Auto-instruments OpenAI · Anthropic · Gemini · Bedrock

Something went wrong.
The trace shows you exactly what.

Peekr traces every LLM call — cost, latency, tokens, hallucination score, and the exact inputs and outputs at every step. Two lines of Python. No proxy. No wrappers. No architecture change.

10k spans/month free · no credit card · MIT license

peekr · trace diagnosislive

# Peekr detected a pattern automatically:

⚠ ROOT CAUSE Sequential execution

15 "entity extraction" spans ran one-after-another

Sequential (now): ████████████████ 279s

Parallel (after fix): ████ ~37s

Fix (low effort):

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=8) as pool:

results = list(pool.map(process, items))

trace · 95629e41 · extremis7.5× speedup available

279s

Before

37s

After

What Peekr shows you

Four complaints. Four traces. Four fixes.

The complaint

My agent gave the wrong answer.

trace output
agent.run  2100ms
  └─ tool.fetch_user  12ms
       out: null          ← returned null
  └─ openai.chat      2088ms
       in: "User profile: null…"  ← LLM got garbage

Peekr's fix ·

The trace shows exactly what the LLM received. Malformed tool output caught in seconds, not hours.

The complaint

My agent is hallucinating.

trace output
eval_scores: { Hallucination: 0.00 }

  ✗ contradicted  "founded in 1923"
  ✗ contradicted  "designed by Frank Lloyd Wright"
  ~ unsupported   "featured at the World's Fair"

Peekr's fix ·

Every claim verdicted: supported / contradicted / unsupported. Find the exact sentence that was wrong.

The complaint

My API bill is too high.

trace output
Trace 1:  18,432 tokens  · $0.018
Trace 2:  21,104 tokens  · $0.021
Trace 3:  24,891 tokens  · $0.025  ← growing

Cost by operation: chat_summary  67% of spend

Peekr's fix ·

Cost per query growing means unbounded context. Peekr shows the slope before the bill arrives.

The complaint

My agent is too slow.

trace output
agent.run  4300ms
  └─ tool.search_web  3800ms  ← 88% of time
  └─ tool.rerank         18ms
  └─ openai.chat        490ms  ← not your problem

Peekr's fix ·

Most teams swap models first. The trace shows it's the tool — not the LLM. 88% of time in one call.

Auto-detection

Peekr reads the traces. You fix the bugs.

6 performance pattern detectors run automatically across every trace. They surface in your Insights tab with the offending trace ID and a code fix.

Sequential execution

7.5× speedup

Same span runs N times one-after-another. Parallelising saves 242s.

Read the case study →
📊

Observer overhead

Zero latency

Eval blocking user responses. Move to background thread.

Read the case study →
🔁

Redundant embeddings

Latency cut

Same text embedded 2× in one trace. Cache the vector.

📈

Context growth

Cost reduced

Token count growing 60%+ across calls. Trim with rolling window.

🔧

Tool bottleneck

Major speedup

Single non-LLM call taking 88% of trace time. Cache it.

🔄

Retry storm

Reliability

Same span fails 3× before succeeding. Add exponential back-off.

Setup

Two lines. Everything else is automatic.

Peekr patches at the class level — every OpenAI() and Anthropic() instance is captured automatically. No wrappers. No proxy. No framework-specific config.

Zero latency overhead — spans export on a background thread

Works alongside JSONL / SQLite for local dev

Self-hostable — MIT license, run anywhere

Start free — 10k spans/month →
agent.py
import peekr

peekr.instrument(
    exporter=peekr.HTTPExporter(
        endpoint="https://peekr.starkspherelabs.com",
        api_key="pk_live_…",
    ),
    evaluators=[peekr.eval.Hallucination(detailed=True)],
    guardrails=[peekr.guard.PIIRedact()],
)

# Your existing code unchanged
from openai import OpenAI
client = OpenAI()  # ← automatically traced

Find bugs in your AI before your users do.

Free up to 10k spans per month. No credit card. Two lines of Python. Traces appear within 5 seconds of your first LLM call.

Also need compliance? See compliance guardrails →