Peekr traces every LLM call — cost, latency, tokens, hallucination score, and the exact inputs and outputs at every step. Two lines of Python. No proxy. No wrappers. No architecture change.
10k spans/month free · no credit card · MIT license
# Peekr detected a pattern automatically:
⚠ ROOT CAUSE Sequential execution
15 "entity extraction" spans ran one-after-another
Sequential (now): ████████████████ 279s
Parallel (after fix): ████ ~37s
Fix (low effort):
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=8) as pool:
results = list(pool.map(process, items))
279s
Before
→
37s
After
What Peekr shows you
The complaint
“My agent gave the wrong answer.”
agent.run 2100ms
└─ tool.fetch_user 12ms
out: null ← returned null
└─ openai.chat 2088ms
in: "User profile: null…" ← LLM got garbagePeekr's fix ·
The trace shows exactly what the LLM received. Malformed tool output caught in seconds, not hours.
The complaint
“My agent is hallucinating.”
eval_scores: { Hallucination: 0.00 }
✗ contradicted "founded in 1923"
✗ contradicted "designed by Frank Lloyd Wright"
~ unsupported "featured at the World's Fair"Peekr's fix ·
Every claim verdicted: supported / contradicted / unsupported. Find the exact sentence that was wrong.
The complaint
“My API bill is too high.”
Trace 1: 18,432 tokens · $0.018
Trace 2: 21,104 tokens · $0.021
Trace 3: 24,891 tokens · $0.025 ← growing
Cost by operation: chat_summary 67% of spendPeekr's fix ·
Cost per query growing means unbounded context. Peekr shows the slope before the bill arrives.
The complaint
“My agent is too slow.”
agent.run 4300ms
└─ tool.search_web 3800ms ← 88% of time
└─ tool.rerank 18ms
└─ openai.chat 490ms ← not your problemPeekr's fix ·
Most teams swap models first. The trace shows it's the tool — not the LLM. 88% of time in one call.
Auto-detection
6 performance pattern detectors run automatically across every trace. They surface in your Insights tab with the offending trace ID and a code fix.
Sequential execution
7.5× speedupSame span runs N times one-after-another. Parallelising saves 242s.
Read the case study →Observer overhead
Zero latencyEval blocking user responses. Move to background thread.
Read the case study →Redundant embeddings
Latency cutSame text embedded 2× in one trace. Cache the vector.
Context growth
Cost reducedToken count growing 60%+ across calls. Trim with rolling window.
Tool bottleneck
Major speedupSingle non-LLM call taking 88% of trace time. Cache it.
Retry storm
ReliabilitySame span fails 3× before succeeding. Add exponential back-off.
Setup
Peekr patches at the class level — every OpenAI() and Anthropic() instance is captured automatically. No wrappers. No proxy. No framework-specific config.
✓ Zero latency overhead — spans export on a background thread
✓ Works alongside JSONL / SQLite for local dev
✓ Self-hostable — MIT license, run anywhere
import peekr
peekr.instrument(
exporter=peekr.HTTPExporter(
endpoint="https://peekr.starkspherelabs.com",
api_key="pk_live_…",
),
evaluators=[peekr.eval.Hallucination(detailed=True)],
guardrails=[peekr.guard.PIIRedact()],
)
# Your existing code unchanged
from openai import OpenAI
client = OpenAI() # ← automatically tracedFree up to 10k spans per month. No credit card. Two lines of Python. Traces appear within 5 seconds of your first LLM call.
Also need compliance? See compliance guardrails →