AI Agent Observability · Peekr Cloud

Trace every step your
AI agent takes.

Multi-step agents fail in ways single-call tracing can't see: cascade loops, tool failures two levels deep, inter-agent handoffs. Peekr captures the full call tree — every LLM call, tool use, and sub-agent span — in two lines of Python, no proxy.

OpenAI Anthropic Gemini Bedrock LangChain CrewAI LlamaIndex

Why agents are harder to observe

Single-call tracing misses agent failures.

A single LLM call either succeeds or fails. An agent that makes 20 calls, invokes 5 tools, and spawns 2 sub-agents can fail at any node — and the failure propagates silently through the rest of the pipeline. You need a trace, not a log.

Cascade failures

A tool returns null. The LLM receives null as context. The agent produces a confident wrong answer. Without a waterfall, you only see the wrong answer.

Runaway loops

An agent that retries on failure can make 100 LLM calls before timing out. Trace depth makes the loop visible in one glance — and shows you the tool condition that caused it.

Cost per workflow

A workflow with 12 LLM calls can spend 90% of its budget on one summarization step. Per-span cost attribution shows you exactly which step to optimize.

Four patterns every agent team hits

The trace you wish you had at 2am.

The complaint

My agent gave the wrong answer — but I don't know why.

See the full call tree: which tool returned null, what the LLM actually received.

Single-call LLM tracing misses the agent-specific failure: a tool two levels deep returned an empty result, the LLM received null as context, and hallucinated an answer. The waterfall exposes the break in milliseconds.

agent.run  3200ms
  └─ tool.lookup_customer   18ms
       out: null               ← returned null
  └─ tool.fetch_history     610ms
  └─ openai.chat           2570ms
       in: "Customer data: null…"  ← LLM got garbage

The complaint

One user request triggered 122 LLM calls.

The waterfall shows cascade loops the moment they start.

Retry loops, unbounded recursion, and agent-spawns-agent patterns are invisible in call-level logs. In a trace, a runaway loop looks like a stack of repeated spans — you see it immediately and know which tool or condition caused it.

agent.run  47,200ms (timeout)
  └─ sub_agent.retry  × 122
       └─ openai.chat  380ms each
            reason: tool always returns "try again"
            cost:   $0.46 per user request

The complaint

My agent is hallucinating — which step caused it?

Claim-level scoring pinpoints the exact sentence and which context span fed it.

An agent that retrieves, summarizes, and answers in three steps can hallucinate at any of them. Peekr scores every LLM output at the sentence level — supported, contradicted, unsupported — and links the verdict back to the retrieval span so you know whether to fix the prompt or the retriever.

step.retrieve    → 4 docs fetched
step.summarize   → eval_scores: { Hallucination: 0.00 }
  ✗ contradicted  "founded in 1987"
  ✗ contradicted  "revenue $2.4B"
step.answer      → blocked by HallucinationBlock(threshold=0.5)

The complaint

My agent cost $0.40 per request and I don't know why.

Cost is broken down per span, per workflow step, and per tenant.

A single agent workflow can make a dozen LLM calls with very different token counts. Peekr shows cost per span and cost per workflow — so you see that the summarization step costs 8× the retrieval step and you know exactly where to add caching or trim context.

workflow cost:  $0.038 / request
  step.plan        $0.002   (5%)
  step.retrieve    $0.001   (3%)
  step.summarize   $0.028   (74%)  ← unbounded tokens
  step.answer      $0.007   (18%)

Two lines. Every span captured.

Instrument your agent before the first import. The rest is automatic.

Peekr patches at the class level — every OpenAI(), AsyncOpenAI(), and anthropic.Anthropic() instance is patched. Every tool call that wraps a patched client is a child span automatically — no per-tool configuration.

  • Async + streaming fully supported — AsyncOpenAI, streamed responses rolled up
  • Works with LangChain, CrewAI, LlamaIndex, and any framework on top of the SDK
  • Zero latency overhead — spans export on a background thread
agent.py
# 1. Call this before any other imports
import peekr

peekr.instrument(
  tenant_id="my-agent",
  exporter=peekr.HTTPExporter(
    endpoint="https://peekr.starkspherelabs.com",
    api_key="pk_live_…",
  ),
  evaluators=[peekr.eval.Hallucination(detailed=True)],
)

# 2. Your agent code is unchanged — every call below is traced
from openai import OpenAI
client = OpenAI()  # ← patched automatically

# LangChain, CrewAI, LlamaIndex — same 2 lines

Regulated AI agents — healthcare, fintech, or legal — need more than observability. Peekr also enforces HIPAA, FDCPA, FINRA, and GDPR on every span in-process, with no proxy and a tamper-evident audit log.

See compliance packs →

FAQ

Common questions about AI agent observability.

What is AI agent observability?

AI agent observability is the practice of tracing every step of a multi-step agent pipeline: each LLM call, each tool use, each inter-agent handoff, and the full call tree. Unlike basic LLM logging, agent observability must capture the causal chain across many spans so you can see exactly which tool returned null, which LLM received corrupted context, and which step caused a cascade failure.

How is AI agent observability different from LLM observability?

LLM observability covers individual call metrics — tokens, latency, cost, and output quality for a single completion. AI agent observability covers the full pipeline: a workflow that makes dozens of LLM calls across multiple models, invokes external tools, and may spawn sub-agents. You need a trace waterfall, not just per-call logs.

Does Peekr support LangChain, CrewAI, and LlamaIndex?

Yes. Peekr auto-instruments at the class level — every OpenAI(), AsyncOpenAI(), and Anthropic() instance is patched. LangChain, CrewAI, LlamaIndex, and any other framework that calls the underlying provider SDK is covered without any per-framework configuration.

How do I add observability to an existing Python agent?

Add two lines before your other imports: import peekr and peekr.instrument(...). Peekr patches the provider clients at the class level — your agent code stays unchanged and every LLM call and tool span is captured automatically.

Can I observe agents that run in async or streaming mode?

Yes. Peekr supports AsyncOpenAI and streamed responses — streaming calls are rolled up into a single span with the final token count and cost. Async agents using asyncio are fully supported without any configuration changes.

Observe your first agent in two lines.

Free up to 10k spans per month. Every LLM call, tool span, and sub-agent step traced automatically. No proxy, no architecture change, no credit card required.

Also need compliance? See HIPAA for AI agents →