LangChain agents are notoriously hard to debug. Peekr solves this with two lines of code — drop them above your existing agent setup and you get full trace visibility, token counts, latency breakdowns, and guardrail hooks with zero changes to your agent logic.
Why LangChain Observability Is Harder Than It Looks
A single LangChain agent invocation isn't one LLM call. It's a cascade: tool selection, tool execution, result parsing, re-prompting, and final synthesis. By the time you get a response back, five to fifteen things have happened internally — and if something went wrong, you're staring at a Python traceback that tells you nothing useful.
Standard logging doesn't help much here. You can wrap agent.invoke() in a try/except and log the output, but you lose:
- Which tool was called and why
- The exact prompt that went into each LLM call
- Token usage per step (not just total)
- Latency per node in the chain
- Whether the agent looped, hallucinated a tool name, or hit a context limit
That's the gap observability fills. Let's look at how to plug it in.
The Two-Line Setup
Install Peekr and add it before your LangChain imports:
pip install peekr langchain langchain-openai
import peekr
peekr.init(api_key="your-peekr-api-key")
# Everything below is untouched from your existing code
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.tools import tool
@tool
def get_weather(city: str) -> str:
"""Returns the current weather for a given city."""
# In production this would call a weather API
return f"It's 72°F and sunny in {city}."
@tool
def get_population(city: str) -> str:
"""Returns the approximate population of a given city."""
populations = {"San Francisco": "874,000", "New York": "8.3 million"}
return populations.get(city, "Population data not available.")
llm = ChatOpenAI(model="gpt-4o", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant. Use tools when needed."),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
agent = create_openai_tools_agent(llm, [get_weather, get_population], prompt)
agent_executor = AgentExecutor(agent=agent, tools=[get_weather, get_population], verbose=False)
result = agent_executor.invoke({"input": "What's the weather and population of San Francisco?"})
print(result["output"])
That's it. Peekr auto-instruments LangChain's callback system at import time. Every chain invocation, LLM call, tool execution, and agent step gets captured and sent to your Peekr dashboard as a structured trace.
What You Actually See in the Trace
After running the snippet above, your Peekr dashboard shows a waterfall trace that looks something like this:
AgentExecutor [total: 1.84s]
├── ChatOpenAI (tool selection) [312ms | 189 tokens]
│ └── Tool call: get_weather(city="San Francisco")
│ └── Result: "It's 72°F and sunny in San Francisco."
├── ChatOpenAI (tool selection) [287ms | 201 tokens]
│ └── Tool call: get_population(city="San Francisco")
│ └── Result: "874,000"
└── ChatOpenAI (final answer) [408ms | 234 tokens]
└── Output: "San Francisco is currently 72°F and sunny, ..."
The numbers above are illustrative, but the structure is real. You can click into any node and inspect:
- The full system prompt and user message
- The raw JSON of the tool call and its return value
- Token counts split by prompt and completion
- Whether the model used cached tokens (relevant for OpenAI's prompt caching)
This is what changes debugging from guesswork to diagnosis.
Adding Guardrails on Top of Observability
Observability tells you what happened. Guardrails tell the agent what it's not allowed to do. Peekr lets you attach both in the same init() call:
import peekr
from peekr.guardrails import GuardrailPolicy
peekr.init(
api_key="your-peekr-api-key",
guardrails=GuardrailPolicy(
block_topics=["competitor pricing", "legal advice"],
max_tokens_per_call=2000,
on_violation="warn", # or "block" to raise an exception
)
)
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.tools import tool
@tool
def search_docs(query: str) -> str:
"""Searches internal documentation."""
return f"Results for '{query}': [doc_1, doc_2, doc_3]"
llm = ChatOpenAI(model="gpt-4o", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a customer support assistant for Acme Corp."),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
agent = create_openai_tools_agent(llm, [search_docs], prompt)
agent_executor = AgentExecutor(agent=agent, tools=[search_docs])
# This will trigger a guardrail warning in the Peekr dashboard
result = agent_executor.invoke({
"input": "Can you compare your prices to CompetitorX's pricing plans?"
})
print(result["output"])
With on_violation="warn", the agent still responds but Peekr flags the trace in the dashboard and fires any webhooks you've configured. Switch to "block" and Peekr raises a GuardrailViolationError before the LLM call goes out, giving you a clean place to catch it and return a canned response.
Tracing Across Multi-Agent Workflows
Single-agent observability is table stakes. The real pain is when you have multiple agents calling each other — a supervisor agent that routes to specialist sub-agents, for example. Without distributed tracing, you get disconnected logs with no way to correlate a slow response back to which sub-agent caused it.
Peekr propagates trace context automatically through LangChain's RunnableConfig, so parent-child relationships are preserved:
import peekr
peekr.init(api_key="your-peekr-api-key")
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableLambda
from langchain_core.output_parsers import StrOutputParser
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Sub-agent: summarization specialist
summarizer = (
llm
| StrOutputParser()
)
# Sub-agent: translation specialist
translator = (
llm
| StrOutputParser()
)
def supervisor(input_dict: dict) -> str:
task = input_dict["task"]
text = input_dict["text"]
if task == "summarize":
from langchain_core.messages import HumanMessage
return summarizer.invoke([HumanMessage(content=f"Summarize this: {text}")])
elif task == "translate":
from langchain_core.messages import HumanMessage
return translator.invoke([HumanMessage(content=f"Translate to Spanish: {text}")])
return "Unknown task."
supervisor_chain = RunnableLambda(supervisor)
result = supervisor_chain.invoke({
"task": "summarize",
"text": "LangChain is a framework for building applications powered by language models."
})
print(result)
In the Peekr dashboard, both the supervisor invocation and the summarizer's LLM call appear under the same root trace ID. You can see total end-to-end latency and drill down into exactly which sub-chain contributed what.
Quick Wins: What to Do Right Now
If you're running a LangChain agent in production today, here's the priority order:
1. Install and init Peekr (5 minutes)
Add import peekr; peekr.init(api_key="...") at the top of your entry point. No other changes. Run your agent once and confirm traces appear in the dashboard.
2. Tag your traces with metadata
Pass session_id and user_id to correlate traces with specific users or sessions:
peekr.init(
api_key="your-peekr-api-key",
default_metadata={"environment": "production", "version": "2.1.0"}
)
You can also pass per-invocation metadata through LangChain's config parameter on any invoke() call.
3. Set a token budget alert Unexpected token usage is almost always an agent stuck in a loop or a tool returning a massive payload. Set a threshold in Peekr's dashboard under Settings → Alerts, and you'll get a Slack or webhook notification the moment a single trace exceeds it.
4. Enable input/output capture for your tools By default, Peekr captures LLM inputs and outputs. Make sure tool call arguments and return values are also being captured (they are, by default) and review a few traces to spot tools that return more data than the model actually needs. Trimming tool outputs is often the single biggest latency win.
5. Add one guardrail before your next deployment
Even on_violation="warn" with a single blocked topic gives you signal on what users are actually trying to do that you didn't anticipate. It's a cheap way to surface product insights alongside safety enforcement.
The Bigger Picture
Most LangChain performance problems and reliability issues — loops, hallucinated tool names, runaway token costs, slow p95 latency — become obvious the moment you have a proper trace. The debugging cycle that used to take hours of print() statements and re-runs collapses to a few minutes of clicking through a waterfall.
The two lines of Peekr setup pay for themselves the first time you ship a broken agent to production and can diagnose it in under five minutes instead of an afternoon.
Check out the Peekr docs to see the full LangChain integration guide and a live demo environment you can run traces against without a production agent.