Observability
Call peekr.instrument() once and Peekr records a span for every LLM call — model, tokens, latency, status, inputs, and outputs. No wrappers and no proxy.
What gets captured
Instrument before you import any LLM SDK. Peekr patches the client classes for OpenAI, Anthropic, AWS Bedrock, and Google Gemini, so every call after that is traced automatically.
import peekr
peekr.instrument() # patches OpenAI, Anthropic, Bedrock, Gemini at the class level
from openai import OpenAI # import AFTER instrument()
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarise this contract..."}],
)
# Span written: model, tokens_input, tokens_output, tokens_total, input, output, duration_ms, statusWarning
peekr.instrument() must run before you import or instantiate any LLM SDK. The instrumentation patches at the class level, so a client created before the call is never traced.Every span carries these fields. Top-level: trace_id, span_id, parent_id, name, start_time, end_time, duration_ms, status (ok | error), tenant_id, retention_class. Under attributes: model, tokens_input, tokens_output, tokens_total, input, output, error, session_id, user_id, eval_scores, experiment_variant, feature, endpoint.
Cost & latency
Token counts (tokens_input / tokens_output / tokens_total) let you attribute spend per call, and duration_ms on each span shows where wall-clock time actually goes — usually a tool or retrieval step, rarely the model itself.
-- Average tokens and duration per model, from local SQLite (storage="sqlite")
SELECT
json_extract(attributes, '$.model') AS model,
AVG(json_extract(attributes, '$.tokens_total')) AS avg_tokens,
AVG(duration_ms) AS avg_ms,
COUNT(*) AS calls
FROM spans
WHERE name LIKE 'openai.chat%'
GROUP BY model
ORDER BY avg_tokens DESC;Sessions & multi-tenant
Wrap a multi-turn conversation in peekr.session(...) to group its spans and tag them with a user, tenant, and retention class. Everything inside the block shares those values.
with peekr.session(
user_id="user-abc",
tenant_id="acme",
session_id="user-abc-turn-3",
retention_class="short", # default | short | long | pii
):
resp1 = client.chat.completions.create(...)
resp2 = client.chat.completions.create(...)
# Both spans share session_id, user_id, tenant_id, retention_classNote
session() win over those set in instrument(), which in turn win over the environment variables PEEKR_TENANT_ID and PEEKR_RETENTION_CLASS.Claim-level hallucination detection
peekr.eval.Hallucination() scores how grounded each response is — 1.0 means every claim is supported, 0.0 means fully hallucinated. The score lands in span.attributes["eval_scores"].
import peekr
peekr.instrument(
evaluators=[
peekr.eval.Hallucination(
# Pull the grounding context out of each span (e.g. your RAG documents):
context_extractor=lambda span: span.attributes.get("input"),
model="gpt-4o-mini", # judge model
detailed=True, # store a verdict for every claim
),
],
)With detailed=True, Peekr decomposes the response into atomic claims and writes per-claim verdicts to span.attributes.hallucination_details. Each verdict is one of supported, contradicted, or unsupported — so you can see exactly which sentence went off the rails, not just an aggregate number.
Viewing traces
Choose where spans land with storage in instrument(), then inspect them from the CLI.
Open a trace file or database
Add --io to include the captured inputs and outputs.
peekr view traces.jsonl --io
peekr view traces.db --ioGenerate a self-contained HTML report
Builds a shareable dashboard from a SQLite database.
peekr dashboard traces.db -o report.htmlReplay a single trace
Step through one trace by id to see its full waterfall.
peekr replay <trace_id> --db traces.dbPeekr Cloud
Point an HTTPExporter at Peekr Cloud to get the trace waterfall, cost and token trends, hallucination distributions, and a worst-offender list with no SQL needed.
import peekr
peekr.instrument(
tenant_id="acme",
exporter=peekr.HTTPExporter(
endpoint="https://peekr.starkspherelabs.com",
api_key="pk_live_…",
),
)import { instrument } from "@peekr/sdk";
instrument({
exporter: {
type: "http",
endpoint: "https://peekr.starkspherelabs.com",
apiKey: "pk_live_…",
},
});Feedback & fine-tuning data
Attach a human rating to any trace with peekr.feedback(...), then export the good ones as a ready-to-train dataset with peekr.export_feedback(...).
# Rate a trace
peekr.feedback(trace_id="...", rating="good", note="grounded, correct tone")
# Export the good traces as an OpenAI fine-tuning file
peekr.export_feedback(
db_path="traces.db",
filter="good",
output="training.jsonl",
format="openai-ft", # or "raw"
)