← All posts
observabilityMay 31, 2026·4 min read

LLM Cost Per Query: How to Track It in Python (and Why Total Spend Lies)

Total LLM spend is a vanity metric. Cost per query is the number that tells you whether your AI agent is getting more expensive to run — and how to track it in Python with two lines of code.

LLM cost trackingOpenAI cost monitoringAI agent observabilityPythoncost per query

Your OpenAI bill is $2,000 this month. Last month it was $1,200. Is that a problem?

It depends entirely on whether usage grew proportionally. If you served 3× more queries, a 67% cost increase is actually efficiency improving. If usage stayed flat, you have a serious problem.

Total spend is a vanity metric. Cost per query is the number you need.

Why Cost Per Query Matters

Cost per query = total LLM cost ÷ number of root traces.

If this number is flat or falling as you grow, you're doing well. If it's rising, something is wrong:

  • Prompt bloat — conversation history is growing unboundedly, adding tokens every turn
  • Unnecessary tool calls — the agent is calling expensive tools it doesn't need
  • Wrong model for the task — using GPT-4o for simple classification that GPT-3.5 handles fine
  • Retries — failed calls being retried, doubling the cost

None of these show up in total spend until they've already cost you real money. Cost per query catches them early.

How to Track It

The naive approach is to sum usage.total_tokens from every API response and divide by request count. This breaks as soon as you have multiple services, async code, or want to attribute costs to specific features.

The right approach: instrument at the SDK level so every call is captured in context.

import peekr

peekr.instrument(
    exporter=peekr.HTTPExporter(
        endpoint="https://peekr.starkspherelabs.com",
        api_key="pk_live_…",
    ),
)

# Your existing OpenAI code is now instrumented — no changes
from openai import OpenAI
client = OpenAI()

Peekr captures tokens_input, tokens_output, and tokens_total on every LLM span, applies the current model pricing, and computes cost per span. It then groups spans by trace (a trace = one user query → one agent run) to give you cost per query.

Reading the Cost Per Query Trend

In the Peekr Cloud dashboard, the Costs page shows:

  1. Cost per query over 30 days — the single most important number. A flat or falling line means you're efficient. A rising line means investigation is needed.

  2. Cost by operation — which specific calls (chat completions, embeddings, tool calls) drive the most spend

  3. Cost by tenant — if you're B2B, which customer is your most expensive to serve

Day 1:  $0.0012 per query
Day 7:  $0.0014 per query   ← prompt growing
Day 14: $0.0021 per query   ← need to investigate
Day 21: $0.0015 per query   ← added summarisation, improving

The Three Most Common Causes of Rising Cost Per Query

1. Unbounded conversation history

This is the #1 culprit. Every turn appends to the messages array, so token count grows linearly with conversation length. Fix: summarise after N turns.

# Detect it: look at token growth across traces with the same session
# In SQL against the peekr traces:
SELECT
  session_id,
  AVG(tokens_total) as avg_tokens,
  COUNT(*) as turns
FROM spans
WHERE name = 'openai.chat.completions'
GROUP BY session_id
ORDER BY turns DESC;

2. Model mismatch

Using gpt-4o ($5/MTok input) for tasks that gpt-4o-mini ($0.15/MTok) handles equally well. The Peekr Insights tab surfaces this automatically: it compares the faithfulness scores by model and flags cases where downgrading would save money without quality loss.

3. Unnecessary embeddings

Embedding every chunk on every request instead of caching vectors. Embeddings are cheap per call but add up fast at scale. The costs page shows embedding spend as a separate operation — if it's more than 10% of total spend, you probably have a caching opportunity.

Setting a Cost Per Query Alert

Once you have a baseline, set an alert:

from peekr.alerts import CostSpike

peekr.instrument(
    alerts=[
        CostSpike(multiplier=2.0)  # alert if cost doubles vs rolling baseline
    ],
)

This fires when the average cost per trace exceeds 2× the rolling 7-day baseline — before the bill arrives.

What to Do When Cost Per Query Rises

In order of impact:

  1. Find the span that changed — compare yesterday's top-cost traces vs a week ago. The trace waterfall shows exactly which call got more expensive.

  2. Check token growth — if input tokens are growing run-over-run on the same operation, you have prompt bloat.

  3. Check model distribution — if something switched from a cheap to an expensive model, it shows in the "Cost by model" breakdown.

  4. Check error rate — retries are invisible in cost metrics unless you track errors. A 20% error rate means 20% of your spend is wasted.

Peekr surfaces all of this in one dashboard. Free tier covers 10k spans/month — enough to get your cost baseline established before you need to pay anything.

Start observing your AI agents in two lines of code.

Free tier — 10k spans/month. No credit card required.