129% of current LLM spend · 7 actionable recommendations
Projected from the last 24h × 30 days
Cost, latency, or quality drift vs. baseline
Recommendations · sorted by savings
What to fix this week.
Route short code_assist queries to gpt-4o-mini
57 of 57 code_assist calls in 24h were under 600 input tokens on claude-opus-4-7.
Short prompts don't need a frontier model. Add a length check at the dispatcher: if tokens_input < 600, use gpt-4o-mini; otherwise fall back. Quality drop is typically negligible at this length.
Route short chat_summary queries to gpt-4o-mini
84 of 84 chat_summary calls in 24h were under 600 input tokens on claude-opus-4-7.
Short prompts don't need a frontier model. Add a length check at the dispatcher: if tokens_input < 600, use gpt-4o-mini; otherwise fall back. Quality drop is typically negligible at this length.
Enable prompt caching for chat_summary
84 calls share a 2.4k-token system prompt on claude-opus-4-7.
Anthropic prompt caching cuts repeated system-prompt cost by ~90% after the first hit. Set cache_control: {"type":"ephemeral"} on the system block — no code path change required on Peekr's side.
Route short support_bot queries to gpt-4o-mini
119 of 119 support_bot calls in 24h were under 600 input tokens on claude-opus-4-7.
Short prompts don't need a frontier model. Add a length check at the dispatcher: if tokens_input < 600, use gpt-4o-mini; otherwise fall back. Quality drop is typically negligible at this length.
Fine-tune for support_bot (high-volume on premium model)
119 support_bot calls in 24h, 100% on premium models.
At this volume a fine-tuned smaller model typically reaches ≥95% of frontier quality on a constrained task. Sample 5k spans, fine-tune gpt-4o-mini, A/B against current. Training cost recovers in ~5 days at current spend.
Route short search_qa queries to gpt-4o-mini
90 of 90 search_qa calls in 24h were under 600 input tokens on gpt-4o.
Short prompts don't need a frontier model. Add a length check at the dispatcher: if tokens_input < 600, use gpt-4o-mini; otherwise fall back. Quality drop is typically negligible at this length.
Fine-tune for search_qa (high-volume on premium model)
90 search_qa calls in 24h, 100% on premium models.
At this volume a fine-tuned smaller model typically reaches ≥95% of frontier quality on a constrained task. Sample 5k spans, fine-tune gpt-4o-mini, A/B against current. Training cost recovers in ~5 days at current spend.
Anomalies · last 7 days
When things changed without you noticing.
chat_summary cost +38% vs 7-day baseline
Triggered when chat_summary defaulted back to claude-opus-4-7 on 2026-05-18. Volume held flat — the spike is purely model-mix.
Inspect a representative trace →tool.web_fetch p95 latency doubled
p95 jumped from 480ms to 980ms after the 13:00 deploy. Hit rate on the downstream proxy dropped — likely cache invalidation.
data_extraction hallucination rate up 11pp
Switched from claude-opus-4-7 to claude-sonnet-4-6 on the structured extraction prompt. Quality regressed; estimated $/correct-answer is actually higher.
Top spenders
Which users cost you the most.
| User | Share | Calls | Top feature | Models used | 24h | Projected /mo |
|---|---|---|---|---|---|---|
he u_heavy_19 | 3.6% | 40 | data_extraction | 3 | $0.335 | $10.06 |
he u_heavy_27 | 3.0% | 15 | chat_summary | 1 | $0.281 | $8.43 |
he u_heavy_39 | 2.7% | 19 | code_assist | 3 | $0.258 | $7.74 |
83 u_832 | 2.7% | 4 | code_assist | 1 | $0.254 | $7.62 |
he u_heavy_10 | 2.2% | 15 | support_bot | 1 | $0.209 | $6.28 |
he u_heavy_37 | 2.2% | 25 | moderation | 3 | $0.205 | $6.15 |
Want these recommendations on your real traffic?
Sign in, mint a key, ship spans. Peekr starts surfacing optimizations the moment your first batch lands.