P
Peekr Cloud
DemoAcme Agents

Insights

Save money. Fix regressions. Right-size models.

Real-time analysis of your spans. We highlight where you're overpaying and where your stack just regressed — with concrete actions, not vibes.

Available savings
$365 / mo

129% of current LLM spend · 7 actionable recommendations

Current monthly spend
$282

Projected from the last 24h × 30 days

Active anomalies
3

Cost, latency, or quality drift vs. baseline

Recommendations · sorted by savings

What to fix this week.

Cost breakdown →
Model swapcode_assistlow effort

Route short code_assist queries to gpt-4o-mini

57 of 57 code_assist calls in 24h were under 600 input tokens on claude-opus-4-7.

Now:claude-opus-4-7·57 calls/24h·0 mean input tokensProposed:gpt-4o-mini

Short prompts don't need a frontier model. Add a length check at the dispatcher: if tokens_input < 600, use gpt-4o-mini; otherwise fall back. Quality drop is typically negligible at this length.

Save / mo
$85
99% on feature
Model swapchat_summarylow effort

Route short chat_summary queries to gpt-4o-mini

84 of 84 chat_summary calls in 24h were under 600 input tokens on claude-opus-4-7.

Now:claude-opus-4-7·84 calls/24h·0 mean input tokensProposed:gpt-4o-mini

Short prompts don't need a frontier model. Add a length check at the dispatcher: if tokens_input < 600, use gpt-4o-mini; otherwise fall back. Quality drop is typically negligible at this length.

Save / mo
$74
99% on feature
Prompt cachingchat_summarylow effort

Enable prompt caching for chat_summary

84 calls share a 2.4k-token system prompt on claude-opus-4-7.

Now:claude-opus-4-7·84 calls/24h·2,400 mean input tokens

Anthropic prompt caching cuts repeated system-prompt cost by ~90% after the first hit. Set cache_control: {"type":"ephemeral"} on the system block — no code path change required on Peekr's side.

Save / mo
$73
81% on feature
Model swapsupport_botlow effort

Route short support_bot queries to gpt-4o-mini

119 of 119 support_bot calls in 24h were under 600 input tokens on claude-opus-4-7.

Now:claude-opus-4-7·119 calls/24h·0 mean input tokensProposed:gpt-4o-mini

Short prompts don't need a frontier model. Add a length check at the dispatcher: if tokens_input < 600, use gpt-4o-mini; otherwise fall back. Quality drop is typically negligible at this length.

Save / mo
$65
99% on feature
Fine-tunesupport_bothigh effort

Fine-tune for support_bot (high-volume on premium model)

119 support_bot calls in 24h, 100% on premium models.

At this volume a fine-tuned smaller model typically reaches ≥95% of frontier quality on a constrained task. Sample 5k spans, fine-tune gpt-4o-mini, A/B against current. Training cost recovers in ~5 days at current spend.

Save / mo
$49
75% on feature
Model swapsearch_qalow effort

Route short search_qa queries to gpt-4o-mini

90 of 90 search_qa calls in 24h were under 600 input tokens on gpt-4o.

Now:gpt-4o·90 calls/24h·0 mean input tokensProposed:gpt-4o-mini

Short prompts don't need a frontier model. Add a length check at the dispatcher: if tokens_input < 600, use gpt-4o-mini; otherwise fall back. Quality drop is typically negligible at this length.

Save / mo
$11
96% on feature
Fine-tunesearch_qahigh effort

Fine-tune for search_qa (high-volume on premium model)

90 search_qa calls in 24h, 100% on premium models.

At this volume a fine-tuned smaller model typically reaches ≥95% of frontier quality on a constrained task. Sample 5k spans, fine-tune gpt-4o-mini, A/B against current. Training cost recovers in ~5 days at current spend.

Save / mo
$8
75% on feature

Anomalies · last 7 days

When things changed without you noticing.

costchat_summary2026-05-18 14:00

chat_summary cost +38% vs 7-day baseline

Triggered when chat_summary defaulted back to claude-opus-4-7 on 2026-05-18. Volume held flat — the spike is purely model-mix.

Inspect a representative trace →
38%
latency2026-05-19 13:18

tool.web_fetch p95 latency doubled

p95 jumped from 480ms to 980ms after the 13:00 deploy. Hit rate on the downstream proxy dropped — likely cache invalidation.

104%
qualitydata_extraction2026-05-19 09:42

data_extraction hallucination rate up 11pp

Switched from claude-opus-4-7 to claude-sonnet-4-6 on the structured extraction prompt. Quality regressed; estimated $/correct-answer is actually higher.

11%

Top spenders

Which users cost you the most.

All users →
UserShareCallsTop featureModels used24hProjected /mo
he
u_heavy_19
3.6%
40data_extraction3$0.335$10.06
he
u_heavy_27
3.0%
15chat_summary1$0.281$8.43
he
u_heavy_39
2.7%
19code_assist3$0.258$7.74
83
u_832
2.7%
4code_assist1$0.254$7.62
he
u_heavy_10
2.2%
15support_bot1$0.209$6.28
he
u_heavy_37
2.2%
25moderation3$0.205$6.15

Want these recommendations on your real traffic?

Sign in, mint a key, ship spans. Peekr starts surfacing optimizations the moment your first batch lands.

Sign in to start