Same Saavi narration, smaller file. Opus 48k preferred — auto-selected by your browser.
Scenario anchor
Aap Aaranya IT BFSI division mein ek large-language-model pipeline design kar rahe
hain — har API call mein 80,000-token regulatory compliance corpus attach
ho raha hai, aur input token costs quarter-over-quarter explode ho rahe
hain. Finance director ka escalation aa gaya hai. Prompt caching ka breakpoint
galat jagah rakha gaya tha — dynamic user query static system prompt ke
PEHLE aa rahi thi — isliye cache hit rate zero tha. Is lesson mein hum
exactly dekhenge ki cache breakpoints kahan place karne chahiye taaki
Claude ka prefix cache maximum hit kare aur aapki team ka cost SLA rescue ho.
Key Takeaways
Always structure your prompt as static-first, dynamic-last: system prompt → reference corpus → few-shot examples → cache_control breakpoint → user turn. This is the only ordering that guarantees a cache hit on the expensive static prefix.
Treat the cache breakpoint like a CDN edge node origin boundary — everything upstream of it must be byte-identical across requests; even a single variable token upstream causes a full cache miss and a full input-token charge.
Token-cost amortization math: a cached prefix costs roughly 10% of the base input-token price on cache hit reads, but carries a write premium on first load — breakpoints only break even if the same static prefix is reused across at least 3-5 subsequent requests within the cache TTL window.
Memory anchor — CDN edge cache invalidation: cache breakpoint placement in Claude = origin boundary placement in a CDN; static assets upstream, dynamic query-string params downstream, never the reverse.