Shrutam CCA-F Crash › Domain 5 › d5-l02-cache-static-prefix Hinglish English →

Prompt Caching — Where to Place Cache Breakpoints

Domain 5 · 15% ~12 min Hinglish narration

Audio-only (commute / mobile data)

Same Saavi narration, smaller file. Opus 48k preferred — auto-selected by your browser.

Scenario anchor

Aap Aaranya IT BFSI division mein ek large-language-model pipeline design kar rahe hain — har API call mein 80,000-token regulatory compliance corpus attach ho raha hai, aur input token costs quarter-over-quarter explode ho rahe hain. Finance director ka escalation aa gaya hai. Prompt caching ka breakpoint galat jagah rakha gaya tha — dynamic user query static system prompt ke PEHLE aa rahi thi — isliye cache hit rate zero tha. Is lesson mein hum exactly dekhenge ki cache breakpoints kahan place karne chahiye taaki Claude ka prefix cache maximum hit kare aur aapki team ka cost SLA rescue ho.

Key Takeaways

Always structure your prompt as static-first, dynamic-last: system prompt → reference corpus → few-shot examples → cache_control breakpoint → user turn. This is the only ordering that guarantees a cache hit on the expensive static prefix.
Treat the cache breakpoint like a CDN edge node origin boundary — everything upstream of it must be byte-identical across requests; even a single variable token upstream causes a full cache miss and a full input-token charge.
Token-cost amortization math: a cached prefix costs roughly 10% of the base input-token price on cache hit reads, but carries a write premium on first load — breakpoints only break even if the same static prefix is reused across at least 3-5 subsequent requests within the cache TTL window.
Memory anchor — CDN edge cache invalidation: cache breakpoint placement in Claude = origin boundary placement in a CDN; static assets upstream, dynamic query-string params downstream, never the reverse.

← Previous lesson Same lesson in English Next lesson → Back to overview