Shrutam CCA-F Crash › Domain 4 › d4-l06-extended-thinking-output-tag Hinglish English →

Extended Thinking — When to Enable, When to Skip

Domain 4 · 18% ~12 min Hinglish narration

Audio-only (commute / mobile data)

Same Saavi narration, smaller file. Opus 48k preferred — auto-selected by your browser.

Scenario anchor

Aap Aaranya IT BFSI ke liye ek loan-underwriting engine design kar rahe hain — har request pe Claude ko complex regulatory constraints aur applicant risk profile simultaneously evaluate karna hai. Product owner bol raha hai: "Extended Thinking enable karo, quality badhegi." Lekin aapka FinOps lead token costs dekh ke freeze ho gaya hai. Yeh lesson resolve karega: kab extended thinking genuinely ROI-positive hai, kab woh sirf latency aur cost ka overhead hai — aur output schema mein <thinking> tag ko aap kaise handle karte hain structured JSON ke saath.

Key Takeaways

Extended thinking should be enabled only when the task requires genuine multi-step inference across ambiguous constraints — regulatory arbitration, adversarial edge-case triage — not for deterministic extraction or template-fill tasks where a strict JSON schema prompt suffices.
The output tag is a first-class token-billable block; control it at the API parameter level (thinking budget) rather than post-processing it out of the response payload — your cost model must account for thinking tokens separately from output tokens.
In structured-output pipelines, validate that your JSON schema contract explicitly excludes or wraps the thinking block; passing raw Claude output with an embedded tag directly into a downstream schema validator will cause parse failures in production.
Memory anchor — treat extended thinking like a schema-validation pipeline: invoke it only when the payload complexity justifies the overhead, define the contract upfront, and never let unvalidated internal tags leak into your downstream message bus.

← Previous lesson Same lesson in English Next lesson → Back to overview