Same Saavi narration, smaller file. Opus 48k preferred — auto-selected by your browser.
Scenario anchor
Aap Aaranya IT BFSI ke liye ek loan-underwriting engine design kar rahe hain —
har request pe Claude ko complex regulatory constraints aur applicant
risk profile simultaneously evaluate karna hai. Product owner bol raha
hai: "Extended Thinking enable karo, quality badhegi." Lekin aapka
FinOps lead token costs dekh ke freeze ho gaya hai. Yeh lesson resolve
karega: kab extended thinking genuinely ROI-positive hai, kab woh sirf
latency aur cost ka overhead hai — aur output schema mein
<thinking> tag ko aap kaise handle karte hain structured JSON ke saath.
Key Takeaways
Extended thinking should be enabled only when the task requires genuine multi-step inference across ambiguous constraints — regulatory arbitration, adversarial edge-case triage — not for deterministic extraction or template-fill tasks where a strict JSON schema prompt suffices.
The output tag is a first-class token-billable block; control it at the API parameter level (thinking budget) rather than post-processing it out of the response payload — your cost model must account for thinking tokens separately from output tokens.
In structured-output pipelines, validate that your JSON schema contract explicitly excludes or wraps the thinking block; passing raw Claude output with an embedded tag directly into a downstream schema validator will cause parse failures in production.
Memory anchor — treat extended thinking like a schema-validation pipeline: invoke it only when the payload complexity justifies the overhead, define the contract upfront, and never let unvalidated internal tags leak into your downstream message bus.