How to use Claude's extended thinking for complex reasoning tasks

Claude's extended thinking feature is one of the most underutilized capabilities in the API. While most people treat Claude like a fast autocomplete, extended thinking transforms it into a genuine reasoning engine that can tackle problems requiring deep analysis, multi-step logic, and careful consideration.

Here's what you need to know to use it effectively.

What Extended Thinking Actually Does

Extended thinking enables Claude to spend computational resources on internal reasoning before generating a response. Think of it like the difference between blurting out an answer versus pausing to think it through.

When you enable extended thinking, Claude creates a "thinking" block where it reasons through the problem. This happens before the actual response. You can see this thinking process, though it costs more tokens—roughly 4x the input tokens for the thinking block itself.

The key insight: this isn't a gimmick. It genuinely improves performance on hard problems. Tests show meaningful improvements on tasks requiring logic puzzles, code debugging, mathematical reasoning, and strategic analysis.

When to Actually Use Extended Thinking

Extended thinking isn't for everything. Use it when:

The problem requires multiple reasoning steps. Math proofs, complex system design, debugging tangled code.
You need to weigh competing considerations. Strategic decisions, policy analysis, ethical dilemmas.
The stakes are high enough to justify the cost. Extra tokens matter for your use case.
The problem is genuinely difficult. Claude struggles with it without thinking enabled.

Skip it for:

Simple factual questions
Creative writing that doesn't require analysis
Straightforward coding tasks
Anything time-sensitive where latency matters

How to Implement It

The implementation depends on where you're using Claude.

Via API (claude-3-7-sonnet or claude-3-5-sonnet):

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[
        {
            "role": "user",
            "content": "Debug this function that's supposed to find the longest palindromic substring but returns incorrect results for edge cases."
        }
    ]
)

for block in response.content:
    if block.type == "thinking":
        print("Internal reasoning:", block.thinking)
    elif block.type == "text":
        print("Response:", block.text)

The budget_tokens parameter controls how much thinking time you allocate. Higher budgets mean more reasoning but higher costs. Start with 5000-10000 tokens and adjust based on problem complexity.

Via Claude Web or Claude Desktop:

You can't directly toggle extended thinking in the UI yet, but Claude will automatically use it for complex reasoning tasks. Pay attention to the "thinking" section if it appears.

Practical Patterns That Work

Pattern 1: Structured Problem Analysis

For complex decisions, prime the thinking by requesting a specific analytical structure:

I need to decide whether to refactor our authentication system or migrate to a third-party provider.

Please:
1. Use extended thinking to analyze both options deeply
2. Consider: cost, security implications, timeline, team capability
3. Identify hidden risks for each approach
4. Recommend the better path with clear reasoning

Extended thinking excels here because it genuinely weighs multiple factors rather than defaulting to the first compelling argument.

Pattern 2: Code Debugging

When you're stuck on a bug:

This Python function is supposed to calculate compound interest but produces wrong results for certain inputs.
Here's the code: [code]

Debug this by:
1. Tracing through the logic step-by-step
2. Identifying where the calculation diverges from the correct formula
3. Showing me the fix and explaining the error

Extended thinking's internal reasoning helps Claude trace execution paths that would otherwise be missed.

Pattern 3: Multi-Domain Problem Solving

Problems spanning multiple disciplines benefit heavily from extended thinking:

We're designing a new onboarding process for a SaaS product.
Current metrics: 40% drop-off rate, mostly from non-technical users.

Consider: UX principles, business goals, technical constraints, user psychology, support costs.

What's the optimal redesign?

The model's reasoning blocks help integrate insights from different domains rather than optimizing for just one.

Optimizing Your Budget Tokens

Budget tokens control the depth of reasoning. Here's how to set them:

Simple reasoning (what's 2+2?): 500-1000 tokens
Moderate complexity (debug this code): 5000-8000 tokens
Very complex (system architecture decisions): 10000-16000 tokens

Monitor your actual usage. Claude won't always use the full budget—it'll think as much as needed and stop. You're paying for what's actually used.

Pro tip: If Claude's response seems shallow or misses obvious angles, you likely need more budget tokens. Increase incrementally.

What to Watch For

Extended thinking isn't magic, and there are real limitations:

Latency increases. A 10,000-token thinking budget typically adds 20-40 seconds to response time. Not acceptable for interactive applications.

Token costs compound. Thinking tokens cost roughly 4x standard tokens. A 10,000-token thinking block costs as much as 40,000 regular tokens. Budget accordingly.

Sometimes it overthinks. For straightforward problems, extended thinking can produce verbose reasoning that doesn't improve the actual answer. The solution: use it selectively, not by default.

The reasoning isn't always visible. You see the thinking block, but you're still trusting Claude's final response. Extended thinking improves reasoning but doesn't eliminate hallucination risk.

Real-World Example

Here's how I'd tackle a genuinely hard problem:

Task: Design a caching strategy for a distributed system handling 100k requests/second.

Prompt:

Design a caching strategy for a distributed system:
- 100k requests/second
- 80/20 rule: 80% of requests hit 20% of keys
- 1GB memory limit per cache node
- 5-second acceptable staleness for most data
- Some data must be fresh within 100ms

Consider: eviction policies, consistency guarantees, cache topology,
refresh strategies, failure modes.

Why extended thinking here:

Multiple competing constraints
Requires systems thinking
No single right answer
Hidden tradeoffs worth exploring

With extended thinking enabled, Claude will reason through cache hit ratios, consistency models, and scaling implications before recommending an architecture.

The Bottom Line

Extended thinking is worth using when you're paying for quality thinking, not speed. It's a tool for problems that genuinely require reasoning—not a universal upgrade.

Start small. Pick one type of problem where your current Claude responses feel insufficient. Enable extended thinking for those cases. Monitor both quality improvements and token costs. You'll quickly find the right balance for your use case.