What Is Chain-of-Thought?
Standard prompting: question โ answer.
Chain-of-thought (CoT): question โ reasoning steps โ answer.
By asking the model to reason before answering, you dramatically improve accuracy on tasks that require multi-step logic, math, or inference.
Without CoT:
Q: A store sells apples for $0.50 each and bananas for $0.30 each.
If I buy 3 apples and 5 bananas, how much do I spend?
A: $2.50 โ wrong
With CoT:
Q: ... (same question) ... Think step by step.
A: Let me work through this:
- 3 apples ร $0.50 = $1.50
- 5 bananas ร $0.30 = $1.50
- Total: $1.50 + $1.50 = $3.00
The answer is $3.00. โ correct
The reasoning forces the model to actually compute rather than pattern-match to an answer.
When CoT Helps Most
| Task Type | CoT Impact |
|---|---|
| Multi-step math | Very high |
| Logic and reasoning puzzles | Very high |
| Code debugging | High |
| Complex instruction following | Medium |
| Factual recall | Low |
| Creative writing | None |
CoT helps when the correct answer requires intermediate reasoning steps. It doesn't help (and can slow you down) for tasks where you just need recall.
The Three CoT Patterns
1. Zero-Shot CoT ("Think step by step")
Just add "Let's think step by step" or "Think step by step before answering":
Problem: A Python function is returning None instead of the expected list.
The function uses a generator. What's likely wrong?
Think step by step, then give your answer.
Simple, works well for most reasoning tasks with frontier models.
2. Few-Shot CoT (Examples with Reasoning)
Show the model examples of correct reasoning, then let it follow the pattern:
Classify the sentiment of the following reviews.
Show your reasoning, then give a label: POSITIVE, NEGATIVE, or NEUTRAL.
Review: "The product works as described but took 3 weeks to arrive."
Reasoning: The product functionality is positive (works as described) but
the shipping time is negative (3 weeks is long). Mixed signals with one
notable frustration. Overall: slightly negative.
Label: NEGATIVE
Review: "Absolutely fantastic quality, exceeded my expectations!"
Reasoning: Strong positive language ("absolutely fantastic," "exceeded expectations").
No negatives mentioned.
Label: POSITIVE
Review: "The coffee maker makes decent coffee."
Reasoning: [Your reasoning here]
Label: [POSITIVE / NEGATIVE / NEUTRAL]
3. Self-Consistency (Multiple Reasoning Paths)
For high-stakes decisions, generate multiple reasoning paths and take the most common answer:
import anthropic
client = anthropic.Anthropic()
def self_consistent_answer(question: str, n: int = 5) -> str:
answers = []
for _ in range(n):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"{question}\n\nThink through this carefully, step by step."
}]
)
# Extract final answer (implementation depends on format)
answers.append(extract_answer(response.content[0].text))
# Return most common answer
from collections import Counter
return Counter(answers).most_common(1)[0][0]
More expensive (N API calls), but significantly more reliable for complex reasoning.
CoT for Code Review
Review this Python function for correctness.
Think through each step: what the code does, what it's supposed to do,
and any cases where these might differ.
```python
def find_peak(arr):
left, right = 0, len(arr) - 1
while left < right:
mid = (left + right) // 2
if arr[mid] > arr[mid + 1]:
right = mid
else:
left = mid + 1
return left
Does this correctly find a peak element? Walk through your reasoning.
By asking the model to trace through the code, it's much more likely to catch edge cases and logical errors.
---
## Structured CoT for Complex Problems
For very complex problems, provide a structure for the reasoning:
You're debugging a performance issue in a web application.
Problem: The dashboard page takes 8 seconds to load. It was fast last week.
Use this framework to analyze the problem:
- What changed: What might have changed in the codebase or environment?
- Possible causes: List 3-5 potential causes in order of likelihood.
- Diagnosis steps: For each cause, how would you verify or rule it out?
- Recommended first step: Based on your analysis, where would you start?
Structured CoT is especially useful for engineering decisions where you want systematic reasoning, not just an answer.
---
## Implementation in Prompts
```python
# System prompt encouraging CoT
SYSTEM = """You are a code assistant.
When given a technical problem:
1. First identify what the problem is asking
2. Break down the approach into steps
3. Work through each step
4. State your final answer clearly
Show your reasoning before giving the answer."""
# User message
user_message = "What's wrong with this SQL query: SELECT * FROM users WHERE id = " + user_input
Key Takeaways
- CoT prompting asks the model to reason before answering, improving accuracy on multi-step tasks
- "Think step by step" (zero-shot CoT) works well for most reasoning tasks
- Few-shot CoT โ show examples with reasoning โ is most reliable for consistent formatting
- Self-consistency โ multiple CoT samples + majority vote โ for high-stakes decisions
- CoT helps most for math, logic, debugging, and complex inference โ not for factual recall
- Provide a reasoning structure for very complex problems to guide systematic thinking