Resourcesโ€บPrompt Engineeringโ€บChain-of-Thought Prompting: Get Better Answers from Any LLM
๐Ÿง Prompt Engineeringโ€” Chain-of-Thought Prompting: Get Better Answers from Any LLMโฑ 6 min

Chain-of-Thought Prompting: Get Better Answers from Any LLM

Chain-of-thought prompting makes LLMs reason step by step before answering. Here's when to use it and the patterns that work best.

๐Ÿ“…January 26, 2026โœTechTwitter.ioprompt-engineeringchain-of-thoughtllmreasoning

What Is Chain-of-Thought?

Standard prompting: question โ†’ answer.

Chain-of-thought (CoT): question โ†’ reasoning steps โ†’ answer.

By asking the model to reason before answering, you dramatically improve accuracy on tasks that require multi-step logic, math, or inference.

Without CoT:

Q: A store sells apples for $0.50 each and bananas for $0.30 each.
   If I buy 3 apples and 5 bananas, how much do I spend?

A: $2.50  โ† wrong

With CoT:

Q: ... (same question) ... Think step by step.

A: Let me work through this:
   - 3 apples ร— $0.50 = $1.50
   - 5 bananas ร— $0.30 = $1.50
   - Total: $1.50 + $1.50 = $3.00

The answer is $3.00.  โ† correct

The reasoning forces the model to actually compute rather than pattern-match to an answer.


When CoT Helps Most

Task TypeCoT Impact
Multi-step mathVery high
Logic and reasoning puzzlesVery high
Code debuggingHigh
Complex instruction followingMedium
Factual recallLow
Creative writingNone

CoT helps when the correct answer requires intermediate reasoning steps. It doesn't help (and can slow you down) for tasks where you just need recall.


The Three CoT Patterns

1. Zero-Shot CoT ("Think step by step")

Just add "Let's think step by step" or "Think step by step before answering":

Problem: A Python function is returning None instead of the expected list.
The function uses a generator. What's likely wrong?

Think step by step, then give your answer.

Simple, works well for most reasoning tasks with frontier models.

2. Few-Shot CoT (Examples with Reasoning)

Show the model examples of correct reasoning, then let it follow the pattern:

Classify the sentiment of the following reviews.
Show your reasoning, then give a label: POSITIVE, NEGATIVE, or NEUTRAL.

Review: "The product works as described but took 3 weeks to arrive."
Reasoning: The product functionality is positive (works as described) but
the shipping time is negative (3 weeks is long). Mixed signals with one
notable frustration. Overall: slightly negative.
Label: NEGATIVE

Review: "Absolutely fantastic quality, exceeded my expectations!"
Reasoning: Strong positive language ("absolutely fantastic," "exceeded expectations").
No negatives mentioned.
Label: POSITIVE

Review: "The coffee maker makes decent coffee."
Reasoning: [Your reasoning here]
Label: [POSITIVE / NEGATIVE / NEUTRAL]

3. Self-Consistency (Multiple Reasoning Paths)

For high-stakes decisions, generate multiple reasoning paths and take the most common answer:

import anthropic

client = anthropic.Anthropic()

def self_consistent_answer(question: str, n: int = 5) -> str:
    answers = []
    for _ in range(n):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": f"{question}\n\nThink through this carefully, step by step."
            }]
        )
        # Extract final answer (implementation depends on format)
        answers.append(extract_answer(response.content[0].text))

    # Return most common answer
    from collections import Counter
    return Counter(answers).most_common(1)[0][0]

More expensive (N API calls), but significantly more reliable for complex reasoning.


CoT for Code Review

Review this Python function for correctness.
Think through each step: what the code does, what it's supposed to do,
and any cases where these might differ.

```python
def find_peak(arr):
    left, right = 0, len(arr) - 1
    while left < right:
        mid = (left + right) // 2
        if arr[mid] > arr[mid + 1]:
            right = mid
        else:
            left = mid + 1
    return left

Does this correctly find a peak element? Walk through your reasoning.


By asking the model to trace through the code, it's much more likely to catch edge cases and logical errors.

---

## Structured CoT for Complex Problems

For very complex problems, provide a structure for the reasoning:

You're debugging a performance issue in a web application.

Problem: The dashboard page takes 8 seconds to load. It was fast last week.

Use this framework to analyze the problem:

  1. What changed: What might have changed in the codebase or environment?
  2. Possible causes: List 3-5 potential causes in order of likelihood.
  3. Diagnosis steps: For each cause, how would you verify or rule it out?
  4. Recommended first step: Based on your analysis, where would you start?

Structured CoT is especially useful for engineering decisions where you want systematic reasoning, not just an answer.

---

## Implementation in Prompts

```python
# System prompt encouraging CoT
SYSTEM = """You are a code assistant.
When given a technical problem:
1. First identify what the problem is asking
2. Break down the approach into steps
3. Work through each step
4. State your final answer clearly

Show your reasoning before giving the answer."""

# User message
user_message = "What's wrong with this SQL query: SELECT * FROM users WHERE id = " + user_input

Key Takeaways

  • CoT prompting asks the model to reason before answering, improving accuracy on multi-step tasks
  • "Think step by step" (zero-shot CoT) works well for most reasoning tasks
  • Few-shot CoT โ€” show examples with reasoning โ€” is most reliable for consistent formatting
  • Self-consistency โ€” multiple CoT samples + majority vote โ€” for high-stakes decisions
  • CoT helps most for math, logic, debugging, and complex inference โ€” not for factual recall
  • Provide a reasoning structure for very complex problems to guide systematic thinking