Skip to main content

Overview

Some AI models support a “Thinking” (Reasoning) feature, where the model performs internal reasoning before generating the final answer. The thinking process is returned through specific fields.

Models with Thinking Support

ModelThinking FieldDescription
o1reasoning_contentOpenAI reasoning model
o1-minireasoning_contentLightweight reasoning model
o3reasoning_contentLatest reasoning model
o3-minireasoning_contentLightweight version
o4-minireasoning_contentLatest lightweight reasoning
claude-sonnet-4-20250514thinkingClaude extended thinking
claude-opus-4-20250514thinkingClaude extended thinking
gemini-2.5-prothoughtsGemini thinking
gemini-2.5-flash-thinkingthoughtsGemini thinking
deepseek-r1reasoning_contentDeepSeek reasoning

Response Format

OpenAI Format (o1/o3/DeepSeek)

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Final answer content",
      "reasoning_content": "Model's thinking process..."
    }
  }],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 500,
    "total_tokens": 600,
    "completion_tokens_details": {
      "reasoning_tokens": 350
    }
  }
}

Thinking in Streaming Output

// Thinking phase
{"choices": [{"delta": {"reasoning_content": "Let me think about this..."}}]}
{"choices": [{"delta": {"reasoning_content": "First, let's analyze..."}}]}

// Answer phase
{"choices": [{"delta": {"content": "The final answer is..."}}]}

Claude Extended Thinking

When using Claude models, enable extended thinking via the thinking parameter:
from openai import OpenAI

client = OpenAI(api_key="sk-xxx", base_url="https://crazyrouter.com/v1")

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational"}],
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10000
        }
    }
)

# Thinking content is in the reasoning_content field
print("Thinking:", response.choices[0].message.reasoning_content)
print("Answer:", response.choices[0].message.content)

Thinking Token Billing

Thinking tokens count toward total usage. Reasoning model thinking tokens typically account for 50%-80% of total output. Be mindful of cost control.
ModelThinking Token Billing
o1/o3 seriesBilled at output token price
Claude extended thinkingBilled at output token price
DeepSeek-R1Billed at output token price

Code Example: Extracting Thinking Content

response = client.chat.completions.create(
    model="o3-mini",
    messages=[{"role": "user", "content": "Which is larger, 9.11 or 9.8?"}]
)

msg = response.choices[0].message

# Thinking process
if hasattr(msg, 'reasoning_content') and msg.reasoning_content:
    print(f"Thinking process:\n{msg.reasoning_content}\n")

# Final answer
print(f"Answer:\n{msg.content}")

# Token usage
usage = response.usage
print(f"\nTotal tokens: {usage.total_tokens}")
if hasattr(usage, 'completion_tokens_details'):
    print(f"Reasoning tokens: {usage.completion_tokens_details.reasoning_tokens}")
Not all models support thinking. For models that don’t support it, the reasoning_content field will not appear in the response.