Compare inference costs across GPT-5, Claude 4.5, Gemini 3, and more.
Now with agentic loops, caching, and reasoning tokens
In 2026, most AI applications use multi-step "agentic loops" rather than single prompts. Calculate costs for workflows like: "5 research steps + 1 summary step = 6 total steps."
Modern APIs (Anthropic, Gemini, OpenAI) offer context caching that reduces input costs by up to 90% for repeated data. Set your cache hit rate to see potential savings.
Reasoning models (like OpenAI's o-series) "think" before responding. These "thinking tokens" cost extra money and are often missed by older calculators. Enable reasoning to see the true cost.
Enable context caching to save up to 90% on repeated inputs
Visual comparison showing which model performs best for different use cases
| Model | Input Price | Output Price | Context | Total Cost | vs. Lowest |
|---|---|---|---|---|---|
Gemini 3 FlashGoogle | $0.07/1M | $0.30/1M | 1.0M | $0.004350 | Lowest |
Llama 4 MaverickMeta | $0.20/1M | $0.60/1M | 1.0M | $0.0112 | +$0.006850 (157%) |
GPT-5 MiniOpenAI | $0.25/1M | $2.00/1M | 128.0K | $0.0165 | +$0.0122 (279%) |
Gemini 3 ProGoogle | $1.25/1M | $10.00/1M | 2.0M | $0.0825 | +$0.0781 (1797%) |
GPT-5OpenAI | $1.75/1M | $14.00/1M | 256.0K | $0.1155 | +$0.1111 (2555%) |
Claude 4.5 SonnetAnthropic | $3.00/1M | $15.00/1M | 200.0K | $0.1800 | +$0.1757 (4038%) |
Grok-4xAI | $3.31/1M | $16.54/1M | 256.0K | $0.1986 | +$0.1942 (4465%) |
o3 (Reasoning)OpenAI | $10.00/1M | $40.00/1M | 200.0K | $0.5800 | +$0.5756 (13233%) |
Claude 4 OpusAnthropic | $15.00/1M | $75.00/1M | 200.0K | $0.9000 | +$0.8957 (20590%) |
Copy-paste code to integrate Gemini 3 Flash into your application
import google.generativeai as genai
genai.configure(api_key="your-api-key")
model = genai.GenerativeModel('gemini-3-flash')
response = model.generate_content(
"Your prompt here",
generation_config={
"max_output_tokens": 2000
}
)
print(response.text)💡 Replace "your-api-key" with your actual API key. Adjust tokens and parameters as needed.
Compare the cheapest text AI APIs in 2026. Find the most affordable LLM APIs for text generation, analysis, summarization, and language tasks.
$0.07/1M tokens
$0.30/1M tokens
$0.07/1M
$0.30/1M
$0.02/1M
1000K
Meta
$0.20/1M
$0.60/1M
1000K
OpenAI
$0.25/1M
$2.00/1M
$0.06/1M
128K
$1.25/1M
$10.00/1M
$0.32/1M
2000K
OpenAI
$1.75/1M
$14.00/1M
$0.44/1M
256K
Anthropic
$3.00/1M
$15.00/1M
$0.30/1M
200K
xAI
$3.31/1M
$16.54/1M
256K
OpenAI
$10.00/1M
$40.00/1M
$2.50/1M
200K
Anthropic
$15.00/1M
$75.00/1M
$1.50/1M
200K
Input pricing ranges from $0.07 per 1M tokens (Gemini 3 Flash) to $15.00 per 1M tokens (Claude 4 Opus). The difference represents a 19900% price variation, so choosing the right model can significantly impact your costs.
While input pricing is important, consider output costs, context caching availability, and additional features. Some models offer better value when factoring in cached input pricing, larger context windows, or specialized capabilities that reduce overall token usage.
The cheapest model for high-volume input may not be cheapest for high-output scenarios. Use the calculator above to estimate costs based on your specific input/output token ratios, agentic steps, and caching usage.
Gemini 3 Flash offers the lowest input pricing at $0.07 per 1M tokens. However, the best choice depends on your specific use case, output requirements, and whether you can leverage context caching.
Use context caching when available, optimize your prompts to reduce token usage, implement streaming for faster responses, and consider using cheaper models for high-volume operations while reserving premium models for complex tasks.
Many developers use multiple APIs for different tasks. You might use a cheaper model for high-volume operations and a more capable model for complex reasoning or specialized tasks. The calculator above helps you compare costs across different models.