AI API Cost Calculator

Compare inference costs across GPT-5, Claude 4.5, Gemini 3, and more.
Now with agentic loops, caching, and reasoning tokens

Last Updated: February 6, 2026

Select Models to Compare

Configure Your Workload

Input Tokens per Request

tokens

Output Tokens per Request

tokens

Input Modality

Agentic Loop Configuration

In 2026, most AI applications use multi-step "agentic loops" rather than single prompts. Calculate costs for workflows like: "5 research steps + 1 summary step = 6 total steps."

Agentic Steps per Task1 step

Context Caching (Save up to 90%)

Modern APIs (Anthropic, Gemini, OpenAI) offer context caching that reduces input costs by up to 90% for repeated data. Set your cache hit rate to see potential savings.

Context Cache Hit Rate0%

Reasoning Tier Pricing (Hidden Cost)

Reasoning models (like OpenAI's o-series) "think" before responding. These "thinking tokens" cost extra money and are often missed by older calculators. Enable reasoning to see the true cost.

Cost Comparison

GPT-5

OpenAI

$0.1155

Total Cost

Input tokens$0.0875

Output tokens$0.0280

GPT-5 Mini

OpenAI

$0.0165

Total Cost

Input tokens$0.0125

Output tokens$0.004000

o3 (Reasoning)

OpenAI

$0.5800

Total Cost

Input tokens$0.5000

Output tokens$0.0800

Claude 4.5 Sonnet

Anthropic

$0.1800

Total Cost

Input tokens$0.1500

Output tokens$0.0300

Claude 4 Opus

Anthropic

$0.9000

Total Cost

Input tokens$0.7500

Output tokens$0.1500

Gemini 3 Pro

Google

$0.0825

Total Cost

Input tokens$0.0625

Output tokens$0.0200

Lowest

Gemini 3 Flash

Google

$0.004350

Total Cost

Input tokens$0.003750

Output tokens$0.000600

Grok-4

xAI

$0.1986

Total Cost

Input tokens$0.1655

Output tokens$0.0331

Llama 4 Maverick

Context Caching Savings

Enable context caching to save up to 90% on repeated inputs

Set the cache hit rate above 0% to see potential savings. Modern APIs like Anthropic, Gemini, and OpenAI offer context caching that can reduce input costs by up to 90% for repeated contexts.

Comparison Heat Map

Visual comparison showing which model performs best for different use cases

Model

Low Latency

Fast response times, minimal processing

High Intelligence

Complex reasoning, large context, advanced capabilities

Cost Optimized

Lowest total cost with caching

High Volume

Best for bulk processing, high throughput

GPT-5

OpenAI

GPT-5 Mini

OpenAI

3rd

2nd

o3 (Reasoning)

OpenAI

2nd

Claude 4.5 Sonnet

Anthropic

Claude 4 Opus

Anthropic

Best

Gemini 3 Pro

Google

3rd

Gemini 3 Flash

Google

Best

Grok-4

xAI

Llama 4 Maverick

Detailed Breakdown

Model	Input Price	Output Price	Context	Total Cost	vs. Lowest
Gemini 3 FlashGoogle	$0.07/1M	$0.30/1M	1.0M	$0.004350	Lowest
Llama 4 MaverickMeta	$0.20/1M	$0.60/1M	1.0M	$0.0112	+$0.006850 (157%)
GPT-5 MiniOpenAI	$0.25/1M	$2.00/1M	128.0K	$0.0165	+$0.0122 (279%)
Gemini 3 ProGoogle	$1.25/1M	$10.00/1M	2.0M	$0.0825	+$0.0781 (1797%)
GPT-5OpenAI	$1.75/1M	$14.00/1M	256.0K	$0.1155	+$0.1111 (2555%)
Claude 4.5 SonnetAnthropic	$3.00/1M	$15.00/1M	200.0K	$0.1800	+$0.1757 (4038%)
Grok-4xAI	$3.31/1M	$16.54/1M	256.0K	$0.1986	+$0.1942 (4465%)
o3 (Reasoning)OpenAI	$10.00/1M	$40.00/1M	200.0K	$0.5800	+$0.5756 (13233%)
Claude 4 OpusAnthropic	$15.00/1M	$75.00/1M	200.0K	$0.9000	+$0.8957 (20590%)

Developer Code Snippet

Copy-paste code to integrate Gemini 3 Flash into your application

import google.generativeai as genai

genai.configure(api_key="your-api-key")
model = genai.GenerativeModel('gemini-3-flash')

response = model.generate_content(
    "Your prompt here",
    generation_config={
        "max_output_tokens": 2000
    }
)

print(response.text)

💡 Replace "your-api-key" with your actual API key. Adjust tokens and parameters as needed.

Cheapest Text AI API 2026

Compare the cheapest text AI APIs in 2026. Find the most affordable LLM APIs for text generation, analysis, summarization, and language tasks.

Most Affordable Option

Gemini 3 Flash

Google

Input:

$0.07/1M tokens

Output:

$0.30/1M tokens

Complete Pricing Comparison

Gemini 3 Flash

Google

Input

$0.07/1M

Output

$0.30/1M

Cached

$0.02/1M

Context

1000K

Cheapest

Llama 4 Maverick

GPT-5 Mini

OpenAI

Input

$0.25/1M

Output

$2.00/1M

Cached

$0.06/1M

Context

128K

Gemini 3 Pro

Google

Input

$1.25/1M

Output

$10.00/1M

Cached

$0.32/1M

Context

2000K

GPT-5

OpenAI

Input

$1.75/1M

Output

$14.00/1M

Cached

$0.44/1M

Context

256K

Claude 4.5 Sonnet

Anthropic

Input

$3.00/1M

Output

$15.00/1M

Cached

$0.30/1M

Context

200K

Grok-4

xAI

Input

$3.31/1M

Output

$16.54/1M

Context

256K

o3 (Reasoning)

OpenAI

Input

$10.00/1M

Output

$40.00/1M

Cached

$2.50/1M

Context

200K

Claude 4 Opus

Anthropic

Input

$15.00/1M

Output

$75.00/1M

Cached

$1.50/1M

Context

200K

Key Insights

Price Range

Input pricing ranges from $0.07 per 1M tokens (Gemini 3 Flash) to $15.00 per 1M tokens (Claude 4 Opus). The difference represents a 19900% price variation, so choosing the right model can significantly impact your costs.

Consider Total Cost of Ownership

While input pricing is important, consider output costs, context caching availability, and additional features. Some models offer better value when factoring in cached input pricing, larger context windows, or specialized capabilities that reduce overall token usage.

Usage Patterns Matter

The cheapest model for high-volume input may not be cheapest for high-output scenarios. Use the calculator above to estimate costs based on your specific input/output token ratios, agentic steps, and caching usage.

Frequently Asked Questions

Which is the cheapest AI API?

Gemini 3 Flash offers the lowest input pricing at $0.07 per 1M tokens. However, the best choice depends on your specific use case, output requirements, and whether you can leverage context caching.

How do I reduce API costs?

Use context caching when available, optimize your prompts to reduce token usage, implement streaming for faster responses, and consider using cheaper models for high-volume operations while reserving premium models for complex tasks.

Should I use multiple AI APIs?

Many developers use multiple APIs for different tasks. You might use a cheaper model for high-volume operations and a more capable model for complex reasoning or specialized tasks. The calculator above helps you compare costs across different models.