AI API Cost Calculator

Compare inference costs across GPT-5, Claude 4.5, Gemini 3, and more.
Now with agentic loops, caching, and reasoning tokens

Last Updated: February 6, 2026

Configure Your Workload

tokens
tokens

Agentic Loop Configuration

In 2026, most AI applications use multi-step "agentic loops" rather than single prompts. Calculate costs for workflows like: "5 research steps + 1 summary step = 6 total steps."

1 step

Context Caching (Save up to 90%)

Modern APIs (Anthropic, Gemini, OpenAI) offer context caching that reduces input costs by up to 90% for repeated data. Set your cache hit rate to see potential savings.

0%

Reasoning models (like OpenAI's o-series) "think" before responding. These "thinking tokens" cost extra money and are often missed by older calculators. Enable reasoning to see the true cost.

Cost Comparison

GPT-5

OpenAI
$0.1155
Total Cost
Input tokens$0.0875
Output tokens$0.0280

Gemini 3 Pro

Google
$0.0825
Total Cost
Input tokens$0.0625
Output tokens$0.0200
Lowest

Gemini 3 Flash

Google
$0.004350
Total Cost
Input tokens$0.003750
Output tokens$0.000600

Grok-4

xAI
$0.1986
Total Cost
Input tokens$0.1655
Output tokens$0.0331

Llama 4 Maverick

Meta
$0.0112
Total Cost
Input tokens$0.0100
Output tokens$0.001200

Context Caching Savings

Enable context caching to save up to 90% on repeated inputs

Comparison Heat Map

Visual comparison showing which model performs best for different use cases

Model
Low Latency
Fast response times, minimal processing
High Intelligence
Complex reasoning, large context, advanced capabilities
Cost Optimized
Lowest total cost with caching
High Volume
Best for bulk processing, high throughput
GPT-5
OpenAI
#3
3rd
#5
#4
#4
Gemini 3 Pro
Google
#5
#1
Best
#3
3rd
#3
3rd
Gemini 3 Flash
Google
#1
Best
#4
#1
Best
#1
Best
Grok-4
xAI
#4
#2
2nd
#5
#5
Llama 4 Maverick
Meta
#2
2nd
#3
3rd
#2
2nd
#2
2nd
Best for this scenario
Good option
Average
Not recommended

Detailed Breakdown

ModelInput PriceOutput PriceContextTotal Costvs. Lowest
Gemini 3 FlashGoogle
$0.07/1M$0.30/1M1.0M$0.004350Lowest
Llama 4 MaverickMeta
$0.20/1M$0.60/1M1.0M$0.0112+$0.006850 (157%)
Gemini 3 ProGoogle
$1.25/1M$10.00/1M2.0M$0.0825+$0.0781 (1797%)
GPT-5OpenAI
$1.75/1M$14.00/1M256.0K$0.1155+$0.1111 (2555%)
Grok-4xAI
$3.31/1M$16.54/1M256.0K$0.1986+$0.1942 (4465%)

Developer Code Snippet

Copy-paste code to integrate Gemini 3 Flash into your application

import google.generativeai as genai

genai.configure(api_key="your-api-key")
model = genai.GenerativeModel('gemini-3-flash')

response = model.generate_content(
    "Your prompt here",
    generation_config={
        "max_output_tokens": 2000
    }
)

print(response.text)

💡 Replace "your-api-key" with your actual API key. Adjust tokens and parameters as needed.

Largest Context Window AI API 2026

Compare AI APIs with the largest context windows in 2026. Find models that can handle the most tokens for long documents and extensive conversations.

Complete Pricing Comparison

Gemini 3 Pro

Google

Input

$1.25/1M

Output

$10.00/1M

Cached

$0.32/1M

Context

2000K

Gemini 3 Flash

Google

Input

$0.07/1M

Output

$0.30/1M

Cached

$0.02/1M

Context

1000K

Llama 4 Maverick

Meta

Input

$0.20/1M

Output

$0.60/1M

Context

1000K

GPT-5

OpenAI

Input

$1.75/1M

Output

$14.00/1M

Cached

$0.44/1M

Context

256K

Grok-4

xAI

Input

$3.31/1M

Output

$16.54/1M

Context

256K

Key Insights

Consider Total Cost of Ownership

While input pricing is important, consider output costs, context caching availability, and additional features. Some models offer better value when factoring in cached input pricing, larger context windows, or specialized capabilities that reduce overall token usage.

Usage Patterns Matter

The cheapest model for high-volume input may not be cheapest for high-output scenarios. Use the calculator above to estimate costs based on your specific input/output token ratios, agentic steps, and caching usage.

Frequently Asked Questions

Which AI API has the best features?

Gemini 3 Pro offers the lowest input pricing at $1.25 per 1M tokens. However, the best choice depends on your specific use case, output requirements, and whether you can leverage context caching.

How do I reduce API costs?

Use context caching when available, optimize your prompts to reduce token usage, implement streaming for faster responses, and consider using cheaper models for high-volume operations while reserving premium models for complex tasks.

Should I use multiple AI APIs?

Many developers use multiple APIs for different tasks. You might use a cheaper model for high-volume operations and a more capable model for complex reasoning or specialized tasks. The calculator above helps you compare costs across different models.