NVIDIA Models API Cost Calculator & Comparison

Every NVIDIA model, side by side — current API rates, context window, benchmarks, and a live calculator that ranks them at your exact workload. 10 active models, 10 with public pricing. Prices refreshed daily.

Models tracked

10

Active

10

With public pricing

10

Cheapest input

$0.00/1M

Calculate your NVIDIA API cost at your workload.

Set your workload — every priced model ranks in real time.

Adjust the workload

Every model below updates in real time.

1,00010,00050,000250,0001M10M

Ranked by your monthly bill

No models with public pricing available to compare right now.

Pricing at a glance

Blended $/1M tokens across the lineup.

Blended price uses a 3-to-1 input/output ratio. Green bar = cheapest.

Quick picks

Best NVIDIA model for your use case.

As of April 2026, NVIDIA offers 10 active models via API, ranging from $0/1M to $1.20/1M input tokens. The most context-rich model handles up to 262K tokens. Models support vision, deep reasoning, tool use. All prices are USD per 1 million tokens.

Quality vs price

NVIDIA benchmarks at a glance.

Each point is one model — X is blended $/1M tokens, Y is the average of available quality benchmarks. Larger bubbles mean larger context windows.

Per-model benchmark scores

ModelAvgScores
Nemotron 3 Super81.0
MMLU Pro83.6LiveCodeBench78.4
Nemotron 3 Super81.0
MMLU Pro83.6LiveCodeBench78.4
Nemotron Nano 9B V280.6
AIME 202572.1MATH97.8GPQA Diamond64IFEval90.3HellaSwag78.9
Nemotron Nano 9B V280.6
AIME 202572.1MATH97.8GPQA Diamond64IFEval90.3HellaSwag78.9
Nemotron 3 Nano 30B A3B78.6
MMLU Pro78.3AIME 202589.1LiveCodeBench68.3
Nemotron 3 Nano 30B A3B78.6
MMLU Pro78.3AIME 202589.1LiveCodeBench68.3
Llama 3.3 Nemotron Super 49B V1.571.2
GPQA Diamond72MATH97.4AIME 202487.5AIME 202582.7LiveCodeBench73.6AA Intelligence Index14
Llama 3.1 Nemotron 70B Instruct50.3
MT-Bench9.0Chatbot Arena Elo1267IFEval73.8BBH47.1MMLU Pro43.5

Every model

Every NVIDIA model — pricing, context & capabilities.

ModelContextInput /1MOutput /1M
Nemotron 3 Nano 30B A3B262K$0.05$0.2
Nemotron 3 Nano 30B A3B256K$0.0$0.0
Nemotron 3 Super262K$0.09$0.45
Nemotron 3 Super262K$0.0$0.0
Nemotron Nano 12B 2 VL131K$0.2$0.6
Nemotron Nano 12B 2 VL128K$0.0$0.0
Nemotron Nano 9B V2131K$0.04$0.16
Nemotron Nano 9B V2128K$0.0$0.0
Llama 3.3 Nemotron Super 49B V1.5131K$0.1$0.4
Llama 3.1 Nemotron 70B Instruct131K$1.20$1.20

FAQ

Perguntas frequentes

Pricing patterns, best-known use cases, and how this provider stacks up.

Get instant answers from our AI agent

NVIDIA API pricing ranges from $0 to $1.20 per 1M input tokens. Output tokens cost more than input on every model. Prices are per 1 million tokens (1M ≈ 750,000 words). Use the calculator above to estimate your monthly spend at your actual workload.
Nemotron 3 Nano 30B A3B is the lowest-priced NVIDIA model with public pricing at $0/1M input tokens. It suits high-volume tasks where cost matters most — classification, extraction, summarization, and similar workloads that don't need frontier reasoning.
Llama 3.1 Nemotron 70B Instruct is NVIDIA's highest-tier model at $1.20/1M input. It delivers the most sophisticated reasoning, instruction-following, and nuance. For workloads that don't require frontier performance, a mid-tier model typically cuts inference costs substantially.
Nemotron 3 Nano 30B A3B, Nemotron 3 Nano 30B A3B, Nemotron 3 Super and 6 more support deep reasoning mode, which improves performance on multi-step coding, debugging, and code review. For simpler autocomplete or snippet generation, a faster, cheaper model often delivers acceptable quality at a fraction of the cost.
Nemotron 3 Nano 30B A3B, Nemotron 3 Nano 30B A3B, Nemotron 3 Super and 7 more support function calling (tool use), required for agentic workflows. Agents need a model that reliably follows structured output schemas — test with your specific tool definitions before committing to production volumes.
Yes — Nemotron Nano 12B 2 VL, Nemotron Nano 12B 2 VL accept image input alongside text. You can pass screenshots, photos, charts, and documents for analysis. Vision adds no separate line-item on most NVIDIA models — you're billed for the token equivalent of the image.
Yes — NVIDIA supports prompt caching (discounts for repeated context) and batch processing (accept a delay, cut costs ~50%). These rates appear in the table above under "Cached /1M" and "Batch /1M." Caching pays off quickly if your prompts share a long system prompt or document prefix across many calls.
NVIDIA has historically adjusted prices when launching new model generations, often cutting rates to stay competitive. Buzzi.ai snapshots pricing daily — you can subscribe to price-drop alerts on any NVIDIA model using the "Alert me" button on its detail page.
Use the main comparison wizard to run the same calculator across NVIDIA, Anthropic, Google, Meta, Mistral, and 20+ other providers. Set your exact workload and get a ranked cost chart in under a minute.
Nemotron 3 Nano 30B A3B, Nemotron 3 Nano 30B A3B, Nemotron 3 Super, Nemotron 3 Super and 5 more offer an extended thinking or reasoning mode. The model spends extra compute "thinking" before answering — slower and more expensive, but meaningfully better on complex, multi-step problems. Standard mode is faster and cheaper for routine tasks.

Look wider

Compare NVIDIA against other providers.

Open the full wizard — pick a use case, set your usage, and cross-compare against OpenAI, Anthropic, Google, and 20+ more.