Z.ai Models API Cost Calculator & Comparison

Every Z.ai model, side by side — current API rates, context window, benchmarks, and a live calculator that ranks them at your exact workload. 13 active models, 13 with public pricing. Prices refreshed daily.

Models tracked

Active

With public pricing

Cheapest input

$0.00/1M

Calculate your Z.ai API cost at your workload.

Set your workload — every priced model ranks in real time.

Adjust the workload

Every model below updates in real time.

Conversations per month 50,000

1,00010,00050,000250,0001M10M

Input length

Output length

Repeated questions?

Replies can wait a few hours (batch discount)

Ranked by your monthly bill

No models with public pricing available to compare right now.

Pricing at a glance

Blended $/1M tokens across the lineup.

Blended price uses a 3-to-1 input/output ratio. Green bar = cheapest.

Quick picks

Best Z.ai model for your use case.

As of April 2026, Z.ai offers 13 active models via API, ranging from $0/1M to $1.20/1M input tokens. The most context-rich model handles up to 205K tokens. Models support vision, deep reasoning, tool use. All prices are USD per 1 million tokens.

Lowest cost

GLM 4.5 Air

$0/1M input

Largest context

GLM 4.6

205K tokens

Deep reasoning

GLM 5.1

Reasoning mode

Vision / multimodal

GLM 5V Turbo

Image input

Tool use / agents

GLM 5 Turbo

Function calling

Quality vs price

Z.ai benchmarks at a glance.

Each point is one model — X is blended $/1M tokens, Y is the average of available quality benchmarks. Larger bubbles mean larger context windows.

Per-model benchmark scores

Model	Avg	Scores
GLM 5.1	80.0	GPQA Diamond86.2SWE-Bench Verified58.4AIME 202595.3
GLM 4.7 Flash	75.3	AIME 202591.6GPQA Diamond75.2SWE-Bench Verified59.2
GLM 4.7	70.7	MMLU Pro84.3GPQA Diamond85.7AIME 202595.7SWE-Bench Verified73.8LiveCodeBench84.9FrontierMath Tier-40.0%
GLM 4.5	65.5	MMLU Pro84.6GPQA Diamond79.1SWE-Bench Verified64.2MATH98.2AIME 202491LiveCodeBench72.9AA Intelligence Index26Humanity's Last Exam8.3
GLM 5	61.7	GPQA Diamond86SWE-Bench Verified77.8AIME 202592.7AA Intelligence Index50FrontierMath Tier-42.1
GLM 5 Turbo	54.0	FrontierMath Tier-42.1SWE-Bench Verified72.1GPQA Diamond87.8
GLM 4.6	45.7	SWE-Bench Verified68LiveCodeBench82.8AA Intelligence Index30FrontierMath Tier-42.1
GLM 5V Turbo	43.0	AA Intelligence Index43
GLM 4.5 Air	32.9	SWE-Bench Verified57.6Humanity's Last Exam8.1
GLM 4.5 Air	32.9	SWE-Bench Verified57.6Humanity's Last Exam8.1

Open weights

Open Models from Z.ai

Z.ai ships 10 open-source or open-weights models you can self-host or fine-tune. Each links to its Hugging Face card.

Every model

Every Z.ai model — pricing, context & capabilities.

Model	Context	Input /1M	Output /1M	Cached /1M	Batch /1M	Capabilities
GLM 5.1	203K	$1.05	$3.50	$0.525	$0.7	Deep thinkingTool use
GLM 5V Turbo	203K	$1.20	$4.00	$0.24	$0.6	Deep thinkingImagesTool use
GLM 5 Turbo	203K	$1.20	$4.00	$0.24	$0.6	Deep thinkingTool use
GLM 5	80K	$0.72	$2.30	$0.2	$0.5	Deep thinkingTool use
GLM 4.7	203K	$0.38	$1.74	$0.11	$0.3	Deep thinkingTool use
GLM 4.7 Flash	203K	$0.06	$0.4	$0.01	$0.035	Deep thinkingTool use
GLM 4.6V	131K	$0.3	$0.9	$0.11	$0.3	Deep thinkingImagesTool use
GLM 4.6	205K	$0.39	$1.90	$0.11	$0.3	Deep thinkingTool use
GLM 4.5V	66K	$0.6	$1.80	$0.11	$0.3	Deep thinkingImagesTool use
GLM 4.5	131K	$0.6	$2.20	$0.11	$0.3	Deep thinkingTool use
GLM 4.5 Air	131K	$0.13	$0.85	$0.025	$0.1	Deep thinkingTool use
GLM 4.5 Air	131K	$0.0	$0.0	$0.0	$0.0	Deep thinkingTool use
GLM 4 32B	128K	$0.1	$0.1	$0.025	$0.05	Deep thinkingTool use

FAQ

자주 묻는 질문

Pricing patterns, best-known use cases, and how this provider stacks up.

Get instant answers from our AI agent

Z.ai API pricing ranges from $0 to $1.20 per 1M input tokens. Output tokens cost more than input on every model. Prices are per 1 million tokens (1M ≈ 750,000 words). Use the calculator above to estimate your monthly spend at your actual workload.

GLM 4.5 Air is the lowest-priced Z.ai model with public pricing at $0/1M input tokens. It suits high-volume tasks where cost matters most — classification, extraction, summarization, and similar workloads that don't need frontier reasoning.

GLM 5V Turbo is Z.ai's highest-tier model at $1.20/1M input. It delivers the most sophisticated reasoning, instruction-following, and nuance. For workloads that don't require frontier performance, a mid-tier model typically cuts inference costs substantially.

GLM 5.1, GLM 5V Turbo, GLM 5 Turbo and 10 more support deep reasoning mode, which improves performance on multi-step coding, debugging, and code review. For simpler autocomplete or snippet generation, a faster, cheaper model often delivers acceptable quality at a fraction of the cost.

GLM 5.1, GLM 5V Turbo, GLM 5 Turbo and 10 more support function calling (tool use), required for agentic workflows. Agents need a model that reliably follows structured output schemas — test with your specific tool definitions before committing to production volumes.

Yes — GLM 5V Turbo, GLM 4.6V, GLM 4.5V accept image input alongside text. You can pass screenshots, photos, charts, and documents for analysis. Vision adds no separate line-item on most Z.ai models — you're billed for the token equivalent of the image.

Yes — Z.ai supports prompt caching (discounts for repeated context) and batch processing (accept a delay, cut costs ~50%). These rates appear in the table above under "Cached /1M" and "Batch /1M." Caching pays off quickly if your prompts share a long system prompt or document prefix across many calls.

Z.ai has historically adjusted prices when launching new model generations, often cutting rates to stay competitive. Buzzi.ai snapshots pricing daily — you can subscribe to price-drop alerts on any Z.ai model using the "Alert me" button on its detail page.

Use the main comparison wizard to run the same calculator across Z.ai, Anthropic, Google, Meta, Mistral, and 20+ other providers. Set your exact workload and get a ranked cost chart in under a minute.

GLM 5.1, GLM 5V Turbo, GLM 5 Turbo, GLM 5 and 9 more offer an extended thinking or reasoning mode. The model spends extra compute "thinking" before answering — slower and more expensive, but meaningfully better on complex, multi-step problems. Standard mode is faster and cheaper for routine tasks.

Look wider

Compare Z.ai against other providers.

Open the full wizard — pick a use case, set your usage, and cross-compare against OpenAI, Anthropic, Google, and 20+ more.

Open the comparison wizard

About

Insights

Streamline

Integration

Solutions

Healthcare AI

Use Cases

Industries