D'où viennent les données
Sourcé, horodaté, auditable.
Every model row has a last_verified_at timestamp. Models not re-verified within 30 days are flagged in the admin UI for refresh.
Pricing & specs
Official provider pricing pages and API docs. Input, output, cached, batch — all as published.
Benchmarks
Provider model cards first, then widely-cited third-party leaderboards (MMLU, SWE-Bench, HumanEval, GPQA, AIME, MMMU, DocVQA). Source noted on each row.
Regions & compliance
Provider trust centers and certification pages: SOC 2, HIPAA, GDPR, FedRAMP, regional data-residency.
Cadence de mise à jour
Rafraîchi chaque matin, audité à chaque changement.
Étape 01
Daily · 02:00 UTC
Snapshot cron
Captures current price and status; diffs against yesterday to detect changes.
Étape 02
Daily · 02:30 UTC
Alerts cron
Emails subscribed users about price changes, deprecations, and sunsets.
Étape 03
Monthly · 1st @ 09:00 UTC
Market Pulse newsletter
One short email with the month’s price moves, new launches, and quiet deprecations.
Étape 04
Ad hoc
Admin edits
New launches land the same day. Every change is written to a public audit log.
Notation des pages Best-for
Les poids correspondent à la tâche.
Each Best-for page defines a set of pillars with explicit weights (visible at the top of the page). For tasks where quality dominates economics — reasoning, agents, healthcare — price is weighted under 30%. For tasks where price dominates — cheap-bulk, long-context with large input — price is weighted above 40%.
Missing benchmarks are treated as the category median. We don't assume a model is bad because a score isn't published.
Quality benchmarks
MMLU, SWE-Bench, HumanEval, GPQA, AIME, MMMU, DocVQA.
Price
Input + output per 1M tokens.
Memory
Context window size.
Capabilities
Function calling, JSON mode, vision, structured output.
Buzzi Intelligence Index
Un score, six benchmarks, poids explicites.
The Quality-vs-Price scatter on the results page uses our own composite score (0–100) built from published benchmark scores. Missing benchmarks fall back to the category median so a model isn't penalized for data we don't have.
We don't import Artificial Analysis's index or any third-party composite. The math is ours and the inputs are auditable.
- 25%
MMLU
Broad knowledge and reasoning — 57 subjects.
- 20%
GPQA
Expert-level science questions (physics, chemistry, biology).
- 20%
HumanEval
Python code generation from docstrings.
- 20%
SWE-Bench
Real-world GitHub issue-fixing tasks.
- 15%
MMMU
Multimodal (text + image) college-level problems.
- 10%
AIME
High-school math olympiad problems.
Weights sum to 1.1 before normalization, so a model that covers all six benchmarks with 100/100 scores scores exactly 100. Missing benchmarks cause the denominator to shrink proportionally.
Formules de coût
Les calculs, détaillés.
Volume cost uses the standard per-million-tokens model. Switch cost assumes a 40-hour engineering week at your chosen rate, with a configurable risk premium.
monthly_cost = (uncached_input_tokens / 1M) × input_price_per_1M
+ (cached_input_tokens / 1M) × cached_input_price_per_1M
+ (batch_input_tokens / 1M) × batch_input_price_per_1M
+ (standard_output_tokens / 1M) × output_price_per_1M
+ (batch_output_tokens / 1M) × batch_output_price_per_1MToken counts come from the provider's own tokenizer when we have it (tiktoken, o200k), otherwise a family coefficient with a ±7% error envelope.
Ce que nous ne faisons pas
Trois règles qui gardent les données honnêtes.
No sponsorships.
We do not take money from LLM providers. No affiliate fees, no paid placements.
No vibes.
We do not weight gut feelings. Every rank is a formula you can audit.
No guessed benchmarks.
If a score has no citable source, we treat the model as median rather than invent a number.
Questions fréquentes
Comment nous travaillons — en détail.
The non-obvious parts of sourcing, scoring, and refreshing the data.
Get instant answers from our AI agent
Brand marks. Provider logos shown across the comparison tool are used under nominative fair use for factual product comparison. All marks are property of their respective owners. Where an official logo isn't available, we display a generated monogram wordmark as a placeholder.
Erreur repérée ?
Les corrections sont bienvenues.
Spotted a missing model or a stale price? Email us with a link to the source. We typically correct within 24 hours.
hello@buzzi.ai