Methodology.
Exactly how the comparison, the calculators, and the Best-for rankings are built — so you, and the AI engines citing our data, can trust the output.
Exactly how the comparison, the calculators, and the Best-for rankings are built — so you, and the AI engines citing our data, can trust the output.
Where the data comes from
Every model row has a last_verified_at timestamp. Models not re-verified within 30 days are flagged in the admin UI for refresh.
Official provider pricing pages and API docs. Input, output, cached, batch — all as published.
Provider model cards first, then widely-cited third-party leaderboards (MMLU, SWE-Bench, HumanEval, GPQA, AIME, MMMU, DocVQA). Source noted on each row.
Provider trust centers and certification pages: SOC 2, HIPAA, GDPR, FedRAMP, regional data-residency.
Update cadence
Step 01
Daily · 02:00 UTC
Captures current price and status; diffs against yesterday to detect changes.
Step 02
Daily · 02:30 UTC
Emails subscribed users about price changes, deprecations, and sunsets.
Step 03
Monthly · 1st @ 09:00 UTC
One short email with the month’s price moves, new launches, and quiet deprecations.
Step 04
Ad hoc
New launches land the same day. Every change is written to a public audit log.
Scoring for Best-for pages
Each Best-for page defines a set of pillars with explicit weights (visible at the top of the page). For tasks where quality dominates economics — reasoning, agents, healthcare — price is weighted under 30%. For tasks where price dominates — cheap-bulk, long-context with large input — price is weighted above 40%.
Missing benchmarks are treated as the category median. We don’t assume a model is bad because a score isn’t published.
MMLU, SWE-Bench, HumanEval, GPQA, AIME, MMMU, DocVQA.
Input + output per 1M tokens.
Context window size.
Function calling, JSON mode, vision, structured output.
Buzzi Intelligence Index
The Quality-vs-Price scatter on the results page uses our own composite score (0–100) built from published benchmark scores. Missing benchmarks fall back to the category median so a model isn’t penalized for data we don’t have.
We don’t import Artificial Analysis’s index or any third-party composite. The math is ours and the inputs are auditable.
Broad knowledge and reasoning — 57 subjects.
Expert-level science questions (physics, chemistry, biology).
Python code generation from docstrings.
Real-world GitHub issue-fixing tasks.
Multimodal (text + image) college-level problems.
High-school math olympiad problems.
Weights sum to 1.1 before normalization, so a model that covers all six benchmarks with 100/100 scores scores exactly 100. Missing benchmarks cause the denominator to shrink proportionally.
Cost formulas
Volume cost uses the standard per-million-tokens model. Switch cost assumes a 40-hour engineering week at your chosen rate, with a configurable risk premium.
monthly_cost = (uncached_input_tokens / 1M) × input_price_per_1M
+ (cached_input_tokens / 1M) × cached_input_price_per_1M
+ (batch_input_tokens / 1M) × batch_input_price_per_1M
+ (standard_output_tokens / 1M) × output_price_per_1M
+ (batch_output_tokens / 1M) × batch_output_price_per_1MToken counts come from the provider’s own tokenizer when we have it (tiktoken, o200k), otherwise a family coefficient with a ±7% error envelope.
What we don’t do
We do not take money from LLM providers. No affiliate fees, no paid placements.
We do not weight gut feelings. Every rank is a formula you can audit.
If a score has no citable source, we treat the model as median rather than invent a number.
FAQ
The non-obvious parts of sourcing, scoring, and refreshing the data.
Get instant answers from our AI agent
Brand marks. Provider logos shown across the comparison tool are used under nominative fair use for factual product comparison. All marks are property of their respective owners. Where an official logo isn’t available, we display a generated monogram wordmark as a placeholder.
Found an error?
Spotted a missing model or a stale price? Email us with a link to the source. We typically correct within 24 hours.
hello@buzzi.aiOpen data
The underlying data is available as a JSON feed under CC BY 4.0 with attribution — free for research, products, and AI engines.
Open JSON feed