Back to the calculator

How we score SLM vs LLM.

Rule-based and reproducible. Same inputs, same shortlist, every time. Hard filters cut anything that fails residency, language, or accuracy floor. Soft weighted scores rank the rest by cost (35%), accuracy on your task (35%), latency fit (15%), and a sovereignty bonus (15%) when residency is selected.

Cost formulas

Per-model monthly cost

monthly_api_cost =
    (queries ร— avg_input_tokens  / 1M) ร— price_per_M_input
  + (queries ร— avg_output_tokens / 1M) ร— price_per_M_output

# Caching discount โ€” up to 90% off input ร— cache_hit_rate
monthly_api_cost *= (1 - cache_hit_rate ร— 0.9)

# Batch discount โ€” up to 50% off when "Batch-tolerant" is selected
monthly_api_cost *= (1 - batch_rate ร— 0.5)

# Self-hosted (open-weight SLMs)
self_hosted_cost =
    monthly_gpu_cost_usd
  + (setup_effort_days ร— 8h ร— eng_hourly_rate_usd) / 12

# Effective cost by deployment mode
effective_monthly_cost = {
  api:                monthly_api_cost,
  managed-inference:  monthly_api_cost ร— 1.10,
  self-hosted-gpu:    self_hosted_cost,
  on-prem:            self_hosted_cost ร— 1.25,
  air-gapped:         self_hosted_cost ร— 1.40,
}[chosen_deployment_mode]

Defaults: cache_hit_rate = 0; batch_rate = 1 when โ€œBatch-tolerantโ€ is selected, else 0; eng_hourly_rate_usd = 150.

Hosting matrix

Volume ร— latency ร— residency โ†’ hosting mode

VolumeLatencyResidencyRecommended
<10KBatch OKNo constraintDirect cloud API
10Kโ€“100K<2sNo constraintDirect cloud API
100Kโ€“1M<2sEUManaged inference (EU)
100Kโ€“1M<500msAnyManaged inference (low-latency)
1Mโ€“10M<500msNo constraintSelf-hosted GPU
1Mโ€“10M<2sResidency-boundSelf-hosted GPU in region
10M+AnyAnySelf-hosted GPU fleet
AnyAnyOn-premSelf-hosted on-prem (open SLM)
AnyAnyAir-gappedSLM on-prem disconnected

Integrity

Three commitments.

No vendor sponsorships.

No vendor pays for placement on this tool, methodology page, or in any model card.

Pricing is not pay-to-play.

We list every provider we track. Pricing is sourced from vendor pages and refreshed monthly with a daily snapshot.

Benchmarks cited, not invented.

Every score has a source URL and capture date. We use Artificial Analysis, HELM, HumanEval, AgentBench, and the HuggingFace Open LLM Leaderboard.

Sources

Where the numbers come from.

  • Pricing โ€” vendor product / pricing pages (OpenAI, Anthropic, Google, Mistral, TII Falcon, Alibaba Qwen, Microsoft Phi). Refreshed monthly with daily snapshot diff.
  • Benchmarks โ€” Artificial Analysis, Stanford HELM, HumanEval / MBPP, AgentBench, HuggingFace Open LLM Leaderboard. Capture date stored per row.
  • Hosted regions โ€” vendor compliance and residency documentation; cross-checked quarterly.
  • Self-host estimates โ€” published vendor sizing guides + Buzzi production deployments. Defaults assume ~40% utilization.

Found a number that's wrong? Email hello@buzzi.ai with the source โ€” we publish corrections within 48 hours.