Back to the calculator

How we score SLM vs LLM.

Rule-based and reproducible. Same inputs, same shortlist, every time. Hard filters cut anything that fails residency, language, or accuracy floor. Soft weighted scores rank the rest by cost (35%), accuracy on your task (35%), latency fit (15%), and a sovereignty bonus (15%) when residency is selected.

Cost formulas

Per-model monthly cost

monthly_api_cost =
    (queries Γ— avg_input_tokens  / 1M) Γ— price_per_M_input
  + (queries Γ— avg_output_tokens / 1M) Γ— price_per_M_output

# Caching discount β€” up to 90% off input Γ— cache_hit_rate
monthly_api_cost *= (1 - cache_hit_rate Γ— 0.9)

# Batch discount β€” up to 50% off when "Batch-tolerant" is selected
monthly_api_cost *= (1 - batch_rate Γ— 0.5)

# Self-hosted (open-weight SLMs)
self_hosted_cost =
    monthly_gpu_cost_usd
  + (setup_effort_days Γ— 8h Γ— eng_hourly_rate_usd) / 12

# Effective cost by deployment mode
effective_monthly_cost = {
  api:                monthly_api_cost,
  managed-inference:  monthly_api_cost Γ— 1.10,
  self-hosted-gpu:    self_hosted_cost,
  on-prem:            self_hosted_cost Γ— 1.25,
  air-gapped:         self_hosted_cost Γ— 1.40,
}[chosen_deployment_mode]

Defaults: cache_hit_rate = 0; batch_rate = 1 when β€œBatch-tolerant” is selected, else 0; eng_hourly_rate_usd = 150.

Hosting matrix

Volume Γ— latency Γ— residency β†’ hosting mode

VolumeLatencyResidencyRecommended
<10KBatch OKNo constraintDirect cloud API
10K–100K<2sNo constraintDirect cloud API
100K–1M<2sEUManaged inference (EU)
100K–1M<500msAnyManaged inference (low-latency)
1M–10M<500msNo constraintSelf-hosted GPU
1M–10M<2sResidency-boundSelf-hosted GPU in region
10M+AnyAnySelf-hosted GPU fleet
AnyAnyOn-premSelf-hosted on-prem (open SLM)
AnyAnyAir-gappedSLM on-prem disconnected

Integrity

Three commitments.

No vendor sponsorships.

No vendor pays for placement on this tool, methodology page, or in any model card.

Pricing is not pay-to-play.

We list every provider we track. Pricing is sourced from vendor pages and refreshed monthly with a daily snapshot.

Benchmarks cited, not invented.

Every score has a source URL and capture date. We use Artificial Analysis, HELM, HumanEval, AgentBench, and the HuggingFace Open LLM Leaderboard.

Sources

Where the numbers come from.

  • Pricing β€” vendor product / pricing pages (OpenAI, Anthropic, Google, Mistral, TII Falcon, Alibaba Qwen, Microsoft Phi). Refreshed monthly with daily snapshot diff.
  • Benchmarks β€” Artificial Analysis, Stanford HELM, HumanEval / MBPP, AgentBench, HuggingFace Open LLM Leaderboard. Capture date stored per row.
  • Hosted regions β€” vendor compliance and residency documentation; cross-checked quarterly.
  • Self-host estimates β€” published vendor sizing guides + Buzzi production deployments. Defaults assume ~40% utilization.

Found a number that's wrong? Email hello@buzzi.ai with the source β€” we publish corrections within 48 hours.