How we score SLM vs LLM.

Rule-based and reproducible. Same inputs, same shortlist, every time. Hard filters cut anything that fails residency, language, or accuracy floor. Soft weighted scores rank the rest by cost (35%), accuracy on your task (35%), latency fit (15%), and a sovereignty bonus (15%) when residency is selected.

Cost formulas

Per-model monthly cost

monthly_api_cost =
    (queries × avg_input_tokens  / 1M) × price_per_M_input
  + (queries × avg_output_tokens / 1M) × price_per_M_output

# Caching discount — up to 90% off input × cache_hit_rate
monthly_api_cost *= (1 - cache_hit_rate × 0.9)

# Batch discount — up to 50% off when "Batch-tolerant" is selected
monthly_api_cost *= (1 - batch_rate × 0.5)

# Self-hosted (open-weight SLMs)
self_hosted_cost =
    monthly_gpu_cost_usd
  + (setup_effort_days × 8h × eng_hourly_rate_usd) / 12

# Effective cost by deployment mode
effective_monthly_cost = {
  api:                monthly_api_cost,
  managed-inference:  monthly_api_cost × 1.10,
  self-hosted-gpu:    self_hosted_cost,
  on-prem:            self_hosted_cost × 1.25,
  air-gapped:         self_hosted_cost × 1.40,
}[chosen_deployment_mode]

Defaults: cache_hit_rate = 0; batch_rate = 1 when “Batch-tolerant” is selected, else 0; eng_hourly_rate_usd = 150.

Hosting matrix

Volume × latency × residency → hosting mode

Volume	Latency	Residency	Recommended
<10K	Batch OK	No constraint	Direct cloud API
10K–100K	<2s	No constraint	Direct cloud API
100K–1M	<2s	EU	Managed inference (EU)
100K–1M	<500ms	Any	Managed inference (low-latency)
1M–10M	<500ms	No constraint	Self-hosted GPU
1M–10M	<2s	Residency-bound	Self-hosted GPU in region
10M+	Any	Any	Self-hosted GPU fleet
Any	Any	On-prem	Self-hosted on-prem (open SLM)
Any	Any	Air-gapped	SLM on-prem disconnected

Integrity

Three commitments.

No vendor sponsorships.

No vendor pays for placement on this tool, methodology page, or in any model card.

Pricing is not pay-to-play.

We list every provider we track. Pricing is sourced from vendor pages and refreshed monthly with a daily snapshot.

Benchmarks cited, not invented.

Every score has a source URL and capture date. We use Artificial Analysis, HELM, HumanEval, AgentBench, and the HuggingFace Open LLM Leaderboard.

Sources

Where the numbers come from.

Pricing — vendor product / pricing pages (OpenAI, Anthropic, Google, Mistral, TII Falcon, Alibaba Qwen, Microsoft Phi). Refreshed monthly with daily snapshot diff.
Benchmarks — Artificial Analysis, Stanford HELM, HumanEval / MBPP, AgentBench, HuggingFace Open LLM Leaderboard. Capture date stored per row.
Hosted regions — vendor compliance and residency documentation; cross-checked quarterly.
Self-host estimates — published vendor sizing guides + Buzzi production deployments. Defaults assume ~40% utilization.

Found a number that's wrong? Email hello@buzzi.ai with the source — we publish corrections within 48 hours.

About

Insights

Streamline

Integration

Solutions

Healthcare AI

Use Cases

Industries