How we score risk

Methodology.

The exact formulas, sources and safeguards behind your risk scores and governance scorecard — reproducible, auditable and published in full.

Per-tool risk model

Four multipliers, applied in sequence.

Every tool starts with a base risk score between 1.0 and 5.0 reflecting its intrinsic data-handling profile. Four multipliers sharpen that into your organisation’s realised risk.

  • Tier multiplier

    Free vs paid-consumer vs enterprise. Captures whether the tier contractually excludes training-on-inputs and supports SSO, audit logs and admin controls.

  • Regulated-data multiplier

    Rises with PHI, MNPI, CJIS, PCI, GDPR-EU or generic PII. Tools handling regulated data get scored more harshly than those that only see marketing copy.

  • Governance-gap multiplier

    Amplifies realised risk when controls are weak. A dangerous tool with strong controls lands lower than a mild tool with zero oversight.

  • Tier-level flags

    Every tier-specific attribute that pushed a score (no-DPA, no-SSO, no-data-residency) is recorded as a rationale string on the tool profile.

Risk bands

Final score clamps to 1-10, mapped to four bands.

  • Low

    Score < 4

    Everyone uses it safely — enterprise tier, DPA in place, admin controls wired up.

  • Medium

    4 – 5.99

    Acceptable when data is bounded. Watch tier drift and shadow installs.

  • High

    6 – 7.99

    Needs intervention. Move to a sanctioned tier or replace with a safer alternative.

  • Critical

    ≥ 8

    Get it off your network today. Training-on-inputs default, regulated data, zero admin control.

Governance score

Twelve questions. Zero to a hundred.

Each answer converts to a magnitude between 0 and 4, weighted by the question’s weight, expressed as a 0-100 score. The result feeds directly into the governance-gap multiplier.

Question domains cover policy, procurement, visibility, access, privacy, training, incident response and monitoring.

  • 01

    Policy · Public AI-use policy signed by exec team

  • 02

    Procurement · AI tools go through vendor review

  • 03

    Visibility · Inventory of tools touching customer data

  • 04

    Access · SSO enforced on sanctioned AI tools

  • 05

    Privacy · DPAs on file for every in-use tool

  • 06

    Training · Annual AI-use training for every employee

+ 6 more questions covering monitoring, incident response, training and drills.

Cost-of-risk formulas

The math, laid out.

Per-tool risk and overall organisational risk are both reproducible from your answers. Here are the formulas verbatim.

# Per-tool risk (clamped 1.0 - 10.0)
tool_risk = base_risk
          × tier_multiplier           # 0.75  free-consumer  1.4
          × regulated_data_multiplier # 1.0   (no regulated)  1.8 (multi-reg)
          × governance_gap_multiplier # 0.85  (mature)        1.35 (none)

# Overall organisation risk
if any(tool_risk >= 8.0):              overall = CRITICAL
elif count(tool_risk >= 6.0) >= 3:     overall = HIGH
else:                                   overall = weighted_avg(tools)
                                          × (1.0 + 0.01 × (70 - gov_score))

The weighted_avg step favours higher-risk tools so a single critical outlier isn’t diluted by dozens of safe ones.

Registry verification

Verified every 90 days. Audited every change.

  1. Step 01

    Every 90 days

    Editorial re-verification

    Each registry entry is checked against vendor-published terms, pricing, and certifications. Timestamped as last_verified_at.

  2. Step 02

    Weekly

    Freshness cron

    Flags entries past the 90-day window for editorial re-review. No entry is allowed to decay past that window silently.

  3. Step 03

    Ad hoc

    Incident & change log

    New incidents, policy changes and risk shifts land in ai_tools_changelog and surface on tool profile pages within 24 hours.

Peer benchmarks

Anonymous. Bucketed. Suppressed when thin.

Benchmarks aggregate across audits completed in the last 12 months, bucketed by industry + company size. Minimum sample is 15 per bucket — below that we suppress the number rather than mislead.

  • Last 12 months · rolling window

  • Bucketed by industry + company size

  • Minimum 15 audits per bucket

  • Suppressed when data is thinner than threshold

Framework alignment

Tagged to the frameworks auditors already reference.

  • NIST AI RMF

    Govern · Map · Measure · Manage

    Recommendations carry the NIST AI RMF sub-category they address so your risk team can map findings onto existing tracking.

  • EU AI Act

    Article-level tagging

    Tools that touch high-risk uses (Annex III) are flagged with the relevant Article references so compliance sees the exposure immediately.

  • ISO/IEC 42001

    AIMS control mapping

    Governance findings map back to the specific AIMS controls they belong to — useful for teams already on the ISO path.

  • Sector rules

    HIPAA · SR 11-7 · CJIS · PCI-DSS

    Sector overlays are auto-applied based on your industry + regulated-data selections in the context step.

What we don’t do

Three rules that keep the data honest.

  • No vendor pay-to-play.

    Tool ranking is never influenced by commercial relationships. The weights are published and the data is auditable.

  • No guessed scores.

    If a tool hasn’t been verified, it’s flagged — not invented. Missing inputs fall back to the category median rather than a fabricated number.

  • No vibes.

    Every score comes from a formula you can reproduce. If you disagree with the output, you can trace exactly which multiplier changed it.

FAQ

Methodology — in more detail.

How often are tools re-verified?

Every 90 days at minimum, with a weekly freshness cron that flags entries needing attention. Changes — new incidents, policy updates, tier reshuffles — land in the changelog and propagate to tool profiles within 24 hours.

What happens when a tool splits its tiers (free vs paid vs enterprise)?

Each tier is scored independently. The "ChatGPT" row in your audit resolves to the tier your team actually uses. If you mix tiers (some on free, some on Enterprise), we score both and flag the exposure from the weaker tier.

Do you take sponsorships from vendors in the registry?

No. Inclusion and scoring are editorial. Vendors can request review of inaccuracies and we respond within 48 hours, but they cannot alter the weights or the score formula.

Why does my score change when I change governance answers?

The governance-gap multiplier is applied to every tool. Stronger controls (DPA on file, SSO enforced, quarterly vendor review) bring down realised risk for the same tool. The rationale string on each tool profile shows the multipliers that produced the number.

What’s the minimum sample size for peer benchmarks?

15 completed audits per industry + size bucket. Below that threshold we suppress the benchmark rather than mislead with thin data.

Can I see the exact formula?

Yes — see the Cost-of-risk formula section on this page. Every multiplier is published and the math is reproducible from our documentation.

Found an error?

Corrections welcome.

Spotted a stale score, an out-of-date DPA or a missing incident? Email us with the link to the source. We correct within 48 hours and log every change.

hello@buzzi.ai

Ready to audit?

Start the 12-minute audit.

The formulas above plug into your actual answers — with a risk score per tool, a governance scorecard and a block-list for IT at the end.

Back to audit