How we score risk
Methodology.
The exact formulas, sources and safeguards behind your risk scores and governance scorecard β reproducible, auditable and published in full.
How we score risk
The exact formulas, sources and safeguards behind your risk scores and governance scorecard β reproducible, auditable and published in full.
Per-tool risk model
Every tool starts with a base risk score between 1.0 and 5.0 reflecting its intrinsic data-handling profile. Four multipliers sharpen that into your organisationβs realised risk.
Free vs paid-consumer vs enterprise. Captures whether the tier contractually excludes training-on-inputs and supports SSO, audit logs and admin controls.
Rises with PHI, MNPI, CJIS, PCI, GDPR-EU or generic PII. Tools handling regulated data get scored more harshly than those that only see marketing copy.
Amplifies realised risk when controls are weak. A dangerous tool with strong controls lands lower than a mild tool with zero oversight.
Every tier-specific attribute that pushed a score (no-DPA, no-SSO, no-data-residency) is recorded as a rationale string on the tool profile.
Risk bands
Score < 4
Everyone uses it safely β enterprise tier, DPA in place, admin controls wired up.
4 β 5.99
Acceptable when data is bounded. Watch tier drift and shadow installs.
6 β 7.99
Needs intervention. Move to a sanctioned tier or replace with a safer alternative.
β₯ 8
Get it off your network today. Training-on-inputs default, regulated data, zero admin control.
Governance score
Each answer converts to a magnitude between 0 and 4, weighted by the questionβs weight, expressed as a 0-100 score. The result feeds directly into the governance-gap multiplier.
Question domains cover policy, procurement, visibility, access, privacy, training, incident response and monitoring.
Policy Β· Public AI-use policy signed by exec team
Procurement Β· AI tools go through vendor review
Visibility Β· Inventory of tools touching customer data
Access Β· SSO enforced on sanctioned AI tools
Privacy Β· DPAs on file for every in-use tool
Training Β· Annual AI-use training for every employee
+ 6 more questions covering monitoring, incident response, training and drills.
Cost-of-risk formulas
Per-tool risk and overall organisational risk are both reproducible from your answers. Here are the formulas verbatim.
# Per-tool risk (clamped 1.0 - 10.0)
tool_risk = base_risk
Γ tier_multiplier # 0.75 free-consumer 1.4
Γ regulated_data_multiplier # 1.0 (no regulated) 1.8 (multi-reg)
Γ governance_gap_multiplier # 0.85 (mature) 1.35 (none)
# Overall organisation risk
if any(tool_risk >= 8.0): overall = CRITICAL
elif count(tool_risk >= 6.0) >= 3: overall = HIGH
else: overall = weighted_avg(tools)
Γ (1.0 + 0.01 Γ (70 - gov_score))The weighted_avg step favours higher-risk tools so a single critical outlier isnβt diluted by dozens of safe ones.
Registry verification
Step 01
Every 90 days
Each registry entry is checked against vendor-published terms, pricing, and certifications. Timestamped as last_verified_at.
Step 02
Weekly
Flags entries past the 90-day window for editorial re-review. No entry is allowed to decay past that window silently.
Step 03
Ad hoc
New incidents, policy changes and risk shifts land in ai_tools_changelog and surface on tool profile pages within 24 hours.
Peer benchmarks
Benchmarks aggregate across audits completed in the last 12 months, bucketed by industry + company size. Minimum sample is 15 per bucket β below that we suppress the number rather than mislead.
Last 12 months Β· rolling window
Bucketed by industry + company size
Minimum 15 audits per bucket
Suppressed when data is thinner than threshold
Framework alignment
Recommendations carry the NIST AI RMF sub-category they address so your risk team can map findings onto existing tracking.
Tools that touch high-risk uses (Annex III) are flagged with the relevant Article references so compliance sees the exposure immediately.
Governance findings map back to the specific AIMS controls they belong to β useful for teams already on the ISO path.
Sector overlays are auto-applied based on your industry + regulated-data selections in the context step.
What we donβt do
Tool ranking is never influenced by commercial relationships. The weights are published and the data is auditable.
If a tool hasnβt been verified, itβs flagged β not invented. Missing inputs fall back to the category median rather than a fabricated number.
Every score comes from a formula you can reproduce. If you disagree with the output, you can trace exactly which multiplier changed it.
FAQ
Every 90 days at minimum, with a weekly freshness cron that flags entries needing attention. Changes β new incidents, policy updates, tier reshuffles β land in the changelog and propagate to tool profiles within 24 hours.
Each tier is scored independently. The "ChatGPT" row in your audit resolves to the tier your team actually uses. If you mix tiers (some on free, some on Enterprise), we score both and flag the exposure from the weaker tier.
No. Inclusion and scoring are editorial. Vendors can request review of inaccuracies and we respond within 48 hours, but they cannot alter the weights or the score formula.
The governance-gap multiplier is applied to every tool. Stronger controls (DPA on file, SSO enforced, quarterly vendor review) bring down realised risk for the same tool. The rationale string on each tool profile shows the multipliers that produced the number.
15 completed audits per industry + size bucket. Below that threshold we suppress the benchmark rather than mislead with thin data.
Yes β see the Cost-of-risk formula section on this page. Every multiplier is published and the math is reproducible from our documentation.
Found an error?
Spotted a stale score, an out-of-date DPA or a missing incident? Email us with the link to the source. We correct within 48 hours and log every change.
hello@buzzi.aiReady to audit?
The formulas above plug into your actual answers β with a risk score per tool, a governance scorecard and a block-list for IT at the end.
Back to audit