Methodology v1.0 · April 2026

12-Point AI Vendor Trust Score methodology.

How we turn 12 wizard answers into a verdict, a 7-category scorecard, and a contract-clause cheat sheet. The flag library, the scorecard rollup, the verdict cascade, and the clause mapping are all open and deterministic. Same answers, same verdict, every time.

Run the wizard Jump to flag library

The 12 questions

Q1 vendor archetype, Q2 engagement type, Q3 budget band, Q4 who is leading the conversation, Q5 working example, Q6 pricing model, Q7 underlying AI model, Q8 data handling, Q9 ongoing costs, Q10 IP ownership, Q11 timeline, Q12 references. Wording adapts for six vendor archetypes (consultancy, agency, SaaS, freelancer, staff-aug, mixed) so the question feels native; the option codes stay constant so the engine is variant-stable.

Each option encodes a discrete declarative answer (e.g. live_demo_client, slides_only). The engine reads option codes only - we never let an LLM interpret free-form vendor answers, so reproducibility is total.

How flags fire

Each flag is a deterministic predicate over the 12 answer codes. Predicates are pure - no randomness, no LLM call, no time dependency. A flag has three properties: severity (red or yellow), the categories it touches, and a list of references back to the procurement-failure case study that justifies the rule.

Flags do not stack - each flag fires at most once. A pitch can trigger zero, one, or many flags. The full list (currently 25) is below.

How we score: the verdict cascade

After flags are evaluated, we tally red and yellow counts. The verdict is a deterministic function of those two integers:

if red >= 3                  -> high_risk        (walk away unless amended)
if red >= 1                  -> concerns          (resolve before signing)
if red == 0 && yellow >= 4   -> mixed             (ask the recommended questions)
if red == 0 && yellow <= 1   -> looks_legit       (final due diligence)
otherwise                    -> worth_proceeding  (with the open questions)

The 7-category scorecard tracks per-category red/yellow tallies. A category with zero of both is marked clean. Categories never aggregate into a numeric score - they are diagnostic, not diagnostic + summative.

How we recommend clauses

Every triggered flag maps to one or more contract clauses in the 21-clause library. The mapping is many-to-many: token_cost_trap recommends monthly_cost_cap, quarterly_cost_review, and 30day_cost_notice. unclear_ip_red recommends ip_ownership_clause, derivative_works_clause, and source_code_escrow.

The PDF report ships every recommended clause with US-default sample copy. The clause library is below for quick reference; have your counsel adapt the language to your jurisdiction before signing.

Rule library

All 25 flag patterns

Each flag is a deterministic predicate over the 12 answers, mapped to one or more risk categories. References cite the procurement-failure case studies that justify the rule.

No working example to showred
customer proof
- Gartner: AI vendor due diligence (2025)
Slides-only sales pitchyellow
customer proof · technical credibility
- CIO: vetting AI vendors in 2026 (2026)
Generic demo, not yoursyellow
customer proof
No references at six-figure budgetred
customer proof · relationship trust
- BCG: AI procurement risk (2025)
No customer referencesyellow
customer proof
LinkedIn-only referencesyellow
customer proof
Only anonymized referencesyellow
customer proof
Vague on the underlying modelred
technical credibility
- ZDNet: questions every AI buyer must ask (2026)
Claims a proprietary modelyellow
technical credibility
Switches between frontier modelsyellow
technical credibility · cost predictability
Data handling answer was vaguered
technical credibility · operational competence
You did not ask about data handlingyellow
operational competence
You did not ask about the modelyellow
operational competence
Token-cost pass-through trapred
cost predictability · pricing transparency
- Real case: SaaS token cost story (2026)
All-inclusive pricing too good to be trueyellow
pricing transparency · cost predictability
Ongoing pricing not discussedred
pricing transparency · relationship trust
Vendor keeps the IPyellow
ip protection
IP ownership is unclearred
ip protection · relationship trust
- Law360: AI vendor IP disputes (2025)
Sales-led pitch with no proofred
operational competence · relationship trust · customer proof
Open-ended timelineyellow
operational competence · cost predictability
Unknown delivery leadyellow
relationship trust
Large budget without milestonesred
cost predictability · operational competence
Looks like a thin wrapper at premium pricered
technical credibility · pricing transparency
Staff aug without named engineersyellow
operational competence
Freelancer at six-figure budgetred
operational competence · relationship trust

Contract clauses

All 21 contract clauses

US-default jurisdiction. Sample copy ships in the PDF. Have your counsel adapt the language to your jurisdiction before signing.

Monthly cost cap
cost predictability
Quarterly cost review
cost predictability
30-day cost-change notice
cost predictability
Cost assumption disclosure
pricing transparency
Itemized cost breakdown
pricing transparency
IP ownership
ip protection
Derivative works
ip protection
Source code escrow
ip protection
Buyout option
ip protection
Data portability
ip protection
Data processing addendum
operational competence
Data residency
operational competence
SLA / uptime
operational competence
Reference performance guarantee
customer proof
Pilot-first clause
customer proof
Fixed milestones
operational competence
Milestone payments
operational competence
SOW amendment process
operational competence
Key personnel
relationship trust
Model disclosure
technical credibility
Termination for convenience
relationship trust

Honest about limits

What this tool is not.

This is not legal advice. The clause library ships US-default sample copy as a starting point. Engaged counsel must adapt language to your jurisdiction, your industry, and the specific contract.

This is not a credit check. The scorecard reads how the vendor pitched, not how the vendor is run. SOC 2, financial stability, and litigation history live in your standard procurement workflow.

This is not a verdict on the vendor. A high_risk verdict means the pitch reads risky on the patterns we encoded. Vendors can answer the open questions, amend the contract, and recover.

Versioning

Methodology v1.0 - April 2026.

Refresh cadence is quarterly. New procurement-failure patterns get added; outdated flags get retired. Each release ships with a numbered version, a change log, and the date of the next refresh.

A self-pass test runs in CI: buzzi.ai answers the 12 questions in writing and the engine must return looks_legit or worth_proceeding with zero red flags. If the test ever fails, we revisit the rules before shipping.

Citations

Where the rules come from.

Every flag carries reference URLs to the procurement-failure case studies that justify the rule. They live inline in the flag library above; click any flag to see its sources.

Ready?

Score a vendor in five minutes.

Twelve questions. Free preview. Methodology you just read - applied to your pitch.

Run the 12-question wizard

About

Insights

Streamline

Integration

Solutions

Healthcare AI

Use Cases

Industries