Methodology v1.0 · April 2026

12-Point AI Vendor Trust Score methodology.

How we turn 12 wizard answers into a verdict, a 7-category scorecard, and a contract-clause cheat sheet. The flag library, the scorecard rollup, the verdict cascade, and the clause mapping are all open and deterministic. Same answers, same verdict, every time.

The 12 questions

Q1 vendor archetype, Q2 engagement type, Q3 budget band, Q4 who is leading the conversation, Q5 working example, Q6 pricing model, Q7 underlying AI model, Q8 data handling, Q9 ongoing costs, Q10 IP ownership, Q11 timeline, Q12 references. Wording adapts for six vendor archetypes (consultancy, agency, SaaS, freelancer, staff-aug, mixed) so the question feels native; the option codes stay constant so the engine is variant-stable.

Each option encodes a discrete declarative answer (e.g. live_demo_client, slides_only). The engine reads option codes only - we never let an LLM interpret free-form vendor answers, so reproducibility is total.

How flags fire

Each flag is a deterministic predicate over the 12 answer codes. Predicates are pure - no randomness, no LLM call, no time dependency. A flag has three properties: severity (red or yellow), the categories it touches, and a list of references back to the procurement-failure case study that justifies the rule.

Flags do not stack - each flag fires at most once. A pitch can trigger zero, one, or many flags. The full list (currently 25) is below.

How we score: the verdict cascade

After flags are evaluated, we tally red and yellow counts. The verdict is a deterministic function of those two integers:

if red >= 3                  -> high_risk        (walk away unless amended)
if red >= 1                  -> concerns          (resolve before signing)
if red == 0 && yellow >= 4   -> mixed             (ask the recommended questions)
if red == 0 && yellow <= 1   -> looks_legit       (final due diligence)
otherwise                    -> worth_proceeding  (with the open questions)

The 7-category scorecard tracks per-category red/yellow tallies. A category with zero of both is marked clean. Categories never aggregate into a numeric score - they are diagnostic, not diagnostic + summative.

How we recommend clauses

Every triggered flag maps to one or more contract clauses in the 21-clause library. The mapping is many-to-many: token_cost_trap recommends monthly_cost_cap, quarterly_cost_review, and 30day_cost_notice. unclear_ip_red recommends ip_ownership_clause, derivative_works_clause, and source_code_escrow.

The PDF report ships every recommended clause with US-default sample copy. The clause library is below for quick reference; have your counsel adapt the language to your jurisdiction before signing.

Rule library

All 25 flag patterns

Each flag is a deterministic predicate over the 12 answers, mapped to one or more risk categories. References cite the procurement-failure case studies that justify the rule.

  • No working example to showred

    customer proof

  • Slides-only sales pitchyellow

    customer proof · technical credibility

  • Generic demo, not yoursyellow

    customer proof

  • No references at six-figure budgetred

    customer proof · relationship trust

  • No customer referencesyellow

    customer proof

  • LinkedIn-only referencesyellow

    customer proof

  • Only anonymized referencesyellow

    customer proof

  • Vague on the underlying modelred

    technical credibility

  • Claims a proprietary modelyellow

    technical credibility

  • Switches between frontier modelsyellow

    technical credibility · cost predictability

  • Data handling answer was vaguered

    technical credibility · operational competence

  • You did not ask about data handlingyellow

    operational competence

  • You did not ask about the modelyellow

    operational competence

  • Token-cost pass-through trapred

    cost predictability · pricing transparency

  • All-inclusive pricing too good to be trueyellow

    pricing transparency · cost predictability

  • Ongoing pricing not discussedred

    pricing transparency · relationship trust

  • Vendor keeps the IPyellow

    ip protection

  • IP ownership is unclearred

    ip protection · relationship trust

  • Sales-led pitch with no proofred

    operational competence · relationship trust · customer proof

  • Open-ended timelineyellow

    operational competence · cost predictability

  • Unknown delivery leadyellow

    relationship trust

  • Large budget without milestonesred

    cost predictability · operational competence

  • Looks like a thin wrapper at premium pricered

    technical credibility · pricing transparency

  • Staff aug without named engineersyellow

    operational competence

  • Freelancer at six-figure budgetred

    operational competence · relationship trust

Contract clauses

All 21 contract clauses

US-default jurisdiction. Sample copy ships in the PDF. Have your counsel adapt the language to your jurisdiction before signing.

  • Monthly cost cap

    cost predictability

  • Quarterly cost review

    cost predictability

  • 30-day cost-change notice

    cost predictability

  • Cost assumption disclosure

    pricing transparency

  • Itemized cost breakdown

    pricing transparency

  • IP ownership

    ip protection

  • Derivative works

    ip protection

  • Source code escrow

    ip protection

  • Buyout option

    ip protection

  • Data portability

    ip protection

  • Data processing addendum

    operational competence

  • Data residency

    operational competence

  • SLA / uptime

    operational competence

  • Reference performance guarantee

    customer proof

  • Pilot-first clause

    customer proof

  • Fixed milestones

    operational competence

  • Milestone payments

    operational competence

  • SOW amendment process

    operational competence

  • Key personnel

    relationship trust

  • Model disclosure

    technical credibility

  • Termination for convenience

    relationship trust

Honest about limits

What this tool is not.

This is not legal advice. The clause library ships US-default sample copy as a starting point. Engaged counsel must adapt language to your jurisdiction, your industry, and the specific contract.

This is not a credit check. The scorecard reads how the vendor pitched, not how the vendor is run. SOC 2, financial stability, and litigation history live in your standard procurement workflow.

This is not a verdict on the vendor. A high_risk verdict means the pitch reads risky on the patterns we encoded. Vendors can answer the open questions, amend the contract, and recover.

Versioning

Methodology v1.0 - April 2026.

Refresh cadence is quarterly. New procurement-failure patterns get added; outdated flags get retired. Each release ships with a numbered version, a change log, and the date of the next refresh.

A self-pass test runs in CI: buzzi.ai answers the 12 questions in writing and the engine must return looks_legit or worth_proceeding with zero red flags. If the test ever fails, we revisit the rules before shipping.

Citations

Where the rules come from.

Every flag carries reference URLs to the procurement-failure case studies that justify the rule. They live inline in the flag library above; click any flag to see its sources.

Ready?

Score a vendor in five minutes.

Twelve questions. Free preview. Methodology you just read - applied to your pitch.

Run the 12-question wizard