AI Vendor Risk Assessment

Use this when you are evaluating an AI vendor before buying: a model provider (OpenAI, Anthropic, a hosting platform), an AI-native product, or a SaaS tool that has quietly embedded an LLM. It covers the risks generic vendor review misses: whether they train on your data, what happens when they sunset the model you built on, and whether usage-based pricing becomes unaffordable at scale. If the tool has no AI component and you only need a standard security posture review, use /vendor-security-assessment instead. If you have not yet decided whether to buy at all, start with /build-vs-buy.

Related skills: Use /vendor-security-assessment for the underlying security posture review (certs, encryption, access controls). Use /vendor-evaluation for broader vendor fit beyond AI risk. Use /build-vs-buy when this is part of a build-vs-buy decision. Record findings in /ai-risk-register.

The hard part most teams miss

Organizations buy AI capability far faster than they govern it. The security questionnaire that worked for a logging SaaS does not surface the failures that actually sink AI vendors.

"Do they train on your data" is the question that matters, and it has two hidden halves: retention and sub-processors. Most teams ask the headline question, get a "no," and stop. But a "no" on training is worthless if prompts and outputs are retained for 30 days for "abuse monitoring," or if the vendor routes your data through a sub-processor whose terms you never read. The data terms are the assessment. Everything else is secondary.
Demos hide model deprecation and lock-in. The demo runs on whatever model is best today. It tells you nothing about what happens when the vendor sunsets that model in eighteen months, changes its behavior in a silent update, or deprecates the fine-tune you spent a quarter building on. Ask what the deprecation policy is, what notice you get, and whether you can pin a version. A vendor with no answer is a vendor planning to move your floor without telling you.
The pricing model is a strategic risk, not a line item. Usage-based and token-based pricing means your cost scales with your success. A product that is cheap in pilot can become unaffordable the month it works, and you will have built your economics on a number the vendor controls. Model the cost at 10x and 100x current volume before signing, not after.

Process

Step 1: Gather inputs

Ask the user:

What is the vendor and what does the AI do? ({{vendor_name}} and the job the AI performs, in one sentence.)
Is this a model provider, an AI-native product, or a SaaS tool with embedded AI? (This changes which risks dominate.)
What data flows to the vendor? ({{data_types}}: prompts, documents, customer PII, code, proprietary content. Be specific.)
What would you build on top of it? (Prompts, fine-tunes, retrieval indexes, agent workflows. This sets your lock-in exposure.)
What is the expected volume now, and at success? (Requests, tokens, or seats today and at 10x. This sets your pricing risk.)
What is your regulatory context? (GDPR, EU AI Act role, sector rules. This sets compliance fit.)
What is the cost of the AI being wrong, slow, or unavailable? (This sets how hard you press on reliability and transparency.)

Depth should be proportional to exposure. A vendor that summarizes public help articles gets a lighter review than one that ingests customer PII into a model you cannot inspect.

Step 2: Assess data handling

This is the core of the review. Press until claims are evidenced, not asserted.

Training: Does the vendor train or fine-tune on your prompts, outputs, or uploaded data? Is it on by default? Can you contractually opt out, and is the opt-out the default for your tier?
Retention: How long are prompts and outputs retained, and why? "Zero retention" and "30-day abuse-monitoring retention" are very different risk profiles. Get the number.
Sub-processors: Who else touches the data? Which model providers, hosting, or analytics vendors sit behind this one? Is there a published sub-processor list and a change-notification commitment?
Data region: Where is data processed and stored? Can residency be contractually guaranteed, or is it best-effort?
Output handling: Are outputs logged, reviewed by humans, or used to improve the service in any way?

A vendor who cannot give you a written, specific answer on training and retention is a red flag regardless of how good the product is.

Step 3: Assess security, provenance, and reliability

Security posture: SOC 2 Type II or ISO 27001, encryption at rest and in transit, access controls and MFA. For depth here, hand off to /vendor-security-assessment; do not duplicate it.
Model provenance and transparency: Which underlying models power the product? Are they first-party or resold? Does the vendor publish or share eval results, known failure modes, and benchmarks, or is it a black box?
Update and deprecation policy: How often do models change? Do you get notice before behavior changes? Can you pin a version? What is the deprecation timeline and migration support when a model is sunset?
Reliability: Is there a real SLA with credits, or marketing uptime language? What are the rate limits, and do they throttle at your expected volume? Is there a public status page with incident history you can read?

Step 4: Assess lock-in, pricing, and compliance fit

Lock-in and portability: Can you export prompts, fine-tune artifacts, embeddings, and conversation data in a usable format? Are you building on proprietary APIs with no equivalent elsewhere, or on portable patterns? What is the realistic cost and time to migrate away?
Pricing-model risk: Is pricing per-token, per-seat, per-request, or flat? Model the monthly cost at current, 10x, and 100x volume. Are there committed-use discounts, price-change notice terms, and a cap on overage? Usage pricing with no ceiling is a strategic exposure, not a line item.
Compliance fit: Does the vendor support your GDPR obligations (DPA, SCCs)? Under the EU AI Act, are they a provider and are you the deployer, and do their obligations and yours line up? Is there a documented incident history, and have they had a model-safety or data incident you should know about?

Step 5: Score and recommend

Score each domain Red, Yellow, or Green. Red on any data-handling item is a presumptive disqualifier unless a compensating control exists.

## AI Vendor Risk Assessment -- {{vendor_name}} -- {{date}}

**Vendor type:** Model provider / AI-native product / SaaS with embedded AI
**What the AI does:** {{one_line}}
**Data sent:** {{data_types}}

| Domain | Finding | Rating | Note |
|---|---|---|---|
| Trains on your data | {{yes_no_optout}} | Red / Yellow / Green | {{note}} |
| Retention | {{duration_and_reason}} | Red / Yellow / Green | {{note}} |
| Sub-processors | {{who_and_notice}} | Red / Yellow / Green | {{note}} |
| Data region | {{region_guarantee}} | Red / Yellow / Green | {{note}} |
| Security posture | {{certs_encryption}} | Red / Yellow / Green | {{note}} |
| Model provenance / transparency | {{models_and_evals}} | Red / Yellow / Green | {{note}} |
| Update / deprecation policy | {{notice_and_pinning}} | Red / Yellow / Green | {{note}} |
| Reliability (SLA, limits, status) | {{sla_and_history}} | Red / Yellow / Green | {{note}} |
| Lock-in / portability | {{export_and_migration}} | Red / Yellow / Green | {{note}} |
| Pricing-model risk | {{model_and_scale_cost}} | Red / Yellow / Green | {{note}} |
| Compliance fit | {{gdpr_ai_act_incidents}} | Red / Yellow / Green | {{note}} |

**Overall risk:** Low / Medium / High / Critical

**Pricing at scale (show the math):**
- Now ({{current_volume}}): {{cost}}
- 10x: {{cost}}
- 100x: {{cost}}

**Red flags:** {{list}}
**Residual risks and compensating controls:** {{list}}

**Recommendation:** Approve / Approve with conditions / Reject
- If conditional: {{conditions, timeline, reassessment trigger}}
- If reject: {{specific disqualifiers}}

Rating definitions: Red is a disqualifier or critical gap (trains on your data with no opt-out, no version pinning on a model you would build on, uncapped usage pricing with no notice terms). Yellow is acceptable with conditions (30-day retention, SLA without credits, single-region only). Green meets or exceeds requirements.

Step 6: Review