Use this when you are evaluating an AI vendor before buying: a model provider (OpenAI, Anthropic, a hosting platform), an AI-native product, or a SaaS tool that has quietly embedded an LLM. It covers the risks generic vendor review misses: whether they train on your data, what happens when they sunset the model you built on, and whether usage-based pricing becomes unaffordable at scale. If the tool has no AI component and you only need a standard security posture review, use /vendor-security-assessment instead. If you have not yet decided whether to buy at all, start with /build-vs-buy.
Related skills: Use
/vendor-security-assessmentfor the underlying security posture review (certs, encryption, access controls). Use/vendor-evaluationfor broader vendor fit beyond AI risk. Use/build-vs-buywhen this is part of a build-vs-buy decision. Record findings in/ai-risk-register.
The hard part most teams miss
Organizations buy AI capability far faster than they govern it. The security questionnaire that worked for a logging SaaS does not surface the failures that actually sink AI vendors.
- "Do they train on your data" is the question that matters, and it has two hidden halves: retention and sub-processors. Most teams ask the headline question, get a "no," and stop. But a "no" on training is worthless if prompts and outputs are retained for 30 days for "abuse monitoring," or if the vendor routes your data through a sub-processor whose terms you never read. The data terms are the assessment. Everything else is secondary.
- Demos hide model deprecation and lock-in. The demo runs on whatever model is best today. It tells you nothing about what happens when the vendor sunsets that model in eighteen months, changes its behavior in a silent update, or deprecates the fine-tune you spent a quarter building on. Ask what the deprecation policy is, what notice you get, and whether you can pin a version. A vendor with no answer is a vendor planning to move your floor without telling you.
- The pricing model is a strategic risk, not a line item. Usage-based and token-based pricing means your cost scales with your success. A product that is cheap in pilot can become unaffordable the month it works, and you will have built your economics on a number the vendor controls. Model the cost at 10x and 100x current volume before signing, not after.
Process
Step 1: Gather inputs
Ask the user:
- What is the vendor and what does the AI do? ({{vendor_name}} and the job the AI performs, in one sentence.)
- Is this a model provider, an AI-native product, or a SaaS tool with embedded AI? (This changes which risks dominate.)
- What data flows to the vendor? ({{data_types}}: prompts, documents, customer PII, code, proprietary content. Be specific.)
- What would you build on top of it? (Prompts, fine-tunes, retrieval indexes, agent workflows. This sets your lock-in exposure.)
- What is the expected volume now, and at success? (Requests, tokens, or seats today and at 10x. This sets your pricing risk.)
- What is your regulatory context? (GDPR, EU AI Act role, sector rules. This sets compliance fit.)
- What is the cost of the AI being wrong, slow, or unavailable? (This sets how hard you press on reliability and transparency.)
Depth should be proportional to exposure. A vendor that summarizes public help articles gets a lighter review than one that ingests customer PII into a model you cannot inspect.
Step 2: Assess data handling
This is the core of the review. Press until claims are evidenced, not asserted.
- Training: Does the vendor train or fine-tune on your prompts, outputs, or uploaded data? Is it on by default? Can you contractually opt out, and is the opt-out the default for your tier?
- Retention: How long are prompts and outputs retained, and why? "Zero retention" and "30-day abuse-monitoring retention" are very different risk profiles. Get the number.
- Sub-processors: Who else touches the data? Which model providers, hosting, or analytics vendors sit behind this one? Is there a published sub-processor list and a change-notification commitment?
- Data region: Where is data processed and stored? Can residency be contractually guaranteed, or is it best-effort?
- Output handling: Are outputs logged, reviewed by humans, or used to improve the service in any way?
A vendor who cannot give you a written, specific answer on training and retention is a red flag regardless of how good the product is.
Step 3: Assess security, provenance, and reliability
- Security posture: SOC 2 Type II or ISO 27001, encryption at rest and in transit, access controls and MFA. For depth here, hand off to
/vendor-security-assessment; do not duplicate it. - Model provenance and transparency: Which underlying models power the product? Are they first-party or resold? Does the vendor publish or share eval results, known failure modes, and benchmarks, or is it a black box?
- Update and deprecation policy: How often do models change? Do you get notice before behavior changes? Can you pin a version? What is the deprecation timeline and migration support when a model is sunset?
- Reliability: Is there a real SLA with credits, or marketing uptime language? What are the rate limits, and do they throttle at your expected volume? Is there a public status page with incident history you can read?
Step 4: Assess lock-in, pricing, and compliance fit
- Lock-in and portability: Can you export prompts, fine-tune artifacts, embeddings, and conversation data in a usable format? Are you building on proprietary APIs with no equivalent elsewhere, or on portable patterns? What is the realistic cost and time to migrate away?
- Pricing-model risk: Is pricing per-token, per-seat, per-request, or flat? Model the monthly cost at current, 10x, and 100x volume. Are there committed-use discounts, price-change notice terms, and a cap on overage? Usage pricing with no ceiling is a strategic exposure, not a line item.
- Compliance fit: Does the vendor support your GDPR obligations (DPA, SCCs)? Under the EU AI Act, are they a provider and are you the deployer, and do their obligations and yours line up? Is there a documented incident history, and have they had a model-safety or data incident you should know about?
Step 5: Score and recommend
Score each domain Red, Yellow, or Green. Red on any data-handling item is a presumptive disqualifier unless a compensating control exists.
## AI Vendor Risk Assessment -- {{vendor_name}} -- {{date}}
**Vendor type:** Model provider / AI-native product / SaaS with embedded AI
**What the AI does:** {{one_line}}
**Data sent:** {{data_types}}
| Domain | Finding | Rating | Note |
|---|---|---|---|
| Trains on your data | {{yes_no_optout}} | Red / Yellow / Green | {{note}} |
| Retention | {{duration_and_reason}} | Red / Yellow / Green | {{note}} |
| Sub-processors | {{who_and_notice}} | Red / Yellow / Green | {{note}} |
| Data region | {{region_guarantee}} | Red / Yellow / Green | {{note}} |
| Security posture | {{certs_encryption}} | Red / Yellow / Green | {{note}} |
| Model provenance / transparency | {{models_and_evals}} | Red / Yellow / Green | {{note}} |
| Update / deprecation policy | {{notice_and_pinning}} | Red / Yellow / Green | {{note}} |
| Reliability (SLA, limits, status) | {{sla_and_history}} | Red / Yellow / Green | {{note}} |
| Lock-in / portability | {{export_and_migration}} | Red / Yellow / Green | {{note}} |
| Pricing-model risk | {{model_and_scale_cost}} | Red / Yellow / Green | {{note}} |
| Compliance fit | {{gdpr_ai_act_incidents}} | Red / Yellow / Green | {{note}} |
**Overall risk:** Low / Medium / High / Critical
**Pricing at scale (show the math):**
- Now ({{current_volume}}): {{cost}}
- 10x: {{cost}}
- 100x: {{cost}}
**Red flags:** {{list}}
**Residual risks and compensating controls:** {{list}}
**Recommendation:** Approve / Approve with conditions / Reject
- If conditional: {{conditions, timeline, reassessment trigger}}
- If reject: {{specific disqualifiers}}
Rating definitions: Red is a disqualifier or critical gap (trains on your data with no opt-out, no version pinning on a model you would build on, uncapped usage pricing with no notice terms). Yellow is acceptable with conditions (30-day retention, SLA without credits, single-region only). Green meets or exceeds requirements.
Step 6: Review
Ask the user:
- What is the single worst thing this vendor could do with our data, and is it contractually prevented or just promised?
- If they deprecated the model we built on tomorrow, what would it cost us, and how much notice are we owed in writing?
- Have we modeled the bill at the volume we are actually hoping for, not the pilot volume?
- Does legal need to review the DPA and sub-processor terms before signing?
- Should this go into the AI risk register with a reassessment trigger?
Anti-patterns
| Anti-pattern | Why it fails | Do instead |
|---|---|---|
| Taking "we don't train on your data" at face value | Ignores retention and sub-processors, where the real exposure lives | Get retention duration and the sub-processor list in writing |
| Evaluating on the demo | The demo hides deprecation, silent updates, and lock-in | Ask the deprecation policy and version-pinning terms before you build |
| Treating pricing as a line item | Usage pricing scales with success and can become unaffordable | Model cost at 10x and 100x volume before signing |
| Reusing the generic security questionnaire | Misses training terms, model provenance, and update policy entirely | Run the AI-specific domains here on top of the security review |
| Ignoring portability until you want to leave | Lock-in is invisible until migration, when it is most expensive | Confirm export of prompts, fine-tunes, and data up front |
| Skipping the EU AI Act role question | Provider and deployer obligations can land on you unexpectedly | Confirm who is provider, who is deployer, and that obligations align |
Output location
Deliver as a markdown document. Suggested filename: ai-vendor-assessment-{{vendor-name}}-{{date}}.md. The recommendation and pricing-at-scale sections should be extractable for procurement review.