Use this when the team has shortlisted 2-4 vendors for a technology component and needs a structured evaluation to make a defensible selection decision. Also use for periodic vendor reviews to assess whether to renew or switch.

Related skills: Use /build-vs-buy when evaluation includes a build option. Use /vendor-security-assessment for security-specific evaluation. Use /decision-brief for final recommendation format. Use /business-case-builder for investment justification.

Process

Step 1: Gather inputs

Collect from the evaluation lead:

Capability being procured -- specific description, not just a category
Non-negotiable requirements -- must-haves that disqualify a vendor if unmet
Nice-to-have requirements -- valued but not dealbreakers
Timeline -- when does the decision need to be made, when must the solution be in production
Budget constraints -- annual and total budget available
Shortlisted vendors -- 2-4 vendors to evaluate (more than 4 makes the process unwieldy)
Existing relationships -- current contracts, partnerships, or history with any of the vendors

Step 2: Define weighted scorecard

Define criteria categories with weights summing to 100. Adjust weights based on what matters most for this decision:

Category	Default weight	Covers
Functionality/fit	25	Feature coverage against requirements
Scalability/performance	15	Can it handle projected growth
Security/compliance	15	Meets regulatory and security needs
Integration/ecosystem	10	Works with existing tech stack
Pricing/TCO	20	3-year total cost of ownership
Vendor viability	5	Financial health, market position, longevity
Support quality	5	Response times, expertise, escalation
Developer experience	5	API quality, docs, SDK maturity

Ask the user: "Which of these categories matters most for your decision? I'll adjust weights accordingly."

Step 3: Establish scoring methodology

Use a 1-5 scale with clear definitions:

Score	Meaning
5	Exceeds requirements with evidence (demo, reference, documentation)
4	Fully meets requirements
3	Meets most requirements with minor gaps
2	Significant gaps requiring workarounds
1	Does not meet requirements

Every score must include evidence -- a number without a reason is just an opinion.

Step 4: Score each vendor

Evaluate each vendor against every criterion using:

Vendor documentation -- feature lists, architecture docs, compliance certifications
Demos/POC results -- hands-on evaluation, not just sales demos
Reference checks -- conversations with existing customers (see Step 7)
Security questionnaire responses -- reference /vendor-security-assessment
Pricing proposals -- not list price, actual negotiated pricing

For each criterion, capture:

Numeric score (1-5)
Evidence supporting the score
Notes on gaps or concerns

Step 5: Generate comparison matrix

# Vendor Evaluation -- {{capability_name}}

## Evaluation date: {{date}}
## Evaluation lead: {{name/role}}
## Vendors evaluated: {{vendor_1}}, {{vendor_2}}, {{vendor_3}}

### Must-have requirements check
| Requirement | {{vendor_1}} | {{vendor_2}} | {{vendor_3}} |
|---|---|---|---|
| {{requirement_1}} | Pass/Fail | Pass/Fail | Pass/Fail |
| {{requirement_2}} | Pass/Fail | Pass/Fail | Pass/Fail |

**Disqualified vendors:** {{any vendor failing a must-have}}

### Weighted scorecard
| Criterion | Weight | {{vendor_1}} | {{vendor_2}} | {{vendor_3}} |
|---|---|---|---|---|
| Functionality/fit | {{w}} | {{score}} | {{score}} | {{score}} |
| Scalability/performance | {{w}} | {{score}} | {{score}} | {{score}} |
| Security/compliance | {{w}} | {{score}} | {{score}} | {{score}} |
| Integration/ecosystem | {{w}} | {{score}} | {{score}} | {{score}} |
| Pricing/TCO | {{w}} | {{score}} | {{score}} | {{score}} |
| Vendor viability | {{w}} | {{score}} | {{score}} | {{score}} |
| Support quality | {{w}} | {{score}} | {{score}} | {{score}} |
| Developer experience | {{w}} | {{score}} | {{score}} | {{score}} |
| **Weighted total** | **100** | **{{total}}** | **{{total}}** | **{{total}}** |

### Pricing comparison (3-year TCO)
| Cost component | {{vendor_1}} | {{vendor_2}} | {{vendor_3}} |
|---|---|---|---|
| Year 1 licensing | {{$}} | {{$}} | {{$}} |
| Year 2 licensing | {{$}} | {{$}} | {{$}} |
| Year 3 licensing | {{$}} | {{$}} | {{$}} |
| Implementation | {{$}} | {{$}} | {{$}} |
| Integration maintenance | {{$}} | {{$}} | {{$}} |
| **3-year total** | **{{$}}** | **{{$}}** | **{{$}}** |

### Key differentiators
{{Per-vendor summary of what makes each stand out, positive and negative}}

Step 6: Produce recommendation

## Recommendation

**Selected vendor:** {{vendor_name}}

**Rationale:** {{2-3 sentences on why this vendor wins}}

**Key risks:**
- {{Risk 1 and mitigation}}
- {{Risk 2 and mitigation}}

**Negotiation leverage:**
- {{Where you have alternatives that strengthen your position}}
- {{Pricing benchmarks or competitive pressure points}}

**Recommended contract terms:**
- Contract length: {{X}} years
- SLA requirements: {{key SLAs to include}}
- Exit clause: {{terms for early termination}}
- Price escalation cap: {{maximum annual increase}}
- Data portability: {{export requirements}}

Step 7: Generate reference check questions

Prepare these questions for speaking with each vendor's existing customers:

## Reference check questions -- {{vendor_name}}

1. How long have you been using {{vendor_name}}? What was your implementation timeline?
2. How did the implementation compare to what was promised during the sales process?
3. What's been your experience with their support? Can you give an example of a critical issue?
4. Have you encountered any unexpected costs or pricing changes?
5. How does the product handle {{your_specific_use_case}}?
6. What's the biggest limitation you've discovered?
7. If you were starting over, would you choose them again? Why or why not?
8. What would you negotiate differently in the contract?

Step 8: Design POC/trial plan

If the decision requires hands-on validation before committing:

## POC plan -- {{vendor_name}}

**Scope:** {{what to test, focused on highest-risk areas}}
**Duration:** {{1-4 weeks}}
**Team:** {{who participates}}

**Success criteria:**
- {{Criterion 1 with measurable target}}
- {{Criterion 2 with measurable target}}
- {{Criterion 3 with measurable target}}

**Evaluation rubric:**
- All criteria met = proceed with vendor
- 1-2 criteria missed = negotiate mitigations
- 3+ criteria missed = disqualify

Step 9: Review

Before finalizing, ask:

Are the weights right for this specific decision?
Did any vendor score surprisingly high or low? If so, check the evidence.
Are there political or relationship factors that the scorecard doesn't capture? (Name them explicitly rather than letting them influence scores silently.)
Does the recommendation need stakeholder buy-in before proceeding?
Is the evaluation team confident enough to commit, or is a POC needed?

Voice/audio vendor addendum

When evaluating voice or audio AI vendors, add these criteria to the weighted scorecard:

Criterion	What to evaluate	How to test
Speech recognition accuracy	Word error rate (WER) on your domain's vocabulary, accent coverage	Provide 50+ audio samples representing your actual user base; measure WER
Synthesis quality	Naturalness (MOS score), expressiveness, voice cloning fidelity	Blind listening test with 5+ evaluators; rate on 1-5 scale
Latency	End-to-end response time (speech in to speech out)	Measure p50 and p95 latency under realistic load
Language coverage	Number of supported languages, quality per language	Test top 3 languages your users speak; WER varies significantly by language
Telephony integration	SIP/PSTN support, warm handoff capability, call recording	Run test calls through actual phone infrastructure
Compliance	Call recording consent, AI disclosure, data residency, HIPAA/SOC2	Review certifications; test disclosure flows
Interruption handling	How gracefully the system handles user interruptions mid-response	Run 10 interruption scenarios; rate recovery quality

Voice-specific reference check questions (add to Step 7):

How does the voice agent handle accented or dialectal speech from your users?
What's the actual end-to-end latency you see in production (not the vendor's benchmark)?
How did you handle the regulatory disclosure requirement ("you are speaking with an AI")?
What's your warm handoff experience -- does context transfer to human agents reliably?

Output location

Save to deliverables/vendor-evaluation-{{capability_name}}-{{date}}.md. Reference check notes and POC results can be appended as sections or linked as separate files.

Example Output

Input

Capability being procured: Real-time fraud detection API for card-not-present transactions, including device fingerprinting, velocity checks, and ML-based risk scoring
Non-negotiable requirements: SOC 2 Type II certified, sub-100ms p95 latency, supports 5,000+ transactions per second, GDPR-compliant data residency in EU
Shortlisted vendors: Sardine, Kount (Equifax), Featurespace, Sift
Budget constraints: $400K/year cap; current legacy system costs $180K/year
Timeline: Decision by March 14; production by June 1 (ahead of summer transaction volume peak)
Existing relationships: Kount — existing Equifax data contract provides potential bundle discount; Sift — used by parent company's EU division

Output (abbreviated)

Vendor Evaluation — Real-Time Fraud Detection API

Evaluation date: March 7, 2025 Evaluation lead: Director of Payments Engineering Vendors evaluated: Sardine, Kount (Equifax), Featurespace, Sift

Must-Have Requirements Check

Requirement	Sardine	Kount	Featurespace	Sift
SOC 2 Type II certified	✅ Pass	✅ Pass	✅ Pass	✅ Pass
p95 latency < 100ms	✅ Pass	✅ Pass	❌ Fail	✅ Pass
5,000+ TPS throughput	✅ Pass	✅ Pass	✅ Pass	✅ Pass
EU data residency (GDPR)	✅ Pass	✅ Pass	✅ Pass	✅ Pass

Disqualified vendors: Featurespace — POC measured p95 latency at 134ms under 3,000 TPS load, failing the sub-100ms requirement. Removed from scored evaluation.

Weighted Scorecard

Weights adjusted: Pricing/TCO reduced to 15%, Functionality increased to 30%, Security/compliance increased to 20% given regulatory environment.

Criterion	Weight	Sardine	Kount	Sift
Functionality/fit	30%	4.5	3.5	4.0
Scalability/performance	15%	4.0	4.0	4.0
Security/compliance	20%	4.0	4.5	4.0
Integration/ecosystem	10%	4.0	3.0	4.5
Pricing/TCO	15%	3.5	4.5	3.0
Vendor viability	5%	3.0	5.0	4.0
Support quality	5%	4.0	3.5	4.0
Weighted total	100%	4.05	3.93	4.03

Score evidence highlights:

Sardine — Functionality (4.5): POC demonstrated best-in-class device fingerprinting with 94% fraud catch rate at 0.4% false positive rate on Brightline Financial's test dataset. Behavioral biometrics feature not available in competitors.
Kount — Security/Compliance (4.5): Equifax backing means dedicated compliance team; provided pre-signed EU DPA, existing audit reports, and named DPO contact within 24 hours. Bundle with existing Equifax contract simplifies procurement.
Kount — Integration/Ecosystem (3.0): Webhook-based event model requires custom middleware to integrate with Brightline's Kafka event stream; Sardine and Sift both offer native Kafka connectors.
Sift — Integration/Ecosystem (4.5): Pre-built connector for Brightline's existing Braintree + Stripe stack; estimated 3 fewer weeks of engineering time vs. Sardine.
Sardine — Vendor viability (3.0): Series B company (~$70M raised); no evidence of profitability. Mitigation required if selected.

Pricing Comparison (3-Year TCO)

Cost component	Sardine	Kount	Sift
Year 1 licensing	$285,000	$210,000	$320,000
Year 2 licensing	$300,000	$220,500	$336,000
Year 3 licensing	$315,000	$231,500	$353,000
Implementation / onboarding	$40,000	$55,000	$25,000
Integration maintenance (est.)	$30,000	$75,000	$15,000
3-year total	$970,000	$792,000	$1,049,000

Kount pricing reflects 12% bundle discount from existing Equifax contract. Sift pricing assumes 5% annual escalation per contract terms.

Key Differentiators

Sardine: Strongest raw fraud detection performance in POC; behavioral biometrics is a genuine differentiator for account-takeover scenarios Brightline is increasingly seeing. Primary concern is startup viability risk and highest integration cost.

Kount: Best TCO by $178K over Sardine; Equifax relationship de-risks vendor stability. Integration complexity is real — engineering estimate is 6 weeks vs. 3 for Sift. Tight June 1 deadline makes this a material concern.

Sift: Fastest time-to-production due to pre-built connectors; detection performance trails Sardine by ~4 percentage points in POC. Highest 3-year cost and steepest annual escalation clause.

Recommendation

Selected vendor: Sardine

Rationale: Sardine's fraud detection performance meaningfully outperformed Kount in head-to-head POC testing — a 4-point gap in catch rate at equivalent false positive thresholds translates to approximately $2.1M in prevented annual fraud losses at Brightline's transaction volume, dwarfing the $178K 3-year TCO premium over Kount. The June 1 timeline is achievable with a dedicated integration sprint. Vendor viability risk is manageable through contract protections.

Key risks:

Sardine startup risk — Mitigate by requiring source code escrow, 12-month data export SLA, and 90-day termination-for-convenience clause. Request audited financials annually.
Integration timeline — 6-week estimate is tight for June 1. Mitigate by beginning integration planning immediately upon contract signature; negotiate a 30-day parallel-run period with legacy system.
False positive rate creep — Sardine's model performance tied to consortium data; contractually require quarterly model performance reviews with defined SLA on false positive rate < 0.6%.

Negotiation leverage:

Kount is a credible alternative at significantly lower cost — use the $178K TCO gap explicitly in negotiations to push Sardine toward $260K Year 1
Sift's parent company (Midigator acquisition) has signaled aggressive 2025 pricing to win enterprise logos; use as secondary pressure point
Sardine's Series B investors have runway pressure; end-of-quarter signing (March 31) may yield additional discount

Recommended contract terms:

Contract length: 2 years with mutual option to extend (avoid 3-year lock given startup risk)
SLA: p95 latency ≤ 80ms (headroom below 100ms requirement), 99.95% uptime, fraud catch rate ≥ 91% measured quarterly
Exit clause: 90-day termination for convenience; 30-day termination for SLA breach (2 consecutive quarters)
Price escalation cap: 5% annually
Data portability: Full model feature export and 18-month transaction history in JSON within 30 days of contract end

Reference Check Questions — Sardine

How long have you been using Sardine, and how did your implementation timeline compare to

Run this now

Try /vendor-evaluation on your own input

0/4000

Part of these Playbook topics

Product-Engineering Pairing

Related Assessment & Diagnostics skills

10x Move Evaluator AI Adoption Evaluator AI Maturity Assess AI Maturity Org Annual Strategy Review Attribution Model Designer Bias Spotter Brand Audit

Back to Skills Catalog