Skip to main content
Assessment & Diagnostics/vendor-evaluation

Vendor Evaluation

You need to score and compare 2-4 technology vendors with a weighted criteria scorecard.

Use this when the team has shortlisted 2-4 vendors for a technology component and needs a structured evaluation to make a defensible selection decision. Also use for periodic vendor reviews to assess whether to renew or switch.

Related skills: Use /build-vs-buy when evaluation includes a build option. Use /vendor-security-assessment for security-specific evaluation. Use /decision-brief for final recommendation format. Use /business-case-builder for investment justification.

Process

Step 1: Gather inputs

Collect from the evaluation lead:

  • Capability being procured -- specific description, not just a category
  • Non-negotiable requirements -- must-haves that disqualify a vendor if unmet
  • Nice-to-have requirements -- valued but not dealbreakers
  • Timeline -- when does the decision need to be made, when must the solution be in production
  • Budget constraints -- annual and total budget available
  • Shortlisted vendors -- 2-4 vendors to evaluate (more than 4 makes the process unwieldy)
  • Existing relationships -- current contracts, partnerships, or history with any of the vendors

Step 2: Define weighted scorecard

Define criteria categories with weights summing to 100. Adjust weights based on what matters most for this decision:

CategoryDefault weightCovers
Functionality/fit25Feature coverage against requirements
Scalability/performance15Can it handle projected growth
Security/compliance15Meets regulatory and security needs
Integration/ecosystem10Works with existing tech stack
Pricing/TCO203-year total cost of ownership
Vendor viability5Financial health, market position, longevity
Support quality5Response times, expertise, escalation
Developer experience5API quality, docs, SDK maturity

Ask the user: "Which of these categories matters most for your decision? I'll adjust weights accordingly."

Step 3: Establish scoring methodology

Use a 1-5 scale with clear definitions:

ScoreMeaning
5Exceeds requirements with evidence (demo, reference, documentation)
4Fully meets requirements
3Meets most requirements with minor gaps
2Significant gaps requiring workarounds
1Does not meet requirements

Every score must include evidence -- a number without a reason is just an opinion.

Step 4: Score each vendor

Evaluate each vendor against every criterion using:

  • Vendor documentation -- feature lists, architecture docs, compliance certifications
  • Demos/POC results -- hands-on evaluation, not just sales demos
  • Reference checks -- conversations with existing customers (see Step 7)
  • Security questionnaire responses -- reference /vendor-security-assessment
  • Pricing proposals -- not list price, actual negotiated pricing

For each criterion, capture:

  • Numeric score (1-5)
  • Evidence supporting the score
  • Notes on gaps or concerns

Step 5: Generate comparison matrix

# Vendor Evaluation -- {{capability_name}}

## Evaluation date: {{date}}
## Evaluation lead: {{name/role}}
## Vendors evaluated: {{vendor_1}}, {{vendor_2}}, {{vendor_3}}

### Must-have requirements check
| Requirement | {{vendor_1}} | {{vendor_2}} | {{vendor_3}} |
|---|---|---|---|
| {{requirement_1}} | Pass/Fail | Pass/Fail | Pass/Fail |
| {{requirement_2}} | Pass/Fail | Pass/Fail | Pass/Fail |

**Disqualified vendors:** {{any vendor failing a must-have}}

### Weighted scorecard
| Criterion | Weight | {{vendor_1}} | {{vendor_2}} | {{vendor_3}} |
|---|---|---|---|---|
| Functionality/fit | {{w}} | {{score}} | {{score}} | {{score}} |
| Scalability/performance | {{w}} | {{score}} | {{score}} | {{score}} |
| Security/compliance | {{w}} | {{score}} | {{score}} | {{score}} |
| Integration/ecosystem | {{w}} | {{score}} | {{score}} | {{score}} |
| Pricing/TCO | {{w}} | {{score}} | {{score}} | {{score}} |
| Vendor viability | {{w}} | {{score}} | {{score}} | {{score}} |
| Support quality | {{w}} | {{score}} | {{score}} | {{score}} |
| Developer experience | {{w}} | {{score}} | {{score}} | {{score}} |
| **Weighted total** | **100** | **{{total}}** | **{{total}}** | **{{total}}** |

### Pricing comparison (3-year TCO)
| Cost component | {{vendor_1}} | {{vendor_2}} | {{vendor_3}} |
|---|---|---|---|
| Year 1 licensing | {{$}} | {{$}} | {{$}} |
| Year 2 licensing | {{$}} | {{$}} | {{$}} |
| Year 3 licensing | {{$}} | {{$}} | {{$}} |
| Implementation | {{$}} | {{$}} | {{$}} |
| Integration maintenance | {{$}} | {{$}} | {{$}} |
| **3-year total** | **{{$}}** | **{{$}}** | **{{$}}** |

### Key differentiators
{{Per-vendor summary of what makes each stand out, positive and negative}}

Step 6: Produce recommendation

## Recommendation

**Selected vendor:** {{vendor_name}}

**Rationale:** {{2-3 sentences on why this vendor wins}}

**Key risks:**
- {{Risk 1 and mitigation}}
- {{Risk 2 and mitigation}}

**Negotiation leverage:**
- {{Where you have alternatives that strengthen your position}}
- {{Pricing benchmarks or competitive pressure points}}

**Recommended contract terms:**
- Contract length: {{X}} years
- SLA requirements: {{key SLAs to include}}
- Exit clause: {{terms for early termination}}
- Price escalation cap: {{maximum annual increase}}
- Data portability: {{export requirements}}

Step 7: Generate reference check questions

Prepare these questions for speaking with each vendor's existing customers:

## Reference check questions -- {{vendor_name}}

1. How long have you been using {{vendor_name}}? What was your implementation timeline?
2. How did the implementation compare to what was promised during the sales process?
3. What's been your experience with their support? Can you give an example of a critical issue?
4. Have you encountered any unexpected costs or pricing changes?
5. How does the product handle {{your_specific_use_case}}?
6. What's the biggest limitation you've discovered?
7. If you were starting over, would you choose them again? Why or why not?
8. What would you negotiate differently in the contract?

Step 8: Design POC/trial plan

If the decision requires hands-on validation before committing:

## POC plan -- {{vendor_name}}

**Scope:** {{what to test, focused on highest-risk areas}}
**Duration:** {{1-4 weeks}}
**Team:** {{who participates}}

**Success criteria:**
- {{Criterion 1 with measurable target}}
- {{Criterion 2 with measurable target}}
- {{Criterion 3 with measurable target}}

**Evaluation rubric:**
- All criteria met = proceed with vendor
- 1-2 criteria missed = negotiate mitigations
- 3+ criteria missed = disqualify

Step 9: Review

Before finalizing, ask:

  • Are the weights right for this specific decision?
  • Did any vendor score surprisingly high or low? If so, check the evidence.
  • Are there political or relationship factors that the scorecard doesn't capture? (Name them explicitly rather than letting them influence scores silently.)
  • Does the recommendation need stakeholder buy-in before proceeding?
  • Is the evaluation team confident enough to commit, or is a POC needed?

Voice/audio vendor addendum

When evaluating voice or audio AI vendors, add these criteria to the weighted scorecard:

CriterionWhat to evaluateHow to test
Speech recognition accuracyWord error rate (WER) on your domain's vocabulary, accent coverageProvide 50+ audio samples representing your actual user base; measure WER
Synthesis qualityNaturalness (MOS score), expressiveness, voice cloning fidelityBlind listening test with 5+ evaluators; rate on 1-5 scale
LatencyEnd-to-end response time (speech in to speech out)Measure p50 and p95 latency under realistic load
Language coverageNumber of supported languages, quality per languageTest top 3 languages your users speak; WER varies significantly by language
Telephony integrationSIP/PSTN support, warm handoff capability, call recordingRun test calls through actual phone infrastructure
ComplianceCall recording consent, AI disclosure, data residency, HIPAA/SOC2Review certifications; test disclosure flows
Interruption handlingHow gracefully the system handles user interruptions mid-responseRun 10 interruption scenarios; rate recovery quality

Voice-specific reference check questions (add to Step 7):

  • How does the voice agent handle accented or dialectal speech from your users?
  • What's the actual end-to-end latency you see in production (not the vendor's benchmark)?
  • How did you handle the regulatory disclosure requirement ("you are speaking with an AI")?
  • What's your warm handoff experience -- does context transfer to human agents reliably?

Output location

Save to deliverables/vendor-evaluation-{{capability_name}}-{{date}}.md. Reference check notes and POC results can be appended as sections or linked as separate files.

Example Output

Input

  • Capability being procured: Real-time fraud detection API for card-not-present transactions, including device fingerprinting, velocity checks, and ML-based risk scoring
  • Non-negotiable requirements: SOC 2 Type II certified, sub-100ms p95 latency, supports 5,000+ transactions per second, GDPR-compliant data residency in EU
  • Shortlisted vendors: Sardine, Kount (Equifax), Featurespace, Sift
  • Budget constraints: $400K/year cap; current legacy system costs $180K/year
  • Timeline: Decision by March 14; production by June 1 (ahead of summer transaction volume peak)
  • Existing relationships: Kount — existing Equifax data contract provides potential bundle discount; Sift — used by parent company's EU division

Output (abbreviated)

Vendor Evaluation — Real-Time Fraud Detection API

Evaluation date: March 7, 2025 Evaluation lead: Director of Payments Engineering Vendors evaluated: Sardine, Kount (Equifax), Featurespace, Sift


Must-Have Requirements Check

RequirementSardineKountFeaturespaceSift
SOC 2 Type II certified✅ Pass✅ Pass✅ Pass✅ Pass
p95 latency < 100ms✅ Pass✅ Pass❌ Fail✅ Pass
5,000+ TPS throughput✅ Pass✅ Pass✅ Pass✅ Pass
EU data residency (GDPR)✅ Pass✅ Pass✅ Pass✅ Pass

Disqualified vendors: Featurespace — POC measured p95 latency at 134ms under 3,000 TPS load, failing the sub-100ms requirement. Removed from scored evaluation.


Weighted Scorecard

Weights adjusted: Pricing/TCO reduced to 15%, Functionality increased to 30%, Security/compliance increased to 20% given regulatory environment.

CriterionWeightSardineKountSift
Functionality/fit30%4.53.54.0
Scalability/performance15%4.04.04.0
Security/compliance20%4.04.54.0
Integration/ecosystem10%4.03.04.5
Pricing/TCO15%3.54.53.0
Vendor viability5%3.05.04.0
Support quality5%4.03.54.0
Weighted total100%4.053.934.03

Score evidence highlights:

  • Sardine — Functionality (4.5): POC demonstrated best-in-class device fingerprinting with 94% fraud catch rate at 0.4% false positive rate on Brightline Financial's test dataset. Behavioral biometrics feature not available in competitors.
  • Kount — Security/Compliance (4.5): Equifax backing means dedicated compliance team; provided pre-signed EU DPA, existing audit reports, and named DPO contact within 24 hours. Bundle with existing Equifax contract simplifies procurement.
  • Kount — Integration/Ecosystem (3.0): Webhook-based event model requires custom middleware to integrate with Brightline's Kafka event stream; Sardine and Sift both offer native Kafka connectors.
  • Sift — Integration/Ecosystem (4.5): Pre-built connector for Brightline's existing Braintree + Stripe stack; estimated 3 fewer weeks of engineering time vs. Sardine.
  • Sardine — Vendor viability (3.0): Series B company (~$70M raised); no evidence of profitability. Mitigation required if selected.

Pricing Comparison (3-Year TCO)

Cost componentSardineKountSift
Year 1 licensing$285,000$210,000$320,000
Year 2 licensing$300,000$220,500$336,000
Year 3 licensing$315,000$231,500$353,000
Implementation / onboarding$40,000$55,000$25,000
Integration maintenance (est.)$30,000$75,000$15,000
3-year total$970,000$792,000$1,049,000

Kount pricing reflects 12% bundle discount from existing Equifax contract. Sift pricing assumes 5% annual escalation per contract terms.


Key Differentiators

Sardine: Strongest raw fraud detection performance in POC; behavioral biometrics is a genuine differentiator for account-takeover scenarios Brightline is increasingly seeing. Primary concern is startup viability risk and highest integration cost.

Kount: Best TCO by $178K over Sardine; Equifax relationship de-risks vendor stability. Integration complexity is real — engineering estimate is 6 weeks vs. 3 for Sift. Tight June 1 deadline makes this a material concern.

Sift: Fastest time-to-production due to pre-built connectors; detection performance trails Sardine by ~4 percentage points in POC. Highest 3-year cost and steepest annual escalation clause.


Recommendation

Selected vendor: Sardine

Rationale: Sardine's fraud detection performance meaningfully outperformed Kount in head-to-head POC testing — a 4-point gap in catch rate at equivalent false positive thresholds translates to approximately $2.1M in prevented annual fraud losses at Brightline's transaction volume, dwarfing the $178K 3-year TCO premium over Kount. The June 1 timeline is achievable with a dedicated integration sprint. Vendor viability risk is manageable through contract protections.

Key risks:

  • Sardine startup risk — Mitigate by requiring source code escrow, 12-month data export SLA, and 90-day termination-for-convenience clause. Request audited financials annually.
  • Integration timeline — 6-week estimate is tight for June 1. Mitigate by beginning integration planning immediately upon contract signature; negotiate a 30-day parallel-run period with legacy system.
  • False positive rate creep — Sardine's model performance tied to consortium data; contractually require quarterly model performance reviews with defined SLA on false positive rate < 0.6%.

Negotiation leverage:

  • Kount is a credible alternative at significantly lower cost — use the $178K TCO gap explicitly in negotiations to push Sardine toward $260K Year 1
  • Sift's parent company (Midigator acquisition) has signaled aggressive 2025 pricing to win enterprise logos; use as secondary pressure point
  • Sardine's Series B investors have runway pressure; end-of-quarter signing (March 31) may yield additional discount

Recommended contract terms:

  • Contract length: 2 years with mutual option to extend (avoid 3-year lock given startup risk)
  • SLA: p95 latency ≤ 80ms (headroom below 100ms requirement), 99.95% uptime, fraud catch rate ≥ 91% measured quarterly
  • Exit clause: 90-day termination for convenience; 30-day termination for SLA breach (2 consecutive quarters)
  • Price escalation cap: 5% annually
  • Data portability: Full model feature export and 18-month transaction history in JSON within 30 days of contract end

Reference Check Questions — Sardine

  1. How long have you been using Sardine, and how did your implementation timeline compare to