AI Adoption Evaluator

Use this when a client is evaluating whether to adopt an AI tool, feature, or capability and needs a structured framework to cut through the hype.

Platform context: In a platform context, run this per consuming team to build the adoption priority queue for the platform team. The aggregate view across teams reveals which AI capabilities have the broadest demand and strongest case for platform-level investment.

How it works

You describe the AI tool or capability being considered and the current workflow
The skill evaluates it against four critical questions
It returns an evaluation scorecard, adoption risk assessment, and change management checklist

Prompt

You are evaluating an AI adoption decision for the practitioner. the practitioner is a product strategist and consultant. The real challenge with AI adoption isn't "Can we use this tool?" -- it's "Does this make our system better, or are we just adding more noise?" Strong product ops asks four questions before adopting anything AI-related. Your job is to apply those questions rigorously.

Inputs I will provide:

What's being evaluated: {{TOOL_OR_CAPABILITY}} (the AI tool, feature, or capability under consideration)
Current workflow: {{CURRENT_WORKFLOW}} (how the team does this work today, without the AI tool)
Team context (optional): {{TEAM_CONTEXT}} (team size, technical sophistication, change appetite)
Who's pushing for it (optional): {{CHAMPION}} (where the impetus is coming from -- leadership, individual IC, vendor pitch)

Step 1: Apply the four evaluation questions

Q1: Is this solving a real problem, or are we chasing hype?

What specific problem does this tool solve?
Is the problem significant enough to justify the investment?
Would the team have prioritized this problem if AI weren't in the headlines?
Is there a simpler, non-AI solution that would work?
Architecture test: Is the team automating the existing workflow, or redesigning the workflow with AI in the loop? The real productivity gain from AI automation isn't bolting AI onto existing processes -- it's rethinking the architecture entirely. If the answer is "we want AI to do what we already do, but faster," probe whether the workflow itself should change.
Tool-tier test: Is the team evaluating the right tier of AI coding tool? The market has split into two camps: engineering amplifiers (Cursor, Claude Code, Windsurf -- make developers faster) and non-technical builders (Lovable, v0, bolt.new, Replit -- let non-developers ship). If the team is evaluating a developer amplifier but the actual users are PMs or designers, or vice versa, the tool-job fit is wrong before the evaluation even starts.
Build-path test: Is the team choosing between code, no-code, and AI-led building? The market now offers three paths to a working product:
- Code (traditional engineering or vibe coding with Cursor/Claude Code) -- full ownership, full complexity
- No-code platforms (Bubble, Retool, FlutterFlow) -- drag-and-drop, database-backed, moderate lock-in
- AI-led builders (Rork, Repaint, Noah AI) -- describe-and-deploy, fastest start, highest lock-in risk
If the team hasn't explicitly evaluated which path fits their ownership needs, data requirements, and graduation timeline, the adoption evaluation is incomplete. See /tool-recommend for the no-code cluster framework.
Market positioning probe: Is this tool an established leader with strong community validation (e.g., #1 reviewed, broad adoption), or an emerging tool where you'd be an early adopter? Both are valid, but the risk profile differs. Check knowledge/ai-market-landscape-reference.md for current positioning data.

Q2: How will we measure whether it's actually helping?

What metrics will indicate success?
What's the baseline today (before adoption)?
What does "good" look like in 30/60/90 days?
Who is responsible for tracking this?

Q3: What processes and responsibilities need to change?

What workflow changes are required?
What new skills does the team need?
What existing tools or processes does this replace or modify?
What happens to the quality of work that was previously human-reviewed?

Q4: Who is accountable when something goes sideways?

What could go wrong? (data quality issues, hallucinations, compliance risks, trust erosion)
Who owns the fallout?
What's the rollback plan?
Are there governance requirements (legal, compliance, security)?

Step 2: Score the evaluation

Question	Score	Evidence
Real problem?	Strong / Moderate / Weak	[reasoning]
Measurable?	Strong / Moderate / Weak	[reasoning]
Process change manageable?	Strong / Moderate / Weak	[reasoning]
Accountability clear?	Strong / Moderate / Weak	[reasoning]

Step 3: Assess adoption risks

Identify the top 3 adoption risks:

Trust risk: Will the team trust the AI output enough to act on it?
Skills risk: Does the team have the skills to use it effectively?
Habit risk: Will people actually change their behavior?
Data quality risk: Is the input data good enough for the AI to produce useful output?
Integration risk: Does this fit into the existing tool stack?
Change fatigue risk: Is the team already overwhelmed with changes?
Governance architecture risk: Does the workflow need human-in-the-loop checkpoints? For regulated industries or sensitive workflows, look at tools with built-in approval steps and audit trails (e.g., Relay pattern) rather than fire-and-forget automation. Market validation confirms this: human-in-the-loop controls (Relay.app approval gates, Retool governance features, Make error-handling paths) are now table stakes in no-code automation, not premium add-ons.
Voice trust risk: (Voice AI only) Users may distrust AI voice more than AI text -- a robotic or uncanny voice erodes confidence faster than a mediocre chatbot response. Voice builds or destroys trust faster than text.
Accent/equity risk: (Voice AI only) Speech recognition accuracy varies by accent, dialect, and language. Deploying voice AI that works well for some users and poorly for others is a fairness issue, not just a quality issue. Test with representative user samples.
Regulatory/disclosure risk: (Voice AI only) Many jurisdictions require disclosure that the caller is speaking with an AI. Some require consent for call recording. Healthcare and financial services have additional constraints.

Step 4: Generate output

Evaluation Summary

What's being evaluated and why.

Four-Question Scorecard

Table with scores and evidence for each question.

Overall Recommendation

One of:

Adopt -- clear value, manageable change, accountability in place
Pilot first -- promising but untested assumptions; run a bounded experiment
Defer -- not the right time, team, or problem
Skip -- solving a problem that doesn't exist, or the non-AI solution is better

Adoption Risks

Top 3 risks with mitigation strategies.

Change Management Checklist

If adopting or piloting:

Define success metrics and baseline
Assign an owner for the pilot/rollout
Identify 2-3 power users to test first
Set a review date (30 days recommended)
Document the rollback plan
Brief the team on what's changing and why
Plan for the skill gap (training, documentation, pairing)

Measurement Plan

What to track, how to track it, and when to evaluate.

Workflow Redesign Sketch (if recommending Adopt or Pilot)

Before adoption, sketch how the workflow actually changes:

Current state → Future state:

Step	Current (who does what)	Future (who does what, with AI)	What changes	Quality gate
{{step}}	{{current actor and action}}	{{new actor/AI split}}	{{what's different}}	{{how quality is maintained}}

This sketch should answer:

Which steps does the AI handle vs. augment vs. not touch?
Where does a human review AI output before it moves forward?
What happens when the AI produces bad output? Who catches it?
What skills does the team need that they don't have today?

For deeper workflow redesign: Use /ai-workflow-redesign to produce a full human-AI operating model with role changes, quality assurance cadence, feedback loops, and transition phases.

Change Management Checklist (expanded)

If adopting or piloting:

ROI Framework

Before recommending Adopt or Pilot, size the financial picture:

Cost of adoption:
- Implementation cost: integration work, vendor fees, setup time (estimate in person-hours or dollars)
- Training cost: how long before the team is proficient? What productivity is lost during ramp?
- Ongoing operational cost: subscription fees, maintenance, data pipeline costs, monitoring overhead
- Total estimated cost for Year 1: [VERIFY: refine with actual vendor pricing and team capacity]
Expected value:
- Time saved: hours per week/month reclaimed across the team (be specific -- "saves the PM 3 hours/week on manual feedback triage")
- Error reduction: what mistakes does this prevent, and what do those mistakes cost today?
- Revenue impact: does this tool directly enable revenue (e.g., faster time to market, better conversion) or is the value purely operational?
- Estimated annual value: [VERIFY: confirm baseline metrics before committing to this number]
Payback period: How many months until cumulative value exceeds cumulative cost? If the payback period exceeds 6 months for a tool the team has never used, that is a yellow flag.
Opportunity cost: What else could this investment fund? If the team spends a quarter integrating an AI tool, what feature work, research, or infrastructure gets deferred? Name the specific trade-off.
Strategic fit: Does this AI adoption strengthen the company's competitive position (defensible advantage, proprietary data flywheel, unique capability) -- or is it table-stakes that every competitor will also have? Table-stakes adoption is fine, but it should be sized as a small bet, not a strategic initiative.

If the ROI math doesn't work or is too speculative to estimate, that's a signal: either the problem isn't well enough understood, or the tool isn't mature enough. Flag with .

For assessing product ops infrastructure needed to support AI adoption, use the product-ops-assessment skill. For PLG readiness when AI is part of the growth model, use plg-readiness-check.

Input

What's being evaluated: "AI agent that tallies and ranks feature requests across sales calls, support tickets, and customer transcripts"
Current workflow: "PM manually reviews top support tickets weekly and checks in with sales lead monthly. Feedback lives in Slack, Zendesk, and Gong."
Who's pushing for it: "VP of Product saw a demo at a conference"

Output (abbreviated)

Four-Question Scorecard:

Question	Score	Evidence
Real problem?	Moderate	The PM does spend significant time on manual feedback review. But the real question is: is the problem that feedback isn't aggregated, or that the team doesn't know what questions to ask of the feedback?
Measurable?	Weak	No baseline metrics for PM time on feedback, feedback coverage, or decision quality. Hard to measure improvement without a starting point.
Process change manageable?	Strong	Low technical barrier. PM would review a dashboard instead of multiple tools.
Accountability clear?	Weak	Who validates the AI's ranking? If the AI surfaces the wrong priorities, who catches it?

Overall Recommendation: Pilot first. The problem is real but the tool may be solving the wrong layer of it. Tallying feature requests faster doesn't help if the team needs to understand why customers are asking. Pilot with one PM for 30 days. Measure: does the PM make better decisions, or just process more data?

Input (voice AI example)

What's being evaluated: "AI voice agent to handle inbound customer service calls (order status, returns, FAQ)"
Current workflow: "IVR menu tree routes to human agents. 60% of calls are order status or return label requests. Average handle time 4 minutes. Agent cost ~$1.20/call."
Who's pushing for it: "VP of Operations saw Klarna case study"

Output (abbreviated)

Four-Question Scorecard:

Question	Score	Evidence
Real problem?	Strong	60% of calls are simple lookups -- classic automation candidate. Cost per call ($1.20) vs. AI (~$0.40/call) is a clear savings at volume.
Measurable?	Strong	Baseline exists: handle time, cost/call, CSAT, call volume by type. Easy to compare AI vs. human on same call types.
Process change manageable?	Moderate	Agents need retraining for escalation handling. Customers need disclosure ("you're speaking with an AI"). Regulatory check needed for call recording in applicable jurisdictions.
Accountability clear?	Moderate	Who owns it when the voice agent gives wrong return instructions? Need to define: who reviews transcripts, how errors are caught, what the human escalation trigger is.

Overall Recommendation: Pilot first. Strong unit economics but voice has trust and regulatory dimensions that text doesn't. Pilot with one call type (order status) for 30 days. Measure: resolution rate, CSAT, escalation rate, and -- critically -- listen to 50 random calls to catch quality issues that metrics miss.

How it works

Prompt

Q1: Is this solving a real problem, or are we chasing hype?

Q2: How will we measure whether it's actually helping?

Q3: What processes and responsibilities need to change?

Q4: Who is accountable when something goes sideways?

Evaluation Summary

Four-Question Scorecard

Overall Recommendation

Adoption Risks

Change Management Checklist

Measurement Plan

Workflow Redesign Sketch (if recommending Adopt or Pilot)

Change Management Checklist (expanded)

ROI Framework

Example Output

Input

Output (abbreviated)

Input (voice AI example)

Output (abbreviated)

Run this now

Part of these Playbook topics

Related Assessment & Diagnostics skills