Skip to main content
Assessment & Diagnostics/ai-adoption-evaluator

AI Adoption Evaluator

You need to evaluate AI adoption readiness and progress.

Use this when a client is evaluating whether to adopt an AI tool, feature, or capability and needs a structured framework to cut through the hype.

Platform context: In a platform context, run this per consuming team to build the adoption priority queue for the platform team. The aggregate view across teams reveals which AI capabilities have the broadest demand and strongest case for platform-level investment.


How it works

  1. You describe the AI tool or capability being considered and the current workflow
  2. The skill evaluates it against four critical questions
  3. It returns an evaluation scorecard, adoption risk assessment, and change management checklist

Prompt

You are evaluating an AI adoption decision for Kate Makrigiannis. Kate is a product strategist and consultant. The real challenge with AI adoption isn't "Can we use this tool?" — it's "Does this make our system better, or are we just adding more noise?" Strong product ops asks four questions before adopting anything AI-related. Your job is to apply those questions rigorously.

Inputs I will provide:

  • What's being evaluated: {{TOOL_OR_CAPABILITY}} (the AI tool, feature, or capability under consideration)
  • Current workflow: {{CURRENT_WORKFLOW}} (how the team does this work today, without the AI tool)
  • Team context (optional): {{TEAM_CONTEXT}} (team size, technical sophistication, change appetite)
  • Who's pushing for it (optional): {{CHAMPION}} (where the impetus is coming from — leadership, individual IC, vendor pitch)

Step 1: Apply the four evaluation questions

Q1: Is this solving a real problem, or are we chasing hype?

  • What specific problem does this tool solve?

  • Is the problem significant enough to justify the investment?

  • Would the team have prioritized this problem if AI weren't in the headlines?

  • Is there a simpler, non-AI solution that would work?

  • Architecture test: Is the team automating the existing workflow, or redesigning the workflow with AI in the loop? The real productivity gain from AI automation isn't bolting AI onto existing processes -- it's rethinking the architecture entirely. If the answer is "we want AI to do what we already do, but faster," probe whether the workflow itself should change.

  • Tool-tier test: Is the team evaluating the right tier of AI coding tool? The market has split into two camps: engineering amplifiers (Cursor, Claude Code, Windsurf -- make developers faster) and non-technical builders (Lovable, v0, bolt.new, Replit -- let non-developers ship). If the team is evaluating a developer amplifier but the actual users are PMs or designers, or vice versa, the tool-job fit is wrong before the evaluation even starts.

  • Build-path test: Is the team choosing between code, no-code, and AI-led building? The market now offers three paths to a working product:

    • Code (traditional engineering or vibe coding with Cursor/Claude Code) -- full ownership, full complexity
    • No-code platforms (Bubble, Retool, FlutterFlow) -- drag-and-drop, database-backed, moderate lock-in
    • AI-led builders (Rork, Repaint, Noah AI) -- describe-and-deploy, fastest start, highest lock-in risk

    If the team hasn't explicitly evaluated which path fits their ownership needs, data requirements, and graduation timeline, the adoption evaluation is incomplete. See /tool-recommend for the no-code cluster framework.

  • Market positioning probe: Is this tool an established leader with strong community validation (e.g., #1 reviewed, broad adoption), or an emerging tool where you'd be an early adopter? Both are valid, but the risk profile differs. Check knowledge/ai-market-landscape-reference.md for current positioning data.

Q2: How will we measure whether it's actually helping?

  • What metrics will indicate success?
  • What's the baseline today (before adoption)?
  • What does "good" look like in 30/60/90 days?
  • Who is responsible for tracking this?

Q3: What processes and responsibilities need to change?

  • What workflow changes are required?
  • What new skills does the team need?
  • What existing tools or processes does this replace or modify?
  • What happens to the quality of work that was previously human-reviewed?

Q4: Who is accountable when something goes sideways?

  • What could go wrong? (data quality issues, hallucinations, compliance risks, trust erosion)
  • Who owns the fallout?
  • What's the rollback plan?
  • Are there governance requirements (legal, compliance, security)?

Step 2: Score the evaluation

QuestionScoreEvidence
Real problem?Strong / Moderate / Weak[reasoning]
Measurable?Strong / Moderate / Weak[reasoning]
Process change manageable?Strong / Moderate / Weak[reasoning]
Accountability clear?Strong / Moderate / Weak[reasoning]

Step 3: Assess adoption risks

Identify the top 3 adoption risks:

  • Trust risk: Will the team trust the AI output enough to act on it?
  • Skills risk: Does the team have the skills to use it effectively?
  • Habit risk: Will people actually change their behavior?
  • Data quality risk: Is the input data good enough for the AI to produce useful output?
  • Integration risk: Does this fit into the existing tool stack?
  • Change fatigue risk: Is the team already overwhelmed with changes?
  • Governance architecture risk: Does the workflow need human-in-the-loop checkpoints? For regulated industries or sensitive workflows, look at tools with built-in approval steps and audit trails (e.g., Relay pattern) rather than fire-and-forget automation. Market validation confirms this: human-in-the-loop controls (Relay.app approval gates, Retool governance features, Make error-handling paths) are now table stakes in no-code automation, not premium add-ons.
  • Voice trust risk: (Voice AI only) Users may distrust AI voice more than AI text -- a robotic or uncanny voice erodes confidence faster than a mediocre chatbot response. Voice builds or destroys trust faster than text.
  • Accent/equity risk: (Voice AI only) Speech recognition accuracy varies by accent, dialect, and language. Deploying voice AI that works well for some users and poorly for others is a fairness issue, not just a quality issue. Test with representative user samples.
  • Regulatory/disclosure risk: (Voice AI only) Many jurisdictions require disclosure that the caller is speaking with an AI. Some require consent for call recording. Healthcare and financial services have additional constraints.

Step 4: Generate output

Evaluation Summary

What's being evaluated and why.

Four-Question Scorecard

Table with scores and evidence for each question.

Overall Recommendation

One of:

  • Adopt — clear value, manageable change, accountability in place
  • Pilot first — promising but untested assumptions; run a bounded experiment
  • Defer — not the right time, team, or problem
  • Skip — solving a problem that doesn't exist, or the non-AI solution is better

Adoption Risks

Top 3 risks with mitigation strategies.

Change Management Checklist

If adopting or piloting:

  • Define success metrics and baseline
  • Assign an owner for the pilot/rollout
  • Identify 2-3 power users to test first
  • Set a review date (30 days recommended)
  • Document the rollback plan
  • Brief the team on what's changing and why
  • Plan for the skill gap (training, documentation, pairing)

Measurement Plan

What to track, how to track it, and when to evaluate.

Workflow Redesign Sketch (if recommending Adopt or Pilot)

Before adoption, sketch how the workflow actually changes:

Current state → Future state:

StepCurrent (who does what)Future (who does what, with AI)What changesQuality gate
{{step}}{{current actor and action}}{{new actor/AI split}}{{what's different}}{{how quality is maintained}}

This sketch should answer:

  • Which steps does the AI handle vs. augment vs. not touch?
  • Where does a human review AI output before it moves forward?
  • What happens when the AI produces bad output? Who catches it?
  • What skills does the team need that they don't have today?

For deeper workflow redesign: Use /ai-workflow-redesign to produce a full human-AI operating model with role changes, quality assurance cadence, feedback loops, and transition phases.

Change Management Checklist (expanded)

If adopting or piloting:

  • Define success metrics and baseline
  • Assign an owner for the pilot/rollout
  • Identify 2-3 power users to test first
  • Set a review date (30 days recommended)
  • Document the rollback plan
  • Brief the team on what's changing and why
  • Plan for the skill gap (training, documentation, pairing)
  • Identify resistance patterns likely to emerge (see /change-readiness-assessment for framework)
  • Design the communication cadence (weekly updates during pilot, then monthly)
  • Define the "graduate from pilot" criteria -- what has to be true to expand?
  • Name who catches AI errors and how they're reported back to improve the workflow

ROI Framework

Before recommending Adopt or Pilot, size the financial picture:

  • Cost of adoption:

    • Implementation cost: integration work, vendor fees, setup time (estimate in person-hours or dollars)
    • Training cost: how long before the team is proficient? What productivity is lost during ramp?
    • Ongoing operational cost: subscription fees, maintenance, data pipeline costs, monitoring overhead
    • Total estimated cost for Year 1: [VERIFY: refine with actual vendor pricing and team capacity]
  • Expected value:

    • Time saved: hours per week/month reclaimed across the team (be specific -- "saves the PM 3 hours/week on manual feedback triage")
    • Error reduction: what mistakes does this prevent, and what do those mistakes cost today?
    • Revenue impact: does this tool directly enable revenue (e.g., faster time to market, better conversion) or is the value purely operational?
    • Estimated annual value: [VERIFY: confirm baseline metrics before committing to this number]
  • Payback period: How many months until cumulative value exceeds cumulative cost? If the payback period exceeds 6 months for a tool the team has never used, that is a yellow flag.

  • Opportunity cost: What else could this investment fund? If the team spends a quarter integrating an AI tool, what feature work, research, or infrastructure gets deferred? Name the specific trade-off.

  • Strategic fit: Does this AI adoption strengthen the company's competitive position (defensible advantage, proprietary data flywheel, unique capability) -- or is it table-stakes that every competitor will also have? Table-stakes adoption is fine, but it should be sized as a small bet, not a strategic initiative.

If the ROI math doesn't work or is too speculative to estimate, that's a signal: either the problem isn't well enough understood, or the tool isn't mature enough. Flag with .


For assessing product ops infrastructure needed to support AI adoption, use the product-ops-assessment skill. For PLG readiness when AI is part of the growth model, use plg-readiness-check.

Examples

Input

  • What's being evaluated: "AI agent that tallies and ranks feature requests across sales calls, support tickets, and customer transcripts"
  • Current workflow: "PM manually reviews top support tickets weekly and checks in with sales lead monthly. Feedback lives in Slack, Zendesk, and Gong."
  • Who's pushing for it: "VP of Product saw a demo at a conference"

Output (abbreviated)

Four-Question Scorecard:

QuestionScoreEvidence
Real problem?ModerateThe PM does spend significant time on manual feedback review. But the real question is: is the problem that feedback isn't aggregated, or that the team doesn't know what questions to ask of the feedback?
Measurable?WeakNo baseline metrics for PM time on feedback, feedback coverage, or decision quality. Hard to measure improvement without a starting point.
Process change manageable?StrongLow technical barrier. PM would review a dashboard instead of multiple tools.
Accountability clear?WeakWho validates the AI's ranking? If the AI surfaces the wrong priorities, who catches it?

Overall Recommendation: Pilot first. The problem is real but the tool may be solving the wrong layer of it. Tallying feature requests faster doesn't help if the team needs to understand why customers are asking. Pilot with one PM for 30 days. Measure: does the PM make better decisions, or just process more data?

Input (voice AI example)

  • What's being evaluated: "AI voice agent to handle inbound customer service calls (order status, returns, FAQ)"
  • Current workflow: "IVR menu tree routes to human agents. 60% of calls are order status or return label requests. Average handle time 4 minutes. Agent cost ~$1.20/call."
  • Who's pushing for it: "VP of Operations saw Klarna case study"

Output (abbreviated)

Four-Question Scorecard:

QuestionScoreEvidence
Real problem?Strong60% of calls are simple lookups -- classic automation candidate. Cost per call ($1.20) vs. AI (~$0.40/call) is a clear savings at volume.
Measurable?StrongBaseline exists: handle time, cost/call, CSAT, call volume by type. Easy to compare AI vs. human on same call types.
Process change manageable?ModerateAgents need retraining for escalation handling. Customers need disclosure ("you're speaking with an AI"). Regulatory check needed for call recording in applicable jurisdictions.
Accountability clear?ModerateWho owns it when the voice agent gives wrong return instructions? Need to define: who reviews transcripts, how errors are caught, what the human escalation trigger is.

Overall Recommendation: Pilot first. Strong unit economics but voice has trust and regulatory dimensions that text doesn't. Pilot with one call type (order status) for 30 days. Measure: resolution rate, CSAT, escalation rate, and -- critically -- listen to 50 random calls to catch quality issues that metrics miss.

Example Output

Input

  • What's being evaluated: "AI agent that tallies and ranks feature requests across sales calls, support tickets, and customer transcripts"
  • Current workflow: "PM manually reviews top support tickets weekly and checks in with sales lead monthly. Feedback lives in Slack, Zendesk, and Gong."
  • Who's pushing for it: "VP of Product saw a demo at a conference"

Output (abbreviated)

Four-Question Scorecard:

QuestionScoreEvidence
Real problem?ModerateThe PM does spend significant time on manual feedback review. But the real question is: is the problem that feedback isn't aggregated, or that the team doesn't know what questions to ask of the feedback?
Measurable?WeakNo baseline metrics for PM time on feedback, feedback coverage, or decision quality. Hard to measure improvement without a starting point.
Process change manageable?StrongLow technical barrier. PM would review a dashboard instead of multiple tools.
Accountability clear?WeakWho validates the AI's ranking? If the AI surfaces the wrong priorities, who catches it?

Overall Recommendation: Pilot first. The problem is real but the tool may be solving the wrong layer of it. Tallying feature requests faster doesn't help if the team needs to understand why customers are asking. Pilot with one PM for 30 days. Measure: does the PM make better decisions, or just process more data?

Input (voice AI example)

  • What's being evaluated: "AI voice agent to handle inbound customer service calls (order status, returns, FAQ)"
  • Current workflow: "IVR menu tree routes to human agents. 60% of calls are order status or return label requests. Average handle time 4 minutes. Agent cost ~$1.20/call."
  • Who's pushing for it: "VP of Operations saw Klarna case study"

Output (abbreviated)

Four-Question Scorecard:

QuestionScoreEvidence
Real problem?Strong60% of calls are simple lookups -- classic automation candidate. Cost per call ($1.20) vs. AI (~$0.40/call) is a clear savings at volume.
Measurable?StrongBaseline exists: handle time, cost/call, CSAT, call volume by type. Easy to compare AI vs. human on same call types.
Process change manageable?ModerateAgents need retraining for escalation handling. Customers need disclosure ("you're speaking with an AI"). Regulatory check needed for call recording in applicable jurisdictions.
Accountability clear?ModerateWho owns it when the voice agent gives wrong return instructions? Need to define: who reviews transcripts, how errors are caught, what the human escalation trigger is.

Overall Recommendation: Pilot first. Strong unit economics but voice has trust and regulatory dimensions that text doesn't. Pilot with one call type (order status) for 30 days. Measure: resolution rate, CSAT, escalation rate, and -- critically -- listen to 50 random calls to catch quality issues that metrics miss.