Use this when someone wants to understand their personal AI maturity level across 6 dimensions and get specific growth recommendations. Works for PMs, Designers, and Engineers.
Process
Step 1: Identify the role
Ask: What is your role? (Product Manager, Designer, or Engineer)
This determines which behavioral indicators to use for each dimension.
Step 2: Assess each dimension
For each of the 6 dimensions below, ask 2-3 behavioral questions tailored to their role. Ask about what they actually do, not what they know in theory.
The 6 dimensions:
- Prompt & Interaction Quality — How well they craft inputs and structure human-AI interaction
- Evaluation Discipline — How rigorously they review and validate AI outputs
- Workflow Integration — How deeply AI is embedded in their daily work
- Context & Knowledge Management — How well they structure context for AI tools
- Governance & Bounded Autonomy — How clearly they draw boundaries for AI autonomy vs. human review
- AI Foundations — Their understanding of core AI/ML concepts
- Agent Operations — How well they manage AI agents and autonomous workflows in production (monitoring, cost control, error recovery, deployment)
Scoring levels:
- 1 = Not Yet Started (no engagement with AI tools)
- 2 = Growing (experimenting, inconsistent results)
- 3 = Meets Expectations (effective daily use with review discipline)
- 4 = Exceeds Expectations (team multiplier, defines patterns for others)
- 5 = Leading (shapes organizational culture, drives cross-team standards)
Key maturity signal -- the "experiment to infrastructure" transition: The clearest indicator of moving from level 2 to level 3 is when AI tools stop being experiments and start being treated as infrastructure. At level 2, people say "I tried using AI for..." At level 3, AI is assumed -- it's wired into workflows before anyone consciously chooses it. At level 4-5, teams redesign workflows around AI capabilities rather than bolting AI onto existing processes. Probe for this transition explicitly: "Do you experiment with AI, or is it already part of how work gets done?"
Tool-tier awareness signal: At level 2, people use whatever AI coding tool someone recommended. At level 3, they consciously choose between tool tiers -- using Cursor for codebase work but Lovable for quick internal-tool prototypes. At level 4-5, they match tool tiers to the job systematically: engineering amplifiers for production code, prompt-to-app builders for prototypes and internal tools, agent orchestration for complex multi-step workflows. Probe: "Do you use different AI tools for different types of work, or the same tool for everything?"
Platform selection maturity signal: Beyond AI coding tools, probe whether the person evaluates build-vs-buy-vs-no-code decisions deliberately:
- Level 2: Uses whatever tool was recommended or is trending. No awareness of lock-in or data ownership trade-offs.
- Level 3: Consciously chooses between code, no-code platforms (Bubble, Retool, FlutterFlow), and AI-led builders (Rork, Repaint) based on the job.
- Level 4-5: Evaluates graduation paths proactively -- knows when a prototype should move from a no-code platform to owned infrastructure. Considers data ownership, vendor lock-in, and cost scaling as first-order selection criteria.
Probe: "When you need to build an internal tool or prototype, how do you decide whether to code it, use a no-code platform, or use an AI builder?"
Example questions by role:
PM:
- "When your team reaches for a no-code tool, what's your process for evaluating whether it's the right choice vs. building with code?"
- "When you use AI to draft a user story, what does your review process look like?"
- "How do you provide context to AI tools about your current project?"
- "What rules does your team have about when AI output needs human review?"
Designer:
- "How do you use AI tools in your design workflow today?"
- "When AI generates design suggestions, how do you evaluate them against brand and accessibility standards?"
- "How do you structure context (personas, brand guidelines) for AI tools?"
Engineer:
- "How do you use AI for writing tests or implementation code?"
- "What's your review process for AI-generated code before it ships?"
- "How do you set up context (codebase, conventions) for AI coding tools?"
Agent Operations (all roles -- skip if the team hasn't deployed agents yet):
- "If an AI agent fails mid-workflow, what happens? Retry, fallback, human escalation?"
- "How do you track agent costs separately from other AI usage?"
- "What's your process for updating prompts or tools in a deployed agent?"
- "How do you monitor whether agents are actually producing good results over time?"
Ask one dimension at a time. Listen to the answer before moving on.
Step 3: Score and explain
After gathering answers for all 6 dimensions, score each 1-5 based on their consistent behavior (not their best day).
Briefly explain the reasoning for each score.
Step 4: Generate the assessment
Output in this format:
AI Maturity Assessment: (name or role)
Date: (today's date) Role: (PM / Designer / Engineer)
Dimension scores
| Dimension | Score | Key observation |
|---|---|---|
| Prompt & Interaction Quality | (1-5) | (one sentence) |
| Evaluation Discipline | (1-5) | (one sentence) |
| Workflow Integration | (1-5) | (one sentence) |
| Context & Knowledge Management | (1-5) | (one sentence) |
| Governance & Bounded Autonomy | (1-5) | (one sentence) |
| AI Foundations | (1-5) | (one sentence) |
Overall maturity level: (lowest score)
The overall maturity level uses the weakest-link model — your overall level equals your lowest dimension score. This reflects that a gap in any dimension limits the effectiveness of all others.
Bottleneck dimension: (dimension name)
(Why this dimension matters and how it limits the others)
Growth recommendations
Priority 1: (specific, actionable recommendation for the weakest dimension) Priority 2: (recommendation for the second weakest)
One practice to adopt this week
(A single, concrete weekly habit that addresses the bottleneck — something they can start doing Monday)
Step 5: Discuss
Ask:
- Does this assessment feel accurate?
- Any scores you'd adjust based on things I didn't ask about?
- Want to explore specific practices to grow in your bottleneck dimension?
Output location
Present the assessment as formatted text in the conversation.
Example Output
Input
- Role: Product Manager
- Name/Context: Priya Nair, Senior PM at Meridian Health (mid-size digital health company, ~800 employees)
- Self-described AI usage: "I use ChatGPT pretty regularly for drafting PRDs and user stories, and I've started experimenting with Notion AI. My team recently deployed a patient intake automation agent but I'm not super involved in monitoring it."
- Sample answers to assessment questions:
- Prompt quality: "I usually just describe what I need and paste in some context. I don't have a consistent template."
- Evaluation: "I read the output and edit it — I'd say I catch obvious mistakes but I don't have a formal checklist."
- Workflow integration: "AI is in my weekly workflow for docs, but I don't use it for roadmap prioritization or stakeholder comms yet."
- Context management: "I paste in our product brief when I remember. No system for it."
- Governance: "We don't really have team rules about when to review AI output vs. ship it directly. It's individual judgment."
- AI foundations: "I know what LLMs are and understand hallucination risk, but I couldn't explain embeddings or fine-tuning."
- Agent ops: "The intake agent is live but our eng team monitors it. I don't know what the failure mode is or what it costs per run."
Output (abbreviated)
AI Maturity Assessment: Priya Nair
Date: June 14, 2025 Role: Product Manager — Meridian Health
Dimension Scores
| Dimension | Score | Key Observation |
|---|---|---|
| Prompt & Interaction Quality | 2 | Uses AI reactively with ad-hoc inputs; no prompt templates or structured interaction patterns. |
| Evaluation Discipline | 2 | Applies judgment informally but has no review checklist or consistent validation standard. |
| Workflow Integration | 3 | AI is reliably embedded in documentation work, but absent from higher-leverage PM tasks like prioritization. |
| Context & Knowledge Management | 2 | No persistent context system; project knowledge shared inconsistently depending on what's remembered. |
| Governance & Bounded Autonomy | 1 | No team-level rules about AI review thresholds; each person decides individually, including in a production agent context. |
| AI Foundations | 3 | Solid working knowledge of LLM behavior and risk; gap in technical depth limits evaluation of architectural trade-offs. |
| Agent Operations | 1 | Live agent in production with no PM visibility into cost, failure modes, or output quality monitoring. |
Overall Maturity Level: 1
The weakest-link model applies — Priya's genuine strengths in workflow integration and AI foundations are capped by critical gaps in governance and agent operations. A production patient intake agent running without PM-level oversight represents meaningful compliance and quality risk in a regulated industry.
Bottleneck Dimension: Governance & Bounded Autonomy (tied with Agent Operations)
Meridian Health operates under HIPAA constraints. Without explicit team-level rules defining when AI output requires human review before patient-facing use, every individual on the team is making implicit risk decisions. The live intake agent compounds this — it's infrastructure, not an experiment, but it's being treated like an experiment no one owns.
Growth Recommendations
Priority 1: Define a two-tier review policy with your team this sprint. Tier 1 = internal docs (user stories, PRDs) — spot-check only. Tier 2 = anything patient-facing or feeding a downstream system — mandatory human review before use. Write it as a one-pager and get team sign-off.
Priority 2: Schedule a 30-minute sync with the engineer running the intake agent. Leave with three answers: what triggers a failure alert, what it costs per 100 runs, and what the fallback is if it errors mid-intake. Document these as the agent's operational spec and own it as PM.
One Practice to Adopt This Week
Monday: Create a shared Notion doc titled "AI Context Pack — [Current Quarter]" and paste in your product vision, top 3 OKRs, and one active user persona. Link it in your team Slack channel. Use it as your default context block every time you open a new AI conversation this week. By Friday you'll have a reusable context system instead of starting from scratch each session.
Does this feel accurate? Any dimensions you'd score differently based on context I didn't capture? Happy to go deeper on governance practices for regulated environments specifically.