AI Maturity Assess - AI Agent Skill

Use this when someone wants to understand their personal AI maturity level across 6 dimensions and get specific growth recommendations. Works for PMs, Designers, and Engineers.

Process

Step 1: Identify the role

Ask: What is your role? (Product Manager, Designer, or Engineer)

This determines which behavioral indicators to use for each dimension.

Step 2: Assess each dimension

For each of the 6 dimensions below, ask 2-3 behavioral questions tailored to their role. Ask about what they actually do, not what they know in theory.

The 6 dimensions:

Prompt & Interaction Quality -- How well they craft inputs and structure human-AI interaction
Evaluation Discipline -- How rigorously they review and validate AI outputs
Workflow Integration -- How deeply AI is embedded in their daily work
Context & Knowledge Management -- How well they structure context for AI tools
Governance & Bounded Autonomy -- How clearly they draw boundaries for AI autonomy vs. human review
AI Foundations -- Their understanding of core AI/ML concepts
Agent Operations -- How well they manage AI agents and autonomous workflows in production (monitoring, cost control, error recovery, deployment)

Scoring levels:

1 = Not Yet Started (no engagement with AI tools)
2 = Growing (experimenting, inconsistent results)
3 = Meets Expectations (effective daily use with review discipline)
4 = Exceeds Expectations (team multiplier, defines patterns for others)
5 = Leading (shapes organizational culture, drives cross-team standards)

Key maturity signal -- the "experiment to infrastructure" transition: The clearest indicator of moving from level 2 to level 3 is when AI tools stop being experiments and start being treated as infrastructure. At level 2, people say "I tried using AI for..." At level 3, AI is assumed -- it's wired into workflows before anyone consciously chooses it. At level 4-5, teams redesign workflows around AI capabilities rather than bolting AI onto existing processes. Probe for this transition explicitly: "Do you experiment with AI, or is it already part of how work gets done?"

Tool-tier awareness signal: At level 2, people use whatever AI coding tool someone recommended. At level 3, they consciously choose between tool tiers -- using Cursor for codebase work but Lovable for quick internal-tool prototypes. At level 4-5, they match tool tiers to the job systematically: engineering amplifiers for production code, prompt-to-app builders for prototypes and internal tools, agent orchestration for complex multi-step workflows. Probe: "Do you use different AI tools for different types of work, or the same tool for everything?"

Platform selection maturity signal: Beyond AI coding tools, probe whether the person evaluates build-vs-buy-vs-no-code decisions deliberately:

Level 2: Uses whatever tool was recommended or is trending. No awareness of lock-in or data ownership trade-offs.
Level 3: Consciously chooses between code, no-code platforms (Bubble, Retool, FlutterFlow), and AI-led builders (Rork, Repaint) based on the job.
Level 4-5: Evaluates graduation paths proactively -- knows when a prototype should move from a no-code platform to owned infrastructure. Considers data ownership, vendor lock-in, and cost scaling as first-order selection criteria.

Probe: "When you need to build an internal tool or prototype, how do you decide whether to code it, use a no-code platform, or use an AI builder?"

Example questions by role:

PM:

"When your team reaches for a no-code tool, what's your process for evaluating whether it's the right choice vs. building with code?"
"When you use AI to draft a user story, what does your review process look like?"
"How do you provide context to AI tools about your current project?"
"What rules does your team have about when AI output needs human review?"

Designer:

"How do you use AI tools in your design workflow today?"
"When AI generates design suggestions, how do you evaluate them against brand and accessibility standards?"
"How do you structure context (personas, brand guidelines) for AI tools?"

Engineer:

"How do you use AI for writing tests or implementation code?"
"What's your review process for AI-generated code before it ships?"
"How do you set up context (codebase, conventions) for AI coding tools?"

Agent Operations (all roles -- skip if the team hasn't deployed agents yet):

"If an AI agent fails mid-workflow, what happens? Retry, fallback, human escalation?"
"How do you track agent costs separately from other AI usage?"
"What's your process for updating prompts or tools in a deployed agent?"
"How do you monitor whether agents are actually producing good results over time?"

Ask one dimension at a time. Listen to the answer before moving on.

Step 3: Score and explain

After gathering answers for all 6 dimensions, score each 1-5 based on their consistent behavior (not their best day).

Briefly explain the reasoning for each score.

Step 4: Generate the assessment

Output in this format:

AI Maturity Assessment: (name or role)

Date: (today's date) Role: (PM / Designer / Engineer)

Dimension scores

Dimension	Score	Key observation
Prompt & Interaction Quality	(1-5)	(one sentence)
Evaluation Discipline	(1-5)	(one sentence)
Workflow Integration	(1-5)	(one sentence)
Context & Knowledge Management	(1-5)	(one sentence)
Governance & Bounded Autonomy	(1-5)	(one sentence)
AI Foundations	(1-5)	(one sentence)

Overall maturity level: (lowest score)

The overall maturity level uses the weakest-link model -- your overall level equals your lowest dimension score. This reflects that a gap in any dimension limits the effectiveness of all others.

Bottleneck dimension: (dimension name)

(Why this dimension matters and how it limits the others)

Growth recommendations

Priority 1: (specific, actionable recommendation for the weakest dimension) Priority 2: (recommendation for the second weakest)

One practice to adopt this week

(A single, concrete weekly habit that addresses the bottleneck -- something they can start doing Monday)

Step 5: Discuss

Ask:

Does this assessment feel accurate?
Any scores you'd adjust based on things I didn't ask about?
Want to explore specific practices to grow in your bottleneck dimension?

Output location

Present the assessment as formatted text in the conversation.

Example Output

Input

Role: Product Manager
Name/Context: Priya Nair, Senior PM at Meridian Health (mid-size digital health company, ~800 employees)
Self-described AI usage: "I use ChatGPT pretty regularly for drafting PRDs and user stories, and I've started experimenting with Notion AI. My team recently deployed a patient intake automation agent but I'm not super involved in monitoring it."
Sample answers to assessment questions:
- Prompt quality: "I usually just describe what I need and paste in some context. I don't have a consistent template."
- Evaluation: "I read the output and edit it — I'd say I catch obvious mistakes but I don't have a formal checklist."
- Workflow integration: "AI is in my weekly workflow for docs, but I don't use it for roadmap prioritization or stakeholder comms yet."
- Context management: "I paste in our product brief when I remember. No system for it."
- Governance: "We don't really have team rules about when to review AI output vs. ship it directly. It's individual judgment."
- AI foundations: "I know what LLMs are and understand hallucination risk, but I couldn't explain embeddings or fine-tuning."
- Agent ops: "The intake agent is live but our eng team monitors it. I don't know what the failure mode is or what it costs per run."

Output (abbreviated)

AI Maturity Assessment: Priya Nair

Date: June 14, 2025 Role: Product Manager — Meridian Health

Dimension Scores

Dimension	Score	Key Observation
Prompt & Interaction Quality	2	Uses AI reactively with ad-hoc inputs; no prompt templates or structured interaction patterns.
Evaluation Discipline	2	Applies judgment informally but has no review checklist or consistent validation standard.
Workflow Integration	3	AI is reliably embedded in documentation work, but absent from higher-leverage PM tasks like prioritization.
Context & Knowledge Management	2	No persistent context system; project knowledge shared inconsistently depending on what's remembered.
Governance & Bounded Autonomy	1	No team-level rules about AI review thresholds; each person decides individually, including in a production agent context.
AI Foundations	3	Solid working knowledge of LLM behavior and risk; gap in technical depth limits evaluation of architectural trade-offs.
Agent Operations	1	Live agent in production with no PM visibility into cost, failure modes, or output quality monitoring.

Overall Maturity Level: 1

The weakest-link model applies — Priya's genuine strengths in workflow integration and AI foundations are capped by critical gaps in governance and agent operations. A production patient intake agent running without PM-level oversight represents meaningful compliance and quality risk in a regulated industry.

Bottleneck Dimension: Governance & Bounded Autonomy (tied with Agent Operations)

Meridian Health operates under HIPAA constraints. Without explicit team-level rules defining when AI output requires human review before patient-facing use, every individual on the team is making implicit risk decisions. The live intake agent compounds this — it's infrastructure, not an experiment, but it's being treated like an experiment no one owns.

Growth Recommendations

Priority 1: Define a two-tier review policy with your team this sprint. Tier 1 = internal docs (user stories, PRDs) — spot-check only. Tier 2 = anything patient-facing or feeding a downstream system — mandatory human review before use. Write it as a one-pager and get team sign-off.

Priority 2: Schedule a 30-minute sync with the engineer running the intake agent. Leave with three answers: what triggers a failure alert, what it costs per 100 runs, and what the fallback is if it errors mid-intake. Document these as the agent's operational spec and own it as PM.

One Practice to Adopt This Week

Monday: Create a shared Notion doc titled "AI Context Pack — [Current Quarter]" and paste in your product vision, top 3 OKRs, and one active user persona. Link it in your team Slack channel. Use it as your default context block every time you open a new AI conversation this week. By Friday you'll have a reusable context system instead of starting from scratch each session.

Does this feel accurate? Any dimensions you'd score differently based on context I didn't capture? Happy to go deeper on governance practices for regulated environments specifically.

Run this now

Try /ai-maturity-assess on your own input

0/4000

Part of these Playbook topics

AI Maturity

Related Assessment & Diagnostics skills

10x Move Evaluator AI Adoption Evaluator AI Maturity Org Annual Strategy Review Attribution Model Designer Bias Spotter Brand Audit Campaign Post-mortem

Back to Skills Catalog