UX Audit - AI Agent Skill

Use this when you're joining an existing product and need to quickly understand where the user experience falls short -- or when it's time for a periodic reassessment. Produces a prioritized gap map with severity ratings, evidence, and recommendations.

Framework attribution: Grounded in Agile UCD (inclusivity and accessible-by-default design) as practiced at Pivotal Labs, alongside Nielsen's usability heuristics. See knowledge/agile-ucd-reference.md.

Related skills: For WCAG accessibility evaluation (perceivable, operable, understandable, robust), use /accessibility-audit -- this skill covers Nielsen's heuristics, not accessibility standards. For evaluative testing with real users, use /usability-test-plan.

Process

Step 1: Gather inputs

Ask the user to provide:

Product description -- what it does, who it's for, current state
Raw observations -- notes from walking through the product yourself (screenshots, flow descriptions, pain points noticed)
User data -- any available support tickets, NPS responses, app store reviews, analytics (drop-off rates, low-adoption features, time-on-task data)
User segments -- who are the primary users? Different users experience different gaps.
Core flows to assess -- which user flows matter most? (e.g., onboarding, core task, checkout, search)

The user can paste raw notes, screenshots, or data exports. If they haven't walked through the product yet, advise them to spend 2-4 hours using it before running the audit.

Step 2: Heuristic evaluation

Analyze the observations against Nielsen's 10 usability heuristics:

## UX Gap Assessment -- (Product name, date)

### Heuristic Evaluation

| # | Heuristic | Rating | Key Findings |
|---|-----------|--------|-------------|
| 1 | Visibility of system status | (Good / Issues found) | (Brief summary) |
| 2 | Match between system and real world | (Good / Issues found) | (Brief summary) |
| 3 | User control and freedom | (Good / Issues found) | (Brief summary) |
| 4 | Consistency and standards | (Good / Issues found) | (Brief summary) |
| 5 | Error prevention | (Good / Issues found) | (Brief summary) |
| 6 | Recognition rather than recall | (Good / Issues found) | (Brief summary) |
| 7 | Flexibility and efficiency | (Good / Issues found) | (Brief summary) |
| 8 | Aesthetic and minimalist design | (Good / Issues found) | (Brief summary) |
| 9 | Error recovery | (Good / Issues found) | (Brief summary) |
| 10 | Help and documentation | (Good / Issues found) | (Brief summary) |
| 11 | Accessibility (WCAG 2.2 AA baseline) | (Good / Issues found) | (Brief summary) |
| 12 | Internationalization & cultural fit (if multi-locale) | (Good / Issues found) | (Brief summary) |

Heuristic 11 -- Accessibility quick screen: This is not a replacement for /accessibility-audit (which covers full WCAG 2.2 compliance). This is a baseline check that every UX audit should include:

Perceivable -- do text and UI elements meet contrast ratios (4.5:1 text, 3:1 components)? Is information conveyed beyond color alone? Do images have alt text?
Operable -- can all interactive elements be reached and activated via keyboard? Are focus indicators visible? Do touch targets meet minimum size (24x24px WCAG 2.2, 44x44px recommended)?
Understandable -- are form inputs labeled? Do error messages identify the problem and suggest correction? Is language consistent?
Robust -- does the heading hierarchy follow h1-h6 order? Are custom components using appropriate ARIA roles?

Flag issues found, note severity, and recommend /accessibility-audit for a comprehensive evaluation if significant issues surface.

Heuristic 12 -- Internationalization & cultural fit (if multi-locale): Skip this heuristic if the product serves a single locale. Include it when the product serves or plans to serve multiple languages or regions.

Text truncation -- do translated strings overflow containers or get clipped? Check longest locales (German, Finnish).
Date/time/number formatting -- are formats correct for each locale, or hardcoded to one convention?
RTL layout support -- if serving Arabic, Hebrew, Farsi, or Urdu: is the layout properly mirrored? Navigation, icons, progress indicators?
Cultural appropriateness -- do imagery, icons, colors, or metaphors carry unintended meaning in target cultures?
Text expansion handling -- do UI layouts accommodate 30-40% longer strings without breaking?
Locale switcher -- is the language/region selector discoverable and functional? Does it use native language names (Deutsch, not German)?
Input methods -- does the product support IME input for CJK languages, diacritics for European languages?

For each issue found, detail:

What's happening -- describe the problem
Severity -- Critical / Major / Minor / Cosmetic
Frequency -- how often does a user encounter this?
Evidence -- observation + supporting data (if available)

Mobile and touch-specific heuristics

If the product has a mobile or responsive experience, add this evaluation:

### Mobile & Touch Evaluation

| Criterion | Rating | Key Findings |
|-----------|--------|-------------|
| Touch targets (minimum 44x44px, 48x48px preferred) | (Pass / Fail) | (Summary) |
| Thumb zone placement (primary actions in easy-reach zone) | (Good / Issues found) | (Summary) |
| Viewport adaptation (no horizontal scroll, readable without zoom) | (Good / Issues found) | (Summary) |
| Input method fit (appropriate keyboards, no hover-dependent UI) | (Good / Issues found) | (Summary) |
| Gesture discoverability (swipe, pull-to-refresh, long-press are discoverable) | (Good / Issues found) | (Summary) |
| Orientation handling (landscape/portrait transitions work) | (Good / Issues found) | (Summary) |
| Fat-finger tolerance (adequate spacing between tappable elements) | (Good / Issues found) | (Summary) |

Step 3: Cross-reference with existing research

Before layering in quantitative data, check if prior research exists that strengthens or challenges your heuristic findings. Ask the user:

Do you have past interview transcripts, usability test results, or research syntheses for this product?
Are there research nuggets or synthesis documents from /interview-synthesis or /research-synthesize?

If prior research is available, cross-reference:

### Prior Research Cross-Reference

| Heuristic finding | Prior research evidence | Effect on severity |
|---|---|---|
| (Gap from heuristic review) | (What interviews/tests revealed about this same area) | Increased / Confirmed / No prior data |

This step matters because heuristic evaluation is expert opinion -- prior research adds real user evidence. A heuristic issue that users also struggled with in testing is higher-confidence than one based on expert judgment alone. Conversely, an issue you flagged that users navigated easily may be lower-severity than your review suggested.

If no prior research exists, note this as a gap:

[NOTE: No prior user research available for this product. Heuristic findings are expert-judgment only. Consider follow-up usability testing to validate severity ratings.]

Step 4: Cross-reference with user data

If the user provided support data, analytics, or reviews, layer it against the heuristic findings:

### Data Cross-Reference

| Finding | Heuristic evidence | Data evidence | Confidence |
|---------|-------------------|---------------|------------|
| (Gap) | (What you observed) | (What the data shows) | High / Medium / Low |

Findings supported by both observation and data are high-confidence gaps.

Step 5: Gap map

Produce the prioritized gap map:

### Gap Map (prioritized)

| # | Gap | Severity | Frequency | User Segments | Evidence | Recommendation |
|---|-----|----------|-----------|---------------|----------|----------------|
| 1 | (What's missing or broken) | Critical | Daily | (Who) | (Sources) | (What to do) |
| 2 | ... | Major | Weekly | ... | ... | ... |

### Quick Wins (high value, low effort)
- (Improvement 1) -- (why it's quick, what it unlocks)
- (Improvement 2)

### Deep Investigations Needed
- (Gap that needs user research to fully understand)

### What's Working Well
- (Strength 1 -- protect this)
- (Strength 2)

Step 5b: AI-in-the-loop pass (2026 practice)

By 2026, AI does the first heavy lift on a UX audit and the human verifies. Use it to compress the manual work, not to replace your judgment. Three places it earns its keep:

1. AI-assisted synthesis of the evidence pile. When you have support tickets, reviews, or session notes feeding Step 4, run an AI clustering pass before you hand-sort. Tools like Dovetail auto-transcribe and theme-tag, and AI-native repositories like Notably and Marvin auto-cluster repeated themes and draft insight summaries. This roughly halves interview-to-insight time and collapses the manual affinity-mapping burden. The catch: AI clusters surface what is frequent, not always what is severe. Treat the output as a draft theme map and verify every cluster against the raw quotes before it changes a severity rating. Note in the gap map which findings came from an AI-clustered pass versus direct observation.

2. Critique AI-generated prototypes, not just the shipped product. 72% of designers now use generative AI, and Figma Make, v0, and Lovable turn a prompt or a screenshot into an interactive prototype with clean Tailwind/Radix code. When a team brings an AI-generated prototype as the proposed fix, audit that artifact with the same heuristics. AI-generated UI reliably skips the unglamorous states: empty states, error and loading states, focus indicators, long-string and truncation handling, and brand fidelity. Your value shifts from spotting layout problems to validating the generated output against edge cases and the accessibility screen in Step 2.

3. Add a codegen-readiness dimension when the team builds with AI. Trustworthy AI codegen runs on the design system. An MCP server scans the codebase and emits a structured rules file (token definitions, component libraries, style hierarchies, naming conventions) so AI agents generate brand- and accessibility-aligned code without re-prompting. Only about 32% of designers currently trust AI-generated code, and the gap is missing context. If the team ships with AI assistance, assess token coverage and MCP-readiness as part of the audit:

### Codegen Readiness (if the team builds with AI)

| Dimension | Rating | Key Findings |
|-----------|--------|-------------|
| Token coverage (color, spacing, type defined as tokens, not hardcoded) | (Good / Issues found) | (Summary) |
| Component library completeness (reusable components vs. one-off markup) | (Good / Issues found) | (Summary) |
| MCP-readiness (a rules file or MCP server exposes the system to AI agents) | (Good / Issues found) | (Summary) |
| Naming and style hierarchy (consistent, documented conventions) | (Good / Issues found) | (Summary) |

A stronger design system produces better AI output, which is a flywheel worth flagging as a quick win when token gaps are small. This is a system-health dimension, not a replacement for the human review gate: Nielsen's heuristics and WCAG still demand human-in-the-loop validation, and AI-generated code still needs the accessibility screen run against it before it ships.

Step 6: Review and refine

Ask the user:

Do the severity ratings match your experience with the product?
Any gaps missing from the map?
Are the quick wins actually quick? (You know the codebase and team capacity better than the agent.)
Is "what's working well" accurate? (Don't let the audit be entirely negative.)

Adjust as needed.

Portfolio example: B Lab (2020-2021)

The practitioner audited B Lab's certification platform and SDG Action Manager. Key findings: 55% of SDG Action Manager users hadn't answered a single question, 90% abandon rate, NPS of 26, 50%+ confusion-related support tickets. She identified 9 platform-wide UX problems including IA hidden behind hamburger menus, insider "B" language users didn't understand, too many distracting options post-registration, and no clear path to success. Created lo-fi dashboard mockups showing before/after: stripped irrelevant nav items and guided new users only to the actions that mattered most.

Output location

Present the assessment as formatted text in the conversation for the user to share with the team and convert into backlog items.

Example Output

Input

Product description: Roofr, a SaaS platform for roofing contractors to generate instant roof measurement reports and send proposals to homeowners. Primary users are small roofing business owners and their estimators (30–200 employee shops). Currently in growth phase with ~8,000 active contractor accounts.
Raw observations: Walked through onboarding, report ordering, and proposal builder flows. Noticed: no progress indicator during the 3-step report order process; proposal editor has 14 toolbar icons with no labels; "Upgrade" CTA appears 6 times on the dashboard before a user has completed their first report; error message on payment failure just reads "Something went wrong"; can't edit a sent proposal without duplicating it first (not obvious).
User data: App store rating 3.6 stars (common complaint: "confusing editor"); Intercom support tickets show 34% tagged "how do I edit a proposal"; onboarding funnel analytics show 61% drop-off between "account created" and "first report ordered"; NPS of 31; top detractor comment theme: "I didn't know my proposal was sent until the client called me."
User segments: (1) Owner-operators who do their own estimating, low tech fluency; (2) office estimators at mid-size shops, higher tech fluency but time-pressured; (3) new signups in 14-day trial, highest churn risk.
Core flows to assess: Onboarding to first report, proposal builder (create → send), and post-send confirmation.

UX Gap Assessment — Roofr, June 2025

Heuristic Evaluation

#	Heuristic	Rating	Key Findings
1	Visibility of system status	Issues found	No progress indicator in report order flow; users don't know if report is processing or failed; no confirmation screen post-send on proposals
2	Match between system and real world	Issues found	"Measurement report" vs. "slope report" terminology inconsistent; proposal editor uses design-tool metaphors (layer icons) unfamiliar to contractors
3	User control and freedom	Issues found	No undo in proposal editor; editing a sent proposal requires non-obvious duplication step; no draft autosave indicator
4	Consistency and standards	Issues found	Primary action buttons alternate between blue and green across flows; "Send" vs. "Deliver" vs. "Submit" used interchangeably for same action
5	Error prevention	Issues found	No confirmation dialog before sending a proposal; no validation warning if proposal total is $0 before send
6	Recognition rather than recall	Issues found	14 unlabeled toolbar icons in proposal editor force memorization; recent reports not surfaced on dashboard
7	Flexibility and efficiency	Issues found	No proposal templates for returning estimators; no bulk re-order for repeat roof types; power users have no keyboard shortcuts
8	Aesthetic and minimalist design	Issues found	Dashboard shows 6 "Upgrade" prompts before user completes first report; sidebar has 11 nav items visible to trial users, most irrelevant
9	Error recovery	Critical issues	Payment failure message: "Something went wrong" — no reason given, no next step; failed report orders have no retry path visible
10	Help and documentation	Issues found	Help widget hidden under settings menu; no contextual tooltips in proposal editor; onboarding checklist disappears after day 1
11	Accessibility (WCAG 2.2 AA baseline)	Issues found	Several toolbar icons fail 3:1 contrast on white background; no visible focus ring on proposal editor canvas; icon-only buttons lack aria-labels

Accessibility note: Icon contrast failures and missing aria-labels are significant. Recommend running /accessibility-audit for full WCAG 2.2 AA evaluation before next major release.

Mobile & Touch Evaluation

Criterion	Rating	Key Findings
Touch targets (min 44×44px)	Fail	Proposal editor toolbar icons measure ~28×28px on mobile; frequent mis-taps reported in reviews
Thumb zone placement	Issues found	Primary "Send Proposal" CTA sits top-right on mobile — outside easy thumb reach for right-handed users
Viewport adaptation	Issues found	Proposal preview table scrolls horizontally on viewports below 390px without user indication
Input method fit	Good	Numeric fields correctly trigger numeric keyboard; no hover-only interactions found
Gesture discoverability	Issues found	No pull-to-refresh on reports list; swipe-to-delete exists on line items but is undiscoverable
Orientation handling	Good	Layout reflows correctly in landscape; no broken states observed
Fat-finger tolerance	Issues found	"Duplicate" and "Delete" proposal actions are adjacent with <8px gap; accidental deletion risk

Prior Research Cross-Reference

Heuristic finding	Prior research evidence	Effect on severity
No post-send confirmation (H1)	Previous usability test (Q3 2024) showed 3/5 participants refreshed the page repeatedly after sending, unsure if action completed	Increased — confirmed user confusion, not just expert judgment
Unlabeled toolbar icons (H6)	No prior research on this specific issue	No prior data
Upgrade prompt overload on dashboard (H8)	Onboarding interview notes (Q1 2025): 4/6 new users described dashboard as "overwhelming" or "pushy" on first login	Increased — research-confirmed

Data Cross-Reference

Finding	Heuristic evidence	Data evidence	Confidence
Onboarding drop-off	No progress indicator; noisy dashboard with 11 nav items	61% drop between signup and first report ordered	High
Proposal editor confusion	14 unlabeled icons, no undo, inconsistent terminology	34% of support tickets = "how do I edit a proposal"; 3.6 App Store rating with editor complaints	High
Post-send status anxiety	No confirmation screen or send receipt	"I didn't know my proposal was sent until the client called me" — top NPS detractor theme	High
Error recovery failure	"Something went wrong" payment error, no retry path	Volume not quantified, but payment failure tickets are second-largest support category	Medium

Gap Map (prioritized)

#	Gap	Severity	Frequency	User Segments	Evidence	Recommendation
1	No confirmation or status after proposal is sent	Critical	Every send action	All, especially owner-operators	NPS detractor quotes; Q3 usability test	Add post-send confirmation screen with: sent timestamp, recipient email, "View sent proposal" link, and optional follow-up reminder
2	"Something went wrong" error on payment failure	Critical	Every failed payment	All segments	Second-largest support ticket category	Replace with specific error (card declined / billing address mismatch) + clear retry CTA + link to update billing
3	61% drop-off between signup and first report	Critical	First session	Trial users (highest churn risk)	Funnel analytics; onboarding interview notes	Reduce dashboard nav to 3–4 items for trial users; replace upgrade prompts with a single persistent trial banner; add linear onboarding checklist with progress indicator
4	Proposal editor icons are unlabeled and too small	Major	Every editing session	Owner-operators (low tech fluency), mobile users	34% support tickets; App Store reviews; 28×28px touch targets	Add icon labels below toolbar; increase touch targets to 44×44px; introduce 3-item "most used" shortcut bar
5	No undo / no autosave signal in proposal editor	Major	Multiple times per session	Estimators, owner-operators	User review: "lost my whole proposal twice"	Implement Ctrl+Z undo stack; add autosave indicator ("Saved 2s ago") in editor header
6	Editing a sent proposal requires non-obvious duplication	Major	Any revision request	All segments	34% of support tickets include this workflow	Add "Edit & resend" button directly on sent proposal view; show inline explainer on first

Run this now

Try /ux-audit on your own input

0/4000

Related UX Research skills

Accessibility Audit Assumption Map Card Sort Plan Competitive UX Benchmark Concept Test Plan Diary Study Plan Interview Plan Interview Script

Back to Skills Catalog