Skip to main content
UX Research/usability-test-plan

Usability Test Plan

You need to plan a usability test from objectives to moderator guide.

Use this when you have a prototype, design, or live product and need to plan a usability test. Produces a complete test plan: objectives, task scenarios, success metrics, moderator guide, and observer briefing. This is the evaluative counterpart to /interview-plan (which covers generative research).

Related skills: Pairs with /screener-design for participant recruitment. Feeds into /research-synthesize for post-test analysis. For generative interviews, use /interview-plan instead. For heuristic evaluation without users, use /ux-audit.

Process

Step 1: Gather inputs

Ask the user to provide:

  1. What's being tested -- prototype, live product, specific flow, or concept. Include links or screenshots if available.
  2. Research questions -- what do you need to learn? (e.g., "Can users complete the onboarding flow without assistance?" or "Where do users get confused in checkout?")
  3. User segment -- who should participate? Role, experience level, familiarity with the product.
  4. Known issues -- anything the team already suspects is broken? (So we can validate, not just discover.)
  5. Decisions this informs -- what will you do differently based on the results? (If nothing, don't test.)
  6. Constraints -- moderated vs. unmoderated, remote vs. in-person, timeline, budget, tools available (e.g., Maze, UserTesting, Lookback).

Step 2: Define test objectives and metrics

## Usability Test Plan -- (Product/feature, date)

### Objectives
1. (Primary objective -- the most important thing to learn)
2. (Secondary objective)
3. (Tertiary objective -- nice-to-have)

### Success metrics

| Metric | Target | How measured |
|---|---|---|
| Task completion rate | (e.g., 80%+ complete without help) | Binary: completed / failed / completed with help |
| Time on task | (e.g., under 3 minutes for core flow) | Stopwatch from task start to completion |
| Error rate | (e.g., fewer than 2 wrong paths per task) | Count of incorrect actions before recovery |
| Satisfaction | (e.g., 4+ on 5-point post-task rating) | Post-task questionnaire |
| Critical issues found | (e.g., 0 blockers in core flow) | Issues that prevent task completion |

Step 3: Write task scenarios

Design 4-7 task scenarios. Each scenario should:

  • Describe a realistic goal, not a set of instructions ("You want to invite a teammate to your project" not "Click the settings gear, then click Team Members")
  • Start with context that grounds the participant in a realistic situation
  • Be completable in 2-5 minutes
  • Not reveal the expected path
### Task Scenarios

**Task 1: (Task name)**
- **Scenario:** "(Context and goal written as the participant would experience it -- e.g., 'You just signed up for the product and want to set up your first project. Start from the dashboard.')"
- **Success criteria:** (What counts as successful completion)
- **Metrics to capture:** (Completion, time, errors, satisfaction)
- **Watch for:** (Specific behaviors or decision points to observe)

**Task 2: (Task name)**
(Same format)

Task design rules:

  • Order tasks from simple to complex. Build participant confidence before testing harder flows.
  • Include at least one task that tests error recovery (e.g., "You accidentally deleted something -- now recover it")
  • Include one open-ended exploration task if time allows ("Look around and tell me what you think this product does")
  • Avoid jargon the product uses internally -- use the participant's language

Step 4: Write the moderator guide

### Moderator Guide

**Before the session (5 min)**
- Welcome and thank participant
- Explain the session: "We're testing the product, not you. There are no wrong answers."
- Get consent for recording
- Ask participant to think aloud: "Tell me what you're thinking as you go through each task."
- Confirm their background matches the screener (1-2 quick questions)

**Warm-up (3-5 min)**
- (1-2 questions about their current workflow related to the product area)
- (Goal: establish comfort and get baseline context)

**Task execution (25-40 min)**
- Read each task scenario aloud, then provide it in writing (paste in chat or hand a card)
- After each task:
  - "On a scale of 1-5, how easy was that?" (post-task rating)
  - "What were you expecting to happen when you clicked X?" (if they hesitated)
  - "Was there anything confusing about that?" (open-ended)
- Between tasks: brief pause. Note any spontaneous comments.

**Debrief (5-10 min)**
- "Which task was the hardest? Why?"
- "What would you change about this product?"
- "Is there anything you expected to find that wasn't there?"
- "Any final thoughts?"

**Moderator rules:**
- Do not help unless the participant is completely stuck for 60+ seconds
- If stuck, offer one neutral prompt: "What would you try next?" before giving a hint
- Never explain the interface. If they ask "What does this button do?" respond: "What do you think it does?"
- Note timestamps for key moments (confusion, delight, errors)
- Record non-verbal cues: hesitation, sighing, re-reading, backtracking

Step 5: Write the observer briefing

### Observer Briefing

**Your role:** Watch silently. Take notes. Do not interact with the participant.

**What to capture:**
- Task completion: did they finish? How long? What path did they take?
- Errors: wrong clicks, backtracking, confusion
- Quotes: verbatim things they say while thinking aloud
- Body language: hesitation, frustration, surprise, delight
- Workarounds: unexpected paths they invent

**Note-taking template (one per task per participant):**

| Field | Notes |
|---|---|
| Task # | |
| Completed? | Yes / No / With help |
| Time | |
| Path taken | (sequence of actions) |
| Errors | |
| Key quote | |
| Observations | |
| Severity of issues | Critical / Major / Minor |

Step 6: Logistics and schedule

### Test logistics

| Item | Detail |
|---|---|
| Format | Moderated / Unmoderated |
| Setting | Remote / In-person |
| Tool | (Zoom, Maze, UserTesting, Lookback, etc.) |
| Participants | (Number -- typically 5-6 per round) |
| Session length | (45-60 min typical) |
| Schedule | (Date range, sessions per day -- max 3-4 to avoid moderator fatigue) |
| Recording | (Video, audio, screen -- confirm consent process) |
| Recruitment | (Use /screener-design for participant screening) |

Step 7: Review and validate

Ask the user:

  • Are the task scenarios realistic? Would a real user encounter these goals?
  • Is the success criteria clear enough that two observers would agree on pass/fail?
  • Are we testing the right things? (The hardest part of usability testing is testing the right questions, not running the sessions.)
  • Anything missing from the observer briefing?

Output location

Present the test plan as formatted text in the conversation. The user copies it into their research documentation or shares with the team.

Example Output

Input

  • What's being tested: Mid-fidelity Figma prototype of a redesigned invoice creation flow in Meridian Financial's B2B payments platform (desktop web). Link to prototype available internally.
  • Research questions: Can finance managers create and send a custom invoice without assistance? Where do line-item editing and tax configuration cause errors or drop-off?
  • User segment: Finance managers and AR specialists at mid-market companies (50–500 employees), moderate SaaS familiarity, some prior experience with QuickBooks or FreshBooks
  • Known issues: Team suspects the "Add Tax Rule" modal is confusing — it requires users to set a jurisdiction before a rate, but the UI doesn't communicate that dependency. Also unsure if "Save as Draft" vs. "Send Now" distinction is clear.
  • Decisions this informs: Whether to ship the redesigned flow in Q3 or return to iteration. Tax modal may be redesigned or replaced with a simpler inline input.
  • Constraints: Moderated remote sessions via Zoom + Lookback, 6 participants, sessions in the next 2 weeks, ~60 minutes each

Output

Usability Test Plan — Meridian Financial Invoice Creation Flow, June 2025


Objectives

  1. Primary: Determine whether finance managers can complete the end-to-end invoice creation and send flow without moderator assistance
  2. Secondary: Identify where users encounter errors or confusion in line-item editing and tax rule configuration
  3. Tertiary: Assess whether the Save as Draft vs. Send Now distinction is understood and used as intended

Success Metrics

MetricTargetHow measured
Task completion rate80%+ complete core flow without helpBinary: completed / failed / completed with help
Time on task (full invoice)Under 6 minutesStopwatch from task start to first send confirmation
Error rate on tax configurationFewer than 2 wrong actions before correct inputCount of mis-taps, wrong field entries, modal re-opens
Post-task ease rating4+ out of 5 for core flowSingle-question post-task scale
Critical blockers0 in core send flowIssues that fully prevent task completion
Draft vs. Send confusionFewer than 3 participants choose wrong actionObserved behavior at the final action step

Task Scenarios

Task 1: Orientation

  • Scenario: "You've just logged into Meridian for the first time this month. Take a minute to look around this screen and tell me what you think you can do here."
  • Success criteria: Participant can describe the general purpose of the dashboard and locate the invoice area unprompted
  • Metrics to capture: Time, verbal description accuracy, navigation path
  • Watch for: Whether "New Invoice" CTA is noticed immediately or buried in their scan

Task 2: Create a Basic Invoice

  • Scenario: "A client, Hargrove Construction, has asked for an invoice for two services: a $4,200 consulting retainer and a $950 setup fee. Create an invoice for them and add both line items."
  • Success criteria: Both line items entered with correct amounts; client name applied
  • Metrics to capture: Completion, time, errors on line-item entry (especially editing an existing row)
  • Watch for: Whether users try to edit a line item inline or look for a separate edit button; confusion with the "+" icon vs. "Add Line" text link (known duplication in the UI)

Task 3: Apply a Tax Rule

  • Scenario: "Hargrove Construction is based in Ontario, Canada. You need to apply the applicable HST rate to both line items before sending."
  • Success criteria: HST (13%) applied to both line items via the tax modal
  • Metrics to capture: Completion, error count, time in modal, post-task ease rating
  • Watch for: Whether users attempt to enter the rate before selecting jurisdiction; whether they re-open the modal or abandon; verbal expressions of confusion or expectation mismatch

Task 4: Save for Later

  • Scenario: "Your manager wants to review this invoice before it goes out. Save it so you can come back and send it after they've approved it."
  • Success criteria: Invoice saved as Draft (not sent)
  • Metrics to capture: Completion, whether Send Now is chosen accidentally, any hesitation at the action buttons
  • Watch for: Eye movement or hovering between "Send Now" and "Save as Draft"; participants who save correctly but express uncertainty about whether it actually saved

Task 5: Send the Final Invoice

  • Scenario: "Your manager has approved the invoice. Now send it to Hargrove Construction's billing contact, billing@hargroveconstruction.com."
  • Success criteria: Invoice sent to correct email address; confirmation state reached
  • Metrics to capture: Completion, time, any errors adding the recipient email
  • Watch for: Whether users navigate back to the draft naturally or need to search for it; confusion at the recipient entry field if it doesn't pre-populate the client contact

Task 6: Error Recovery

  • Scenario: "You just realized you sent an invoice with the wrong setup fee — it should have been $1,150, not $950. What would you do?"
  • Success criteria: Participant attempts to locate the sent invoice and find an edit or void option (success defined as reaching the correct screen, even if edit is not available in prototype)
  • Metrics to capture: Path taken, time, verbal problem-solving
  • Watch for: Whether users look for an "Edit" button on the sent invoice, try to duplicate and re-send, or express that they expect this capability to exist but can't find it

Moderator Guide

Before the session (5 min)

  • Welcome and thank the participant; introduce yourself and any silent observers
  • "We're testing the product today, not your skills — there are no wrong answers. If something is confusing, that's valuable information for us."
  • Confirm recording consent; explain Lookback will capture screen and audio
  • "As you go through each task, please think out loud — tell me what you're looking at, what you're expecting, and what's going through your mind."
  • Quick screener confirmation: "Just to confirm, do you currently handle invoicing or accounts receivable in your role?"

Warm-up (3–5 min)

  • "Walk me through how you typically create an invoice today — what tool do you use, and how long does it usually take?"
  • "What's the most frustrating part of that process?"
  • (Goal: calibrate language, establish comfort, surface mental models before they see the prototype)

Task execution (30–40 min)

  • Read each scenario aloud, then paste the text into the Zoom chat
  • After each task:
    • "On a scale of 1–5, how easy was that?" → note number and any unprompted explanation
    • If hesitation observed: "I noticed you paused there — what were you thinking?"
    • If they clicked something unexpected: "What were you expecting to happen when you did that?"
    • "Was there anything about that task that surprised you?"

Debrief (8–10 min)

  • "Which task felt the hardest? What made it difficult?"
  • "Was there anything you expected to find that wasn't there?"
  • "If you could change one thing about creating an invoice in this product, what would it be?"
  • "Any final thoughts before we wrap up?"

Moderator rules:

  • Do not intervene unless the participant has been fully stuck for 60+ seconds
  • First neutral prompt: "What would you try next?"
  • Second prompt (if still stuck): "Is there anywhere else you might look for that?"
  • Never confirm or correct. If they ask "Is this right?" respond: "What do you think?"
  • Timestamp every error, long pause (5+ sec), and spontaneous comment
  • Note non-verbal signals: re-reading instructions, sighing, laughing, leaning toward screen

Observer Briefing

Your role: Silent observation. Take structured notes. Do not speak, react, or make eye contact with the participant during tasks.

What to capture:

  • Exact path taken (sequence of clicks/actions), not a summary
  • Verbatim quotes while thinking aloud — especially around the tax modal and send/draft step
  • Errors: wrong clicks, modal re-opens, backtracking, form re-entry
  • Moments of hesitation (3+ seconds without action)
  • Any workarounds or invented paths not anticipated by the design

Note-taking template (one per task per participant):

FieldNotes
Participant ID
Task #