Use this when a team has a feature idea, product concept, or strategic bet and needs to identify what they're assuming, how risky those assumptions are, and what to test first. Produces a prioritized assumption map with risk ratings and recommended validation methods. This is the starting point for focused discovery -- it tells you where to aim your research.
Related skills: Operationalizes the four-risk model from
knowledge/pm-discovery-frameworks.md. Output feeds into/interview-plan,/usability-test-plan, or/experiment-designdepending on the assumption type. Part of thediscovery-sprintrecipe.
Process
Step 1: Gather inputs
Ask the user to provide:
- The idea or bet -- what is the team planning to build, launch, or invest in? (Feature, product, initiative, or strategic direction.)
- Current evidence -- what do you already know? (Prior research, analytics, customer feedback, competitive intel.)
- Stakeholder context -- who believes this will work, and why? (Understanding conviction sources helps surface hidden assumptions.)
- Timeline pressure -- when does the team need to make a go/no-go decision? (This constrains how much validation is possible.)
Step 2: Surface assumptions
Guide the user through assumption extraction. For each risk category, ask: "What are we assuming is true that, if wrong, would make this fail?"
## Assumption Map -- (Idea/feature, date)
### Value assumptions (Will anyone want this?)
1. (Assumption -- e.g., "Users currently struggle with X and would switch to a better solution")
2. (Assumption -- e.g., "This problem is painful enough that users will pay to solve it")
3. (Assumption)
### Usability assumptions (Can people figure it out?)
1. (Assumption -- e.g., "Users will understand the new navigation without training")
2. (Assumption -- e.g., "The core workflow can be completed in under 3 minutes")
3. (Assumption)
### Viability assumptions (Should we build this?)
1. (Assumption -- e.g., "This feature won't cannibalize our existing product line")
2. (Assumption -- e.g., "We can acquire users for under $X CAC")
3. (Assumption)
### Feasibility assumptions (Can we build this?)
1. (Assumption -- e.g., "The API can handle 10x current load without infrastructure changes")
2. (Assumption -- e.g., "We can ship an MVP in 4 weeks with the current team")
3. (Assumption)
Extraction tips:
- Ask "What would have to be true for this to succeed?" -- each answer is an assumption
- Ask "What are we taking for granted?" -- especially for things that feel obvious
- Ask "What would a skeptic challenge?" -- find the assumptions the team doesn't want to examine
- Aim for 8-15 assumptions total. Fewer than 8 means you haven't dug deep enough. More than 20 means you need to consolidate
Step 3: Map assumptions to Business Model Canvas zones
After surfacing assumptions, map each one to its Business Model Canvas zone. This reveals where risk clusters:
- Desirability (Value Proposition, Customer Relationships, Channels, Customer Segments) -- Will anyone want this? Are we solving a real problem for real people?
- Feasibility (Key Activities, Key Resources, Key Partners) -- Can we build and deliver this? Do we have the people, tech, and partners?
- Viability (Revenue Streams, Cost Structure) -- Should we build this? Will the economics work?
If all assumptions cluster in one zone, that is the team's primary risk. If desirability assumptions dominate, the team has a value risk. If feasibility dominates, it is a delivery risk. If viability dominates, it is a business model risk. Name the cluster explicitly -- it focuses the validation plan.
Step 4: Prioritize on the risk matrix
Plot each assumption on two axes:
- Impact if wrong: How badly does the idea fail if this assumption is false? (High / Medium / Low)
- Evidence level: How much do we actually know? (Strong evidence / Some evidence / No evidence / Contradictory evidence)
### Risk matrix
| # | Assumption | Risk category | Impact if wrong | Evidence level | Priority |
|---|---|---|---|---|---|
| 1 | (Assumption) | Value | High | No evidence | **Test first** |
| 2 | (Assumption) | Usability | High | Some evidence | **Test soon** |
| 3 | (Assumption) | Viability | Medium | Strong evidence | Monitor |
| 4 | (Assumption) | Feasibility | Low | Some evidence | Accept for now |
### Priority definitions
- **Test first:** High impact + low evidence. These are the riskiest assumptions. Validate before committing resources.
- **Test soon:** High impact + some evidence, or medium impact + no evidence. Validate within this iteration.
- **Monitor:** Medium impact with some evidence. Keep an eye on it, but don't block progress.
- **Accept for now:** Low impact or strong evidence. Revisit only if conditions change.
Step 5: Recommend validation methods
For each "Test first" and "Test soon" assumption, recommend a validation approach:
### Validation plan
| Assumption | Recommended method | Why this method | Timeline | Effort |
|---|---|---|---|---|
| (Assumption 1) | (e.g., 5 user interviews) | (Value assumption -- need to hear from users whether this pain exists) | 1-2 weeks | Medium |
| (Assumption 2) | (e.g., Prototype usability test) | (Usability assumption -- need to see if users can navigate the flow) | 3-5 days | Low |
| (Assumption 3) | (e.g., Spreadsheet financial model) | (Viability assumption -- need to validate unit economics) | 1-2 days | Low |
| (Assumption 4) | (e.g., Technical spike) | (Feasibility assumption -- need to prove the architecture works) | 3-5 days | Medium |
Method selection guidance:
- Value assumptions → User interviews, surveys, landing page tests, fake door tests
- Usability assumptions → Prototype testing, hallway tests, wizard of oz
- Viability assumptions → Financial models, market sizing, competitive analysis, sales conversations
- Feasibility assumptions → Technical spikes, proof of concepts, architecture reviews
Step 6: Review and validate
Ask the user:
- Are we missing any assumptions? (The scariest ones are the ones nobody mentions.)
- Do the priority ratings feel right? Is anything under-weighted because the team has false confidence?
- Is the validation plan realistic given timeline and resources?
- Who on the team should own each validation? (Assign owners now or it won't happen.)
- When will you re-validate? Assumptions are dated the moment you write them. Market conditions, competitive landscape, and user needs shift. Schedule a review in 4-6 weeks. A validated assumption from last quarter may be invalid this quarter. Product-market fit is a moving target, not a destination.
Output location
Present the assumption map as formatted text in the conversation. The user copies it into their project documentation. The validation plan feeds directly into /interview-plan, /usability-test-plan, or /experiment-design depending on the assumption type.
Example Output
Input
- The idea or bet: Add an AI-powered spend categorization feature to Ardent Finance's SMB expense management platform — the system would automatically tag transactions and flag policy violations without requiring manual review
- Current evidence: NPS surveys show "manual categorization" is the #3 complaint; one enterprise pilot customer reduced reconciliation time by 40%; no data on SMB willingness to pay a premium; engineering has not evaluated ML infrastructure costs
- Stakeholder context: VP of Product believes this closes deals against Expensify; CFO is skeptical because a similar initiative was deprioritized at a previous company due to low adoption
- Timeline pressure: Board review in 10 weeks; team needs a go/no-go on roadmap commitment in 6 weeks
Output (abbreviated)
Assumption Map — Ardent Finance AI Spend Categorization, June 2025
Value assumptions (Will anyone want this?)
- SMB finance managers find manual categorization painful enough to change workflows for an automated solution
- Users will trust AI-generated categories enough to act on them without reviewing each transaction
- Policy violation flagging is a valued feature, not just a nice-to-have — SMBs will pay to catch violations proactively
- The 40% reconciliation time reduction from the enterprise pilot will generalize to SMB users with smaller transaction volumes
Usability assumptions (Can people figure it out?)
- Users will understand when and why the AI is uncertain, and know what to do when confidence is low
- Overriding an AI-assigned category won't feel more effortful than manual entry from scratch
- The policy violation alert UI won't generate false-positive fatigue that causes users to ignore all flags
Viability assumptions (Should we build this?)
- SMBs will pay a meaningful price premium (≥$15/seat/month) for AI categorization over the base tier
- Categorization accuracy needs to stay above ~92% or support ticket volume will erode margin
- This feature differentiates Ardent against Expensify and Ramp rather than prompting a feature match within 2 quarters
Feasibility assumptions (Can we build this?)
- Ardent's current transaction data is large enough and clean enough to train or fine-tune a categorization model
- The team can ship an MVP accurate enough to demo in 6 weeks with current ML and backend capacity
- Ongoing model retraining and monitoring can be absorbed without a dedicated MLOps hire
Risk matrix
| # | Assumption | Risk category | Impact if wrong | Evidence level | Priority |
|---|---|---|---|---|---|
| 1 | SMBs find manual categorization painful enough to change workflows | Value | High | Some evidence (NPS data) | Test first |
| 2 | Users will trust AI categories without reviewing each transaction | Value | High | No evidence | Test first |
| 3 | SMBs will pay ≥$15/seat premium for this feature | Viability | High | No evidence | Test first |
| 4 | 40% time savings generalizes from enterprise to SMB segment | Value | High | Contradictory evidence | Test first |
| 5 | Override flow won't feel harder than manual entry | Usability | High | No evidence | Test soon |
| 6 | Users will understand low-confidence AI states | Usability | Medium | No evidence | Test soon |
| 7 | Transaction data is sufficient for model training | Feasibility | High | No evidence | Test soon |
| 8 | Policy flagging is valued, not just tolerated | Value | Medium | Some evidence | Test soon |
| 9 | Accuracy must stay above ~92% to protect margin | Viability | Medium | Some evidence | Monitor |
| 10 | Feature won't be matched by Expensify within 2 quarters | Viability | Medium | Some evidence | Monitor |
| 11 | MVP shippable in 6 weeks with current team | Feasibility | Medium | Some evidence | Monitor |
| 12 | MLOps absorbed without dedicated hire | Feasibility | Low | Some evidence | Accept for now |
Risk cluster diagnosis: Assumptions heavily cluster in Desirability — the team has strong internal conviction but is largely guessing at SMB user behavior and willingness to pay. This is a value risk, not a delivery risk. Validate demand before touching the tech stack.
Validation plan
| Assumption | Recommended method | Why this method | Timeline | Effort |
|---|---|---|---|---|
| SMBs find categorization painful enough to change workflows | 8 problem interviews with SMB finance managers across 3 verticals | Need first-person accounts of the workflow, not survey proxies — uncover severity and current workarounds | Week 1–2 | Medium |
| Users won't trust AI without reviewing each transaction | Prototype concept test — show categorized transactions and observe review behavior | Usability + value hybrid; reveals trust calibration before any code is written | Week 2–3 | Low |
| SMBs will pay ≥$15/seat premium | Pricing page fake-door test + 5 sales discovery calls asking directly about budget | Combine stated and revealed willingness to pay; CFO skepticism makes this a must-answer before board review | Week 1–3 | Low–Medium |
| 40% time savings generalizes to SMBs | Diary study or time-on-task analysis with 3 SMB pilot users doing live reconciliation | Behavioral data beats extrapolating from a single enterprise customer | Week 2–4 | Medium |
| Override flow usability | 5-participant hallway usability test on Figma prototype with override scenario | Fast, cheap, answers the specific interaction risk before engineering speculates | Week 3 | Low |
| Transaction data sufficient for model training | Engineering data audit + 1-week feasibility spike | Unblock or kill the technical path early — this gates everything downstream | Week 1 | Medium |
Open questions before 6-week go/no-go
- Who owns the SMB interview recruiting — does CS have a warm list, or do we need a panel?
- Should sales be included in the pricing fake-door test, or will that contaminate active pipeline?
- The CFO's prior experience with low adoption: what specifically failed? Interview that stakeholder before finalizing the assumption map.
- Re-validate date: Schedule assumption review for Week 5 — one week before the go/no-go decision — to incorporate findings before roadmap commitment is made.