A PM I was coaching came to me with a product idea. The pitch was good: use an LLM to classify customer support tickets, route them to the right team, and suggest a first response. The demo was impressive. The model was fast. The accuracy was high.
I asked one question: "What does the support agent do after they see the suggestion?"
The PM paused. They hadn't mapped the workflow past the AI output. The model classified tickets beautifully. No one had designed what happened next: how agents edited the suggestion, how confidence thresholds affected routing, what happened when the model was wrong, how the team would know if agents trusted it or ignored it.
The model worked. The product didn't exist yet.
Capability vs. job
I wrote about the Fruit Tool problem - the pattern where teams build one clever AI feature, wrap it in a UI, and call it a product. The diagnostic I use is simple: if you remove the AI, is there still a product? If not, you have a demo.
That diagnostic is the starting point for how I think about AI product strategy. The full framework has three layers.
Layer 1: Start with the job, not the model
Every AI product I've built started with the same question: what's the person trying to accomplish?
At FLUXX Health, we built a menopause education platform with AI-driven phase detection. The capability was impressive - quiz responses mapped to hormonal phases using a scoring engine. But the job wasn't "detect my phase." The job was "help me understand what's happening to my body and what to do about it." Phase detection was infrastructure. The product was clarity.
When I built assessment tools for my own practice, the same principle applied. The AI Maturity Assessment doesn't just produce a score. It surfaces the specific dimension where a team is weakest and recommends the first action for that week. The output is a decision, not a number.
If your AI product's primary value proposition is "look what the model can do," you are building a demo. If the value proposition is "here's what you should do next," you are building a product.
Layer 2: Design for the moment the model is wrong
Every model is wrong sometimes. The question isn't whether it will be wrong. The question is what happens when it is.
Most AI product teams spend 90% of their energy on the happy path - the model is right, the user is delighted. The best teams spend equal energy on the failure path. What does the user see when confidence is low? How do they correct the output? Does the correction make the system better?
At FLUXX, we built uncertainty handling into the phase detection logic. When the model couldn't confidently assign a phase, it said so. It didn't guess. That decision was hard - it meant some users got a less satisfying result on their first use. But it also meant users trusted the results they did get. Trust compounded. Engagement stayed high.
The failure path is where AI products earn or lose trust. Design it as carefully as the happy path.
Layer 3: Measure the outcome, not the output
AI products have a unique measurement trap: the model metrics can be excellent while the product metrics are terrible.
A classification model with 95% accuracy sounds great. But if agents override the suggestion 40% of the time, the product isn't working. If the time-to-resolution doesn't decrease, the classification isn't helping. If agents stop looking at the suggestions after the first week, accuracy doesn't matter.
The metrics that matter are always downstream of the model:
- Did the user accomplish their goal faster?
- Did the user trust the result enough to act on it?
- Did the product reduce the work the user had to do?
At FLUXX, we measured quiz completion rates and return visits - not just phase detection accuracy. A model that detected phases correctly but produced a quiz that 60% of people abandoned was a model that wasn't helping anyone. When we optimized the quiz for mobile and reduced abandonment by 38%, that was a product win. The model didn't change. The experience around it did.
Outcome metrics tell you whether the product works. To know whether the model itself works, you need AI evals - systematic tests of output quality that run as you iterate, not a one-time demo. Here's how I design them.
The framework in practice
When I evaluate an AI product opportunity, I run three questions in order:
1. Can I describe the user's job without mentioning AI? If the pitch only works because "AI" is in the sentence, the job isn't real. "Help support agents resolve tickets faster" is a job. "Use AI to classify tickets" is a capability.
2. What happens when the model is wrong? If the answer is "it won't be" or "we'll improve accuracy," the failure path hasn't been designed. Every model will be wrong. The product needs to handle it gracefully.
3. What outcome am I measuring? If the primary metric is model accuracy, the team is optimizing the wrong thing. Find the user outcome that the model enables, and measure that.
The organizational problem
The hardest part of AI product strategy isn't the framework. It's the organizational pressure to ship something that demos well.
Executives who have seen ChatGPT want an AI feature. Boards want an AI story. Sales wants an AI pitch. The pressure to wrap a model in a UI and ship is enormous.
The best AI product leaders I know resist that pressure by reframing the conversation. They don't say "we shouldn't use AI." They say "here's the user problem AI can solve, and here's how we'll know it's working." They make the job the hero of the story, not the model.
That reframe is the entire strategy. Start with the job. Design for failure. Measure outcomes. Everything else is implementation.
Frequently asked questions
Related services
Frameworks for this topic
Read next
I built k8mak.com the same way I build products for clients: JTBD research, OKR alignment, opportunity solution trees, and a 15-step product strategy process. Here is what I built, what I cut, and why.
In 2014, I managed 24 engineers building a car dealership negotiation platform. In 2026, I ship comparable output solo with AI tooling. Here is what transferred and what did not.
I built a structured AI skill system at Artium that grew from zero to 340+ skills. After leaving, I kept building. It's now 584 skills, 18 recipes, and the engine behind k8mak.com. The meta-lesson: building AI infrastructure is product work.
Want to work together?
I help teams ship better products. Let's talk about your situation.
Get in touch