The phrase "AI agent" gets thrown around loosely. For this framework, an agent is an AI system that can take actions - not just generate text, but read data, call APIs, send messages, modify records. The shift from AI-as-advisor to AI-as-actor changes everything about how you need to design the experience.
Most agent failures look like intelligence problems. The output was wrong, the action was inappropriate, the context was missed. But when you dig in, the root cause is almost always structural. The agent didn't have the right information, didn't know what it was supposed to do, or didn't have the right tools configured.
The waterline model
Think of agent failures like an iceberg. The visible failure - bad output, wrong action, missed context - sits above the waterline. Below it: missing permissions, incomplete context, wrong tool configuration, unclear identity documents, absent feedback loops.
Diagnostic sequence when an agent underperforms:
- Does the agent have access to the information it needs? (Permissions)
- Does the agent know what it's supposed to do? (Identity/soul document)
- Does the agent have the right tools configured? (Integrations)
- Is the context window overloaded or underscoped? (Context management)
- Only after all structural issues are ruled out: is the model capability the bottleneck?
Teams waste enormous time prompt-engineering around structural problems. Fix the infrastructure first.
Three layers of agent design
An effective agent system has three layers. Most products only build one.
Soul
The identity document. Who the agent is, what it values, how it communicates, what it refuses to do. This persists across sessions and provides behavioral consistency.
A good soul document covers: voice and tone, domain expertise, decision-making principles, escalation boundaries, and explicit limitations. Without it, the agent's personality changes with every conversation.
Heartbeat
The recurring cadence. Scheduled tasks that run without prompting: morning briefings, end-of-day summaries, weekly pulse checks, inbox monitoring. This turns a passive tool into an active collaborator.
Heartbeat is what separates "I have a chatbot" from "I have a system that works for me." Most products skip this layer entirely and wonder why adoption stalls after the novelty wears off.
Jobs
The task backlog. Specific work the agent can do on request: generate a report, draft a response, analyze a dataset, triage incoming items. In most AI products, this is the entire offering.
Jobs are necessary but insufficient. Without soul, the jobs lack consistency. Without heartbeat, the user has to remember to use the agent. The full stack - soul, heartbeat, jobs - creates a system that knows who it is, shows up proactively, and does useful work.
Progressive trust
Grant agent capabilities incrementally, not all at once:
| Trust level | Capabilities | Example |
|---|---|---|
| Read-only | View data, analyze, summarize | Calendar read, email read, file analysis |
| Draft | Generate content for human review | Email drafts, document drafts, code suggestions |
| Send with approval | Take actions, but human confirms each one | Send email after review, create ticket after confirmation |
| Autonomous | Act independently within defined boundaries | Auto-triage low-priority items, schedule recurring tasks |
Each level requires demonstrated reliability at the previous level. Jumping straight to autonomous is how you get agents sending embarrassing emails or creating duplicate records.
This trust model maps to security thinking - each trust level requires a different security posture. And the UX for each level is covered in agentic UX.
Measuring agent health
Use the AI Health Indicator to assess your agent across all six CARATS dimensions. Agents are particularly vulnerable to:
- Consistency failures when context management is poor
- Alignment failures when the soul document is missing or vague
- Security failures when trust levels aren't enforced
- Tone failures when the agent operates across different contexts without adjusting
Track these metrics over time. Agent quality degrades silently - a model update, a context change, a new integration can all shift behavior without anyone noticing.
Related practices
Related services
Want help with agent experience?
I coach teams on this practice. Let's talk about your situation.
Get in touch