Skip to main content
Ai StrategyIntermediate7 min read

Agent Experience

Designing AI agents that work like good colleagues, not unpredictable tools

The phrase "AI agent" gets thrown around loosely. For this framework, an agent is an AI system that can take actions - not just generate text, but read data, call APIs, send messages, modify records. The shift from AI-as-advisor to AI-as-actor changes everything about how you need to design the experience.

Most agent failures look like intelligence problems. The output was wrong, the action was inappropriate, the context was missed. But when you dig in, the root cause is almost always structural. The agent didn't have the right information, didn't know what it was supposed to do, or didn't have the right tools configured.

The waterline model

Think of agent failures like an iceberg. The visible failure - bad output, wrong action, missed context - sits above the waterline. Below it: missing permissions, incomplete context, wrong tool configuration, unclear identity documents, absent feedback loops.

Diagnostic sequence when an agent underperforms:

  1. Does the agent have access to the information it needs? (Permissions)
  2. Does the agent know what it's supposed to do? (Identity/soul document)
  3. Does the agent have the right tools configured? (Integrations)
  4. Is the context window overloaded or underscoped? (Context management)
  5. Only after all structural issues are ruled out: is the model capability the bottleneck?

Teams waste enormous time prompt-engineering around structural problems. Fix the infrastructure first.

Three layers of agent design

An effective agent system has three layers. Most products only build one.

Soul

The identity document. Who the agent is, what it values, how it communicates, what it refuses to do. This persists across sessions and provides behavioral consistency.

A good soul document covers: voice and tone, domain expertise, decision-making principles, escalation boundaries, and explicit limitations. Without it, the agent's personality changes with every conversation.

Heartbeat

The recurring cadence. Scheduled tasks that run without prompting: morning briefings, end-of-day summaries, weekly pulse checks, inbox monitoring. This turns a passive tool into an active collaborator.

Heartbeat is what separates "I have a chatbot" from "I have a system that works for me." Most products skip this layer entirely and wonder why adoption stalls after the novelty wears off.

Jobs

The task backlog. Specific work the agent can do on request: generate a report, draft a response, analyze a dataset, triage incoming items. In most AI products, this is the entire offering.

Jobs are necessary but insufficient. Without soul, the jobs lack consistency. Without heartbeat, the user has to remember to use the agent. The full stack - soul, heartbeat, jobs - creates a system that knows who it is, shows up proactively, and does useful work.

Progressive trust

Grant agent capabilities incrementally, not all at once:

Trust levelCapabilitiesExample
Read-onlyView data, analyze, summarizeCalendar read, email read, file analysis
DraftGenerate content for human reviewEmail drafts, document drafts, code suggestions
Send with approvalTake actions, but human confirms each oneSend email after review, create ticket after confirmation
AutonomousAct independently within defined boundariesAuto-triage low-priority items, schedule recurring tasks

Each level requires demonstrated reliability at the previous level. Jumping straight to autonomous is how you get agents sending embarrassing emails or creating duplicate records.

This trust model maps to security thinking - each trust level requires a different security posture. And the UX for each level is covered in agentic UX.

Measuring agent health

Use the AI Health Indicator to assess your agent across all six CARATS dimensions. Agents are particularly vulnerable to:

  • Consistency failures when context management is poor
  • Alignment failures when the soul document is missing or vague
  • Security failures when trust levels aren't enforced
  • Tone failures when the agent operates across different contexts without adjusting

Track these metrics over time. Agent quality degrades silently - a model update, a context change, a new integration can all shift behavior without anyone noticing.

Want help with agent experience?

I coach teams on this practice. Let's talk about your situation.

Get in touch