Most teams shipping AI features have no structured way to answer "is this actually working?" They rely on user complaints, gut feel, or the absence of obvious failures. That is not a quality strategy. That is hope.
The AI Health Indicator (AHI) is a diagnostic framework built around six dimensions that matter for production AI systems. It gives you a score, a risk map, and a clear picture of where to invest next.
The CARATS framework
CARATS stands for Consistency, Accuracy, Reliability, Alignment, Tone, and Security. Each dimension captures a different way AI systems can fail silently.
| Dimension | What it measures | Silent failure mode |
|---|---|---|
| Consistency | Same input produces similar output across runs | Outputs drift without anyone noticing |
| Accuracy | Outputs are factually correct and complete | Confident wrong answers that look right |
| Reliability | System performs under load, over time, across user segments | Works in demo, degrades in production |
| Alignment | Outputs match what users actually need | Technically correct but practically useless |
| Tone | Communication style fits the audience and context | Medical assistant sounds like a chatbot |
| Security | Protected from injection, leakage, adversarial manipulation | Prompt injection exposes system instructions |
Built from patterns across 30+ AI engagements, the theme that kept emerging: teams were measuring whether the AI was running but not whether it was working.
How to use it
Run the assessment
The AI Health Check tool walks you through 10 questions covering all six dimensions plus structural health factors (evaluation maturity, context discipline, experimentation rigor). It takes about 5 minutes and produces a scored breakdown with risk areas highlighted.
Read the scores
Each dimension scores on a 1-5 scale:
- 4.0+: Healthy. You have practices in place and they're working.
- 3.5-3.9: At risk. You have some practices but gaps are showing.
- Below 3.5: Needs attention. This dimension is a liability.
Your overall score is the average, but the real value is the per-dimension breakdown. A team scoring 4.5 on Consistency but 2.0 on Security has a very different action plan than one scoring 3.0 across the board.
Act on the gaps
For each dimension below healthy, the framework points to specific practices:
- Low Consistency/Accuracy → Invest in test-driven development and metrics for AI outputs
- Low Reliability/Alignment → Run delivery diagnostics to find structural issues
- Low Tone/Security → Apply security thinking and agentic UX principles
- Low Context Discipline → Structure your agent skills and context management
- Low Experimentation → Adopt experiment-driven development
Beyond CARATS: structural health
CARATS measures output quality. But output quality depends on structural factors:
Evaluation maturity asks whether your team writes evals before building features. If you only check quality after launch, you're doing quality assurance. If you define expected behavior before implementation, you're doing eval-driven development. The difference is the same as the gap between TDD and manual testing.
Context discipline asks whether your AI agents get the right information at the right time. Poor context management causes drift, hallucination, and inconsistency. Teams with strong context discipline use structured knowledge documents, scoped tool access, and explicit context boundaries.
Bounded autonomy asks whether there are clear lines between what the AI can do alone and what requires human review. Most failures come from AI systems operating outside their competence boundary without anyone knowing.
When to run it
- At kickoff - baseline before building
- Monthly - track trends, catch drift
- When something feels off - structured diagnosis instead of guessing
- Before a launch - confirm readiness across all dimensions
The assessment is lightweight enough to run regularly. The value compounds as you track scores over time and can see whether your investments are moving the needle.
Related practices
Related services
Want help with ai health indicator?
I coach teams on this practice. Let's talk about your situation.
Get in touch