AI Health Indicator

Most teams shipping AI features have no structured way to answer "is this actually working?" They rely on user complaints, gut feel, or the absence of obvious failures. That is not a quality strategy. That is hope.

The AI Health Indicator (AHI) is a diagnostic framework built around six dimensions that matter for production AI systems. It gives you a score, a risk map, and a clear picture of where to invest next.

The CARATS framework

CARATS stands for Consistency, Accuracy, Reliability, Alignment, Tone, and Security. Each dimension captures a different way AI systems can fail silently.

Dimension	What it measures	Silent failure mode
Consistency	Same input produces similar output across runs	Outputs drift without anyone noticing
Accuracy	Outputs are factually correct and complete	Confident wrong answers that look right
Reliability	System performs under load, over time, across user segments	Works in demo, degrades in production
Alignment	Outputs match what users actually need	Technically correct but practically useless
Tone	Communication style fits the audience and context	Medical assistant sounds like a chatbot
Security	Protected from injection, leakage, adversarial manipulation	Prompt injection exposes system instructions

Built from patterns across 30+ AI engagements, the theme that kept emerging: teams were measuring whether the AI was running but not whether it was working.

How to use it

Run the assessment

The AI Health Check tool walks you through 10 questions covering all six dimensions plus structural health factors (evaluation maturity, context discipline, experimentation rigor). It takes about 5 minutes and produces a scored breakdown with risk areas highlighted.

Read the scores

Each dimension scores on a 1-5 scale:

4.0+: Healthy. You have practices in place and they're working.
3.5-3.9: At risk. You have some practices but gaps are showing.
Below 3.5: Needs attention. This dimension is a liability.

Your overall score is the average, but the real value is the per-dimension breakdown. A team scoring 4.5 on Consistency but 2.0 on Security has a very different action plan than one scoring 3.0 across the board.

Act on the gaps

For each dimension below healthy, the framework points to specific practices:

Low Consistency/Accuracy → Invest in test-driven development and metrics for AI outputs
Low Reliability/Alignment → Run delivery diagnostics to find structural issues
Low Tone/Security → Apply security thinking and agentic UX principles
Low Context Discipline → Structure your agent skills and context management
Low Experimentation → Adopt experiment-driven development

Beyond CARATS: structural health

CARATS measures output quality. But output quality depends on structural factors:

Evaluation maturity asks whether your team writes evals before building features. If you only check quality after launch, you're doing quality assurance. If you define expected behavior before implementation, you're doing eval-driven development. The difference is the same as the gap between TDD and manual testing.

Context discipline asks whether your AI agents get the right information at the right time. Poor context management causes drift, hallucination, and inconsistency. Teams with strong context discipline use structured knowledge documents, scoped tool access, and explicit context boundaries.

Bounded autonomy asks whether there are clear lines between what the AI can do alone and what requires human review. Most failures come from AI systems operating outside their competence boundary without anyone knowing.

When to run it

At kickoff - baseline before building
Monthly - track trends, catch drift
When something feels off - structured diagnosis instead of guessing
Before a launch - confirm readiness across all dimensions

The assessment is lightweight enough to run regularly. The value compounds as you track scores over time and can see whether your investments are moving the needle.

The CARATS framework

How to use it

Run the assessment

Read the scores

Act on the gaps

Beyond CARATS: structural health

When to run it

Skills for this topic

Apps for this topic

See this in practice

Related practices

Related services

Want help with ai health indicator?