Use this when you are taking an AI feature or system toward production and need a structured, durable record of what could go wrong, who owns each risk, and what catches it early. It enumerates risks across the dimensions that actually bite (accuracy, bias, security, privacy, IP, cost, reliability, compliance, reputation, provider dependency), scores them by judgment, assigns owners and controls, and sets a review cadence. If you only need to enumerate failure modes for a single feature before launch, use /ai-failure-mode-analysis. If you need the org-level policy and decision rights that sit above the register, use /ai-governance-framework.
Related skills: Sits under
/ai-governance-frameworkas its operational artifact. Maps regulatory exposure with/eu-ai-act-readiness. Feeds from per-feature/ai-failure-mode-analysis.
The hard part most teams miss
A risk register is only worth the day it was last touched. Most fail not in the scoring but in the keeping.
- A register nobody updates is theater. The first-pass scoring feels like the work, but it is the least durable part. What makes the register real is a review cadence with a named owner who is accountable for running it. A pristine document written once and never reopened gives the org false confidence, which is worse than no register at all. Decide the cadence and the owner before you score a single row.
- Scoring is judgment, not math. Likelihood times impact is a way to force a conversation, not a measurement. A composite score of "3.7" is false precision that hides disagreement: it averages a security lead who said "high" with a PM who said "low" and erases the fact that they disagree. Keep the scale coarse (Low/Medium/High, or 1 to 3) and surface the disagreement rather than dissolving it into a decimal. The argument about the score is the value.
- The real decision is residual-risk acceptance. After a mitigation, some risk remains. Someone has to say, out loud and by name, "we accept this residual risk and ship." That acceptance is a human decision, not a row in a spreadsheet, and it needs a named owner with the authority to make it. A register that lists residual risk but names no acceptor has not actually decided anything.
Process
Step 1: Gather inputs
Ask the user:
- What is the AI system or feature? ({{system_name}}: what it does, who uses it, what it touches.)
- What is the worst plausible outcome? (Regulatory action, customer harm, data breach, runaway cost, brand damage. This calibrates impact.)
- What data does it use, and whose is it? (Training data, user inputs, retrieved documents, PII. This drives the privacy and IP rows.)
- What model and provider? ({{model_provider}}: hosted API, self-hosted, fine-tuned. This drives the dependency and cost rows.)
- What is the regulatory exposure? (Jurisdictions, sectors, EU AI Act risk tier if known. )
- Who can accept risk, and how often will this be reviewed? (The named register owner and the cadence. Do not skip this; it is the point.)
Step 2: Set the cadence and the owner first
Before scoring anything, pin down two facts and write them at the top of the register:
- Register owner: the single named person accountable for keeping it current. Not a team, a person.
- Review cadence: how often the register is reopened and re-scored (monthly for a system in active change, quarterly for a stable one, plus event-triggered reviews on incident or model swap).
If the user cannot name an owner, stop and say so. A register without an owner is the failure mode this skill exists to prevent.
Step 3: Enumerate risks across dimensions
Walk every dimension and name the concrete risks for this system. Do not skip a dimension; write "none identified" explicitly if it genuinely does not apply, so the gap is a decision, not an oversight.
- Accuracy / hallucination: confidently wrong output, fabricated facts, citations to sources that do not exist.
- Bias / fairness: systematically worse outcomes for a group; proxy discrimination in features or training data.
- Security: prompt injection, jailbreaks, data exfiltration through the model, tool-use abuse.
- Privacy / PII: user data in prompts or logs, memorized training data leaking, inadequate retention controls.
- IP / copyright: training-data provenance, output that reproduces protected work, unclear ownership of generated content.
- Cost overrun: token spend scaling faster than value, runaway loops, no per-user or per-tenant cap.
- Availability / reliability: provider outage, rate-limit throttling, latency spikes degrading the product.
- Compliance / regulatory: EU AI Act obligations, sector rules (health, finance), disclosure and consent requirements.
- Reputational: a screenshot-able bad output, public bias incident, perceived creepiness.
- Third-party / model-provider dependency: provider deprecates a model, changes pricing, alters terms, or restricts your use case.
Step 4: Score by judgment
For each risk, assign Likelihood and Impact on a coarse scale (Low / Medium / High). Derive a Score by combining them (e.g. High x High = High; Medium x Low = Low). Resist any urge to produce decimals.
When stakeholders disagree on a score, do not average. Record the spread (e.g. "Likelihood: Med/High, security lead dissents High") and treat the disagreement as a flag to resolve in review, not noise to smooth over.
Step 5: Assign owner, control, residual, and trigger
For each row, fill:
- Owner: the named person responsible for the mitigation (distinct from the register owner).
- Mitigation / Control: the specific thing that reduces likelihood or impact (an eval gate, a PII filter, a spend cap, a fallback model, a human review step).
- Residual: the risk that remains after the control, scored on the same coarse scale.
- Acceptance: for any residual scored Medium or High, name the human who has accepted it. No name means not yet accepted, which means not yet ready to ship that risk.
- Trigger signal: the early-warning metric or event that says this risk is materializing (hallucination rate crossing a threshold, a spike in refused requests, a provider status-page incident, monthly spend exceeding budget). A risk with no trigger signal is one you will only learn about from a customer.
Step 6: Output the register
# AI Risk Register: {{system_name}}
**Register owner:** (named person, accountable for keeping this current)
**Review cadence:** (monthly / quarterly + event triggers)
**Last reviewed:** (date) **Next review:** (date)
**Risk acceptor(s):** (named person/people with authority to accept residual risk)
| Risk | Dimension | Likelihood | Impact | Score | Owner | Mitigation / Control | Residual | Accepted by | Trigger signal |
|---|---|---|---|---|---|---|---|---|---|
| (one-line description) | Accuracy | Med | High | High | (name) | (specific control) | Med | (name or "not accepted") | (metric/event + threshold) |
| | Security | | | | | | | | |
| | Privacy | | | | | | | | |
| | IP / copyright | | | | | | | | |
| | Cost | | | | | | | | |
| | Reliability | | | | | | | | |
| | Compliance | | | | | | | | |
| | Reputational | | | | | | | | |
| | Provider dependency | | | | | | | | |
## Open disagreements
- (any score the team did not converge on, with who dissented and why)
## Changes since last review
- (rows added, scores moved, residuals re-accepted)
Step 7: Review
Ask the user:
- Does every Medium-or-High residual have a named acceptor? (If not, it is not ready to ship.)
- Does every High-score risk have a trigger signal someone actually watches?
- Who reopens this register, when, and what forces an off-cycle review?
- Where do the scores hide a disagreement that should be on the table instead?
Anti-patterns
| Anti-pattern | Why it fails | Do instead |
|---|---|---|
| Write once, never reopen | Gives false confidence; the world moves and the register does not | Set a cadence and a named owner before scoring anything |
| Composite decimal scores | "3.7" hides who disagreed and erases the conversation | Keep the scale coarse and record disagreement explicitly |
| Residual risk with no acceptor | Lists the risk but decides nothing; no one is on the hook | Name a human who accepts each Medium/High residual |
| No trigger signals | You learn the risk materialized from a customer, not a metric | Define an early-warning signal and threshold per risk |
| Register owned by "the team" | Diffuse ownership means no one keeps it current | Name one accountable person, not a group |
| Dimensions silently skipped | A blank row reads as "safe" when it means "not considered" | Write "none identified" explicitly so the gap is a decision |
Output location
Present the register as formatted markdown in the conversation; recommend the user store it as a living document in their governance space (alongside the /ai-governance-framework output) and reopen it on the cadence set in Step 2.