Observability Plan - AI Agent Skill

Use this when you need to design, audit, or improve product-level observability -- measuring what users do, how they experience the product, and whether they accomplish their goals. This covers event tracking, user journeys, task completion, funnel analysis, and product performance from the user's perspective. If you're looking to measure system health, uptime, or deployment reliability, use /instrumentation-plan instead.

The distinction: Observability answers "Are users successful?" Instrumentation answers "Is the system healthy?" Both are needed. Start here if you can't answer basic questions about how users interact with your product.

For AI-powered features: This skill covers product-level observability. If you need to monitor LLM-specific concerns (prompt quality, token costs, hallucination drift, model regression), use /llm-observability-plan -- it covers the AI-specific layer between product analytics and system instrumentation.

Process

Step 1: Gather context

Ask the user to provide:

Product description -- what does this product do? Who are the users? What are the primary use cases?
Current analytics -- what's already tracked? (existing analytics tools, dashboards, event logs)
Key user journeys -- the 3-5 most important paths a user takes (e.g., signup → first value, search → purchase, onboard → habit)
Business goals -- what does success look like? (activation rate, retention, revenue, engagement)
Known blind spots -- what questions about user behavior can't be answered today?
Analytics stack -- tools in use or under consideration (Amplitude, Mixpanel, PostHog, GA4, custom, etc.)

If the user doesn't have all of this, work with what's available. Flag gaps as assumptions.

Step 2: Define the event taxonomy

A clean event taxonomy is the foundation of product observability. Design it before implementing anything.

Event naming convention:

Use a consistent Object Action or object_action pattern:

Pattern	Example	Anti-pattern
`Object Action`	`Button Clicked`, `Page Viewed`, `Form Submitted`	`click`, `pageview`, `submit`
Namespace prefix	`Onboarding Step Completed`, `Search Query Executed`	`step_done`, `searched`
Past tense for completed actions	`Account Created`, `Item Purchased`	`creating_account`, `buying`

Event categories:

Category	Purpose	Examples
Lifecycle	Track user progression	`Account Created`, `Onboarding Completed`, `Subscription Started`, `Account Churned`
Engagement	Track feature usage	`Feature Used`, `Content Viewed`, `Search Executed`, `Export Generated`
Conversion	Track goal completion	`Trial Started`, `Purchase Completed`, `Upgrade Initiated`
Navigation	Track movement patterns	`Page Viewed`, `Tab Switched`, `Navigation Clicked`
Error / Friction	Track failure points	`Error Displayed`, `Form Validation Failed`, `Timeout Experienced`
Security / Anomaly	Track security-relevant behavior for baseline building	`Permission Elevated`, `Unusual Access Pattern`, `Data Export Volume Exceeded`, `Off-Hours Activity`, `New Device Login`

Event properties (payload structure):

Every event should include:

Property	Type	Purpose	Example
`event_name`	string	What happened	`Button Clicked`
`timestamp`	ISO 8601	When it happened	`2026-03-05T14:30:00Z`
`user_id`	string	Who did it (anonymized if needed)	`usr_abc123`
`session_id`	string	Session grouping	`sess_xyz789`
`page` / `screen`	string	Where it happened	`/dashboard`
`properties`	object	Context-specific details	`{ button_name: "Export", format: "CSV" }`

Taxonomy design rules:

Decide on a naming convention before your first event -- retrofitting is expensive
Every event must answer: Who did what, where, when, and with what context?
Limit property cardinality -- a property with 10,000 unique values is hard to analyze
Version your taxonomy -- when you rename or restructure events, document the change

Present the taxonomy as a table:

Event Name	Category	Trigger	Key Properties	Priority
(Page Viewed)	Navigation	Any page load	`page_path`, `referrer`, `load_time_ms`	P0
(Feature X Used)	Engagement	User completes action X	`feature_name`, `input_type`, `result_count`	P0
(Signup Completed)	Lifecycle	Registration finishes	`signup_method`, `referral_source`	P0

Step 3: Map user journeys and funnels

For each key user journey, define the funnel:

Funnel template:

Step	Event	Success Criteria	Expected Drop-off	Alert If
1. (Entry)	`Page Viewed (landing)`	User arrives	--	Traffic < (threshold)
2. (Engagement)	`Feature Explored`	User interacts	40-60% drop expected	Drop > 70%
3. (Activation)	`First Value Achieved`	User gets value	20-40% drop expected	Drop > 50%
4. (Conversion)	`Goal Completed`	User converts	10-30% drop expected	Drop > 40%

For each funnel:

Name the journey -- e.g., "New user → first value" or "Search → purchase"
Define the steps -- specific events that mark progression
Set baseline expectations -- what's a healthy drop-off rate at each step?
Define alert thresholds -- when does drop-off signal a problem vs. normal behavior?
Identify branch points -- where do users take alternate paths? Are those paths tracked?

Step 4: Task completion and timing

Beyond funnels, measure whether users accomplish what they came to do:

Task completion framework:

Task	Start Event	End Event	Success Criteria	Time Target	Measure
(Complete onboarding)	`Onboarding Started`	`Onboarding Completed`	All steps finished	< 5 minutes	Completion rate, median time
(Find and use a feature)	`Search Executed`	`Feature Used`	User finds what they need	< 30 seconds	Success rate, time to result
(Submit a form)	`Form Opened`	`Form Submitted`	Valid submission	< 2 minutes	Completion rate, error rate, abandonment point

Timing metrics to capture:

Time to first value -- how long from signup/entry to the first meaningful outcome?
Time on task -- how long does a specific workflow take?
Time between sessions -- how frequently do users return?
Perceived performance -- Core Web Vitals (LCP, FID/INP, CLS) as user-facing performance signals

Behavioral baseline signals (optional -- include when the product feeds security or anomaly detection):

If the organization uses AI-powered security tools (UEBA, anomaly detection), product observability events serve double duty -- they're both product analytics and security intelligence. Consider tracking these behavioral patterns as part of the event taxonomy:

Signal	What it establishes	Anomaly example
Access frequency per user	Normal usage cadence	Sudden spike or off-pattern access
Typical session duration	Expected engagement length	Unusually long or short sessions
Normal data access volume	Baseline download/export behavior	Bulk data export outside normal range
Geographic consistency	Expected access locations	Login from new region or impossible travel
Feature access patterns	Which features a user typically uses	Sudden access to admin or sensitive features

These signals feed /telemetry-readiness-audit assessments and enable AI security tools to build meaningful behavioral baselines from the same instrumentation effort.

Step 5: Feature adoption and retention signals

Track whether features are actually used and whether usage sticks:

Adoption metrics:

Metric	Formula	What it tells you
Feature adoption rate	Users who used feature / total active users	Is the feature discoverable?
Activation rate	Users who completed key action / users who signed up	Are users getting value?
Breadth of use	# of features used per user per session	Are users exploring or stuck?
Depth of use	Frequency of feature use per user per week	Is usage habitual or one-time?
Retention (D1/D7/D30)	Users returning on day N / users who started on day 0	Does the product stick?
Stickiness (DAU/MAU)	Daily active users / monthly active users	How often do users come back?

Cohort analysis guidance:

Always segment by acquisition cohort (week or month of first use)
Compare feature adoption across cohorts to detect trends
Separate new users from power users in adoption metrics -- they have different baselines

Step 6: Experiment instrumentation

If the team runs A/B tests or experiments, ensure the observability layer supports them:

Experiment tracking requirements:

Every user session tagged with active experiment variants
Experiment assignment logged as an event (Experiment Assigned with experiment_name, variant, user_id)
Primary and secondary metrics defined before the experiment starts
Sample size and duration calculated before launch (not after)
Guardrail metrics defined -- metrics that must not degrade (e.g., page load time, error rate)

Experiment event structure:

Event	Properties	When
`Experiment Assigned`	`experiment_name`, `variant`, `assignment_method`	User enters experiment
`Experiment Exposed`	`experiment_name`, `variant`, `exposure_context`	User sees the variant
`Experiment Goal Reached`	`experiment_name`, `variant`, `goal_name`, `goal_value`	User hits primary metric

Step 7: Generate the observability plan

Compile everything into a single document:

Observability Plan -- (Project name)

Generated: (date) Product: (brief description) Current state: (summary of what's tracked today)

Event Taxonomy

(Table from Step 2 -- event names, categories, triggers, properties, priority)

User Journey Funnels

(Funnel definitions from Step 3 -- one per key journey)

Task Completion Metrics

(Table from Step 4 -- tasks, events, targets, timing)

Feature Adoption & Retention

(Metrics from Step 5 -- adoption rate, activation, retention cohorts)

Experiment Instrumentation

(Structure from Step 6 -- if applicable; omit if team doesn't run experiments yet)

Implementation Checklist

Priority-ordered list of what to implement next:

(P0) (Most critical gap -- e.g., "No event tracking exists; implement page views and core action events")
(P0) (Second critical gap -- e.g., "Signup funnel has no step-level tracking")
(P1) (Important but not urgent -- e.g., "Add timing instrumentation to onboarding flow")
(P1) (Next important item)
(P2) (Nice to have -- e.g., "Implement breadth-of-use metric across feature set")

Data Governance Notes

PII handling: (what user data is collected, how it's anonymized or consented)
Retention policy: (how long event data is kept)
Access: (who can see raw events vs. aggregated dashboards)

Open Questions

(Anything that couldn't be resolved without more information)

Step 8: Review and refine

Ask the user:

Does the event taxonomy cover the questions you need to answer about user behavior?
Are the funnels measuring the right steps? Any steps missing or too granular?
Are the task completion targets realistic based on what you know about user behavior?
Is the implementation checklist ordered correctly for your current priorities?
Are there privacy or compliance constraints that affect what can be tracked?
Do you need experiment instrumentation now, or is that a future concern?

Adjust based on feedback.

Related skills

/instrumentation-plan -- plan SRE instrumentation for system health, uptime, and deployment reliability
/experiment-design -- design a specific experiment to validate a product hypothesis
/research-synthesize -- synthesize qualitative research alongside quantitative observability data

Output location

Present the plan as formatted text in the conversation. The user can copy it into their product wiki, analytics documentation, or team shared drive.

Example Output

Input

Product description: Capsule is a B2B SaaS product that helps HR teams build and manage employee onboarding programs — creating task checklists, sending automated welcome sequences, and tracking new hire progress through their first 90 days
Current analytics: Google Analytics 4 (page views only), no event tracking beyond that; no funnel visibility; onboarding completion rates estimated manually from Salesforce notes
Key user journeys: (1) HR admin signup → first program published, (2) New hire receives invite → completes onboarding checklist, (3) HR admin reviews progress dashboard → takes action on at-risk employee
Business goals: Improve activation rate (HR admin publishes first program within 7 days of signup), improve new hire completion rate (target 80%), reduce churn among accounts where no program is published within 14 days
Known blind spots: No visibility into where HR admins abandon the program builder, no data on which checklist steps new hires skip most, no session-level timing, can't tell if progress dashboard is actually used
Analytics stack: Migrating to PostHog (self-hosted); Salesforce for CRM; Intercom for in-app messaging; no data warehouse yet

Output (abbreviated)

Observability Plan — Capsule HR Onboarding

Generated: 2025-07-14 Product: B2B SaaS for HR teams — builds and tracks employee onboarding programs through 90-day new hire journeys Current state: GA4 page view tracking only. No event-level visibility. Activation, funnel, and task completion data are estimated manually. PostHog migration in progress — this plan targets PostHog as the implementation target.

Event Taxonomy

Event Name	Category	Trigger	Key Properties	Priority
`Page Viewed`	Navigation	Any page load	`page_path`, `referrer`, `load_time_ms`, `user_role`	P0
`Account Created`	Lifecycle	HR admin completes registration	`signup_method`, `company_size`, `referral_source`	P0
`Program Builder Opened`	Engagement	Admin clicks "Create Program"	`entry_point`, `template_used`	P0
`Program Step Added`	Engagement	Admin adds a task to program	`step_type`, `step_index`, `program_id`	P0
`Program Published`	Conversion	Admin clicks "Publish"	`program_id`, `step_count`, `time_to_publish_days`, `template_used`	P0
`New Hire Invited`	Lifecycle	Admin sends onboarding invite	`program_id`, `invite_method`, `days_before_start_date`	P0
`Onboarding Checklist Opened`	Engagement	New hire opens their checklist	`program_id`, `device_type`, `hours_since_invite`	P0
`Checklist Step Completed`	Engagement	New hire marks a step done	`step_id`, `step_type`, `step_index`, `program_id`, `completion_method`	P0
`Checklist Step Skipped`	Error / Friction	New hire skips or bypasses a step	`step_id`, `step_type`, `step_index`, `skip_reason`	P0
`Onboarding Completed`	Lifecycle	All required steps finished	`program_id`, `total_steps`, `days_to_complete`, `skip_count`	P0
`Progress Dashboard Viewed`	Engagement	Admin opens new hire progress view	`new_hire_count`, `at_risk_count`, `view_depth_seconds`	P1
`At-Risk Employee Actioned`	Conversion	Admin sends nudge or reassigns step	`action_type`, `days_since_last_hire_activity`, `program_id`	P1
`Program Builder Abandoned`	Error / Friction	Admin exits builder without publishing (session ends)	`last_step_reached`, `steps_added`, `time_in_builder_minutes`	P1
`Form Validation Failed`	Error / Friction	Inline error shown to user	`form_name`, `field_name`, `error_type`, `user_role`	P1
`Experiment Assigned`	Lifecycle	User enters A/B test	`experiment_name`, `variant`, `user_role`	P1
`Account Churned`	Lifecycle	Subscription cancelled or not renewed	`tenure_days`, `programs_published`, `last_active_date`	P1
`Bulk Export Generated`	Security / Anomaly	Admin exports new hire data	`record_count`, `export_format`, `time_of_day`	P2
`Permission Elevated`	Security / Anomaly	User role changed to admin	`changed_by`, `previous_role`, `account_id`	P2

User Journey Funnels

Journey 1: HR Admin Signup → First Program Published (Activation)

Step	Event	Success Criteria	Expected Drop-off	Alert If
1. Signup	`Account Created`	Admin registers	—	Volume < 20% below 7-day avg
2. Builder Entry	`Program Builder Opened`	Admin starts building within 7 days	20–30% drop	Drop > 45%
3. Content Added	`Program Step Added` (3+ events)	Admin adds at least 3 steps	20–30% drop	Drop > 40%
4. Published	`Program Published`	Admin publishes first program	25–35% drop	Drop > 50%

Target activation rate: ≥ 55% of signups publish a program within 7 days Critical blind spot addressed: Program Builder Abandoned event reveals where admins stall — step count and time in builder pinpoint the friction.

Journey 2: New Hire → Onboarding Completed

Step	Event	Success Criteria	Expected Drop-off	Alert If
1. Invited	`New Hire Invited`	Invite delivered	—	Delivery failure rate > 5%
2. Checklist Opened	`Onboarding Checklist Opened`	New hire opens within 48 hrs	10–20% drop	Drop > 35%
3. First Step Completed	`Checklist Step Completed` (step_index = 1)	Any first action taken	15–25% drop	Drop > 40%
4. Halfway	`Checklist Step Completed` (step_index = 50% of total)	Sustained progress	15–25% drop	Drop > 35%
5. Completed	`Onboarding Completed`	All required steps done	10–20% drop	Completion rate < 70%

Target completion rate: ≥ 80% of invited new hires Note: Checklist Step Skipped by step_index will reveal which specific tasks block completion — this is Capsule's most actionable unknown today.

Journey 3: HR Admin → Progress Dashboard → Action Taken

Step	Event	Success Criteria	Expected Drop-off	Alert If
1. Dashboard Opened	`Progress Dashboard Viewed`	Admin views dashboard	—	Less than 40% of active accounts/week
2. At-Risk Identified	Dashboard view with `at_risk_count > 0`	Admin sees a flagged hire	Varies	—
3. Action Taken	`At-Risk Employee Actioned`	Admin responds within 48 hrs	40–60% drop	Action rate < 25% on at-risk accounts

Task Completion Metrics

Task	Start Event	End Event	Success Criteria	Time Target	Measure
Publish first program	`Program Builder Opened`	`Program Published`	Program has ≥ 3 steps	< 20 minutes	Completion rate, median time, abandonment step
New hire completes onboarding	`Onboarding Checklist Opened`	`Onboarding Completed`	All required steps done	< 30 days	Completion rate, skip

Run this now

Try /observability-plan on your own input

0/4000

Part of these Playbook topics

Metrics

Related Engineering skills

ADR Generate AI Testing Strategy Architecture Context Reviewer Architecture Discovery Boris Model Build vs Buy Code Review Codependency Analyzer

Back to Skills Catalog