Skip to main content
Product Management/cohort-analysis

Cohort Analysis

You need to analyze retention patterns and diagnose churn by cohort.

A client needs to understand retention patterns, compare user groups over time, diagnose churn, or forecast future behavior based on historical cohort data. Works for subscription, usage-based, e-commerce, and marketplace businesses.


How it works

  1. You provide the product/service, available data, retention definition, time period, and key events or milestones
  2. The skill designs retention curves, defines behavioral segments for cohort comparison, diagnoses churn patterns with root cause hypotheses, and recommends intervention timing
  3. It returns a cohort analysis framework with interpretation guide, churn diagnosis, and cohort-based forecast Kate can use for retention strategy conversations

Prompt

You are building a cohort analysis framework for a Kate Makrigiannis consulting engagement. Kate uses this to help clients see past vanity metrics and understand how real groups of users behave over time. Cohort analysis reveals whether the product is actually getting better at retaining people, or just growing fast enough to hide churn. Before writing, read knowledge/voice-tone-guide.md -- use the client-facing voice.

Inputs I will provide:

  • Product/Service: {{PRODUCT}} (what the product is, business model, stage)
  • Available data: {{DATA}} (what data exists -- e.g., "signup dates, login activity, subscription status," "purchase history by customer," "event-level analytics in Mixpanel," or "we have limited data, mostly spreadsheets")
  • Retention definition: {{RETENTION}} (what "retained" means for this business -- e.g., "logged in at least once in a 7-day window," "made a purchase in the calendar month," "active subscription," or "not sure, help me define it")
  • Time period: {{TIME_PERIOD}} (range of data available and analysis window -- e.g., "12 months of data, want to look at monthly cohorts" or "6 weeks of data, daily cohorts")
  • Key events/milestones: {{EVENTS}} (product changes, pricing changes, marketing campaigns, seasonal factors that might affect cohort behavior)
  • Context (optional): {{CONTEXT}} (specific questions, known retention problems, growth targets, segments of interest)

Step 1: Define the cohort framework

Retention Definition

Before building any analysis, lock down precisely what "retained" means:

ParameterDefinitionRationale
Cohort grouping[How users are grouped -- e.g., by signup week, first purchase month, activation date][Why this grouping makes sense for the business]
Retention event[The specific action that counts as "retained" -- e.g., "completed a session," "made a purchase," "logged in"][Why this event is the right signal]
Retention window[Time bucket for measuring -- daily, weekly, monthly][Matches natural usage frequency of the product]
Measurement method[Bounded vs. unbounded retention -- "did the event happen in week N" vs. "did the event happen on or after day N"][Which is more appropriate for this product]

Alternative Retention Definitions to Consider

AlternativeWhen to UseTrade-off
[e.g., "Any login" vs. "Core action completed"][Broad engagement vs. meaningful usage][Broad definition flatters retention; narrow definition is more honest]
[e.g., "Weekly" vs. "Monthly" windows][High-frequency vs. low-frequency products][Wider windows smooth noise but hide early drop-off]

Step 2: Retention curve design

Standard Retention Table Template

Design the cohort retention table structure:

CohortSizePeriod 0Period 1Period 2Period 3Period 4Period 5Period 6
[Month/Week 1][N users]100%[X%][X%][X%][X%][X%][X%]
[Month/Week 2][N users]100%[X%][X%][X%][X%][X%]--
[Month/Week 3][N users]100%[X%][X%][X%][X%]----

If the client provides actual data, populate the table. If designing the framework, explain what goes in each cell and how to compute it.

Show the math: "Period 1 retention = Users who performed [retention event] in Period 1 / Total users in cohort x 100"

Retention Curve Interpretation Guide

Curve ShapeWhat It MeansTypical CauseAction
Steep early drop, then flattensProduct has a core retained audience but loses most users quicklyOnboarding friction, wrong audience, unclear value propFocus on activation and time-to-value
Gradual steady declineUsers slowly disengage over time, no stable floorProduct lacks habit loops or ongoing valueBuild engagement hooks, recurring value triggers
Flat high retentionMost users stick aroundStrong product-market fit for this segmentFocus on acquisition, the product retains well
Flat low retentionAlmost everyone leaves quicklyFundamental product or audience problemRevisit product-market fit before optimizing retention
Improving over time (newer cohorts retain better)Product is getting better at retainingProduct improvements, better onboarding, better targetingKeep iterating, quantify what changed
Declining over time (newer cohorts retain worse)Product or audience quality is degradingChannel mix shift, market saturation, product neglectDiagnose urgently -- this compounds fast

Step 3: Behavioral segmentation for cohort comparison

Define segments that are worth comparing as separate cohorts:

Segmentation Criteria

Segment DimensionSegments to CompareHypothesis
Acquisition channel[e.g., Organic vs. Paid vs. Referral][Organic users may retain better because higher intent]
Activation behavior[e.g., Completed onboarding vs. Did not][Users who hit the aha moment retain at higher rates]
Plan/Tier[e.g., Free vs. Paid vs. Enterprise][Paid users have sunk cost, likely higher retention]
Geography[e.g., US vs. International][Product-market fit may vary by region]
Use case[e.g., Primary use case A vs. B][One use case may have stronger retention loops]
Time-based[e.g., Pre-launch vs. Post-launch of feature X][Feature X was supposed to improve retention -- did it?]

Priority segments to analyze first:

  1. [Segment comparison] -- because [reason this is the highest-value comparison]
  2. [Segment comparison] -- because [reason]
  3. [Segment comparison] -- because [reason]

Step 4: Churn pattern identification

Churn Timing Analysis

Churn Window% of Total ChurnCumulativePattern
Period 0-1[X%][X%][Early churn -- activation problem]
Period 1-3[X%][X%][Short-term churn -- value realization gap]
Period 3-6[X%][X%][Medium-term churn -- engagement decay]
Period 6-12[X%][X%][Long-term churn -- competitive displacement or needs change]
Period 12+[X%][X%][Mature churn -- natural lifecycle]

If actual data is provided, compute these. If designing the framework, explain how to calculate each and what to look for.

Root Cause Hypothesis Matrix

For each significant churn window, generate testable hypotheses:

Churn WindowHypothesisSupporting SignalHow to ValidateConfidence
Period 0-1[e.g., Users do not understand value in first session][e.g., <30% complete onboarding][Onboarding completion funnel analysis][High / Medium / Low]
Period 0-1[e.g., Wrong audience from paid acquisition][e.g., Paid cohorts churn 2x vs. organic][Compare cohorts by acquisition channel][Confidence]
Period 1-3[e.g., No habit loop after initial use][e.g., Usage drops 80% after week 1][Session frequency analysis by cohort week][Confidence]
Period 3-6[e.g., Users hit a capability ceiling][e.g., Power users upgrade, others leave][Feature usage correlation with retention][Confidence]

Step 4b: Statistical rigor for cohort comparisons

When comparing retention between segments or testing whether a cohort difference is real vs. noise:

Statistical Validation

ComparisonMethodWhen to use
Two cohort retention rates at a single time pointZ-test for proportions or chi-square"Is the January cohort's Month 3 retention different from February's?"
Full retention curves between two groupsLog-rank test"Does the entire retention trajectory differ between organic and paid users?"
Retention with covariatesCox proportional hazards regression"After controlling for plan type and geography, does acquisition channel affect retention?"
Time-to-event (time to churn)Kaplan-Meier estimator"What's the median time to churn, accounting for users who haven't churned yet?"

Censoring: the most common cohort analysis mistake. Users who signed up recently haven't had the opportunity to churn at later periods. This isn't "100% retention at Month 6" -- it's missing data. Kaplan-Meier curves handle this correctly by adjusting the denominator as users are "censored" (their observation window hasn't reached that period yet). Standard retention tables handle this by only showing cells where the cohort has had enough time.

Confidence intervals on retention rates: Report retention rates with confidence intervals, especially for small cohorts. A cohort of 50 users showing 60% Month 1 retention has a 95% CI of roughly [45%, 74%] -- that's a wide range. A cohort of 5,000 at 60% has a CI of [58.6%, 61.4%]. The sample size determines whether the difference you see is signal or noise.

Related skills: For choosing the right statistical test, use /statistical-test-selector. For understanding whether a retention intervention caused the improvement, use /causal-inference-guide.

Step 5: Intervention timing recommendations

Intervention Map

Churn Risk WindowInterventionTriggerChannelExpected Impact
Day 0-3[Onboarding email sequence][Signup without completing core action][Email + in-app][Increase activation by X%]
Day 7-14[Re-engagement nudge][No activity for 5+ days][Push / Email][Recover X% of dormant users]
Day 30[Value check-in][End of first month][Email / in-app survey][Identify at-risk users early]
Day 60-90[Feature education][Users not using key features][In-app walkthrough][Expand usage depth]
Day 180+[Win-back campaign][Churned for 30+ days][Email with incentive][Recover X% at lower LTV]

Step 6: Cohort-based forecasting

Retention Forecast Model

If the client wants to project future revenue or user counts:

InputValueSource
Monthly new users[X][Current acquisition rate or target]
Retention curve (mature cohort)[X% at Month 1, X% at Month 3, X% at Month 6, X% at Month 12][Historical cohort data or benchmark]
Revenue per retained user[$X/month][ARPU or subscription price]

Forward Projection

MonthNew UsersRetained from Prior CohortsTotal Active UsersMonthly Revenue
Month 1[X][0][X][$X]
Month 2[X][X from M1 x M1 retention %][X][$X]
Month 3[X][Sum of retained from all prior cohorts][X][$X]
...............
Month 12[X][X][X][$X]

Show the math for at least Month 3: "Month 3 active = New M3 users + (M1 cohort x M3 retention %) + (M2 cohort x M2 retention %) = X + X + X = X users"

Scenario Comparison

ScenarioRetention ChangeImpact on Month 12 Active UsersRevenue Impact
BaselineCurrent retention curve[X users][$X/month]
+5% retention improvement[Adjusted curve][X users (+Y%)][$X/month (+$Z)]
+10% retention improvement[Adjusted curve][X users (+Y%)][$X/month (+$Z)]

"A 5 percentage-point improvement in Month 1 retention compounds to [X] additional active users by Month 12, worth approximately $[X] in additional monthly revenue."

Kate's Talking Points

  • "Your retention curve shows [shape]. This tells us [interpretation]. The biggest opportunity is [specific window and action]."
  • "Newer cohorts are retaining [better/worse/the same] as older cohorts. This means [the product is improving / something is degrading / retention is stable]."
  • "If we improve [specific retention window] by [X] percentage points, that compounds to [X additional users and $X revenue] over 12 months."

Related skills: Feeds into /funnel-analysis for understanding where in the funnel retention breaks down. Pairs with /growth-model-builder for growth projections using cohort retention inputs. Uses /audience-segmentation for defining behavioral segments to compare.


Example Output

Input

  • Product/Service: Brightpath — a B2B SaaS platform for mid-market HR teams to manage employee learning and development (LMS + skills tracking). $299/seat/year subscription. 3 years old, ~$4.2M ARR, Series A raised 14 months ago. Growing ~40% YoY but board is asking hard questions about net revenue retention.
  • Available data: Salesforce (contract start/end dates, ARR by account), Mixpanel (event-level usage: logins, course completions, skills assessments, admin actions), and a spreadsheet of monthly active account counts by signup quarter going back 28 months. No individual seat-level data — retention is tracked at the account level.
  • Retention definition: Not fully defined. Currently reporting "accounts with at least one login in the calendar month" — but leadership suspects this is masking low-depth engagement. Want help defining something more meaningful.
  • Time period: 28 months of data; want monthly cohort analysis (cohorts defined by contract start month). Most mature cohort is 28 months old; most recent is 2 months old.
  • Key events/milestones: New onboarding flow launched Month 10; pricing increased ~18% at Month 16; mobile app released Month 20; a large enterprise-focused content library added Month 24. Sales shifted toward mid-market (50–500 employees) starting around Month 12 after early growth came from smaller SMB accounts.
  • Context: NRR is reportedly 91% — below the 100%+ benchmark investors expect. Leadership suspects early churn is the main problem but hasn't validated this. Three cohorts from the SMB era (Months 1–6) are dragging down the averages. Want to know whether the product is actually getting better at retaining mid-market accounts or just looks that way because SMB cohorts are aging out.

Output (abbreviated)

Cohort Analysis Framework: Brightpath

Prepared for Kate Makrigiannis | Engagement Reference: Brightpath NRR Diagnostic


Step 1: Retention Definition

Locked Retention Definition

ParameterDefinitionRationale
Cohort groupingMonth of contract start date (from Salesforce)Aligns retention measurement to the contractual relationship, not login behavior
Retention eventAccount completed ≥1 course OR ran ≥1 skills assessment in the calendar monthDistinguishes genuine platform usage from admin check-ins; ties retention to Brightpath's core value
Retention windowMonthly (calendar month)Matches natural L&D workflow cycles — most HR teams plan training monthly or quarterly
Measurement methodBounded retention — "did the account perform the retention event in Month N?"Appropriate for subscription businesses; tells us whether accounts are active right now, not just ever again

Why the current definition is misleading: "Any login" captures HR admins logging in to pull a report or check a setting — not actual learning activity. An account logging in once per month with zero courses completed is technically "retained" under the current definition but is a churn risk. This inflated headline number is likely masking a depth-of-engagement problem.

Alternative Retention Definitions to Consider

AlternativeWhen to UseTrade-off
"Any login" (current method)Useful for early-stage products where any engagement signal mattersFlatters retention numbers; an account with one admin login looks identical to one with 200 completions
"≥X% of licensed seats active monthly"Better for seat-based contracts with large accountsPenalizes accounts that are active but using fewer seats than purchased — may not reflect renewal risk accurately
"Course completion + skills assessment" (recommended)When the product's core value is demonstrated learning outcomesNarrower definition will show lower retention initially — that's honest, and it's where the real work is

Step 2: Retention Curve Design

Standard Retention Table Template (Framework — Populate from Mixpanel + Salesforce)

CohortAccountsM0M1M3M6M12M18M24
Q1 SMB (M1–3)~85100%74%58%41%28%19%14%
Q2 SMB (M4–6)~110100%71%54%38%25%17%--
Q3 Transition (M7–9)~95100%76%61%47%33%----
Q4 Mid-market (M10–12)~130100%81%67%54%38%----
Q5 Mid-market (M13–15)~160100%83%70%57%------
Q6 Post-pricing (M16–18)~145100%80%66%--------
Q7 Mobile era (M19–21)~175100%85%71%--------
Q8 Recent (M22–24)~190100%84%----------

Cells marked -- are censored: those cohorts haven't reached that period yet. Do not report these as 100% retention. Leave them blank in client-facing materials.

The math:

M3 retention for Q4 Mid-market cohort = Accounts that completed ≥1 course OR skills assessment in Month 3 ÷ 130 total accounts in cohort × 100 = 87 ÷ 130 = 67%

Retention Curve Interpretation Guide — Brightpath Context

Curve ShapeMatch to Brightpath?Interpretation
Steep early drop, then flattens✅ Yes — especially SMB cohortsBrightpath loses ~25–30% of accounts in Month 1, then stabilizes. This is an activation problem, not a long-term product failure.
Improving over time (newer cohorts retain better)✅ Likely — Q4 onward shows improvementThe mid-market shift + new onboarding (Month 10) appear to be working. This is the story the board needs to hear — with proof.
Declining over time⚠️ Watch for post-pricing cohorts (M16+)If M16–18 cohorts flatten below M13–15, the 18% price increase may have filtered for lower-commitment accounts.

Step 3: Behavioral Segmentation

Priority Cohort Comparisons

Segment DimensionSegments to CompareHypothesis
Company size (ICP shift)SMB (<50 employees) vs. Mid-market (50–500)Mid-market accounts have dedicated L&D budgets and stronger internal champions; should retain meaningfully better
Onboarding completionAccounts that completed new onboarding flow (Month 10+) vs. those that did notStructured onboarding likely reduces early churn by accelerating time-to-first-course-completion
Activation depth in Month 1Accounts with ≥10 course completions in M1 vs. <10Early depth of usage is almost always the strongest leading indicator of long-term retention
Pricing tierPre-price-increase vs. post-price-increase cohortsHigher price point may attract higher-intent buyers or may be filtering out marginal accounts
Mobile adoptionAccounts with ≥20% of sessions on mobile vs. desktop-onlyMobile app (Month 20) may have unlocked a new usage pattern that correlates with retention

Priority ranking:

  1. SMB vs. Mid-market — this is the core diagnostic question. If mid-market cohorts retain at 110%+ NRR, the SMB drag is a legacy problem that will naturally age out. That's a very different conversation with the board than "we have a retention problem."
  2. **Activation depth in Month