Use this when you suspect data issues are undermining decisions -- dashboards showing unexpected patterns, metrics that don't reconcile, or teams losing trust in their numbers. Also use proactively when standing up a new analytics surface or onboarding a data-heavy product. Produces a scored data quality report with validation rules, freshness SLOs, and a remediation roadmap. Now includes schema quality scoring, non-technical team assessment mode, and cross-system consistency checks.
Related skills: Pair with
/metrics-dashboardto validate the data underneath metric definitions. Use/instrumentation-planfor system-level observability. Feed findings into/analytics-dashboard-designto add data quality indicators to dashboards. Use/data-governance-playbookto establish ongoing governance. Pair with/integration-landscape-mapfor cross-system data flow context.
Process
Step 1: Gather inputs
Ask the user to provide:
- Data sources under review -- which databases, pipelines, event streams, or third-party integrations?
- Critical metrics -- which metrics does the business rely on for decisions? (Link to
/metrics-dashboardoutput if available.) - Known data issues -- what's already broken, suspect, or complained about?
- Pipeline architecture -- how does data flow from source to analytics surface? (Event capture, ETL/ELT, warehouse, BI tool.)
- Team context -- who owns the data? Who consumes it? Is there a data team, or do PMs and engineers own pipelines?
- Current monitoring -- any existing data quality checks, alerts, or validation rules?
Step 2: Assess the six dimensions of data quality
Evaluate each dimension across the critical data sources:
## Data Quality Assessment -- {{product/team}}, {{date}}
### Dimension scores
| Dimension | Definition | Score (1-5) | Key findings |
|-----------|-----------|-------------|-------------|
| Completeness | Are expected records present? Are required fields populated? | (n) | (e.g., "12% of events missing user_id") |
| Accuracy | Do values reflect reality? Do calculations match definitions? | (n) | (e.g., "Revenue metric double-counts refunded orders") |
| Freshness | Is data available when needed? What's the lag from event to queryable? | (n) | (e.g., "Dashboard refreshes daily but decisions need hourly data") |
| Consistency | Do the same concepts match across sources? Do metrics reconcile? | (n) | (e.g., "User count differs between product DB and analytics warehouse") |
| Validity | Do values fall within expected ranges? Do they conform to schema? | (n) | (e.g., "Negative session durations in 0.3% of records") |
| Uniqueness | Are there duplicates? Is deduplication handled correctly? | (n) | (e.g., "Event replay after outage created 2x records for March 14") |
| Schema quality | Are naming conventions consistent? Are relationships defined? Are orphaned entities minimal? | (n) | (e.g., "65% of custom objects have zero relationships -- orphan risk") |
### Score definitions
- **5** -- Monitored and maintained. Automated validation, clear SLOs, rare issues.
- **4** -- Solid. Occasional issues caught quickly. Some automation.
- **3** -- Functional. Known gaps, manual checks, issues discovered reactively.
- **2** -- Fragile. Frequent issues, low trust, workarounds common.
- **1** -- Unreliable. Data regularly contradicts reality. Decisions made despite data, not because of it.
Step 3: Audit metric computation
For each critical metric, verify the computation chain from source to surface:
### Metric computation audit
| Metric | Definition (as documented) | Actual computation | Match? | Issue |
|--------|--------------------------|-------------------|--------|-------|
| {{metric_name}} | (What the metric is supposed to measure) | (What the SQL/code actually computes) | Yes / No / Partial | (e.g., "Denominator excludes churned users but definition says 'all users'") |
### Common computation issues found
- [ ] Numerator/denominator mismatch with documented definition
- [ ] Time window ambiguity (calendar day vs. rolling 24h vs. business day)
- [ ] Null handling (are nulls excluded, counted as zero, or causing errors?)
- [ ] Currency/timezone conversions applied inconsistently
- [ ] Metric versioning (definition changed but historical data not restated)
- [ ] Aggregation errors (averaging averages, summing rates)
Step 4: Define freshness SLOs
For each critical data source, define when data should be available:
### Freshness SLOs
| Data source | Current latency | Required latency | SLO | Monitoring |
|-------------|----------------|-----------------|-----|-----------|
| {{source}} | (e.g., T+4 hours) | (e.g., T+1 hour) | (e.g., "99% of daily loads complete by 7am UTC") | (How is this monitored? Alert channel?) |
### Freshness monitoring rules
- Define "stale" threshold for each source (when should an alert fire?)
- Track load completion times, not just schedule
- Monitor row counts vs. expected volume (a load that completes with 0 rows is worse than a late load)
Step 5: Design validation rules
Propose automated checks for the highest-risk data:
### Validation rules
| Rule | Check | Severity | Action on failure |
|------|-------|----------|------------------|
| Completeness | {{field}} NOT NULL rate > {{threshold}}% | Critical / Warning | Block pipeline / Alert |
| Volume | Row count within {{n}}% of trailing 7-day average | Warning | Alert data team |
| Range | {{field}} BETWEEN {{min}} AND {{max}} | Critical | Quarantine rows |
| Referential | All {{foreign_key}} values exist in {{parent_table}} | Critical | Block pipeline |
| Recency | MAX({{timestamp_field}}) within {{n}} hours of now | Critical | Alert on-call |
| Uniqueness | COUNT(DISTINCT {{key}}) = COUNT(*) | Critical | Deduplicate and alert |
| Cross-source | {{metric}} in source A within {{n}}% of source B | Warning | Investigate discrepancy |
### Implementation priority
- Start with Critical rules on the 3-5 most-used tables
- Add Warning rules in the second pass
- Automate using: (dbt tests / Great Expectations / Monte Carlo / custom SQL checks / other)
Step 6: Assess schema quality (if applicable)
If the product has configurable or user-defined data structures (custom objects, custom fields, configurable forms), assess schema-level quality:
### Schema quality assessment
#### Naming consistency
| Check | Result | Issue |
|-------|--------|-------|
| Entity names follow a consistent pattern | (Y/N) | (e.g., mix of PascalCase and snake_case across object types) |
| Field names follow a consistent pattern | (Y/N) | (e.g., "created_date" vs. "dateCreated" vs. "create_dt") |
| Boolean fields named as questions | (Y/N) | (e.g., "active" instead of "is_active") |
| No generic names (field_1, misc, other) | (Y/N) | (Count of generic names found) |
#### Relationship health
| Check | Result | Issue |
|-------|--------|-------|
| Orphan rate (entities with zero relationships) | {{n}}% | (Target: < 10%. High orphan rates suggest users don't understand relationships or don't find them valuable.) |
| Circular reference count | {{n}} | (Circular references create infinite loops in queries and UI.) |
| Missing foreign keys (expected relationships not defined) | {{n}} | (Which relationships should exist based on the domain?) |
#### Schema sprawl
| Check | Result | Issue |
|-------|--------|-------|
| Unused entities (created but empty or < 5 records) | {{n}} | (Candidates for cleanup or admin guidance) |
| Fields with > 90% null values | {{n}} | (Either optional by design or a dead field) |
| Duplicate entity patterns (e.g., "Assets" and "Equipment" with same fields) | {{n}} | (Merge candidates) |
Step 7: Cross-system consistency check (if multiple data sources)
When data exists across multiple systems, verify consistency at integration boundaries:
### Cross-system consistency
| Entity/metric | System A | System B | Match? | Discrepancy | Root cause |
|--------------|----------|----------|--------|-------------|-----------|
| {{entity}} | (Count/value in system A) | (Count/value in system B) | Y/N/Partial | (Size of gap) | (Sync lag / filter mismatch / dedup difference / schema drift) |
### Common cross-system issues
- [ ] Same entity counted differently across systems (different filters or dedup logic)
- [ ] Sync lag creates temporary discrepancies that look like errors
- [ ] Schema drift (field renamed in one system but not the other)
- [ ] One system has records the other doesn't (failed sync, manual creation, different data entry points)
- [ ] Currency, timezone, or unit conversions applied in one system but not the other
### Source-of-truth recommendation
| Entity/metric | Recommended source of truth | Reason |
|--------------|---------------------------|--------|
| {{entity}} | (Which system should be authoritative?) | (Why this system over the others?) |
Step 8: Non-technical team scoring mode
If the team does not have a dedicated data engineering function, add this summary layer:
### Team data capability summary
| Capability | Current state | Gap | Recommendation |
|-----------|--------------|-----|---------------|
| Can the team define what "good data" looks like? | (Y/Partial/N) | | |
| Can the team detect data issues without engineering help? | (Y/Partial/N) | | |
| Can the team fix data issues without engineering help? | (Y/Partial/N) | | |
| Is there a named person responsible for data quality? | (Y/N) | | |
| Does the team review data quality regularly (not just when something breaks)? | (Y/N) | | |
### Recommended first actions for non-technical teams
1. Assign a data steward (does not need to write SQL -- needs authority to flag and escalate)
2. Create a "data issues" channel or form (lower the barrier to reporting)
3. Start with 3 checks the steward can do manually in 15 min/week
4. Automate those 3 checks as the first engineering investment
Step 9: Build the remediation roadmap
### Remediation roadmap
| Priority | Issue | Dimension | Remediation | Effort | Impact |
|----------|-------|-----------|-------------|--------|--------|
| P0 | (Actively causing wrong decisions) | (which) | (Specific fix) | S/M/L | (What improves) |
| P1 | (Eroding trust or creating workarounds) | (which) | (Specific fix) | S/M/L | (What improves) |
| P2 | (Technical debt, not yet causing visible harm) | (which) | (Specific fix) | S/M/L | (What improves) |
### Quick wins
- (Fix that's small effort, visible trust improvement)
### Systemic issues
- (Patterns needing architectural changes)
### What's working well
- (Data quality strengths to protect)
Step 10: Review
Ask the user:
- Do the dimension scores match your team's experience? Where do they lose trust?
- Are the freshness SLOs realistic given current infrastructure?
- Who will own the validation rules? (Without an owner, they decay.)
- What's the team's appetite for data quality investment vs. feature work?
- Is there a data quality incident that prompted this assessment?
Step 10b: ALCOA+ lens (for regulated or clinical data systems)
When the data system operates in a regulated environment (clinical trials, healthcare, life sciences), layer the ALCOA+ data integrity framework on top of the standard six-dimension assessment:
### ALCOA+ Data Integrity Overlay
Use this section when the data under assessment is subject to regulatory requirements (21 CFR Part 11, ICH-GCP, CLIA, HIPAA) or when decisions based on this data have safety or legal consequences.
| ALCOA+ Attribute | Status | Gap |
|---|---|---|
| **Attributable** -- every data entry traceable to a specific person | (Y/Partial/N) | |
| **Legible** -- data readable and permanent, no format degradation | (Y/Partial/N) | |
| **Contemporaneous** -- timestamps reflect actual event time, not backfill | (Y/Partial/N) | |
| **Original** -- source of truth defined, copies marked as copies | (Y/Partial/N) | |
| **Accurate** -- validation rules, range checks, referential integrity in place | (Y/Partial/N) | |
| **Complete** -- no selective deletion, failed results retained | (Y/Partial/N) | |
| **Consistent** -- no contradictions across systems or time periods | (Y/Partial/N) | |
| **Enduring** -- retention policies defined and enforced | (Y/Partial/N) | |
| **Available** -- data retrievable on demand for inspection | (Y/Partial/N) | |
### Regulatory-specific checks
- [ ] Audit trail enabled, immutable, and covering all critical data fields
- [ ] Electronic signatures compliant with 21 CFR Part 11 (if applicable)
- [ ] Change reasons documented for every data modification
- [ ] Data retention meets regulatory requirements (not just business requirements)
- [ ] Cross-system reconciliation procedures defined and executed on schedule
Related skill: For a full ALCOA+ audit, use
/clinical-data-integrity-audit.
Output location
Present the assessment as formatted text in the conversation. The remediation roadmap can be exported for backlog planning.
Example Output
Input
- Data sources under review: Snowflake analytics warehouse (populated via Fivetran from Salesforce CRM, Stripe billing, and a custom Rails app event stream via Segment), plus a Looker BI layer on top
- Critical metrics: Monthly Recurring Revenue (MRR), Net Revenue Retention (NRR), Product-Qualified Lead (PQL) conversion rate, and Weekly Active Accounts
- Known data issues: MRR figures in Looker don't match the Stripe revenue report CFO uses; PQL conversion rate jumped 40% in mid-October with no corresponding change in sales activity; the data team suspects event replay after an October 14 infrastructure incident
- Pipeline architecture: Segment → Fivetran → Snowflake raw layer → dbt transformations → Looker explores; Salesforce and Stripe synced separately via Fivetran on 6-hour schedules
- Team context: 2-person data team (1 analytics engineer, 1 data analyst) supporting 8 stakeholders across Finance, Sales, and Product; no formal data ownership outside the data team
- Current monitoring: dbt schema tests (not null, accepted values) on ~30% of models; no freshness monitoring; no cross-source reconciliation; no alerting
Output
Data Quality Assessment — Meridian Analytics Platform, November 2024
Dimension Scores
| Dimension | Definition | Score (1–5) | Key Findings |
|---|---|---|---|
| Completeness | Expected records present; required fields populated | 3 | 8% of Segment events missing account_id; Salesforce opportunities missing close_date in 14% of closed-won records |
| Accuracy | Values reflect reality; calculations match definitions | 2 | MRR double-counts expansion MRR when a subscription is upgraded mid-period; NRR denominator uses beginning-of-month ARR but code pulls beginning-of-quarter |
| Freshness | Data available when needed; lag from event to queryable | 2 | Salesforce and Stripe sync on 6-hour schedules; Finance reviews MRR at 8am and frequently queries stale data from the prior evening |
| Consistency | Same concepts match across sources; metrics reconcile | 2 | MRR differs by ~$31K between Looker and Stripe dashboard; Weekly Active Accounts count differs by ~420 between product DB and Snowflake warehouse |
| Validity | Values within expected ranges; schema conformance | 3 | Negative session_duration_ms values in 0.6% of records; 3 Stripe subscription statuses (past_due_paused, trialing_grandfathered, legacy) not in the accepted values list |
| Uniqueness | No duplicates; deduplication handled correctly | 1 | October 14 event replay created duplicate Segment events; deduplication logic in dbt uses event_id but Segment reissued new IDs on replay — duplicates are invisible to current checks |
| Schema Quality | Naming conventions consistent; relationships defined; low orphan rate | 3 | Mix of snake_case and camelCase field names across 4 dbt models; pql_score calculated in two separate models with slightly different logic |
Overall score: 2.3 / 5 — Fragile. Immediate remediation required on Accuracy, Freshness, Consistency, and Uniqueness before MRR and NRR can be trusted for board reporting.
Metric Computation Audit
| Metric | Definition (Documented) | Actual Computation | Match? | Issue |
|---|---|---|---|---|
| MRR | Sum of normalized monthly value of all active subscriptions at period end | Sums amount from Stripe subscriptions table including past_due status | Partial | past_due subscriptions included in MRR; Finance excludes them manually — $12–18K discrepancy depending on month |
| NRR | (Beginning ARR + expansion + contraction + churn) / Beginning ARR, cohorted monthly | Denominator pulls arr_start_of_quarter, not arr_start_of_month | No | Denominator mismatch inflates NRR by 3–6 points in growth months |
| PQL Conversion Rate | (Accounts reaching PQL threshold that convert to paid within 30 days) / (Total accounts reaching PQL threshold) | Counts duplicate events post-October 14 as separate PQL signals; some accounts triggered threshold twice | No | ~40% spike in mid-October is an artifact of event replay, not real conversion lift |
| Weekly Active Accounts | Distinct accounts with ≥1 qualifying event in rolling 7-day window | Uses account_id from Segment events; 8% null account_id rate means those sessions are excluded silently | Partial | Understates WAA by estimated 6–9%; nulls excluded without alerting or documentation |
Common Computation Issues Found
- Null handling — nulls excluded from WAA without documentation or alerting
- Metric versioning — NRR definition changed from quarterly to monthly cohorts in August; historical data not restated
- Numerator/denominator mismatch — NRR denominator uses wrong time window
- Aggregation errors —
pql_scoreaveraged across two models using different field weights - Time window ambiguity — not currently an issue but
rolling_7dwindow definition not documented - Currency/timezone conversions — Stripe amounts in USD only; not a current risk
Freshness SLOs
| Data Source | Current Latency | Required Latency | Proposed SLO | Monitoring |
|---|---|---|---|---|
| Stripe → Snowflake (billing) | T+6 hours (Fivetran schedule) | T+2 hours for Finance morning review | 99% of loads complete by 6am UTC | Fivetran webhook → Slack #data-alerts; row count check |
| Salesforce → Snowflake (CRM) | T+6 hours | T+4 hours (less time-critical) | 99% of loads complete by 5am UTC | Same webhook pattern |
| Segment → Snowflake (events) | T+15–45 min (near-real-time) | T+60 min acceptable | 95% of hourly event volumes arrive within 60 min of event time | Volume-based anomaly check; alert if hourly count < 70% of same-hour prior 7-day average |
| dbt transformations | Runs at 3am UTC; no completion monitoring | Models powering Finance must complete by 6am UTC | 99% of critical models complete by 6am UTC | dbt Cloud job alerts; model-level updated_at freshness test |
Freshness Monitoring Rules
- Stale threshold — Stripe: Alert if
max(updated_at)instripe.subscriptionsis > 3 hours behind wall clock at 6am UTC - Stale threshold — Segment: Alert if hourly event count drops below 50% of trailing 7-day average for the same hour (catches both pipeline failures and replay scenarios)
- Row count monitoring is mandatory — the October 14 replay scenario would have been caught 12 hours earlier with a "volume spike" alert; add upper-bound checks (> 200% of expected = replay risk)
Validation Rules
| Rule | Check | Severity | Action on Failure |
|---|---|---|---|
| Completeness — account_id | account_id NOT NULL rate > 95% on Segment events | Critical | Alert #data-alerts; surface null rate in Looker data quality tile |
| Volume — lower bound | Hourly Segment event count > 60% of 7-day trailing average | Critical | Page on-call data engineer |
| Volume — upper bound (replay detection) | Hourly Segment event count < 250% of 7-day trailing average | Warning | Alert and quarantine new events pending review |
| Range — session duration | session_duration_ms BETWEEN 0 AND 86400000 | Warning | Quarantine rows; log to data_quality.invalid_sessions |
| Referential — account existence | All account_id values in events table exist in accounts master table | Critical | Block downstream dbt models; alert |
| Uniqueness — event deduplication | After dedup, COUNT(DISTINCT composite_key) = COUNT(*) where composite_key = user_id + event_name + timestamp_trunc(minute) | Critical | Halt PQL model |