Skip to main content
Product Management/data-quality-assessment

Data Quality Assessment

You need to assess data quality across pipelines, metrics, and analytics surfaces.

Use this when you suspect data issues are undermining decisions -- dashboards showing unexpected patterns, metrics that don't reconcile, or teams losing trust in their numbers. Also use proactively when standing up a new analytics surface or onboarding a data-heavy product. Produces a scored data quality report with validation rules, freshness SLOs, and a remediation roadmap. Now includes schema quality scoring, non-technical team assessment mode, and cross-system consistency checks.

Related skills: Pair with /metrics-dashboard to validate the data underneath metric definitions. Use /instrumentation-plan for system-level observability. Feed findings into /analytics-dashboard-design to add data quality indicators to dashboards. Use /data-governance-playbook to establish ongoing governance. Pair with /integration-landscape-map for cross-system data flow context.

Process

Step 1: Gather inputs

Ask the user to provide:

  1. Data sources under review -- which databases, pipelines, event streams, or third-party integrations?
  2. Critical metrics -- which metrics does the business rely on for decisions? (Link to /metrics-dashboard output if available.)
  3. Known data issues -- what's already broken, suspect, or complained about?
  4. Pipeline architecture -- how does data flow from source to analytics surface? (Event capture, ETL/ELT, warehouse, BI tool.)
  5. Team context -- who owns the data? Who consumes it? Is there a data team, or do PMs and engineers own pipelines?
  6. Current monitoring -- any existing data quality checks, alerts, or validation rules?

Step 2: Assess the six dimensions of data quality

Evaluate each dimension across the critical data sources:

## Data Quality Assessment -- {{product/team}}, {{date}}

### Dimension scores

| Dimension | Definition | Score (1-5) | Key findings |
|-----------|-----------|-------------|-------------|
| Completeness | Are expected records present? Are required fields populated? | (n) | (e.g., "12% of events missing user_id") |
| Accuracy | Do values reflect reality? Do calculations match definitions? | (n) | (e.g., "Revenue metric double-counts refunded orders") |
| Freshness | Is data available when needed? What's the lag from event to queryable? | (n) | (e.g., "Dashboard refreshes daily but decisions need hourly data") |
| Consistency | Do the same concepts match across sources? Do metrics reconcile? | (n) | (e.g., "User count differs between product DB and analytics warehouse") |
| Validity | Do values fall within expected ranges? Do they conform to schema? | (n) | (e.g., "Negative session durations in 0.3% of records") |
| Uniqueness | Are there duplicates? Is deduplication handled correctly? | (n) | (e.g., "Event replay after outage created 2x records for March 14") |
| Schema quality | Are naming conventions consistent? Are relationships defined? Are orphaned entities minimal? | (n) | (e.g., "65% of custom objects have zero relationships -- orphan risk") |

### Score definitions
- **5** -- Monitored and maintained. Automated validation, clear SLOs, rare issues.
- **4** -- Solid. Occasional issues caught quickly. Some automation.
- **3** -- Functional. Known gaps, manual checks, issues discovered reactively.
- **2** -- Fragile. Frequent issues, low trust, workarounds common.
- **1** -- Unreliable. Data regularly contradicts reality. Decisions made despite data, not because of it.

Step 3: Audit metric computation

For each critical metric, verify the computation chain from source to surface:

### Metric computation audit

| Metric | Definition (as documented) | Actual computation | Match? | Issue |
|--------|--------------------------|-------------------|--------|-------|
| {{metric_name}} | (What the metric is supposed to measure) | (What the SQL/code actually computes) | Yes / No / Partial | (e.g., "Denominator excludes churned users but definition says 'all users'") |

### Common computation issues found
- [ ] Numerator/denominator mismatch with documented definition
- [ ] Time window ambiguity (calendar day vs. rolling 24h vs. business day)
- [ ] Null handling (are nulls excluded, counted as zero, or causing errors?)
- [ ] Currency/timezone conversions applied inconsistently
- [ ] Metric versioning (definition changed but historical data not restated)
- [ ] Aggregation errors (averaging averages, summing rates)

Step 4: Define freshness SLOs

For each critical data source, define when data should be available:

### Freshness SLOs

| Data source | Current latency | Required latency | SLO | Monitoring |
|-------------|----------------|-----------------|-----|-----------|
| {{source}} | (e.g., T+4 hours) | (e.g., T+1 hour) | (e.g., "99% of daily loads complete by 7am UTC") | (How is this monitored? Alert channel?) |

### Freshness monitoring rules
- Define "stale" threshold for each source (when should an alert fire?)
- Track load completion times, not just schedule
- Monitor row counts vs. expected volume (a load that completes with 0 rows is worse than a late load)

Step 5: Design validation rules

Propose automated checks for the highest-risk data:

### Validation rules

| Rule | Check | Severity | Action on failure |
|------|-------|----------|------------------|
| Completeness | {{field}} NOT NULL rate > {{threshold}}% | Critical / Warning | Block pipeline / Alert |
| Volume | Row count within {{n}}% of trailing 7-day average | Warning | Alert data team |
| Range | {{field}} BETWEEN {{min}} AND {{max}} | Critical | Quarantine rows |
| Referential | All {{foreign_key}} values exist in {{parent_table}} | Critical | Block pipeline |
| Recency | MAX({{timestamp_field}}) within {{n}} hours of now | Critical | Alert on-call |
| Uniqueness | COUNT(DISTINCT {{key}}) = COUNT(*) | Critical | Deduplicate and alert |
| Cross-source | {{metric}} in source A within {{n}}% of source B | Warning | Investigate discrepancy |

### Implementation priority
- Start with Critical rules on the 3-5 most-used tables
- Add Warning rules in the second pass
- Automate using: (dbt tests / Great Expectations / Monte Carlo / custom SQL checks / other)

Step 6: Assess schema quality (if applicable)

If the product has configurable or user-defined data structures (custom objects, custom fields, configurable forms), assess schema-level quality:

### Schema quality assessment

#### Naming consistency
| Check | Result | Issue |
|-------|--------|-------|
| Entity names follow a consistent pattern | (Y/N) | (e.g., mix of PascalCase and snake_case across object types) |
| Field names follow a consistent pattern | (Y/N) | (e.g., "created_date" vs. "dateCreated" vs. "create_dt") |
| Boolean fields named as questions | (Y/N) | (e.g., "active" instead of "is_active") |
| No generic names (field_1, misc, other) | (Y/N) | (Count of generic names found) |

#### Relationship health
| Check | Result | Issue |
|-------|--------|-------|
| Orphan rate (entities with zero relationships) | {{n}}% | (Target: < 10%. High orphan rates suggest users don't understand relationships or don't find them valuable.) |
| Circular reference count | {{n}} | (Circular references create infinite loops in queries and UI.) |
| Missing foreign keys (expected relationships not defined) | {{n}} | (Which relationships should exist based on the domain?) |

#### Schema sprawl
| Check | Result | Issue |
|-------|--------|-------|
| Unused entities (created but empty or < 5 records) | {{n}} | (Candidates for cleanup or admin guidance) |
| Fields with > 90% null values | {{n}} | (Either optional by design or a dead field) |
| Duplicate entity patterns (e.g., "Assets" and "Equipment" with same fields) | {{n}} | (Merge candidates) |

Step 7: Cross-system consistency check (if multiple data sources)

When data exists across multiple systems, verify consistency at integration boundaries:

### Cross-system consistency

| Entity/metric | System A | System B | Match? | Discrepancy | Root cause |
|--------------|----------|----------|--------|-------------|-----------|
| {{entity}} | (Count/value in system A) | (Count/value in system B) | Y/N/Partial | (Size of gap) | (Sync lag / filter mismatch / dedup difference / schema drift) |

### Common cross-system issues
- [ ] Same entity counted differently across systems (different filters or dedup logic)
- [ ] Sync lag creates temporary discrepancies that look like errors
- [ ] Schema drift (field renamed in one system but not the other)
- [ ] One system has records the other doesn't (failed sync, manual creation, different data entry points)
- [ ] Currency, timezone, or unit conversions applied in one system but not the other

### Source-of-truth recommendation
| Entity/metric | Recommended source of truth | Reason |
|--------------|---------------------------|--------|
| {{entity}} | (Which system should be authoritative?) | (Why this system over the others?) |

Step 8: Non-technical team scoring mode

If the team does not have a dedicated data engineering function, add this summary layer:

### Team data capability summary

| Capability | Current state | Gap | Recommendation |
|-----------|--------------|-----|---------------|
| Can the team define what "good data" looks like? | (Y/Partial/N) | | |
| Can the team detect data issues without engineering help? | (Y/Partial/N) | | |
| Can the team fix data issues without engineering help? | (Y/Partial/N) | | |
| Is there a named person responsible for data quality? | (Y/N) | | |
| Does the team review data quality regularly (not just when something breaks)? | (Y/N) | | |

### Recommended first actions for non-technical teams
1. Assign a data steward (does not need to write SQL -- needs authority to flag and escalate)
2. Create a "data issues" channel or form (lower the barrier to reporting)
3. Start with 3 checks the steward can do manually in 15 min/week
4. Automate those 3 checks as the first engineering investment

Step 9: Build the remediation roadmap

### Remediation roadmap

| Priority | Issue | Dimension | Remediation | Effort | Impact |
|----------|-------|-----------|-------------|--------|--------|
| P0 | (Actively causing wrong decisions) | (which) | (Specific fix) | S/M/L | (What improves) |
| P1 | (Eroding trust or creating workarounds) | (which) | (Specific fix) | S/M/L | (What improves) |
| P2 | (Technical debt, not yet causing visible harm) | (which) | (Specific fix) | S/M/L | (What improves) |

### Quick wins
- (Fix that's small effort, visible trust improvement)

### Systemic issues
- (Patterns needing architectural changes)

### What's working well
- (Data quality strengths to protect)

Step 10: Review

Ask the user:

  • Do the dimension scores match your team's experience? Where do they lose trust?
  • Are the freshness SLOs realistic given current infrastructure?
  • Who will own the validation rules? (Without an owner, they decay.)
  • What's the team's appetite for data quality investment vs. feature work?
  • Is there a data quality incident that prompted this assessment?

Step 10b: ALCOA+ lens (for regulated or clinical data systems)

When the data system operates in a regulated environment (clinical trials, healthcare, life sciences), layer the ALCOA+ data integrity framework on top of the standard six-dimension assessment:

### ALCOA+ Data Integrity Overlay

Use this section when the data under assessment is subject to regulatory requirements (21 CFR Part 11, ICH-GCP, CLIA, HIPAA) or when decisions based on this data have safety or legal consequences.

| ALCOA+ Attribute | Status | Gap |
|---|---|---|
| **Attributable** -- every data entry traceable to a specific person | (Y/Partial/N) | |
| **Legible** -- data readable and permanent, no format degradation | (Y/Partial/N) | |
| **Contemporaneous** -- timestamps reflect actual event time, not backfill | (Y/Partial/N) | |
| **Original** -- source of truth defined, copies marked as copies | (Y/Partial/N) | |
| **Accurate** -- validation rules, range checks, referential integrity in place | (Y/Partial/N) | |
| **Complete** -- no selective deletion, failed results retained | (Y/Partial/N) | |
| **Consistent** -- no contradictions across systems or time periods | (Y/Partial/N) | |
| **Enduring** -- retention policies defined and enforced | (Y/Partial/N) | |
| **Available** -- data retrievable on demand for inspection | (Y/Partial/N) | |

### Regulatory-specific checks
- [ ] Audit trail enabled, immutable, and covering all critical data fields
- [ ] Electronic signatures compliant with 21 CFR Part 11 (if applicable)
- [ ] Change reasons documented for every data modification
- [ ] Data retention meets regulatory requirements (not just business requirements)
- [ ] Cross-system reconciliation procedures defined and executed on schedule

Related skill: For a full ALCOA+ audit, use /clinical-data-integrity-audit.

Output location

Present the assessment as formatted text in the conversation. The remediation roadmap can be exported for backlog planning.

Example Output

Input

  • Data sources under review: Snowflake analytics warehouse (populated via Fivetran from Salesforce CRM, Stripe billing, and a custom Rails app event stream via Segment), plus a Looker BI layer on top
  • Critical metrics: Monthly Recurring Revenue (MRR), Net Revenue Retention (NRR), Product-Qualified Lead (PQL) conversion rate, and Weekly Active Accounts
  • Known data issues: MRR figures in Looker don't match the Stripe revenue report CFO uses; PQL conversion rate jumped 40% in mid-October with no corresponding change in sales activity; the data team suspects event replay after an October 14 infrastructure incident
  • Pipeline architecture: Segment → Fivetran → Snowflake raw layer → dbt transformations → Looker explores; Salesforce and Stripe synced separately via Fivetran on 6-hour schedules
  • Team context: 2-person data team (1 analytics engineer, 1 data analyst) supporting 8 stakeholders across Finance, Sales, and Product; no formal data ownership outside the data team
  • Current monitoring: dbt schema tests (not null, accepted values) on ~30% of models; no freshness monitoring; no cross-source reconciliation; no alerting

Output

Data Quality Assessment — Meridian Analytics Platform, November 2024


Dimension Scores

DimensionDefinitionScore (1–5)Key Findings
CompletenessExpected records present; required fields populated38% of Segment events missing account_id; Salesforce opportunities missing close_date in 14% of closed-won records
AccuracyValues reflect reality; calculations match definitions2MRR double-counts expansion MRR when a subscription is upgraded mid-period; NRR denominator uses beginning-of-month ARR but code pulls beginning-of-quarter
FreshnessData available when needed; lag from event to queryable2Salesforce and Stripe sync on 6-hour schedules; Finance reviews MRR at 8am and frequently queries stale data from the prior evening
ConsistencySame concepts match across sources; metrics reconcile2MRR differs by ~$31K between Looker and Stripe dashboard; Weekly Active Accounts count differs by ~420 between product DB and Snowflake warehouse
ValidityValues within expected ranges; schema conformance3Negative session_duration_ms values in 0.6% of records; 3 Stripe subscription statuses (past_due_paused, trialing_grandfathered, legacy) not in the accepted values list
UniquenessNo duplicates; deduplication handled correctly1October 14 event replay created duplicate Segment events; deduplication logic in dbt uses event_id but Segment reissued new IDs on replay — duplicates are invisible to current checks
Schema QualityNaming conventions consistent; relationships defined; low orphan rate3Mix of snake_case and camelCase field names across 4 dbt models; pql_score calculated in two separate models with slightly different logic

Overall score: 2.3 / 5 — Fragile. Immediate remediation required on Accuracy, Freshness, Consistency, and Uniqueness before MRR and NRR can be trusted for board reporting.


Metric Computation Audit

MetricDefinition (Documented)Actual ComputationMatch?Issue
MRRSum of normalized monthly value of all active subscriptions at period endSums amount from Stripe subscriptions table including past_due statusPartialpast_due subscriptions included in MRR; Finance excludes them manually — $12–18K discrepancy depending on month
NRR(Beginning ARR + expansion + contraction + churn) / Beginning ARR, cohorted monthlyDenominator pulls arr_start_of_quarter, not arr_start_of_monthNoDenominator mismatch inflates NRR by 3–6 points in growth months
PQL Conversion Rate(Accounts reaching PQL threshold that convert to paid within 30 days) / (Total accounts reaching PQL threshold)Counts duplicate events post-October 14 as separate PQL signals; some accounts triggered threshold twiceNo~40% spike in mid-October is an artifact of event replay, not real conversion lift
Weekly Active AccountsDistinct accounts with ≥1 qualifying event in rolling 7-day windowUses account_id from Segment events; 8% null account_id rate means those sessions are excluded silentlyPartialUnderstates WAA by estimated 6–9%; nulls excluded without alerting or documentation

Common Computation Issues Found

  • Null handling — nulls excluded from WAA without documentation or alerting
  • Metric versioning — NRR definition changed from quarterly to monthly cohorts in August; historical data not restated
  • Numerator/denominator mismatch — NRR denominator uses wrong time window
  • Aggregation errors — pql_score averaged across two models using different field weights
  • Time window ambiguity — not currently an issue but rolling_7d window definition not documented
  • Currency/timezone conversions — Stripe amounts in USD only; not a current risk

Freshness SLOs

Data SourceCurrent LatencyRequired LatencyProposed SLOMonitoring
Stripe → Snowflake (billing)T+6 hours (Fivetran schedule)T+2 hours for Finance morning review99% of loads complete by 6am UTCFivetran webhook → Slack #data-alerts; row count check
Salesforce → Snowflake (CRM)T+6 hoursT+4 hours (less time-critical)99% of loads complete by 5am UTCSame webhook pattern
Segment → Snowflake (events)T+15–45 min (near-real-time)T+60 min acceptable95% of hourly event volumes arrive within 60 min of event timeVolume-based anomaly check; alert if hourly count < 70% of same-hour prior 7-day average
dbt transformationsRuns at 3am UTC; no completion monitoringModels powering Finance must complete by 6am UTC99% of critical models complete by 6am UTCdbt Cloud job alerts; model-level updated_at freshness test

Freshness Monitoring Rules

  • Stale threshold — Stripe: Alert if max(updated_at) in stripe.subscriptions is > 3 hours behind wall clock at 6am UTC
  • Stale threshold — Segment: Alert if hourly event count drops below 50% of trailing 7-day average for the same hour (catches both pipeline failures and replay scenarios)
  • Row count monitoring is mandatory — the October 14 replay scenario would have been caught 12 hours earlier with a "volume spike" alert; add upper-bound checks (> 200% of expected = replay risk)

Validation Rules

RuleCheckSeverityAction on Failure
Completeness — account_idaccount_id NOT NULL rate > 95% on Segment eventsCriticalAlert #data-alerts; surface null rate in Looker data quality tile
Volume — lower boundHourly Segment event count > 60% of 7-day trailing averageCriticalPage on-call data engineer
Volume — upper bound (replay detection)Hourly Segment event count < 250% of 7-day trailing averageWarningAlert and quarantine new events pending review
Range — session durationsession_duration_ms BETWEEN 0 AND 86400000WarningQuarantine rows; log to data_quality.invalid_sessions
Referential — account existenceAll account_id values in events table exist in accounts master tableCriticalBlock downstream dbt models; alert
Uniqueness — event deduplicationAfter dedup, COUNT(DISTINCT composite_key) = COUNT(*) where composite_key = user_id + event_name + timestamp_trunc(minute)CriticalHalt PQL model