Data Quality Assessment - AI Agent Skill

Use this when you suspect data issues are undermining decisions -- dashboards showing unexpected patterns, metrics that don't reconcile, or teams losing trust in their numbers. Also use proactively when standing up a new analytics surface or onboarding a data-heavy product. Produces a scored data quality report with validation rules, freshness SLOs, and a remediation roadmap. Now includes schema quality scoring, non-technical team assessment mode, and cross-system consistency checks.

Related skills: Pair with /metrics-dashboard to validate the data underneath metric definitions. Use /instrumentation-plan for system-level observability. Feed findings into /analytics-dashboard-design to add data quality indicators to dashboards. Use /data-governance-playbook to establish ongoing governance. Pair with /integration-landscape-map for cross-system data flow context.

Process

Step 1: Gather inputs

Ask the user to provide:

Data sources under review -- which databases, pipelines, event streams, or third-party integrations?
Critical metrics -- which metrics does the business rely on for decisions? (Link to /metrics-dashboard output if available.)
Known data issues -- what's already broken, suspect, or complained about?
Pipeline architecture -- how does data flow from source to analytics surface? (Event capture, ETL/ELT, warehouse, BI tool.)
Team context -- who owns the data? Who consumes it? Is there a data team, or do PMs and engineers own pipelines?
Current monitoring -- any existing data quality checks, alerts, or validation rules?

Step 2: Assess the six dimensions of data quality

Evaluate each dimension across the critical data sources:

## Data Quality Assessment -- {{product/team}}, {{date}}

### Dimension scores

| Dimension | Definition | Score (1-5) | Key findings |
|-----------|-----------|-------------|-------------|
| Completeness | Are expected records present? Are required fields populated? | (n) | (e.g., "12% of events missing user_id") |
| Accuracy | Do values reflect reality? Do calculations match definitions? | (n) | (e.g., "Revenue metric double-counts refunded orders") |
| Freshness | Is data available when needed? What's the lag from event to queryable? | (n) | (e.g., "Dashboard refreshes daily but decisions need hourly data") |
| Consistency | Do the same concepts match across sources? Do metrics reconcile? | (n) | (e.g., "User count differs between product DB and analytics warehouse") |
| Validity | Do values fall within expected ranges? Do they conform to schema? | (n) | (e.g., "Negative session durations in 0.3% of records") |
| Uniqueness | Are there duplicates? Is deduplication handled correctly? | (n) | (e.g., "Event replay after outage created 2x records for March 14") |
| Schema quality | Are naming conventions consistent? Are relationships defined? Are orphaned entities minimal? | (n) | (e.g., "65% of custom objects have zero relationships -- orphan risk") |

### Score definitions
- **5** -- Monitored and maintained. Automated validation, clear SLOs, rare issues.
- **4** -- Solid. Occasional issues caught quickly. Some automation.
- **3** -- Functional. Known gaps, manual checks, issues discovered reactively.
- **2** -- Fragile. Frequent issues, low trust, workarounds common.
- **1** -- Unreliable. Data regularly contradicts reality. Decisions made despite data, not because of it.

Step 3: Audit metric computation

For each critical metric, verify the computation chain from source to surface:

### Metric computation audit

| Metric | Definition (as documented) | Actual computation | Match? | Issue |
|--------|--------------------------|-------------------|--------|-------|
| {{metric_name}} | (What the metric is supposed to measure) | (What the SQL/code actually computes) | Yes / No / Partial | (e.g., "Denominator excludes churned users but definition says 'all users'") |

### Common computation issues found
- [ ] Numerator/denominator mismatch with documented definition
- [ ] Time window ambiguity (calendar day vs. rolling 24h vs. business day)
- [ ] Null handling (are nulls excluded, counted as zero, or causing errors?)
- [ ] Currency/timezone conversions applied inconsistently
- [ ] Metric versioning (definition changed but historical data not restated)
- [ ] Aggregation errors (averaging averages, summing rates)

Step 4: Define freshness SLOs

For each critical data source, define when data should be available:

### Freshness SLOs

| Data source | Current latency | Required latency | SLO | Monitoring |
|-------------|----------------|-----------------|-----|-----------|
| {{source}} | (e.g., T+4 hours) | (e.g., T+1 hour) | (e.g., "99% of daily loads complete by 7am UTC") | (How is this monitored? Alert channel?) |

### Freshness monitoring rules
- Define "stale" threshold for each source (when should an alert fire?)
- Track load completion times, not just schedule
- Monitor row counts vs. expected volume (a load that completes with 0 rows is worse than a late load)

Step 5: Design validation rules

Propose automated checks for the highest-risk data:

### Validation rules

| Rule | Check | Severity | Action on failure |
|------|-------|----------|------------------|
| Completeness | {{field}} NOT NULL rate > {{threshold}}% | Critical / Warning | Block pipeline / Alert |
| Volume | Row count within {{n}}% of trailing 7-day average | Warning | Alert data team |
| Range | {{field}} BETWEEN {{min}} AND {{max}} | Critical | Quarantine rows |
| Referential | All {{foreign_key}} values exist in {{parent_table}} | Critical | Block pipeline |
| Recency | MAX({{timestamp_field}}) within {{n}} hours of now | Critical | Alert on-call |
| Uniqueness | COUNT(DISTINCT {{key}}) = COUNT(*) | Critical | Deduplicate and alert |
| Cross-source | {{metric}} in source A within {{n}}% of source B | Warning | Investigate discrepancy |

### Implementation priority
- Start with Critical rules on the 3-5 most-used tables
- Add Warning rules in the second pass
- Automate using: (dbt tests / Great Expectations / Monte Carlo / custom SQL checks / other)

Step 6: Assess schema quality (if applicable)

If the product has configurable or user-defined data structures (custom objects, custom fields, configurable forms), assess schema-level quality:

### Schema quality assessment

#### Naming consistency
| Check | Result | Issue |
|-------|--------|-------|
| Entity names follow a consistent pattern | (Y/N) | (e.g., mix of PascalCase and snake_case across object types) |
| Field names follow a consistent pattern | (Y/N) | (e.g., "created_date" vs. "dateCreated" vs. "create_dt") |
| Boolean fields named as questions | (Y/N) | (e.g., "active" instead of "is_active") |
| No generic names (field_1, misc, other) | (Y/N) | (Count of generic names found) |

#### Relationship health
| Check | Result | Issue |
|-------|--------|-------|
| Orphan rate (entities with zero relationships) | {{n}}% | (Target: < 10%. High orphan rates suggest users don't understand relationships or don't find them valuable.) |
| Circular reference count | {{n}} | (Circular references create infinite loops in queries and UI.) |
| Missing foreign keys (expected relationships not defined) | {{n}} | (Which relationships should exist based on the domain?) |

#### Schema sprawl
| Check | Result | Issue |
|-------|--------|-------|
| Unused entities (created but empty or < 5 records) | {{n}} | (Candidates for cleanup or admin guidance) |
| Fields with > 90% null values | {{n}} | (Either optional by design or a dead field) |
| Duplicate entity patterns (e.g., "Assets" and "Equipment" with same fields) | {{n}} | (Merge candidates) |

Step 7: Cross-system consistency check (if multiple data sources)

When data exists across multiple systems, verify consistency at integration boundaries:

### Cross-system consistency

| Entity/metric | System A | System B | Match? | Discrepancy | Root cause |
|--------------|----------|----------|--------|-------------|-----------|
| {{entity}} | (Count/value in system A) | (Count/value in system B) | Y/N/Partial | (Size of gap) | (Sync lag / filter mismatch / dedup difference / schema drift) |

### Common cross-system issues
- [ ] Same entity counted differently across systems (different filters or dedup logic)
- [ ] Sync lag creates temporary discrepancies that look like errors
- [ ] Schema drift (field renamed in one system but not the other)
- [ ] One system has records the other doesn't (failed sync, manual creation, different data entry points)
- [ ] Currency, timezone, or unit conversions applied in one system but not the other

### Source-of-truth recommendation
| Entity/metric | Recommended source of truth | Reason |
|--------------|---------------------------|--------|
| {{entity}} | (Which system should be authoritative?) | (Why this system over the others?) |

Step 8: Non-technical team scoring mode

If the team does not have a dedicated data engineering function, add this summary layer:

### Team data capability summary

| Capability | Current state | Gap | Recommendation |
|-----------|--------------|-----|---------------|
| Can the team define what "good data" looks like? | (Y/Partial/N) | | |
| Can the team detect data issues without engineering help? | (Y/Partial/N) | | |
| Can the team fix data issues without engineering help? | (Y/Partial/N) | | |
| Is there a named person responsible for data quality? | (Y/N) | | |
| Does the team review data quality regularly (not just when something breaks)? | (Y/N) | | |

### Recommended first actions for non-technical teams
1. Assign a data steward (does not need to write SQL -- needs authority to flag and escalate)
2. Create a "data issues" channel or form (lower the barrier to reporting)
3. Start with 3 checks the steward can do manually in 15 min/week
4. Automate those 3 checks as the first engineering investment

Step 9: Build the remediation roadmap

### Remediation roadmap

| Priority | Issue | Dimension | Remediation | Effort | Impact |
|----------|-------|-----------|-------------|--------|--------|
| P0 | (Actively causing wrong decisions) | (which) | (Specific fix) | S/M/L | (What improves) |
| P1 | (Eroding trust or creating workarounds) | (which) | (Specific fix) | S/M/L | (What improves) |
| P2 | (Technical debt, not yet causing visible harm) | (which) | (Specific fix) | S/M/L | (What improves) |

### Quick wins
- (Fix that's small effort, visible trust improvement)

### Systemic issues
- (Patterns needing architectural changes)

### What's working well
- (Data quality strengths to protect)

Step 10: Review

Ask the user:

Do the dimension scores match your team's experience? Where do they lose trust?
Are the freshness SLOs realistic given current infrastructure?
Who will own the validation rules? (Without an owner, they decay.)
What's the team's appetite for data quality investment vs. feature work?
Is there a data quality incident that prompted this assessment?

Step 10b: ALCOA+ lens (for regulated or clinical data systems)

When the data system operates in a regulated environment (clinical trials, healthcare, life sciences), layer the ALCOA+ data integrity framework on top of the standard six-dimension assessment:

### ALCOA+ Data Integrity Overlay

Use this section when the data under assessment is subject to regulatory requirements (21 CFR Part 11, ICH-GCP, CLIA, HIPAA) or when decisions based on this data have safety or legal consequences.

| ALCOA+ Attribute | Status | Gap |
|---|---|---|
| **Attributable** -- every data entry traceable to a specific person | (Y/Partial/N) | |
| **Legible** -- data readable and permanent, no format degradation | (Y/Partial/N) | |
| **Contemporaneous** -- timestamps reflect actual event time, not backfill | (Y/Partial/N) | |
| **Original** -- source of truth defined, copies marked as copies | (Y/Partial/N) | |
| **Accurate** -- validation rules, range checks, referential integrity in place | (Y/Partial/N) | |
| **Complete** -- no selective deletion, failed results retained | (Y/Partial/N) | |
| **Consistent** -- no contradictions across systems or time periods | (Y/Partial/N) | |
| **Enduring** -- retention policies defined and enforced | (Y/Partial/N) | |
| **Available** -- data retrievable on demand for inspection | (Y/Partial/N) | |

### Regulatory-specific checks
- [ ] Audit trail enabled, immutable, and covering all critical data fields
- [ ] Electronic signatures compliant with 21 CFR Part 11 (if applicable)
- [ ] Change reasons documented for every data modification
- [ ] Data retention meets regulatory requirements (not just business requirements)
- [ ] Cross-system reconciliation procedures defined and executed on schedule

Related skill: For a full ALCOA+ audit, use /clinical-data-integrity-audit.

Output location

Present the assessment as formatted text in the conversation. The remediation roadmap can be exported for backlog planning.

Example Output

Input

Data sources under review: Snowflake analytics warehouse (populated via Fivetran from Salesforce CRM, Stripe billing, and a custom Rails app event stream via Segment), plus a Looker BI layer on top
Critical metrics: Monthly Recurring Revenue (MRR), Net Revenue Retention (NRR), Product-Qualified Lead (PQL) conversion rate, and Weekly Active Accounts
Known data issues: MRR figures in Looker don't match the Stripe revenue report CFO uses; PQL conversion rate jumped 40% in mid-October with no corresponding change in sales activity; the data team suspects event replay after an October 14 infrastructure incident
Pipeline architecture: Segment → Fivetran → Snowflake raw layer → dbt transformations → Looker explores; Salesforce and Stripe synced separately via Fivetran on 6-hour schedules
Team context: 2-person data team (1 analytics engineer, 1 data analyst) supporting 8 stakeholders across Finance, Sales, and Product; no formal data ownership outside the data team
Current monitoring: dbt schema tests (not null, accepted values) on ~30% of models; no freshness monitoring; no cross-source reconciliation; no alerting

Output

Data Quality Assessment — Meridian Analytics Platform, November 2024

Dimension Scores

Dimension	Definition	Score (1–5)	Key Findings
Completeness	Expected records present; required fields populated	3	8% of Segment events missing `account_id`; Salesforce opportunities missing `close_date` in 14% of closed-won records
Accuracy	Values reflect reality; calculations match definitions	2	MRR double-counts expansion MRR when a subscription is upgraded mid-period; NRR denominator uses beginning-of-month ARR but code pulls beginning-of-quarter
Freshness	Data available when needed; lag from event to queryable	2	Salesforce and Stripe sync on 6-hour schedules; Finance reviews MRR at 8am and frequently queries stale data from the prior evening
Consistency	Same concepts match across sources; metrics reconcile	2	MRR differs by ~$31K between Looker and Stripe dashboard; Weekly Active Accounts count differs by ~420 between product DB and Snowflake warehouse
Validity	Values within expected ranges; schema conformance	3	Negative `session_duration_ms` values in 0.6% of records; 3 Stripe subscription statuses (`past_due_paused`, `trialing_grandfathered`, `legacy`) not in the accepted values list
Uniqueness	No duplicates; deduplication handled correctly	1	October 14 event replay created duplicate Segment events; deduplication logic in dbt uses `event_id` but Segment reissued new IDs on replay — duplicates are invisible to current checks
Schema Quality	Naming conventions consistent; relationships defined; low orphan rate	3	Mix of snake_case and camelCase field names across 4 dbt models; `pql_score` calculated in two separate models with slightly different logic

Overall score: 2.3 / 5 — Fragile. Immediate remediation required on Accuracy, Freshness, Consistency, and Uniqueness before MRR and NRR can be trusted for board reporting.

Metric Computation Audit

Metric	Definition (Documented)	Actual Computation	Match?	Issue
MRR	Sum of normalized monthly value of all active subscriptions at period end	Sums `amount` from Stripe `subscriptions` table including `past_due` status	Partial	`past_due` subscriptions included in MRR; Finance excludes them manually — $12–18K discrepancy depending on month
NRR	(Beginning ARR + expansion + contraction + churn) / Beginning ARR, cohorted monthly	Denominator pulls `arr_start_of_quarter`, not `arr_start_of_month`	No	Denominator mismatch inflates NRR by 3–6 points in growth months
PQL Conversion Rate	(Accounts reaching PQL threshold that convert to paid within 30 days) / (Total accounts reaching PQL threshold)	Counts duplicate events post-October 14 as separate PQL signals; some accounts triggered threshold twice	No	~40% spike in mid-October is an artifact of event replay, not real conversion lift
Weekly Active Accounts	Distinct accounts with ≥1 qualifying event in rolling 7-day window	Uses `account_id` from Segment events; 8% null `account_id` rate means those sessions are excluded silently	Partial	Understates WAA by estimated 6–9%; nulls excluded without alerting or documentation

Common Computation Issues Found

Null handling — nulls excluded from WAA without documentation or alerting
Metric versioning — NRR definition changed from quarterly to monthly cohorts in August; historical data not restated
Numerator/denominator mismatch — NRR denominator uses wrong time window
Aggregation errors — pql_score averaged across two models using different field weights
Time window ambiguity — not currently an issue but rolling_7d window definition not documented
Currency/timezone conversions — Stripe amounts in USD only; not a current risk

Freshness SLOs

Data Source	Current Latency	Required Latency	Proposed SLO	Monitoring
Stripe → Snowflake (billing)	T+6 hours (Fivetran schedule)	T+2 hours for Finance morning review	99% of loads complete by 6am UTC	Fivetran webhook → Slack `#data-alerts`; row count check
Salesforce → Snowflake (CRM)	T+6 hours	T+4 hours (less time-critical)	99% of loads complete by 5am UTC	Same webhook pattern
Segment → Snowflake (events)	T+15–45 min (near-real-time)	T+60 min acceptable	95% of hourly event volumes arrive within 60 min of event time	Volume-based anomaly check; alert if hourly count < 70% of same-hour prior 7-day average
dbt transformations	Runs at 3am UTC; no completion monitoring	Models powering Finance must complete by 6am UTC	99% of critical models complete by 6am UTC	dbt Cloud job alerts; model-level `updated_at` freshness test

Freshness Monitoring Rules

Stale threshold — Stripe: Alert if max(updated_at) in stripe.subscriptions is > 3 hours behind wall clock at 6am UTC
Stale threshold — Segment: Alert if hourly event count drops below 50% of trailing 7-day average for the same hour (catches both pipeline failures and replay scenarios)
Row count monitoring is mandatory — the October 14 replay scenario would have been caught 12 hours earlier with a "volume spike" alert; add upper-bound checks (> 200% of expected = replay risk)

Validation Rules

Rule	Check	Severity	Action on Failure
Completeness — account_id	`account_id` NOT NULL rate > 95% on Segment events	Critical	Alert `#data-alerts`; surface null rate in Looker data quality tile
Volume — lower bound	Hourly Segment event count > 60% of 7-day trailing average	Critical	Page on-call data engineer
Volume — upper bound (replay detection)	Hourly Segment event count < 250% of 7-day trailing average	Warning	Alert and quarantine new events pending review
Range — session duration	`session_duration_ms` BETWEEN 0 AND 86400000	Warning	Quarantine rows; log to `data_quality.invalid_sessions`
Referential — account existence	All `account_id` values in events table exist in `accounts` master table	Critical	Block downstream dbt models; alert
Uniqueness — event deduplication	After dedup, COUNT(DISTINCT composite_key) = COUNT(*) where composite_key = `user_id + event_name + timestamp_trunc(minute)`	Critical	Halt PQL model

Run this now

Try /data-quality-assessment on your own input

0/4000

Related Product Management skills

2x2 Prioritize A/B Test Planner Activation Optimization AI Prototype Guide Analytics Dashboard Design Audience Segmentation Backlog Craft Backlog Refine

Back to Skills Catalog