Data Quality Remediation
Data quality remediation transforms data from a degraded state to one that meets defined quality rules. This task covers the complete remediation cycle from issue triage through correction, verification, and recurrence prevention. You execute remediation when profiling reveals quality scores below acceptable thresholds or when downstream systems report data-related failures.
Remediation operates on issues already identified through profiling and monitoring. The Data Quality Management concept page establishes the rules and metrics that define what constitutes a quality issue. This task assumes those rules exist and focuses on the mechanics of correction.
Prerequisites
Before beginning remediation, confirm the following requirements are met.
Quality rules and profiling results: You need documented quality rules that define valid states for each data element. These rules specify acceptable value ranges, format patterns, referential integrity constraints, and uniqueness requirements. Profiling must have executed against current data, producing a quantified list of violations. Without profiling results, you cannot prioritise remediation effort or measure improvement.
Verify profiling is current by checking the profile timestamp:
SELECT profile_name, last_run_timestamp, total_records_profiled, total_violations_foundFROM data_quality.profile_runsWHERE dataset_name = 'beneficiary_registry'ORDER BY last_run_timestamp DESCLIMIT 1;The profile should be no older than 24 hours for actively changing datasets, or no older than the last data load for batch-updated datasets.
Issue prioritisation: Quality issues require prioritisation before remediation begins. A dataset with 50,000 violations cannot be remediated in arbitrary order. Prioritisation ranks issues by business impact, remediation complexity, and dependency relationships. High-impact issues affecting downstream reporting or operational decisions take precedence over cosmetic problems.
Confirm a prioritised issue list exists:
SELECT issue_id, issue_type, affected_records, priority_score, assigned_toFROM data_quality.remediation_queueWHERE dataset_name = 'beneficiary_registry' AND status = 'pending'ORDER BY priority_score DESC;Access permissions: Remediation requires write access to affected tables. For production data, this typically means elevated permissions beyond standard read access. Confirm your database role includes UPDATE and DELETE privileges on target tables, or that you have access to a remediation service account.
Test write access before beginning:
-- Test update capability (will be rolled back)BEGIN TRANSACTION;UPDATE beneficiary_registrySET last_modified = CURRENT_TIMESTAMPWHERE beneficiary_id = 'TEST-000';ROLLBACK;Backup verification: Before modifying production data, verify a current backup exists and test restoration capability. Remediation changes are difficult to reverse without backups, particularly for bulk corrections affecting thousands of records.
# Verify backup exists and is recentpsql -h backup-server -U backup_admin -c \ "SELECT backup_id, backup_timestamp, database_name, size_mb FROM backup_catalog WHERE database_name = 'programme_data' ORDER BY backup_timestamp DESC LIMIT 1;"The backup timestamp should precede your remediation start time by no more than the organisation’s recovery point objective, typically 4 to 24 hours.
Stakeholder authorisation: Data corrections require authorisation from the data owner or designated steward. This authorisation covers the scope of changes, acceptable data loss (if any), and service window for remediation activities. Document authorisation in the remediation ticket before proceeding.
Procedure
The remediation procedure progresses through five phases: triage, root cause analysis, planning, execution, and verification. Each phase produces artefacts required by subsequent phases.
+--------------------------------------------------------------------+| REMEDIATION WORKFLOW |+--------------------------------------------------------------------+| || +----------------+ +----------------+ +----------------+ || | | | | | | || | Issue Triage +---->+ Root Cause +---->+ Remediation | || | | | Analysis | | Planning | || +----------------+ +----------------+ +----------------+ || | | | || v v v || - Categorise issues - Identify source - Select pattern || - Assess impact - Trace lineage - Estimate effort || - Confirm priority - Document cause - Define rollback || || +----------------+ || | | || | Execution | || | | || +-------+--------+ || | || v || - Apply corrections || - Log changes || - Update audit trail || | || v || +-------+--------+ || | | || | Verification | || | | || +----------------+ || | || v || - Re-profile data || - Compare scores || - Document outcome || |+--------------------------------------------------------------------+Figure 1: Remediation workflow phases
Phase 1: Issue triage
Triage categorises issues, confirms their current state, and validates priority assignments from the remediation queue.
- Extract the issue batch for remediation from the queue. Select issues sharing a common category or root cause to enable efficient batch processing:
SELECT issue_id, rule_id, rule_name, violation_type, affected_table, affected_column, sample_values, record_count, first_detected, priority_score FROM data_quality.remediation_queue WHERE dataset_name = 'beneficiary_registry' AND status = 'pending' AND violation_type = 'invalid_format' ORDER BY priority_score DESC LIMIT 100;This query retrieves up to 100 format-related issues for batch remediation. Adjust the violation_type filter based on the issue category you are addressing.
- Verify issue counts match profiling results. Discrepancies indicate data changes since profiling or queue synchronisation problems:
-- Count current violations for comparison SELECT COUNT(*) as current_violations FROM beneficiary_registry WHERE phone_number !~ '^\+?[0-9]{10,15}$' AND phone_number IS NOT NULL;Compare this count against the record_count from the queue. A variance exceeding 5% suggests re-profiling before remediation.
- Sample affected records to understand the violation patterns:
SELECT beneficiary_id, phone_number, registration_date, data_source FROM beneficiary_registry WHERE phone_number !~ '^\+?[0-9]{10,15}$' AND phone_number IS NOT NULL ORDER BY registration_date DESC LIMIT 50;Examine the sample to identify whether violations cluster by data source, time period, or other attributes. This clustering informs root cause analysis.
- Document the triage findings in the remediation ticket. Record the issue count, sample patterns observed, and any adjustments to priority based on your analysis.
Phase 2: Root cause analysis
Root cause analysis identifies why invalid data entered the system. Remediation without root cause analysis treats symptoms while the underlying problem continues generating new violations.
+------------------------------------------------------------------+| ROOT CAUSE ANALYSIS |+------------------------------------------------------------------+| || +------------------+ || | Quality Issue | || | (symptom) | || +--------+---------+ || | || +--------------+--------------+ || | | | || v v v || +--------+----+ +------+------+ +----+--------+ || | Source | | Transform | | Load | || | System | | Logic | | Process | || +--------+----+ +------+------+ +----+--------+ || | | | || v v v || - No validation - Missing rule - Truncation || - User error - Wrong mapping - Type coercion || - Legacy data - Edge case - Concurrency || || [Trace using Data Lineage to identify exact point] || |+------------------------------------------------------------------+Figure 2: Root cause analysis categories
- Trace the data lineage for affected records. Lineage reveals the data path from source to current location, identifying transformation points where corruption could occur:
-- Query lineage metadata for affected table SELECT source_system, source_table, transform_job, load_timestamp, record_count FROM data_lineage.column_lineage WHERE target_table = 'beneficiary_registry' AND target_column = 'phone_number' ORDER BY load_timestamp DESC;If your organisation uses a lineage tool like Apache Atlas or DataHub, query its API for more detailed transformation history. See Data Lineage for lineage architecture.
- Examine source system data for the same records:
-- Query source system (adjust connection as needed) SELECT registration_id, contact_phone, entry_timestamp, entry_user FROM source_kobo.submissions WHERE registration_id IN ( SELECT source_id FROM beneficiary_registry WHERE phone_number !~ '^\+?[0-9]{10,15}$' LIMIT 100 );Compare source values against destination values. If source data is already invalid, the root cause lies in data collection. If source data is valid but destination is corrupted, the root cause lies in transformation or loading.
- Review transformation logic for the identified pipeline:
# Example: Check dbt model or ETL script for phone handling # Look for: # - Missing COALESCE or NULL handling # - Incorrect regex patterns # - Truncation from VARCHAR sizing # - Character encoding issuesDocument the specific transformation step where corruption occurs. If no corruption point exists in the pipeline, the root cause is source data entry.
Record the root cause in the remediation ticket using this structure:
- Root cause category: Source system / Transform logic / Load process
- Specific cause: Description of the defect
- Evidence: Query results or code references demonstrating the cause
- Prevention recommendation: How to stop future occurrences
Phase 3: Remediation planning
Planning determines the correction approach, estimates effort, and defines rollback procedures.
Select the remediation pattern based on root cause and data characteristics. Three patterns apply to most scenarios:
Manual correction applies when fewer than 50 records require remediation, each requiring individual assessment, or when corrections need domain expertise to determine valid values. Manual correction uses UPDATE statements with WHERE clauses targeting specific records.
Automated cleansing applies when hundreds or thousands of records share a common correction rule. Automated cleansing uses SQL UPDATE statements, Python scripts with pandas, or dedicated data quality tools to apply transformations in bulk.
Source fix applies when the root cause is ongoing data entry or upstream system problems. Source fixes modify collection forms, API validations, or upstream ETL before remediating existing data.
For this procedure, select automated cleansing when affected records exceed 50 and share a deterministic correction rule.
Define the correction logic. Express the transformation as a SQL CASE statement or equivalent logic:
-- Define correction logic for phone number standardisation -- Rule: Remove non-numeric characters except leading + -- Rule: Reject if result is not 10-15 digits
UPDATE beneficiary_registry SET phone_number = CASE -- Already valid: no change WHEN phone_number ~ '^\+?[0-9]{10,15}$' THEN phone_number -- Cleanable: remove spaces, dashes, parentheses WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$' THEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') -- Uncorrectable: set to NULL for manual review ELSE NULL END, phone_number_remediation_flag = CASE WHEN phone_number ~ '^\+?[0-9]{10,15}$' THEN 'valid' WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$' THEN 'auto_corrected' ELSE 'requires_review' END, last_modified = CURRENT_TIMESTAMP, modified_by = 'dq_remediation_batch_2024_001' WHERE phone_number IS NOT NULL AND phone_number !~ '^\+?[0-9]{10,15}$';- Estimate records affected by each correction branch. Run the logic in SELECT mode first:
SELECT CASE WHEN phone_number ~ '^\+?[0-9]{10,15}$' THEN 'already_valid' WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$' THEN 'auto_correctable' ELSE 'requires_review' END as correction_category, COUNT(*) as record_count FROM beneficiary_registry WHERE phone_number IS NOT NULL GROUP BY 1;Expected output:
correction_category | record_count --------------------|------------- already_valid | 45230 auto_correctable | 3847 requires_review | 412This breakdown informs stakeholders about expected outcomes: 3,847 automatic corrections and 412 records requiring manual attention.
- Create the rollback script before executing corrections. Store original values in a remediation log table:
-- Create remediation log if not exists CREATE TABLE IF NOT EXISTS data_quality.remediation_log ( log_id SERIAL PRIMARY KEY, remediation_batch VARCHAR(50), table_name VARCHAR(100), record_id VARCHAR(100), column_name VARCHAR(100), original_value TEXT, new_value TEXT, correction_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP, corrected_by VARCHAR(100) );
-- Capture original values before correction INSERT INTO data_quality.remediation_log (remediation_batch, table_name, record_id, column_name, original_value, new_value, corrected_by) SELECT 'BATCH_2024_001', 'beneficiary_registry', beneficiary_id, 'phone_number', phone_number, CASE WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$' THEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ELSE NULL END, 'dq_remediation_batch_2024_001' FROM beneficiary_registry WHERE phone_number IS NOT NULL AND phone_number !~ '^\+?[0-9]{10,15}$';Document the remediation plan in the ticket:
- Correction pattern: Manual / Automated / Source fix
- Correction logic: SQL or script reference
- Expected outcomes: Record counts by category
- Rollback procedure: Reference to remediation_log entries
- Execution window: Scheduled time for corrections
- Verification method: Re-profiling and spot checks
Phase 4: Execution
Execute the remediation during the approved service window with appropriate transaction controls.
- Begin a database transaction to enable rollback if errors occur:
BEGIN TRANSACTION; SET LOCAL statement_timeout = '30min';The statement timeout prevents runaway queries. Adjust based on expected execution time from your row count estimates.
- Execute the original value capture (if not done in planning):
INSERT INTO data_quality.remediation_log (remediation_batch, table_name, record_id, column_name, original_value, new_value, corrected_by) SELECT 'BATCH_2024_001', 'beneficiary_registry', beneficiary_id, 'phone_number', phone_number, CASE WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$' THEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ELSE NULL END, 'dq_remediation_batch_2024_001' FROM beneficiary_registry WHERE phone_number IS NOT NULL AND phone_number !~ '^\+?[0-9]{10,15}$';Verify the insert count matches your planning estimate:
INSERT 0 4259- Execute the correction statement:
UPDATE beneficiary_registry SET phone_number = CASE WHEN phone_number ~ '^\+?[0-9]{10,15}$' THEN phone_number WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$' THEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ELSE NULL END, phone_number_remediation_flag = CASE WHEN phone_number ~ '^\+?[0-9]{10,15}$' THEN 'valid' WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$' THEN 'auto_corrected' ELSE 'requires_review' END, last_modified = CURRENT_TIMESTAMP, modified_by = 'dq_remediation_batch_2024_001' WHERE phone_number IS NOT NULL AND phone_number !~ '^\+?[0-9]{10,15}$';Verify the update count:
UPDATE 4259- Perform immediate validation before committing:
-- Check no valid records were corrupted SELECT COUNT(*) as still_valid FROM beneficiary_registry WHERE phone_number ~ '^\+?[0-9]{10,15}$';
-- Check correction counts match expectations SELECT phone_number_remediation_flag, COUNT(*) FROM beneficiary_registry WHERE modified_by = 'dq_remediation_batch_2024_001' GROUP BY 1;Expected output:
phone_number_remediation_flag | count ------------------------------|------- auto_corrected | 3847 requires_review | 412If counts deviate significantly from planning estimates, investigate before committing.
- Commit or rollback based on validation results:
-- If validation passes: COMMIT;
-- If validation fails: ROLLBACK;Commit decision
Only commit if validation counts match planning estimates within 2% tolerance. Larger variances indicate logic errors requiring investigation.
- Update the remediation queue to reflect completed work:
UPDATE data_quality.remediation_queue SET status = 'completed', completion_timestamp = CURRENT_TIMESTAMP, records_remediated = 4259, notes = 'Batch BATCH_2024_001: 3847 auto-corrected, 412 set for manual review' WHERE dataset_name = 'beneficiary_registry' AND violation_type = 'invalid_format' AND affected_column = 'phone_number' AND status = 'pending';Phase 5: Duplicate resolution
Duplicate records require specialised handling because resolution involves choosing survivor records and merging or purging losers. This phase applies when your remediation batch includes uniqueness violations.
- Identify duplicate clusters using the matching criteria defined in your quality rules:
-- Find duplicates by matching key (name + date of birth + location) WITH duplicate_clusters AS ( SELECT LOWER(TRIM(first_name)) || '|' || LOWER(TRIM(last_name)) || '|' || date_of_birth || '|' || location_pcode AS match_key, ARRAY_AGG(beneficiary_id ORDER BY registration_date) as duplicate_ids, COUNT(*) as cluster_size FROM beneficiary_registry GROUP BY 1 HAVING COUNT(*) > 1 ) SELECT * FROM duplicate_clusters ORDER BY cluster_size DESC LIMIT 100;Apply survivorship rules to select the golden record from each cluster. Survivorship rules define which record’s values take precedence. Common rules include:
- Most recent registration date wins
- Most complete record (fewest NULLs) wins
- Specific source system wins (e.g., biometric registration over paper)
- Earliest record wins (original registration)
-- Identify survivors using "most complete record" rule WITH duplicate_clusters AS ( SELECT LOWER(TRIM(first_name)) || '|' || LOWER(TRIM(last_name)) || '|' || date_of_birth || '|' || location_pcode AS match_key, beneficiary_id, -- Count non-null fields as completeness score (CASE WHEN phone_number IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN email IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN national_id IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN household_id IS NOT NULL THEN 1 ELSE 0 END) as completeness_score, registration_date, ROW_NUMBER() OVER ( PARTITION BY LOWER(TRIM(first_name)) || '|' || LOWER(TRIM(last_name)) || '|' || date_of_birth || '|' || location_pcode ORDER BY -- Primary: completeness (CASE WHEN phone_number IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN email IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN national_id IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN household_id IS NOT NULL THEN 1 ELSE 0 END) DESC, -- Secondary: earliest registration registration_date ASC ) as survivor_rank FROM beneficiary_registry ) SELECT match_key, beneficiary_id, completeness_score, registration_date, CASE WHEN survivor_rank = 1 THEN 'survivor' ELSE 'duplicate' END as resolution FROM duplicate_clusters WHERE match_key IN ( SELECT match_key FROM duplicate_clusters GROUP BY 1 HAVING COUNT(*) > 1 ) ORDER BY match_key, survivor_rank;- Merge valuable data from duplicate records into survivors before deletion:
-- Merge phone numbers from duplicates into survivors where survivor is NULL UPDATE beneficiary_registry survivor SET phone_number = dup.phone_number, last_modified = CURRENT_TIMESTAMP, modified_by = 'dq_dedup_batch_2024_001' FROM ( SELECT match_key, beneficiary_id, phone_number FROM duplicate_clusters WHERE survivor_rank > 1 AND phone_number IS NOT NULL ) dup JOIN ( SELECT match_key, beneficiary_id FROM duplicate_clusters WHERE survivor_rank = 1 AND phone_number IS NULL ) surv ON dup.match_key = surv.match_key WHERE survivor.beneficiary_id = surv.beneficiary_id;- Archive duplicate records before deletion:
INSERT INTO beneficiary_registry_archive SELECT *, CURRENT_TIMESTAMP as archived_at, 'duplicate_resolution' as archive_reason FROM beneficiary_registry WHERE beneficiary_id IN ( SELECT beneficiary_id FROM duplicate_clusters WHERE survivor_rank > 1 );- Delete archived duplicates:
DELETE FROM beneficiary_registry WHERE beneficiary_id IN ( SELECT beneficiary_id FROM duplicate_clusters WHERE survivor_rank > 1 );Verify deletion count matches expected duplicates:
DELETE 847Phase 6: Standardisation and missing data
Standardisation corrects inconsistent representations of the same value. Missing data handling applies default values or derives values from related fields.
- Apply standardisation transformations for categorical fields:
-- Standardise gender field UPDATE beneficiary_registry SET gender = CASE WHEN UPPER(TRIM(gender)) IN ('M', 'MALE', 'MAN', 'HOMME') THEN 'Male' WHEN UPPER(TRIM(gender)) IN ('F', 'FEMALE', 'WOMAN', 'FEMME') THEN 'Female' WHEN UPPER(TRIM(gender)) IN ('O', 'OTHER', 'AUTRE', 'NON-BINARY') THEN 'Other' ELSE gender -- Preserve unrecognised values for manual review END, last_modified = CURRENT_TIMESTAMP, modified_by = 'dq_standardisation_batch_2024_001' WHERE gender IS NOT NULL AND gender NOT IN ('Male', 'Female', 'Other');- Derive missing values from related fields where business rules permit:
-- Derive age_group from date_of_birth where age_group is NULL UPDATE beneficiary_registry SET age_group = CASE WHEN DATE_PART('year', AGE(date_of_birth)) < 5 THEN '0-4' WHEN DATE_PART('year', AGE(date_of_birth)) < 18 THEN '5-17' WHEN DATE_PART('year', AGE(date_of_birth)) < 60 THEN '18-59' ELSE '60+' END, last_modified = CURRENT_TIMESTAMP, modified_by = 'dq_derivation_batch_2024_001' WHERE age_group IS NULL AND date_of_birth IS NOT NULL;- Apply default values for mandatory fields where organisational policy permits:
-- Set default registration status for records missing status UPDATE beneficiary_registry SET registration_status = 'pending_verification', last_modified = CURRENT_TIMESTAMP, modified_by = 'dq_default_batch_2024_001' WHERE registration_status IS NULL;Default value policy
Apply defaults only for fields where your data governance policy explicitly permits default assignment. Document default logic in the remediation ticket for audit purposes.
Verification
Verification confirms remediation achieved its objectives without introducing new problems.
Execute a full data profile after remediation completes:
-- Trigger re-profiling (tool-specific; example for Great Expectations)-- great_expectations checkpoint run beneficiary_registry_checkpointFor manual verification, run the original quality rules against remediated data:
-- Recount phone number violationsSELECT COUNT(*) as remaining_violationsFROM beneficiary_registryWHERE phone_number IS NOT NULL AND phone_number !~ '^\+?[0-9]{10,15}$';Expected result: 0 violations (auto-correctable records fixed) or a count matching the “requires_review” category from planning.
Compare quality scores before and after remediation:
SELECT p.profile_name, p.run_timestamp, p.total_records, p.passing_records, ROUND(100.0 * p.passing_records / p.total_records, 2) as quality_score_pctFROM data_quality.profile_results pWHERE p.dataset_name = 'beneficiary_registry' AND p.rule_name = 'phone_format_valid'ORDER BY p.run_timestamp DESCLIMIT 2;Expected output showing improvement:
profile_name | run_timestamp | total_records | passing_records | quality_score_pct--------------------|---------------------|---------------|-----------------|------------------post_remediation | 2024-11-16 14:30:00 | 49489 | 49077 | 99.17pre_remediation | 2024-11-16 09:00:00 | 49489 | 45230 | 91.39Perform spot checks on remediated records to verify corrections are sensible:
SELECT beneficiary_id, original_value, new_value, correction_timestampFROM data_quality.remediation_logWHERE remediation_batch = 'BATCH_2024_001'ORDER BY RANDOM()LIMIT 20;Review the 20 randomly selected corrections. Each should show a logical transformation from invalid to valid format.
Verify no downstream system failures occurred after remediation by checking application logs or monitoring dashboards for the 2 hours following correction completion.
Troubleshooting
| Symptom | Cause | Resolution |
|---|---|---|
| UPDATE affects 0 rows | WHERE clause too restrictive or data already remediated | Verify issue still exists with SELECT using same WHERE clause; check if previous remediation batch already corrected |
| UPDATE affects more rows than expected | WHERE clause too broad; missing condition | ROLLBACK immediately; add additional predicates to limit scope; re-test with SELECT before UPDATE |
| Transaction timeout during large batch | Batch size exceeds statement_timeout setting | Split remediation into smaller batches of 10,000-50,000 records; increase timeout with SET LOCAL statement_timeout = '2h' |
| Foreign key violation on DELETE | Duplicate records referenced by child tables | Update child table foreign keys to point to survivor record before deleting duplicates |
| Deadlock detected | Concurrent processes modifying same records | Retry after delay; schedule remediation during low-activity windows; use explicit row locking with SELECT ... FOR UPDATE |
| Character encoding errors in standardisation | Source data contains mixed encodings | Convert encoding before standardisation: CONVERT_FROM(column::bytea, 'UTF8') |
| Regex pattern not matching expected records | Pattern syntax differs between databases | Verify regex flavour (POSIX vs PCRE); test pattern in database-specific regex tester |
| Remediation log INSERT fails with duplicate key | Log table primary key conflict from previous run | Truncate log table for current batch or use different batch identifier |
| Quality score unchanged after remediation | Re-profiling not executed or cached results | Force fresh profile run; clear profile cache; verify profile query references correct timestamp |
| New violations appear immediately after remediation | Source system continuing to load invalid data | Implement source fix before re-remediating; add validation to ingestion pipeline |
| Rollback fails with “transaction already committed” | COMMIT executed before problem discovered | Restore from remediation_log table using reverse UPDATE; if log incomplete, restore from backup |
| Memory exhaustion during large UPDATE | Database loading entire result set into memory | Process in batches using LIMIT and OFFSET; use cursor-based approach |
Recurring issue investigation
When the same quality issues reappear within 30 days of remediation, the root cause was not addressed. Investigate using this sequence:
Query the lineage for records that failed quality rules after remediation:
SELECT l.source_system, l.load_job_id, l.load_timestamp, COUNT(*) as new_violationsFROM beneficiary_registry bJOIN data_lineage.load_audit l ON b.load_batch_id = l.load_job_idWHERE b.phone_number !~ '^\+?[0-9]{10,15}$' AND b.registration_date > ( SELECT MAX(correction_timestamp) FROM data_quality.remediation_log WHERE remediation_batch = 'BATCH_2024_001' )GROUP BY 1, 2, 3ORDER BY l.load_timestamp;This identifies which source system and load job introduced new violations. Address by implementing validation at the ingestion point rather than repeated remediation.