Skip to main content

Data Quality Remediation

Data quality remediation transforms data from a degraded state to one that meets defined quality rules. This task covers the complete remediation cycle from issue triage through correction, verification, and recurrence prevention. You execute remediation when profiling reveals quality scores below acceptable thresholds or when downstream systems report data-related failures.

Remediation operates on issues already identified through profiling and monitoring. The Data Quality Management concept page establishes the rules and metrics that define what constitutes a quality issue. This task assumes those rules exist and focuses on the mechanics of correction.

Prerequisites

Before beginning remediation, confirm the following requirements are met.

Quality rules and profiling results: You need documented quality rules that define valid states for each data element. These rules specify acceptable value ranges, format patterns, referential integrity constraints, and uniqueness requirements. Profiling must have executed against current data, producing a quantified list of violations. Without profiling results, you cannot prioritise remediation effort or measure improvement.

Verify profiling is current by checking the profile timestamp:

SELECT profile_name,
last_run_timestamp,
total_records_profiled,
total_violations_found
FROM data_quality.profile_runs
WHERE dataset_name = 'beneficiary_registry'
ORDER BY last_run_timestamp DESC
LIMIT 1;

The profile should be no older than 24 hours for actively changing datasets, or no older than the last data load for batch-updated datasets.

Issue prioritisation: Quality issues require prioritisation before remediation begins. A dataset with 50,000 violations cannot be remediated in arbitrary order. Prioritisation ranks issues by business impact, remediation complexity, and dependency relationships. High-impact issues affecting downstream reporting or operational decisions take precedence over cosmetic problems.

Confirm a prioritised issue list exists:

SELECT issue_id,
issue_type,
affected_records,
priority_score,
assigned_to
FROM data_quality.remediation_queue
WHERE dataset_name = 'beneficiary_registry'
AND status = 'pending'
ORDER BY priority_score DESC;

Access permissions: Remediation requires write access to affected tables. For production data, this typically means elevated permissions beyond standard read access. Confirm your database role includes UPDATE and DELETE privileges on target tables, or that you have access to a remediation service account.

Test write access before beginning:

-- Test update capability (will be rolled back)
BEGIN TRANSACTION;
UPDATE beneficiary_registry
SET last_modified = CURRENT_TIMESTAMP
WHERE beneficiary_id = 'TEST-000';
ROLLBACK;

Backup verification: Before modifying production data, verify a current backup exists and test restoration capability. Remediation changes are difficult to reverse without backups, particularly for bulk corrections affecting thousands of records.

Terminal window
# Verify backup exists and is recent
psql -h backup-server -U backup_admin -c \
"SELECT backup_id, backup_timestamp, database_name, size_mb
FROM backup_catalog
WHERE database_name = 'programme_data'
ORDER BY backup_timestamp DESC LIMIT 1;"

The backup timestamp should precede your remediation start time by no more than the organisation’s recovery point objective, typically 4 to 24 hours.

Stakeholder authorisation: Data corrections require authorisation from the data owner or designated steward. This authorisation covers the scope of changes, acceptable data loss (if any), and service window for remediation activities. Document authorisation in the remediation ticket before proceeding.

Procedure

The remediation procedure progresses through five phases: triage, root cause analysis, planning, execution, and verification. Each phase produces artefacts required by subsequent phases.

+--------------------------------------------------------------------+
| REMEDIATION WORKFLOW |
+--------------------------------------------------------------------+
| |
| +----------------+ +----------------+ +----------------+ |
| | | | | | | |
| | Issue Triage +---->+ Root Cause +---->+ Remediation | |
| | | | Analysis | | Planning | |
| +----------------+ +----------------+ +----------------+ |
| | | | |
| v v v |
| - Categorise issues - Identify source - Select pattern |
| - Assess impact - Trace lineage - Estimate effort |
| - Confirm priority - Document cause - Define rollback |
| |
| +----------------+ |
| | | |
| | Execution | |
| | | |
| +-------+--------+ |
| | |
| v |
| - Apply corrections |
| - Log changes |
| - Update audit trail |
| | |
| v |
| +-------+--------+ |
| | | |
| | Verification | |
| | | |
| +----------------+ |
| | |
| v |
| - Re-profile data |
| - Compare scores |
| - Document outcome |
| |
+--------------------------------------------------------------------+

Figure 1: Remediation workflow phases

Phase 1: Issue triage

Triage categorises issues, confirms their current state, and validates priority assignments from the remediation queue.

  1. Extract the issue batch for remediation from the queue. Select issues sharing a common category or root cause to enable efficient batch processing:
SELECT issue_id,
rule_id,
rule_name,
violation_type,
affected_table,
affected_column,
sample_values,
record_count,
first_detected,
priority_score
FROM data_quality.remediation_queue
WHERE dataset_name = 'beneficiary_registry'
AND status = 'pending'
AND violation_type = 'invalid_format'
ORDER BY priority_score DESC
LIMIT 100;

This query retrieves up to 100 format-related issues for batch remediation. Adjust the violation_type filter based on the issue category you are addressing.

  1. Verify issue counts match profiling results. Discrepancies indicate data changes since profiling or queue synchronisation problems:
-- Count current violations for comparison
SELECT COUNT(*) as current_violations
FROM beneficiary_registry
WHERE phone_number !~ '^\+?[0-9]{10,15}$'
AND phone_number IS NOT NULL;

Compare this count against the record_count from the queue. A variance exceeding 5% suggests re-profiling before remediation.

  1. Sample affected records to understand the violation patterns:
SELECT beneficiary_id,
phone_number,
registration_date,
data_source
FROM beneficiary_registry
WHERE phone_number !~ '^\+?[0-9]{10,15}$'
AND phone_number IS NOT NULL
ORDER BY registration_date DESC
LIMIT 50;

Examine the sample to identify whether violations cluster by data source, time period, or other attributes. This clustering informs root cause analysis.

  1. Document the triage findings in the remediation ticket. Record the issue count, sample patterns observed, and any adjustments to priority based on your analysis.

Phase 2: Root cause analysis

Root cause analysis identifies why invalid data entered the system. Remediation without root cause analysis treats symptoms while the underlying problem continues generating new violations.

+------------------------------------------------------------------+
| ROOT CAUSE ANALYSIS |
+------------------------------------------------------------------+
| |
| +------------------+ |
| | Quality Issue | |
| | (symptom) | |
| +--------+---------+ |
| | |
| +--------------+--------------+ |
| | | | |
| v v v |
| +--------+----+ +------+------+ +----+--------+ |
| | Source | | Transform | | Load | |
| | System | | Logic | | Process | |
| +--------+----+ +------+------+ +----+--------+ |
| | | | |
| v v v |
| - No validation - Missing rule - Truncation |
| - User error - Wrong mapping - Type coercion |
| - Legacy data - Edge case - Concurrency |
| |
| [Trace using Data Lineage to identify exact point] |
| |
+------------------------------------------------------------------+

Figure 2: Root cause analysis categories

  1. Trace the data lineage for affected records. Lineage reveals the data path from source to current location, identifying transformation points where corruption could occur:
-- Query lineage metadata for affected table
SELECT source_system,
source_table,
transform_job,
load_timestamp,
record_count
FROM data_lineage.column_lineage
WHERE target_table = 'beneficiary_registry'
AND target_column = 'phone_number'
ORDER BY load_timestamp DESC;

If your organisation uses a lineage tool like Apache Atlas or DataHub, query its API for more detailed transformation history. See Data Lineage for lineage architecture.

  1. Examine source system data for the same records:
-- Query source system (adjust connection as needed)
SELECT registration_id,
contact_phone,
entry_timestamp,
entry_user
FROM source_kobo.submissions
WHERE registration_id IN (
SELECT source_id
FROM beneficiary_registry
WHERE phone_number !~ '^\+?[0-9]{10,15}$'
LIMIT 100
);

Compare source values against destination values. If source data is already invalid, the root cause lies in data collection. If source data is valid but destination is corrupted, the root cause lies in transformation or loading.

  1. Review transformation logic for the identified pipeline:
models/staging/stg_beneficiaries.sql
# Example: Check dbt model or ETL script for phone handling
# Look for:
# - Missing COALESCE or NULL handling
# - Incorrect regex patterns
# - Truncation from VARCHAR sizing
# - Character encoding issues

Document the specific transformation step where corruption occurs. If no corruption point exists in the pipeline, the root cause is source data entry.

  1. Record the root cause in the remediation ticket using this structure:

    • Root cause category: Source system / Transform logic / Load process
    • Specific cause: Description of the defect
    • Evidence: Query results or code references demonstrating the cause
    • Prevention recommendation: How to stop future occurrences

Phase 3: Remediation planning

Planning determines the correction approach, estimates effort, and defines rollback procedures.

  1. Select the remediation pattern based on root cause and data characteristics. Three patterns apply to most scenarios:

    Manual correction applies when fewer than 50 records require remediation, each requiring individual assessment, or when corrections need domain expertise to determine valid values. Manual correction uses UPDATE statements with WHERE clauses targeting specific records.

    Automated cleansing applies when hundreds or thousands of records share a common correction rule. Automated cleansing uses SQL UPDATE statements, Python scripts with pandas, or dedicated data quality tools to apply transformations in bulk.

    Source fix applies when the root cause is ongoing data entry or upstream system problems. Source fixes modify collection forms, API validations, or upstream ETL before remediating existing data.

    For this procedure, select automated cleansing when affected records exceed 50 and share a deterministic correction rule.

  2. Define the correction logic. Express the transformation as a SQL CASE statement or equivalent logic:

-- Define correction logic for phone number standardisation
-- Rule: Remove non-numeric characters except leading +
-- Rule: Reject if result is not 10-15 digits
UPDATE beneficiary_registry
SET phone_number = CASE
-- Already valid: no change
WHEN phone_number ~ '^\+?[0-9]{10,15}$'
THEN phone_number
-- Cleanable: remove spaces, dashes, parentheses
WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$'
THEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g')
-- Uncorrectable: set to NULL for manual review
ELSE NULL
END,
phone_number_remediation_flag = CASE
WHEN phone_number ~ '^\+?[0-9]{10,15}$' THEN 'valid'
WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$' THEN 'auto_corrected'
ELSE 'requires_review'
END,
last_modified = CURRENT_TIMESTAMP,
modified_by = 'dq_remediation_batch_2024_001'
WHERE phone_number IS NOT NULL
AND phone_number !~ '^\+?[0-9]{10,15}$';
  1. Estimate records affected by each correction branch. Run the logic in SELECT mode first:
SELECT
CASE
WHEN phone_number ~ '^\+?[0-9]{10,15}$' THEN 'already_valid'
WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$' THEN 'auto_correctable'
ELSE 'requires_review'
END as correction_category,
COUNT(*) as record_count
FROM beneficiary_registry
WHERE phone_number IS NOT NULL
GROUP BY 1;

Expected output:

correction_category | record_count
--------------------|-------------
already_valid | 45230
auto_correctable | 3847
requires_review | 412

This breakdown informs stakeholders about expected outcomes: 3,847 automatic corrections and 412 records requiring manual attention.

  1. Create the rollback script before executing corrections. Store original values in a remediation log table:
-- Create remediation log if not exists
CREATE TABLE IF NOT EXISTS data_quality.remediation_log (
log_id SERIAL PRIMARY KEY,
remediation_batch VARCHAR(50),
table_name VARCHAR(100),
record_id VARCHAR(100),
column_name VARCHAR(100),
original_value TEXT,
new_value TEXT,
correction_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
corrected_by VARCHAR(100)
);
-- Capture original values before correction
INSERT INTO data_quality.remediation_log
(remediation_batch, table_name, record_id, column_name, original_value, new_value, corrected_by)
SELECT
'BATCH_2024_001',
'beneficiary_registry',
beneficiary_id,
'phone_number',
phone_number,
CASE
WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$'
THEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g')
ELSE NULL
END,
'dq_remediation_batch_2024_001'
FROM beneficiary_registry
WHERE phone_number IS NOT NULL
AND phone_number !~ '^\+?[0-9]{10,15}$';
  1. Document the remediation plan in the ticket:

    • Correction pattern: Manual / Automated / Source fix
    • Correction logic: SQL or script reference
    • Expected outcomes: Record counts by category
    • Rollback procedure: Reference to remediation_log entries
    • Execution window: Scheduled time for corrections
    • Verification method: Re-profiling and spot checks

Phase 4: Execution

Execute the remediation during the approved service window with appropriate transaction controls.

  1. Begin a database transaction to enable rollback if errors occur:
BEGIN TRANSACTION;
SET LOCAL statement_timeout = '30min';

The statement timeout prevents runaway queries. Adjust based on expected execution time from your row count estimates.

  1. Execute the original value capture (if not done in planning):
INSERT INTO data_quality.remediation_log
(remediation_batch, table_name, record_id, column_name, original_value, new_value, corrected_by)
SELECT
'BATCH_2024_001',
'beneficiary_registry',
beneficiary_id,
'phone_number',
phone_number,
CASE
WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$'
THEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g')
ELSE NULL
END,
'dq_remediation_batch_2024_001'
FROM beneficiary_registry
WHERE phone_number IS NOT NULL
AND phone_number !~ '^\+?[0-9]{10,15}$';

Verify the insert count matches your planning estimate:

INSERT 0 4259
  1. Execute the correction statement:
UPDATE beneficiary_registry
SET phone_number = CASE
WHEN phone_number ~ '^\+?[0-9]{10,15}$'
THEN phone_number
WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$'
THEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g')
ELSE NULL
END,
phone_number_remediation_flag = CASE
WHEN phone_number ~ '^\+?[0-9]{10,15}$' THEN 'valid'
WHEN REGEXP_REPLACE(phone_number, '[^0-9+]', '', 'g') ~ '^\+?[0-9]{10,15}$' THEN 'auto_corrected'
ELSE 'requires_review'
END,
last_modified = CURRENT_TIMESTAMP,
modified_by = 'dq_remediation_batch_2024_001'
WHERE phone_number IS NOT NULL
AND phone_number !~ '^\+?[0-9]{10,15}$';

Verify the update count:

UPDATE 4259
  1. Perform immediate validation before committing:
-- Check no valid records were corrupted
SELECT COUNT(*) as still_valid
FROM beneficiary_registry
WHERE phone_number ~ '^\+?[0-9]{10,15}$';
-- Check correction counts match expectations
SELECT phone_number_remediation_flag, COUNT(*)
FROM beneficiary_registry
WHERE modified_by = 'dq_remediation_batch_2024_001'
GROUP BY 1;

Expected output:

phone_number_remediation_flag | count
------------------------------|-------
auto_corrected | 3847
requires_review | 412

If counts deviate significantly from planning estimates, investigate before committing.

  1. Commit or rollback based on validation results:
-- If validation passes:
COMMIT;
-- If validation fails:
ROLLBACK;

Commit decision

Only commit if validation counts match planning estimates within 2% tolerance. Larger variances indicate logic errors requiring investigation.

  1. Update the remediation queue to reflect completed work:
UPDATE data_quality.remediation_queue
SET status = 'completed',
completion_timestamp = CURRENT_TIMESTAMP,
records_remediated = 4259,
notes = 'Batch BATCH_2024_001: 3847 auto-corrected, 412 set for manual review'
WHERE dataset_name = 'beneficiary_registry'
AND violation_type = 'invalid_format'
AND affected_column = 'phone_number'
AND status = 'pending';

Phase 5: Duplicate resolution

Duplicate records require specialised handling because resolution involves choosing survivor records and merging or purging losers. This phase applies when your remediation batch includes uniqueness violations.

  1. Identify duplicate clusters using the matching criteria defined in your quality rules:
-- Find duplicates by matching key (name + date of birth + location)
WITH duplicate_clusters AS (
SELECT
LOWER(TRIM(first_name)) || '|' ||
LOWER(TRIM(last_name)) || '|' ||
date_of_birth || '|' ||
location_pcode AS match_key,
ARRAY_AGG(beneficiary_id ORDER BY registration_date) as duplicate_ids,
COUNT(*) as cluster_size
FROM beneficiary_registry
GROUP BY 1
HAVING COUNT(*) > 1
)
SELECT * FROM duplicate_clusters
ORDER BY cluster_size DESC
LIMIT 100;
  1. Apply survivorship rules to select the golden record from each cluster. Survivorship rules define which record’s values take precedence. Common rules include:

    • Most recent registration date wins
    • Most complete record (fewest NULLs) wins
    • Specific source system wins (e.g., biometric registration over paper)
    • Earliest record wins (original registration)
-- Identify survivors using "most complete record" rule
WITH duplicate_clusters AS (
SELECT
LOWER(TRIM(first_name)) || '|' ||
LOWER(TRIM(last_name)) || '|' ||
date_of_birth || '|' ||
location_pcode AS match_key,
beneficiary_id,
-- Count non-null fields as completeness score
(CASE WHEN phone_number IS NOT NULL THEN 1 ELSE 0 END +
CASE WHEN email IS NOT NULL THEN 1 ELSE 0 END +
CASE WHEN national_id IS NOT NULL THEN 1 ELSE 0 END +
CASE WHEN household_id IS NOT NULL THEN 1 ELSE 0 END) as completeness_score,
registration_date,
ROW_NUMBER() OVER (
PARTITION BY LOWER(TRIM(first_name)) || '|' ||
LOWER(TRIM(last_name)) || '|' ||
date_of_birth || '|' ||
location_pcode
ORDER BY
-- Primary: completeness
(CASE WHEN phone_number IS NOT NULL THEN 1 ELSE 0 END +
CASE WHEN email IS NOT NULL THEN 1 ELSE 0 END +
CASE WHEN national_id IS NOT NULL THEN 1 ELSE 0 END +
CASE WHEN household_id IS NOT NULL THEN 1 ELSE 0 END) DESC,
-- Secondary: earliest registration
registration_date ASC
) as survivor_rank
FROM beneficiary_registry
)
SELECT match_key, beneficiary_id, completeness_score, registration_date,
CASE WHEN survivor_rank = 1 THEN 'survivor' ELSE 'duplicate' END as resolution
FROM duplicate_clusters
WHERE match_key IN (
SELECT match_key FROM duplicate_clusters GROUP BY 1 HAVING COUNT(*) > 1
)
ORDER BY match_key, survivor_rank;
  1. Merge valuable data from duplicate records into survivors before deletion:
-- Merge phone numbers from duplicates into survivors where survivor is NULL
UPDATE beneficiary_registry survivor
SET phone_number = dup.phone_number,
last_modified = CURRENT_TIMESTAMP,
modified_by = 'dq_dedup_batch_2024_001'
FROM (
SELECT match_key, beneficiary_id, phone_number
FROM duplicate_clusters
WHERE survivor_rank > 1
AND phone_number IS NOT NULL
) dup
JOIN (
SELECT match_key, beneficiary_id
FROM duplicate_clusters
WHERE survivor_rank = 1
AND phone_number IS NULL
) surv ON dup.match_key = surv.match_key
WHERE survivor.beneficiary_id = surv.beneficiary_id;
  1. Archive duplicate records before deletion:
INSERT INTO beneficiary_registry_archive
SELECT *, CURRENT_TIMESTAMP as archived_at, 'duplicate_resolution' as archive_reason
FROM beneficiary_registry
WHERE beneficiary_id IN (
SELECT beneficiary_id
FROM duplicate_clusters
WHERE survivor_rank > 1
);
  1. Delete archived duplicates:
DELETE FROM beneficiary_registry
WHERE beneficiary_id IN (
SELECT beneficiary_id
FROM duplicate_clusters
WHERE survivor_rank > 1
);

Verify deletion count matches expected duplicates:

DELETE 847

Phase 6: Standardisation and missing data

Standardisation corrects inconsistent representations of the same value. Missing data handling applies default values or derives values from related fields.

  1. Apply standardisation transformations for categorical fields:
-- Standardise gender field
UPDATE beneficiary_registry
SET gender = CASE
WHEN UPPER(TRIM(gender)) IN ('M', 'MALE', 'MAN', 'HOMME') THEN 'Male'
WHEN UPPER(TRIM(gender)) IN ('F', 'FEMALE', 'WOMAN', 'FEMME') THEN 'Female'
WHEN UPPER(TRIM(gender)) IN ('O', 'OTHER', 'AUTRE', 'NON-BINARY') THEN 'Other'
ELSE gender -- Preserve unrecognised values for manual review
END,
last_modified = CURRENT_TIMESTAMP,
modified_by = 'dq_standardisation_batch_2024_001'
WHERE gender IS NOT NULL
AND gender NOT IN ('Male', 'Female', 'Other');
  1. Derive missing values from related fields where business rules permit:
-- Derive age_group from date_of_birth where age_group is NULL
UPDATE beneficiary_registry
SET age_group = CASE
WHEN DATE_PART('year', AGE(date_of_birth)) < 5 THEN '0-4'
WHEN DATE_PART('year', AGE(date_of_birth)) < 18 THEN '5-17'
WHEN DATE_PART('year', AGE(date_of_birth)) < 60 THEN '18-59'
ELSE '60+'
END,
last_modified = CURRENT_TIMESTAMP,
modified_by = 'dq_derivation_batch_2024_001'
WHERE age_group IS NULL
AND date_of_birth IS NOT NULL;
  1. Apply default values for mandatory fields where organisational policy permits:
-- Set default registration status for records missing status
UPDATE beneficiary_registry
SET registration_status = 'pending_verification',
last_modified = CURRENT_TIMESTAMP,
modified_by = 'dq_default_batch_2024_001'
WHERE registration_status IS NULL;

Default value policy

Apply defaults only for fields where your data governance policy explicitly permits default assignment. Document default logic in the remediation ticket for audit purposes.

Verification

Verification confirms remediation achieved its objectives without introducing new problems.

Execute a full data profile after remediation completes:

-- Trigger re-profiling (tool-specific; example for Great Expectations)
-- great_expectations checkpoint run beneficiary_registry_checkpoint

For manual verification, run the original quality rules against remediated data:

-- Recount phone number violations
SELECT COUNT(*) as remaining_violations
FROM beneficiary_registry
WHERE phone_number IS NOT NULL
AND phone_number !~ '^\+?[0-9]{10,15}$';

Expected result: 0 violations (auto-correctable records fixed) or a count matching the “requires_review” category from planning.

Compare quality scores before and after remediation:

SELECT
p.profile_name,
p.run_timestamp,
p.total_records,
p.passing_records,
ROUND(100.0 * p.passing_records / p.total_records, 2) as quality_score_pct
FROM data_quality.profile_results p
WHERE p.dataset_name = 'beneficiary_registry'
AND p.rule_name = 'phone_format_valid'
ORDER BY p.run_timestamp DESC
LIMIT 2;

Expected output showing improvement:

profile_name | run_timestamp | total_records | passing_records | quality_score_pct
--------------------|---------------------|---------------|-----------------|------------------
post_remediation | 2024-11-16 14:30:00 | 49489 | 49077 | 99.17
pre_remediation | 2024-11-16 09:00:00 | 49489 | 45230 | 91.39

Perform spot checks on remediated records to verify corrections are sensible:

SELECT beneficiary_id,
original_value,
new_value,
correction_timestamp
FROM data_quality.remediation_log
WHERE remediation_batch = 'BATCH_2024_001'
ORDER BY RANDOM()
LIMIT 20;

Review the 20 randomly selected corrections. Each should show a logical transformation from invalid to valid format.

Verify no downstream system failures occurred after remediation by checking application logs or monitoring dashboards for the 2 hours following correction completion.

Troubleshooting

SymptomCauseResolution
UPDATE affects 0 rowsWHERE clause too restrictive or data already remediatedVerify issue still exists with SELECT using same WHERE clause; check if previous remediation batch already corrected
UPDATE affects more rows than expectedWHERE clause too broad; missing conditionROLLBACK immediately; add additional predicates to limit scope; re-test with SELECT before UPDATE
Transaction timeout during large batchBatch size exceeds statement_timeout settingSplit remediation into smaller batches of 10,000-50,000 records; increase timeout with SET LOCAL statement_timeout = '2h'
Foreign key violation on DELETEDuplicate records referenced by child tablesUpdate child table foreign keys to point to survivor record before deleting duplicates
Deadlock detectedConcurrent processes modifying same recordsRetry after delay; schedule remediation during low-activity windows; use explicit row locking with SELECT ... FOR UPDATE
Character encoding errors in standardisationSource data contains mixed encodingsConvert encoding before standardisation: CONVERT_FROM(column::bytea, 'UTF8')
Regex pattern not matching expected recordsPattern syntax differs between databasesVerify regex flavour (POSIX vs PCRE); test pattern in database-specific regex tester
Remediation log INSERT fails with duplicate keyLog table primary key conflict from previous runTruncate log table for current batch or use different batch identifier
Quality score unchanged after remediationRe-profiling not executed or cached resultsForce fresh profile run; clear profile cache; verify profile query references correct timestamp
New violations appear immediately after remediationSource system continuing to load invalid dataImplement source fix before re-remediating; add validation to ingestion pipeline
Rollback fails with “transaction already committed”COMMIT executed before problem discoveredRestore from remediation_log table using reverse UPDATE; if log incomplete, restore from backup
Memory exhaustion during large UPDATEDatabase loading entire result set into memoryProcess in batches using LIMIT and OFFSET; use cursor-based approach

Recurring issue investigation

When the same quality issues reappear within 30 days of remediation, the root cause was not addressed. Investigate using this sequence:

Query the lineage for records that failed quality rules after remediation:

SELECT l.source_system,
l.load_job_id,
l.load_timestamp,
COUNT(*) as new_violations
FROM beneficiary_registry b
JOIN data_lineage.load_audit l ON b.load_batch_id = l.load_job_id
WHERE b.phone_number !~ '^\+?[0-9]{10,15}$'
AND b.registration_date > (
SELECT MAX(correction_timestamp)
FROM data_quality.remediation_log
WHERE remediation_batch = 'BATCH_2024_001'
)
GROUP BY 1, 2, 3
ORDER BY l.load_timestamp;

This identifies which source system and load job introduced new violations. Address by implementing validation at the ingestion point rather than repeated remediation.

See also