Skip to main content

Backup Verification

Backup verification confirms that backup data exists, remains intact, and can be restored within required timeframes. Verification operates at three levels: integrity checking validates that backup files are uncorrupted, restore testing proves that data can be recovered to a usable state, and compliance documentation demonstrates verification activities to auditors and regulators. Without systematic verification, organisations discover backup failures only when attempting recovery during an incident.

Prerequisites

Before beginning backup verification activities, confirm the following requirements are in place.

RequirementDetail
Backup system accessAdministrative access to backup platform (Restic, Veeam, Azure Backup, AWS Backup, or equivalent)
Test restore environmentIsolated infrastructure for restore testing that does not affect production systems
Storage capacitySufficient space for test restores: minimum 1.5x the size of largest backup set
Documentation accessWrite access to backup verification log repository
Scheduling authorityAbility to schedule verification jobs during maintenance windows
Time allocation4 hours weekly for routine verification; 8 hours monthly for full restore tests

Verify that backup jobs are completing successfully before beginning verification. A backup that fails to run produces nothing to verify:

Terminal window
# Check recent backup job status (Restic example)
restic -r /path/to/repo snapshots --latest 5
# Expected output shows recent snapshots with timestamps
# ID Time Host Tags Paths
# a1b2c3d4 2024-11-15 02:00:01 prod-db-01 /var/lib/postgresql
# e5f6g7h8 2024-11-14 02:00:01 prod-db-01 /var/lib/postgresql

If no recent snapshots appear, investigate the backup job before proceeding with verification.

Procedure

Backup verification follows a tiered approach: automated integrity checks run continuously, scheduled restore tests occur weekly and monthly, and compliance evidence collection happens quarterly or as required by audit schedules.

Configuring automated integrity verification

Automated verification catches corruption, storage failures, and incomplete backups without manual intervention. Configure these checks to run after each backup job completes.

  1. Enable backup integrity checking in your backup configuration. For Restic-based backups, add verification to the post-backup script:
/opt/backup/scripts/backup-with-verify.sh
#!/bin/bash
REPO="/mnt/backup/restic-repo"
BACKUP_PATH="/var/lib/postgresql"
# Run backup
restic -r "$REPO" backup "$BACKUP_PATH" --tag postgresql
BACKUP_EXIT=$?
# Verify backup integrity
if [ $BACKUP_EXIT -eq 0 ]; then
restic -r "$REPO" check --read-data-subset=5%
VERIFY_EXIT=$?
else
VERIFY_EXIT=1
fi
# Report results
if [ $VERIFY_EXIT -ne 0 ]; then
echo "CRITICAL: Backup verification failed" | \
mail -s "Backup Alert: $(hostname)" backup-alerts@example.org
exit 1
fi
echo "Backup and verification completed successfully"
exit 0

The --read-data-subset=5% flag verifies a random 5% sample of backup data on each run. Over 20 backup cycles, this provides statistical coverage of the entire repository while keeping verification time under 30 minutes for repositories up to 2TB.

  1. Configure verification scheduling in your job scheduler. For systemd-based systems, create a timer that runs verification independently of backup jobs:
/etc/systemd/system/backup-verify.timer
[Unit]
Description=Weekly full backup verification
[Timer]
OnCalendar=Sun 04:00
Persistent=true
RandomizedDelaySec=1800
[Install]
WantedBy=timers.target
/etc/systemd/system/backup-verify.service
[Unit]
Description=Full backup repository verification
[Service]
Type=oneshot
ExecStart=/opt/backup/scripts/full-verify.sh
User=backup
StandardOutput=journal
StandardError=journal

Enable the timer:

Terminal window
sudo systemctl enable --now backup-verify.timer
  1. Create the full verification script that performs comprehensive integrity checking:
/opt/backup/scripts/full-verify.sh
#!/bin/bash
REPO="/mnt/backup/restic-repo"
LOG_DIR="/var/log/backup-verify"
DATE=$(date +%Y%m%d)
mkdir -p "$LOG_DIR"
echo "Starting full verification at $(date)" | tee "$LOG_DIR/verify-$DATE.log"
# Full repository check (reads all data)
restic -r "$REPO" check --read-data 2>&1 | tee -a "$LOG_DIR/verify-$DATE.log"
CHECK_EXIT=${PIPESTATUS[0]}
# Verify snapshot consistency
restic -r "$REPO" snapshots --json | \
jq -r '.[] | "\(.time) \(.hostname) \(.paths[])"' | \
tee -a "$LOG_DIR/verify-$DATE.log"
# Check for stale locks
restic -r "$REPO" unlock 2>&1 | tee -a "$LOG_DIR/verify-$DATE.log"
echo "Verification completed with exit code $CHECK_EXIT at $(date)" | \
tee -a "$LOG_DIR/verify-$DATE.log"
exit $CHECK_EXIT
  1. Configure alerting for verification failures. Create an alert rule that triggers when verification jobs fail or do not run:
/etc/prometheus/rules/backup-alerts.yml
groups:
- name: backup_verification
rules:
- alert: BackupVerificationFailed
expr: backup_verify_success == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Backup verification failed on {{ $labels.instance }}"
description: "Backup verification has been failing for more than 5 minutes."
- alert: BackupVerificationMissing
expr: time() - backup_verify_last_success > 604800
for: 1h
labels:
severity: warning
annotations:
summary: "No successful backup verification in 7 days"
description: "Instance {{ $labels.instance }} has not had a successful verification in over 7 days."

For environments without Prometheus, configure email alerts through cron or your backup platform’s native alerting.

Performing scheduled restore tests

Automated integrity checks confirm that backup data is uncorrupted, but only restore tests prove that data can actually be recovered. Schedule restore tests according to the criticality of each data type.

+--------------------------------------------------------------------+
| RESTORE TEST SCHEDULE |
+--------------------------------------------------------------------+
| |
| DATA CRITICALITY TEST FREQUENCY RESTORE SCOPE |
| +-----------------+ +---------------+ +---------------+ |
| | Critical | | Weekly | | Sample files | |
| | (databases, +------>| (every Sunday)| | + full DB | |
| | financial) | | | | monthly | |
| +-----------------+ +---------------+ +---------------+ |
| |
| +-----------------+ +---------------+ +---------------+ |
| | Important | | Monthly | | Representative| |
| | (documents, +------>| (1st Sunday) | | sample | |
| | email) | | | | | |
| +-----------------+ +---------------+ +---------------+ |
| |
| +-----------------+ +---------------+ +---------------+ |
| | Standard | | Quarterly | | Random sample | |
| | (user files, +------>| (Jan/Apr/ | | | |
| | archives) | | Jul/Oct) | | | |
| +-----------------+ +---------------+ +---------------+ |
| |
+--------------------------------------------------------------------+

Figure 1: Restore test frequency aligned to data criticality

  1. Prepare your test restore environment. The environment must be isolated from production to prevent test data from contaminating live systems:
Terminal window
# Create isolated restore target directory
sudo mkdir -p /mnt/restore-test
sudo chown backup:backup /mnt/restore-test
# For database restores, prepare a test database instance
# (PostgreSQL example)
sudo -u postgres createdb restore_test_db

For cloud-based backup systems, provision a separate resource group or project for restore testing:

Terminal window
# Azure example
az group create --name rg-restore-test --location uksouth
# AWS example
aws ec2 create-vpc --cidr-block 10.99.0.0/16 \
--tag-specifications 'ResourceType=vpc,Tags=[{Key=Purpose,Value=restore-test}]'
  1. Execute a file-level restore test. Select files from different time periods to verify both recent and older backups:
Terminal window
# Restore specific files from most recent backup
restic -r /mnt/backup/restic-repo restore latest \
--target /mnt/restore-test \
--include "/var/lib/postgresql/data/base" \
--verify
# Restore from backup 30 days ago
SNAPSHOT_30D=$(restic -r /mnt/backup/restic-repo snapshots \
--json | jq -r '[.[] | select(.time < (now - 2592000 | todate))] | .[0].id')
restic -r /mnt/backup/restic-repo restore "$SNAPSHOT_30D" \
--target /mnt/restore-test-30d \
--include "/etc" \
--verify

The --verify flag compares restored file checksums against the backup repository, confirming data integrity through the entire restore chain.

  1. Execute a database restore test. Database restores require bringing the restored data online to confirm usability:
Terminal window
# PostgreSQL restore test
# 1. Restore the data directory
restic -r /mnt/backup/restic-repo restore latest \
--target /mnt/restore-test/pg-data \
--include "/var/lib/postgresql/14/main"
# 2. Start PostgreSQL against restored data
sudo -u postgres /usr/lib/postgresql/14/bin/pg_ctl \
-D /mnt/restore-test/pg-data/var/lib/postgresql/14/main \
-o "-p 5433" \
start
# 3. Verify database accessibility and run consistency check
psql -h localhost -p 5433 -U postgres -c "SELECT count(*) FROM pg_tables;"
psql -h localhost -p 5433 -U postgres -d production_db \
-c "SELECT schemaname, tablename FROM pg_tables WHERE schemaname = 'public';"
# 4. Run application-specific verification queries
psql -h localhost -p 5433 -U postgres -d production_db \
-c "SELECT COUNT(*) as beneficiary_count FROM beneficiaries;"
# Expected: Count matches or closely matches production
# Deviation greater than 1% indicates potential backup issue
# 5. Shutdown test instance and clean up
sudo -u postgres /usr/lib/postgresql/14/bin/pg_ctl \
-D /mnt/restore-test/pg-data/var/lib/postgresql/14/main \
stop
rm -rf /mnt/restore-test/pg-data
  1. Document restore test results. Create a verification record for each test:
Terminal window
# Generate restore test report
cat > /var/log/backup-verify/restore-test-$(date +%Y%m%d).md << EOF
# Restore Test Report
Date: $(date +%Y-%m-%d)
Tester: $(whoami)
Backup Repository: /mnt/backup/restic-repo
## File Restore Test
- Snapshot ID: $(restic -r /mnt/backup/restic-repo snapshots --latest 1 --json | jq -r '.[0].id')
- Snapshot Date: $(restic -r /mnt/backup/restic-repo snapshots --latest 1 --json | jq -r '.[0].time')
- Files Restored: 847
- Verification: PASSED (checksums match)
- Restore Duration: 4m 23s
## Database Restore Test
- Database: production_db
- Table Count: 42 (matches production)
- Row Count Check: beneficiaries: 15,847 (production: 15,851, delta: 0.03%)
- Verification: PASSED
- Restore Duration: 12m 07s
## Issues Identified
None
## Sign-off
Test completed successfully. Backups verified recoverable.
EOF
  1. Measure and record restore performance. Recovery Time Objective (RTO) compliance depends on knowing actual restore speeds:
Terminal window
# Timed full restore test (run quarterly)
START=$(date +%s)
restic -r /mnt/backup/restic-repo restore latest \
--target /mnt/restore-test/full \
--verify
END=$(date +%s)
DURATION=$((END - START))
SIZE=$(du -sh /mnt/restore-test/full | cut -f1)
echo "Full restore: $SIZE in $DURATION seconds"
echo "Restore rate: $(echo "scale=2; $(du -sb /mnt/restore-test/full | cut -f1) / $DURATION / 1048576" | bc) MB/s"
# Compare against RTO
# If RTO is 4 hours (14400 seconds) and restore took 3600 seconds,
# you have 75% margin

Configuring verification for cloud backup services

Cloud backup services provide built-in verification capabilities that differ from self-managed backup tools. Configure these features to provide equivalent assurance.

  1. Enable Azure Backup verification features:
Terminal window
# Enable soft delete and cross-region restore for Recovery Services vault
az backup vault update \
--resource-group rg-backup \
--name vault-backup-prod \
--soft-delete-state Enabled \
--cross-region-restore Enabled
# Configure backup policy with verification
az backup policy create \
--resource-group rg-backup \
--vault-name vault-backup-prod \
--name policy-daily-verified \
--backup-management-type AzureIaasVM \
--policy '{
"schedulePolicy": {
"schedulePolicyType": "SimpleSchedulePolicy",
"scheduleRunFrequency": "Daily",
"scheduleRunTimes": ["2024-01-01T02:00:00Z"]
},
"retentionPolicy": {
"retentionPolicyType": "LongTermRetentionPolicy",
"dailySchedule": {
"retentionDuration": {"count": 30, "durationType": "Days"}
}
},
"instantRpRetentionRangeInDays": 5
}'
  1. Configure AWS Backup verification:
Terminal window
# Create backup plan with verification
aws backup create-backup-plan --backup-plan '{
"BackupPlanName": "daily-verified",
"Rules": [{
"RuleName": "daily-backup",
"TargetBackupVaultName": "backup-vault-prod",
"ScheduleExpression": "cron(0 2 * * ? *)",
"StartWindowMinutes": 60,
"CompletionWindowMinutes": 180,
"Lifecycle": {
"MoveToColdStorageAfterDays": 30,
"DeleteAfterDays": 365
},
"EnableContinuousBackup": true
}]
}'
# Create restore testing plan
aws backup create-restore-testing-plan --restore-testing-plan '{
"RestoreTestingPlanName": "weekly-restore-test",
"ScheduleExpression": "cron(0 4 ? * SUN *)",
"StartWindowHours": 4,
"RecoveryPointSelection": {
"Algorithm": "LATEST_WITHIN_WINDOW",
"IncludeVaults": ["arn:aws:backup:eu-west-1:123456789:backup-vault:backup-vault-prod"],
"RecoveryPointTypes": ["CONTINUOUS", "SNAPSHOT"]
}
}'
  1. Schedule and monitor cloud restore tests:
Terminal window
# AWS: Check restore testing job status
aws backup list-restore-testing-plans
aws backup list-restore-jobs --by-restore-testing-plan-arn \
"arn:aws:backup:eu-west-1:123456789:restore-testing-plan:weekly-restore-test"
# Azure: Check restore job history
az backup job list \
--resource-group rg-backup \
--vault-name vault-backup-prod \
--operation Restore \
--output table

Establishing verification metrics and reporting

Verification activities generate data that demonstrates backup health over time and provides evidence for compliance requirements.

  1. Define key verification metrics. Track these indicators to identify trends before they become failures:
# Create metrics collection script
cat > /opt/backup/scripts/collect-metrics.sh << 'EOF'
#!/bin/bash
REPO="/mnt/backup/restic-repo"
METRICS_FILE="/var/lib/prometheus/node-exporter/backup_metrics.prom"
# Metric: Last successful verification timestamp
LAST_VERIFY=$(stat -c %Y /var/log/backup-verify/verify-*.log 2>/dev/null | sort -rn | head -1)
echo "backup_verify_last_success ${LAST_VERIFY:-0}" > "$METRICS_FILE"
# Metric: Repository size
REPO_SIZE=$(restic -r "$REPO" stats --json 2>/dev/null | jq -r '.total_size // 0')
echo "backup_repository_size_bytes $REPO_SIZE" >> "$METRICS_FILE"
# Metric: Snapshot count
SNAPSHOT_COUNT=$(restic -r "$REPO" snapshots --json 2>/dev/null | jq -r 'length // 0')
echo "backup_snapshot_count $SNAPSHOT_COUNT" >> "$METRICS_FILE"
# Metric: Latest snapshot age (seconds)
LATEST_TIME=$(restic -r "$REPO" snapshots --latest 1 --json 2>/dev/null | \
jq -r '.[0].time // "1970-01-01T00:00:00Z"')
LATEST_EPOCH=$(date -d "$LATEST_TIME" +%s 2>/dev/null || echo 0)
NOW_EPOCH=$(date +%s)
SNAPSHOT_AGE=$((NOW_EPOCH - LATEST_EPOCH))
echo "backup_latest_snapshot_age_seconds $SNAPSHOT_AGE" >> "$METRICS_FILE"
# Metric: Verification success (1=success, 0=failure)
if grep -q "exit code 0" /var/log/backup-verify/verify-$(date +%Y%m%d).log 2>/dev/null; then
echo "backup_verify_success 1" >> "$METRICS_FILE"
else
echo "backup_verify_success 0" >> "$METRICS_FILE"
fi
EOF
chmod +x /opt/backup/scripts/collect-metrics.sh
  1. Create a verification dashboard or report template:
# Generate weekly verification report
cat > /opt/backup/scripts/weekly-report.sh << 'EOF'
#!/bin/bash
REPORT_DIR="/var/log/backup-verify/reports"
mkdir -p "$REPORT_DIR"
WEEK=$(date +%Y-W%V)
REPORT="$REPORT_DIR/weekly-$WEEK.md"
cat > "$REPORT" << HEADER
# Backup Verification Weekly Report
Week: $WEEK
Generated: $(date +%Y-%m-%d)
## Summary
| Metric | Value | Status |
|--------|-------|--------|
HEADER
# Calculate metrics
VERIFY_COUNT=$(ls -1 /var/log/backup-verify/verify-*.log 2>/dev/null | \
xargs -I {} sh -c 'date -d "$(basename {} .log | cut -d- -f2)" +%s' | \
awk -v week_start=$(date -d "last sunday" +%s) '$1 >= week_start' | wc -l)
RESTORE_TESTS=$(grep -l "Restore Test Report" /var/log/backup-verify/*.md 2>/dev/null | \
xargs -I {} sh -c 'date -d "$(basename {} .md | cut -d- -f3)" +%s' | \
awk -v week_start=$(date -d "last sunday" +%s) '$1 >= week_start' | wc -l)
FAILURES=$(grep -l "FAILED\|exit code [1-9]" /var/log/backup-verify/verify-*.log 2>/dev/null | wc -l)
echo "| Integrity checks completed | $VERIFY_COUNT | $([ $VERIFY_COUNT -ge 7 ] && echo '✓' || echo '⚠') |" >> "$REPORT"
echo "| Restore tests completed | $RESTORE_TESTS | $([ $RESTORE_TESTS -ge 1 ] && echo '✓' || echo '⚠') |" >> "$REPORT"
echo "| Verification failures | $FAILURES | $([ $FAILURES -eq 0 ] && echo '✓' || echo '✗') |" >> "$REPORT"
cat >> "$REPORT" << FOOTER
## Restore Test Results
$(cat /var/log/backup-verify/restore-test-*.md 2>/dev/null | grep -A 20 "^## " | head -40)
## Issues Requiring Attention
$(grep -h "FAILED\|ERROR\|WARNING" /var/log/backup-verify/verify-*.log 2>/dev/null | sort -u | head -10)
---
*Report generated automatically. Review and archive for compliance.*
FOOTER
echo "Report generated: $REPORT"
EOF
chmod +x /opt/backup/scripts/weekly-report.sh
  1. Archive verification evidence for compliance. Retain verification records according to your data retention policy, typically matching or exceeding backup retention:
# Archive verification logs monthly
cat > /opt/backup/scripts/archive-verification.sh << 'EOF'
#!/bin/bash
ARCHIVE_DIR="/mnt/archive/backup-verification"
LOG_DIR="/var/log/backup-verify"
MONTH=$(date -d "last month" +%Y-%m)
mkdir -p "$ARCHIVE_DIR"
# Create compressed archive of month's verification logs
tar -czf "$ARCHIVE_DIR/verification-$MONTH.tar.gz" \
-C "$LOG_DIR" \
$(find "$LOG_DIR" -name "*$MONTH*" -type f -printf "%f\n")
# Generate SHA256 checksum for archive integrity
sha256sum "$ARCHIVE_DIR/verification-$MONTH.tar.gz" > \
"$ARCHIVE_DIR/verification-$MONTH.tar.gz.sha256"
# Remove archived logs from active directory (keep 3 months online)
find "$LOG_DIR" -name "*.log" -mtime +90 -delete
find "$LOG_DIR" -name "*.md" -mtime +90 -delete
EOF
chmod +x /opt/backup/scripts/archive-verification.sh

Verification

After configuring backup verification, confirm that the system operates correctly through these checks.

Check that automated verification runs on schedule:

Terminal window
# Verify timer is active and shows next run time
systemctl list-timers | grep backup-verify
# Expected output:
# Sun 2024-11-17 04:00:00 GMT 6 days left Sun 2024-11-10 04:12:33 GMT 23h ago backup-verify.timer backup-verify.service
# Check recent verification job logs
journalctl -u backup-verify.service --since "1 week ago" | tail -20

Confirm alerting functions correctly by triggering a test alert:

Terminal window
# Temporarily create a failing verification to test alerting
echo "backup_verify_success 0" > /var/lib/prometheus/node-exporter/backup_metrics.prom
# Wait for alert to fire (check Prometheus/Alertmanager)
# Then restore normal metric
/opt/backup/scripts/collect-metrics.sh

Verify restore test documentation exists and contains required elements:

Terminal window
# Check for recent restore test reports
ls -la /var/log/backup-verify/restore-test-*.md
# Verify report contains required sections
grep -E "^## (File|Database) Restore Test" /var/log/backup-verify/restore-test-*.md

Confirm metrics are being collected and exported:

Terminal window
# Check Prometheus metrics endpoint (if using node_exporter textfile collector)
curl -s localhost:9100/metrics | grep backup_
# Expected output includes:
# backup_verify_last_success 1731234567
# backup_verify_success 1
# backup_repository_size_bytes 1234567890
# backup_snapshot_count 30
+------------------------------------------------------------------+
| VERIFICATION CONFIRMATION FLOW |
+------------------------------------------------------------------+
| |
| 1. AUTOMATED CHECKS |
| +-------------------+ |
| | Timer active? +---> NO ---> Enable timer |
| +--------+----------+ |
| | YES |
| v |
| +-------------------+ |
| | Jobs completing? +---> NO ---> Check logs for errors |
| +--------+----------+ |
| | YES |
| v |
| 2. ALERTING |
| +-------------------+ |
| | Test alert fires? +---> NO ---> Check alert config |
| +--------+----------+ |
| | YES |
| v |
| 3. DOCUMENTATION |
| +-------------------+ |
| | Reports exist? +---> NO ---> Run manual restore test |
| +--------+----------+ |
| | YES |
| v |
| +-------------------+ |
| | VERIFICATION | |
| | COMPLETE | |
| +-------------------+ |
| |
+------------------------------------------------------------------+

Figure 2: Verification system confirmation workflow

Troubleshooting

SymptomCauseResolution
repository is locked error during verificationPrevious job did not complete cleanly, or concurrent accessRun restic unlock to clear stale locks. If lock persists, check for running processes: ps aux | grep restic. Kill orphaned processes if safe.
Verification completes but reports 0 files checkedIncorrect repository path or empty repositoryVerify repository path with restic -r /path/to/repo snapshots. Check that backup jobs are writing to the expected location.
check: data blob not found errorsRepository corruption or incomplete backupRun restic check --read-data to identify affected snapshots. May require restoring from an earlier snapshot or rebuilding from source.
Restore test fails with permission deniedTest restore environment not configured correctlyVerify restore target directory ownership: ls -la /mnt/restore-test. Ensure backup service account has write permissions.
Database restore succeeds but application reports missing dataPoint-in-time recovery needed; snapshot taken during transactionFor transactional consistency, use database-native backup tools (pg_dump, mysqldump) rather than filesystem snapshots, or enable continuous archiving.
Restore test takes longer than RTOInsufficient restore infrastructure, network bottleneck, or storage IOPS limitsProfile restore with time command. Check network throughput during restore: iftop. Consider faster restore targets or parallel restore streams.
Verification timer not runningTimer not enabled, or service file has errorsCheck timer status: systemctl status backup-verify.timer. Verify service file syntax: systemd-analyze verify backup-verify.service.
Metrics not appearing in PrometheusTextfile collector path incorrect, or script not executableVerify collector path in node_exporter config. Check script permissions: ls -la /opt/backup/scripts/collect-metrics.sh. Run script manually to test.
Alert fires but no notification receivedAlertmanager routing misconfigured, or receiver not set upTest Alertmanager directly: amtool alert add alertname=test. Check receiver configuration in alertmanager.yml.
Verification passes but restore failsIntegrity check verifies structure, not application-level consistencyAlways include actual restore tests, not just integrity checks. Database restores must include application verification queries.
Cloud backup verification shows success but restore quota exceededCloud provider limits on restore operations or egressCheck provider quotas and limits. Azure: Recovery Services vault limits. AWS: Backup vault restore limits. Plan restore tests within quota.
Restore test database conflicts with productionTest environment not sufficiently isolatedUse separate ports, separate hosts, or containerised test environments. Never restore to production database server without explicit isolation.

Verification is not backup

Successful verification confirms that existing backups are recoverable. It does not guarantee that backups contain the right data, cover all required systems, or meet retention requirements. Verification is one component of backup assurance alongside backup policy, coverage audits, and retention compliance checks.

See also