Data Archiving
Data archiving transfers data from primary storage to long-term repositories where it remains accessible but at lower cost and with different performance characteristics than operational systems. You perform archiving to meet retention obligations, reduce storage costs on high-performance systems, and preserve historical records for future reference. Unlike backup, which creates copies for disaster recovery, archiving moves data to a different tier with the expectation that retrieval will be infrequent.
- Archive
- A collection of data removed from operational systems and stored in long-term repositories with metadata sufficient for future retrieval and interpretation.
- Cold storage
- Storage tier optimised for infrequently accessed data, offering lowest cost per gigabyte but higher retrieval latency and often retrieval fees.
- Legal hold
- A directive requiring preservation of specified data regardless of normal retention schedules, typically issued in anticipation of litigation or regulatory investigation.
- Archive package
- A self-contained unit combining data files, metadata, checksums, and documentation sufficient to interpret the archive contents without external dependencies.
Prerequisites
Before beginning archiving operations, confirm the following requirements are satisfied.
You need an approved retention policy that specifies which data categories require archiving, the retention periods for each category, and the conditions under which archived data may be disposed. Without this policy, you cannot determine what to archive or for how long. The retention policy should reference your data classification scheme so you can identify sensitive data requiring encryption in archives. Obtain the current retention schedule from your records management function or data governance team.
You need access to archive storage infrastructure with capacity for the data volume you intend to archive plus 20% headroom for packaging overhead. Calculate expected archive size by querying source systems for data volumes matching your archive criteria. For cloud storage, confirm you have credentials with write permissions to the target bucket or container and that network egress costs are budgeted.
You need data classification labels applied to source data so you can determine encryption requirements and access restrictions for archived data. Unclassified data cannot be archived because you cannot verify appropriate handling. If source data lacks classification, complete classification before proceeding.
You need checksums or hash values for source data to verify archive integrity. If source systems do not provide checksums, you must generate them before extraction. SHA-256 provides sufficient collision resistance for integrity verification.
You need documentation of the source system schema, field definitions, and any encoding or compression applied to source data. Archives without schema documentation become uninterpretable when source systems are decommissioned. Export schema definitions and data dictionaries as part of archive preparation.
For database archives, you need credentials with SELECT permissions on tables to be archived and, if performing destructive archiving, DELETE permissions. Coordinate with database administrators to schedule archive operations during low-usage periods if source systems cannot tolerate concurrent archiving load.
For grant or project closeout archiving, you need confirmation from the programme manager that the project has completed final reporting and that no outstanding data requests exist. Premature archiving of project data can obstruct audit responses.
Procedure
Identifying archive candidates
You identify data for archiving by applying retention policy criteria to source systems. The selection criteria derive from your retention schedule and typically include age thresholds, project status, or explicit archive flags set by data owners.
- Query the source system to identify records meeting archive criteria. For date-based selection, calculate the cutoff date from your retention policy. If your policy specifies archiving data older than 24 months from last modification, calculate the threshold:
-- PostgreSQL: identify records for archiving SELECT id, created_at, modified_at, pg_size_pretty(pg_column_size(t.*)) as record_size FROM programme_data t WHERE modified_at < CURRENT_DATE - INTERVAL '24 months' AND archived_at IS NULL AND legal_hold = FALSE;Record the count and total size of candidate records. For this example, assume the query returns 145,000 records totalling 12.3 GB.
- Verify no legal holds apply to candidate records. Query your legal hold register or check the legal hold flag on individual records. Any record under legal hold must be excluded regardless of age:
-- Verify no legal holds in candidate set SELECT COUNT(*) as held_records FROM programme_data WHERE modified_at < CURRENT_DATE - INTERVAL '24 months' AND archived_at IS NULL AND legal_hold = TRUE;If this returns any non-zero count, investigate each hold before proceeding. Do not archive records under legal hold.
- Classify the archive candidate set by data classification level. This determines encryption requirements and storage tier:
SELECT data_classification, COUNT(*) as record_count, SUM(pg_column_size(t.*)) as total_bytes FROM programme_data t WHERE modified_at < CURRENT_DATE - INTERVAL '24 months' AND archived_at IS NULL AND legal_hold = FALSE GROUP BY data_classification;Records classified as CONFIDENTIAL or higher require encryption before transfer to archive storage. Records classified as INTERNAL or PUBLIC may be archived without encryption depending on your security policy.
- Obtain approval from the data owner for the identified archive set. Send the record counts, date ranges, and classification breakdown to the responsible data steward. Document their approval with date and any conditions they specify.
Preparing archive packages
Archive packages bundle data with metadata to ensure future interpretability. A well-constructed archive package contains everything needed to understand and use the archived data without access to the original source system.
- Create a working directory for archive assembly with subdirectories for data, metadata, and checksums:
ARCHIVE_ID="prog-data-$(date +%Y%m%d)-001" mkdir -p /archive/staging/${ARCHIVE_ID}/{data,metadata,checksums} cd /archive/staging/${ARCHIVE_ID}- Export the data from the source system in a format that preserves data types and handles special characters correctly. For relational data, use the database’s native export format or a portable format like CSV with explicit quoting:
# PostgreSQL export with explicit format controls psql -h dbserver -U archive_user -d programme_db -c "\COPY ( SELECT * FROM programme_data WHERE modified_at < CURRENT_DATE - INTERVAL '24 months' AND archived_at IS NULL AND legal_hold = FALSE ) TO '/archive/staging/${ARCHIVE_ID}/data/programme_data.csv' WITH (FORMAT CSV, HEADER TRUE, ENCODING 'UTF8', NULL '')"For binary data or documents, export to the data subdirectory preserving original filenames or using systematic naming:
# Export documents with preserved directory structure rsync -av --files-from=<( psql -t -c "SELECT file_path FROM documents WHERE modified_at < CURRENT_DATE - INTERVAL '24 months'" ) /documents/ /archive/staging/${ARCHIVE_ID}/data/documents/- Generate the schema documentation capturing table structure, field definitions, data types, and constraints:
# Export PostgreSQL schema pg_dump -h dbserver -U archive_user -d programme_db \ --schema-only \ --table=programme_data \ > metadata/schema.sql
# Generate human-readable data dictionary psql -h dbserver -U archive_user -d programme_db -c " SELECT column_name, data_type, character_maximum_length, is_nullable, column_default FROM information_schema.columns WHERE table_name = 'programme_data' ORDER BY ordinal_position " > metadata/data_dictionary.txt- Create an archive manifest documenting the archive contents, source system, extraction date, and retention requirements:
cat > metadata/MANIFEST.json << EOF { "archive_id": "${ARCHIVE_ID}", "created_at": "$(date -Iseconds)", "created_by": "$(whoami)", "source_system": "programme_db.example.org", "source_table": "programme_data", "record_count": 145000, "date_range": { "earliest_record": "2019-03-15", "latest_record": "2022-12-31" }, "data_classification": "INTERNAL", "retention_policy": "PROG-RET-001", "retain_until": "2032-01-01", "extraction_criteria": "modified_at < 2023-01-01", "files": [ {"name": "data/programme_data.csv", "format": "CSV/UTF-8", "rows": 145000}, {"name": "metadata/schema.sql", "format": "PostgreSQL DDL"}, {"name": "metadata/data_dictionary.txt", "format": "text"} ] } EOF- Generate checksums for all files in the archive package to enable integrity verification:
find data metadata -type f -exec sha256sum {} \; > checksums/SHA256SUMS
# Generate a checksum of the checksum file for tamper detection sha256sum checksums/SHA256SUMS > checksums/SHA256SUMS.sig- For data classified as CONFIDENTIAL or higher, encrypt the data files before packaging. Use AES-256 encryption with a key managed through your secrets management system:
# Encrypt data files (for CONFIDENTIAL classification) gpg --symmetric --cipher-algo AES256 \ --output data/programme_data.csv.gpg \ data/programme_data.csv
# Remove unencrypted file after successful encryption rm data/programme_data.csv
# Update checksums to reflect encrypted file sha256sum data/programme_data.csv.gpg >> checksums/SHA256SUMSStore the encryption key identifier in the manifest. Do not store the key itself in the archive.
- Package the archive directory into a single archive file for transfer:
cd /archive/staging tar -cvf ${ARCHIVE_ID}.tar ${ARCHIVE_ID}/
# Generate checksum of the complete package sha256sum ${ARCHIVE_ID}.tar > ${ARCHIVE_ID}.tar.sha256The resulting package for this example measures approximately 4.1 GB after CSV compression inherent in the data.
Selecting storage tier
Archive storage tiers trade access speed against cost. You select the tier based on expected retrieval frequency and acceptable retrieval latency.
+----------------------------------+ | How often will this data be | | retrieved in the next 5 years? | +----------------+-----------------+ | +----------------------+----------------------+ | | | v v v +---------+--------+ +---------+--------+ +---------+--------+ | More than | | 1-10 times | | Likely never, | | 10 times | | (annual audits, | | compliance | | | | occasional | | retention only | | | | reference) | | | +--------+---------+ +--------+---------+ +--------+---------+ | | | v v v +---------+--------+ +---------+--------+ +---------+--------+ | WARM TIER | | COLD TIER | | ARCHIVE TIER | | | | | | | | - S3 Standard-IA | | - S3 Glacier | | - S3 Glacier | | - Azure Cool | | Instant | | Deep Archive | | - GCS Nearline | | - Azure Cold | | - Azure Archive | | | | - GCS Coldline | | - GCS Archive | | Retrieval: mins | | Retrieval: mins | | Retrieval: hours | | Cost: ~$0.01/GB | | Cost: ~$0.004/GB | | Cost: ~$0.001/GB | +------------------+ +------------------+ +------------------+Figure 1: Storage tier selection based on retrieval frequency
The cost figures represent approximate monthly storage costs; retrieval costs add $0.01-0.03 per GB for cold tiers and $0.02-0.05 per GB for archive tiers. A 12 GB archive stored for 10 years in deep archive tier costs approximately $14.40 in storage plus retrieval costs when accessed. The same archive in warm tier costs approximately $144 over 10 years but with immediate access.
For the programme data archive in this example, annual audit access is expected, making cold tier appropriate. Configure the storage class during upload.
Transferring to archive storage
Transfer the packaged archive to its designated storage tier with integrity verification at each stage.
- Upload the archive package to cloud storage with the appropriate storage class:
# AWS S3 with Glacier Instant Retrieval aws s3 cp ${ARCHIVE_ID}.tar \ s3://org-archives/programme-data/${ARCHIVE_ID}.tar \ --storage-class GLACIER_IR \ --metadata "retention-until=2032-01-01,classification=INTERNAL"
# Upload checksum file to standard storage for quick verification aws s3 cp ${ARCHIVE_ID}.tar.sha256 \ s3://org-archives/programme-data/${ARCHIVE_ID}.tar.sha256For Azure Blob Storage:
# Azure with Cool tier az storage blob upload \ --account-name orgarchives \ --container-name programme-data \ --name ${ARCHIVE_ID}.tar \ --file ${ARCHIVE_ID}.tar \ --tier Cool \ --metadata "retention-until=2032-01-01" "classification=INTERNAL"- Verify the upload completed successfully by comparing checksums:
# Get the ETag (MD5 for single-part uploads) from S3 aws s3api head-object \ --bucket org-archives \ --key programme-data/${ARCHIVE_ID}.tar \ --query 'ETag' --output text
# For large files uploaded in parts, download and verify aws s3 cp s3://org-archives/programme-data/${ARCHIVE_ID}.tar - | \ sha256sum | cut -d' ' -f1
# Compare with local checksum cat ${ARCHIVE_ID}.tar.sha256The checksums must match exactly. Any discrepancy indicates transfer corruption requiring re-upload.
- Apply retention locks if your storage platform supports them. Retention locks prevent deletion until the retention period expires, protecting against accidental or malicious deletion:
# AWS S3 Object Lock (requires bucket with Object Lock enabled) aws s3api put-object-retention \ --bucket org-archives \ --key programme-data/${ARCHIVE_ID}.tar \ --retention '{"Mode":"GOVERNANCE","RetainUntilDate":"2032-01-01T00:00:00Z"}'Governance mode allows users with specific permissions to override the lock; compliance mode prevents any deletion until expiry including by the root account.
- Register the archive in your archive index or data catalogue. This registration enables future discovery and retrieval:
INSERT INTO archive_registry ( archive_id, source_system, source_table, record_count, date_range_start, date_range_end, storage_location, storage_tier, checksum_sha256, retain_until, data_classification, created_at, created_by ) VALUES ( 'prog-data-20250101-001', 'programme_db.example.org', 'programme_data', 145000, '2019-03-15', '2022-12-31', 's3://org-archives/programme-data/prog-data-20250101-001.tar', 'GLACIER_IR', 'a3f2b8c9d4e5f6...', '2032-01-01', 'INTERNAL', CURRENT_TIMESTAMP, 'archive_operator' );Marking source data as archived
After successful transfer and verification, update source records to indicate archival status. This prevents duplicate archiving and enables source data deletion if your retention policy permits.
- Update the archived flag on source records within a transaction:
BEGIN;
UPDATE programme_data SET archived_at = CURRENT_TIMESTAMP, archive_reference = 'prog-data-20250101-001' WHERE modified_at < CURRENT_DATE - INTERVAL '24 months' AND archived_at IS NULL AND legal_hold = FALSE;
-- Verify expected count was updated -- Should match earlier candidate count: 145000 SELECT COUNT(*) FROM programme_data WHERE archive_reference = 'prog-data-20250101-001';
COMMIT;- If your retention policy permits deletion of archived source data, schedule the deletion for a grace period after archiving. A 30-day grace period allows discovery of archive problems before source data is removed:
-- Schedule deletion 30 days after archiving INSERT INTO scheduled_deletions ( table_name, selection_criteria, scheduled_for, archive_reference, approved_by ) VALUES ( 'programme_data', 'archive_reference = ''prog-data-20250101-001''', CURRENT_DATE + INTERVAL '30 days', 'prog-data-20250101-001', 'data_steward_name' );Do not delete source data immediately after archiving. The grace period enables retrieval testing and discovery of any packaging errors.
- Remove the staging copy of the archive package after confirming successful upload:
# Verify cloud copy exists and matches CLOUD_EXISTS=$(aws s3api head-object \ --bucket org-archives \ --key programme-data/${ARCHIVE_ID}.tar \ 2>&1 && echo "yes" || echo "no")
if [ "$CLOUD_EXISTS" = "yes" ]; then rm -rf /archive/staging/${ARCHIVE_ID} rm /archive/staging/${ARCHIVE_ID}.tar rm /archive/staging/${ARCHIVE_ID}.tar.sha256 echo "Staging files removed" else echo "ERROR: Cloud copy not verified, retaining staging files" exit 1 fiLegal hold procedures
When a legal hold is issued, you must immediately suspend normal retention processing for affected data and preserve all copies including archives.
- Upon receiving a legal hold notice, identify all data within scope by querying source systems and archive registries:
-- Identify active records under hold scope UPDATE programme_data SET legal_hold = TRUE, legal_hold_reference = 'LH-2025-003', legal_hold_date = CURRENT_DATE WHERE project_id IN ('PRJ-2021-045', 'PRJ-2021-046') OR beneficiary_region = 'Eastern Province';
-- Identify archives containing potentially relevant data SELECT archive_id, storage_location, date_range_start, date_range_end FROM archive_registry WHERE source_table = 'programme_data' AND date_range_end >= '2021-01-01';- Disable any automated deletion or lifecycle policies that could affect held data:
# Suspend S3 lifecycle rule for affected archives aws s3api put-bucket-lifecycle-configuration \ --bucket org-archives \ --lifecycle-configuration '{ "Rules": [{ "ID": "archive-expiry", "Status": "Disabled", "Filter": {"Prefix": "programme-data/"}, "Expiration": {"Days": 3650} }] }'- Document the hold in your legal hold register with scope, date issued, issuing authority, and affected data locations:
INSERT INTO legal_holds ( hold_reference, issued_date, issued_by, matter_description, data_scope, affected_systems, affected_archives, status ) VALUES ( 'LH-2025-003', '2025-01-15', 'General Counsel', 'Employment dispute - Eastern Province operations', 'All programme data for projects PRJ-2021-045, PRJ-2021-046 and Eastern Province beneficiary records', 'programme_db', 'prog-data-20230101-001, prog-data-20240101-001', 'ACTIVE' );- When the legal hold is released, restore normal retention processing and update records:
-- Release hold on active records UPDATE programme_data SET legal_hold = FALSE, legal_hold_released = CURRENT_DATE WHERE legal_hold_reference = 'LH-2025-003';
-- Update hold register UPDATE legal_holds SET status = 'RELEASED', released_date = CURRENT_DATE, released_by = 'General Counsel' WHERE hold_reference = 'LH-2025-003';Data that passed its retention date during the hold period becomes immediately eligible for disposal upon release.
Project and grant closeout archiving
Grant closeout archiving follows a compressed timeline with specific documentation requirements. Donor agreements typically require data preservation for 3-7 years after project end, with some requiring longer retention.
Obtain the grant closeout checklist from the programme manager confirming all deliverables are submitted and final reports accepted. Do not begin archiving until programme confirms readiness.
Query all project-related data across systems using the project or grant identifier:
-- Identify all data linked to closing grant SELECT 'programme_data' as source_table, COUNT(*) as records, MIN(created_at) as earliest, MAX(modified_at) as latest FROM programme_data WHERE grant_id = 'GRANT-2020-123' UNION ALL SELECT 'beneficiary_records', COUNT(*), MIN(created_at), MAX(modified_at) FROM beneficiary_records WHERE grant_id = 'GRANT-2020-123' UNION ALL SELECT 'financial_transactions', COUNT(*), MIN(created_at), MAX(modified_at) FROM financial_transactions WHERE grant_id = 'GRANT-2020-123';- Create a consolidated archive package containing all grant-related data with explicit donor retention requirements in the manifest:
GRANT_ID="GRANT-2020-123" ARCHIVE_ID="grant-closeout-${GRANT_ID}-$(date +%Y%m%d)"
# Include donor retention requirements in manifest cat > metadata/MANIFEST.json << EOF { "archive_id": "${ARCHIVE_ID}", "archive_type": "grant_closeout", "grant_id": "${GRANT_ID}", "donor": "USAID", "project_end_date": "2024-12-31", "donor_retention_requirement": "7 years from project end", "retain_until": "2031-12-31", "created_at": "$(date -Iseconds)", ... } EOF- Transfer the closeout archive to storage with retention lock matching donor requirements. USAID requires 7-year retention; FCDO typically requires 6 years; EU requires 5 years from final payment:
# Calculate retention date based on donor requirement RETAIN_UNTIL="2031-12-31" # 7 years from 2024 project end
aws s3 cp ${ARCHIVE_ID}.tar \ s3://org-archives/grant-closeouts/${ARCHIVE_ID}.tar \ --storage-class GLACIER_IR
aws s3api put-object-retention \ --bucket org-archives \ --key grant-closeouts/${ARCHIVE_ID}.tar \ --retention "{\"Mode\":\"COMPLIANCE\",\"RetainUntilDate\":\"${RETAIN_UNTIL}T00:00:00Z\"}"Use COMPLIANCE mode for donor-mandated retention to prevent any override.
- Notify the programme manager and finance team that archiving is complete, providing the archive reference for inclusion in closeout documentation.
Verification
After completing archiving operations, verify the archive is complete, intact, and retrievable.
Test archive integrity by downloading and verifying checksums. Perform this test on every archive before marking source data for deletion:
# Download archive to verification locationaws s3 cp s3://org-archives/programme-data/${ARCHIVE_ID}.tar \ /archive/verify/${ARCHIVE_ID}.tar
# Verify package checksumcd /archive/verifysha256sum -c ${ARCHIVE_ID}.tar.sha256# Expected output: prog-data-20250101-001.tar: OK
# Extract and verify internal checksumstar -xf ${ARCHIVE_ID}.tarcd ${ARCHIVE_ID}sha256sum -c checksums/SHA256SUMS# Expected output: all files show OKIf any checksum fails, the archive is corrupted. Do not proceed with source data deletion. Investigate the failure, regenerate the archive from source data, and re-upload.
Test data retrievability by extracting sample records and comparing with source. Select 10-20 records at random and verify field values match:
# Extract specific records from archived CSVhead -1 data/programme_data.csv > /tmp/verify_sample.csvgrep -E "^(12345|67890|11111)," data/programme_data.csv >> /tmp/verify_sample.csv
# Compare with source (before deletion)psql -h dbserver -U archive_user -d programme_db -c " SELECT * FROM programme_data WHERE id IN (12345, 67890, 11111)" > /tmp/source_sample.txtVisually compare the extracted records with source records. Any discrepancy indicates extraction or encoding problems requiring investigation.
Verify metadata completeness by confirming the archive package contains all required documentation:
# Check required files existREQUIRED_FILES="metadata/MANIFEST.json metadata/schema.sql metadata/data_dictionary.txt checksums/SHA256SUMS"for file in $REQUIRED_FILES; do if [ ! -f "$file" ]; then echo "MISSING: $file" exit 1 fidoneecho "All required metadata files present"Verify archive registry entry exists and contains correct information:
SELECT archive_id, storage_location, checksum_sha256, retain_untilFROM archive_registryWHERE archive_id = 'prog-data-20250101-001';The query must return exactly one row with values matching the archive package.
Troubleshooting
| Symptom | Cause | Resolution |
|---|---|---|
| Upload fails with “Access Denied” | IAM policy missing write permission to target bucket or path | Verify IAM policy includes s3:PutObject for the target path; check bucket policy does not deny the operation |
| Checksum mismatch after upload | Network corruption during transfer or multipart upload reassembly error | Re-upload the file; for files over 5 GB, use --expected-size flag to ensure consistent multipart chunking |
| Archive extraction fails with “Unexpected EOF” | Incomplete upload or corrupted package | Verify source tar file integrity locally; re-upload if local file is intact; regenerate if local file corrupted |
| Cannot set retention lock | Bucket not configured for Object Lock or insufficient permissions | Object Lock must be enabled at bucket creation; cannot be added later; create new bucket with Object Lock enabled and migrate archives |
| Retrieval from Glacier takes longer than expected | Deep Archive tier selected instead of Instant Retrieval | Check storage class with head-object; for urgent retrieval, use Expedited retrieval option at higher cost |
| Encrypted archive cannot be decrypted | Key not recorded or key rotated | Verify key identifier in manifest matches available keys; restore key from backup if available; encrypted data is unrecoverable without key |
| Archive query returns no results | Archive not registered in catalogue or search criteria mismatch | Query archive_registry directly by known archive_id; verify registration completed; re-register if entry missing |
| ”Insufficient storage” during packaging | Staging volume full | Extend staging volume or clean old staging files; archive packages require approximately 1.2x source data size during creation |
| CSV export contains corrupted characters | Encoding mismatch between source and export | Specify UTF-8 encoding explicitly in export command; verify source data encoding; convert if necessary |
| Legal hold records still being archived | Hold flag not checked in selection query | Add AND legal_hold = FALSE to archive candidate selection; update any records incorrectly archived |
| Lifecycle policy deleting archives early | Retention lock not applied or policy override | Verify retention lock status; reapply with COMPLIANCE mode; check IAM policies for lifecycle override permissions |
| Grant closeout archive rejected by donor | Missing required documentation or incorrect retention period | Review donor requirements; regenerate archive with complete documentation; adjust retention to match donor specification |
Automation
For organisations with regular archiving requirements, automate the identification and packaging steps while retaining manual approval for transfer and source deletion.
Create an archiving script that identifies candidates and prepares packages:
#!/bin/bash# archive_prepare.sh - Identify and package archive candidates
set -e
ARCHIVE_DATE=$(date +%Y%m%d)STAGING_DIR="/archive/staging"LOG_FILE="/var/log/archive/prepare-${ARCHIVE_DATE}.log"
log() { echo "[$(date -Iseconds)] $1" | tee -a "$LOG_FILE"}
# Identify candidateslog "Identifying archive candidates"CANDIDATE_COUNT=$(psql -t -c " SELECT COUNT(*) FROM programme_data WHERE modified_at < CURRENT_DATE - INTERVAL '24 months' AND archived_at IS NULL AND legal_hold = FALSE")
if [ "$CANDIDATE_COUNT" -eq 0 ]; then log "No candidates found, exiting" exit 0fi
log "Found ${CANDIDATE_COUNT} candidates"
# Generate archive packageARCHIVE_ID="prog-data-${ARCHIVE_DATE}-001"log "Creating archive package: ${ARCHIVE_ID}"
mkdir -p "${STAGING_DIR}/${ARCHIVE_ID}"/{data,metadata,checksums}cd "${STAGING_DIR}/${ARCHIVE_ID}"
# Export datapsql -c "\COPY (SELECT * FROM programme_data WHERE modified_at < CURRENT_DATE - INTERVAL '24 months' AND archived_at IS NULL AND legal_hold = FALSE) TO 'data/programme_data.csv' WITH CSV HEADER"
# Generate metadata and checksums# ... (remaining steps from procedure)
log "Archive package ready for review: ${STAGING_DIR}/${ARCHIVE_ID}"log "Manual approval required before transfer"
# Send notificationecho "Archive ${ARCHIVE_ID} ready for review" | \ mail -s "Archive Ready for Approval" data-steward@example.orgSchedule the preparation script to run monthly:
0 2 1 * * archive_user /opt/scripts/archive_prepare.shThe script prepares archives but does not transfer them. A data steward reviews the prepared package and manually executes transfer after verification.