Database Archival System
Overview
Implement automated data archival pipelines that move historical records from primary database tables to archive storage (archive tables, S3, Azure Blob, or GCS) based on age, status, or access frequency criteria.
Prerequisites
- Database credentials with SELECT, INSERT, and DELETE permissions on source and archive tables
- Cloud storage credentials (AWS S3, Azure Blob, or GCS) if archiving to cold storage
psql or mysql CLI for executing archival queries
aws s3, az storage, or gsutil CLI for cloud storage uploads
- Understanding of data retention requirements and compliance policies (GDPR, HIPAA, SOX)
- Current table sizes:
SELECT pgsizepretty(pgtotalrelationsize('tablename')) to identify archival candidates
Instructions
- Identify archival candidates by finding large tables with time-based data:
SELECT relname, nlivetup, pgsizepretty(pgtotalrelationsize(relid)) FROM pgstatusertables ORDER BY pgtotalrelation_size(relid) DESC LIMIT 10
- Focus on tables where historical data is rarely queried: logs, audit trails, events, old orders, expired sessions
- Define archival criteria for each table:
- Age-based: Records older than N days/months (
WHERE created_at < NOW() - INTERVAL '1 year')
- Status-based: Records in terminal state (
WHERE status IN ('completed', 'cancelled', 'expired'))
- Combined: Old AND terminal (
WHERE created_at < NOW() - INTERVAL '6 months' AND status = 'completed')
- Calculate the expected volume:
SELECT COUNT(), pgsizepretty(pgcolumnsize(t.)) FROM table_name t WHERE
- Handle referential integrity by archiving in dependency order:
- Archive child records first (order_items before orders)
- For tables with active foreign key references, verify no active records reference the candidates:
SELECT COUNT(*) FROM activechild WHERE parentid IN (SELECT id FROM parent WHERE )
- Option: cascade archive by archiving parent and all descendants together
- Create archive destination tables matching the source schema plus metadata columns:
CREATE TABLE orders_archive (LIKE orders INCLUDING ALL)
ALTER TABLE ordersarchive ADD COLUMN archivedat TIMESTAMPTZ DEFAULT NOW()
ALTER TABLE ordersarchive ADD COLUMN archivebatch_id UUID
- Remove foreign key constraints on archive tables (archived data is self-contained)