Data Validation Engine
Overview
Implement and enforce data integrity rules at the database level using CHECK constraints, triggers, foreign keys, and custom validation functions across PostgreSQL and MySQL.
Prerequisites
- Database credentials with ALTER TABLE and CREATE FUNCTION permissions
psql or mysql CLI for executing validation queries
- Current schema documentation or access to
information_schema for column specifications
- Business rules document describing valid data ranges, formats, and relationships
- Backup of production data before applying new constraints (constraints may reject existing invalid data)
Instructions
- Audit existing data quality by running validation queries before adding constraints. Check for NULL values in columns that should be required:
SELECT columnname, COUNT() FILTER (WHERE columnname IS NULL) AS nullcount, COUNT() AS total FROM tablename GROUP BY column_name.
- Detect orphaned records (broken referential integrity):
SELECT c.id FROM childtable c LEFT JOIN parenttable p ON c.parent_id = p.id WHERE p.id IS NULL. Document all orphaned records for cleanup or archival before adding foreign key constraints.
- Validate data format compliance:
- Email format:
SELECT email FROM users WHERE email !~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
- Phone format:
SELECT phone FROM contacts WHERE phone !~ '^\+?[1-9]\d{6,14}$'
- URL format:
SELECT url FROM links WHERE url !~ '^https?://.+'
- Date ranges:
SELECT * FROM events WHERE startdate > enddate
- Check numeric range violations:
SELECT FROM products WHERE price < 0 OR price > 999999.99 and SELECT FROM users WHERE age < 0 OR age > 150. Map each column to its valid range based on business rules.
- Identify duplicate records that violate intended uniqueness:
SELECT email, COUNT() FROM users GROUP BY email HAVING COUNT() > 1. Determine which duplicate to keep (most recent, most complete) and plan deduplication.
- Generate CHECK constraints for validated rules:
ALTER TABLE products ADD CONSTRAINT chkpricepositive CHECK (price >= 0)
ALTER TABLE users ADD CONSTRAINT chkemailformat CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')
ALTER TABLE events ADD CONSTRAINT chkdateorder CHECK (startdate <= enddate)
ALTER TABLE orders ADD CONSTRAINT chkstatusvalid CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled'))
- Create foreign key con