Data Seeder Generator
Overview
Generate realistic database seed scripts that populate development and testing environments with representative data. This skill creates seed data that respects foreign key relationships, unique constraints, check constraints, and data type validations using Faker libraries (faker.js, Faker for Python, or raw SQL with random functions).
Prerequisites
- Database schema definition (SQL DDL, ORM models, or Prisma schema) to understand table structures
- Target database connection for schema introspection (optional, can work from DDL files)
- Faker library available:
@faker-js/faker (Node.js), faker (Python), or Bogus (.NET)
- Knowledge of referential integrity constraints (foreign keys, cascades)
- Target data volume per table (e.g., 100 users, 1000 orders, 5000 line items)
Instructions
- Analyze the database schema to catalog all tables, columns, data types, constraints, and foreign key relationships. Build a dependency graph where parent tables (referenced by foreign keys) must be seeded before child tables.
- Determine the seeding order by topologically sorting the dependency graph. Tables with no foreign keys are seeded first (users, categories, products), then tables referencing them (orders, reviews), then junction tables and deeply nested tables last.
- Map each column to an appropriate Faker generator based on column name and data type:
firstname, lastname -> faker.person.firstName(), faker.person.lastName()
email -> faker.internet.email() with unique enforcement
phone -> faker.phone.number()
address, city, state, zip -> faker.location.*
createdat, updatedat -> faker.date.between({ from: '2023-01-01', to: '2024-12-31' })
price, amount -> faker.commerce.price({ min: 1, max: 999 })
description, bio -> faker.lorem.paragraph()
status -> Random selection from CHECK constraint values or enum values
uuid -> faker.string.uuid()
- Generate foreign key values by referencing previously inserted parent records. Store parent IDs in arrays during generation and randomly select from them for child records. Ensure every parent has at least one child (if the relationship is expected) and distribute children realistically (e.g., Zipf distribution where some users have many orders, most have few).
- Handle unique constraints by tracking generated values in a Set and regenerating on collision. For ema