NoSQL Data Modeler
Overview
Design data models for NoSQL databases including MongoDB (document), DynamoDB (key-value/wide-column), Redis (key-value), and Cassandra (wide-column). Unlike relational modeling where normalization drives design, NoSQL modeling starts from access patterns and query requirements, then shapes the data to serve those patterns efficiently.
Prerequisites
mongosh, aws dynamodb CLI, redis-cli, or cqlsh installed depending on target database
- Documented list of application access patterns (read/write queries the application performs)
- Expected data volumes (document count, average document size, growth rate)
- Read/write ratio and latency requirements for each access pattern
- Understanding of consistency requirements (strong vs. eventual consistency)
Instructions
- Catalog all application access patterns as a table with columns: pattern name, query description, frequency (queries/sec), latency requirement, and data fields accessed. This drives every modeling decision.
- For MongoDB document modeling, apply the embedding vs. referencing decision framework:
- Embed when: data is always accessed together, child data has no independent lifecycle, cardinality is bounded (1:few), and updates are infrequent.
- Reference when: data has independent access patterns, cardinality is unbounded (1:many/many:many), child documents are large, or data is shared across parents.
- Design document schemas that match query patterns. If the application needs "all orders for a customer with line items," embed line items inside the order document. If the application needs "all products across all orders," use references to a products collection.
- For DynamoDB, design the partition key and sort key to support the primary access pattern with a single-table design. Use composite sort keys (e.g.,
ORDER#2024-01-15#12345) for hierarchical data. Plan GSIs (Global Secondary Indexes) for secondary access patterns, keeping total GSI count under 5.
- Evaluate denormalization trade-offs: duplicating data across documents reduces read latency but increases write complexity and storage. Denormalize data that changes rarely (user names, product categories) but reference data that changes frequently (prices, inventory counts).
- Handle one-to-many relationships by choosing between embedding (small arrays), child referencing (parent stores child IDs), or parent referencing (child stores parent ID). For unbounded one-to-many, always use parent referencing to avoid document size limits (16MB in MongoDB).
- Model many-to-many relationships using an array of references in each document or a dedicated junction collection. For DynamoDB, use adja