SQL Query Optimizer
Overview
Rewrite SQL queries for maximum performance by eliminating anti-patterns, restructuring JOINs, leveraging window functions, and applying database-specific optimizations for PostgreSQL and MySQL. This skill takes a slow query and its execution plan as input and produces an optimized version with measurable improvement, along with any supporting index changes needed.
Prerequisites
- The slow SQL query text and its current execution time
EXPLAIN ANALYZE output (PostgreSQL) or EXPLAIN FORMAT=JSON output (MySQL) for the query
- Table row counts and approximate data distribution for involved tables
psql or mysql CLI for testing rewrites
- Knowledge of the application's acceptable result ordering and NULL handling requirements
Instructions
- Examine the original query structure and identify common anti-patterns:
SELECT * instead of specific columns (forces unnecessary I/O)
WHERE column IN (SELECT ...) that can be rewritten as JOIN or EXISTS
DISTINCT used to mask duplicate rows from incorrect JOINs
- Functions applied to indexed columns in WHERE clauses (
WHERE UPPER(name) = 'FOO')
OR conditions that prevent index usage
NOT IN with nullable columns (produces wrong results and poor plans)
- Analyze the execution plan to identify the most expensive operation nodes. Focus optimization effort on the node consuming the most time or processing the most rows.
- Rewrite subqueries as JOINs where possible. Convert correlated subqueries to lateral joins (PostgreSQL) or derived tables. Replace
IN (SELECT ...) with EXISTS (SELECT 1 ...) for existence checks since EXISTS short-circuits after the first match.
- Optimize JOIN ordering for the query planner: place the most selective table (fewest matching rows after WHERE filters) as the driving table. Use
JOIN hints only as a last resort since the optimizer usually picks the correct order with accurate statistics.
- Replace multiple OR conditions on the same column with
IN (...): change WHERE status = 'active' OR status = 'pending' to WHERE status IN ('active', 'pending'). For OR across different columns, consider UNION ALL of two simpler queries.
- Apply window functions to replace self-joins or correlated subqueries. Use
ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) for top-N-per-group queries instead of GROUP BY with subqueries.
- Leverage CTEs (Common Table Expressions) for readability but be aware that PostgreSQL versions before 12 materialize all CTEs. For perf