Database Index Advisor
Overview
Analyze database index usage, identify missing indexes causing sequential scans, detect redundant or unused indexes wasting write performance, and recommend optimal index configurations for PostgreSQL and MySQL.
Prerequisites
- Database credentials with access to
pgstatuserindexes, pgstatusertables, and pgstatstatements (PostgreSQL) or performance_schema and sys schema (MySQL)
pgstatstatements extension enabled for PostgreSQL query statistics
psql or mysql CLI for executing analysis queries
- Representative workload running (analysis during off-peak hours may miss important query patterns)
- At least 24 hours of statistics accumulation since the last
pgstatreset()
Instructions
- Identify tables with high sequential scan activity (candidates for missing indexes):
- PostgreSQL:
SELECT relname, seqscan, seqtupread, idxscan, nlivetup FROM pgstatusertables WHERE seqscan > 100 AND nlivetup > 10000 ORDER BY seqtupread DESC LIMIT 20
- A table with high
seqscan count and high seqtupread relative to nlive_tup is scanning most of the table repeatedly
- Find the queries causing sequential scans by correlating with
pgstatstatements:
SELECT query, calls, meanexectime, rows FROM pgstatstatements WHERE query ILIKE '%tablename%' ORDER BY meanexec_time DESC LIMIT 10
- Run
EXPLAIN (ANALYZE, BUFFERS) on the top queries to confirm sequential scan usage
- Analyze query WHERE clauses and JOIN conditions to determine which columns need indexes. Extract the filtering columns and their selectivity:
SELECT columnname, ndistinct, correlation FROM pgstats WHERE tablename = 'targettable'
- High
n_distinct (close to row count) indicates good index selectivity
correlation close to 1.0 or -1.0 suggests the column benefits from a B-tree index
- Recommend composite indexes for multi-column queries. Follow the equality-first, range-second ordering:
- Place columns used with
= operators first in the index
- Place columns used with
>, <, BETWEEN, or LIKE 'prefix%' last
- Example:
WHERE status = 'active' AND createdat > '2024-01-01' -> CREATE INDEX ON orders (status, createdat)
- Identify unused indexes wasting write performance: