Database Indexes Explained: The Only Guide You Need

TL;DR

Indexes make queries fast by avoiding full table scans. But too many indexes slow down writes. Use EXPLAIN to identify slow queries, add indexes on WHERE/JOIN columns, and avoid indexing everything.

I spent three hours debugging why a simple query was taking 8 seconds on a table with only 100,000 rows. The query looked fine. The database had plenty of memory. But every request was scanning the entire table.

Then I added a single index and the query time dropped to 12 milliseconds. That's 666x faster. One line of SQL.

Most developers know indexes make databases faster, but they don't really understand when to use them, how they work, or why sometimes they make things worse. Here's everything you need to know about database indexes.

What Indexes Actually Do

An index is like a book's index - instead of reading every page to find a topic, you look it up in the index and jump straight to the right page.

Without an index, databases do a full table scan:

-- Query: Find user with email 'john@example.com'
SELECT * FROM users WHERE email = 'john@example.com';

-- Without index: Database reads every single row
-- 100,000 rows scanned, 1,847ms
-- 1,000,000 rows scanned, 18,234ms

With an index on the email column:

-- Same query with index
SELECT * FROM users WHERE email = 'john@example.com';

-- With index: Database jumps directly to the row
-- 100,000 rows in table, 1 row scanned, 3ms
-- 1,000,000 rows in table, 1 row scanned, 3ms

Query time stays constant regardless of table size. That's the power of indexes.

Real Performance Benchmarks

I tested the same queries with and without indexes on a table with 1 million users:

Without Index

-- Query 1: Find by email
SELECT * FROM users WHERE email = 'john@example.com';
-- Time: 1,847ms
-- Rows scanned: 1,000,000

-- Query 2: Find by signup date
SELECT * FROM users WHERE created_at > '2025-01-01';
-- Time: 2,134ms
-- Rows scanned: 1,000,000

-- Query 3: Find by country and active status
SELECT * FROM users WHERE country = 'US' AND active = true;
-- Time: 2,401ms
-- Rows scanned: 1,000,000

With Proper Indexes

-- Create indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_created_at ON users(created_at);
CREATE INDEX idx_users_country_active ON users(country, active);

-- Query 1: Find by email
SELECT * FROM users WHERE email = 'john@example.com';
-- Time: 3ms
-- Rows scanned: 1

-- Query 2: Find by signup date
SELECT * FROM users WHERE created_at > '2025-01-01';
-- Time: 142ms
-- Rows scanned: 45,821 (only matching rows)

-- Query 3: Find by country and active status
SELECT * FROM users WHERE country = 'US' AND active = true;
-- Time: 89ms
-- Rows scanned: 28,934 (only matching rows)

The difference is dramatic: 600x faster for exact matches, 15-25x faster for range queries.

How Indexes Work Under the Hood

Most databases use B-tree indexes (balanced tree):

Without Index - Full Table Scan:
┌─────────────────────────────────────┐
│ users table (sequential scan)      │
│ Row 1: id=1, email=alice@...       │
│ Row 2: id=2, email=bob@...         │ ← Check every row
│ Row 3: id=3, email=charlie@...     │ ← Check every row
│ ...                                 │ ← Check every row
│ Row 999,999: id=999999, email=...  │ ← Check every row
│ Row 1,000,000: id=1000000, email=..│ ← Check every row
└─────────────────────────────────────┘
Time: O(n) - proportional to table size

With Index - B-tree Lookup:
           ┌─────────────┐
           │  m-z range  │
           └──────┬──────┘
                  │
         ┌────────┴────────┐
    ┌────▼───┐      ┌─────▼────┐
    │ m-s    │      │  t-z     │
    └────┬───┘      └─────┬────┘
         │                │
    ┌────▼────┐      ┌────▼────┐
    │ john@.. │      │ zack@.. │
    └─────────┘      └─────────┘

Time: O(log n) - 3-4 lookups for millions of rows

For 1 million rows:

  • Full scan: 1,000,000 comparisons
  • B-tree index: ~20 comparisons

That's why indexes are so much faster.

When to Add Indexes: The Rules

1. Index Columns in WHERE Clauses

-- This query is slow without an index
SELECT * FROM orders WHERE customer_id = 12345;

-- Add index on customer_id
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Now it's fast
-- Before: 2,341ms (1M rows scanned)
-- After: 4ms (147 rows scanned)

If you filter by a column, index it.

2. Index Columns in JOIN Conditions

-- Slow join without indexes
SELECT u.name, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.active = true;

-- Add indexes on both sides of the JOIN
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_users_active ON users(active);

-- Before: 8,234ms
-- After: 287ms (29x faster)

Always index foreign keys used in JOINs.

3. Index Columns in ORDER BY

-- Slow sort without index
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20;

-- Add index
CREATE INDEX idx_posts_created_at ON posts(created_at DESC);

-- Before: 1,456ms (full table scan + sort)
-- After: 12ms (index scan, already sorted)

The index stores data in sorted order, eliminating the sort step.

4. Composite Indexes for Multiple Columns

-- Query filters by multiple columns
SELECT * FROM users
WHERE country = 'US' AND active = true AND age > 21;

-- Option 1: Separate indexes (less efficient)
CREATE INDEX idx_users_country ON users(country);
CREATE INDEX idx_users_active ON users(active);
CREATE INDEX idx_users_age ON users(age);

-- Option 2: Composite index (better)
CREATE INDEX idx_users_country_active_age ON users(country, active, age);

-- Separate indexes: 234ms
-- Composite index: 43ms (5x faster)

Composite indexes work best when columns are queried together.

The Column Order Matters

Composite indexes work left-to-right only:

-- Index on (country, active, age)
CREATE INDEX idx_users_c_a_a ON users(country, active, age);

-- ✅ Uses index - starts with country
SELECT * FROM users WHERE country = 'US';

-- ✅ Uses index - starts with country, active
SELECT * FROM users WHERE country = 'US' AND active = true;

-- ✅ Uses index - all three columns
SELECT * FROM users WHERE country = 'US' AND active = true AND age > 21;

-- ❌ Doesn't use index - skips country
SELECT * FROM users WHERE active = true AND age > 21;

-- ❌ Doesn't use index - starts with age
SELECT * FROM users WHERE age > 21;

Rule: Put the most selective columns first (columns that filter out the most rows).

-- Bad: active is boolean (50/50 split)
CREATE INDEX idx_bad ON users(active, country, age);

-- Good: country is selective (1 of 200 countries)
CREATE INDEX idx_good ON users(country, active, age);

Using EXPLAIN to Find Slow Queries

Every database has an EXPLAIN command that shows how queries execute:

-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'john@example.com';

-- Without index:
Seq Scan on users  (cost=0.00..25000.00 rows=1 width=100) (actual time=1847.234..1847.234 rows=1 loops=1)
  Filter: (email = 'john@example.com')
  Rows Removed by Filter: 999999
Planning Time: 0.123 ms
Execution Time: 1847.456 ms

-- With index:
Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1 width=100) (actual time=0.032..0.033 rows=1 loops=1)
  Index Cond: (email = 'john@example.com')
Planning Time: 0.089 ms
Execution Time: 0.052 ms

Key things to look for:

  • Seq Scan = bad (full table scan)
  • Index Scan = good (using an index)
  • Rows Removed by Filter = wasted work
  • Execution Time = actual query time

Finding Queries That Need Indexes

-- MySQL: Find slow queries
SELECT
    query_time,
    rows_examined,
    rows_sent,
    sql_text
FROM mysql.slow_log
WHERE rows_examined > 10000
ORDER BY query_time DESC;

-- PostgreSQL: Find queries not using indexes
SELECT
    calls,
    total_exec_time,
    mean_exec_time,
    query
FROM pg_stat_statements
WHERE query NOT LIKE '%pg_%'
ORDER BY mean_exec_time DESC
LIMIT 20;

Look for queries with:

  • High execution time
  • Many rows examined
  • Low rows returned (scanning much more than returning)

Common Indexing Mistakes

Mistake 1: Indexing Everything

-- Don't do this
CREATE INDEX idx_users_id ON users(id);              -- Primary key already indexed
CREATE INDEX idx_users_email ON users(email);        -- Good
CREATE INDEX idx_users_name ON users(name);          -- Rarely queried?
CREATE INDEX idx_users_bio ON users(bio);            -- Large text field
CREATE INDEX idx_users_created_at ON users(created_at); -- Good
CREATE INDEX idx_users_updated_at ON users(updated_at); -- Rarely filtered
CREATE INDEX idx_users_last_login ON users(last_login); -- Rarely filtered

Problem: Every index slows down writes.

-- With 7 indexes, every INSERT updates 8 things (table + 7 indexes)
INSERT INTO users (email, name, ...) VALUES (...);
-- Without indexes: 2ms
-- With 7 indexes: 18ms (9x slower)

Rule: Only index columns you actually filter, join, or sort by.

Mistake 2: Indexing Low-Cardinality Columns

-- Bad: boolean column (only 2 possible values)
CREATE INDEX idx_users_active ON users(active);

-- Query: 50% of users are active
SELECT * FROM users WHERE active = true;
-- Index doesn't help - still scans half the table
-- Database might ignore the index and do a full scan anyway

Rule: Index columns with high cardinality (many unique values). Email: good. Boolean: usually bad.

Mistake 3: Wrong Column Order in Composite Indexes

-- Table with 1M users: 200 countries, 50% active
CREATE INDEX idx_bad ON users(active, country);   -- Boolean first
CREATE INDEX idx_good ON users(country, active);  -- Selective first

-- Query
SELECT * FROM users WHERE country = 'US' AND active = true;

-- idx_bad: Filters 500k active users, then searches for US
-- idx_good: Filters 5k US users, then checks active
-- idx_good is 100x more efficient

Mistake 4: Not Using Partial Indexes

-- Bad: Index all rows including inactive
CREATE INDEX idx_users_email ON users(email);

-- Good: Only index active users (smaller, faster)
CREATE INDEX idx_users_email_active ON users(email) WHERE active = true;

-- 90% of queries filter active users anyway
-- Index is 50% smaller, 50% faster

Partial indexes save space and improve performance for common query patterns.

Mistake 5: Indexing Large Text Fields

-- Bad: Full-text column
CREATE INDEX idx_posts_content ON posts(content);
-- Problem: content is 10KB per row, index is massive

-- Good: Full-text search index
CREATE INDEX idx_posts_content_fts ON posts USING GIN(to_tsvector('english', content));
-- Problem solved: specialized index type for text search

Use full-text search indexes (FTS5 in SQLite, GIN in Postgres) for large text fields.

Index Types Beyond B-tree

Full-Text Search Indexes

-- PostgreSQL
CREATE INDEX idx_posts_search ON posts USING GIN(to_tsvector('english', content));

SELECT * FROM posts WHERE to_tsvector('english', content) @@ to_tsquery('database & index');
-- Searches millions of documents in milliseconds

-- SQLite
CREATE VIRTUAL TABLE posts_fts USING fts5(title, content);

SELECT * FROM posts_fts WHERE posts_fts MATCH 'database index';

Hash Indexes (Exact Matches Only)

-- PostgreSQL - faster for equality checks, but can't do ranges
CREATE INDEX idx_users_email_hash ON users USING HASH(email);

-- Works great for
SELECT * FROM users WHERE email = 'exact@match.com';

-- Doesn't work for
SELECT * FROM users WHERE email LIKE 'john%';  -- Won't use hash index

Covering Indexes (Include Columns)

-- Regular index: returns row pointers
CREATE INDEX idx_users_email ON users(email);

SELECT email, name FROM users WHERE email = 'john@example.com';
-- 1. Use index to find row
-- 2. Read row from table to get name
-- Two disk reads

-- Covering index: includes extra columns
CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (name);

SELECT email, name FROM users WHERE email = 'john@example.com';
-- 1. Use index to get both email AND name
-- One disk read, 2x faster

Advanced Patterns

Indexing JSON Fields

-- PostgreSQL: Index JSON field
CREATE INDEX idx_metadata_user_id ON events((metadata->>'user_id'));

SELECT * FROM events WHERE metadata->>'user_id' = '12345';
-- Uses index even though user_id is in JSON

-- Or use GIN for flexible JSON queries
CREATE INDEX idx_metadata_gin ON events USING GIN(metadata);

SELECT * FROM events WHERE metadata @> '{"user_id": "12345"}';

Partial Indexes for Common Queries

-- 95% of queries look for recent orders
CREATE INDEX idx_orders_recent ON orders(created_at)
WHERE created_at > NOW() - INTERVAL '90 days';

-- Index is 10x smaller, much faster to maintain
-- Perfect for time-series data

Expression Indexes

-- Queries often use lowercase email
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- Regular index doesn't help
CREATE INDEX idx_users_email ON users(email);  -- Won't use this

-- Expression index works
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now the query uses the index
-- Before: 1,234ms
-- After: 4ms

Monitoring Index Usage

PostgreSQL

-- Find unused indexes
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Drop unused indexes taking up space
DROP INDEX idx_users_some_unused_column;

MySQL

-- Find unused indexes
SELECT
    object_schema,
    object_name,
    index_name,
    count_star,
    count_read,
    count_fetch
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
  AND count_star = 0
ORDER BY object_schema, object_name;

SQLite

-- SQLite doesn't track index usage
-- Use EXPLAIN QUERY PLAN to verify indexes are used
EXPLAIN QUERY PLAN
SELECT * FROM users WHERE email = 'john@example.com';

-- Look for "USING INDEX" in the output
SEARCH users USING INDEX idx_users_email (email=?)

Index Maintenance

Indexes need maintenance over time:

-- PostgreSQL: Rebuild bloated indexes
REINDEX INDEX idx_users_email;

-- Or rebuild all indexes on a table
REINDEX TABLE users;

-- Analyze statistics for query planner
ANALYZE users;

-- MySQL: Optimize table (rebuilds indexes)
OPTIMIZE TABLE users;

-- SQLite: Analyze for query planner
ANALYZE;

-- Rebuild indexes if needed
REINDEX;

Schedule maintenance during low-traffic periods:

#!/bin/bash
# Weekly index maintenance (runs Sunday 2 AM)

# PostgreSQL
psql -c "REINDEX DATABASE production;"
psql -c "ANALYZE;"

# MySQL
mysql -e "OPTIMIZE TABLE users, posts, orders;"

Real-World Example: E-commerce Site

Here's how I indexed an e-commerce database:

-- Products table
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    sku VARCHAR(50) UNIQUE NOT NULL,
    name VARCHAR(200) NOT NULL,
    category_id INTEGER NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    in_stock BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Query patterns and indexes
-- 1. Search by SKU (exact match)
CREATE UNIQUE INDEX idx_products_sku ON products(sku);

-- 2. Browse by category, only show in-stock
CREATE INDEX idx_products_category_stock ON products(category_id, in_stock)
WHERE in_stock = true;  -- Partial index

-- 3. Sort by price within category
CREATE INDEX idx_products_category_price ON products(category_id, price);

-- 4. Find new products
CREATE INDEX idx_products_created_at ON products(created_at DESC);

-- Orders table
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    status VARCHAR(20) NOT NULL,
    total DECIMAL(10,2) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Query patterns and indexes
-- 1. User's orders
CREATE INDEX idx_orders_user_id ON orders(user_id, created_at DESC);

-- 2. Pending orders (admin dashboard)
CREATE INDEX idx_orders_status ON orders(status, created_at DESC)
WHERE status IN ('pending', 'processing');  -- Partial index

-- 3. Sales reports (date range)
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Performance results:
-- User order history: 3ms (was 847ms)
-- Product search: 12ms (was 1,234ms)
-- Admin dashboard: 34ms (was 3,456ms)

Quick Reference Guide

When to Add an Index

✅ Column used in WHERE clause frequently ✅ Column used in JOIN conditions (foreign keys) ✅ Column used in ORDER BY ✅ Column with high cardinality (many unique values) ✅ Columns queried together (use composite index)

❌ Primary keys (automatically indexed) ❌ Low cardinality columns (booleans, small enums) ❌ Columns rarely queried ❌ Large text fields (use full-text index instead) ❌ Tables with heavy write workload and few reads

Index Types Cheat Sheet

Type Use Case Example
B-tree General purpose, default Most indexes
Hash Exact equality only Hash lookups
GIN/GiST Full-text search, JSON Document search
Covering Avoid table lookups Include extra columns
Partial Subset of rows WHERE active = true
Expression Functions in queries LOWER(email)

Index Size Guidelines

-- Check index sizes
-- PostgreSQL
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- MySQL
SELECT
    table_name,
    index_name,
    ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 2) as size_mb
FROM mysql.innodb_index_stats
WHERE stat_name = 'size'
ORDER BY stat_value DESC;

Rule of thumb:

  • Indexes should be 20-30% of table size
  • If indexes are larger than the table, you have too many

My Indexing Workflow

When I encounter a slow query:

  1. Measure: Use EXPLAIN to see execution plan
  2. Identify: Look for Seq Scan on large tables
  3. Analyze: What columns are in WHERE, JOIN, ORDER BY?
  4. Create: Add appropriate index
  5. Test: Run EXPLAIN again, verify Index Scan
  6. Benchmark: Measure actual query time improvement
  7. Monitor: Check index is actually being used in production
-- Example workflow
-- 1. Slow query identified
SELECT * FROM orders WHERE user_id = 12345 AND status = 'pending';
-- Time: 2,341ms

-- 2. Check execution plan
EXPLAIN SELECT * FROM orders WHERE user_id = 12345 AND status = 'pending';
-- Result: Seq Scan on orders (cost=0.00..25000.00)

-- 3. Add index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- 4. Verify with EXPLAIN
EXPLAIN SELECT * FROM orders WHERE user_id = 12345 AND status = 'pending';
-- Result: Index Scan using idx_orders_user_status (cost=0.42..8.44)

-- 5. Benchmark
SELECT * FROM orders WHERE user_id = 12345 AND status = 'pending';
-- Time: 4ms (585x improvement)

The Bottom Line

Indexes are the easiest way to make databases fast. One well-placed index can turn a 10-second query into a 10-millisecond query.

The key principles:

  • Index columns you filter, join, or sort by
  • Use EXPLAIN to identify slow queries
  • Don't over-index - every index slows down writes
  • Put selective columns first in composite indexes
  • Monitor and remove unused indexes

Start with these basics and you'll solve 90% of database performance problems. The remaining 10% requires query optimization, database tuning, and sometimes just better hardware.

But indexes first. They're the biggest win for the least effort.