Database Indexing | System Design Library

Why Indexes Matter

Without an index, a database performs a full table scan — reading every row to find matches. With a billion rows, that's fatal to performance. An index lets the database jump directly to matching rows in O(log N) time.

-- Without index: full table scan across billions of rows
SELECT * FROM users WHERE email = 'alice@example.com';

-- With index on email: O(log N) lookup
CREATE INDEX idx_users_email ON users(email);

B-Tree Index (Default)

Most databases default to B-Tree indexes. A balanced tree structure sorted by the indexed column.

Supports:

Equality lookups: WHERE user_id = 123
Range queries: WHERE age BETWEEN 20 AND 30
Prefix matching: WHERE name LIKE 'Ali%'
ORDER BY (if index order matches)

Doesn't support: Suffix matching (WHERE name LIKE '%son'), inequality on multiple columns simultaneously

Hash Index

Stores a hash of each key. Only useful for exact equality lookups — no range support.

Faster than B-Tree for equality lookups
Cannot support range queries or sorting
Used by: Redis, some PostgreSQL operators

Composite Indexes (Multi-Column)

Indexes on multiple columns together.

CREATE INDEX idx_user_created ON posts(user_id, created_at);

Crucial rule — leftmost prefix principle: The index is used only when the query filters on the leading columns of the index.

-- Uses index (user_id is leftmost)
SELECT * FROM posts WHERE user_id = 5 AND created_at > '2024-01-01';

-- Uses index (user_id alone)
SELECT * FROM posts WHERE user_id = 5;

-- CANNOT use index (skips user_id)
SELECT * FROM posts WHERE created_at > '2024-01-01';

Covering Index

A covering index includes all columns needed by a query, eliminating the need to look up the actual row.

-- If we frequently run this query:
SELECT user_id, created_at FROM posts WHERE user_id = 5;

-- A covering index on (user_id, created_at) serves the query entirely from the index
CREATE INDEX idx_covering ON posts(user_id, created_at);

Covering indexes can dramatically improve performance for read-heavy queries.

Index Trade-offs

| Aspect | Effect | |---|---| | Read performance | Dramatically improved | | Write performance | Slightly degraded (index must be updated on every insert/update/delete) | | Storage | Additional disk space per index | | Too many indexes | Can hurt write-heavy workloads |

Rule: Only index columns you actually query on. Every index has a write penalty.

When to Add an Index

Columns in WHERE clauses with high selectivity (many distinct values)
Columns used in JOIN conditions
Columns in ORDER BY or GROUP BY clauses
Foreign key columns (most ORMs do this automatically)

Advanced Index Types

Full-text index: For text search (MATCH AGAINST in MySQL, tsvector in PostgreSQL)
Partial index: Index only a subset of rows (WHERE active = true) — smaller and faster
Expression index: Index the result of a function (LOWER(email))
GIN/GiST: PostgreSQL indexes for arrays, JSONB, geometric data, full-text

Interview Tips

"Add an index" is often the right first answer for slow query problems before reaching for architectural changes
Know the leftmost prefix rule for composite indexes cold — it's frequently misunderstood
Index selectivity matters: an index on gender (2 values) is nearly useless; an index on email (billions of unique values) is very effective
Discuss the write overhead of indexes — in write-heavy systems, too many indexes can hurt performance
EXPLAIN ANALYZE is how you verify whether an index is being used

Database DesignDatabase Indexing