Database DesignDatabase Indexing

Indexes speed up data retrieval by maintaining a separate, sorted data structure that maps column values to row locations. Good indexing is the fastest way to fix slow database queries without changing your architecture.

Why Indexes Matter

Without an index, a database performs a full table scan — reading every row to find matches. With a billion rows, that's fatal to performance. An index lets the database jump directly to matching rows in O(log N) time.

-- Without index: full table scan across billions of rows
SELECT * FROM users WHERE email = 'alice@example.com';

-- With index on email: O(log N) lookup
CREATE INDEX idx_users_email ON users(email);

B-Tree Index (Default)

Most databases default to B-Tree indexes. A balanced tree structure sorted by the indexed column.

Supports:

  • Equality lookups: WHERE user_id = 123
  • Range queries: WHERE age BETWEEN 20 AND 30
  • Prefix matching: WHERE name LIKE 'Ali%'
  • ORDER BY (if index order matches)

Doesn't support: Suffix matching (WHERE name LIKE '%son'), inequality on multiple columns simultaneously

Hash Index

Stores a hash of each key. Only useful for exact equality lookups — no range support.

  • Faster than B-Tree for equality lookups
  • Cannot support range queries or sorting
  • Used by: Redis, some PostgreSQL operators

Composite Indexes (Multi-Column)

Indexes on multiple columns together.

CREATE INDEX idx_user_created ON posts(user_id, created_at);

Crucial rule — leftmost prefix principle: The index is used only when the query filters on the leading columns of the index.

-- Uses index (user_id is leftmost)
SELECT * FROM posts WHERE user_id = 5 AND created_at > '2024-01-01';

-- Uses index (user_id alone)
SELECT * FROM posts WHERE user_id = 5;

-- CANNOT use index (skips user_id)
SELECT * FROM posts WHERE created_at > '2024-01-01';

Covering Index

A covering index includes all columns needed by a query, eliminating the need to look up the actual row.

-- If we frequently run this query:
SELECT user_id, created_at FROM posts WHERE user_id = 5;

-- A covering index on (user_id, created_at) serves the query entirely from the index
CREATE INDEX idx_covering ON posts(user_id, created_at);

Covering indexes can dramatically improve performance for read-heavy queries.

Index Trade-offs

| Aspect | Effect | |---|---| | Read performance | Dramatically improved | | Write performance | Slightly degraded (index must be updated on every insert/update/delete) | | Storage | Additional disk space per index | | Too many indexes | Can hurt write-heavy workloads |

Rule: Only index columns you actually query on. Every index has a write penalty.

When to Add an Index

  1. Columns in WHERE clauses with high selectivity (many distinct values)
  2. Columns used in JOIN conditions
  3. Columns in ORDER BY or GROUP BY clauses
  4. Foreign key columns (most ORMs do this automatically)

Advanced Index Types

  • Full-text index: For text search (MATCH AGAINST in MySQL, tsvector in PostgreSQL)
  • Partial index: Index only a subset of rows (WHERE active = true) — smaller and faster
  • Expression index: Index the result of a function (LOWER(email))
  • GIN/GiST: PostgreSQL indexes for arrays, JSONB, geometric data, full-text

Interview Tips

  • "Add an index" is often the right first answer for slow query problems before reaching for architectural changes
  • Know the leftmost prefix rule for composite indexes cold — it's frequently misunderstood
  • Index selectivity matters: an index on gender (2 values) is nearly useless; an index on email (billions of unique values) is very effective
  • Discuss the write overhead of indexes — in write-heavy systems, too many indexes can hurt performance
  • EXPLAIN ANALYZE is how you verify whether an index is being used