Database Indexing Fundamentals in System Design

Learn Database Indexing from a System Design perspective. This guide explains how database indexes work, B-Tree and Hash indexes, clustered and non-clustered indexes, composite indexes, covering indexes, query optimization, execution plans, and real-world examples from Amazon, Banking, Netflix, and Uber.

Introduction

Imagine you have a Customers table with 100 million records.

A customer logs into your banking application.

The application executes:

SELECT * FROM customers
WHERE customer_id = 100001;

How does the database find one row among 100 million rows?

Without an index,

the database may scan every row.

100 Million Rows

↓

Sequential Scan

↓

Slow Response

With an index,

the database jumps directly to the required row.

Index

↓

Locate Row

↓

Fast Response

Indexes are one of the most powerful performance optimization techniques in relational databases.

Learning Objectives

After completing this article, you will understand:

What is an Index?
Why Indexes are Important
Sequential Scan vs Index Scan
B-Tree Index
Hash Index
Clustered Index
Non-Clustered Index
Composite Index
Covering Index
Query Execution Plan
Best Practices

What is a Database Index?

A Database Index is a data structure that improves the speed of retrieving rows.

Think of it like the index in a book.

Instead of reading every page,

you jump directly to the required topic.

Book Example

Without Index

flowchart TD
    A["Open Book"]
    B["Page 1"]
    C["Page 2"]
    D["..."]
    E["Page 500"]

    A --> B
    B --> C
    C --> D
    D --> E

With Index

flowchart TD
    A["Index"]
    B["Java"]
    C["Page 350"]

    A --> B
    B --> C

Exactly how database indexes work.

Without Index

flowchart TD
    APP["Application"]
    DB["Database"]
    SCAN["Sequential Scan"]
    ROWS["100 Million Rows"]

    APP --> DB
    DB --> SCAN
    SCAN --> ROWS

Every row is checked.

With Index

flowchart TD
    APP["Application"]
    IDX["Index"]
    ROW["Target Row"]
    DB["Database"]

    APP --> IDX
    IDX --> ROW
    ROW --> DB

Only a few pages are accessed.

Sequential Scan

Example

SELECT *
FROM customers
WHERE email='[email protected]';

Without index

Row 1

Row 2

Row 3

...

Row 100 Million

Time complexity

O(n)

Index Scan

With an index

Index

↓

Target Row

Time complexity

O(log n)

Much faster.

B-Tree Index

The default index type in most relational databases.

flowchart TD
    ROOT["50"]

    LEFT["25"]
    RIGHT["75"]

    ROOT --> LEFT
    ROOT --> RIGHT

    LEFT --> L1["10"]
    LEFT --> L2["30"]

    RIGHT --> R1["60"]
    RIGHT --> R2["90"]

Advantages

Fast search
Fast sorting
Supports range queries

Used by

PostgreSQL
MySQL
Oracle
SQL Server

B-Tree Search

Searching

Database traverses

Root

↓

75

↓

Found

Instead of scanning every row.

Hash Index

Hash indexes use a hash function.

flowchart LR
    KEY["Customer ID = 1001"]
    HASH["Hash Function"]
    BUCKET["Bucket #42"]
    ROW["Target Row"]

    KEY --> HASH
    HASH --> BUCKET
    BUCKET --> ROW

Excellent for

WHERE customer_id=1001

Not suitable for

ORDER BY

BETWEEN

B-Tree vs Hash

Feature	B-Tree	Hash
Equality Search	Excellent	Excellent
Range Queries	Yes	No
ORDER BY	Yes	No
Default Index	Yes	No

Clustered Index

Rows are stored physically in index order.

flowchart TD
    IDX["Clustered Index"]
    R1["Row 1"]
    R2["Row 2"]
    R3["Row 3"]
    R4["Row 4"]

    IDX --> R1
    R1 --> R2
    R2 --> R3
    R3 --> R4

Only one clustered index per table.

Example

Primary Key.

Non-Clustered Index

Data remains separate.

flowchart LR
    IDX["Non-Clustered Index"]
    PTR["Row Pointer"]
    TABLE["Actual Table Data"]

    IDX --> PTR
    PTR --> TABLE

Multiple non-clustered indexes are allowed.

Clustered vs Non-Clustered

Clustered	Non-Clustered
Data sorted	Pointer to data
One per table	Multiple allowed
Fast range scans	Good for lookups

Composite Index

Index on multiple columns.

Example

CREATE INDEX idx_customer
ON customers(last_name, city);

Works well for

WHERE last_name='Smith'
AND city='Dallas';

Composite Index Flow

flowchart LR
    A["Last Name"]
    B["City"]
    IDX["Composite Index"]
    ROW["Target Row"]

    A --> IDX
    B --> IDX
    IDX --> ROW

Covering Index

Index contains all required columns.

Query

SELECT name,email
FROM customers
WHERE customer_id=1001;

If the index already stores

customer_id
name
email

Database avoids reading the table.

Very fast.

Unique Index

Guarantees uniqueness.

CREATE UNIQUE INDEX
idx_email
ON customers(email);

Prevents duplicate emails.

Query Execution Plan

Every database decides

how to execute queries.

flowchart LR
    SQL["SQL Query"]
    OPT["Optimizer"]
    PLAN["Execution Plan"]
    EXEC["Execute"]
    RESULT["Result"]

    SQL --> OPT
    OPT --> PLAN
    PLAN --> EXEC
    EXEC --> RESULT

EXPLAIN Example

EXPLAIN
SELECT *
FROM customers
WHERE customer_id=1001;

Output

Index Scan

Using idx_customer

Without index

Sequential Scan

Banking Example

Query

SELECT *
FROM accounts
WHERE account_number='123456789';

Account Number should always be indexed.

Otherwise,

customer login becomes slow.

Amazon Example

Product ID

SKU

Category

Indexes exist on

Product ID
Seller ID
Category

for fast searches.

Netflix Example

Indexes on

Movie ID
Genre
Release Year

speed up catalog searches.

Uber Example

Indexes on

Driver ID
Trip ID
Rider ID

allow fast ride lookups.

Spring Boot Architecture

flowchart LR
    REACT["React"]
    SPRING["Spring Boot"]
    JPA["Spring Data JPA"]
    TABLE["PostgreSQL Table"]
    INDEX["B-Tree Index"]

    REACT --> SPRING
    SPRING --> JPA
    JPA --> TABLE
    INDEX -. "Speeds up queries" .-> TABLE

JPA queries automatically benefit from properly designed database indexes.

Good Index Candidates

Create indexes on

Primary Keys
Foreign Keys
Frequently searched columns
JOIN columns
ORDER BY columns
WHERE clause columns

Avoid Indexing

Avoid indexes on

Small tables
Frequently updated columns
Boolean columns
Columns with very low selectivity
Temporary tables

Advantages

Faster SELECT queries
Faster JOIN operations
Better sorting
Better filtering
Lower CPU usage
Improved scalability

Disadvantages

Increased storage
Slower INSERT
Slower UPDATE
Slower DELETE
Additional maintenance

Every index must be updated whenever indexed data changes.

Monitoring

Monitor

Slow Queries
Sequential Scans
Index Usage
Query Latency
Lock Wait Time
Deadlocks
CPU Usage
Buffer Cache Hit Ratio

Tools

PostgreSQL EXPLAIN ANALYZE
pg_stat_statements
MySQL EXPLAIN
Oracle AWR Reports
Datadog
Grafana

Common Mistakes

❌ Indexing every column

❌ Ignoring slow query logs

❌ Duplicate indexes

❌ Missing indexes on JOIN columns

❌ Not reviewing execution plans

❌ Keeping unused indexes

Best Practices

Index frequently queried columns.
Use B-Tree indexes for general-purpose queries.
Use composite indexes carefully based on query patterns.
Remove unused indexes periodically.
Analyze execution plans before creating indexes.
Monitor slow queries continuously.
Keep indexes as small as possible.
Rebuild fragmented indexes when necessary.

Common Interview Questions

What is a Database Index?

A database index is a data structure that allows the database to locate rows quickly without scanning the entire table.

Why are B-Tree indexes the default?

B-Tree indexes support equality searches, range queries, sorting, and ordered traversal, making them suitable for a wide variety of workloads.

What is the difference between a Clustered and a Non-Clustered Index?

A clustered index determines the physical order of rows in a table, while a non-clustered index stores pointers to the actual rows. A table typically has one clustered index but can have many non-clustered indexes.

What is a Composite Index?

A composite index contains multiple columns and is most effective when queries filter or sort using those columns in the same order as the index definition.

Why shouldn't every column be indexed?

Each additional index consumes storage and increases the cost of INSERT, UPDATE, and DELETE operations because the database must maintain every index when data changes.

Summary

Database indexes are fundamental to high-performance applications. They reduce query execution time, improve scalability, and enable efficient searching, sorting, and joining of large datasets.

In this article, we covered:

Index fundamentals
Sequential Scan vs Index Scan
B-Tree
Hash Index
Clustered & Non-Clustered Indexes
Composite Index
Covering Index
Query Execution Plans
Banking, Amazon, Netflix, and Uber examples
Monitoring
Best practices

Proper indexing is one of the highest-impact optimizations in database design. Combined with efficient SQL queries, caching, and good schema design, indexes enable applications to serve millions of requests with low latency while minimizing database resource consumption.

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...