Full Stack • Java • System Design • Cloud • AI Engineering

SQL2026-06-12

SQL Indexes: Complete Guide to Database Performance

Master SQL indexes including B-Tree, clustered, non-clustered, composite, and covering indexes with visual diagrams and performance optimization techniques

SQL Indexes

📋 Table of Contents

  1. Introduction to Indexes
  2. How Indexes Work
  3. Index Types
  4. Creating Indexes
  5. Composite Indexes
  6. Index Performance
  7. Best Practices

Introduction to Indexes

SQL Indexes are data structures that improve query performance by allowing faster data retrieval.

┌─────────────────────────────────────────────────────────┐
│              Without Index vs With Index                 │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  WITHOUT INDEX (Full Table Scan)                        │
│  ┌──────────────────────────────────────────┐          │
│  │ Row 1 → Row 2 → Row 3 → ... → Row 1000  │          │
│  └──────────────────────────────────────────┘          │
│  Time: O(n) - Linear search                             │
│                                                          │
│  WITH INDEX (B-Tree Lookup)                             │
│  ┌──────────────────────────────────────────┐          │
│  │        Root → Branch → Leaf              │          │
│  └──────────────────────────────────────────┘          │
│  Time: O(log n) - Logarithmic search                    │
│                                                          │
└─────────────────────────────────────────────────────────┘

Why Use Indexes?

Benefit Description Impact
Faster Queries Reduces data scan time 10-100x faster
Efficient Sorting Pre-sorted data structure Eliminates sort operations
Quick Lookups Direct data access Constant time for unique keys
Join Performance Speeds up table joins Significant improvement

Trade-offs

Pros Cons
✅ Faster SELECT queries ❌ Slower INSERT/UPDATE/DELETE
✅ Efficient WHERE clauses ❌ Additional storage space
✅ Quick ORDER BY ❌ Index maintenance overhead
✅ Improved JOIN performance ❌ Can slow down writes

How Indexes Work

B-Tree Index Structure

                    [50]
                   /    \
                  /      \
              [25]        [75]
             /    \      /    \
           /      \    /      \
        [10,20] [30,40] [60,70] [80,90]
         |  |    |  |    |  |    |  |
        Data    Data    Data    Data

B-Tree Properties:

  • Balanced tree structure
  • Sorted keys for efficient searching
  • Logarithmic search time O(log n)
  • Self-balancing on insertions/deletions

Index Lookup Process

┌─────────────────────────────────────────────────────────┐
│              Index Lookup Steps                          │
└─────────────────────────────────────────────────────────┘

1. Query: SELECT * FROM users WHERE user_id = 42

2. Index Traversal:
   Root Node    → Compare 42 with 50 → Go LEFT
   Branch Node  → Compare 42 with 25 → Go RIGHT
   Leaf Node    → Find 42 → Get Row Pointer

3. Data Retrieval:
   Follow pointer → Fetch actual row data

Total: 3-4 disk reads vs 1000+ without index

Index Types

1. Clustered Index

Physical order of data matches index order. One per table.

┌─────────────────────────────────────────────────────────┐
│              Clustered Index                             │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Index (Primary Key)        Actual Data                 │
│  ┌────┐                     ┌────────────┐             │
│  │ 1  │ ──────────────────→ │ Row 1 Data │             │
│  │ 2  │ ──────────────────→ │ Row 2 Data │             │
│  │ 3  │ ──────────────────→ │ Row 3 Data │             │
│  └────┘                     └────────────┘             │
│                                                          │
│  Data is physically sorted by index key                 │
│                                                          │
└─────────────────────────────────────────────────────────┘
-- Clustered index (automatically created with PRIMARY KEY)
CREATE TABLE users (
    user_id INT PRIMARY KEY,  -- Clustered index
    username VARCHAR(50),
    email VARCHAR(100)
);

2. Non-Clustered Index

Separate structure with pointers to data. Multiple per table.

┌─────────────────────────────────────────────────────────┐
│            Non-Clustered Index                           │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Index (email)              Actual Data                 │
│  ┌──────────────┐           ┌────────────┐             │
│  │ alice@...    │ ────────→ │ Row 3 Data │             │
│  │ bob@...      │ ────────→ │ Row 1 Data │             │
│  │ charlie@...  │ ────────→ │ Row 2 Data │             │
│  └──────────────┘           └────────────┘             │
│                                                          │
│  Index is separate, points to data location             │
│                                                          │
└─────────────────────────────────────────────────────────┘
-- Non-clustered index
CREATE INDEX idx_email ON users(email);
CREATE INDEX idx_username ON users(username);

3. Unique Index

Ensures column values are unique.

-- Unique index
CREATE UNIQUE INDEX idx_email_unique ON users(email);

-- Automatically created with UNIQUE constraint
ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);

4. Composite Index

Index on multiple columns.

-- Composite index (order matters!)
CREATE INDEX idx_name ON users(last_name, first_name);

-- Efficient for:
-- ✅ WHERE last_name = 'Smith'
-- ✅ WHERE last_name = 'Smith' AND first_name = 'John'
-- ❌ WHERE first_name = 'John' (doesn't use index)

5. Covering Index

Index contains all columns needed by query.

-- Covering index
CREATE INDEX idx_user_details ON users(user_id, username, email);

-- This query uses only the index (no table access)
SELECT user_id, username, email
FROM users
WHERE user_id = 42;

Creating Indexes

Basic Syntax

-- Create index
CREATE INDEX index_name ON table_name(column_name);

-- Create unique index
CREATE UNIQUE INDEX index_name ON table_name(column_name);

-- Create composite index
CREATE INDEX index_name ON table_name(col1, col2, col3);

Practical Examples

-- Single column index
CREATE INDEX idx_email ON customers(email);

-- Composite index for common query patterns
CREATE INDEX idx_order_lookup ON orders(customer_id, order_date);

-- Covering index for specific query
CREATE INDEX idx_order_summary 
ON orders(customer_id, order_date, total_amount, status);

-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE is_active = TRUE;

-- Descending index
CREATE INDEX idx_recent_orders ON orders(order_date DESC);

Viewing Indexes

-- MySQL
SHOW INDEX FROM users;

-- SQL Server
EXEC sp_helpindex 'users';

-- PostgreSQL
\d users
SELECT * FROM pg_indexes WHERE tablename = 'users';

-- Check index usage
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';

Dropping Indexes

-- Drop index
DROP INDEX idx_email ON users;  -- MySQL
DROP INDEX idx_email;            -- PostgreSQL, SQL Server

Composite Indexes

Column Order Matters

┌─────────────────────────────────────────────────────────┐
│          Composite Index: (A, B, C)                      │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Can be used for:                                       │
│  ✅ WHERE A = ?                                         │
│  ✅ WHERE A = ? AND B = ?                               │
│  ✅ WHERE A = ? AND B = ? AND C = ?                     │
│                                                          │
│  Cannot be used for:                                    │
│  ❌ WHERE B = ?                                         │
│  ❌ WHERE C = ?                                         │
│  ❌ WHERE B = ? AND C = ?                               │
│                                                          │
└─────────────────────────────────────────────────────────┘

Example

-- Create composite index
CREATE INDEX idx_user_activity ON user_logs(user_id, action_date, action_type);

-- ✅ Uses index (left-most prefix)
SELECT * FROM user_logs WHERE user_id = 123;
SELECT * FROM user_logs WHERE user_id = 123 AND action_date = '2024-01-01';
SELECT * FROM user_logs WHERE user_id = 123 AND action_date = '2024-01-01' AND action_type = 'login';

-- ❌ Doesn't use index (skips left-most column)
SELECT * FROM user_logs WHERE action_date = '2024-01-01';
SELECT * FROM user_logs WHERE action_type = 'login';

Choosing Column Order

-- Order by: Equality → Range → Sort
CREATE INDEX idx_orders ON orders(
    customer_id,    -- 1. Equality (WHERE customer_id = ?)
    status,         -- 2. Equality (WHERE status = ?)
    order_date      -- 3. Range/Sort (WHERE order_date > ? ORDER BY order_date)
);

Index Performance

Query Performance Comparison

┌─────────────────────────────────────────────────────────┐
│         Performance: No Index vs Index                   │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Table Size: 1,000,000 rows                             │
│                                                          │
│  Query: SELECT * FROM users WHERE email = 'user@...'   │
│                                                          │
│  Without Index:                                         │
│  ├─ Full table scan: 1,000,000 rows                    │
│  ├─ Time: ~2.5 seconds                                 │
│  └─ Disk I/O: High                                     │
│                                                          │
│  With Index:                                            │
│  ├─ Index lookup: ~20 comparisons                      │
│  ├─ Time: ~0.001 seconds                               │
│  └─ Disk I/O: Minimal                                  │
│                                                          │
│  Performance Gain: 2500x faster                         │
│                                                          │
└─────────────────────────────────────────────────────────┘

When Indexes Help

-- ✅ Indexes help with:
SELECT * FROM users WHERE user_id = 123;           -- Equality
SELECT * FROM users WHERE age > 25;                -- Range
SELECT * FROM users ORDER BY created_at DESC;      -- Sorting
SELECT * FROM users WHERE email LIKE 'john%';      -- Prefix search

-- ❌ Indexes don't help with:
SELECT * FROM users WHERE email LIKE '%@gmail.com'; -- Wildcard at start
SELECT * FROM users WHERE YEAR(created_at) = 2024;  -- Function on column
SELECT * FROM small_table WHERE ...;                -- Small tables

Index Selectivity

-- High selectivity (good for indexing)
CREATE INDEX idx_email ON users(email);        -- Unique values
CREATE INDEX idx_ssn ON users(ssn);            -- Unique values

-- Low selectivity (poor for indexing)
-- Don't index: gender (2 values), boolean flags, status with few values
-- CREATE INDEX idx_gender ON users(gender);   -- ❌ Not recommended

Best Practices

1. Index Foreign Keys

-- ✅ GOOD: Index foreign keys for joins
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
CREATE INDEX idx_customer_id ON orders(customer_id);

2. Index WHERE Clause Columns

-- Analyze common queries
-- Query: SELECT * FROM orders WHERE status = 'pending' AND created_at > '2024-01-01'

-- ✅ GOOD: Create composite index
CREATE INDEX idx_status_date ON orders(status, created_at);

3. Use Covering Indexes for Frequent Queries

-- Frequent query
SELECT order_id, customer_id, total_amount
FROM orders
WHERE customer_id = 123;

-- ✅ GOOD: Covering index (includes all selected columns)
CREATE INDEX idx_customer_covering 
ON orders(customer_id, order_id, total_amount);

4. Avoid Over-Indexing

CREATE INDEX idx_location ON users(state, city);


### 5. Monitor Index Usage

```sql
-- MySQL: Check index usage
SELECT 
    TABLE_NAME,
    INDEX_NAME,
    SEQ_IN_INDEX,
    COLUMN_NAME
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA = 'your_database';

-- Find unused indexes (PostgreSQL)
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY tablename;

6. Rebuild Fragmented Indexes

-- MySQL: Optimize table
OPTIMIZE TABLE users;

-- SQL Server: Rebuild index
ALTER INDEX idx_email ON users REBUILD;

-- PostgreSQL: Reindex
REINDEX TABLE users;

Index Decision Tree

Should I create an index?
├─ Is column used in WHERE/JOIN? → YES
│  ├─ High cardinality (many unique values)? → YES → CREATE INDEX
│  └─ Low cardinality (few unique values)? → NO → Skip
├─ Is column used in ORDER BY? → YES → CREATE INDEX
├─ Is table small (< 1000 rows)? → NO → Skip
└─ Will it slow down writes significantly? → YES → Reconsider

Summary

This guide covered:

  • ✅ Index fundamentals and B-Tree structure
  • ✅ Clustered vs non-clustered indexes
  • ✅ Composite and covering indexes
  • ✅ Index performance impact
  • ✅ Best practices for index management

Key Takeaways:

  1. Index columns used in WHERE, JOIN, and ORDER BY
  2. Use composite indexes for multi-column queries
  3. Column order matters in composite indexes
  4. Monitor and remove unused indexes
  5. Balance read performance vs write overhead
  6. Rebuild fragmented indexes regularly