Full Stack • Java • System Design • Cloud • AI Engineering

SQL Indexes

Master SQL indexes with Mermaid diagrams covering B-Tree, clustered, non-clustered, composite, and covering indexes with optimization techniques

SQL Indexes are data structures that dramatically improve query performance by enabling fast data retrieval. They work like a book's index, allowing direct access to data without scanning entire tables.

Index Performance Impact

graph LR
    A[Query Execution] --> B{Index Available?}
    B -->|No| C[Full Table Scan]
    B -->|Yes| D[Index Lookup]
    
    C --> C1[Scan all rows]
    C --> C2[O n complexity]
    C --> C3[Slow 2-5 seconds]
    
    D --> D1[B-Tree traversal]
    D --> D2[O log n complexity]
    D --> D3[Fast 0.001 seconds]
    
    style C fill:#F44336
    style C3 fill:#F44336
    style D fill:#4CAF50
    style D3 fill:#4CAF50

Key Points:

  • Without Index: Full table scan, O(n) complexity, slow for large tables
  • With Index: B-Tree lookup, O(log n) complexity, 100-1000x faster
  • Trade-off: Faster reads but slower writes (INSERT, UPDATE, DELETE)
  • Storage: Indexes require additional disk space
  • Maintenance: Database automatically maintains index structure

B-Tree Index Structure

graph TB
    A[Root Node: 50] --> B[Branch: 25]
    A --> C[Branch: 75]
    B --> D[Leaf: 10, 20]
    B --> E[Leaf: 30, 40]
    C --> F[Leaf: 60, 70]
    C --> G[Leaf: 80, 90]
    
    D --> D1[Data Pointers]
    E --> E1[Data Pointers]
    F --> F1[Data Pointers]
    G --> G1[Data Pointers]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#4CAF50
    style D fill:#FF9800

Key Points:

  • Balanced Tree: All leaf nodes at same depth
  • Sorted Keys: Enables binary search efficiency
  • Self-Balancing: Maintains balance on insert/delete
  • Logarithmic Search: O(log n) lookup time
  • Leaf Nodes: Contain actual data pointers

Index Types Overview

graph TB
    A[SQL Index Types] --> B[Clustered]
    A --> C[Non-Clustered]
    A --> D[Unique]
    A --> E[Composite]
    A --> F[Covering]
    
    B --> B1[Physical data order]
    B --> B2[One per table]
    
    C --> C1[Separate structure]
    C --> C2[Multiple per table]
    
    D --> D1[Enforces uniqueness]
    D --> D2[No duplicates]
    
    E --> E1[Multiple columns]
    E --> E2[Order matters]
    
    F --> F1[Includes all query columns]
    F --> F2[No table access needed]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#F44336
    style F fill:#00BCD4

Key Points:

  • Clustered: Data physically sorted by index key, one per table
  • Non-Clustered: Separate index with pointers, multiple allowed
  • Unique: Ensures no duplicate values in indexed column(s)
  • Composite: Index on multiple columns, left-most prefix rule
  • Covering: Contains all columns needed by query

Clustered vs Non-Clustered

sequenceDiagram
    participant Q as Query
    participant CI as Clustered Index
    participant NCI as Non-Clustered Index
    participant Data as Data Pages
    
    Note over Q,Data: Clustered Index Lookup
    Q->>CI: Search for key
    CI->>Data: Direct access (data is index)
    Data->>Q: Return row
    
    Note over Q,Data: Non-Clustered Index Lookup
    Q->>NCI: Search for key
    NCI->>NCI: Find pointer
    NCI->>Data: Follow pointer to data
    Data->>Q: Return row

Key Points:

  • Clustered: One lookup, data is physically ordered
  • Non-Clustered: Two lookups, index then data
  • Primary Key: Usually creates clustered index automatically
  • Performance: Clustered faster for range queries
  • Flexibility: Non-clustered allows multiple indexes

Composite Index Left-Most Prefix

flowchart TB
    A[Composite Index: A, B, C] --> B{Query Uses A?}
    B -->|Yes| C{Query Uses B?}
    B -->|No| D[Index NOT Used]
    
    C -->|Yes| E{Query Uses C?}
    C -->|No| F[Index Used: A]
    
    E -->|Yes| G[Index Used: A, B, C]
    E -->|No| H[Index Used: A, B]
    
    style G fill:#4CAF50
    style H fill:#4CAF50
    style F fill:#4CAF50
    style D fill:#F44336

Key Points:

  • Left-Most Rule: Must use first column in index
  • Partial Use: Can use prefix (A) or (A,B) but not (B) or (C)
  • Column Order: Critical for index effectiveness
  • Query Patterns: Design based on common WHERE clauses
  • Multiple Indexes: May need separate indexes for different patterns

Covering Index Benefit

graph LR
    A[Query: SELECT A, B, C WHERE A=?] --> B{Covering Index?}
    B -->|No| C[Index Lookup]
    C --> D[Table Access]
    D --> E[Return Data]
    
    B -->|Yes| F[Index Lookup Only]
    F --> G[Return Data from Index]
    
    style C fill:#FF9800
    style D fill:#FF9800
    style F fill:#4CAF50
    style G fill:#4CAF50

Key Points:

  • No Table Access: All data in index, faster retrieval
  • Include Columns: Add SELECT columns to index
  • Trade-off: Larger index size vs faster queries
  • Best For: Frequently executed queries
  • Optimization: Eliminates expensive table lookups

Index Selectivity

graph TB
    A[Index Selectivity] --> B[High Selectivity]
    A --> C[Low Selectivity]
    
    B --> B1[Many unique values]
    B --> B2[Email, SSN, ID]
    B --> B3[Good for indexing]
    
    C --> C1[Few unique values]
    C --> C2[Gender, Boolean, Status]
    C --> C3[Poor for indexing]
    
    style B fill:#4CAF50
    style B3 fill:#4CAF50
    style C fill:#F44336
    style C3 fill:#F44336

Key Points:

  • High Selectivity: Many distinct values, index very effective
  • Low Selectivity: Few distinct values, index less useful
  • Calculation: Distinct values / Total rows
  • Threshold: Generally need >10% selectivity
  • Examples: Good: email, SSN; Bad: gender, boolean flags

Index Maintenance Impact

sequenceDiagram
    participant App as Application
    participant Table as Table
    participant Idx1 as Index 1
    participant Idx2 as Index 2
    participant Idx3 as Index 3
    
    App->>Table: INSERT new row
    Table->>Table: Add row
    Table->>Idx1: Update index
    Table->>Idx2: Update index
    Table->>Idx3: Update index
    
    Note over Table,Idx3: More indexes = slower writes

Key Points:

  • Write Overhead: Each index must be updated on INSERT/UPDATE/DELETE
  • Balance: More indexes = faster reads, slower writes
  • Monitoring: Track index usage, remove unused indexes
  • Fragmentation: Indexes fragment over time, need rebuilding
  • Best Practice: Only create indexes that provide significant benefit

Code Examples

Creating Indexes

-- Single column index
CREATE INDEX idx_email ON users(email);

-- Unique index
CREATE UNIQUE INDEX idx_username ON users(username);

-- Composite index (order matters!)
CREATE INDEX idx_name ON users(last_name, first_name);

-- Covering index
CREATE INDEX idx_user_summary 
ON users(user_id, username, email, created_at);

-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users 
ON users(email) WHERE is_active = TRUE;

-- Descending index
CREATE INDEX idx_recent_orders 
ON orders(order_date DESC);

Composite Index Usage

-- Create composite index
CREATE INDEX idx_orders ON orders(customer_id, status, order_date);

-- ✅ Uses index (left-most prefix)
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending' AND order_date > '2024-01-01';

-- ❌ Does NOT use index (skips left-most)
SELECT * FROM orders WHERE status = 'pending';
SELECT * FROM orders WHERE order_date > '2024-01-01';
SELECT * FROM orders WHERE status = 'pending' AND order_date > '2024-01-01';

Covering Index Example

-- Frequent query
SELECT order_id, customer_id, total_amount, status
FROM orders
WHERE customer_id = 123;

-- Create covering index (includes all SELECT columns)
CREATE INDEX idx_customer_orders 
ON orders(customer_id, order_id, total_amount, status);

-- Query now uses only index, no table access needed

Index Analysis

-- Check if query uses index (MySQL)
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';

-- View all indexes on table
SHOW INDEX FROM users;

-- Find unused indexes (PostgreSQL)
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan as scans
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY tablename;

-- Index fragmentation (SQL Server)
SELECT 
    OBJECT_NAME(i.object_id) AS table_name,
    i.name AS index_name,
    s.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id;

Index Maintenance

-- Rebuild index (SQL Server)
ALTER INDEX idx_email ON users REBUILD;

-- Optimize table (MySQL)
OPTIMIZE TABLE users;

-- Reindex (PostgreSQL)
REINDEX TABLE users;

-- Drop unused index
DROP INDEX idx_old_column ON users;

Best Practices

  1. Index Foreign Keys: Always index columns used in JOINs
  2. Index WHERE Columns: Index frequently filtered columns
  3. Composite Order: Equality → Range → Sort
  4. Monitor Usage: Remove unused indexes regularly
  5. Avoid Over-Indexing: Each index slows writes
  6. High Selectivity: Index columns with many unique values
  7. Covering Indexes: For critical, frequent queries
  8. Rebuild Regularly: Maintain index health and performance

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...