SQL Indexes
Master SQL indexes with Mermaid diagrams covering B-Tree, clustered, non-clustered, composite, and covering indexes with optimization techniques
SQL Indexes are data structures that dramatically improve query performance by enabling fast data retrieval. They work like a book's index, allowing direct access to data without scanning entire tables.
Index Performance Impact
graph LR
A[Query Execution] --> B{Index Available?}
B -->|No| C[Full Table Scan]
B -->|Yes| D[Index Lookup]
C --> C1[Scan all rows]
C --> C2[O n complexity]
C --> C3[Slow 2-5 seconds]
D --> D1[B-Tree traversal]
D --> D2[O log n complexity]
D --> D3[Fast 0.001 seconds]
style C fill:#F44336
style C3 fill:#F44336
style D fill:#4CAF50
style D3 fill:#4CAF50
Key Points:
- Without Index: Full table scan, O(n) complexity, slow for large tables
- With Index: B-Tree lookup, O(log n) complexity, 100-1000x faster
- Trade-off: Faster reads but slower writes (INSERT, UPDATE, DELETE)
- Storage: Indexes require additional disk space
- Maintenance: Database automatically maintains index structure
B-Tree Index Structure
graph TB
A[Root Node: 50] --> B[Branch: 25]
A --> C[Branch: 75]
B --> D[Leaf: 10, 20]
B --> E[Leaf: 30, 40]
C --> F[Leaf: 60, 70]
C --> G[Leaf: 80, 90]
D --> D1[Data Pointers]
E --> E1[Data Pointers]
F --> F1[Data Pointers]
G --> G1[Data Pointers]
style A fill:#2196F3
style B fill:#4CAF50
style C fill:#4CAF50
style D fill:#FF9800
Key Points:
- Balanced Tree: All leaf nodes at same depth
- Sorted Keys: Enables binary search efficiency
- Self-Balancing: Maintains balance on insert/delete
- Logarithmic Search: O(log n) lookup time
- Leaf Nodes: Contain actual data pointers
Index Types Overview
graph TB
A[SQL Index Types] --> B[Clustered]
A --> C[Non-Clustered]
A --> D[Unique]
A --> E[Composite]
A --> F[Covering]
B --> B1[Physical data order]
B --> B2[One per table]
C --> C1[Separate structure]
C --> C2[Multiple per table]
D --> D1[Enforces uniqueness]
D --> D2[No duplicates]
E --> E1[Multiple columns]
E --> E2[Order matters]
F --> F1[Includes all query columns]
F --> F2[No table access needed]
style A fill:#2196F3
style B fill:#4CAF50
style C fill:#FF9800
style D fill:#9C27B0
style E fill:#F44336
style F fill:#00BCD4
Key Points:
- Clustered: Data physically sorted by index key, one per table
- Non-Clustered: Separate index with pointers, multiple allowed
- Unique: Ensures no duplicate values in indexed column(s)
- Composite: Index on multiple columns, left-most prefix rule
- Covering: Contains all columns needed by query
Clustered vs Non-Clustered
sequenceDiagram
participant Q as Query
participant CI as Clustered Index
participant NCI as Non-Clustered Index
participant Data as Data Pages
Note over Q,Data: Clustered Index Lookup
Q->>CI: Search for key
CI->>Data: Direct access (data is index)
Data->>Q: Return row
Note over Q,Data: Non-Clustered Index Lookup
Q->>NCI: Search for key
NCI->>NCI: Find pointer
NCI->>Data: Follow pointer to data
Data->>Q: Return row
Key Points:
- Clustered: One lookup, data is physically ordered
- Non-Clustered: Two lookups, index then data
- Primary Key: Usually creates clustered index automatically
- Performance: Clustered faster for range queries
- Flexibility: Non-clustered allows multiple indexes
Composite Index Left-Most Prefix
flowchart TB
A[Composite Index: A, B, C] --> B{Query Uses A?}
B -->|Yes| C{Query Uses B?}
B -->|No| D[Index NOT Used]
C -->|Yes| E{Query Uses C?}
C -->|No| F[Index Used: A]
E -->|Yes| G[Index Used: A, B, C]
E -->|No| H[Index Used: A, B]
style G fill:#4CAF50
style H fill:#4CAF50
style F fill:#4CAF50
style D fill:#F44336
Key Points:
- Left-Most Rule: Must use first column in index
- Partial Use: Can use prefix (A) or (A,B) but not (B) or (C)
- Column Order: Critical for index effectiveness
- Query Patterns: Design based on common WHERE clauses
- Multiple Indexes: May need separate indexes for different patterns
Covering Index Benefit
graph LR
A[Query: SELECT A, B, C WHERE A=?] --> B{Covering Index?}
B -->|No| C[Index Lookup]
C --> D[Table Access]
D --> E[Return Data]
B -->|Yes| F[Index Lookup Only]
F --> G[Return Data from Index]
style C fill:#FF9800
style D fill:#FF9800
style F fill:#4CAF50
style G fill:#4CAF50
Key Points:
- No Table Access: All data in index, faster retrieval
- Include Columns: Add SELECT columns to index
- Trade-off: Larger index size vs faster queries
- Best For: Frequently executed queries
- Optimization: Eliminates expensive table lookups
Index Selectivity
graph TB
A[Index Selectivity] --> B[High Selectivity]
A --> C[Low Selectivity]
B --> B1[Many unique values]
B --> B2[Email, SSN, ID]
B --> B3[Good for indexing]
C --> C1[Few unique values]
C --> C2[Gender, Boolean, Status]
C --> C3[Poor for indexing]
style B fill:#4CAF50
style B3 fill:#4CAF50
style C fill:#F44336
style C3 fill:#F44336
Key Points:
- High Selectivity: Many distinct values, index very effective
- Low Selectivity: Few distinct values, index less useful
- Calculation: Distinct values / Total rows
- Threshold: Generally need >10% selectivity
- Examples: Good: email, SSN; Bad: gender, boolean flags
Index Maintenance Impact
sequenceDiagram
participant App as Application
participant Table as Table
participant Idx1 as Index 1
participant Idx2 as Index 2
participant Idx3 as Index 3
App->>Table: INSERT new row
Table->>Table: Add row
Table->>Idx1: Update index
Table->>Idx2: Update index
Table->>Idx3: Update index
Note over Table,Idx3: More indexes = slower writes
Key Points:
- Write Overhead: Each index must be updated on INSERT/UPDATE/DELETE
- Balance: More indexes = faster reads, slower writes
- Monitoring: Track index usage, remove unused indexes
- Fragmentation: Indexes fragment over time, need rebuilding
- Best Practice: Only create indexes that provide significant benefit
Code Examples
Creating Indexes
-- Single column index
CREATE INDEX idx_email ON users(email);
-- Unique index
CREATE UNIQUE INDEX idx_username ON users(username);
-- Composite index (order matters!)
CREATE INDEX idx_name ON users(last_name, first_name);
-- Covering index
CREATE INDEX idx_user_summary
ON users(user_id, username, email, created_at);
-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users
ON users(email) WHERE is_active = TRUE;
-- Descending index
CREATE INDEX idx_recent_orders
ON orders(order_date DESC);
Composite Index Usage
-- Create composite index
CREATE INDEX idx_orders ON orders(customer_id, status, order_date);
-- ✅ Uses index (left-most prefix)
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending' AND order_date > '2024-01-01';
-- ❌ Does NOT use index (skips left-most)
SELECT * FROM orders WHERE status = 'pending';
SELECT * FROM orders WHERE order_date > '2024-01-01';
SELECT * FROM orders WHERE status = 'pending' AND order_date > '2024-01-01';
Covering Index Example
-- Frequent query
SELECT order_id, customer_id, total_amount, status
FROM orders
WHERE customer_id = 123;
-- Create covering index (includes all SELECT columns)
CREATE INDEX idx_customer_orders
ON orders(customer_id, order_id, total_amount, status);
-- Query now uses only index, no table access needed
Index Analysis
-- Check if query uses index (MySQL)
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
-- View all indexes on table
SHOW INDEX FROM users;
-- Find unused indexes (PostgreSQL)
SELECT
schemaname,
tablename,
indexname,
idx_scan as scans
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY tablename;
-- Index fragmentation (SQL Server)
SELECT
OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
s.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id;
Index Maintenance
-- Rebuild index (SQL Server)
ALTER INDEX idx_email ON users REBUILD;
-- Optimize table (MySQL)
OPTIMIZE TABLE users;
-- Reindex (PostgreSQL)
REINDEX TABLE users;
-- Drop unused index
DROP INDEX idx_old_column ON users;
Best Practices
- Index Foreign Keys: Always index columns used in JOINs
- Index WHERE Columns: Index frequently filtered columns
- Composite Order: Equality → Range → Sort
- Monitor Usage: Remove unused indexes regularly
- Avoid Over-Indexing: Each index slows writes
- High Selectivity: Index columns with many unique values
- Covering Indexes: For critical, frequent queries
- Rebuild Regularly: Maintain index health and performance
Comments
Share a question, correction, or practical insight about this article.
Checking login status...
Loading approved comments...