SQL Indexes: Complete Guide to Database Performance
Master SQL indexes including B-Tree, clustered, non-clustered, composite, and covering indexes with visual diagrams and performance optimization techniques
SQL Indexes
📋 Table of Contents
- Introduction to Indexes
- How Indexes Work
- Index Types
- Creating Indexes
- Composite Indexes
- Index Performance
- Best Practices
Introduction to Indexes
SQL Indexes are data structures that improve query performance by allowing faster data retrieval.
┌─────────────────────────────────────────────────────────┐
│ Without Index vs With Index │
├─────────────────────────────────────────────────────────┤
│ │
│ WITHOUT INDEX (Full Table Scan) │
│ ┌──────────────────────────────────────────┐ │
│ │ Row 1 → Row 2 → Row 3 → ... → Row 1000 │ │
│ └──────────────────────────────────────────┘ │
│ Time: O(n) - Linear search │
│ │
│ WITH INDEX (B-Tree Lookup) │
│ ┌──────────────────────────────────────────┐ │
│ │ Root → Branch → Leaf │ │
│ └──────────────────────────────────────────┘ │
│ Time: O(log n) - Logarithmic search │
│ │
└─────────────────────────────────────────────────────────┘
Why Use Indexes?
| Benefit | Description | Impact |
|---|---|---|
| Faster Queries | Reduces data scan time | 10-100x faster |
| Efficient Sorting | Pre-sorted data structure | Eliminates sort operations |
| Quick Lookups | Direct data access | Constant time for unique keys |
| Join Performance | Speeds up table joins | Significant improvement |
Trade-offs
| Pros | Cons |
|---|---|
| ✅ Faster SELECT queries | ❌ Slower INSERT/UPDATE/DELETE |
| ✅ Efficient WHERE clauses | ❌ Additional storage space |
| ✅ Quick ORDER BY | ❌ Index maintenance overhead |
| ✅ Improved JOIN performance | ❌ Can slow down writes |
How Indexes Work
B-Tree Index Structure
[50]
/ \
/ \
[25] [75]
/ \ / \
/ \ / \
[10,20] [30,40] [60,70] [80,90]
| | | | | | | |
Data Data Data Data
B-Tree Properties:
- Balanced tree structure
- Sorted keys for efficient searching
- Logarithmic search time O(log n)
- Self-balancing on insertions/deletions
Index Lookup Process
┌─────────────────────────────────────────────────────────┐
│ Index Lookup Steps │
└─────────────────────────────────────────────────────────┘
1. Query: SELECT * FROM users WHERE user_id = 42
2. Index Traversal:
Root Node → Compare 42 with 50 → Go LEFT
Branch Node → Compare 42 with 25 → Go RIGHT
Leaf Node → Find 42 → Get Row Pointer
3. Data Retrieval:
Follow pointer → Fetch actual row data
Total: 3-4 disk reads vs 1000+ without index
Index Types
1. Clustered Index
Physical order of data matches index order. One per table.
┌─────────────────────────────────────────────────────────┐
│ Clustered Index │
├─────────────────────────────────────────────────────────┤
│ │
│ Index (Primary Key) Actual Data │
│ ┌────┐ ┌────────────┐ │
│ │ 1 │ ──────────────────→ │ Row 1 Data │ │
│ │ 2 │ ──────────────────→ │ Row 2 Data │ │
│ │ 3 │ ──────────────────→ │ Row 3 Data │ │
│ └────┘ └────────────┘ │
│ │
│ Data is physically sorted by index key │
│ │
└─────────────────────────────────────────────────────────┘
-- Clustered index (automatically created with PRIMARY KEY)
CREATE TABLE users (
user_id INT PRIMARY KEY, -- Clustered index
username VARCHAR(50),
email VARCHAR(100)
);
2. Non-Clustered Index
Separate structure with pointers to data. Multiple per table.
┌─────────────────────────────────────────────────────────┐
│ Non-Clustered Index │
├─────────────────────────────────────────────────────────┤
│ │
│ Index (email) Actual Data │
│ ┌──────────────┐ ┌────────────┐ │
│ │ alice@... │ ────────→ │ Row 3 Data │ │
│ │ bob@... │ ────────→ │ Row 1 Data │ │
│ │ charlie@... │ ────────→ │ Row 2 Data │ │
│ └──────────────┘ └────────────┘ │
│ │
│ Index is separate, points to data location │
│ │
└─────────────────────────────────────────────────────────┘
-- Non-clustered index
CREATE INDEX idx_email ON users(email);
CREATE INDEX idx_username ON users(username);
3. Unique Index
Ensures column values are unique.
-- Unique index
CREATE UNIQUE INDEX idx_email_unique ON users(email);
-- Automatically created with UNIQUE constraint
ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);
4. Composite Index
Index on multiple columns.
-- Composite index (order matters!)
CREATE INDEX idx_name ON users(last_name, first_name);
-- Efficient for:
-- ✅ WHERE last_name = 'Smith'
-- ✅ WHERE last_name = 'Smith' AND first_name = 'John'
-- ❌ WHERE first_name = 'John' (doesn't use index)
5. Covering Index
Index contains all columns needed by query.
-- Covering index
CREATE INDEX idx_user_details ON users(user_id, username, email);
-- This query uses only the index (no table access)
SELECT user_id, username, email
FROM users
WHERE user_id = 42;
Creating Indexes
Basic Syntax
-- Create index
CREATE INDEX index_name ON table_name(column_name);
-- Create unique index
CREATE UNIQUE INDEX index_name ON table_name(column_name);
-- Create composite index
CREATE INDEX index_name ON table_name(col1, col2, col3);
Practical Examples
-- Single column index
CREATE INDEX idx_email ON customers(email);
-- Composite index for common query patterns
CREATE INDEX idx_order_lookup ON orders(customer_id, order_date);
-- Covering index for specific query
CREATE INDEX idx_order_summary
ON orders(customer_id, order_date, total_amount, status);
-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE is_active = TRUE;
-- Descending index
CREATE INDEX idx_recent_orders ON orders(order_date DESC);
Viewing Indexes
-- MySQL
SHOW INDEX FROM users;
-- SQL Server
EXEC sp_helpindex 'users';
-- PostgreSQL
\d users
SELECT * FROM pg_indexes WHERE tablename = 'users';
-- Check index usage
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
Dropping Indexes
-- Drop index
DROP INDEX idx_email ON users; -- MySQL
DROP INDEX idx_email; -- PostgreSQL, SQL Server
Composite Indexes
Column Order Matters
┌─────────────────────────────────────────────────────────┐
│ Composite Index: (A, B, C) │
├─────────────────────────────────────────────────────────┤
│ │
│ Can be used for: │
│ ✅ WHERE A = ? │
│ ✅ WHERE A = ? AND B = ? │
│ ✅ WHERE A = ? AND B = ? AND C = ? │
│ │
│ Cannot be used for: │
│ ❌ WHERE B = ? │
│ ❌ WHERE C = ? │
│ ❌ WHERE B = ? AND C = ? │
│ │
└─────────────────────────────────────────────────────────┘
Example
-- Create composite index
CREATE INDEX idx_user_activity ON user_logs(user_id, action_date, action_type);
-- ✅ Uses index (left-most prefix)
SELECT * FROM user_logs WHERE user_id = 123;
SELECT * FROM user_logs WHERE user_id = 123 AND action_date = '2024-01-01';
SELECT * FROM user_logs WHERE user_id = 123 AND action_date = '2024-01-01' AND action_type = 'login';
-- ❌ Doesn't use index (skips left-most column)
SELECT * FROM user_logs WHERE action_date = '2024-01-01';
SELECT * FROM user_logs WHERE action_type = 'login';
Choosing Column Order
-- Order by: Equality → Range → Sort
CREATE INDEX idx_orders ON orders(
customer_id, -- 1. Equality (WHERE customer_id = ?)
status, -- 2. Equality (WHERE status = ?)
order_date -- 3. Range/Sort (WHERE order_date > ? ORDER BY order_date)
);
Index Performance
Query Performance Comparison
┌─────────────────────────────────────────────────────────┐
│ Performance: No Index vs Index │
├─────────────────────────────────────────────────────────┤
│ │
│ Table Size: 1,000,000 rows │
│ │
│ Query: SELECT * FROM users WHERE email = 'user@...' │
│ │
│ Without Index: │
│ ├─ Full table scan: 1,000,000 rows │
│ ├─ Time: ~2.5 seconds │
│ └─ Disk I/O: High │
│ │
│ With Index: │
│ ├─ Index lookup: ~20 comparisons │
│ ├─ Time: ~0.001 seconds │
│ └─ Disk I/O: Minimal │
│ │
│ Performance Gain: 2500x faster │
│ │
└─────────────────────────────────────────────────────────┘
When Indexes Help
-- ✅ Indexes help with:
SELECT * FROM users WHERE user_id = 123; -- Equality
SELECT * FROM users WHERE age > 25; -- Range
SELECT * FROM users ORDER BY created_at DESC; -- Sorting
SELECT * FROM users WHERE email LIKE 'john%'; -- Prefix search
-- ❌ Indexes don't help with:
SELECT * FROM users WHERE email LIKE '%@gmail.com'; -- Wildcard at start
SELECT * FROM users WHERE YEAR(created_at) = 2024; -- Function on column
SELECT * FROM small_table WHERE ...; -- Small tables
Index Selectivity
-- High selectivity (good for indexing)
CREATE INDEX idx_email ON users(email); -- Unique values
CREATE INDEX idx_ssn ON users(ssn); -- Unique values
-- Low selectivity (poor for indexing)
-- Don't index: gender (2 values), boolean flags, status with few values
-- CREATE INDEX idx_gender ON users(gender); -- ❌ Not recommended
Best Practices
1. Index Foreign Keys
-- ✅ GOOD: Index foreign keys for joins
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
CREATE INDEX idx_customer_id ON orders(customer_id);
2. Index WHERE Clause Columns
-- Analyze common queries
-- Query: SELECT * FROM orders WHERE status = 'pending' AND created_at > '2024-01-01'
-- ✅ GOOD: Create composite index
CREATE INDEX idx_status_date ON orders(status, created_at);
3. Use Covering Indexes for Frequent Queries
-- Frequent query
SELECT order_id, customer_id, total_amount
FROM orders
WHERE customer_id = 123;
-- ✅ GOOD: Covering index (includes all selected columns)
CREATE INDEX idx_customer_covering
ON orders(customer_id, order_id, total_amount);
4. Avoid Over-Indexing
CREATE INDEX idx_location ON users(state, city);
### 5. Monitor Index Usage
```sql
-- MySQL: Check index usage
SELECT
TABLE_NAME,
INDEX_NAME,
SEQ_IN_INDEX,
COLUMN_NAME
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA = 'your_database';
-- Find unused indexes (PostgreSQL)
SELECT
schemaname,
tablename,
indexname,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY tablename;
6. Rebuild Fragmented Indexes
-- MySQL: Optimize table
OPTIMIZE TABLE users;
-- SQL Server: Rebuild index
ALTER INDEX idx_email ON users REBUILD;
-- PostgreSQL: Reindex
REINDEX TABLE users;
Index Decision Tree
Should I create an index?
├─ Is column used in WHERE/JOIN? → YES
│ ├─ High cardinality (many unique values)? → YES → CREATE INDEX
│ └─ Low cardinality (few unique values)? → NO → Skip
├─ Is column used in ORDER BY? → YES → CREATE INDEX
├─ Is table small (< 1000 rows)? → NO → Skip
└─ Will it slow down writes significantly? → YES → Reconsider
Summary
This guide covered:
- ✅ Index fundamentals and B-Tree structure
- ✅ Clustered vs non-clustered indexes
- ✅ Composite and covering indexes
- ✅ Index performance impact
- ✅ Best practices for index management
Key Takeaways:
- Index columns used in WHERE, JOIN, and ORDER BY
- Use composite indexes for multi-column queries
- Column order matters in composite indexes
- Monitor and remove unused indexes
- Balance read performance vs write overhead
- Rebuild fragmented indexes regularly