SQL Window Functions: Complete Guide with Visual Examples
Master SQL window functions including ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG, and aggregate window functions with visual data flow diagrams
SQL Window Functions
📋 Table of Contents
- Introduction to Window Functions
- ROW_NUMBER
- RANK and DENSE_RANK
- LEAD and LAG
- Aggregate Window Functions
- PARTITION BY
- Best Practices
Introduction to Window Functions
Window Functions perform calculations across a set of rows related to the current row without collapsing the result set.
┌─────────────────────────────────────────────────────────┐
│ Regular Aggregate vs Window Function │
├─────────────────────────────────────────────────────────┤
│ │
│ REGULAR AGGREGATE (GROUP BY) │
│ Input: 10 rows → Output: 3 rows (collapsed) │
│ ┌──────────┐ │
│ │ 10 rows │ ──GROUP BY──→ │ 3 rows │ │
│ └──────────┘ │
│ │
│ WINDOW FUNCTION (OVER) │
│ Input: 10 rows → Output: 10 rows (preserved) │
│ ┌──────────┐ │
│ │ 10 rows │ ──OVER──→ │ 10 rows + calculations │ │
│ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
Basic Syntax
function_name() OVER (
[PARTITION BY column]
[ORDER BY column]
[ROWS/RANGE frame_specification]
)
Sample Data
CREATE TABLE sales (
sale_id INT,
employee_name VARCHAR(50),
department VARCHAR(50),
sale_amount DECIMAL(10,2),
sale_date DATE
);
INSERT INTO sales VALUES
(1, 'Alice', 'Electronics', 1500.00, '2024-01-15'),
(2, 'Bob', 'Electronics', 2000.00, '2024-01-16'),
(3, 'Charlie', 'Electronics', 1800.00, '2024-01-17'),
(4, 'David', 'Clothing', 1200.00, '2024-01-15'),
(5, 'Eve', 'Clothing', 1500.00, '2024-01-16'),
(6, 'Frank', 'Clothing', 1000.00, '2024-01-17');
ROW_NUMBER
Assigns a unique sequential number to each row within a partition.
Data Flow Diagram
┌─────────────────────────────────────────────────────────┐
│ ROW_NUMBER() Data Flow │
└─────────────────────────────────────────────────────────┘
Input Data (Ordered by sale_amount DESC):
┌──────────┬────────────┬──────────────┐
│ Name │ Department │ Sale Amount │
├──────────┼────────────┼──────────────┤
│ Bob │Electronics │ 2000.00 │
│ Charlie │Electronics │ 1800.00 │
│ Alice │Electronics │ 1500.00 │
│ Eve │ Clothing │ 1500.00 │
│ David │ Clothing │ 1200.00 │
│ Frank │ Clothing │ 1000.00 │
└──────────┴────────────┴──────────────┘
↓
ROW_NUMBER() OVER (ORDER BY sale_amount DESC)
↓
Output with Row Numbers:
┌──────────┬────────────┬──────────────┬────────┐
│ Name │ Department │ Sale Amount │ Row # │
├──────────┼────────────┼──────────────┼────────┤
│ Bob │Electronics │ 2000.00 │ 1 │
│ Charlie │Electronics │ 1800.00 │ 2 │
│ Alice │Electronics │ 1500.00 │ 3 │
│ Eve │ Clothing │ 1500.00 │ 4 │
│ David │ Clothing │ 1200.00 │ 5 │
│ Frank │ Clothing │ 1000.00 │ 6 │
└──────────┴────────────┴──────────────┴────────┘
Examples
-- Basic ROW_NUMBER
SELECT
employee_name,
sale_amount,
ROW_NUMBER() OVER (ORDER BY sale_amount DESC) AS row_num
FROM sales;
-- ROW_NUMBER with PARTITION BY
SELECT
employee_name,
department,
sale_amount,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY sale_amount DESC
) AS dept_rank
FROM sales;
-- Result:
-- employee_name | department | sale_amount | dept_rank
-- Bob | Electronics | 2000.00 | 1
-- Charlie | Electronics | 1800.00 | 2
-- Alice | Electronics | 1500.00 | 3
-- Eve | Clothing | 1500.00 | 1
-- David | Clothing | 1200.00 | 2
-- Frank | Clothing | 1000.00 | 3
-- Use case: Get top 2 from each department
WITH ranked_sales AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY sale_amount DESC
) AS rn
FROM sales
)
SELECT employee_name, department, sale_amount
FROM ranked_sales
WHERE rn <= 2;
RANK and DENSE_RANK
Comparison Diagram
┌─────────────────────────────────────────────────────────┐
│ ROW_NUMBER vs RANK vs DENSE_RANK │
└─────────────────────────────────────────────────────────┘
Input Data (with ties):
┌──────────┬──────────────┐
│ Name │ Sale Amount │
├──────────┼──────────────┤
│ Bob │ 2000.00 │
│ Charlie │ 1800.00 │
│ Alice │ 1500.00 │ ← Tie
│ Eve │ 1500.00 │ ← Tie
│ David │ 1200.00 │
└──────────┴──────────────┘
ROW_NUMBER() RANK() DENSE_RANK()
1 1 1 Bob (2000)
2 2 2 Charlie (1800)
3 3 3 Alice (1500) ← Tie
4 3 3 Eve (1500) ← Tie
5 5 4 David (1200)
↑ ↑ ↑
Always Skips 4 No gaps
unique after tie
Examples
-- Compare all three
SELECT
employee_name,
sale_amount,
ROW_NUMBER() OVER (ORDER BY sale_amount DESC) AS row_num,
RANK() OVER (ORDER BY sale_amount DESC) AS rank,
DENSE_RANK() OVER (ORDER BY sale_amount DESC) AS dense_rank
FROM sales;
-- RANK with PARTITION BY
SELECT
employee_name,
department,
sale_amount,
RANK() OVER (
PARTITION BY department
ORDER BY sale_amount DESC
) AS dept_rank
FROM sales;
-- Use case: Top 3 with ties
WITH ranked AS (
SELECT *,
DENSE_RANK() OVER (ORDER BY sale_amount DESC) AS rank
FROM sales
)
SELECT * FROM ranked WHERE rank <= 3;
LEAD and LAG
Access data from subsequent (LEAD) or previous (LAG) rows.
Data Flow Diagram
┌─────────────────────────────────────────────────────────┐
│ LEAD and LAG Data Flow │
└─────────────────────────────────────────────────────────┘
Input Data (Ordered by sale_date):
┌──────────┬────────────┬──────────────┐
│ Name │ Sale Date │ Sale Amount │
├──────────┼────────────┼──────────────┤
│ Alice │ 2024-01-15 │ 1500.00 │
│ Bob │ 2024-01-16 │ 2000.00 │
│ Charlie │ 2024-01-17 │ 1800.00 │
└──────────┴────────────┴──────────────┘
LAG(sale_amount) ←─┐ Current ┌─→ LEAD(sale_amount)
│ │
NULL ←─────┤ 1500.00 ├─────→ 2000.00
1500.00 ←─────┤ 2000.00 ├─────→ 1800.00
2000.00 ←─────┤ 1800.00 ├─────→ NULL
Output:
┌──────────┬─────────┬──────────┬──────────┐
│ Name │ Current │ Previous │ Next │
├──────────┼─────────┼──────────┼──────────┤
│ Alice │ 1500.00 │ NULL │ 2000.00 │
│ Bob │ 2000.00 │ 1500.00 │ 1800.00 │
│ Charlie │ 1800.00 │ 2000.00 │ NULL │
└──────────┴─────────┴──────────┴──────────┘
Examples
-- LEAD and LAG
SELECT
employee_name,
sale_date,
sale_amount,
LAG(sale_amount) OVER (ORDER BY sale_date) AS prev_sale,
LEAD(sale_amount) OVER (ORDER BY sale_date) AS next_sale
FROM sales;
-- Calculate difference from previous
SELECT
employee_name,
sale_date,
sale_amount,
sale_amount - LAG(sale_amount) OVER (ORDER BY sale_date) AS diff_from_prev
FROM sales;
-- LEAD with offset and default
SELECT
employee_name,
sale_amount,
LEAD(sale_amount, 2, 0) OVER (ORDER BY sale_date) AS two_ahead
FROM sales;
-- Use case: Compare with previous month
SELECT
DATE_FORMAT(sale_date, '%Y-%m') AS month,
SUM(sale_amount) AS current_month,
LAG(SUM(sale_amount)) OVER (ORDER BY DATE_FORMAT(sale_date, '%Y-%m')) AS prev_month,
SUM(sale_amount) - LAG(SUM(sale_amount)) OVER (ORDER BY DATE_FORMAT(sale_date, '%Y-%m')) AS growth
FROM sales
GROUP BY DATE_FORMAT(sale_date, '%Y-%m');
Aggregate Window Functions
Running Total Diagram
┌─────────────────────────────────────────────────────────┐
│ Running Total Data Flow │
└─────────────────────────────────────────────────────────┘
Input Data:
┌──────────┬──────────────┐
│ Date │ Amount │
├──────────┼──────────────┤
│ Jan 15 │ 1500 │
│ Jan 16 │ 2000 │
│ Jan 17 │ 1800 │
└──────────┴──────────────┘
SUM() OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Processing:
Row 1: SUM(1500) = 1500
Row 2: SUM(1500 + 2000) = 3500
Row 3: SUM(1500 + 2000 + 1800) = 5300
Output:
┌──────────┬──────────────┬───────────────┐
│ Date │ Amount │ Running Total │
├──────────┼──────────────┼───────────────┤
│ Jan 15 │ 1500 │ 1500 │
│ Jan 16 │ 2000 │ 3500 │
│ Jan 17 │ 1800 │ 5300 │
└──────────┴──────────────┴───────────────┘
Examples
-- Running total
SELECT
sale_date,
sale_amount,
SUM(sale_amount) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM sales;
-- Moving average (last 3 rows)
SELECT
sale_date,
sale_amount,
AVG(sale_amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM sales;
-- Cumulative average
SELECT
sale_date,
sale_amount,
AVG(sale_amount) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_avg
FROM sales;
-- Percentage of total
SELECT
employee_name,
department,
sale_amount,
ROUND(100.0 * sale_amount / SUM(sale_amount) OVER (), 2) AS pct_of_total,
ROUND(100.0 * sale_amount / SUM(sale_amount) OVER (PARTITION BY department), 2) AS pct_of_dept
FROM sales;
PARTITION BY
Divides result set into partitions for independent calculations.
Partition Diagram
┌─────────────────────────────────────────────────────────┐
│ PARTITION BY Visualization │
└─────────────────────────────────────────────────────────┘
Without PARTITION BY:
┌────────────────────────────────────┐
│ All rows processed together │
│ ┌──────────────────────────────┐ │
│ │ Electronics + Clothing │ │
│ │ (6 rows) │ │
│ └──────────────────────────────┘ │
└────────────────────────────────────┘
With PARTITION BY department:
┌────────────────────────────────────┐
│ Partition 1: Electronics │
│ ┌──────────────────────────────┐ │
│ │ Bob, Charlie, Alice │ │
│ │ (3 rows) │ │
│ └──────────────────────────────┘ │
└────────────────────────────────────┘
┌────────────────────────────────────┐
│ Partition 2: Clothing │
│ ┌──────────────────────────────┐ │
│ │ Eve, David, Frank │ │
│ │ (3 rows) │ │
│ └──────────────────────────────┘ │
└────────────────────────────────────┘
Examples
-- Rank within each department
SELECT
employee_name,
department,
sale_amount,
RANK() OVER (PARTITION BY department ORDER BY sale_amount DESC) AS dept_rank
FROM sales;
-- Running total per department
SELECT
employee_name,
department,
sale_date,
sale_amount,
SUM(sale_amount) OVER (
PARTITION BY department
ORDER BY sale_date
) AS dept_running_total
FROM sales;
-- Compare to department average
SELECT
employee_name,
department,
sale_amount,
AVG(sale_amount) OVER (PARTITION BY department) AS dept_avg,
sale_amount - AVG(sale_amount) OVER (PARTITION BY department) AS diff_from_avg
FROM sales;
Best Practices
1. Use Appropriate Window Function
-- ✅ GOOD: ROW_NUMBER for unique ranking
SELECT *, ROW_NUMBER() OVER (ORDER BY sale_amount DESC) AS rank
FROM sales;
-- ✅ GOOD: DENSE_RANK when you need continuous ranks
SELECT *, DENSE_RANK() OVER (ORDER BY sale_amount DESC) AS rank
FROM sales;
-- ✅ GOOD: RANK when gaps after ties are acceptable
SELECT *, RANK() OVER (ORDER BY sale_amount DESC) AS rank
FROM sales;
2. Optimize with Indexes
-- Create index on PARTITION BY and ORDER BY columns
CREATE INDEX idx_sales_dept_amount ON sales(department, sale_amount DESC);
- ✅ Frame specifications (ROWS, RANGE)
Key Takeaways:
- Window functions preserve row count (unlike GROUP BY)
- Use PARTITION BY to create independent groups
- ORDER BY determines calculation order
- Frame specification controls which rows are included
- Choose the right function for your use case
- Index PARTITION BY and ORDER BY columns