Full Stack • Java • System Design • Cloud • AI Engineering

SQL2026-06-12

SQL Window Functions: Complete Guide with Visual Examples

Master SQL window functions including ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG, and aggregate window functions with visual data flow diagrams

SQL Window Functions

📋 Table of Contents

  1. Introduction to Window Functions
  2. ROW_NUMBER
  3. RANK and DENSE_RANK
  4. LEAD and LAG
  5. Aggregate Window Functions
  6. PARTITION BY
  7. Best Practices

Introduction to Window Functions

Window Functions perform calculations across a set of rows related to the current row without collapsing the result set.

┌─────────────────────────────────────────────────────────┐
│        Regular Aggregate vs Window Function              │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  REGULAR AGGREGATE (GROUP BY)                           │
│  Input: 10 rows → Output: 3 rows (collapsed)           │
│  ┌──────────┐                                           │
│  │ 10 rows  │ ──GROUP BY──→ │ 3 rows │                │
│  └──────────┘                                           │
│                                                          │
│  WINDOW FUNCTION (OVER)                                 │
│  Input: 10 rows → Output: 10 rows (preserved)          │
│  ┌──────────┐                                           │
│  │ 10 rows  │ ──OVER──→ │ 10 rows + calculations │    │
│  └──────────┘                                           │
│                                                          │
└─────────────────────────────────────────────────────────┘

Basic Syntax

function_name() OVER (
    [PARTITION BY column]
    [ORDER BY column]
    [ROWS/RANGE frame_specification]
)

Sample Data

CREATE TABLE sales (
    sale_id INT,
    employee_name VARCHAR(50),
    department VARCHAR(50),
    sale_amount DECIMAL(10,2),
    sale_date DATE
);

INSERT INTO sales VALUES
(1, 'Alice', 'Electronics', 1500.00, '2024-01-15'),
(2, 'Bob', 'Electronics', 2000.00, '2024-01-16'),
(3, 'Charlie', 'Electronics', 1800.00, '2024-01-17'),
(4, 'David', 'Clothing', 1200.00, '2024-01-15'),
(5, 'Eve', 'Clothing', 1500.00, '2024-01-16'),
(6, 'Frank', 'Clothing', 1000.00, '2024-01-17');

ROW_NUMBER

Assigns a unique sequential number to each row within a partition.

Data Flow Diagram

┌─────────────────────────────────────────────────────────┐
│              ROW_NUMBER() Data Flow                      │
└─────────────────────────────────────────────────────────┘

Input Data (Ordered by sale_amount DESC):
┌──────────┬────────────┬──────────────┐
│   Name   │ Department │ Sale Amount  │
├──────────┼────────────┼──────────────┤
│   Bob    │Electronics │   2000.00    │
│ Charlie  │Electronics │   1800.00    │
│  Alice   │Electronics │   1500.00    │
│   Eve    │ Clothing   │   1500.00    │
│  David   │ Clothing   │   1200.00    │
│  Frank   │ Clothing   │   1000.00    │
└──────────┴────────────┴──────────────┘
                 ↓
        ROW_NUMBER() OVER (ORDER BY sale_amount DESC)
                 ↓
Output with Row Numbers:
┌──────────┬────────────┬──────────────┬────────┐
│   Name   │ Department │ Sale Amount  │ Row #  │
├──────────┼────────────┼──────────────┼────────┤
│   Bob    │Electronics │   2000.00    │   1    │
│ Charlie  │Electronics │   1800.00    │   2    │
│  Alice   │Electronics │   1500.00    │   3    │
│   Eve    │ Clothing   │   1500.00    │   4    │
│  David   │ Clothing   │   1200.00    │   5    │
│  Frank   │ Clothing   │   1000.00    │   6    │
└──────────┴────────────┴──────────────┴────────┘

Examples

-- Basic ROW_NUMBER
SELECT 
    employee_name,
    sale_amount,
    ROW_NUMBER() OVER (ORDER BY sale_amount DESC) AS row_num
FROM sales;

-- ROW_NUMBER with PARTITION BY
SELECT 
    employee_name,
    department,
    sale_amount,
    ROW_NUMBER() OVER (
        PARTITION BY department 
        ORDER BY sale_amount DESC
    ) AS dept_rank
FROM sales;

-- Result:
-- employee_name | department  | sale_amount | dept_rank
-- Bob           | Electronics | 2000.00     | 1
-- Charlie       | Electronics | 1800.00     | 2
-- Alice         | Electronics | 1500.00     | 3
-- Eve           | Clothing    | 1500.00     | 1
-- David         | Clothing    | 1200.00     | 2
-- Frank         | Clothing    | 1000.00     | 3

-- Use case: Get top 2 from each department
WITH ranked_sales AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY department 
            ORDER BY sale_amount DESC
        ) AS rn
    FROM sales
)
SELECT employee_name, department, sale_amount
FROM ranked_sales
WHERE rn <= 2;

RANK and DENSE_RANK

Comparison Diagram

┌─────────────────────────────────────────────────────────┐
│      ROW_NUMBER vs RANK vs DENSE_RANK                    │
└─────────────────────────────────────────────────────────┘

Input Data (with ties):
┌──────────┬──────────────┐
│   Name   │ Sale Amount  │
├──────────┼──────────────┤
│   Bob    │   2000.00    │
│ Charlie  │   1800.00    │
│  Alice   │   1500.00    │ ← Tie
│   Eve    │   1500.00    │ ← Tie
│  David   │   1200.00    │
└──────────┴──────────────┘

ROW_NUMBER()  RANK()  DENSE_RANK()
    1           1          1        Bob (2000)
    2           2          2        Charlie (1800)
    3           3          3        Alice (1500) ← Tie
    4           3          3        Eve (1500)   ← Tie
    5           5          4        David (1200)
    ↑           ↑          ↑
  Always    Skips 4    No gaps
  unique    after tie

Examples

-- Compare all three
SELECT 
    employee_name,
    sale_amount,
    ROW_NUMBER() OVER (ORDER BY sale_amount DESC) AS row_num,
    RANK() OVER (ORDER BY sale_amount DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY sale_amount DESC) AS dense_rank
FROM sales;

-- RANK with PARTITION BY
SELECT 
    employee_name,
    department,
    sale_amount,
    RANK() OVER (
        PARTITION BY department 
        ORDER BY sale_amount DESC
    ) AS dept_rank
FROM sales;

-- Use case: Top 3 with ties
WITH ranked AS (
    SELECT *,
        DENSE_RANK() OVER (ORDER BY sale_amount DESC) AS rank
    FROM sales
)
SELECT * FROM ranked WHERE rank <= 3;

LEAD and LAG

Access data from subsequent (LEAD) or previous (LAG) rows.

Data Flow Diagram

┌─────────────────────────────────────────────────────────┐
│              LEAD and LAG Data Flow                      │
└─────────────────────────────────────────────────────────┘

Input Data (Ordered by sale_date):
┌──────────┬────────────┬──────────────┐
│   Name   │ Sale Date  │ Sale Amount  │
├──────────┼────────────┼──────────────┤
│  Alice   │ 2024-01-15 │   1500.00    │
│   Bob    │ 2024-01-16 │   2000.00    │
│ Charlie  │ 2024-01-17 │   1800.00    │
└──────────┴────────────┴──────────────┘

LAG(sale_amount) ←─┐  Current  ┌─→ LEAD(sale_amount)
                   │           │
        NULL ←─────┤  1500.00  ├─────→ 2000.00
     1500.00 ←─────┤  2000.00  ├─────→ 1800.00
     2000.00 ←─────┤  1800.00  ├─────→ NULL

Output:
┌──────────┬─────────┬──────────┬──────────┐
│   Name   │ Current │ Previous │   Next   │
├──────────┼─────────┼──────────┼──────────┤
│  Alice   │ 1500.00 │   NULL   │ 2000.00  │
│   Bob    │ 2000.00 │ 1500.00  │ 1800.00  │
│ Charlie  │ 1800.00 │ 2000.00  │   NULL   │
└──────────┴─────────┴──────────┴──────────┘

Examples

-- LEAD and LAG
SELECT 
    employee_name,
    sale_date,
    sale_amount,
    LAG(sale_amount) OVER (ORDER BY sale_date) AS prev_sale,
    LEAD(sale_amount) OVER (ORDER BY sale_date) AS next_sale
FROM sales;

-- Calculate difference from previous
SELECT 
    employee_name,
    sale_date,
    sale_amount,
    sale_amount - LAG(sale_amount) OVER (ORDER BY sale_date) AS diff_from_prev
FROM sales;

-- LEAD with offset and default
SELECT 
    employee_name,
    sale_amount,
    LEAD(sale_amount, 2, 0) OVER (ORDER BY sale_date) AS two_ahead
FROM sales;

-- Use case: Compare with previous month
SELECT 
    DATE_FORMAT(sale_date, '%Y-%m') AS month,
    SUM(sale_amount) AS current_month,
    LAG(SUM(sale_amount)) OVER (ORDER BY DATE_FORMAT(sale_date, '%Y-%m')) AS prev_month,
    SUM(sale_amount) - LAG(SUM(sale_amount)) OVER (ORDER BY DATE_FORMAT(sale_date, '%Y-%m')) AS growth
FROM sales
GROUP BY DATE_FORMAT(sale_date, '%Y-%m');

Aggregate Window Functions

Running Total Diagram

┌─────────────────────────────────────────────────────────┐
│              Running Total Data Flow                     │
└─────────────────────────────────────────────────────────┘

Input Data:
┌──────────┬──────────────┐
│   Date   │    Amount    │
├──────────┼──────────────┤
│ Jan 15   │    1500      │
│ Jan 16   │    2000      │
│ Jan 17   │    1800      │
└──────────┴──────────────┘

SUM() OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

Processing:
Row 1: SUM(1500) = 1500
Row 2: SUM(1500 + 2000) = 3500
Row 3: SUM(1500 + 2000 + 1800) = 5300

Output:
┌──────────┬──────────────┬───────────────┐
│   Date   │    Amount    │ Running Total │
├──────────┼──────────────┼───────────────┤
│ Jan 15   │    1500      │     1500      │
│ Jan 16   │    2000      │     3500      │
│ Jan 17   │    1800      │     5300      │
└──────────┴──────────────┴───────────────┘

Examples

-- Running total
SELECT 
    sale_date,
    sale_amount,
    SUM(sale_amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM sales;

-- Moving average (last 3 rows)
SELECT 
    sale_date,
    sale_amount,
    AVG(sale_amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3
FROM sales;

-- Cumulative average
SELECT 
    sale_date,
    sale_amount,
    AVG(sale_amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_avg
FROM sales;

-- Percentage of total
SELECT 
    employee_name,
    department,
    sale_amount,
    ROUND(100.0 * sale_amount / SUM(sale_amount) OVER (), 2) AS pct_of_total,
    ROUND(100.0 * sale_amount / SUM(sale_amount) OVER (PARTITION BY department), 2) AS pct_of_dept
FROM sales;

PARTITION BY

Divides result set into partitions for independent calculations.

Partition Diagram

┌─────────────────────────────────────────────────────────┐
│           PARTITION BY Visualization                     │
└─────────────────────────────────────────────────────────┘

Without PARTITION BY:
┌────────────────────────────────────┐
│  All rows processed together       │
│  ┌──────────────────────────────┐ │
│  │ Electronics + Clothing       │ │
│  │ (6 rows)                     │ │
│  └──────────────────────────────┘ │
└────────────────────────────────────┘

With PARTITION BY department:
┌────────────────────────────────────┐
│  Partition 1: Electronics          │
│  ┌──────────────────────────────┐ │
│  │ Bob, Charlie, Alice          │ │
│  │ (3 rows)                     │ │
│  └──────────────────────────────┘ │
└────────────────────────────────────┘
┌────────────────────────────────────┐
│  Partition 2: Clothing             │
│  ┌──────────────────────────────┐ │
│  │ Eve, David, Frank            │ │
│  │ (3 rows)                     │ │
│  └──────────────────────────────┘ │
└────────────────────────────────────┘

Examples

-- Rank within each department
SELECT 
    employee_name,
    department,
    sale_amount,
    RANK() OVER (PARTITION BY department ORDER BY sale_amount DESC) AS dept_rank
FROM sales;

-- Running total per department
SELECT 
    employee_name,
    department,
    sale_date,
    sale_amount,
    SUM(sale_amount) OVER (
        PARTITION BY department 
        ORDER BY sale_date
    ) AS dept_running_total
FROM sales;

-- Compare to department average
SELECT 
    employee_name,
    department,
    sale_amount,
    AVG(sale_amount) OVER (PARTITION BY department) AS dept_avg,
    sale_amount - AVG(sale_amount) OVER (PARTITION BY department) AS diff_from_avg
FROM sales;

Best Practices

1. Use Appropriate Window Function

-- ✅ GOOD: ROW_NUMBER for unique ranking
SELECT *, ROW_NUMBER() OVER (ORDER BY sale_amount DESC) AS rank
FROM sales;

-- ✅ GOOD: DENSE_RANK when you need continuous ranks
SELECT *, DENSE_RANK() OVER (ORDER BY sale_amount DESC) AS rank
FROM sales;

-- ✅ GOOD: RANK when gaps after ties are acceptable
SELECT *, RANK() OVER (ORDER BY sale_amount DESC) AS rank
FROM sales;

2. Optimize with Indexes

-- Create index on PARTITION BY and ORDER BY columns
CREATE INDEX idx_sales_dept_amount ON sales(department, sale_amount DESC);
  • ✅ Frame specifications (ROWS, RANGE)

Key Takeaways:

  1. Window functions preserve row count (unlike GROUP BY)
  2. Use PARTITION BY to create independent groups
  3. ORDER BY determines calculation order
  4. Frame specification controls which rows are included
  5. Choose the right function for your use case
  6. Index PARTITION BY and ORDER BY columns