SQL Window Functions
Master SQL window functions with Mermaid diagrams, data flow charts, and practical examples covering ROW_NUMBER, RANK, LEAD, LAG, and aggregates
Window functions perform calculations across rows related to the current row without collapsing the result set, unlike GROUP BY aggregates.
Window Function Architecture
flowchart TB
A[Input Rows] --> B[PARTITION BY]
B --> C[Divide into Groups]
C --> D[ORDER BY]
D --> E[Sort within Partitions]
E --> F[Window Frame]
F --> G[Calculate Function]
G --> H[Output All Rows + Calculation]
style A fill:#2196F3
style B fill:#4CAF50
style D fill:#FF9800
style F fill:#9C27B0
style H fill:#F44336
Key Points:
- Preserves Rows: Unlike GROUP BY, window functions keep all original rows
- PARTITION BY: Divides data into groups for independent calculations
- ORDER BY: Determines calculation order within partitions
- Window Frame: Specifies which rows to include in calculation
- No Aggregation: Each row gets a calculated value without collapsing
Window Function vs GROUP BY
graph LR
A[10 Input Rows] --> B[GROUP BY]
A --> C[Window Function]
B --> D[3 Grouped Rows]
C --> E[10 Rows + Calculations]
style A fill:#2196F3
style B fill:#FF9800
style C fill:#4CAF50
style D fill:#F44336
style E fill:#00BCD4
Key Points:
- GROUP BY: Collapses rows into groups, reduces row count
- Window Functions: Adds calculations to each row, preserves row count
- Use GROUP BY: When you need summary statistics only
- Use Window Functions: When you need row-level data with aggregates
ROW_NUMBER Execution Flow
sequenceDiagram
participant Input as Input Data
participant Partition as PARTITION BY
participant Order as ORDER BY
participant Assign as Assign Numbers
participant Output
Input->>Partition: Group by department
Partition->>Order: Sort by sale_amount DESC
Order->>Assign: Assign 1, 2, 3...
Assign->>Output: Each row gets unique number
Note over Assign: Resets for each partition
Key Points:
- Unique Numbers: Each row gets unique sequential number
- Partition Reset: Numbering restarts at 1 for each partition
- Deterministic: Same ORDER BY always produces same numbers
- Use Cases: Pagination, top-N queries, deduplication
RANK vs DENSE_RANK Comparison
graph TB
A[Sorted Data with Ties] --> B[ROW_NUMBER]
A --> C[RANK]
A --> D[DENSE_RANK]
B --> B1["1,2,3,4,5 - Always unique"]
C --> C1["1,2,3,3,5 - Skips after tie"]
D --> D1["1,2,3,3,4 - No gaps"]
style A fill:#2196F3
style B fill:#4CAF50
style C fill:#FF9800
style D fill:#9C27B0
Key Points:
- ROW_NUMBER: Always assigns unique numbers, even for ties
- RANK: Assigns same rank to ties, skips next rank(s)
- DENSE_RANK: Assigns same rank to ties, no gaps in sequence
- Choose Based On: Whether you need unique numbers or handle ties
LEAD and LAG Data Access
flowchart LR
A[Previous Row] -->|LAG| B[Current Row]
B -->|LEAD| C[Next Row]
D[LAG sale_amount, 1] --> E[Get previous value]
F[LEAD sale_amount, 1] --> G[Get next value]
H[Default Value] --> I[NULL if no row]
style B fill:#2196F3
style A fill:#4CAF50
style C fill:#FF9800
Key Points:
- LAG: Access previous row's value based on ORDER BY
- LEAD: Access next row's value based on ORDER BY
- Offset: Specify how many rows back/forward (default 1)
- Default: Provide default value when row doesn't exist (default NULL)
- Use Cases: Compare with previous period, calculate differences
Running Total Calculation
sequenceDiagram
participant R1 as Row 1: 1500
participant R2 as Row 2: 2000
participant R3 as Row 3: 1800
participant Total as Running Total
R1->>Total: 1500
Note over Total: Sum = 1500
R2->>Total: + 2000
Note over Total: Sum = 3500
R3->>Total: + 1800
Note over Total: Sum = 5300
Key Points:
- Cumulative Sum: Adds current row to all previous rows
- Frame Specification: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- Order Matters: ORDER BY determines which rows are "previous"
- Use Cases: Cumulative sales, running balances, progressive totals
PARTITION BY Visualization
graph TB
A[All Data 6 rows] --> B{PARTITION BY department}
B --> C[Electronics Partition]
B --> D[Clothing Partition]
C --> C1[Bob: 2000]
C --> C2[Charlie: 1800]
C --> C3[Alice: 1500]
D --> D1[Eve: 1500]
D --> D2[David: 1200]
D --> D3[Frank: 1000]
C1 --> E[Rank 1]
C2 --> F[Rank 2]
C3 --> G[Rank 3]
D1 --> H[Rank 1]
D2 --> I[Rank 2]
D3 --> J[Rank 3]
style A fill:#2196F3
style C fill:#4CAF50
style D fill:#FF9800
Key Points:
- Independent Groups: Each partition calculated separately
- Rank Resets: Numbering/ranking starts over for each partition
- Multiple Partitions: Can partition by multiple columns
- Performance: Indexes on partition columns improve performance
Code Examples
Basic Window Functions
-- ROW_NUMBER: Unique sequential numbers
SELECT employee_name, sale_amount,
ROW_NUMBER() OVER (ORDER BY sale_amount DESC) AS row_num
FROM sales;
-- RANK: Handles ties, skips ranks
SELECT employee_name, sale_amount,
RANK() OVER (ORDER BY sale_amount DESC) AS rank
FROM sales;
-- DENSE_RANK: Handles ties, no gaps
SELECT employee_name, sale_amount,
DENSE_RANK() OVER (ORDER BY sale_amount DESC) AS dense_rank
FROM sales;
LEAD and LAG
-- Compare with previous and next
SELECT employee_name, sale_date, sale_amount,
LAG(sale_amount) OVER (ORDER BY sale_date) AS prev_sale,
LEAD(sale_amount) OVER (ORDER BY sale_date) AS next_sale,
sale_amount - LAG(sale_amount) OVER (ORDER BY sale_date) AS diff
FROM sales;
Aggregate Window Functions
-- Running total
SELECT sale_date, sale_amount,
SUM(sale_amount) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM sales;
-- Moving average (last 3 rows)
SELECT sale_date, sale_amount,
AVG(sale_amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM sales;
PARTITION BY Examples
-- Rank within each department
SELECT employee_name, department, sale_amount,
RANK() OVER (
PARTITION BY department
ORDER BY sale_amount DESC
) AS dept_rank
FROM sales;
-- Top 2 from each department
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY sale_amount DESC
) AS rn
FROM sales
)
SELECT * FROM ranked WHERE rn <= 2;
Percentage Calculations
-- Percentage of total and department
SELECT employee_name, department, sale_amount,
ROUND(100.0 * sale_amount /
SUM(sale_amount) OVER (), 2) AS pct_of_total,
ROUND(100.0 * sale_amount /
SUM(sale_amount) OVER (PARTITION BY department), 2) AS pct_of_dept
FROM sales;
Best Practices
- Index Optimization: Create indexes on PARTITION BY and ORDER BY columns
- Choose Right Function: ROW_NUMBER for unique, RANK for ties with gaps, DENSE_RANK for no gaps
- Frame Specification: Use ROWS or RANGE to control calculation window
- Avoid Redundancy: Calculate once, reuse in outer query with CTE
- Performance: Window functions can be expensive on large datasets
Comments
Share a question, correction, or practical insight about this article.
Checking login status...
Loading approved comments...