Full Stack • Java • System Design • Cloud • AI Engineering

SQL Window Functions

Master SQL window functions with Mermaid diagrams, data flow charts, and practical examples covering ROW_NUMBER, RANK, LEAD, LAG, and aggregates

Window functions perform calculations across rows related to the current row without collapsing the result set, unlike GROUP BY aggregates.

Window Function Architecture

flowchart TB
    A[Input Rows] --> B[PARTITION BY]
    B --> C[Divide into Groups]
    C --> D[ORDER BY]
    D --> E[Sort within Partitions]
    E --> F[Window Frame]
    F --> G[Calculate Function]
    G --> H[Output All Rows + Calculation]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style D fill:#FF9800
    style F fill:#9C27B0
    style H fill:#F44336

Key Points:

  • Preserves Rows: Unlike GROUP BY, window functions keep all original rows
  • PARTITION BY: Divides data into groups for independent calculations
  • ORDER BY: Determines calculation order within partitions
  • Window Frame: Specifies which rows to include in calculation
  • No Aggregation: Each row gets a calculated value without collapsing

Window Function vs GROUP BY

graph LR
    A[10 Input Rows] --> B[GROUP BY]
    A --> C[Window Function]
    
    B --> D[3 Grouped Rows]
    C --> E[10 Rows + Calculations]
    
    style A fill:#2196F3
    style B fill:#FF9800
    style C fill:#4CAF50
    style D fill:#F44336
    style E fill:#00BCD4

Key Points:

  • GROUP BY: Collapses rows into groups, reduces row count
  • Window Functions: Adds calculations to each row, preserves row count
  • Use GROUP BY: When you need summary statistics only
  • Use Window Functions: When you need row-level data with aggregates

ROW_NUMBER Execution Flow

sequenceDiagram
    participant Input as Input Data
    participant Partition as PARTITION BY
    participant Order as ORDER BY
    participant Assign as Assign Numbers
    participant Output
    
    Input->>Partition: Group by department
    Partition->>Order: Sort by sale_amount DESC
    Order->>Assign: Assign 1, 2, 3...
    Assign->>Output: Each row gets unique number
    
    Note over Assign: Resets for each partition

Key Points:

  • Unique Numbers: Each row gets unique sequential number
  • Partition Reset: Numbering restarts at 1 for each partition
  • Deterministic: Same ORDER BY always produces same numbers
  • Use Cases: Pagination, top-N queries, deduplication

RANK vs DENSE_RANK Comparison

graph TB
    A[Sorted Data with Ties] --> B[ROW_NUMBER]
    A --> C[RANK]
    A --> D[DENSE_RANK]
    
    B --> B1["1,2,3,4,5 - Always unique"]
    C --> C1["1,2,3,3,5 - Skips after tie"]
    D --> D1["1,2,3,3,4 - No gaps"]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0

Key Points:

  • ROW_NUMBER: Always assigns unique numbers, even for ties
  • RANK: Assigns same rank to ties, skips next rank(s)
  • DENSE_RANK: Assigns same rank to ties, no gaps in sequence
  • Choose Based On: Whether you need unique numbers or handle ties

LEAD and LAG Data Access

flowchart LR
    A[Previous Row] -->|LAG| B[Current Row]
    B -->|LEAD| C[Next Row]
    
    D[LAG sale_amount, 1] --> E[Get previous value]
    F[LEAD sale_amount, 1] --> G[Get next value]
    
    H[Default Value] --> I[NULL if no row]
    
    style B fill:#2196F3
    style A fill:#4CAF50
    style C fill:#FF9800

Key Points:

  • LAG: Access previous row's value based on ORDER BY
  • LEAD: Access next row's value based on ORDER BY
  • Offset: Specify how many rows back/forward (default 1)
  • Default: Provide default value when row doesn't exist (default NULL)
  • Use Cases: Compare with previous period, calculate differences

Running Total Calculation

sequenceDiagram
    participant R1 as Row 1: 1500
    participant R2 as Row 2: 2000
    participant R3 as Row 3: 1800
    participant Total as Running Total
    
    R1->>Total: 1500
    Note over Total: Sum = 1500
    R2->>Total: + 2000
    Note over Total: Sum = 3500
    R3->>Total: + 1800
    Note over Total: Sum = 5300

Key Points:

  • Cumulative Sum: Adds current row to all previous rows
  • Frame Specification: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  • Order Matters: ORDER BY determines which rows are "previous"
  • Use Cases: Cumulative sales, running balances, progressive totals

PARTITION BY Visualization

graph TB
    A[All Data 6 rows] --> B{PARTITION BY department}
    
    B --> C[Electronics Partition]
    B --> D[Clothing Partition]
    
    C --> C1[Bob: 2000]
    C --> C2[Charlie: 1800]
    C --> C3[Alice: 1500]
    
    D --> D1[Eve: 1500]
    D --> D2[David: 1200]
    D --> D3[Frank: 1000]
    
    C1 --> E[Rank 1]
    C2 --> F[Rank 2]
    C3 --> G[Rank 3]
    
    D1 --> H[Rank 1]
    D2 --> I[Rank 2]
    D3 --> J[Rank 3]
    
    style A fill:#2196F3
    style C fill:#4CAF50
    style D fill:#FF9800

Key Points:

  • Independent Groups: Each partition calculated separately
  • Rank Resets: Numbering/ranking starts over for each partition
  • Multiple Partitions: Can partition by multiple columns
  • Performance: Indexes on partition columns improve performance

Code Examples

Basic Window Functions

-- ROW_NUMBER: Unique sequential numbers
SELECT employee_name, sale_amount,
    ROW_NUMBER() OVER (ORDER BY sale_amount DESC) AS row_num
FROM sales;

-- RANK: Handles ties, skips ranks
SELECT employee_name, sale_amount,
    RANK() OVER (ORDER BY sale_amount DESC) AS rank
FROM sales;

-- DENSE_RANK: Handles ties, no gaps
SELECT employee_name, sale_amount,
    DENSE_RANK() OVER (ORDER BY sale_amount DESC) AS dense_rank
FROM sales;

LEAD and LAG

-- Compare with previous and next
SELECT employee_name, sale_date, sale_amount,
    LAG(sale_amount) OVER (ORDER BY sale_date) AS prev_sale,
    LEAD(sale_amount) OVER (ORDER BY sale_date) AS next_sale,
    sale_amount - LAG(sale_amount) OVER (ORDER BY sale_date) AS diff
FROM sales;

Aggregate Window Functions

-- Running total
SELECT sale_date, sale_amount,
    SUM(sale_amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM sales;

-- Moving average (last 3 rows)
SELECT sale_date, sale_amount,
    AVG(sale_amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3
FROM sales;

PARTITION BY Examples

-- Rank within each department
SELECT employee_name, department, sale_amount,
    RANK() OVER (
        PARTITION BY department 
        ORDER BY sale_amount DESC
    ) AS dept_rank
FROM sales;

-- Top 2 from each department
WITH ranked AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY department 
            ORDER BY sale_amount DESC
        ) AS rn
    FROM sales
)
SELECT * FROM ranked WHERE rn <= 2;

Percentage Calculations

-- Percentage of total and department
SELECT employee_name, department, sale_amount,
    ROUND(100.0 * sale_amount / 
        SUM(sale_amount) OVER (), 2) AS pct_of_total,
    ROUND(100.0 * sale_amount / 
        SUM(sale_amount) OVER (PARTITION BY department), 2) AS pct_of_dept
FROM sales;

Best Practices

  1. Index Optimization: Create indexes on PARTITION BY and ORDER BY columns
  2. Choose Right Function: ROW_NUMBER for unique, RANK for ties with gaps, DENSE_RANK for no gaps
  3. Frame Specification: Use ROWS or RANGE to control calculation window
  4. Avoid Redundancy: Calculate once, reuse in outer query with CTE
  5. Performance: Window functions can be expensive on large datasets

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...