Full Stack • Java • System Design • Cloud • AI Engineering

SQL Aggregate Functions

Master SQL aggregate functions with Mermaid diagrams covering COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING, ROLLUP, and CUBE

Aggregate Functions perform calculations on multiple rows and return a single result. They are essential for data analysis, reporting, and statistical operations.

Aggregate Functions Overview

graph TB
    A[Aggregate Functions] --> B[COUNT]
    A --> C[SUM]
    A --> D[AVG]
    A --> E[MIN/MAX]
    A --> F[Advanced]
    
    B --> B1[Count rows]
    B --> B2[Handles NULL]
    
    C --> C1[Sum values]
    C --> C2[Numeric only]
    
    D --> D1[Average values]
    D --> D2[Excludes NULL]
    
    E --> E1[Find extremes]
    E --> E2[Any data type]
    
    F --> F1[STRING_AGG, STDDEV]
    F --> F2[PERCENTILE, VARIANCE]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#F44336
    style F fill:#00BCD4

Key Points:

  • COUNT: Counts rows, COUNT(*) includes NULL, COUNT(column) excludes NULL
  • SUM: Adds numeric values, ignores NULL
  • AVG: Calculates average, excludes NULL from calculation
  • MIN/MAX: Finds minimum/maximum values, works with any comparable type
  • Advanced: String aggregation, statistical functions, percentiles

Query Execution Order with Aggregates

flowchart TB
    A[FROM] --> B[JOIN]
    B --> C[WHERE]
    C --> D[GROUP BY]
    D --> E[Aggregate Functions]
    E --> F[HAVING]
    F --> G[SELECT]
    G --> H[DISTINCT]
    H --> I[ORDER BY]
    I --> J[LIMIT]
    
    style A fill:#4CAF50
    style C fill:#FF9800
    style D fill:#2196F3
    style E fill:#9C27B0
    style F fill:#F44336

Key Points:

  • WHERE: Filters rows before grouping
  • GROUP BY: Groups rows for aggregation
  • Aggregates: Calculated on each group
  • HAVING: Filters groups after aggregation
  • ORDER BY: Sorts final result set

GROUP BY Mechanism

sequenceDiagram
    participant Query
    participant GroupBy as GROUP BY
    participant Agg as Aggregate
    participant Result
    
    Query->>GroupBy: Process rows
    GroupBy->>GroupBy: Create groups
    
    loop For each group
        GroupBy->>Agg: Calculate COUNT, SUM, AVG
        Agg->>Result: Store group result
    end
    
    Result->>Query: Return aggregated data

Key Points:

  • Grouping: Rows with same GROUP BY values combined
  • Per Group: Aggregate calculated for each group
  • Non-Aggregated Columns: Must be in GROUP BY clause
  • Multiple Columns: Can group by multiple columns
  • Performance: Index GROUP BY columns for speed

WHERE vs HAVING

flowchart LR
    A[Data Rows] --> B[WHERE Filter]
    B --> C[Filtered Rows]
    C --> D[GROUP BY]
    D --> E[Grouped Data]
    E --> F[Aggregate Functions]
    F --> G[HAVING Filter]
    G --> H[Final Result]
    
    style B fill:#FF9800
    style D fill:#2196F3
    style F fill:#9C27B0
    style G fill:#F44336

Key Points:

  • WHERE: Filters individual rows before grouping
  • HAVING: Filters groups after aggregation
  • WHERE Performance: Better, reduces rows early
  • HAVING Use: Filter on aggregate results (COUNT, SUM, AVG)
  • Both Together: WHERE first, then GROUP BY, then HAVING

NULL Handling in Aggregates

graph TB
    A[Aggregate Functions & NULL] --> B[COUNT *]
    A --> C[COUNT column]
    A --> D[SUM/AVG/MIN/MAX]
    
    B --> B1[Includes NULL rows]
    B --> B2[Counts all rows]
    
    C --> C1[Excludes NULL]
    C --> C2[Counts non-NULL only]
    
    D --> D1[Ignores NULL]
    D --> D2[Calculates on non-NULL]
    
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0

Key Points:

  • COUNT(*): Counts all rows including NULL
  • COUNT(column): Counts only non-NULL values
  • SUM/AVG: Ignores NULL, calculates on non-NULL values
  • MIN/MAX: Ignores NULL, finds extreme non-NULL value
  • COALESCE: Use to replace NULL with default value

ROLLUP Hierarchy

graph TB
    A[ROLLUP year, quarter, month] --> B[Detail Level]
    A --> C[Quarter Subtotals]
    A --> D[Year Subtotals]
    A --> E[Grand Total]
    
    B --> B1[year, quarter, month]
    C --> C1[year, quarter, NULL]
    D --> D1[year, NULL, NULL]
    E --> E1[NULL, NULL, NULL]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#F44336

Key Points:

  • Hierarchical: Creates subtotals at each level
  • Right to Left: Removes columns from right
  • Grand Total: Final row with all NULLs
  • Use Case: Reports with subtotals and totals
  • Performance: Single query instead of UNION

CUBE vs ROLLUP

graph LR
    A[CUBE region, product] --> B[All Combinations]
    C[ROLLUP region, product] --> D[Hierarchy Only]
    
    B --> B1[region, product]
    B --> B2[region, NULL]
    B --> B3[NULL, product]
    B --> B4[NULL, NULL]
    
    D --> D1[region, product]
    D --> D2[region, NULL]
    D --> D3[NULL, NULL]
    
    style A fill:#2196F3
    style C fill:#FF9800

Key Points:

  • CUBE: All possible combinations of grouping columns
  • ROLLUP: Hierarchical subtotals only
  • CUBE Results: 2^n combinations (n = number of columns)
  • ROLLUP Results: n+1 levels
  • Use CUBE: Cross-tabulation, multi-dimensional analysis

Aggregate Performance

flowchart TB
    A[Optimize Aggregates] --> B[Index GROUP BY columns]
    A --> C[Filter with WHERE]
    A --> D[Limit result set]
    A --> E[Use covering indexes]
    
    B --> B1[Faster grouping]
    C --> C1[Reduce rows early]
    D --> D1[Less data to process]
    E --> E1[Avoid table access]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#00BCD4

Key Points:

  • Indexes: Create on GROUP BY and aggregate columns
  • WHERE First: Filter before aggregation
  • Covering Index: Include all SELECT columns in index
  • Partitioning: For very large tables
  • Materialized Views: Pre-calculate common aggregates

Code Examples

COUNT Function

-- Count all rows
SELECT COUNT(*) AS total_orders FROM orders;

-- Count non-NULL values
SELECT COUNT(shipped_date) AS shipped_orders FROM orders;

-- Count distinct values
SELECT COUNT(DISTINCT customer_id) AS unique_customers FROM orders;

-- Count with condition
SELECT 
    COUNT(*) AS total,
    COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped,
    COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending
FROM orders;

SUM and AVG

-- Basic SUM and AVG
SELECT 
    SUM(total_amount) AS total_revenue,
    AVG(total_amount) AS avg_order_value,
    COUNT(*) AS order_count
FROM orders;

-- SUM with calculation
SELECT 
    SUM(quantity * unit_price) AS total_sales,
    AVG(quantity * unit_price) AS avg_item_value
FROM order_items;

-- Conditional SUM
SELECT 
    SUM(CASE WHEN status = 'completed' THEN total_amount ELSE 0 END) AS completed_revenue,
    SUM(CASE WHEN status = 'pending' THEN total_amount ELSE 0 END) AS pending_revenue
FROM orders;

MIN and MAX

-- Find extremes
SELECT 
    MIN(price) AS lowest_price,
    MAX(price) AS highest_price,
    AVG(price) AS avg_price
FROM products;

-- MIN/MAX with dates
SELECT 
    MIN(order_date) AS first_order,
    MAX(order_date) AS last_order,
    DATEDIFF(MAX(order_date), MIN(order_date)) AS days_span
FROM orders;

-- Find records with MIN/MAX values
SELECT * FROM products
WHERE price = (SELECT MAX(price) FROM products);

GROUP BY

-- Single column grouping
SELECT 
    category,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price
FROM products
GROUP BY category;

-- Multiple column grouping
SELECT 
    category,
    brand,
    COUNT(*) AS product_count,
    SUM(stock) AS total_stock
FROM products
GROUP BY category, brand
ORDER BY category, brand;

-- GROUP BY with expression
SELECT 
    YEAR(order_date) AS year,
    MONTH(order_date) AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;

HAVING Clause

-- Filter groups
SELECT 
    customer_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 5 AND SUM(total_amount) > 1000;

-- WHERE and HAVING together
SELECT 
    category,
    AVG(price) AS avg_price,
    COUNT(*) AS product_count
FROM products
WHERE is_active = TRUE
GROUP BY category
HAVING AVG(price) > 100 AND COUNT(*) >= 10;

ROLLUP

-- Hierarchical subtotals
SELECT 
    YEAR(order_date) AS year,
    QUARTER(order_date) AS quarter,
    SUM(total_amount) AS revenue,
    COUNT(*) AS order_count
FROM orders
GROUP BY YEAR(order_date), QUARTER(order_date) WITH ROLLUP;

-- Identify ROLLUP rows
SELECT 
    COALESCE(category, 'ALL CATEGORIES') AS category,
    COALESCE(brand, 'ALL BRANDS') AS brand,
    COUNT(*) AS product_count
FROM products
GROUP BY category, brand WITH ROLLUP;

CUBE

-- All combinations (SQL Server, PostgreSQL)
SELECT 
    region,
    product_category,
    SUM(sales_amount) AS total_sales
FROM sales
GROUP BY CUBE(region, product_category);

-- Results include:
-- (region, category), (region, NULL), (NULL, category), (NULL, NULL)

Best Practices

  1. Include All Non-Aggregated Columns: In GROUP BY clause
  2. Use HAVING for Groups: Filter aggregated results with HAVING
  3. Handle NULLs: Use COALESCE or IFNULL for NULL values
  4. Index GROUP BY Columns: Improves performance significantly
  5. Filter Early: Use WHERE before GROUP BY when possible
  6. Meaningful Aliases: Name aggregated columns clearly
  7. DISTINCT Carefully: COUNT(DISTINCT) can be expensive
  8. Consider Materialized Views: For frequently used aggregations

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...