SQL Aggregate Functions

Master SQL aggregate functions with Mermaid diagrams covering COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING, ROLLUP, and CUBE

Aggregate Functions perform calculations on multiple rows and return a single result. They are essential for data analysis, reporting, and statistical operations.

Aggregate Functions Overview

graph TB
    A[Aggregate Functions] --> B[COUNT]
    A --> C[SUM]
    A --> D[AVG]
    A --> E[MIN/MAX]
    A --> F[Advanced]
    
    B --> B1[Count rows]
    B --> B2[Handles NULL]
    
    C --> C1[Sum values]
    C --> C2[Numeric only]
    
    D --> D1[Average values]
    D --> D2[Excludes NULL]
    
    E --> E1[Find extremes]
    E --> E2[Any data type]
    
    F --> F1[STRING_AGG, STDDEV]
    F --> F2[PERCENTILE, VARIANCE]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#F44336
    style F fill:#00BCD4

Key Points:

COUNT: Counts rows, COUNT(*) includes NULL, COUNT(column) excludes NULL
SUM: Adds numeric values, ignores NULL
AVG: Calculates average, excludes NULL from calculation
MIN/MAX: Finds minimum/maximum values, works with any comparable type
Advanced: String aggregation, statistical functions, percentiles

Query Execution Order with Aggregates

flowchart TB
    A[FROM] --> B[JOIN]
    B --> C[WHERE]
    C --> D[GROUP BY]
    D --> E[Aggregate Functions]
    E --> F[HAVING]
    F --> G[SELECT]
    G --> H[DISTINCT]
    H --> I[ORDER BY]
    I --> J[LIMIT]
    
    style A fill:#4CAF50
    style C fill:#FF9800
    style D fill:#2196F3
    style E fill:#9C27B0
    style F fill:#F44336

Key Points:

WHERE: Filters rows before grouping
GROUP BY: Groups rows for aggregation
Aggregates: Calculated on each group
HAVING: Filters groups after aggregation
ORDER BY: Sorts final result set

GROUP BY Mechanism

sequenceDiagram
    participant Query
    participant GroupBy as GROUP BY
    participant Agg as Aggregate
    participant Result
    
    Query->>GroupBy: Process rows
    GroupBy->>GroupBy: Create groups
    
    loop For each group
        GroupBy->>Agg: Calculate COUNT, SUM, AVG
        Agg->>Result: Store group result
    end
    
    Result->>Query: Return aggregated data

Key Points:

Grouping: Rows with same GROUP BY values combined
Per Group: Aggregate calculated for each group
Non-Aggregated Columns: Must be in GROUP BY clause
Multiple Columns: Can group by multiple columns
Performance: Index GROUP BY columns for speed

WHERE vs HAVING

flowchart LR
    A[Data Rows] --> B[WHERE Filter]
    B --> C[Filtered Rows]
    C --> D[GROUP BY]
    D --> E[Grouped Data]
    E --> F[Aggregate Functions]
    F --> G[HAVING Filter]
    G --> H[Final Result]
    
    style B fill:#FF9800
    style D fill:#2196F3
    style F fill:#9C27B0
    style G fill:#F44336

Key Points:

WHERE: Filters individual rows before grouping
HAVING: Filters groups after aggregation
WHERE Performance: Better, reduces rows early
HAVING Use: Filter on aggregate results (COUNT, SUM, AVG)
Both Together: WHERE first, then GROUP BY, then HAVING

NULL Handling in Aggregates

graph TB
    A[Aggregate Functions & NULL] --> B[COUNT *]
    A --> C[COUNT column]
    A --> D[SUM/AVG/MIN/MAX]
    
    B --> B1[Includes NULL rows]
    B --> B2[Counts all rows]
    
    C --> C1[Excludes NULL]
    C --> C2[Counts non-NULL only]
    
    D --> D1[Ignores NULL]
    D --> D2[Calculates on non-NULL]
    
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0

Key Points:

COUNT(*): Counts all rows including NULL
COUNT(column): Counts only non-NULL values
SUM/AVG: Ignores NULL, calculates on non-NULL values
MIN/MAX: Ignores NULL, finds extreme non-NULL value
COALESCE: Use to replace NULL with default value

ROLLUP Hierarchy

graph TB
    A[ROLLUP year, quarter, month] --> B[Detail Level]
    A --> C[Quarter Subtotals]
    A --> D[Year Subtotals]
    A --> E[Grand Total]
    
    B --> B1[year, quarter, month]
    C --> C1[year, quarter, NULL]
    D --> D1[year, NULL, NULL]
    E --> E1[NULL, NULL, NULL]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#F44336

Key Points:

Hierarchical: Creates subtotals at each level
Right to Left: Removes columns from right
Grand Total: Final row with all NULLs
Use Case: Reports with subtotals and totals
Performance: Single query instead of UNION

CUBE vs ROLLUP

graph LR
    A[CUBE region, product] --> B[All Combinations]
    C[ROLLUP region, product] --> D[Hierarchy Only]
    
    B --> B1[region, product]
    B --> B2[region, NULL]
    B --> B3[NULL, product]
    B --> B4[NULL, NULL]
    
    D --> D1[region, product]
    D --> D2[region, NULL]
    D --> D3[NULL, NULL]
    
    style A fill:#2196F3
    style C fill:#FF9800

Key Points:

CUBE: All possible combinations of grouping columns
ROLLUP: Hierarchical subtotals only
CUBE Results: 2^n combinations (n = number of columns)
ROLLUP Results: n+1 levels
Use CUBE: Cross-tabulation, multi-dimensional analysis

Aggregate Performance

flowchart TB
    A[Optimize Aggregates] --> B[Index GROUP BY columns]
    A --> C[Filter with WHERE]
    A --> D[Limit result set]
    A --> E[Use covering indexes]
    
    B --> B1[Faster grouping]
    C --> C1[Reduce rows early]
    D --> D1[Less data to process]
    E --> E1[Avoid table access]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#00BCD4

Key Points:

Indexes: Create on GROUP BY and aggregate columns
WHERE First: Filter before aggregation
Covering Index: Include all SELECT columns in index
Partitioning: For very large tables
Materialized Views: Pre-calculate common aggregates

Code Examples

COUNT Function

-- Count all rows
SELECT COUNT(*) AS total_orders FROM orders;

-- Count non-NULL values
SELECT COUNT(shipped_date) AS shipped_orders FROM orders;

-- Count distinct values
SELECT COUNT(DISTINCT customer_id) AS unique_customers FROM orders;

-- Count with condition
SELECT 
    COUNT(*) AS total,
    COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped,
    COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending
FROM orders;

SUM and AVG

-- Basic SUM and AVG
SELECT 
    SUM(total_amount) AS total_revenue,
    AVG(total_amount) AS avg_order_value,
    COUNT(*) AS order_count
FROM orders;

-- SUM with calculation
SELECT 
    SUM(quantity * unit_price) AS total_sales,
    AVG(quantity * unit_price) AS avg_item_value
FROM order_items;

-- Conditional SUM
SELECT 
    SUM(CASE WHEN status = 'completed' THEN total_amount ELSE 0 END) AS completed_revenue,
    SUM(CASE WHEN status = 'pending' THEN total_amount ELSE 0 END) AS pending_revenue
FROM orders;

MIN and MAX

-- Find extremes
SELECT 
    MIN(price) AS lowest_price,
    MAX(price) AS highest_price,
    AVG(price) AS avg_price
FROM products;

-- MIN/MAX with dates
SELECT 
    MIN(order_date) AS first_order,
    MAX(order_date) AS last_order,
    DATEDIFF(MAX(order_date), MIN(order_date)) AS days_span
FROM orders;

-- Find records with MIN/MAX values
SELECT * FROM products
WHERE price = (SELECT MAX(price) FROM products);

GROUP BY

-- Single column grouping
SELECT 
    category,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price
FROM products
GROUP BY category;

-- Multiple column grouping
SELECT 
    category,
    brand,
    COUNT(*) AS product_count,
    SUM(stock) AS total_stock
FROM products
GROUP BY category, brand
ORDER BY category, brand;

-- GROUP BY with expression
SELECT 
    YEAR(order_date) AS year,
    MONTH(order_date) AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;

HAVING Clause

-- Filter groups
SELECT 
    customer_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 5 AND SUM(total_amount) > 1000;

-- WHERE and HAVING together
SELECT 
    category,
    AVG(price) AS avg_price,
    COUNT(*) AS product_count
FROM products
WHERE is_active = TRUE
GROUP BY category
HAVING AVG(price) > 100 AND COUNT(*) >= 10;

ROLLUP

-- Hierarchical subtotals
SELECT 
    YEAR(order_date) AS year,
    QUARTER(order_date) AS quarter,
    SUM(total_amount) AS revenue,
    COUNT(*) AS order_count
FROM orders
GROUP BY YEAR(order_date), QUARTER(order_date) WITH ROLLUP;

-- Identify ROLLUP rows
SELECT 
    COALESCE(category, 'ALL CATEGORIES') AS category,
    COALESCE(brand, 'ALL BRANDS') AS brand,
    COUNT(*) AS product_count
FROM products
GROUP BY category, brand WITH ROLLUP;

CUBE

-- All combinations (SQL Server, PostgreSQL)
SELECT 
    region,
    product_category,
    SUM(sales_amount) AS total_sales
FROM sales
GROUP BY CUBE(region, product_category);

-- Results include:
-- (region, category), (region, NULL), (NULL, category), (NULL, NULL)

Best Practices

Include All Non-Aggregated Columns: In GROUP BY clause
Use HAVING for Groups: Filter aggregated results with HAVING
Handle NULLs: Use COALESCE or IFNULL for NULL values
Index GROUP BY Columns: Improves performance significantly
Filter Early: Use WHERE before GROUP BY when possible
Meaningful Aliases: Name aggregated columns clearly
DISTINCT Carefully: COUNT(DISTINCT) can be expensive
Consider Materialized Views: For frequently used aggregations

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...