SQL Aggregate Functions
Master SQL aggregate functions with Mermaid diagrams covering COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING, ROLLUP, and CUBE
Aggregate Functions perform calculations on multiple rows and return a single result. They are essential for data analysis, reporting, and statistical operations.
Aggregate Functions Overview
graph TB
A[Aggregate Functions] --> B[COUNT]
A --> C[SUM]
A --> D[AVG]
A --> E[MIN/MAX]
A --> F[Advanced]
B --> B1[Count rows]
B --> B2[Handles NULL]
C --> C1[Sum values]
C --> C2[Numeric only]
D --> D1[Average values]
D --> D2[Excludes NULL]
E --> E1[Find extremes]
E --> E2[Any data type]
F --> F1[STRING_AGG, STDDEV]
F --> F2[PERCENTILE, VARIANCE]
style A fill:#2196F3
style B fill:#4CAF50
style C fill:#FF9800
style D fill:#9C27B0
style E fill:#F44336
style F fill:#00BCD4
Key Points:
- COUNT: Counts rows, COUNT(*) includes NULL, COUNT(column) excludes NULL
- SUM: Adds numeric values, ignores NULL
- AVG: Calculates average, excludes NULL from calculation
- MIN/MAX: Finds minimum/maximum values, works with any comparable type
- Advanced: String aggregation, statistical functions, percentiles
Query Execution Order with Aggregates
flowchart TB
A[FROM] --> B[JOIN]
B --> C[WHERE]
C --> D[GROUP BY]
D --> E[Aggregate Functions]
E --> F[HAVING]
F --> G[SELECT]
G --> H[DISTINCT]
H --> I[ORDER BY]
I --> J[LIMIT]
style A fill:#4CAF50
style C fill:#FF9800
style D fill:#2196F3
style E fill:#9C27B0
style F fill:#F44336
Key Points:
- WHERE: Filters rows before grouping
- GROUP BY: Groups rows for aggregation
- Aggregates: Calculated on each group
- HAVING: Filters groups after aggregation
- ORDER BY: Sorts final result set
GROUP BY Mechanism
sequenceDiagram
participant Query
participant GroupBy as GROUP BY
participant Agg as Aggregate
participant Result
Query->>GroupBy: Process rows
GroupBy->>GroupBy: Create groups
loop For each group
GroupBy->>Agg: Calculate COUNT, SUM, AVG
Agg->>Result: Store group result
end
Result->>Query: Return aggregated data
Key Points:
- Grouping: Rows with same GROUP BY values combined
- Per Group: Aggregate calculated for each group
- Non-Aggregated Columns: Must be in GROUP BY clause
- Multiple Columns: Can group by multiple columns
- Performance: Index GROUP BY columns for speed
WHERE vs HAVING
flowchart LR
A[Data Rows] --> B[WHERE Filter]
B --> C[Filtered Rows]
C --> D[GROUP BY]
D --> E[Grouped Data]
E --> F[Aggregate Functions]
F --> G[HAVING Filter]
G --> H[Final Result]
style B fill:#FF9800
style D fill:#2196F3
style F fill:#9C27B0
style G fill:#F44336
Key Points:
- WHERE: Filters individual rows before grouping
- HAVING: Filters groups after aggregation
- WHERE Performance: Better, reduces rows early
- HAVING Use: Filter on aggregate results (COUNT, SUM, AVG)
- Both Together: WHERE first, then GROUP BY, then HAVING
NULL Handling in Aggregates
graph TB
A[Aggregate Functions & NULL] --> B[COUNT *]
A --> C[COUNT column]
A --> D[SUM/AVG/MIN/MAX]
B --> B1[Includes NULL rows]
B --> B2[Counts all rows]
C --> C1[Excludes NULL]
C --> C2[Counts non-NULL only]
D --> D1[Ignores NULL]
D --> D2[Calculates on non-NULL]
style B fill:#4CAF50
style C fill:#FF9800
style D fill:#9C27B0
Key Points:
- COUNT(*): Counts all rows including NULL
- COUNT(column): Counts only non-NULL values
- SUM/AVG: Ignores NULL, calculates on non-NULL values
- MIN/MAX: Ignores NULL, finds extreme non-NULL value
- COALESCE: Use to replace NULL with default value
ROLLUP Hierarchy
graph TB
A[ROLLUP year, quarter, month] --> B[Detail Level]
A --> C[Quarter Subtotals]
A --> D[Year Subtotals]
A --> E[Grand Total]
B --> B1[year, quarter, month]
C --> C1[year, quarter, NULL]
D --> D1[year, NULL, NULL]
E --> E1[NULL, NULL, NULL]
style A fill:#2196F3
style B fill:#4CAF50
style C fill:#FF9800
style D fill:#9C27B0
style E fill:#F44336
Key Points:
- Hierarchical: Creates subtotals at each level
- Right to Left: Removes columns from right
- Grand Total: Final row with all NULLs
- Use Case: Reports with subtotals and totals
- Performance: Single query instead of UNION
CUBE vs ROLLUP
graph LR
A[CUBE region, product] --> B[All Combinations]
C[ROLLUP region, product] --> D[Hierarchy Only]
B --> B1[region, product]
B --> B2[region, NULL]
B --> B3[NULL, product]
B --> B4[NULL, NULL]
D --> D1[region, product]
D --> D2[region, NULL]
D --> D3[NULL, NULL]
style A fill:#2196F3
style C fill:#FF9800
Key Points:
- CUBE: All possible combinations of grouping columns
- ROLLUP: Hierarchical subtotals only
- CUBE Results: 2^n combinations (n = number of columns)
- ROLLUP Results: n+1 levels
- Use CUBE: Cross-tabulation, multi-dimensional analysis
Aggregate Performance
flowchart TB
A[Optimize Aggregates] --> B[Index GROUP BY columns]
A --> C[Filter with WHERE]
A --> D[Limit result set]
A --> E[Use covering indexes]
B --> B1[Faster grouping]
C --> C1[Reduce rows early]
D --> D1[Less data to process]
E --> E1[Avoid table access]
style A fill:#2196F3
style B fill:#4CAF50
style C fill:#FF9800
style D fill:#9C27B0
style E fill:#00BCD4
Key Points:
- Indexes: Create on GROUP BY and aggregate columns
- WHERE First: Filter before aggregation
- Covering Index: Include all SELECT columns in index
- Partitioning: For very large tables
- Materialized Views: Pre-calculate common aggregates
Code Examples
COUNT Function
-- Count all rows
SELECT COUNT(*) AS total_orders FROM orders;
-- Count non-NULL values
SELECT COUNT(shipped_date) AS shipped_orders FROM orders;
-- Count distinct values
SELECT COUNT(DISTINCT customer_id) AS unique_customers FROM orders;
-- Count with condition
SELECT
COUNT(*) AS total,
COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped,
COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending
FROM orders;
SUM and AVG
-- Basic SUM and AVG
SELECT
SUM(total_amount) AS total_revenue,
AVG(total_amount) AS avg_order_value,
COUNT(*) AS order_count
FROM orders;
-- SUM with calculation
SELECT
SUM(quantity * unit_price) AS total_sales,
AVG(quantity * unit_price) AS avg_item_value
FROM order_items;
-- Conditional SUM
SELECT
SUM(CASE WHEN status = 'completed' THEN total_amount ELSE 0 END) AS completed_revenue,
SUM(CASE WHEN status = 'pending' THEN total_amount ELSE 0 END) AS pending_revenue
FROM orders;
MIN and MAX
-- Find extremes
SELECT
MIN(price) AS lowest_price,
MAX(price) AS highest_price,
AVG(price) AS avg_price
FROM products;
-- MIN/MAX with dates
SELECT
MIN(order_date) AS first_order,
MAX(order_date) AS last_order,
DATEDIFF(MAX(order_date), MIN(order_date)) AS days_span
FROM orders;
-- Find records with MIN/MAX values
SELECT * FROM products
WHERE price = (SELECT MAX(price) FROM products);
GROUP BY
-- Single column grouping
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM products
GROUP BY category;
-- Multiple column grouping
SELECT
category,
brand,
COUNT(*) AS product_count,
SUM(stock) AS total_stock
FROM products
GROUP BY category, brand
ORDER BY category, brand;
-- GROUP BY with expression
SELECT
YEAR(order_date) AS year,
MONTH(order_date) AS month,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;
HAVING Clause
-- Filter groups
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 5 AND SUM(total_amount) > 1000;
-- WHERE and HAVING together
SELECT
category,
AVG(price) AS avg_price,
COUNT(*) AS product_count
FROM products
WHERE is_active = TRUE
GROUP BY category
HAVING AVG(price) > 100 AND COUNT(*) >= 10;
ROLLUP
-- Hierarchical subtotals
SELECT
YEAR(order_date) AS year,
QUARTER(order_date) AS quarter,
SUM(total_amount) AS revenue,
COUNT(*) AS order_count
FROM orders
GROUP BY YEAR(order_date), QUARTER(order_date) WITH ROLLUP;
-- Identify ROLLUP rows
SELECT
COALESCE(category, 'ALL CATEGORIES') AS category,
COALESCE(brand, 'ALL BRANDS') AS brand,
COUNT(*) AS product_count
FROM products
GROUP BY category, brand WITH ROLLUP;
CUBE
-- All combinations (SQL Server, PostgreSQL)
SELECT
region,
product_category,
SUM(sales_amount) AS total_sales
FROM sales
GROUP BY CUBE(region, product_category);
-- Results include:
-- (region, category), (region, NULL), (NULL, category), (NULL, NULL)
Best Practices
- Include All Non-Aggregated Columns: In GROUP BY clause
- Use HAVING for Groups: Filter aggregated results with HAVING
- Handle NULLs: Use COALESCE or IFNULL for NULL values
- Index GROUP BY Columns: Improves performance significantly
- Filter Early: Use WHERE before GROUP BY when possible
- Meaningful Aliases: Name aggregated columns clearly
- DISTINCT Carefully: COUNT(DISTINCT) can be expensive
- Consider Materialized Views: For frequently used aggregations
Comments
Share a question, correction, or practical insight about this article.
Checking login status...
Loading approved comments...