SQL Aggregate Functions: Complete Guide with Examples
Comprehensive guide to SQL aggregate functions including COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING, ROLLUP, CUBE, and GROUPING SETS with visual diagrams
SQL Aggregate Functions
Table of Contents
- Introduction to Aggregate Functions
- COUNT Function
- SUM Function
- AVG Function
- MIN and MAX Functions
- GROUP BY Clause
- HAVING Clause
- GROUPING SETS
- ROLLUP
- CUBE
- Advanced Aggregation
- Best Practices
Introduction to Aggregate Functions
Aggregate functions perform calculations on a set of values and return a single value. They are commonly used with GROUP BY clause to group results.
┌─────────────────────────────────────────────────────────────┐
│ AGGREGATE FUNCTIONS │
├─────────────────────────────────────────────────────────────┤
│ │
│ COUNT() → Count number of rows │
│ SUM() → Calculate sum of values │
│ AVG() → Calculate average of values │
│ MIN() → Find minimum value │
│ MAX() → Find maximum value │
│ GROUP BY → Group rows for aggregation │
│ HAVING → Filter groups (like WHERE for groups) │
│ │
└─────────────────────────────────────────────────────────────┘
Aggregate Function Flow
┌─────────────────────────────────────────────────────────────┐
│ AGGREGATE FUNCTION EXECUTION │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. FROM → Get data from tables │
│ 2. WHERE → Filter individual rows │
│ 3. GROUP BY → Group rows │
│ 4. Aggregate → Apply aggregate functions │
│ 5. HAVING → Filter groups │
│ 6. SELECT → Select columns │
│ 7. ORDER BY → Sort results │
│ 8. LIMIT → Limit results │
│ │
└─────────────────────────────────────────────────────────────┘
Detailed Aggregate Functions Overview:
┌─────────────────────────────────────────────────────────────┐
│ AGGREGATE FUNCTIONS - DETAILED OVERVIEW │
├─────────────────────────────────────────────────────────────┤
│ │
│ How Aggregate Functions Work: │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Input: Multiple Rows │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Row 1: value = 100 │ │ │
│ │ │ Row 2: value = 200 │ │ │
│ │ │ Row 3: value = 150 │ │ │
│ │ │ Row 4: value = 300 │ │ │
│ │ │ Row 5: value = 250 │ │ │
│ │ └──────────────┬──────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Aggregate Function Processing: │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ COUNT(*) → 5 rows │ │ │
│ │ │ SUM(value) → 1000 │ │ │
│ │ │ AVG(value) → 200 │ │ │
│ │ │ MIN(value) → 100 │ │ │
│ │ │ MAX(value) → 300 │ │ │
│ │ └──────────────┬──────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Output: Single Value per Group │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Result: Aggregated value │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ Function Comparison: │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Function │ Purpose │ NULL Handling │ │
│ │ ─────────┼──────────────────┼───────────────────── │ │
│ │ COUNT(*) │ Count all rows │ Includes NULLs │ │
│ │ COUNT(c) │ Count non-NULL │ Excludes NULLs │ │
│ │ SUM(c) │ Sum values │ Ignores NULLs │ │
│ │ AVG(c) │ Average values │ Ignores NULLs │ │
│ │ MIN(c) │ Minimum value │ Ignores NULLs │ │
│ │ MAX(c) │ Maximum value │ Ignores NULLs │ │
│ │ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ With GROUP BY: │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Data: │ │
│ │ ┌────────────┬────────┐ │ │
│ │ │ category │ amount │ │ │
│ │ ├────────────┼────────┤ │ │
│ │ │ Electronics│ 100 │ ┐ │ │
│ │ │ Electronics│ 200 │ ├─ Group 1 │ │
│ │ │ Electronics│ 150 │ ┘ │ │
│ │ │ Books │ 50 │ ┐ │ │
│ │ │ Books │ 75 │ ├─ Group 2 │ │
│ │ │ Books │ 60 │ ┘ │ │
│ │ └────────────┴────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ GROUP BY category: │ │
│ │ ┌────────────┬───────┬─────┬─────┬─────┐ │ │
│ │ │ category │ COUNT │ SUM │ AVG │ MAX │ │ │
│ │ ├────────────┼───────┼─────┼─────┼─────┤ │ │
│ │ │ Electronics│ 3 │ 450 │ 150 │ 200 │ │ │
│ │ │ Books │ 3 │ 185 │ 62 │ 75 │ │ │
│ │ └────────────┴───────┴─────┴─────┴─────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ Common Use Cases: │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ 📊 Reporting: │ │
│ │ • Total sales per month │ │
│ │ • Average order value │ │
│ │ • Customer count by region │ │
│ │ │ │
│ │ 📈 Analytics: │ │
│ │ • Revenue trends │ │
│ │ • Performance metrics │ │
│ │ • Statistical analysis │ │
│ │ │ │
│ │ 🎯 Business Intelligence: │ │
│ │ • KPI calculations │ │
│ │ • Comparative analysis │ │
│ │ • Data summarization │ │
│ │ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Sample Tables
-- Sales table
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
category VARCHAR(50),
region VARCHAR(50),
sale_date DATE,
quantity INT,
unit_price DECIMAL(10,2),
total_amount DECIMAL(10,2)
);
INSERT INTO sales VALUES
(1, 101, 'Electronics', 'North', '2024-01-15', 2, 500.00, 1000.00),
(2, 102, 'Electronics', 'South', '2024-01-16', 1, 300.00, 300.00),
(3, 103, 'Furniture', 'North', '2024-01-17', 3, 200.00, 600.00),
(4, 104, 'Furniture', 'East', '2024-01-18', 1, 450.00, 450.00),
(5, 101, 'Electronics', 'West', '2024-01-19', 2, 500.00, 1000.00),
(6, 105, 'Clothing', 'North', '2024-01-20', 5, 50.00, 250.00),
(7, 106, 'Clothing', 'South', '2024-01-21', 3, 75.00, 225.00),
(8, 103, 'Furniture', 'West', '2024-01-22', 2, 200.00, 400.00),
(9, 102, 'Electronics', 'East', '2024-02-01', 1, 300.00, 300.00),
(10, 107, 'Clothing', 'North', '2024-02-02', 4, 60.00, 240.00);
-- Employees table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
department VARCHAR(50),
salary DECIMAL(10,2),
hire_date DATE,
manager_id INT
);
INSERT INTO employees VALUES
(1, 'John CEO', 'Executive', 150000, '2020-01-01', NULL),
(2, 'Jane Manager', 'Sales', 80000, '2020-06-01', 1),
(3, 'Bob Manager', 'IT', 85000, '2020-06-01', 1),
(4, 'Alice Employee', 'Sales', 50000, '2021-01-15', 2),
(5, 'Charlie Employee', 'Sales', 48000, '2021-02-01', 2),
(6, 'David Employee', 'IT', 55000, '2021-03-01', 3),
(7, 'Eve Employee', 'IT', 52000, '2021-03-15', 3),
(8, 'Frank Employee', 'Sales', 51000, '2021-04-01', 2);
COUNT Function
COUNT() returns the number of rows that match a specified criterion.
Syntax Variations
COUNT(*) -- Count all rows (including NULLs)
COUNT(column) -- Count non-NULL values in column
COUNT(DISTINCT column) -- Count unique non-NULL values
Example 1: Basic COUNT
-- Count total number of sales
SELECT COUNT(*) AS total_sales
FROM sales;
/*
total_sales
-----------
10
*/
-- Count sales by category
SELECT
category,
COUNT(*) AS sale_count
FROM sales
GROUP BY category
ORDER BY sale_count DESC;
/*
category | sale_count
------------|------------
Electronics | 4
Furniture | 3
Clothing | 3
*/
Example 2: COUNT with DISTINCT
-- Count unique products sold
SELECT COUNT(DISTINCT product_id) AS unique_products
FROM sales;
/*
unique_products
---------------
7
*/
-- Count unique products per category
SELECT
category,
COUNT(DISTINCT product_id) AS unique_products,
COUNT(*) AS total_sales
FROM sales
GROUP BY category;
/*
category | unique_products | total_sales
------------|-----------------|------------
Electronics | 2 | 4
Furniture | 2 | 3
Clothing | 3 | 3
*/
Example 3: COUNT with Conditions
-- Count sales over $500
SELECT COUNT(*) AS high_value_sales
FROM sales
WHERE total_amount > 500;
/*
high_value_sales
----------------
4
*/
-- Count sales by region with conditions
SELECT
region,
COUNT(*) AS total_sales,
COUNT(CASE WHEN total_amount > 500 THEN 1 END) AS high_value_sales,
COUNT(CASE WHEN total_amount <= 500 THEN 1 END) AS low_value_sales
FROM sales
GROUP BY region;
/*
region | total_sales | high_value_sales | low_value_sales
-------|-------------|------------------|----------------
North | 3 | 1 | 2
South | 2 | 0 | 2
East | 2 | 0 | 2
West | 3 | 2 | 1
*/
Example 4: COUNT with NULL Handling
-- Create table with NULLs
CREATE TABLE orders (
order_id INT,
customer_id INT,
notes TEXT
);
INSERT INTO orders VALUES
(1, 101, 'Express delivery'),
(2, 102, NULL),
(3, 103, 'Gift wrap'),
(4, 104, NULL);
-- COUNT(*) includes NULLs
SELECT COUNT(*) AS total_orders FROM orders; -- Returns 4
-- COUNT(column) excludes NULLs
SELECT COUNT(notes) AS orders_with_notes FROM orders; -- Returns 2
-- Count NULLs
SELECT COUNT(*) - COUNT(notes) AS orders_without_notes FROM orders; -- Returns 2
Example 5: COUNT with Multiple Groups
-- Count sales by category and region
SELECT
category,
region,
COUNT(*) AS sale_count,
SUM(quantity) AS total_quantity
FROM sales
GROUP BY category, region
ORDER BY category, region;
/*
category | region | sale_count | total_quantity
------------|--------|------------|---------------
Clothing | North | 2 | 9
Clothing | South | 1 | 3
Electronics | East | 1 | 1
Electronics | North | 1 | 2
Electronics | South | 1 | 1
Electronics | West | 1 | 2
Furniture | East | 1 | 1
Furniture | North | 1 | 3
Furniture | West | 1 | 2
*/
SUM Function
SUM() calculates the total sum of a numeric column.
Example 1: Basic SUM
-- Total sales amount
SELECT SUM(total_amount) AS total_revenue
FROM sales;
/*
total_revenue
-------------
4765.00
*/
-- Sum by category
SELECT
category,
SUM(total_amount) AS category_revenue,
SUM(quantity) AS total_quantity
FROM sales
GROUP BY category
ORDER BY category_revenue DESC;
/*
category | category_revenue | total_quantity
------------|------------------|---------------
Electronics | 2600.00 | 6
Furniture | 1450.00 | 6
Clothing | 715.00 | 12
*/
Example 2: SUM with Calculations
-- Calculate total revenue and profit
SELECT
category,
SUM(total_amount) AS revenue,
SUM(quantity * unit_price * 0.3) AS estimated_profit,
SUM(total_amount) - SUM(quantity * unit_price * 0.7) AS profit_margin
FROM sales
GROUP BY category;
Example 3: SUM with Conditions (CASE)
-- Sum sales by region with conditional totals
SELECT
region,
SUM(total_amount) AS total_sales,
SUM(CASE WHEN category = 'Electronics' THEN total_amount ELSE 0 END) AS electronics_sales,
SUM(CASE WHEN category = 'Furniture' THEN total_amount ELSE 0 END) AS furniture_sales,
SUM(CASE WHEN category = 'Clothing' THEN total_amount ELSE 0 END) AS clothing_sales
FROM sales
GROUP BY region
ORDER BY total_sales DESC;
/*
region | total_sales | electronics_sales | furniture_sales | clothing_sales
-------|-------------|-------------------|-----------------|---------------
North | 1850.00 | 1000.00 | 600.00 | 250.00
West | 1400.00 | 1000.00 | 400.00 | 0.00
South | 525.00 | 300.00 | 0.00 | 225.00
East | 750.00 | 300.00 | 450.00 | 0.00
*/
Example 4: SUM with Date Grouping
-- Monthly sales totals
SELECT
YEAR(sale_date) AS year,
MONTH(sale_date) AS month,
COUNT(*) AS sale_count,
SUM(total_amount) AS monthly_revenue,
AVG(total_amount) AS avg_sale_amount
FROM sales
GROUP BY YEAR(sale_date), MONTH(sale_date)
ORDER BY year, month;
/*
year | month | sale_count | monthly_revenue | avg_sale_amount
-----|-------|------------|-----------------|----------------
2024 | 1 | 8 | 4225.00 | 528.13
2024 | 2 | 2 | 540.00 | 270.00
*/
Example 5: Running Total (Cumulative SUM)
-- Running total of sales
SELECT
sale_date,
total_amount,
SUM(total_amount) OVER (ORDER BY sale_date) AS running_total
FROM sales
ORDER BY sale_date;
/*
sale_date | total_amount | running_total
-----------|--------------|---------------
2024-01-15 | 1000.00 | 1000.00
2024-01-16 | 300.00 | 1300.00
2024-01-17 | 600.00 | 1900.00
2024-01-18 | 450.00 | 2350.00
...
*/
AVG Function
AVG() calculates the average value of a numeric column.
Example 1: Basic AVG
-- Average sale amount
SELECT AVG(total_amount) AS avg_sale_amount
FROM sales;
/*
avg_sale_amount
---------------
476.50
*/
-- Average by category
SELECT
category,
COUNT(*) AS sale_count,
AVG(total_amount) AS avg_sale_amount,
MIN(total_amount) AS min_sale,
MAX(total_amount) AS max_sale
FROM sales
GROUP BY category
ORDER BY avg_sale_amount DESC;
/*
category | sale_count | avg_sale_amount | min_sale | max_sale
------------|------------|-----------------|----------|----------
Electronics | 4 | 650.00 | 300.00 | 1000.00
Furniture | 3 | 483.33 | 400.00 | 600.00
Clothing | 3 | 238.33 | 225.00 | 250.00
*/
Example 2: AVG with ROUND
-- Rounded averages
SELECT
category,
ROUND(AVG(total_amount), 2) AS avg_sale,
ROUND(AVG(quantity), 2) AS avg_quantity,
ROUND(AVG(unit_price), 2) AS avg_unit_price
FROM sales
GROUP BY category;
/*
category | avg_sale | avg_quantity | avg_unit_price
------------|----------|--------------|---------------
Electronics | 650.00 | 1.50 | 400.00
Furniture | 483.33 | 2.00 | 283.33
Clothing | 238.33 | 4.00 | 61.67
*/
Example 3: AVG with Filtering
-- Average salary by department (excluding CEO)
SELECT
department,
COUNT(*) AS employee_count,
ROUND(AVG(salary), 2) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees
WHERE department != 'Executive'
GROUP BY department
ORDER BY avg_salary DESC;
/*
department | employee_count | avg_salary | min_salary | max_salary
-----------|----------------|------------|------------|------------
IT | 3 | 64000.00 | 52000.00 | 85000.00
Sales | 4 | 57250.00 | 48000.00 | 80000.00
*/
Example 4: Weighted Average
-- Weighted average price (by quantity)
SELECT
category,
SUM(quantity * unit_price) / SUM(quantity) AS weighted_avg_price,
AVG(unit_price) AS simple_avg_price
FROM sales
GROUP BY category;
/*
category | weighted_avg_price | simple_avg_price
------------|-------------------|------------------
Electronics | 400.00 | 400.00
Furniture | 216.67 | 283.33
Clothing | 59.58 | 61.67
*/
Example 5: AVG with NULL Handling
-- AVG ignores NULL values
CREATE TABLE test_scores (
student_id INT,
test1 INT,
test2 INT,
test3 INT
);
INSERT INTO test_scores VALUES
(1, 85, 90, 88),
(2, 78, NULL, 82), -- NULL is ignored in AVG
(3, 92, 95, NULL);
SELECT
student_id,
AVG(test1) AS avg_test1,
AVG(test2) AS avg_test2,
AVG(test3) AS avg_test3,
(COALESCE(test1, 0) + COALESCE(test2, 0) + COALESCE(test3, 0)) / 3 AS avg_all
FROM test_scores
GROUP BY student_id, test1, test2, test3;
MIN and MAX Functions
MIN() returns the minimum value, MAX() returns the maximum value.
Example 1: Basic MIN and MAX
-- Find price range
SELECT
MIN(unit_price) AS lowest_price,
MAX(unit_price) AS highest_price,
MAX(unit_price) - MIN(unit_price) AS price_range
FROM sales;
/*
lowest_price | highest_price | price_range
-------------|---------------|-------------
50.00 | 500.00 | 450.00
*/
Example 2: MIN and MAX by Group
-- Price range by category
SELECT
category,
MIN(unit_price) AS min_price,
MAX(unit_price) AS max_price,
AVG(unit_price) AS avg_price,
MAX(unit_price) - MIN(unit_price) AS price_range
FROM sales
GROUP BY category
ORDER BY price_range DESC;
/*
category | min_price | max_price | avg_price | price_range
------------|-----------|-----------|-----------|-------------
Electronics | 300.00 | 500.00 | 400.00 | 200.00
Furniture | 200.00 | 450.00 | 283.33 | 250.00
Clothing | 50.00 | 75.00 | 61.67 | 25.00
*/
Example 3: MIN and MAX with Dates
-- Find first and last sale dates
SELECT
category,
MIN(sale_date) AS first_sale,
MAX(sale_date) AS last_sale,
DATEDIFF(MAX(sale_date), MIN(sale_date)) AS days_between
FROM sales
GROUP BY category;
/*
category | first_sale | last_sale | days_between
------------|------------|------------|-------------
Electronics | 2024-01-15 | 2024-02-01 | 17
Furniture | 2024-01-17 | 2024-01-22 | 5
Clothing | 2024-01-20 | 2024-02-02 | 13
*/
Example 4: MIN and MAX with Strings
-- Alphabetical range of employee names
SELECT
department,
MIN(name) AS first_alphabetically,
MAX(name) AS last_alphabetically,
COUNT(*) AS employee_count
FROM employees
GROUP BY department;
/*
department | first_alphabetically | last_alphabetically | employee_count
-----------|---------------------|---------------------|---------------
Executive | John CEO | John CEO | 1
Sales | Alice Employee | Jane Manager | 4
IT | Bob Manager | Eve Employee | 3
*/
Example 5: Finding Records with MIN/MAX Values
-- Find the highest paid employee in each department
SELECT
e1.department,
e1.name,
e1.salary
FROM employees e1
WHERE e1.salary = (
SELECT MAX(e2.salary)
FROM employees e2
WHERE e2.department = e1.department
)
ORDER BY e1.salary DESC;
/*
department | name | salary
-----------|--------------|----------
Executive | John CEO | 150000.00
IT | Bob Manager | 85000.00
Sales | Jane Manager | 80000.00
*/
GROUP BY Clause
GROUP BY groups rows that have the same values in specified columns into summary rows.
GROUP BY Execution Flow Diagram:
┌─────────────────────────────────────────────────────────────┐
│ GROUP BY EXECUTION FLOW │
├─────────────────────────────────────────────────────────────┤
│ │
│ Query Execution Order: │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ 1. FROM clause │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Identify source tables │ │ │
│ │ │ Perform JOINs if needed │ │ │
│ │ └──────────────┬──────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 2. WHERE clause │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Filter individual rows │ │ │
│ │ │ BEFORE grouping │ │ │
│ │ └──────────────┬──────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 3. GROUP BY clause │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Group rows by specified columns │ │ │
│ │ │ │ │ │
│ │ │ Example Data: │ │ │
│ │ │ ┌──────────┬────────┐ │ │ │
│ │ │ │ category │ amount │ │ │ │
│ │ │ ├──────────┼────────┤ │ │ │
│ │ │ │ A │ 100 │ ┐ │ │ │
│ │ │ │ A │ 200 │ ├─ Group A│ │ │
│ │ │ │ A │ 150 │ ┘ │ │ │
│ │ │ │ B │ 300 │ ┐ │ │ │
│ │ │ │ B │ 250 │ ├─ Group B│ │ │
│ │ │ │ B │ 400 │ ┘ │ │ │
│ │ │ └──────────┴────────┘ │ │ │
│ │ └──────────────┬──────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 4. Aggregate Functions │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Apply aggregations per group │ │ │
│ │ │ │ │ │
│ │ │ Group A: COUNT=3, SUM=450 │ │ │
│ │ │ Group B: COUNT=3, SUM=950 │ │ │
│ │ └──────────────┬──────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 5. HAVING clause │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Filter groups │ │ │
│ │ │ AFTER aggregation │ │ │
│ │ │ │ │ │
│ │ │ Example: HAVING SUM > 500 │ │ │
│ │ │ Result: Only Group B passes │ │ │
│ │ └──────────────┬──────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 6. SELECT clause │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Select columns and aggregates │ │ │
│ │ └──────────────┬──────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 7. ORDER BY clause │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Sort final results │ │ │
│ │ └──────────────┬──────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 8. LIMIT/OFFSET │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Limit number of results │ │ │
│ │ └─────────────────────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ Visual Example: │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Original Data (10 rows): │ │
│ │ ┌──────────┬────────┬────────┐ │ │
│ │ │ dept │ emp │ salary │ │ │
│ │ ├──────────┼────────┼────────┤ │ │
│ │ │ Sales │ John │ 50000 │ ┐ │ │
│ │ │ Sales │ Jane │ 55000 │ │ │ │
│ │ │ Sales │ Bob │ 52000 │ ├─ Sales Group │ │
│ │ │ IT │ Alice │ 60000 │ │ │ │
│ │ │ IT │ Tom │ 65000 │ ├─ IT Group │ │
│ │ │ IT │ Sarah │ 62000 │ │ │ │
│ │ │ HR │ Mike │ 48000 │ │ │ │
│ │ │ HR │ Lisa │ 50000 │ ├─ HR Group │ │
│ │ └──────────┴────────┴────────┘ ┘ │ │
│ │ │ │ │
│ │ ▼ GROUP BY dept │ │
│ │ │ │
│ │ Grouped Result (3 rows): │ │
│ │ ┌──────────┬───────┬────────────┬────────────┐ │ │
│ │ │ dept │ COUNT │ AVG_SALARY │ MAX_SALARY │ │ │
│ │ ├──────────┼───────┼────────────┼────────────┤ │ │
│ │ │ Sales │ 3 │ 52333 │ 55000 │ │ │
│ │ │ IT │ 3 │ 62333 │ 65000 │ │ │
│ │ │ HR │ 2 │ 49000 │ 50000 │ │ │
│ │ └──────────┴───────┴────────────┴────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ Multiple Column GROUP BY: │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ GROUP BY dept, year │ │
│ │ │ │
│ │ Creates groups for each unique combination: │ │
│ │ ┌──────────┬──────┬───────┐ │ │
│ │ │ dept │ year │ COUNT │ │ │
│ │ ├──────────┼──────┼───────┤ │ │
│ │ │ Sales │ 2023 │ 5 │ ← Unique combination │ │
│ │ │ Sales │ 2024 │ 7 │ ← Unique combination │ │
│ │ │ IT │ 2023 │ 3 │ ← Unique combination │ │
│ │ │ IT │ 2024 │ 4 │ ← Unique combination │ │
│ │ └──────────┴──────┴───────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Example 1: Single Column GROUP BY
-- Sales by category
SELECT
category,
COUNT(*) AS sale_count,
SUM(total_amount) AS total_revenue,
AVG(total_amount) AS avg_sale
FROM sales
GROUP BY category
ORDER BY total_revenue DESC;
Example 2: Multiple Column GROUP BY
-- Sales by category and region
SELECT
category,
region,
COUNT(*) AS sale_count,
SUM(total_amount) AS revenue,
AVG(total_amount) AS avg_sale
FROM sales
GROUP BY category, region
ORDER BY category, revenue DESC;
/*
category | region | sale_count | revenue | avg_sale
------------|--------|------------|---------|----------
Clothing | North | 2 | 490.00 | 245.00
Clothing | South | 1 | 225.00 | 225.00
Electronics | North | 1 | 1000.00 | 1000.00
Electronics | West | 1 | 1000.00 | 1000.00
Electronics | South | 1 | 300.00 | 300.00
Electronics | East | 1 | 300.00 | 300.00
Furniture | North | 1 | 600.00 | 600.00
Furniture | East | 1 | 450.00 | 450.00
Furniture | West | 1 | 400.00 | 400.00
*/
Example 3: GROUP BY with Expressions
-- Sales by year and month
SELECT
YEAR(sale_date) AS year,
MONTH(sale_date) AS month,
MONTHNAME(sale_date) AS month_name,
COUNT(*) AS sale_count,
SUM(total_amount) AS monthly_revenue
FROM sales
GROUP BY YEAR(sale_date), MONTH(sale_date), MONTHNAME(sale_date)
ORDER BY year, month;
/*
year | month | month_name | sale_count | monthly_revenue
-----|-------|------------|------------|----------------
2024 | 1 | January | 8 | 4225.00
2024 | 2 | February | 2 | 540.00
*/
Example 4: GROUP BY with CASE
-- Group sales into price ranges
SELECT
CASE
WHEN total_amount < 300 THEN 'Low (< 300)'
WHEN total_amount BETWEEN 300 AND 600 THEN 'Medium (300-600)'
ELSE 'High (> 600)'
END AS price_range,
COUNT(*) AS sale_count,
SUM(total_amount) AS total_revenue,
AVG(total_amount) AS avg_amount
FROM sales
GROUP BY
CASE
WHEN total_amount < 300 THEN 'Low (< 300)'
WHEN total_amount BETWEEN 300 AND 600 THEN 'Medium (300-600)'
ELSE 'High (> 600)'
END
ORDER BY avg_amount;
/*
price_range | sale_count | total_revenue | avg_amount
-------------------|------------|---------------|------------
Low (< 300) | 4 | 965.00 | 241.25
Medium (300-600) | 3 | 1350.00 | 450.00
High (> 600) | 3 | 2450.00 | 816.67
*/
Example 5: GROUP BY ALL Columns
-- Find duplicate sales (if any)
SELECT
product_id,
category,
region,
sale_date,
COUNT(*) AS duplicate_count
FROM sales
GROUP BY product_id, category, region, sale_date
HAVING COUNT(*) > 1;
HAVING Clause
HAVING filters groups after aggregation (WHERE filters rows before aggregation).
WHERE vs HAVING
┌─────────────────────────────────────────────────────────────┐
│ WHERE vs HAVING │
├─────────────────────────────────────────────────────────────┤
│ │
│ WHERE HAVING │
│ ───────────────────────────── ────────────────────────── │
│ • Filters individual rows • Filters groups │
│ • Before GROUP BY • After GROUP BY │
│ • Cannot use aggregates • Can use aggregates │
│ • Faster (reduces data early) • Slower (after grouping) │
│ │
└─────────────────────────────────────────────────────────────┘
Example 1: Basic HAVING
-- Categories with more than 2 sales
SELECT
category,
COUNT(*) AS sale_count,
SUM(total_amount) AS total_revenue
FROM sales
GROUP BY category
HAVING COUNT(*) > 2
ORDER BY sale_count DESC;
/*
category | sale_count | total_revenue
------------|------------|---------------
Electronics | 4 | 2600.00
Furniture | 3 | 1450.00
Clothing | 3 | 715.00
*/
Example 2: HAVING with Multiple Conditions
-- High-performing categories
SELECT
category,
COUNT(*) AS sale_count,
SUM(total_amount) AS total_revenue,
AVG(total_amount) AS avg_sale
FROM sales
GROUP BY category
HAVING COUNT(*) >= 3
AND SUM(total_amount) > 1000
ORDER BY total_revenue DESC;
/*
category | sale_count | total_revenue | avg_sale
------------|------------|---------------|----------
Electronics | 4 | 2600.00 | 650.00
Furniture | 3 | 1450.00 | 483.33
*/
Example 3: WHERE and HAVING Together
-- Sales in 2024 January, grouped by category, with revenue > 500
SELECT
category,
COUNT(*) AS sale_count,
SUM(total_amount) AS total_revenue
FROM sales
WHERE YEAR(sale_date) = 2024
AND MONTH(sale_date) = 1 -- WHERE filters rows first
GROUP BY category
HAVING SUM(total_amount) > 500 -- HAVING filters groups
ORDER BY total_revenue DESC;
/*
category | sale_count | total_revenue
------------|------------|---------------
Electronics | 3 | 2300.00
Furniture | 3 | 1450.00
*/
Example 4: HAVING with Subquery
-- Categories with above-average revenue
SELECT
category,
SUM(total_amount) AS total_revenue
FROM sales
GROUP BY category
HAVING SUM(total_amount) > (
SELECT AVG(category_revenue)
FROM (
SELECT SUM(total_amount) AS category_revenue
FROM sales
GROUP BY category
) AS category_totals
)
ORDER BY total_revenue DESC;
Example 5: HAVING with Complex Aggregates
-- Departments with high average salary and multiple employees
SELECT
department,
COUNT(*) AS employee_count,
ROUND(AVG(salary), 2) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees
GROUP BY department
HAVING COUNT(*) > 1
AND AVG(salary) > 50000
ORDER BY avg_salary DESC;
/*
department | employee_count | avg_salary | min_salary | max_salary
-----------|----------------|------------|------------|------------
IT | 3 | 64000.00 | 52000.00 | 85000.00
Sales | 4 | 57250.00 | 48000.00 | 80000.00
*/
GROUPING SETS
GROUPING SETS allows you to define multiple grouping sets in a single query.
Example 1: Basic GROUPING SETS
-- Sales totals by category, region, and overall
SELECT
category,
region,
SUM(total_amount) AS total_revenue
FROM sales
GROUP BY GROUPING SETS (
(category, region), -- Group by both
(category), -- Group by category only
(region), -- Group by region only
() -- Grand total
)
ORDER BY category, region;
/*
category | region | total_revenue
------------|--------|---------------
Clothing | North | 490.00
Clothing | South | 225.00
Clothing | NULL | 715.00 -- Category subtotal
Electronics | East | 300.00
Electronics | North | 1000.00
Electronics | South | 300.00
Electronics | West | 1000.00
Electronics | NULL | 2600.00 -- Category subtotal
Furniture | East | 450.00
Furniture | North | 600.00
Furniture | West | 400.00
Furniture | NULL | 1450.00 -- Category subtotal
NULL | East | 750.00 -- Region subtotal
NULL | North | 2090.00 -- Region subtotal
NULL | South | 525.00 -- Region subtotal
NULL | West | 1400.00 -- Region subtotal
NULL | NULL | 4765.00 -- Grand total
*/
Example 2: GROUPING SETS vs UNION
-- GROUPING SETS (efficient)
SELECT category, region, SUM(total_amount) AS revenue
FROM sales
GROUP BY GROUPING SETS ((category), (region), ());
-- Equivalent using UNION (less efficient)
SELECT category, NULL AS region, SUM(total_amount) AS revenue
FROM sales GROUP BY category
UNION ALL
SELECT NULL, region, SUM(total_amount) AS revenue
FROM sales GROUP BY region
UNION ALL
SELECT NULL, NULL, SUM(total_amount) AS revenue
FROM sales;
ROLLUP
ROLLUP creates subtotals and grand totals in a hierarchical manner.
Example 1: Basic ROLLUP
-- Hierarchical totals: category → region → grand total
SELECT
category,
region,
COUNT(*) AS sale_count,
SUM(total_amount) AS total_revenue
FROM sales
GROUP BY ROLLUP(category, region)
ORDER BY category, region;
/*
category | region | sale_count | total_revenue
------------|--------|------------|---------------
Clothing | North | 2 | 490.00
Clothing | South | 1 | 225.00
Clothing | NULL | 3 | 715.00 -- Category subtotal
Electronics | East | 1 | 300.00
Electronics | North | 1 | 1000.00
Electronics | South | 1 | 300.00
Electronics | West | 1 | 1000.00
Electronics | NULL | 4 | 2600.00 -- Category subtotal
Furniture | East | 1 | 450.00
Furniture | North | 1 | 600.00
Furniture | West | 1 | 400.00
Furniture | NULL | 3 | 1450.00 -- Category subtotal
NULL | NULL | 10 | 4765.00 -- Grand total
*/
Example 2: ROLLUP with Three Levels
-- Year → Month → Day hierarchy
SELECT
YEAR(sale_date) AS year,
MONTH(sale_date) AS month,
DAY(sale_date) AS day,
SUM(total_amount) AS daily_revenue
FROM sales
GROUP BY ROLLUP(YEAR(sale_date), MONTH(sale_date), DAY(sale_date))
ORDER BY year, month, day;
Example 3: Identifying ROLLUP Rows
-- Use GROUPING() to identify subtotal rows
SELECT
category,
region,
SUM(total_amount) AS revenue,
GROUPING(category) AS is_category_total,
GROUPING(region) AS is_region_total,
CASE
WHEN GROUPING(category) = 1 AND GROUPING(region) = 1 THEN 'Grand Total'
WHEN GROUPING(region) = 1 THEN 'Category Subtotal'
ELSE 'Detail'
END AS row_type
FROM sales
GROUP BY ROLLUP(category, region)
ORDER BY category, region;
CUBE
CUBE creates subtotals for all possible combinations of dimensions.
Example 1: Basic CUBE
-- All possible combinations of category and region
SELECT
category,
region,
COUNT(*) AS sale_count,
SUM(total_amount) AS total_revenue
FROM sales
GROUP BY CUBE(category, region)
ORDER BY category, region;
/*
category | region | sale_count | total_revenue
------------|--------|------------|---------------
NULL | NULL | 10 | 4765.00 -- Grand total
NULL | East | 2 | 750.00 -- Region total
NULL | North | 3 | 2090.00 -- Region total
NULL | South | 2 | 525.00 -- Region total
NULL | West | 3 | 1400.00 -- Region total
Clothing | NULL | 3 | 715.00 -- Category total
Clothing | North | 2 | 490.00
Clothing | South | 1 | 225.00
Electronics | NULL | 4 | 2600.00 -- Category total
Electronics | East | 1 | 300.00
Electronics | North | 1 | 1000.00
Electronics | South | 1 | 300.00
Electronics | West | 1 | 1000.00
Furniture | NULL | 3 | 1450.00 -- Category total
Furniture | East | 1 | 450.00
Furniture | North | 1 | 600.00
Furniture | West | 1 | 400.00
*/
Example 2: CUBE vs ROLLUP
-- ROLLUP: Hierarchical (category → region)
-- Creates: (category, region), (category), ()
SELECT category, region, SUM(total_amount) AS revenue
FROM sales
GROUP BY ROLLUP(category, region);
-- CUBE: All combinations
-- Creates: (category, region), (category), (region), ()
SELECT category, region, SUM(total_amount) AS revenue
FROM sales
GROUP BY CUBE(category, region);
Advanced Aggregation
Example 1: Conditional Aggregation
-- Pivot-like aggregation using CASE
SELECT
region,
SUM(CASE WHEN category = 'Electronics' THEN total_amount ELSE 0 END) AS electronics,
SUM(CASE WHEN category = 'Furniture' THEN total_amount ELSE 0 END) AS furniture,
SUM(CASE WHEN category = 'Clothing' THEN total_amount ELSE 0 END) AS clothing,
SUM(total_amount) AS total
FROM sales
GROUP BY region
ORDER BY total DESC;
/*
region | electronics | furniture | clothing | total
-------|-------------|-----------|----------|--------
North | 1000.00 | 600.00 | 490.00 | 2090.00
West | 1000.00 | 400.00 | 0.00 | 1400.00
East | 300.00 | 450.00 | 0.00 | 750.00
South | 300.00 | 0.00 | 225.00 | 525.00
*/
Example 2: String Aggregation
-- MySQL: GROUP_CONCAT
SELECT
category,
GROUP_CONCAT(DISTINCT region ORDER BY region) AS regions,
COUNT(DISTINCT region) AS region_count
FROM sales
GROUP BY category;
/*
category | regions | region_count
------------|----------------------|-------------
Clothing | North,South | 2
Electronics | East,North,South,West| 4
Furniture | East,North,West | 3
*/
-- PostgreSQL: STRING_AGG
SELECT
category,
STRING_AGG(DISTINCT region, ', ' ORDER BY region) AS regions
FROM sales
GROUP BY category;
-- SQL Server: STRING_AGG (SQL Server 2017+)
SELECT
category,
STRING_AGG(region, ', ') WITHIN GROUP (ORDER BY region) AS regions
FROM sales
GROUP BY category;
Example 3: Statistical Aggregates
-- Standard deviation and variance
SELECT
category,
COUNT(*) AS sale_count,
AVG(total_amount) AS avg_amount,
STDDEV(total_amount) AS std_dev,
VARIANCE(total_amount) AS variance,
MIN(total_amount) AS min_amount,
MAX(total_amount) AS max_amount
FROM sales
GROUP BY category;
Example 4: Percentile Calculations
-- Median and percentiles (MySQL 8.0+)
SELECT
category,
AVG(total_amount) AS mean,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_amount) AS median,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY total_amount) AS q1,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_amount) AS q3
FROM sales
GROUP BY category;
Example 5: Nested Aggregation
-- Average of category averages
SELECT
AVG(category_avg) AS overall_avg
FROM (
SELECT
category,
AVG(total_amount) AS category_avg
FROM sales
GROUP BY category
) AS category_averages;
Best Practices
1. Use Appropriate Aggregate Functions
-- ✅ GOOD: Use COUNT(*) for row count
SELECT COUNT(*) FROM sales;
-- ❌ AVOID: COUNT(column) when you want all rows
SELECT COUNT(product_id) FROM sales; -- Excludes NULLs
2. Include All Non-Aggregated Columns in GROUP BY
-- ✅ GOOD: All non-aggregated columns in GROUP BY
SELECT category, region, SUM(total_amount)
FROM sales
GROUP BY category, region;
-- ❌ ERROR: Missing column in GROUP BY
SELECT category, region, SUM(total_amount)
FROM sales
GROUP BY category; -- Error: region not in GROUP BY
3. Use HAVING for Group Filtering
-- ✅ GOOD: HAVING for aggregate conditions
SELECT category, SUM(total_amount) AS revenue
FROM sales
GROUP BY category
HAVING SUM(total_amount) > 1000;
-- ❌ WRONG: WHERE with aggregate (won't work)
SELECT category, SUM(total_amount) AS revenue
FROM sales
WHERE SUM(total_amount) > 1000 -- Error!
GROUP BY category;
4. Handle NULLs Appropriately
-- ✅ GOOD: Use COALESCE for NULL handling
SELECT
category,
COALESCE(SUM(total_amount), 0) AS total_revenue
FROM sales
GROUP BY category;
5. Use Indexes on GROUP BY Columns
-- ✅ GOOD: Create indexes for frequently grouped columns
CREATE INDEX idx_sales_category ON sales(category);
CREATE INDEX idx_sales_region ON sales(region);
CREATE INDEX idx_sales_date ON sales(sale_date);
6. Consider Performance with Large Datasets
-- ✅ GOOD: Filter before grouping
SELECT category, SUM(total_amount)
FROM sales
WHERE sale_date >= '2024-01-01' -- Filter first
GROUP BY category;
-- ❌ SLOWER: Group all data then filter
SELECT category, SUM(total_amount)
FROM sales
GROUP BY category
HAVING MIN(sale_date) >= '2024-01-01';
7. Use Meaningful Aliases
-- ✅ GOOD: Clear aliases
SELECT
category,
COUNT(*) AS sale_count,
SUM(total_amount) AS total_revenue,
AVG(total_amount) AS avg_sale_amount
FROM sales
GROUP BY category;
-- ❌ BAD: Unclear aliases
SELECT
category,
COUNT(*) AS c,
SUM(total_amount) AS s,
AVG(total_amount) AS a
FROM sales
GROUP BY category;
Summary
This section covered:
- ✅ COUNT function (with DISTINCT and NULL handling)
- ✅ SUM function (with conditional aggregation)
- ✅ AVG function (with weighted averages)
- ✅ MIN and MAX functions
- ✅ GROUP BY clause (single and multiple columns)
- ✅ HAVING clause (filtering groups)
- ✅ GROUPING SETS (multiple groupings)
- ✅ ROLLUP (hierarchical totals)
- ✅ CUBE (all combinations)
- ✅ Advanced aggregation techniques
- ✅ Best practices