SQL • 2026-06-12

SQL Subqueries - Complete Guide with Diagrams

Master SQL subqueries with visual diagrams, practical examples, and best practices. Learn scalar, correlated, EXISTS, IN, ANY, and ALL subqueries.

SQL Subqueries - Complete Guide

Introduction to Subqueries
Scalar Subqueries
Row Subqueries
Table Subqueries
Correlated Subqueries
EXISTS and NOT EXISTS
IN and NOT IN
ANY and ALL
Subqueries in Different Clauses
Performance Considerations
Best Practices
Common Use Cases
Interview Questions

Introduction to Subqueries

A subquery (also called an inner query or nested query) is a query within another SQL query. Subqueries can return different types of results and can be used in various parts of a SQL statement.

What is a Subquery?

A subquery is a SELECT statement embedded within another SQL statement. It provides a way to perform operations that would otherwise require multiple separate queries or complex joins.

Subquery Types

┌─────────────────────────────────────────────────────────────┐
│                    SUBQUERY TYPES                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. SCALAR SUBQUERY                                          │
│     • Returns single value (one row, one column)            │
│     • Can be used anywhere a single value is expected       │
│     • Example: SELECT (SELECT MAX(salary) FROM employees)   │
│                                                              │
│  2. ROW SUBQUERY                                             │
│     • Returns single row (multiple columns)                 │
│     • Used with row comparisons                             │
│     • Example: WHERE (col1, col2) = (SELECT ...)           │
│                                                              │
│  3. TABLE SUBQUERY                                           │
│     • Returns multiple rows and columns                     │
│     • Used with IN, EXISTS, FROM clause                     │
│     • Example: SELECT * FROM (SELECT ...) AS subquery      │
│                                                              │
│  4. CORRELATED SUBQUERY                                      │
│     • References outer query columns                        │
│     • Executes once per outer query row                     │
│     • Example: WHERE salary > (SELECT AVG(salary) ...)     │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Subquery Execution Flow

┌─────────────────────────────────────────────────────────────┐
│              SUBQUERY EXECUTION FLOW                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Non-Correlated Subquery:                                   │
│  ┌──────────────────────────────────────┐                  │
│  │  1. Execute subquery once            │                  │
│  │     (independent of outer query)     │                  │
│  └──────────────┬───────────────────────┘                  │
│                 │                                            │
│                 ▼                                            │
│  ┌──────────────────────────────────────┐                  │
│  │  2. Store result                     │                  │
│  └──────────────┬───────────────────────┘                  │
│                 │                                            │
│                 ▼                                            │
│  ┌──────────────────────────────────────┐                  │
│  │  3. Use result in outer query        │                  │
│  └──────────────────────────────────────┘                  │
│                                                              │
│  Correlated Subquery:                                       │
│  ┌──────────────────────────────────────┐                  │
│  │  1. Outer query processes row 1      │                  │
│  └──────────────┬───────────────────────┘                  │
│                 │                                            │
│                 ▼                                            │
│  ┌──────────────────────────────────────┐                  │
│  │  2. Subquery executes for row 1      │                  │
│  │     (uses row 1 values)              │                  │
│  └──────────────┬───────────────────────┘                  │
│                 │                                            │
│                 ▼                                            │
│  ┌──────────────────────────────────────┐                  │
│  │  3. Result used for row 1            │                  │
│  └──────────────┬───────────────────────┘                  │
│                 │                                            │
│                 ▼                                            │
│  Repeat for each row in outer query                        │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Subquery Syntax

-- Basic subquery structure
SELECT column1, column2
FROM table1
WHERE column3 = (SELECT column4 FROM table2 WHERE condition);

-- Subquery in SELECT clause
SELECT 
    column1,
    (SELECT MAX(column2) FROM table2) AS max_value
FROM table1;

-- Subquery in FROM clause
SELECT *
FROM (
    SELECT column1, column2
    FROM table1
    WHERE condition
) AS subquery_alias;

Scalar Subqueries

Scalar subqueries return a single value (one row, one column) and can be used anywhere a single value is expected.

Scalar Subquery Concept

┌─────────────────────────────────────────────────────────────┐
│              SCALAR SUBQUERY CONCEPT                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Outer Query:                                                │
│  SELECT name, salary                                         │
│  FROM employees                                              │
│  WHERE salary > (SELECT AVG(salary) FROM employees)         │
│                         │                                    │
│                         └──────────┐                         │
│                                    │                         │
│  Subquery executes first:         │                         │
│  ┌────────────────────────────────▼──┐                     │
│  │ SELECT AVG(salary)                 │                     │
│  │ FROM employees                     │                     │
│  │ Result: 50000 (single value)       │                     │
│  └────────────────────────────────────┘                     │
│                         │                                    │
│                         ▼                                    │
│  Outer query becomes:                                        │
│  SELECT name, salary                                         │
│  FROM employees                                              │
│  WHERE salary > 50000                                        │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Scalar Subquery Examples

-- Find employees earning more than average
SELECT 
    employee_id,
    first_name,
    last_name,
    salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Compare with maximum value
SELECT 
    product_name,
    price,
    (SELECT MAX(price) FROM products) AS max_price,
    price - (SELECT MAX(price) FROM products) AS difference
FROM products;

-- Use in SELECT clause
SELECT 
    order_id,
    customer_id,
    total_amount,
    (SELECT AVG(total_amount) FROM orders) AS avg_order_amount,
    total_amount - (SELECT AVG(total_amount) FROM orders) AS difference_from_avg
FROM orders;

-- Multiple scalar subqueries
SELECT 
    employee_id,
    salary,
    (SELECT MIN(salary) FROM employees) AS min_salary,
    (SELECT MAX(salary) FROM employees) AS max_salary,
    (SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees
WHERE department_id = 10;

Row Subqueries

Row subqueries return a single row with multiple columns.

Row Subquery Concept

┌─────────────────────────────────────────────────────────────┐
│              ROW SUBQUERY CONCEPT                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Compare multiple columns at once:                          │
│                                                              │
│  SELECT * FROM employees                                     │
│  WHERE (department_id, salary) =                            │
│        (SELECT department_id, MAX(salary)                   │
│         FROM employees                                       │
│         WHERE department_id = 10)                           │
│                                                              │
│  Subquery returns: (10, 75000)                             │
│                     │    │                                  │
│                     │    └─ salary                          │
│                     └────── department_id                   │
│                                                              │
│  Matches rows where both conditions are true               │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Row Subquery Examples

-- Find employee with highest salary in department
SELECT *
FROM employees
WHERE (department_id, salary) = (
    SELECT department_id, MAX(salary)
    FROM employees
    WHERE department_id = 10
);

-- Compare multiple columns
SELECT *
FROM products
WHERE (category_id, price) IN (
    SELECT category_id, MAX(price)
    FROM products
    GROUP BY category_id
);

-- Row comparison with operators
SELECT *
FROM employees e1
WHERE (salary, hire_date) > (
    SELECT AVG(salary), MIN(hire_date)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

Table Subqueries

Table subqueries return multiple rows and columns, creating a derived table.

Table Subquery Concept

┌─────────────────────────────────────────────────────────────┐
│              TABLE SUBQUERY CONCEPT                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Subquery as a derived table:                               │
│                                                              │
│  SELECT dept_name, avg_salary                               │
│  FROM (                                                      │
│      SELECT                                                  │
│          department_id,                                      │
│          AVG(salary) as avg_salary                          │
│      FROM employees                                          │
│      GROUP BY department_id                                  │
│  ) AS dept_avg                                              │
│  JOIN departments d ON dept_avg.department_id = d.id        │
│                                                              │
│  Subquery creates temporary result set:                     │
│  ┌──────────────┬─────────────┐                            │
│  │ department_id│ avg_salary  │                            │
│  ├──────────────┼─────────────┤                            │
│  │      1       │   50000     │                            │
│  │      2       │   60000     │                            │
│  │      3       │   55000     │                            │
│  └──────────────┴─────────────┘                            │
│                                                              │
│  This derived table is then joined with departments         │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Table Subquery Examples

-- Derived table in FROM clause
SELECT 
    dept_summary.department_id,
    d.department_name,
    dept_summary.emp_count,
    dept_summary.avg_salary
FROM (
    SELECT 
        department_id,
        COUNT(*) as emp_count,
        AVG(salary) as avg_salary
    FROM employees
    GROUP BY department_id
) AS dept_summary
JOIN departments d ON dept_summary.department_id = d.department_id;

-- Multiple levels of subqueries
SELECT *
FROM (
    SELECT 
        customer_id,
        SUM(total_amount) as total_spent
    FROM orders
    GROUP BY customer_id
) AS customer_totals
WHERE total_spent > (
    SELECT AVG(total_spent)
    FROM (
        SELECT SUM(total_amount) as total_spent
        FROM orders
        GROUP BY customer_id
    ) AS avg_calc
);

-- Subquery with aggregation
SELECT 
    category,
    product_count,
    avg_price
FROM (
    SELECT 
        category_id,
        COUNT(*) as product_count,
        AVG(price) as avg_price
    FROM products
    GROUP BY category_id
    HAVING COUNT(*) > 5
) AS category_stats
JOIN categories c ON category_stats.category_id = c.category_id;

Correlated Subqueries

Correlated subqueries reference columns from the outer query and execute once for each row processed by the outer query.

Correlated Subquery Execution

┌─────────────────────────────────────────────────────────────┐
│          CORRELATED SUBQUERY EXECUTION                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  SELECT e1.name, e1.salary                                  │
│  FROM employees e1                                           │
│  WHERE e1.salary > (                                        │
│      SELECT AVG(e2.salary)                                  │
│      FROM employees e2                                       │
│      WHERE e2.department_id = e1.department_id              │
│  )                              │                            │
│                                 └─ References outer query    │
│                                                              │
│  Execution for each row:                                     │
│                                                              │
│  Row 1: e1 = {id:1, dept:10, salary:60000}                 │
│  ┌────────────────────────────────────────┐                │
│  │ Subquery: AVG salary for dept 10       │                │
│  │ Result: 50000                           │                │
│  │ Compare: 60000 > 50000 ✓ Include       │                │
│  └────────────────────────────────────────┘                │
│                                                              │
│  Row 2: e1 = {id:2, dept:10, salary:45000}                 │
│  ┌────────────────────────────────────────┐                │
│  │ Subquery: AVG salary for dept 10       │                │
│  │ Result: 50000                           │                │
│  │ Compare: 45000 > 50000 ✗ Exclude       │                │
│  └────────────────────────────────────────┘                │
│                                                              │
│  Row 3: e1 = {id:3, dept:20, salary:70000}                 │
│  ┌────────────────────────────────────────┐                │
│  │ Subquery: AVG salary for dept 20       │                │
│  │ Result: 65000                           │                │
│  │ Compare: 70000 > 65000 ✓ Include       │                │
│  └────────────────────────────────────────┘                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Correlated Subquery Examples

-- Find employees earning above department average
SELECT 
    e1.employee_id,
    e1.first_name,
    e1.last_name,
    e1.salary,
    e1.department_id
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

-- Find customers with above-average order totals
SELECT 
    c.customer_id,
    c.customer_name,
    (SELECT SUM(o.total_amount)
     FROM orders o
     WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c
WHERE (
    SELECT SUM(o.total_amount)
    FROM orders o
    WHERE o.customer_id = c.customer_id
) > (
    SELECT AVG(customer_total)
    FROM (
        SELECT SUM(total_amount) AS customer_total
        FROM orders
        GROUP BY customer_id
    ) AS totals
);

-- Find products more expensive than category average
SELECT 
    p1.product_id,
    p1.product_name,
    p1.price,
    p1.category_id
FROM products p1
WHERE p1.price > (
    SELECT AVG(p2.price)
    FROM products p2
    WHERE p2.category_id = p1.category_id
);

EXISTS and NOT EXISTS

EXISTS checks if a subquery returns any rows. It's more efficient than IN for large datasets.

EXISTS Concept

┌─────────────────────────────────────────────────────────────┐
│              EXISTS vs IN COMPARISON                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  EXISTS:                                                     │
│  • Returns TRUE if subquery returns any rows               │
│  • Stops as soon as first match is found                   │
│  • More efficient for large datasets                       │
│  • Can handle NULL values better                           │
│                                                              │
│  SELECT * FROM customers c                                   │
│  WHERE EXISTS (                                             │
│      SELECT 1 FROM orders o                                 │
│      WHERE o.customer_id = c.customer_id                   │
│  )                                                          │
│                                                              │
│  Execution:                                                  │
│  For each customer:                                         │
│  ┌────────────────────────────────┐                        │
│  │ Check if any orders exist      │                        │
│  │ If yes: Include customer ✓     │                        │
│  │ If no:  Exclude customer ✗     │                        │
│  │ Stop checking after first match│                        │
│  └────────────────────────────────┘                        │
│                                                              │
│  IN:                                                         │
│  • Compares value against list                             │
│  • Processes entire subquery result                        │
│  • Can be slower for large datasets                        │
│                                                              │
└─────────────────────────────────────────────────────────────┘

EXISTS Examples

-- Find customers who have placed orders
SELECT 
    customer_id,
    customer_name,
    email
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

-- Find customers who haven't placed orders
SELECT 
    customer_id,
    customer_name,
    email
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

-- Find employees who manage others
SELECT 
    e1.employee_id,
    e1.first_name,
    e1.last_name
FROM employees e1
WHERE EXISTS (
    SELECT 1
    FROM employees e2
    WHERE e2.manager_id = e1.employee_id
);

-- Find products never ordered
SELECT 
    product_id,
    product_name,
    price
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

-- Complex EXISTS with multiple conditions
SELECT 
    c.customer_id,
    c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
    AND o.total_amount > 1000
);

IN and NOT IN

IN operator checks if a value matches any value in a list or subquery result.

IN Operator Concept

┌─────────────────────────────────────────────────────────────┐
│              IN OPERATOR CONCEPT                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  SELECT * FROM employees                                     │
│  WHERE department_id IN (                                   │
│      SELECT department_id                                   │
│      FROM departments                                        │
│      WHERE location = 'New York'                            │
│  )                                                          │
│                                                              │
│  Subquery returns: [10, 20, 30]                            │
│                                                              │
│  Equivalent to:                                              │
│  WHERE department_id = 10                                   │
│     OR department_id = 20                                   │
│     OR department_id = 30                                   │
│                                                              │
│  ⚠️  NULL Handling:                                         │
│  • IN handles NULLs correctly                               │
│  • NOT IN with NULLs can return unexpected results         │
│                                                              │
└─────────────────────────────────────────────────────────────┘

IN and NOT IN Examples

-- Find employees in specific departments
SELECT 
    employee_id,
    first_name,
    last_name,
    department_id
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM departments
    WHERE location = 'New York'
);

-- Find products in popular categories
SELECT 
    product_id,
    product_name,
    category_id
FROM products
WHERE category_id IN (
    SELECT category_id
    FROM (
        SELECT category_id, COUNT(*) as product_count
        FROM products
        GROUP BY category_id
        HAVING COUNT(*) > 10
    ) AS popular_categories
);

-- Find customers who ordered specific products
SELECT DISTINCT
    c.customer_id,
    c.customer_name
FROM customers c
WHERE c.customer_id IN (
    SELECT o.customer_id
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE oi.product_id IN (101, 102, 103)
);

-- NOT IN example (be careful with NULLs)
SELECT 
    product_id,
    product_name
FROM products
WHERE product_id NOT IN (
    SELECT product_id
    FROM order_items
    WHERE product_id IS NOT NULL  -- Important!
);

-- Better alternative to NOT IN with NULLs
SELECT 
    p.product_id,
    p.product_name
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

ANY and ALL

ANY and ALL operators compare a value to a set of values returned by a subquery.

ANY and ALL Concepts

┌─────────────────────────────────────────────────────────────┐
│              ANY and ALL OPERATORS                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ANY (or SOME):                                              │
│  • Returns TRUE if comparison is true for ANY value         │
│  • = ANY is equivalent to IN                                │
│                                                              │
│  WHERE salary > ANY (SELECT salary FROM employees)          │
│  Returns TRUE if salary is greater than at least one value  │
│                                                              │
│  ALL:                                                        │
│  • Returns TRUE if comparison is true for ALL values        │
│  • More restrictive than ANY                                │
│                                                              │
│  WHERE salary > ALL (SELECT salary FROM employees)          │
│  Returns TRUE if salary is greater than every value         │
│                                                              │
│  Comparison:                                                 │
│  Subquery returns: [30000, 40000, 50000]                   │
│                                                              │
│  salary = 45000                                             │
│  • > ANY: TRUE (45000 > 30000) ✓                           │
│  • > ALL: FALSE (45000 not > 50000) ✗                      │
│                                                              │
│  salary = 60000                                             │
│  • > ANY: TRUE (60000 > 30000) ✓                           │
│  • > ALL: TRUE (60000 > 50000) ✓                           │
│                                                              │
└─────────────────────────────────────────────────────────────┘

ANY and ALL Examples

-- Find employees earning more than ANY employee in dept 10
SELECT 
    employee_id,
    first_name,
    salary
FROM employees
WHERE salary > ANY (
    SELECT salary
    FROM employees
    WHERE department_id = 10
);

-- Find employees earning more than ALL employees in dept 10
SELECT 
    employee_id,
    first_name,
    salary
FROM employees
WHERE salary > ALL (
    SELECT salary
    FROM employees
    WHERE department_id = 10
);

-- Find products cheaper than ANY product in category 1
SELECT 
    product_id,
    product_name,
    price
FROM products
WHERE price < ANY (
    SELECT price
    FROM products
    WHERE category_id = 1
);

-- Find products more expensive than ALL products in category 1
SELECT 
    product_id,
    product_name,
    price
FROM products
WHERE price > ALL (
    SELECT price
    FROM products
    WHERE category_id = 1
);

-- = ANY is equivalent to IN
SELECT * FROM employees
WHERE department_id = ANY (SELECT department_id FROM departments WHERE location = 'NYC');

-- Same as:
SELECT * FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'NYC');

-- <> ALL is equivalent to NOT IN
SELECT * FROM products
WHERE category_id <> ALL (SELECT category_id FROM categories WHERE active = 0);

Subqueries in Different Clauses

Subqueries can be used in various parts of a SQL statement.

Subquery Locations

┌─────────────────────────────────────────────────────────────┐
│          SUBQUERY LOCATIONS IN SQL                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. SELECT Clause (Scalar subquery)                         │
│     SELECT col1, (SELECT ...) AS col2                       │
│                                                              │
│  2. FROM Clause (Derived table)                             │
│     FROM (SELECT ...) AS alias                              │
│                                                              │
│  3. WHERE Clause (Filter condition)                         │
│     WHERE col1 = (SELECT ...)                               │
│     WHERE col1 IN (SELECT ...)                              │
│     WHERE EXISTS (SELECT ...)                               │
│                                                              │
│  4. HAVING Clause (Group filter)                            │
│     HAVING COUNT(*) > (SELECT ...)                          │
│                                                              │
│  5. INSERT Statement                                         │
│     INSERT INTO table1 SELECT ... FROM table2               │
│                                                              │
│  6. UPDATE Statement                                         │
│     UPDATE table1 SET col1 = (SELECT ...)                   │
│                                                              │
│  7. DELETE Statement                                         │
│     DELETE FROM table1 WHERE col1 IN (SELECT ...)           │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Examples in Different Clauses

-- SELECT clause
SELECT 
    e.employee_id,
    e.first_name,
    e.salary,
    (SELECT AVG(salary) FROM employees) AS company_avg,
    (SELECT MAX(salary) FROM employees WHERE department_id = e.department_id) AS dept_max
FROM employees e;

-- FROM clause
SELECT 
    dept_stats.department_id,
    dept_stats.avg_salary,
    dept_stats.emp_count
FROM (
    SELECT 
        department_id,
        AVG(salary) AS avg_salary,
        COUNT(*) AS emp_count
    FROM employees
    GROUP BY department_id
) AS dept_stats
WHERE dept_stats.emp_count > 5;

-- WHERE clause
SELECT *
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
AND department_id IN (SELECT department_id FROM departments WHERE location = 'NYC');

-- HAVING clause
SELECT 
    department_id,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING AVG(salary) > (
    SELECT AVG(salary)
    FROM employees
);

-- INSERT with subquery
INSERT INTO high_value_customers (customer_id, customer_name, total_spent)
SELECT 
    customer_id,
    customer_name,
    (SELECT SUM(total_amount) FROM orders WHERE orders.customer_id = customers.customer_id)
FROM customers
WHERE (SELECT SUM(total_amount) FROM orders WHERE orders.customer_id = customers.customer_id) > 10000;

-- UPDATE with subquery
UPDATE employees
SET salary = salary * 1.1
WHERE department_id IN (
    SELECT department_id
    FROM departments
    WHERE budget > 1000000
);

-- DELETE with subquery
DELETE FROM products
WHERE product_id NOT IN (
    SELECT DISTINCT product_id
    FROM order_items
    WHERE product_id IS NOT NULL
);

Performance Considerations

Performance Comparison

┌─────────────────────────────────────────────────────────────┐
│          SUBQUERY PERFORMANCE TIPS                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. EXISTS vs IN                                             │
│     • EXISTS: Stops at first match (faster)                 │
│     • IN: Processes entire result set                       │
│     • Use EXISTS for large datasets                         │
│                                                              │
│  2. JOIN vs Subquery                                         │
│     • JOINs often faster than correlated subqueries         │
│     • Modern optimizers can convert subqueries to joins     │
│     • Test both approaches                                  │
│                                                              │
│  3. Correlated vs Non-Correlated                            │
│     • Non-correlated: Executes once (faster)                │
│     • Correlated: Executes per row (slower)                 │
│     • Avoid correlated when possible                        │
│                                                              │
│  4. Subquery in SELECT                                       │
│     • Can be slow if executed for many rows                 │
│     • Consider JOIN or window functions instead             │
│                                                              │
│  5. NOT IN with NULLs                                        │
│     • Can return unexpected results                         │
│     • Use NOT EXISTS or LEFT JOIN instead                   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Optimization Examples

-- Slow: Correlated subquery in SELECT
SELECT 
    c.customer_id,
    c.customer_name,
    (SELECT COUNT(*) FROM orders WHERE customer_id = c.customer_id) AS order_count
FROM customers c;

-- Fast: JOIN with aggregation
SELECT 
    c.customer_id,
    c.customer_name,
    COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name;

-- Slow: NOT IN with potential NULLs
SELECT *
FROM products
WHERE product_id NOT IN (SELECT product_id FROM order_items);

-- Fast: NOT EXISTS
SELECT *
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

-- Slow: Multiple correlated subqueries
SELECT 
    e.employee_id,
    (SELECT COUNT(*) FROM projects WHERE manager_id = e.employee_id),
    (SELECT SUM(budget) FROM projects WHERE manager_id = e.employee_id)
FROM employees e;

-- Fast: Single JOIN with aggregation
SELECT 
    e.employee_id,
    COUNT(p.project_id) AS project_count,
    SUM(p.budget) AS total_budget
FROM employees e
LEFT JOIN projects p ON e.employee_id = p.manager_id
GROUP BY e.employee_id;

Best Practices

Subquery Best Practices

┌─────────────────────────────────────────────────────────────┐
│          SUBQUERY BEST PRACTICES                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ✓ DO:                                                      │
│    • Use EXISTS instead of IN for large datasets            │
│    • Avoid correlated subqueries when possible              │
│    • Use meaningful aliases for derived tables              │
│    • Test performance with EXPLAIN                          │
│    • Consider JOINs as alternative                          │
│    • Handle NULL values explicitly                          │
│    • Keep subqueries simple and readable                    │
│                                                              │
│  ✗ DON'T:                                                   │
│    • Use SELECT * in subqueries                             │
│    • Nest subqueries too deeply                             │
│    • Use NOT IN with nullable columns                       │
│    • Ignore performance implications                        │
│    • Forget to alias derived tables                         │
│    • Use correlated subqueries in SELECT unnecessarily      │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Common Use Cases

1. Finding Top N per Group

-- Find top 3 highest paid employees per department
SELECT *
FROM (
    SELECT 
        employee_id,
        first_name,
        last_name,
        salary,
        department_id,
        ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
    FROM employees
) AS ranked
WHERE rn <= 3;

-- Alternative with correlated subquery
SELECT e1.*
FROM employees e1
WHERE (
    SELECT COUNT(*)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
    AND e2.salary > e1.salary
) < 3;

2. Finding Duplicates

-- Find duplicate email addresses
SELECT email, COUNT(*) as count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

-- Find customers with duplicate emails
SELECT *
FROM customers
WHERE email IN (
    SELECT email
    FROM customers
    GROUP BY email
    HAVING COUNT(*) > 1
);

3. Running Totals

-- Calculate running total using subquery
SELECT 
    order_id,
    order_date,
    total_amount,
    (SELECT SUM(total_amount)
     FROM orders o2
     WHERE o2.order_date <= o1.order_date) AS running_total
FROM orders o1
ORDER BY order_date;

Interview Questions

Basic Questions

Q1: What is a subquery?

A subquery is a SELECT statement nested inside another SQL statement. It can return a single value, a single row, or multiple rows and can be used in various clauses like SELECT, FROM, WHERE, and HAVING.

Q2: What are the types of subqueries?

Scalar: Returns single value
Row: Returns single row with multiple columns
Table: Returns multiple rows and columns
Correlated: References outer query columns

Q3: What's the difference between correlated and non-correlated subqueries?

Non-correlated: Independent of outer query, executes once
Correlated: References outer query columns, executes once per outer row

Intermediate Questions

Q4: When should you use EXISTS instead of IN?

Use EXISTS when:

Working with large datasets (EXISTS stops at first match)
Checking for existence rather than matching values
Dealing with NULL values (EXISTS handles them better)

Q5: What's the problem with NOT IN and NULL values?

If the subquery returns any NULL values, NOT IN returns no rows because NULL comparisons are undefined. Use NOT EXISTS or LEFT JOIN with IS NULL instead.

Q6: How do you optimize a correlated subquery?

Convert to JOIN when possible
Use window functions
Create indexes on join columns
Consider materialized views for complex calculations

Advanced Questions

Q7: Explain the difference between ANY and ALL.

ANY: Returns TRUE if condition is true for at least one value
ALL: Returns TRUE if condition is true for all values
Example: salary > ANY (subquery) vs salary > ALL (subquery)

Q8: How do subqueries affect query performance?

Correlated subqueries execute once per outer row (slow)
Non-correlated subqueries execute once (faster)
Subqueries in SELECT clause can be expensive
Modern optimizers may convert subqueries to joins

Q9: Can you use a subquery in an INSERT statement?

Yes, you can insert results from a subquery:

INSERT INTO table1 (col1, col2)
SELECT col1, col2
FROM table2
WHERE condition;

Q10: What's the difference between a subquery and a CTE?

Subquery: Inline, can be used once
CTE: Named, can be referenced multiple times, more readable
CTEs are better for complex queries and recursive operations

Summary

Subqueries are powerful tools for complex data retrieval. Key takeaways:

Types: Scalar, row, table, and correlated subqueries
Operators: EXISTS, IN, ANY, ALL for different comparison needs
Performance: EXISTS often faster than IN, avoid correlated when possible
Best Practices: Use appropriate type, handle NULLs, consider JOINs as alternative
Use Cases: Finding top N, duplicates, running totals, complex filtering

Master subqueries to write more sophisticated and efficient SQL queries that solve complex data problems elegantly.