SQL Subqueries - Complete Guide with Diagrams
Master SQL subqueries with visual diagrams, practical examples, and best practices. Learn scalar, correlated, EXISTS, IN, ANY, and ALL subqueries.
SQL Subqueries - Complete Guide
Table of Contents
- Introduction to Subqueries
- Scalar Subqueries
- Row Subqueries
- Table Subqueries
- Correlated Subqueries
- EXISTS and NOT EXISTS
- IN and NOT IN
- ANY and ALL
- Subqueries in Different Clauses
- Performance Considerations
- Best Practices
- Common Use Cases
- Interview Questions
Introduction to Subqueries
A subquery (also called an inner query or nested query) is a query within another SQL query. Subqueries can return different types of results and can be used in various parts of a SQL statement.
What is a Subquery?
A subquery is a SELECT statement embedded within another SQL statement. It provides a way to perform operations that would otherwise require multiple separate queries or complex joins.
Subquery Types
┌─────────────────────────────────────────────────────────────┐
│ SUBQUERY TYPES │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. SCALAR SUBQUERY │
│ • Returns single value (one row, one column) │
│ • Can be used anywhere a single value is expected │
│ • Example: SELECT (SELECT MAX(salary) FROM employees) │
│ │
│ 2. ROW SUBQUERY │
│ • Returns single row (multiple columns) │
│ • Used with row comparisons │
│ • Example: WHERE (col1, col2) = (SELECT ...) │
│ │
│ 3. TABLE SUBQUERY │
│ • Returns multiple rows and columns │
│ • Used with IN, EXISTS, FROM clause │
│ • Example: SELECT * FROM (SELECT ...) AS subquery │
│ │
│ 4. CORRELATED SUBQUERY │
│ • References outer query columns │
│ • Executes once per outer query row │
│ • Example: WHERE salary > (SELECT AVG(salary) ...) │
│ │
└─────────────────────────────────────────────────────────────┘
Subquery Execution Flow
┌─────────────────────────────────────────────────────────────┐
│ SUBQUERY EXECUTION FLOW │
├─────────────────────────────────────────────────────────────┤
│ │
│ Non-Correlated Subquery: │
│ ┌──────────────────────────────────────┐ │
│ │ 1. Execute subquery once │ │
│ │ (independent of outer query) │ │
│ └──────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ 2. Store result │ │
│ └──────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ 3. Use result in outer query │ │
│ └──────────────────────────────────────┘ │
│ │
│ Correlated Subquery: │
│ ┌──────────────────────────────────────┐ │
│ │ 1. Outer query processes row 1 │ │
│ └──────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ 2. Subquery executes for row 1 │ │
│ │ (uses row 1 values) │ │
│ └──────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ 3. Result used for row 1 │ │
│ └──────────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ Repeat for each row in outer query │
│ │
└─────────────────────────────────────────────────────────────┘
Subquery Syntax
-- Basic subquery structure
SELECT column1, column2
FROM table1
WHERE column3 = (SELECT column4 FROM table2 WHERE condition);
-- Subquery in SELECT clause
SELECT
column1,
(SELECT MAX(column2) FROM table2) AS max_value
FROM table1;
-- Subquery in FROM clause
SELECT *
FROM (
SELECT column1, column2
FROM table1
WHERE condition
) AS subquery_alias;
Scalar Subqueries
Scalar subqueries return a single value (one row, one column) and can be used anywhere a single value is expected.
Scalar Subquery Concept
┌─────────────────────────────────────────────────────────────┐
│ SCALAR SUBQUERY CONCEPT │
├─────────────────────────────────────────────────────────────┤
│ │
│ Outer Query: │
│ SELECT name, salary │
│ FROM employees │
│ WHERE salary > (SELECT AVG(salary) FROM employees) │
│ │ │
│ └──────────┐ │
│ │ │
│ Subquery executes first: │ │
│ ┌────────────────────────────────▼──┐ │
│ │ SELECT AVG(salary) │ │
│ │ FROM employees │ │
│ │ Result: 50000 (single value) │ │
│ └────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Outer query becomes: │
│ SELECT name, salary │
│ FROM employees │
│ WHERE salary > 50000 │
│ │
└─────────────────────────────────────────────────────────────┘
Scalar Subquery Examples
-- Find employees earning more than average
SELECT
employee_id,
first_name,
last_name,
salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Compare with maximum value
SELECT
product_name,
price,
(SELECT MAX(price) FROM products) AS max_price,
price - (SELECT MAX(price) FROM products) AS difference
FROM products;
-- Use in SELECT clause
SELECT
order_id,
customer_id,
total_amount,
(SELECT AVG(total_amount) FROM orders) AS avg_order_amount,
total_amount - (SELECT AVG(total_amount) FROM orders) AS difference_from_avg
FROM orders;
-- Multiple scalar subqueries
SELECT
employee_id,
salary,
(SELECT MIN(salary) FROM employees) AS min_salary,
(SELECT MAX(salary) FROM employees) AS max_salary,
(SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees
WHERE department_id = 10;
Row Subqueries
Row subqueries return a single row with multiple columns.
Row Subquery Concept
┌─────────────────────────────────────────────────────────────┐
│ ROW SUBQUERY CONCEPT │
├─────────────────────────────────────────────────────────────┤
│ │
│ Compare multiple columns at once: │
│ │
│ SELECT * FROM employees │
│ WHERE (department_id, salary) = │
│ (SELECT department_id, MAX(salary) │
│ FROM employees │
│ WHERE department_id = 10) │
│ │
│ Subquery returns: (10, 75000) │
│ │ │ │
│ │ └─ salary │
│ └────── department_id │
│ │
│ Matches rows where both conditions are true │
│ │
└─────────────────────────────────────────────────────────────┘
Row Subquery Examples
-- Find employee with highest salary in department
SELECT *
FROM employees
WHERE (department_id, salary) = (
SELECT department_id, MAX(salary)
FROM employees
WHERE department_id = 10
);
-- Compare multiple columns
SELECT *
FROM products
WHERE (category_id, price) IN (
SELECT category_id, MAX(price)
FROM products
GROUP BY category_id
);
-- Row comparison with operators
SELECT *
FROM employees e1
WHERE (salary, hire_date) > (
SELECT AVG(salary), MIN(hire_date)
FROM employees e2
WHERE e2.department_id = e1.department_id
);
Table Subqueries
Table subqueries return multiple rows and columns, creating a derived table.
Table Subquery Concept
┌─────────────────────────────────────────────────────────────┐
│ TABLE SUBQUERY CONCEPT │
├─────────────────────────────────────────────────────────────┤
│ │
│ Subquery as a derived table: │
│ │
│ SELECT dept_name, avg_salary │
│ FROM ( │
│ SELECT │
│ department_id, │
│ AVG(salary) as avg_salary │
│ FROM employees │
│ GROUP BY department_id │
│ ) AS dept_avg │
│ JOIN departments d ON dept_avg.department_id = d.id │
│ │
│ Subquery creates temporary result set: │
│ ┌──────────────┬─────────────┐ │
│ │ department_id│ avg_salary │ │
│ ├──────────────┼─────────────┤ │
│ │ 1 │ 50000 │ │
│ │ 2 │ 60000 │ │
│ │ 3 │ 55000 │ │
│ └──────────────┴─────────────┘ │
│ │
│ This derived table is then joined with departments │
│ │
└─────────────────────────────────────────────────────────────┘
Table Subquery Examples
-- Derived table in FROM clause
SELECT
dept_summary.department_id,
d.department_name,
dept_summary.emp_count,
dept_summary.avg_salary
FROM (
SELECT
department_id,
COUNT(*) as emp_count,
AVG(salary) as avg_salary
FROM employees
GROUP BY department_id
) AS dept_summary
JOIN departments d ON dept_summary.department_id = d.department_id;
-- Multiple levels of subqueries
SELECT *
FROM (
SELECT
customer_id,
SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id
) AS customer_totals
WHERE total_spent > (
SELECT AVG(total_spent)
FROM (
SELECT SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id
) AS avg_calc
);
-- Subquery with aggregation
SELECT
category,
product_count,
avg_price
FROM (
SELECT
category_id,
COUNT(*) as product_count,
AVG(price) as avg_price
FROM products
GROUP BY category_id
HAVING COUNT(*) > 5
) AS category_stats
JOIN categories c ON category_stats.category_id = c.category_id;
Correlated Subqueries
Correlated subqueries reference columns from the outer query and execute once for each row processed by the outer query.
Correlated Subquery Execution
┌─────────────────────────────────────────────────────────────┐
│ CORRELATED SUBQUERY EXECUTION │
├─────────────────────────────────────────────────────────────┤
│ │
│ SELECT e1.name, e1.salary │
│ FROM employees e1 │
│ WHERE e1.salary > ( │
│ SELECT AVG(e2.salary) │
│ FROM employees e2 │
│ WHERE e2.department_id = e1.department_id │
│ ) │ │
│ └─ References outer query │
│ │
│ Execution for each row: │
│ │
│ Row 1: e1 = {id:1, dept:10, salary:60000} │
│ ┌────────────────────────────────────────┐ │
│ │ Subquery: AVG salary for dept 10 │ │
│ │ Result: 50000 │ │
│ │ Compare: 60000 > 50000 ✓ Include │ │
│ └────────────────────────────────────────┘ │
│ │
│ Row 2: e1 = {id:2, dept:10, salary:45000} │
│ ┌────────────────────────────────────────┐ │
│ │ Subquery: AVG salary for dept 10 │ │
│ │ Result: 50000 │ │
│ │ Compare: 45000 > 50000 ✗ Exclude │ │
│ └────────────────────────────────────────┘ │
│ │
│ Row 3: e1 = {id:3, dept:20, salary:70000} │
│ ┌────────────────────────────────────────┐ │
│ │ Subquery: AVG salary for dept 20 │ │
│ │ Result: 65000 │ │
│ │ Compare: 70000 > 65000 ✓ Include │ │
│ └────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Correlated Subquery Examples
-- Find employees earning above department average
SELECT
e1.employee_id,
e1.first_name,
e1.last_name,
e1.salary,
e1.department_id
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
);
-- Find customers with above-average order totals
SELECT
c.customer_id,
c.customer_name,
(SELECT SUM(o.total_amount)
FROM orders o
WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c
WHERE (
SELECT SUM(o.total_amount)
FROM orders o
WHERE o.customer_id = c.customer_id
) > (
SELECT AVG(customer_total)
FROM (
SELECT SUM(total_amount) AS customer_total
FROM orders
GROUP BY customer_id
) AS totals
);
-- Find products more expensive than category average
SELECT
p1.product_id,
p1.product_name,
p1.price,
p1.category_id
FROM products p1
WHERE p1.price > (
SELECT AVG(p2.price)
FROM products p2
WHERE p2.category_id = p1.category_id
);
EXISTS and NOT EXISTS
EXISTS checks if a subquery returns any rows. It's more efficient than IN for large datasets.
EXISTS Concept
┌─────────────────────────────────────────────────────────────┐
│ EXISTS vs IN COMPARISON │
├─────────────────────────────────────────────────────────────┤
│ │
│ EXISTS: │
│ • Returns TRUE if subquery returns any rows │
│ • Stops as soon as first match is found │
│ • More efficient for large datasets │
│ • Can handle NULL values better │
│ │
│ SELECT * FROM customers c │
│ WHERE EXISTS ( │
│ SELECT 1 FROM orders o │
│ WHERE o.customer_id = c.customer_id │
│ ) │
│ │
│ Execution: │
│ For each customer: │
│ ┌────────────────────────────────┐ │
│ │ Check if any orders exist │ │
│ │ If yes: Include customer ✓ │ │
│ │ If no: Exclude customer ✗ │ │
│ │ Stop checking after first match│ │
│ └────────────────────────────────┘ │
│ │
│ IN: │
│ • Compares value against list │
│ • Processes entire subquery result │
│ • Can be slower for large datasets │
│ │
└─────────────────────────────────────────────────────────────┘
EXISTS Examples
-- Find customers who have placed orders
SELECT
customer_id,
customer_name,
email
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- Find customers who haven't placed orders
SELECT
customer_id,
customer_name,
email
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- Find employees who manage others
SELECT
e1.employee_id,
e1.first_name,
e1.last_name
FROM employees e1
WHERE EXISTS (
SELECT 1
FROM employees e2
WHERE e2.manager_id = e1.employee_id
);
-- Find products never ordered
SELECT
product_id,
product_name,
price
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.product_id
);
-- Complex EXISTS with multiple conditions
SELECT
c.customer_id,
c.customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
AND o.total_amount > 1000
);
IN and NOT IN
IN operator checks if a value matches any value in a list or subquery result.
IN Operator Concept
┌─────────────────────────────────────────────────────────────┐
│ IN OPERATOR CONCEPT │
├─────────────────────────────────────────────────────────────┤
│ │
│ SELECT * FROM employees │
│ WHERE department_id IN ( │
│ SELECT department_id │
│ FROM departments │
│ WHERE location = 'New York' │
│ ) │
│ │
│ Subquery returns: [10, 20, 30] │
│ │
│ Equivalent to: │
│ WHERE department_id = 10 │
│ OR department_id = 20 │
│ OR department_id = 30 │
│ │
│ ⚠️ NULL Handling: │
│ • IN handles NULLs correctly │
│ • NOT IN with NULLs can return unexpected results │
│ │
└─────────────────────────────────────────────────────────────┘
IN and NOT IN Examples
-- Find employees in specific departments
SELECT
employee_id,
first_name,
last_name,
department_id
FROM employees
WHERE department_id IN (
SELECT department_id
FROM departments
WHERE location = 'New York'
);
-- Find products in popular categories
SELECT
product_id,
product_name,
category_id
FROM products
WHERE category_id IN (
SELECT category_id
FROM (
SELECT category_id, COUNT(*) as product_count
FROM products
GROUP BY category_id
HAVING COUNT(*) > 10
) AS popular_categories
);
-- Find customers who ordered specific products
SELECT DISTINCT
c.customer_id,
c.customer_name
FROM customers c
WHERE c.customer_id IN (
SELECT o.customer_id
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE oi.product_id IN (101, 102, 103)
);
-- NOT IN example (be careful with NULLs)
SELECT
product_id,
product_name
FROM products
WHERE product_id NOT IN (
SELECT product_id
FROM order_items
WHERE product_id IS NOT NULL -- Important!
);
-- Better alternative to NOT IN with NULLs
SELECT
p.product_id,
p.product_name
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.product_id
);
ANY and ALL
ANY and ALL operators compare a value to a set of values returned by a subquery.
ANY and ALL Concepts
┌─────────────────────────────────────────────────────────────┐
│ ANY and ALL OPERATORS │
├─────────────────────────────────────────────────────────────┤
│ │
│ ANY (or SOME): │
│ • Returns TRUE if comparison is true for ANY value │
│ • = ANY is equivalent to IN │
│ │
│ WHERE salary > ANY (SELECT salary FROM employees) │
│ Returns TRUE if salary is greater than at least one value │
│ │
│ ALL: │
│ • Returns TRUE if comparison is true for ALL values │
│ • More restrictive than ANY │
│ │
│ WHERE salary > ALL (SELECT salary FROM employees) │
│ Returns TRUE if salary is greater than every value │
│ │
│ Comparison: │
│ Subquery returns: [30000, 40000, 50000] │
│ │
│ salary = 45000 │
│ • > ANY: TRUE (45000 > 30000) ✓ │
│ • > ALL: FALSE (45000 not > 50000) ✗ │
│ │
│ salary = 60000 │
│ • > ANY: TRUE (60000 > 30000) ✓ │
│ • > ALL: TRUE (60000 > 50000) ✓ │
│ │
└─────────────────────────────────────────────────────────────┘
ANY and ALL Examples
-- Find employees earning more than ANY employee in dept 10
SELECT
employee_id,
first_name,
salary
FROM employees
WHERE salary > ANY (
SELECT salary
FROM employees
WHERE department_id = 10
);
-- Find employees earning more than ALL employees in dept 10
SELECT
employee_id,
first_name,
salary
FROM employees
WHERE salary > ALL (
SELECT salary
FROM employees
WHERE department_id = 10
);
-- Find products cheaper than ANY product in category 1
SELECT
product_id,
product_name,
price
FROM products
WHERE price < ANY (
SELECT price
FROM products
WHERE category_id = 1
);
-- Find products more expensive than ALL products in category 1
SELECT
product_id,
product_name,
price
FROM products
WHERE price > ALL (
SELECT price
FROM products
WHERE category_id = 1
);
-- = ANY is equivalent to IN
SELECT * FROM employees
WHERE department_id = ANY (SELECT department_id FROM departments WHERE location = 'NYC');
-- Same as:
SELECT * FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'NYC');
-- <> ALL is equivalent to NOT IN
SELECT * FROM products
WHERE category_id <> ALL (SELECT category_id FROM categories WHERE active = 0);
Subqueries in Different Clauses
Subqueries can be used in various parts of a SQL statement.
Subquery Locations
┌─────────────────────────────────────────────────────────────┐
│ SUBQUERY LOCATIONS IN SQL │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. SELECT Clause (Scalar subquery) │
│ SELECT col1, (SELECT ...) AS col2 │
│ │
│ 2. FROM Clause (Derived table) │
│ FROM (SELECT ...) AS alias │
│ │
│ 3. WHERE Clause (Filter condition) │
│ WHERE col1 = (SELECT ...) │
│ WHERE col1 IN (SELECT ...) │
│ WHERE EXISTS (SELECT ...) │
│ │
│ 4. HAVING Clause (Group filter) │
│ HAVING COUNT(*) > (SELECT ...) │
│ │
│ 5. INSERT Statement │
│ INSERT INTO table1 SELECT ... FROM table2 │
│ │
│ 6. UPDATE Statement │
│ UPDATE table1 SET col1 = (SELECT ...) │
│ │
│ 7. DELETE Statement │
│ DELETE FROM table1 WHERE col1 IN (SELECT ...) │
│ │
└─────────────────────────────────────────────────────────────┘
Examples in Different Clauses
-- SELECT clause
SELECT
e.employee_id,
e.first_name,
e.salary,
(SELECT AVG(salary) FROM employees) AS company_avg,
(SELECT MAX(salary) FROM employees WHERE department_id = e.department_id) AS dept_max
FROM employees e;
-- FROM clause
SELECT
dept_stats.department_id,
dept_stats.avg_salary,
dept_stats.emp_count
FROM (
SELECT
department_id,
AVG(salary) AS avg_salary,
COUNT(*) AS emp_count
FROM employees
GROUP BY department_id
) AS dept_stats
WHERE dept_stats.emp_count > 5;
-- WHERE clause
SELECT *
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
AND department_id IN (SELECT department_id FROM departments WHERE location = 'NYC');
-- HAVING clause
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING AVG(salary) > (
SELECT AVG(salary)
FROM employees
);
-- INSERT with subquery
INSERT INTO high_value_customers (customer_id, customer_name, total_spent)
SELECT
customer_id,
customer_name,
(SELECT SUM(total_amount) FROM orders WHERE orders.customer_id = customers.customer_id)
FROM customers
WHERE (SELECT SUM(total_amount) FROM orders WHERE orders.customer_id = customers.customer_id) > 10000;
-- UPDATE with subquery
UPDATE employees
SET salary = salary * 1.1
WHERE department_id IN (
SELECT department_id
FROM departments
WHERE budget > 1000000
);
-- DELETE with subquery
DELETE FROM products
WHERE product_id NOT IN (
SELECT DISTINCT product_id
FROM order_items
WHERE product_id IS NOT NULL
);
Performance Considerations
Performance Comparison
┌─────────────────────────────────────────────────────────────┐
│ SUBQUERY PERFORMANCE TIPS │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. EXISTS vs IN │
│ • EXISTS: Stops at first match (faster) │
│ • IN: Processes entire result set │
│ • Use EXISTS for large datasets │
│ │
│ 2. JOIN vs Subquery │
│ • JOINs often faster than correlated subqueries │
│ • Modern optimizers can convert subqueries to joins │
│ • Test both approaches │
│ │
│ 3. Correlated vs Non-Correlated │
│ • Non-correlated: Executes once (faster) │
│ • Correlated: Executes per row (slower) │
│ • Avoid correlated when possible │
│ │
│ 4. Subquery in SELECT │
│ • Can be slow if executed for many rows │
│ • Consider JOIN or window functions instead │
│ │
│ 5. NOT IN with NULLs │
│ • Can return unexpected results │
│ • Use NOT EXISTS or LEFT JOIN instead │
│ │
└─────────────────────────────────────────────────────────────┘
Optimization Examples
-- Slow: Correlated subquery in SELECT
SELECT
c.customer_id,
c.customer_name,
(SELECT COUNT(*) FROM orders WHERE customer_id = c.customer_id) AS order_count
FROM customers c;
-- Fast: JOIN with aggregation
SELECT
c.customer_id,
c.customer_name,
COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name;
-- Slow: NOT IN with potential NULLs
SELECT *
FROM products
WHERE product_id NOT IN (SELECT product_id FROM order_items);
-- Fast: NOT EXISTS
SELECT *
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.product_id
);
-- Slow: Multiple correlated subqueries
SELECT
e.employee_id,
(SELECT COUNT(*) FROM projects WHERE manager_id = e.employee_id),
(SELECT SUM(budget) FROM projects WHERE manager_id = e.employee_id)
FROM employees e;
-- Fast: Single JOIN with aggregation
SELECT
e.employee_id,
COUNT(p.project_id) AS project_count,
SUM(p.budget) AS total_budget
FROM employees e
LEFT JOIN projects p ON e.employee_id = p.manager_id
GROUP BY e.employee_id;
Best Practices
Subquery Best Practices
┌─────────────────────────────────────────────────────────────┐
│ SUBQUERY BEST PRACTICES │
├─────────────────────────────────────────────────────────────┤
│ │
│ ✓ DO: │
│ • Use EXISTS instead of IN for large datasets │
│ • Avoid correlated subqueries when possible │
│ • Use meaningful aliases for derived tables │
│ • Test performance with EXPLAIN │
│ • Consider JOINs as alternative │
│ • Handle NULL values explicitly │
│ • Keep subqueries simple and readable │
│ │
│ ✗ DON'T: │
│ • Use SELECT * in subqueries │
│ • Nest subqueries too deeply │
│ • Use NOT IN with nullable columns │
│ • Ignore performance implications │
│ • Forget to alias derived tables │
│ • Use correlated subqueries in SELECT unnecessarily │
│ │
└─────────────────────────────────────────────────────────────┘
Common Use Cases
1. Finding Top N per Group
-- Find top 3 highest paid employees per department
SELECT *
FROM (
SELECT
employee_id,
first_name,
last_name,
salary,
department_id,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
FROM employees
) AS ranked
WHERE rn <= 3;
-- Alternative with correlated subquery
SELECT e1.*
FROM employees e1
WHERE (
SELECT COUNT(*)
FROM employees e2
WHERE e2.department_id = e1.department_id
AND e2.salary > e1.salary
) < 3;
2. Finding Duplicates
-- Find duplicate email addresses
SELECT email, COUNT(*) as count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;
-- Find customers with duplicate emails
SELECT *
FROM customers
WHERE email IN (
SELECT email
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
);
3. Running Totals
-- Calculate running total using subquery
SELECT
order_id,
order_date,
total_amount,
(SELECT SUM(total_amount)
FROM orders o2
WHERE o2.order_date <= o1.order_date) AS running_total
FROM orders o1
ORDER BY order_date;
Interview Questions
Basic Questions
Q1: What is a subquery?
A subquery is a SELECT statement nested inside another SQL statement. It can return a single value, a single row, or multiple rows and can be used in various clauses like SELECT, FROM, WHERE, and HAVING.
Q2: What are the types of subqueries?
- Scalar: Returns single value
- Row: Returns single row with multiple columns
- Table: Returns multiple rows and columns
- Correlated: References outer query columns
Q3: What's the difference between correlated and non-correlated subqueries?
- Non-correlated: Independent of outer query, executes once
- Correlated: References outer query columns, executes once per outer row
Intermediate Questions
Q4: When should you use EXISTS instead of IN?
Use EXISTS when:
- Working with large datasets (EXISTS stops at first match)
- Checking for existence rather than matching values
- Dealing with NULL values (EXISTS handles them better)
Q5: What's the problem with NOT IN and NULL values?
If the subquery returns any NULL values, NOT IN returns no rows because NULL comparisons are undefined. Use NOT EXISTS or LEFT JOIN with IS NULL instead.
Q6: How do you optimize a correlated subquery?
- Convert to JOIN when possible
- Use window functions
- Create indexes on join columns
- Consider materialized views for complex calculations
Advanced Questions
Q7: Explain the difference between ANY and ALL.
- ANY: Returns TRUE if condition is true for at least one value
- ALL: Returns TRUE if condition is true for all values
- Example:
salary > ANY (subquery)vssalary > ALL (subquery)
Q8: How do subqueries affect query performance?
- Correlated subqueries execute once per outer row (slow)
- Non-correlated subqueries execute once (faster)
- Subqueries in SELECT clause can be expensive
- Modern optimizers may convert subqueries to joins
Q9: Can you use a subquery in an INSERT statement?
Yes, you can insert results from a subquery:
INSERT INTO table1 (col1, col2)
SELECT col1, col2
FROM table2
WHERE condition;
Q10: What's the difference between a subquery and a CTE?
- Subquery: Inline, can be used once
- CTE: Named, can be referenced multiple times, more readable
- CTEs are better for complex queries and recursive operations
Summary
Subqueries are powerful tools for complex data retrieval. Key takeaways:
- Types: Scalar, row, table, and correlated subqueries
- Operators: EXISTS, IN, ANY, ALL for different comparison needs
- Performance: EXISTS often faster than IN, avoid correlated when possible
- Best Practices: Use appropriate type, handle NULLs, consider JOINs as alternative
- Use Cases: Finding top N, duplicates, running totals, complex filtering
Master subqueries to write more sophisticated and efficient SQL queries that solve complex data problems elegantly.