SQL Subqueries
Master SQL subqueries with Mermaid diagrams, practical examples covering scalar, correlated, EXISTS, IN, ANY, and ALL subqueries
A subquery (nested query) is a SELECT statement embedded within another SQL query. It provides a way to perform complex operations that would otherwise require multiple queries or joins.
Subquery Types
graph TB
A[SQL Subqueries] --> B[Scalar Subquery]
A --> C[Row Subquery]
A --> D[Table Subquery]
A --> E[Correlated Subquery]
B --> B1[Single Value]
B --> B2[One row, one column]
C --> C1[Single Row]
C --> C2[One row, multiple columns]
D --> D1[Multiple Rows]
D --> D2[Multiple columns]
E --> E1[References Outer Query]
E --> E2[Executes per row]
style A fill:#2196F3
style B fill:#4CAF50
style C fill:#FF9800
style D fill:#9C27B0
style E fill:#F44336
Key Points:
- Scalar: Returns single value, used anywhere a value is expected
- Row: Returns single row with multiple columns
- Table: Returns multiple rows and columns, used with IN, EXISTS, FROM
- Correlated: References outer query columns, executes for each outer row
- Non-Correlated: Independent of outer query, executes once
Non-Correlated Subquery Execution
sequenceDiagram
participant Outer as Outer Query
participant Sub as Subquery
participant Result
Outer->>Sub: Execute subquery first
Sub->>Sub: Process independently
Sub->>Result: Return result set
Result->>Outer: Use result in outer query
Outer->>Outer: Filter/process with result
Note over Sub: Executes only once
Key Points:
- Independent: Subquery doesn't reference outer query columns
- Single Execution: Runs once before outer query
- Result Caching: Result stored and reused
- Performance: Generally faster than correlated subqueries
Correlated Subquery Execution
sequenceDiagram
participant Outer as Outer Query
participant Row as Current Row
participant Sub as Subquery
loop For each row
Outer->>Row: Process row 1
Row->>Sub: Execute with row 1 values
Sub->>Row: Return result
Row->>Outer: Include/exclude row
Outer->>Row: Process row 2
Row->>Sub: Execute with row 2 values
Sub->>Row: Return result
Row->>Outer: Include/exclude row
end
Note over Sub: Executes once per row
Key Points:
- Row-Dependent: References outer query columns
- Multiple Executions: Runs once for each outer query row
- Dynamic: Different result for each row
- Performance: Can be slower for large datasets
- Use Case: Compare row values with aggregates
EXISTS vs IN Comparison
graph LR
A[Existence Check] --> B[EXISTS]
A --> C[IN]
B --> B1[Returns TRUE/FALSE]
B --> B2[Stops at first match]
B --> B3[Better for large sets]
C --> C1[Compares values]
C --> C2[Processes all results]
C --> C3[Better for small sets]
style A fill:#2196F3
style B fill:#4CAF50
style C fill:#FF9800
Key Points:
- EXISTS: Checks if any rows exist, stops at first match
- IN: Compares value against list, processes entire result
- Performance: EXISTS faster for large datasets
- NULL Handling: EXISTS handles NULLs better than IN
- Readability: IN more intuitive for simple value lists
Subquery Locations
flowchart TB
A[Subquery Placement] --> B[SELECT Clause]
A --> C[FROM Clause]
A --> D[WHERE Clause]
A --> E[HAVING Clause]
B --> B1[Scalar subquery]
B --> B2[Calculated columns]
C --> C1[Derived table]
C --> C2[Inline view]
D --> D1[Filter conditions]
D --> D2[EXISTS, IN, ANY, ALL]
E --> E1[Group filtering]
E --> E2[Aggregate conditions]
style A fill:#2196F3
style B fill:#4CAF50
style C fill:#FF9800
style D fill:#9C27B0
style E fill:#F44336
Key Points:
- SELECT: Add calculated columns using scalar subqueries
- FROM: Create derived tables for complex queries
- WHERE: Filter rows based on subquery results
- HAVING: Filter grouped results with aggregates
ANY and ALL Operators
graph TB
A[Comparison Operators] --> B[ANY/SOME]
A --> C[ALL]
B --> B1[True if ANY match]
B --> B2[salary > ANY subquery]
B --> B3[At least one condition]
C --> C1[True if ALL match]
C --> C2[salary > ALL subquery]
C --> C3[Every condition]
style A fill:#2196F3
style B fill:#4CAF50
style C fill:#FF9800
Key Points:
- ANY: Condition true if it matches at least one value
- ALL: Condition true only if it matches all values
- SOME: Synonym for ANY
- Use Cases: ANY for "greater than minimum", ALL for "greater than maximum"
Performance Optimization
flowchart LR
A[Subquery] --> B{Can Convert?}
B -->|Yes| C[JOIN]
B -->|No| D[Optimize Subquery]
C --> E[Better Performance]
D --> F[Add Indexes]
D --> G[Limit Results]
style A fill:#2196F3
style B fill:#FF9800
style C fill:#4CAF50
style D fill:#9C27B0
Key Points:
- JOIN Alternative: Often faster than subqueries
- Indexes: Index columns used in subquery conditions
- LIMIT: Restrict subquery results when possible
- EXISTS vs IN: Use EXISTS for large datasets
- Avoid Correlated: Convert to JOIN when possible
Code Examples
Scalar Subquery
-- Single value in SELECT
SELECT
employee_name,
salary,
(SELECT AVG(salary) FROM employees) AS avg_salary,
salary - (SELECT AVG(salary) FROM employees) AS diff
FROM employees;
Correlated Subquery
-- Compare with department average
SELECT
e1.employee_name,
e1.salary,
e1.department_id
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
);
EXISTS
-- Find customers with orders
SELECT customer_id, customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- Find customers without orders
SELECT customer_id, customer_name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
IN Operator
-- Find employees in specific departments
SELECT employee_name, department_id
FROM employees
WHERE department_id IN (
SELECT department_id
FROM departments
WHERE location = 'New York'
);
-- NOT IN for exclusion
SELECT product_name
FROM products
WHERE product_id NOT IN (
SELECT DISTINCT product_id
FROM order_items
);
ANY and ALL
-- Salary greater than ANY in department 10
SELECT employee_name, salary
FROM employees
WHERE salary > ANY (
SELECT salary
FROM employees
WHERE department_id = 10
);
-- Salary greater than ALL in department 10
SELECT employee_name, salary
FROM employees
WHERE salary > ALL (
SELECT salary
FROM employees
WHERE department_id = 10
);
Subquery in FROM
-- Derived table
SELECT dept_name, avg_salary
FROM (
SELECT
d.department_name AS dept_name,
AVG(e.salary) AS avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id
GROUP BY d.department_name
) AS dept_averages
WHERE avg_salary > 50000;
Best Practices
- Use EXISTS over IN: For large datasets and better NULL handling
- Convert to JOINs: When possible for better performance
- Index Subquery Columns: Add indexes on columns used in WHERE
- Limit Correlated: Avoid when JOIN alternative exists
- Alias Subqueries: Always alias derived tables in FROM clause
- Test Performance: Compare subquery vs JOIN execution plans
Comments
Share a question, correction, or practical insight about this article.
Checking login status...
Loading approved comments...