Full Stack • Java • System Design • Cloud • AI Engineering

SQL Subqueries

Master SQL subqueries with Mermaid diagrams, practical examples covering scalar, correlated, EXISTS, IN, ANY, and ALL subqueries

A subquery (nested query) is a SELECT statement embedded within another SQL query. It provides a way to perform complex operations that would otherwise require multiple queries or joins.

Subquery Types

graph TB
    A[SQL Subqueries] --> B[Scalar Subquery]
    A --> C[Row Subquery]
    A --> D[Table Subquery]
    A --> E[Correlated Subquery]
    
    B --> B1[Single Value]
    B --> B2[One row, one column]
    
    C --> C1[Single Row]
    C --> C2[One row, multiple columns]
    
    D --> D1[Multiple Rows]
    D --> D2[Multiple columns]
    
    E --> E1[References Outer Query]
    E --> E2[Executes per row]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#F44336

Key Points:

  • Scalar: Returns single value, used anywhere a value is expected
  • Row: Returns single row with multiple columns
  • Table: Returns multiple rows and columns, used with IN, EXISTS, FROM
  • Correlated: References outer query columns, executes for each outer row
  • Non-Correlated: Independent of outer query, executes once

Non-Correlated Subquery Execution

sequenceDiagram
    participant Outer as Outer Query
    participant Sub as Subquery
    participant Result
    
    Outer->>Sub: Execute subquery first
    Sub->>Sub: Process independently
    Sub->>Result: Return result set
    Result->>Outer: Use result in outer query
    Outer->>Outer: Filter/process with result
    
    Note over Sub: Executes only once

Key Points:

  • Independent: Subquery doesn't reference outer query columns
  • Single Execution: Runs once before outer query
  • Result Caching: Result stored and reused
  • Performance: Generally faster than correlated subqueries

Correlated Subquery Execution

sequenceDiagram
    participant Outer as Outer Query
    participant Row as Current Row
    participant Sub as Subquery
    
    loop For each row
        Outer->>Row: Process row 1
        Row->>Sub: Execute with row 1 values
        Sub->>Row: Return result
        Row->>Outer: Include/exclude row
        
        Outer->>Row: Process row 2
        Row->>Sub: Execute with row 2 values
        Sub->>Row: Return result
        Row->>Outer: Include/exclude row
    end
    
    Note over Sub: Executes once per row

Key Points:

  • Row-Dependent: References outer query columns
  • Multiple Executions: Runs once for each outer query row
  • Dynamic: Different result for each row
  • Performance: Can be slower for large datasets
  • Use Case: Compare row values with aggregates

EXISTS vs IN Comparison

graph LR
    A[Existence Check] --> B[EXISTS]
    A --> C[IN]
    
    B --> B1[Returns TRUE/FALSE]
    B --> B2[Stops at first match]
    B --> B3[Better for large sets]
    
    C --> C1[Compares values]
    C --> C2[Processes all results]
    C --> C3[Better for small sets]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800

Key Points:

  • EXISTS: Checks if any rows exist, stops at first match
  • IN: Compares value against list, processes entire result
  • Performance: EXISTS faster for large datasets
  • NULL Handling: EXISTS handles NULLs better than IN
  • Readability: IN more intuitive for simple value lists

Subquery Locations

flowchart TB
    A[Subquery Placement] --> B[SELECT Clause]
    A --> C[FROM Clause]
    A --> D[WHERE Clause]
    A --> E[HAVING Clause]
    
    B --> B1[Scalar subquery]
    B --> B2[Calculated columns]
    
    C --> C1[Derived table]
    C --> C2[Inline view]
    
    D --> D1[Filter conditions]
    D --> D2[EXISTS, IN, ANY, ALL]
    
    E --> E1[Group filtering]
    E --> E2[Aggregate conditions]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#F44336

Key Points:

  • SELECT: Add calculated columns using scalar subqueries
  • FROM: Create derived tables for complex queries
  • WHERE: Filter rows based on subquery results
  • HAVING: Filter grouped results with aggregates

ANY and ALL Operators

graph TB
    A[Comparison Operators] --> B[ANY/SOME]
    A --> C[ALL]
    
    B --> B1[True if ANY match]
    B --> B2[salary > ANY subquery]
    B --> B3[At least one condition]
    
    C --> C1[True if ALL match]
    C --> C2[salary > ALL subquery]
    C --> C3[Every condition]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800

Key Points:

  • ANY: Condition true if it matches at least one value
  • ALL: Condition true only if it matches all values
  • SOME: Synonym for ANY
  • Use Cases: ANY for "greater than minimum", ALL for "greater than maximum"

Performance Optimization

flowchart LR
    A[Subquery] --> B{Can Convert?}
    B -->|Yes| C[JOIN]
    B -->|No| D[Optimize Subquery]
    
    C --> E[Better Performance]
    D --> F[Add Indexes]
    D --> G[Limit Results]
    
    style A fill:#2196F3
    style B fill:#FF9800
    style C fill:#4CAF50
    style D fill:#9C27B0

Key Points:

  • JOIN Alternative: Often faster than subqueries
  • Indexes: Index columns used in subquery conditions
  • LIMIT: Restrict subquery results when possible
  • EXISTS vs IN: Use EXISTS for large datasets
  • Avoid Correlated: Convert to JOIN when possible

Code Examples

Scalar Subquery

-- Single value in SELECT
SELECT 
    employee_name,
    salary,
    (SELECT AVG(salary) FROM employees) AS avg_salary,
    salary - (SELECT AVG(salary) FROM employees) AS diff
FROM employees;

Correlated Subquery

-- Compare with department average
SELECT 
    e1.employee_name,
    e1.salary,
    e1.department_id
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

EXISTS

-- Find customers with orders
SELECT customer_id, customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

-- Find customers without orders
SELECT customer_id, customer_name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

IN Operator

-- Find employees in specific departments
SELECT employee_name, department_id
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM departments
    WHERE location = 'New York'
);

-- NOT IN for exclusion
SELECT product_name
FROM products
WHERE product_id NOT IN (
    SELECT DISTINCT product_id
    FROM order_items
);

ANY and ALL

-- Salary greater than ANY in department 10
SELECT employee_name, salary
FROM employees
WHERE salary > ANY (
    SELECT salary
    FROM employees
    WHERE department_id = 10
);

-- Salary greater than ALL in department 10
SELECT employee_name, salary
FROM employees
WHERE salary > ALL (
    SELECT salary
    FROM employees
    WHERE department_id = 10
);

Subquery in FROM

-- Derived table
SELECT dept_name, avg_salary
FROM (
    SELECT 
        d.department_name AS dept_name,
        AVG(e.salary) AS avg_salary
    FROM employees e
    JOIN departments d ON e.department_id = d.department_id
    GROUP BY d.department_name
) AS dept_averages
WHERE avg_salary > 50000;

Best Practices

  1. Use EXISTS over IN: For large datasets and better NULL handling
  2. Convert to JOINs: When possible for better performance
  3. Index Subquery Columns: Add indexes on columns used in WHERE
  4. Limit Correlated: Avoid when JOIN alternative exists
  5. Alias Subqueries: Always alias derived tables in FROM clause
  6. Test Performance: Compare subquery vs JOIN execution plans

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...