Full Stack • Java • System Design • Cloud • AI Engineering

Advanced SQL Topics

Master advanced SQL with Mermaid diagrams covering CTEs, recursive queries, PIVOT, dynamic SQL, optimization, and partitioning

Advanced SQL techniques enable complex data manipulation, hierarchical queries, performance optimization, and dynamic query generation for sophisticated database applications.

Advanced SQL Topics Overview

graph TB
    A[Advanced SQL] --> B[CTEs]
    A --> C[Recursive Queries]
    A --> D[PIVOT/UNPIVOT]
    A --> E[Dynamic SQL]
    A --> F[Optimization]
    A --> G[Partitioning]
    
    B --> B1[Readable queries]
    B --> B2[Reusable subqueries]
    
    C --> C1[Hierarchical data]
    C --> C2[Tree traversal]
    
    D --> D1[Rows to columns]
    D --> D2[Columns to rows]
    
    E --> E1[Runtime SQL generation]
    E --> E2[Flexible queries]
    
    F --> F1[Execution plans]
    F --> F2[Index optimization]
    
    G --> G1[Large table management]
    G --> G2[Query performance]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#F44336
    style F fill:#00BCD4
    style G fill:#E91E63

Key Points:

  • CTEs: Common Table Expressions for readable, reusable queries
  • Recursive: Handle hierarchical data (org charts, categories)
  • PIVOT: Transform rows to columns for reporting
  • Dynamic SQL: Generate SQL at runtime for flexibility
  • Optimization: Improve query performance with execution plans
  • Partitioning: Manage large tables efficiently

CTE (Common Table Expression) Flow

sequenceDiagram
    participant Query
    participant CTE as CTE Definition
    participant Main as Main Query
    participant Result
    
    Query->>CTE: WITH cte_name AS (SELECT...)
    CTE->>CTE: Execute subquery
    CTE->>Main: Make result available
    Main->>Main: Reference CTE
    Main->>Result: Return final result
    
    Note over CTE: Temporary named result set

Key Points:

  • Readability: Named subqueries improve code clarity
  • Reusability: Reference CTE multiple times in query
  • Scope: CTE exists only for single query execution
  • Performance: Similar to subquery, not materialized
  • Multiple CTEs: Can define multiple CTEs in one query

Recursive CTE Structure

flowchart TB
    A[Recursive CTE] --> B[Anchor Member]
    A --> C[UNION ALL]
    A --> D[Recursive Member]
    
    B --> E[Initial rows]
    B --> F[Base case]
    
    D --> G[References CTE itself]
    D --> H[Adds more rows]
    
    C --> I[Combines results]
    
    I --> J{More rows?}
    J -->|Yes| D
    J -->|No| K[Final Result]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style D fill:#FF9800
    style K fill:#9C27B0

Key Points:

  • Anchor: Initial query, base case (non-recursive)
  • Recursive: References CTE itself, adds rows iteratively
  • UNION ALL: Combines anchor and recursive results
  • Termination: Stops when recursive member returns no rows
  • Use Cases: Hierarchies, trees, graphs, sequences

PIVOT Transformation

graph LR
    A[Original Table] --> B[PIVOT Operation]
    B --> C[Transformed Table]
    
    A --> A1[Rows: Product, Month, Sales]
    C --> C1[Columns: Product, Jan, Feb, Mar]
    
    style A fill:#4CAF50
    style B fill:#FF9800
    style C fill:#2196F3

Key Points:

  • Rows to Columns: Transform row values into column headers
  • Aggregation: Requires aggregate function (SUM, COUNT, AVG)
  • Fixed Columns: Column names must be known in advance
  • Dynamic PIVOT: Use dynamic SQL for unknown columns
  • Reporting: Ideal for cross-tabulation reports

Dynamic SQL Execution

sequenceDiagram
    participant App as Application
    participant Builder as SQL Builder
    participant DB as Database
    participant Result
    
    App->>Builder: Provide parameters
    Builder->>Builder: Construct SQL string
    Builder->>DB: EXECUTE/PREPARE statement
    DB->>DB: Parse and compile
    DB->>DB: Execute query
    DB->>Result: Return data
    Result->>App: Final result

Key Points:

  • Runtime Generation: Build SQL statements dynamically
  • Flexibility: Adapt queries based on conditions
  • Parameterization: Use parameters to prevent SQL injection
  • Performance: Compiled each time, no plan caching
  • Use Cases: Dynamic filters, column selection, table names

Query Optimization Strategies

flowchart TB
    A[Query Optimization] --> B[Index Usage]
    A --> C[Query Rewriting]
    A --> D[Execution Plan]
    A --> E[Statistics]
    
    B --> B1[Create appropriate indexes]
    B --> B2[Covering indexes]
    
    C --> C1[Avoid SELECT *]
    C --> C2[Use EXISTS vs IN]
    C --> C3[Avoid functions on columns]
    
    D --> D1[Analyze with EXPLAIN]
    D --> D2[Identify bottlenecks]
    
    E --> E1[Update statistics]
    E --> E2[Help optimizer]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0
    style E fill:#00BCD4

Key Points:

  • Indexes: Create on WHERE, JOIN, ORDER BY columns
  • Avoid Functions: Don't use functions on indexed columns in WHERE
  • EXISTS vs IN: EXISTS faster for large datasets
  • SELECT Specific: Avoid SELECT *, specify needed columns
  • EXPLAIN: Analyze execution plans to find issues

Execution Plan Analysis

graph TB
    A[Query] --> B[Parser]
    B --> C[Optimizer]
    C --> D[Execution Plan]
    
    D --> E[Table Scan]
    D --> F[Index Seek]
    D --> G[Index Scan]
    D --> H[Nested Loop]
    D --> I[Hash Join]
    
    E --> E1[Slow - reads all rows]
    F --> F1[Fast - direct lookup]
    G --> G1[Moderate - scans index]
    H --> H1[Good for small sets]
    I --> I1[Good for large sets]
    
    style E fill:#F44336
    style F fill:#4CAF50
    style G fill:#FF9800

Key Points:

  • Table Scan: Reads entire table, slowest
  • Index Seek: Direct lookup using index, fastest
  • Index Scan: Scans entire index, moderate speed
  • Join Types: Nested loop, hash join, merge join
  • Cost: Optimizer estimates cost, chooses lowest

Table Partitioning

graph TB
    A[Large Table] --> B[Partition by Range]
    A --> C[Partition by List]
    A --> D[Partition by Hash]
    
    B --> B1[Date ranges]
    B --> B2[2020, 2021, 2022]
    
    C --> C1[Discrete values]
    C --> C2[Region: US, EU, ASIA]
    
    D --> D1[Hash function]
    D --> D2[Even distribution]
    
    style A fill:#2196F3
    style B fill:#4CAF50
    style C fill:#FF9800
    style D fill:#9C27B0

Key Points:

  • Range: Partition by value ranges (dates, IDs)
  • List: Partition by discrete values (regions, categories)
  • Hash: Even distribution using hash function
  • Benefits: Faster queries, easier maintenance, parallel processing
  • Partition Pruning: Query only relevant partitions

Temporary Tables vs CTEs

graph LR
    A[Temporary Storage] --> B[Temporary Tables]
    A --> C[CTEs]
    
    B --> B1[Physical storage]
    B --> B2[Can have indexes]
    B --> B3[Multiple queries]
    
    C --> C1[Logical view]
    C --> C2[No indexes]
    C --> C3[Single query only]
    
    style B fill:#4CAF50
    style C fill:#FF9800

Key Points:

  • Temp Tables: Physical storage, persist across statements
  • CTEs: Logical, exist only for single query
  • Indexes: Temp tables can have indexes, CTEs cannot
  • Performance: Temp tables better for large intermediate results
  • Simplicity: CTEs better for readability and simple cases

Code Examples

Common Table Expression (CTE)

-- Simple CTE
WITH high_value_customers AS (
    SELECT 
        customer_id,
        SUM(total_amount) AS total_spent
    FROM orders
    GROUP BY customer_id
    HAVING SUM(total_amount) > 10000
)
SELECT 
    c.customer_name,
    hvc.total_spent
FROM high_value_customers hvc
JOIN customers c ON hvc.customer_id = c.customer_id;

-- Multiple CTEs
WITH 
monthly_sales AS (
    SELECT 
        YEAR(order_date) AS year,
        MONTH(order_date) AS month,
        SUM(total_amount) AS revenue
    FROM orders
    GROUP BY YEAR(order_date), MONTH(order_date)
),
avg_monthly AS (
    SELECT AVG(revenue) AS avg_revenue
    FROM monthly_sales
)
SELECT 
    ms.year,
    ms.month,
    ms.revenue,
    am.avg_revenue,
    ms.revenue - am.avg_revenue AS difference
FROM monthly_sales ms
CROSS JOIN avg_monthly am;

Recursive CTE

-- Employee hierarchy
WITH RECURSIVE employee_hierarchy AS (
    -- Anchor: Top-level employees
    SELECT 
        employee_id,
        employee_name,
        manager_id,
        1 AS level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive: Add subordinates
    SELECT 
        e.employee_id,
        e.employee_name,
        e.manager_id,
        eh.level + 1
    FROM employees e
    INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy ORDER BY level, employee_name;

-- Number sequence generator
WITH RECURSIVE numbers AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM numbers WHERE n < 100
)
SELECT * FROM numbers;

PIVOT

-- SQL Server PIVOT
SELECT *
FROM (
    SELECT product_name, month, sales_amount
    FROM sales
) AS source_data
PIVOT (
    SUM(sales_amount)
    FOR month IN ([Jan], [Feb], [Mar], [Apr])
) AS pivot_table;

-- MySQL alternative (conditional aggregation)
SELECT 
    product_name,
    SUM(CASE WHEN month = 'Jan' THEN sales_amount ELSE 0 END) AS Jan,
    SUM(CASE WHEN month = 'Feb' THEN sales_amount ELSE 0 END) AS Feb,
    SUM(CASE WHEN month = 'Mar' THEN sales_amount ELSE 0 END) AS Mar
FROM sales
GROUP BY product_name;

Dynamic SQL

-- SQL Server
DECLARE @sql NVARCHAR(MAX);
DECLARE @table_name NVARCHAR(100) = 'customers';
DECLARE @min_amount DECIMAL(10,2) = 1000;

SET @sql = N'SELECT * FROM ' + QUOTENAME(@table_name) + 
           N' WHERE total_spent > @amount';

EXEC sp_executesql @sql, N'@amount DECIMAL(10,2)', @amount = @min_amount;

-- MySQL
SET @table_name = 'customers';
SET @sql = CONCAT('SELECT * FROM ', @table_name, ' WHERE total_spent > 1000');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

Query Optimization

-- Use EXISTS instead of IN
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);

-- Avoid functions on indexed columns
-- Bad: WHERE YEAR(order_date) = 2024
-- Good:
WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

-- Use covering index
CREATE INDEX idx_covering ON orders(customer_id, order_date, total_amount);
SELECT customer_id, order_date, total_amount
FROM orders
WHERE customer_id = 123;  -- Uses index only, no table access

Table Partitioning

-- Range partitioning (SQL Server)
CREATE PARTITION FUNCTION pf_orders_by_year(DATE)
AS RANGE RIGHT FOR VALUES ('2021-01-01', '2022-01-01', '2023-01-01');

CREATE PARTITION SCHEME ps_orders_by_year
AS PARTITION pf_orders_by_year ALL TO ([PRIMARY]);

CREATE TABLE orders (
    order_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
) ON ps_orders_by_year(order_date);

Best Practices

  1. Use CTEs for Readability: Break complex queries into logical parts
  2. Limit Recursive Depth: Add MAXRECURSION hint to prevent infinite loops
  3. Parameterize Dynamic SQL: Prevent SQL injection attacks
  4. Analyze Execution Plans: Use EXPLAIN to identify bottlenecks
  5. Index Strategically: Create indexes on frequently queried columns
  6. Partition Large Tables: Improve query performance and maintenance
  7. Update Statistics: Keep optimizer statistics current
  8. Test Performance: Compare different approaches with real data

Source: SQL Documentation
Last Updated: 2026-06-12

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...