Subqueries: Queries Within Queries
A subquery (also called an inner query or nested query) is a query embedded inside another query. Subqueries let you break complex problems into smaller, logical steps: "Find customers who ordered products that cost more than the average price." Instead of calculating the average separately, you nest that calculation inside your main query. Subqueries can appear in SELECT, WHERE, FROM, and HAVING clauses, and mastering them unlocks elegant solutions to problems that would otherwise require multiple queries or complex joins. They're the bridge between basic SQL and truly advanced data analysis.
📚 What You'll Learn
- What subqueries are and when to use them
- How to write subqueries in WHERE clause
- How to use subqueries in SELECT clause
- How to use subqueries in FROM clause (derived tables)
- How to use IN, EXISTS, ANY, and ALL operators
- The difference between correlated and non-correlated subqueries
- How subqueries compare to JOINs
- Performance considerations for subqueries
- Common subquery patterns and use cases
1. What is a Subquery?
A subquery is a SELECT statement nested inside another SQL statement. The inner query executes first, and its results are used by the outer query.
Basic Structure
SELECT column
FROM table
WHERE column operator (
SELECT column ← Subquery (inner query)
FROM table
WHERE condition
);
← Main query (outer query)
Simple Example
-- Find products more expensive than the average
SELECT
product_id,
name,
price
FROM products
WHERE price > (
SELECT AVG(price) -- Subquery calculates average
FROM products
);
-- Process:
-- 1. Subquery runs: SELECT AVG(price) FROM products → returns 49.99
-- 2. Main query runs: WHERE price > 49.99
2. Subqueries in WHERE Clause
Single-Value Subqueries
-- Find customers who placed orders today
SELECT
customer_id,
name,
email
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
WHERE order_date = CURRENT_DATE
);
-- Subquery returns a list of customer IDs
-- Main query filters customers by that list
Comparison Operators
-- Find products in categories with more than 5 products
SELECT
name,
category_id
FROM products
WHERE category_id IN (
SELECT category_id
FROM products
GROUP BY category_id
HAVING COUNT(*) > 5
);
-- Subquery finds "popular" categories
-- Main query filters products to those categories
3. Using IN with Subqueries
IN operator checks if a value matches any value in a list returned by the subquery. It's the most common way to use subqueries in WHERE.
Basic IN Pattern
-- Find customers who have placed orders
SELECT
customer_id,
name,
email
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
);
-- Returns customers who appear in the orders table
NOT IN Pattern
-- Find customers who have NEVER placed an order
SELECT
customer_id,
name,
email
FROM customers
WHERE customer_id NOT IN (
SELECT customer_id
FROM orders
WHERE customer_id IS NOT NULL -- Important: avoid NULL issues
);
-- Returns customers who don't appear in orders table
⚠️ NULL Warning: NOT IN can behave unexpectedly with NULLs. If the subquery returns any NULL values, NOT IN returns no rows! Always filter out NULLs or use NOT EXISTS instead.
4. EXISTS - Testing for Existence
EXISTS returns TRUE if the subquery returns any rows, FALSE otherwise. It's often more efficient than IN because it stops as soon as it finds a match.
Basic EXISTS
-- Find customers who have placed orders (using EXISTS)
SELECT
c.customer_id,
c.name,
c.email
FROM customers c
WHERE EXISTS (
SELECT 1 -- Can use 1, *, or any value
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- EXISTS just checks if rows exist, doesn't return values
-- More efficient than IN for large datasets
NOT EXISTS
-- Find customers who have NEVER placed an order (safer than NOT IN)
SELECT
c.customer_id,
c.name,
c.email
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- NOT EXISTS handles NULLs properly, unlike NOT IN
EXISTS vs IN
-- These are equivalent, but EXISTS is often faster:
-- Using IN
SELECT name FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders
);
-- Using EXISTS (preferred for large datasets)
SELECT name FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
);
5. Correlated Subqueries
Correlated subqueries reference columns from the outer query. They execute once for each row processed by the outer query. Non-correlated subqueries execute once and return a result used by the outer query.
Non-Correlated (Independent)
-- Non-correlated: Subquery runs ONCE
SELECT name, price
FROM products
WHERE price > (
SELECT AVG(price) FROM products -- Runs once, returns one value
);
-- Process:
-- 1. Subquery executes: AVG(price) = 49.99
-- 2. Main query uses that value
Correlated (Dependent)
-- Correlated: Subquery runs for EACH row
SELECT
p1.product_id,
p1.name,
p1.price,
p1.category_id
FROM products p1
WHERE p1.price > (
SELECT AVG(p2.price)
FROM products p2
WHERE p2.category_id = p1.category_id -- References outer query!
);
-- Process:
-- 1. Main query looks at product #1 (category_id = 5)
-- 2. Subquery calculates AVG(price) for category 5
-- 3. Compare p1.price to that average
-- 4. Repeat for each product
-- Finds products more expensive than their category average
Another Correlated Example
-- Find each customer's most recent order
SELECT
c.name,
o.order_id,
o.order_date,
o.total
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date = (
SELECT MAX(order_date)
FROM orders o2
WHERE o2.customer_id = c.customer_id -- Correlated
);
6. Subqueries in SELECT Clause
Subqueries in SELECT must return a single value (scalar subquery). They execute once per row in the result set.
Basic Pattern
-- Show each customer with their order count
SELECT
c.customer_id,
c.name,
(
SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.customer_id
) AS order_count
FROM customers c
ORDER BY order_count DESC;
-- Subquery runs once per customer
Multiple Scalar Subqueries
-- Customer summary with multiple calculations
SELECT
c.customer_id,
c.name,
(
SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.customer_id
) AS total_orders,
(
SELECT SUM(total)
FROM orders o
WHERE o.customer_id = c.customer_id
) AS total_spent,
(
SELECT MAX(order_date)
FROM orders o
WHERE o.customer_id = c.customer_id
) AS last_order_date
FROM customers c;
⚠️ Performance Note: Multiple subqueries in SELECT can be slow because each executes for every row. Consider using JOINs with GROUP BY instead for better performance.
7. Subqueries in FROM (Derived Tables)
A subquery in FROM creates a temporary result set (derived table) that you can query like a regular table. You must give it an alias.
Basic Derived Table
-- Find categories with average price > $50
SELECT
category_name,
avg_price,
product_count
FROM (
SELECT
c.name AS category_name,
AVG(p.price) AS avg_price,
COUNT(*) AS product_count
FROM categories c
INNER JOIN products p ON c.category_id = p.category_id
GROUP BY c.name
) AS category_stats -- Must have alias!
WHERE avg_price > 50
ORDER BY avg_price DESC;
-- Process:
-- 1. Subquery creates temporary result set
-- 2. Main query filters and orders that result
Complex Derived Table
-- Find customers whose average order is above the global average
SELECT
customer_name,
avg_order_value,
order_count
FROM (
SELECT
c.name AS customer_name,
AVG(o.total) AS avg_order_value,
COUNT(o.order_id) AS order_count
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.name
) AS customer_stats
WHERE avg_order_value > (
SELECT AVG(total) FROM orders
)
ORDER BY avg_order_value DESC;
8. ANY and ALL Operators
ANY (or SOME)
-- Find products more expensive than ANY product in category 5
SELECT
product_id,
name,
price
FROM products
WHERE price > ANY (
SELECT price
FROM products
WHERE category_id = 5
);
-- Returns TRUE if price is greater than at least one value
-- Equivalent to: price > MIN(subquery)
ALL
-- Find products more expensive than ALL products in category 5
SELECT
product_id,
name,
price
FROM products
WHERE price > ALL (
SELECT price
FROM products
WHERE category_id = 5
);
-- Returns TRUE if price is greater than every value
-- Equivalent to: price > MAX(subquery)
Equivalents
-- These pairs are equivalent:
price > ANY (subquery) = price > (SELECT MIN(...) FROM subquery)
price > ALL (subquery) = price > (SELECT MAX(...) FROM subquery)
price < ANY (subquery) = price < (SELECT MAX(...) FROM subquery)
price < ALL (subquery) = price < (SELECT MIN(...) FROM subquery)
9. Common Subquery Patterns
Pattern 1: Top N per Group
-- Find the most expensive product in each category
SELECT
p1.category_id,
p1.name,
p1.price
FROM products p1
WHERE p1.price = (
SELECT MAX(p2.price)
FROM products p2
WHERE p2.category_id = p1.category_id
);
Pattern 2: Difference from Average
-- Show how much each product differs from category average
SELECT
p1.name,
p1.price,
(
SELECT AVG(p2.price)
FROM products p2
WHERE p2.category_id = p1.category_id
) AS category_avg_price,
p1.price - (
SELECT AVG(p2.price)
FROM products p2
WHERE p2.category_id = p1.category_id
) AS price_difference
FROM products p1
ORDER BY price_difference DESC;
Pattern 3: Ranking Without Window Functions
-- Rank customers by total spending
SELECT
c.name,
(
SELECT SUM(o.total)
FROM orders o
WHERE o.customer_id = c.customer_id
) AS total_spent,
(
SELECT COUNT(*) + 1
FROM customers c2
WHERE (
SELECT SUM(o2.total)
FROM orders o2
WHERE o2.customer_id = c2.customer_id
) > (
SELECT SUM(o3.total)
FROM orders o3
WHERE o3.customer_id = c.customer_id
)
) AS rank
FROM customers c
ORDER BY total_spent DESC;
10. When to Use Subqueries vs JOINs
| Scenario | Use Subquery | Use JOIN |
|---|---|---|
| Filtering based on aggregate | ✅ Good | ❌ Complex |
| Checking existence (IN/EXISTS) | ✅ Good | ✅ Also good |
| Need columns from both tables | ❌ Awkward | ✅ Better |
| Multiple aggregations per row | ⚠️ Can be slow | ✅ Usually faster |
| Readability matters | ✅ Sometimes clearer | ✅ Sometimes clearer |
Same Query: Subquery vs JOIN
-- Using subquery
SELECT name
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
WHERE order_date >= '2024-01-01'
);
-- Using JOIN (often faster)
SELECT DISTINCT c.name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01';
11. 🎯 Practice Exercises
Exercise 1: Basic Subqueries
Given this schema:
products (product_id, name, price, category_id, stock)
categories (category_id, name)
orders (order_id, customer_id, order_date, total)
order_items (item_id, order_id, product_id, quantity)
Tasks:
- Find products priced above the average price
- Find products that have never been ordered (using NOT IN)
- Find products that have never been ordered (using NOT EXISTS)
Show Solution
-- Task 1: Products above average price
SELECT
product_id,
name,
price,
price - (SELECT AVG(price) FROM products) AS above_avg
FROM products
WHERE price > (
SELECT AVG(price)
FROM products
)
ORDER BY price DESC;
-- Task 2: Never ordered (NOT IN)
SELECT
product_id,
name,
price,
stock
FROM products
WHERE product_id NOT IN (
SELECT product_id
FROM order_items
WHERE product_id IS NOT NULL
)
ORDER BY price DESC;
-- Task 3: Never ordered (NOT EXISTS - preferred)
SELECT
p.product_id,
p.name,
p.price,
p.stock
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.product_id
)
ORDER BY p.price DESC;
Key Concepts:
- ✅ Non-correlated subquery for average (runs once)
- ✅ NOT IN requires NULL check
- ✅ NOT EXISTS is safer and often faster
- ✅ Correlated subquery in EXISTS
Exercise 2: Correlated Subqueries
Given this schema:
employees (employee_id, name, salary, department_id, hire_date)
departments (department_id, name, location)
Tasks:
- Find employees earning more than their department's average
- Find the highest-paid employee in each department
- Show each employee with their department's employee count
Show Solution
-- Task 1: Above department average
SELECT
e1.employee_id,
e1.name,
e1.salary,
e1.department_id,
(
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
) AS dept_avg_salary
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
)
ORDER BY e1.department_id, e1.salary DESC;
-- Task 2: Highest paid per department
SELECT
e1.employee_id,
e1.name,
e1.salary,
d.name AS department_name
FROM employees e1
INNER JOIN departments d ON e1.department_id = d.department_id
WHERE e1.salary = (
SELECT MAX(e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
)
ORDER BY e1.salary DESC;
-- Task 3: Employee count per department
SELECT
e.employee_id,
e.name,
e.salary,
d.name AS department_name,
(
SELECT COUNT(*)
FROM employees e2
WHERE e2.department_id = e.department_id
) AS dept_employee_count
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
ORDER BY dept_employee_count DESC, e.name;
Correlated Subquery Techniques:
- ✅ Subquery references outer table (e1.department_id)
- ✅ Executes once per outer row
- ✅ Useful for per-group comparisons
- ✅ Can be used in SELECT and WHERE
Exercise 3: Derived Tables
Given this schema:
customers (customer_id, name, country, registration_date)
orders (order_id, customer_id, order_date, total, status)
Tasks:
- Find countries with average customer spending over $500
- Find customers whose total spending is in the top 10%
- Create a summary showing customer tier (based on spending)
Show Solution
-- Task 1: Countries with high average spending
SELECT
country,
customer_count,
avg_spending
FROM (
SELECT
c.country,
COUNT(DISTINCT c.customer_id) AS customer_count,
AVG(o.total) AS avg_spending
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.country
) AS country_stats
WHERE avg_spending > 500
ORDER BY avg_spending DESC;
-- Task 2: Top 10% spenders
SELECT
customer_id,
name,
total_spent
FROM (
SELECT
c.customer_id,
c.name,
SUM(o.total) AS total_spent
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.name
) AS customer_spending
WHERE total_spent >= (
SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY total_spent)
FROM (
SELECT SUM(total) AS total_spent
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
) AS spending_dist
)
ORDER BY total_spent DESC;
-- Task 3: Customer tiers
SELECT
customer_id,
name,
total_spent,
CASE
WHEN total_spent >= 5000 THEN 'Platinum'
WHEN total_spent >= 2000 THEN 'Gold'
WHEN total_spent >= 500 THEN 'Silver'
ELSE 'Bronze'
END AS customer_tier
FROM (
SELECT
c.customer_id,
c.name,
COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
AND o.status = 'completed'
GROUP BY c.customer_id, c.name
) AS customer_totals
ORDER BY total_spent DESC;
Derived Table Patterns:
- ✅ Subquery in FROM creates temporary table
- ✅ Must have alias (country_stats, customer_spending)
- ✅ Can filter/sort derived results
- ✅ Useful for multi-step calculations
🔥 Challenge: Complex Analysis with Multiple Subqueries
You have an e-commerce database:
products (product_id, name, price, category_id)
categories (category_id, name)
orders (order_id, customer_id, order_date, status)
order_items (item_id, order_id, product_id, quantity, price)
reviews (review_id, product_id, customer_id, rating, review_date)
Challenge: Create a product performance report showing: products with above-average ratings, sales metrics, comparison to category average, and identify top performers.
Show Solution
-- Comprehensive product performance analysis
SELECT
p.product_id,
p.name AS product_name,
cat.name AS category_name,
p.price,
-- Review metrics
(
SELECT COUNT(*)
FROM reviews r
WHERE r.product_id = p.product_id
) AS review_count,
(
SELECT AVG(rating)
FROM reviews r
WHERE r.product_id = p.product_id
) AS avg_rating,
-- Sales metrics
(
SELECT COUNT(DISTINCT oi.order_id)
FROM order_items oi
INNER JOIN orders o ON oi.order_id = o.order_id
WHERE oi.product_id = p.product_id
AND o.status = 'completed'
) AS orders_containing_product,
(
SELECT SUM(oi.quantity)
FROM order_items oi
INNER JOIN orders o ON oi.order_id = o.order_id
WHERE oi.product_id = p.product_id
AND o.status = 'completed'
) AS total_units_sold,
-- Category comparison
(
SELECT AVG(price)
FROM products p2
WHERE p2.category_id = p.category_id
) AS category_avg_price,
p.price - (
SELECT AVG(price)
FROM products p2
WHERE p2.category_id = p.category_id
) AS price_vs_category_avg,
-- Performance indicator
CASE
WHEN (
SELECT AVG(rating)
FROM reviews r
WHERE r.product_id = p.product_id
) >= 4.5
AND (
SELECT SUM(oi.quantity)
FROM order_items oi
INNER JOIN orders o ON oi.order_id = o.order_id
WHERE oi.product_id = p.product_id
AND o.status = 'completed'
) > (
SELECT AVG(units_sold)
FROM (
SELECT SUM(oi2.quantity) AS units_sold
FROM order_items oi2
INNER JOIN orders o2 ON oi2.order_id = o2.order_id
WHERE o2.status = 'completed'
GROUP BY oi2.product_id
) AS product_sales
) THEN 'Top Performer'
WHEN (
SELECT AVG(rating)
FROM reviews r
WHERE r.product_id = p.product_id
) >= 4.0 THEN 'Strong'
ELSE 'Needs Improvement'
END AS performance_category
FROM products p
INNER JOIN categories cat ON p.category_id = cat.category_id
WHERE EXISTS (
-- Only products with reviews
SELECT 1
FROM reviews r
WHERE r.product_id = p.product_id
)
AND (
SELECT AVG(rating)
FROM reviews r
WHERE r.product_id = p.product_id
) > (
SELECT AVG(rating)
FROM reviews
)
ORDER BY avg_rating DESC, total_units_sold DESC;
Advanced Subquery Techniques:
- ✅ Multiple correlated subqueries in SELECT
- ✅ Nested subqueries (subquery within subquery)
- ✅ EXISTS for filtering
- ✅ Subqueries in CASE expressions
- ✅ Derived tables for intermediate calculations
- 💡 Combines everything learned in this lesson!
Performance Note:
This query has many subqueries and could be slow. In production, consider: - Using JOINs with GROUP BY instead - Creating indexed views for frequently-used aggregations - Caching results if data doesn't change frequently
📝 Key Takeaways
- Subqueries are queries nested inside other queries
- Subqueries can appear in SELECT, WHERE, FROM, and HAVING
- Non-correlated subqueries run once, correlated run per row
- Use IN to check if value exists in subquery result
- Use EXISTS to test if subquery returns any rows (faster than IN)
- NOT EXISTS is safer than NOT IN (handles NULLs properly)
- Scalar subqueries in SELECT return single values
- Derived tables (subqueries in FROM) must have aliases
- ANY returns TRUE if condition is true for any value
- ALL returns TRUE if condition is true for all values
- Subqueries can be nested multiple levels deep
- JOINs are often faster than subqueries for large datasets
- Use subqueries for filtering and comparisons
- Use JOINs when you need columns from multiple tables
- Subqueries enable elegant solutions to complex problems