← Back to Lessons
ADVANCED ⏱ 45 minutes

UNION and Set Operations

Set operations combine the results of multiple SELECT statements into a single result set. While JOINs combine columns from different tables, set operations combine rows. UNION merges results from two queries, removing duplicates, while UNION ALL keeps all rows including duplicates. These operations are essential for queries like "show all customers from the USA plus all VIP customers from any country" or "combine historical data with current data from different tables." Understanding set operations unlocks powerful ways to merge data from different sources, time periods, or conditions into unified result sets.


📚 What You'll Learn

1. Set Operations vs JOINs

JOINs combine columns from different tables. Set operations combine rows from different queries. Think of UNION as stacking result sets vertically, while JOINs extend them horizontally.

Visual Comparison

JOIN (Horizontal - combines columns):
Table A        Table B         Result
+----+         +----+          +----+----+
| A1 |    +    | B1 |    →     | A1 | B1 |
| A2 |         | B2 |          | A2 | B2 |
+----+         +----+          +----+----+

UNION (Vertical - combines rows):
Query 1        Query 2         Result
+----+         +----+          +----+
| A1 |         | C1 |          | A1 |
| A2 |    +    | C2 |    →     | A2 |
+----+         +----+          | C1 |
                               | C2 |
                               +----+

2. UNION: Combining Result Sets

Basic UNION Syntax

SELECT column1, column2
FROM table1
WHERE condition

UNION

SELECT column1, column2
FROM table2
WHERE condition;

Simple Example

-- Combine customers from USA and VIP customers from any country
SELECT 
    customer_id,
    name,
    email,
    'USA Customer' AS category
FROM customers
WHERE country = 'USA'

UNION

SELECT 
    customer_id,
    name,
    email,
    'VIP Customer' AS category
FROM customers
WHERE loyalty_tier = 'VIP';

-- Result: All USA customers + all VIP customers (duplicates removed)

UNION Removes Duplicates

-- If a customer is both from USA and VIP:
-- UNION keeps only ONE row (duplicate removed)

Query 1 results:        Query 2 results:        UNION result:
+----+-------+          +----+-------+           +----+-------+
| 1  | Alice |          | 1  | Alice |           | 1  | Alice |
| 2  | Bob   |          | 3  | Carol |           | 2  | Bob   |
+----+-------+          +----+-------+           | 3  | Carol |
                                                 +----+-------+
Alice appears once because she's in both queries

3. UNION ALL: Keeping Duplicates

UNION ALL keeps all rows including duplicates. It's faster than UNION because it doesn't need to check for and remove duplicates.

UNION vs UNION ALL

-- UNION (removes duplicates, slower)
SELECT product_id, name FROM products WHERE category_id = 1
UNION
SELECT product_id, name FROM products WHERE price > 100;

-- UNION ALL (keeps duplicates, faster)
SELECT product_id, name FROM products WHERE category_id = 1
UNION ALL
SELECT product_id, name FROM products WHERE price > 100;

-- Use UNION ALL when:
-- 1. You know there are no duplicates
-- 2. You want to keep duplicates
-- 3. Performance matters and duplicates are acceptable

Performance Difference

-- UNION process:
-- 1. Execute first query
-- 2. Execute second query
-- 3. Combine results
-- 4. SORT to find duplicates (expensive!)
-- 5. Remove duplicates
-- 6. Return result

-- UNION ALL process:
-- 1. Execute first query
-- 2. Execute second query
-- 3. Combine results
-- 4. Return result (no sorting/deduplication)

✅ Best Practice: Use UNION ALL when possible. Only use UNION when you specifically need to remove duplicates.

4. Rules for Using UNION

Rule 1: Same Number of Columns

❌ ERROR: Column count mismatch

-- This will fail!
SELECT customer_id, name FROM customers
UNION
SELECT product_id, name, price FROM products;  -- 3 columns vs 2

-- Error: "The used SELECT statements have different number of columns"

✅ CORRECT: Same column count

-- Add NULL or placeholder for missing column
SELECT customer_id, name, NULL AS price FROM customers
UNION
SELECT product_id, name, price FROM products;

Rule 2: Compatible Data Types

-- ✅ Compatible types (both strings)
SELECT name FROM customers
UNION
SELECT name FROM suppliers;

-- ✅ Compatible types (both numbers)
SELECT customer_id FROM customers
UNION
SELECT supplier_id FROM suppliers;

-- ⚠️ Different types may work (automatic conversion)
SELECT customer_id FROM customers      -- INT
UNION
SELECT 'NEW' AS customer_id;          -- STRING (converts to string)

-- ❌ Incompatible types will fail
SELECT order_date FROM orders          -- DATE
UNION
SELECT customer_id FROM customers;    -- INT (cannot convert)

Rule 3: Column Names from First Query

-- First query determines column names
SELECT customer_id AS id, name AS full_name FROM customers
UNION
SELECT supplier_id, company_name FROM suppliers;

-- Result columns are named: id, full_name
-- (supplier_id and company_name are ignored)

5. ORDER BY and LIMIT with UNION

ORDER BY Must Be at the End

-- ✅ CORRECT: ORDER BY applies to entire result
SELECT name, 'Customer' AS type FROM customers
UNION
SELECT name, 'Supplier' AS type FROM suppliers
ORDER BY name;  -- Sorts combined result

-- ❌ WRONG: Can't ORDER BY individual queries
SELECT name FROM customers ORDER BY name  -- Error!
UNION
SELECT name FROM suppliers;

LIMIT with UNION

-- LIMIT applies to final result
SELECT name FROM customers
UNION
SELECT name FROM suppliers
LIMIT 10;  -- Returns top 10 from combined result

-- To limit individual queries, use subqueries
(SELECT name FROM customers ORDER BY registration_date DESC LIMIT 5)
UNION ALL
(SELECT name FROM suppliers ORDER BY created_date DESC LIMIT 5)
ORDER BY name;

6. Common UNION Patterns

Pattern 1: Combining Different Tables

-- Create unified contact list
SELECT 
    customer_id AS id,
    name,
    email,
    phone,
    'Customer' AS contact_type
FROM customers

UNION ALL

SELECT 
    supplier_id,
    company_name,
    email,
    phone,
    'Supplier'
FROM suppliers

UNION ALL

SELECT 
    employee_id,
    name,
    work_email,
    work_phone,
    'Employee'
FROM employees

ORDER BY name;

Pattern 2: Historical + Current Data

-- Combine archived orders with current orders
SELECT 
    order_id,
    customer_id,
    order_date,
    total,
    'Archived' AS status
FROM orders_archive
WHERE order_date >= '2020-01-01'

UNION ALL

SELECT 
    order_id,
    customer_id,
    order_date,
    total,
    'Current'
FROM orders_current

ORDER BY order_date DESC;

Pattern 3: Multiple Conditions

-- Find customers needing attention
SELECT 
    customer_id,
    name,
    email,
    'High Value - No Recent Orders' AS reason
FROM customers
WHERE total_lifetime_value > 10000
    AND last_order_date < DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY)

UNION ALL

SELECT 
    customer_id,
    name,
    email,
    'New Customer - No Orders Yet'
FROM customers
WHERE registration_date >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
    AND total_orders = 0

UNION ALL

SELECT 
    customer_id,
    name,
    email,
    'Abandoned Cart'
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM cart_items ci
    WHERE ci.customer_id = c.customer_id
        AND ci.created_at < DATE_SUB(CURRENT_DATE, INTERVAL 24 HOUR)
)

ORDER BY reason, name;

Pattern 4: Report Generation

-- Monthly sales summary with totals
SELECT 
    DATE_FORMAT(order_date, '%Y-%m') AS period,
    'Sales' AS metric,
    SUM(total) AS value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')

UNION ALL

SELECT 
    DATE_FORMAT(order_date, '%Y-%m'),
    'Orders',
    COUNT(*)
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')

UNION ALL

SELECT 
    DATE_FORMAT(order_date, '%Y-%m'),
    'Avg Order Value',
    AVG(total)
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')

ORDER BY period, metric;

7. INTERSECT and EXCEPT (MySQL Alternatives)

⚠️ MySQL Note: MySQL doesn't natively support INTERSECT and EXCEPT. However, you can achieve the same results using other methods.

INTERSECT (Rows in Both Queries)

-- Standard SQL (not MySQL):
SELECT customer_id FROM customers_usa
INTERSECT
SELECT customer_id FROM vip_customers;

-- MySQL equivalent using INNER JOIN:
SELECT DISTINCT c1.customer_id
FROM customers_usa c1
INNER JOIN vip_customers c2 ON c1.customer_id = c2.customer_id;

-- MySQL equivalent using IN:
SELECT customer_id
FROM customers_usa
WHERE customer_id IN (
    SELECT customer_id
    FROM vip_customers
);

EXCEPT/MINUS (Rows in First, Not in Second)

-- Standard SQL (not MySQL):
SELECT customer_id FROM all_customers
EXCEPT
SELECT customer_id FROM vip_customers;

-- MySQL equivalent using NOT IN:
SELECT customer_id
FROM all_customers
WHERE customer_id NOT IN (
    SELECT customer_id
    FROM vip_customers
    WHERE customer_id IS NOT NULL
);

-- MySQL equivalent using NOT EXISTS (better):
SELECT c1.customer_id
FROM all_customers c1
WHERE NOT EXISTS (
    SELECT 1
    FROM vip_customers c2
    WHERE c2.customer_id = c1.customer_id
);

8. Performance Tips for UNION

Optimization Strategies

  • ✅ Use UNION ALL instead of UNION when duplicates don't matter
  • ✅ Filter data before UNION (in individual queries)
  • ✅ Index columns used in WHERE clauses of each query
  • ✅ Limit rows in individual queries when possible
  • ✅ Use identical column types to avoid conversions
  • ✅ Consider materialized views for frequently-used UNIONs

Filter Early

-- ❌ SLOW: Filter after UNION
SELECT name, email FROM customers
UNION
SELECT name, email FROM suppliers
WHERE country = 'USA';  -- Error! Can't do this

-- ✅ FAST: Filter before UNION
SELECT name, email FROM customers WHERE country = 'USA'
UNION
SELECT name, email FROM suppliers WHERE country = 'USA';

9. 🎯 Practice Exercises

Exercise 1: Basic UNION Operations

Given this schema:

customers (customer_id, name, email, country, registration_date)
employees (employee_id, name, email, department, hire_date)
suppliers (supplier_id, company_name, email, country, established_date)

Tasks:

  • Create a unified email list from all three tables
  • Find all names (customers, employees, suppliers) sorted alphabetically
  • Combine USA customers and suppliers, indicating the type
Show Solution
-- Task 1: Unified email list
SELECT 
    email,
    'Customer' AS source
FROM customers

UNION

SELECT 
    email,
    'Employee'
FROM employees

UNION

SELECT 
    email,
    'Supplier'
FROM suppliers

ORDER BY email;


-- Task 2: All names alphabetically (UNION removes duplicates)
SELECT name FROM customers
UNION
SELECT name FROM employees
UNION
SELECT company_name FROM suppliers
ORDER BY name;


-- Task 3: USA entities with type indicator
SELECT 
    customer_id AS id,
    name,
    email,
    country,
    'Customer' AS entity_type
FROM customers
WHERE country = 'USA'

UNION ALL

SELECT 
    supplier_id,
    company_name,
    email,
    country,
    'Supplier'
FROM suppliers
WHERE country = 'USA'

ORDER BY entity_type, name;

Key Concepts:

  • ✅ UNION removes duplicate emails automatically
  • ✅ Add type indicators to distinguish sources
  • ✅ UNION ALL faster when no duplicates expected
  • ✅ ORDER BY applies to entire combined result

Exercise 2: Historical Data Combination

Given this schema:

orders_2023 (order_id, customer_id, order_date, total, status)
orders_2024 (order_id, customer_id, order_date, total, status)
orders_archive (order_id, customer_id, order_date, total, status, archive_year)

Tasks:

  • Combine all orders from all tables for customer_id = 1001
  • Get total order count and revenue across all tables
  • Find top 10 orders by total across all years
Show Solution
-- Task 1: All orders for specific customer
SELECT 
    order_id,
    customer_id,
    order_date,
    total,
    status,
    2023 AS year
FROM orders_2023
WHERE customer_id = 1001

UNION ALL

SELECT 
    order_id,
    customer_id,
    order_date,
    total,
    status,
    2024
FROM orders_2024
WHERE customer_id = 1001

UNION ALL

SELECT 
    order_id,
    customer_id,
    order_date,
    total,
    status,
    archive_year
FROM orders_archive
WHERE customer_id = 1001

ORDER BY order_date DESC;


-- Task 2: Total counts and revenue
SELECT 
    'Total' AS metric,
    COUNT(*) AS order_count,
    SUM(total) AS total_revenue
FROM (
    SELECT order_id, total FROM orders_2023
    UNION ALL
    SELECT order_id, total FROM orders_2024
    UNION ALL
    SELECT order_id, total FROM orders_archive
) AS all_orders;


-- Task 3: Top 10 orders across all years
SELECT 
    order_id,
    customer_id,
    order_date,
    total
FROM orders_2023

UNION ALL

SELECT 
    order_id,
    customer_id,
    order_date,
    total
FROM orders_2024

UNION ALL

SELECT 
    order_id,
    customer_id,
    order_date,
    total
FROM orders_archive

ORDER BY total DESC
LIMIT 10;

Historical Data Patterns:

  • ✅ UNION ALL appropriate (no duplicates across years)
  • ✅ Add year column to track source
  • ✅ Wrap UNION in subquery for aggregation
  • ✅ ORDER BY and LIMIT work on combined result

🔥 Challenge: Multi-Source Report Generation

You need to create a comprehensive sales dashboard:

orders (order_id, customer_id, order_date, total, status)
refunds (refund_id, order_id, refund_date, amount, reason)
exchanges (exchange_id, order_id, exchange_date, old_product_id, new_product_id)
subscriptions (subscription_id, customer_id, start_date, monthly_amount, status)

Challenge: Create a unified financial activity report showing all transactions (orders, refunds, exchanges, subscriptions) with type indicators, amounts (positive/negative), and monthly summaries.

Show Solution
-- Comprehensive financial activity report
SELECT 
    order_id AS transaction_id,
    customer_id,
    order_date AS transaction_date,
    total AS amount,
    'Order' AS transaction_type,
    status AS transaction_status,
    DATE_FORMAT(order_date, '%Y-%m') AS month
FROM orders
WHERE order_date >= '2024-01-01'

UNION ALL

SELECT 
    refund_id,
    (SELECT customer_id FROM orders WHERE order_id = r.order_id),
    refund_date,
    -amount,  -- Negative for refunds
    'Refund',
    reason,
    DATE_FORMAT(refund_date, '%Y-%m')
FROM refunds r
WHERE refund_date >= '2024-01-01'

UNION ALL

SELECT 
    exchange_id,
    (SELECT customer_id FROM orders WHERE order_id = e.order_id),
    exchange_date,
    0,  -- No financial impact
    'Exchange',
    CONCAT('Old: ', old_product_id, ', New: ', new_product_id),
    DATE_FORMAT(exchange_date, '%Y-%m')
FROM exchanges e
WHERE exchange_date >= '2024-01-01'

UNION ALL

SELECT 
    subscription_id,
    customer_id,
    start_date,
    monthly_amount,
    'Subscription',
    status,
    DATE_FORMAT(start_date, '%Y-%m')
FROM subscriptions
WHERE start_date >= '2024-01-01'

ORDER BY transaction_date DESC, transaction_type;


-- Monthly summary from combined data
SELECT 
    month,
    transaction_type,
    COUNT(*) AS transaction_count,
    SUM(amount) AS total_amount,
    AVG(amount) AS avg_amount
FROM (
    -- Same UNION query as above
    SELECT 
        order_id AS transaction_id,
        order_date AS transaction_date,
        total AS amount,
        'Order' AS transaction_type,
        DATE_FORMAT(order_date, '%Y-%m') AS month
    FROM orders
    WHERE order_date >= '2024-01-01'
    
    UNION ALL
    
    SELECT 
        refund_id,
        refund_date,
        -amount,
        'Refund',
        DATE_FORMAT(refund_date, '%Y-%m')
    FROM refunds
    WHERE refund_date >= '2024-01-01'
    
    UNION ALL
    
    SELECT 
        subscription_id,
        start_date,
        monthly_amount,
        'Subscription',
        DATE_FORMAT(start_date, '%Y-%m')
    FROM subscriptions
    WHERE start_date >= '2024-01-01'
) AS all_transactions
GROUP BY month, transaction_type
ORDER BY month DESC, transaction_type;

Advanced UNION Techniques:

  • ✅ Multiple UNION ALL for different transaction types
  • ✅ Negative amounts for refunds
  • ✅ Subqueries to fetch related data
  • ✅ Wrap UNION in subquery for aggregation
  • ✅ DATE_FORMAT for monthly grouping
  • 💡 This pattern powers real financial dashboards!

📝 Key Takeaways