UNION and Set Operations
Set operations combine the results of multiple SELECT statements into a single result set. While JOINs combine columns from different tables, set operations combine rows. UNION merges results from two queries, removing duplicates, while UNION ALL keeps all rows including duplicates. These operations are essential for queries like "show all customers from the USA plus all VIP customers from any country" or "combine historical data with current data from different tables." Understanding set operations unlocks powerful ways to merge data from different sources, time periods, or conditions into unified result sets.
📚 What You'll Learn
- What set operations are and how they differ from JOINs
- How to use UNION to combine result sets
- The difference between UNION and UNION ALL
- Rules for combining queries with UNION
- How to order and limit combined results
- How to use INTERSECT (if supported)
- How to use EXCEPT/MINUS (if supported)
- Performance implications of set operations
- Common use cases and patterns
1. Set Operations vs JOINs
JOINs combine columns from different tables. Set operations combine rows from different queries. Think of UNION as stacking result sets vertically, while JOINs extend them horizontally.
Visual Comparison
JOIN (Horizontal - combines columns):
Table A Table B Result
+----+ +----+ +----+----+
| A1 | + | B1 | → | A1 | B1 |
| A2 | | B2 | | A2 | B2 |
+----+ +----+ +----+----+
UNION (Vertical - combines rows):
Query 1 Query 2 Result
+----+ +----+ +----+
| A1 | | C1 | | A1 |
| A2 | + | C2 | → | A2 |
+----+ +----+ | C1 |
| C2 |
+----+
2. UNION: Combining Result Sets
Basic UNION Syntax
SELECT column1, column2
FROM table1
WHERE condition
UNION
SELECT column1, column2
FROM table2
WHERE condition;
Simple Example
-- Combine customers from USA and VIP customers from any country
SELECT
customer_id,
name,
email,
'USA Customer' AS category
FROM customers
WHERE country = 'USA'
UNION
SELECT
customer_id,
name,
email,
'VIP Customer' AS category
FROM customers
WHERE loyalty_tier = 'VIP';
-- Result: All USA customers + all VIP customers (duplicates removed)
UNION Removes Duplicates
-- If a customer is both from USA and VIP:
-- UNION keeps only ONE row (duplicate removed)
Query 1 results: Query 2 results: UNION result:
+----+-------+ +----+-------+ +----+-------+
| 1 | Alice | | 1 | Alice | | 1 | Alice |
| 2 | Bob | | 3 | Carol | | 2 | Bob |
+----+-------+ +----+-------+ | 3 | Carol |
+----+-------+
Alice appears once because she's in both queries
3. UNION ALL: Keeping Duplicates
UNION ALL keeps all rows including duplicates. It's faster than UNION because it doesn't need to check for and remove duplicates.
UNION vs UNION ALL
-- UNION (removes duplicates, slower)
SELECT product_id, name FROM products WHERE category_id = 1
UNION
SELECT product_id, name FROM products WHERE price > 100;
-- UNION ALL (keeps duplicates, faster)
SELECT product_id, name FROM products WHERE category_id = 1
UNION ALL
SELECT product_id, name FROM products WHERE price > 100;
-- Use UNION ALL when:
-- 1. You know there are no duplicates
-- 2. You want to keep duplicates
-- 3. Performance matters and duplicates are acceptable
Performance Difference
-- UNION process:
-- 1. Execute first query
-- 2. Execute second query
-- 3. Combine results
-- 4. SORT to find duplicates (expensive!)
-- 5. Remove duplicates
-- 6. Return result
-- UNION ALL process:
-- 1. Execute first query
-- 2. Execute second query
-- 3. Combine results
-- 4. Return result (no sorting/deduplication)
✅ Best Practice: Use UNION ALL when possible. Only use UNION when you specifically need to remove duplicates.
4. Rules for Using UNION
Rule 1: Same Number of Columns
❌ ERROR: Column count mismatch
-- This will fail!
SELECT customer_id, name FROM customers
UNION
SELECT product_id, name, price FROM products; -- 3 columns vs 2
-- Error: "The used SELECT statements have different number of columns"
✅ CORRECT: Same column count
-- Add NULL or placeholder for missing column
SELECT customer_id, name, NULL AS price FROM customers
UNION
SELECT product_id, name, price FROM products;
Rule 2: Compatible Data Types
-- ✅ Compatible types (both strings)
SELECT name FROM customers
UNION
SELECT name FROM suppliers;
-- ✅ Compatible types (both numbers)
SELECT customer_id FROM customers
UNION
SELECT supplier_id FROM suppliers;
-- ⚠️ Different types may work (automatic conversion)
SELECT customer_id FROM customers -- INT
UNION
SELECT 'NEW' AS customer_id; -- STRING (converts to string)
-- ❌ Incompatible types will fail
SELECT order_date FROM orders -- DATE
UNION
SELECT customer_id FROM customers; -- INT (cannot convert)
Rule 3: Column Names from First Query
-- First query determines column names
SELECT customer_id AS id, name AS full_name FROM customers
UNION
SELECT supplier_id, company_name FROM suppliers;
-- Result columns are named: id, full_name
-- (supplier_id and company_name are ignored)
5. ORDER BY and LIMIT with UNION
ORDER BY Must Be at the End
-- ✅ CORRECT: ORDER BY applies to entire result
SELECT name, 'Customer' AS type FROM customers
UNION
SELECT name, 'Supplier' AS type FROM suppliers
ORDER BY name; -- Sorts combined result
-- ❌ WRONG: Can't ORDER BY individual queries
SELECT name FROM customers ORDER BY name -- Error!
UNION
SELECT name FROM suppliers;
LIMIT with UNION
-- LIMIT applies to final result
SELECT name FROM customers
UNION
SELECT name FROM suppliers
LIMIT 10; -- Returns top 10 from combined result
-- To limit individual queries, use subqueries
(SELECT name FROM customers ORDER BY registration_date DESC LIMIT 5)
UNION ALL
(SELECT name FROM suppliers ORDER BY created_date DESC LIMIT 5)
ORDER BY name;
6. Common UNION Patterns
Pattern 1: Combining Different Tables
-- Create unified contact list
SELECT
customer_id AS id,
name,
email,
phone,
'Customer' AS contact_type
FROM customers
UNION ALL
SELECT
supplier_id,
company_name,
email,
phone,
'Supplier'
FROM suppliers
UNION ALL
SELECT
employee_id,
name,
work_email,
work_phone,
'Employee'
FROM employees
ORDER BY name;
Pattern 2: Historical + Current Data
-- Combine archived orders with current orders
SELECT
order_id,
customer_id,
order_date,
total,
'Archived' AS status
FROM orders_archive
WHERE order_date >= '2020-01-01'
UNION ALL
SELECT
order_id,
customer_id,
order_date,
total,
'Current'
FROM orders_current
ORDER BY order_date DESC;
Pattern 3: Multiple Conditions
-- Find customers needing attention
SELECT
customer_id,
name,
email,
'High Value - No Recent Orders' AS reason
FROM customers
WHERE total_lifetime_value > 10000
AND last_order_date < DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY)
UNION ALL
SELECT
customer_id,
name,
email,
'New Customer - No Orders Yet'
FROM customers
WHERE registration_date >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
AND total_orders = 0
UNION ALL
SELECT
customer_id,
name,
email,
'Abandoned Cart'
FROM customers c
WHERE EXISTS (
SELECT 1 FROM cart_items ci
WHERE ci.customer_id = c.customer_id
AND ci.created_at < DATE_SUB(CURRENT_DATE, INTERVAL 24 HOUR)
)
ORDER BY reason, name;
Pattern 4: Report Generation
-- Monthly sales summary with totals
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS period,
'Sales' AS metric,
SUM(total) AS value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
UNION ALL
SELECT
DATE_FORMAT(order_date, '%Y-%m'),
'Orders',
COUNT(*)
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
UNION ALL
SELECT
DATE_FORMAT(order_date, '%Y-%m'),
'Avg Order Value',
AVG(total)
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY period, metric;
7. INTERSECT and EXCEPT (MySQL Alternatives)
⚠️ MySQL Note: MySQL doesn't natively support INTERSECT and EXCEPT. However, you can achieve the same results using other methods.
INTERSECT (Rows in Both Queries)
-- Standard SQL (not MySQL):
SELECT customer_id FROM customers_usa
INTERSECT
SELECT customer_id FROM vip_customers;
-- MySQL equivalent using INNER JOIN:
SELECT DISTINCT c1.customer_id
FROM customers_usa c1
INNER JOIN vip_customers c2 ON c1.customer_id = c2.customer_id;
-- MySQL equivalent using IN:
SELECT customer_id
FROM customers_usa
WHERE customer_id IN (
SELECT customer_id
FROM vip_customers
);
EXCEPT/MINUS (Rows in First, Not in Second)
-- Standard SQL (not MySQL):
SELECT customer_id FROM all_customers
EXCEPT
SELECT customer_id FROM vip_customers;
-- MySQL equivalent using NOT IN:
SELECT customer_id
FROM all_customers
WHERE customer_id NOT IN (
SELECT customer_id
FROM vip_customers
WHERE customer_id IS NOT NULL
);
-- MySQL equivalent using NOT EXISTS (better):
SELECT c1.customer_id
FROM all_customers c1
WHERE NOT EXISTS (
SELECT 1
FROM vip_customers c2
WHERE c2.customer_id = c1.customer_id
);
8. Performance Tips for UNION
Optimization Strategies
- ✅ Use UNION ALL instead of UNION when duplicates don't matter
- ✅ Filter data before UNION (in individual queries)
- ✅ Index columns used in WHERE clauses of each query
- ✅ Limit rows in individual queries when possible
- ✅ Use identical column types to avoid conversions
- ✅ Consider materialized views for frequently-used UNIONs
Filter Early
-- ❌ SLOW: Filter after UNION
SELECT name, email FROM customers
UNION
SELECT name, email FROM suppliers
WHERE country = 'USA'; -- Error! Can't do this
-- ✅ FAST: Filter before UNION
SELECT name, email FROM customers WHERE country = 'USA'
UNION
SELECT name, email FROM suppliers WHERE country = 'USA';
9. 🎯 Practice Exercises
Exercise 1: Basic UNION Operations
Given this schema:
customers (customer_id, name, email, country, registration_date)
employees (employee_id, name, email, department, hire_date)
suppliers (supplier_id, company_name, email, country, established_date)
Tasks:
- Create a unified email list from all three tables
- Find all names (customers, employees, suppliers) sorted alphabetically
- Combine USA customers and suppliers, indicating the type
Show Solution
-- Task 1: Unified email list
SELECT
email,
'Customer' AS source
FROM customers
UNION
SELECT
email,
'Employee'
FROM employees
UNION
SELECT
email,
'Supplier'
FROM suppliers
ORDER BY email;
-- Task 2: All names alphabetically (UNION removes duplicates)
SELECT name FROM customers
UNION
SELECT name FROM employees
UNION
SELECT company_name FROM suppliers
ORDER BY name;
-- Task 3: USA entities with type indicator
SELECT
customer_id AS id,
name,
email,
country,
'Customer' AS entity_type
FROM customers
WHERE country = 'USA'
UNION ALL
SELECT
supplier_id,
company_name,
email,
country,
'Supplier'
FROM suppliers
WHERE country = 'USA'
ORDER BY entity_type, name;
Key Concepts:
- ✅ UNION removes duplicate emails automatically
- ✅ Add type indicators to distinguish sources
- ✅ UNION ALL faster when no duplicates expected
- ✅ ORDER BY applies to entire combined result
Exercise 2: Historical Data Combination
Given this schema:
orders_2023 (order_id, customer_id, order_date, total, status)
orders_2024 (order_id, customer_id, order_date, total, status)
orders_archive (order_id, customer_id, order_date, total, status, archive_year)
Tasks:
- Combine all orders from all tables for customer_id = 1001
- Get total order count and revenue across all tables
- Find top 10 orders by total across all years
Show Solution
-- Task 1: All orders for specific customer
SELECT
order_id,
customer_id,
order_date,
total,
status,
2023 AS year
FROM orders_2023
WHERE customer_id = 1001
UNION ALL
SELECT
order_id,
customer_id,
order_date,
total,
status,
2024
FROM orders_2024
WHERE customer_id = 1001
UNION ALL
SELECT
order_id,
customer_id,
order_date,
total,
status,
archive_year
FROM orders_archive
WHERE customer_id = 1001
ORDER BY order_date DESC;
-- Task 2: Total counts and revenue
SELECT
'Total' AS metric,
COUNT(*) AS order_count,
SUM(total) AS total_revenue
FROM (
SELECT order_id, total FROM orders_2023
UNION ALL
SELECT order_id, total FROM orders_2024
UNION ALL
SELECT order_id, total FROM orders_archive
) AS all_orders;
-- Task 3: Top 10 orders across all years
SELECT
order_id,
customer_id,
order_date,
total
FROM orders_2023
UNION ALL
SELECT
order_id,
customer_id,
order_date,
total
FROM orders_2024
UNION ALL
SELECT
order_id,
customer_id,
order_date,
total
FROM orders_archive
ORDER BY total DESC
LIMIT 10;
Historical Data Patterns:
- ✅ UNION ALL appropriate (no duplicates across years)
- ✅ Add year column to track source
- ✅ Wrap UNION in subquery for aggregation
- ✅ ORDER BY and LIMIT work on combined result
🔥 Challenge: Multi-Source Report Generation
You need to create a comprehensive sales dashboard:
orders (order_id, customer_id, order_date, total, status)
refunds (refund_id, order_id, refund_date, amount, reason)
exchanges (exchange_id, order_id, exchange_date, old_product_id, new_product_id)
subscriptions (subscription_id, customer_id, start_date, monthly_amount, status)
Challenge: Create a unified financial activity report showing all transactions (orders, refunds, exchanges, subscriptions) with type indicators, amounts (positive/negative), and monthly summaries.
Show Solution
-- Comprehensive financial activity report
SELECT
order_id AS transaction_id,
customer_id,
order_date AS transaction_date,
total AS amount,
'Order' AS transaction_type,
status AS transaction_status,
DATE_FORMAT(order_date, '%Y-%m') AS month
FROM orders
WHERE order_date >= '2024-01-01'
UNION ALL
SELECT
refund_id,
(SELECT customer_id FROM orders WHERE order_id = r.order_id),
refund_date,
-amount, -- Negative for refunds
'Refund',
reason,
DATE_FORMAT(refund_date, '%Y-%m')
FROM refunds r
WHERE refund_date >= '2024-01-01'
UNION ALL
SELECT
exchange_id,
(SELECT customer_id FROM orders WHERE order_id = e.order_id),
exchange_date,
0, -- No financial impact
'Exchange',
CONCAT('Old: ', old_product_id, ', New: ', new_product_id),
DATE_FORMAT(exchange_date, '%Y-%m')
FROM exchanges e
WHERE exchange_date >= '2024-01-01'
UNION ALL
SELECT
subscription_id,
customer_id,
start_date,
monthly_amount,
'Subscription',
status,
DATE_FORMAT(start_date, '%Y-%m')
FROM subscriptions
WHERE start_date >= '2024-01-01'
ORDER BY transaction_date DESC, transaction_type;
-- Monthly summary from combined data
SELECT
month,
transaction_type,
COUNT(*) AS transaction_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount
FROM (
-- Same UNION query as above
SELECT
order_id AS transaction_id,
order_date AS transaction_date,
total AS amount,
'Order' AS transaction_type,
DATE_FORMAT(order_date, '%Y-%m') AS month
FROM orders
WHERE order_date >= '2024-01-01'
UNION ALL
SELECT
refund_id,
refund_date,
-amount,
'Refund',
DATE_FORMAT(refund_date, '%Y-%m')
FROM refunds
WHERE refund_date >= '2024-01-01'
UNION ALL
SELECT
subscription_id,
start_date,
monthly_amount,
'Subscription',
DATE_FORMAT(start_date, '%Y-%m')
FROM subscriptions
WHERE start_date >= '2024-01-01'
) AS all_transactions
GROUP BY month, transaction_type
ORDER BY month DESC, transaction_type;
Advanced UNION Techniques:
- ✅ Multiple UNION ALL for different transaction types
- ✅ Negative amounts for refunds
- ✅ Subqueries to fetch related data
- ✅ Wrap UNION in subquery for aggregation
- ✅ DATE_FORMAT for monthly grouping
- 💡 This pattern powers real financial dashboards!
📝 Key Takeaways
- UNION combines rows from multiple queries, removing duplicates
- UNION ALL keeps all rows including duplicates (faster)
- JOINs combine columns, UNION combines rows
- Queries must have the same number of columns
- Column data types must be compatible
- Column names come from the first query
- ORDER BY goes at the end, applies to entire result
- Use UNION ALL when possible for better performance
- UNION requires sorting to remove duplicates (expensive)
- Filter data before UNION in individual queries
- MySQL doesn't support INTERSECT/EXCEPT natively
- Use IN/EXISTS for INTERSECT functionality
- Use NOT IN/NOT EXISTS for EXCEPT functionality
- Common uses: historical data, multi-source reports, contact lists
- UNION enables combining data from different tables/time periods