← Back to Lessons
INTERMEDIATE ⏱ 40 minutes

SELF JOIN: Joining a Table to Itself

A SELF JOIN joins a table to itself, treating it as if it were two separate tables. This powerful technique solves problems involving hierarchies (employees and managers), comparisons within the same dataset (finding pairs, duplicates, or sequences), and relationships where both sides reference the same entity type. SELF JOINs require careful use of aliases to distinguish between the two "copies" of the table, but once mastered, they unlock solutions to complex relational problems that would otherwise require multiple queries or application-level logic.


📚 What You'll Learn

1. What is a SELF JOIN?

A SELF JOIN joins a table to itself by treating it as two separate tables. You must use different aliases to distinguish between the "left" and "right" copies of the same table.

Conceptual Example

Think of a table of employees:
┌─────────────┬──────────────┬────────────┐
│ employee_id │ name         │ manager_id │
├─────────────┼──────────────┼────────────┤
│ 1           │ Alice (CEO)  │ NULL       │
│ 2           │ Bob          │ 1          │
│ 3           │ Carol        │ 1          │
│ 4           │ David        │ 2          │
└─────────────┴──────────────┴────────────┘

Question: "Show each employee with their manager's name"

To answer, we need to join employees to employees:
- Left side: employees (as workers)
- Right side: employees (as managers)
- Connection: worker.manager_id = manager.employee_id

Why We Need SELF JOIN

2. Basic SELF JOIN Syntax

The Pattern

SELECT columns
FROM table_name AS alias1
JOIN table_name AS alias2
    ON alias1.column = alias2.column;

⚠️ Critical: You MUST use different aliases (e.g., e1 and e2, or worker and manager). Without aliases, SQL can't distinguish which copy of the table you're referencing.

Employee-Manager Example

-- Show employees with their manager names
SELECT 
    worker.employee_id,
    worker.name AS employee_name,
    manager.name AS manager_name
FROM employees worker
LEFT JOIN employees manager 
    ON worker.manager_id = manager.employee_id
ORDER BY worker.employee_id;

-- Aliases explained:
-- "worker" = employees table (left side)
-- "manager" = employees table (right side)
-- Join condition: worker's manager_id matches manager's employee_id

3. Hierarchical Relationships (Org Charts)

Sample Data

employees
+-------------+------------------+------------+------------+
| employee_id | name             | manager_id | position   |
+-------------+------------------+------------+------------+
| 1           | Alice Chen       | NULL       | CEO        |
| 2           | Bob Smith        | 1          | VP Sales   |
| 3           | Carol Williams   | 1          | VP Eng     |
| 4           | David Lee        | 2          | Sales Mgr  |
| 5           | Emma Johnson     | 2          | Sales Rep  |
| 6           | Frank Brown      | 3          | Dev Lead   |
| 7           | Grace Davis      | 3          | Dev Lead   |
+-------------+------------------+------------+------------+

Basic Manager Lookup

-- List employees with their manager
SELECT 
    e.name AS employee,
    e.position,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id
ORDER BY e.employee_id;

-- Result:
+------------------+------------+------------------+
| employee         | position   | manager          |
+------------------+------------+------------------+
| Alice Chen       | CEO        | NULL             | (No manager)
| Bob Smith        | VP Sales   | Alice Chen       |
| Carol Williams   | VP Eng     | Alice Chen       |
| David Lee        | Sales Mgr  | Bob Smith        |
| Emma Johnson     | Sales Rep  | Bob Smith        |
| Frank Brown      | Dev Lead   | Carol Williams   |
| Grace Davis      | Dev Lead   | Carol Williams   |
+------------------+------------+------------------+

Find Direct Reports

-- Show managers with their direct reports
SELECT 
    m.name AS manager,
    m.position AS manager_position,
    e.name AS direct_report,
    e.position AS report_position
FROM employees m
INNER JOIN employees e ON m.employee_id = e.manager_id
ORDER BY m.name, e.name;

-- Shows only managers who have direct reports
+------------------+------------------+------------------+----------------+
| manager          | manager_position | direct_report    | report_position|
+------------------+------------------+------------------+----------------+
| Alice Chen       | CEO              | Bob Smith        | VP Sales       |
| Alice Chen       | CEO              | Carol Williams   | VP Eng         |
| Bob Smith        | VP Sales         | David Lee        | Sales Mgr      |
| Bob Smith        | VP Sales         | Emma Johnson     | Sales Rep      |
| Carol Williams   | VP Eng           | Frank Brown      | Dev Lead       |
| Carol Williams   | VP Eng           | Grace Davis      | Dev Lead       |
+------------------+------------------+------------------+----------------+

Count Direct Reports

-- How many direct reports does each manager have?
SELECT 
    m.name AS manager,
    COUNT(e.employee_id) AS direct_reports
FROM employees m
LEFT JOIN employees e ON m.employee_id = e.manager_id
GROUP BY m.employee_id, m.name
HAVING COUNT(e.employee_id) > 0
ORDER BY direct_reports DESC;

-- Result:
+------------------+----------------+
| manager          | direct_reports |
+------------------+----------------+
| Alice Chen       | 2              |
| Bob Smith        | 2              |
| Carol Williams   | 2              |
+------------------+----------------+

4. Finding Peers (Same Level)

Employees with Same Manager

-- Find employees who share the same manager (peers)
SELECT 
    e1.name AS employee1,
    e2.name AS employee2,
    m.name AS shared_manager
FROM employees e1
INNER JOIN employees e2 
    ON e1.manager_id = e2.manager_id
    AND e1.employee_id < e2.employee_id  -- Prevent duplicates
INNER JOIN employees m ON e1.manager_id = m.employee_id
ORDER BY m.name, e1.name;

-- Result:
+------------------+------------------+------------------+
| employee1        | employee2        | shared_manager   |
+------------------+------------------+------------------+
| Bob Smith        | Carol Williams   | Alice Chen       |
| David Lee        | Emma Johnson     | Bob Smith        |
| Frank Brown      | Grace Davis      | Carol Williams   |
+------------------+------------------+------------------+

💡 Pro Tip: The condition e1.employee_id < e2.employee_id prevents getting duplicate pairs (e.g., both "Alice & Bob" and "Bob & Alice"). It also prevents matching an employee to themselves.

5. Comparing Rows: Finding Pairs and Patterns

Example: Products at Similar Prices

-- Find pairs of products with similar prices (within $5)
SELECT 
    p1.product_id AS product1_id,
    p1.name AS product1_name,
    p1.price AS product1_price,
    p2.product_id AS product2_id,
    p2.name AS product2_name,
    p2.price AS product2_price,
    ABS(p1.price - p2.price) AS price_difference
FROM products p1
INNER JOIN products p2 
    ON p1.product_id < p2.product_id
    AND ABS(p1.price - p2.price) <= 5
ORDER BY price_difference;

-- Finds competing products at similar price points

Example: Sequential Records

-- Compare consecutive orders from same customer
SELECT 
    o1.order_id AS first_order,
    o1.order_date AS first_date,
    o1.total AS first_total,
    o2.order_id AS next_order,
    o2.order_date AS next_date,
    o2.total AS next_total,
    DATEDIFF(o2.order_date, o1.order_date) AS days_between,
    o2.total - o1.total AS spending_change
FROM orders o1
INNER JOIN orders o2 
    ON o1.customer_id = o2.customer_id
    AND o2.order_date > o1.order_date
WHERE o1.customer_id = 1
ORDER BY o1.order_date, o2.order_date;

-- Shows spending patterns over time for customer 1

6. Detecting Duplicates

Find Duplicate Emails

-- Find customers with duplicate emails
SELECT 
    c1.customer_id AS customer1_id,
    c1.name AS customer1_name,
    c2.customer_id AS customer2_id,
    c2.name AS customer2_name,
    c1.email AS shared_email
FROM customers c1
INNER JOIN customers c2 
    ON c1.email = c2.email
    AND c1.customer_id < c2.customer_id
ORDER BY c1.email;

-- Shows pairs of customers sharing an email

Find Similar Names (Potential Duplicates)

-- Find customers with very similar names (typos, duplicates)
SELECT 
    c1.customer_id AS id1,
    c1.name AS name1,
    c2.customer_id AS id2,
    c2.name AS name2
FROM customers c1
INNER JOIN customers c2 
    ON c1.customer_id < c2.customer_id
    AND SOUNDEX(c1.name) = SOUNDEX(c2.name)
ORDER BY c1.name;

-- SOUNDEX finds phonetically similar names
-- Example: "John Smith" and "Jon Smyth"

7. Recursive Relationships

Example: Category Hierarchy

-- categories table
+-------------+--------------------+-------------------+
| category_id | name               | parent_category_id|
+-------------+--------------------+-------------------+
| 1           | Electronics        | NULL              |
| 2           | Computers          | 1                 |
| 3           | Smartphones        | 1                 |
| 4           | Laptops            | 2                 |
| 5           | Desktops           | 2                 |
+-------------+--------------------+-------------------+

-- Show categories with their parent
SELECT 
    child.name AS category,
    parent.name AS parent_category
FROM categories child
LEFT JOIN categories parent 
    ON child.parent_category_id = parent.category_id
ORDER BY parent.name, child.name;

-- Result:
+-------------+------------------+
| category    | parent_category  |
+-------------+------------------+
| Electronics | NULL             |
| Computers   | Electronics      |
| Smartphones | Electronics      |
| Laptops     | Computers        |
| Desktops    | Computers        |
+-------------+------------------+

Example: Referral Chain

-- Show who referred whom
SELECT 
    u1.username AS user,
    u2.username AS referred_by
FROM users u1
LEFT JOIN users u2 ON u1.referred_by_user_id = u2.user_id
WHERE u1.referred_by_user_id IS NOT NULL
ORDER BY u2.username, u1.username;

8. Multi-Level Relationships

Two Levels: Grandparent Relationships

-- Show employee → manager → manager's manager (skip level)
SELECT 
    emp.name AS employee,
    mgr.name AS manager,
    top_mgr.name AS managers_manager
FROM employees emp
LEFT JOIN employees mgr ON emp.manager_id = mgr.employee_id
LEFT JOIN employees top_mgr ON mgr.manager_id = top_mgr.employee_id
ORDER BY emp.employee_id;

-- Shows 3 levels of hierarchy

Find All Descendants

-- Find all reports under a specific manager (direct and indirect)
-- This requires recursive CTE (Common Table Expression)

WITH RECURSIVE org_tree AS (
    -- Base case: Direct reports
    SELECT 
        employee_id,
        name,
        manager_id,
        1 AS level
    FROM employees
    WHERE manager_id = 1  -- Start with Alice Chen's reports
    
    UNION ALL
    
    -- Recursive case: Reports of reports
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        ot.level + 1
    FROM employees e
    INNER JOIN org_tree ot ON e.manager_id = ot.employee_id
)
SELECT 
    name,
    level,
    REPEAT('  ', level - 1) || name AS indented_name
FROM org_tree
ORDER BY level, name;

-- Shows entire org tree under Alice

9. SELF JOIN Performance Considerations

Optimization Tips

  • ✅ Index the join column (e.g., manager_id, parent_id)
  • ✅ Index both sides if joining on different columns
  • ✅ Use INNER JOIN when you don't need NULL-manager rows
  • ✅ Add WHERE filters to reduce result set
  • ✅ Limit comparison queries (e.g., use < not !=)
  • ✅ Consider materialized views for complex hierarchies

Index Example

-- Create index on manager_id for faster SELF JOINs
CREATE INDEX idx_employees_manager_id ON employees(manager_id);

-- Now manager lookups are much faster

10. 🎯 Practice Exercises

Exercise 1: Employee Hierarchy

Given this employees table:

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(255),
    position VARCHAR(100),
    manager_id INT,
    salary DECIMAL(10, 2),
    hire_date DATE
);

Tasks:

  • List all employees with their manager's name and position
  • Find employees who earn more than their manager
  • Show managers with 3+ direct reports
Show Solution
-- Task 1: Employees with manager info
SELECT 
    e.employee_id,
    e.name AS employee_name,
    e.position AS employee_position,
    e.salary,
    m.name AS manager_name,
    m.position AS manager_position
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id
ORDER BY e.name;


-- Task 2: Employees earning more than their manager
SELECT 
    e.name AS employee,
    e.salary AS employee_salary,
    m.name AS manager,
    m.salary AS manager_salary,
    e.salary - m.salary AS salary_difference
FROM employees e
INNER JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary
ORDER BY salary_difference DESC;


-- Task 3: Managers with 3+ direct reports
SELECT 
    m.employee_id,
    m.name AS manager_name,
    m.position,
    COUNT(e.employee_id) AS direct_reports
FROM employees m
INNER JOIN employees e ON m.employee_id = e.manager_id
GROUP BY m.employee_id, m.name, m.position
HAVING COUNT(e.employee_id) >= 3
ORDER BY direct_reports DESC;

Key Concepts:

  • ✅ LEFT JOIN includes employees without managers
  • ✅ INNER JOIN excludes CEO (no manager to compare)
  • ✅ WHERE on joined table for salary comparison
  • ✅ GROUP BY + HAVING for aggregate filtering

Exercise 2: Product Comparisons

Given this products table:

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name VARCHAR(255),
    category VARCHAR(100),
    price DECIMAL(10, 2),
    stock INT
);

Tasks:

  • Find pairs of products in same category with similar prices (within $10)
  • Find products that are more expensive than all other products in their category
  • List products with lower stock than average stock in their category
Show Solution
-- Task 1: Similar-priced products in same category
SELECT 
    p1.name AS product1,
    p1.price AS price1,
    p2.name AS product2,
    p2.price AS price2,
    p1.category,
    ABS(p1.price - p2.price) AS price_difference
FROM products p1
INNER JOIN products p2 
    ON p1.category = p2.category
    AND p1.product_id < p2.product_id
    AND ABS(p1.price - p2.price) <= 10
ORDER BY p1.category, price_difference;


-- Task 2: Most expensive product per category
SELECT 
    p1.product_id,
    p1.name,
    p1.category,
    p1.price
FROM products p1
LEFT JOIN products p2 
    ON p1.category = p2.category
    AND p2.price > p1.price
WHERE p2.product_id IS NULL
ORDER BY p1.category;
-- If no product is more expensive, it's the most expensive


-- Task 3: Below-average stock in category
SELECT 
    p.product_id,
    p.name,
    p.category,
    p.stock,
    AVG(p2.stock) AS category_avg_stock
FROM products p
INNER JOIN products p2 ON p.category = p2.category
GROUP BY p.product_id, p.name, p.category, p.stock
HAVING p.stock < AVG(p2.stock)
ORDER BY p.category, p.stock;

Advanced Techniques:

  • ✅ Multiple join conditions (category AND price)
  • ✅ LEFT JOIN + IS NULL finds "maximum" records
  • ✅ SELF JOIN for within-group averages
  • ✅ p1.product_id < p2.product_id prevents duplicates

Exercise 3: Sequential Order Analysis

Given this orders table:

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total DECIMAL(10, 2),
    status VARCHAR(20)
);

Tasks:

  • For each customer, show their order with the previous order date and total
  • Find customers whose spending increased between consecutive orders
  • Calculate average days between orders for each customer
Show Solution
-- Task 1: Orders with previous order info
SELECT 
    o1.customer_id,
    o1.order_id AS current_order,
    o1.order_date AS current_date,
    o1.total AS current_total,
    o2.order_id AS previous_order,
    o2.order_date AS previous_date,
    o2.total AS previous_total,
    DATEDIFF(o1.order_date, o2.order_date) AS days_since_last
FROM orders o1
LEFT JOIN orders o2 
    ON o1.customer_id = o2.customer_id
    AND o2.order_date < o1.order_date
    AND o2.order_date = (
        SELECT MAX(order_date)
        FROM orders
        WHERE customer_id = o1.customer_id
          AND order_date < o1.order_date
    )
ORDER BY o1.customer_id, o1.order_date;


-- Task 2: Customers with increasing spending
SELECT DISTINCT
    o1.customer_id,
    o1.order_id AS order1,
    o1.total AS order1_total,
    o2.order_id AS order2,
    o2.total AS order2_total,
    o2.total - o1.total AS spending_increase
FROM orders o1
INNER JOIN orders o2 
    ON o1.customer_id = o2.customer_id
    AND o2.order_date > o1.order_date
    AND o2.total > o1.total
ORDER BY o1.customer_id, o1.order_date;


-- Task 3: Average days between orders
SELECT 
    o1.customer_id,
    COUNT(o2.order_id) AS order_pairs,
    AVG(DATEDIFF(o2.order_date, o1.order_date)) AS avg_days_between_orders
FROM orders o1
INNER JOIN orders o2 
    ON o1.customer_id = o2.customer_id
    AND o2.order_date > o1.order_date
GROUP BY o1.customer_id
HAVING COUNT(o2.order_id) > 0
ORDER BY avg_days_between_orders;

Sequential Analysis Patterns:

  • ✅ Compare current row to previous: o2.date < o1.date
  • ✅ Subquery finds "immediately previous" record
  • ✅ DATEDIFF calculates time between events
  • ✅ AVG aggregates across all pairs

🔥 Challenge: Social Network Analysis

You have a social network database:

CREATE TABLE users (
    user_id INT PRIMARY KEY,
    username VARCHAR(100),
    joined_date DATE
);

CREATE TABLE friendships (
    friendship_id INT PRIMARY KEY,
    user_id INT,
    friend_id INT,
    created_at TIMESTAMP
);

Challenge:

  • Find mutual friends (users who are friends with each other)
  • Find friend-of-friend connections (2 degrees of separation)
  • Recommend potential friends (friends of friends who aren't already friends)
Show Solution
-- Challenge 1: Mutual friends (bidirectional friendships)
SELECT 
    u1.username AS user1,
    u2.username AS user2,
    f1.created_at AS friendship_date
FROM friendships f1
INNER JOIN friendships f2 
    ON f1.user_id = f2.friend_id
    AND f1.friend_id = f2.user_id
INNER JOIN users u1 ON f1.user_id = u1.user_id
INNER JOIN users u2 ON f1.friend_id = u2.user_id
WHERE f1.user_id < f1.friend_id  -- Prevent duplicates
ORDER BY f1.created_at;


-- Challenge 2: Friend-of-friend connections
SELECT 
    u1.username AS user,
    u2.username AS friend,
    u3.username AS friend_of_friend
FROM friendships f1
INNER JOIN friendships f2 
    ON f1.friend_id = f2.user_id
    AND f2.friend_id != f1.user_id  -- Not back to original user
INNER JOIN users u1 ON f1.user_id = u1.user_id
INNER JOIN users u2 ON f1.friend_id = u2.user_id
INNER JOIN users u3 ON f2.friend_id = u3.user_id
WHERE f1.user_id = 1  -- For specific user
ORDER BY u3.username;


-- Challenge 3: Friend recommendations (friends of friends, not already friends)
SELECT 
    u1.username AS user,
    u3.username AS recommended_friend,
    COUNT(*) AS mutual_friends
FROM friendships f1
INNER JOIN friendships f2 
    ON f1.friend_id = f2.user_id
INNER JOIN users u1 ON f1.user_id = u1.user_id
INNER JOIN users u3 ON f2.friend_id = u3.user_id
WHERE f1.user_id = 1  -- For specific user
    AND f2.friend_id != f1.user_id  -- Not self
    AND NOT EXISTS (
        -- Not already friends
        SELECT 1 
        FROM friendships f3
        WHERE f3.user_id = f1.user_id 
          AND f3.friend_id = f2.friend_id
    )
GROUP BY u1.username, u3.username
ORDER BY mutual_friends DESC, u3.username
LIMIT 10;

Advanced Graph Patterns:

  • ✅ Bidirectional matching with f1 ↔ f2
  • ✅ Two-hop traversal (friend → friend's friend)
  • ✅ NOT EXISTS excludes existing relationships
  • ✅ COUNT mutual friends for recommendation strength
  • 💡 This pattern scales to LinkedIn, Facebook, Twitter

Real-World Application:

  • 🌐 LinkedIn "People You May Know"
  • 🌐 Facebook friend suggestions
  • 🌐 Twitter "Who to Follow"
  • 🌐 Professional networking platforms

📝 Key Takeaways