Introduction to Joins: Connecting Related Tables
Real-world databases don't store everything in a single table. Customer data lives in one table, orders in another, products in a third. Joins are how you connect these related tables to answer meaningful questions like "What did customer Alice order?" or "Which products have never been purchased?" This lesson introduces the fundamental concept of joinsβwhy they exist, how they work, and how to think about relationships between tables. Mastering joins transforms you from writing simple queries to building sophisticated data analyses.
π What You'll Learn
- Why databases split data across multiple tables
- How tables relate through primary and foreign keys
- What a join accomplishes and when to use it
- The difference between matching and non-matching rows
- The basic structure of a join query
- How to visualize joins conceptually
- Common join terminology and patterns
- How to identify join opportunities in real queries
1. Why Databases Use Multiple Tables
Imagine storing all customer and order information in a single table. You'd repeat customer details (name, email, address) for every order they place. This wastes space, creates inconsistencies, and makes updates difficult. Instead, we normalize data: customers in one table, orders in another, connected by keys.
The Problem: Denormalized Data
-- β BAD: Everything in one table (denormalized)
orders_denormalized
+----------+---------------+------------------+------------+------------+
| order_id | customer_name | customer_email | product | quantity |
+----------+---------------+------------------+------------+------------+
| 1 | Alice Johnson | alice@email.com | Laptop | 1 |
| 2 | Alice Johnson | alice@email.com | Mouse | 2 |
| 3 | Alice Johnson | alice@email.com | Keyboard | 1 |
| 4 | Bob Smith | bob@email.com | Monitor | 2 |
+----------+---------------+------------------+------------+------------+
Problems:
β’ Alice's info repeated 3 times (data redundancy)
β’ If Alice changes email, must update 3 rows (update anomaly)
β’ Wastes storage space
β’ Risk of inconsistent data (typos, partial updates)
The Solution: Normalized Data with Relationships
-- β
GOOD: Split into related tables (normalized)
customers
+-------------+---------------+------------------+
| customer_id | name | email |
+-------------+---------------+------------------+
| 1 | Alice Johnson | alice@email.com |
| 2 | Bob Smith | bob@email.com |
+-------------+---------------+------------------+
orders
+----------+-------------+------------+----------+
| order_id | customer_id | product | quantity |
+----------+-------------+------------+----------+
| 1 | 1 | Laptop | 1 |
| 2 | 1 | Mouse | 2 |
| 3 | 1 | Keyboard | 1 |
| 4 | 2 | Monitor | 2 |
+----------+-------------+------------+----------+
Benefits:
β’ Customer info stored once (no redundancy)
β’ Update Alice's email in one place
β’ Saves storage space
β’ Maintains data consistency
2. How Tables Connect: Primary and Foreign Keys
Primary Keys: Unique Identifiers
Every table should have a primary keyβa column (or combination of columns) that uniquely identifies each row. Primary keys are typically numeric IDs that auto-increment.
CREATE TABLE customers (
customer_id INT PRIMARY KEY AUTO_INCREMENT, -- Primary key
name VARCHAR(255),
email VARCHAR(255)
);
Foreign Keys: References to Other Tables
A foreign key is a column in one table that references the primary key of another table. This creates the relationship between tables.
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT, -- Foreign key referencing customers.customer_id
product VARCHAR(255),
quantity INT,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
Visualizing the Relationship
customers orders
βββββββββββββββ βββββββββββββββ
β customer_id βββββββββββββββ customer_id β (Foreign Key)
β name β β order_id β
β email β β product β
βββββββββββββββ β quantity β
βββββββββββββββ
Relationship: One customer can have many orders (One-to-Many)
β’ customer_id in orders points to customer_id in customers
β’ This link lets us connect order data to customer data
3. What Joins Accomplish
A join combines rows from two or more tables based on a related column. It "stitches together" data that's been separated for efficiency, allowing you to query across tables as if they were one.
Without Joins: Limited Queries
-- Can only see customer IDs, not names
SELECT * FROM orders;
+----------+-------------+----------+----------+
| order_id | customer_id | product | quantity |
+----------+-------------+----------+----------+
| 1 | 1 | Laptop | 1 |
| 2 | 1 | Mouse | 2 |
| 4 | 2 | Monitor | 2 |
+----------+-------------+----------+----------+
-- Can only see customer info, no order details
SELECT * FROM customers;
+-------------+---------------+------------------+
| customer_id | name | email |
+-------------+---------------+------------------+
| 1 | Alice Johnson | alice@email.com |
| 2 | Bob Smith | bob@email.com |
+-------------+---------------+------------------+
With Joins: Complete Picture
-- Join to see both customer info AND order details
SELECT
o.order_id,
c.name AS customer_name,
c.email,
o.product,
o.quantity
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
+----------+---------------+------------------+----------+----------+
| order_id | customer_name | email | product | quantity |
+----------+---------------+------------------+----------+----------+
| 1 | Alice Johnson | alice@email.com | Laptop | 1 |
| 2 | Alice Johnson | alice@email.com | Mouse | 2 |
| 4 | Bob Smith | bob@email.com | Monitor | 2 |
+----------+---------------+------------------+----------+----------+
Now we can answer: "What did Alice order?" "Who bought monitors?"
4. Basic Join Structure
Anatomy of a Join Query
SELECT
columns
FROM table1
JOIN table2 ON table1.column = table2.column;
Breaking It Down
SELECT
c.name, -- Column from customers table
o.product -- Column from orders table
FROM customers c -- First table (with alias 'c')
JOIN orders o -- Second table (with alias 'o')
ON c.customer_id = o.customer_id; -- Join condition
Components:
1. FROM: Starting table (customers)
2. JOIN: Table to connect (orders)
3. ON: How tables relate (matching customer_id values)
4. SELECT: Which columns to display from either table
Table Aliases
When joining tables, use aliases (short names) to make queries readable. Without aliases, you'd write customers.customer_id repeatedly. With aliases, just write c.customer_id.
-- Without aliases (verbose, hard to read)
SELECT customers.name, orders.product
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id;
-- With aliases (clean, readable)
SELECT c.name, o.product
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
5. Matching vs. Non-Matching Rows
Understanding what happens when rows match or don't match is crucial for choosing the right join type.
Matching Rows
customers
+-------------+---------------+
| customer_id | name |
+-------------+---------------+
| 1 | Alice Johnson |
| 2 | Bob Smith |
| 3 | Carol Davis | β No orders yet
+-------------+---------------+
orders
+----------+-------------+----------+
| order_id | customer_id | product |
+----------+-------------+----------+
| 1 | 1 | Laptop | β Matches Alice
| 2 | 1 | Mouse | β Matches Alice
| 4 | 2 | Monitor | β Matches Bob
+----------+-------------+----------+
Matching rows:
β’ Alice (customer_id=1) has 2 orders
β’ Bob (customer_id=2) has 1 order
Non-matching row:
β’ Carol (customer_id=3) has NO orders
What Different Joins Do
| Join Type | What It Returns | Carol's Row? |
|---|---|---|
| INNER JOIN | Only rows that match in BOTH tables | β Excluded |
| LEFT JOIN | All from left table + matches from right | β Included (with NULLs) |
| RIGHT JOIN | All from right table + matches from left | Depends on direction |
| FULL OUTER JOIN | All rows from both tables | β Included (with NULLs) |
6. Common Table Relationships
One-to-Many (Most Common)
One row in table A relates to many rows in table B. This is the most common relationship type.
Examples:
β’ One customer β Many orders
β’ One author β Many books
β’ One department β Many employees
β’ One category β Many products
customers (1) β orders (Many)
Alice β [Laptop, Mouse, Keyboard]
Bob β [Monitor, Keyboard]
Many-to-Many
Many rows in table A relate to many rows in table B. Requires a junction table (also called bridge table or associative table).
Examples:
β’ Students β Courses (students take many courses, courses have many students)
β’ Products β Orders (products in many orders, orders contain many products)
β’ Authors β Books (authors write many books, books can have multiple authors)
students enrollments courses
Alice β [Math, History] β Math has [Alice, Bob, Carol]
Bob β [Math, Science] β Science has [Bob, David]
Carol β [Math] β History has [Alice]
-- Junction table for many-to-many
CREATE TABLE students (
student_id INT PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE courses (
course_id INT PRIMARY KEY,
course_name VARCHAR(255)
);
CREATE TABLE enrollments ( -- Junction table
enrollment_id INT PRIMARY KEY,
student_id INT,
course_id INT,
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
One-to-One (Rare)
One row in table A relates to exactly one row in table B. Less common, often used for separating sensitive data or large optional fields.
Examples:
β’ User β User_Profile (extended info)
β’ Employee β Employee_Passport (sensitive data)
β’ Product β Product_Detailed_Specs (large optional text)
users (1) ββ user_profiles (1)
Alice ββ [Bio, Avatar, Preferences]
Bob ββ [Bio, Avatar, Preferences]
7. Real-World Join Scenarios
Scenario 1: E-commerce Order History
-- Question: "Show me all orders with customer names and emails"
SELECT
o.order_id,
o.order_date,
c.name AS customer_name,
c.email,
o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
ORDER BY o.order_date DESC;
Scenario 2: Product Catalog with Categories
-- Question: "List all products with their category names"
SELECT
p.product_id,
p.name AS product_name,
p.price,
cat.name AS category_name
FROM products p
JOIN categories cat ON p.category_id = cat.category_id
ORDER BY cat.name, p.name;
Scenario 3: Employee Department Lookup
-- Question: "Show employees with their department names"
SELECT
e.employee_id,
e.name AS employee_name,
e.position,
d.name AS department_name,
d.location
FROM employees e
JOIN departments d ON e.department_id = d.department_id;
8. Identifying When You Need a Join
You Need a Join When:
- You need data from multiple tables in one result
- You want to see related information together
- You need to filter based on values in another table
- You're answering questions that span table boundaries
- You see foreign keys pointing to other tables
Question Analysis
| Question | Tables Needed | Join Required? |
|---|---|---|
| "List all customers" | customers | β No |
| "What did Alice order?" | customers + orders | β Yes |
| "Show products by category" | products + categories | β Yes |
| "Total sales per customer" | customers + orders | β Yes |
| "How many orders today?" | orders | β No |
9. π― Practice Exercises
Exercise 1: Identify Relationships
Given these table structures, identify the type of relationship:
authors (author_id, name)
books (book_id, title, author_id)
products (product_id, name)
orders (order_id, customer_id)
order_items (order_item_id, order_id, product_id)
users (user_id, username)
profiles (profile_id, user_id, bio, avatar)
Questions:
- What's the relationship between authors and books?
- What's the relationship between products and orders?
- What's the relationship between users and profiles?
Show Solution
Answer 1: Authors and Books
One-to-Many relationship
- One author can write many books
- Each book has one author (based on author_id foreign key in books)
- Foreign key: books.author_id β authors.author_id
Answer 2: Products and Orders
Many-to-Many relationship
- One product can appear in many orders
- One order can contain many products
- Junction table: order_items connects them
- Foreign keys: order_items.order_id β orders.order_id
- Foreign keys: order_items.product_id β products.product_id
Answer 3: Users and Profiles
One-to-One relationship
- One user has exactly one profile
- One profile belongs to exactly one user
- Foreign key: profiles.user_id β users.user_id
- Typically profiles.user_id would be UNIQUE
Exercise 2: Write Your First Join
Given these tables:
CREATE TABLE departments (
department_id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department_id INT,
salary DECIMAL(10, 2)
);
Task: Write a query to show each employee's name, salary, and their department name.
Show Solution
-- Solution
SELECT
e.name AS employee_name,
e.salary,
d.name AS department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
ORDER BY d.name, e.name;
-- Breaking it down:
-- 1. FROM employees e: Start with employees table, alias 'e'
-- 2. JOIN departments d: Connect to departments table, alias 'd'
-- 3. ON e.department_id = d.department_id: Match on foreign key
-- 4. SELECT: Choose columns from both tables
-- 5. ORDER BY: Sort by department, then employee name
Key Points:
- β Aliases (e, d) make the query readable
- β ON clause specifies how tables relate
- β We can select from either table (e.name, d.name)
- β Result shows employee and department together
Exercise 3: Identify Join Needs
For each question, determine which tables you'd need and whether a join is required:
- "How many products do we have?"
- "Which customers have placed orders in 2024?"
- "What's the total revenue per product category?"
- "List all employees hired after 2020"
- "Show orders with customer email addresses"
Show Solution
Question 1: "How many products do we have?"
- Tables needed: products
- Join required: β No - single table query
- Query:
SELECT COUNT(*) FROM products;
Question 2: "Which customers have placed orders in 2024?"
- Tables needed: customers, orders
- Join required: β Yes - need customer names from one table, order dates from another
- Relationship: customers.customer_id = orders.customer_id
Question 3: "What's the total revenue per product category?"
- Tables needed: categories, products, orders (or order_items)
- Join required: β Yes - need category names, product info, and sales data
- Multiple joins needed!
Question 4: "List all employees hired after 2020"
- Tables needed: employees
- Join required: β No - single table with WHERE clause
- Query:
SELECT * FROM employees WHERE hire_date > '2020-12-31';
Question 5: "Show orders with customer email addresses"
- Tables needed: orders, customers
- Join required: β Yes - emails in customers, order data in orders
- Relationship: orders.customer_id = customers.customer_id
π Key Takeaways
- Databases normalize data by splitting it across multiple tables
- Primary keys uniquely identify rows within a table
- Foreign keys create relationships by referencing primary keys
- Joins combine data from related tables into a single result
- The ON clause specifies how tables are related
- Use table aliases to make join queries readable
- One-to-Many is the most common relationship type
- Many-to-Many relationships require a junction table
- Different join types handle matching vs non-matching rows differently
- You need a join when your question spans multiple tables
- Joins let you reconstruct normalized data for queries
- Understanding relationships is key to writing effective joins