Primary Keys: The Foundation of Data Integrity
Every well-designed table needs a primary keyβa column (or set of columns) that uniquely identifies each row. Primary keys are the cornerstone of database integrity, enabling relationships between tables and ensuring that no two rows can be confused with each other. In this lesson, you'll learn what makes a good primary key, how to implement them, and why they're absolutely essential for professional database design.
π What You'll Learn
- What a primary key is and why every table needs one
- The four essential rules for primary keys
- How primary keys enable table relationships
- How primary keys support efficient queries and joins
- Natural keys vs. surrogate keys (and when to use each)
- Single-column vs. composite primary keys
- How to create primary keys in SQL
- Common primary key mistakes to avoid
- Best practices for choosing primary keys
1. What is a Primary Key?
A primary key is a column (or set of columns) that uniquely identifies each row in a table. No two rows can have the same primary key value, and the primary key cannot be NULL.
Example: Students Table
| π student_id | name | major | |
|---|---|---|---|
| 100234 | Alice Johnson | alice@uni.edu | Computer Science |
| 100567 | Bob Smith | bob@uni.edu | Biology |
| 100891 | Carol Davis | carol@uni.edu | Mathematics |
In this table, student_id is the primary key. Each student has a unique ID that will never change and will never be shared with another student.
2. The Four Golden Rules of Primary Keys
For a column to serve as a primary key, it must follow these four essential rules:
Rule 1: Uniqueness
Every value must be unique. No two rows can have the same primary key value.
Rule 2: Not NULL
Primary keys cannot be NULL. Every row must have a valid primary key value.
Rule 3: Unchanging
Primary keys should never change. Once assigned, they remain constant for the life of the row.
Rule 4: Simple
Keep it simple. Prefer single columns when possible. Avoid complex composite keys unless necessary.
β What Makes a BAD Primary Key?
| Bad Choice | Why It's Bad |
|---|---|
| Name | β Not unique (multiple people can have the same name) |
| β Can change (people change email addresses) | |
| Phone Number | β Can change and can be NULL (some people don't provide phone) |
| Address | β Not unique and changes frequently |
3. Why Every Table MUST Have a Primary Key
Reason 1: Uniquely Identify Rows
Without a primary key, you cannot reliably reference a specific row. Imagine trying to update "the student named John Smith" when you have three students with that name!
β Without Primary Key:
UPDATE students
SET major = 'Data Science'
WHERE name = 'John Smith';
-- Which John Smith? All three get updated! π±
β With Primary Key:
UPDATE students
SET major = 'Data Science'
WHERE student_id = 100234;
-- Exactly one row updated! β
Reason 2: Enable Relationships (Foreign Keys)
Foreign keys reference primary keys. Without primary keys, you cannot create relationships between tables!
students (π student_id) ββ enrollments (π student_id FK)
enrollments.student_id references students.student_id
Reason 3: Fast Lookups and Joins
Databases automatically create an index on the primary key, making queries extremely fast. Looking up a row by primary key is nearly instantaneous, even in tables with millions of rows.
| Query Type | Speed |
|---|---|
WHERE student_id = 100234 |
β‘ Super fast (indexed) |
WHERE name = 'Alice Johnson' |
β οΈ Slower (full table scan) |
Reason 4: Prevent Duplicate Data
The database enforces uniqueness automatically. You cannot accidentally insert two students with the same student_idβthe database will reject it.
4. Natural Keys vs. Surrogate Keys
Natural Keys: Using Real-World Identifiers
A natural key is a column that has real business meaning. Examples: Social Security Number, ISBN, License Plate, Employee Badge Number.
β Advantages of Natural Keys:
- Meaningful to users (ISBN tells you it's a book identifier)
- No need to join to another table to see the identifier
- One less column in the database
β Disadvantages of Natural Keys:
- Can change: SSNs can be reissued, license plates change
- Can be NULL: Not everyone has an SSN when first entering the system
- Privacy concerns: SSNs shouldn't be widely shared
- Composite keys: Natural keys are often multiple columns
Surrogate Keys: Database-Generated IDs
A surrogate key is an artificial identifier created by the database, usually an auto-incrementing integer. Examples: student_id, order_id, product_id.
β Advantages of Surrogate Keys:
- Never changes: Once assigned, stays forever
- Always unique: Database guarantees it
- Never NULL: Auto-generated on insert
- Simple: Single integer column
- Fast: Integers are compact and join quickly
- No privacy issues: No business meaning to protect
β Disadvantages of Surrogate Keys:
- No business meaning (student_id 100234 doesn't tell you who they are)
- Extra column in the database
β Best Practice: Use surrogate keys (auto-increment IDs) as primary keys in most cases. They're simple, reliable, and follow all four golden rules perfectly.
5. Creating Primary Keys in SQL
Method 1: Define at Column Level
CREATE TABLE students (
student_id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
major VARCHAR(100)
);
Method 2: Define at Table Level (More Flexible)
CREATE TABLE students (
student_id INT AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
major VARCHAR(100),
PRIMARY KEY (student_id)
);
Composite Primary Key (Multiple Columns)
Sometimes you need multiple columns to uniquely identify a row:
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrollment_date DATE NOT NULL,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id)
);
This composite key ensures that each student can enroll in each course only once. The combination of (student_id, course_id) must be unique.
Adding Primary Key to Existing Table
-- Add primary key after table creation
ALTER TABLE students
ADD PRIMARY KEY (student_id);
-- Add auto-increment to existing column
ALTER TABLE students
MODIFY student_id INT AUTO_INCREMENT;
6. Common Primary Key Mistakes
| β Mistake | β Fix |
|---|---|
| No primary key at all | Always add a surrogate key (id AUTO_INCREMENT) |
| Using email as primary key | Email can change! Use surrogate key instead |
| Allowing NULL primary keys | Primary keys are automatically NOT NULL |
| Changing primary key values | Never update primary keys! They should be permanent |
| Using varchar(255) for IDs | Use INTβmuch faster and more compact |
| Complex composite keys (4+ columns) | Add a surrogate key for simplicity |
7. π― Practice Exercises
Exercise 1: Create a Products Table
Create a products table with the following requirements:
- Auto-incrementing primary key
- Product name (required)
- Product description
- Price (required, decimal)
- Stock quantity (required, default 0)
Show Solution
CREATE TABLE products (
product_id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10, 2) NOT NULL,
stock_quantity INT NOT NULL DEFAULT 0
);
-- Verify the structure
DESCRIBE products;
Key Points:
- β product_id is auto-incrementing surrogate key
- β PRIMARY KEY ensures uniqueness and creates index
- β AUTO_INCREMENT automatically assigns next available ID
- β Required fields use NOT NULL
- β stock_quantity has DEFAULT value
Exercise 2: Fix a Broken Table
Someone created this table with several problems. Identify what's wrong and write the correct CREATE TABLE statement:
-- β Broken table
CREATE TABLE employees (
email VARCHAR(255),
first_name VARCHAR(100),
last_name VARCHAR(100),
hire_date DATE
);
Show Solution
Problems with Original Table:
- β No primary key! Cannot uniquely identify rows
- β Email shouldn't be primary key (can change)
- β No way to reference employees from other tables
- β Multiple employees can have same name
Fixed Version:
CREATE TABLE employees (
employee_id INT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(255) UNIQUE NOT NULL,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
hire_date DATE NOT NULL
);
-- Now you can reference employees:
-- SELECT * FROM employees WHERE employee_id = 42;
What We Fixed:
- β Added employee_id as surrogate primary key
- β Made email UNIQUE (no duplicates) but not primary key
- β Added NOT NULL constraints to required fields
- β Now each employee can be uniquely identified
Exercise 3: Composite Primary Key
Create a course_prerequisites table that tracks which courses require other courses as prerequisites. Each course can have multiple prerequisites, and the same prerequisite can be required by multiple courses. The combination of (course_id, prerequisite_course_id) should be unique.
Show Solution
CREATE TABLE course_prerequisites (
course_id INT NOT NULL,
prerequisite_course_id INT NOT NULL,
PRIMARY KEY (course_id, prerequisite_course_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id),
FOREIGN KEY (prerequisite_course_id) REFERENCES courses(course_id),
CHECK (course_id != prerequisite_course_id)
);
-- Example data:
-- CS201 requires CS101
-- INSERT INTO course_prerequisites VALUES (201, 101);
-- CS301 requires both CS201 and MATH150
-- INSERT INTO course_prerequisites VALUES (301, 201);
-- INSERT INTO course_prerequisites VALUES (301, 150);
Analysis:
- β Composite primary key (course_id, prerequisite_course_id)
- β Prevents duplicate prerequisite entries
- β Both columns reference the same courses table
- β CHECK prevents course from being its own prerequisite
- β This is a many-to-many self-referencing relationship
π₯ Challenge: Natural vs Surrogate Key Decision
You're designing a books table. You have ISBN (International Standard Book Number) which is unique to each book. Should you use ISBN as the primary key, or create a surrogate key book_id? Explain your reasoning and write the CREATE TABLE statement for your chosen approach.
Show Solution & Analysis
Option 1: ISBN as Natural Primary Key
CREATE TABLE books (
isbn VARCHAR(20) PRIMARY KEY,
title VARCHAR(255) NOT NULL,
author VARCHAR(255) NOT NULL,
publication_year INT
);
Advantages:
- β ISBN is globally unique and standardized
- β ISBN has real-world meaning
- β One less column in the database
Disadvantages:
- β VARCHAR(20) is slower than INT for joins
- β Takes more storage space (20 bytes vs 4 bytes)
- β Some books don't have ISBNs (old books, self-published)
- β ISBN format changed over time (ISBN-10 vs ISBN-13)
Option 2: Surrogate Key + ISBN
CREATE TABLE books (
book_id INT PRIMARY KEY AUTO_INCREMENT,
isbn VARCHAR(20) UNIQUE,
title VARCHAR(255) NOT NULL,
author VARCHAR(255) NOT NULL,
publication_year INT
);
Advantages:
- β INT is fast and compact
- β Handles books without ISBNs (isbn can be NULL)
- β Consistent pattern across all tables
- β Easy to reference in foreign keys
- β Still enforce ISBN uniqueness with UNIQUE constraint
β Best Practice: Use Option 2 (surrogate key + UNIQUE ISBN). This gives you the best of both worlds: fast integer joins and ISBN uniqueness enforcement. The ISBN remains searchable and unique, but the primary key is optimized for database operations.
π Key Takeaways
- Every table needs a primary key to uniquely identify rows
- Primary keys must be unique, not NULL, unchanging, and simple
- Primary keys enable relationships (foreign keys) between tables
- Primary keys are automatically indexed for fast lookups
- Surrogate keys (AUTO_INCREMENT IDs) are preferred over natural keys
- Use INT AUTO_INCREMENT for most primary keys
- Composite keys use multiple columns (rare, but sometimes necessary)
- Never use columns that can change or be NULL as primary keys
- Primary keys prevent duplicate data and enable efficient queries
- Without primary keys, you cannot reliably update, delete, or reference specific rows