← Back to Lessons
BEGINNER ⏱ 35 minutes

Primary Keys: The Foundation of Data Integrity

Every well-designed table needs a primary keyβ€”a column (or set of columns) that uniquely identifies each row. Primary keys are the cornerstone of database integrity, enabling relationships between tables and ensuring that no two rows can be confused with each other. In this lesson, you'll learn what makes a good primary key, how to implement them, and why they're absolutely essential for professional database design.


πŸ“š What You'll Learn

1. What is a Primary Key?

A primary key is a column (or set of columns) that uniquely identifies each row in a table. No two rows can have the same primary key value, and the primary key cannot be NULL.

Example: Students Table

πŸ”‘ student_id name email major
100234 Alice Johnson alice@uni.edu Computer Science
100567 Bob Smith bob@uni.edu Biology
100891 Carol Davis carol@uni.edu Mathematics

In this table, student_id is the primary key. Each student has a unique ID that will never change and will never be shared with another student.

2. The Four Golden Rules of Primary Keys

For a column to serve as a primary key, it must follow these four essential rules:

Rule 1: Uniqueness

Every value must be unique. No two rows can have the same primary key value.

Rule 2: Not NULL

Primary keys cannot be NULL. Every row must have a valid primary key value.

Rule 3: Unchanging

Primary keys should never change. Once assigned, they remain constant for the life of the row.

Rule 4: Simple

Keep it simple. Prefer single columns when possible. Avoid complex composite keys unless necessary.

❌ What Makes a BAD Primary Key?

Bad Choice Why It's Bad
Name ❌ Not unique (multiple people can have the same name)
Email ❌ Can change (people change email addresses)
Phone Number ❌ Can change and can be NULL (some people don't provide phone)
Address ❌ Not unique and changes frequently

3. Why Every Table MUST Have a Primary Key

Reason 1: Uniquely Identify Rows

Without a primary key, you cannot reliably reference a specific row. Imagine trying to update "the student named John Smith" when you have three students with that name!

❌ Without Primary Key:

UPDATE students 
SET major = 'Data Science' 
WHERE name = 'John Smith';

-- Which John Smith? All three get updated! 😱

βœ… With Primary Key:

UPDATE students 
SET major = 'Data Science' 
WHERE student_id = 100234;

-- Exactly one row updated! βœ“

Reason 2: Enable Relationships (Foreign Keys)

Foreign keys reference primary keys. Without primary keys, you cannot create relationships between tables!

students (πŸ”‘ student_id) ←→ enrollments (πŸ”— student_id FK)

enrollments.student_id references students.student_id

Reason 3: Fast Lookups and Joins

Databases automatically create an index on the primary key, making queries extremely fast. Looking up a row by primary key is nearly instantaneous, even in tables with millions of rows.

Query Type Speed
WHERE student_id = 100234 ⚑ Super fast (indexed)
WHERE name = 'Alice Johnson' ⚠️ Slower (full table scan)

Reason 4: Prevent Duplicate Data

The database enforces uniqueness automatically. You cannot accidentally insert two students with the same student_idβ€”the database will reject it.

4. Natural Keys vs. Surrogate Keys

Natural Keys: Using Real-World Identifiers

A natural key is a column that has real business meaning. Examples: Social Security Number, ISBN, License Plate, Employee Badge Number.

βœ… Advantages of Natural Keys:

❌ Disadvantages of Natural Keys:

Surrogate Keys: Database-Generated IDs

A surrogate key is an artificial identifier created by the database, usually an auto-incrementing integer. Examples: student_id, order_id, product_id.

βœ… Advantages of Surrogate Keys:

❌ Disadvantages of Surrogate Keys:

βœ… Best Practice: Use surrogate keys (auto-increment IDs) as primary keys in most cases. They're simple, reliable, and follow all four golden rules perfectly.

5. Creating Primary Keys in SQL

Method 1: Define at Column Level

CREATE TABLE students (
    student_id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    major VARCHAR(100)
);

Method 2: Define at Table Level (More Flexible)

CREATE TABLE students (
    student_id INT AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    major VARCHAR(100),
    PRIMARY KEY (student_id)
);

Composite Primary Key (Multiple Columns)

Sometimes you need multiple columns to uniquely identify a row:

CREATE TABLE enrollments (
    student_id INT,
    course_id INT,
    enrollment_date DATE NOT NULL,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id)
);

This composite key ensures that each student can enroll in each course only once. The combination of (student_id, course_id) must be unique.

Adding Primary Key to Existing Table

-- Add primary key after table creation
ALTER TABLE students
ADD PRIMARY KEY (student_id);

-- Add auto-increment to existing column
ALTER TABLE students
MODIFY student_id INT AUTO_INCREMENT;

6. Common Primary Key Mistakes

❌ Mistake βœ… Fix
No primary key at all Always add a surrogate key (id AUTO_INCREMENT)
Using email as primary key Email can change! Use surrogate key instead
Allowing NULL primary keys Primary keys are automatically NOT NULL
Changing primary key values Never update primary keys! They should be permanent
Using varchar(255) for IDs Use INTβ€”much faster and more compact
Complex composite keys (4+ columns) Add a surrogate key for simplicity

7. 🎯 Practice Exercises

Exercise 1: Create a Products Table

Create a products table with the following requirements:

  • Auto-incrementing primary key
  • Product name (required)
  • Product description
  • Price (required, decimal)
  • Stock quantity (required, default 0)
Show Solution
CREATE TABLE products (
    product_id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    price DECIMAL(10, 2) NOT NULL,
    stock_quantity INT NOT NULL DEFAULT 0
);

-- Verify the structure
DESCRIBE products;

Key Points:

  • βœ… product_id is auto-incrementing surrogate key
  • βœ… PRIMARY KEY ensures uniqueness and creates index
  • βœ… AUTO_INCREMENT automatically assigns next available ID
  • βœ… Required fields use NOT NULL
  • βœ… stock_quantity has DEFAULT value

Exercise 2: Fix a Broken Table

Someone created this table with several problems. Identify what's wrong and write the correct CREATE TABLE statement:

-- ❌ Broken table
CREATE TABLE employees (
    email VARCHAR(255),
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    hire_date DATE
);
Show Solution

Problems with Original Table:

  • ❌ No primary key! Cannot uniquely identify rows
  • ❌ Email shouldn't be primary key (can change)
  • ❌ No way to reference employees from other tables
  • ❌ Multiple employees can have same name

Fixed Version:

CREATE TABLE employees (
    employee_id INT PRIMARY KEY AUTO_INCREMENT,
    email VARCHAR(255) UNIQUE NOT NULL,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    hire_date DATE NOT NULL
);

-- Now you can reference employees:
-- SELECT * FROM employees WHERE employee_id = 42;

What We Fixed:

  • βœ… Added employee_id as surrogate primary key
  • βœ… Made email UNIQUE (no duplicates) but not primary key
  • βœ… Added NOT NULL constraints to required fields
  • βœ… Now each employee can be uniquely identified

Exercise 3: Composite Primary Key

Create a course_prerequisites table that tracks which courses require other courses as prerequisites. Each course can have multiple prerequisites, and the same prerequisite can be required by multiple courses. The combination of (course_id, prerequisite_course_id) should be unique.

Show Solution
CREATE TABLE course_prerequisites (
    course_id INT NOT NULL,
    prerequisite_course_id INT NOT NULL,
    PRIMARY KEY (course_id, prerequisite_course_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id),
    FOREIGN KEY (prerequisite_course_id) REFERENCES courses(course_id),
    CHECK (course_id != prerequisite_course_id)
);

-- Example data:
-- CS201 requires CS101
-- INSERT INTO course_prerequisites VALUES (201, 101);
-- CS301 requires both CS201 and MATH150
-- INSERT INTO course_prerequisites VALUES (301, 201);
-- INSERT INTO course_prerequisites VALUES (301, 150);

Analysis:

  • βœ… Composite primary key (course_id, prerequisite_course_id)
  • βœ… Prevents duplicate prerequisite entries
  • βœ… Both columns reference the same courses table
  • βœ… CHECK prevents course from being its own prerequisite
  • βœ… This is a many-to-many self-referencing relationship

πŸ”₯ Challenge: Natural vs Surrogate Key Decision

You're designing a books table. You have ISBN (International Standard Book Number) which is unique to each book. Should you use ISBN as the primary key, or create a surrogate key book_id? Explain your reasoning and write the CREATE TABLE statement for your chosen approach.

Show Solution & Analysis

Option 1: ISBN as Natural Primary Key

CREATE TABLE books (
    isbn VARCHAR(20) PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(255) NOT NULL,
    publication_year INT
);

Advantages:

  • βœ… ISBN is globally unique and standardized
  • βœ… ISBN has real-world meaning
  • βœ… One less column in the database

Disadvantages:

  • ❌ VARCHAR(20) is slower than INT for joins
  • ❌ Takes more storage space (20 bytes vs 4 bytes)
  • ❌ Some books don't have ISBNs (old books, self-published)
  • ❌ ISBN format changed over time (ISBN-10 vs ISBN-13)

Option 2: Surrogate Key + ISBN

CREATE TABLE books (
    book_id INT PRIMARY KEY AUTO_INCREMENT,
    isbn VARCHAR(20) UNIQUE,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(255) NOT NULL,
    publication_year INT
);

Advantages:

  • βœ… INT is fast and compact
  • βœ… Handles books without ISBNs (isbn can be NULL)
  • βœ… Consistent pattern across all tables
  • βœ… Easy to reference in foreign keys
  • βœ… Still enforce ISBN uniqueness with UNIQUE constraint

βœ… Best Practice: Use Option 2 (surrogate key + UNIQUE ISBN). This gives you the best of both worlds: fast integer joins and ISBN uniqueness enforcement. The ISBN remains searchable and unique, but the primary key is optimized for database operations.

πŸ“ Key Takeaways

← Previous: Schema Design
Next: Foreign Keys β†’