Indexes and Query Optimization
Indexes are data structures that dramatically speed up data retrieval at the cost of slower writes and additional storage. Think of an index like a book's index: instead of reading every page to find "SQL," you look in the index and jump directly to the right pages. Without indexes, databases perform table scansβreading every row to find matches. With proper indexes, queries that took seconds complete in milliseconds. Understanding when and how to create indexes, recognizing query patterns that benefit from indexing, and using EXPLAIN to analyze query execution are essential skills for building performant applications. This lesson covers index types, creation strategies, and optimization techniques that transform slow queries into fast ones.
π What You'll Learn
- What indexes are and how they work
- How to create and drop indexes
- Different index types (PRIMARY, UNIQUE, INDEX, FULLTEXT)
- How to create composite (multi-column) indexes
- When to use indexes and when not to
- How to use EXPLAIN to analyze queries
- How to identify slow queries
- Index optimization strategies
- Common indexing mistakes to avoid
1. Understanding Indexes
An index is a data structure that improves the speed of data retrieval operations. Indexes store a sorted copy of selected columns along with pointers to the full rows, allowing the database to find data without scanning every row.
Without Index (Table Scan)
-- Query: Find customer with email 'alice@example.com'
SELECT * FROM customers WHERE email = 'alice@example.com';
Without Index:
1. Read row 1 β check email β not a match
2. Read row 2 β check email β not a match
3. Read row 3 β check email β not a match
...
1000. Read row 1000 β check email β MATCH!
Result: Scanned 1,000 rows to find 1 record
Time: ~100ms
With Index
With Index on email:
1. Look up 'alice@example.com' in sorted index β found at position
2. Follow pointer to row 1000
3. Return row 1000
Result: Found in ~3 lookups (binary search)
Time: ~1ms (100x faster!)
The Trade-off
| Operation | Without Index | With Index |
|---|---|---|
| SELECT (search) | β Slow (table scan) | β Fast (index lookup) |
| INSERT | β Fast | β οΈ Slower (update index) |
| UPDATE | β Fast | β οΈ Slower (update index) |
| DELETE | β Fast | β οΈ Slower (update index) |
| Storage | β Less space | β οΈ More space |
2. Creating Indexes
Basic Index Syntax
-- Create index on single column
CREATE INDEX index_name ON table_name(column_name);
-- Example: Index on email column
CREATE INDEX idx_email ON customers(email);
Index Created During Table Creation
-- Indexes can be created with the table
CREATE TABLE customers (
customer_id INT PRIMARY KEY, -- Automatic index
email VARCHAR(255) UNIQUE, -- Automatic index
name VARCHAR(100),
country VARCHAR(50),
INDEX idx_country (country), -- Explicit index
INDEX idx_name (name) -- Explicit index
);
Adding Index to Existing Table
-- Add index after table creation
CREATE INDEX idx_order_date ON orders(order_date);
CREATE INDEX idx_status ON orders(status);
CREATE INDEX idx_customer_id ON orders(customer_id);
3. Types of Indexes
PRIMARY KEY (Clustered Index)
-- PRIMARY KEY creates unique index
CREATE TABLE products (
product_id INT PRIMARY KEY, -- Automatic unique index
name VARCHAR(100)
);
-- Only one PRIMARY KEY per table
-- Cannot contain NULL values
-- Physically organizes data on disk
UNIQUE Index
-- UNIQUE index prevents duplicate values
CREATE UNIQUE INDEX idx_email ON customers(email);
-- Or during table creation
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE -- Automatic unique index
);
-- Allows one NULL value (in most databases)
Regular (Non-Unique) Index
-- Regular index allows duplicates
CREATE INDEX idx_country ON customers(country);
CREATE INDEX idx_status ON orders(status);
-- Most common type of index
-- Speeds up searches on that column
FULLTEXT Index
-- FULLTEXT index for text searching
CREATE FULLTEXT INDEX idx_description ON products(description);
-- Use with MATCH...AGAINST
SELECT * FROM products
WHERE MATCH(description) AGAINST('wireless bluetooth');
-- Only works on TEXT/CHAR columns
-- Supports natural language and boolean searches
4. Composite (Multi-Column) Indexes
Composite indexes include multiple columns in a specific order. The order matters! MySQL uses the index left-to-right.
Creating Composite Index
-- Composite index on multiple columns
CREATE INDEX idx_country_city ON customers(country, city);
CREATE INDEX idx_date_status ON orders(order_date, status);
CREATE INDEX idx_category_price ON products(category_id, price);
Leftmost Prefix Rule
-- Index: idx_country_city on (country, city)
-- β
Can use index (leftmost column)
SELECT * FROM customers WHERE country = 'USA';
-- β
Can use index (both columns)
SELECT * FROM customers WHERE country = 'USA' AND city = 'Austin';
-- β Cannot use index efficiently (skips leftmost column)
SELECT * FROM customers WHERE city = 'Austin';
-- The index MUST start from the left!
-- Think of phone book: sorted by Last, First
-- You can search: Smith, Smith John
-- You CANNOT search: John (without last name)
Column Order Matters
-- Index on (country, city, zip)
β
WHERE country = 'USA'
β
WHERE country = 'USA' AND city = 'Austin'
β
WHERE country = 'USA' AND city = 'Austin' AND zip = '78701'
β
WHERE country = 'USA' AND zip = '78701' -- Uses country only
β WHERE city = 'Austin'
β WHERE zip = '78701'
β WHERE city = 'Austin' AND zip = '78701'
Put most selective/frequently-queried columns first!
5. When to Create Indexes
β Good Candidates for Indexes
- β Columns in WHERE clauses (frequently searched)
- β Columns in JOIN conditions (foreign keys)
- β Columns in ORDER BY clauses
- β Columns in GROUP BY clauses
- β Columns with high selectivity (many unique values)
- β Large tables (1000+ rows)
β Bad Candidates for Indexes
- β Small tables (< 1000 rows) - table scan is faster
- β Columns with low selectivity (few unique values like boolean)
- β Columns rarely used in queries
- β Tables with frequent INSERT/UPDATE/DELETE
- β Too many indexes (slows writes, wastes space)
Examples of Good Indexes
-- β
Foreign keys (for JOINs)
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_product_id ON order_items(product_id);
-- β
Frequently searched columns
CREATE INDEX idx_email ON customers(email);
CREATE INDEX idx_order_date ON orders(order_date);
-- β
Columns used in WHERE
CREATE INDEX idx_status ON orders(status);
CREATE INDEX idx_country ON customers(country);
-- β
Composite for common query patterns
CREATE INDEX idx_date_status ON orders(order_date, status);
6. Using EXPLAIN to Analyze Queries
EXPLAIN shows how MySQL executes a query. It reveals whether indexes are being used, how many rows are scanned, and where optimizations are needed.
Basic EXPLAIN
-- Add EXPLAIN before any SELECT
EXPLAIN SELECT * FROM orders WHERE customer_id = 1001;
EXPLAIN SELECT
c.name,
COUNT(o.order_id)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.name;
Reading EXPLAIN Output
Key columns in EXPLAIN output:
type: Join type (how table is accessed)
- system: Only 1 row (best)
- const: PRIMARY KEY or UNIQUE lookup (excellent)
- eq_ref: One row per JOIN (excellent)
- ref: Multiple rows match index (good)
- range: Index range scan (acceptable)
- index: Full index scan (poor)
- ALL: Full table scan (worst)
possible_keys: Indexes that COULD be used
key: Index that WAS actually used (NULL = no index)
rows: Estimated rows to scan (lower is better)
Extra: Additional information
- Using index: Covering index (excellent)
- Using where: WHERE filter applied
- Using filesort: Sort without index (slow)
- Using temporary: Temp table created (slow)
Example EXPLAIN Analysis
-- Query without index
EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-01';
-- Result:
-- type: ALL
-- key: NULL
-- rows: 10000
-- Extra: Using where
-- Problem: Full table scan! Scanning all 10,000 rows
-- Create index
CREATE INDEX idx_order_date ON orders(order_date);
-- Query with index
EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-01';
-- Result:
-- type: ref
-- key: idx_order_date
-- rows: 50
-- Extra: Using index condition
-- Success! Using index, only scanning 50 rows
7. Query Optimization Strategies
1. Index Foreign Keys
-- Always index columns used in JOINs
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_order_id ON order_items(order_id);
CREATE INDEX idx_product_id ON order_items(product_id);
-- This makes JOINs dramatically faster
2. Use Covering Indexes
-- Covering index: includes ALL columns in query
-- Database can answer query from index alone (no table lookup)
-- Query: SELECT name, email FROM customers WHERE country = 'USA'
-- Without covering index:
-- 1. Look up in country index
-- 2. Fetch name and email from table
-- With covering index:
CREATE INDEX idx_country_name_email ON customers(country, name, email);
-- 1. Look up in index
-- 2. Return results (no table access!)
-- EXPLAIN shows: Extra: Using index
3. Optimize WHERE Clauses
-- β BAD: Function on indexed column prevents index use
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
-- β
GOOD: Use range instead
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01';
-- β BAD: Leading wildcard prevents index use
SELECT * FROM customers WHERE email LIKE '%@gmail.com';
-- β
GOOD: No leading wildcard
SELECT * FROM customers WHERE email LIKE 'john%';
4. Limit Result Sets
-- Use LIMIT to reduce data transfer
SELECT * FROM orders ORDER BY order_date DESC LIMIT 100;
-- Paginate large results
SELECT * FROM products
ORDER BY product_id
LIMIT 100 OFFSET 200; -- Page 3 (100 per page)
5. Avoid SELECT *
-- β BAD: Fetches all columns
SELECT * FROM customers WHERE country = 'USA';
-- β
GOOD: Only fetch needed columns
SELECT customer_id, name, email FROM customers WHERE country = 'USA';
-- Benefits:
-- - Less data transfer
-- - May use covering index
-- - Clearer intent
8. Managing Indexes
Viewing Indexes
-- Show all indexes on a table
SHOW INDEX FROM customers;
SHOW INDEX FROM orders;
-- Or using information_schema
SELECT
TABLE_NAME,
INDEX_NAME,
COLUMN_NAME,
NON_UNIQUE
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = 'your_database'
AND TABLE_NAME = 'customers';
Dropping Indexes
-- Drop an index
DROP INDEX idx_country ON customers;
-- Alternative syntax
ALTER TABLE customers DROP INDEX idx_country;
-- Drop primary key (rare, requires no foreign key references)
ALTER TABLE table_name DROP PRIMARY KEY;
Renaming Indexes
-- MySQL 5.7+
ALTER TABLE customers RENAME INDEX old_name TO new_name;
9. Common Indexing Mistakes
Mistake 1: Too Many Indexes
-- β BAD: Index on every column
CREATE TABLE products (
product_id INT PRIMARY KEY,
name VARCHAR(100),
description TEXT,
price DECIMAL(10,2),
cost DECIMAL(10,2),
stock INT,
category_id INT,
INDEX idx_name (name), -- Maybe needed
INDEX idx_description (description), -- Probably not needed
INDEX idx_price (price), -- Maybe needed
INDEX idx_cost (cost), -- Probably not needed
INDEX idx_stock (stock), -- Low selectivity
INDEX idx_category (category_id) -- Probably needed
);
-- Problem: Every INSERT/UPDATE/DELETE must update 6 indexes!
-- Solution: Only index columns you actually query frequently
Mistake 2: Duplicate/Redundant Indexes
-- β BAD: Redundant indexes
CREATE INDEX idx_country ON customers(country);
CREATE INDEX idx_country_city ON customers(country, city);
-- idx_country is redundant! idx_country_city can handle both:
-- WHERE country = 'USA' (uses idx_country_city)
-- WHERE country = 'USA' AND city = 'Austin'
-- β
GOOD: Keep only the composite index
DROP INDEX idx_country;
-- Keep idx_country_city
Mistake 3: Wrong Column Order
-- Common query: WHERE status = 'active' AND category_id = 5
-- β BAD: Low selectivity first
CREATE INDEX idx_status_category ON products(status, category_id);
-- status has only 2 values (active/inactive) - not selective
-- β
GOOD: High selectivity first
CREATE INDEX idx_category_status ON products(category_id, status);
-- category_id has many values - more selective
-- Rule: Put most selective columns first
10. π― Practice Exercises
Exercise 1: Basic Index Creation
Given this schema:
customers (customer_id, name, email, country, registration_date)
orders (order_id, customer_id, order_date, total, status)
products (product_id, name, price, category_id, stock)
Tasks:
- Create indexes for all foreign key columns
- Create indexes for frequently-searched columns (email, order_date, status)
- Create a composite index for common query: WHERE country = ? AND registration_date > ?
Show Solution
-- Task 1: Index foreign keys (for JOINs)
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_category_id ON products(category_id);
-- Task 2: Index frequently-searched columns
CREATE INDEX idx_email ON customers(email);
CREATE INDEX idx_order_date ON orders(order_date);
CREATE INDEX idx_status ON orders(status);
-- Task 3: Composite index for common query pattern
CREATE INDEX idx_country_regdate ON customers(country, registration_date);
-- This index supports:
-- WHERE country = 'USA'
-- WHERE country = 'USA' AND registration_date > '2024-01-01'
-- Verify with EXPLAIN
EXPLAIN SELECT * FROM customers
WHERE country = 'USA' AND registration_date > '2024-01-01';
Exercise 2: Analyzing Slow Queries
You have these slow queries:
-- Query 1: Takes 5 seconds
SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
-- Query 2: Takes 3 seconds
SELECT c.name, COUNT(o.order_id)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.name;
-- Query 3: Takes 8 seconds
SELECT * FROM products WHERE category_id = 5 AND price > 100;
Tasks:
- Use EXPLAIN to analyze each query
- Create appropriate indexes
- Verify improvement with EXPLAIN after adding indexes
Show Solution
-- Query 1 Analysis
EXPLAIN SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
-- Result: type: ALL, key: NULL (full table scan)
-- Solution: Index on order_date
CREATE INDEX idx_order_date ON orders(order_date);
-- Verify
EXPLAIN SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
-- Result: type: range, key: idx_order_date (uses index!)
-- Query 2 Analysis
EXPLAIN SELECT c.name, COUNT(o.order_id)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.name;
-- Result: JOIN on orders shows type: ALL (full scan on every customer)
-- Solution: Index foreign key
CREATE INDEX idx_customer_id ON orders(customer_id);
-- Verify
EXPLAIN SELECT c.name, COUNT(o.order_id)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.name;
-- Result: type: ref, key: idx_customer_id (uses index for JOIN!)
-- Query 3 Analysis
EXPLAIN SELECT * FROM products
WHERE category_id = 5 AND price > 100;
-- Result: type: ALL, key: NULL (full table scan)
-- Solution: Composite index (most selective first)
CREATE INDEX idx_category_price ON products(category_id, price);
-- Verify
EXPLAIN SELECT * FROM products
WHERE category_id = 5 AND price > 100;
-- Result: type: range, key: idx_category_price (uses composite index!)
Performance Improvements:
- β Query 1: 5 seconds β 50ms (100x faster)
- β Query 2: 3 seconds β 200ms (15x faster)
- β Query 3: 8 seconds β 10ms (800x faster)
π₯ Challenge: Optimize E-commerce Database
You have an e-commerce database with performance issues:
customers (customer_id, name, email, country, city, registration_date)
orders (order_id, customer_id, order_date, total, status)
order_items (item_id, order_id, product_id, quantity, price)
products (product_id, name, category_id, price, stock, status)
categories (category_id, name)
Common Queries:
- Find orders by customer email
- Get active products in specific category sorted by price
- Daily sales reports: WHERE order_date = ? AND status = 'completed'
- Customer list: WHERE country = ? AND city = ?
Challenge: Create optimal indexes for these query patterns.
Show Solution
-- 1. Index foreign keys (always!)
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_order_id ON order_items(order_id);
CREATE INDEX idx_product_id ON order_items(product_id);
CREATE INDEX idx_category_id ON products(category_id);
-- 2. For: Find orders by customer email
-- Need to JOIN customers β orders via email
CREATE INDEX idx_email ON customers(email);
-- 3. For: Active products in category, sorted by price
-- Composite: category_id, status, price (in that order)
CREATE INDEX idx_cat_status_price ON products(category_id, status, price);
-- This is a "covering index" for:
-- WHERE category_id = ? AND status = 'active' ORDER BY price
-- 4. For: Daily sales WHERE date = ? AND status = ?
-- Composite index on both columns
CREATE INDEX idx_date_status ON orders(order_date, status);
-- 5. For: Customers by country and city
-- Composite index
CREATE INDEX idx_country_city ON customers(country, city);
-- 6. Additional optimization: Registration date ranges
CREATE INDEX idx_registration_date ON customers(registration_date);
-- Verify with EXPLAIN
-- Query 1: Orders by email
EXPLAIN SELECT o.*
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE c.email = 'alice@example.com';
-- Uses: idx_email, idx_customer_id
-- Query 2: Active products in category, sorted by price
EXPLAIN SELECT * FROM products
WHERE category_id = 5 AND status = 'active'
ORDER BY price;
-- Uses: idx_cat_status_price (covering index!)
-- Query 3: Daily sales
EXPLAIN SELECT * FROM orders
WHERE order_date = '2024-12-18' AND status = 'completed';
-- Uses: idx_date_status
-- Query 4: Customers by location
EXPLAIN SELECT * FROM customers
WHERE country = 'USA' AND city = 'Austin';
-- Uses: idx_country_city
Index Strategy Summary:
- β Total indexes: 10 (reasonable for this schema)
- β All foreign keys indexed (for JOINs)
- β Composite indexes match query patterns
- β Column order optimized for selectivity
- β One covering index for best performance
- π‘ Expected improvement: 10-100x faster queries!
π Key Takeaways
- Indexes dramatically speed up SELECT but slow down INSERT/UPDATE/DELETE
- Without indexes, databases perform full table scans
- Create indexes with CREATE INDEX index_name ON table(column)
- PRIMARY KEY and UNIQUE automatically create indexes
- Composite indexes include multiple columns in specific order
- Leftmost prefix rule: composite indexes work left-to-right only
- Index foreign keys for JOIN performance
- Index columns in WHERE, ORDER BY, GROUP BY clauses
- Use EXPLAIN to analyze query execution
- Look for type: ALL (bad) vs type: ref/range (good)
- Avoid too many indexes (slows writes)
- High selectivity columns make better indexes
- Put most selective columns first in composite indexes
- Covering indexes include all queried columns (fastest)
- Proper indexing can make queries 10-1000x faster