Aggregate Functions - Analyzing Your Data
Up until now, you've been working with individual rows: "Show me Alice's GPA" or "Find students in Biology." But what if you want to answer bigger questions like "What's the average GPA across all students?" or "How many students are enrolled?" That's where aggregate functions come in! They take many rows and collapse them into a single summary value. Think of them as your data analysis powerhouse!
π What You'll Learn
- What aggregate functions are and why they're essential
- The "Big Five" aggregate functions: COUNT, SUM, AVG, MIN, MAX
- How aggregates collapse many rows into single summary values
- The critical difference between COUNT(*) and COUNT(column)
- How NULL values affect each aggregate function
- Why you can't mix aggregate and non-aggregate columns (without GROUP BY)
- Using aggregates with WHERE to filter before summarizing
- Combining multiple aggregates in one query
- Real-world business analytics scenarios
- Common beginner mistakes and how to avoid them
- Hands-on practice exercises
1. What Are Aggregate Functions?
Aggregate functions perform calculations across multiple rows and return a single result. They "aggregate" (combine) data into summaries.
π‘ Real-World Example: Imagine you're a teacher with 100 students. Instead of looking at each student's grade individually, you want to know: "What's the class average?" or "What's the highest score?" or "How many students passed?" Aggregate functions answer these "big picture" questions by processing all rows at once and giving you one answer.
Visual Example: From Many Rows to One Summary
Input: Individual Rows
| name | gpa |
|---|---|
| Alice | 3.8 |
| Bob | 3.5 |
| Carol | 3.9 |
| Diana | 3.2 |
Output: Single Summary
SELECT AVG(gpa)
FROM students;
| AVG(gpa) |
|---|
| 3.6 |
See the magic? Four rows collapsed into one number: the average GPA!
2. The Big Five Aggregate Functions
SQL has five essential aggregate functions you'll use constantly:
| Function | What It Does | Example Question |
|---|---|---|
| COUNT() | Counts rows | "How many students?" |
| SUM() | Adds numbers | "What's total sales?" |
| AVG() | Calculates average | "What's average GPA?" |
| MIN() | Finds smallest value | "What's lowest price?" |
| MAX() | Finds largest value | "What's highest score?" |
3. COUNT() - How Many Rows?
COUNT() is the most commonly used aggregate function. It counts rows.
COUNT(*) - Count All Rows
-- How many students are in the database?
SELECT COUNT(*) AS total_students
FROM students;
Result: If there are 150 students, you get: total_students: 150
π‘ What COUNT(*) Means: The asterisk (*) means "count every row, regardless of content." Even if a row has NULL values in every column, COUNT(*) still counts it because the row exists.
COUNT(column) - Count Non-NULL Values
-- How many students provided a phone number?
SELECT COUNT(phone) AS students_with_phone
FROM students;
This counts only rows where phone is NOT NULL.
The Critical Difference
COUNT(*)
SELECT COUNT(*)
FROM students;
Result: 4 (all rows)
COUNT(column)
SELECT COUNT(phone)
FROM students;
Result: 3 (Bob has NULL phone)
β οΈ Critical Point: COUNT(*) and COUNT(column) give DIFFERENT results! This is one of the most common interview questions. COUNT(*) counts rows, COUNT(column) counts non-NULL values in that column.
COUNT(DISTINCT column) - Count Unique Values
-- How many different majors do we have?
SELECT COUNT(DISTINCT major) AS unique_majors
FROM students;
If 100 students are divided among 5 majors, this returns 5 (not 100).
4. SUM() - What's the Total?
SUM() adds up all values in a numeric column.
-- What are our total sales?
SELECT SUM(sales_amount) AS total_sales
FROM orders;
Example with Sample Data:
Order Data
| order_id | amount |
|---|---|
| 1 | $250 |
| 2 | $175 |
| 3 | $500 |
| 4 | $325 |
SUM Result
SELECT SUM(amount)
FROM orders;
| SUM(amount) |
|---|
| $1,250 |
250 + 175 + 500 + 325 = 1,250
β οΈ Important: SUM() only works with numeric columns! Trying to SUM() text like names will cause an error.
5. AVG() - What's the Average?
AVG() calculates the average (mean) of numeric values.
-- What's the average GPA?
SELECT AVG(gpa) AS average_gpa
FROM students;
Example Calculation:
GPAs: 3.8, 3.5, 3.9, 3.2
Calculation: (3.8 + 3.5 + 3.9 + 3.2) Γ· 4 = 14.4 Γ· 4 = 3.6
How NULL Affects AVG()
AVG() ignores NULL values completely:
With NULL
GPAs: 3.8, 3.5, NULL, 3.9
AVG(): (3.8 + 3.5 + 3.9) Γ· 3 = 3.73
NULL is excluded from both sum and count!
Without NULL
GPAs: 3.8, 3.5, 3.2, 3.9
AVG(): (3.8 + 3.5 + 3.2 + 3.9) Γ· 4 = 3.6
All values included
6. MIN() and MAX() - Finding Extremes
MIN() finds the smallest value, MAX() finds the largest value.
Finding Minimum Values
-- What's the lowest GPA?
SELECT MIN(gpa) AS lowest_gpa
FROM students;
Finding Maximum Values
-- What's the highest price?
SELECT MAX(price) AS highest_price
FROM products;
MIN/MAX Work with Text Too!
-- First student alphabetically
SELECT MIN(name) AS first_name_alphabetically
FROM students;
-- Last student alphabetically
SELECT MAX(name) AS last_name_alphabetically
FROM students;
With names like Alice, Bob, Carol, Diana:
MIN(name) = "Alice" (first alphabetically)
MAX(name) = "Diana" (last alphabetically)
π‘ Pro Tip: MIN() and MAX() also ignore NULL values, just like SUM() and AVG().
7. Combining Multiple Aggregates in One Query
You can use multiple aggregate functions in a single SELECT statement:
-- Get complete GPA statistics
SELECT
COUNT(*) AS total_students,
AVG(gpa) AS average_gpa,
MIN(gpa) AS lowest_gpa,
MAX(gpa) AS highest_gpa
FROM students;
Result:
| total_students | average_gpa | lowest_gpa | highest_gpa |
|---|---|---|---|
| 150 | 3.45 | 2.1 | 4.0 |
One query, complete statistical summary! This is incredibly powerful for dashboards and reports.
8. Using WHERE to Filter Before Aggregating
WHERE filters rows before the aggregate function runs:
-- Average GPA of Biology majors only
SELECT AVG(gpa) AS bio_average_gpa
FROM students
WHERE major = 'Biology';
Execution Order:
- FROM: Get all students
- WHERE: Filter to Biology majors only
- SELECT AVG(): Calculate average of filtered rows
Practical Example: Sales Analysis
-- Total sales for 2024 only
SELECT
COUNT(*) AS orders_in_2024,
SUM(amount) AS total_revenue
FROM orders
WHERE YEAR(order_date) = 2024;
9. The Mixing Problem (Without GROUP BY)
This is one of the most confusing things for beginners:
-- β ERROR: Can't mix aggregate and non-aggregate columns!
SELECT name, AVG(gpa)
FROM students;
β Why This Fails: AVG(gpa) returns ONE number (the average). But name has 150 different values (one per student). SQL can't put 150 names next to 1 numberβthey're incompatible!
Think About It Logically:
Question: "Show me each student's name and the average GPA."
Problem: The average GPA is 3.6 for ALL students. Which name should appear next to it? Alice? Bob? All 150 names?
Answer: SQL doesn't know, so it throws an error!
The Solution: GROUP BY (Next Lesson!)
To mix aggregates with regular columns, you need GROUP BY. That's coming in the next lesson! For now, remember:
- β
Aggregates alone:
SELECT AVG(gpa) FROM students - β
Regular columns alone:
SELECT name FROM students - β Mixed without GROUP BY:
SELECT name, AVG(gpa) FROM students
10. Common Beginner Mistakes
Mistake #1: Confusing COUNT(*) and COUNT(column)
-- These give DIFFERENT results!
SELECT COUNT(*) FROM students; -- Counts all rows (4)
SELECT COUNT(phone) FROM students; -- Counts non-NULL phones (3)
Mistake #2: Using AVG() on Text
-- β ERROR: Can't average text!
SELECT AVG(name) FROM students;
-- β
CORRECT: Average numeric values
SELECT AVG(gpa) FROM students;
Mistake #3: Forgetting NULL Affects Results
-- If there are 100 students but 20 have NULL GPA:
SELECT AVG(gpa) FROM students;
-- Averages only the 80 non-NULL GPAs (not 100!)
Mistake #4: Mixing Aggregates with Regular Columns
-- β WRONG: Can't mix without GROUP BY
SELECT name, AVG(gpa) FROM students;
-- β
CORRECT: Just the aggregate
SELECT AVG(gpa) FROM students;
Mistake #5: Using Aggregate in WHERE
-- β WRONG: Can't use aggregate in WHERE
SELECT * FROM students WHERE gpa > AVG(gpa);
-- β
CORRECT: Use HAVING (covered in next lesson)
-- Or calculate separately in subquery
11. π― Hands-On Practice Exercises
Exercise 1: Count All Students
Task: Write a query to count the total number of students.
Show Answer
SELECT COUNT(*) AS total_students
FROM students;
Exercise 2: Calculate Average GPA
Task: Find the average GPA across all students.
Show Answer
SELECT AVG(gpa) AS average_gpa
FROM students;
Exercise 3: Find Highest and Lowest GPA
Task: Find both the highest and lowest GPAs in one query.
Show Answer
SELECT
MAX(gpa) AS highest_gpa,
MIN(gpa) AS lowest_gpa
FROM students;
Exercise 4: Count Students with Email
Task: Count how many students have provided an email address.
Show Answer
SELECT COUNT(email) AS students_with_email
FROM students;
Remember: COUNT(email) excludes NULL values!
Exercise 5: Complete Statistics
Task: Get complete GPA statistics: count, average, min, and max in one query.
Show Answer
SELECT
COUNT(*) AS total_students,
AVG(gpa) AS average_gpa,
MIN(gpa) AS lowest_gpa,
MAX(gpa) AS highest_gpa
FROM students;
Exercise 6: Filtered Aggregate
Task: Find the average GPA of Computer Science majors only.
Show Answer
SELECT AVG(gpa) AS cs_average_gpa
FROM students
WHERE major = 'Computer Science';
Challenge Exercise: COUNT Comparison
Task: Write a query that shows both total students and students with phone numbers.
Show Answer
SELECT
COUNT(*) AS total_students,
COUNT(phone) AS students_with_phone,
COUNT(*) - COUNT(phone) AS students_without_phone
FROM students;
Bonus: We calculated students WITHOUT phone by subtracting!
π Key Takeaways
- Aggregate functions collapse many rows into one summary
- COUNT(*) counts all rows, COUNT(column) counts non-NULL values
- SUM() adds numbers, AVG() calculates average
- MIN() finds smallest, MAX() finds largest
- All aggregates (except COUNT(*)) ignore NULL values
- You can combine multiple aggregates in one query
- Use WHERE to filter before aggregating
- Can't mix aggregates with regular columns without GROUP BY
- Aggregates work on numbers, but MIN/MAX also work on text
- Perfect for dashboards, reports, and data analysis!
π What's Next?
Excellent work! You now know how to summarize entire datasets into meaningful statistics. But what if you want to break down those summaries by category? For example, "average GPA per major" or "total sales per region"? That's where GROUP BY comes inβit's one of the most powerful SQL features!
Up next: GROUP BY - Grouping Data for Analysis