Inner Joins
Learning Objectives
- You can write
INNER JOINqueries that follow the relationships in a schema. - You can explain why non-matching rows disappear in an inner join.
In this chapter, we look into the SQL syntax for combining related rows from multiple tables. The most basic way to do that is with an inner join, which combines rows when matches are required on both sides.
Start with One Relationship
Suppose the question is: which user made each submission?
To answer this, we need to combine information from two tables: users and submissions. The structure of the tables is:
users(id, name, email)submissions(id, exercise_id, user_id, submitted_at, score, status)
An INNER JOIN query starts with one table and then joins the other table by following the relationship between them. In this case, the relationship is that submissions.user_id refers to users.id. The query looks like this:
SELECT
s.id,
s.status,
s.score,
u.name
FROM submissions AS s
INNER JOIN users AS u ON s.user_id = u.id;
A useful way to read this is: start from submissions, match each submission to its user, and then return columns from the combined result.
Joined Result Rows
After writing a join, it is worth pausing and asking: what does one result row represent now?
In the query above, one result row still represents one submission. The difference is that the row now also carries related user information.
A possible result could look like this:
| id | status | score | name |
|---|---|---|---|
| 10 | graded | 9 | Aino |
| 11 | submitted | NULL | Leevi |
That row meaning matters. If you lose track of it, joins quickly become confusing.
On the ON Condition
The ON condition explains how rows from the two tables are connected. It follows the relationship in the schema. In this case, the relationship is that submissions.user_id refers to users.id, so the ON condition is s.user_id = u.id.
It is useful to contrast that with a condition that looks similar but does not follow the relationship. For example, s.id = u.id would usually be wrong, because it compares the submission identifier to the user identifier instead of following the foreign key stored in s.user_id.
A common beginner mistake is to treat a join as “attach another table somehow.” A safer interpretation is that every join should follow one specific relationship.
If the relationship is unclear, the ON condition is likely to be wrong too.
Non-Matching Rows
The word INNER answers another important question: what happens if one row does not find a match?
With an inner join, only matching pairs survive.
So if one submission row had user_id = 999 and no user existed with id = 999, that submission would disappear from the result.
Inner joins are useful when the question only cares about matching pairs of related rows.
Adding a Join at a Time
The same query pattern extends to longer paths. For example, suppose the question is: which exercises has each user submitted?
The path here starts from users and follows two relationships: first to submissions, then to exercises. In words:
users -> submissions -> exercises
Once the path is clear, the SQL follows it almost literally:
SELECT
u.name,
s.id,
e.title AS exercise_title
FROM users AS u
INNER JOIN submissions AS s ON s.user_id = u.id
INNER JOIN exercises AS e ON e.id = s.exercise_id;
The pattern is still the same: start from one table, follow one relationship, and then follow the next relationship. Each INNER JOIN ... ON ... line corresponds to one step in the path.
This query can be read in ordinary language: start from users, connect each user to their submissions, connect each submission to its exercise, and then show the selected columns.
One result row now represents one submission, together with information about the user who made it and the exercise it was made for.
That reading is more useful than treating the query as a pile of clauses. It keeps the SQL connected to the relationship path.
Common Join Mistakes
One of the most common join mistakes is a missing or wrong ON condition. When a join does not have any ON condition, the result is a Cartesian product, which combines every row from one table with every row from the other table. That usually produces far too many rows.
A Cartesian product is a mathematical operation that combines every element of one set with every element of another set. In SQL, if you write
FROM A, Bwithout a join condition, you get the Cartesian product of tables A and B.
A join bug often shows up first as a symptom in the result: too many rows, repeated combinations, or values that should never appear together.
Knowing the data helps. If you know that there are 100 users and 100 submissions, then a result with 10,000 rows is a clear sign of a missing or wrong ON condition. Even if you do not know the exact numbers, a result that seems far too large is a strong signal to check the join conditions.
Similarly, a combination of values that should not appear together is a sign of a wrong ON condition. For example, if the result contains a row with name = 'Aino' and exercise_title = 'Linked Lists', but Aino has never submitted that exercise, then the join conditions are likely wrong.
Check Your Understanding
- What rows survive an inner join?
- Why can a wrong
ONcondition produce far too many rows? - Why is it useful to ask what one joined result row represents?
SQL Programming Exercise
The following short SQL sequence practices the core ideas from this chapter: following one relationship with INNER JOIN, then extending the same pattern to a slightly longer join path.