Data in Multiple Tables

Inner Joins


Learning Objectives

  • You can write INNER JOIN queries that follow the relationships in a schema.
  • You can explain why non-matching rows disappear in an inner join.

In this chapter, we look into the SQL syntax for combining related rows from multiple tables. The most basic way to do that is with an inner join, which combines rows when matches are required on both sides.

Start with One Relationship

Suppose the question is: which user made each submission?

To answer this, we need to combine information from two tables: users and submissions. The structure of the tables is:

  • users(id, name, email)
  • submissions(id, exercise_id, user_id, submitted_at, score, status)

An INNER JOIN query starts with one table and then joins the other table by following the relationship between them. In this case, the relationship is that submissions.user_id refers to users.id. The query looks like this:

SELECT
  s.id,
  s.status,
  s.score,
  u.name
FROM submissions AS s
INNER JOIN users AS u ON s.user_id = u.id;

A useful way to read this is: start from submissions, match each submission to its user, and then return columns from the combined result.

Joined Result Rows

After writing a join, it is worth pausing and asking: what does one result row represent now?

In the query above, one result row still represents one submission. The difference is that the row now also carries related user information.

A possible result could look like this:

idstatusscorename
10graded9Aino
11submittedNULLLeevi

That row meaning matters. If you lose track of it, joins quickly become confusing.

Loading Exercise...

On the ON Condition

The ON condition explains how rows from the two tables are connected. It follows the relationship in the schema. In this case, the relationship is that submissions.user_id refers to users.id, so the ON condition is s.user_id = u.id.

It is useful to contrast that with a condition that looks similar but does not follow the relationship. For example, s.id = u.id would usually be wrong, because it compares the submission identifier to the user identifier instead of following the foreign key stored in s.user_id.

A common beginner mistake is to treat a join as “attach another table somehow.” A safer interpretation is that every join should follow one specific relationship.

If the relationship is unclear, the ON condition is likely to be wrong too.

Loading Exercise...

Non-Matching Rows

The word INNER answers another important question: what happens if one row does not find a match?

With an inner join, only matching pairs survive.

So if one submission row had user_id = 999 and no user existed with id = 999, that submission would disappear from the result.

Inner joins are useful when the question only cares about matching pairs of related rows.

Loading Exercise...

Adding a Join at a Time

The same query pattern extends to longer paths. For example, suppose the question is: which exercises has each user submitted?

The path here starts from users and follows two relationships: first to submissions, then to exercises. In words:

  • users -> submissions -> exercises

Once the path is clear, the SQL follows it almost literally:

SELECT
  u.name,
  s.id,
  e.title AS exercise_title
FROM users AS u
INNER JOIN submissions AS s ON s.user_id = u.id
INNER JOIN exercises AS e ON e.id = s.exercise_id;

The pattern is still the same: start from one table, follow one relationship, and then follow the next relationship. Each INNER JOIN ... ON ... line corresponds to one step in the path.

This query can be read in ordinary language: start from users, connect each user to their submissions, connect each submission to its exercise, and then show the selected columns.

One result row now represents one submission, together with information about the user who made it and the exercise it was made for.

That reading is more useful than treating the query as a pile of clauses. It keeps the SQL connected to the relationship path.

Common Join Mistakes

One of the most common join mistakes is a missing or wrong ON condition. When a join does not have any ON condition, the result is a Cartesian product, which combines every row from one table with every row from the other table. That usually produces far too many rows.

A Cartesian product is a mathematical operation that combines every element of one set with every element of another set. In SQL, if you write FROM A, B without a join condition, you get the Cartesian product of tables A and B.

A join bug often shows up first as a symptom in the result: too many rows, repeated combinations, or values that should never appear together.

Knowing the data helps. If you know that there are 100 users and 100 submissions, then a result with 10,000 rows is a clear sign of a missing or wrong ON condition. Even if you do not know the exact numbers, a result that seems far too large is a strong signal to check the join conditions.

Similarly, a combination of values that should not appear together is a sign of a wrong ON condition. For example, if the result contains a row with name = 'Aino' and exercise_title = 'Linked Lists', but Aino has never submitted that exercise, then the join conditions are likely wrong.

Loading Exercise...

Check Your Understanding

  1. What rows survive an inner join?
  2. Why can a wrong ON condition produce far too many rows?
  3. Why is it useful to ask what one joined result row represents?

SQL Programming Exercise

The following short SQL sequence practices the core ideas from this chapter: following one relationship with INNER JOIN, then extending the same pattern to a slightly longer join path.

Loading Exercise...