Data in Multiple Tables

Complex Joins


Learning Objectives

  • You can extend joins to larger query patterns involving several tables.
  • You can explain how row meaning and join type still matter in longer queries.
  • You can debug a suspicious join result by narrowing the query one step at a time.

Once a two-table join feels manageable, the next step is to grow the query, following one relationship at a time and keeping the row meaning in mind as the query grows.

Start with the Path and Build the Query

Suppose the question is: which users submitted exercises in the spring 2026 instance of CS-A1150?

Before writing SQL, describe the path in words. Start from the course, move to the correct course instance, move to its exercises, move to submissions, and then move to users.

That gives the path:

  • courses -> course_instances -> exercises -> submissions -> users

This habit matters even more in long joins than in short ones. When the path is clear, the later JOIN clauses have a much better chance of being correct.

The SQL might look like this:

SELECT
  c.code,
  ci.term,
  ci.year,
  e.title AS exercise_title,
  u.name
FROM courses AS c
INNER JOIN course_instances AS ci ON ci.course_id = c.id
INNER JOIN exercises AS e ON e.course_instance_id = ci.id
INNER JOIN submissions AS s ON s.exercise_id = e.id
INNER JOIN users AS u ON u.id = s.user_id
WHERE c.code = 'CS-A1150'
  AND ci.term = 'spring'
  AND ci.year = 2026;

Even though the query is longer, the logic is still the same as before: follow one relationship at a time.

A possible result might look like this:

codetermyearexercise_titlename
CS-A1150spring2026SQL BasicsAino
CS-A1150spring2026SQL BasicsLeevi
CS-A1150spring2026Filtering QueriesAino

Each row represents one matched path through the query: one course instance, one exercise, one submission, and one user.

At this point, it is useful to practice that same idea directly: follow the path, add one join at a time, and keep the row meaning in mind as the query grows.

Loading Exercise...

Repeated Values

Longer joins often produce repeated values.

That can feel suspicious at first, but the important question is not simply whether some value appears more than once. The more useful question is what one result row represents now.

If several users submitted the same exercise, then a query that shows both exercises and users will naturally include several rows for that exercise. The repeated-looking exercise title may be completely correct.

So the interpretation rule is simple: repeated-looking values are only a bug if the row meaning says they should not be there.

Loading Exercise...

A Longer Path with a Left Join

Longer join paths do not have to use INNER JOIN all the way through. The join type should still come from the question.

Suppose the question is: show all exercises in the spring 2026 instance of CS-A1150, together with submissions if they exist.

That question still starts with a long path:

  • courses -> course_instances -> exercises

The last step is different, however. The question asks for all exercises, even if some of them have no submissions, so the final step should use a LEFT JOIN:

SELECT
  e.title,
  s.submitted_at,
  s.score
FROM courses AS c
INNER JOIN course_instances AS ci ON ci.course_id = c.id
INNER JOIN exercises AS e ON e.course_instance_id = ci.id
LEFT JOIN submissions AS s ON s.exercise_id = e.id
WHERE c.code = 'CS-A1150'
  AND ci.term = 'spring'
  AND ci.year = 2026;

Longer query paths are still built one relationship at a time. The path tells us which tables belong in the query, and the wording of the question tells us whether each step should require a match or keep unmatched rows visible.

Loading Exercise...

A Small Self-Join

Sometimes a table must be joined to itself.

For example, imagine that courses stores a prerequisite through prerequisite_course_id, which points back to courses.id.

Then one query can read the table in two roles:

SELECT
  c.code,
  p.code AS prerequisite_code
FROM courses AS c
INNER JOIN courses AS p ON c.prerequisite_course_id = p.id;

The main idea is simple: a join is about a relationship path, not about whether the two roles happen to come from tables with different names.

Aliases are important here because the query needs one name for the course itself and another name for the prerequisite course. In other words, the aliases make it possible to say clearly which role each copy of the table is playing.

Loading Exercise...

Debugging Suspicious Results

Longer joins also make more room for mistakes. Before worrying about syntax, a useful first question is the one from earlier in the chapter: what does one result row represent?

If the answer is clear but the rows on screen do not match it, a short debugging habit helps.

A useful order is to simplify back toward something known to work:

  1. keep only the first one or two joins,
  2. run the shortened query and inspect the rows,
  3. add one join back at a time,
  4. at each step, check whether the row count or row shape changes in a way you did not expect.

Different symptoms usually point to different causes:

  • Far too many rows: often a missing or wrong ON condition, which produces something close to a Cartesian product.
  • Rows that should not appear together: usually a wrong ON condition that connects tables by the wrong columns.
  • Missing rows that should be visible: often an INNER JOIN where a LEFT JOIN was needed, or a WHERE condition that accidentally filters out rows with NULL on one side.
  • Repeated rows that look like duplicates: often correct and explained by the row meaning, but worth confirming against the schema and the data.

The general pattern is the same as in the debugging chapter of Part 2: start small, compare expectation with result, and add complexity back one step at a time. Joins only make that habit more important.

Loading Exercise...

Check Your Understanding

  1. Why is it useful to describe the join path before writing a long query?
  2. Why are repeated-looking rows sometimes correct in a multi-table join?
  3. Why might the last step in a longer query use LEFT JOIN instead of INNER JOIN?

SQL Programming Exercise

The following SQL sequence practices longer relationship paths, one longer query that ends with LEFT JOIN, and one self-join.

Loading Exercise...