Data in Multiple Tables

Subqueries and Set Operations


Learning Objectives

  • You can use subqueries for questions that are naturally about membership or existence.
  • You can explain when set operations are a natural fit.

So far, we have practiced joins as the main way to express multi-table questions. Joins are powerful, but they are not the only way to express a multi-table question. Sometimes the question is really about membership, existence, or combining whole result sets. That is where subqueries and set operations become useful.

When IN (...) Fits the Question

Suppose the question is: which exercises belong to course instances from the year 2026?

One clear way to write that is:

SELECT
  id,
  title
FROM exercises
WHERE course_instance_id IN (
  SELECT id
  FROM course_instances
  WHERE year = 2026
);

This query can be read in two steps. First, find the course instance identifiers for the year 2026. Then keep only exercises whose course_instance_id is in that result.

That is the practical meaning of the subquery here. One query produces a result that another query uses.

This is also why the example fits a subquery well. The question is really about membership: which exercises belong to a set of course instances? In that kind of case, the IN (...) structure can match the wording more naturally than a join.

That does not mean subqueries are always better than joins. The join version would also work here:

SELECT
  e.id,
  e.title
FROM exercises AS e
INNER JOIN course_instances AS ci ON ci.id = e.course_instance_id
WHERE ci.year = 2026;

The two queries answer the same question. The difference is mainly in emphasis. The join version follows the relationship path directly, while the subquery version keeps the membership idea more visible: first find the relevant course instance identifiers, and then keep the exercises that belong to them.

If the real goal is to reconstruct related facts from several tables and show them together, a join is often the clearer choice. Subqueries are most helpful when the question is naturally about membership or existence.

Write SQL for clarity first

In many everyday cases, SQL should first be written in the form that best matches the question for a human reader. Modern database systems can often optimize different but equivalent query forms, so clarity is usually a good starting point. Still, different query shapes are not always optimized in exactly the same way, so performance should be checked when it matters.


Loading Exercise...

EXISTS and NOT EXISTS

Another common question shape is keep this row if at least one related row exists. Suppose the question is: which users have made at least one submission?

Here, the EXISTS operator is a natural fit:

SELECT
  u.id,
  u.name
FROM users AS u
WHERE EXISTS (
  SELECT 1
  FROM submissions AS s
  WHERE s.user_id = u.id
);

Read this as: keep this user row if there exists at least one related submission row.

Loading Exercise...

The opposite pattern is just as useful:

SELECT
  u.id,
  u.name
FROM users AS u
WHERE NOT EXISTS (
  SELECT 1
  FROM submissions AS s
  WHERE s.user_id = u.id
);

Read this as: keep this user row only if no related submission row exists.

This is a good example of choosing the SQL pattern from the question. If the question really says “at least one” or “none,” then EXISTS or NOT EXISTS often fits naturally.

Loading Exercise...

Set Operations

Subqueries are not the only alternative to joins. SQL can also combine whole query results through set operations. The main set operation is UNION, which combines two results and removes duplicates by default. Other set operations include INTERSECT for rows present in both results, and EXCEPT for rows present in the first result but not in the second.

For example, if we had a table called archived_courses that stores old courses, we could ask: which course codes have ever been used?

SELECT code
FROM courses
UNION
SELECT code
FROM archived_courses;

This query lists all unique course codes from both the courses and archived_courses tables.

The following query asks which course codes exist in courses but not in archived_courses:

SELECT code
FROM courses
EXCEPT
SELECT code
FROM archived_courses;

That is a different kind of question from a join. Instead of reconstructing related facts, it compares one whole result with another whole result.

Result shapes must match

Set operations work on whole result sets, so the two queries must have compatible result shapes. That means the same number of columns and matching types in the corresponding positions. If the result shapes are not compatible, the database will raise an error.


Loading Exercise...

Choose the Pattern from the Question

Joins are often a good fit for reconstructing related facts. Subqueries are often a good fit for membership or existence conditions. Set operations are often a good fit for combining or comparing whole results.

That framing keeps the SQL connected to the information need. It also reduces the temptation to force every relational question into the same query pattern.

Common mistake

It is easy to force every multi-table question into a join, even when the question is really about membership, existence, or missing related rows. If a join version feels awkward to read, a subquery may be a better fit.


Loading Exercise...

Check Your Understanding

  1. When can a subquery express a question more clearly than a long join?
  2. Why is NOT EXISTS useful for finding missing related rows?
  3. Why do set operations need compatible result shapes?

SQL Programming Exercise

The following SQL sequence practices the chapter’s main patterns: IN (...), EXISTS, NOT EXISTS, and one set operation.

Loading Exercise...