Data in Multiple Tables

A Relational-Algebra Reading of Queries


Learning Objectives

  • You can explain how a few core relational-algebra operators connect to familiar SQL.
  • You can read a SQL query as a sequence of smaller relational steps.
  • You can explain why a relational-algebra reading helps in understanding query transformations and query optimization.

By this point in the part, you have already written joins, subqueries, and set operations. These can be also presented using a small set of relational-algebra operators that make query structure and transformations easier to discuss.

This chapter gives a relational-algebra reading of familiar SQL queries, giving names to the main operations that appear again and again in relational querying. This helps learning for two reasons: it helps explain what a query means, and it helps explain why the same query can often be written in more than one form.

Relational algebra is not a set of prerequisites for writing joins; it is a way of reading queries you have already written, so that their structure and the transformations between equivalent forms become easier to talk about. In this chapter, the goal is not to turn the course into a notation-heavy theory course. The goal is to make familiar SQL feel more structured and less like a collection of commands.

Use relational algebra as a structure model

The point of this chapter is not to memorize symbols for their own sake. The point is to see that a relational query can be read as a sequence of smaller operations on tables.

Relations In, Relations Out

One useful starting idea is that a relational-algebra operator takes one or more relations as input and produces another relation as output.

That may sound abstract at first, but it explains something important about relational queries. Query operations can be chained because the output of one step can become the input of the next step.

This is also why it is useful to think in steps. A query does not need to be understood as one large block. It can be understood as a small pipeline of row and column operations.

Loading Exercise...

Selection and Projection

Two of the most important operators are selection and projection. Selection means deciding which rows stay, while projection means deciding which columns stay visible.

In relational-algebra notation, selection is commonly written as sigma (𝜎) and projection as pi (πœ‹). You can read 𝜎condition(relation) as β€œkeep the rows of relation that satisfy the condition” and πœ‹columns(relation) as β€œkeep these columns visible.”

For example, consider the following query:

SELECT
  name
FROM users
WHERE id < 3;

This can be read in two parts. WHERE id < 3 is a selection that keeps only the rows whose id is less than 3, and SELECT name is a projection that keeps only the name column visible in the result.

In relational-algebra notation, the query can be written as:

πœ‹name(𝜎id < 3(users))

You do not need to become fluent in symbolic notation for this course. The inner step keeps only the rows where id < 3, and the outer step keeps only the name column.

In ordinary language, the structure is:

  1. start from users,
  2. apply selection to keep only some rows,
  3. apply projection to keep only some columns.

When thinking of the query in that way, it is easier to see how the selection and projection steps are separate. The selection step decides which rows stay, and the projection step decides which columns stay visible. That is a useful way to understand the structure of a query β€” we start with the full table, and then we apply a sequence of operations that keep only some rows and some columns.

Loading Exercise...

Another central idea is combination. In relational algebra, a join is often understood as a structured combination of rows. The notation for a join in relational algebra is:

relation_1⨝conditionrelation_2.

This means that the two relations are combined by keeping the row pairs that satisfy the written relationship condition. The condition could, for example, be relation_1.attribute = relation_2.attribute to follow the relationship between those tables.

One helpful way to think about it is: first consider row combinations, and then keep only the combinations that satisfy the relationship condition.

That interpretation helps explain why a missing or broken join condition causes trouble. If too many combinations survive, the result grows larger than expected and the rows start to look suspicious or repetitive.

Consider this query:

SELECT
  u.name,
  e.title
FROM submissions AS s
JOIN users AS u ON u.id = s.user_id
JOIN exercises AS e ON e.id = s.exercise_id
WHERE s.status = 'graded';

The same query can be read as a sequence of relational steps:

  1. start from submissions,
  2. combine it with users by matching s.user_id with u.id,
  3. combine that result with exercises by matching s.exercise_id with e.id,
  4. keep only the rows where status = 'graded',
  5. keep only the visible columns name and title.

If we write only the main relational structure, it is often clearer to split the algebra into a few named steps instead of forcing everything into one long line. First, we construct relation 𝑅1 by joining submissions and users on the relationship between them:

𝑅1≔submissions⨝submissions.user_id = users.idusers

Then, we construct relation 𝑅2 by joining 𝑅1 with exercises on the relationship between them:

𝑅2≔𝑅1⨝submissions.exercise_id = exercises.idexercises

Then, we construct the relation 𝑅3 by applying selection to keep only the graded submissions:

𝑅3β‰”πœŽstatus = graded(𝑅2)

Finally, we apply projection to keep only the desired columns:

πœ‹name,title(𝑅3)

Together, these steps give a structured reading of the query. Each step has a clear meaning, and the overall structure is easier to follow than one long nested expression.

Divide and conquer

When reading a query, it is often helpful to break it down into smaller steps. Each step can be understood on its own, and together they build up the full meaning of the query. This is one of the main insights of a relational-algebra reading: it allows us to understand complex queries as a sequence of simpler operations.


Loading Exercise...

Reading a Query Through Relational Algebra

It is useful to see one familiar query written in both forms. Consider the following SQL query that finds the course code, term, and year for all course instances in the year 2026:

SELECT
  c.code,
  ci.term,
  ci.year
FROM course_instances AS ci
JOIN courses AS c ON c.id = ci.course_id
WHERE ci.year = 2026;

To construct a relational-algebra reading of the same query, we can follow the same steps as before. First, we join course_instances with courses on the relationship between them:

𝑅1≔course_instances⨝course_instances.course_id = courses.idcourses

Then, we apply selection to keep only the course instances from the year 2026:

𝑅2β‰”πœŽyear = 2026(𝑅1)

Finally, we apply projection to keep only the desired columns:

πœ‹code,term,year(𝑅2)

This gives us a structured reading of the query in relational-algebra form. Each step corresponds to a part of the SQL query, and together they build up the full meaning of the query.

This is one reason a relational-algebra reading is useful. It shows that a relational query is built from a small set of reusable operations rather than from one large special-purpose statement.

Loading Exercise...

Set Operations Fit the Same Model

The SQL operators UNION, INTERSECT, and EXCEPT also fit into the same picture. They work on compatible query results and produce another compatible result.

Union (βˆͺ) combines two results and removes duplicates by default. Intersection (∩) keeps only the rows that appear in both results. Set difference (βˆ’) keeps only the rows that appear in the first result but not in the second.

In algebra-style notation, the shape of those operations is simply:

𝑅βˆͺ𝑆 π‘…βˆ©π‘† π‘…βˆ’π‘†

This is useful because it keeps the same basic pattern intact: start with relations, apply an operator, and produce another relation.

That also explains why these operators require compatible result shapes. If the two inputs do not describe results with the same structure, the set-style operation no longer makes sense.

Loading Exercise...

Relational Algebra and Query Optimization

Relational algebra is often used in query optimization because it gives a clear model for query structure and query transformation.

For example, compare these two ways of structuring the same query. In the first structure, we join first and filter later:

𝑅1≔submissions⨝submissions.user_id = users.idusers 𝑅2≔𝑅1⨝submissions.exercise_id = exercises.idexercises 𝑅3β‰”πœŽstatus = graded(𝑅2) πœ‹name,title(𝑅3)

In the second structure, we filter submissions first and only then join the smaller intermediate result:

𝑅1β‰”πœŽstatus = graded(submissions) 𝑅2≔𝑅1⨝submissions.user_id = users.idusers 𝑅3≔𝑅2⨝submissions.exercise_id = exercises.idexercises πœ‹name,title(𝑅3)

The main point is that the selection step can sometimes be moved earlier without changing the meaning of the query.

Now consider the implications of the two different query structures. The first structure joins the full submissions table with users and exercises, which may produce a large intermediate result. Then it applies the selection to keep only the graded submissions, which may be a small subset of the original result.

The second structure, on the other hand, applies the selection to submissions first, which reduces the number of rows before the joins. That can lead to a much smaller intermediate result when joining with users and exercises, which in turn can improve performance.

The idea of reordering operations while preserving the meaning of the query is a key aspect of query optimization. Relational algebra provides a framework for understanding when such transformations are valid and how they can lead to more efficient query execution plans.

In this course, the most important point about relational algebra is simple: relational algebra helps describe both what a query means and how its steps can sometimes be rearranged safely.

Loading Exercise...

Check Your Understanding

  1. What is the difference between selection and projection?
  2. How can a join be read as a sequence of smaller relational steps?
  3. Why is a relational-algebra reading useful even when the course still writes queries in SQL?

AI Study Aid

Create a chapter diagram

We're looking into to what extent AI-generated diagrams could help with studying.

Use this study aid to generate an AI-generated visual summary of the material you just studied. Each diagram style emphasizes the content in a different way, so you can choose the focus that feels most useful.

Using the diagram generator is voluntary and does not affect your course progress. We encourage you to try it out and see if it helps, but it's totally up to you! Your answers help us understand how to make better study aids in the future.

Diagrams tried: 0

Selected style: choose a diagram style