A Relational-Algebra Reading of Queries
Learning Objectives
- You can explain how a few core relational-algebra operators connect to familiar SQL.
- You can read a SQL query as a sequence of smaller relational steps.
- You can explain why a relational-algebra reading helps in understanding query transformations and query optimization.
By this point in the part, you have already written joins, subqueries, and set operations. These can be also presented using a small set of relational-algebra operators that make query structure and transformations easier to discuss.
This chapter gives a relational-algebra reading of familiar SQL queries, giving names to the main operations that appear again and again in relational querying. This helps learning for two reasons: it helps explain what a query means, and it helps explain why the same query can often be written in more than one form.
Relational algebra is not a set of prerequisites for writing joins; it is a way of reading queries you have already written, so that their structure and the transformations between equivalent forms become easier to talk about. In this chapter, the goal is not to turn the course into a notation-heavy theory course. The goal is to make familiar SQL feel more structured and less like a collection of commands.
The point of this chapter is not to memorize symbols for their own sake. The point is to see that a relational query can be read as a sequence of smaller operations on tables.
Relations In, Relations Out
One useful starting idea is that a relational-algebra operator takes one or more relations as input and produces another relation as output.
That may sound abstract at first, but it explains something important about relational queries. Query operations can be chained because the output of one step can become the input of the next step.
This is also why it is useful to think in steps. A query does not need to be understood as one large block. It can be understood as a small pipeline of row and column operations.
Selection and Projection
Two of the most important operators are selection and projection. Selection means deciding which rows stay, while projection means deciding which columns stay visible.
In relational-algebra notation, selection is commonly written as sigma () and projection as pi (). You can read as βkeep the rows of relation that satisfy the conditionβ and as βkeep these columns visible.β
For example, consider the following query:
SELECT
name
FROM users
WHERE id < 3;
This can be read in two parts. WHERE id < 3 is a selection that keeps only the rows whose id is less than 3, and SELECT name is a projection that keeps only the name column visible in the result.
In relational-algebra notation, the query can be written as:
You do not need to become fluent in symbolic notation for this course. The inner step keeps only the rows where id < 3, and the outer step keeps only the name column.
In ordinary language, the structure is:
- start from
users, - apply selection to keep only some rows,
- apply projection to keep only some columns.
When thinking of the query in that way, it is easier to see how the selection and projection steps are separate. The selection step decides which rows stay, and the projection step decides which columns stay visible. That is a useful way to understand the structure of a query β we start with the full table, and then we apply a sequence of operations that keep only some rows and some columns.
Combining Related Tables
Another central idea is combination. In relational algebra, a join is often understood as a structured combination of rows. The notation for a join in relational algebra is:
This means that the two relations are combined by keeping the row pairs that satisfy the written relationship condition. The condition could, for example, be relation_1.attribute = relation_2.attribute to follow the relationship between those tables.
One helpful way to think about it is: first consider row combinations, and then keep only the combinations that satisfy the relationship condition.
That interpretation helps explain why a missing or broken join condition causes trouble. If too many combinations survive, the result grows larger than expected and the rows start to look suspicious or repetitive.
Consider this query:
SELECT
u.name,
e.title
FROM submissions AS s
JOIN users AS u ON u.id = s.user_id
JOIN exercises AS e ON e.id = s.exercise_id
WHERE s.status = 'graded';
The same query can be read as a sequence of relational steps:
- start from
submissions, - combine it with
usersby matchings.user_idwithu.id, - combine that result with
exercisesby matchings.exercise_idwithe.id, - keep only the rows where
status = 'graded', - keep only the visible columns
nameandtitle.
If we write only the main relational structure, it is often clearer to split the algebra into a few named steps instead of forcing everything into one long line. First, we construct relation by joining submissions and users on the relationship between them:
Then, we construct relation by joining with exercises on the relationship between them:
Then, we construct the relation by applying selection to keep only the graded submissions:
Finally, we apply projection to keep only the desired columns:
Together, these steps give a structured reading of the query. Each step has a clear meaning, and the overall structure is easier to follow than one long nested expression.
When reading a query, it is often helpful to break it down into smaller steps. Each step can be understood on its own, and together they build up the full meaning of the query. This is one of the main insights of a relational-algebra reading: it allows us to understand complex queries as a sequence of simpler operations.
Reading a Query Through Relational Algebra
It is useful to see one familiar query written in both forms. Consider the following SQL query that finds the course code, term, and year for all course instances in the year 2026:
SELECT
c.code,
ci.term,
ci.year
FROM course_instances AS ci
JOIN courses AS c ON c.id = ci.course_id
WHERE ci.year = 2026;
To construct a relational-algebra reading of the same query, we can follow the same steps as before. First, we join course_instances with courses on the relationship between them:
Then, we apply selection to keep only the course instances from the year 2026:
Finally, we apply projection to keep only the desired columns:
This gives us a structured reading of the query in relational-algebra form. Each step corresponds to a part of the SQL query, and together they build up the full meaning of the query.
This is one reason a relational-algebra reading is useful. It shows that a relational query is built from a small set of reusable operations rather than from one large special-purpose statement.
Set Operations Fit the Same Model
The SQL operators UNION, INTERSECT, and EXCEPT also fit into the same picture. They work on compatible query results and produce another compatible result.
Union () combines two results and removes duplicates by default. Intersection () keeps only the rows that appear in both results. Set difference () keeps only the rows that appear in the first result but not in the second.
In algebra-style notation, the shape of those operations is simply:
This is useful because it keeps the same basic pattern intact: start with relations, apply an operator, and produce another relation.
That also explains why these operators require compatible result shapes. If the two inputs do not describe results with the same structure, the set-style operation no longer makes sense.
Relational Algebra and Query Optimization
Relational algebra is often used in query optimization because it gives a clear model for query structure and query transformation.
For example, compare these two ways of structuring the same query. In the first structure, we join first and filter later:
In the second structure, we filter submissions first and only then join the smaller intermediate result:
The main point is that the selection step can sometimes be moved earlier without changing the meaning of the query.
Now consider the implications of the two different query structures. The first structure joins the full submissions table with users and exercises, which may produce a large intermediate result. Then it applies the selection to keep only the graded submissions, which may be a small subset of the original result.
The second structure, on the other hand, applies the selection to submissions first, which reduces the number of rows before the joins. That can lead to a much smaller intermediate result when joining with users and exercises, which in turn can improve performance.
The idea of reordering operations while preserving the meaning of the query is a key aspect of query optimization. Relational algebra provides a framework for understanding when such transformations are valid and how they can lead to more efficient query execution plans.
In this course, the most important point about relational algebra is simple: relational algebra helps describe both what a query means and how its steps can sometimes be rearranged safely.
Check Your Understanding
- What is the difference between selection and projection?
- How can a join be read as a sequence of smaller relational steps?
- Why is a relational-algebra reading useful even when the course still writes queries in SQL?
AI Study Aid
Create a chapter diagram
We're looking into to what extent AI-generated diagrams could help with studying.
Use this study aid to generate an AI-generated visual summary of the material you just studied.
Each diagram style emphasizes the content in a different way, so you can choose the focus
that feels most useful.
Using the diagram generator is voluntary and does not affect your course progress. We encourage
you to try it out and see if it helps, but it's totally up to you! Your answers help us understand
how to make better study aids in the future.
Selected style: choose a diagram style