Data in Multiple Tables

Data Distribution and Redundancy

Learning Objectives

You can explain why repeated facts make a schema fragile.
You can connect redundancy to update, insert, and delete anomalies.
You can read the extended multi-table schema that the rest of this part uses.

So far, we have worked with a single table at a time. We could, in principle, keep doing that by putting all the facts into one big table. That design becomes fragile quickly, however, because the same fact ends up stored in many rows.

In this chapter, we first look at what that redundancy means in one wide table. After that, we introduce the cleaner multi-table structure that we use from this part onward.

Start with One Wide Table

Imagine that a system stores course submission information in one table.

submission_id	user_name	user_email	course_code	course_title	exercise_title	score
1	Aino	aino@example.com	CS-A1150	Databases	SQL Basics	9
2	Aino	aino@example.com	CS-A1150	Databases	Filtering Queries	10
3	Leevi	leevi@example.com	CS-A1150	Databases	SQL Basics	8

If we read one row at a time, this table looks fine.

Reading by rows is not enough here, however. It helps to read the table by kinds of facts instead. The user facts are repeated, the course facts are repeated, the exercise facts are repeated, and only a few of the columns are really about the submission itself. That repeated storage of the same fact is called redundancy.

Redundant Facts and Anomalies

When the schema contains redundant facts, it becomes fragile. That is because the same fact is stored in many places, and those places can get out of sync. These out-of-sync situations are called anomalies. There are three main kinds of anomalies: update anomalies, insert anomalies, and delete anomalies.

Update Anomalies

Suppose the course title changes from Databases to Databases and Applications. In the one-table design, the title is not stored once. It is stored in many rows. One conceptual change now requires many row updates.

In a one-table design, one fact stored many times must also be changed many times. As soon as that happens, there is room for inconsistent data.

If some rows are updated and others are not, the database no longer contains one clear course title. It contains several competing versions of the same fact. This situation is called an update anomaly.

Loading Exercise...

Insert Anomalies

Suppose the system wants to store a new course before any exercise or submission exists. In the one-table design, that is awkward because the row shape is really designed for submissions: adding a new course would require filling in submission-related fields, possibly with NULL values.

The schema is asking for submission-related values even though we only want to store a course. That is called an insert anomaly. One fact cannot be added cleanly by itself because the table shape assumes that some other fact already exists too.

Delete Anomalies

Suppose the last submission row for one exercise is deleted. If that row also contained the only stored copy of the exercise title or course title, deleting one submission would also delete another fact by accident. That is called a delete anomaly.

Loading Exercise...

The Schema Used from Now On

Now that the redundancy problem is visible, it helps to see the cleaner multi-table structure that the rest of this part uses.

This schema is close to the one introduced in Part 1, but it adds two tables that were not present there: course_instances and enrollments. The reason for the addition is that one course is usually offered more than once, and who is enrolled in a particular offering is a separate fact from the course itself. Once those two refinements are made, questions about, say, the spring 2026 instance of CS-A1150 become easy to express.

Figure 1 — The schema used in the SQL chapters of this part. Compared with Part 1, it adds course_instances and enrollments, and exercises now points to a course instance rather than directly to a course.

Courses, course instances, exercises, users, enrollments, and submissions are all stored in separate tables. Each table contains one kind of fact, and the relationships between the tables are expressed with foreign keys.

The earlier one-table design would now be split across several related tables. For example, the same information would be distributed like this:

Courses:

id	code	title
1	CS-A1150	Databases

Course instances:

id	course_id	term	year
1	1	Spring	2026

Exercises:

id	course_instance_id	title	deadline_at
1	1	SQL Basics	2026-02-01
2	1	Filtering Queries	2026-02-08

Users:

id	name	email
1	Aino	aino@example.com
2	Leevi	leevi@example.com

Enrollments:

user_id	course_instance_id
1	1
2	1

Submissions:

id	exercise_id	user_id	submitted_at	score	status
1	1	1	2026-01-30 10:00	9	submitted
2	2	1	2026-02-07 15:00	10	submitted
3	1	2	2026-01-31 12:00	8	submitted

Lots of tables and foreign keys, but no redundancy. Each fact is stored once, and the relationships between facts are expressed with foreign keys instead of repeated titles and names.

The SQL examples and exercises in Chapters 2-7 run against this teaching dataset, not against the study tracker project database. The project itself continues to grow from Chapter 9 onward, where cards and a small linking table are added on top of the existing decks and tags.

The interpretation also changes: stable user facts are stored once in users, stable course facts are stored once in courses, course offerings are stored in course_instances, and submission rows refer to related facts through identifiers instead of repeating titles and names.

For example, if the course title changes now, the change belongs in courses. The submission rows do not need their own copies of the course title anymore.

This structure reduces the earlier anomaly risks considerably. Updating the course title changes one row in courses. A new course can be added without needing any submission-related fields. A submission can be deleted without removing the only stored copy of a course title.

Queries Become Harder

There is one important tradeoff. When facts are split across tables, the schema becomes better structured, but the questions become less direct to write.

In the one-table design, one row already contained user, course, exercise, and submission information together. In the split design, that same view has to be reconstructed by joining the relevant tables.

This is one of the central ideas of relational databases: store facts in a sensible structure, then reconstruct useful views with queries.

The rest of this part is organized around that idea. The next chapter practices reading relationships from a schema before writing any join; the chapters after that introduce inner joins, left joins, more complex join patterns, and then subqueries and set operations. A later chapter then revisits the same queries through relational algebra, and the final two chapters turn these ideas into visible application features.

Loading Exercise...

Check Your Understanding

Why can repeated facts make a schema fragile?
What is the ordinary-language meaning of an insert anomaly?
Why does splitting data into several tables create a new need for joins?

AI Study Aid

Create a chapter diagram

We're looking into to what extent AI-generated diagrams could help with studying.

Use this study aid to generate an AI-generated visual summary of the material you just studied. Each diagram style emphasizes the content in a different way, so you can choose the focus that feels most useful.

Using the diagram generator is voluntary and does not affect your course progress. We encourage you to try it out and see if it helps, but it's totally up to you! Your answers help us understand how to make better study aids in the future.

Diagrams tried: 0

← Overview

Tables and Relationships →