Schemas and Constraints

Planning the Schema from the Design Package

Learning Objectives

You can turn an ER diagram and its supporting documents into a concrete table plan.
You can identify the foreign-key dependencies that decide the order of table creation.
You can check the table plan against the design rationale to see that nothing has drifted.

An ER diagram is a conceptual picture. A relational schema is a concrete structure with tables, columns, keys, and constraints that a DBMS can enforce. This chapter bridges the two.

The design package from the chapter Documenting a Design from the previous part contains the entities, the relationships, the identifier choices, and the rationale. What remains is mechanical: turn each piece into the right kind of SQL structure and check that the result matches the intent.

A Three-Step Mapping Sketch

For creating the schema, a useful workflow is:

give each main entity a primary key, usually a surrogate id,
turn many-to-many relationships into linking tables,
place foreign keys on the “many” side of one-to-many relationships.

This is not the only possible mapping workflow, but it covers the large majority of designs. When a case does not fit this pattern, the design package should already have explained why.

Entities Become Tables

Each main entity in the diagram becomes a table, and each attribute becomes a column.

A conceptual entity such as course usually becomes a table such as courses(id, code, title). A conceptual entity such as course_instance becomes something like course_instances(id, course_id, term, year).

The mapping is not fully automatic as you still have to decide how each entity is identified and which attributes belong to it. Once the design — which we practised in the last part — has answered those questions, the table structure follows without much additional thought.

One-to-Many Relationships

A one-to-many relationship is usually implemented by placing a foreign key on the “many” side.

If one course has many course instances, then the course_instances table gets a course_id column that refers to courses.id. The “many” side carries the reference because each course instance belongs to exactly one course, so each course-instance row needs one course identifier.

This pattern appeared implicitly when you practised writing joins. A path such as courses -> course_instances -> exercises only works because the schema makes those one-to-many relationships visible through foreign keys.

Loading Exercise...

Many-to-Many Relationships

A many-to-many relationship becomes its own table.

The course-platform has users enrolled in course instances. Neither users nor course_instances can store the relationship as a column, so the relationship gets its own table:

users,
course_instances,
enrollments(user_id, course_instance_id, joined_at).

The linking table allows one user to connect to many course instances and one course instance to connect to many users. If the design rationale said that the relationship has an attribute such as joined_at, the linking table is where that attribute lives.

Any attribute that depends on both sides of the relationship belongs to the relationship, not to either entity.

Loading Exercise...

A Worked Table Plan: The Course Platform

Using the course-platform design package from the end of the last part, the table plan looks like this:

courses(id, code, title) with code UNIQUE,
course_instances(id, course_id, term, year),
exercises(id, course_instance_id, title, deadline_at),
users(id, name, email, role) with email UNIQUE,
enrollments(user_id, course_instance_id, joined_at),
submissions(id, user_id, exercise_id, submitted_at, score, status).

Primary keys and foreign keys:

every main entity has a surrogate id primary key,
course_instances.course_id references courses.id,
exercises.course_instance_id references course_instances.id,
enrollments has a composite primary key (user_id, course_instance_id), and each part is also a foreign key,
submissions.user_id references users.id,
submissions.exercise_id references exercises.id.

Uniqueness on natural-value columns:

courses.code is UNIQUE because one code should identify one course,
users.email is UNIQUE because one email address should identify one user account.

These UNIQUE constraints come from the key-choice rationale: use surrogate id values for internal relationships, and keep natural values (code, email) as separate UNIQUE attributes.

This is a plan that the SQL implementation will follow. Writing the plan in this compact form first makes the later implementation straightforward, because all the structural decisions are visible at once.

Foreign-Key Dependencies Decide the Order

One important consequence of foreign keys is that some tables cannot exist before others. course_instances cannot be created before courses, because course_instances.course_id must refer to a real table. enrollments cannot be created before both users and course_instances.

For the course-platform plan, a valid creation order is:

courses,
users,
course_instances (depends on courses),
exercises (depends on course_instances),
enrollments (depends on users and course_instances),
submissions (depends on users and exercises).

The order is not unique; users could be created before courses, because neither references the other. But each step must come after any table it references.

This ordering matters for two reasons. It decides the order of CREATE TABLE statements in a migration, and it decides the order of INSERT statements in seed data. Both are sensitive to dependency order, and both fail with a foreign-key error when the order is wrong.

Dependencies are not alphabetical

It is tempting to order migrations alphabetically or by whatever order the tables appear in the diagram. Neither is reliable. The only order that works is one that respects the foreign-key dependencies.

Loading Exercise...

Matching the Plan Against the Design Rationale

Before writing any SQL, it is worth reading the table plan and the design rationale side by side. Every table and every foreign key in the plan should be traceable to a decision in the rationale. If the plan contains a table that the rationale did not discuss, either the plan has drifted or the rationale was incomplete.

For the course-platform example, the mapping is:

courses and the surrogate id match the rationale’s “each main entity uses a surrogate id” choice,
courses.code UNIQUE and users.email UNIQUE match the rationale’s “natural values as separate UNIQUE attributes” choice,
enrollments(user_id, course_instance_id) with a composite key matches the “enrollment as its own structure” paragraph,
users.role matches the “teachers and students in one table” paragraph,
submissions.score and submissions.status exist because the chapter Testing a Design Against Its Questions in the last part showed they were needed.

If a design change is needed during implementation, it is almost always a signal that the Part 5 rationale should be updated. The rationale is the source of truth, not a one-time document.

Loading Exercise...

A Second Small Plan: The Recipe App

The recipe app diagram maps to this plan:

users(id, name, email) with email UNIQUE,
recipes(id, user_id, title, description),
ingredients(id, name) with name UNIQUE,
recipe_ingredients(recipe_id, ingredient_id, quantity, unit),
favorites(user_id, recipe_id).

Primary keys and foreign keys:

users, recipes, and ingredients have surrogate id primary keys,
recipes.user_id references users.id,
recipe_ingredients has a composite primary key (recipe_id, ingredient_id), with both parts as foreign keys,
favorites has a composite primary key (user_id, recipe_id), with both parts as foreign keys.

A few of these columns are extensions over the subject-analysis from the last part. The note named the entities and the relationships but did not enumerate attributes; values like users.name, users.email, recipes.title, and recipes.description are added here because the application will need them. The note should be updated to record these decisions, since they are now part of the design.

Notice also that recipe_ingredients carries two relationship attributes: a quantity and a unit. The last part left this as an open question (“are ingredient units stored on the ingredient or on the recipe-ingredient connection?”). The plan answers it by putting both on the connection, because the same ingredient can appear with different units in different recipes (“1 cup of flour” in one recipe, “100 g of flour” in another).

A separate open question that the note raised was whether preparation steps deserve their own entity or can be modeled as text on the recipe. The plan above defers that question by leaving preparation steps out altogether. If the application needs to render or reorder steps individually, a recipe_steps(id, recipe_id, position, content) table would be the natural addition, but it does not appear in this plan.

A Third Plan: The Library

The library diagram maps to this plan:

books(id, title, isbn) with isbn UNIQUE when present,
copies(id, book_id),
members(id, name, email, joined_at) with email UNIQUE,
loans(id, copy_id, member_id, borrowed_at, returned_at).

Primary keys and foreign keys:

books, copies, members, and loans each have a surrogate id primary key,
copies.book_id references books.id,
loans.copy_id references copies.id,
loans.member_id references members.id.

Several decisions in this plan are extensions over the note from the last part rather than restatements of it. The note named member, book, copy, and loan as entities but did not list attributes for them. This plan adds name, email, and joined_at to members; title and isbn to books; and the borrow and return timestamps to loans. Each addition has a defensible reason — members need contact details, books need a stable external identifier, loans need their dates — but none of them is dictated by the earlier design. The rationale should be updated to include them.

Two of these are worth noticing in particular.

First, members.joined_at did not appear in the note at all. The plan adds it because the library is likely to want a “member since” date visible on member pages, and adding it here is cheaper than adding it later.

Second, loans.returned_at is nullable because a loan exists before the copy is returned. The NULL value here carries meaning — it means “not yet returned” — and later queries will depend on that.

When the Mapping Is Not Automatic

Most mappings follow the three-step sketch, but a few design choices need more thought.

A one-to-one relationship can be mapped by placing a unique foreign key on one side, or by having both sides share the same primary key. The design package should say which. If it does not, the design package is incomplete.

A composite natural key may need an explicit decision about whether the table uses the composite as its primary key or whether a surrogate id is added with a separate UNIQUE constraint on the composite. Both choices are defensible; the rationale should pick one.

An entity with no clear owner sometimes appears. For example, a category table that classifies items without belonging to any particular user or account. The schema can still be created, but the lack of an owning entity may affect later decisions about access control and deletion.

When such decision points appear, the rationale in the design package should be consulted first.

Loading Exercise...

Check Your Understanding

Why should the table plan be written before any CREATE TABLE statement?
Why does the order of table creation matter, and how is that order decided?
What does it mean for a table plan to drift from the design rationale, and what should happen when it does?

AI Study Aid

Create a chapter diagram

We're looking into to what extent AI-generated diagrams could help with studying.

Use this study aid to generate an AI-generated visual summary of the material you just studied. Each diagram style emphasizes the content in a different way, so you can choose the focus that feels most useful.

Using the diagram generator is voluntary and does not affect your course progress. We encourage you to try it out and see if it helps, but it's totally up to you! Your answers help us understand how to make better study aids in the future.

Diagrams tried: 0

← Overview

Creating Tables with SQL →