Schemas and Constraints

Constraints and Integrity Rules

Learning Objectives

You can use constraints to enforce design decisions at the database level.
You can connect specific constraints to specific invalid states that they prevent.
You can choose between referential actions when deciding what happens on delete or update.
You can recognize when a constraint is genuinely protecting the design and when it is only adding friction.

A schema is more trustworthy when the database can reject invalid states automatically. The previous chapter already used three constraints: NOT NULL, UNIQUE, and REFERENCES. This chapter adds the rest of the practical set and connects each one to a design decision it protects.

Primary Keys

A primary key identifies each row uniquely.

id SERIAL PRIMARY KEY

This tells the database two things. First, the column must uniquely identify rows: no two rows may have the same id. Second, the SERIAL type asks the database to assign new values automatically, so the application does not have to choose them.

Without a primary key, the database could contain two rows that claim to be the same thing. That is almost never what the design intends.

For composite primary keys, the declaration moves to the table level:

CREATE TABLE enrollments (
    user_id INTEGER NOT NULL REFERENCES users(id),
    course_instance_id INTEGER NOT NULL REFERENCES course_instances(id),
    joined_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, course_instance_id)
);

The composite key says that the pair of values uniquely identifies one enrollment. No user can appear twice in the same course instance.

Foreign Keys

A foreign key connects one table to another.

course_instance_id INTEGER NOT NULL REFERENCES course_instances(id)

This does two useful things. It documents the relationship between the tables, and it asks the database to check that the referenced row exists. If course_instance_id = 17 appears in exercises, the foreign key rule says that course instance 17 must really exist in course_instances.

A DBMS will reject any attempt to insert a row with a foreign key value that does not match any row in the referenced table. That is one of the main ways the database protects the design from invalid states.

This is one of the main reasons joins can be trusted: the schema has already protected the relationship structure. You do not need to worry that the join will find garbage references, because the foreign key would have prevented them from being inserted in the first place.

Referential Actions

A foreign key also decides what happens when the referenced row is updated or deleted. These are called referential actions, and they are declared with ON DELETE and ON UPDATE clauses.

The most common options are:

ON DELETE NO ACTION (the default) — prevents the delete if any row still references it,
ON DELETE RESTRICT — the same effect as NO ACTION in practice, with slightly different timing in transactions,
ON DELETE CASCADE — when the referenced row is deleted, the referencing rows are deleted too,
ON DELETE SET NULL — when the referenced row is deleted, the referencing column is set to NULL.

For example:

CREATE TABLE submissions (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
    exercise_id INTEGER NOT NULL REFERENCES exercises(id) ON DELETE CASCADE,
    submitted_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    score INTEGER,
    status TEXT NOT NULL
);

Here, deleting a user is blocked if that user has any submissions. Deleting an exercise removes all its submissions automatically.

Neither choice is universally right. The right answer depends on what the domain means by “deleted”. In this example, a user’s submissions are considered meaningful history that should not disappear silently, while an exercise’s submissions are considered part of the exercise and should be removed when the exercise itself is removed.

Loading Exercise...

Default is restrictive

Without an ON DELETE clause, PostgreSQL defaults to NO ACTION, which prevents the delete. That is usually a safe default. Explicit ON DELETE CASCADE should only appear when the design rationale has discussed the cascade and accepted it.

Loading Exercise...

A Worked Referential-Action Decision

Consider the project’s own relationship: cards belong to decks. When a deck is deleted, what should happen to its cards?

The answer is a design decision, not a syntax issue.

Option A: ON DELETE CASCADE. Deleting a deck removes all its cards. This matches a user model where “delete this deck” means “I don’t want any of this material anymore.”

Option B: ON DELETE RESTRICT. Deleting a deck is blocked while it has cards. The user must either move the cards or delete them first. This matches a user model where deck removal is rare and should be deliberate.

Option C: ON DELETE SET NULL. The cards stay, but their deck_id becomes NULL. This requires deck_id to be nullable, and it implies that cards can exist without belonging to any deck. For the study tracker, for example, this is usually not what the design wants — otherwise, we would have orphaned cards in the database that would not be shown anywhere in the user interface.

In practise, each option is a different answer to the question “what does it mean to delete a deck?”. The design rationale should include this — if it does not, the rationale should be updated to match the implementation.

Loading Exercise...

Other Useful Constraints

A few other constraints appear often.

UNIQUE prevents repetition where repetition should not be allowed. The course-platform schema already uses UNIQUE on two columns:

code TEXT NOT NULL UNIQUE
email TEXT NOT NULL UNIQUE

No two courses can share a code, and no two users can share an email address. Composite uniqueness is also available:

UNIQUE (course_id, term, year)

This says that the triple of values must be unique across all rows, even though each column individually may repeat. The same course can be offered many times, but not twice in the same term and year.

CHECK expresses simple validity rules:

year INTEGER NOT NULL CHECK (year >= 2000)

This rejects rows where year is before 2000. The database will prevent such a row from being inserted or updated into place.

DEFAULT provides a value when none is supplied:

joined_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP

The default is not a constraint that rejects bad data. It is a convenience that fills in common cases automatically.

A Realistic Example with Several Constraints

Putting several constraints together on the full course-platform schema:

CREATE TABLE courses (
    id SERIAL PRIMARY KEY,
    code TEXT NOT NULL UNIQUE,
    title TEXT NOT NULL
);

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE,
    role TEXT NOT NULL CHECK (role IN ('student', 'teacher'))
);

CREATE TABLE course_instances (
    id SERIAL PRIMARY KEY,
    course_id INTEGER NOT NULL REFERENCES courses(id) ON DELETE RESTRICT,
    term TEXT NOT NULL,
    year INTEGER NOT NULL CHECK (year >= 2000),
    UNIQUE (course_id, term, year)
);

CREATE TABLE exercises (
    id SERIAL PRIMARY KEY,
    course_instance_id INTEGER NOT NULL REFERENCES course_instances(id) ON DELETE CASCADE,
    title TEXT NOT NULL,
    deadline_at TIMESTAMP NOT NULL
);

CREATE TABLE enrollments (
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    course_instance_id INTEGER NOT NULL REFERENCES course_instances(id) ON DELETE CASCADE,
    joined_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, course_instance_id)
);

CREATE TABLE submissions (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
    exercise_id INTEGER NOT NULL REFERENCES exercises(id) ON DELETE CASCADE,
    submitted_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    score INTEGER CHECK (score IS NULL OR (score >= 0 AND score <= 100)),
    status TEXT NOT NULL CHECK (status IN ('submitted', 'graded', 'late'))
);

This one set of tables already enforces several design rules:

every course has a unique code and every user has a unique email,
every user’s role is one of a known set of values,
no two course instances of the same course share a term and year,
a course cannot be removed while any of its instances still exist,
an exercise and its submissions go together when the course instance is removed,
an enrollment disappears when either its user or its course instance is removed,
a submission belongs to a real user and a real exercise,
the submitted_at timestamp is always set,
the score, if present, is between 0 and 100,
the status value must be one of three allowed values.

Notice the shape of the score check. The column is nullable, so the check allows NULL. Without the score IS NULL OR ... guard, the constraint would reject ungraded submissions because NULL comparisons evaluate to unknown, and a failing check treats unknown as failure.

It is useful to read this schema line by line and ask what bad row each line prevents. That habit turns constraints from syntax into meaning.

This is the canonical course-platform schema for the rest of Part 6. Later chapters refer to these choices when they talk about the “course-platform schema” or the “running schema.”

Sensible and Less Sensible Constraints

Constraints are useful only when they match the meaning of the data. The goal is not to add as many as possible but to add the right ones.

For example, this is sensible:

code TEXT NOT NULL UNIQUE

in a courses table, if each course code should really identify one course.

This would usually be a bad idea:

title TEXT UNIQUE

in the same table, because different courses could reasonably share similar titles, and titles change over time.

Wrong Use of `UNIQUE`

In the previous part, the course-platform case description noted that “a student may resubmit an exercise, and in that case only the most recent submission counts for grading.”

A constraint like:

UNIQUE (exercise_id, user_id)

on submissions would directly contradict that rule, because it would prevent the same student from submitting the same exercise twice.

If the domain instead said “a student may submit once per exercise,” the UNIQUE constraint would be a good match. Neither domain is wrong in the abstract. What matters is that the constraint matches the rule from the design package.

Why Rules in the Database?

Application code should certainly validate inputs. But if the database also enforces the rules, the protection becomes more reliable, because:

more than one application may write to the database,
bugs can exist in application code,
direct scripts or maintenance commands may also change data in ways the normal UI does not.

Constraints make invalid states harder to create accidentally, no matter how the data arrives.

Loading Exercise...

Check Your Understanding

What is the difference between ON DELETE RESTRICT and ON DELETE CASCADE?
Why does a CHECK constraint on a nullable column usually need a column IS NULL OR ... guard?
Why should constraints reflect rules from the design package rather than the implementer’s preferences?

SQL Programming Exercise

Here, you again practise writing SQL statements for creating tables. You first create a table with only the basic constraints, and then extend it so that it has a set of constraints.

Loading Exercise...

← Creating Tables with SQL

Practicing Schema Implementation →

Constraints and Integrity Rules

Learning Objectives

Primary Keys

Foreign Keys

Referential Actions

A Worked Referential-Action Decision

Other Useful Constraints

A Realistic Example with Several Constraints

Sensible and Less Sensible Constraints

Wrong Use of UNIQUE

Check Your Understanding

SQL Programming Exercise

Wrong Use of `UNIQUE`