Schemas and Constraints

Migrations and Seed Data


Learning Objectives

  • You can explain why schema changes should be tracked through migrations rather than applied manually.
  • You can distinguish schema creation from seed-data loading and write each one appropriately.
  • You can connect a migration and seed example back to a working application page.

Projects that use a database use migrations: files that record the schema changes in a way that can be applied repeatedly.

We already introduced migrations in the course project. The practical message so far has been “put schema changes into migration files”. We’ll now look into the deeper reasons: migrations and seed data make the schema reproducible, reviewable, and safe to evolve.

What a Migration Is

A migration is a tracked, reproducible change to the database schema.

Instead of saying “create these tables somehow,” a project records the changes as SQL files that can be applied in a specific order. Anyone starting from the same migration files ends up with the same schema, whether it is a teammate on a fresh laptop, a grader on a continuous-integration server, or a developer returning after a two-week break.

Without migrations, a database is a snapshot whose history is invisible. With migrations, the database is the result of a specific sequence of changes, each reviewable on its own.

Schema and Seed Data Are Different

It is useful to separate two concerns.

Schema migrations change the structure: tables, columns, constraints, foreign keys. The CREATE TABLE (and ALTER TABLE that we’ll soon look into) statements belong here.

Seed data provides a small dataset that makes the application usable or testable immediately. The INSERT statements for starter rows belong here.

Both use SQL, and both can live in the same directory, but they have different responsibilities. Schema changes are usually stable once written and rarely need to be revised. Seed data changes more often as the application evolves and new seeds become useful.

The conventional pattern is to keep these concerns in separate files. A project might have:

  • 001_schema_courses_users.sql — creates courses and users,
  • 002_seed_initial_data.sql — inserts a few starter rows,
  • 003_schema_exercises.sql — adds the exercises table.

Separation is not strictly required by the migration helper, but it makes each file easier to read and review.

Loading Exercise...

A Small Running Example

A schema migration for the course-platform example might create:

CREATE TABLE courses (
    id SERIAL PRIMARY KEY,
    code TEXT NOT NULL UNIQUE,
    title TEXT NOT NULL
);

CREATE TABLE course_instances (
    id SERIAL PRIMARY KEY,
    course_id INTEGER NOT NULL REFERENCES courses(id),
    term TEXT NOT NULL,
    year INTEGER NOT NULL
);

CREATE TABLE exercises (
    id SERIAL PRIMARY KEY,
    course_instance_id INTEGER NOT NULL REFERENCES course_instances(id),
    title TEXT NOT NULL,
    deadline_at TIMESTAMP NOT NULL
);

A seed file might then add starter rows:

INSERT INTO courses (code, title)
VALUES ('CS-A1150', 'Introduction to Databases');

INSERT INTO course_instances (course_id, term, year)
VALUES (
  (SELECT id FROM courses WHERE code = 'CS-A1150'),
  'Spring', 2026
);

INSERT INTO exercises (course_instance_id, title, deadline_at)
VALUES (
  (SELECT id FROM course_instances
   WHERE course_id = (SELECT id FROM courses WHERE code = 'CS-A1150')
     AND term = 'Spring' AND year = 2026),
  'SQL Basics', '2026-05-10 23:59:00'
);

The seed uses SELECT subqueries to look up the foreign-key values rather than assuming that courses.id = 1. That pattern is more verbose but more resilient. If the migrations are ever re-run on a database with other data, or if the order of earlier migrations changes, the lookup still finds the right row.

For simple examples, assuming id = 1 works fine. The shortcut is worth using while the examples are small and only a few rows deep. Once the seed has more than a few rows or more than one table of references, the lookup pattern becomes more reliable than counting on sequence state.

Loading Exercise...

Migration Helper

In the project scaffold, the migration helper looks through the migrations/ directory, applies any SQL files that have not yet been recorded in a small tracking table called schema_migrations, and leaves already-applied files alone.

The container startup script runs the helper before the application starts:

python3 -m app.migrate
exec python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

This means the project can be started repeatedly without recreating everything from scratch each time. It also means that a fresh environment produces exactly the same schema as an existing one, as long as both see the same migration files.

A typical local workflow looks like this:

  1. run docker compose up --build,
  2. let PostgreSQL become healthy,
  3. let start.sh call python3 -m app.migrate,
  4. let the migration helper apply each unapplied .sql file in order,
  5. open the application and confirm that the seeded rows appear on a page such as /decks.

That sequence is worth reading carefully because it connects the design, the SQL, and the running software into one reproducible chain.

In real projects, the migration helper is often a more complex tool that can also roll back changes, run tests, and manage multiple environments. The scaffold’s helper is a simple version that focuses on the core idea of applying migrations in order and tracking which ones have been applied, without the need for an external dependency.

One Migration Rule

Never edit a migration file that has already been applied.

The scaffold tracks applied migrations by filename. When the helper runs, it looks up each filename in schema_migrations and skips any that are already there. If a file has been applied once and is then edited, the helper still sees the filename as applied and does not re-run it. The edit appears to do nothing.

In generic migration tools, migrations are typically tracked by a hash of the file contents and the filename. That approach allows edits to be detected. Edited files that have already been applied typically lead to a failed migration and an error.

When the failure is silent, it can be confusing. A student edits 003_tags.sql to add a column, runs docker compose up --build, checks the table, and sees no change. The migration file says the column should be there, but the database says it is not. The helper is correct; it applied the original file, and the edit was never seen.

The right pattern is to add a new migration file for any change after the original has been applied:

  • 003_tags.sql — the original (already applied),
  • 004_tags_add_color.sql — the change.

The new file adds the column, the helper sees a new filename, and the change takes effect.

Loading Exercise...

Verifying That Migrations Ran

After the migration helper reports success, it is worth confirming that the schema actually matches expectations. Useful quick checks include:

  • a route renders the seeded rows,
  • a follow-up SELECT returns the expected records,
  • the application logs clearly show that the schema and seed files were applied,
  • \dt inside psql lists the expected tables.

Do not stop at “the command finished.” A migration can report success while producing an unexpected result, for example if a seed file quietly failed but did not abort the whole run. A running page is the most reliable verification target, because it exercises the schema and the seed together.

Loading Exercise...

Why Reproducibility Matters

Once the migrations and seed files have run, the project has a foundation it can rebuild from. A fresh environment produces the same tables, the same starter rows, and the same dependent application pages, as long as both environments see the same migration files. That is what reproducibility means in practice, and it is what makes migrations more than just a way to remember CREATE TABLE statements.

The chain from design to running software now looks like this:

  • conceptual model,
  • relational schema,
  • migration and seed files,
  • seeded data,
  • visible page.

Every step in the chain is documented and repeatable. This is valuable both in teaching and in real projects. In real systems, migration discipline is how teams deploy schema changes without each developer having a subtly different local database.

Loading Exercise...

Check Your Understanding

  1. What is the difference between a schema migration and seed data?
  2. Why is it unsafe to edit a migration file that has already been applied?
  3. Why does a running page make a better verification target than “the command finished without errors”?
Loading Exercise...