Schemas and Constraints

Evolving a Schema Safely


Learning Objectives

  • You can describe a safe order of operations for common kinds of schema change.
  • You can write backfill SQL that fills in new columns for existing rows.
  • You can explain why column renames should be staged rather than done in one step.
  • You can recognize when a change is additive and when it requires more care.

Schemas do not stay fixed forever. New features, better designs, and new requirements often require change. This chapter discusses how to use ALTER TABLE and migrations to make schema changes safely.

Altering Table Structure

Table structure can be changed with ALTER TABLE. For example, to add a new column called description to the courses table, the command is:

ALTER TABLE courses
ADD COLUMN description TEXT;

That command changes the existing schema without recreating the whole database from scratch.

Such a change would belong to a new migration file, not in the original CREATE TABLE migration.

Schema Changes Need Care

A schema change can affect existing data, application code, queries, templates, and tests or grading scripts.

The main risk is that a change can leave the database and the application in disagreement. If the schema changes before the application is ready, the application may see columns that do not exist or fail to write rows it expects to save. If the application changes before the schema, the schema may silently accept rows that no longer match what the application sends.

Safe evolution is mostly a matter of sequencing the changes so that the system works at every step, not only at the end.

Additive Changes Are Usually Easier

Some changes are safer than others.

Examples that are often low-risk:

  • adding a new nullable column,
  • adding a new table,
  • adding an index,
  • adding a new default that does not affect existing rows.

Examples that need more care:

  • removing a column,
  • renaming a column,
  • tightening a nullable column to NOT NULL,
  • splitting one column into several.

The difference is whether the change can be applied without any existing data becoming invalid. Additive changes usually leave existing rows alone. Destructive or restrictive changes often require updating existing rows first.

Loading Exercise...

A Safer Order of Operations

A practical pattern for any change that is not purely additive:

  1. add the new structure,
  2. backfill the existing data so the new structure has valid values for old rows,
  3. update the application code to use the new structure,
  4. remove the old structure once nothing depends on it.

This order helps avoid the moments when the database and the application disagree.

For purely additive changes, steps 1 and 3 are usually enough: add the new thing, and start using it. The backfill and removal only apply when old data or old code needs to be transitioned.

A Worked Additive Change

Suppose the platform originally stores only:

  • exercises(id, course_instance_id, title, deadline_at).

Later, the team decides that exercises also need a published_at timestamp so that drafts can exist separately from visible exercises.

The safer change plan is:

  1. add published_at TIMESTAMP as a nullable column,
  2. decide whether old rows need a backfilled value (yes: treat all existing exercises as published, so set published_at to the current time),
  3. update the application code to read and write the new column,
  4. decide later whether published_at should become NOT NULL.

The migration numbers below are illustrative; in the course-platform’s hypothetical migration sequence, this would be the next available number, whatever that happens to be.

The first step is one migration file:

-- 036_exercises_add_published_at.sql
ALTER TABLE exercises
ADD COLUMN published_at TIMESTAMP;

The second step is usually a separate migration:

-- 037_exercises_backfill_published_at.sql
UPDATE exercises
SET published_at = CURRENT_TIMESTAMP
WHERE published_at IS NULL;

Keeping these in separate files matters for a practical reason: a migration that both adds a column and backfills it has two responsibilities, and if the backfill fails, it is harder to reason about what state the database is in. Two separate migrations are easier to reason about and easier to re-run safely.

The application update is not a migration. It is code change that comes out in the same deploy or after the migrations. Once the application code uses the new column, published_at is fully in use.

Later, if the team decides that every exercise must have a published_at, a further migration tightens the constraint:

-- 042_exercises_publish_required.sql
ALTER TABLE exercises
ALTER COLUMN published_at SET NOT NULL;

This migration only works because the earlier backfill already guaranteed that no row has published_at IS NULL. Without that guarantee, the tightening would fail on the existing data. That is exactly why the backfill matters.

Migrations in Transactions

Most migration management systems run each migration inside a transaction. If any statement in the migration fails, the whole transaction rolls back and the database is left unchanged. This means that if a migration file has multiple statements, they either all succeed or all fail together.

Due to this, multiple statements that logically belong together (like adding a column and backfilling it) can be put in the same migration file without risking a half-applied state. However, for clarity and maintainability, it is often better to separate them into different files, as described in the chapter.

A Worked Rename

Renaming a column is a classic example where doing it in one step usually causes trouble.

Suppose submissions.submitted_at should be renamed to created_at, because the team has decided that every time-stamped audit column in the schema should use the same name.

A naive one-step approach:

ALTER TABLE submissions
RENAME COLUMN submitted_at TO created_at;

This is a single SQL statement, and it runs successfully. The problem is that the moment it runs, every application query that still refers to submitted_at starts failing. If the rename is in the same migration as the application update and they deploy together, the rename succeeds but some queries may have been in flight against the old name. For a small course project this may be acceptable, but it is worth learning the safer pattern because it transfers directly to real systems.

The safer pattern is to stage the change through several migrations:

Migration 1: add the new column, copy data across.

ALTER TABLE submissions
ADD COLUMN created_at TIMESTAMP;

UPDATE submissions
SET created_at = submitted_at;

Update the application to write to both columns. This is application code, not a migration. Every INSERT and UPDATE that used to set submitted_at now also sets created_at. Read queries still use submitted_at at this stage.

Update the application to read from the new column. Another code change. Now all reads use created_at; writes still write both.

Migration 2: drop the old column.

ALTER TABLE submissions
DROP COLUMN submitted_at;

Clean up the application so that it only writes created_at.

At every step in this sequence, the schema and the application agree. No query fails because a column moved out from under it. This is more work than the one-step version, but for changes that affect running systems, it is worth the care.

For a local project, where nothing is deployed, the shorter path may be reasonable. What matters is recognizing when each approach fits.

Loading Exercise...

A Worked Tightening

Suppose a column is currently nullable but the design now requires it to be NOT NULL. For example, recipes.description started as optional but the team has decided every recipe must have a short description.

The naive one-step approach:

ALTER TABLE recipes
ALTER COLUMN description SET NOT NULL;

This fails immediately if any existing row has description IS NULL. That is the right behavior: tightening the constraint without handling the existing data would have silently left the table in an invalid state.

The safer pattern is to backfill first:

-- Migration 1: backfill existing rows.
UPDATE recipes
SET description = '(no description)'
WHERE description IS NULL;

-- Migration 2: tighten the constraint.
ALTER TABLE recipes
ALTER COLUMN description SET NOT NULL;

The application should also be modified so that it enforces the rule on new rows already before the migrations are run. This way, no new NULL values should be added by the application after the backfill and before the tightening.

Loading Exercise...

A Worked Split

Sometimes one column needs to become several. For example, if a shipping system currently stores an address as one long string in orders.shipping_address, and a new feature needs to query orders by city and country separately.

The domain here is mechanical: the split rule (“everything before the comma is street, the next segment is city, the last is country”) works for well-formed addresses. That is very different from splitting a human name, which is notoriously hard because names do not always cleanly split into two parts; for such cases the split rule belongs in the application, not the database.

Whatever the split rule, the schema-evolution pattern is the same:

  1. add the new columns as nullable,
  2. backfill them from the existing data (this may be a SQL UPDATE with string functions, or it may require application code if the split rule is complex),
  3. update the application to write to the new columns,
  4. update the application to read from the new columns,
  5. remove the old column,
  6. tighten the new columns to NOT NULL if appropriate.

Each step is a separate migration or code change. At no point do both old and new coexist in a way that the application cannot handle.

Loading Exercise...

What Goes Wrong Without Discipline

A few concrete failure modes are worth naming.

Silent edit of an applied migration. The developer edits an already-applied file to fix a typo. On their own machine, the change has no effect because the helper skips the filename. On a fresh machine, the change does take effect. The two environments now silently differ. The fix is the rule: every change is a new migration file.

Loading Exercise...

Schema-before-code change. The developer renames a column in a migration, restarts the application, and every route that used the old name starts returning 500. The error message (“column does not exist”) is not hard to read, but a deployed system cannot afford the downtime.

Tightening without backfill. The developer adds NOT NULL to a column that has existing NULL values. The migration fails, and the database ends up in a half-applied state depending on exactly how the helper handles the error.

Cascade without thinking. The developer adds ON DELETE CASCADE to a foreign key without checking what depends on the referenced rows. The next delete silently removes more data than expected.

Each of these is avoidable. Each of them comes up often enough in projects that the discipline in this chapter is worth developing early.

Check Your Understanding

  1. Why are additive changes usually safer than changes that remove or restrict structure?
  2. Why does renaming a column benefit from being staged through several migrations and code changes rather than done in one step?
  3. Why does tightening a column to NOT NULL usually require a backfill migration first?

SQL Programming Exercise

Here, you get to practise alter table and backfilling data. The

Loading Exercise...