Schemas and Constraints

Creating Tables with SQL


Learning Objectives

  • You can write basic CREATE TABLE statements from a table plan.
  • You can choose reasonable data types for the columns in a design.
  • You can read a CREATE TABLE statement as a record of design decisions.

Once the table plan is clear, the next step is to write the SQL that creates those tables.

In this course, SQL appears in three main roles:

  • as a query language for reading data,
  • as a data definition language (DDL) for creating and changing tables,
  • and as a modification language for changing stored rows.

This chapter focuses on the second of those roles.

A Basic CREATE TABLE Example

The CREATE TABLE statement defines a new table in the database. For example, the courses table from the course-platform example can be created with this statement:

CREATE TABLE courses (
    id SERIAL PRIMARY KEY,
    code TEXT NOT NULL UNIQUE,
    title TEXT NOT NULL
);

This statement defines a table, its columns, and a few core design choices.

Read it slowly:

  • CREATE TABLE courses says what kind of thing is being stored,
  • id SERIAL PRIMARY KEY says each row gets a stable identifier, and the database will assign it automatically,
  • code TEXT NOT NULL UNIQUE says every course must have a code and no two courses can share one,
  • title TEXT NOT NULL says every course must have a title.

Every word here corresponds to a decision the design package made. The SERIAL choice matches the “surrogate id” rationale. The NOT NULL on code and title matches the domain rule that every course must have both. The UNIQUE on code comes straight from the key-choice rationale: surrogate id for internal relationships, with the natural value (code) kept as a separate UNIQUE attribute.

Tables with relationships are created in the same way, with the addition of a REFERENCES clause to turn an ordinary column into a foreign key. For example, the course_instances table can be created with this statement:

CREATE TABLE course_instances (
    id SERIAL PRIMARY KEY,
    course_id INTEGER NOT NULL REFERENCES courses(id),
    term TEXT NOT NULL,
    year INTEGER NOT NULL
);

Now the schema contains both structure and a relationship. The course_id column is the one-to-many relationship from the table plan that now becomes visible in SQL.

Three details are worth noticing:

  • course_id INTEGER matches the type of courses.id, which SERIAL creates as an integer,
  • NOT NULL matches the domain rule that every course instance must belong to a course,
  • REFERENCES courses(id) tells the database to check that the referenced row exists.

The next chapter explains what else the database does with that reference. For now, the important point is that REFERENCES turns an ordinary integer column into a tracked relationship.

Loading Exercise...

One More Step: Exercises

Let’s add one more table to the schema, to show how the structure grows as the design package is implemented. The exercises table can be created with this statement:

CREATE TABLE exercises (
    id SERIAL PRIMARY KEY,
    course_instance_id INTEGER NOT NULL REFERENCES course_instances(id),
    title TEXT NOT NULL,
    deadline_at TIMESTAMP NOT NULL
);

With three related tables in place, the schema already starts reflecting how the platform works in practice.

It also enforces the order from the previous chapter. courses must be created before course_instances, and course_instances must be created before exercises, because the later tables refer to the earlier ones.

Choosing Data Types

A small practical set of types covers most cases.

  • INTEGER or SERIAL for identifiers and whole-number values,
  • TEXT for free-form text of any length,
  • BOOLEAN for true/false values,
  • DATE for calendar dates without a time component,
  • TIMESTAMP for date-time values,
  • NUMERIC for decimal values where precision matters, such as money or measurements.

The right type helps both correctness and later querying. For example, deadline_at TIMESTAMP communicates something different from deadline_at TEXT. The timestamp version tells both the database and the reader that the value is a real date-time, not just a piece of text that happens to look like one. With a real timestamp type, later queries can order by time, compare against the current moment, and compute intervals.

TEXT is the default for strings in PostgreSQL

DBMSs allow storing varying-length character strings in fixed-length CHAR(n) or variable-length VARCHAR(n) columns. Internally, in PostgreSQL, both of those types are implemented as variable-length strings, so the length limit is only a constraint on the values, not a storage optimization. The TEXT type is a more flexible alternative that does not require a length limit. It is the idiomatic choice for free-form text.


Loading Exercise...

Working Through the Plan

The table plan from the previous chapter becomes DDL by working through it one table at a time, respecting the dependency order. Two tables in the plan are worth a closer look beyond the simple courses and course_instances cases above.

The enrollments table has a composite primary key:

CREATE TABLE enrollments (
    user_id INTEGER NOT NULL REFERENCES users(id),
    course_instance_id INTEGER NOT NULL REFERENCES course_instances(id),
    joined_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, course_instance_id)
);

When a primary key spans more than one column, it has to be declared at the table level rather than inline on one column. The joined_at column shows another small piece of SQL syntax: DEFAULT CURRENT_TIMESTAMP tells the database to fill in the current time when the application does not supply a value.

The submissions table has a column that is allowed to be NULL:

CREATE TABLE submissions (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id),
    exercise_id INTEGER NOT NULL REFERENCES exercises(id),
    submitted_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    score INTEGER,
    status TEXT NOT NULL
);

submissions.score is the only nullable column in the schema. That is because the score is absent until the submission has been graded. NOT NULL would have forced a placeholder value, which would have been more confusing than a real NULL.

Loading Exercise...

On the other hand, submissions.status is NOT NULL, but the schema does not yet constrain which values are allowed. A CHECK constraint could pin that down. The next chapter discusses when to add such constraints and when to leave them out, and shows the canonical course-platform schema with all of its constraints in place.

Loading Exercise...

Reading DDL as Design Decisions

A CREATE TABLE statement is both syntax and a record of design choices:

  • what the entity is called,
  • what its attributes are,
  • which values are required,
  • which other table it depends on,
  • what happens automatically when a row is inserted.

That is why it is useful to read DDL slowly, as if it were part of the design explanation itself. Once the design package is real SQL, reading the SQL is a way to check that the design survived translation.

For example, reading the enrollments table above:

  • an enrollment connects one user to one course instance,
  • both references must point to real rows,
  • the enrollment has a joined_at timestamp that defaults to the current time,
  • the pair (user_id, course_instance_id) uniquely identifies one enrollment.

Each of those statements is a design decision that the earlier design rationale should have justified. The DDL is the machine-readable form of the rationale.

CREATE TABLE does not ask the database for rows. It changes the structure that later queries will use.

A default value such as DEFAULT CURRENT_TIMESTAMP is part of the design too. It decides what happens when the application is silent about a column. Most inserts into enrollments will happen at the moment of enrollment, and the default removes the need for the application to compute the timestamp itself. Like other design decisions, defaults are worth being deliberate about.

Check Your Understanding

  1. What design decisions can you read directly from a CREATE TABLE statement?
  2. Why is the chosen data type part of the design, not only syntax?
  3. When is VARCHAR(n) a more appropriate choice than TEXT?

SQL Programming Exercise

Here, you get to practise SQL for creating a schema, starting with a single table. The exercise sequence uses the same course example as the materials do. Do not directly copy-paste what you saw earlier. Instead, write the statements based on the handout.

Loading Exercise...