Database Design and Modeling

Keys, Identifiers, and Stability

Learning Objectives

You can explain why every main entity needs a stable identifier.
You can distinguish natural keys, surrogate keys, and composite keys in practical terms.
You can justify a key choice based on stability, uniqueness, and expected use.

Identifiers are not only a SQL syntax issue. They are a design decision that affects every relationship and every future update in the schema.

At the beginning of the course, primary keys appeared as a feature of existing tables. In the next part, they will appear as constraints in CREATE TABLE statements. Between those, in this part, the question is more fundamental: what should identify each entity, and why?

Entities Need Identifiers

If the system needs to refer to a course, a user, a submission, or any other entity, it helps if that thing has a stable identity.

Without a stable identifier, the design becomes awkward quickly. How do we distinguish two rows with the same name? How do other tables refer to this row? How do we update one chosen row reliably if the value that identifies it can change at any time?

Every relationship in the ER diagram implicitly depends on identifiers. When the course-platform diagram shows that a submission belongs to a user, the submission row will eventually carry the user’s identifier in some form. The choice of identifier decides what that reference looks like.

Loading Exercise...

Natural Keys

A natural key is based on meaningful data already present in the domain.

Examples might include:

a course code,
a unique institutional email,
a national identifier,
an ISBN for a book.

Natural keys can be elegant when they are truly stable and unique. A course code can be a strong natural candidate if the institution guarantees that one course keeps the same code over time. An ISBN is a good natural key for books because it is designed to be globally unique and stable.

Natural keys have one major benefit: they are meaningful. If a query returns the course code CS-A1150, a reader who knows the domain immediately understands which course this is. With a surrogate identifier like 42, no such recognition is possible without looking up the row.

Natural keys also have one major risk: if the value ever changes, every reference to that entity elsewhere in the schema must change with it. That is often more work than the benefit is worth.

Loading Exercise...

Surrogate Keys

A surrogate key is an artificial identifier created for the system, usually an integer id generated automatically.

Depending on the case, surrogate keys might also be UUIDs, strings, or other opaque values. The key point is that they have no meaning outside of their role as identifiers.

This is very common in application databases because it gives a simple stable reference even if names or other business values change later. The id has no meaning outside of its role as an identifier, which is why it never needs to change.

In the course-platform example, every main entity has a surrogate key: courses.id, users.id, exercises.id, and so on. The meaningful attributes like code, email, and title are still stored, but the relationships between tables use the surrogate identifier, not the meaningful value.

Surrogate keys have one trade-off worth naming. When you see a foreign key column like course_id = 42, the value does not tell you anything about which course this is. You have to look it up. Natural keys would show course_code = 'CS-A1150' directly. This is a small inconvenience, but, for large systems, it is usually worth the trade-off because changeable natural values cause more trouble than opaque surrogate values do.

Both at once is normal

Many designs use surrogate keys as primary keys and also enforce uniqueness on natural identifiers through a separate constraint. The id handles the relationships; the UNIQUE constraint on code or email prevents duplicates of the natural value.

Loading Exercise...

Composite Keys

Sometimes identity depends on more than one column.

The most common example is a linking table for a many-to-many relationship. An enrollment might reasonably be identified by (user_id, course_instance_id), because what uniquely identifies an enrollment is the pair of one specific user and one specific course instance.

Other examples appear when the natural identifier of an entity only makes sense in context. A seat in a theatre might be identified by (row_number, seat_number) because either value alone is ambiguous.

Composite keys work well when the combination is truly what identifies the row. They are less convenient when other tables need to reference the entity, because the reference must also be composite. That is one reason some designs use a composite UNIQUE constraint along with a simple surrogate id: the uniqueness rule captures the identity, and the id makes references simpler.

Loading Exercise...

A Worked Key Discussion

Consider these choices in the course-platform example.

courses.id: surrogate key. The table uses an integer id as the primary key. This is simple, stable, and easy for other tables to reference.

courses.code: natural candidate. The course code is a natural value that should be unique. Rather than making it the primary key, the design could enforce uniqueness through a UNIQUE constraint and still uses the surrogate id for references. This protects against the cost of a course code ever changing.

users.id: surrogate key. Same reasoning as for courses.id.

users.email: natural candidate, with caution. Email is a good candidate for uniqueness, but many systems let users change their email address. A surrogate id for references and a UNIQUE constraint on email handles both concerns.

enrollments(user_id, course_instance_id): composite key. The relationship itself is identified by the pair. Either user_id or course_instance_id alone does not identify one enrollment. The composite key captures this naturally.

This kind of mixed strategy is common in real systems. One table does not force every other table to make the same key choice.

Design Trade-offs

There is no single perfect key rule for every case. Useful practical questions include:

is the value guaranteed to be unique?
can it change later?
will many other tables need to refer to it?
would a simpler surrogate key make the design easier to maintain?
does the domain already provide a stable identifier that the system can adopt?

Good key choices make later foreign keys, joins, and updates much more straightforward. Weak key choices often surface as awkward updates or cascading rewrites that could have been avoided.

Contrastive Example: Name as Identifier

Using name as the identifier for users would be a poor choice. The concrete problem is the cascading update: if a user changes their display name, and name is the primary key, then every foreign key row that referenced that user — every submission, every enrollment, every favorite — must also update its stored name value. In a small table that is annoying. In a real application it is a correctness risk, because if any of the cascading updates is missed, the data silently becomes inconsistent.

A surrogate id avoids the problem entirely. The name column can still exist as an ordinary attribute, and changing it affects only one row. The relationships do not care, because they reference the stable id instead.

Meaningful is nice, stable is essential

Human-readable values are pleasant to work with, but stable values are what relationships need. When these goals conflict, stability usually wins, and the meaningful value becomes a separate UNIQUE attribute.

Check Your Understanding

What is the difference between a natural key and a surrogate key?
When can a composite key make sense, and when is a surrogate key plus a UNIQUE constraint often simpler?
Why should key choice be justified by stability and expected use, not only by what looks readable?

AI Study Aid

Create a chapter diagram

We're looking into to what extent AI-generated diagrams could help with studying.

Use this study aid to generate an AI-generated visual summary of the material you just studied. Each diagram style emphasizes the content in a different way, so you can choose the focus that feels most useful.

Using the diagram generator is voluntary and does not affect your course progress. We encourage you to try it out and see if it helps, but it's totally up to you! Your answers help us understand how to make better study aids in the future.

Diagrams tried: 0

Design Exercise

Apply the chapter’s trade-off logic to a compact model.

For each entity, choose an identifier strategy and justify it with stability, uniqueness, and expected reference use.

Loading Exercise...

← Testing a Design Against Its Questions

Design Patterns and Anti-Patterns →