Database Design and Modeling

Practicing Subject Analysis


Learning Objectives

  • You can apply the three-pass reading method to a case description you have not seen before.
  • You can produce a compact subject-analysis note that is organized by category rather than as one long list.
  • You can recognize when your own note is still hiding design decisions inside vague words.

In the previous chapter, we walked through subject analysis on one case. That is a good starting point. Subject analysis is a judgment skill, however, and judgment gets better with repetition on varied domains. This chapter gives you two new cases to practice on.

The two cases are intentionally different from each other and from the course-platform example. A recipe app and a library lending system ask the designer to notice different things. The goal is not to memorize the course-platform domain but to build a habit that transfers.

Working On This Chapter

For each case below:

  1. read the case description twice quickly to get the sense of the domain,
  2. do the three-pass reading from the previous chapter,
  3. produce a compact note with the same sections we used last time,
  4. then read the commented worked note that follows and compare it with your own.

The worked notes are not meant to be the only correct answer. Subject analysis usually has more than one defensible reading. What matters is whether your own note organizes the domain clearly and makes its uncertainties visible.

Loading Exercise...

Do not peek

The worked notes are genuinely more useful after you have tried the case yourself. It is tempting to read them first, but the skill only transfers if you have already committed to some answers before you see the reference.

Case 1: A Small Recipe App

Here is the first case description:

A recipe app stores recipes written by users. Each recipe has a title, a short description, a list of ingredients with quantities, and a sequence of preparation steps. The same ingredient, such as flour or sugar, can appear in many recipes, and one recipe can use many ingredients. Users can mark other users’ recipes as favorites so that they can find them again quickly. A recipe belongs to exactly one user, and that user can later edit or delete it.

Before reading on, try the three-pass reading yourself. What are the main things? What are the connections? What is the case description leaving unsaid?

The exercise below repeats the case description and asks you to write a compact subject-analysis note for it.

Loading Exercise...

A Commented Worked Note

Here is one reasonable note for the recipe app, with commentary on the non-obvious choices.

  • candidate entities: user, recipe, ingredient, preparation step

    Commentary: recipe and user are clearly entities. ingredient is also an entity because the same ingredient appears across many recipes and has a stable identity (flour is always flour). preparation step is a borderline case. It could be an entity with its own identity, or it could be modeled as an ordered list stored on the recipe. We will leave it as a candidate entity for now and revisit it later.

  • candidate relationship concepts:

    • a recipe uses many ingredients, and an ingredient appears in many recipes (many-to-many),
    • a user favorites many recipes, and a recipe is favorited by many users (many-to-many),
    • a recipe belongs to one user (one-to-many from user to recipe)
  • relationship with its own attribute: the connection between recipe and ingredient carries a quantity. That quantity does not belong to the recipe on its own, and it does not belong to the ingredient on its own. It belongs to the connection between a specific recipe and a specific ingredient. This is a strong hint that the relationship itself will need its own row in the later schema.

  • candidate events: none obvious. Editing and deleting a recipe are actions on stored data, but the case description does not say the system must remember the history of those actions.

  • business rules:

    • a recipe belongs to exactly one user,
    • only that user can edit or delete the recipe,
    • an ingredient may appear in many recipes,
    • a user may favorite many recipes
  • open questions:

    • are preparation steps their own entity, or an ordered list attached to the recipe?
    • are ingredient units (grams, cups, pieces) stored on the ingredient or on the recipe-ingredient connection?
    • can a user favorite their own recipe?
    • can the same ingredient appear twice in the same recipe (for example, “flour (for dough)” and “flour (for dusting)”)?
    • what happens to a user’s recipes if the user account is deleted?
  • likely application queries:

    • all recipes by this user,
    • all recipes that use this ingredient,
    • my favorite recipes,
    • ingredients and quantities for this recipe

Notice how different this note looks from the course-platform note. The recipe app has a genuine many-to-many with an attribute on the relationship (quantity), which the course platform did not have in the same clear way. That difference is an excellent example of why practicing on varied domains helps learning.

Case 2: A Small Library Lending System

Here is the second case description:

A public library keeps a catalogue of books. For each book in the catalogue, the library may own several physical copies. Members can borrow one copy at a time of a given book. When a copy is borrowed, the library records who borrowed it and when. When the copy is returned, the return date is recorded as well. Copies that are not returned within four weeks are considered overdue, and the system should be able to list the overdue loans so that staff can contact the member.

Try the three passes yourself before reading on. Like before, the exercise below repeats the case description and asks you to write a compact subject-analysis note for it.

Loading Exercise...

A Commented Worked Note

Here is one reasonable note for the library, again with commentary.

  • candidate entities: member, book (the catalogue entry), copy (a physical item), loan

    Commentary: the most important observation in this case is that book and copy are two different things. A book in the catalogue is a title such as “The Hobbit.” A copy is one physical object on the shelf. The library owns many copies of the same book. If we collapse them into one entity, we lose the ability to ask which physical copy is currently borrowed.

  • candidate relationships:

    • a book has many copies (one-to-many),
    • a copy may have many loans over time (one-to-many from copy to loan),
    • a loan belongs to exactly one member (many-to-one from loan to member)

    Commentary: notice that there is no direct relationship between book and member. A member borrows a copy, not a book. The connection from a member back to a book goes through the copy and the loan. This is the kind of path that Part 4 taught us to trace.

  • candidate events: borrowing, returning. Each of these is naturally an event with a timestamp.

    Commentary: in this domain, a loan is more naturally modeled as an event than as an attribute. The loan has a start date, possibly a return date, and it relates a member to a copy. An attribute on copy called “currently borrowed by” would work for the current state but would not let us ask who borrowed the copy a month ago.

  • business rules:

    • a member borrows one copy at a time of a given book,
    • a loan has a borrow date and possibly a return date,
    • a loan is overdue if not returned within four weeks
  • open questions:

    • can a member borrow two different copies of the same book at once (probably not, but the sentence is ambiguous)?
    • is “four weeks” a fixed rule or a per-book policy?
    • is “overdue” stored as a flag, or computed from dates?
    • what happens to old loan records after a copy is returned?
    • do we record which staff member processed the loan?
  • likely application queries:

    • which copies of this book are currently available?
    • which loans are overdue right now?
    • borrowing history of this member,
    • borrowing history of this copy
  • design hint: whether “overdue” is computed from dates or stored as a flag matters. If it is stored as a flag, the system must remember to update the flag when time passes. If it is computed, the system only needs the dates and the rule. Computed is usually easier to keep correct, but that is a decision for later.

Notice how this note surfaces two decisions that the case description did not make explicit: the book/copy distinction and the stored-vs-computed question for “overdue.” Those decisions will affect the schema substantially, and writing them down now means they will not get lost.

What These Two Cases Show Together

The recipe app and the library have very different shapes. The recipe app’s interesting feature is a relationship with its own attribute (quantity). The library’s interesting feature is that two entities that sound similar (book and copy) must actually stay separate.

These two patterns — “the relationship has an attribute” and “two similar-sounding things are really different” — appear in many domains. If you can spot them reliably on unfamiliar cases, a large part of subject analysis is already in place.

You may also notice that both cases leave things unsaid. Real case descriptions are almost always incomplete, because the people writing them know their domain well enough that they do not think to mention everything.

A useful habit is to read the case description as if you were a stranger to the domain. What would a stranger need explained? Those are usually the ambiguities worth writing down.

Same method, different domains

The three-pass reading is the same every time. What changes is which questions the domain surfaces. That is why practicing on varied domains matters more than practicing on the same domain repeatedly.


Loading Exercise...

Check Your Understanding

  1. Why is it useful to record that a relationship has its own attribute, such as “quantity” in the recipe case?
  2. Why are “book” and “copy” two different entities in the library case?
  3. Why does writing down ambiguities lead to better designs than resolving them by guessing?