Database Design and Modeling

Testing a Design Against Its Questions

Learning Objectives

You can walk a draft design through its expected questions and check whether the structure supports them.
You can recognize when a question cannot be answered without adding something to the model.
You can use the question list as a revision tool rather than only as a specification.

The Database Design Process chapter highlighted that database design should start from the questions the system must answer. The Reading a Case Description chapter added likely application queries to the subject-analysis note. At this point, designs should have questions attached to them. This chapter turns those question lists into concrete checks.

The idea is simple: for every question the design claims to support, trace the path through the model that would answer it. If the path does not exist, the model is missing something. If the path is awkward, the model may be structured in a way that makes the question harder than it should be.

The Basic Method

For each question in the list:

identify the entities the question mentions,
trace the path between them through the diagram,
check that each step in the path corresponds to a real relationship in the model,
check that the attributes needed by the question actually exist on the right entities,
if the path or the attributes are missing, note what needs to be added.

This is almost the same process as what we used to plan join paths before writing a query. The difference is that now you are doing it on a draft design, to find out whether the design is ready, rather than on a finished schema, to write a query.

A Worked Walk-Through

We will use the course-platform design from the previous chapters. Here is the diagram again for reference.

Figure 1 — The course-platform ER diagram we have been refining throughout this part.

And here is the list of likely application queries from the subject-analysis note:

Which exercises belong to a given course instance?
Which students are enrolled in a given course instance?
Which submissions still need grading?
What is the most recent submission from a given student for a given exercise?
Which students have enrolled in a course instance but never submitted any work?
What is the average score for a given exercise?

Let us walk through each question.

Question 1: Which exercises belong to a given course instance?

entities: exercise, course instance,
path: course instance -> exercise (through the one-to-many relationship shown on the diagram),
attributes needed: the course instance identifier to filter by, and some exercise attributes to show (probably title and deadline),
verdict: supported.

Question 2: Which students are enrolled in a given course instance?

entities: user, course instance,
path: course instance -> enrollment -> user,
attributes needed: the course instance identifier, and some user attributes to show (probably name and email),
verdict: supported.

Question 3: Which submissions still need grading?

entities: submission,
path: this is a single-table question; no relationships needed,
attributes needed: some way to distinguish ungraded submissions from graded ones,
verdict: this is where we find a gap. The current model for submission in the diagram does not explicitly show the status or score attributes. Without one of them, there is no way to filter for ungraded submissions.

That is a valuable result. We have just discovered that the diagram has been implicitly assuming submission has a score or a status, but neither has been made visible. Either we add status (with values like submitted and graded) or we rely on score IS NULL. Whichever we choose, the attribute needs to appear in the model so that the question can be answered.

Question 4: What is the most recent submission from a given student for a given exercise?

entities: submission, user, exercise,
path: user -> submission <- exercise with filters for this specific user and this specific exercise,
attributes needed: the user identifier, the exercise identifier, a timestamp on the submission to order by,
verdict: supported if submission has a submitted_at attribute.

Worth noting: the earlier case description said “a student may resubmit, and only the most recent submission counts for grading.” This question maps directly to that rule. The design needs to support both the question and the rule, and it does, as long as the timestamp is there.

Question 5: Which students have enrolled in a course instance but never submitted any work?

entities: user, course instance, enrollment, submission, exercise,
path: start from enrollment filtered by the course instance, follow to user, and then check whether that user has any submission for any exercise in the same course instance,
attributes needed: course instance identifier, user identifier,
verdict: supported, but it uses a relatively long path.

This one is worth inspecting. The query needs to cross from enrollment to submission via two separate sides: the user side and the course-instance side. The design supports it because the relationships are all there, but it is a good example of a question whose answer depends on the right combination of relationships being in place.

Question 6: What is the average score for a given exercise?

entities: submission, exercise,
path: exercise -> submission with an aggregate over the score,
attributes needed: exercise identifier, score on submission,
verdict: same issue as Question 3. The model is implicitly assuming score exists on submission but the diagram has not shown it.

Loading Exercise...

What the Walk-Through Told Us

Going through the six questions produced one concrete revision need: the diagram should show the status and score attributes on submission. Without them, two of the six questions cannot be answered.

It also reassured us that the relationship structure is complete. Every question that needed a multi-entity path found that path in the diagram. No new entities or relationships need to be added for these six questions.

This is exactly what the check is supposed to produce: a small concrete revision list, or reassurance that the design is solid. In either case, the result is useful.

Expect to find something

The first time you walk a design through its questions, you will usually find at least one gap. That is normal. Designs without gaps are almost always designs that have not been checked.

Loading Exercise...

When a Question Cannot Be Answered

If a question cannot be answered from the current model, there are usually three kinds of fixes:

Add an attribute. The model has the right entities and relationships, but one of them is missing a property that the question needs. Question 3 above was this kind of case.

Add a relationship. The entities exist, but the connection between them has not been modeled. For example, if the course-platform design did not have an enrollment structure, Question 2 would be unanswerable even though both user and course instance exist as entities.

Add an entity. The model has not recognized that a whole thing needs to be represented. This is rarer at the checking stage because entity-level omissions are usually caught earlier, but it does happen. For example, if a new question asked “who graded each submission,” the design would need either a grader attribute on submission or a richer representation of grading as an event.

Each of these fixes is relatively easy to make at the conceptual stage. All of them are harder to make after the schema has been implemented, because tables have data and applications have code that depends on the current shape.

When a Question Is Awkward

Sometimes the path exists, but it feels indirect. That is not always a design problem. Some questions are genuinely about longer paths in the domain, and the model is right to reflect that.

But if a question that users ask frequently requires a four-step path when a two-step path would be possible with a small design change, the design might be worth reconsidering. Frequently asked questions deserve shorter paths, all else being equal.

This is a softer judgment than the earlier cases. A single awkward-feeling question is not usually a reason to revise the model. A pattern of awkward questions often is.

Loading Exercise...

Contrastive Example: The Library Loan Question

The library case from the Practicing Subject Analysis chapter is a good example of how this check helps.

One of the expected questions was “what is the borrowing history of this member?” In the diagram from the previous chapter, this works: start from member, follow to loan, follow from loan to copy, and from copy to book. Four entities, three relationships, one path.

Consider an alternative design where the loan stored only the member and not the copy. Then the question “which book did this member borrow?” could not be answered directly. The design would have to be revised.

The walk-through surfaces these issues before they become schema problems.

Loading Exercise...

Use the Question List as a Revision Tool

The question list is not only a specification for what the system must do. It is also a tool for improving the design.

A useful habit is to revisit the question list at every revision of the diagram. Every time the model changes, walk through the questions again. A change that looks like a small improvement on one part of the diagram can silently break support for an unrelated question. Catching that breakage early is much easier than catching it after the schema is in use.

Check Your Understanding

Why is it useful to walk a design through its expected questions before calling it done?
What are the three main kinds of fixes when a question cannot be answered?
Why is an awkward path not always a design problem, but a pattern of awkward paths usually is?

Design Exercise

Use this as a two-step design check.

First revise the diagram structure so the missing question-path becomes explicit. Then explain, in plain language, why your revision closes the gap.

Loading Exercise...

← Entity-Relationship Diagrams

Keys, Identifiers, and Stability →