Summaries and Derived Data

Summary Queries in Applications

Learning Objectives

You can explain the difference between stored and derived data, and why this part defaults to deriving.
You can connect grouped queries to application features such as dashboards and summary pages.
You can decide when summary work belongs in the database, the route, or the template.
You can structure several related summary queries cleanly, either as multiple small queries or as one larger CTE-based query.

So far in this part, the focus has been on summary-oriented SQL itself. The next practical question is where that SQL belongs in a working application.

A summary page in an application is rarely just one query. A dashboard might show the deck count, the card count, the review count, the average rating per deck, and a small chart of recent activity. Each of those is a summary; together, they are a feature. Building such a feature well is not just about writing the SQL — it is about deciding which queries to run, where to compose them, and how to keep the route, the SQL, and the template understandable.

This chapter brings the summary toolkit from the previous chapters into application work. Before turning to the route and template structure, though, it is worth pausing on a design question that has been implicit throughout this part.

Stored vs. Derived Data

By this point in the part, you have written many summary queries. Some of those queries return values that the application uses repeatedly: the number of cards in a deck, the average rating per deck, the count of active loans per member. A natural question arises: should those values live in the database as stored columns, or should they be computed every time they are needed?

Stored data is kept explicitly in the database as a column value. It is written when a row is created or updated and read back as-is. A course title, an exercise deadline, a submission timestamp, a card’s question text — these are facts the system needs to remember directly. There is no way to compute a course title from anything else; it has to be stored.

Derived data is computed from stored data when needed. The number of submissions for an exercise, the average score across all submissions for a course, the count of cards in each deck — each of these can be computed from the underlying tables. The storage layer holds the facts; the query layer derives the summaries.

The choice between storing and deriving is a trade-off across three goals:

Consistency. Stored data can drift from reality. If decks.card_count is a stored column that is supposed to equal the number of cards in the deck, then every change to the cards table has to also update decks.card_count. A bug, a missed update, or a concurrent write can leave the stored count wrong. With derived data, this category of bug does not exist, because the count is always computed fresh.
Convenience. Stored data is easy to read. A query for “show me all decks with their card counts” is simpler when the count is already there as a column. With derived data, the query has to join, group, and count.
Performance. Stored data is faster to read. Computing a summary across thousands or millions of rows takes time; reading a precomputed value takes no time. For a high-traffic dashboard, this difference can matter.

Storing favors convenience and read performance. Deriving favors consistency. Choosing between them is a trade-off, not a universal rule.

This part defaults to deriving, and the next part takes up the trade-off in detail under the name denormalization, including the patterns for keeping a stored summary correct when storing really is the right choice.

Loading Exercise...

A Before-and-After Example

To make the trade-off concrete, consider a team thinking about adding card_count directly to the decks table:

ALTER TABLE decks ADD COLUMN card_count INTEGER NOT NULL DEFAULT 0;

Before storing it:

the count is computed with SELECT COUNT(*) FROM cards WHERE deck_id = {deck_id},
the only source of truth is the cards table,
adding, updating, or deleting a card just works,
the deck list page runs one extra grouped query.

After storing it:

the deck list page can read decks.card_count directly,
but every insert into cards must also UPDATE decks SET card_count = card_count + 1 WHERE id = {deck_id},
every delete from cards must decrement,
moving a card from one deck to another must decrement one and increment another,
if any of these updates is missed (in a route, in a script, in a future feature), the stored count drifts from reality,
and detecting the drift later requires re-running the derived query and comparing.

For a project where the count can be derived in a few milliseconds, the second scenario is much more work for a benefit that may not be needed. That is why “derive first, store reluctantly” is a reasonable default for this part: until measured pressure shows that deriving is too slow, derivation is simpler and more correct.

With that design note settled, the rest of the chapter focuses on how summary queries actually live inside route code.

A Dashboard for the Study Tracker

Suppose the study tracker should have a dashboard page that shows, at a glance:

the total number of decks,
the total number of cards,
the total number of reviews,
the average rating across all reviews,
a per-deck breakdown of card counts and review counts,
a small list of recent review activity.

That is a handful of queries. None of them are individually complex, but combining them well requires choosing how the route loads the data.

One Query Per Number, or One Combined Query?

Two strategies are possible.

Strategy A: Several small queries. Each summary is its own focused query. The route runs all of them and passes the results to the template.

@app.get("/dashboard")
def dashboard(request: Request):
    with get_connection() as conn:
        deck_count = conn.execute(
            "SELECT COUNT(*) AS deck_count FROM decks"
        ).fetchone().deck_count

        card_count = conn.execute(
            "SELECT COUNT(*) AS card_count FROM cards"
        ).fetchone().card_count

        review_count = conn.execute(
            "SELECT COUNT(*) AS review_count FROM reviews"
        ).fetchone().review_count

        average_rating = conn.execute(
            "SELECT COALESCE(ROUND(AVG(rating), 2), 0) AS average_rating FROM reviews"
        ).fetchone().average_rating
        per_deck = load_per_deck_breakdown(conn)
        recent_reviews = load_recent_reviews(conn)

    return render(
        request,
        "dashboard.html",
        deck_count=deck_count,
        card_count=card_count,
        review_count=review_count,
        average_rating=average_rating,
        per_deck=per_deck,
        recent_reviews=recent_reviews,
    )

This is verbose but easy to understand. Each query has one purpose. If something is wrong with the average rating, only one query has to be inspected. If a new summary is needed, a new small query can be added without touching the others.

Strategy B: One larger query with CTEs. The summary numbers can also be combined into one query that uses CTEs to compute each piece and then assembles the results.

WITH deck_stats AS (
  SELECT
    d.id, d.name,
    COALESCE(cc.card_count, 0) AS card_count,
    COALESCE(rc.review_count, 0) AS review_count
  FROM decks AS d
  LEFT JOIN (
    SELECT deck_id, COUNT(*) AS card_count
    FROM cards
    GROUP BY deck_id
  ) AS cc ON cc.deck_id = d.id
  LEFT JOIN (
    SELECT c.deck_id, COUNT(*) AS review_count
    FROM cards AS c
    INNER JOIN reviews AS r ON r.card_id = c.id
    GROUP BY c.deck_id
  ) AS rc ON rc.deck_id = d.id
)
SELECT * FROM deck_stats ORDER BY name;

This particular query handles the per-deck breakdown. Other summaries (totals, averages, recent activity) would each be one or two more queries.

Combining everything into one giant query is rarely a good idea, because the single query becomes hard to read and almost impossible to debug. Strategy A’s “several small queries” wins for clarity, while Strategy B — when several summaries share intermediate computation — reduces unnecessary computation that would otherwise be repeated.

Loading Exercise...

Several Small Queries vs. One CTE-Based Query

For dashboards that are used infrequently and where performance is not critical, the practical solution often is building a small number of focused queries, each computing one or a few related summaries. Each query stays short enough to read; the route stays short enough to follow.

Then again, if the dashboard is used frequently and performance is critical, a single CTE-based query that computes everything in one go can be the right choice. The decision is a trade-off between readability and performance, and it depends on the specific context.

In some contexts, it might be also acceptable to show stale data, which again changes the trade-off. The important thing is to make the trade-off consciously, rather than just writing one big query because it seems like the “right” way to do it.

Do Not Count Everything in Application Code

A beginner-friendly alternative to summary SQL is to compute summaries in Python:

decks = fetch_all_decks()
cards = fetch_all_cards()
deck_card_counts = {}
for card in cards:
    deck_card_counts[card.deck_id] = deck_card_counts.get(card.deck_id, 0) + 1

This works for tiny datasets and, even there, is a bad practice that one should avoid. For anything larger, it creates concrete problems:

First, it transfers more data than needed. If a deck has 10,000 cards, the application loads all 10,000 to compute one number. The database could have computed the count without sending any of the cards.

Second, it pushes work the database is good at into a layer that is less specialized. DBMSs have decades of optimization for grouped queries. Programming languages do not.

Third, it scatters logic. A count computed in the application is a count that lives outside the schema. If another route needs the same count, it has to re-implement the same logic. A SQL summary can be expressed once and reused.

When the question is naturally a summary question, SQL is almost always the clearer home for the work. A programmatic loop approach makes sense only when the summary depends on application logic that SQL cannot express, which is rare.

Counting and grouping belong in the database. Rendering and presentation belong in the template. The route is the thin layer in the middle that connects them. When summary logic creeps into the route or the template, it is usually a sign that the SQL could have done more.

Loading Exercise...

Templates Should Display, Not Compute

Once the route has loaded the summaries, the template’s job is to render them. Templates should not compute averages, look up totals, or perform aggregations of their own. Even Jinja, which has loop constructs, is the wrong place for summary work.

A clean dashboard template might look like:

<h1>Dashboard</h1>

<section class="totals">
  <p>Total decks: {{ deck_count }}</p>
  <p>Total cards: {{ card_count }}</p>
  <p>Total reviews: {{ review_count }}</p>
  <p>Average rating: {{ average_rating }}</p>
</section>

<section class="per-deck">
  <h2>Decks</h2>
  <table>
    <thead>
      <tr>
        <th>Deck</th>
        <th>Cards</th>
        <th>Reviews</th>
      </tr>
    </thead>
    <tbody>
    {% for row in per_deck %}
      <tr>
        <td>{{ row.name }}</td>
        <td>{{ row.card_count }}</td>
        <td>{{ row.review_count }}</td>
      </tr>
    {% endfor %}
    </tbody>
  </table>
</section>

<section class="recent">
  <h2>Recent reviews</h2>
  <ul>
    {% for r in recent_reviews %}
    <li>{{ r.deck_name }} / {{ r.question }} -- rating {{ r.rating }}</li>
    {% endfor %}
  </ul>
</section>

The template loops, displays, and formats. It does not compute. Each summary value is already in the shape it needs to be in. The next chapter turns this sketch into a complete worked template, including the empty-list case and the final route wiring.

When the template starts adding totals across the loop or computing percentages, the line between display and computation has been crossed. The fix is usually to move the computation into the SQL and pass the ready value to the template.

Loading Exercise...

Routes Stay Schema-Aware

The route still needs to ask the same design questions as before:

what does one result row mean,
which rows must remain visible (driving the choice of LEFT JOIN),
and which values are stored versus derived?

These questions do not go away when summary queries enter the application. If anything, they matter more, because a summary mistake silently affects every page that displays it.

The route is also where the verification habit from earlier has its biggest payoff. A route that loads several summary queries should at least once be checked end to end: do the displayed numbers actually match what the underlying data says? A quick SELECT COUNT(*) FROM cards in psql should equal the dashboard’s “total cards” number. If they disagree, something is wrong somewhere.

Loading Exercise...

Suitable Dashboard Choices for the Project

Good options for the project’s first summary-driven page include:

a deck overview that shows the number of cards and number of reviews per deck,
a recent-activity page that lists the last several reviews with deck and card context,
a “needs attention” page that highlights cards with low average ratings or cards that have not been reviewed yet,
a per-day activity chart for the last week or month,
or a combined dashboard that brings several of these together.

Each of these draws on the techniques from the previous chapters. A realistic first dashboard usually has three or four such elements, not a single one. The next chapter’s project checkpoint walks through one such page in detail and ends with a project-zip submission.

Loading Exercise...

Check Your Understanding

Why does this part default to deriving summary values rather than storing them as columns?
Why is “several small queries” usually clearer than “one giant combined query” for a dashboard route?
Why should templates display summary values rather than compute them?

SQL Programming Exercise

This sequence keeps the application context but isolates just the SQL side of the work. You will practice the kinds of summary queries that a summary page could load before any route or template code is involved.

Loading Exercise...

← Complex Summary Queries with CTEs

Project Checkpoint: Building a Dashboard →