Summaries and Derived Data

Readable Summary Queries

Learning Objectives

You can use aliases to make summary results easier to interpret.
You can use COALESCE and CASE to make summary output more useful, while staying honest about what the values mean.
You can apply time bucketing patterns to produce reports by day, week, or month.

A summary query that produces the right number is a good start. A summary query that produces the right number with a name and a format that the reader can immediately understand is better. This chapter focuses on the small additions that turn raw aggregates into output that reads cleanly.

These techniques have already been practised in the chapter Simple Expressions in Queries. The difference is that they now apply to grouped output, where the line between data and presentation often matters more, and where unclear output can mislead a reader who has no easy way to inspect the underlying rows.

Use Clear Names

Aliases matter especially in summary queries. A grouped column makes its meaning obvious; an aggregate without an alias does not.

Compare these two styles:

SELECT exercise_id, COUNT(*) FROM submissions GROUP BY exercise_id;

SELECT
  exercise_id,
  COUNT(*) AS submission_count
FROM submissions
GROUP BY exercise_id;

The first produces a column literally called count, which is a generic term that does not tell the reader what is being counted. The second produces a column called submission_count, which tells anyone reading the result exactly what each number represents.

In application code, the alias also becomes the key the template uses. A route returning row.submission_count is much clearer than row.count, and refactoring the query later becomes safer.

Loading Exercise...

The same applies for derived columns:

SELECT
  exercise_id,
  ROUND(AVG(score), 2) AS average_score,
  MAX(submitted_at) AS most_recent_submission_at
FROM submissions
WHERE status = 'graded'
GROUP BY exercise_id;

Each result column has an alias that explains both what it is and, where helpful, what units or shape it has. most_recent_submission_at ends in _at to signal that it is a timestamp, matching the column-naming convention used elsewhere in the schema.

`COALESCE` for Display

When a summary aggregate can return NULL for some groups, COALESCE provides a fallback value. This is most useful in display-oriented summaries where NULL would render as blank or as the literal string “None.”

SELECT
  exercise_id,
  COALESCE(ROUND(AVG(score), 2), 0) AS average_score
FROM submissions
GROUP BY exercise_id;

For an exercise with no graded submissions, AVG(score) would normally return NULL. COALESCE(..., 0) replaces that with 0 for display.

But this transformation comes with a warning. A real average of 0 is not the same thing as “no scores yet.” Both will show as 0 in this query, and the reader cannot tell them apart. If the difference matters, the query should keep NULL and the application should render it as something distinguishable.

A safer pattern when the difference matters is to expose both:

SELECT
  exercise_id,
  ROUND(AVG(score), 2) AS average_score,
  COUNT(score) AS graded_count
FROM submissions
GROUP BY exercise_id;

Here the application can distinguish “average is NULL” (because graded_count is 0) from “average is genuinely 0” (because there are graded submissions all with score 0). The query stays honest, and the application has the information it needs.

A fallback value is a design decision

COALESCE(AVG(score), 0) is convenient, but it also makes a claim about what missing data should look like. That claim should match the reader’s expectations; otherwise, the query is technically correct and practically misleading.

Loading Exercise...

Categories and `CASE`

Values can be turned into categories with CASE. This is useful when the summary should show interpretation, not only raw numbers.

SELECT
  exercise_id,
  CASE
    WHEN AVG(score) >= 8 THEN 'high'
    WHEN AVG(score) >= 5 THEN 'middle'
    WHEN AVG(score) IS NOT NULL THEN 'low'
    ELSE 'no graded submissions'
  END AS performance_tier
FROM submissions
GROUP BY exercise_id;

Reading top to bottom:

if the average score is at least 8, the exercise is in the “high” tier,
otherwise if at least 5, “middle”,
otherwise (meaning average score is below 5), “low” — but only if any graded submissions exist,
otherwise (meaning no graded submissions exist at all), “no graded submissions”.

The IS NOT NULL check distinguishes “low average” from “no data.” Without it, an exercise where every submission is still ungraded would fall into “low” by default, because NULL >= 5 is unknown and would not match the WHEN AVG(score) >= 5 branch.

One nuance worth naming: AVG(score) is computed across the graded submissions within each exercise’s group, because AVG ignores NULL. An exercise with a mix of graded and ungraded submissions is placed into a tier based on the graded ones alone. If the tiering should instead depend on “all submissions, treating ungraded as zero,” the query needs AVG(COALESCE(score, 0)) instead. The two answer different questions, so matching the query to the question matters.

CASE is also useful when the summary needs to bucket non-numeric values:

SELECT
  CASE
    WHEN status = 'graded' THEN 'completed'
    WHEN status IN ('submitted', 'late') THEN 'pending'
    ELSE 'other'
  END AS submission_state,
  COUNT(*) AS submission_count
FROM submissions
GROUP BY
  CASE
    WHEN status = 'graded' THEN 'completed'
    WHEN status IN ('submitted', 'late') THEN 'pending'
    ELSE 'other'
  END;

The same CASE expression has to appear in both SELECT and GROUP BY, because the grouping has to be by the same value the result shows. This is a slightly verbose pattern, and CTEs from the next chapter make it cleaner.

Loading Exercise...

Time Bucketing

Reports often ask for summaries per day, week, or month. The date_trunc function is the most common tool for this:

SELECT
  date_trunc('day', submitted_at) AS submission_day,
  COUNT(*) AS submission_count
FROM submissions
GROUP BY date_trunc('day', submitted_at)
ORDER BY submission_day;

date_trunc('day', ...) rounds a timestamp down to the nearest day. 'week', 'month', 'year', 'hour' all work too, with the corresponding granularity.

A common variation is to filter to a recent window:

SELECT
  date_trunc('day', submitted_at) AS submission_day,
  COUNT(*) AS submission_count
FROM submissions
WHERE submitted_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
GROUP BY date_trunc('day', submitted_at)
ORDER BY submission_day;

This produces one row per day for the last week, with the count of submissions on that day. The pattern is widely used in dashboards.

If the dashboard needs to show every day in the window, including the empty ones, the query has to be combined with a generated date series, which is more advanced and outside the scope of this course.

For most summary needs, the simple form above is enough. Recognizing when it is not enough is part of summary judgment.

Loading Exercise...

Time Bucketing in Different Granularities

The same pattern works at different time scales:

-- Hourly counts for the last day
SELECT date_trunc('hour', submitted_at) AS submission_hour, COUNT(*)
FROM submissions
WHERE submitted_at >= CURRENT_TIMESTAMP - INTERVAL '1 day'
GROUP BY date_trunc('hour', submitted_at)
ORDER BY submission_hour;

-- Weekly counts for the last quarter
SELECT date_trunc('week', submitted_at) AS submission_week, COUNT(*)
FROM submissions
WHERE submitted_at >= CURRENT_TIMESTAMP - INTERVAL '3 months'
GROUP BY date_trunc('week', submitted_at)
ORDER BY submission_week;

-- Monthly counts for the last year
SELECT date_trunc('month', submitted_at) AS submission_month, COUNT(*)
FROM submissions
WHERE submitted_at >= CURRENT_TIMESTAMP - INTERVAL '1 year'
GROUP BY date_trunc('month', submitted_at)
ORDER BY submission_month;

The window length should usually match the granularity. Hourly counts over a year would produce 8,760 rows, which is too many for any human to read. Daily counts over a few weeks, weekly counts over a few months, monthly counts over a few years — each combination produces a manageable result.

Combining Aliases, COALESCE, and Time Bucketing

A typical dashboard query brings several of these together:

SELECT
  date_trunc('day', submitted_at) AS submission_day,
  COUNT(*) AS submission_count,
  COUNT(DISTINCT user_id) AS distinct_user_count,
  COALESCE(ROUND(AVG(score), 2), 0) AS average_score
FROM submissions
WHERE submitted_at >= CURRENT_TIMESTAMP - INTERVAL '14 days'
GROUP BY date_trunc('day', submitted_at)
ORDER BY submission_day;

This produces a 14-day activity report with three numbers per day: total submissions, distinct submitting users, and average score. Each column has a clear name. The score average defaults to 0 for display purposes, with the caveat that callers should know the difference between “no scores yet” and “average is genuinely 0.”

This query inherits the no-data-days caveat from the earlier section: a day with zero submissions in the window will not appear in the result at all. For most dashboards, that is acceptable (the reader infers “missing means zero”), but when the dashboard has to show a complete 14-day row even for empty days, the query has to be combined with a generated date series.

A query like this is what most “analytics” pages eventually turn into. The shape — bucket by time, group by something, aggregate several numbers, name them clearly — repeats again and again across domains.

Readability Is Part of Correctness

SQL should not be judged only by whether it runs. A summary query is also part of communication.

If the result is hard to interpret, the reader may draw the wrong conclusion even when the SQL is syntactically valid. A column named count could be counting anything. An average of 0 could mean “no data” or “everyone scored zero.” A daily report missing days could be read as “those days had no activity” when in fact the query simply did not include them.

Readability fixes — clear aliases, careful fallbacks, explicit handling of NULL — are part of producing correct output, not separate from it.

Check Your Understanding

Why do summary aliases matter more than aliases in row-level queries?
When is COALESCE(AVG(...), 0) a useful display fix, and when does it hide important information?
What does date_trunc('day', timestamp) do, and why is it useful in summary queries?

SQL Programming Exercise

These exercises focus on making summary output easier to read, not just making the SQL run. You will practice clear aliases, CASE-based categories, time buckets, and using a fallback value.

Loading Exercise...

← Practicing Summary Queries

Complex Summary Queries with CTEs →

Readable Summary Queries

Learning Objectives

Use Clear Names

COALESCE for Display

Categories and CASE

Time Bucketing

Time Bucketing in Different Granularities

Combining Aliases, COALESCE, and Time Bucketing

Check Your Understanding

SQL Programming Exercise

`COALESCE` for Display

Categories and `CASE`