Practical Performance and Verifying Improvements
Learning Objectives
- You can recognize the most common performance problems in application-backed SQL usage.
- You can verify that a performance improvement actually helped, with measurements rather than guesses.
- You can describe a small set of safe defaults for performance work in introductory projects.
The previous chapters introduced the building blocks: indexes, query plans, EXPLAIN ANALYZE. This chapter brings them together as a practical skill: recognizing performance problems, applying the right fix, and confirming the fix actually worked.
Performance Often Relates To Several Decisions
Performance is rarely about one magic trick. It is usually about how schema design, query shape, application behavior, and data size interact.
A query is fast or slow because of:
- the schema it queries (normalized vs. denormalized, how data is split across tables),
- the indexes that support it (or do not),
- the shape of the SQL (whether it does extra work, joins through the right path, or asks for too much data),
- the application’s access pattern (one query per request, or N+1 queries that should have been one),
- the size and distribution of the data (a query that is fast on 100 rows can be slow on 100 million).
Performance work means looking at these together rather than reaching for a single fix. The most common mistake is to add an index without first understanding which of these is the actual problem.
Common Problems
A small set of patterns accounts for a large fraction of the performance problems that appear in real projects.
The N+1 problem. The application loads a list of items, then runs one extra query per item. For a list of 100 items, this is 101 queries instead of 1.
Missing indexes on hot lookup paths. A query filters on a column that is not indexed. For small tables it is fine; as the data grows, it becomes the dominant cost.
Loading too much data. A query selects all columns when only a few are needed, or returns thousands of rows when the page only displays the first ten.
Inefficient joins. A query joins through a path that does not have supporting indexes, or joins many tables when fewer would do.
Redundant work in loops. The application computes something in code that could have been computed in SQL (such as a summary), or does the same query in two different code paths that could have been combined.
This chapter focuses on the first two — N+1 and missing indexes — because they are the most common in projects of the scale this course covers.
The N+1 Problem
Suppose the dashboard shows a list of decks with their card counts. A naive implementation might look like:
def show_dashboard(conn) -> list[dict]:
decks = conn.execute(
"SELECT id, name FROM decks ORDER BY name"
).fetchall()
result = []
for deck in decks:
count = conn.execute(
t"SELECT COUNT(*) AS card_count FROM cards WHERE deck_id = {deck.id}"
).fetchone().card_count
result.append({"id": deck.id, "name": deck.name, "card_count": count})
return result
For 20 decks, this runs 21 queries: one for the deck list and 20 for the per-deck counts. Each query involves a round trip to the database, which has a small fixed cost. The 20 small queries together are usually slower than one larger query.
The fix is to compute the counts in one grouped query, as the summary chapters showed:
def show_dashboard(conn) -> list[dict]:
rows = conn.execute("""
SELECT d.id, d.name, COUNT(c.id) AS card_count
FROM decks AS d
LEFT JOIN cards AS c ON c.deck_id = d.id
GROUP BY d.id, d.name
ORDER BY d.name
""").fetchall()
return [{"id": r.id, "name": r.name, "card_count": r.card_count} for r in rows]
One query, regardless of how many decks. The database does the join and grouping in one pass; the application gets the results in one round trip.
The N+1 problem is most visible when the loop runs many iterations. For 20 decks, the difference is small but measurable. For 200 decks, it becomes obvious. For 2,000, the naive version becomes unusable.
The N+1 problem is one of the more common application-side performance bugs. It is easy to write and hard to notice until the data grows. The fix is usually straightforward — rewrite the query to do the grouping — but it requires recognizing the pattern first.
Commonly used ORMs (Object-Relational Mappers) that we have not covered in this course, such as Django’s ORM, can mask the problem if used naively. They often generate one query per item when you access a related field.
Detecting Missing Indexes
The signal for a missing index is a query that uses Seq Scan on a large table when the filter is selective.
EXPLAIN ANALYZE SELECT * FROM reviews WHERE card_id = 42;
If the output shows Seq Scan on reviews and the table has many rows, an index on card_id is likely worth adding.
A useful workflow:
- identify the slow query (from logs, from
\timingin psql, from user reports), - run
EXPLAIN ANALYZEon it, - look at the bottom of the plan tree; if it is
Seq Scanon a column that the query filters on, that column is a candidate for an index, - add the index, applying the change through a migration if it is a real-project schema change,
- run
EXPLAIN ANALYZEagain to confirm the plan changed and the time improved.
The last step is the verification. Without it, the index might exist but not be used (because of low selectivity, stale statistics, or other reasons), and the supposed fix would not actually help.
A Worked Verification
This section walks through the detect-and-fix pattern on a hypothetical scenario. The point is the pattern, not the project — the study tracker does not actually have a maintenance page that needs this index. The project checkpoints later in this part are where you will add index migrations to the project itself.
Suppose a project displays “all reviews ordered by recency” on a maintenance page, and that query has become slow:
EXPLAIN ANALYZE
SELECT * FROM reviews ORDER BY reviewed_at DESC LIMIT 50;
Limit (cost=2156.50..2156.62 rows=50 width=24) (actual time=18.123..18.135 rows=50 loops=1)
-> Sort (cost=2156.50..2406.50 rows=100000 width=24) (actual time=18.122..18.130 rows=50 loops=1)
Sort Key: reviewed_at DESC
Sort Method: top-N heapsort Memory: 28kB
-> Seq Scan on reviews (cost=0.00..1740.00 rows=100000 width=24)
(actual time=0.012..8.412 rows=100000 loops=1)
Planning Time: 0.092 ms
Execution Time: 18.165 ms
Diagnosis: the Seq Scan reads all 100,000 rows, then a top-N heapsort picks out the 50 most recent. The dominant cost is reading the whole table just to pick out a tiny ordered window.
Hypothesis: an index on reviewed_at should let the query walk the index in descending order and stop after 50 rows, skipping the table scan and the sort.
The migration that would add the index, in the form it would take in a real project:
CREATE INDEX reviews_reviewed_at_idx ON reviews (reviewed_at DESC);
After applying the migration, running EXPLAIN ANALYZE again on the same query shows:
Limit (cost=0.42..2.30 rows=50 width=24) (actual time=0.024..0.158 rows=50 loops=1)
-> Index Scan using reviews_reviewed_at_idx on reviews
(cost=0.42..3744.42 rows=100000 width=24) (actual time=0.022..0.150 rows=50 loops=1)
Planning Time: 0.078 ms
Execution Time: 0.180 ms
The plan now uses the index, walking it in descending order and stopping after 50 rows. Execution time dropped from 18 ms to 0.2 ms — about 100x faster. The hypothesis was correct, and the verification confirmed it.
If the verification had shown the same Seq Scan and the same time, something would have been off — perhaps the index was not created, or perhaps the optimizer had a reason to prefer the scan. In either case, the next step would have been to investigate, not to assume the fix worked.
This is the pattern: diagnose with EXPLAIN ANALYZE, form a hypothesis, apply the fix, verify with another EXPLAIN ANALYZE. Each step is small; together they make the difference between an actual improvement and a guess.
Other Verification Tools
EXPLAIN ANALYZE is the most versatile verification tool, but a few others are worth knowing.
\timing in psql. Turning on \timing in psql causes every query to report its execution time. Useful for ad-hoc comparison without writing EXPLAIN ANALYZE for each one.
=> \timing
Timing is on.
=> SELECT COUNT(*) FROM reviews WHERE card_id = 42;
count
-------
20
Time: 0.193 ms
Counting queries during a request. As discussed above, the number of queries per request is often the clearest signal of an N+1 problem. A page that should load with 1-2 queries should not be running 50.
Comparing seed-data sizes. A performance problem that disappears with smaller seed data is a sign that the algorithm scales poorly. Running a quick EXPLAIN ANALYZE on a small table and again on a 10x-larger table shows whether the time grows linearly, more than linearly, or barely at all.
pg_stat_statements. A PostgreSQL extension that tracks query frequencies and times. It is useful for production systems.
For the work in this course, EXPLAIN ANALYZE plus \timing covers almost everything.
Default Approaches
For projects at the scale of this course, there are a few default approaches that work well.
Index foreign-key columns that are frequently filtered. For the study tracker, this means cards.deck_id, reviews.card_id, and the columns of card_tags.
Use grouped queries instead of N+1 loops. Whenever the application does “load list, then loop and query per item,” check whether the inner queries can be combined.
Verify before changing. Every performance change should be accompanied by a before-and-after EXPLAIN ANALYZE. The change is not done until the verification confirms it over multiple runs.
Do not denormalize without measurement. Until profiling shows that the derived query is genuinely the bottleneck, the schema should stay normalized.
Do not optimize what is not slow. A query that runs in 5 ms does not need an index, even if one would technically apply. Indexes have costs, and spending those costs on queries that are not slow is waste.
These defaults are not exhaustive, but they cover the common cases. More advanced performance work — query rewriting, partial indexes, table partitioning, connection pooling — comes up in more specialized contexts.
Check Your Understanding
- Why is the N+1 problem often more impactful than a single inefficient query?
- Why is
EXPLAIN ANALYZEmore useful thanEXPLAINfor verifying a performance improvement? - Why is “the query feels faster” not a good enough signal that an optimization worked?