Building Applications with Large Language Models

Conversations, State, and Structured Outputs


Learning Objectives

  • You understand how a CLI application can maintain conversation state.
  • You know why structured outputs are useful in LLM-powered programs.
  • You can connect message history and downstream program logic safely.

Conversation state

In a single-turn prompt, the request contains all the context the model needs. In a chat-style application, this is no longer true. The program must decide what previous messages to keep and how to store them.

At a minimum, a CLI chat application often keeps an array like this:

const messages = [
  { role: "system", content: "You are a concise assistant." },
];

Each time the user enters a new message, the program appends it to the array, calls the model, and then appends the model response as well.

This means that conversation state is ordinary program state. The application decides what goes into the message history, in what order, and for how long it remains there. That design choice affects both cost and quality.

State affects later answers

Conversation state affects later answers. This is useful when the application needs continuity, but it also creates trade-offs. A longer history costs more because more tokens are sent on every request. Irrelevant history can distract the model from the current task. State that is not reviewed can also preserve bad assumptions from earlier turns.

For that reason, the application should decide deliberately what stays in memory and what gets discarded or summarized.

In a small CLI application, one simple strategy is to keep the entire conversation in memory while the session is short. In a longer-running system, the application may need to trim old turns or replace them with a shorter summary. The important point is that the application should have a policy, not just an ever-growing array.

Loading Exercise...

Structured outputs

Many applications do not only want text. They want a response that the surrounding program can use directly.

For example, a CLI helper might ask the model to return a JSON object with fields such as intent, summary, and nextAction.

If the program expects these fields, the prompt should say so clearly and the code should still check that the fields really exist.

const messages = [
  { role: "system", content: "Return JSON with keys intent, summary, and nextAction. Do not include markdown fences." },
  { role: "user", content: "Summarize this bug report and suggest the next development action." },
];

It is often useful to think of this in two layers. The prompt layer requests a predictable structure from the model. The program layer then decides whether the returned structure is good enough to trust. If either layer is weak, the result becomes fragile.

Prompting for JSON or another explicit format is helpful, but it is not enough on its own. The surrounding code still needs to check whether the response actually matches the expected shape.

Loading Exercise...

Structured model outputs

Some provider SDKs provide specific ways for defining structured model outputs. See, for example OpenAI’s structured model outputs documentation. The provider-specific approaches can be more sophisticated than the one that we outlined here.

For instance, OpenAI’s structured model outputs rely on constrained decoding, where the set of tokens the model is allowed to generate at each step is restricted to those that keep the output valid with respect to the specified schema.

Text generation versus program input

A useful way to think about structured outputs is this: plain text is primarily for people, while structured output is primarily for programs.

In practice, many applications need both. A CLI program may print a human-readable answer to the user while also storing a structured summary internally.

Suppose that a command-line support tool asks the model to summarize a user request. The user may want a readable explanation on the screen, but the application may also want a machine-readable intent field so that later logic can decide whether to ask a follow-up question or file a ticket. One response may therefore serve two audiences at once.

A minimal structure prompt can be very short:

Read the conversation below.
Return valid JSON with exactly these keys:
- intent
- summary
- nextAction

Do not include markdown fences or extra text.

The advantage of a prompt like this is predictability. The application and the test writer both know what shape to expect.

Loading Exercise...

Keep the structure simple

If an application only needs two fields, ask for two fields. Overly complex response schemas make prompts harder to maintain and validations harder to explain.

Small, explicit structures are easier to test and easier to use safely in later program logic. That same design habit appears in tool schemas, evaluation records, and retrieval outputs later in the course. Compact structure is usually easier to reason about than ambitious structure.