Knowledge, Retrieval, and Evaluation

Overview


In the previous part, we connected an LLM application to tools. In this part, we focus on two questions that appear as soon as an application needs to answer domain-specific questions:

  • where does the application get reliable context,
  • and how do we know whether the answers are good enough for the intended use?

Large language models already contain a great deal of general knowledge, but many useful applications depend on material that is local, changing, or specific to one organization. A CLI assistant for course policies, internal engineering guidelines, or local documentation cannot rely only on what the model happened to learn during training.

Retrieval helps with that problem. The application can search its own documents, select relevant pieces of text, and include them in the prompt. But retrieval alone is not enough. An application may retrieve the wrong passage, assemble the context badly, or still generate a weak answer. That is why this part also introduces evaluation.

This part also introduces a second kind of model call. In the earlier application parts, we only sent chat-style requests. Here, the application may also create embeddings for documents and questions so that it can retrieve relevant chunks before generating an answer.

Figure 1 summarizes the basic pattern that we will use in this part.

Fig 1. — Retrieval and evaluation turn a plain prompt-response application into a system that can ground answers in local data and check how well it performs.
Loading Exercise...

Retrieval begins before answer generation

It is easy to think of RAG as “ask a question and get an answer with citations”. In practice, much of the engineering work happens before the answer is generated at all.

Someone has to decide what documents are included, how they are chunked, how identifiers are assigned, how embeddings are created, and how many chunks are retrieved for one question. Those decisions shape the quality of the final answer just as much as the wording of the final prompt does.

Loading Exercise...

The structure of this part is as follows:

Finally, Recap and Feedback summarizes the part and prepares you for the final part on security, limits, and responsible use.