Knowledge, Retrieval, and Evaluation

Embeddings and Similarity

Learning Objectives

You understand the basic idea of an embedding.
You know why similarity search is useful in LLM applications.
You can implement a simple similarity calculation in code.

From text to vectors

An embedding is a numeric representation of a piece of text. Instead of storing only the original words, we also store a vector that places the text in a high-dimensional space.

We discussed embeddings and word representations in a bit more depth in the Introduction to Large Language Models course.

The key idea is that pieces of text with similar meaning often end up closer to each other in that space than unrelated pieces of text. That makes embeddings useful for retrieval. If a user asks “How should I store API keys for this project?”, then a system should ideally retrieve chunks about secrets, configuration, and environment variables, even if the document does not use exactly the same phrasing.

Embeddings matter because they let us compare pieces of text by similarity. Suppose that we have three document chunks:

“Store API keys in environment variables, not in source code.”
“Use snapshot tests to keep CLI output stable.”
“Never commit private credentials to a public repository.”

A question about API keys should rank the first and third chunks higher than the second one. Similarity search is the mechanism that makes that possible.

A simple similarity function

In practice, providers often return long vectors. The application then compares vectors with a similarity measure such as cosine similarity.

The following example shows the core idea:

const cosineSimilarity = (a, b) => {
  let dotProduct = 0;
  let normA = 0;
  let normB = 0;

  for (let index = 0; index < a.length; index += 1) {
    dotProduct += a[index] * b[index];
    normA += a[index] * a[index];
    normB += b[index] * b[index];
  }

  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
};

The result is a number that tells us how closely the vectors point in the same direction. Larger values usually mean greater similarity.

A worked ranking example

It is useful to make the ranking step concrete. Suppose that the application stores the following three chunks:

security-guidelines.md#1: “Store API keys in environment variables, not in source code.”
testing-notes.md#2: “Use snapshot tests to keep CLI output stable.”
security-guidelines.md#4: “Never commit private credentials to a public repository.”

Now suppose that the user asks:

Where should I keep API keys for this tool?

After the application embeds the question and compares it with the stored chunk embeddings, it might see scores like the following:

[
  { id: "security-guidelines.md#1", score: 0.91 },
  { id: "security-guidelines.md#4", score: 0.86 },
  { id: "testing-notes.md#2", score: 0.22 },
]

The exact numbers do not matter. What matters is the ranking. The first and third chunks are both about secret handling, while the testing chunk is not. In a real application, the retriever would pass the top chunks into the prompt and leave out the unrelated one.

This also shows why retrieval debugging needs more than one layer of inspection. If the system answers badly, the problem might come from the prompt, but it might also come from the ranking step. A good engineering workflow therefore records which chunks were selected and with what scores.

Loading Exercise...

Embeddings are usually attached to chunks

Real documents are often too long to retrieve as one unit. Instead, the system splits them into smaller chunks such as paragraphs or short sections. This way, retrieval becomes more precise, and the application can cite the exact chunks it used.

Chunking also changes what the system can retrieve well. Chunks that are too large may mix unrelated topics. Chunks that are too small may lose the surrounding context needed to interpret the text.

Loading Exercise...

Similarity is not the same as truth

An embedding search can find text that is semantically close to the question, but that does not guarantee that the retrieved text is correct, complete, or current. Retrieval improves the chances of grounding an answer. It does not remove the need for evaluation.

What this means for engineering

From a software engineering perspective, embeddings introduce a new data representation into the application:

raw text is still needed for display and prompting,
vectors are needed for retrieval,
and chunk identifiers help trace which passages influenced the answer.

Those pieces work together. If the application cannot trace retrieved chunks back to readable text and document sources, debugging becomes much harder.

In practical terms, embeddings add another small subsystem to the application. Someone has to decide when embeddings are created, where they are stored, how the corresponding raw text is tracked, and how retrieval results are inspected later. That makes embeddings both a machine-learning concept and a software-engineering concept.

Loading Exercise...

← Overview

Retrieval-Augmented Generation →