Knowledge, Retrieval, and Evaluation

Tutorial: Building and Evaluating a RAG Application

Learning Objectives

You can build a small CLI RAG application over local documents.
You can store an index of chunks and embeddings for later retrieval.
You can evaluate the application with a small dataset of questions and expected citations.

In this tutorial, we build a command-line question-answering application over a local document collection. The application has three commands:

index builds a retrieval index from local Markdown files,
ask "question" answers one question using retrieved chunks,
eval runs a small evaluation set and reports the results.

Unlike the earlier tutorials, this project needs both a chat-style model call and an embeddings call. The retrieval step depends on embeddings, while the final answer still comes from a chat-style completion request.

The project structure is as follows:

docs-rag/
├── data/
│   ├── docs/
│   │   ├── course-policies.md
│   │   ├── llm-usage.md
│   │   └── security-guidelines.md
│   ├── evals.json
│   └── index.json
├── deno.json
├── main.js
└── src/
    ├── chatClient.js
    ├── chunker.js
    ├── config.js
    ├── embeddingsClient.js
    ├── evaluator.js
    ├── indexer.js
    ├── retrieval.js
    └── structuredAnswer.js

Figure 1 summarizes the program flow.

Fig 1. — The application separates indexing, question answering, and evaluation into explicit CLI commands.

Step 1: Define the project configuration

We use Deno and a small import map.

{
  "imports": {
    "@std/path": "jsr:@std/path@1.1.4"
  }
}

The configuration module reads chat and embedding settings from environment variables.

// src/config.js
const getRequired = (name) => {
  const value = Deno.env.get(name);
  if (!value) {
    throw new Error(`Missing required environment variable: ${name}`);
  }
  return value;
};

const loadConfig = () => {
  return {
    chatApiUrl: getRequired("LLM_CHAT_API_URL"),
    embeddingsApiUrl: getRequired("LLM_EMBEDDINGS_API_URL"),
    apiKey: getRequired("LLM_API_KEY"),
    chatModel: getRequired("LLM_CHAT_MODEL"),
    embeddingModel: getRequired("LLM_EMBEDDING_MODEL"),
  };
};

export { loadConfig };

Step 2: Chunk the source documents

We use paragraph-based chunking. This is sufficient for demonstration purposes.

// src/chunker.js
const chunkMarkdown = ({ source, text }) => {
  return text
    .split(/\n\s*\n/)
    .map((chunk) => chunk.trim())
    .filter((chunk) => chunk.length >= 40)
    .map((chunk, index) => ({
      id: `${source}#${index + 1}`,
      source,
      text: chunk,
    }));
};

export { chunkMarkdown };

The chunk identifiers will later be used for citations and evaluation.

Loading Exercise...

Step 3: Create embeddings and build the index

The embeddings are created using a third party API. The embedding client uses raw fetch and returns one vector per input text.

// src/embeddingsClient.js
const createEmbeddings = async ({
  apiUrl,
  apiKey,
  model,
  input,
}) => {
  const response = await fetch(apiUrl, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${apiKey}`,
    },
    body: JSON.stringify({ model, input }),
  });

  if (!response.ok) {
    throw new Error(`Embedding request failed with status ${response.status}`);
  }

  const data = await response.json();
  return data.data.map((item) => item.embedding);
};

export { createEmbeddings };

Now we can read the documents, create chunks, request embeddings, and store the result in data/index.json.

// src/indexer.js
import * as path from "@std/path";
import { chunkMarkdown } from "./chunker.js";
import { createEmbeddings } from "./embeddingsClient.js";

const buildIndex = async ({ docsDirectory, outputPath, config }) => {
  const chunks = [];

  for await (const entry of Deno.readDir(docsDirectory)) {
    if (!entry.isFile || !entry.name.endsWith(".md")) {
      continue;
    }

    const filepath = path.join(docsDirectory, entry.name);
    const text = await Deno.readTextFile(filepath);
    chunks.push(...chunkMarkdown({ source: entry.name, text }));
  }

  const embeddings = await createEmbeddings({
    apiUrl: config.embeddingsApiUrl,
    apiKey: config.apiKey,
    model: config.embeddingModel,
    input: chunks.map((chunk) => chunk.text),
  });

  const index = chunks.map((chunk, index) => ({
    ...chunk,
    embedding: embeddings[index],
  }));

  await Deno.writeTextFile(outputPath, JSON.stringify(index, null, 2));
};

export { buildIndex };

Step 4: Retrieve relevant chunks

Retrieval compares the question embedding with the stored chunk embeddings and returns the top results.

// src/retrieval.js
const cosineSimilarity = (a, b) => {
  let dotProduct = 0;
  let normA = 0;
  let normB = 0;

  for (let index = 0; index < a.length; index += 1) {
    dotProduct += a[index] * b[index];
    normA += a[index] * a[index];
    normB += b[index] * b[index];
  }

  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
};

const retrieveTopChunks = ({ questionEmbedding, index, topK = 3 }) => {
  return index
    .map((chunk) => ({
      ...chunk,
      score: cosineSimilarity(questionEmbedding, chunk.embedding),
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
};

export { retrieveTopChunks };

Step 5: Ask the model for a structured answer

The chat client asks the model to return JSON with an answer and a list of cited chunk identifiers.

// src/chatClient.js
const extractOutputText = (data) => {
  const messageItem = (data.output ?? []).find((item) => item.type === "message");
  const textPart = messageItem?.content?.find((part) =>
    part.type === "output_text"
  );
  return textPart?.text;
};

const requestAnswer = async ({
  apiUrl,
  apiKey,
  model,
  question,
  chunks,
}) => {
  const context = chunks
    .map((chunk) => `(${chunk.id}) ${chunk.text}`)
    .join("\n\n");

  const messages = [
    {
      role: "system",
      content:
        "You answer questions using only the supplied context. Return JSON with keys answer and citations. Citations must be an array of chunk ids. If the context is insufficient, say so clearly.",
    },
    {
      role: "user",
      content: `Question: ${question}\n\nContext:\n${context}`,
    },
  ];

  const response = await fetch(apiUrl, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${apiKey}`,
    },
    body: JSON.stringify({ model, input: messages }),
  });

  if (!response.ok) {
    throw new Error(`Chat request failed with status ${response.status}`);
  }

  const data = await response.json();
  return extractOutputText(data);
};

export { extractOutputText, requestAnswer };

The parser checks that the response has the required shape.

// src/structuredAnswer.js
const parseStructuredAnswer = (text) => {
  const parsed = JSON.parse(text);

  if (
    typeof parsed !== "object" ||
    parsed === null ||
    typeof parsed.answer !== "string" ||
    !Array.isArray(parsed.citations)
  ) {
    throw new Error("Model response did not match the expected JSON shape.");
  }

  return parsed;
};

export { parseStructuredAnswer };

Before turning this into a full fetch call, it is often helpful to inspect the system prompt on its own. A good prompt for this application should make three things explicit at once: groundedness, output structure, and insufficiency handling.

You answer questions using only the supplied context.
Return valid JSON with keys answer and citations.
Citations must be an array of chunk ids.
If the context is insufficient, say so clearly in the answer instead of guessing.

This prompt is short, but it already captures the core behavioral contract of the CLI tool.

Loading Exercise...

Step 6: Add a simple evaluation runner

The evaluation file can contain records such as:

[
  {
    "question": "Where should API keys be stored?",
    "expectedCitations": ["security-guidelines.md#1"],
    "mustContain": ["environment variable"]
  }
]

The evaluator runs each question through the same pipeline and scores the result.

// src/evaluator.js
const scoreAnswer = ({ answer, expectedCitations, mustContain }) => {
  const citationPass = expectedCitations.every((id) =>
    answer.citations.includes(id)
  );
  const contentPass = mustContain.every((text) =>
    answer.answer.toLowerCase().includes(text.toLowerCase())
  );

  return {
    citationPass,
    contentPass,
    passed: citationPass && contentPass,
  };
};

export { scoreAnswer };

Step 7: Wire the CLI commands together

The entry point dispatches between indexing, asking, and evaluation.

// main.js
import { loadConfig } from "./src/config.js";
import { createEmbeddings } from "./src/embeddingsClient.js";
import { scoreAnswer } from "./src/evaluator.js";
import { buildIndex } from "./src/indexer.js";
import { retrieveTopChunks } from "./src/retrieval.js";
import { requestAnswer } from "./src/chatClient.js";
import { parseStructuredAnswer } from "./src/structuredAnswer.js";

const config = loadConfig();
const [command, ...rest] = Deno.args;
const usage = [
  "Usage:",
  '  deno run --allow-read --allow-write --allow-net --allow-env main.js index',
  '  deno run --allow-read --allow-net --allow-env main.js ask "question"',
  '  deno run --allow-read --allow-net --allow-env main.js eval',
].join("\n");

if (command === "index") {
  await buildIndex({
    docsDirectory: "./data/docs",
    outputPath: "./data/index.json",
    config,
  });
  console.log("Index written to ./data/index.json");
} else if (command === "ask") {
  const question = rest.join(" ");
  if (question.trim().length === 0) {
    throw new Error("The ask command requires a question string.");
  }
  const index = JSON.parse(await Deno.readTextFile("./data/index.json"));
  const [questionEmbedding] = await createEmbeddings({
    apiUrl: config.embeddingsApiUrl,
    apiKey: config.apiKey,
    model: config.embeddingModel,
    input: [question],
  });
  const chunks = retrieveTopChunks({ questionEmbedding, index });
  const content = await requestAnswer({
    apiUrl: config.chatApiUrl,
    apiKey: config.apiKey,
    model: config.chatModel,
    question,
    chunks,
  });
  const answer = parseStructuredAnswer(content);
  console.log(JSON.stringify(answer, null, 2));
} else if (command === "eval") {
  const index = JSON.parse(await Deno.readTextFile("./data/index.json"));
  const evalCases = JSON.parse(await Deno.readTextFile("./data/evals.json"));

  for (const evalCase of evalCases) {
    const [questionEmbedding] = await createEmbeddings({
      apiUrl: config.embeddingsApiUrl,
      apiKey: config.apiKey,
      model: config.embeddingModel,
      input: [evalCase.question],
    });
    const chunks = retrieveTopChunks({ questionEmbedding, index });
    const content = await requestAnswer({
      apiUrl: config.chatApiUrl,
      apiKey: config.apiKey,
      model: config.chatModel,
      question: evalCase.question,
      chunks,
    });
    const answer = parseStructuredAnswer(content);
    const score = scoreAnswer({
      answer,
      expectedCitations: evalCase.expectedCitations,
      mustContain: evalCase.mustContain,
    });

    console.log({
      question: evalCase.question,
      passed: score.passed,
      citations: answer.citations,
    });
  }
} else {
  console.log(usage);
}

Run the commands like this:

$ export LLM_CHAT_API_URL="https://api.example.com/v1/responses"
$ export LLM_EMBEDDINGS_API_URL="https://api.example.com/v1/embeddings"
$ export LLM_API_KEY="your-api-key"
$ export LLM_CHAT_MODEL="example-chat-model"
$ export LLM_EMBEDDING_MODEL="example-embedding-model"
$ deno run --allow-read --allow-write --allow-net --allow-env main.js index
$ deno run --allow-read --allow-net --allow-env main.js ask "Where should API keys be stored?"
$ deno run --allow-read --allow-net --allow-env main.js eval

A typical successful run could look like this:

$ deno run --allow-read --allow-write --allow-net --allow-env main.js index
Index written to ./data/index.json

$ deno run --allow-read --allow-net --allow-env main.js ask "Where should API keys be stored?"
{
  "answer": "Store API keys in environment variables, not in source code.",
  "citations": [
    "security-guidelines.md#1"
  ]
}

A useful validation case is a missing question for the ask command:

$ deno run --allow-read --allow-net --allow-env main.js ask
Uncaught Error: The ask command requires a question string.

Loading Exercise...

The next chapter revisits the same RAG application through LangChainJS. Reading the two versions together makes the framework trade-offs easier to see in code.

The programming exercise for this chapter stubs both embedding and chat requests. That makes it possible to test indexing, retrieval, structured parsing, and evaluation locally without spending tokens or depending on external services during grading.

Loading Exercise...

← Evaluation Methods and Datasets

Framework Variant: A Small RAG Pipeline with LangChainJS →