Knowledge, Retrieval, and Evaluation

Framework Variant: A Small RAG Pipeline with LangChainJS


Learning Objectives

  • You can read a small retrieval pipeline built with a text splitter, an embeddings model, and an explicit ranking step.
  • You understand why the LangChainJS version is still more provider-specific than the generic main-course example.

This chapter revisits the RAG tutorial from this part. The main tutorial builds the pipeline explicitly so that chunking, embedding requests, retrieval, and answer generation remain visible as ordinary application code. This chapter shows how a framework can package some of those steps into reusable components.

Installing the packages

A compact Deno setup for this LangChainJS-based RAG example is:

$ deno add npm:@langchain/core@1.1.32 npm:@langchain/openai@1.2.13 npm:@langchain/textsplitters@1.0.1

If you’ve worked on the previous LangChainJS examples, you can just add the textsplitters package since the core and openai packages are already in place.

This adds the dependencies to deno.json, after which the code can use the shorthand import names. This example uses OpenAIEmbeddings and ChatOpenAI because they are well documented in LangChainJS. If your project standardizes on another provider, LangChainJS offers other integrations as well, but those integrations remain provider-specific.

This framework variant uses the same course-level environment variable structure where possible. The provider integration is specific, but the surrounding application can still treat configuration in a familiar way.

Splitting and indexing documents

The following example uses the recommended recursive character splitter together with the embedding client. The overall structure stays close to the earlier implementation while the framework simplifies some of the moving parts.

// src/langchainIndex.js
import { OpenAIEmbeddings } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";

const buildIndex = async (markdownFiles) => {
  const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 500,
    chunkOverlap: 50,
  });

  const chunks = [];

  for (const file of markdownFiles) {
    const splitTexts = await splitter.splitText(file.text);
    for (const [index, text] of splitTexts.entries()) {
      chunks.push({
        id: `${file.source}#${index + 1}`,
        source: file.source,
        text,
      });
    }
  }

  const embeddings = new OpenAIEmbeddings({
    apiKey: Deno.env.get("LLM_API_KEY"),
    model: Deno.env.get("LLM_EMBEDDING_MODEL") ?? "text-embedding-3-small",
  });

  const vectors = await embeddings.embedDocuments(
    chunks.map((chunk) => chunk.text),
  );

  return chunks.map((chunk, index) => ({
    ...chunk,
    embedding: vectors[index],
  }));
};

export { buildIndex };

This code still leaves the final index visible as ordinary application data. The outer file-reading loop and the final JSON index format can stay the same as in the tutorial; the main difference is that the splitter and embedding client now come from LangChainJS.

Asking a question

Once the index exists, the question side of the program can use LangChainJS for embeddings and the chat model while keeping the ranking logic explicit:

// src/langchainAsk.js
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { parseStructuredAnswer } from "./structuredAnswer.js";

const cosineSimilarity = (a, b) => {
  let dotProduct = 0;
  let normA = 0;
  let normB = 0;

  for (let index = 0; index < a.length; index += 1) {
    dotProduct += a[index] * b[index];
    normA += a[index] * a[index];
    normB += b[index] * b[index];
  }

  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
};

const retrieveTopChunks = ({ questionEmbedding, index, topK = 3 }) => {
  return index
    .map((chunk) => ({
      ...chunk,
      score: cosineSimilarity(questionEmbedding, chunk.embedding),
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
};

const askQuestion = async ({ index, question }) => {
  const embeddings = new OpenAIEmbeddings({
    apiKey: Deno.env.get("LLM_API_KEY"),
    model: Deno.env.get("LLM_EMBEDDING_MODEL") ?? "text-embedding-3-small",
  });
  const questionEmbedding = await embeddings.embedQuery(question);
  const chunks = retrieveTopChunks({ questionEmbedding, index });
  const context = chunks
    .map((chunk) =>
      `(${chunk.id}) [${chunk.source}] ${chunk.text}`
    )
    .join("\n\n");

  const model = new ChatOpenAI({
    apiKey: Deno.env.get("LLM_API_KEY"),
    model: Deno.env.get("LLM_CHAT_MODEL") ?? "gpt-5-nano-2025-08-07",
    useResponsesApi: true,
    reasoning: { effort: "low" },
  });

  const response = await model.invoke([
    {
      role: "system",
      content:
        "Answer only from the supplied context. Return JSON with keys answer and citations. Citations must be an array of chunk ids. If the context is insufficient, say so clearly.",
    },
    {
      role: "user",
      content: `Question: ${question}\n\nContext:\n${context}`,
    },
  ]);

  const content = response.text;

  return parseStructuredAnswer(content);
};

export { askQuestion };

The answer-generation step still looks familiar. The main difference is that chunking and embeddings are now expressed through LangChainJS interfaces rather than through hand-written helper functions. The invocation still takes ordinary message objects, and the returned answer text can be read from response.text before the JSON is parsed.

The retrieval function can stay explicit, the answer is still parsed through parseStructuredAnswer, and the later evaluation logic can keep the same expectations as before.

LangChainJS also offers higher-level retriever and vector-store abstractions. In this framework variant, we deliberately keep the ranking step explicit so that the bridge to the main tutorial stays easy to follow.

Loading Exercise...

What changed compared with the earlier RAG tutorial

The most important difference is where the implementation effort moves.

In the explicit tutorial, we wrote the chunking logic, embedding calls, and retrieval support directly as ordinary helper functions. In this framework variant, the splitter and embedding client come from LangChainJS, so the code becomes shorter and more uniform. The application still needs to decide:

  • what documents are indexed,
  • how chunks are identified,
  • how many chunks are retrieved,
  • how the prompt is assembled,
  • and how the answer is evaluated afterward.

This shows what a framework actually buys. It can reduce boilerplate, but it does not remove the need for retrieval design.

A short run

If this framework variant is wrapped into the same index / ask / eval shell as the main tutorial, the terminal experience can stay very close to the original:

$ export LLM_API_KEY="your-api-key"
$ export LLM_CHAT_MODEL="gpt-5-nano-2025-08-07"
$ export LLM_EMBEDDING_MODEL="text-embedding-3-small"
$ deno run --allow-read --allow-env main.js ask "Where should API keys be stored?"

Store API keys in environment variables, not in source code.
Loading Exercise...