Programming Foundations for LLM Applications

Tutorial: Text Processing Pipeline


Learning Objectives

  • You can structure a small CLI project into multiple modules.
  • You can combine file reading, text analysis, and reporting in one program.

In this tutorial, we build a small command-line program that reads text files from a folder, analyzes them, and prints a summary report.

The project structure is as follows:

text-analysis/
├── data/
│   └── hello.txt
├── src/
│   ├── analyzer.js
│   ├── analyzer_test.js
│   ├── processor.js
│   ├── report.js
│   └── utils/
│       └── fileUtils.js
├── deno.json
└── main.js

Step 1: Add a dependency

Create deno.json and add the path dependency and the test dependency there. In practice, you could add them with deno add, but here we write the file directly so the structure is visible:

{
  "imports": {
    "@std/path": "jsr:@std/path@1.0.8",
    "@std/assert": "jsr:@std/assert@1.0.15"
  }
}

Assume that data/hello.txt contains the text hello world.

Step 2: Read files

The file utility module lists files and loads their contents.

// src/utils/fileUtils.js
const listFiles = async (directory) => {
  const files = [];
  for await (const entry of Deno.readDir(directory)) {
    if (entry.isFile) {
      files.push(`${directory}/${entry.name}`);
    }
  }
  return files;
};

const readTextFiles = async (filepaths) => {
  return await Promise.all(
    filepaths.map(async (filepath) => ({
      filepath,
      content: await Deno.readTextFile(filepath),
    })),
  );
};

export { listFiles, readTextFiles };

Step 3: Analyze text

The analyzer focuses only on the logic of turning text into counts.

// src/analyzer.js
const analyzeText = (text) => {
  const words = text
    .toLowerCase()
    .split(/\W+/)
    .filter((word) => word.length > 0);

  const counts = new Map();
  for (const word of words) {
    counts.set(word, (counts.get(word) ?? 0) + 1);
  }

  const topWords = [...counts.entries()]
    .sort((a, b) => b[1] - a[1])
    .slice(0, 5);

  return {
    totalWords: words.length,
    uniqueWords: counts.size,
    topWords,
  };
};

export { analyzeText };

Step 4: Process documents

Now we combine file metadata and the analysis results.

// src/processor.js
import * as path from "@std/path";
import { analyzeText } from "./analyzer.js";

const processDocument = (file) => {
  const analysis = analyzeText(file.content);
  return {
    filename: path.basename(file.filepath),
    filepath: file.filepath,
    ...analysis,
  };
};

const processDocuments = async (files) => {
  return await Promise.all(files.map((file) => processDocument(file)));
};

const generateReport = (documents) => {
  const totalDocuments = documents.length;
  const totalWords = documents.reduce((sum, doc) => sum + doc.totalWords, 0);
  const averageWordsPerDoc = totalDocuments === 0 ? 0 : totalWords / totalDocuments;

  return {
    totalDocuments,
    totalWords,
    averageWordsPerDoc,
    documents,
  };
};

export { generateReport, processDocument, processDocuments };

Step 5: Add a test

Before wiring the whole program together, we can test the analysis logic separately.

// src/analyzer_test.js
import { assertEquals } from "@std/assert";
import { analyzeText } from "./analyzer.js";

Deno.test("analyzeText counts words and unique words", () => {
  assertEquals(analyzeText("Hello hello world"), {
    totalWords: 3,
    uniqueWords: 2,
    topWords: [["hello", 2], ["world", 1]],
  });
});

This keeps the most important logic from becoming a black box. If a later refactoring changes the behavior unexpectedly, the test gives immediate feedback.

Loading Exercise...

Step 6: Format the result

The reporting module turns the structured result into CLI output.

// src/report.js
const formatReport = (report) => {
  const lines = [
    `Documents: ${report.totalDocuments}`,
    `Total words: ${report.totalWords}`,
    `Average words per document: ${report.averageWordsPerDoc.toFixed(1)}`,
    "",
  ];

  for (const document of report.documents) {
    lines.push(`${document.filename}: ${document.totalWords} words`);
  }

  return lines.join("\n");
};

export { formatReport };

Step 7: Put the pieces together

The entry point coordinates the program.

// main.js
import { listFiles, readTextFiles } from "./src/utils/fileUtils.js";
import { generateReport, processDocuments } from "./src/processor.js";
import { formatReport } from "./src/report.js";

const fileList = await listFiles("./data");
const files = await readTextFiles(fileList);
const processedDocs = await processDocuments(files);
const report = generateReport(processedDocs);

console.log(formatReport(report));

First run the tests:

$ deno test
ok | 1 passed | 0 failed

Then run the program with read permission:

$ deno run --allow-read main.js

With the example hello.txt, the output could look like this:

$ deno run --allow-read main.js
Documents: 1
Total words: 2
Average words per document: 2.0

hello.txt: 2 words
Loading Exercise...

Structure matters

This project is small, but it already shows a useful pattern:

  • keep I/O in one place,
  • keep analysis logic in another,
  • keep output formatting separate,
  • and let the entry point coordinate the overall flow.

That same structure will be useful later when the program logic also includes API requests and response validation. Similarly, the test file already establishes the habit that a multi-module project should be runnable and testable, not only readable.

Focused AI help inside an existing project

Once a small project has several files, broad prompts such as “improve this project” often lead to unnecessary rewrites. It is usually better to constrain the request to one module and one change.

For example, if we wanted the analyzer to ignore a short stop-word list, a good prompt would explicitly protect the current structure:

Update only the analyzer module shown below.
Keep the current project structure unchanged.
Add a small stop-word filter before counting words.
Return only the revised module code.

This kind of prompt is useful because it matches how engineers often work in practice. Most changes should be local. The request should therefore be local too.

The programming exercise for this chapter follows the same pipeline shape: multiple modules, one focused unit test to start from, and a CLI entry point that should still run cleanly once the pieces are connected.

Loading Exercise...