Testing, Review, and Documentation with AI
Learning Objectives
- You understand where AI assistance can be useful in testing, review, and documentation.
- You know common weaknesses of AI-generated tests and review comments.
- You can use AI support while still keeping verification under human control.
AI-assisted testing
Large language models are often good at proposing test cases quickly. This is especially helpful when you want help enumerating successful paths, missing-input cases, malformed-input cases, and output-format edge cases.
However, AI-generated tests are often weaker than they first appear. They may test only the happy path, repeat the implementation logic instead of checking behavior, or use vague assertions that would pass even if the program were wrong.
For example, a weak test might only check that “some output exists”. A stronger test checks the exact value or exact structure of the output.
Consider the function:
const isPassing = (grade) => grade >= 90;
A very weak test would be:
assert(typeof isPassing(80) === "boolean");
A stronger test is:
assertEquals(isPassing(90), true);
assertEquals(isPassing(89), false);
The stronger test says what the function should actually do.
When a test only checks that something happened, it leaves too much room for incorrect behavior. Good tests reduce ambiguity by making expectations explicit.
A useful testing prompt therefore needs to say what kind of tests are wanted. For example:
Read the specification below and propose concrete test cases.
Include successful cases, invalid-input cases, and boundary cases.
State the expected behavior for each test.
Avoid vague checks such as "some output exists".
This kind of prompt encourages the model to connect its suggestions to the specification instead of producing generic testing advice.
At the same time, the engineer still has to decide whether the proposed tests are good enough to keep. A test that looks technical can still be too weak to protect the real behavior.
Code review support
AI tools can also help review code by pointing out possible issues such as:
- missing validation,
- duplicated logic,
- unclear names,
- or error cases that seem unhandled.
This can be useful as a second pair of eyes. But review suggestions still need review themselves. A model may claim that something is a bug when it is actually an intentional design choice, or it may miss the most important problem entirely.
Early security-focused research on LLMs — at least the early ones — reinforces the same point. See Do Users Write More Insecure Code with AI Assistants?, the later empirical study Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study, and the mitigation-oriented paper Large Language Models for Code: Security Hardening and Adversarial Testing.
Documentation support
Documentation is another area where AI assistance can be helpful. It can:
- summarize a function,
- draft a README section,
- explain command-line flags,
- or propose a changelog entry.
Still, documentation should reflect the actual behavior of the program, not only what the code appears to do at a quick glance. If the implementation changed during debugging or refactoring, old descriptions may become inaccurate.
Early examples of using LLMs for helping with documentation included also tasks like automatically creating Git commit messages.
Documentation should name boundaries and assumptions
Documentation is especially useful when it makes the boundaries of a program clear. For instance, a command-line tool may need to document required environment variables, expected input files, output formats, or cases where the program deliberately rejects weak input.
These details matter because later LLM-powered programs often depend on small explicit contracts. If a module expects JSON with two keys, the documentation should say so. If a CLI tool needs --allow-read and --allow-env, the usage notes should say so. AI can help draft that text, but the engineer still has to decide whether the draft matches the actual behavior.
The same review habit matters for documentation too. A generated README section may look polished while still omitting required permissions, environment variables, or cases where a feature is optional rather than automatic.