03. Tool Design Fundamentals

Tools are the agent's only channel to the outside world. Design tools badly and the model learns the wrong behavior; make tool output unstable and the loop becomes hard to test; grant tools too much authority and the security boundary becomes decoration. In a usable coding agent, the tool layer has to serve the model, the user interface, the log, and the security policy all at once.

A tool is not a function wrapper

Exposing a local function to the model as-is usually fails. An ordinary function's parameters are written for programmers, and so are its error messages; an agent tool's parameters and errors are consumed by the model. Models are bad at inferring "what to change next" from a stack trace, but very good at correcting a call based on structured, explicit, concise feedback.

A tool definition should answer four questions:

When to use this tool.
How the parameters are expressed.
What the output will contain, and how it may be truncated.
How the model can correct itself when the call fails.

For example, do not describe grep as just "searches text." A better description: search the workspace when you do not know where a file lives; make queries as specific as possible; at most the first 50 results are returned; if there are too many results, narrow the keywords or restrict the directory.

Argument validation

Model output is unknown. Even if the provider claims to return arguments that conform to the schema, the runtime must validate them again — streamed tool args can be truncated, the model may add extra fields, and a restored old session may carry arguments shaped by an old schema.

The teaching project can start with hand-written validation:

type ReadInput = {
  path: string;
};

function parseReadInput(value: unknown): { ok: true; input: ReadInput } | { ok: false; message: string } {
  if (typeof value !== "object" || value === null) {
    return { ok: false, message: "Expected an object with a path field." };
  }
  const record = value as Record<string, unknown>;
  if (typeof record.path !== "string" || record.path.length === 0) {
    return { ok: false, message: "Expected path to be a non-empty string." };
  }
  return { ok: true, input: { path: record.path } };
}

A production system can use JSON Schema, Zod, or Valibot, but the principle is the same: a validation failure must produce a tool result, not crash the runtime. Error messages must be actionable — "path must be a relative path," not "validation failed."

Write output for the model

Tool output has two audiences: the model and humans. The model needs text that is concise, stable, and something it can keep reasoning from. Humans may need the full diff, command exit codes, elapsed time, the truncation policy, and expandable details. Do not cram every structure the human UI needs into the tool result text.

Have each tool return two layers of results:

type ToolResult<Details> = {
  message: ToolResultMessage;
  details: Details;
};

message goes into the LLM context; details goes into the event stream, the log, or the UI. The edit tool can tell the model "replacement succeeded, 2 lines changed" while handing the UI a structured diff. That saves tokens without sacrificing observability.

The read-only toolbox

Before giving the agent write access, implement a read-only toolbox first:

read: read a text file, with support for line ranges and output truncation.
ls: list a directory, distinguishing files, directories, symlinks, and hidden entries.
grep: search text, cap the number of results, return matching lines and paths.
find: find files by name, with limits on traversal scope and result count.

The goal of the read-only tools is not feature completeness; it is to instill in the model the habit of "observe before acting." The system prompt should also state the requirements explicitly: read the target file before editing it; search first when you do not know the path; never guess file contents.

Watching it run

A good read result should look like this:

Read src/config.ts lines 1-42.
Output was truncated after 200 lines. Request a narrower range if needed.

1 export type Config = {
2   model: string;
3   maxTurns: number;
...

It gives the model facts, boundaries, and a suggested next step all at once. Bad output is either "file too long" or tens of thousands of lines pasted in raw. The former leaves the model nothing to continue with; the latter wastes context.

Production trade-offs

The tool layer needs at least these policies:

Paths must first resolve inside the workspace boundary; .. must not escape it.
Text reads have to handle encodings and binary files.
Long output must be truncated, and the model must be told explicitly that truncation happened.
Command-style tools need timeouts and process-tree cleanup.
File-writing tools must participate in the per-file write queue to avoid parallel overwrites.
Tool results need stable ids so the UI and the log can correlate them.

These policies sound like details, but they decide whether the agent "occasionally demos well" or "can be used in a real repository."

Exercises

Implement the three read-only tools: read, ls, and grep.

Acceptance criteria:

A nonexistent path returns isError: true, and the error includes a correctable suggestion.
Overly long files are truncated, with the truncation policy stated.
Binary files are refused rather than having garbled bytes stuffed into the context.
Every path must resolve inside the workspace.
Use the faux provider to make the model grep first and then read, verifying that the two tool calls chain together.