A human opens a web page and ignores most of it: navigation, sidebars, repeated links. The brain filters automatically. For an AI agent, none of that is free. Every character that reaches the prompt becomes tokens — processed, paid for, and potentially confusing.

Efficiency in agent tools is not about the model. It is about the interaction.

I built a web browser for AI agents and what I learned from building it reshaped how I design every tool since.

The Problem: Flat Interactions

Most agent tools follow one pattern: call the tool, dump everything into the prompt, let the model sort it out. A web scraper returns the full page. A file reader returns the entire file. A database client returns every column.

This works. It is also wasteful.

When a tool returns more than the agent asked for, the model pays for data it didn’t need and must reason through noise it didn’t ask to see. The cost is not just tokens. It is attention, accuracy, and latency.

Two-Step Interactions

The browser I built uses a two-step interaction pattern instead:

  1. The first call returns structure — what is available: sections, headings, links. No content.
  2. The second call returns content — only what the agent explicitly selects.

The agent decides what it needs before the data crosses into the prompt. The harness fetches and returns only that.

This is not caching. This is not compression. This is a different interaction contract between agent and tool.

The Numbers

Fetching one section of a Wikipedia article through a two-step interaction costs around 196 tokens. Loading the full page into the prompt: around 14,600.

The model did not change. The prompt structure did not change. The interaction changed.

Same information. 74× fewer tokens. The gain is not in the LLM. It is in the decision point.

Push Decisions Into the Harness

The principle generalizes beyond the browser:

Push processing into the harness, reasoning into the LLM.

The harness computes, filters, and structures. The LLM reasons over what remains. The harness decides what to return. The LLM decides what it means.

This is not new as an idea — GraphQL applied the same inversion to API over-fetching a decade ago. But for agent tools, the stakes are different. Over-fetching does not cost bandwidth. It costs reasoning capacity.

What Actually Needs to Reach the LLM?

I carry one question into every new agent tool I build: what actually needs to reach the LLM?

If the tool can decide it, the tool should decide it. If the structure can be explored before the content is fetched, the agent should explore first. If the data can be scoped to the request, scope it before it enters the prompt.

The answer is always the same: change the interaction, not the model.


This is the second post in a series on building AI agent harnesses. Read the first: From HCI to ACI.