Don't Send the Web Page Through the Model

I built sanqian-browser to do two practical things: save what I'm reading, and fill the forms I'm tired of filling. The save part looked like an afternoon of work. It mostly was, until it broke. The fix turned out to be more interesting than the feature.

First, the shape of the system. sanqian-browser is a browser extension; it does not decide anything by itself. The agent lives in sanqian, the desktop app. The extension exposes tools: read the page, click things, and, because the desktop app sits behind it, write to a folder on disk. So saving a page looked like one more tool:

web_file_write(path, content)

The agent reads the page, hands the content to web_file_write, done. It worked on a tweet, on a blog post, on every ordinary article I pointed it at -- until the day I asked it to save a sprawling Wikipedia entry, body and infoboxes and three hundred references all at once, and it failed.

Why it breaks

The context window is not the first limit. Everyone already thinks about input budget; million-token input is no longer surprising. The tighter limit is on the other side: a model can generate far less than it can read, and output limits have grown much more slowly than context windows. web_file_write(path, content) puts the whole file on the output side. To save a 48KB article, the model has to emit those 48KB as the content argument -- typing the entire page back out, byte for byte, after it has just read it.

What I had built, it turned out, was a very expensive photocopier. Reading the page costs input tokens, which is usually fine. Reproducing the page costs output tokens, which is where it fails. Past a certain length the page does not fit through the model's output at all, and the call truncates or gives up. On a tweet you never notice. On something the size of that Wikipedia page, you notice every time.

The input side has a quieter version of the same problem. To be read at all, the whole article has to sit in the context window, and a long one crowds out everything else the model needs room for. That is why the read tools truncate by default: ask for a long page and you get the first few thousand characters, with the rest dropped. So the model cannot reliably get the whole article in, never mind back out. The page is too big for the model in both directions, which is the hint that the model is the wrong thing to move it.

The fix that wasn't

First instinct: chunk it. Save the first 10KB, append the next, and so on -- web_file_write even has an append flag.

It does not help, and the reason is the point of the whole bug. Every chunk still has to come out of the model. You have not removed the retyping; you have spread it across turns and added bookkeeping on top: which byte did I stop at, do not drop the boundary, do not cover a gap with "...". Models are bad at exactly that. I wrote three increasingly elaborate prompt rules to make chunked saving reliable before I noticed the responsibility was in the wrong layer. The content should not be chunked through the model. It should not go through the model at all.

Send the code to the data

So I turned it around. Instead of pulling the page into the model and asking the model to hand bytes to a write tool, I had it send a small script into the page, where the bytes already are. The script does the extraction there and hands off a finished file.

The model writes that script; web_execute_js runs it in the page:

const article = new Readability(document.cloneNode(true)).parse();
const markdown = new TurndownService().turndown(article.content);
return {
  __save: {
    path: `articles/${article.title.replace(/[^a-zA-Z0-9]/g, '-').toLowerCase()}.md`,
    content: `# ${article.title}\n\n> ${location.href}\n\n${markdown}`,
  },
};

The code is capped at 10KB, so what the model produces is a recipe, not a payload. Readability, TurndownService, and DOMPurify are already loaded in the page, so extraction runs where the content lives. The full text never has to enter the model's context to get saved, which sidesteps the read-side problem too. The script does not write anything itself; page JavaScript cannot touch the disk. It just returns an object with a __save key.

That return value lands back in the extension, and before it reaches the model, the handler catches the key:

// bypass the LLM for large content
if (result && typeof result === 'object' && '__save' in result) {
  const { path, content } = result.__save;
  const saved = await saveFileViaNativeMessaging({ path, content, baseDir });
  return { success: true, saved: { path: saved.fullPath, size: saved.size } };
}

__save is not a function; it is just a key the handler looks for. The extension cannot write files either; it is sandboxed like the page. So it ships the bytes over Native Messaging to com.sanqian.native, a small Go process installed alongside sanqian. That process writes the file and confines the path so a stray ../ cannot climb out. What comes back to the model is this:

{ "success": true, "saved": { "path": "articles/foo.md", "size": "48.5 KB" } }

A dozen tokens. The model named the file, shaped the markdown, and decided where it went, but it never had to carry the page body.

Two pipes

The bytes do not even travel on the same wire as the agent. Tool calls ride a WebSocket between the desktop app and the extension; file content rides Native Messaging between the extension and the Go host. The article goes page -> extension -> Go host -> disk, and the model's side of the system never carries any of it.

The chunking comes back here too, just not in the model. Down in that plumbing, a message size limit still forces big content to be split and put back together. But there it is deterministic code: a small loop that tracks offsets and reassembles bytes. What I failed to make reliable with prompt rules becomes routine when it lives in the right layer. Putting bytes back in order takes no judgment, and judgment is the thing the model has that the rest of the system does not.

What it was really about

None of this is specific to saving pages. Form-filling, the other thing I set out to build, trips over the same instinct in reverse, but that is another post. The failing instinct is common: describe a task the way a person would do it, then put the agent in the person's seat, holding whatever the task touches. It demos fine because demo inputs are small. It breaks on real ones.

The model is good at deciding what should happen. It is a bad place to put data that is only passing through. When the data already lives somewhere the runtime can reach -- a page, a file, a response -- what the model emits should be the instruction for moving it, not the data itself.