You Can't Save a Web Page Through an LLM

I built sanqian-browser to do two boring things: save what I'm reading, and fill the forms I'm tired of filling. The save part looked like an afternoon of work. It mostly was -- until it broke, and the fix turned out to be more interesting than the feature.

First, the shape of the thing. sanqian-browser is a browser extension with no brain of its own. The agent lives in sanqian, the desktop app; the extension just exposes tools -- read the page, click things, and, because the desktop app sits behind it, write to a folder on disk. So saving a page was one more tool:

web_file_write(path, content)

The agent reads the page, hands the content to web_file_write, done. It worked on a tweet, on a blog post, on every ordinary article I pointed it at -- until the day I asked it to save a sprawling Wikipedia entry, body and infoboxes and three hundred references all at once, and it declined.

Why it breaks

It isn't the context window that breaks this. Everyone budgets for that now -- the million-token input. The limit is on the other side: a model can generate far less than it can read, and that ceiling has risen far more slowly than the context window has. web_file_write(path, content) puts the whole file on the output side. To save a 48KB article, the model has to emit those 48KB as the content argument -- typing the entire page back out, byte for byte, having just read it in.

What I had built, it turned out, was a very expensive photocopier. Read in (input tokens, fine), reproduce out (output tokens, fatal). Past a certain length the page doesn't fit through the model's output at all, and the call truncates or gives up. On a tweet you never notice. On something the size of that Wikipedia page, you notice every time.

The input side has a quieter version of the same problem. To be read at all, the whole article has to sit in the context window -- and a long one crowds out everything else the model needs room for. That's why the read tools truncate by default: ask for a long page and you get the first few thousand characters, the rest dropped. So the model can't reliably get the whole article in, never mind back out. The page is too big for the model in both directions -- which is the hint that the model is the wrong thing to be moving it.

The fix that wasn't

First instinct: chunk it. Save the first 10KB, append the next, and so on -- web_file_write even has an append flag.

It doesn't help, and the reason is the whole point. Every chunk still has to come out of the model. You haven't removed the retyping; you've spread it across turns and added bookkeeping on top -- which byte did I stop at, don't drop the boundary, don't paper over a gap with "...". Models are bad at exactly that. I wrote three increasingly elaborate prompt rules to make chunked saving reliable before I noticed I was negotiating with the wrong part of the system. The content shouldn't be chunked through the model. It shouldn't go through the model at all.

Send the code to the data

So I turned it around. Instead of pulling the page up into the model and having it hand the bytes to a write tool, I had it push a small script down into the page -- where the bytes already are -- and let the script do the extraction and hand off a finished file.

The model writes that script; web_execute_js runs it in the page:

const article = new Readability(document.cloneNode(true)).parse();
const markdown = new TurndownService().turndown(article.content);
return {
  __save: {
    path: `articles/${article.title.replace(/[^a-zA-Z0-9]/g, '-').toLowerCase()}.md`,
    content: `# ${article.title}\n\n> ${location.href}\n\n${markdown}`,
  },
};

The code is capped at 10KB, so what the model produces is a recipe, not a payload. Readability, TurndownService, and DOMPurify are already loaded in the page, so the extraction runs where the content lives -- and the full text never has to come up into the model's context to get saved, which sidesteps the read-side problem too. The script doesn't write anything itself; page JavaScript can't touch the disk. It just returns an object with a __save key.

That return value lands back in the extension, and before it reaches the model, the handler catches the key:

// bypass the LLM for large content
if (result && typeof result === 'object' && '__save' in result) {
  const { path, content } = result.__save;
  const saved = await saveFileViaNativeMessaging({ path, content, baseDir });
  return { success: true, saved: { path: saved.fullPath, size: saved.size } };
}

__save isn't a function -- it's just a key the handler looks for. The extension can't write files either; it's sandboxed like the page. So it ships the bytes over Native Messaging to com.sanqian.native, a small Go process installed alongside sanqian that writes the file and jails the path so a stray ../ can't climb out. What comes back to the model is this:

{ "success": true, "saved": { "path": "articles/foo.md", "size": "48.5 KB" } }

A dozen tokens. The model named the file, shaped the markdown, decided where it went -- and never saw a byte of the body leave the page.

Two pipes

The bytes don't even travel on the same wire as the agent. Tool calls ride a WebSocket between the desktop app and the extension; the file content rides Native Messaging between the extension and the Go host. The article goes page → extension → Go host → disk, and the model's side of the system never carries any of it.

The chunking comes back here, too -- just not in the model. Down in that plumbing a message size limit forces big content to be split and put back together, and there it's a few dumb lines that never miss. What I spent three prompt rules failing to teach the model, an indifferent loop does for free. Putting bytes back in order takes no judgment -- and judgment is the only thing the model has that the rest of the system doesn't.

What it was really about

None of this is specific to saving pages. (Form-filling, the other thing I set out to build, trips over the same instinct in reverse -- but that's another post.) The instinct that failed is the same one every time: you describe a task the way a person would do it, then build the agent into the person's seat, holding whatever the task touches. It demos fine -- demo inputs are small -- and breaks on real ones.

The model is good at deciding what should happen. It's a bad place to put anything that's only passing through. When the data already lives somewhere the runtime can reach -- a page, a file, a response -- what the model emits should be the instruction for moving it, never the data itself.