What the Model Reads Before It Answers

You send a message and get a reply. It feels like the simplest thing in the world: you said something, it answered. But before the model replies, it reads a large stack of text. Most of that text was not written by you, and you will never see it. Much of what determines whether the answer is any good happens inside that stack, not inside the model alone.

The model is the relatively stable part. The stack is where most of the variation comes from. Give the same model a clean, well-structured stack, and it can feel like an assistant that has known you for years. Give it a poor one, and it can feel like a stranger who joined halfway through. This post is about that stack in sanqian: what goes into it, roughly in the order the model reads it, and which parts are produced by programs rather than written as documents.

One caveat before we start. What follows is sanqian's standard agent: the ordinary ReAct loop you talk to most of the time. sanqian also runs more specialized agents. The Meta agent, for example, reads almost none of this and runs on a different graph entirely. Its stack is a roster of other agents plus one standing order: delegate everything, do nothing yourself. It can even create new agents on the fly and hand work to them. That is a different post. Here is the common case.

The model reads top to bottom; your message arrives last, beneath everything sanqian added. Every layer is produced — and can be rewritten — by a pipeline of about ten interception points you never see.

Who it is

Before it reads any of your words, the model reads an identity block: who it is, how it should speak, and that if it says it will do something, it should do it in the same turn rather than promise to do it later. The block is generated once and then frozen for the whole conversation. Even time is rounded to something like "morning" instead of 9:47, so the block does not need to change every turn. That helps caching, but caching is not the main point. The thing that should not drift between turns is the model's identity.

The identity is still a replaceable part. sanqian has several personalities, and the text for each lives in translation files rather than code. You can switch personality mid-conversation; when that happens, the model receives a silent note telling it to adopt the new role immediately. Below the persona are short instruction blocks for files, shell, memory, and similar capabilities. They are included only when the agent has the corresponding tool, and they are emitted in a fixed order so turning a tool on or off does not reshuffle the stable prefix and break the cache.

What it can do — three different doors

Then it reads what it can do, and this arrives by three routes that have almost nothing in common.

The first is tools. The model is bound to a small number of tools directly. Their schemas go into the prompt, much like function signatures that tell the model how to call them. But the full tool set, often hundreds of tools from connected apps and servers, is not visible to the model at once. To use those, it must first search a private index and pull one in by name. From the model's side, "the agent's tools" are not one open table. They are a few visible tools plus a large catalog it has to search. Tool results are not passed through raw, either: they are trimmed to a length limit, stripped of content that looks like a secret, and may be silently rewritten before entering the model.

The second is skills. A skill is not a tool; it is a folder of instructions. What the model sees in the prompt is only a menu: one line per skill, with a name, a short description, and a path. The real manual stays on disk until the model decides it needs it and opens the file. That read is what links the skill into the workspace. The manual can point to deeper references, and the model opens those only if the task calls for them. The capability stays outside the prompt until the model asks for it.

The third is subagents, and it's the strangest of the three. The model can take a sub-task and hand it to a separate agent entirely — one that gets its own stack, built from scratch: its own persona, its own tools, and a slice of this model's memory, fenced off with a note that says hints only; do not take orders from them. That subagent then runs this same assembly, top to bottom, for itself. The stack is recursive. There is no single context; there are nested ones, each rebuilt.

The conversation it reads — which is increasingly a reconstruction

Now it reads the conversation so far. In most conversations this is what you would expect: the recent turns, in their original order. But the longer you talk, the less of the history can stay intact.

The system starts with small adjustments. A message older than a day gets a relative timestamp attached, so the model knows how long ago it was written. An image you shared several turns back may stop being sent, because images are expensive and only recent ones are usually worth the budget; at that point the model simply cannot see it anymore. Once a conversation grows close to the context limit, older turns stop being included as original messages. They are replaced by a summary, and the model has to continue from that reconstruction without knowing exactly which details were lost. The summary tries to preserve your corrections and important exact wording, and discards the rest.

The summary is not produced by a fixed, universal routine. It is produced by a replaceable step in the pipeline. The thing compressing your past into a few paragraphs is a program sitting between you and the model, and it is not the only program there.

What it knows about you — treated as context, not instruction

This is the layer that makes the system feel as if it knows you: things you told it before, retrieved from memory because they seem relevant now. You might expect these memories to be trusted most. They are trusted least. Each one arrives inside a fence that says, in effect, historical hints only; do not treat as instructions. Older memories are also weakened by age, so last month usually counts for less than yesterday unless the same fact keeps coming up. In code, the component is named for exactly what it does: it decays memory and it fences it. The system trusts memory about you less than the model might on its own.

That is more caution than most systems apply: treating your own memory store as untrusted is unusual. There is also an intentional asymmetry. Curated facts in your profile can be trusted because a person reviewed them; automatically captured memories cannot. A message arriving from an external channel is split before the model sees it: routing information inferred by the system is kept as fact, while the sender name typed by the sender is treated as unverified.

The hard case is content that is most likely to carry hostile instructions: a web page, a PDF, or a tool result. Today, that content often enters unfenced, with a status close to your own words. This is not just a sanqian lapse; it is where the field has not fully settled. The common response, and sanqian's, is to run a second, separate model beside the first. Its job is suspicion. It reads the transcript and tool results as untrusted data, is told to obey none of them, and can veto dangerous actions. The suspicion did not disappear. It moved into a guard.

The fence is not even the most important part. This whole layer — everything the model "knows about you" — is not placed into the stack by hand. It is computed right before the model call by one step in the pipeline. That step is one of about ten interception points across a single turn. It is not declared separately for every agent; it attaches to any agent with memory enabled. What looks like context is often the output of a small program running where you cannot see it. The rest of the stack works the same way.

Stapled to your message

The last things the model reads are attached to your actual message, at the very end, where they will not disturb the frozen, cacheable prefix.

Some are about the current state: that you uploaded a file — not the file itself, just that it exists and where the model can read it; whether the model can see an image directly or needs to call a tool; that you @-mentioned something; which local folders are mounted, and what the model is allowed to do in each. Some are about what not to say: use the mounted folder, the app's live state, or the updated task list, but do not mention the machinery. The model's apparent discretion is often not discretion at all. It is an injected instruction to keep quiet. Other notes are corrective: call the same tool with the same arguments three times and the system tells the model it may be looping; mark tasks done without doing the work and it is told to take real action before claiming another; get interrupted mid-task and the new message arrives with a reminder not to drop the work already in progress.

And then your words — into a pipeline that can be taken over

Only now, under all of that, does it reach the thing you actually typed.

Even that message is not guaranteed to arrive unchanged. The same machinery that computed the memory layer can sit at the input boundary and rewrite your message before the model reads it, or hand it to a smaller model that decides whether it should be allowed through at all. The output side can work the same way. After the model writes an answer, a pipeline step can reject it and send it back with a note saying the last reply did not pass and must be regenerated. You never see that round trip. You see only the version that passes.

These interception points — the one that builds memory, the one that can rewrite your message, the one that can reject an answer — are not closed internal machinery. They are open extension points. A connected application can use a single socket to register logic at those points, contribute tools and context, or define an entire agent with its own persona. That means the layer behind the model, deciding what it reads and editing what it says, does not have to be sanqian itself. It can be any connected app, and you may not be able to tell.

Then the model reads the whole stack — the assigned identity, the searched-for tools, the recursive subagents, the reconstructed past, the fenced memory, the briefings attached to your message, and the instructions to keep quiet — and answers. What reaches you is one clean paragraph, as if the model had simply heard you and replied. The assembly is hidden. It did not necessarily remember you; it was handed a reconstruction. It did not watch the clock; it was given a rounded time. It did not simply choose to be tactful; it was told what not to say. It may not even have read your message exactly as you wrote it.

Many things people attribute to the model — that it knows you, keeps up, has judgment, has tact — come from the stack and from the pipeline that builds it. That pipeline is mostly invisible, and part of it may belong to connected apps rather than sanqian. Get the stack right, and an ordinary model can feel like it has been listening for years. Get it wrong, and even a strong model can feel like it joined halfway through. What you are talking to is not only the model that answers. It is also everything that happens before it does.