Don’t Panic · A Field Guide to Your AI Sidekick

Don’t Panic

A field guide to your AI sidekick · how to drive it

We trace one thread: what you type, what it reads, and how tools turn text into action. See the thread, and you can drive it.

by Yu Xiao

“Don’t Panic” is printed on the cover of The Hitchhiker’s Guide to the Galaxy. Reportedly very useful.

WHY · why this is worth half an hour

All the scary words — MCP, agents, RAG, skills, harness, hooks — grow from one thread

We follow that thread end to end: how it reads your words, where it fails, how you feed it, and how it acts for you.

Once you see the thread, you can drive the system without opening the box. The jargon gets much smaller.

RECAP · the essence, in five

What you feed in is one long string of textSystem + your words + its replies, joined into one
It guesses the next token from a probability tableso it’ll be confidently wrong sometimes — that’s “hallucination”
It can only read so much at oncelonger means costlier — hence the context-length limit
It has no memory; each turn it re-reads from the toplonger chats mean more text — slower and pricier
It reads the two ends clearly, the middle blursthe start and end land best; put what matters there

that’s it for the mechanics

It reads, then forgets.
So what do you feed it each time?

the list is written

But the things on that list —
did you have them all up front?

RECAP · working with it, in five

Treat it like an intern on day onegoal, current status, your materials, your constraints — spell it all out
Don’t give it what it already knowsskip public common sense; only what it can’t look up
Can’t put it into words? Let it ask you firstadd “ask me anything unclear before you start”
The first draft is usually wrong — that’s normalpaste “what’s wrong” back as-is; a few rounds gets there
Don’t close it once it workshave it write a doc; reuse it next time and skip the rounds

you know the method · but you still did the work

You’ve been its hands and memory.
Now programs take over — we give it a body.

RECAP · the body, in five parts

At its core it’s just a brainonly reads and writes text; every other ability is bolted on outside
Hands = toolsit writes a call; a program outside actually does it (plugged in via MCP)
Memory = injectiondocs, your preferences, honed playbooks — written into its context ahead of time (RAG/Memory/Skill)
Running itself = Agentwrite → execute → write back, round after round, hands-off
The whole body = the harnessthe layer that wires it together; at key points you can hang hooks (block / inject / follow-up)

all the buzzwords, decoded

This whole thread —
what is it really about?

SO · looking back, you had two jobs

Tools, Skills, harness… whatever the words, your job stays the same

You learned to feed it, read it, and feed it again. That is enough.

Feed it enough

Put the right text in its window: your situation, materials, status, and saved Skills.

Check the output

See if it is right. If not, paste what went wrong and try another round.

KNOW IT · the boundary you work with

It remembers the world — just not you

Two habits: say what matters every time, and give it raw material. It forgets you; it is still trained.

Simple boundary: public + common + pre-cutoff = built in; about you / private / recent = you supply.

Side A · no memory

Each message is a first meeting
So say what matters every time

Side B · trained

In a field new to you, it may score 60–70 while you score 0
Do not dismiss a 70 from the position of 0
Give it raw material and let it judge

THE SHIFT · your seat has moved

Operator → Commander

In life, skip the technical detail: say what you want done, then check the result. At work, you do not write every line. You define the goal and review the output.

Before

You researched it and did it yourself. You were the operator.

→

Now

You define the problem, prepare the input, and check the result. AI executes; you command.

Step off the field and onto the bridge. The same hours produce more. That is leverage, not laziness.

MORE · it can help you think

It also helps you think

Ask it to argue against you. That is usually more useful than agreement. It will not soften the critique to protect your feelings.

The breakthrough is when you stop using AI only to do work and start using it to think. We are trapped in our own heads. This is a cheap outside view.

Got a plan

Ask it to find faults, holes, and objections.

Stuck

Use it to brainstorm until another path appears.

Deciding

Ask for trade-offs from angles you missed.

"Here is my plan. Be a harsh critic: find the 3 most fatal holes, and tell me exactly when it would fail."

THE CORE · push it to the bottom

The whole thread reduces to one word

Context

In a new field it may be 60–70 while you are 0 →
so give it raw material and let it judge →
tools / RAG / Memory / Skills all lower the cost of becoming good input →
one word: Context

The core is not learning tools or memorizing prompts. It is making your work legible to AI. Better context gives it more to do.

The industry calls this context engineering. You just learned the useful part.

It compounds. Saved context, workflows, and Skills pay you back every time. Start early; compounding takes time.

ONE LAST THING

Don’t panic. Go try.

It only reads text and writes text — everything else is your call.

Pick one small thing you cannot do yet. Tell it the whole story today. See what comes back. Worst case, the answer is mediocre — and it cannot think you are stupid.

back home

You do not need to understand it. You only need to use it. That is enough.

The chat · what you use every day

System is set once, then User / Assistant take turns

System is the app’s house rules, set once per chat. Then User (you) and Assistant (it) take turns. The turns pile up. Nothing else is in there.

The essence · what it actually does

Everything is joined into one text; it guesses the next word

Remove the boxes and the pieces become one long string. The model reads the string and continues from the end, one word at a time. However fancy the AI looks, this is the job.
So how does it “guess”? Let’s flatten the view.

The essence · through its eyes

Flattened out: one string, new words popping out at the tail

Bubbles and roles are for you. The model sees one long string of tokens. It reads the whole string, then emits the next word at the end. That is the typing effect you see.

How it guesses · ① attention

Before guessing, it looks back at what matters most

Attention is the heart of the transformer. Before picking the next word, it weighs earlier words differently. Here it leans on “To be, or not to be” and the rhythm of the line.

How it guesses · ② probability

It never knows the answer — it scores every candidate

After reading everything, it builds a probability table for the next word: arrows .71 / pains .12 / stings .05… “Guessing” means picking from that table. Because it is always guessing, confident does not mean correct. A hallucination is not a separate bug; it is this table doing its normal job.

How it guesses · ③ append, repeat

Pick one, append it, guess the next the same way

The chosen word is added to the end. Then it repeats the same process: guess, append, guess again. Word by word, the whole reply gets generated. That is the core of an LLM.

The constraint · one “head” first

It reads the honest way: every word × every word

One reader like this is called a “head”. It scores how each word relates to every other word and stores the result as a web: 100 words = 10,000 cells; 10,000 words = 100 million. What you put in is context. The web’s limit is the context length, fixed at build time.

The constraint · that was one head

There are dozens of heads — not split work, but angles

Head 1 does not read the front while head 2 reads the back. Every head reads the whole text, each from its own angle: rhythm, names, tone. Dozens run at once. It is faster, but no less work: dozens of webs, all computed. (The picture uses 64.)

The constraint · deeper layer by layer

That stack of webs is just layer 1 — dozens more follow

Layer 1 reads the raw text. Each later layer reads what the previous layer produced. Another web, one level deeper (the picture uses 80). The bill multiplies: words × words × heads × layers. Double the window, quadruple the bill. That is why the window needs a cap.

The constraint · the one hard limit

It can only read so much — and the middle goes muddy

The string has a hard cap: context length. The two classic complaints, big files choke it and long chats make it forget, both start here. The longer the string gets, the easier it is to miss the middle. The buried line is the one it reads past.

Stateless · there is no “just now”

The chat is an illusion: every message is a first meeting

Between replies, nothing stays in its head. A continuous chat is the app resending the history each time. The model rereads from the top and continues. That is also why long chats cost more. It has no state. What you send is what it has.

Feed it · start with a handoff list

First, list every piece of knowledge the job takes

Treat it like a brilliant new hire on a permanent first day. What would they need before taking over? Your goal, situation, private materials, latest progress, public knowledge and common sense. Do not sort who supplies what yet. List it all.

Cross off · what it comes with

Two items on the list can be crossed right off

Public knowledge and common sense: if it is public, common, and before the cutoff, it saw it in training. Already in its head. Cross those off. Four things remain for you to supply: goal / status / materials / constraints. Those go into the window.

Ask · information comes out in talk

Just talk — and for anything complex, have it ask you first

You do not need to fill the list at once. Information comes out in talk: it asks, you answer, the slots fill in. For complex work, start with “Before you start, ask me about anything unclear.” One round of questions saves rounds of rework.

Sand it · v1 is usually wrong

Toss “what went wrong” back as-is, round after round

The first version is rarely right. It was guessed into being. Paste the raw result or error back; it revises. Still wrong? Paste again. The chat grows, and the result gets sharper. Give symptoms, not diagnosis. Stop when it is good enough.

Save · don’t just close the tab

Have it distill the chat into a document — that is a Skill

If the job worked, add one last instruction: “Distill this chat into a reusable doc.” It extracts the context, final solution, and pitfalls, then writes a Skill. Experience accumulates as documents you can reuse.

Save · how it pays off

Next time’s handoff: the Skill fills the slots

Next time the same kind of job appears, drop in the Skill. The how-to, pitfalls, and house rules come with it. You only state this time’s goal and status. Five rounds last time; one round now. That is saved context compounding.

First, see what it is

It’s just a brain — it only reads and writes text

Nothing else: no hands, no memory, can’t act on its own. You’ve seen it check calendars, remember you, run long tasks — none of that is the model; it’s programs around it doing the work. The next few pages add those programs, one at a time.

Fit the hands · tool

The model has no hands: it writes an instruction; a tool does the work

So how does it book that meeting? It can’t reach your calendar — all it can do is write one line on the notepad: check_calendar(Wed,…), then stop. A program outside sees it and actually checks the calendar — that program is a tool. The result comes back as text; it reads it and answers. It only ever talks; the doing is the tool’s.

Fit the memory · injection (RAG/Memory/Skill)

Memory is bolted on: it fetches, or someone writes

The model does not remember you by itself. Memory is added around it. Two routes: a tool fetches something, or the harness writes context onto the notepad — documents (RAG), preferences and past results (Memory), reusable how-tos (Skill). All arrive at fixed moments. Who writes them? Coming up.

The body moves · the loop (Agent)

The same model, fed a few more rounds = an Agent

With hands and memory attached, the system can run on its own: write → tool runs → result written back, round after round. None of it needs you; the notepad keeps growing. An Agent sounds mysterious. The mechanism is plain: the same model, fed a few more rounds.

The full skeleton · harness

The shell that assembles this body is the harness

The harness is the program that writes the notepad, offers tools, and runs the loop: Claude Code, Cursor, your AI app. “Who does the writing?” This is who. It can also run several loops as a workflow. The point: only the model is special. The rest is ordinary code, and you can shape it.

Reflexes & red lines · hooks

Fixed points on the loop come with sockets — hooks

“Hook” is an old programmer word: when a fixed event fires, your attached action runs. If dinner starts, “wash hands first” runs. The loop has sockets at key points: after you speak, before it acts, after a tool runs, and before wrap-up. Put gates, injections, and follow-ups there. A prompt asks. A hook enforces.

the lens · this chat, stripped bare

To be, or not to be,

that is the question:

Whether ’tis nobler

to suffer the slings and arrows

IN · fed to the model

SSystemYou are a helpful assistant.›

You are a helpful assistant.(set by the app; invisible in the chat UI)

UUserTo be, or not to be,›

To be, or not to be,

AIAIthat is the question:›

that is the question:(its own last reply, fed straight back in — it remembers nothing, it rereads)

UUserWhether ’tis nobler›

Whether ’tis nobler

OUT · what it wrote

AIAIto suffer the slings and arrows›

to suffer the slings and arrows

→ REQUEST (in)

{
  "messages": [
    { "role": "system",    "content": "You are a helpful assistant." },
    { "role": "user",      "content": "To be, or not to be," },
    { "role": "assistant", "content": "that is the question:" },
    { "role": "user",      "content": "Whether ’tis nobler" }
  ]
}

← RESPONSE (out)

{
  "content": [{ "type": "text", "text": "to suffer the slings and arrows" }]
}

↓ what the model is fed is just this one string (DeepSeek template shown)

<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>To be, or not to be,<｜Assistant｜>that is the question:<｜end▁of▁sentence｜><｜User｜>Whether ’tis nobler<｜Assistant｜>to suffer the slings and arrows<｜end▁of▁sentence｜>

■ grey = role separators (special tokens) · ■ orange = what the model wrote, word by word. The moment it reads <｜Assistant｜> it guesses onward — the very same string you saw on stage.

This is what you see every day: you give the setup, it continues. Split open (click rows to expand): the input is more than what you typed — the app’s System line and even its own previous reply get fed back in; the output is only the newest line. All of it is text. “Calling an AI” means mailing these few pieces of text over; what returns is also text. There is no secret handshake. At the very bottom, even “chat” and “roles” are flattened away — just one long string, the one on stage. It reads <｜Assistant｜> and guesses on, word by word.

Book a meeting with Ann and Bob this Wednesday.

⚙ calling check_calendar(Wed, [Ann, Bob, me])

↩ calendar: Wed 3:00 pm

All three are free Wednesday afternoon — book 3 pm?

step ① · fed in

TToolscheck_calendar(day, people)›

check_calendar(day, people) — finds shared free slots. (the tool itself is just a paragraph of instructions, sent along with the request)

SSystemYou are a helpful assistant.›

You are a helpful assistant.

UUserBook a meeting with Ann and Bob this Wednesday.›

Book a meeting with Ann and Bob this Wednesday.

step ② · what it wrote (a tool call)

AIAIcalls check_calendar(Wed, …)›

check_calendar({"day":"Wed","people":["Ann","Bob","me"]}) (structured text — nothing more)

step ③ · tool runs, result fed back

⚙check_calendarfree_slots: Wed 3:00 pm›

free_slots: Wed 3:00 pm (the tool finishes; its result goes back into the chat as text)

step ④ · it reads, keeps writing

AIAIAll three free Wednesday — book 3 pm?›

All three are free Wednesday afternoon — shall I book it for 3 pm?

→ REQUEST (in · note the new tools field)

{
  "tools": [{ "name": "check_calendar", "desc": "finds shared free slots" }],
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Book a meeting with Ann and Bob this Wednesday." }
  ]
}

← RESPONSE (out · only the call was written)

{ "tool_calls": [{ "name": "check_calendar",
    "args": { "day": "Wed", "people": ["Ann","Bob","me"] } }] }

then: the program runs it → result returns as text → one more round → it answers

↓ the whole round at token level — still one string (DeepSeek template)

<｜begin▁of▁sentence｜>You are a helpful assistant.
[tools] check_calendar(day, people): finds shared free slots<｜User｜>Book a meeting with Ann and Bob this Wednesday.<｜Assistant｜><｜tool▁calls▁begin｜>check_calendar {"day":"Wed","people":["Ann","Bob","me"]}<｜tool▁calls▁end｜><｜end▁of▁sentence｜><｜tool▁outputs▁begin｜>free_slots: Wed 3:00 pm<｜tool▁outputs▁end｜><｜Assistant｜>All three are free Wednesday — book 3 pm?<｜end▁of▁sentence｜>

■ grey = special tokens · ■ orange = what it wrote · ■ green = what the tool returned. Tool definition, call, and result all sit on one string.

What you see: it “used a tool”. Peel down one layer. The four steps mirror ①②③④ on stage. Step ①: the tool is just instructions fed in. Step ②: the “call” is text too. Each step only adds words to the same conversation. “Wiring up a tool” means one extra tools field in the request. The “call” is structured text in the reply. The program runs it, puts the result back, and asks one more round. Flattened all the way down: tool definition, tool call, tool result. All words on one string. The rule holds: it is all text.