Nested Code Blocks in Markdown Are Unsolvable

I pasted a markdown file into sanqian notes' editor and the code blocks broke. The file was a documentation template -- markdown about markdown, with typescript code examples nested inside a markdown code block. The inner code blocks spilled out of the outer one. Headings and body text that should have been inside the code block rendered as actual content.

What CommonMark says

CommonMark's fenced code block spec is clear: everything inside a fenced code block is flat text. There is no nesting.

Say you're writing documentation that includes a TypeScript code example inside a markdown code block. The outer block opens with three backticks and "markdown". Inside, you have a TypeScript example that also uses three backticks. Then a bare three-backtick line to close the TypeScript block, followed by more content, and finally a bare three-backtick line to close the outer block.

The parser doesn't see it that way. It sees:

Three backticks with "markdown" -- opens a fenced code block
Everything after that is content, including the three backticks with "typescript" (which has an info string, so it's not a valid closing fence per spec)
The first bare three-backtick line -- closes the outer block, because the spec says a closing fence "must not have an info string"
Everything after that renders as regular markdown

The bare fence that was meant to close the inner TypeScript block closes the outer one instead. CommonMark has no concept of "inner" -- content inside a code block is flat text, period.

The workaround: use four backticks for the outer fence and three for the inner. The inner closing fence doesn't match the outer opening, so it's treated as content. This works, but requires the author to know about it. When you're pasting someone else's markdown, you can't control the source.

What editors do

I checked. Obsidian follows CommonMark faithfully -- the community filed a feature request and the answer was "use more backticks." Typora, same. VS Code, same. GitHub's docs tell you to use quadruple backticks. Cursor has open bugs about it.

Nobody does anything special on the input side. The one place the industry handles this is markdown output -- ProseMirror's markdown serializer scans code block content for backtick sequences and generates a fence one backtick longer than the longest run:

const backticks = node.textContent.match(/`{3,}/gm)
const fence = backticks ? (backticks.sort().slice(-1)[0] + "`") : "```"

That works because at serialization time, the document structure is known. You know what's a code block and what isn't. Parsing is the opposite -- you're trying to discover the structure from ambiguous text.

What we did

For a notes app, "use more backticks" isn't an answer. Users paste documentation, AI output, and tutorial content. They don't think about CommonMark fence rules.

We wrote a custom block extension for marked (v17) that replaces the built-in fence tokenizer with a nesting-aware one. The heuristic: a fence line with an info string (like "typescript" or "python") inside a code block is treated as an inner opening. A bare fence line is a close. Track depth with a stack.

const fenceStack: number[] = [outerFenceCount]
 
while (pos < src.length && fenceStack.length > 0) {
  // ...scan line by line...
  
  if (!fenceInfo && backtickCount >= fenceStack.at(-1)) {
    fenceStack.pop()
    if (fenceStack.length === 0) {
      // outer block closes here
      return { type: 'code', raw, text: lines.join('\n'), lang }
    }
    lines.push(line) // inner close, keep as content
  } else if (fenceInfo && backtickCount >= 3) {
    fenceStack.push(backtickCount) // inner opening
    lines.push(line)
  }
}

For non-nested code blocks, this produces identical results to CommonMark -- the stack starts at 1, hits a bare fence, drops to 0, closes. No behavior change.

For nested blocks, it tracks depth. The outer "markdown" fence opens at depth 1. An inner "typescript" fence pushes to depth 2. The first bare fence pops back to 1 (inner close). The final bare fence pops to 0 (outer close). The entire nested structure is captured as one code block.

What this gets us

The heuristic covers documentation, tutorials, and AI output where inner code blocks have language tags -- which is most of the time. When inner fences have no info string, the tokenizer falls back to standard CommonMark behavior. That edge case is ambiguous by nature, and rare enough in practice.

The spec is designed for authoring, where the writer controls the markup. Paste is consumption, where you take what you get. Different contexts, different constraints. CommonMark addresses the first. For the second, a heuristic that handles the common case is the best we've found.