Hierarchical Markdown

Conceptually, a markdown file can be divided into a hierarchical structure by its headers:

# Level 1
|--- ## Level 2
|    |--------- Paragraph
|    +-------- Paragraph
+--- ## Level 2

I also found that in the ATXHeading implementation of lezer-parser/markdown. We can actually fold markdown base on this tree-like structure.
I wonder if there is a way to access this tree-like structure without custom plugins. (I’m struggling to implement this via my plugin (>_<)).

/// Parse Section as Composite Blocks
const sectionParserPlugin: MarkdownConfig = {
    defineNodes: (() => {
        let nodes = new Array<NodeSpec>();
        for (let i = 1; i <= 10; ++i) {
            nodes.push({
                name: `SectionHead${i}`,
                block: true,
                style: banyanTags[`sectionHead${i}`]
            }, {
                name: `Section${i}`,
                style: banyanTags[`section${i}`],
                block: true,
                composite(_, line): boolean {
                    let level = line.text.trimStart().match(/^#*/)?.[0].length || 0;
                    console.log(level, i, line.text);
                    return (level == 0) || (level > i);
                }
            });
        }
        return nodes;
    })(),
    parseBlock: (() => {
        let blockParsers = new Array<BlockParser>();
        for (let i = 1; i <= 10; ++i) {
            blockParsers.push({
                name: `Section${i}`,
                parse(cx, line): boolean | null {
                    // Nothing happens
                    let level = line.text.trimStart().match(/^#*/)?.[0].length || 0;
                    if (level != i) { return false; }
                    // Process the header
                    let start = cx.lineStart;
                    let lineElements = cx.parser.parseInline(line.text.slice(level), start + level);
                    // Start composite block
                    cx.startComposite(`Section${i}`, cx.lineStart);
                    // Add next current element
                    cx.nextLine();
                    cx.addElement(cx.elt(`SectionHead${i}`, start, cx.lineStart, lineElements));
                    return null;
                }
            });
        }
        return blockParsers;
    })(),
}

No, none of the code in @lezer/markdown or @codemirror/lang-markdown uses this kind of structure. As you found, the parse tree just follows the CommonMark standard, and represents a document as a sequence of blocks. Quotes and lists can produce block-level structure, but there’s no such thing as sections—headers are just blocks among blocks.

My following up question:

  • Is it possible to implement this section tree structure using composite blocks?
  • Are my custom block parsers incremental? How is that achieved?

I’m not sure how well this approach will work, but yes, doing it like that should be incremental.

I think I also need some help on composite blocks.
How to correctly use composite blocks? Can I use it in a BlockParser.parse function that returns true/false? And what is the value parameter in the startComposite function?

Your code seems to mostly work, but it doesn’t get run because it’s added after the default ATX heading parser logic. Putting a before: "ATXHeading" property on the block parser objects should help with that.