How to get access to tokens that were consumed when a specific grammar production matched?

shche · April 30, 2026, 10:15am

I am working on a parser for Wolfram language. It has Out operator (Out—Wolfram Documentation), which can take different forms, including %%…% (% repeated arbitrary number of times). I can write %, or %1, %%%, etc.

I have:

@tokens {
  backref { "%" @digit+ }
  ref { "%"+ }
  @precedence { backref, ref }
}
...
Out { backref | ref }

After I get a syntax tree, I traverse it for my purposes:

    tree.iterate({
        enter: (node: SyntaxNodeRef) =>...)
    });

Inside enter I receive the current syntax node. How do I get to the relevant tokens? In my case I need to extract the number from backref token or count the length of the ref token.

marijn · April 30, 2026, 10:30am

Because tree nodes only store their extent and type, getting information (such as identifier name, literal value, or in this case, reference number) involves fetching that range from the document text and looking at its content.

JerryI · May 4, 2026, 9:14am

Hi @shche,

I am glad to see that someone tries to make grammar for WL! Ages ago I tried with Lezer, but my knowledge wasn’t enough to pull this off. So I ended up using just tokenizer (legacy streaming parser plugin).

Good luck!