How to get access to tokens that were consumed when a specific grammar production matched?

I am working on a parser for Wolfram language. It has Out operator (Out—Wolfram Documentation), which can take different forms, including %%…% (% repeated arbitrary number of times). I can write %, or %1, %%%, etc.

I have:

@tokens {
  backref { "%" @digit+ }
  ref { "%"+ }
  @precedence { backref, ref }
}
...
Out { backref | ref }

After I get a syntax tree, I traverse it for my purposes:

    tree.iterate({
        enter: (node: SyntaxNodeRef) =>...)
    });

Inside enter I receive the current syntax node. How do I get to the relevant tokens? In my case I need to extract the number from backref token or count the length of the ref token.

Because tree nodes only store their extent and type, getting information (such as identifier name, literal value, or in this case, reference number) involves fetching that range from the document text and looking at its content.

Hi @shche,

I am glad to see that someone tries to make grammar for WL! Ages ago I tried with Lezer, but my knowledge wasn’t enough to pull this off. So I ended up using just tokenizer (legacy streaming parser plugin).

Good luck!