Can child/sub tokens appear in the parse tree?

AlexErrant · March 24, 2024, 6:40pm

I feel like the answer is “no” but I have to ask just to make sure.

This grammar and input yield the following the parse tree:

@top Program { Either* }

@tokens {
  Letter { @asciiLetter+ }
  Number { @digit+ }
  Either { Letter | Number }
}
---
a1b2
---
Program (a1b2)
	Either (a)
	Either (1)
	Either (b)
	Either (2)

Is there a way to have this be the parse tree without moving Either out of @tokens?

Program (a1b2)
	Either (a)
		Letter (a)
	Either (1)
		Number (1)
	Either (b)
		Letter (b)
	Either (2)
		Number (2)

In my non-toy-language, moving that token out of @tokens causes additional complexity I’d love to avoid (@skip, ExternalTokenizer, etc.)

marijn · March 24, 2024, 7:15pm

No. Tokens are the atomic elements in the parse tree. No structure is recorded inside them.

AlexErrant · June 28, 2024, 2:49pm

If you’re desperate/creative you can used the parseMixed feature. First, create your base grammar which contains Either. Then define a second grammar that can parse the contents of Either; in this case Number and Letter. Finally, do something like

baseParser.configure({
	wrap: parseMixed((node) =>
		node.type.is(Either) ? { parser: eitherParser } : null,
	),
})

There is extra complexity when dealing with a mixed language, and it isn’t as powerful as an integrated grammar, but hey you’re desperate