Lezer Indentation Example - Question

richardbann · November 16, 2023, 10:28am

Hi,

I am in the beginning of my Lezer journey and have some trouble understanding one piece of the indentation example grammar on the site.

The example uses an external tokenizer that generates the indent, dedent and blankLineStart tokens.

It is not clear to me why the tokenizer uses stack.canShift(blankLineStart). Checking the grammar it seems to me that after a blankLineStart token any of indent, dedent or blankLineStart tokens are invalid. If I understand the following sentence from the System Guide correctly, in this case the external tokenizer is not called:

[external tokenizer] will only be called when the current state allows one of the tokens it defines to be matched.

Could you please point out where is my misunderstanding?

Thanks

marijn · November 16, 2023, 11:32am

I think that in this grammar, that check is there so that, once the parser enters the blankLineStart (spaces | Comment)* lineEnd, it doesn’t continue matching blankLineStart tokens. It would probably be safe to remove, since indeed, inside that rule none of the tokens produced by this external tokenizer match, so it will not be called. But infinite loops from zero-length tokens are really easy to produce, and the reasoning that makes the check superfluous in this case is kind of involved and brittle if the grammar were to change, so I feel it’s a good idea to include it regardless, to nudge people into a habit that avoids this issue.

richardbann · November 16, 2023, 11:44am

Sure, thanks for the fast reply. All I needed was to confirm my understanding is correct. (Building a mental model of how the parser works is not easy…)