I am in the beginning of my Lezer journey and have some trouble understanding one piece of the indentation example grammar on the site.
The example uses an external tokenizer that generates the
It is not clear to me why the tokenizer uses
stack.canShift(blankLineStart). Checking the grammar it seems to me that after a
blankLineStart token any of
blankLineStart tokens are invalid. If I understand the following sentence from the System Guide correctly, in this case the external tokenizer is not called:
[external tokenizer] will only be called when the current state allows one of the tokens it defines to be matched.
Could you please point out where is my misunderstanding?
I think that in this grammar, that check is there so that, once the parser enters the
blankLineStart (spaces | Comment)* lineEnd, it doesn’t continue matching
blankLineStart tokens. It would probably be safe to remove, since indeed, inside that rule none of the tokens produced by this external tokenizer match, so it will not be called. But infinite loops from zero-length tokens are really easy to produce, and the reasoning that makes the check superfluous in this case is kind of involved and brittle if the grammar were to change, so I feel it’s a good idea to include it regardless, to nudge people into a habit that avoids this issue.
Sure, thanks for the fast reply. All I needed was to confirm my understanding is correct. (Building a mental model of how the parser works is not easy…)