parsing error when production uses repetition operator

shche · April 23, 2026, 9:31am

I have a grammar that skips `\n` tokens:

@skip { newline }

But I need to recognize expressions separated by newlines:

implicitCompoundExpression[@name="CompoundExpression"] {
    expr eLineBreakCharSeparator implicitCompoundExpression
    | expr eLineBreakCharSeparator expr

Where eLineBreakCharSeparator is a token inserted by an ExternalTokenizer:

export const insertImplicitNewline = new ExternalTokenizer(
    (input, stack) => {
        if (stack.canShift(terms.eLineBreakCharSeparator)) {
            input.acceptToken(terms.eLineBreakCharSeparator);
        }
    },
    {
        fallback: true
    }
);

The production above works as expected.

But when I change it to

implicitCompoundExpression[@name="CompoundExpression"] {
    expr (eLineBreakCharSeparator expr)+
}

Lezer returns an error node after the second expression:

1+2
10+20
100+200

The error is reported at the beginning of token 100. I do not understand why. Please help.

marijn · April 23, 2026, 10:20am

How are you defining newline and eLineBreakCharSeparator? Those sound like they would conflict.

shche · April 23, 2026, 10:41am

I have an external tokenizer for that too. newline and eLineBreakCharSeparator are numbers:

export const [
    newline,
    ...
    eLineBreakCharSeparator,
    ...
] = _.range(200);

My tokenizer maps characters to numbers like above and just returns these numbers.

marijn · April 23, 2026, 10:54am

Sure, but how does it determine whether to return newline or eLineBreakCharSeparator when it sees a newline? I’m trying to understand how you are setting things up so that newlines are skipped but at the same time used as a significant part of the grammar.

shche · April 23, 2026, 11:06am

When the tokenizer detects a newline, it returns a number. The other token, eLineBreakCharSeparator, is never returned from the tokenizer, it is just declared in the grammar. This is a legacy code, I have no control over it.

My understanding is that since I do need newlines in only a few rules in the grammar, I skip newlines and insert that other token eLineBreakCharSeparator in place of newline where it is needed.

marijn · April 23, 2026, 11:19am

Then I would expect the rules that use it to never match.

You can selectively turn off skipping for just some rules with a skip {} { ... } block, and put in comments and whitespace explicitly in those rules. But you have to be careful about the boundaries of the rules you use from such a block, because mixing rules with different skip behavior only works when they are unambiguously delimited.

shche · April 23, 2026, 11:31am

Um… I have an external tokenizer that inserts that other token…

But I am trying to understand why my rule works when I write it using explicit recursion and does not work (generates an error) when I use the + operator.

marijn · April 23, 2026, 11:48am

Oh I see, the fallback tokenizer in the original message. The two formulations do look like they should be equivalent. I don’t think I can really say much more about this without a (simplified) reproduction.