Why does this input generate no error?

shche · September 27, 2023, 1:11am

Hello, I am using the sample grammar from the Guide (Lezer System Guide):

@top Program { expression }

expression { Name | Number | BinaryExpression }

BinaryExpression { "(" expression ("+" | "-") expression ")" }

@tokens {
  Name { @asciiLetter+ }
  Number { @digit+ }
}

When I parse the text below, I receive no error nodes in the tree:

a(b-c)

I expected at least one error node for the opening bracket. But I get this:

NODE: Program [0-6]
NODE: Name [0-1]
NODE: BinaryExpression [1-6]
NODE: Name [2-3]
NODE: Name [4-5]

Here is my code:

import {parser} from './lang';

const input = 'a(b-c)';
const tree = parser.parse(input);

console.log(input, '\n');

let cursor = tree.cursor();

do {
    const {type} = cursor;
    console.log(`${type.isError ? 'ERROR' : 'NODE'}: ${cursor.name} [${cursor.from}-${cursor.to}]`);
} while (cursor.next());

Am I missing something?

marijn · September 27, 2023, 8:46am

When the parser reaches an end state and there’s still more input, it restarts at the starting parse state. I guess it would be reasonable to insert an error node where it does that. What were you trying to do that requires the error node? Or did you just notice that this looked surprising?

shche · September 27, 2023, 10:55am

Thanks. This just looked surprising in general. I would, at least, expect more than one parse tree in this case, or, yes, an error node. Otherwise, how can I tell that the input is correct? Or maybe there is another API that I can use which does not restart the parser?

marijn · September 27, 2023, 3:05pm

Attached patch makes the parser emit an error node.

There’s no way for a parse to stop without reaching the end of the input, in Lezer, so I don’t know how turning off restarting would work.