Guiding error recovery to better match brackets

jade-guiton · August 22, 2025, 4:40pm

Hello! I’m trying to write a Lezer parser for a simple Lisp-like language, with the goal of using it for syntax highlighting in CodeMirror. I currently have the following grammar:

@top Program { expression }
expression { Fn | App | Var }
Fn { LPar @specialize[@name="fn"]<Var, "fn"> Var expression RPar }
App { LPar expression expression+ RPar }
@tokens {
	Var { $[a-zA-Z_]+ }
	space { $[ \t\r\n]+ }
	LPar { "(" }
	RPar { ")" }
}
@skip { space }
@detectDelim

When trying it in CodeMirror with the bracket matching extension enabled, some simple ill-formed inputs succeed in matching brackets, while others fail.

For instance:

() fails to match; parses as Program(App(LPar,⚠(RPar)))
(a) succeeds; parses as Program(App(LPar,Var,⚠,RPar))
(fn) fails; parses as Program(Fn(LPar,fn,⚠(RPar)))
(fn a) succeeds; parses as Program(Fn(LPar,fn,Var,⚠,RPar))
(fn a a a) succeeds; parses as Program(Fn(LPar,fn,Var,Var,⚠(Var),RPar))

As you can see, the failures seem to be cases where the expected tokens are ... A B ), but both A and B are missing. The error recovery mechanism decides to skip the closing bracket and assume all three tokens are missing, instead of assuming that A and B are missing and matching the closing bracket.

Is there a way to structure my grammar or configure Lezer such that balanced brackets successfully match, even when more than one token is missing in between?

marijn · August 24, 2025, 9:01am

No, that’s unfortunately not something Lezer makes possible. The search heuristic it applies to continue parsing on a syntax error simply tries to find the smallest amount of tokens to insert or skip, or rules to break off, to be able to continue parsing.