Guiding error recovery to better match brackets

Hello! I’m trying to write a Lezer parser for a simple Lisp-like language, with the goal of using it for syntax highlighting in CodeMirror. I currently have the following grammar:

@top Program { expression }
expression { Fn | App | Var }
Fn { LPar @specialize[@name="fn"]<Var, "fn"> Var expression RPar }
App { LPar expression expression+ RPar }
@tokens {
	Var { $[a-zA-Z_]+ }
	space { $[ \t\r\n]+ }
	LPar { "(" }
	RPar { ")" }
}
@skip { space }
@detectDelim

When trying it in CodeMirror with the bracket matching extension enabled, some simple ill-formed inputs succeed in matching brackets, while others fail.

For instance:

  • () fails to match; parses as Program(App(LPar,⚠(RPar)))
  • (a) succeeds; parses as Program(App(LPar,Var,⚠,RPar))
  • (fn) fails; parses as Program(Fn(LPar,fn,⚠(RPar)))
  • (fn a) succeeds; parses as Program(Fn(LPar,fn,Var,⚠,RPar))
  • (fn a a a) succeeds; parses as Program(Fn(LPar,fn,Var,Var,⚠(Var),RPar))

As you can see, the failures seem to be cases where the expected tokens are ... A B ), but both A and B are missing. The error recovery mechanism decides to skip the closing bracket and assume all three tokens are missing, instead of assuming that A and B are missing and matching the closing bracket.

Is there a way to structure my grammar or configure Lezer such that balanced brackets successfully match, even when more than one token is missing in between?

No, that’s unfortunately not something Lezer makes possible. The search heuristic it applies to continue parsing on a syntax error simply tries to find the smallest amount of tokens to insert or skip, or rules to break off, to be able to continue parsing.