Hello! I’m trying to write a Lezer parser for a simple Lisp-like language, with the goal of using it for syntax highlighting in CodeMirror. I currently have the following grammar:
@top Program { expression }
expression { Fn | App | Var }
Fn { LPar @specialize[@name="fn"]<Var, "fn"> Var expression RPar }
App { LPar expression expression+ RPar }
@tokens {
Var { $[a-zA-Z_]+ }
space { $[ \t\r\n]+ }
LPar { "(" }
RPar { ")" }
}
@skip { space }
@detectDelim
When trying it in CodeMirror with the bracket matching extension enabled, some simple ill-formed inputs succeed in matching brackets, while others fail.
For instance:
()
fails to match; parses asProgram(App(LPar,⚠(RPar)))
(a)
succeeds; parses asProgram(App(LPar,Var,⚠,RPar))
(fn)
fails; parses asProgram(Fn(LPar,fn,⚠(RPar)))
(fn a)
succeeds; parses asProgram(Fn(LPar,fn,Var,⚠,RPar))
(fn a a a)
succeeds; parses asProgram(Fn(LPar,fn,Var,Var,⚠(Var),RPar))
As you can see, the failures seem to be cases where the expected tokens are ... A B )
, but both A and B are missing. The error recovery mechanism decides to skip the closing bracket and assume all three tokens are missing, instead of assuming that A and B are missing and matching the closing bracket.
Is there a way to structure my grammar or configure Lezer such that balanced brackets successfully match, even when more than one token is missing in between?