I made a few language extensions to TypeScript using Babel, but Babel turned out not to be a great fit for this, as it is solely focused on the JavaScript ecosystem, and not meant for custom languages. It also does not have any parse error recovery, it can only recover errors based on analysing the succesfully parsed tree, like having a redefined const
, making using it as part of a language service (editor support) a bad fit. As this would mean e.g. not having auto complete at object.|
, where |
is the cursor, as Babel fails to parse this, and thus there is no parse tree to determine the auto completion on. In my use case, there is no parse tree to transform to valid TypeScript that can be further processed by the TypeScript language service.
As far as I know Lezer is the only parser with parse error recovery as a JavaScript library, without resorting to WASM, that is actively maintained. And as it is designed for use in a code editor, it is a great fit for language support. I know Lezer stops at parse trees, as that is all that is needed for its intended use case. However, since all alternatives I could find that do provide support for abstract syntax trees per design are either unmaintained or missing some of those crucial features for good language service support, I plan to use Lezer as my basis.
I can construct a basic abstract tree using the the original input and iterating over the parse tree. However for the abstract tree to be really useful, it would need named children, like being able to refer to the left and right expression in a binary expression. That example can be easily dealt with using some additional metadata that a binary expression should have 2 expression children and 1 operator child, and that the first expression is left, and second right, but I can think of more complex examples where it would be undecidable without knowing the alternatives chosen. For showcase purposes, say we add the syntax to name children with <name>:
, and we have something like this:
BinaryExpression { left:Expression Operator right:Expression | "flip" right:Expression Operator left:Expression }
Where the "flip"
alternative would allow you to write a binary expression with the expressions flipped. Given either alternative, my Lezer parse tree would only know it is a BinaryExpression
and that is has the children [Expression, Operator, Expression]
, not giving me enough information to determine which Expression
should be named left
or right
.
Is this information present in the Lezer parse tree? The most obvious answer would be to inspect the tokens and determine the alternatives that way, like checking for the "flip"
token, but that would require me to basically reparse the parse tree again, which feels wrong, redoing part of the work Lezer already did.
Another option would be to make sure that whenever the parse tree becomes ambiguous like this, to rewrite the grammer to add additional rules to make these distinctions.
Is there some obvious solution I am missing?
Or is using Lezer for things like this simply not a good fit? If so, do you know any alternatives?