Switch from Monaco to CodeMirror and generate AST from Lezer tree

medihack · August 30, 2022, 10:01am

I have an existing parser (based on Chevrotain) for a custom (JSX-like) language that also supports some schema definition (something like XML simple-schema). In the end, I need an AST for a custom interpreter and also some web editor integration. That’s why I wrote a Monaco editor integration with a language service, but there are various reasons why I would like to switch to the wonderful CodeMirror 6 editor. It seems to me that using Lezer as the parser for CodeMirror is a better option than reusing the Chevrotain parser. I wonder if I should now get rid of the Chevrotain parser completely and generate the AST tree from the Lezer tree? The AST tree is even needed for schema validation and autocompletion (and I guess must be recreated every time). Do you think this is a good scenario for CodeMirror 6 (especially as Monaco does this all in a web worker)?

marijn · August 30, 2022, 10:36am

That depends. Lezer does not emit an abstract syntax tree, just a set of nodes with types and parent/child relations. So using that for analysis is definitely possible, but more awkward than what you’d get from a regular parser—you have to iterate through child nodes finding the one you’re interested in, and get the name of variables by reading the start/end position of the corresponding node from the document, and so on.

medihack · August 31, 2022, 9:12am

@marijn Thank you for your assessment. With the help of TreeCursor and the source text itself, it seems to be quite easy to transfer the Lezer tree into an AST (and then compare that AST to the schema object). But I guess I’ll have to just test it out, if validation and autocomplete is still fast enough to not block the UI. If not, I could still use a web worker for calculating the linting and autocompletion results. Is there a way to serialize and deserialize a Lezer tree (to send it to a worker)?

marijn · August 31, 2022, 9:35am

No, that currently doesn’t exist. There’s Tree.build to deserialize from the arrays the parser builds up, but no corresponding serialization function. Also, since these are constantly being updated and share structure with previous trees, you’d probably want something more intelligent than full serialization/deserialization (such as tagging subtrees with IDs using WeakMap so that you can reuse them on the other side when they have already been moved to the worker).

medihack · August 31, 2022, 12:04pm

Thanks a lot for your input. It sounds like a bit of work, but totally doable. Really cool project!!!