I have implemented my own lightweight markup language and want to give it full and proper language support in editors. At the moment I want support for VS Code, ProseMirror, and CodeMirror. So I went and manually (i.e. no generator used) implemented the language in Lezer to produce a parse tree. From there I can transform it to whatever I need, e.g. in the case of VS Code that could be semantic tokens (i.e. syntax highlighting) and folding ranges. However I started to see a pattern. If I go about naively implemented those features, I can just traverse the full parse tree and produce whatever I need, whenever it is requested. Then I realized you can combine these traversals into one traversal, using async/promises. And then I realized that in most cases only a small change occurred, and I know exactly what part of the parse tree changed in turn, so it is rather inefficient to reproduce the whole thing. In case of big documents, this could become significant. This holds for parsing itself (in case of Lezer we can use TreeFragment
s), but also for the semantic tokens, folding ranges, document links, etc. Unfortunately, as I noticed with implementing reuse in the parser, it is definitely non-trivial to get it correct, making sure everything is updated exactly as it would be as when a full reparse would have taken place. I wonder how others approach this. When I think of all corrections I would need to do to properly reuse, I feel like a full reproduction might sometimes actually be faster outside maybe memory usage (i.e. just recalculate all folding ranges, even if only one is affected).
At the moment I am just going to only do full reproductions (but combine them in one traversal), but still reuse the parse tree where possible with TreeFragment
s. However I would like to go the extra step and learn of ways how to have those reproductions be recalculated more efficiently. If anybody has examples of how others do this, or other pointers, I love to hear them!