Granularity of incremental parsing

martijnwalraven · May 30, 2025, 5:58am

I’ve been trying to get incremental parsing to work, but I feel I may have misunderstood what it is able to do. My hope was that incremental parsing would be able to reuse syntax nodes when these have not been affected by changes. As a next step, I would then be able to associate dependent data with those syntax nodes (through NodeWeakMap).

I’ve used TreeFragment.applyChanges to modify fragments, and I can see the range changing, but the syntax nodes in that range are never reused. Even if I don’t make any changes to the text at all, parsing still fails to reuse any of the previously generated syntax nodes. If I do something like this for example:

let tree = parser.parse(text);
let fragments = TreeFragment.addTree(tree);
let newTree = parser.parse(text, fragments);

Surprisingly, there doesn’t seem to be any reuse between parses. One reason seems to be that incremental parsing is only enabled when the input length exceeds a certain value (parser.bufferLength * 4, so 4096 by default):

github.com/lezer-parser/lr

src/parse.ts

3eaa5d375


      
          this.fragments = fragments.length && this.stream.end - from > parser.bufferLength * 4
            ? new FragmentCursor(fragments, parser.nodeSet) : null

That seems like an optimization to balance the performance benefits of incremental parsing against the overhead of using it. But after some more digging, that made me realize incremental parsing is limited to the reuse of whole Trees (or rather TreeBuffers).

Am I correct in thinking incremental parsing will never break up TreeBuffers? That means it is more coarse grained than I thought, and won’t be a reliable way to keep track of which syntax nodes have changed. Is that a fair conclusion?

marijn · May 30, 2025, 6:46am

That’s correct. It’ll reuse Tree objects, but any TreeBuffer touched by the changes is going to be re-parsed.

There’s no way to associate data with nodes inside TreeBuffers in any case, since there’s no object identity for those.

Caching on the Tree level still works very well though (see for example the way local variable completion works in packages like lang-javascript), on the assumption that recomputing for a couple of buffer’s worth of code is cheap enough.