Incremental Syntax Highlighting with Lezer

I am trying to use the incremental parsing API in Lezer to implement incremental syntax highlighting. Here’s a sketch of what I have so far:

// Global value that is cached between transactions.
let fragments = []; 

// Try to emulate https://github.com/codemirror/language/blob/e2b4213ce8b24f0f47189369ec5289beac9841d9/src/language.ts#L328-L330
// but for prosemirror transactions
let ranges: ChangedRange[] = [];
if (stepMap) {
  // Normalize entire document to single PM node by substracting start of node
  stepMap.forEach((oldStart, oldEnd, newStart, newEnd) => ranges.push({fromA: oldStart - pos, toA: oldEnd - pos, fromB: newStart - pos, toB: newEnd - pos}));
}
fragments = TreeFragment.applyChanges(fragments, ranges) as any;
let tree = (support.language.parser as Parser).parse(node.textContent, 0, {fragments});

// Has as callback a function that generates syntax highlighted decorations.
highlightTree(tree, ...);

// Cache fragments for this codeblock to be used next time.
fragments = TreeFragment.addTree(tree);

From an earlier discussion, it seems that we want to take previously outputted tree fragments and use them as the input to Parser.parse or Parser.startParse which returns a tree and PartialParse respectively.

To perform syntax highlight, we want to use highlightTree or highlightTreeRange from @codemirror/highlight.

What methods do we use from the outtputted tree or treeFragments to only call highlightTreeRange on the modified ranges to avoid outputting decorations for unmodified ranges?

Relevant links:
highlight/highlight.ts at main · codemirror/highlight · GitHub highlightTree, highlightTreeRange
language/language.ts at main · codemirror/language · GitHub uses applyChanges
lezer/parse.ts at master · lezer-parser/lezer · GitHub

There’s no such feature at the moment. You could try to set something up using tree cursors to diff the two trees, but I haven’t tried that yet.

When iterating through a tree with treeCursor, what’s the way to compare whether two syntaxNodes from different trees are equal to know whether to skip or descend into?

I thought about using cursor.from, cursor.to, cursor.name but nodes after the changedRanges which may retain the same decorations / same cursor.name would have offsetted cursor.from, cursor.to values depending on the changedRanges.

For example, if we had

def sumOfSquares(x, y)
  return x**2 + y**2

move to

def sumOfSquares(x, y):
  return x**2 + y**2

we would have

Node Script from 0 to 43
Node FunctionDefinition from 0 to 22
Node def from 0 to 3
Node VariableName from 4 to 16
Node ParamList from 16 to 22
Node ( from 16 to 17
Node VariableName from 17 to 18
Node , from 18 to 19
Node VariableName from 20 to 21
Node ) from 21 to 22
Node ⚠ from 22 to 22
Node ReturnStatement from 25 to 43
Node return from 25 to 31
Node BinaryExpression from 32 to 43
Node BinaryExpression from 32 to 36
Node VariableName from 32 to 33
Node ArithOp from 33 to 35
Node ** from 33 to 35
Node Number from 35 to 36
Node ArithOp from 37 to 38
Node BinaryExpression from 39 to 43
Node VariableName from 39 to 40
Node ArithOp from 40 to 42
Node ** from 40 to 42
Node Number from 42 to 43

to

Node Script from 0 to 44
Node FunctionDefinition from 0 to 44
Node def from 0 to 3
Node VariableName from 4 to 16
Node ParamList from 16 to 22
Node ( from 16 to 17
Node VariableName from 17 to 18
Node , from 18 to 19
Node VariableName from 20 to 21
Node ) from 21 to 22
Node Body from 22 to 44
Node : from 22 to 23
Node ReturnStatement from 26 to 44
Node return from 26 to 32
Node BinaryExpression from 33 to 44
Node BinaryExpression from 33 to 37
Node VariableName from 33 to 34
Node ArithOp from 34 to 36
Node ** from 34 to 36
Node Number from 36 to 37
Node ArithOp from 38 to 39
Node BinaryExpression from 40 to 44
Node VariableName from 40 to 41
Node ArithOp from 41 to 43
Node ** from 41 to 43
Node Number from 43 to 44

Edit: I suppose the simplified question is: given two trees, where one reuses subtrees of the other, how do you detect if a current substree has been re-used when iterating with treeCursor?

The approach I had in mind would be to iterate over both trees, once from the front and once from the back, to find the first and last difference, in order to limit the amount of re-rendering to the range between those. So you’d run two cursors in sync until the nodes they point at differ, and take that position as the start of the tree changes.

I’m confused about this point: how do you check if nodes differ between two treeCursors? Is it just a simple equality check?

const cursor1 = tree1.cursor(); // input to `applyChanges`
const cursor2 = tree2.cursor(); // output of parse where tree1 is fragments
while (cursor1.node == cursor2.node) { // Do this likewise for back
  cursor1.next();
  cursor2.next();
}

As I said, I haven’t done this yet, so I’m not going to provide you with a complete implementation. Comparing nodes by identity is meaningless (they are allocated on demand). Rather, you’ll want to do something with the node’s type and position.