Get row-aware changes from Transaction.changes?

martaver · November 16, 2022, 8:39pm

I’m playing with tree-sitter and lezer grammars and a CodeMirror 6 editor to learn and compare the two parsers.

I understand that lezer is designed specifically to work with CodeMirror and so I’m interested to understand tree-sitter’s shortcomings.

I was setting up tree-sitter’s incremental parsing, when I noticed that CodeMirror always gives its edits in a ‘single line’ format, that is fromA, toA, fromB, toB… where A is the last state and B is the new state.

I found this strange because CodeMirror seems to store Document Text in Lines rather than one big string.

Also, tree-sitter’s edit function has *Position fields that accept row and column for start, oldEnd, newEnd respectively.

At the same time, tree-sitter’s parse method could read directly from the array of Lines in a Document’s Text, rather than concatenating them to a string first, and then parsing them. Presumably this could work together with edits that are row-aware to only read the lines that have been changed from the Document.

My questions are:

is there a way to get row-aware changes from CodeMirror, so that tree-sitter can just read the lines that have been edited on subsequent parse?
why does CodeMirror’s edit callback include both fromA, and fromB, when they always seem to be the same value (tree-sitter just has startIndex)? Is there a scenario where they could be different?
is this touching on some limitation of tree-sitter in a browser context?

marijn · November 17, 2022, 1:41pm

The document (Text type) can be used to look up the line that a given position falls in (lineAt method).

I’m not sure which API you are talking about here, but if you have multiple changes the fromA/fromB for a given change can definitely differ.

Tree-sitter is a C project. For the time being, CodeMirror does not use webassembly.

martaver · November 18, 2022, 9:02pm

Thanks for your reply