I’m playing with tree-sitter and lezer grammars and a CodeMirror 6 editor to learn and compare the two parsers.
I understand that lezer is designed specifically to work with CodeMirror and so I’m interested to understand tree-sitter’s shortcomings.
I was setting up tree-sitter’s incremental parsing, when I noticed that CodeMirror always gives its edits in a ‘single line’ format, that is fromA
, toA
, fromB
, toB
… where A is the last state and B is the new state.
I found this strange because CodeMirror seems to store Document Text in Lines rather than one big string.
Also, tree-sitter’s edit function has *Position fields that accept row
and column
for start, oldEnd, newEnd respectively.
At the same time, tree-sitter’s parse method could read directly from the array of Lines in a Document’s Text, rather than concatenating them to a string first, and then parsing them. Presumably this could work together with edits that are row-aware to only read the lines that have been changed from the Document.
My questions are:
- is there a way to get row-aware changes from CodeMirror, so that tree-sitter can just read the lines that have been edited on subsequent parse?
- why does CodeMirror’s edit callback include both fromA, and fromB, when they always seem to be the same value (tree-sitter just has startIndex)? Is there a scenario where they could be different?
- is this touching on some limitation of tree-sitter in a browser context?