I’m trying to build a generic autocompletion module that works independently of language modes, and my idea here is to:
Receive ViewUpdate events from the editor.
Identify the new tokens from the changed part.
Since tokenizing the entire document on every change is likely to be a problem, I would like to identify the changed content, and update the list of tokens to be suggested. The latter part is on me to implement; my question is about the former.
Playing around for a bit, I can see that there is a ViewUpdate.changes which contains a list of insertions, but there doesn’t seem to be a way to identify deletions. So, my questions are:
Am I thinking correctly about how a generic autocompletion may be implemented? Or is there another API I should be looking at to build this feature?
Assuming the answer to (1) is yes, how would I go about identifying deletions and retrieving the changed line efficiently?
This is what incremental parsing (as done by CodeMirror language modes) intends to solve. If you can use the syntax tree that was already built up by the library, you don’t have to worry about this entire problem.
Change sets also identify deletions (changes that have fromA < toA delete content).
Thank you, the positions in the changeset seem to identify the character position in terms of the entire document. Is there a way to map the character positions to a specific line efficiently so that I don’t have to look through all the lines?
Apologies for the many questions, I’m a bit lost in the docs as to how to achieve the things you mentioned…
I noticed the {from/to}{A/B} properties, but the only way I see to map a position to a line number would be to iterate over EditorState.doc.text and find out the relevant line. Is there a more efficient way to do this? I’d need this to identify token boundaries in the line and add or remove the tokens from the autocompletions list that I manage manually.
I understand that I can get the syntax tree by using syntaxTree(editor.state), but is there any example that shows how I can get the individual tokens? The tree object seems to use packed integers for representing its children, and TreeCursor seems to be specific to Lezer parsers, and I think it won’t apply to simple/legacy modes like the Rust one?
The Rust mode is in @codemirror/lang-rust is also a Lezer parser. But even the legacy mode builds up a tree using the data structure defined in @lezer/common, and supports the same operations (though it will be shallower, only listing tokens and no larger structures).
I can identify the changed sections ViewUpdate.changedRanges, but what would be the most efficient way to detect the changed text?
The easiest way would be to do editor.state.doc.slice(x, y) but I assume it uses slicing under the hood. There’s also ViewUpdate.changes.inserted which does seem to have the changed text, but I have a few concerns:
For insertions/replacements, I always see an additional TextLeaf object of zero-length for each change, is this expected?
Can I assume that the order of ViewUpdate.changedRanges and ViewUpdate.changes.inserted are the same? For example, if I have a ViewUpdate object that looks like this:
Please don’t go digging through objects for undocumented stuff. That can change and will not provide a usable, stable interface. The docs tell you what you can do with a ChangeSet object, and that’s what you will have to work with.