Async Streaming Lint / Autocomplete

ceramichacker · July 31, 2024, 7:59pm

Hi!

I’ve built a streaming parser / validator, which I would like to use for linting / providing autocompletion in a very long file.

Ideally, I would like to have a loop that runs when the editor is idle. It would:

parse ~50 lines or so
Add decorations for any syntax highlighting
Report any errors to the lint source
Provide autocompletion results if it has just parsed the cursor position, and if there are results to provide
Save a resumable “checkpoint” of its state
When the editor is next idle, repeat starting from the last saved checkpoint
On transactions that change the document, we would discard any checkpoints after the transaction start.

I can run these pieces separately:

I can create a StreamLanguage for the syntax highlighting
I can create an async Lint source for linting
I can create an async autocompletion source for autocompletion

But it’s a bit sad that I can’t reuse work between the 3.

Also, it doesn’t look like the lint source can return multiple times, so I would have to either return on the first linting error, or wait until the entire document has been linted before returning any errors.

Even more problematically for this approach, it doesn’t look like I can store my “checkpoints” state in the lint source / autocompletion source.
Would the “correct” approach here be to:

run one copy of the parser as a stream language
run a second copy in a state field, which implements the “loop” by running transactions that update itself
have the lint source and autocompletion source just pull from the state field

Thanks!

marijn · August 1, 2024, 5:53am

Using setDiagnostics (instead of a regular lint source) to manage your linting should make updating the diagnostics from a state field rather straightforward. For completions, yes, it does make sense to store some kind of representation of the program in a state field and query that in the completion source function.

If possible, using a Lezer parser and having the other two processes use the syntax tree it produces might avoid duplicated work, but if you’re using a stream parser, that can only emit tokens, not a structured tree, so it might not be much use for that purpose.