Custom language implementation efficiency

Hi
We have an ANTLR4 lexer/parser that we have built.
I integrated it with CodeMirror 5 by defining my own language.
I’m wondering if its the most efficient method.

This is the code:

const macroMode = {
    name: "macro",
    token: (stream: StringStream, state: any): string | null => {
        // this function returns an array of [{text, token}] based on the parsed text.
        const tokens = tokensForLine(stream.string);
        for (const t of tokens) {
            // we iterate over the stream and match the token text to advance the stream
            // returning the token type that is used for the styling
            if (stream.match(t.text!)) {
                return "macro-"+t.token;
            }
        }

        stream.next();
        return null;
    },
};

Is this efficient to do it this way?
Any tips how to improve this?
Thanks in advance.

That’s clearly quadratic (you’re re-tokenizing the entire line for every token) so no, that doesn’t look efficient.

I see.

Can you explain (or link to documentation) that explains how the function “token” is called? is it always called after input? Is it supposed to return the token up to the cursor or after the cursor?
It’s not very clear.

Do you have any tips how to make it more efficient? I suppose I need to implement a StringStream like object for ANTLR so it can recognize tokens on the fly? What’s the minimal interface that I would need to implement?

See CodeMirror: User Manual

1 Like