Custom language implementation efficiency

kostyay · March 26, 2022, 6:09pm

Hi
We have an ANTLR4 lexer/parser that we have built.
I integrated it with CodeMirror 5 by defining my own language.
I’m wondering if its the most efficient method.

This is the code:

const macroMode = {
    name: "macro",
    token: (stream: StringStream, state: any): string | null => {
        // this function returns an array of [{text, token}] based on the parsed text.
        const tokens = tokensForLine(stream.string);
        for (const t of tokens) {
            // we iterate over the stream and match the token text to advance the stream
            // returning the token type that is used for the styling
            if (stream.match(t.text!)) {
                return "macro-"+t.token;
            }
        }

        stream.next();
        return null;
    },
};

Is this efficient to do it this way?
Any tips how to improve this?
Thanks in advance.

marijn · March 26, 2022, 8:17pm

That’s clearly quadratic (you’re re-tokenizing the entire line for every token) so no, that doesn’t look efficient.

kostyay · March 27, 2022, 7:35am

I see.

Can you explain (or link to documentation) that explains how the function “token” is called? is it always called after input? Is it supposed to return the token up to the cursor or after the cursor?
It’s not very clear.

Do you have any tips how to make it more efficient? I suppose I need to implement a StringStream like object for ANTLR so it can recognize tokens on the fly? What’s the minimal interface that I would need to implement?

marijn · March 27, 2022, 2:43pm

See CodeMirror: User Manual

RuslanZh · August 17, 2022, 1:29pm

If someone is interested in the integration of Codemirror 6 and Antlr4 grammar,
you can find an example on github page.

The main idea is here:

token: (stream, state) => {
        // getting tokens for the current stream.string, using antlr4 TokenStream
        const tokens = getTokensForText(stream.string);
        // getting the next token
        const nextToken = tokens.filter(t => t.startIndex >= stream.pos)[0];
        // matching the next token in current stream
        if (stream.match(nextToken.text)) {
            let valueClass = getStyleNameByTag(tags.keyword);

            switch (nextToken.type) {
                case ...:
                    valueClass = getStyleNameByTag(tags.string);
                    break;
               ...
                default: 
                    valueClass = getStyleNameByTag(tags.keyword);
                    break;
             }

            return valueClass;
        } else {
            stream.next();
            return null;
        }
    }