[custom mode] any way to process the entire string in the editor before running token(stream, state) ?

Hello, just wanted to say I love CM.

I have a weird situation, I have my own tokenizers that I run on the editor text, and then the CM mode just follows along the already parsed content.

I used this, to parse the string only once. state.parsed becomes an array and every invokation of token() just shifts one item from the array.

    token: function (stream, state) {
      if (stream.string !== state.parsedStr) {
        state.parsedStr = stream.string;
        state.parsed = utils.parseString(stream.string);
      }
      processToken(stream, state)
    }

However, silly me didn’t think of pressing enter for days :slight_smile: So now I see now that every line in the editor comes into this function independently (stream.string is only the line, not the entire text).

Is there a lifecycle hook or something that could allow me to process the entire text in the editor before the token functions run? I thought startState might get the entire body, but I only see 2 undefined parameters passed to it.

I would appreciate any help/advice, the tokenizers I use are not “streaming”, they get some string and finish, they can’t “continue” like CM.

Thanks

It is usually less work (and less of a performance hazard) to implement a regular streaming CodeMirror mode than to try and work on top of a classical whole-file parser. But if you really want to do it this way, you can parameterize your mode with a full token stream, including token text, and have it run through that (keeping a counter in its state) as long as the stream it has matches the content, and just start outputting null tokens when it runs into an outdated token. You can then set a debounced change event handler that parses the document and calls cm.setOption("mode", myModeWithTokens(tokenizeWholeFile(cm.getValue())), starting a new highlight.

Thanks for the performance comment, I do see a potential issue with running many more regular expressions on every keystroke. I still think it’s a decent decision because seeing how our homegrown parsers work visually within CM has been priceless.

I think I understand, if I repeatedly call setOption(“mode”) it will re-initialize the mode, allowing me to pass the parsed object through the config, that’s great and should solve my problem.

just start outputting null tokens when it runs into an outdated token

You mean because of the debouncing and the “indirect” tokenization?

Thanks

There’s, generally, no guarantee that your event handlers will run before the next time the editor tries to draw something or someone calls getTokenAt or similar (which will also go through the mode), so you have to deal with the mode being called on text that doesn’t match the token array it was initialized with.