Best way to define a lot of keywords for StreamParser

JerryI · April 1, 2023, 11:48pm

I am working with a tokenizer for Mathematica language using legacy approach.
The idea is to highlight built-in (standard library) functions and user-defined symbols. In the modes/mathematica.js there is a method used

  // Literals like variables, keywords, functions
  if (stream.match(reIdInContext, true, false)) {
    return "function";
  }

I added my own to match build-in symbols in a way like

var reKeywords = new RegExp(
  "StringQ|Null|NullQ|Do|Block|Module|With|While|Sqrt|Switch|Which|... a lot"
);
  // Literals like variables, keywords, functions
  if (stream.match(reKeywords, true, false)) {
    return "keyword";
  }

My feeling that is for sure - it is extremely slow.
Is there any reliable way to do this?

I did check the source code of StreamParser, it seems to be the case that I can simply do

let arrayOfFunctions = ["Table", "While", "Do", ... a lot];

arrayOfFunctions.forEach((key) => {
    if (stream.match(key, true, false)) {
    return "keyword";
  }
})

But the overhead of stream.match is also quite big…

marijn · April 2, 2023, 7:22am

What you want to do is match identifiers, and then look up the resulting word (stream.current()) in tables for the various kind of words you want to highlight specially (keywords, built-ins, …).

JerryI · April 2, 2023, 12:05pm

It did work absolutely perfect!
git

mb I will publish it on NPM as well.

Tokenizer also highlights symbols, which were mentioned more than 1 time in a code.
If I manage to implement visual editing of matrixes/tables using Decorations, this feature will be presented too.