simpleMode for syntax-highlighter: how to get tags

I’ve been following the example from this post, along with the documentation, to try and use a SimpleMode object to create a basic syntax highlighter for an unsupported language (ink). I’m at the point where I’m trying to figure out how I can make Tag objects from my SimpleMode object. Tags objects and HTML class strings are joined, then passed into the HighlightStyle.define() method to create the highlighter.

This tutorial has been linked to a few times, and explains how a SimpleMode object works, but not how to make use of it. And the post I’ve been following (the first link) is creating Tag objects in a non-standard way that isn’t applicable for my more general use-case.

Can anyone help me understand how to create Tag objects that represent the rules of a SimpleObject object?


Below is code I’ve been putting into CodeMirrors “try” page. It doesn’t have console errors at this point, but it also doesn’t work. That means no special html classes for the comments in the editor.


import { minimalSetup, EditorView } from "codemirror"
import { simpleMode } from "@codemirror/legacy-modes/mode/simple-mode";
import { HighlightStyle, LanguageSupport } from "@codemirror/language";
import { StreamLanguage, syntaxHighlighting } from "@codemirror/language";

const inkSimpleMode = simpleMode({
  start: [
    { regex: /\/\/.*/, token: 'comment' },
  ]
});
window.dev = { inkSimpleMode: inkSimpleMode }; // DEV

const inkStreamLanguage = StreamLanguage.define(inkSimpleMode);
window.dev.inkStreamLanguage = inkStreamLanguage; // DEV

export const inkHighlighter = syntaxHighlighting(
  HighlightStyle.define([
    {
////////////////////////////////////////
// WHAT TO PUT FOR NEXT LINE?
////////////////////////////////////////
      tag: 'comment',
      class: 'ink-comment'
    }
  ]));
window.dev.inkHighlighter = inkHighlighter; // DEV

const inkLanguage =
    new LanguageSupport(inkStreamLanguage, [inkHighlighter]);
window.dev.inkLanguage = inkLanguage; // DEV

new EditorView({
  parent: document.body,
  doc: `Hi // goodbye
left // right
tall // short`,
  extensions: [
    minimalSetup,
    inkLanguage,
    ////////////////////////////////////////
    // SHOULD I USE THE NEXT LINE INSTEAD OF THE PRIOR?
    ////////////////////////////////////////
    //syntaxHighlighting(inkLanguage, { fallback: true }),
  ],
})

You’re almost there. Import tags from @lezer/highlight and use tags.comment rather than the string 'comment'.

It works! sigh Thank you very much.

One more query, if you don’t mind. It’s a 2-parter.

Your answer works for pre-existing tags (like “comment”). The language I’m highlighting requires a lot of custom tags. I’ve been able to get custom tags working by modifying the code in the ways outlined below. My question: are my modifications below the proper way to handle custom tags? Specifically, I’ve attached my custom tags to a custom field in the simpleMode object. Is this the expected storage method?
EDIT: I tried defining the Tags directly within the HighlightStyle.define and it didn’t work.

Also, while the language uses “#” to start a “lineTags” rule, it allows escaping “#” (“\#”) to avoid the rule. My understanding is that lookbehind is disabled, so my escape-character check, below, requires the text to have a throwaway space character before “#”, unless the “#” is escaped. My question: is what I’ve done the best way to handle escape-characters when using simpleMode objects?

  1. I imported Tag from @lezer/highlight
import { Tag } from "@lezer/highlight";
  1. I added a new language-specific rule:
const inkSimpleMode = simpleMode({
 start: [
   { regex: /\/\/.*/, token: 'comment' },
   { regex: /[^\\]#.*/, token: 'lineTags' },
 ]
});
  1. I created a new Tag object for each rule in the simpleMode object and attached them to the simpleMode object. (This is mainly what I’m asking about):
inkSimpleMode.tokenTable = { comment: Tag.define(), lineTags: Tag.define() };
  1. I used tags I created in step 3 for the HighlightStyle.define:
  HighlightStyle.define([
    { tag: inkSimpleMode.tokenTable.comment, class: 'ink-comment' },
    { tag: inkSimpleMode.tokenTable.lineTags, class: 'ink-lineTags' },
  ]));

I have an answer to my first question:
I had assumed, since “tokenTable” is created and added to the simpleMode object by hand, that it was a convenient storage place for custom Tag objects. However, experimentation has shown that it is not only necessary, but MUST be in place before the simpleMode object is passed into StreamLanguage.define(), or the highlighting doesn’t work.

I retroactively found this thread, where CodeMirror was patched to add support the optional “tokenTable” for “streamParser” objects. It looks like the “simpleMode” method generates a kind of “streamParser” object, so this all fits.

Make some other token match escaped characters, so that they are consumed before other tokens are matched against them.

Ah, of course. Thank you again.

So like this:

Adding “token_pound” to the simpleMode, but not to the tokenTable or HighlightStyle.define.

const inkSimpleMode = simpleMode({
 start: [
   { regex: /\/\/.*/, token: 'comment' },
   { regex: /\\#.*/, token: 'token_pound' },
   { regex: /#.*/, token: 'lineTags' },
 ]
});

(Note - I’m being extra-explicit for posterity. I hope this is helpful to others.)

Actually “token_pound”, needs to be in tokenTable as well. Just not HighlightStyle.define. Otherwise there are console warnings whenever the rule is used.

I believe you can give a token a null ("") tag if you don’t want to highlight it. Also, depending on the language, a generic ‘escaped character’ token might make more sense than a specific one per character.

Yes, that is much nicer. So the below version is the best as it is simple and universal (the first rule handles ALL escape characters).

const inkSimpleMode = simpleMode({
 start: [
   { regex: /\\./, token: null },
   { regex: /\/\/.*/, token: 'comment' },
   { regex: /#.*/, token: 'lineTags' },
 ]
});