Markdown and Latex syntax highlighting

Hi,

I’m wondering what would be the recommended way to get both Markdown and Latex syntax highlighting working simultaneously, and I noticed there’s a function called parseMixed in the @lezer/common package to select a different parser for particular nodes. I would want for text surrounded by single (inline math mode) or double (display math mode) dollar-signs to be highlighted as Latex and everything outside as Markdown. Should I read the ranges from each node given by parseMixed and find matches with regex or should I extend the @lezer/markdown to add new node types for these sections? Also, would I have to modify the markdown table extension to allow for latex within tables?

Yes, write an extension for the markdown parser that recognizes dollar sign markup, and either directly integrate the math parsing in there, or use parseMixed to enable some kind of LaTeX parsing inside those nodes.

I copied the code for the Strikethough extension and made it work with dollar signs. Don’t know if I’m doing this correctly, but the mixed parsing wrapper became simply this:

const latexWrapper = parseMixed((node, input) => {
    if (node.type.name === "InlineMath") {
        return { parser };
    }
    return null;
});

Problem I’m running into now is that I would like to have a <div class="cm-math"> wrap the latex elements but I guess the parser replaces the “InlineMath” node with its own nodes. What’s the correct way of doing this?

Nevermind. Just had to do:

const latexWrapper = parseMixed((node, input) => {
    if (node.type.name === "InlineMath") {
        return { parser, overlay: [{ from: node.from, to: node.to }] };
    }

    return null;
});

@marijn I was able to do the inline parser, but I’m having trouble parsing the latex blocks. This is what I got:

parse: (cx: BlockContext, line: Line) => {
    if (!line.text.startsWith("$$")) {
        return false;
    }

    const startFrom = line.pos;
    const startTo   = line.pos + 2;

    while (cx.nextLine()) {
        if (line.text.startsWith("$$")) {
            const mark = cx.elt(mathBlockMark, cx.lineStart, cx.lineStart + 2);
            const elt  = cx.elt(mathBlockNode, startFrom, startTo, [mark]);
            cx.addElement(elt);
            return true;
        }
    }

    return false;
}

Am I doing this correctly?

I’m not sure how this block math markup works, but that code looks a bit dubious — you’re returning false when there’s only a single line prefixed with $$, and creating separate mabhBlockNode elements for every prefixed line beyond the first (but not for the first).

Yeah, that was a bit off, but I think I figured out how to do it.

  parse: (cx: BlockContext, line: Line) => {
      if (!line.text.startsWith("$$")) {
          return false;
      }

      const start = cx.lineStart;
      while (cx.nextLine()) {
          if (line.text.startsWith("$$")) {
              cx.addElement(cx.elt(mathBlockNode, start, cx.lineStart + 2));
              cx.nextLine();
              return true;
          }
      }
      return false;
  }

So, that ended up not working lol. What I wanna do is parse everything between the dollar signs (including the dollar signs themselves) as a MathBlock, regardless of where they are (I think this is how FencedCode is parsed). I guess doing this with parseBlock.parse won’t work since the start of a block doesn’t have to be at the beginning of a line. Anyway, the following examples should all be valid:

Example 1

$$2x+1$$

Example2

$$
2x+1$$

Example 3

test $$
2x+1
$$

Example 4

test $$
2x+1
$$

@marijn How do I go about doing this?

If these can occur in inline text then it looks like you’ll have to define an inline parser for them.

Hello there, I am also trying to do the same thing, I was wondering what parser are you using, are you using sTeX from the legacy parser?

I also managed to create a simple inline parser based off Strikethough extension, and I haven’t used the block parser so I may not be able help you unfortunately :frowning:.

This is similar to what was included above, but this is the parser I’m currently using to mark regions as InlineMath or BlockMath.

The variant of markdown the parser is for, however, only supports $$ at the beginning of a line and uses $ for inline math.

I haven’t tried using it yet, but it looks like a 3rd-party lezer parser for TeX exists (lezer-tex on npm).

1 Like

A version that uses the sTeX parser can be found here.


Edit: This post originally contained question about the usage of what I thought was an @internal constructor. I was was confused. The original question is below:

Details

This version of the parser does, however, use an @internal version of cx.elt(...):

I’m using this constructor to nest elements. Is there some other way I should be doing this?

1 Like

BlockContext.elt is public. Could you elaborate on what is internal about this use?

1 Like

Sorry! I was confused!

I was looking at the Element constructor:

(which I am not using).

Thanks a lot for sharing all those information. This allows us to activate mathematical expression highlighting in JupyterLab 4 (that switches to CodeMirror 6). For interested dev, you can have a look at:

2 Likes