Custom language with syntax highlighting

IceTheAxeMan · July 4, 2023, 10:42pm

I am creating a custom language and am trying to get custom highlighting to work.
I currently have this in my language file generated by lezer-generator:

const PypelineLanguage = LRLanguage.define({
  parser: parser.configure({
    props: [
      indentNodeProp.add({
        Application: delimitedIndent({ closing: ")", align: false })
      }),
      foldNodeProp.add({
        Application: foldInside
      }),
      styleTags({
        ComparisonOperator: tags.operator,
        PlusMinus: tags.operator,
        StarObelus: tags.operator,
        NotQuoteChar: tags.string,
        String: tags.string,
        Digit: tags.number,
        TransformerName: tags.name,
        FunctionName: tags.name,
        True: tags.keyword,
        False: tags.keyword,
      })
    ]
  }),
  languageData: {
    commentTokens: { line: ";" }
  }
});

I have it being implemented using:

const newLang = new LanguageSupport(PypelineLanguage);
const editorState = EditorState.create({
      extensions: [
        keymap.of([
          ...defaultKeymap,
          ...searchKeymap,
          ...historyKeymap,
          ...lintKeymap,
        
          // NOTE: This keymap refers to the `tab` key, NOT tabs vs spaces.
          // NOTE: `indentWithTab` should be loaded after Emmet to ensure Emmet completions can take precedence
          // NOTE: Warn users about ESC + Tab https://codemirror.net/examples/tab/
          indentWithTab,
        ]),
        placeholder('Enter here'),
        EditorView.lineWrapping,
        // autocompletion({override: [myCompletions]}),
        autocompleteExtension,
        newLang,
        EditorView.updateListener.of((v) => {
          if (v.docChanged) {
            setExpressionString(v.state.doc.text[0]);
          }
        })
      ],
    });

    const view = new EditorView({
      state: editorState,
      parent: editorRef.current,
    });

The editor is working fine, but the highlighting isn’t working for any values. Any help would be greatly appreciated.

marijn · July 5, 2023, 8:31am

Is your parser working when you directly call its parse method? Are the node names in the styleTags call the nodes you are seeing in the tree (use .toString() to inspect it)?

IceTheAxeMan · July 5, 2023, 10:08pm

I’m new to codemirror so I’m not sure exactly what you mean by call it's parse method but I did console.log(parser.parse() and I got this response.

The image shows that the values in the styleTags are also in this NodeSet. If this isn’t what you were looking for, I am willing to try anything you need. Thanks for the help.

IceTheAxeMan · July 6, 2023, 12:07am

So I realized that I had comments in the grammar that seemed to cause some problems because upon removing the comments I got an error when trying to build in the lang-example repo. I have since resolved that problem with this being the updated grammar.

@top BuiltExpression { Expression }

@tokens {
  ComparisonOperator { ">" | "<" | ">=" | "<=" | "==" | "=" | "!=" }
  PlusMinus { "+" | "-" }
  StarObelus { "*" | "/" }
  NotQuoteChar { $[^"]+ }
  String { "\"" NotQuoteChar* "\"" }
  Digit { '0' | $[1-9] }
  TransformerName { "min_max_scaler" | "standard_scale" }
  FunctionName { "if" | "and" | "or" | "not" | "rename" | "toUppercase" | "toLowercase" | "trim" | "contains" | "indexOf" | "lastIndexOf" | "replace" | "length" | "match" }
  True { "True" }
  False { "False" }
}

Expression {
  Expression "or" Comparison
  | Comparison
}

Comparison {
  Comparison ComparisonOperator Item
  | Item
}

Item {
  Item PlusMinus Term
  | Term
}

Term {
  Term StarObelus Factor
  | Factor
}

Factor {
  Value
  | Parentheses
}

Parentheses {
  "(" Expression ")"
}

Value {
  DataFrame
  | Column
  | Function
  | String
  | Number
  | Bool
}

Function {
  Transformer
  | RegularFunction
}

Transformer {
  TransformerName "(" Expression ("," Expression)* ")"
}

RegularFunction {
  FunctionName "(" Expression ("," Expression)* ")"
}

DataFrame {
  "[" String "," String "," String* "]"
}

Column {
  "[" String "]"
}

Number {
  Float
  | Integer
}

Bool {
  True
  | False
}

Float {
  Integer "." Integer
}

Integer {
  Digit+
}

@detectDelim

The problem now is that whenever I run a very basic test case:


"True"

==>

Program(String)

# False

"False"

==>

Program(String)

I get this error:

Side note, it’s also not working in the project. I just don’t have anything to show there so I’m trying to show my build process. Thanks again for any help you can offer.

tamilselvanyes · July 6, 2023, 3:43pm

I am also trying to apply syntax highlighting for my sql editor but stuck with the same, is there any update on this?

IceTheAxeMan · July 10, 2023, 2:07pm

@marijn I was wondering if you have any suggestions for this problem? Any help would be appreciated really.

marijn · July 10, 2023, 2:54pm

That’s not Lezer syntax. You want !["] instead.

IceTheAxeMan · July 10, 2023, 3:23pm

Thanks for catching that! Sadly, still getting a “No parse…” error for the true or false test.

I’ll have a run at the syntax again and see where there are mistakes. Thanks for the help.

IceTheAxeMan · July 13, 2023, 6:19am

@marijn I was able to fix the grammar and tested it in an online lezer parser tester. The grammar had no hanging branches in the tree and the structure was what I expected. I then moved it into the lang-example repository and built a new js file. When I copied that file into my project and imported into the codemirror element, the syntax highlighting isn’t working at all. None of the variables setup were highlighted. Am I missing a step or did I do something incorrectly?

marijn · July 13, 2023, 7:02am

Did you add highlighting tags to your nodes?

IceTheAxeMan · July 13, 2023, 7:21am

This is what my file looks like

import { LRParser } from '@lezer/lr';
import { styleTags, tags } from '@lezer/highlight';
import { LRLanguage, indentNodeProp, delimitedIndent, foldNodeProp, foldInside, LanguageSupport } from '@codemirror/language';
import { completeFromList } from "@codemirror/autocomplete"
import { snippets } from "./snippets";

// This file was generated by lezer-generator. You probably shouldn't edit it.
const parser = LRParser.deserialize({
  version: 14,
  states: "(QOQOPOOOlOPO'#CiOqOPO'#CkOvOPO'#CmOOOO'#Cj'#CjOOOO'#Cy'#CyO{OQO'#CqO!jOQO'#CoOOOO'#Co'#CoOOOO'#Cs'#CsOOOO'#Cf'#CfOQOPO'#CvOOOO'#Ce'#CeO#UOQO'#CaOOOO'#Cc'#CcO#mOQO'#C_O$ROQO'#C^Q$dOQOOO$iOPO,59TOQOPO,59VOQOPO,59XOOOO-E6w-E6wO$qOPO,59[O$vOQO,59bOQOPO,58}OQOPO,58{OQOPO,58yOQOPO,58xO%OOPO1G.mOOOO1G.o1G.oO%TOQO1G.qO%`OQO1G.sOOOO1G.v1G.vOOOO1G.|1G.|OOOO1G.i1G.iO%kOQO1G.gO&SOQO1G.eO&hOQO1G.dO&yOPO7+$XOQOPO'#CxO'OOPO7+$]OOOO7+$]7+$]O'WOPO7+$_OOOO7+$_7+$_O'`OPO<<GsO'hOQO,59dOOOO-E6v-E6vOOOO<<Gw<<GwOOOO<<Gy<<GyOOOO'#Cw'#CwO'sOPOAN=_OOOOAN=_AN=_OOOO-E6u-E6uOOOOG22yG22y",
  stateData: "'{~O[YO`QObROfTOhXOiXOpPOsZO~O[bO~OscO~OsdO~OfTOSeXUeXWeXneXoeXueXteXqeX~OufOScXUcXWcXncXocXtcXqcX~OWhOSTXUTXnTXoTXtTXqTX~OUiOSRXnRXoRXtRXqRX~OSjOnQXoQXtQXqQX~OokO~OqlOrmO~OfTO~OokOtqO~O[vO~OokOqwOtyO~OokOqwOt{O~OWhOSTiUTinTioTitTiqTi~OUiOSRinRioRitRiqRi~OSjOnQioQitQiqQi~Oq|O~OqwOt!PO~OqwOt!QO~O[!ROr!TO~OokOqlatla~O[!ROr!VO~O",
  goto: "${nPPo!OP!YP!eP!q#O#ZP#Z#Z#fP#fP#Z#q#|P#ZPP#O$[$b$lQaOQgZQncQodR}wY`OZcdwRuk[_OZcdkwRtj^]OZcdjkwRsi`^OZcdijkwRrhc[OZcdhijkwcYOZcdhijkwcSOZcdhijkwcWOZcdhijkwbVOZcdhijkwRpfQ!S|R!U!SQxnQzoT!OxzdUOZcdfhijkwReU",
  nodeNames: "⚠ BuiltExpression Expression Comparison ComparisonOperator Item PlusMinus Term StarObelus Factor Value DataFrame String Column Function Transformer TransformerName RegularFunction FunctionName Number Float Integer Digit Bool True False Parentheses",
  maxTerm: 37,
  skippedNodes: [0],
  repeatNodeCount: 3,
  tokenData: "0k~Rmqr!|rs#Xxy#myz#rz{#w{|#||}$R}!O#|!O!P$W!P!Q#w!Q!R$]!R![$]!^!_$b!_!`$b!`!a$b!h!i$j!v!w%X!}#O%p#P#Q%u#T#U%z#V#W&]#]#^'X#`#a(Q#a#b)]#b#c+c#c#d+o#f#g+|#g#h-R#h#i.s~#PP!_!`#S~#XOS~~#[Qrs#b#Q#R#X~#gQ[~rs#b#Q#R#X~#rOs~~#wOt~~#|OW~~$ROU~~$WOq~~$]Ou~~$bOf~~$gPS~!_!`#S~$mP#T#U$p~$sP#`#a$v~$yP#g#h$|~%PP#X#Y%S~%XOi~~%[P#f#g%_~%bP#i#j%e~%hP#X#Y%k~%pOh~~%uOp~~%zOr~P%}P#b#c&QP&TP#W#X&WP&]ObPP&`P#c#d&cP&fP#b#c&iP&lP#h#i&oP&rP#T#U&uP&xP#]#^&{P'OP#b#c'RP'UP#g#h&WP'[Q#Y#Z&W#b#c'bP'eP#W#X'hP'kP#X#Y'nP'qP#l#m'tP'wP!q!r'zP'}P#Y#Z&WP(TQ#T#U(Z#X#Y(sP(^P#g#h(aP(dP#h#i(gP(jP!k!l(mP(pP#b#c'bP(vP#b#c(yP(|P#Z#[)PP)SP#h#i)VP)YP#[#]&W~)`Q#T#U)f#]#^)rP)iP#h#i)lP)oP#V#W)V~)uP#b#c)x~){P#R#S*O~*RP#a#b*U~*XP#T#U*[~*_P#l#m*b~*eP#R#S*h~*kP#g#h*n~*qP#V#W*t~*wP#T#U*z~*}P#`#a+Q~+TP#X#Y+W~+ZP#f#g+^~+cO`~P+fP#c#d+iP+lP#h#i&WR+rP#f#g+uR+|OoQbPP,PP#X#Y,SP,VQ#b#c,]#d#e,oP,`P#T#U,cP,fP#a#b,iP,lP#X#Y&WP,rP#`#a,uP,xP#T#U,{P-OP#V#W,i~-UP#h#i-X~-[P#T#U-_~-bP#b#c-e~-hP#W#X-k~-nP#T#U-q~-tP#f#g-w~-zP#W#X-}~.QP#R#S.T~.WP#g#h.Z~.^P#V#W.a~.dP#T#U.g~.jP#`#a.m~.pP#X#Y+^P.vQ#c#d.|#f#g0_P/PQ!n!o/V!w!x0RP/YP#c#d/]P/`P#k#l/cP/fP#X#Y/iP/lP#f#g/oP/rP#V#W/uP/xP#T#U/{P0OP#g#h,iP0UP#d#e0XP0[P#d#e/cP0bP#]#^0eP0hP#a#b&W",
  tokenizers: [0, 1],
  topRules: {"BuiltExpression":[0,1]},
  tokenPrec: 0
});

const PypelineLanguage = LRLanguage.define({
  parser: parser.configure({
    props: [
      indentNodeProp.add({
        Application: delimitedIndent({ closing: ")", align: false })
      }),
      foldNodeProp.add({
        Application: foldInside
      }),
      styleTags({
        ComparisonOperator: tags.operator,
        PlusMinus: tags.operator,
        StarObelus: tags.operator,
        NotQuoteChar: tags.string,
        String: tags.string,
        Digit: tags.number,
        TransformerName: tags.name,
        FunctionName: tags.name,
        True: tags.keyword,
        False: tags.keyword,
      })
    ]
  }),
  languageData: {
    commentTokens: { line: ";" }
  }
});
function Pypeline() {
  console.log(parser.parse("if()"))
  return new LanguageSupport(PypelineLanguage, PypelineLanguage.data.of({
    autocomplete: completeFromList(snippets)
  }));
}

export { Pypeline, PypelineLanguage };

marijn · July 13, 2023, 8:26am

And True/False or strings aren’t highlighted? (Operators and names don’t get a specific color in the default highlighter.)

IceTheAxeMan · July 14, 2023, 8:11am

Good catch. I changed that to

Bool: tags.keyword

but it’s still not highlighting anything.

For clarification, when I use javascript language it works as expected with the highlighting.

Ranjit-s94 · March 8, 2024, 11:07am

@IceTheAxeMan Could you please share the final grammar file code with me? As I am stuck after this:

@top BuilderExpression { Expression }

@tokens {
  ComparisonOperator { ">" | "<" | ">=" | "<=" | "==" | "=" | "!=" }
  PlusMinus { "+" | "-" }
  StarObelus { "*" | "/" }
  NotQuoteChar { !["] }
  String { "\"" NotQuoteChar* "\"" }
  Digit { '0' | $[1-9] }
  TransformerName { "min_max_scaler" | "standard_scale" }
  FunctionName { "contains" | "concat" | "indexOf" | "lastIndexOf" | "slice" | "split" | "splitByLengths" | "length" | "match" | "startsWith" | "endsWith" | "replace" | "trim" | "toUppercase" | "toLowercase" | "toNumber" | "toText" | "if" | "or" | "not" | "and" }
  Space { " " }
  RoundTerm { ")" }
} 

Expression {
  Expression "or" Comparison
  | Comparison
}

Comparison {
  Comparison Item
  | Item
}

Item {
  AddSubtract
}

Round {
  RoundTerm
}

AddSubtract {
  Term
}

TimesDivide { 
  SpaceFactor
}

SpaceFactor {
  Space Factor
}

Term {
  TimesDivide
}

Factor {
  Parentheses
  | Value
}

Parentheses {
  "(" Expression ")"
}

Value {
  Function
  | DataFrame
  | Column
  | String
  | Number
  | Bool
  | Round
  | StarObelus
  | PlusMinus
  | ComparisonOperator
}

Function {
  Transformer
  | RegularFunction
}

Transformer {
  TransformerName "(" Expression ("," Expression)* ")"
}

RegularFunction {
  FunctionName "(" Expression ("," Expression)* ")"
}

DataFrame {
  "[" String "," String "," String* "]"
}

Column {
  "[" String "]"
}

Number {
  Float
  | Integer
}

Bool {
  "true"
  | "false"
}

Float {
  Integer "." Integer
}

Integer {
  Digit+
}

@detectDelim