Lezer token for "prop: value" syntax

Kasheftin · December 12, 2022, 8:33pm

I have to write a grammar for a custom simple yaml-like language. It should parse a code like:

@ First Task
  color: #ddd
  tags: first, second
  due: tomorrow
@ Second Task
  tags: third(#ccc)
  color: #aaa

This is my grammar file:

@top TaskList { expression+ }

@skip { space | LineComment }

expression {
  TaskTitle | 
  Prop
  LineComment
}

@tokens {
  LineComment { "//" ![\n]* }

  TaskTitle { "@" ![\n]* }

  Prop { "\n" ($[a-zA-Z]+) ":" ![\n]* }

  @precedence { TaskTitle, Prop, space }

  space { $[ \t\n\r]+ }
}

Is there any way to mark the beging of the line/file? I want task title to work only for the case when @ character is at the begining of the line. Something like TaskTitle { (\n|^)\s*@ ![\n]* }, hovewer ^ is not recognized.
The main reason of using lezer is to have syntax highlight in codemirror. Is it possible to catch the regex matches and send them somehow to the highlight? I want #ddd text to be colored with #ddd color meaning lexer should catch (#[a-f0-9]{3,6}) and produce <span style="--custom-color:$1">$1</span>.
Is it possible to catch the prop name and send it to the css class? (tags):(![\n]*) row should produce <div class="$1">$1: $2</div>.
Is it reasonable to use lezer at all for my “language”? There’s no any nested structures, just a flat list, it’s quite trivial to write a bunch of regular expressions that will produce the required html. Might that be that the codemirror legacy / something similar to legacy-modes/yaml.js at main · codemirror/legacy-modes · GitHub is the better choice?

marijn · December 13, 2022, 9:20am

Don’t include line breaks in your skipped space token. Make them explicitly appear in the grammar, since they are a significant part of the syntax.

You’ll have to do styling based on token content in an editor extension, not in the parser. The extension can use the parse tree, of course.

You could use a legacy mode, but doing this in Lezer should also be straightforward.