Need to know if Lezer can solve this

Dandada1993 · January 3, 2024, 8:52pm

Hi

I am migrating from CodeMirror 5 to 6 and am attempting to use a Lezer to define the grammer however I am running into a problem and I need to know definitively if it is possible to solve with the built in Lezer mechanisms or if I should abandon Lezer and use the Stream Parser. The below is a minimal example that demonstrates the problem.

@top Scripts { Script (Separator Script)* }

Script { (Command+) }

Command {
	Motion lineEnd |
	Container { Cblock lineEnd Body }
}

Cblock {
        control_forever { @specialize<Phrase, "forever"> }
}

Motion {
	movesteps { "move (" Value? ") steps" }
}

MotionValues {
	xposition { @specialize<Phrase, "x position"> }
}

Body { indent Command+ (dedent | eof) }

Value {
	Number |
	MotionValues |
	Phrase
}

Separator { separator Phrase* lineEnd }

@skip { 
	spaces | 
	blankLineStart spaces lineEnd
}

lineEnd { newline | eof }

@context trackIndent from "./tokens.js"

@external tokens indentation from "./tokens.js" {
	indent,
	dedent,
  blankLineStart
}

//!tokens

@tokens {
  @precedence { Number, Phrase }
  spaces { $[ \t]+ }
  newline { "\n" }
  eof { @eof }
  char { @asciiLetter | $[_\u{a1}-\u{10ffff}] }
  word { (@digit | "." | char)+ }
  Number { @digit @digit* }
  Phrase { word (" " word)* }
  separator { "-----" }
}

I also want a valid Script to include a single MotionValues i.e.

Script { (Command+|MotionValues) }

However, there is an overlapping tokens between “move (” and Phrase when I do that. I have tried defining “movesteps” like this:

movesteps { @specialize[@name=move]<Phrase, “move (”> Value? “) steps” }

However then the text “move (” does not get highlighted correctly. I assume that I can define “move (” as a token however there are too many such commands to make defining each as a token a viable solution.

I need to know if the Ambiguity Marker or some other Lezer mechanism can resolve this or if I should move to the Stream Parser. I have a Mode from version 5 that I could probably largely reuse. All I really need is highlighting.

marijn · January 4, 2024, 11:01am

It doesn’t look like Phrase even matches the string "move (" (since a paren is not a Word). Making the tokens smaller (i.e. making Word the token, and Phrase a non-terminal that matches words separated by whitespace, and matching a specialized word for "move" followed by an opening paren token for "move ("), might make this easier.

Dandada1993 · January 4, 2024, 1:57pm

I will make the suggested edits and see if it works.

Dandada1993 · January 4, 2024, 11:16pm

Thank you very much. I was able to implement your suggestions and the following worked.

@top Scripts { Script (Separator Script)* }

Script { (Command+|MotionValues) }

Command {
Motion lineEnd |
Container { Cblock lineEnd Body }
}

Cblock {
control_forever { @specialize<word, “forever”> }
}

Motion {
movesteps { @specialize<word, “move”> “(” Value? “) steps” }
}

MotionValues {
xposition { @specialize<word, “x”> space @specialize<word, “position”> }
}

Value {
Number |
MotionValues |
Phrase
}

Body { indent Command+ (dedent | eof) }

Separator { separator Phrase* lineEnd }

Phrase { word (space word)* }

@skip {
spaces |
blankLineStart spaces lineEnd
}

lineEnd { newline | eof }

@context trackIndent from “./tokens.js”

@external tokens indentation from “./tokens.js” {
indent,
dedent,
blankLineStart
}

//!tokens

@tokens {
space { $[ \t] }
spaces { space+ }
newline { “\n” }
eof { @eof }
char { @asciiLetter | $[_\u{a1}-\u{10ffff}] }
word { (@digit | “.” | char)+ }
separator { “-----” }
Number { @digit @digit* }
@precedence{ spaces, space }
@precedence {word, spaces}
@precedence { Number, word }
}