How to allow syntax `2h30` when syntax `2h 30` works?

tobiasBora · May 29, 2023, 7:34pm

Hello, I created with lezer a mini-language to write basic stuff like 1+3*8 or abc in ["a", "b", "c"]. However, I’d like to add a syntax to add hours, like 2h30. By doing:

BinaryExpressionInfix {
  Expression !hour Hour Expression | …
}
Hour[@isGroup=Sign] { @specialize<Identifier, "h"> }

I can make the syntax 2h 30 work, unfortunately I can’t remove the space to make it compatible with the syntax 2h30. My understanding is that lezer scans the full token, i.e. h30 (since numbers are allowed in tokens), and don’t recognize it as h… but I don’t know how to fix this issue. For now, my tokens are:

@tokens {
  Identifier { $[a-zA-Z]$[a-zA-Z0-9_]* }
  Number { @digit+ }
  space { @whitespace+ } 
}

but if I turn it into:

Identifier { OneLetter (OneLetter | Digit)* }
Number { Digit+}
@tokens {
  OneLetter { $[a-zA-Z] }
  Digit { @digit }
  space { @whitespace+ } 
}

then I will not be able to use @specialize<Identifier, "if"> is Identifier is not anymore a token… What is the solution to my problem?

marijn · May 29, 2023, 10:05pm

Right. h30 matches your Identifier token, so that won’t be parsed as an h identifier. If your syntax requires the h to be between numbers with no spaces around it, you’ll probably want to make that a single token. If they are actual different tokens that allow spaces between them, then that seems awkward to parse. You could add an external specializer for identifiers that recognizes the h + two digits formats and assigns a specific token type to it, I guess.

tobiasBora · May 29, 2023, 10:15pm

Thanks, but what do you mean by single token? If my memory is good if I put the h in

Hour[@isGroup=Sign] { "h" }

it will fail, like if I put:

@tokens {
  Identifier { $[a-zA-Z]$[a-zA-Z0-9_]* }
  Number { @digit+ }
  H { "h" }
  space { @whitespace+ } 
}

since tokens will overlap. Or am I missing something?

marijn · May 29, 2023, 10:28pm

You can specify precedence between tokens. So a token like Time { @digit @digit? "h" @digit @digit } will conflict with your number token, but you can say (in the @tokens block) @precedence { Time Number } to indicate that if both apply, the time token should be used.

tobiasBora · May 29, 2023, 10:32pm

But my understanding of precedence is that if I do that I will not be able to define variables named with an h like in hello, since it will first try to parse the h, prioritize h over a generic token, and then fail to continue the parsing (it’s the reason why I needed to use specialize in the first place)

marijn · May 30, 2023, 5:11pm

The token I showed above doesn’t match h, so no, that won’t be an actual problem.