Hello, I created with lezer a mini-language to write basic stuff like 1+3*8 or abc in ["a", "b", "c"]. However, I’d like to add a syntax to add hours, like 2h30. By doing:
I can make the syntax 2h 30 work, unfortunately I can’t remove the space to make it compatible with the syntax 2h30. My understanding is that lezer scans the full token, i.e. h30 (since numbers are allowed in tokens), and don’t recognize it as h… but I don’t know how to fix this issue. For now, my tokens are:
@tokens {
Identifier { $[a-zA-Z]$[a-zA-Z0-9_]* }
Number { @digit+ }
space { @whitespace+ }
}
Right. h30 matches your Identifier token, so that won’t be parsed as an h identifier. If your syntax requires the h to be between numbers with no spaces around it, you’ll probably want to make that a single token. If they are actual different tokens that allow spaces between them, then that seems awkward to parse. You could add an external specializer for identifiers that recognizes the h + two digits formats and assigns a specific token type to it, I guess.
You can specify precedence between tokens. So a token like Time { @digit @digit? "h" @digit @digit } will conflict with your number token, but you can say (in the @tokens block) @precedence { Time Number } to indicate that if both apply, the time token should be used.
But my understanding of precedence is that if I do that I will not be able to define variables named with an h like in hello, since it will first try to parse the h, prioritize h over a generic token, and then fail to continue the parsing (it’s the reason why I needed to use specialize in the first place)