How to allow syntax `2h30` when syntax `2h 30` works?

Hello, I created with lezer a mini-language to write basic stuff like 1+3*8 or abc in ["a", "b", "c"]. However, I’d like to add a syntax to add hours, like 2h30. By doing:

BinaryExpressionInfix {
  Expression !hour Hour Expression | …
}
Hour[@isGroup=Sign] { @specialize<Identifier, "h"> }

I can make the syntax 2h 30 work, unfortunately I can’t remove the space to make it compatible with the syntax 2h30. My understanding is that lezer scans the full token, i.e. h30 (since numbers are allowed in tokens), and don’t recognize it as h… but I don’t know how to fix this issue. For now, my tokens are:

@tokens {
  Identifier { $[a-zA-Z]$[a-zA-Z0-9_]* }
  Number { @digit+ }
  space { @whitespace+ } 
}

but if I turn it into:

Identifier { OneLetter (OneLetter | Digit)* }
Number { Digit+}
@tokens {
  OneLetter { $[a-zA-Z] }
  Digit { @digit }
  space { @whitespace+ } 
}

then I will not be able to use @specialize<Identifier, "if"> is Identifier is not anymore a token… What is the solution to my problem?

Right. h30 matches your Identifier token, so that won’t be parsed as an h identifier. If your syntax requires the h to be between numbers with no spaces around it, you’ll probably want to make that a single token. If they are actual different tokens that allow spaces between them, then that seems awkward to parse. You could add an external specializer for identifiers that recognizes the h + two digits formats and assigns a specific token type to it, I guess.

Thanks, but what do you mean by single token? If my memory is good if I put the h in

Hour[@isGroup=Sign] { "h" }

it will fail, like if I put:

@tokens {
  Identifier { $[a-zA-Z]$[a-zA-Z0-9_]* }
  Number { @digit+ }
  H { "h" }
  space { @whitespace+ } 
}

since tokens will overlap. Or am I missing something?

You can specify precedence between tokens. So a token like Time { @digit @digit? "h" @digit @digit } will conflict with your number token, but you can say (in the @tokens block) @precedence { Time Number } to indicate that if both apply, the time token should be used.

But my understanding of precedence is that if I do that I will not be able to define variables named with an h like in hello, since it will first try to parse the h, prioritize h over a generic token, and then fail to continue the parsing (it’s the reason why I needed to use specialize in the first place)

The token I showed above doesn’t match h, so no, that won’t be an actual problem.