Lookahead from within terms

mikhail · March 9, 2022, 9:26am

Imagine a toy rule-set:

@tokens {
  Name { std.asciiLetter+ }
  Number { std.digit+ | "0x" $[0-9a-f]+ }
}

It defines a tokenizer that will parse “0x” as a Number followed by a Name whereas it is probably a malformed number.

Is there a way to add a non-capturing negative lookahead of ![0-9a-f] to the Number term without writing an external tokenizer?

Thanks!

marijn · March 9, 2022, 11:08am

No. Lexer tokens are a strictly regular language, and don’t do lookahead.

mikhail · March 9, 2022, 11:29am

Thank you for the quick reply, Marijn!