It does work. However, I noticed that MultibyteChar { $[\u0080-\uFFFF] } actually does what I want, because MultibyteChar { $[\uFFFF] } seems to match both UTF-16 bytes of codepoints above 0xFFFF.
Is this a bug? It seems like this would prevent you from matching any specific codepoints above 0xFFFF.
MultibyteChar { "š" } seems to match that input just fine when I test it.
But indeed, the parser uses \uffff as a special marker (I think I assumed it wasnāt a valid character when I made that choice) which messes things up. Iāll take a look at how to fix that.
Also note that grammars are not UTF16, in that you canāt specify surrogate pairs as two separate characters, only as unicode characters.
Ok thanks. I just though it was worth mentioning. Thatās a good choice that the grammars arenāt in UTF16. What should be the proper way of matching any character above 127?
The patches below should fix the confusion around direct mentions of character 0xffff. Unfortunately, npm is having some kind of issue right now and I canāt push new releases. But matching any character above 127 would be done with something like $[\u{80}-\u{10ffff}] even with the current versions of the libraries.