I am trying to write my language so that it can support member expressions and floating point literals, but the “.” symbol causes conflicts. Would someone help point me in the right direction?
For floating point numbers I expect formats with or without the leading number:
.3 is valid
0.3 is valid
For member expressions, the following inputs end up with float literals instead of member expressions:
1__id.2__id.3__id
@User.1.2
Ideally, I would want it to resolve into something like the following syntax trees:
The way languages like JavaScript handle this is that a dot followed or preceded by a digit is always a floating point literal (which in Lezer would mean a higher token precedence on the floating point literal).
That would clear up the conflict, but I am confused at how that would create a useful syntax tree.
If I were to write something like some_name__id.1_other_name__id, I would want the tree to look something like Program(MemberExpression(Object, Property)), but if I were to put a higher precedence on the Float Literal, I would instead get something like Program(?, FloatLiteral, ?). How does the JavaScript grammar work around this?
That wouldn’t be valid in JavaScript—if your identifiers can start with a digit you’ll have to find some other strategy to disambiguate this. But that’s more of a syntax design issue than a Lezer issue.
I see the issue now. Unfortunately, I cannot make changes to the syntax
It seems for now that the most consistent thing to do is create a token with a higher precedence than FloatLiteral, which takes the entire MemberExpression, since I cannot really break it into parts without conflicting with float literals.
Are external tokenizers able to help at all in this situation? I may just need to get the entire MemberExpression value using from and to on the document and split the string by periods.
If lookahead helps, an external tokenizer could do that—using some more involved logic to determine whether to produce a dot token or a float literal. Or maybe make sure your grammar doesn’t allow float literals in any places that allow dots, so that contextual tokenizing takes care of this.
I see, I am thinking of replacing my FloatLiteral token with an External Tokenizer’s token. It’s going to be exactly the same, but it will also look-behind to see if $[a-zA-Z] exists right before the dot. Is that a correct use of external tokenizers?