I have keywords like ’ Action’, ‘Routine’ and ‘Private’ which are used in my rules, but they can be occur as identifiers too. They should be parsed as identifiers when they are not specified in the rule definition.
For example, for content like Alpha Beta Gamma Routine
RuleA {
Identifier+
}
It’s getting parsed as RuleA, (Error) Routine right now, Which should be purely RuleA,
but while doing this, I also want it to parse content like Routine Alpha
With a a rule like
RuleB {
Routine Identifier
}
I fixed it using
@external extend {Identifier} extendIdentifier from "./tokens" {
Routine
}
You can also directly use @extend (instead of @specialize) in your grammar. See for example the way the JavaScript grammar handles contextual keywords.
Sorry for the trouble, but what would you suggest for a case like this?
There are two rules: 1) URLText 2) Identifier
Which have similar definitions, so ‘google’ can be parsed as URLText and Identifier too. I set the precedence as below, which seemed to fix it for most cases.
@precedence {URLText, Identifier}
but this breaks when in one rule, where both of these rules are used in close proximity. Input like Alpha is getting parsed as URLText, which should be parsed as an identifier.
From the docs, dynamic precedence seems to be a possible solution, but I’m not sure how to approach this with it, could you please give any suggestions?
This’ll happen in parse states where both are valid. It sounds like, if you have ambiguous tokens that may both occur in a given position, your grammar has a problem, and Lezer can’t do much more than apply the precedence you specified. I’m not sure if you’re implementing an existing grammar or creating one here, but in the former case maybe check the spec for that grammar more closely, and the latter see if you can change something to fix the ambiguity.
Hi again Marijn, I’ve been thinking about specialize and extend.
Specifically, the below part:
There is another operator similar to @specialize , called @extend. Whereas specialized tokens replace the original token, extended tokens allow both meanings to take effect, implicitly enabling GLR when both apply. This can be useful for contextual keywords where it isn’t clear whether they should be treated as an identifier or a keyword until a few tokens later.
If it’s no bother, could you iterate over the allow both meaning to take effect part?
Prior to extend, I had most of my tokens specialized instead of extended, until I came across cases where identifiers could have the same input as the tokens, I extended those tokens.
This will mostly be off, but why not have all of the tokens extended? Why would specialize be needed? The parsing seems to work fine that way too, along with the additional benefit of not clashing with identifiers
Apologies if it’s very wrong, would really appreciate your input.
Firstly it’s less efficient to follow multiple parses on every keyword. But also this’d allow things like let function = 10 in, for example, the JavaScript parser, which the language most certainly does not allow. function is always a keyword, and most languages work like that—you can’t use keywords as identifiers.