Hi!
I am working on a personal language project with Lezer. While I’ve gotten something to work, it’s hacky and I was wondering about the preferred approach for expressing something like below.
variable : type = "value" // -> Assign( VarName OptionalType String )
variable // -> ExprStmt(VarName)
variable with // -> ExprStmt( VarName WithClause(
key1 : "value1" // Key Value(String)
key2 : "value2" // Key Value(String)
key3 : "value3" // Key Value(String) )
variable // -> ExprStmt(VarName)
The main problem is to disambiguate a variable name vs. a Key in the with clause sequence without using an end delimiter for with clause (also right now I’m skipping spaces and new lines if that matters). I had the grammar below, which has an identifier token, and a labeled rule which looks for !label Key{identifier} ":" Value{expression} after a with keyword. (The !label precedence marker is needed to resolve a shift-reduce conflict.) However, this will force succeeding identifiers to be a Key.
Grammar 1
@top Script { statement* }
statement {
ExprStmt{ expression } |
Assign
}
expression {
String |
VarName{identifier} |
expression
!withL WithClause{@specialize[@name=With]<identifier, "with"> labeled+ }
}
Assign {
VarName{identifier} ( ':' OptionalType{ identifier})?
"=" expression
}
@precedence {
withL @left
label @left
}
// the label prec removes the shift/reduce conflict, but forces all succeeding identifiers to be
labeled { !label Key{identifier} ':' Value{expression} }
@skip { space | newline }
@tokens {
space { $[ \t]+ }
newline { $[\r\n] }
String { '"' (![\\\n"] | "\\" _)* '"' }
identifierChar { @asciiLetter }
identifier { identifierChar (identifierChar | @digit)* }
}
variable with // ExprStmt( VarName WithClause(
key1 : "value1" // Key Value(String)
key2 : "value2" // Key Value(String)
key3 : "value3" // Key Value(String)
variable // -> still a Key instead of the intended variable name
Because of this, I added a label token with a higher precedence than identifier, which is just identifier ":", and replaced the identifier token with the label token in the with clause (snippet below, full grammar at the bottom of post).
expression {
String | VarName{identifier} |
expression !withL WithClause{With labeled+ }
}
With{@specialize< identifier, "with">}
labeled { !label Key{label} Value{expression} } // <- use label token instead of identifier
@tokens {
// ...
identifier { identifierChar (identifierChar | @digit)* }
label { identifier space? ":"}
@precedence {label, identifier}
}
This “works” for the with clause, but now the colon is inside the Key{label} node, and has some other side effects like needing additional handling for variable type declarations (I collapsed the details below if you’re interested). So I was wondering if there’s a cleaner way to accomplish this within the grammar, or should I resort to an external tokenizer for this (or just give up and introduce delimiters :D)? Thank you in advance!
variable with // ExprStmt( VarName WithClause(
key1 : "value1" // Key Value(String)
key2 : "value2" // Key Value(String)
key3 : "value3" // Key Value(String)
//^^^^^^^ - Key nodes now include the colon because it uses the label token
variable // -> But at least this is now an ExprStmt(VarName)
Side effect on variable type declaration
Using the label {identifier ":"} would interfere with something like a type declaration following a VarName{identifier} rule because the label token has a higher precedence than the identifier.
variable : optionalType = "string"
^^^^^^^^^^ - // this range is now a label token, so VarName{identifier}
//doesn't kick in
To handle that I introduced a new condition for variable assignment, one for an identifier token and another for label token, and this again works as intended, but of course it’s hackish, and again the colon tokens becomes part VarName{label} nodes.
Assign {
( VarName{identifier}
|
VarName{label} OptionalType{ identifier} )
"=" expression
}
Grammar 2
@top Script { statement* }
@skip { space | newline }
@precedence {
withL @left
label @left
}
statement {
ExprStm{ expression } |
Assign
}
expression {
String | VarName{identifier} |
expression !withL WithClause{With labeled+ }
}
With{@specialize< identifier, "with">}
// use label token for With instead of identifier
labeled { !label Key{label} Value{expression} }
Assign {
( VarName{identifier}
|
VarName{label} OptionalType{ identifier} )
"=" expression
}
@tokens {
space { $[ \t]+ }
newline { $[\r\n] }
String { '"' (![\\\n"] | "\\" _)* '"' }
identifierChar { @asciiLetter }
identifier { identifierChar (identifierChar | @digit)* }
label { identifier space? ":"} // <-- new label token
@precedence {label, identifier}
}
Edit: I think I’ve properly interpreted the shift-reduce conflict I mentioned above that !label tries to solve, and introduced ~with ambiguity markers in (seemingly) correct where !label used to be. The grammar below now works without requiring the special label token, but would still love to hear your thoughts on this approach, especially if maybe the ambiguity marker is unneeded. Thank you!
New grammar with ambiguity markers instead of a label token and precedence marker
@top Script { statement* }
statement {
ExprStmt{ expression } |
Assign
}
expression {
String |
VarName{identifier} |
expression
~with WithClause{@specialize[@name=With]<identifier, "with"> labeled+ }
}
Assign {
VarName{identifier} ( ':' OptionalType{ identifier})?
"=" expression
}
@precedence {
withL @left
label @left
}
// the label prec removes the shift/reduce conflict, but forces all succeeding identifiers to be
labeled { ~with Key{identifier} ':' Value{expression} }
@skip { space | newline }
@tokens {
space { $[ \t]+ }
newline { $[\r\n] }
String { '"' (![\\\n"] | "\\" _)* '"' }
identifierChar { @asciiLetter }
identifier { identifierChar (identifierChar | @digit)* }
}