Hi all, I spent few days now stuck on getting grammar rules to match a language grammar I’m working on. The language is called Csound and it has some unconventional ways it can make a function statement. It can in fact call a function with or without parenthesis, and it support comma seperated arguments as input and output parameters of a function call. And this is where Lezer is tripping me out, even with ambiguity char ~ I still can’t get this to work, so Im hoping to tap into some expert advice here.
In Csound, a function has the archaic name Opcode, which I’ll use in this example.
You can call Opcode within an instr block like this with arguments ending with a newline that terminates the statement.
instr 1
sum 1, 2, 3
endin
this is easy enough and will work with grammar like this (note that Im leaving out some rules to make it shorter), note as well that variables in csound I’m calling SignalRateIdentifier, and those for example are the only identifier type allowed in opcode outputs.
@top Program { rootstatement* }
@skip { BlockComment | LineComment | space }
rootstatement {
InstrumentDeclaration | newline
}
statement {
OpcodeStatement
}
OpcodeStatement {
Opcode expressionCommaSep newline
}
opcodeOutputExpression { commaSep<SignalRateIdentifier> }
expressionNoComma {
Number |
String |
SignalRateIdentifier |
ParenthesizedExpression { "(" expressionNoComma ")" } |
BinaryExpression {
expressionNoComma !arithOpR arithOpRight expressionNoComma |
expressionNoComma !arithOpL arithOpLeft expressionNoComma
} |
ConditionalExpression {
expressionNoComma !ternary "?" expressionNoComma ":" expressionNoComma
} |
CallbackExpression { Opcode !call ArgList }
}
expressionCommaSep { commaSep< expressionNoComma> }
InstrumentDeclaration {
kw<"instr"> (FunctionName | Number) newline
statement*
InstrumentEnd
}
InstrumentEnd {
kw<"endin">
}
FunctionName { word }
SignalRateIdentifier { identifier }
Opcode { identifier }
kw<term> { @specialize[@name={term}]<identifier, term> }
@tokens {
LineComment { ";" ![\n]* | "//" ![\n]* }
BlockComment { "/*" blockCommentRest }
blockCommentRest { ![*] blockCommentRest | "*" blockCommentAfterStar }
blockCommentAfterStar { "/" | "*" blockCommentAfterStar | ![/*] blockCommentRest }
identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
Number { ("+" | "-")? @digit? "."? @digit+ | ("+" | "-")? @digit+ "."? @digit* }
word { identifierChar (identifierChar | @digit)* }
identifier { word (":" $[iak])? }
@precedence { spaces, newline, identifier }
@precedence { spaces, newline, opcodeIdentifier }
@precedence { spaces, newline, word }
newline { $[\n\r]+ }
space { $[ \t]+ }
So here comes the ambiguity that I’ve been stuck on, csound allows for the following grammar, which trigger reduce/reduce errors in Lezer.
instr 1
signalRate opcode
signalRate1, signalRate2 opcode 1, 2, opcode(1 + 2)
opcode signalRate1, signalRate2, (1 + (2 * 3))
opcode
endin
My emphasis is on the statement rule itself, of opcode statement with either input and output, just either input or output, and a single opcode statement without input or output arguments.
This is my attempt to catch this pattern which fails, and maybe this is not even possible?
OpcodeStatement {
opcodeOutputExpression Opcode expressionCommaSep newline |
Opcode expressionCommaSep newline |
opcodeOutputExpression Opcode newline |
Opcode newline
}
the error usually complains about SignalRateIdentifier and Opcode conflicting with identifier token. So I try applying ~ operator, but that seems to confuse Lezer even more and it starts to give even worse results.
There’s a lot of context here but I’d be happy with even tiniest hint or tip!