What's wrong in this grammar file code?

Ranjit-s94 · March 6, 2024, 6:54am

As I am new to lezer generator and getting few warnings. Attaching the code below

@top BuilderExpression { Expression }

@tokens {
  ComparisonOperator { ">" | "<" | ">=" | "<=" | "==" | "=" | "!=" }
  PlusMinus { "+" | "-" }
  StarObelus { "*" | "/" }
  NotQuoteChar { !["] }
  String { "\"" NotQuoteChar* "\"" }
  Digit { '0' | $[1-9] }
  TransformerName { "min_max_scaler" | "standard_scale" }
  FunctionName { "if" | "and" | "or" | "not" | "rename" | "toUppercase" | "toLowercase" | "trim" | "contains" | "indexOf" | "lastIndexOf" | "replace" | "length" | "match" }
  Space { " " }
  Round { "(" | ")" }
}

Expression {
  Expression "or" Comparison
  | Comparison
}

Comparison {
  Comparison ComparisonOperator Item
  | Item
}

TimesDivide { 
  TimesDivide SpaceFactor StarObelus  SpaceFactor Factor |
  Factor
}

SpaceFactor {
  Item Space Item | AddSubtract "(" Expression ")"
}

AddSubtract {
  Item PlusMinus Term | RoundTerm "(" Expression ")" 
}

RoundTerm {
  Item Round Term
}

Item {
  Item PlusMinus Term
  | Term
}

Term {
  Term StarObelus Factor
  | Factor
}

Factor {
  Value
  | Parentheses
}

Parentheses {
  "(" Expression ")"
}

Value {
  DataFrame
  | Column
  | Function
  | String
  | Number
  | Bool
}

Function {
  Transformer
  | RegularFunction
}

Transformer {
  TransformerName "(" Expression ("," Expression)* ")"
}

RegularFunction {
  FunctionName "(" Expression ("," Expression)* ")"
}

DataFrame {
  "[" String "," String "," String* "]"
}

Column {
  "[" String "]"
}

Number {
  Float
  | Integer
}

Bool {
  "true"
  | "false"
}

Float {
  Integer "." Integer
}

Integer {
  Digit+
}

@detectDelim

And the warnings while generating the parser are:

Unused rule 'Space' (src/my.grammar 12:2)
Unused rule 'Round' (src/my.grammar 13:2)
Unused rule 'TimesDivide' (src/my.grammar 26:0)
Unused rule 'SpaceFactor' (src/my.grammar 31:0)
Unused rule 'AddSubtract' (src/my.grammar 35:0)
Unused rule 'RoundTerm' (src/my.grammar 39:0)

please help me to resolve these.

marijn · March 6, 2024, 9:04am

The warnings seem pretty obvious. You’re not using TimesDivide anywhere, so it, and all the rules only used through it, are marked as unused.

Also, it is very much recommended to not use the classical CFG style of creating a deep tower of rules to represent precedence (Expression, Comparison, Item, etc), especially with all of them capitalized, because that will produce ridiculously deeply nested trees. Instead, use a general expression (non-capitalized) rule that contains a lot of different expressions types (as capitalized node types), and use precedence (!) markers instead of nesting to control relative precedence.

Ranjit-s94 · March 7, 2024, 6:47am

I have updated the grammar file as given below and it is generating the parser as well but the tree structure is not as expected. Could u please help me refactor the grammar file so that I can get the correct tree structure @marijn .

Grammar file is :

@top BuilderExpression { Expression }

@tokens {
  ComparisonOperator { ">" | "<" | ">=" | "<=" | "==" | "=" | "!=" }
  PlusMinus { "+" | "-" }
  StarObelus { "*" | "/" }
  NotQuoteChar { !["] }
  String { "\"" NotQuoteChar* "\"" }
  Digit { '0' | $[1-9] }
  TransformerName { "min_max_scaler" | "standard_scale" }
  FunctionName { "contains" | "concat" | "indexOf" | "lastIndexOf" | "slice" | "split" | "splitByLengths" | "length" | "match" | "startsWith" | "endsWith" | "replace" | "trim" | "toUppercase" | "toLowercase" | "toNumber" | "toText" | "if" | "or" | "not" | "and" }
  Space { " " }
  Round { "(" | ")" }
} 

Expression {
  Expression "or" Comparison
  | Comparison
}

Comparison {
  Comparison ComparisonOperator Item
  | Item
}

Item {
  RoundTerm SpaceFactor 
}

RoundTerm {
  AddSubtract Round | TimesDivide Round
}

AddSubtract {
  Item PlusMinus Item
}

TimesDivide { 
  Term StarObelus Factor |
  Factor
}

SpaceFactor {
  Item Space
}

Term {
  Term StarObelus Factor
  | Factor
}

Factor {
  Value
  | Parentheses
}

Parentheses {
  "(" Expression ")"
}

Value {
  DataFrame
  | Column
  | Function
  | String
  | Number
  | Bool
}

Function {
  Transformer
  | RegularFunction
}

Transformer {
  TransformerName "(" Expression ("," Expression)* ")"
}

RegularFunction {
  FunctionName "(" Expression ("," Expression)* ")"
}

DataFrame {
  "[" String "," String "," String* "]"
}

Column {
  "[" String "]"
}

Number {
  Float
  | Integer
}

Bool {
  "true"
  | "false"
}

Float {
  Integer "." Integer
}

Integer {
  Digit+
}

@detectDelim

Current Tree structure due to above grammar is :

BuilderExpression(Expression(Comparison(Item(RoundTerm(TimesDivide(Term(Factor(Value(Function(RegularFunction(FunctionName,Expression(Comparison(Comparison(Item(RoundTerm(TimesDivide(Term(Factor(Value(Column(String))),⚠(Space)),⚠),⚠),⚠)),ComparisonOperator,⚠(Space),Item(RoundTerm(TimesDivide(Factor(Value(Number(Integer(Digit))))),Round),⚠))),⚠))))),⚠),⚠),⚠))))

Expected Tree structure is:

BuilderExpression(Expression(Item(Term(SpaceFactor(Space,Factor(Value(Function(RegularFunction(FunctionName,Expression(Comparison(Item(Term(SpaceFactor(Space,Factor(Value(Column(String))),Space))),ComparisonOperator,Item(Term(SpaceFactor(Space,Factor(Value(Number(Integer(Digit)))),Space))))))))),Space)))))

marijn · March 7, 2024, 7:42am

Looking at the structure of this example grammar might point you in the right direction.