hi!
I built a grammar to parse a simple filter syntax. there’s not too much to it, but in certain cases, I get an unexpected result and I don’t understand why, maybe someone could help me figure it out.
I’m using the lezer playground app to check the lezer syntax tree.
the grammar is defined like this:
@top Filter { expression* }
expression { FilterGroup | FilterStatement | LogicalOperator | space }
FilterGroup { "(" (FilterStatement | LogicalOperator | FilterGroup | space)+ ")" }
FilterStatement { FilterKey space? (comparison | inclusion) }
comparison { ComparisonOperator space? FilterSimpleValue }
// comparison { (ComparisonOperator | InvalidComparisonOperator) space? FilterSimpleValue }
inclusion { Inclusion space? FilterIncludesValue }
FilterKey { Identifier | String }
FilterSimpleValue[@isGroup=FilterValue] { ((PositionOperator | LikeOperator)? (Identifier | String) PositionOperator?) }
FilterIncludesValue[@isGroup=FilterValue] {
List | IncompleteList
}
@tokens {
space { @whitespace }
ComparisonOperator { '=' | '!=' | '<' | '<=' | '>' | '>=' }
InvalidComparisonOperator { $[A-Za-z0-9!@#$%^&*?,_\\.-/]+ space }
LogicalOperator { 'and' | 'AND' | 'or' | 'OR' }
Inclusion { 'in' | '!in' | 'IN' | '!IN'}
PositionOperator { "*" }
LikeOperator { "?" }
// Account for empty pair of brackets to not immediately start a new group, but wait
// for a range / list to be finished inside a filter statement.
// Range { "(" (Identifier space? ("to" | "TO") space? Identifier) ")" }
List { ("(" space* ")" | "(" space? (Identifier | String) ("," space? (Identifier | String))* space? ")") }
Identifier { $[A-Za-z0-9_] $[A-Za-z0-9_.-]* }
String { '"' !["]* '"' }
IncompleteRange { "(" ((space? Identifier?) | (space? Identifier space?)) (($[tT]+$[oO]?)? | ("to" | "TO") space? Identifier?) ")"? }
IncompleteList { "(" ")"? }
@precedence { LogicalOperator, Identifier }
@precedence { Range, IncompleteRange }
@precedence { List, IncompleteList }
@precedence { Inclusion, InvalidComparisonOperator }
}
valid input is parsed as expected. for showing proper error messages, I tested some invalid statements and got unexpected results.
entering
foo , ba
I get at least a somewhat expected result with one filter statement, the statement having a key, two errors, and a value.
changing the input to
foo , bar
all the sudden, I get two filter statements and I have no idea why there should be a difference between ba
and bar
. as far as I see, the number of character should not matter for the filter value.
any input would be highly appreciated
best,
peter