Hello! I’m currently looking at writing a Lezer grammar/parser for systemd-style ini files for a personal project, and hit a couple of snags.
While the actual eBNF of ini files is simple, I’ve had a lot of issues getting whitespace and comments to behave correctly.
The first issue is that comments are only valid if the line starts (ignoring leading whitespace) with a #
. So for instance:
# This is a valid comment
x = 2 # but this is actually part of the value
This suddenly makes comments much harder to use in @skip
rules, as they’re only valid in certain contexts.
The other nasty feature is continuation lines. Like many other languages, a line can end with \
to treat the line break as a space rather than a new line. However, the following line(s) may be comments, which will be skipped before the actual content appears:
[Section header\
# A comment
continued]
This can more easily be expressed as a @skip
rule. However, in the above case, the “continued” section is parsed as a syntax error, and I’m not quite sure why:
- Section: "[Section header\\\n# A comment\ncontinued]\n"
- SectionHeader: "[Section header\\\n# A comment\ncontinued]"
- SectionName: "Section header"
- Comment: "# A comment" (skipped)
- ⚠: "continued" (error, skipped)
- SectionEnd: "]"
For completeness, here is the whole grammar:
ini.grammar
@skip { space | Comment eol | "\n" } {
@top Unit {
Section*
}
Section {
sectionHeader
}
}
@skip { cont (space? Comment eol)* } {
sectionHeader { SectionHeader eol }
SectionHeader { "[" SectionName* SectionEnd }
SectionEnd { "]" }
SectionName { sectionName | "\\" | "]" }
}
@tokens {
eol { @eof | "\n" }
space { $[ \t]+ }
Comment { $[#;] ![\n]* }
cont { "\\" eol }
sectionName { ![\n\\\]]+ }
}
Was wondering if anyone with more familiarity with Lezer would be able to offer some thoughts on what I’m doing, and if there’s any other alternative routes I should be trying? It’s possible this is solvable with a context tracker + custom tokeniser, but wasn’t quite sure how to go about that.