Greetings,
I am trying to create a grammar that compiles a very simple language. I have [mark] and [/mark] tags that I want to parse. Here are some examples:
hello world
should be parsed as a singleText
node[mark]hello world[/mark]
should be parsed asMarkTag => (OpenTag Text ClosedTag)
bla [mark]bla[/mark] bla
should be parsed asText MarkTag => (OpenTag Text ClosedTag) Text
[mark] [mark] [/mark] [/mark]
should match the first open tag with the first closing tag. i.e. opening tags are treated as text in a mark block. No nesting.
Here are some attempts:
@top Program { expression* }
expression { MarkTag | Text }
MarkTag { (OpenTag Text CloseTag) }
@tokens {
OpenTag { "[mark]" }
CloseTag { "[/mark]" }
Text { ![\n]+ }
@precedence { OpenTag, CloseTag, Text }
}
Despite the @precedence, once it matches something as Text it matches the entire rest of the line that way
@top Program { expression* }
expression { MarkTag | Text }
MarkTag { (OpenTag Text CloseTag) }
Text { chars ("[" chars)*}
@tokens {
OpenTag { "[mark]" }
CloseTag { "[/mark]" }
chars { ![[\n]+ }
@precedence {OpenTag, CloseTag, Text }
}
This attempted to interrupt the text on every [
, to get it to consider OpenTag and CloseTag again, but it doesn’t work in all cases, such as [mark][[/mark]
due to requiring extra chars around the open bracket. And any attempt I made to fix that issue (such as replacing the +
in chars
with *
created shift/reduce problems
Two of us have been working on this for ages, but we don’t seem to be getting anywhere. How do I fix this grammar?