Is Lezer the right tool for parsing a domain-specific markup language?

I’m working on developing a simple markup language for structuring comic book scripts, as well as some browser-based tools for manipulating said scripts.

Lezer seemed like precisely what I needed to create my parser, but parser specification is a very new endeavor to me, and I’m struggling with some odd errors. In working from the examples of other parsers included in CodeMirror 6, I ran across the Markdown parser there, in the documentation for which it said that Lezer was not well-suited to parsing Markdown, so the Markdown parser had been written separately to act like a Lezer-generated parser, even though it wasn’t one.

Markdown was an inspiration for my project, so I’m now wondering if I’m using the wrong tool for the job.

My language, Serifu (I’ve written a fairly rigorous description here), is meant to make it easy for a writer to specify the page, layout panel, and speaker of the lines of dialogue that make up a comic book script, in a way that feels natural but is also rigorously parseable. It looks sort of like this:

# PAGE 1
	- 1.1
	Title: My Great Comic Book
# PAGE 2
# PAGE 3
	- 3.1
	Superman: Hello, world!
    Lois: Hey, Supes.
    - 3.2
    Lex: I *hate* this guy.

My current Lezer grammar for this project is hereI’m not asking for anything to be debugged, but if Lezer is just the wrong approach to this problem, a hint to that effect would be very helpful. :slight_smile:

If your language can conceptually be tokenized one token at a time, you can probably fit it into an LR parser. But if it does thing like Markdowns two-level parsing (first dividing the text into blocks, then parsing the block content) and arbitrary lookahead (to interpret the start of emphasis, you have to first find the matching end token), then indeed, trying to formulate it as an LR parser is going to be difficult.

1 Like