Writing a custom parser without using Lezer or StreamParser

thetrevorharmon · July 11, 2022, 10:12pm

I have been working on adding language support for an in-house SQLish language. There already exists a grammar for it written in ANTLR4, as well as a lot of custom logic around parsing and tokenization that I will be able to leverage, so going the typical Lezer route doesn’t make much sense for my use case.

I attempted writing a Stream Parser, but discovered that the parsing/tokenization utilities that I am using are incompatible with the way that stream parsing works–the utilities I am using parse the entire document at once and do not support a line-by-line approach (I realize that this does not scale; however, the language is extremely limited and even the most complex queries possible are fairly small).

From this document, it seems my only other option is to write a custom parser. I’ve been trying to explore what that option would look like by looking at the markdown package as an example, as well as exploring the docs for Codemirror and Lezer.

I’m looking to validate my approach and make sure I am not missing anything obvious; I also am starting this thread as a place to share my learnings. Here are the initial steps from what I’ve gathered:

Subclass Language found in @codemirror/language
Subclass the Parser class found in @lezer/common
Implement the methods for Parser (createParse, startParse, parse)
Conform all of the types of my existing language utilities to match the types that Parser expects

The questions that I’m attempting to answer:

Is this a valid approach?
Is there anything obvious I’m missing?

Thanks in advance!

marijn · July 12, 2022, 6:42am

This would probably work, if you are certain that queries are always going to be small enough that there won’t be a need for incremental parsing.