How can ebnf notation be used to add a new language to Codemirror?



We are looking to add FEEL, a language introduce by an OMG DMN specification, to CodeMiror. I started looking at how to define new language and I was wondering if there is a mapping available somewhere between eBNF (into which the whole syntax of FEEL is described) and the system used by CodeMirror?



I’m not aware of such a thing, though it would definitely be theoretically possible (and probably not even very hard) to implement. I’ve been hoping someone puts together a good open-source implementation, but no luck so far.


I wasn’t even looking at an automated mapping but more more like a guide how how to “translate” ebnf to CodeMirror parser.


If your grammar is (mostly) LL(1), you can write a mode that resembles a recursive descent parser in structure, but keeps its stack as a first-class data structure (because modes have to run one token at a time and can’t use the regular function stack). The JavaScript mode does something like this. It’s relatively complicated, partly because JavaScript has a horrible grammar, but also partly because it has to handle unsyntactic input gracefully (and preferably recover and color the stuff after it properly).

(If you don’t need a ‘deep’ parse, you can get away with just implementing a tokenizer, plus maybe some simple rules for tracking context for indentation, which is what most modes do.)


Thanks Marijn for those insights.

The biggest difficulty of FEEL is that it allows space in variable and function name which makes me doubt it is LL(1). I have to look forward until I find a match to one of the existing variables to know it is a variable. They even allow language keyword (such as if, for) to show un in those variable names, but not at the beginning. So in FEEL something like that could be valid, assuming “Gift for teacher” is a variable in the context.

if Gift for teacher = "apple" then
return "grocery store"
return "mall"

So the produced highlight in that post isn’t that bad, except that “for” should not be black as it is not a keyword in that case.


It’d be great if codemirror (perhaps the new version 6) had direct support for EBNF grammars. I’m using ANTLR to specify a custom language for a project and will most likely use codemirror for the stylized display. Being able to use the same grammar for the language parsing and codemirror would greatly simplify the process. But it may be possible to define an EBNF grammar and use ANTLR to translate that into the codemirror specific parser. ANTLR has a javascript target so should be possible.