Hi. I’d like to write a CM6 highlighter for my language. I’m new, and I might be misunderstanding.
I’m having some trouble. I think I understand what’s wrong, but I’m not confident of my understanding, and if I am right, I don’t know how to resolve it.
The Language
The basis of my problem is a detail of the language I’m trying to parse, FSL, which is heavily reliant on what lisps and prologs call atoms, or what perl calls barewords.
In this language, you can tell from context when something is a bareword, and when something is a language token. As a result, it’s actually perfectly fine to have barewords which appear to collide with language tokens.
In the language,
foo -> bar -> baz;
This is a chain of three barewords. We can tell because they’re separated by arrows, which don’t get used anywhere else. My parser will parse this as a Chain
. Underneath are three Atom
s and two Arrow
s, plus a Terminator
(the semicolon.)
This matters because language tokens tend to look like barewords. By example, you can write
state foo: { shape: circle; };
This is a StateDecl
(declaration) filled with a StateDeclItem
.
The language realizes from context that state
is a keyword, not an atom’s name.
I wrote a language highlighter for Lezer which is able to parse this correctly. It kinda feels like PEG, which is nice. As a result, the language can parse this, which otherwise seems ambiguous:
state -> country;
state country: { label: "A nation"; };
state state: { label: "A state"; };
The Problem
The problem is, what I receive in the editor is a thing that’s marked Chain
, and another marked StateDecl
. I don’t seem to be able to highlight the Atom
s or the Arrow
s inside of the chain, or the StateDeclItem
inside the StateDecl
.
What I want instead are to reference the Arrow
s and String
s and Number
s and Atom
s and so on. Almost all my rules are compound in this way, and so I don’t entirely know how to move forwards.
As a result, I’m only able to meaningfully highlight the rules which are complete as their top-level expression, like line comments and flow declarations.
The other problem is that I’m having trouble with ambiguity resolution. If I try to promote the subordinate rules such that they’re exposed, the atoms and arrows collide with one another, because in some ways they have overlapping character sets. foo-bar
is a valid atom, so the arrow ->
collides on grounds of the hyphen, apparently.
The Ask
What I want is to be able to say “style the things inside the top level rule you gave me, instead of the top level rule itself.” Is that possible? If so, this all goes away.
If not, the alternative would be “is there a way to modify my grammar to make these sub-rules top-level, without falling afoul of ambiguity?” But I want this grammar shape, and if I can just highlight the sub-rules somehow, I’d much prefer that.
Reference
In case they’re relevant:
- My language plugin
- My grammar (see line 50 please)
- My test cases
- A live editor, set up for FSL
- The editor’s source
In the live editor, you’ll see that a Chain
gets highlighted, but the individual atoms and arrows inside are not; similarly, the Chain
has a DOM representation but its sub-elements do not.
A valid simple chain for the editor is
foo -> bar;
The ideal mock-dom for that would be (please ignore the dumb indentation, just forum formatting)
<chain>
<atom>foo</atom>
<arrow>-></arrow>
<atom>bar</atom><terminator>;</terminator>
</chain>