I have seen comments in a few places (see below) about conventions for tags / scopes used in parsing and syntax highlighting which aren’t clear to me and wanted to try to clarify what is meant and what CodeMirror’s goal is with the approach taken.
TextMate-style open ended type names mean that everybody is going to define their own ad-hoc types, and theme writers just won’t know what to target. So you get impractically huge themes trying to target all the crap found in the wild
CodeMirror uses a mostly closedvocabulary of syntax tags (as opposed to traditional open string-based systems, which make it hard for highlighting themes to cover all the tokens produced by the various languages).
My understanding of textmate grammars is that these problems implied above are theoretically possible but don’t actually happen often, even with widespread use across most modern code editors.
The fallback system in textmate grammars / scopes works such that if a scope defines markup.list.numbered.markdown for which no highlighting rule is found, it will fall back to markup.list.numbered, then markup.list, then finally markup. So the top-level root scope names are the only things which theme authors need to target. Writing themes with huge numbers of scopes is something that theme authors would do if an only if they want to change specific things.
The more restrictive approach taken in codemirror makes it difficult to build themes with the same degree of control that is used in other editors, and seems to require one of the following options:
Modifying the parser in order to change the opinions enforced by the mode author about how tokens should be grouped into a very limited number of available scopes
Writing every mode with a huge number of custom tags exported, where each custom tag falls back to the standard set of tags defined. This basically re-creates textmate’s approach, but via imports rather than string construction, and I’m not sure why that would be worth it.
The themes I looked at had all kinds of language-specific rules in them, which seems contrary to the idea of a generic theme.
The vocabulary provided by @lezer/highlight is not, I think, “very limited”. It has 78 different tags and 6 modifiers to work with. Typical programming languages should be able to tag all the constructs they distinguish with these with little problem, and exporting custom tags is not something that happens a lot.
What, concretely, is the problem you are having here?
I think the reason why language-specific rules are included is important. Theme authors choose to provide highlighting rules for specific modes, because they want to make a really nice theme that they have fine-grained control over, not because they need to in order to make a decent theme.
The concrete issue I’m having is exactly that, particularly for markdown (which I know is a special case). I’m making a theme and want to have page structure marks (Section header marks, list marks) colored separately from text emphasis marks.
The Markdown mode for Lezer assigns the same tag to all of these: markdown/markdown.ts at main · lezer-parser/markdown · GitHub
As far as I can tell, my only option is to modify the markdown mode (with custom tags for these things that are exported, etc.) in order to make a theme behave the way I want. Is that a good general way to go about making this type of finer-grained control possible for theme authors?