Expressing patterns in styleTags

Hey, I’m working on Elixir language support. I wrote a parser that matches our tree-sitter implementation and now I got to highlighting. A couple questions/feedback with regard to that:

  1. Is there a way to match against node content? This is crucial for highlighting Elixir correctly and I think it’s a very relevant feature in general.

    To give a specific example, in Elixir pretty much all keywords, such as def, if, for are regular functions (macros to be specific), so def my_fun do ... end is parsed as “calling function def with argument my_fun and a do-block”. For the purpose of highlighting however, we still want to distinguish such calls and colour those functions as keywords. So the query we want is Call/Identifier, where Identifier is either of "def", "if", "for", ....

    Ideally the matching would support regexes, so we can match variables starting with _ and highlight as unused variables.

    If this is not an existing feature, could such be added? Given that this is a filter on top of existing query, it shouldn’t have a hit on performance.

  2. Is there a way to assign parent style based on a child? Again, a specific example, in Elixir modules have module-attributes denoted as @name value. Among other usages, such attributes can be placed before function definition as an annotation and this is used for docstrings:

    @doc """
    Returns stuff.
    """
    def fun() do ... end
    

    The attribute would be parsed as UnaryOperator(Call(Identifier, Arguments(String(QuotedContent)))). In order to highlight this, we first need the feature outlined in 1., so that we can match on the specific “doc” identifier. But if we do it as UnaryOperator/Call/Identifier where Identifier is "doc", then we can only target the Identifier and possibly the child String, but we should also annotate the operator itself. For the reference, here you can see the corresponding tree-sitter query (and all the others :)).

  3. Tree-sitter distinguishes between “named” an “anonymous” nodes, so that the syntax-tree is not verbose (relevant, for example, in tests). It also has “fields”, which allow adding a label to a node child (either named or anonymous), which can be used for matching later.

    To give an example, if the expression is a |> b, in tree-sitter I would parse this as BinaryOperator(Identifier, Identifier), but there would also be labels left, operator and right. With this I can match on the right-hand-side of |> (and there is a valid use case where this is needed). From my understanding, in order to access all the information for matching in Lezer, I would need to parse it as BinaryOperator(Left(Identifier), Operator("|>"), Right(Identifier)).

    So the question is, is my understanding correct that in the Lezer system all of these should be named nodes, and otherwise we can’t match on them?

Also, thank you for all your work!

No. Highlighting is purely derived from the tree, for performance reasons.

Again, no.

Also no named children in Lezer.

Couldn’t styleTags accept a (context) => tag functions, similarly to indentation rules?

No. Highlight is done again on every doc or parse tree change. It has to be cheap.

Hey again!

For future reference, I ended up specialising certain nodes in the syntax tree, so it is effectively a superset of the tree-sitter version. For example to solve 1. I added FunctionDefinitionCall, which is a special case of Call. Consequently there is more duplication, but we have more specific tree to query against for the highlighting tags.


@marijn I have both lezer-elixir and lang-elixir ready, are you open to including them in the respective organisations?

Side question: when working within codemirror/dev, do you have a good way to link a local Lezer package, so that the devserver build picks it up?

Great! I don’t put 3rd-party package under my organizations, but I’ll gladly link them from the community language list and language-data. I’d recommend package names lezer-elixir and codemirror-lang-elixir.

No. That setup is intended for development on the stuff I maintain. What would be the reason for developing this package in that context?

Ok, I will let you know once both are published : )

What would be the reason for developing this package in that context?

I am asking because I am curious about the optimal setup for developing a parser/highlighting linked to a local demo (be it codemirror/dev, or an actual application). Using npm link or installing from directory has the issue that there are multiple node_modules with duplicated packages and this breaks instanceof checks. When developing the package, for a fast feedback loop, I ended up using absolute paths across the projects to avoid the dependency issue.

Oh, right, yeah, that’s an annoying thing with npm. I usually just symlink in the /dist from my dev version into the one under the test project’s /node_modules, but that’s not a great solution either.

Got it, thank you!

I’ve just published lezer-elixir (source code) and codemirror-lang-elixir (source code). The latter package was taken and contained the legacy mode, but the owner was nice enough to transfer, so it is now at v4.0.0.

Nice, thank you!