Hey, I’m working on Elixir language support. I wrote a parser that matches our tree-sitter implementation and now I got to highlighting. A couple questions/feedback with regard to that:
-
Is there a way to match against node content? This is crucial for highlighting Elixir correctly and I think it’s a very relevant feature in general.
To give a specific example, in Elixir pretty much all keywords, such as
def
,if
,for
are regular functions (macros to be specific), sodef my_fun do ... end
is parsed as “calling functiondef
with argumentmy_fun
and a do-block”. For the purpose of highlighting however, we still want to distinguish such calls and colour those functions as keywords. So the query we want isCall/Identifier, where Identifier is either of "def", "if", "for", ...
.Ideally the matching would support regexes, so we can match variables starting with
_
and highlight as unused variables.If this is not an existing feature, could such be added? Given that this is a filter on top of existing query, it shouldn’t have a hit on performance.
-
Is there a way to assign parent style based on a child? Again, a specific example, in Elixir modules have module-attributes denoted as
@name value
. Among other usages, such attributes can be placed before function definition as an annotation and this is used for docstrings:@doc """ Returns stuff. """ def fun() do ... end
The attribute would be parsed as
UnaryOperator(Call(Identifier, Arguments(String(QuotedContent))))
. In order to highlight this, we first need the feature outlined in 1., so that we can match on the specific “doc” identifier. But if we do it asUnaryOperator/Call/Identifier where Identifier is "doc"
, then we can only target the Identifier and possibly the child String, but we should also annotate the operator itself. For the reference, here you can see the corresponding tree-sitter query (and all the others :)). -
Tree-sitter distinguishes between “named” an “anonymous” nodes, so that the syntax-tree is not verbose (relevant, for example, in tests). It also has “fields”, which allow adding a label to a node child (either named or anonymous), which can be used for matching later.
To give an example, if the expression is
a |> b
, in tree-sitter I would parse this asBinaryOperator(Identifier, Identifier)
, but there would also be labelsleft
,operator
andright
. With this I can match on the right-hand-side of|>
(and there is a valid use case where this is needed). From my understanding, in order to access all the information for matching in Lezer, I would need to parse it asBinaryOperator(Left(Identifier), Operator("|>"), Right(Identifier))
.So the question is, is my understanding correct that in the Lezer system all of these should be named nodes, and otherwise we can’t match on them?
Also, thank you for all your work!