match for all characters

Is there a way to match any character? I’m working on a grammar for liquid templates, where a comment block can look like this:

{% comment %} any symbol/character/bracket etc. can
go in between these tags.
{% endcomment %}

I want to mark this entire thing as a comment and skip it, no matter what’s in between the start and end. So far I have:

Comment {
“{% comment %}”
($[a-zA-Z0-9] | whitespace | Operator)*
“{% endcomment %}”
}

and I can keep going by adding every possible unicode symbol to the middle, but I’m wondering if there isn’t a way to match every character, like .* in regex? Thanks!

An underscore matches any character. But that won’t work here, because that’ll also match {% endcomment %}. Finding that kind of end terminators in a regular language is awkward, so it might be easier to write this as an external tokenizer.

Ooh good to know! Alternatively, is there a way to exclude a series of characters, rather than a set? Something like:

!["{% comment %}"]* …and have it exclude that exact sequence rather than any of the characters?

No, there isn’t. Tokenizers work one character at a time, so there’s no straightforward way to express such a thing.

1 Like

Is the functionality of underscore documented somewhere? I just spent the better part of a morning trying to figure out why StringEscape { "\\" _ } from the Lezer System Guide works. Could not find anything about underscore.

If not documented, it should be IMO. It is a crucial part of not just the syntax but also the system guide.

Good point. I’ve added a sentence that introduces it in this patch.

Thank you for the prompt response. :slight_smile: I didn’t realise the website was also open source, I would’ve submitted a patch myself.