match for all characters

Is there a way to match any character? I’m working on a grammar for liquid templates, where a comment block can look like this:

{% comment %} any symbol/character/bracket etc. can
go in between these tags.
{% endcomment %}

I want to mark this entire thing as a comment and skip it, no matter what’s in between the start and end. So far I have:

Comment {
“{% comment %}”
($[a-zA-Z0-9] | whitespace | Operator)*
“{% endcomment %}”
}

and I can keep going by adding every possible unicode symbol to the middle, but I’m wondering if there isn’t a way to match every character, like .* in regex? Thanks!

An underscore matches any character. But that won’t work here, because that’ll also match {% endcomment %}. Finding that kind of end terminators in a regular language is awkward, so it might be easier to write this as an external tokenizer.

Ooh good to know! Alternatively, is there a way to exclude a series of characters, rather than a set? Something like:

!["{% comment %}"]* …and have it exclude that exact sequence rather than any of the characters?

No, there isn’t. Tokenizers work one character at a time, so there’s no straightforward way to express such a thing.

1 Like