Matching up to but not including


I have a language that uses a somewhat annoying syntax. The main thing here are Actions and Values, both of which consist of nothing but keywords that are never the same. Both of them can have arguments but if they don’t have any they don’t use parenthesis, and both of them can have spaces in their names. Actions and values are always a list of keywords and cannot be separately declared.

For example

Some Action(Some Argument, Some Value(Some Other Argument));
Some Other Action;
Some Value(Argument, Some Value(Some Other Argument));
Some Other Value;

My current solution is something along the lines of this, matching any Action until a separator is reached.

separators { "(" | ")" | ";" | "," | "{" | "}" }
actions { "Action" | "Some Other Action" }
Action { actions separators }

This works, but unfortunately the separator is included in the results. Some Action( arguments

To prevent this I had in mind to give the punctuation precedence over the Actions and Values.

Punctuation { ";" | "(" | ")" | "[" | "]" | "{" | "}" | "." | "," | "!" | "?" | "=" | ">" | "<" }
@precenence { Punctuation, Value, Action } 

But unfortunately this does not work.

In a similar fashion I have numbers declared like so.

Number { @digit+ $[.%] }

Here periods and percentages are included in the number, which in this case is what I want. However numbers are matched in the middle of a word. For example I might have a variable


In which case the number will be incorrectly highlighted as a number. How would I use some sort of word boundary?

I’ve gone over the documentation and scanned through the existing .grammar files, but I have a hard time making sense of it. I’ve tried setting up external tokens, which I think is what I need, but I don’t know what I’m doing. Any pointers in the right direction would be much appreciated!

I can’t quite figure out from your description what the rules are for these kind of keywords, but if any group of space-separated words is always a single unit, the way to do this would be to make that a token that greedily matches such a group of words (wordChar+ (" " wordChar+)* or something like that), and then use specializers to make known keywords different tokens. That should also prevent numbers in the middle of a word from becoming separate tokens.