Tokenizer is not contextual with `@specialize`

AlexErrant · April 21, 2024, 8:29pm

I’m writing a language similar to Github’s query syntax. Consider this grammar:

@top Program { expression* }
@skip { whitespace }
@precedence { unary @right }

// "keyword"
kw<word> { @specialize[@name={word}]<SimpleString, word> }

expression {
  !unary Not?
  ( SimpleString | Filter )
  ( kw<"OR">? expression )*
}

Filter {
  created
  // | edited | author | tag | otherFilters
}

created {
  kw<"created"> ":"
  ( kw<"today">
  | kw<"yesterday">
  )
}

@tokens {
  SimpleString { @asciiLetter+ }
  whitespace { @whitespace+ }
  Not { "-" }
}

and this input: foo bar yesterday created:today.

created:today is parsed correctly as a Filter, but yesterday is tokenized as a kw when I wish it to be a SimpleString. In my naive view, the tokenizer isn’t contextual here since created: must precede yesterday.

Is my only solution to use an external tokenizer?

marijn · April 22, 2024, 6:28am

@specialize itself will unconditionally replace tokens of the given type with the specialized token. What may work for you is to have a different SimpleString-like token that you use for the specializations, and make sure that is never used in a context where SimpleString is also valid.