"Lower than default" precedence in Lezer grammars

andreypopp · March 29, 2021, 2:26pm

I’m trying to port Julia’s Tree Sitter grammar to Lezer and it uses negative precedence — by default all rules have precedence set to 0 in Tree Sitter, so negative precedence is lower than the default one — and I’m wondering if there’s a way to specify precedence “lower than default” in Lezer?

marijn · March 29, 2021, 7:07pm

No, that’s not supported (except with dynamic precedence, but if you can avoid that static precedence is much cheaper). But unless this rule conflicts with a lot of other rules, applying a regular precedence to the others might work.

andreypopp · March 31, 2021, 11:22am

Thanks! I think I was able to overcome that.

Another question: Julia grammar requires some more control over whitespace, for example in function call the left paren ( should open immediately w/o any whitespace after the identifier; this is a syntax error f (1) while this is ok f(1).

In tree-sitter-julia they have a custom tokenizer which emits IMMEDIATE_PAREN token.

I’ve tried to do the same with Lezer but it seems external tokenizers don’t have access to “lookahead”. I’ve looked into using ContextTracker but it doesn’t seem to be able to react on whitespace so I cannot react on the occurrence of whitespace in the input.

The gist of what I’m trying to do:

export const layout = new ExternalTokenizer(
  (input, token, stack) => {
    if (
      input.get(token.start) === LPAREN &&
      // only if immediateParen is allowed
      stack.canShift(terms.immediateParen) &&
      // only if we haven't encountered whitespace on prev position
      !stack.context.whitespace
    ) {
      token.accept(terms.immediateParen, token.start - 1);
      return;
    }
  },
  {
    contextual: true,
    // So we still produce "(" after immediateParen
    extend: true,
  }
);


export let context = new ContextTracker({
  start: {whitespace: true},
  // THIS DOES NOT EXIST
  skip(context) {
    return {whitespace: true};
  },
  shift(_context, term, input, stack) {
    return {whitespace: false};
  },
  hash(context) {
    return +context.whitespace;
  }
});

Do you think this makes sense or am I missing some other obvious solution?

Thanks!

marijn · March 31, 2021, 2:19pm

Custom tokenizers should be able to read any input they desire, so it should be possible to emit a special token for parens without whitespace in front of them.

andreypopp · March 31, 2021, 3:41pm

I was hoping I could reuse definitions for whitespace-y tokens defined in grammar but checking what was before the paren is good enough too. Thanks!