How to reprocess syntax tree?

luizzappa · March 25, 2023, 9:19pm

Is it possible to reprocess the whole sytanx tree again?

The context:

I’m creating a language package for Excel functions. One of the peculiarities is that some tokens change according to the idiom. For example, the Boolean token can be TRUE (English) or VERDADEIRO (Portuguese).

I created an external token that handles this based on the currIdiom variable:

File: tokens.ts

let currIdiom = 'en-US';

const i18n = {
  BoolToken: {
    'en-US': ['TRUE', 'FALSE'],
    'pt-BR': ['VERDADEIRO', 'FALSO']
  }
}

export const isBoolean = (value: string, stack: Stack): number => {
  return i18n.BoolToken[currIdiom].indexOf(value.toUpperCase()) !== -1
    ? tokens['BoolToken']
    : -1;
}

To be able to change the idiom, I exported a function that changes the currIdiom variable:

export const setLezerIdiom = (newIdiom: supportedIdioms): supportedIdioms =>
  (currIdiom = newIdiom);

Everything is working, but when I switch between languages, it seems to me that it keeps the previous syntax tree, I need to type something to update. See the example below, TRUE is in purple (recognized as a BoolToken when language is en-US), but when I change it to Portuguese, I need to type something to stop being purple (= no longer recognized as a BoolToken):

excel formula

How can I force a reprocess so that highlighting works when switching idiom?

marijn · March 26, 2023, 7:35am

Pass the language as a parameter to whatever creates the parser, and update your configuration to use a new parser whenever the language setting is changed.

luizzappa · March 26, 2023, 2:53pm

I tried here but without success… It always depends on an interaction for the highlighting to update. I must be doing something wrong…

export function spreadsheet(idiom: supportedIdioms = 'en-US') {
  setLezerIdiom(idiom);
  return new LanguageSupport(spreadsheetLanguage);
}

Every dropdown change triggers this here:

 editor.dispatch({
    effects: languageCompartment.reconfigure(spreadsheet(newIdiom))
  });

luizzappa · March 26, 2023, 10:55pm

It’s ugly, but this is what solved it:

  editor.dispatch({
    effects: languageCompartment.reconfigure(spreadsheet(newIdiom)),
    changes: { //Force reprocessing
      from: 0,
      to: editor.state.doc.length,
      insert: editor.state.doc.toString()
    }
  });

marijn · March 27, 2023, 5:38am

That will destroy your undo history and all other editor state that is tracking document positions, though.

blh · June 29, 2023, 5:02pm

Does this mean using buildParser to create a parser on the fly, instead of having a static parser?

Looking at the source code, not sure how one build a static parser that allows changing idiom or decimal separator.

marijn · June 29, 2023, 5:15pm

No. You can use dialects or reconfigured external tokenizers to make small changes like that. Recompiling the grammar should never be necessary at run-time.

blh · June 29, 2023, 6:02pm

reconfigured external tokenizers to make small changes like that.

Is there a lezer example lang that does this which I could reference?

Edit: Guessing we have to use configure and use tokenizers and specializers.

https://lezer.codemirror.net/docs/ref/#lr.ParserConfig.tokenizers

marijn · June 29, 2023, 6:20pm

You can see the lang-sql reconfiguring the parser to wire in a customized external tokenizer here.