I’m trying to match a string using an external tokenizer, but the function is being invoked too late. The Identifier token attempts to match first, which is expected behavior. However, the external tokenizer matches too late in the process.
Here’s the relevant part of my grammar:
TableLabel { tableLabel ":" }
LoadStatement { TableLabel? load String }
VariableName { Identifier }
SetStatement { set VariableName "=" (String | Number)}
// ... more
@tokens {
identifierChar { @asciiLetter | $[\u{a1}-\u{10ffff}_#%@^|?\\.] }
word { identifierChar (identifierChar | @digit)* }
Identifier { word }
// ... more
}
@external tokens tableName from "./tokens" { tableName }
In tokens.ts:
// Match anything until it reaches a colon
export const tableName = new ExternalTokenizer(
(input) => {
let { next } = input;
let hasContent = false;
while (next !== -1 && next !== COLON) {
hasContent = true;
input.advance();
next = input.next;
}
if (hasContent) {
input.acceptToken(terms.tableName);
}
},
{ contextual: true, fallback: true },
);
When I input something like:
tablename:
Load "somefile.csv";
The issue is that the tableName function gets invoked when the next token is “:” instead of at the beginning of the string. “tablename” gets recognized as a VariableName, and I get a syntax error.
I tried defining the tableName
external tokenizer before the @tokens block and added extend: true to the function. However, this approach significantly slows down the parsing performance.
Any suggestions on how to fix this problem?