I am adapting a toy language (Monkey) to a dielect (pua-lang) where the keywords are ridiculous Chinese techno-babble. Now what I want to highlight is the following:
赋能 拔河123 = 抓手(x) {
细分 (x 对齐 0) {
0;
} 路径 {
细分 (x 对齐 1) {
1;
} 路径 {
拔河123(x - 1) 联动 拔河123(x - 2);
}
}
};
拔河123(10);
And starting from a working grammar for Monkey, I tried:
CodeMirror.defineSimpleMode('monkey', {
start: [
{ regex: /".*"/, token: 'string' },
{ regex: /(?:fn|let|return|if|else|抓手|赋能|细分|路径|反哺)(?:\b|(?=\s|[()]))/, token: 'keyword' },
{ regex: /true|false|null|三七五|三二五/, token: 'atom' },
{ regex: /\d+|[-+]?(?:\.\d+|\d+\.?\d*)/, token: 'number' },
{ regex: /[-+\/*=<>!]|对齐|联动|差异|倾斜/, token: 'operator' },
{ regex: /[\{\[\(]/, indent: true },
{ regex: /[\}\]\)]/, dedent: true },
{ regex: /\p{XID_Start}\p{XID_Continue}*|[a-z$][\w$]*/u, token: 'variable' },
],
comment: [],
meta: {},
});
Now the keyword
part looks over-compilated, but that’s just an idiosyncrasy of \b
. Hardcode a look-ahead, and then it works in both the console and this grammar. What’s really weird is that some stuff work in the console (as /...regex.../u.exec('string')
) but not in the grammar, specifically the operator
and variable
tokens.
What did I mess up here? (pr)