Dynamic token types for ambiguous languages

viridium · December 29, 2016, 4:54pm

Hi there,

I’m wondering if CodeMirror supports defining token types more dynamically. For example, if I were to use CodeMirror to create English language sentences, a word in a sentence may be a verb or a noun in a way that a parser cannot infer statically; the information must be supplied by a user externally:

We saw her duck.

I’ve been pointed to markText, but it feels like the wrong place to start. Even if I ended up using it, I may have to build a lot of the use case myself. Note that I need a token to stay of a given type until it’s modified; optionally, I would also like an option to force the token to be deleted or replaced all at once, which means disallowing editing inside the token.

Thanks,
Mihai

marijn · December 29, 2016, 7:18pm

That’s the atomic option to markText.

viridium · January 6, 2017, 1:18am

However, isn’t the mode the place to decide on this and represent the differences as different token types? Is there a best practice for doing it at the mode level? Any reason no to do it like that?

marijn · January 6, 2017, 8:09am

Nope, that’s not what CodeMirror modes do – they provide tokenizing information, but don’t directly change the state of the editor.

viridium · January 6, 2017, 9:59am

But modes can indirectly change the state of the editor: they can change their mind about a token’s type. If a noun changes to a verb, isn’t that a change in the token type? If we don’t use the mode, aren’t we in effect dumbing down the mode and pushing knowledge that should belong in the mode elsewhere?