Dynamic token types for ambiguous languages


#1

Hi there,

I’m wondering if CodeMirror supports defining token types more dynamically. For example, if I were to use CodeMirror to create English language sentences, a word in a sentence may be a verb or a noun in a way that a parser cannot infer statically; the information must be supplied by a user externally:

We saw her duck.

I’ve been pointed to markText, but it feels like the wrong place to start. Even if I ended up using it, I may have to build a lot of the use case myself. Note that I need a token to stay of a given type until it’s modified; optionally, I would also like an option to force the token to be deleted or replaced all at once, which means disallowing editing inside the token.

Thanks,
Mihai


#2

That’s the atomic option to markText.


#3

However, isn’t the mode the place to decide on this and represent the differences as different token types? Is there a best practice for doing it at the mode level? Any reason no to do it like that?


#4

Nope, that’s not what CodeMirror modes do – they provide tokenizing information, but don’t directly change the state of the editor.


#5

But modes can indirectly change the state of the editor: they can change their mind about a token’s type. If a noun changes to a verb, isn’t that a change in the token type? If we don’t use the mode, aren’t we in effect dumbing down the mode and pushing knowledge that should belong in the mode elsewhere?