CRDTs & Positions in CodeMirror 6

dmonad · August 18, 2020, 4:31pm

A few months ago you considered using a CRDT as a data model for CodeMirror.

https://marijnhaverbeke.nl/blog/collaborative-editing-cm.html

I completely understand that you are hesitant to integrate a CRDT into CodeMirror just for better position handling. Although, I showed that the overhead of using a CRDT is not that expensive (https://blog.kevinjahns.de/are-crdts-suitable-for-shared-editing/).

ProseMirror and CodeMirror 6 now both implement a similar approach for transforming positions. From a developer experience, this is great. This allows third-party plugins to provide features such as commenting on ranges of the document to unique map positions while accounting for remote changes.

Your blog post describes perfectly the advantages of using a CRDT for position mapping. As I described in the ProseMirror forum, the current position mapping approach is unsuitable for CRDT-bindings to *Mirror editors. So third-party plugins won’t work with CRDT approaches to provide shared editing.

I think there are several good arguments to reconsider using a CRDT for shared editing in CodeMirror.

Better position handling
The author of ShareJS and ShareDB openly speaks out against OT: https://news.ycombinator.com/item?id=24194091
The web has evolved to a point where web applications do work peer-to-peer over WebRTC. One of many real-world examples is room.sh. They use Yjs & CodeMirror 5 to provide collaboration in peer-to-peer WebRTC sessions.

I hope that the future web is more decentralized. Collaboration over WebRTC is already very common. Shared editing backends that use CRDTs are easier to scale because they only require a simple PubSub server to exchange updates (Nimbus based their editor on Yjs because CRDTs are easier to scale). I hope that Local-first software will become more relevant because I believe that data should be owned by the user.

Decentralized web applications are no longer an edge case and they hold a lot of future potential. I hope that CodeMirror 6 plugins, that use the default position mapping, will work as expected in decentralized applications.

So I hope that you reconsider using a CRDT as a data model in CodeMirror 6.

A bit of background about me: I’m the author of a CRDT framework Yjs that enables shared editing over any network stack (WebRTC, Hyper, WebSocket, …). A couple of demos for shared editing in different editors (ProseMirror, CodeMirror, Quil, Atlaskit, …): https://docs.yjs.dev/ecosystem/editor-bindings & https://demos.yjs.dev/

My goal is to power shared editing on the web with Yjs. Different editors can use the same technology to provide shared editing over different backends. I think the Y.Text type would be a great data model for CodeMirror 6. If you are interested, I’d love to help you to integrate the Y.Text type into CodeMirror 6 and work with you on a better approach for position mapping. Another advantage of Yjs is that it supports selective Undo/Redo for free (no additional data structure to hold operations, just ranges on the vector clocks).

Alternatively, it would be helpful if CodeMirror 6 would allow CRDT editor bindings to hijack position mappings. Although, as you explained, index positions don’t always accurately describe ranges in collaborative documents. With CRDTs, it is possible to refer to deleted characters. This is not possible if positions are described as index positions. With any change around the mapped position, some information gets lost. A commenting plugin that describes comments as numeric index ranges on the document wouldn’t work in WebRTC applications like room.sh.

Another alternative is to abstract positions in CodeMirror. A CRDT implementation could add markers to the abstract position to describe the position relative to the CRDT model without doing index transformations (i.e. map to the unique ID of the character).

This abstract position approach is the least intrusive approach that would allow CM to work in distributed software. I would still favor the CRDT integration as shared Undo/Redo should be handled by the CRDT implementation. There are more advantages of integrating a collaboration-aware model into CodeMirror (e.g. support for diffs on the CRDT model, similarly to versions in the yjs.dev demo). But anything that would allow me to provide Yjs bindings for the fantastic CM 6 editor would work for me.

marijn · August 18, 2020, 6:42pm

Look, I spent a month and a half thinking about this, and wrote down my conclusions in that blog post. They stand.

I’m not sure how that would work. If mappings are defined in terms of document offsets (as opposed to operation IDs), how would you provide a useful mapping based on the CRDT data? Also, in which cases does this help?

In any case, the main argument against integrating a CRDT system is conceptual complexity, and ‘positions are just numbers’ is one of the main pieces of simplicity that this allowed me to preserve. I don’t expect you’ll be able to convince me otherwise.

Integrating Yjs as you did for ProseMirror, as an external component that drives the collaborative editing, should be even easier for CodeMirror (due to its much simpler data model). I think that’s the way to go. If you have specific concerns about how that’d work, let’s discuss those.

dmonad · August 19, 2020, 1:50pm

That’s fair. I just wanted to give this one last shot.

I just realized that the example in your blog post shows that positions in OT also don’t always converge. This was surprising to me. I’m currently educating people using y-prosemirror not to use the default position mapping because they don’t work in peer-to-peer scenarios. If this is the expected behavior for CodeMirror & ProseMirror, I can be more lenient to the fact they they also don’t converge
with Yjs.

Although, I still think that in some cases this behavior is not acceptable. For example when implementing a shared-cursor plugin, or when implementing a commenting plugin.

The idea is to find the position in the old CRDT state and then map that position to the new CRDT state.
I would implement this by associating every CodeMirror State object to a Yjs Snapshot. A snapshot is a view on the Yjs document that reflects the old state. A position mapping from one state to another would basically find the position in the old state (extracting the unique character id) and map that position to the new state (by computing the offset of the unique character id).

If it was possible to hijack the position mapping and provide a custom mapping based on the CRDT model, it would be possible to have positions converge. This will be helpful for plugins that mark text and rely on positions to converge.

I’m currently struggling to find more good examples when positions must converge. Maybe you are right and having diverging positions are an acceptable tradeoff for simpler position representation. It would still be nice to have positions converge when using Yjs.

marijn · August 19, 2020, 2:14pm

If your input is still a plain (and thus ambiguous in the face of deletions) position, I don’t see how you can provide a better mapping. A position next to a deletion will correspond to multiple CRDT character IDs, and you don’t have information to disambiguate that.

dmonad · August 19, 2020, 3:46pm

You are right… this wouldn’t work either. For the few cases when convergence is required I will provide a different position abstraction that doesn’t need to be native to CodeMirror. Thanks for your feedback.