How to Compress transactions

Hi all,

I’m using the collab feature from CodeMirror and I’m saving the transactions in one of my DBs. I’m running into an issue where the amount of data being read and stored into my DB is excessive and I’m trying to see how to cut back. One thing I thought of instantly is that there should be a way to compress the transactions. For example, we explicitly have the words “map”, “effects”, etc repeated for every transaction which could be changed to a single character if it were compiled/compressed right?

Another idea I had was to have a way to eliminate non-critical transactions from the array list like cursor transactions. Or even is there a way to easily take an array of transactions and say that we want to combine the first 100 transactions into a single transaction? I’m also open to any other way to make the whole system more efficient/compressed.

I’ve added an example of what part of my current array looks like when saved in the db (I know it’s an object but I’m just saying array for simplicity purposes).

Thanks so much.

{
    "81": "{\"clientID\":\"4f4761fd-9875-4709-ae1c-6915071fd4c8_oaiwejf@aofiej.com\",\"changes\":[34,[0,\"\",\"\"]],\"effects\":[],\"timestamp\":1726372240727}",
    "82": "{\"clientID\":\"4f4761fd-9875-4709-ae1c-6915071fd4c8_oaiwejf@aofiej.com\",\"changes\":[35],\"effects\":[{\"type\":{\"map\":{}},\"value\":{\"from\":35,\"to\":35}}],\"timestamp\":1726372240727}"
}

Are you just converting the raw collab Update objects to JSON? That’s sort of what that JSON looks like. But those "effects":[{"type":{"map":{}}] parts don’t look like they contain anything that could ever be converted back to useful data. Also, selection changes won’t generate such updates (your data doesn’t seem to even encode selection), so I’m not sure how to square that with your observation that maybe you don’t need to save cursor transactions.

What you need to save depends entirely on what you are doing with the data. If you want collab clients to be able to re-sync from any point in history, you do need precise changes since the point where they last synced. But it is entirely reasonable to disallow that for old points in history, and keep less fine-grained information (or no information at all) for older updates.

Changesets can be combined with compose, if you want to reduce the number of checkpoints you save.

1 Like

These are the updates that you’re seeing:

const updates = sendableUpdates(this.view.state);

My data definitely has selection included. As you can see, this one is dealing with the cursor:

"82": "{\"clientID\":\"4f4761fd-9875-4709-ae1c-6915071fd4c8_oaiwejf@aofiej.com\",\"changes\":[35],\"effects\":[{\"type\":{\"map\":{}},\"value\":{\"from\":35,\"to\":35}}],\"timestamp\":1726372240727}"

Because it has the “from” and “to”.

What do you think I’m doing wrong that’s causing you to see unuseful data like “effects”:[{“type”:{“map”:{}}]?

I see, you’re storing a shared effect to track remote selections, and then just directly dumping the JSON (which adds the useless {"type":{"map":{}} property to the output, since StateEffect objects don’t implement a toJSON method) and, I guess, using a custom deserializer to parse that back to an effect.

So I guess you can already cut down on JSON size by being more careful about how you serialize these objects.