"Help" fast merge be more accurate for structured data

s5bug · February 7, 2026, 7:37am

I’m moving my application Squirrel Explorer which currently uses Monaco over to CodeMirror to assist in reverse engineering Squirrel 2.2.5 bytecode files. The Monaco experience right now is this:

I’ve gotten the CodeMirror experience up to here:

Beyond the nice hatch pattern for added/removed lines, and the diff minimap showing where along the files the changes are, which are just cosmetic niceities, it’s pretty important to my workflow to be able to line up instructions side-by-side. Especially to see i.e. LoadInt(1, 2, 0, 0) on line 3 be matched up with LoadInt(2, 2, 0, 0) on line 4, telling me I’m both missing an instruction and have my stack set up wrong.

What I’m diffing is readonly, and as can be seen from the screenshot, has a lot of structure to it. However, because these outputs are upwards of dozens of thousands of lines long, I can’t use the slower diffing algorithm without the app locking up on every keystroke (Monaco resolves this issue by running the diff algorithm inside a web worker).

It would be nice if I could tell the diff algorithm something like: “for the next X lines, literals and params and outerValues and etc should line up on the [ and ], so use the slow diff algorithm on what’s between them”. Or at the very least, “use the slow algorithm for the first X lines and the fast algorithm for the rest”?

marijn · February 8, 2026, 10:31am

Sorry, but those aren’t things I feel would make sense to support in the diff algorithm. If it’s any help to you, I could make the algorithm pluggable, so you can bring your own.

s5bug · February 8, 2026, 8:00pm

That would actually help a lot. I already need to diff the binary representation of the data I’m outputting to the right merge view to reduce the amount of updates I’m making to it, and I could use that same logic to write the diff algorithm I want here. (The main thing I’ve never done before is matching things up within lines, but I imagine I can just grab the O(ND) algorithm already in the merge code and just run it on my small sections).

But actually, I’m not sure how to “attach” my binary representation to the editor context. I’d like to use that if possible instead of pretty printing → reparsing.

marijn · February 9, 2026, 7:06am

This would definitely need to happen in terms of document strings, because that’s what the library does the diffing on. Please take a look if attached patch is helpful for you.

s5bug · February 9, 2026, 8:30pm

Awesome, thanks! When I get a few hours of time to hack this out I’ll report back on if/how well it works.

s5bug · February 15, 2026, 11:36pm

Hmm, I’m actually really wishing I could access the editors’ states here: I already have a StateField that essentially holds the from/tos of where I pretty-printed each field to, so being able to access that would save me the trouble of re-parsing.

That being said, the only “re-parsing” I need to do here is looking for something like (\s+)(\w+) = \[, so I’ll try to get that working first.

s5bug · February 16, 2026, 2:03am

I got it working when I completely replace the entire buffer, but if I make incremental changes it no longer registers anything as changed.

Here’s it working as intended:

But if I send an empty transaction, I lose everything:

Here’s my code:

github.com/s5bug/squirrel-explorer

src/diff_cnut.ts

956cb0696

import { Change } from '@codemirror/merge'

/*
Begin copied from https://github.com/codemirror/merge/blob/3bb0627b0c4cbab0f1eaac1513d8d15ab024acf7/src/diff.ts
 */

function findDiff(a: string, fromA: number, toA: number, b: string, fromB: number, toB: number): Change[] {
  if (a == b) return []

  // Remove identical prefix and suffix
  let prefix = commonPrefix(a, fromA, toA, b, fromB, toB)
  let suffix = commonSuffix(a, fromA + prefix, toA, b, fromB + prefix, toB)
  fromA += prefix; toA -= suffix
  fromB += prefix; toB -= suffix
  let lenA = toA - fromA, lenB = toB - fromB
  // Nothing left in one of them
  if (!lenA || !lenB) return [new Change(fromA, toA, fromB, toB)]

  // Try to find one string in the other to cover cases with just 2
  // deletions/insertions.

This file has been truncated. show original

s5bug · February 24, 2026, 5:55pm

I see. It’s because the diff function is given partial strings, so I’ll have to be a little bit smarter.

s5bug · February 26, 2026, 5:54am

@marijn How bad of an idea is it to call the Lezer parser inside my diff function? Bottom-up means it’ll work (as best as it can) on the partial strings, right?

marijn · February 26, 2026, 6:22am

That sounds like it would be very expensive, since you won’t be able to do incremental re-parses and parsing unsyntactic (incomplete) input tends to be more work.

s5bug · March 15, 2026, 1:47am

This is definitely a “wrong” solution that I need to fix later, but running my broken/stupid algorithm for large changes & calling the smart built-in algorithm for small changes actually works with my workflow.

It’s nice and responsive too (much faster than Monaco!) so I’ve finally been able to deploy it. Thanks for all the help!