incorrect diff for large files in merge view

pondorasti · November 13, 2023, 8:37pm

I’ve noticed that when using very large files with the MergeView editor, the computed diff between the files is incorrect.

More specifically for my case, I have 2 ~1500+ loc files that have just a couple changes between them (not more than 40-50 loc) spread out throughout the whole document (small diffs at the beginning, middle, and end of the document).

Here’s the diff that I’m expecting to see when using a tool like diffchecker.com.

And this is how it’s rendered in CM using the MergeView. The whole document is basically flagged as one large diff.

Here’s a Gist with repro documents for testing and a playground configured with MergeView. Couldn’t generate a proper share link since the url is too long when using my repro documents.

marijn · November 13, 2023, 9:02pm

Computing the diff for such large (and different) files is too expensive to do interactively, so the editor falls back to an overapproximation of the diff.

pondorasti · November 13, 2023, 9:44pm

For my use case, I’m more interested in precision over speed. Is there a way I can prevent the editor from falling back to the approximation?

marijn · November 14, 2023, 6:41am

If people are allowed to edit the document, it seems that having a slow (and this is super-linear, so it gets very slow quite quickly) UI update cycle would make it unusable pretty quickly.

If users aren’t allowed to edit, maybe just statically rendering the text is easier than having it in a CodeMirror editor.

pondorasti · November 14, 2023, 6:59pm

My use case is very similar to GitHub’s diff mode in PRs. I use CM to look at the version of a source file or compare the source of a file across two different versions. The document is static to the user and is indeed initialized with EditorState.readOnly.of(true) and EditorView.editable.of(false).

If I understand correctly there’s no way of making CM render the content statically and get my desired outcome? Should I be looking at a different framework like prism for my use cases?

pondorasti · November 14, 2023, 7:11pm

Computing the diff for such large (and different) files is too expensive to do interactively, so the editor falls back to an overapproximation of the diff.

Could you go a bit more into detail in what is considered too expensive (file size, word count, etc) so I can get an idea of when the editor is expected to fallback to the approximation?

marijn · November 15, 2023, 11:07am

This patch makes the scan limit configurable through the merge view options.

pondorasti · November 15, 2023, 10:02pm

Awesome! I’ve put together a repo to try it out and it’s working as expected when setting the scanLimit (preview link).

Right now, I’m running the dependency off the main branch from GitHub. Once you cut a new release and make it official, I will update my package to the npm version.

// package.json
"@codemirror/merge": "github:codemirror/merge#main"

I greatly appreciate all the help and time you’ve put into this issue. Thanks for implementing the fix so fast.

marijn · November 16, 2023, 6:45am

I’ve tagged @codemirror/merge 6.3.0

cboulanger · April 16, 2025, 5:36pm

Hi, if I may continue this thread: this was very helpful since I am working with long documents with small changes, which were not correctly displayed before. The longer time it takes to compute the diffs (10-20 seconds) is acceptable in principle, but I would like to inform the user with a spinner/block screen. Is there an event I can listen to to know when the computation is finished? Thank you for any pointer.

marijn · April 17, 2025, 6:12am

Computing the diff is a synchronous process. So a spinner wouldn’t spin, and the UI would be entirely frozen. I don’t think that a 20 second freeze is really going to be acceptable in any circumstance, unfortunately.

cboulanger · April 17, 2025, 6:54am

Hi, thanks - I saw that my app was still responsive, just the editor wasn’t updating. In my special case, it’s not a problem because the result is so useful that the user of the app is happily waiting for the process to finish - the app also depends on long-running server-side computations, so my users have to be patient in any case.

cboulanger · April 17, 2025, 7:10am

Interestingly, while I was experiencing very long delays yesterday, adding the MergeView is almost instantaneously now even with scanLimit: 50000 and the same long documents as yesterday. Strange, my machine must have been busy with other things…