Feasability Assessment: Extension to decorate conflict markers

DaRosenberg · July 20, 2024, 3:46pm

I would like to get a conversation started about a potential new CodeMirror extension that we have been pondering for some time. Initially I would like to describe our idea and thoughts about the implementation, and get some feedback from people more familiar with CodeMirror’s extensibility points and APIs than we are (in particular from you @Marijn of course, but also from anyone who has insights) to gather an overall assessment on whether this would be feasible to do, and a sense of how difficult or easy it would be.

Depending on what people think, there are of course several options for next steps to try to get this implemented. We can discuss this at a later stage.

Before I go any further, I want to clarify: this is not about building a three-way merge editor as many others have suggested already. This proposal is something much simpler (presumably) but related - more information on that follows below.

Background

Conflict markers

When using git merge there are sometimes conflicts that Git cannot automatically resolve. Some conflicts involve files that have been modified on both sides of the merge (theirs and ours) and where both sides modified the same lines in the file, such that Git cannot automatically resolve. I contiguous range of such lines we can call a conflict section. These sections are left as conflict markers in the merged files, for the user to resolve as she sees fit.

Here is a simple example:

Michael James Anderson
Sarah Elizabeth Thompson
David Alexander Harris
<<<<<<< HEAD
Emily Louise Mitchell
=======
Emily Jane Mitchell
>>>>>>> dev
Christopher Daniel Evans
Amanda Nicole Foster
Matthew Thomas Carter
Olivia Rose Bennett

This shows how a line has been modified to “Emily Louise Mitchell” in the current branch (ours) and to “Emily Jane Mitchell” in the incoming branch (theirs). There is no indication here of the original (base) value.

Conflict marker styles

Git can also be configured to use a few different styles for these conflict markers. What you see above in the default merge conflict style but there are also diff3 and zdiff3 styles. These also add the base (i.e. the original) content inside the conflict marker, like so:

Michael James Anderson
Sarah Elizabeth Thompson
David Alexander Harris
<<<<<<< HEAD
Emily Louise Mitchell
||||||| d78303c
Emily Grace Mitchell
=======
Emily Jane Mitchell
>>>>>>> dev
Christopher Daniel Evans
Amanda Nicole Foster
Matthew Thomas Carter
Olivia Rose Bennett

In this example Git has been configured to use zdiff3 and it shows that the original (base) value was “Emily Grace Mitchell”.

Resolving conflict markers

In a basic real-world conflict resolution scenario, the user would inspect this file and its remaining conflict markers, choose how to resolve each one, and manually edit the file to the desired final state by replacing the entire conflict section with whatever content the user thinks should remain, save the file and finalize the merge.
Performing manual text-based resolution of conflict sections is simple in principle, but not very ergonomic due to:

Complete lack of visual aids such as color highlighting of the changes
Vertically stacked layout makes it difficult to visually compare the changed lines
No assistive “shortcut” actions to choose one side or another - everything must be done by manual editing

Existing tools

Some tools attempt to provide some assistance to help. For example, Visual Studio Code highlights conflict markers with some colors and provides code lens actions to accept a given side. Here’s the same zdiff3 style conflict section we saw above rendered in Visual Studio Code:

We want to provide something similar to this to our user, but implemented as an extension for CodeMirror, and with more visual clarity and even better ergonomics.

Not a three-way merge editor

As mentioned, what we are proposing is different from using a three way merge editor.
A full-fledged merge editor such as Beyond Compare or Kaleidoscope will typically disregard the conflict markers left by Git in the target file, and instead extract the ours, theirs and base versions from Git, and perform a three-way merge from scratch based on these three files, and then overwrite the target file with the resulting merged file.

In the scenario we want to target, by contrast, Git has already performed the merge of the file is question, and the file already contains the changes from both sides that did not conflict (let’s call these the non-conflicting changes ) along with conflict sections for anything Git was not able to resolve automatically.

We simply want to visualize those remaining conflict sections, and provide actions for a user to interact with them. There is only a single file to consider, and only a single editor is needed to resolve the remaining conflicts inside it.

Proposed Design

In a nutshell we are proposing an extension that turns this…

into this:

Basic principle

Use a multi-line regex to identify conflict markers in the document and extract the relevant fragments
- Note: I’ve seen that there is a placeholder utility but I recall seeing that it can only be used to match strings in a single line. Do similar utilities exist to match over multiple lines? Is it reasonably easy to do, or would this be a significant obstacle to overcome?
Replace each such section with a block-level widget that shows the change versions side-by-side horizontally
Use diff utilities in the @codemirror/merge package to highlight inline differences in ours and theirs compared to base
Disallow modifications inside the widget, but allow selection
Provide actions to accept either one of the three versions, i.e. replace the entire conflict section with that version
Provide actions to copy either one of the three versions to the clipboard
Provide an action to dismiss the conflict, i.e. simply delete the conflict section from the document

No additional editor state

One of the main drivers behind proposing this particular design is that it would (presumably) not require any additional editor state. The current document content is sufficient at all times to represent the current state. The extension is only a visualization of the current content, along with some actions to mutate that content.

The normal document history and undo/redo actions can be used to get back to previous states, such as restore a conflict section that was accepted or dismissed by mistake.

Mapping

Once matched by the regex, here is how the pieces of a conflict section would be mapped to the rendered widget:

The strings “ours”, “original” and “theirs” should be used by default, but the host application should also be able to specify those in options when needed. Similarly, the refs are taken from the conflict marker by default, but also those can be overridden as options by the host application.

This enables scenarios where the sides of a conflict might be more recognizable by the user if they are expressed in terms of some higher-level application-defined concept.

Layout

The three versions of the conflicted section (ours , base and theirs ) are stacked side-by-side from left to right horizontally, each in one pane.

The horizontal layout significantly improves ergonomics and reduces the cognitive load required to compare the sides of the conflict, because the different versions of a modified line are always adjacent (unlike when the sides are presented vertically and two versions of the same line can be potentially dozens of lines apart, depending on conflict section length).

Each version pane always occupies one third of the available horizontal space, such that the versions are always equally wide. If their content contains lines longer than the available space, each pane scrolls horizontally to accommodate the overflow. (Respecting the word-wrapping configuration of the surrounding editor would also be acceptable, but not worth much additional effort or complexity.)

The horizontal layout is also much more scalable. Conflict sections can consist of any number of lines; the widget can simply grow vertically to accommodate, and each version pane grows accordingly.

Highlighting diff between versions

As mentioned above, the rendered widget should be able to use the existing utilities in the @codemirror/merge package to highlight inline differences in ours and theirs compared to base as illustrated here:

Base version is optional

The default conflict marker style produced by git merge has no base version. The diff3 and zdiff3 formats both include the base version and their formats are exactly the same; they differ only in what ends up inside the base version in some cases.

This proposed extension should handle all three styles, i.e. should handle the obsense of a base version gracefully. This means the regex used to match conflict sections needs to have the base section as optional, and the middle pane titled “original” can be missing. In such cases:

The ours and theirs panes each occupy 50% of the available horizontal space
The diff highlight can either be done between ours and theirs or disabled altogether

Interactivity

The rendered widget provides several points of user interaction, as illustrated here:

Coexistence with other extensions

Syntax highlighting

The highligting done by this extension is deliberately limited to background colors. This is to not interfere with any syntax highlighting that might simultaneously be enabled.

Unified merge view

It should be possible to enable this proposed extension together with the unifiedMergeView extension (provided in the @codemirror/merge package) on the same editor view. The unifiedMergedView extension can then be configured with the base version of the document (which would normally be extracted from Git separately) as the original document. As a result, non-conflicting changes in the document (i.e. those changes that Git has already managed to automatically resolve) would be visualized as well.

By composing the two extensions in this way, we get a reasonably full-featured conflict resolution experience, without the need to implement a three-way merge editor or any merging algorithms.

Undo/redo

As already mentioned, the actions provided by this extensions should all be single-transaction mutations that integrate naturally with the normal undo/redo functionality. The user can then linearly regret any actions taken (e.g. accepting resolutions or dismissing conflicts) and the hosting application can optionally show UI for this outside the editor view itself.

Residence

In our opinion, this new proposed extension would fit very naturally in the @codemirror/merge package, because:

It is closely related to merge functionality
It would use some existing functionality in this package (the diff utility functions)
It would compose nicely together with the unifiedMergeView extension already provided by this package

Feasible or not?

So what does everyone think? Does this proposed design make sense, and is it feasible from an implementation standpoint?

Any feedback is highly appreciated.

AlexErrant · July 20, 2024, 5:39pm

Disclaimer: I am a CodeMirror noob.

I feel like it is unlikely that a block-level widget will accomplish your desired UI, especially if you wish for the conflicted sections to have the original’s syntax highlighting. Most grammars will throw up their hands upon seeing <<<<<<< HEAD, so you’ll need to have 3 separate source docs.

If I drop the syntax highlighting requirement, then we still run into

Each pane is read-only (but editable so that user can select and copy parts of its content using both pointer and keyboard.

For UI consistency… each pane should be a CM editor. I suppose you could drop any consistency requirements and just render it in a div, but then you sacrifice keyboard selection (somewhat). I suppose you could insert a CM editor as a block-level widget, in which case we’re wayyyy outside my comfort zone.

Just some thoughts; I’d love to be wrong!

DaRosenberg · July 20, 2024, 7:39pm

Hey Alex, thanks for chiming in here, much appreciated!

Regarding syntax highlighting vs. the conflict markers: I don’t know if CM grammars are fundamentally different in this regard than those used by VS Code for example, but in the latter, syntax highlighting has no problems with the conflict markers. As an example, here is a C# file with a conflict section in it:

In fact, even if I remove one > at the end to make it not match the conflict marker pattern, the grammar seems to have no problem understanding what comes before, inside and after the conflict markers in VS Code… but you’re saying CM grammars would not be so forgiving?

For UI consistency… each pane should be a CM editor.

This thought occurred to me as well while I was writing it all up. To maintain consistency with the rest of the editor, and to enable things like keyboard selection, it seems inevitable indeed that each pane would need to be its own CM editor.

Would it necessarily be an inherently bad idea to have a block-level widget that hosts three small nested CM editor views?

AlexErrant · July 20, 2024, 8:37pm

VS Code uses TextMate grammars, which underneath run a buncha regex (which explains its performance characteristics hah), while CM (usually) uses Lezer, which is an LR-parser, so the grammars are pretty different. However, Lezer has good error recovery, so it too should be able to parse a document with conflict markers… as long as the conflicts are between complete statements/expressions, as in your screenshot. A more complex multi-line conflict that occurs inside or between statements/expressions may have… odd behavior. Like, a conflict line of just

.FirstOrDefault()

isn’t a complete statement/expression so IDK how a parser would be able to handle it without context.

Here’s a demo of some conflict markers in JS using https://lezer-playground.vercel.app/

The code is pretty nonsensical don’t think too hard about it.

Would it necessarily be an inherently bad idea to have a block-level widget that hosts three small nested CM editor views?

That’s outside my experience, unfortunately. However a search of the forums reveals one user who says they got 3 levels of nested CM6 working.

JerryI · July 21, 2024, 12:37pm

Here is my 3 coins In my experience, CM6 is extremely scalable. With some magic of CSS, one can spawn more than 100 instances having no issues

live demo

DaRosenberg · July 23, 2024, 1:20pm

OK so from a scalability standpoint, based on @JerryI’s example I’m not too worried.

Regarding syntax highlighting: for our use case the vast majority of documents will be XML, so I did some testing in the playground to see how the syntax highlighter deals with the presence of conflict markers in the document, and I can’t see any issues:

So I guess… by and large there doesn’t seem to be any major obstacles identified yet right? What about the task of finding sections based on multi-line patterns? What do you guys think of that particular challenge?

AlexErrant · July 23, 2024, 4:33pm

Regex seems fine. You could always build a new lezer grammar and make a mixed-language for xml-gitconflictmarkers, but the advantages of that are not obvious to me.

DaRosenberg · August 2, 2024, 12:40pm

@marijn Two questions:

Do you think this extension would be a good fit for the @codemirror/merge package?
Regardless of where it would end up living, are you open to being funded by us (and potentially others) to develop this feature?

Kind regards,
Daniel Rosenberg
OrgFlow GmbH

marijn · August 4, 2024, 11:13am

Hey Daniel, I don’t think I want this in @codemirror/merge (it’s much more specific than the generic merge view features). But we could do a thing where I implement it on a free-lance basis. Write me an email and we can discuss.