I love the smell of

in the morning
Everything here is my opinion. I do not speak for your employer.
March 2008
April 2008

2008-03-23 »

A tale of five merges, part 3: 'git merge'

In the last two articles from this series, I talked about merging branches in CVS and SVN, and how they're largely the same.

Today is the first of two articles about merging in git. git is most definitely not the same as CVS and SVN.

Remember, the questions we're investigating for each merging system are: (A) how automatic is it, (B) what happens to the change history, (C) what happens when you merge back and forth multiple times between two branches, and (D) what happens when you "cherry pick" individual changes, usually bugfixes, from one branch to another.

Merging with 'git merge'

Compared to working with CVS or SVN, merging branches in git feels like magic. First, you check out the branch that you want to merge into, say MASTER. Then you type "git merge BRANCH", where BRANCH is the name of the branch you want to merge from. And that's all.

...When it works.

Aha. You see, "git merge" is really just doing exactly what you were doing in CVS or SVN by hand. That's why I went into CVS merging in such detail before.

Here's what really happens: given the name BRANCH and the implicit name MASTER (since that's the one you have checked out), git traces back in the history of both until it finds the branchpoint, that is, the last commit before the two diverged in the first place.(1) If we were using CVS, we'd call that point BEFORE, and BRANCH would be called AFTER. git then simply takes all the changes from BEFORE to AFTER and adds them to MASTER.

Now here's where things get especially clever. Remember when we were discussing bidirectional merges in CVS, and I pointed out that merging all the changes from BEFORE to AFTER into HEAD was the same as merging all the changes from BEFORE to HEAD into AFTER? The resulting tree is the same in both cases, but of course, in CVS and SVN, the actual commit you make is different in each case. After all, in one case, you're committing to the HEAD branch, and in another, you're committing to the AFTER branch.

git is smarter than that. In git, there really is no difference between the two operations. You generate a single commit that has two parents: AFTER and HEAD (or BRANCH and MASTER). Now, it so happens that git updates the MASTER branch pointer to point at your new commit in one case, or the BRANCH branch pointer in the other case, but the commit itself is exactly the same in both cases.

And that simple fact is why repeated merges and bidirectional merges work. Next time you're merging MASTER and BRANCH, git will trace back, and the most recent checkin that is shared between the two branches will be one of the parents of that magical two-parent merge checkin. We use that one as BEFORE, and merge the changes from BEFORE to BRANCH into MASTER. Or else we merge the changes from BEFORE to MASTER into BRANCH; again, it's the same thing.

This concept of a two-parented commit is what makes git merging totally different from SVN, and it's the difference between SVN's "linearized history" (in which every commit has exactly one parent, even if we made it that way artificially) and git's non-linear history.(2) It's also the reason git merging can be automatic, while svn's cannot.

Why it doesn't always work

The problem with git's automatic merging is, well, that it's automatic. When it's working for you, it's amazingly powerful. But people who are used to CVS or SVN merging sometimes feel like git is taking their power away.

For example, perhaps you don't want to merge all the changes from BRANCH; perhaps you only want the changes up until sometime last week. With 'svnmerge' this is easy enough to do, but with git it requires multiple steps, since git-merge won't accept arbitrary revisions on its command line, only branch names.(4)

Or say you create BRANCH1 from MASTER, and start to add feature #1. Then you create BRANCH2 from BRANCH1, and add (unrelated) feature #2. Now you decide that feature #2 is ready to merge back into MASTER, but feature #1 is not. No problem, right? You checkout MASTER and then type "git merge BRANCH2".

Oops! git has just merged all the history of BRANCH1 and BRANCH2 back into MASTER, which is not what you wanted at all! Unfortunately, because git's merging is fully automatic, there is no easy way to get the behaviour you wanted.(3) With CVS, on the other hand, what you would do is obvious: "cvs up -j BRANCH1 -j BRANCH2", to merge all the changes from BRANCH1 to BRANCH2 into MASTER. With SVN, it would be similar, except that BRANCH1 and BRANCH2 involve revision numbers and URLs.

In git, the solution to this sort of problem is to use the "git rebase" command, which we'll discuss in more detail next time. But any use of git-rebase or git-filter-branch, both of which are very useful, fundamentally change the commit IDs on the modified branch. "git push" and "git pull" stop working as expected. Essentially, these commands generate entirely new commits for the part of the tree they change, so now there are two sets of commits in your repository that do the same thing in two different ways, with nothing tying them together. Worse, after a rebase, the point where your two branches diverged might be no longer known to git - they no longer have a single commit id in common - so it just gives up.(5) And trying to merge the new MASTER back into the original BRANCH2 later will result in a ton of conflicts.

The really bad news is that git-svn, which is otherwise a great tool to help with migrating between git and svn, uses git-rebase pretty heavily. That confuses git's automated merging features altogether, and git's manual merging features are not as straightforward as SVN's, so things get confusing very fast.

Git merge and cherry picking

git includes another command called "git cherry-pick". Its job is to take the changes from one specific commit and apply them to the current branch.

Unfortunately, the existence of git cherry-pick is incompatible with git merge's view of the world. git's history may not be linear, but at least it's continuous: branches may swerve into and out of each other as they pass through time, but individual changes aren't supposed to simply jump from the middle of one tree into the middle of another.

Since there's no way to represent what really happened, a cherry-picked patch instead produces an entirely new commit with exactly one parent (the destination branch), so a future git-merge from that branch has no idea the patch already existed, and usually produces a conflict.

Cherry picking in SVN with 'svnmerge' definitely works better than this.

How the git changelog tracks merges

Last time around, I talked about how SVN's "linear" change history tracks merges, which is simple enough: every time you do a merge, it puts a single checkin into the destination branch which contains all the changes. And if you want to know the details of every patch from the source branch, well, you'll have to go look at the source branch yourself.

git definitely doesn't do that. Instead, if you ask for "git log" it will show you the complete set of changes in all branches leading up to the current commit. That is, all of the checkins to either of the merged branches show up in the log, with nothing in particular in the "git log" output identifying which patch came from where.(6) This is technically correct, but is not actually what most users wanted to know.

Most people who ask for the history of the MASTER branch want to know the "simplified" form of its history: Added feature X; added feature Y; fixed feature X; added feature Z. But instead, they confusingly see all the individual 57 commits making up "added feature X", even though if they were ignoring your featureX branch all along, "added feature X" was really a one-time event for them.(7)

A related problem is that git doesn't have a way of naming branches globally. So if you merge from the "featureX" branch, you're not really merging from featureX; you're merging from 2bed2ad4845eceb8ee650f34e476a60f9fbecc7c. As far as the computer is concerned, that's the same thing, but when it comes to tracing back through your history, it's not actually what you wanted to know.

Worse still, git doesn't remember at branch time where you diverged; it only figures it out at merge time. That means there is no equivalent to SVN's "svn log --stop-on-copy", which shows only the changes you added in this branch.(8)

git-merge overall

Now let's go back to the questions we were trying to answer in the first place:

git-merge: (A) does merges fully automatically, when it works, but complicates manual merges; (B) retains the history of each change, but that often turns out to be more than you wanted; (C) easily supports back-and-forth merges with no more trouble than single merges; and (D) does not really support cherry picking without introducing future merge conflicts.

Phew!

Next time: how 'git rebase' tries to solve some of the things people don't like about git merge.

Footnotes

(1) This is actually greatly oversimplified. Since git history is nonlinear, just tracing back through it is a bit complicated. On top of that, you can play tricks with selecting different branchpoints for different files, and so on.

(2) I guess if SVN's history is "linear", then git's history is "helix shaped," with endless splitting and remerging of histories.

(3) git proponents would say that what you should have done is branched BRANCH2 from MASTER in the first place. But not everyone is lucky enough to get everything right the first time.

(4) There seems to be no actual reason for this restriction. Perhaps it will go away eventually.

(5) That's the source of the mysterious note that "You should understand the implications of using git rebase on a repository that you share" in the git-rebase man page. Of course, the man page doesn't really tell you what the implications are. It just says it "will cause problems."

(6) The "gitk" and "gitweb" programs attempt to represent this graphically, which helps. The "git show-branch" program is also somewhat useful here. But all of these are much harder to understand than "svn log".

(7) There is actually a "--squash" feature of git-merge that tries to resolve this problem. Essentially it creates an entirely new commit with only one parent - the destination branch - and doesn't tie in at all to the history of the other branch. In other words, it does exactly what "svn merge" would do, but that has exactly the opposite set of tradeoffs, as discussed last time, and makes bidirectional merges very ugly. Alternatively, "git log --first-parent" displays only the first parent of each merge, which is closer to what most users actually want.

(8) If you manually keep track of which branch you came from, you can use "git log BEFORE..MYBRANCH" to see the list of changes between BEFORE and MYBRANCH. But if you're going to manually keep track of BEFORE, you might as well be using CVS. "git show-branch" also attempts to help here, but if you ever use "git rebase" its output will be hopelessly cluttered and confusing.

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com