It's a bird, it's a plane, it's

Everything here is my opinion. I do not speak for your employer.
January 2010
February 2010

2010-01-04 »

bup 0.01: It backs things up

I just spent a few days of my Christmas vacation writing a new program, bup.

bup is a program that backs things up. It's short for "backup." Can you believe that nobody else has named an open source program "bup" after all this time? Me neither. It also has almost no other meanings.

Despite its unassuming name, bup is pretty cool. To give you an idea of just how cool it is, I wrote you this poem:

Bup is teh awesome
What rhymes with awesome?
I guess maybe possum
But that's irrelevant.

Hmm. Did that help? Maybe prose is more useful after all.

Reasons bup is awesome

bup has a few advantages over other backup software:

  • It uses a rolling checksum algorithm (similar to rsync) to split large files into chunks. The most useful result of this is you can backup huge virtual machine (VM) disk images, databases, and XML files incrementally, even though they're typically all in one huge file, and not use tons of disk space for multiple versions.
  • It uses the packfile format from git, so you can access the stored data even if you don't like bup's user interface.
  • Unlike git, it writes packfiles directly (instead of having a separate garbage collection / repacking stage) so it's fast even with gratuitously huge amounts of data.
  • Data is "automagically" shared between incremental backups without having to know which backup is based on which other one - even if the backups are made from two different computers that don't even know about each other. You just tell bup to back stuff up, and it saves only the minimum amount of data needed.
  • Even when a backup is incremental, you don't have to worry about restoring the full backup, then each of the incrementals in turn; an incremental backup acts as if it's a full backup, it just takes less disk space.
  • It's written in python (with some C parts to make it faster) so it's easy to extend and maintain.

Super quick example

(The README actually has a more detailed example.)

Try making a remote backup:

    tar -cvf - /etc | bup split -r myserver: -n my-etc -vv

Try restoring your backup:

    bup join -r myserver: my-etc | tar -tf -

(On myserver) look at how much disk space your backup took:

    du -s ~/.bup

Make another backup (yes, that's exactly the same command):

    tar -cvf - /etc | bup split -r myserver: -n my-etc -vv

Look how little extra space your second backup used on top of the first:

    du -s ~/.bup

Restore your first backup over again (the ~1 is git notation for "one older than the most recent"):

    bup join -r myserver: local-etc~1 | tar -tf -

What's next?

I have lots of plans for this lovely program, in the event that I actually get time to implement them. But if you think it's cool, please feel free to git clone it, hack away, and send some patches! Read the README for a list of some deficiencies in the current release.

I'm sure there are also more deficiencies that I don't know about, of course.

(Previous poetry-related adventures.)

Update (2010/01/05): Commentary at ycombinator news and reddit programming. To answer the most common question: it's different from most of those other apps you mention because: a) bup backs up really huge files rather than silently ignoring them or running out of memory; b) bup is a backend, not a GUI, while most of those apps are GUIs (which could use bup as a backend if they wanted); c) bup stores its backups in big packfiles, rather than a one-file-per-file model, and thus can be much faster (but 0.01 isn't optimized yet).

Update 2 (2010/01/05): By popular demand (well, nonzero demand from the populace, anyway), I've created a mailing list. You can subscribe by sending an email to bup-list+subscribe@googlegroups.com (note the weird + character in the email address).

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com