200304 - apenwarr

Everything here is my opinion. I do not speak for your employer.

← March 2003

May 2003 →

2003-04-06 »

Cascading Failures

Well, I promised not to discuss my life in here, but since I'm about to use it as an example of generalized system failure modes, I figure it's okay.

The goal: from Waterloo on Thursday morning, travel to Montreal by Friday evening at 8 pm. By car, this is a 7 hour drive. No problem, right?

Well, we were going to rent a car and drive on Thursday afternoon, but it starting snowing/raining/sleeting so we decided not to drive after all; instead, let's take the train, which is safer in bad weather. Since the night train leaves you kind of tired, we decided to take the Friday morning (9:30 am) train from Toronto.

Friday morning, the weather still sucked, but that's okay. We called a taxi at 6:20 am to take us to the bus station in Waterloo. At 7 am, it finally arrived - delayed by bad weather, of course. So we missed the 7am bus to Toronto. No problem, there's an 8 am bus that should still make it to Toronto in time. Unfortunately, the 8 am bus showed up at 8:30 (bad weather), departed shortly afterwards, and got to Toronto at 10:00 (extra late - bad weather). No problem, though; we rescheduled our train tickets from the 9:30 to the 11:30 train. We changed the reservation by cell phone from the bus, luckily, because by the time we arrived all the trains for the day were fully booked. Turns out all the airports were closed (bad weather) and the people taking flights had all switched to the train.

As we were picking up the tickets, they made an announcement that the 11:30 train would be leaving at 12:30 instead - bad weather. No problem: the 11:30 is supposed to get to Montreal at 4:45, so an hour later is 5:45, and even with additional weather delays I should certainly be in Montreal by 8 pm. So, we have some time, let's go for lunch.

At 12:20, we came back and found out that the train had left at 12:07, having been re-re-scheduled while we were gone. In fact, they had made the new announcement before we left the station, but because of a ridiculously loud random music performance (something about the Juno awards) in the middle of the station at the time, all the public announcements were inaudible.

Feeling guilty, they asked us to wait while they figured out what they'd do to get us to Montreal. The result: at 1pm or so, we found out that they could squeeze us on the 3:30 train (arrives around 9:30; useless) or a special 2:30 shuttle bus (could arrive at 8:30 in good weather; useless). So Via Rail wasn't going to be able to help.

Last chance: rent a car after all (there's a rental place at the train station) and drive it to Montreal. That takes at least 6 hours in good weather. By 1:30 we had almost finished filling out the rental forms, meaning that we could be in Montreal by 7:30 on a good day. Sadly, it wasn't a good day. (Interestingly, if we had known at 11:30 that we would miss the train, the rental would have saved us.)

I mentioned above that the airports were closed too (bad weather).

The Moral of the Story

Despite a metric tonne of backup plans (an extra day; an extra bus; an extra train; backup train should still arrive early; could rent a car if the train was cancelled) and slippage, we still didn't get to Montreal on time.

In management, we call this "slippage." In clustering, we call this "cascading failures."

The lesson to learn here is that if you're going to add redundancy (like the extra buses, trains, time, etc) you'd best make sure that the same root cause can't screw up all of your backup plans at the same time. That means don't put a five-station Oracle database cluster on the same power circuit, don't write software that shuts down and expects the cluster to take over if it gets confused (because what if all the nodes get confused by the same thing?), and don't plug all your backup servers into the same Internet connection. For that matter, don't store them all in the same nuclear bunker in the Swiss Alps. If exactly the wrong thing happens, you'll be in trouble.

2003-04-13 »

Guadec

Woo hoo, my UniConf and Sharvil's ExchangeIT talks got accepted for Guadec. In half-hour timeslots instead of hour ones, which is unexpected but not really a problem. (Were you supposed to request something, or something?)

Now all I need is a passport, a plane ticket, and a place to stay. Eek!

Amazing Discovery

Today I learned that according to Statistics Canada, there are more married women in Canada than married men. (Statscan is truly awesome, so I suppose there's some explanation for this.)

Document Correlation

This weekend I wrote a page correlator using ideas I blatantly stole (badly) from Paul Graham's essay on spam detection using Bayesian Filtering. His math makes way more sense than my cheesier version, but it mine just a hack so that's okay for now.

Quick summary of the theory: the interesting aspects of a document are characterized by the locally most common words among the set of globally least common words. That is, if I say the word splahooie a lot, but nobody else does, that makes "splahooie" an interesting characteristic of my document. But if everyone else says splahooie, or I don't say splahooie very much, it's not a keyword.

So anyway, that works pretty well with some refinement. But using this technique and a cheesy "keyword correlation" algorithm in perl+mysql, and using our internal company wiki (900+ pages) as a data source, I made it so you can get a list of "interesting pages" related to your current page.

The results were... interesting. The algorithm, though simple, is surprisingly good. What it does, though, is a bit weird, because of the way we define "interesting." We only care about globally uncommon stuff. The result is a system that tells you not exactly, "What's related to this page?" but rather, "What's unusual about this page?" If I ask about pphaneuf, it mentions XPLC but not Net Integrator, even though I (as an evil manager) make sure he works more on the latter than the former. But other people do too, so XPLC is more interesting as far as the algorithm is concerned. It's kind of like the anti-google.

Hmm, an abnormality detector. I bet I could sell this to school boards in Kansas.

Related
An epic treatise on scheduling, bug tracking, and triage (2017)

My sister got married. My dad built a tardis (2014)
Unrelated
People Hacking Again (2005)

2003-04-21 »

Mozilla Still Sucks

dyork: I was distressed to read the entire "101 things that Mozilla can do that IE cannot" article and not find a single thing that an average person would find useful. Looks like browsers are officially "mature" now, kinda like word processors. (Translation: only really really smart people can think of something useful to add.)

Election Algorithms

Bram: The election analysis stuff is really interesting. I finally understand what's wrong with the Canadian/U.S. election systems (which are the only ones I care enough about to have watched, sorry). Now to convince the Powers That Be that they should switch to the condorcet system. Good luck.

There is also something here about document correlation, but I'm not yet sure exactly what.

Opening the NIT

After more fiddling, we're at last almost ready to replace the old open.nit.ca with the new new OpenNit quasi-wiki. And the anonymous cvs access is all set now using cvsd, pending one DNS registration.

Branch Constraint Theory

If I can just tear myself away from random browsing and email for a few hours, I'll be able to sanitize my (as previously mentioned) Branch Constraint Theory paper. Maybe tonight. Should I post it as an advogato article? Hmm.

(I must be on vacation, because I actually feel like I'm catching up for a change. I'm sure I'll get over it soon.)

Related
Systems design explains the world: volume 1 (2020)

The sad evolution of wikis (2010)
Unrelated
NITI in Retrospect (2007)

2003-04-22 »

More Mozilla

dan: I really did try to find something that the average person would find more useful in Mozilla than IE, but I stand by my opinion. The average person doesn't benefit. If my grandma could figure out what a popup was or how to configure blocking for it, that would be great; if IE didn't do perfectly good password management already (and even better with RoboForms) that would be wonderful.

I was honestly hoping (expecting?) that I'd find something in the list of 101 items, since I'm fully aware that Mozilla (or its variants) are the best non-IE browsers available. I use it myself, since I don't have a Windows desktop. But unfortunately Microsoft has us beat. If one of the things in the list had been "loads faster" or "renders faster" or "renders more pages that people actually visit" or "integrates better with your desktop" or "fits in with your existing desktop theme" or "isn't ugly", then you would have had me. Sadly, IE does all of those things better than Mozilla, and those are the things that the average person cares about.

I was even personally pleased to see that Mozilla is now claiming pipelined HTTP, which is one of my big concerns - sadly, average people don't care about that, either.

Related
Xnest, Xephyr, ChromeOS, synergy, and syncing some clipboards (2018)

Mozilla Still Sucks (2003)
Unrelated
Programmer migration patterns (2019)

← March 2003

May 2003 →

I'm CEO at Tailscale, where we make network problems disappear.
Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com