Thursday, January 22, 2009

The Perils of Operating a Service People Use

I was waiting at a bus stop a few weeks ago and I was describing OneBusAway to some fellow UW students waiting at the stop with me. "That's awesome!" the exclaimed, and they immediately tried calling up the phone number to see the service in action.

Of course, the phone service wasn't working.

As luck would have it, the phone service was down for some reason. Restarting Asterisk fixed the problem, but the damage was already done. It was pretty embarrassing because (a) things were broken and (b) I had no idea.

In fact, there have been a number of times when a polite email or even a tweet was my first indication that all was not well in OneBusAway land. While I definitely encourage anyone to let me know if they are having issues with OBA, I'm happy to announce I'm taking more proactive measures from now on. I've now got a basic Nagios installation up that monitors the various services that make up OneBusAway and automatically sends out emails and text messages when things go wrong.

And while things are green across the board for now:

Green is Good

Things will invariably go bad. Hopefully I'll be one of the first to know about them now ; )

