Tuesday, February 28, 2012

On OneBusAway Inaccuracies

(This post is by S. Morris Rose. I'm the engineer that's been hired on a temporary basis to keep the services that power OneBusAway chugging away now that Brian Ferris, the engineer that created it, has moved on to work on transit projects at Google Zurich, though he still pitches in from time to time. The position is funded by contracts with King County Metro, Pierce Transit, and Sound Transit. I've been a technical staff member for Computer Science & Engineering at the University of Washington, where OneBusAway was created, for more than a decade.)

Many users have noticed that sometimes OneBusAway isn't real accurate- it might report a bus is early when it's on time, late when it's early, or display the status labeled, "scheduled departure" (which means that there is no "real time" arrival data available for that trip), or a scheduled trip might simply be missing. In the case of Community Transit (which is not a project funder), the schedule data has simply gone missing. In this post, I'll explain a few of the factors that lead to the errors.

OneBusAway depends upon two types of data to tell you where your bus is: schedule data, which all about where and when the agency plans for each bus to be, and real-time arrival data, which is all about where the bus is right now. Schedule data is updated but several times a year. Real-time arrival data is updated constantly. Pull those two data types together and apply algorithms, and you've got a guess about when your bus will arrive. There can- and are- problems with both data types and with algorithms that lead to false predictions.

In the case of schedule data, various things can go wrong. It can be incomplete, as is the case with the current King County Metro data, it can contain errors, as is the case with all complex datasets, or it can just be missing- as is the case, for now, with Community Transit data. Also, since the data only lands a few times a year, but minor changes are made by agencies along the way- perhaps due to construction- it can be partially stale. And if a trip is canceled or rerouted, such as during a snow emergency, the schedule data can become desperately wrong.

Real-time (AVL, or automatic vehicle location) data is much more complex and fraught. Because the data is changing constantly, latency- a difference between when a data point is generated and when OneBusAway gets it- is a problem. Sometimes a trip goes missing due to technical issues, in which case only "scheduled departure" is shown. Some agencies don't even have real-time data (e.g. Community Transit). Complicating matters for King County Metro is the fact that they are transitioning from an older system based on a combination of radio beacons and wheel rotation counts to one based on GPS. (That process is about 60% complete, but there are yet more than 500 buses to be converted. Some areas are behind others, including the northern area of Seattle, where there is a high concentration of OneBusAway users.) The task of combining the two types of real-time data has proven to be challenging.

And then there are the algorithms. To predict an arrival, there is a lot to compute even after the position of a bus is known. For example, a mile of Montlake Boulevard at rush hour on Friday translates to a lot more time than that same mile two hours later. OneBusAway doesn't do its own arrival prediction- instead, we rely upon data from others, who in turn run their own or commercial software. This arrival prediction data comes from the agencies themselves for buses that use GPS; and from MyBus for buses using the older AVL system. (MyBus is a system running here at UW, from Dan Dailey and the Intelligent Transportation Systems project. A big thank-you to Dan and Joel Bradbury for continuing to keep this data up and available! OneBusAway has relied on it from the beginning, and will continue to do so while the AVL system is still in use.)

Finally, when buses are on reroute due to snow (as happened last month), the arrival predictions currently become somewhere between wildly inaccurate or totally missing.

Add up all these issues, toss in a snowstorm in January and simultaneous major schedule changes in mid-February, and you get a service that sometimes tells you lies.


Phillip said...

When there's a consistent error in the prediction is it something that can be corrected if the right people find out? Or are there just too many moving parts?

It's the 41 in north Seattle so I'm assuming it's AVL still but I've always wondered if this is something where feedback would be helpful.

DJStroky said...

Cool Job! Great to see someone is continuing the work Brian did.

KatieK said...

I'm glad to see that this is still an important project for Metro. But wrong information is really frustrating.

More than once, I've skipped bus A because bus B (closer to my destination) was coming in just 5 minutes. But bus B never came and I waited an hour in cold rain.

Why does snow routing throw all reporting accuracy "under the bus"? I understand that you can't tell where the bus is if it's on different streets, but the data should be present for portions of the bus line which are the same for snow routes as regular routes; 3rd and Pike in downtown Seattle, for example.

Cookie Guru said...

Phillip - You can tell if a bus uses GPS because it will have a "next stop" sign inside and will announce major stops and transfer points. All buses operating route 41 are from Metro's "North Base" which is slowly acquiring GPS-equipped coaches.

Shephard said...

I can get schedule data directly from the source. The only useful thing for me to seek out from third-party software is real-time data. (In fact you can even get that from the source, too, it just has a terrible user-interface.)

The technical difficulties getting that data, as interesting and valid as they may be, still result in the possible (and completely unpredictable) compromising of the *primary* service being offered. And a failure of this particular service is somewhat unique in that it can often result in my life being *more* difficult than if I hadn't trusted it at all.

Nicholas Barnard said...

@Shepard Wow way to complain for something that you don't pay for.

I for one prefer the realtime data, but if thats not available the schedule data is better than nothing. Its pretty clearly marked what is what when you're using OneBusAway so its not as if its a bait and switch

Unknown said...

In response to Phillip's question--

Many years ago, when MyBus had some funding support, I used to spend substantial time looking at user's reports of prediction errors. We maintained a rolling log of a month's worth of live bus data, and one of our engineers developed a really helpful graphical application that plotted the bus location data versus the schedule data. From this we could get some idea of what was causing errors--bad hardware, radio beacon issue, errors in schedule data, software glitches, etc, and when applicable we passed the the info on to Metro, or if possible made fixes on our end. But it takes time, and that requires funding. I'm sure that with the current transitions, and the mixing of GPS and AVL data, there are bound to be glitches that will take much time and careful investigation to sort out. I don't think anyone is being paid to do this work (unless there is someone at Metro tasked with it).

I do find it kind of interesting that although money was cobbled together to 'keep OneBusAway running', there was no provision made for supporting the live prediction data stream that OBA depends upon (for Metro buses). OBA is essentially a front-end app for displaying the data that the Busview/MyBus projects developed and made freely available to other developers. As users are now finding out, the utility of such an app is somewhat limited if there is no support, maintenance, and continued development of the resources that are doing the actual bus tracking and departure prediction calculations.

MyBus, and the data stream that feeds OBA, has been kept running for many years (on now ancient computers) through unpaid, volunteer efforts.

When Metro completes the switch in bus tracking systems later this year, they will presumably use other, commercial prediction software, rather than the algorithms that were developed for MyBus. It will remain to be seen if future predictions will be more or less reliable.

Joel said...

When I wrote the previous comment, I hadn't realized that OBA seems to be already using the commercial vendor's (INIT's) predictions for the buses that have been switched to GPS. I did some comparison with MyBus, and I saw some distinct flakiness in the OBA predictions. In one case, MyBus was steadily predicting a 16 minute delay, while OBA showed the bus as having already departed on time. Then, later, OBA changed to showing the bus as delayed, rather than departed. One time the OBA display went to black and showed the scheduled time (indicating no live data), and once this bus dropped out of the OBA display entirely. In the end, OBA finally showed the bus departing, much delayed, in agreement with MyBus.

In both cases, the bus location data was coming from the same source, presumably GPS in this case. What was different was the handling of the predictions.

Admittedly, this is a single observation. It would be interesting to see a more extensive, statistical comparison of the comparative accuracies of the two prediction algorithms.

Unknown said...

This evenings commute showed, "Scheduled Arrival."

I understand that there is going to be some inaccuracies; however, I could get a bus schedule to get this information.

Joel said...

Well, just to suggest alternatives, MyBus (and the Metro counterpart, Tracker) were both continuing to display real-time bus information during the hours when OBA was apparently down this evening.

Justin Sweet said...

OneBusAway is quickly becoming unusable due to the many ongoing errors I keep getting. I know this is a free service, but it used to work so well and it's a real shame to see it fall apart like this.

John said...

Katiek. . Metro buses using the avl don't work so well in or after reroutes because the software makes predictions bases on the distance traveled since trip begin. since reroutes change the distance traveled, before the reroute OBA should work as expected but afterwards it may have difficult.

Im pretty sure all south base buses have gps. many buses at east and bellevue base have gps, no trolleys have it, and others its hit and miss. Should all have gps by years end.