I have an after-work event tonight, and rather than leave my car in the garage overnight, I ran to work. Since I’ve moved to downtown Bellevue, I’ve done this a few times – and given that I’m running another half-marathon in 10 days, it was a great opportunity for a long training run before I begin to taper my mileage down a bit leading up to the race.
I’ve been a long-time user of a running companion app called Runtastic. It does the usual stuff of tracking mileage, route, pace, and giving voice updates at user specified intervals. I find it especially valuable when training or racing, because I usually have very specific pace goals. Having a voice tell me how long the last mile/half-mile/km took me let’s me know if I’m running too fast (and burning out) or too slow (and putting my goals in jeopardy). Granted, I have a pretty good internal clock, and usually run my paces pretty well withouth “help”, but the feedback is really useful to me.
Today, I took off from home, started spacing out, and before long, I was a mile or two from home…when I noticed that I had not received any voice feedback yet. I knew exactly what happened (because it’s happened before). When you start the app, it gives you a 15 second timer before the tracking actually starts, along with the ability to add time to the delay up to two minutes or so. I LOVE this feature, because I can give myself time to put my phone into my running belt, walk twenty yards and curse myself for having such a painful hobby before actually exerting any physical energy.
Unfortunately, at least 25% of the time I use the app the countdown fails. But I never know it failed until too late.
What happens is that the countdown stops at 1 second. I hear the voice prompt count down 5-4-3-2-1, and I think it’s tracking, but it’s stuck on one second.
Today was extra painful not only because I wanted to see how I was doing on “race pace”, but after I discoverd it failed, I restarted the app, set the countdown again…and it “hung” on one second again.
Now, it’s a bug – that’s for sure. Some testers I know would automatically assume that every user in the world was hitting this bug and that public shaming of the company would be the next course of action. I, however, realize that context plays a role in many (every?) part of software engineering, and that given the value I get from the product (and my very amateur level of running) that while this is a painfully annoying bug, it’s not the end of the world.
Given that I wasn’t concerned at all with pace for the remaining 5+ miles of my commute, my mind began to wander. What follows is a completely made up story of how a Runtastic engineer may discover this issue without me, or anyone else, reporting it.
Pointy Haired Runtastic Manager: Hey super-smart employee (sse). Our default time for delay before a run is 15 seconds. Can you look at the data and see if our estimate for a default delay length is in the ballpark of what people actually use? Someone on the train told me that they though 30 seconds would be a lot better. I don’t think they’re right, but I want to make a decision based on data.
Super Smart Employee (sse): Sure boss. That data is pretty easy to pull. I’ll take a look!
What SSE is about to do at this point is gather Business Intelligence. They want to use data to make a business decision.
SSE looks at hundreds of thousands of activities from the past six months and sees that nearly 60% of the people just use the default 15 seconds. She quickly generates a scatter graph showing that shows the outliers and prepares it for her boss. Before sending, she realized that she wants to exclude the instances where people cancel the activity completely before the countdown completes (phone calls, cold feet (literally, and metaphorically), and a variety of other reasons could cause this). She filters the data and starts to send the report…but – while she’s there, she notices something…interesting. First, the number of “cancelled” activities seem high to her (over 15%). She flips the filter to look only at cancelled activities and things get weirder. Of the 15% of cancelled activities, 90% are cancelled at exactly 1 second.
That’s too weird to be true.
SSE looks at every activity where the timer was “killed” at one second. Often, those users started another activity within 10-15 minutes.
Or, maybe they all had the same model of phone.
Or maybe they were all running Spotify at the same time.
Or something. Remember. This story is completely made up.
The point is that SSE quickly went from gathering BI to using discovery and insight to find a pretty cool bug. Using data!
I told a story at a conference recently about a team I worked on that used an offshore vendor team to run through a large number of applications for app compatability testing. We asked them to take notes and to send a report, but not to bother filing bug reports.
Yeah – we had sufficient telemetry and monitoring that we knew about all the bugs and glitches (and had collected call stacks and other helpful information) already. Many, in fact, that the test team didn’t (or couldn’t) notice. Entering the bugs would have been a waste of time. In the rare cases where something weird happened that we didn’t track, we immediately added the appropriate instrumentation to track that class of failure in the future.
I expect that for most of you, my world isn’t your world. But in my world, data driven engineering is critical.
Since I don’t know how made up my made up story really is, I’m going to report it to Runtastic anyway. I can’t predict the future (or anything else), but I hope the reply to my complaint is, “Yeah – we already knew about that. From the data”.