Here’s a story I hear often. The names have been changed to prevent the guilty.
Jake had barely taken a sip of his steaming coffee when he saw that thirty-two of the automated tests failed in last night’s test pass. “Crap, I’m slammed today”, thought Jake, “I don’t have time to look at thirty-blanking-two failures”. Without a second thought, Jake clicked the ‘re-run failures’’ button on the web page that displayed results and turned his attention back to his coffee. After finishing his coffee and filling a second cup, Jake was happy to see that twenty-five of the failing tests now passed. “Must be a flaky environment”, thought Jake as he took a big swig of coffee and got to work investigating the seven remaining failures.
A few weeks later, Jake was sitting in in a meeting to go over a few of the top the live site failures reported by customers and the operations folks. Ellen, the development manager was walking through the issues and fixes, and throwing in a little lightweight root-cause analysis where appropriate. “These three”, she began “caused a pretty bad customer experience. When we first looked at the errors, we figured it had to be an issue with the deployment environment, but we discovered that we could reproduce all of these in our internal test and development environments as well.” Jake’s stomach sunk a bit as Ellen continued. “It turns out that although the functionality is basically broken, it will work some of the time. I guess our tests were just lucky.”
In some versions of this story, Jake steps up to the plate and takes responsibility. In other versions, he merely learns a lesson. In a few versions of the story, Jake calls the whole thing a fluke and goes through the same thing later in his career.
The point of this story is simple. Every test failure means something. The failure may mean a product failure. It may mean you have flaky tests. When you start to assume flaky tests or environments, you’re heading into the land of broken windows and product failures you could have found earlier (actually, you probably did – you just ignored them).
Great testers rely on trustworthy tests. The goal is that every failed test represents a product failure, and any tests that fall short of that goal should be investigated and fixed – or at the very least updated with diagnostic information that lets you make a quick confident decision about the failure. Relying on test automation for any part of your testing is pointless if you don’t care about the results and look at failed tests every time they fail.
Yes, I know. Your situation is unique, and you have a business reason for ignoring failed tests. My first response when I hear this claim is that you’re probably wrong. Probably, but not definitely – but don’t let flaky tests get through your reality filter. Otherwise, you’ll be sitting in Jake’s shoes before you know it.
Our test execution tool has an option to “run until pass”…
My team, and wherever I have a chance to decide, doesn’t use this option.
I have to admit though, that since we have a flaky test environment that can’t be fixed (for example homemade RF test equipment whose developers are long gone), and since many times investigating intermittent phenomenons can take days (or forever, due to lack of relevant logs) we do sin in rerunning failed test without fully understanding the root cause.
This is simply great. I have been Jake sometimes myself and I knew I was doing something wrong all along. What you have written here is not something very complex to comprehend, so I am wondering why did I not understand this when I was actually going through it. I guess this is why we need somebody to tell us things. Thank you.