All Posts

Conflicting Results

ByAlan Page November 15, 2009November 16, 2009

I’m a huge soccer fan, and I’m happily following the MLS Cup even though the local team was eliminated last week. Last night’s match between Real Salt Lake (RSL) and the Chicago Fire went to penalty kicks before one team finally prevailed. After the game ended, I went to mlsnet.com to watch the highlights and check out some of the stats. When I got there, the front page had this headline and teaser:

Quick – which team won? Did the Fire edge Real Salt Lake, or dir RSL outlast the Fire?

If you read a bit more, you’ll see that “RSL will face the Galaxy in the 2009 MLS Cup”, so if you go with majority rules you’ll be correct, since RSL did indeed edge the Fire last night. Headline errors aren’t all that uncommon (e.g. Dewey Defeats Truman), so I don’t fault the news site at all. Unfortunately, a very close relative of error, the false positive, has been bugging the crap out of me lately, and this headline reminded me that it’s past time to share my thoughts.

Let’s say you have 10,000 automated tests (or checks for those of you who speak Boltonese). We had a million or so on a medium sized project I was involved with once, so 10k seems like a fair enough sample size for this example. For the purpose of this example, let’s say that 98% of the tests are currently passing, and 2% (or 200 tests) are failing. This, of course, doesn’t mean you have 200 product bugs. Chances are that many of these failures are caused by the same product bug (and hopefully you have a way of discovering this automatically, because investigating even 200 failures manually is about as exciting as picking lint off of astroturf). Buried in those 200 failures are false positives – tests that fail due to bugs in the test rather than bugs in the product. I’ll be nice and say that 5% of the failures are false positives (you’re welcome do do your own math on this one). Now we’re down to 10 failures that aren’t really failures. You may be thinking that’s not too big of a deal – it’s only 1% of the total tests, and looking at 10 tests a bit closer to see what’s going on is definitely worth the overall sacrifice in test code quality. Testers in this situation either just ignore these test results or quickly patch them without too much further thought.

This worries me to no end. If 5% of your failing tests aren’t really failing, I think it’s fair to say that 5% of your passing tests aren’t really passing. I doubt that you (or the rest of the testers on your team) are capable of only making mistakes in the failing tests – you have crappy test code everywhere. A minute ago, you may have been ok with only 10 false positives out of 10k tests, but I also think that 490 of your “passing” tests are doing so even though they should be failing. Now feel free to add zeroes if you have more automated tests. I also challenge you to examine all 9800 tests to see which 490 are the “broken” tests.

Yet we (testers) continue to write fragile automation. I’ve heard quotes like, “It’s not product code, why should it be good”, or “We don’t have time to write good tests”, or “We don’t ship tests, we can’t make it as high quality as shipping code”. So, we deal with false positives, ignore the inverse problem, and bury our heads in the sand rather than write quality tests in the first place. In my opinion, it’s beyond idiotic – we’re wasting time, we’re wasting money, and we’re breeding the wrong habits from every tester who thinks of writing automation.

But I remain curious. Are my observations consistent with what you see? Please convince me that I shouldn’t be as worried (and angry) as I am about this.

All Posts

An Interlude
ByAlan Page June 12, 2011

As you may have noticed, I’ve had to take a bit of a break from blogging. I’ve kicked out at least a post or two a week for a few years now, but I’m not dry on ideas, and I definitely want to follow up on my recent Test This post (before a follow up…

Like this:
Like Loading…

Read More An Interlude
All Posts

Five for Friday – February 19, 2021
ByAlan Page February 19, 2021February 19, 2021

Back again and sharing a few cool things I found this week. I wrote another article for TestProject – this one on why developers should write all (or at least most) of your automated tests. While I’m self-promoting, I’m giving a webinar next Wednesday on developer vs. tester mindsets. Former guest on the ABT 343…

Like this:
Like Loading…

Read More Five for Friday – February 19, 2021
All Posts

Five for Friday – January 6, 2023
ByAlan Page January 6, 2023January 6, 2023

Happy New Year everyone! It was great to have a break, but I bet you can’t wait to see what bits of the internet I’ve uncovered for you. Or you’re here to see if I’m going to get people on linkedin angry at me again. Either way – here are some links to things I’ve…

Like this:
Like Loading…

Read More Five for Friday – January 6, 2023
All Posts

Ur doin it rong
ByAlan Page November 18, 2009November 18, 2009

I’d like to offer a bit of advice for everyone in the world (but especially to software testers). In just about every thing you do, every day of your life, it is possible to do something wrong. My challenge to you is to think deeply about how you can do things “right”. Some examples: If…

Like this:
Like Loading…

Read More Ur doin it rong
All Posts

Five for Friday – October 1, 2021
ByAlan Page October 1, 2021October 1, 2021

We made it through another week. Sort of. Between this, that, and some really gnarly other stuff, here are a few links I found that you may find interesting as well. In teams I lead, I do as much as I can to ensure a culture of transparency, accountability, and psychological safety. As such, I’m…

Like this:
Like Loading…

Read More Five for Friday – October 1, 2021
All Posts

Five for Friday – March 26, 2021
ByAlan Page March 26, 2021March 26, 2021

Back again, and once again, if you’d like to receive this post via email, you can sign up here. Here are some things you should know this week. First – I wrote another article for TestProject – this one on Testing and the Road to Quality. I’m happy to say that with this article nobody…

Like this:
Like Loading…

Read More Five for Friday – March 26, 2021

8 Comments

Scott Yost says:

November 15, 2009 at 5:08 pm

I’m pretty confident we have the false negative problem. I try to put in self-tests for my automation frameworks to prove that my methods are at least capable of failure. I catch a lot of test bugs this way, unfortunately.

We do typically try to hold test code to a high quality bar though, so we’re not quite as guilty of that problem.

Reply
phil kirkham says:

November 16, 2009 at 2:08 am

Never mind automated tests – I’m currently having problems reviewing manual tests that have passed that should have failed 🙁

( and I wish it was only 5% of them )

[Alan] The numbers in my experience are worse too. But I thought the story was scary enough already .

Reply
Albert Gareev says:

November 16, 2009 at 6:52 am

> Please convince me that
> I shouldn’t be as worried
>(and angry) as I am about this

Hi Alan,

It’s all about the approach.
While most of testers only started studying programming they evolutionary reproduce all the errors and wrong assumptions that were already left behind by mature developers.

A brilliant tester won’t necessary create a brilliant code… most likely it will be a crappy code, especially if record/playback or other code generation tool is used.

So don’t blame automated tests – they just reproduce the garbage that was put on.

[Alan] Believe me, I’m not blaming automation – I’m blaming the testers who find it perfectly acceptable to create poor automation.

Reply
Marlena says:

November 16, 2009 at 1:16 pm

This is exactly the problem I found when I started automating my tests and it left me totally crestfallen because I worked so hard on them.

My passing tests that should have been failing showed me that special attention must be paid if one is developing for test.

Separating testing and coding is much harder than it would appear on the surface, and it’s not as simple as “developers” writing “better” code than “testers.” There is complexity in the problem set of evaluating results.

Reply
Liz Marley says:

November 16, 2009 at 2:42 pm

Is it fair to assume that the type of coding errors which cause a test to incorrectly pass and errors which cause an incorrect fail are of equal probability?

For safety, many electrical appliances are designed so that if something goes wrong, they will refuse to work rather than refusing to stop. Is there a way we can engineer our tests so that if there’s a bug, it’s more likely to be a bogus fail than a bogus pass?

Sorry, I only have questions, no answers.

[Alan] – but they’re good questions, and enough questions may lead to an answer. Thanks for the comment.

Reply
Liz Marley says:

November 16, 2009 at 2:45 pm

PS Your blog may have a bug: My previous comment was posted at 1:42, but the time stamp displayed is 2:42.

[Alan] – oops – thanks for pointing that out. I set up the wrong UTC offset when I set up the blog. Should be fixed now.

Reply
Lanette says:

November 17, 2009 at 6:03 pm

We humans are not yet good a programming the evaluation portion of tests. For that reason the accuracy is woefully bad. A tweep told me “Automating checks is nearly 100% automated — meaning there is no need for human to evaluate the result – 0 or 1 …” If 0 or 1 was always an accurate pass or fail only on the system under test then the automated check would not need human evaluation. That has never been the case yet in ANY of the automation I’ve worked so hard to make maintainable. I think wishing for that is the wrong goal.

Any automation which makes it easier for a human to evaluate the quality of important aspects of the software faster, cheaper, and better start to finish than they could without it is automation that is a success! I think our goals for automation are wrong and the areas where there is great potential for innovation, such as model based testing, random number generating varied tests, and even tests which detect change and are partly manual but help a tester be more powerful are discounted because of this one stupid idea that somehow human evaluation can be skipped.

The automation can help testers do more. It is good at that. It only sucks at replacing testers because so far we aren’t good enough at programming machines to evaluate and test for us.

Sorry, Alan, this is a big rant in your blog, but I get fired up about this stuff because the goals and motivations are harmful and they could be so much better!

Reply
Adam R says:

May 27, 2010 at 8:10 am

Preaching to the choir! My tests aren’t fail-proof, but i try to practice safe coding. That means it’s less headache for maintenance, more modular for reusing functions, and more readible for someone who might want to update it.

If I expect the developers to provide me with good code to test, i feel that I owe them good code to test it with. Now, do we take it ad far as writing unit tests for our test scripts? Can I get a QA crew to QA QA?

Reply

Like this:

Similar Posts

Like this:

Like this:

Like this:

Like this:

Like this:

Like this:

8 Comments

Leave a Reply Cancel reply