“Ensuring Software Quality”…maybe

I ranted a bit on twitter last week about this book excerpt from Capers Jones. I’ve always had respect for Jones’s work (and still do), but some of the statements in this writing grated on me a bit. This could be (and is likely) because of how I came across the article (more on that somewhere below), but I thought I’d try to see if I could share my thoughts in more than 140 characters.

Jones starts out this chapter by saying that testing isn’t enough, and that code inspections, static analysis, and other defect prevention techniques are necessary – all points I agree with completely. His comments on test organizations ("there is no standard way of testing software applications in 2009") are certainly true – although I personally think it’s fine – or even good, that testing isn’t standard, and would hope that testing organizations can adapt to the software under test, customer segment, and market opportunity as appropriate.

Jones writes, "It is an unfortunate fact that most forms of testing are not very efficient, and find only about 25% to 40% of the bugs that are actually present". If all bugs were equal, I would put more weight on this statistic – but bugs come in a variety of flavors (far more than the severity levels we typically add in our bug tracking systems). I have no reason to doubt the numbers, and seem consistent with my own observations – which is why I am a big believer in the portfolio theory of test design (the more ideas you have and the better you can use them where they’re needed, the better testing you will do). I believe that with a large variety of test design ideas, that this statistic can be improved immensely.

As I re-read the excerpt sentence by sentence, there are few points that I can call out as completely wrong – but there are several themes that still don’t sit well with me. They include:

The notion that defect removal == quality. Although Jones calls out several non-functional testing roles, he seems (to me) to equate software quality solely with defect removal. Quality software is much more than being free of defects – in fact, I am sure I could write a defect free program that nobody would find any value in. Without that value, is it really quality software?
Jones talks about TDD as a testing activity, where I see it more of a software design activity. But more importantly, TDD primarily finds functional bugs at a very granular level. His claims that defect removal from TDD can top 85% may be true, but only for a specific class of bugs. If the design is wrong in the first place, or if the contracts aren’t understood well enough, a "defect free" TDD unit can still have plenty of bugs.
Jones claims that, "Testing can also be outsourced, although as of 2009 this
activity is not common." I don’t have data to prove this wrong, but anecdotally, I saw a LOT of test outsourcing going on in 2009.

I found this article while searching for the source of this quote (attributed to Capers Jones here).

“As a result (of static analysis), when testing starts there are so few bugs present that testing schedules drop by perhaps 50%.”

I’m a huge fan (and user) of static analysis, but this quote – which I hope is out of context, bugs the crap out of me. We do (and have done) a lot of static analysis on my teams at Microsoft, and we find and fix a huge number of bugs found by our SA tools – but in no way does it drop our testing schedule by 50% – or even a fraction of that. I worry that there’s a pointy haired boss somewhere that will read that quote and think he’s going to get a zillion dollar bonus if he can get static analysis tools run on his team’s code base. Static analysis is just one tool in a big testing toolbox, and something every software team should probably use. SA will save you time by finding some types of bugs early, but don’t expect SA to suck in source code on one end and spit out a "quality program" on the other end.

There are plenty of things I agree with in the Jones book excerpt. The comments on productivity and metrics are on the money, and I think the article is worth reading for anyone involved in testing and quality. It’s likely that my perception of the paper was skewed by the quote I found on the CAST software site, and hope that anyone reading the article for the first time reads it with an open mind and forms their own opinions.

Comments

Johan Hoberg says:

April 13, 2011 at 11:58 pm

Good article. Interesting read. Thank you!

Rikard Edgren says:

April 14, 2011 at 1:33 pm

I have read the excerpt twice, and there are so many things I disagree with and question.
I will only quote the fascinating start:
“There are more than 15 different types of testing.”

I also find the piece difficult to read, there might be more numbers than words…

1. Alan Page says:
  
  April 15, 2011 at 9:10 am
  
  Yes – it was difficult to read for me too. I also noticed the 15 types of testing line and stopped to pause.
  
  The big problem with readability was that every paragraph seemed to have a sentence that made me pause and think. It wasn’t that it was completely wrong – just wrong enough to make you stop and think rather than read.
  
Jim Hazen says:

April 15, 2011 at 8:15 am

Alan,

This one ‘Jones claims that, “Testing can also be outsourced, although as of 2009 this activity is not common.” I don’t have data to prove this wrong, but anecdotally, I saw a LOT of test outsourcing going on in 2009.” makes me want to ask the question:

“Mr. Jones, what planet are you on?” I’m like you, I’ve seen ALOT of outsourcing of testing going on.

And that is a debate for later on. Otherwise… great post!

1. Alan Page says:
  
  April 15, 2011 at 9:07 am
  
  The whole piece gave me that feeling – the outsourcing comment in particular was from left field, but many of the others seemed …just not quite right.
  
Andy Glover says:

April 18, 2011 at 12:57 am

Hi Alan, good article.
Just a thought, imagine a world where it wasn’t possible to perform static analysis on the code. Rather, the software was coded and went straight into ‘testing’. I wouldn’t be surprised if the testing schedules grew by 100%. More bugs to find, to fix and to re-test.

1. Alan Page says:
  
  April 18, 2011 at 6:35 am
  
  I could see 20% growth in that scenario, but not 100%. Maybe 20 years ago when compiler warnings weren’t as inclusive the growth could be more, but I can’t see 100%
  
  However – I would say that on a project built by junior developers with poor design and no code reviews – tested by new testers with no idea what they were doing, that static analysis could increase the schedule by 100%. So there you go. :}
  
James says:

April 27, 2011 at 7:10 am

This read like a lot of conversations I’ve seen at work involving people in software development management people that have no real idea what software testing really involves.

“Ensuring Software Quality”…maybe

Like this:

Comments

Leave a Reply to Johan Hoberg Cancel reply