Thanks everyone who played the Numberz Challenge. I’ll get the boring part out of the way first. Based on response time and completeness (and bonus points for reverse engineering the “bad” parts of the app), lixiong is the winner of either a signed copy of HWTSAM or an Amazon gift certificate. Li – email me and we’ll figure out the details of the exchange. A definite honorable mention goes to Aleksander Lipski – he came up with the same conclusions as Li, and also took the time to establish some context (which is easy to forget in contrived examples such as this). Aleksander – send me an email and we can work out some sort of prize for you too.
Thanks to everyone else for playing along and testing the app. It was nice to see a high interest level in my silly little experiment.
For those interested in the what, why, and how of this exercise, read on.
The Numberz App v1
Brent Jensen noted that the first version of the application was quite buggy. It had several issues calculating the total. I released this buggy version on purpose. I have a bit of a distaste for testing applications that are so buggy that finding bugs is like shooting fish in a barrel – bugs should be hard to find. I think the applications we use to learn testing should be (mostly) working applications with some seeded defects that are difficult to find. Once I save up enough money for another Amazon gift certificate, I’ll see if I can come up with a more difficult exercise.
A more important point I’d like to bring up is that as testers, I hope we aren’t seeing apps like this as part of our day jobs. Something as broken as the first drop of this app shouldn’t make it one micrometer away from the hard drive of the developer who wrote it. If the apps and functionality you see on a daily basis is this bad, your software development process is broken (IMO of course, YMMV).
Numberz v2 – the Bugs
You can dig the details out of the comments, but basically, Numberz had three notable bugs.
The first bug was an intermittent addition error. Occasionally, the total would be one more than it should be. Some reports said that it seemed to happen about 2% of the time. Given the code, that sounds just about right. This gem was “hidden” in the source code.
if ((rand() % 50) == 0)
The second bug was that the number 3 showed up slightly more often than other numbers. In 100 roles, the difference (likely) isn’t statistically significant, but in larger number of roles, the delta was pronounced.
In 10,000 rolls of 5 numbers spanning 10 digits, you’d expect roughly 5000 occurrences of each digit. Most numbers had less than 5000 – mostly because the number 3 was being a hog.
 5096 int
 4619 int
 4730 int
 6842 int
 4513 int
 5068 int
 4601 int
 4986 int
 4958 int
 4587 int
Based on the distribution (6842 / 50000), it appears that the number 3 has about a 13% chance of occurring when it should have a 10% chance of occurring. Sure enough, the pesky programmer wrote this:
vals[i] = (rand() % 10);
if ((rand() % 30) == 0)
vals[i] = 3;
total += vals[i];
The third bug had to do with the app not closing all the time. This one, I’m afraid to say was unintentional, and is the sort of thing that occurs when one doesn’t write a windows app in C for a very long time, then decides to “whip one out”. For those interested in the root cause, it was a missing call to EndDialog() (win32 apps aren’t dialog based by default, and I forgot the extra magic to make it actually exit when closed).
The Testing Challenge
Would you believe that I, the hater of GUI automation, wrote an automated GUI test to test the app? I thought it would be fun to write a quick test from a pure black box perspective. Staying close to my roots, I used a win32 app in C as the test app. I used Spy++ to get the control IDs and went off to the races. The entire app (minus header declarations is below). It takes about five seconds on my middle of the road dev machine to execute the 10000 iteration test (and slightly longer if I were to actually log the test results).
#define LOOPLENGTH 10000
int additionErrors = 0;
// I got the IDs of the window items I care about from spy++
int numIDs =
int resultID = 0x3ef;
int buttonID = 0x3e8;
int APIENTRY _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
LPTSTR lpCmdLine, int nCmdShow)
HWND hwnd = FindWindow(NULL, L"Numberz");
//ok - now we have all the window handles we need, the rest is (mostly) easy
for (int loop = 0; loop < LOOPLENGTH; loop++)
SendDlgItemMessage(hwnd, buttonID, WM_LBUTTONDOWN, 0, 0);
SendDlgItemMessage(hwnd, buttonID, WM_LBUTTONUP, 0, 0);
int total = 0;
for (int i = 0; i < 5; i++)
int val = GetDlgItemInt(hwnd, numIDs[i], NULL, FALSE);
// fill an array with values that we can examine later
values[i][loop] = val;
// counts may be enough
int proposedVal = GetDlgItemInt(hwnd, resultID, NULL, FALSE);
if (proposedVal != total)
// logging omitted for now
Note, that while this test app runs and obtains accurate results, it will break as soon as a single control ID changes. For a more sophisticated app, I’d prefer a model that would let me get the values in a more reliable fashion. But given the context, it’s a good solution.
But real bugs aren’t seeded!?!
The seeded errors in the Numberz app were contrived, but that’s about the best you can do in a 120 line application. However, the applications we test – and the applications we will be testing in the future, are huge complex systems where interactions and interoperability scenarios cause errors like this to exist. In many of these systems, an automated test is the only way to find these interaction bugs – but most importantly, we need to recognize when automation can help us find errors that we wouldn’t be able to discover otherwise.
Remember, as testers, our job isn’t to write automated tests, or do exploratory testing; our job is to test. Automate what is necessary in order to test efficiently, and explore where it’s necessary to test effectively (which is another way of saying you should automate 100% of the tests that should be automated, and in a nutshell, is how I view test design).
If you’re interested in playing with the source code for the app or test, I put the code in my git repo (open to the public). I didn’t include make files, but those shouldn’t be too hard to put together for anyone who can figure out git.