Numberz Winnerz

Thanks everyone who played the Numberz Challenge. I’ll get the boring part out of the way first. Based on response time and completeness (and bonus points for reverse engineering the “bad” parts of the app), lixiong is the winner of either a signed copy of HWTSAM or an Amazon gift certificate. Li – email me and we’ll figure out the details of the exchange. A definite honorable mention goes to Aleksander Lipski – he came up with the same conclusions as Li, and also took the time to establish some context (which is easy to forget in contrived examples such as this). Aleksander – send me an email and we can work out some sort of prize for you too.

Thanks to everyone else for playing along and testing the app. It was nice to see a high interest level in my silly little experiment.

Analysis

For those interested in the what, why, and how of this exercise, read on.

The Numberz App v1

Brent Jensen noted that the first version of the application was quite buggy. It had several issues calculating the total. I released this buggy version on purpose. I have a bit of a distaste for testing applications that are so buggy that finding bugs is like shooting fish in a barrel – bugs should be hard to find. I think the applications we use to learn testing should be (mostly) working applications with some seeded defects that are difficult to find. Once I save up enough money for another Amazon gift certificate, I’ll see if I can come up with a more difficult exercise.

A more important point I’d like to bring up is that as testers, I hope we aren’t seeing apps like this as part of our day jobs. Something as broken as the first drop of this app shouldn’t make it one micrometer away from the hard drive of the developer who wrote it. If the apps and functionality you see on a daily basis is this bad, your software development process is broken (IMO of course, YMMV).

Numberz v2 – the Bugs

You can dig the details out of the comments, but basically, Numberz had three notable bugs.

The first bug was an intermittent addition error. Occasionally, the total would be one more than it should be. Some reports said that it seemed to happen about 2% of the time. Given the code, that sounds just about right. This gem was “hidden” in the source code.

    if ((rand() % 50) == 0)
    {
        total+=1;
    }

The second bug was that the number 3 showed up slightly more often than other numbers. In 100 roles, the difference (likely) isn’t statistically significant, but in larger number of roles, the delta was pronounced.

In 10,000 rolls of 5 numbers spanning 10 digits, you’d expect roughly 5000 occurrences of each digit. Most numbers had less than 5000 – mostly because the number 3 was being a hog.

        [0]    5096    int
        [1]    4619    int
        [2]    4730    int
        [3]    6842    int
        [4]    4513    int
        [5]    5068    int
        [6]    4601    int
        [7]    4986    int
        [8]    4958    int
        [9]    4587    int

Based on the distribution (6842 / 50000), it appears that the number 3 has about a 13% chance of occurring when it should have a 10% chance of occurring. Sure enough, the pesky programmer wrote this:

        vals[i] = (rand() % 10);
        if ((rand() % 30) == 0)
        {
            vals[i] = 3;
        }
        total += vals[i];

 

The third bug had to do with the app not closing all the time. This one, I’m afraid to say was unintentional, and is the sort of thing that occurs when one doesn’t write a windows app in C for a very long time, then decides to “whip one out”. For those interested in the root cause, it was a missing call to EndDialog() (win32 apps aren’t dialog based by default, and I forgot the extra magic to make it actually exit when closed).

The Testing Challenge

Would you believe that I, the hater of GUI automation, wrote an automated GUI test to test the app? I thought it would be fun to write a quick test from a pure black box perspective. Staying close to my roots, I used a win32 app in C as the test app. I used Spy++ to get the control IDs and went off to the races. The entire app (minus header declarations is below). It takes about five seconds on my middle of the road dev machine to execute the 10000 iteration test (and slightly longer if I were to actually log the test results).

#define LOOPLENGTH 10000
 
int values[5][LOOPLENGTH];
int counts[10];
int additionErrors = 0;
 
// I got the IDs of the window items I care about from spy++
int numIDs[] = 
{
    0x3ea,
    0x3eb,
    0x3ec,
    0x3ed,
    0x3ee
};
int resultID = 0x3ef;
int buttonID = 0x3e8;
 
 
int APIENTRY _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,  
                       LPTSTR lpCmdLine, int nCmdShow)
{
    UNREFERENCED_PARAMETER(hPrevInstance);
    UNREFERENCED_PARAMETER(lpCmdLine);
 
    HWND hwnd = FindWindow(NULL, L"Numberz");
 
    //ok - now we have all the window handles we need, the rest is (mostly) easy
    ShowWindow(hwnd, nCmdShow);
    for (int loop = 0; loop < LOOPLENGTH; loop++)
    {
        SendDlgItemMessage(hwnd, buttonID, WM_LBUTTONDOWN, 0, 0);
        SendDlgItemMessage(hwnd, buttonID, WM_LBUTTONUP, 0, 0);
        int total = 0;
        for (int i = 0; i < 5; i++)
        {
            int val = GetDlgItemInt(hwnd, numIDs[i], NULL, FALSE);
            total +=val;
            // fill an array with values that we can examine later
            values[i][loop] = val;
            // counts may be enough
            counts[val]++;
        }
        int proposedVal = GetDlgItemInt(hwnd, resultID, NULL, FALSE);
        if (proposedVal != total)
        {
            additionErrors++;
        }
    }
    // logging omitted for now
}

Note, that while this test app runs and obtains accurate results, it will break as soon as a single control ID changes. For a more sophisticated app, I’d prefer a model that would let me get the values in a more reliable fashion. But given the context, it’s a good solution.

But real bugs aren’t seeded!?!

The seeded errors in the Numberz app were contrived, but that’s about the best you can do in a 120 line application. However, the applications we test – and the applications we will be testing in the future, are huge complex systems where interactions and interoperability scenarios cause errors like this to exist. In many of these systems, an automated test is the only way to find these interaction bugs – but most importantly, we need to recognize when automation can help us find errors that we wouldn’t be able to discover otherwise.

Remember, as testers, our job isn’t to write automated tests, or do exploratory testing; our job is to test. Automate what is necessary in order to test efficiently, and explore where it’s necessary to test effectively (which is another way of saying you should automate 100% of the tests that should be automated, and in a nutshell, is how I view test design).

If you’re interested in playing with the source code for the app or test, I put the code in my git repo (open to the public). I didn’t include make files, but those shouldn’t be too hard to put together for anyone who can figure out git.

Comments

  1. Adding my thanks – it was fun and a good way to try out some Ruby.

    Tip of the hat to lixiong for reverse engineering the app !!

    Maybe some of those people who wonder how to interview testers should look at this and try it out on their potential recruits

  2. “Remember, as testers, our job isn’t to write automated tests, or do exploratory testing; our job is to test. Automate what is necessary in order to test efficiently, and explore where it’s necessary to test effectively (which is another way of saying you should automate 100% of the tests that should be automated, and in a nutshell, is how I view test design).”

    Well said!

  3. This was fun, Alan. Thanks! I recall in the earlier days of my career, one of my employees constructed a ‘buggy’ app much as you suggested. We would put in front of interview candidates sans Spec asking them to Test it. It was a simple app, but it really helped us see who was testing and who was throwing ‘noodles at the wall’ to see which stuck.

    I agree finding bugs should be hard. I tell younger testers it’s our job to out think dev. This job is hard. It’s not about automation, nor about blindly walking the product. There are places for both of those, but Testing is not a task for the intellectually lazy.

    That said, it sometimes feels like we haven’t done much to move “Quality Upstream” (My favorite inactionable QA cliche). At times, the desire of Dev to rely on the “QA safety net” seems to have greater momentum, so they keep handing us well stocked barrels.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.