Oh the tests I’ll run

Last week, Katrina Clokie (@katrina_tester) asked this question on twitter:

Has anyone dynamically ordered automated checks so that those most likely to fail are executed first, then the build can fail fast?

— Katrina Clokie (@katrina_tester) February 24, 2017

I gave a few abbreviated answers based on my experience, and promised to write up a bit more, as this is something I’ve played with quite a bit before. I sort of meant to copy and paste an email I sent to an internal team (at msft) a few months back, but alas – I don’t have access to that email anymore :}

Why select tests?

A lot of people will ask, “If I can run every test we ever wrote in five minutes, why does this matter?” If this is the case, of course it doesn’t matter. I’m all for parallelization and leveraging things like selenium grid to run a massive number of tests at the same time; but it’s not always possible. On the Xbox team, for example, we had a limited (although large) number of systems we could use for testing, so test selection / prioritization / ordering was something we had to do.

Basics of Test Selection

OK – so you’ve decided you have need to select the tests most likely to fail to run first. Easy(ish) – just run the tests that exercise all the code that changed!

This is quite a bit easier than it sounds – that is if you’re already using code coverage tools. Now is a good time to remind you that code coverage is a wonderful tool, but a horrible metric. Test selection benefits from the former part of that statement. For every automated test in your suite, periodically collect the coverage information just for that test, and save it somewhere (I suggest a database, but you can shove it in json or excel if you feel the need). Now, you know exactly which lines, functions, or blocks (depending on your coverage tool) are hit by each test.

The slightly harder part may be to figure out which lines of code have changed (or have been added, or removed) since the last time the tests were ran (which may be the last check-in, the last day, or longer). I can leave it as an exercise to map source control information to the same database / json / excel as mentioned above, but once you have this key/map pair, test selection is just picking the tests that hit changed lines.

But there are a lot of caveats with this approach. If you’re changing a low level component, every test will hit it (as an example, I used this approach many years ago on a test system that communicated with external devices over winsock. Every change to winsock told us that we needed to run every test. While probably a correct approach, it didn’t really help with prioritization. You’ll also find that often enough, there aren’t any tests to cover the changed code – and I’ll let you figure out what to do when you have tests that hit code that was removed (hint: maybe run the test once anyway to make sure it fails).

Heuristics

What I’ve found, is that coverage is a good start – and may be enough for most teams (among teams who can’t run all of their tests quickly on every build). But adding some other selection factors (or heuristics) and applying some weights can take you a bit farther.

Some heuristics I’ve used in the past for test prioritization / selection include:

Has the test found a bug before? Some tests are good at finding product bugs. I give these tests more weight.
When was the last time the test ran? If a test has run every day for a year and never failed, I don’t give it much weight. We testers are always paranoid that the moment we choose not to run a test that a regression will appear. This weighted heuristic helps combat the conundrum of running the test that never fails vs. fear of missing the regression
How flaky is the test? If you never have flaky tests, skip this one. For everyone else, it makes sense to run my tests that return false positives less often (or at the end of the test pass)
How long does the test take? I put more weight on tests that run faster.

Then I give each of these a weight. You can give each a whole number (e.g. 1-5, or 1-10), a decimal value, or whatever. Then, do some math to turn the full set of weights into a value, and then sort tests by value. Voila – my tests are prioritized. As you run the tests and learn more, you can tweak the numbers.

You can add more test meta-data as needed, but the above is a minimum. For example, with just the above, you could run something like:

run the most important tests that will complete in under 15 minutes

Using whatever command line arguments would support the statement above, you can limit the test run based on test time (and optionally add even more weight to tests that run quickly).

Probably a lot more nuance here, but the concept of test selection is probably something any tester working with automation should know a bit about.

Just in case.

Oh the tests I’ll run

Why select tests?

Basics of Test Selection

Heuristics

Like this:

Leave a Reply Cancel reply