It’s (probably) a Design Problem

As you know, I occasionally rant about GUI automation; but I don’t think I’ve done a good job explaining some of the reasons why it can be fragile, and when it can actually be a good idea.

Let’s take a deeper look at some of the attributes of the GUI automation design challenges.

  • Record & Playback automation. This is a non-starter for me. I’ve never seen recorded automation work well in the long term– if your automation is based on record & playback, I can’t imagine it being successful. Yes, the tools are getting better, but the real problem is …
  • What are you going to do about the oracle? – Regardless of whether you write the automation, or you record it, simple playback of a user workflow has an oracle problem. The failure model in many UI tests is that nothing “bad” happens – the problem is that you don’t often know what flavors of “bad” you’re looking for, so your tests may miss stuff. However, if you design your tests with a trusted oracle (meaning if the test fails, you know there’s a problem, and if it passes, you know the scenario works), you probably have usable UI automation. By the way – if your oracle solution involves screen comparisons, I think you’re heading down a scary path – please consider this solution as a last, end-of-the-world type solution.
  • Bad Test Design I – Many authors of UI tests seemingly fail to realize they’re writing automation. Basic verification that would be hit by anyone walking through the basics of the application isn’t worth much to me. However, if your automation actually takes advantage of automation – meaning that loops, randomness, input variations, and loads of other ideas are part of the automation approach, the approach may be worthwhile.
  • Bad Test Design II – I dislike tests that do exactly the same thing every time. While valuable for regression testing, “hard coded” tests have severe limitations. In notepad, I can open a file by clicking the file menu, then selecting open, or I can press Alt-f, then o on the keyboard, I can press Ctrl-o, I can drag a file onto an open instance of notepad, or I can double click a file associated with notepad. A well-designed notepad test could just state File.Open(filename), and then use a randomly selected file open method. Too much UI automation doesn’t have this sort of abstraction and limits the effectiveness of the approach.
  • Bad Test Design III – Lack of forward thinking is another design flaw I see often. I once saw a GUI test suite that ran (successfully) on over 20 different localized versions of Windows 95, including RTL languages – and it ran successfully on Windows 98. Unfortunately, not all test authors consider the variety of scenarios where their tests could run.
  • Bad Test Design IV – Failure to consider what can fail. This one irks me, because testers should know that bad things can happen. Consider what will happen when someone changes window text, a control type, or dialog layout. Consider what happens when Windows reboots your machine in the middle of a test run. You don’t necessarily have to make your test resilient to these changes, but at the very least, you need to make the error text point to exactly what changed and caused the error. Too often, tests fail and give zero information on why they failed. Plan for failure and ensure that all test failures tell you exactly what is wrong.
  • Bad Test Design V – As a UI tester, you should have some notion of what sorts of UI automation are reliable, and which sorts are flaky. Using SendKeys works…but it probably should be a last resort. Good UI automation means that you know at least three ways to accomplish any task, and know which of the approaches is most reliable and which is least reliable.
  • Bad Test Design VI – One of my test design smells is Sleep (or similar) statements. The more I see in your test code, the less I trust it. Repeat after me – “Sleep functions are not a form of synchronization”. Most frameworks have an alternative to Sleep. There is always a better alternative to Sleep statements.
  • Fragile UI – It’s really easy to write automation that fails anytime the UI changes. It’s also really easy to write UI that breaks automation. If you don’t start testing until late in the product cycle, and know that the UI automation won’t be used in subsequent versions of the product, an investment in UI automation may make sense (given that it solves a real testing problem. Alternatively, if you’re involved early in the product cycle, you could wait to write UI automation until the UI was complete. A third (and recommended) approach is to make sure that the application under test is designed in a way that makes UI automation more robust (i.e. increase testability). Testability is, in short, the expense of test. Rewriting automation frequently during the product cycle is expensive. Writing complex oracles to verify things that should be easier is expensive. Working with developers to implement a UI model that enables straightforward and sustainable automation is cheap  – especially when you consider the long term benefits and gains.I probably have another post on this subject (testability) alone.

My annoyance with GUI automation isn’t a blanket view. The problem is that few teams put the effort into product and test design that successful GUI automation requires. I think GUI automation can be awesome – but we (as an industry) don’t seem to care enough…yet.


  1. “Sleep functions are not a form of synchronization.” True, and important. Unfortunately, there were times when I can’t get anything else to work at all.

    Nice post, but it fairly screamed at me what IMO is a better solution *if* you can do it – advocate very early in the product cycle for a SDK that allows access to the business logic, and keep the GUI layer thin. If bugs can’t be found by SDK automation, then they’re lo-sev bugs, so product quality risk associated with having a bunch of failing GUI automation or missing GUI automaton, is low.

    SDK automation will run more quickly and reliably than GUI automation, it will allow broader and more robust exercising of business logic including boundary checking, and it will be more stable than GUI automation.

  2. In response to “Bad Test Design II” I think the random approach is only good if the choices made “randomly” can be duplicated e.g.. to verify a fix or test for race conditions etc. For example, you could write your XML grammar to have a set of zero or more test run choice sets, and each set contain all choices differentiated by e.g. XML attribute and a stable index that identifies the set of choices. The harness interprets the XML, reports on the index, and runs the test with the choices specified. To reproduce the test run, the index can be specified.

    That way, you’re sure to know from the test artifacts which choices were made (e.g. Ctrl-O to open the file followed by a menu-drop to save-as) and if for some reason there’s a behavior that is associate only with those specific choices, you can run the test again at will. You can’t do that if the test run choices are (pseudo) random.

    Also, with some XML to drive the choices as above, you can test more efficiently with pair-wise testing rather than running the suite for a very long time, waiting for a case where the (pseudo) random choices break the product GUI.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.