Roles and Boxes

The fine folks at the Ministry of Testing keep promoting my blog posts, so the least I can do is give them a link and a shout out. I’m looking forward to talking about “Testing without Testers” at Test Bash Philadelphia (preview here) and about my role on the team.

This morning, I passively listened to The Bach Bros talk in a webinar about roles and what they mean, as I was curious to get their take. The punch line of the talk was the introduction of “Role grams” – which are shapes that describe who does what, what they’re doing, and who owns it. Nothing earth-shattering, but a workable model.

For the last 14 months, I’ve been a manager on an engineering team responsible for deployment and quality. As I’ve put it before, I am responsible for everything that happens between the time code is checked in (or slightly before, as I also added a few git hooks) to the time it is deployed to our production servers. Given that context, my role is a big blob of stuff. One could say that my “role” is Director of quality and infrastructure, and within that role, I take on activities of tools, engineering productivity, builds, quality, and testing. Certainly, you could say that each of those are roles I take on as part of my job – that model just doesn’t work as well for me. My role – the part I play as an actor on my product team, is one where I need to look at the entire ecosystem of check-in, CI, build, test, and release as one thing. I look at all of that as a system – how quickly and efficiently can I move a developer check-in to something that shows customer value. I have to view the system to know which parts are bottlenecks, and which parts need speeding up, or slowing down in order to increase overall system efficiency.

Here’s a low-tech view of my role. 20160915_120808I’m not locked in my box – in fact, since I’m the “quality guy” (label, not role) on my team, I spend most of my time working across the team on improving the testing done by other engineers on the team. I also have a small team of dedicated testers (these people have a testing “role”), who focus almost entirely on exploratory testing driven by trends in customer feedback and areas where I think we need additional breadth coverage in order to reduce risk.

It’s an evolving role, but one I enjoy, and one that I think will become much more common in the coming years.

The Mojo of the Weasel

It’s been a heck of a year.

I joined a new team at MS fourteen months ago, and it’s been the busiest fourteen months of my career. To be fair, I worked more hours during my stint in Xbox One, but the role I’m in now has even more responsibility (more on what that means some other time). Since I haven’t really blogged much since I joined the team, perhaps a recap is in order for context.

I’m working on a yet-unannounced, and possibly-soon-to-be-released product. I was hired as “the quality guy”, but expanded the role over the time I’ve been on the team to take ownership of all of our build and release infrastructure as well. Basically, I’m responsible for everything from the moment code is checked in (including check-in quality gates), until it hits our production servers (at this time, “production” is for beta users only). This includes our CI system, build systems, test infrastructure, deployment, and a bit of manual testing as well. To this end, I’ve ventured back into management, and manage a small handful of full time employees, as well as a larger handful of temporary (vendor) testers. I’m responsible for making sure we have systems and strategies that enable us to get a good product to our customers frequently. I make sure the code is ready, and that the product is well-tested. It’s also my call on whether – and when to push bits to our production environment. It’s a killer job, but one deep in my wheelhouse. I’ve grown in many ways over the last year, and learned even more.

But it’s taken away a bit of who I am. I don’t write much anymore. I speak much less than I used to, and I’ve all but disappeared from twitter (one plus is that I’ve been really happy with the AB podcasts that Brent and I have been delivering). While part of me is growing and learning, another part of me is withering and dying.

So I need to make a change. Not necessarily a change as big as changing jobs or changing companies, but I need to remember that my job is just a job, and it’s not who I am. Like a lot of people, I lose sight of that sometimes, and I need to see if I can get myself back on track – and start by (re) connecting with the community that inspires me and drives me to do even better things.

I remain excited about the product I’m working on, and the team I work with – but perhaps more excited about speaking at Test Bash, getting back to talking with old friends on Twitter, knocking the dust off, and sharing a bit more with my friends.

Here’s to re-engaging.

The A Word returns (sort of)

Chris McMahon, who has always impressed me with his words and his wit called me out in his blog.


Apropos of my criticism of “Context Driven Approach to Automation in Testing” (I reviewed version 1.04), I ask you to join me in condemning publicly both the tone and the substance of that paper.

Almost exactly a year ago, I reviewed a draft of the paper, and my name is among those listed as reviewing the paper. The feedback I gave was largely editorial (typos and flow), with a few comments about approach that I’ll repeat here:

The first red flag I called out (and will call out again here) is the phrase that describes test automation as something to “automate testing by automating the user”. This is a shallow view of test automation, but other than comment, I didn’t push hard on it. In hindsight, this was a mistake (much of The A Word touches on this topic).

In regards to the scenario the authors chose for scenario automation, I thought the choice was weird, and asked for more…context and provided some food for thought.

I think you can add even more emphasis to how and why you chose to automate scene creation. Too many times, testers choose to automate because they can automate. The thought process (for me) may be, “The scene feature is pretty important, I’m curious what happens if an author has thousands of scenes. Will it cause performance problems, formatting problems, file load problems, or other issues, etc. There’s no way in hell I want to do this manually, so let me write some code to help me run my experiment”.  You may also want to discuss alternate implementation ideas – e.g. creating a macro in Notepad++ to create the text then paste it in, or creating a macro in Word for the same, or using for in a windows console (e.g. for /L %f in (1,1,1000) do echo “##”>>output.txt&echo “Scene %f”>>output.txt ). Using tools for creating and manipulating data could be a whole article.

And that led to my real beef with the paper. It talks about using tools to test – which can be a good thing, but it doesn’t really talk about automation in the way successful teams actually use it.

I think it may be important to talk about purposes of automation and where to apply it – or at least one context of that – as I don’t think that’s discussed enough (but I’ve made a note to myself to write a blog on this very subject). At a BFC (big company) like Microsoft, we write a lot of shared / distributed automation – automated tests that we need to run on a lot of different hardware/platform configurations in some sort of lab setting (tools like SauceLabs or BrowserStack are helpful here). Web apps are famous for this [problem].

I also commented:

The other kind of automation (which you cover in your article) is what I sometimes call exploratory automation (or more often, just “Testing”). This is where we get a test idea, and want to write some quick automation to help learn about the product. While we may turn this sort of test into something that’s distributed / shared someday, its primary purpose is to help me answer questions and learn. There’s (another) story in HWTSAM where I described a case of this. I wrote really ugly brute force automation in C (using things like FindWindow and SendMessage(LBUTTON_DOWN…) to simulate opening and closing a connection to a remote host many times (the only thing this app did). It found a nice memory leak that may not have been found otherwise (or at least not as quickly).

All of this feedback fed my uber-point, which was that while the article talked about test automation, the examples really just talked about using somewhat random tools to help the authors explore or test some software. There was nothing about strategy, or about more typical use of test automation. I asked about it in a comment:

I wonder why you don’t use the word “Tools” in the title – e.g. “A CDT approach to tools and automation in testing” or something like that.

…because the paper is about, as I said above, using non-standard tools to help test. Sure, it’s automation in a sense, but nothing in the paper reflects the way test automation is used successfully in thousands of successful products.

All that said, I do not support this paper as a description of good test automation, and I think it’s an inappropriate method for anyone to learn about how to write automation. Chris requested that the authors remove the paper; while I support this, and do believe that the paper can cause more harm than good, there’s so much bad advice on the internet about creating software, that removing this one piece of bad advice will hardly make a dent.

I did not realize my name was listed as a reviewer, and although I did (as admitted above) review this paper, I do not want my name associated with it, and will request that the authors remove my name.

A Few Anniversaries (and one announcement)

There’s a light at the end of the oh-my-work-is-so-crazy train, and I look forward to ranting more often both here and on twitter.

But first, a few minor anniversaries to acknowledge. Monday was my 21-year anniversary at Microsoft. It’s not a nice even number like 20, but it’s weird to think that people born on the day I started (full-time) at Microsoft, can now drink in the U.S. While I doubt I’ll make it to 25, I doubted that I’d make it to 20…or 10, so this is definitely an area where I’m bad at estimating.

Meanwhile, the ABTesting Podcast just hit episode #40. That’s another milestone I never thought I’d hit, but Brent and I keep finding things to talk about (or new ways to talk about the same things). We should hit the 50-episode milestone (by my already established as poor estimates) before the end of the calendar year. I’m thinking of inviting Satya to be a guest, but I don’t think he’ll show up.

 testbashphilly[1]On the announcement front, I’m speaking at Test Bash Philadelphia in November. I’ll be talking about “Testing without Testers and other stupid ideas that sometimes work”. This is an evolution of a talk I’ve been giving recently, but I’m preparing something extra special for test bash that should inspire, as well as cause some great conversations to happen.

Filling a gap in Istanbul coverage

I’m at no loss for blog material, but have been short on time (that’s not going to change, so I’ll need to tweak priorities). But…I wanted to write something a bit different  from normal in case anyone else ever needs to solve this specific problem (or if anyone else knows that this problem already has an even better solution).

Our team uses a tool called Istanbul to measure code coverage. It generates a report that looks sort of like this (minus the privacy scribbling).imageimage

For those who don’t know me, I feel compelled to once again share that I think Code Coverage is a wonderful tool, but a horrible metric. Driving coverage numbers up purely for the sake of getting a higher number is idiotic and irresponsible. However, the value of discovering untested and unreachable code is invaluable, and dismissing the tool entirely can be worse than using the measurements incorrectly.

The Missing Piece

Istanbul shows all up coverage for our web app (about 600 files in 300 or so directories). What I wanted to do, was to break down coverage by feature team as well. The “elegant” solution would be to create a map of files to features, then add code to the Istanbul reporter to add the feature team to each file / directory, and then modify the table output to include the ability to filter by team (or create separate reports by team).

I don’t have time for the elegant solution (but here’s where someone can tell me if it already exists).

The (or “My”) Solution

This seems like a job for Excel, so first, I looked to see if Istanbul had CSV as a reporter format (it doesn’t). It does, however output json and xml, so I figured a quick and dirty solution was possible.

The first thing I did was assign a team owner to each code directory. I pulled the list of directories from the Istanbul report (I copied from the html, but I could have pulled from the xml as well), and then used excel to create a CSV file with file and owner. I could figure out a team owner for over 90% of the files from the name (thanks to reasonable naming conventions!), and I used git log to discover the rest. I ended up with a format that looked like this:


Then it was a matter of parsing the coverage xml created by Istanbul and making a new CSV with the data I cared about (directory, coverage percentage, statements, and statements hit). The latter two are critical, because I would need to recalculate coverage per team.

There was a time (like my first 20+ years in software) where a batch file was my answer for almost anything, but lately – and especially in this case – a bit of powershell was the right tool for the job.

The pseudo code was pretty much:

  • Load the xml file into a PS object
  • Walk the xml nodes to get the coverage data for a node
  • Load a map file from a csv
  • Use the map and node information to create a new csv

Hacky, yet effective.

I posted the whole script on github here.

Do it *my* way, or do it *our* way

I was thinking about this on the way to work today, and thought I’d try to spit out a quick blog post before I got side-tracked again.

I’ve been very fortunate to have had success with organizational change with teams at Microsoft. Whether it’s getting programmers to run integration tests before check-in, or helping a team get to a daily zero-bug bar, my leadership style is the same. I believe that people will do things that they think are valuable. In fact, this quote from Eisenhower (which is, admittedly, overused) aligns tightly with my style.

Leadership is the art of getting someone else to do something you want done because [s]he wants to do it.

I talk with people to understand what their concerns and motivations are. I communicate plans and strategies to the team. Often, I “plant seeds” – for example, I may mention to a manager a few of the benefits of keeping engineering debt low and give a few examples. No judgement or decree – just an idea to put in their head. Later, I may mention that it would may be a good idea to keep pri 1 bug counts at zero, and maybe overall bugs below some arbitrary number. Often, a few weeks later, I’ll see that manager’s team with zero pri 1 bugs. Or, I’ll mention in a meeting that I’d like to get the whole team down to zero bugs, and I generally have support from everywhere I planted a seed.

The big advantage of this style of change management (in my experience) is that the team owns the change, and accept it as part of the way they work. The disadvantage, is that it takes time. To me, that time investment is worth it.

There’s a faster approach, but I don’t like it – yet I see it used often. It probably has a better name, but I’ll call it the do-it-because-I-said-so style of leadership. Eisenhower also said that leadership doesn’t come from barking orders or insisting on action (paraphrase because I’m too lazy to look it up). To me, leadership isn’t about your ideas, it’s about working with others and building your tribe. Too many so-called leaders think that leadership is being the loudest voice, or being the one that makes mandates to an organization. That’s not leadership to me. That’s being a dick.

That said, there’s a middle ground there, that I see often enough to respect, but not often enough to completely understand. I know some leaders who are able to make explicit mandates and have their team rally around them immediately. They don’t do this often, and I think it helps. They are humble and I think this helps. They have a relationship with their followers – and this helps too. Maybe the answer is that they’ve waited until they’re a real leader (rather than a self-proclaimed chest-thumper), and waited until circumstances were necessary before making a mandate.

What kind of leader do you want to be?

Intelligence and Insight

I have an after-work event tonight, and rather than leave my car in the garage overnight, I ran to work. Since I’ve moved to downtown Bellevue, I’ve done this a few times – and given that I’m running another half-marathon in 10 days, it was a great opportunity for a long training run before I begin to taper my mileage down a bit leading up to the race.

App Issues

I’ve been a long-time user of a running companion app called Runtastic. It does the usual stuff of tracking mileage, route, pace, and giving voice updates at user specified intervals. I find it especially valuable when training or racing, because I usually have very specific pace goals. Having a voice tell me how long the last mile/half-mile/km took me let’s me know if I’m running too fast (and burning out) or too slow (and putting my goals in jeopardy). Granted, I have a pretty good internal clock, and usually run my paces pretty well withouth “help”, but the feedback is really useful to me.

Today, I took off from home, started spacing out, and before long, I was a mile or two from home…when I noticed that I had not received any voice feedback yet. I knew exactly what happened (because it’s happened before). When you start the app, it gives you a 15 second timer before the tracking actually starts, along with the ability to add time to the delay up to two minutes or so. I LOVE this feature, because I can give myself time to put my phone into my running belt, walk twenty yards and curse myself for having such a painful hobby before actually exerting any physical energy.


Unfortunately, at least 25% of the time I use the app the countdown fails. But I never know it failed until too late.

What happens is that the countdown stops at 1 second. I hear the voice prompt count down 5-4-3-2-1, and I think it’s tracking, but it’s stuck on one second.

Today was extra painful not only because I wanted to see how I was doing on “race pace”, but after I discoverd it failed, I restarted the app, set the countdown again…and it “hung” on one second again.


Now, it’s a bug – that’s for sure. Some testers I know would automatically assume that every user in the world was hitting this bug and that public shaming of the company would be the next course of action. I, however, realize that context plays a role in many (every?) part of software engineering, and that given the value I get from the product (and my very amateur level of running) that while this is a painfully annoying bug, it’s not the end of the world.


Given that I wasn’t concerned at all with pace for the remaining 5+ miles of my commute, my mind began to wander. What follows is a completely made up story of how a Runtastic engineer may discover this issue without me, or anyone else, reporting it.

Pointy Haired Runtastic Manager: Hey super-smart employee (sse). Our default time for delay before a run is 15 seconds. Can you look at the data and see if our estimate for a default delay length is in the ballpark of what people actually use? Someone on the train told me that they though 30 seconds would be a lot better. I don’t think they’re right, but I want to make a decision based on data.

Super Smart Employee (sse): Sure boss. That data is pretty easy to pull. I’ll take a look!

What SSE is about to do at this point is gather Business Intelligence. They want to use data to make a business decision.

SSE looks at hundreds of thousands of activities from the past six months and sees that nearly 60% of the people just use the default 15 seconds. She quickly generates a scatter graph showing that shows the outliers and prepares it for her boss. Before sending, she realized that she wants to exclude the instances where people cancel the activity completely before the countdown completes (phone calls, cold feet (literally, and metaphorically), and a variety of other reasons could cause this). She filters the data and starts to send the report…but – while she’s there, she notices something…interesting. First, the number of “cancelled” activities seem high to her (over 15%). She flips the filter to look only at cancelled activities and things get weirder. Of the 15% of cancelled activities, 90% are cancelled at exactly 1 second.

That’s too weird to be true.


SSE looks at every activity where the timer was “killed” at one second. Often, those users started another activity within 10-15 minutes.

Or, maybe they all had the same model of phone.

Or maybe they were all running Spotify at the same time.

Or something. Remember. This story is completely made up.

The point is that SSE quickly went from gathering BI to using discovery and insight to find a pretty cool bug. Using data!


I told a story at a conference recently about a team I worked on that used an offshore vendor team to run through a large number of applications for app compatability testing. We asked them to take notes and to send a report, but not to bother filing bug reports.


Yeah – we had sufficient telemetry and monitoring that we knew about all the bugs and glitches (and had collected call stacks and other helpful information) already. Many, in fact, that the test team didn’t (or couldn’t) notice. Entering the bugs would have been a waste of time. In the rare cases where something weird happened that we didn’t track, we immediately added the appropriate instrumentation to track that class of failure in the future.

I expect that for most of you, my world isn’t your world. But in my world, data driven engineering is critical.


Since I don’t know how made up my made up story really is, I’m going to report it to Runtastic anyway. I can’t predict the future (or anything else), but I hope the reply to my complaint is, “Yeah – we already knew about that. From the data”.

Creative Work

It’s early January, but I think I’ve already read at least a half dozen web articles on how testers need to be creative and use their brains, etc.. The articles are exactly on point in some sense, but most give me the feeling that the authors think that software testing is (one of) the only profession(s) that requires thinking and creativity.

Which is, of course, complete crap.

In A Whole New Mind, Daniel Pink tells story after story about how creativity is the competitive advantage for any business, and any knowledge worker. The jobs of today, and especially the future all will require creativity and thinking over “book smarts” and rote work. Software development (including testing) is just one example of a knowledge worker role that requires those skills. Everyone who wants a successful career should look for ways to learn, opportunities to be creative, and new ways to think about hard problems.

Peter Drucker came up with the term Knowledge Worker in the 1950’s, and most definitions I’ve read describe software testing quite well (wikipedia article here if you’d like to form your own opinion) – but if you don’t want to click, try this excerpt:

Knowledge work can be differentiated from other forms of work by its emphasis on “non-routine” problem solving that requires a combination ofconvergent, divergent, and creative thinking.

I think the problems in software testing (and in software development in general) are some of the most interesting and challenging problems anywhere – but I do not believe that the approach to the problem solving is particularly unique – especially as unique as some of my industry peers seem to imply.

I encourage anyone curious about knowledge work to read Druckers writings on the subject, and especially Pink’s book mentioned above.

Roles and Fluidity

I had a twitter conversation this week about roles this week. I’ll recap it – and expand on my views; but first I’ll tell a story.

Very early on in the Windows 98 project, I was performing some exploratory testing on the explorer shell and found an interesting (and slightly weird bug). At the end of my session, I entered the bug (and several others) into our bug tracking system – but the one issue continued to intrigue me. So, I took some time to look at the bug again and reflect on what could cause this bug to happen. I dug into the source code in the area where the bug occurred, but there was nothing obvious. I couldn’t shake my curiosity, so I looked at the check-in history and read through the code again; this time focusing on code checked in within the last few weeks. My assumption was that this was a newly introduced bug, and that seemed like a reasonable way to narrow my focus.

Less than an hour later, I discovered that a particular windows API was called several times throughout the code base, but on one occasion, was called with the parameters reversed. At this point, I could have hooked up a debugger (or  some could say that I should have already hooked up a debugger), but after visual examination of the code, the code of the API, and the documentation, I was positive I found cause of the error. I added the information to the bug report and started to pack my things to go home.

But I didn’t. I couldn’t.

I was bothered by how easy it was to make this particular error and wondered if others had made the same error too. I sat down and wrote a small script which would attempt to discover this error in source code. Another hour or so later, and I had a not-perfect, but pretty-good analyzer for this particular error. I ran it across the code base, and found 19 more errors. I spot checked each one manually, and after verifying they were all errors, added each of them to the bug tracking system.

Finally I was about to go home. But as I was leaving, one of the developers on the team stopped by to ask how I found the bugs I just entered. I told him the story above, and he suggested I add the tool to the check-in suite (along with several other static analysis tools) so that developers could  catch this error before checking in. I sat back down, we reviewed the code, made a few tweaks, and I added the tool to the check-in system.

Over the course of several hours, my role changed from testing and investigation of the product, to analysis and debugger, to tool developer, and finally to early detection  / prevention. The changes were fluid.

On twitter, a conversation started on detection vs. prevention. Some testers have a stance that those two activities are distinct, and that doing both makes you average (at best) at both. The conversation (although plagued by circular discussion, metaphors and 140 character limits) centered around the point that you can’t do multiple roles simultaneously. While I agree completely that you cannot do multiple roles simultaneously, I believe (and have proven over 20+ years) that it is certainly possible to move fluidly through different roles. Furthermore, I can say anecdotally that people who can move fluidly through different roles tend to have the most impact on their teams.

To this day, I figure out what needs to be done, and I take on the role necessary to solve my team’s most important problems. Even though I have self-identified as a tester for most of my career, I don’t see a hard line between testing and developing (or detecting and preventing). In fact, that may be one of the roots of conversations like this. For years, I’ve considered the line between development and testing to be a very thin grey line. This reflects in my story above, and in many of my writings.

Today, however, I don’t see a line at all. Or – if it’s there, it’s nearly invisible. It’s been a freeing experience for me to consider software creation as an activity where I can make a significant contribution while contributing in whatever areas make sense – at any given moment.

Sure – there are places where develop and then test still exist, and this sort of role fluidity is difficult there (but not impossible). But for those of us shipping frequently and making high quality software for thousands (or millions) of customers, I think locking into roles is a bottleneck.

The key to building a great product is building a great team first. To me, great teams aren’t bound by roles, but they’re driven by moving forward. Roles can help define how people contribute to the team, but people can – and should flow between roles as needed.

%d bloggers like this: