Block-Testing

How you do QA is probably terrible.

Of course I don’t know you, but most software development shops that I’ve worked on or chatted with “do QA” in a way that is pretty terrible.

Almost everyone does it, and almost everyone considers it a “best practice”, and yet somehow it never even got a name. I’m going to call it “block-testing” because it’s the kind of testing that unfortunately makes continuous deployment and continuous delivery fundamentally impossible.

So what is it already?!?!

It’s the practice of having humans try to regression-test an entire product or some high-value subset of features across the entire product usually with the goal of finding reasons to stop it from being released. The most extreme (worst) case of this being when a specific role exists solely for this purpose (almost always given the misnomer “QA”).

Just to be super-duper clear: I’m aware that almost everyone is doing block-testing. That doesn’t mean it’s a good fit for every organization though. If you actually want to achieve continuous deployment or continuous delivery, this is one of the big-batch operations that you’ll need to forego.

What’s so bad about it?

  • A single manual run-through of a broad test script is slow (compared to existing automation)
  • It’s expensive (compared to good automation, over the long-term)
  • It frequently misses all but the most egregious regressions (more often than alternatives)
  • It’s antagonistic, pitting different roles at the company against each other.
  • It confuses where the responsibility for quality lies
  • There are many other possible solutions to the same problem, many of which are faster and more comprehensive.

Alright a couple of those things are obvious to most people I think so let’s just dig into a couple of the less obvious ones:

It’s antagonistic

Hiring people whose sole purpose is to find reasons to stop a release is counter to the goals of a company that is trying to release software quickly. Of course block-testers are actually tasked with the important goal of making sure each release is high-quality (and that’s a critical goal!), but the end result is that as block-testers, they naturally tend to optimize their role to “trying to find release-blockers”. They’ll log other defects too of course, but those don’t have the same importance to them (or anyone else) as issues that block releases.

Also when you’ve got one group of people responsible for finding fault in another group’s work, you’re going to have conflict between the groups. Engineers are going to start saying that the bug isn’t theirs or the bug isn’t “that bad”. Testers are going to say the opposite. Each group is going to blame the other when management wonders why releases are delayed or infrequent. You’ll get the same finger-pointing when a defect gets through to production as well. The ensuing arguments are a massive waste of time and each “side” becomes less empathetic to the other’s plight over time.

It Confuses where the Responsibility for Quality Lies

I’ve heard the words “You’re the tester! Test it!” a few times in my career. The sentiment is that development and testing are distinct phases rather than something that happens iteratively during development, and “the developers do the developing and the testers do the testing”. “This frees developers up to just focus on just development”.

Nothing really could be further from the truth and I have some serious doubts about the experience of any developer that thinks that they shouldn’t be testing their own work along the way as if they can hammer out 15 or more LoC and won’t have to bother to try and see if it works. The fact of the matter is that quality is entirely the responsibility of the developer and any successful developer is testing constantly as they go. Conversely, the tester’s only role is to assess the quality. The tester has no responsibility whatsoever to improve it.

At the same time, the very presence of the tester is a very confusing signal. It signals that the developers aren’t wholly responsible for quality (and maybe even that they’re not trusted with it!). It diminishes their skin in the game ), and disincentives quality. Of course a tester-only role has no ability to actually fix defects, and you can’t sanely hold someone responsible for something they have no control over.

I would wager that this division of responsibility is the single greatest cost of block-testing. The testers and the developers start to engage in a kind of game of ping-pong where the developer tries to see what they can get past the tester, and the tester tries to find ways to block the release for any reason at all. I’ve seen a feature go back and forth from developer to tester more than a half-dozen times before the tester agrees that it’s okay to release. That back-and-forth often plays out over days through an issue tracker for trivially-fixed-but-important issues in high value features. If you’re looking closely for this kind of activity you can see it all over the place, and it can look about as effective as a football team that uses telegrams to communicate mid-play. These hand-offs are all classic waste from a Lean perspective.

There are many other possible solutions to ensuring quality, many of which are faster and more comprehensive.

Once you realize the harm, the cost, and the inadequacy of block-testing, you naturally have to ask, “but is there anything better?”. You can’t really fault an organization for block-testing if there are no alternatives.

Let’s chat about some alternatives:

Move Fast and Avoid False Dichotomies

It’s really unfortunate that the industry has resolved itself to the belief that you have to either go slow or you’ll break things, and that quality and speed are an inevitable and universal trade-off. This is a really limiting belief.

In order to get creative, let’s forget the “best practices” for a minute and get back to first-principles: Here’s what we’re trying to do:

  • We want to minimize users’ exposure to defects and outages.
  • We want to minimize the time spent between a developer completing a change on their workstation and it getting to users to keep delivery radically fast.
  • We want to catch more issues than a human can.

Preventative Measures

Preventative measures are anything that you might do to prevent a defect from affecting a user. They’re a pretty critical part of achieving goal A.

However, goal B means we really have to be careful what steps we put between the developer and the production environment. Ideally they’re critically important and they’re fast. Each of these steps has to pay for themselves. Because of this, you have to be really careful about what preventative measures you choose. Some really fast preventative measures: static-typing, automated tests, broken-link-checking spiders, linters.

Mitigating Measures

You also have to start thinking about how to achieve Goal A with more than just preventative measures. Preventative measures aren’t everything. Even the most ardent block-testers have non-preventative measures: feedback from the end-user (even if it goes through 5 levels of support first). I’ll call these post-deployment measures “mitigating measures”. There are 3 main categories for the types of mitigation:

  • Reducing the severity of a defect (# of users, impact per user, etc)
  • Finding defects faster
  • Fixing defects faster

Hopefully those categories get your imagination started. User feedback is a mitigation measure; it’s just the slowest and most expensive one. Here are a few that do better:

  • Fast automated deployment. If you can’t get a fix out quickly, more users will be affected for longer than necessary.
  • Telemetry: logging, metrics, and alerts. Instrument your codebase in as many ways as possible to make it self-report issues it might be having. Alerts will tell you when things are going wrong faster than user-feedback and often you’ll learn about things that humans (manual testers or even end-users) did not, or even could not find.
  • One-click rollback. Make it so that your production environment can be rolled back to the previous known-good deployment as quickly and easily as possible.
  • Staged roll-out. Coupled with great telemetry, a gradual deployment process allows you to halt deployment and roll-back (even automatically) if there is a spike in crashes or logged errors before something is fully deployed.
  • Feature flags / Feature toggles / kill switches. Control (post-deployment) what users get what features and when. If there are any problems, only the allowed users are impacted, and the broken functionality is quickly reverted.

Of course, you will probably want to do these things even if you never eliminate block-testing from your release process. The prevailing practice seems to be to forego these things and expect the human block-tester will catch everything though. That’s a kind of bury-your-head-in-the-sand quality management.

If you try these things out though, you’re almost certainly going to see that these practices are cheaper, faster, and more effective than block-testers though.

“But Automation is Expensive!”

Automation IS expensive! I have no argument against that. It is. I’ve seen numerous well-covered codebases have more test code than application code, so the upfront costs of automated tests are probably close to the same as writing the application code itself.

Over the long haul it pays for itself though. A well-covered codebase easily has thousands of tests each getting run thousands of times a year, so we’re talking about millions of behaviour verifications on the conservative end of things. Humans simply cannot compete with that. Humans are slow, and even the best of them are terrible at repetitive detail-oriented tasks. As expensive as automation is, block-testers are even more expensive.

“But developers are terrible at testing!”

They’re really not. Many haven’t had much practice because they’re working in block-testing environments where the responsibility is muddled, but they’re actually quite good at it with a little practice when they have the full responsibility and trust for it. Don’t be surprised when they don’t choose manual-testing for the solution to every quality problem though.

“Do we fire our QA?”

Block-testing is so common in the industry that many people have a really hard time understanding the place of specific testing personnel in the SDLC without it. There’s absolutely a place for specific testing personnel, but they’ve got to start contributing to improving quality beyond block-testing, and that change is understandably difficult.

Compared to computers, humans really are slow and terrible at repetitive detail-oriented tasks. However humans have well-known strengths that computers do not: they’re creative and curious.

So there are still places for manual testing:

  • Post-deployment exploratory testing. This is a great match for curious humans, and doesn’t slow delivery.
  • Feature-by-feature for new features only, with the developers present and actively involved, before code-complete of that feature. It’s probably best if the tester doesn’t actually do the tests themselves at all, but instead talks the developer through what tests to perform. This hands-on approach improves the developer’s testing ability more permanently and doesn’t confuse the fact that the developers own quality.

With that said, I’ve been working with multiple teams for the better part of a decade that have no manual tester role at all. When they do test manually, the developers are testing things themselves, often as a group (mob testing!). You’ll probably want to at least consider eliminating the role of manual testing entirely.

There’s still a place for quality-minded people other than manual testing though. In fact, the concept of QA and block-testing are really opposites. Actual quality assurance is about making sure that quality is baked into the process from beginning to end. Personnel that just do block-testing are not doing that at all. At best they’re quality control (QC) and that’s a far less valuable role. QA would be involved in a bunch of completely different concerns: How can we prevent defects from ever existing? How can we find them faster? How can we mitigate their impact? How can we recover from them faster?

Here’s a laundry list of possible Quality Assurance tasks:

  • run learning post mortems.
  • measure quality in many different ways (defect rate, MTTR, MTBF, etc).
  • regularly visit production logs and metrics to look for live quality issues
  • coach devs on how to more aggressively test their work
  • get into test automation
  • liaise with users and customer support about quality issues
  • help establish quality criteria for a task before it gets started, and throughout its development
  • Look for patterns in defects

“Does it really never make sense?”

The economics of block-testing make more sense if you anticipate very few releases with increasingly smaller differences. Agencies write this kind of software, but I haven’t personally done this kind of work in over a decade so I could be convinced of the economics either way by someone with more recent and extensive experience . Here I’m specifically talking about teams working on a software product that exists over a long period of time.

“But my situation is different because…”

Okay! I believe you! There are rarely one-size-fits-all practices in software development. I’m simply submitting this counter-argument for consideration. I’ve certainly worked at and heard from many organizations that should strongly consider stopping block-testing because the value proposition for them is just not there.

Comments