Resiliency and Fault tolerance in Testing

personal experience, resilience, and testing
a still from Seinfeld show, 1989-1998
Seinfeld, 1989-1998

How do we look at failure when the system (or subsystem) we are observing is: “how do we build a testing process”?

A lot of software testers may come across different recipes for a test process, for example:

The schools that generate these and other counter-approaches, all miss the mark.

Some schools will focus on positive and reassuring ideas on quality and teamwork. Most of the ideas end up being in total dissonance with the world that we live in:

a religion of thousands of denominations, built on the premise of move slow, break stuff, pay later.

We all know its name.

How many of us abide by the idea that the teams we work with are agile? When do we stop to realize said teams are sick with the tyranny of structurelessness? Or succumb to bureaucracy mixed with recurring finger pointing and fear mongering.

Some schools will advocate for soul-less forms of testing. They cater to a world that doesn’t care for craft. Organizations want reassurance they are following the international standards.

They can’t fail if they follow the standards to the letter, right?

Many know the reality that these same standards are shallow and spine-less. Like a self-fulfilling prophecy: recipes advocated by these entities end up embarrassing failures.

Some schools struggle to have a marketable voice. The world has a hard time, can’t suffer to slow down and smell the flowers of “refined” testing. The world is “cruel and cold” and instead demands painless solutions for two sets of problems:

Sure enough, said schools will excel at showing (selling) you an approach for the first problem. As tragic fate would have it, they fail to face the second one. At least through the lens of prospective organizations.

Some schools are full of good intentions. Their focus will be in improving business and delivery “using data”! Optimizing for “meta” team quality problems! Completely missing what testing is and what core problems testing intends to solve. Those problems don’t matter. Modern positive attitude matters. “Let’s get together and (you folks) solve things… While I guide you with The Force”.

There’s something in the market for everyone, and a voice for every ear. And yet, in spite of all the solutions all schools of thought might offer there is failure. The common tester will notice through time that most approaches will fail one way or the other:

Then come the symptoms and idiosyncrasies that test engineers face through time:

The underlying premise is the same as with all “approaches that miss the mark”: No one is looking at failure in the system. Or observing the system itself. No one is sitting behind the wheel. The tragedy of our century:

Testers failing to test themselves.

What can we do about it?

There’s a handful of things that I’ve tried. Some of them turned out successful, some failed in specific contexts, at times. So full disclaimer:

What follows is based on a collection of personal lessons, likely has a lot of holes, and can be hypocritical…

Hopefully these might help others, since some of these helped me making systems more resilient/fault tolerant. At least in the domain of software testing. In no particular order:

Let’s look at each of these a bit in detail:

Avoid logical falacies

Plenty of folks will appeal to authority, “this is how we do testing, because the Pope of quality, Pontifex Maximus Testingus says so”.

Some folks will repeat that their approaches towards testing or their prefered framework are ideal because their approaches towards testing and prefered frameworks are ideal.

Some folks will plead their case, saying that maybe the current testing approach doesn’t work because folks didn’t believe in it right.

People will attack successful testers suggestions because no good can come from testers that smell bad and come from the north.

Besides, we should follow what some dude that worked at a subset of teams, say, at Big Tech Corp said. He has likely never meaningfully tested anything in his life. But he wrote a book about the subject. It must have worked, because Big Tech Corp is sucessful. Plus, those big tech companies are all known for having zero embarrassing bugs, having users’ best interest in mind and not being evil…

Are you getting the idea here?

Watch out for sunk cost

If you’re doing something, and it’s not working for a while, but it’s still early to say, push a bit further.

If you are doing something that is the same that has been done for months or years, or is different but is just another way of avoiding problems: stop sinking your time and resources into it. The horse is dead.

Systematically deconstruct biases

Picture someone that one day turns up to you and says:

it was revealed to me in a dream that from now on we need to automate all the Testing”.

Most craftsman testers attitude towards this (cocaine-induced) troll bait is to start washing dirty laundry in public.

Don’t waste your breadth. Don’t waste energy by taking that as an opportunity to tell them that “testing can’t be automated, testing is testing, you’re stupid and ignorant and stupid”.

Stop, and instead of taking the moral superiority road, which is getting us nowwhere, take a step back and deconstruct:

We never take a step back to understand why and how these are set up in the person’s mind in the first place. You waste less energy deconstructing and studying biases than replying directly to them in a counterattack flamewar.

Deconstruct a wrongful concept’s origin first before trying to correct someone. This will force you to take a step back and look at the entire system. A system that induced the person to think in that way for a reason.

Add guardrails to the system

Stick with pure-ish guiding principles that will get you some hygiene-guardrails in the long run:

Keep the principles simple and easy to understand. Act on the principles in abundance.

Let Chaos follow its path

If there’s anything that Jurassic Park has taught us is that life will always find a way. Sometimes we must let life do its part, follow its course.

Think of all the systems, as in, tech organizations, development, integration and deployment processes, etc., that are by design broken.

They’re broken and yet, they’re supported by many willing/unwilling participants. If they are maintained in a broken fashion for a long while, despite our attempts to get them better, it might mean someone in power is benefiting from the broken system.

And someone is being taken advantage of.

So, sometimes it’s better to know when to leave. Know when to let extremely broken systems self-destruct themselves.


If you read this far, thank you. Feel free to reach out to me with comments, ideas, grammar errors, and suggestions via any of my social media. Until next time, stay safe, take care! If you are up for it, you can also buy me a coffee ☕

Special shout-out to my friend Jorge for proof-reading and giving me new ideas while I drafted this post.