Guiding principles for choosing a Test framework

November 06, 2021
tools, testing, and personal experience

a movie still from Dune, 2021 — Dune, 2021

“No one understands my passion for this vain test library, but I shall make it default for everyone”

– A vernacular Software Test Engineer

The problem

Test tool (or framework) comparisons, and the actual choice of a tool, can be a “painful” and overblown exercise in modern tech organizations:

It’s prone to a lot of lengthy and tiresome risk discussions between “hyperfocused” advocates of different tools that haven’t used the tools of the competition;
It’s very easy in some cases for folks to fall for sales pitches from malicious test tool vendors;
It’s easy to fall for a “honeymoon“-like anti-pattern phase, where the initial impression/adoption fails to uncover a lot of deep rooted issues that come with prolonged tool usage.
Lastly, tools are intimately dependant on their context, e.g. familiarity, team-level, org-level, and industry-level suggested ~~imposed~~ rules and standards, extensions & plugins, etc., meaning: the “one test tool/framework/library to rule them all” (likely) does not exist.

This is oftentimes true regardless of the area of focus of the test tools, be it UI-facing, API-facing, Load/Performance-dedicated, …

Proposed solution(s)

I’d like to share and suggest a handful of personal guiding principles that have helped me mitigate the non-useful noise and disruption that comes with a tool-choice exercise:

Abstract tool-specific complexity through accessible interfaces (Anyone can run)
Containerize from the start (Anyone can run from anywhere)
Abstract the “attack type“ (Anyone can run anything)
Expose meaningful test results (Anyone can understand what failed)
Make debugging easy (Anyone can tinker with failure)

The underlying principle is using accessibility and maintainability as a compass (and a shortcut) to help create some sort of safety-distance / critical perspective while comparing test-frameworks.

Let’s look at each of the above points in detail.

Anyone can run

One of the most overlooked aspects of any test tool, and specifically tooling that is adapted and then developed on top of inhouse tools is that it’s not always straightforward for folks to make use of the actual tools.

If you have been working on the tech industry for a while, you probably have seen some of the worse symptoms of what I’m describing:

The automated checks are only run by the same folks developing the checks from the UI of an IDE, e.g. something done by Jetbrains like IntelliJ, or done in any sort of the typical apps nowadays built with ElectronJS, like Insomnia, Paw, Postman, etc.
The automated checks only run locally through a CLI or a set of shell commands, and are a mess to setup locally on someone else’s machine aside from the machine of the engineers coding the checks;
The automated checks are already at a stage where they run on a CI pipeline in our typical Jenkins/GitlabCI/Github Actions/… BUT, only the engineers that created those checks know about the CI jobs, and are the only ones checking the results of those jobs;
…

The basis of the problem is always the same: only the test engineers developing the checks know how to run them and/or how to interpret the results, and there is a massive disconnect in value between the automation and everyone else in the team.

In my personal experience, to fight this problem we can resource to one principle:

Whatever complexity we might code into our automated checks, we need to provide multiple accessible interfaces for folks to run the automated checks, abstracting any tool-specific complexity (and setup) at all its levels.

What this means always depends on the context where we’re working, but here’s a few good signals that we should be tuning for:

If non-technical folks are drawn to run the tool and can run it unsupervised and without fear of causing damage;
If technical folks can run the tool in a way that suits their preferred development environment tempos;
If the interface we designed is easy to remember;
…

In the field the above can look like this: suppose we’re working on a test tool called “testthis” where folks can mimic user flows pointed at the API level.

Technical folks can run the tool via CLI, which could look like:

testthis run --flow release_goat_for_trex --environment staging --project dinopark

And non-technical folks can run the same tool from points where they are used to work in, like in Slack or other chat-based software, e.g. using a slash command:

/testthis run --flow release_goat_for_trex --environment staging --project dinopark

The trick is abstracting and reducing configuration, reducing the mental load of someone using the tool. There’s more I could write about this principle alone, because it’s tricky to design an interface that is simple but not over-simplifying. I’ll leave it to another post for the time being.

Anyone can run from anywhere

One of the coolest things that I’ve come to appreciate over this past year was having direct access to a friend who also happens to be a Docker Captain (yes, that’s right Tom, I’m name dropping you on one of my posts).

The main teaching that we adopt when we’re interacting everyday with someone that is a Docker (and containers) ace pilot is:

If we can break apart and contain a solution to a problem, e.g. a piece of software, the solution might not be perfect, but we’re solving in one go a lot of other tiny issues for ourselves and others.

We go from the typical “Works on my machine” to being able to share and distribute the thing, contain it and pin its dependencies to a working state, and reuse it without worrying about internals or obscure steps of setup or host machine specifics. It now “works on my container”, which is a slightly better predicament than having something just work on the programmer’s host machine.

Sadly, what we’ll find inside most organizations in the industry is that, aside from using existing containerized tools, very few test engineers tap into that power and start thinking about containerizing their own automated checks and their own in-house test tooling, and fail to extend tools that already allow for containerization. I believe this also aggravates the pickles that a lot of Test Engineers endure:

“No one uses my test tools”
“No one cares for the automated checks I’ve coded”
“My test tools work fine, folks just need to follow these 10 steps on their machines to set it up, and between executions do these 15 steps to reset the test data”
“I’ve worked on this piece of automation for months, and no developer or tester or non-technical person uses it”
…

Nobody cares because the test engineer is not distributing the thing properly.

Distribution is not just sharing a link to a repository or a CI job. In order to fix these “dormant test tooling dilemmas, we need to keep in mind:

Any piece of test tooling we are building needs to provide value from the start, not in “6 months”;
If there’s no easy to follow README showing me how I can run the tool, I absolutely don’t care for the tool;
If I can’t just spin up a container of the tool, not even technical folks will care for the tool.

And probably the key side-pieces that come as a byproduct of the above principles:

We can provide value from the start if we can get it in the hand of folks that will use it,
There’s no better way of getting it to folks hands than containerizing it,
Containerizing takes you to the next level, because containers can “run anywhere” and be triggered to start “from anywhere”,
“from anywhere” means we easily provide our tool through a chat bot, a spreadsheet, or any other tool that can underneath do any sort of API requests to run a container “somewhere”.

When we do the exercise of putting our test tool or our handful of automated checks working in a way that they can be run “anywhere”, more often than not it also forces us to think about the next problem: to “run anything” pointed everywhere.

Anyone can run anything

It’s usually the case our test tools all try do the same: they try to follow a scripted path of the interaction of a user through a certain narrow perspective of a product.

If we wanted the test tool user to do this same interaction at a larger scale, like in a load test, it would be a good practice that they could easily just do that: indicate that they want to run the same thing they are running mimicking one user - but for hundreds, thousands, etc… of users.

Here’s the part where most Test Engineers will spot a gotcha: folks dedicate too much time either:

on tools that accomplish flows for a single user, where they add a lot of detailed assertions throughout those,
or attacking the scale problem, focusing on load tests that are not deep in assertions.

But they almost never dedicate balanced time for both. This cannot be the case.

My proposed principle to try and do things right in this case is that we need to push for ways that we can dedicate enough meaningful time for both implementations. And this is only possible if from the start we try to provide those through the “same” interface:

testthis run --flow some_flow (... other arguments)

testthis run-load --flow some_flow  (... other arguments) (load specific arguments, like number of virtual users, iterations)

testthis run-distributed-load --flow some_flow  (... other arguments) (load specific arguments) (distributed arguments)

The end user should be able to run anything easily, they just need to focus on choosing the right “attack” type, and the “parent” test tool abstracts the underlying complexity, and acts as an alias of any other underlying tools.

Anyone can understand what failed

This goes back to something I had mentioned in a previous post.

As test engineers we tend make it so when a given automated check suite fails, we get notified with some bland message and a link to a CI job. Problems with this approach:

The notifications end up being cryptic for the desired audience for those notifications (in theory the whole team),
It’s dumb to have to go through logs if what you want is just an initial understanding of the issue,
The notifications don’t provide any meaningful information, their intended premise is “folks should care about these”, which curses the notifications to fall into oblivion, since they quickly get ignored.

This is not the way. Notifications for test tool failures should be as “delicious”, enticing and meaningful as a typical predatory notification for a new social media post… without the shallowness and ad-revenue hungry demonic spirits that come with default social media notifications and clickbait.

What would this look like in theory? Well, it means we prioritize:

Meaningful error messages and easy to access logs over bland failure messages;
Direct-feedback loop over scripted test case loops through integrations.

And what does this look like in practice? Taking the example from my anti-patterns post, it’s all about trying to reach a message that tells a story, like this:

SomeAutoChatbot says: The endpoint ABC in development environment 029 is failing with 502 Bad Gateway for the buy-an-action-man test scenario. Error trace-id is 053de188-7438-42b1. Link to the logs some kibana/cloudwatch link. Possible solution: restart the orders service here or contact @oncall-support-dev-env-team.

versus saying something bland, like this:

something is not working, please check my failed jenkins job and the ticket 1234 of the test case on JIRA

The point is: whatever you do, you optimize for the message itself, by being sure that anyone in your surrounding context, including yourself, can have a quick grasp and clear signals of why something failed, and you leave breadcrumbs for folks to investigate deeper if they are up for it.

Anyone can tinker with failure

How many times has a developer reached out to a test engineer and asked - how could I do this specific automated check or debug a failing check, only to be met with several flavours of the same response:

You could, but you can’t

We tend to make our lives in a project harder by not looking at probably the most useful problem to look at after the problem of containerization:

How do we make it easy for anyone to debug a failure state of our test tool?

This principle depends heavily on the programming language, libraries and tools from which each of us build our own automated checks and test tooling, but its importance is what can make or break a test tool and even a test engineer.

Some folks are quick to write this off and will say: “Ah, if folks do this and that on the specific dev environment that I use, they can somewhat debug the test tool… problem solved”

Those folks fail to realize they are a part of the problem. This shouldn’t be the case. There are a few steps I can suggest in this case:

You should be able to provide multiple different ways that folks can both “breakpoint-resume” debug the test tool, as well as your typical “console log” approach;
The tool you are building details somewhere in meaningful logs and other document formats the steps it took while trying to follow along a certain flow;
Any debug approaches should be documented somewhere where it’s easy to find the info;
You should try each of the debug approaches you suggest for yourself;
If any of the suggested debug approaches is hard to explain or convoluted and complex, scrap it for a simpler one.

Wrap-up: A caution word about tools that try to be human

Here’s what most folks might not talk about when it comes to test tool choice: the “evil” of tools that try to be and do everything a human does.

By this I mean:

They suggest ~~impose~~ a human-based domain specific language (DSL)

There’s two crucial points to keep in mind regarding this:

It indirectly promotes software testing busywork, as in, “it is work that is done to show that work is being done” (see my anti-patterns post);
It impacts maintainability, since not only you have to maintain test code, you also need to maintain a regurgitated human translation of that code that no one will care for;
…

Just these points breed the equivalent of the flowers blossoming problem: weeds blossom due to (re)implementation freedom. Oftentimes you end up with an extra million ways of using the “do-it-all” library to solve a certain path within the same org, plus the added clutter of having libraries that do more than what you are trying to do in the context of a scripted test.

So, I’ll wrap up this post with a word of caution:

There is such a thing as test tools/frameworks that try to do everything a human does so much so they become vain tools.

I recommend you give a read of Michael Bolton’s experience reports of Katalon and mabl to get a feel of what this usually means from the lens of a hardcore software tester.

If you read this far, thank you. Feel free to reach out to me with comments, ideas, grammar errors, and suggestions via any of my social media. Until next time, stay safe, take care! If you are up for it, you can also buy me a coffee ☕