My concern with proprietary Load Testing tools

July 03, 2021
tools, testing, load testing, and personal experience

a movie still from Star Wars Episode VIII - The Last Jedi, 2017 — Star Wars Episode VIII - The Last Jedi, 2017

One of the things that I mentioned briefly both on episode 2 of the tools series or my load testing “get rich fast” scheme parody was the extent of how some proprietary load test vendors will use common load testing terms and will twist them to try and fool potential customers… The biggest twist of all being the predatory and (almost) criminal schemes with how “virtual-users” tiers and similar concepts are marketed and portrayed to folks that sit behind load testing expense budgets.

The Setup

The setup for the scam starts with the victim:

person in charge of estimating load testing efforts believes their untested services will need to tackle “millions of users” or “millions of requests” and believes their apps need to scale ad infinitum and process billions of transactions on demand.

In most (but not all) cases the above is nothing but a symptom of an industry-wide ~~fetish~~ insecurity on infinite scalability (“will it scale?” ‘ intensifies) born from the inner dream that everybody wants to be the next Netflix and is insecure about the size of their… ermmm… digital genitalia?.

~~Malicious~~ Closed source and/or proprietary load testing tool providers are aware of this fetish and exploit it as a weakness in the same way a social engineer would. The bait goes in these lines: “You know, your application is so important, we believe in your mission, we want to help you out, let’s make synergies, we think you need these many virtual users…”

So they send the victim:

a product prospect mentioning that their closed tool actually integrates with a ton of other tools, and is, among other things, endorsed by the Pope of the Universal Church of the Kingdom of Quality Assurance,
alongside the prospect comes spreadsheet with prices for licensing their closed source cloud solution,
and/or a spreadsheet with the prices of the licenses for proprietary self-hosted approach (that likely has some sort of built-in rate-limiting).

In worst cases, the prices are never available in a straightforward/transparent/public way, the victim always has to go through some sort of sales folks… The spreadsheet, never the same one between customers, often shows exorbitant prices in a format that is loved by budget owners: it’s all presented in a crisp and transparent manner, usually on one column the number of virtual users (VUs), or requests per second, or some other big number without its actual meaning, and on the other column the price tag.

Since most targeted victims tend to have free-ish reign on budget, and the prospect looks “clean and corporate-friendly”, mentioning the tool is guaranteed to help the victim load test their galaxy-grade trillions of users, the victim then picks a tier, signs their souls to the tool provider… and both parties become professional best friends forever.

Side-note, for those of you that want to know more lore of this particular world build, here’s a bonus point: the poorly-paid interns and/or sweatshops that the victim organization will pay to do the actual load testing work and scripts are also certified in using the same proprietary load tool and… official partners of the proprietary load tool provider? Who would’ve guessed?!

The core of the scam

The gist of the actual scam is simple and deadly effective: attributing a concept that requires context depth, like the concept of Virtual users or requests per second a price tag, oversimplifying the problem and leaving it at that.

Here’s what these malicious proprietary load tool providers are not explaining to potential customers or leaving in microscopic print:

If the load flow you are exploring is made up of requests that act like tiny short bursts, e.g. requests that (almost) always only take up a few milliseconds, then a given amount of VUs will translate into, for example, a greater number of requests per second. If on the other hand requests take up seconds because the service is waiting for stuff to be done on the background, the exact same amount of VUs from before will translate into less requests per second: VUs and Requests per second (RPS) intimately depend on the context of the system you are testing.
If for the proprietary load tool folks are considering to use in 21st century there are no truthful and reproducible benchmarks available in public domain and no transparency on the actual infrastructure that would be hosting any cloud-based load generation: folks are foolishly paying extra for good vibes and reassurance, not for credible software tooling. Well intentioned tool providers will warn folk about the tool’s own limitations. Malicious providers (usually proprietary tools) will hide any kind of information that paints the tool in a bad light.
Complex “Production-grade” load flows and setups are never what closed source/proprietary tool providers demo in any sales pitch. There’s a reason for this: When designing a load flow, there’s always a limit of how much work both the load generating and load target system can sustain. There’s also limitations that come with the complexity of the load flow designed. Don’t make your budget decision or tool choice if you haven’t tried out yourself the tool deeply/meaningfully.
For most folks, “the needle” would move just by trying to do any (effective/meaningful) load testing efforts at all, even if at tiny or local scale. Anything you can think of: Persistence limitations, dependencies on external services, bugs on one’s code, bugs on one’s infrastructure and network, misconfigurations on load balancing, proxy bugs, DNS issues, overall flakiness and weirdness… Every single one of these problems can be dug up with load testing at a smaller scale than most folks might believe. The only thing that is missing is: folks doing the actual load testing work.
A bit related to the previous point: You can’t ask a load generator more that it can generate, and you can’t (or shouldn’t) expect a target system to respond neither linearly or predictably to a given load. In the (likely) event the load target system is “tanking” and infrastructure is partially/fully “dying”, a fixed number of VUs won’t mean a steady amount of RPS and VUs are effectively getting stuck. Taking this into account, if you are starting for the first time to load test a system, you are still in “discovery” mode, when trying to map out the response of the target system under different loads, it’s of no use to pay in advance for load generation setup you won’t be using “just yet”.
Anyone can “bomb” a service using free open source load test tool alternatives and still find the same (or better) issues one would find with overpriced proprietary tools, but, no miraculous load test tool will cover for previous lack of setup on observability and monitoring of systems target of load.
…

Wrap-up

In the end, the crude reality of what “victim” companies will be paying for:

Oh yes we die with big traffic, but don’t have in-deep information to actually help engineering folks debug issues;
Potentially Second-hand overpriced cloud instances;
Fake sense of pride and comfort: “Look here C-level management team, we spent our budget of X millions on load testing using an enterprise-grade tool, we crossed out these KPIs, assured the quality, and did the load testings, great success, high-five, very nice”… And bugs that hide in the shallow depths of missed meaningful load testing efforts thrive and live on.
…

In the end folks are left with a disconnect between their context and their actual load testing needs and a tool which may or may not suit that context and those needs. That in itself isn’t necessarily world-ending, but it isn’t fair or right either, and personally, I just think it’s immoral. I don’t have a recipe for “fixing this”, other than some personal recommendations:

Always try to get information and demand accountability from load tool providers’ claims about their own tools;
Make informed tooling decisions based on a better grasp of your context, not based on sales pitches that you try to mold into your context;
Be very mindful that not all money-based decisions with regards to load test tooling have a necessary advantage over other expenses, particularly the expense of time as a resource. An expensive paid proprietary tool might have the selling point of saving you time, but that selling point can be practical smoke and mirrors in the field when you try to use the tool in your specific context;
If you have a choice, go with free and open-source first (k6, locust, jmeter for those into masochism, …), try things at depth, and only then, if you’re not satisfied, search the market for other approaches.

If you read this far, thank you. Feel free to reach out to me with comments, ideas, grammar errors, and suggestions via any of my social media. Until next time, stay safe, take care! If you are up for it, you can also buy me a coffee ☕