Frail and cast-iron tools - Episode 2

tools, testing, load testing, k6, and personal experience
a movie still from First Blood 1982
First Blood, 1982

It’s been almost a year since my first Frail and cast-iron tools episode where I wrote about my personal experience with Postman and I’ve decided to do a new follow-up episode dedicated to another test tool that has been supercharging my (load) testing efforts for about 2 years now:

k6, an open source load testing tool.

Disclaimer (same as for the Postman post): this is not a tutorial. It’s not a paid or endorsed review. It’s a recollection of my experiences, feelings, frustrations and a flawed portrait of my user experience with the k6 tool, from a software tester point of view. If I might sound negative at some points, I prefer to do so and stay true to myself as a tester, instead of making a blog post of shallow “praise and wonder”. Read on!


So, similar to the other post let’s start with a timeline, this post is about my personal experience using k6 from around mid-2019 until now (April 2021). I’ve used k6 with random consistency, sometimes multiple times a day, every day of the week on the most hardcore weeks, other times a handful of days per month between these timestamps, and I’ve used almost all of the k6 versions released between v0.24.0 and v0.31.1, the one I remember having the most interaction with being v0.26.0.

I’ve used it on a lot of load testing efforts, and the most used “setup” was running load scenarios that were either “coded by hand” or generated through postman-to-k6 library. The load scenarios could be anything from “do these 2 API calls for a few thousand times” to “load these input data files from an S3 bucket into 50 load-generating k6 instances on a Kubernetes cluster to mimic this complex product flow of 500k users”.

I have never used any priced offering of k6 nor have I ever used their cloud, meaning that, even at scale, I’ve continued to use k6 from the free “self-hosted” perspective, and this post is always written considering only the free & open source user space.

Lately my interactions with the tool have not so much to do with reproducing huge scales of load, but two main topics: first, running a hands-off set of smaller load scenarios on a nightly basis and “spitting out” comparisons of each nightly load stats, and second, democratizing the access to load test tooling in the current organization I work with, where a non-technical person can, with a simple chat command trigger what is on its core a k6 instance running a user-mimicking load scenario (comprising a series of HTTP calls and receiving Websocket events), defining the Virtual users (VUs) and iterations in the chat command itself and getting back “human” meaningful results. At one point later in the year I expect to be back to the problem of large load tests/ distributed load generation, but I have other problems to solve in the meantime.

What I liked 😀 / “The Honeymoon period”

To best describe what I like the most about k6, let’s start with a problem:

I have very little time allocated for doing deep testing and even less time to invest in (equally deep and investigative) Load testing… due to XYZ constraints, so the time I do have available at my disposal as a tester/engineer, I need to use it to quickly sketch realistic load flows and then run a lot of experiments and investigate why the system under load works weirdly or breaks…

(Pro-tip: a system always does break or behave weirdly especially if it hasn’t been tested (or thrown in production) at even the tiniest of scales, it’s always like opening a can of worms, and it’s always a good idea to load test.)

I had this problem and I needed to find a tool for it. And one day, by chance, a colleague (Carlos you know who you are) whom I went to whenever I needed to know about either some arcane networking/infra piece of knowledge or about some specific tool that might help me solve some problem, turned up to me and said: Hey Filipe, I think this would fit your load use-case, it’s called k6, I just tried it out on my machine, you should try it out too.

I opened their page. Installed k6 with a brew command and tried out their example. And that was it, with an onboarding experience of “a few seconds”, I started to get emotionally convinced.

I quickly searched if there was something that could help me port over in any way the product flows I had already defined as Postman collections quickly into k6, and there was one. Then I looked around their documentation: it was straight to the point and had examples. No bullshit. None of that “enterprise grade synergies to leverage power of closed-source load software for make benefit of big corporations” bullshit. I was hooked.

I could specifically describe what I like the most about k6 in multiple topics:

But instead I’ll try to sum it all up in a single topic…

Providing a great first user experience

One of the first things that hit me was the low (practically inexistant) “barrier of entry” to using k6. It’s straightforward to setup, and once installed, using the CLI (either directly or with Docker) is as simple as using any other typical CLI tool that most technical folks might already be used to interact with every day.

Unlike most other common tools, which have:

k6 removes both these barriers by, in my opinion, assuming three simple principles:

By sitting on these concepts (either by chance or on purpose) k6 lets half of the magic work happen on the developer/tester’s head:

The same principles apply when thinking of scale, from the local execution to large load tests:

Now that we’ve addressed my favorite parts, let’s go into the next topic…

What I didn’t like 😠

Version changes bringing surprises

I’m not sure if I can call this a point that I dislike, I guess it’s not necessarily a fair overall point towards k6, but it was nevertheless a personal point of frustration, which I think other folks might have been affected likewise.

Long story short:

user of open source tool is dumb, doesn’t read changelog, uses new version, new version has magic the user is not prepared for, breaking the user normal workflow

Around the time k6 was updated from v0.26.0 to v0.27.0, there was a breaking change, that was by all accounts documented on Breaking changes. It was a minimal change, and starts like this:

Previously, if the iterations and vus script options were specified, but duration was not, the script could have ran practically indefinitely, given a sufficiently large number or length of the used iterations. (…) From k6 v0.27.0, by default, if the specified iterations haven’t finished, these scripts will abort after 10 minutes (…) This default value can easily be changed (…) by setting the duration / --duration / K6_DURATION script option.

Now, first problem, right off the bat, as a rascal free open source user, I generally didn’t read changelogs (lesson learned from that moment onward).

Second problem: an assumption was made, that most users’ load scripts by default won’t take more than 10 minutes. This is far from the truth for folks running k6 load scenarios at scales of 50k iterations, where each iteration will comprise anything between 10 to 30 different API calls.

Third problem: if the first idea that comes to the user mind isn’t checking the tool’s changelog, folks will inevitably waste time debugging and understanding: why on earth are their k6 load generator instances dying before they even have a chance to run for a while?

We found out the solution eventually, and the fix was literally a single line of code, defining K6_DURATION environment variable to a value that was more in line with the typical load experiment duration, but I believe, for this particular kind of breaking change, where an assumption is made towards the typical usage of the tool, it needs to be tackled differently, besides (only) documenting it in the changelog.

For instance, the whole trouble could’ve been avoided with a simple warning log when running the tool:

$ docker run -i loadimpact/k6 run - <script.js --vus 10 --iterations 1000

! DEPRECATION WARNING !: Hey you rascal, after v0.27.0
   whenever you use --iterations be aware that max duration defaults to 10min,
   no matter if you still have iterations to run.
   Read our changelog for v0.27.0 here.

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

  execution: local

This is a deeper problem than just the fact I didn’t read changelogs and wasted time on something with a fix so simple. But we could discuss whether or not it would have been a valuable step to include some sort of deprecation warning in run-time for this kind of issue in particular.

We can consider it’s indeed my fault as user for not being informed, but this is a tricky choice in the long run for winning over chunks of the community. On the contrary, assuming folks read the changelog, can lead to a lot of noise (see this example, and the same example on github).

This type of noise can, in my n00b opinion, be always prevented with either avoiding a breaking change, or what I personally think would be the more senseful choice, warning the user in run-time about any surprises.

Memory and CPU fine-tuning

At one point we hit a lot of memory and CPU weirdness/issues for “containerised” k6 runners, especially true for scenarios that were intricate (with tons of requests and checks), where we would hit cases of load runners dying out-of-memory (OOM) even before being they were finished with the iterations.

When the issue wasn’t dying with OOM, then it was the issue of the instances requiring a considerable amount of memory per pod that was running k6, e.g. 4 to 6 GB of RAM. It might not look much individually, but multiply that by 50 instances and we’re talking about a small fortune in monthly EC2 bill… sure, the rascal in me will say, “well if it’s the company paying and not me… not my money…” but we can all agree it’s not an ideal situation.

As for the source and solution for those issues, it was always a point of fine-tuning, soul-searching and accurate guesses:

Figuring out each of these issues meant that, we eventually knew implicitly that if a single load generating pod in k8s had “this much CPU and this much memory”, it could hold these many VUs running this particular complex load scenario, and if something was dead or buggy, our test would fail sooner, but there was a lot of trial and error and experimenting to figure all of that stuff out.

Note, this “dislike” point isn’t new for anyone that has had to do deep load test experiments regardless of the tool choice being k6 or another tool. One point of praise is that at least k6 makes the attempt to document their benchmarks for folks to reproduce or guide their own setups while most other tool providers provide benchmarks that are of “used toilet-paper” grade.

Pain-points of having to use “large” data files as input

One of the indirect reasons memory was a problem was the issue of using large input data files. We had hit a limitation at one point where we wanted to make sure that for each iteration of each virtual users, a unique line from an input data file was being used. This was tricky for two main reasons:

I was no stranger to these, especially in moments where it was required to do specific product flows for about half-million test/staging users. These users existed in some form on a target database, so no user-creation was needed for some flows, and the input data files were sitting on some S3 bucket ready to be downloaded and “put to work”, BUT the issue remained, with flows that required reading data from input files we’d always have to tackle the above issues.

If the files were too big (past a magic number), the k6 instances became clogged and wouldn’t “boot” due to having to hold “a lot” of data in-memory. So we split those files and created different “collections” of the same dataset, and 500k users would be chunked into several files, depending on numbers of k6 instances we wanted to spin up. It was quite common to see: “let’s use the 5000-per-chunk collection today… which we can use to feed these many k6 instances, or the 25000-per-chunk collection today…”.

After that, we had to guarantee that every user on that data file was only attempted once, which meant we’d have to do a trick (similar to this one) with how we loaded the data file and how each virtual users in a load run would get a unique “cross-section” of data to use for iterations. Even with an adaptation of the above trick, we hit collisions where near the end of a load run, a handful of virtual-users would try to re-use data that had already been used, causing a few “false” failures.

Note: it seems this issue has been recently “officially” solved on v0.30.0 (see this Github ticket and this pull-request). Haven’t tried it yet, but it’s promising.

Mixed feelings 😕

Solving the distributed load generation problem for tight budgets

Solving the distributed load generation problem (“how do I make it so my test runs on multiple load generating instances in a seamless and coordinated way, and see a sum of all results afterwards?”) was (and still is) very much a gruesome problem for open source users. Note, it’s not a brand new problem and has been acknowledged by folks that develop k6.

k6 supposedly provides better support for large load tests in its paid SaaS cloud offer… BUT, there’s a catch, unless the Engineers in the project have “access to the credit card” in the organization, or can convince the right folks, trying out or even thinking of using the paid options is already for many a rejected option. The bigger and more structured the corporate environment the worse it becomes to convince someone “above” that one’s team needs access to something like a k6 cloud paid subscription. Oftentimes the context is: there are already folks in the corporation that have sold their souls to other load tool brands and it becomes a pointless pursuit bordering on the domain of internal political and organizational power-balance battles which few people are naive or unsuspecting enough to pursue.

This means that distributing load ultimately becomes a problem that falls into the Testers/ Engineer lap, and they will have a brand new set of different problems to deal with:

For a lot of these problems, you either have the luck of managing through a “tech miracle” to solve them, or you have the luck of being given the time and resources to solve them, or your team has already solved some of them for other use-cases, or you have a bright high-IQ extremely underpaid intern helping you out, freeing time for you to focus on doing plenty of actual load experiments. Otherwise, as the user you’ll be in muddy waters.

(don’t worry Billy, that promotion is coming, we know you are contributing more for this company than most C-levels, but we just believe you’re not quite there yet…)

To give a practical example, let’s use the point: How to monitor both the load generation containers execution and target-of-load systems.

The fact that k6 leaves this bit to the user (which is understandable, it shouldn’t try to do everything) becomes another point that might make or break load testing efforts:

Self-hosted life means dealing with this kind of stuff. Is this a positive or negative thing? The answer is: Yes.

postman-to-k6 converter needs some love

To talk about postman-to-k6 could be a post by itself. I’ll try to keep it brief: postman-to-k6 needed (and likely still needs) a lot of love. It’s a wonderful tool to work with for simple collections, like when you are trying to do some simple exploit, so the collection is just a handful of simple API requests, BUT, it’s awful for postman “power-users” that have to work with intricate setups.

Now, to be fair this is not the fault of postman-to-k6 alone. The fact is, in this case we are talking about creating and maintaining complex postman scripts that already reach limitations in performance of the postman client itself. From the perspective of the converter this means we’re talking about trying to convert collections that:

These are conditions that I assume the converter was never designed to tackle, but I was hopeful in the beginning it would. When I reached this problem, it meant the extra-work of creating dumbed/stripped down versions of particular scenarios in Postman, with a lot of hardcoded stuff in them, and only then convert them back to a k6 script. And even then it meant at times, debugging compatibility issues between different k6 versions that surfaced on some of the libs that the converter would pre-include on a converted script, or investing time adapting generated code (when doing a “vanilla” k6 script would have meant investing less time).

There’s more to be said about this, in k6’s case, my work-around recommendation is, if possible, avoiding converted scenarios, and starting from scratch as much as possible.

What I hope for the future 🌱

I’ve written a parody in the past about doing business in load testing, sort of as a critic to a lot of foul play I observed from some load testing vendors through time. Generally the options for load testing tool choice are:

Both paths have their pros and cons, but both generally suffer from the same evil: the tool is not easy for a n00b millenial (like me) to “pick up and play“… and I don’t even want to think of what Gen Z will say about this at one point in the future.

In this context, I personally think k6 appears in a third rare group: tools that try to solve a problem that has already been solved by many, but democratizing the access to the solution to everyone, no matter their previous knowledge on the topic of the tool. And this can mean a lot of things: being open source first, no (hidden) limits for “free”-tier users, straight to point meaningful documentation, reliable and natural onboarding experience, adaptability of the tool for any kind of use- be it simple one-time-only use or continued and hardcore use, … in most of these things tools like k6 with its strengths and limitations: delights.

With that said, I leave for the folks that develop k6 a few wishes/hopes for the future:

If you read this far, thank you. Feel free to reach out to me with comments, ideas, grammar errors, and suggestions via any of my social media. Until next time, stay safe, take care! If you are up for it, you can also buy me a coffee ☕