I’ve been in a few situations where a performance test looked great on my local machine, only for the production environment to buckle under actual load. The problem wasn’t the code—it was the test. My single-machine Locust setup had hit a CPU bottleneck, meaning I wasn’t testing the server; I was testing the limits of my own laptop. This is where distributed load testing with locust becomes non-negotiable.
If you’ve already compared locust vs jmeter for performance testing, you know that Locust’s event-driven nature makes it incredibly efficient. However, even with Gevent, a single Python process can only handle so many concurrent users before it starts dropping requests or reporting skewed latency. To simulate truly massive scale, you need to distribute the load.
The Challenge: The Single-Node Bottleneck
In a standard Locust run, one process handles both the web UI and the simulation of users. As you scale to thousands of users, the overhead of managing those simulated users consumes the CPU. When the CPU hits 100%, the ‘think time’ between requests increases, and your response times appear higher than they actually are. You’re no longer measuring your API’s latency; you’re measuring your test runner’s struggle.
To solve this, Locust uses a Master-Worker architecture. The Master node manages the state, collects statistics, and provides the Web UI, while multiple Worker nodes execute the actual Python code and generate the traffic. This decoupling allows you to scale horizontally by simply adding more workers.
Solution Overview: The Master-Worker Pattern
The beauty of distributed load testing with Locust is its simplicity. You don’t need complex configuration files for every node; you just need the same locustfile.py present on all machines. The Master coordinates the workers via ZeroMQ, ensuring that the total user count is split evenly across the fleet.
Core Architecture Components
- The Master: The brain. It doesn’t generate load. It tracks failures, response times, and manages the user count.
- The Workers: The muscle. They run the
Userclasses and send the actual HTTP requests to your target. - The Target: The application or kubernetes clusters you are trying to stress test.
Implementation: Setting Up Distributed Load
I’ll walk you through a basic setup using two terminal windows (simulating two different machines), but the logic remains identical whether you’re using AWS EC2 instances or a local cluster.
Step 1: The Locustfile
Create a simple locustfile.py. Ensure this file is identical on the Master and all Worker nodes.
from locust import HttpUser, task, between
class WebsiteUser(HttpUser):
wait_time = between(1, 5)
@task
def load_homepage(self):
self.client.get("/")
@task
def load_about(self):
self.client.get("/about")
Step 2: Start the Master
On your primary machine, start Locust in master mode. Notice the --master flag:
locust -f locustfile.py --master
Step 3: Start the Workers
On your worker machines (or separate terminals), run the following command. Replace <master-ip> with the actual IP address of your Master node:
locust -f locustfile.py --worker --master-host=<master-ip>
As shown in the image above (if applicable to your architecture), the workers will now check in with the master. Once you open the Web UI at http://localhost:8089, you’ll see the number of active workers listed on the main page.
Advanced Techniques for High-Scale Testing
Once you move past a few workers, you’ll run into network and OS limits. In my experience, these are the three most critical optimizations for distributed load testing with Locust:
1. FastHttpUser for Maximum Throughput
If you don’t need the full feature set of the requests library (which HttpUser uses), switch to FastHttpUser. It uses geventhttpclient and is significantly faster, often allowing a single worker to handle 5-10x more users.
from locust import FastHttpUser, task
class HighPerformanceUser(FastHttpUser):
@task
def fast_request(self):
self.client.get("/")
2. Tuning the OS (ulimit)
By default, Linux limits the number of open file descriptors (sockets). When running thousands of users, you’ll hit the Too many open files error. I always run this on my workers before starting the test:
ulimit -n 65535
3. Headless Execution for CI/CD
For automated pipelines, avoid the UI. Run the master in headless mode to define the duration and user count explicitly:
locust -f locustfile.py --master --headless -u 1000 -r 100 --run-time 10m
Case Study: Scaling to 50k Concurrent Users
I recently helped a client test a promotional landing page expected to hit 50k concurrent users. Using a single large instance wasn’t enough. We deployed a distributed cluster of 10 t3.medium instances on AWS.
The result? We discovered that while the app servers held up, the Load Balancer’s connection draining was misconfigured, causing 502 errors during scale-up. We wouldn’t have seen this with a single-node test because the ramp-up wasn’t aggressive enough to trigger the LB’s threshold.
Common Pitfalls to Avoid
- Network Saturation: Ensure your worker nodes aren’t all on the same subnet if you’re testing network throughput. You might be saturating the VPC gateway rather than the app.
- The ‘Master’ Bottleneck: While the master doesn’t generate load, it does process all the stats. If you have 100+ workers, the master’s CPU might spike. Consider increasing the master’s resources.
- Mismatched Code: If the
locustfile.pyon a worker is different from the master’s, the worker will fail to connect or produce erratic results. Use a git submodule or a shared S3 bucket to sync files.