Speeding up CI/CD: a 3x performance improvement story

When your CI/CD pipeline takes 3 hours to complete, "my code is compiling" stops being a joke and becomes a daily reality that kills productivity. Every developer knows the pain: push a fix in the morning, get results after lunch. Need to iterate? That's tomorrow's problem. With some effort we gave our development team 195 hours back every month by cutting build times from nearly 3 hours to just 48 minutes. The surprising part? We actually reduced our AWS costs in the process. The key was rethinking not just what we were running, but how we were provisioning it.

When "my code is compiling" takes 3 hours

Back in 2022, our main system's build pipeline was in crisis. Builds were taking up to 8 hours due to an artifact repository issue causing extreme download delays. After fixing that immediate fire and bringing builds down to 2.5-3 hours, we discovered we still had a problem. A deep dive into Jenkins revealed the new culprit: integration tests were consuming over 2 hours and 21 minutes of the total build time.

This wasn't just a minor inconvenience. With developers and QA engineers waiting for builds to complete, we were losing significant productivity every single day. It was time for a systematic approach to optimization.

Before diving into solutions, we established clear objectives:

Goal: Reduce build time by at least 50%
Constraint: The solution must not increase our AWS infrastructure costs

This second constraint was crucial. Throwing money at the problem wasn't an option. We needed to be smarter, not just richer.

On-demand node provisioning

Before diving into the optimizations, it's crucial to understand the infrastructure change that made this all possible. Our Jenkins setup had recently transitioned from a traditional provisioning model to an on-demand approach:

The old model: A fixed pool of worker nodes remained online 24/7, scaling horizontally only under heavy load. This created significant waste during quiet periods, with expensive EC2 instances sitting idle waiting for jobs.

The new model: Worker nodes are now provisioned on-demand when jobs need them. Instances launch dynamically, scale up to a configurable maximum (3 nodes), and automatically terminate after 10 minutes of inactivity. No jobs running? No nodes running. No waste.

This shift fundamentally changed our optimization approach. With the old model, using more powerful instances meant paying for expensive hardware even when idle. With on-demand provisioning, we only pay for what we use, making it economically viable to experiment with high-performance instance types.

Our two-pronged approach

1. Better hardware

Thanks to on-demand provisioning, we could now experiment with more powerful instances without the burden of 24/7 costs. Our baseline builds were running on t3a.xlarge EC2 instances (2.5 GHz, 4 vCPUs, 16GB RAM). Since our tests were CPU-bound rather than memory-bound, we tested compute-optimized instances that offered higher clock speeds:

c5a.large: 3.3 GHz, 2 vCPUs, 4GB RAM
c5a.xlarge: 3.3 GHz, 4 vCPUs, 8GB RAM
c5a.2xlarge: 3.3 GHz, 8 vCPUs, 16GB RAM

The results from just hardware improvements were encouraging:

t3a.xlarge (baseline): 165 minutes
c5a.xlarge: 93 minutes (1.8x speedup)

2. Parallel test execution

The real breakthrough came when we noticed our tests were running sequentially on a single core, leaving 75% of our CPU capacity idle. It was time to parallelize our test execution.

Sequential to parallel testing

Making tests run in parallel sounds simple in theory, but our reality was more complex. We had years of legacy tests that assumed they would run in isolation, one after another. When we first attempted to run tests concurrently, chaos ensued:

Tests were competing for the same database connections
File system conflicts arose when multiple tests tried to write to the same locations
Some tests depended on the state left behind by previous tests
Race conditions appeared in code that had worked fine for years

Finding the right approach

We evaluated two strategies for parallel test execution:

Thread-based parallelism: Running multiple tests within the same process
Process-based parallelism: Spawning separate processes for different test groups

Thread-based parallelism immediately showed its limitations. Tests that had coexisted peacefully when run sequentially now fought over shared memory and resources. The failures were intermittent and debugging them would have required rewriting thousands of tests.

Process-based parallelism proved more forgiving. By running tests in completely separate processes, each with its own memory space, we avoided most of the concurrency issues. The configuration was simple: tell the test runner to spawn one process per CPU core.

The pragmatic compromise

Despite our best efforts, two test modules absolutely refused to run in parallel:

The automated UI test runner (which needed exclusive access to screen capture)
The configuration client tests (which used shared system resources)

Rather than spending weeks refactoring legacy code, we made a pragmatic decision: let these two modules run sequentially while parallelizing everything else. It wasn't perfect, but it was good enough.

The results

Let's break down the impact of each optimization:

Hardware improvements alone (sequential tests)

First, we tested better hardware while keeping tests sequential:

Instance Type	Build Duration	Improvement	Notes
t3a.xlarge (baseline)	165 min	-	2.5 GHz, 4 cores
c5a.xlarge	93 min	1.8x faster	3.3 GHz, 4 cores

Just by moving to compute-optimized instances with faster CPUs, we nearly halved our build time. The higher clock speed made a significant difference for our CPU-bound tests.

Parallel testing impact (same hardware)

Next, we enabled parallel testing on our original hardware:

Configuration	Build Duration	Improvement
t3a.xlarge (sequential)	165 min	-
t3a.xlarge (parallel)	101 min	1.6x faster

Parallel testing alone delivered a 60% speed improvement, even on our baseline hardware. We went from using 25% of available CPU to nearly 100%.

The combined effect

When we combined both optimizations, the results were better than we anticipated:

Instance Type	Parallel Tests	Build Duration	Total Speedup	Cost per Build
t3a.xlarge	No	165 min	1x	$0.47
t3a.xlarge	Yes	101 min	1.6x	$0.29
c5a.xlarge	Yes	75 min	2.2x	$0.23
c5a.2xlarge	Yes	52 min	3.2x	$0.32
c6a.2xlarge	Yes	48 min	3.5x	$0.26

The improvements multiplied rather than added:

Better hardware alone: 1.8x
Parallel tests alone: 1.6x
Combined: up to 3.5x

Our winning configuration, the c6a.2xlarge instance with parallel testing, delivered:

3.5x faster builds (2h 45m → 48 min)
45% lower costs per build
55% of the original cost while being 350% faster

Beyond infrastructure savings: the productivity impact

While the infrastructure cost savings are nice, they pale in comparison to the productivity gains. Let's talk about what really matters.

The human cost of waiting

When builds take 2 hours and 45 minutes, developers face a brutal choice:

Context switch to something else and lose mental state
Wait it out and waste nearly 3 hours of productivity
Batch changes together, making debugging harder when things break

With 5 builds per day across our team, we were losing:

117 minutes per build saved (165 min → 48 min)
9.75 hours saved per day (117 min × 5 builds)
195 hours saved per month (assuming 20 working days)

Faster cycles

The benefits go beyond raw time savings:

Before (2h 45m builds):

Morning fix? See results after lunch
Two iteration cycles per day, maximum
Developers batch multiple changes to avoid waiting
Harder to isolate which change broke the build
"I'll just push this and check tomorrow"

After (48 minute builds):

Fix something, verify within the hour
5-6 iteration cycles per day possible
Developers can test changes individually
Faster debugging when issues arise
Same-day verification becomes the norm

The economics that actually matter

Let's put this in perspective. With engineering costs averaging $75-100/hour:

Monthly productivity gain: $14,625 - $19,500 (195 hours)
Annual productivity gain: $175,500 - $234,000

Compare that to our AWS infrastructure savings of $252/year. The cloud costs are a rounding error.

Yes, we also saved on AWS costs

For completeness, here's the infrastructure cost comparison:

With on-demand provisioning (5 builds/day):

Before: $560.67/year (t3a.xlarge)
After: $308.17/year (c6a.2xlarge)
Savings: $252.50/year (45% reduction)

Nice? Yes. Significant? Not really. The real impact is giving developers 2 hours of their life back, multiple times per day.

Implementation challenges and solutions

Enabling parallel test execution immediately exposed years of hidden race conditions and resource conflicts. Tests were competing for files, database connections, and even depending on execution order. Rather than refactoring thousands of legacy tests, we took a pragmatic approach: configure the two most problematic modules to run sequentially while parallelizing everything else.

Resource allocation presented another challenge. Parallel tests require dedicated CPU resources to achieve optimal performance. The solution was to configure each Jenkins worker node with a single executor, ensuring each build gets exclusive access to all CPU cores. This might seem wasteful at first glance, but with our on-demand provisioning model, it's actually optimal:

Each job gets a dedicated node that launches on demand
AWS Auto Scaling spins up additional nodes as needed (up to our configured maximum of 3)
Nodes automatically terminate after 10 minutes of inactivity
During peak hours, multiple nodes run simultaneously, each handling a single job at maximum efficiency

This approach wouldn't have been cost-effective with the old always-on provisioning model, but with on-demand nodes, we only pay for the compute time we actually use. The 3x speed improvement more than justified dedicating nodes to individual builds.

Key takeaways and recommendations

Four critical lessons emerged from this optimization journey:

First, measure what actually matters. Yes, we saved 45% on AWS costs, but the real win was returning 195 hours per month to our development team. That's like hiring an extra full-time developer, except better because it's distributed across the whole team.

Second, profile before optimizing. We initially suspected compilation or artifact downloads were the culprits, but Jenkins revealed that 80% of build time was spent in test execution. This insight saved us from weeks of misdirected effort.

Third, optimization strategies compound. Faster CPUs alone gave us 1.8x improvement, parallel execution alone yielded 1.6x, but combining them delivered 3.5x. The whole truly was greater than the sum of its parts.

Fourth, infrastructure changes enable optimization opportunities. The shift to on-demand provisioning didn't just save money; it fundamentally changed what optimizations were economically viable. Sometimes the biggest wins come from rethinking the platform, not just the code running on it.

For teams facing similar challenges:

Start by measuring where time is actually spent
Try parallel execution on existing hardware first (it's free!)
Consider your provisioning model before investing in expensive hardware
Remember that developer time is your most expensive resource

Future improvements

Our journey didn't end here. Many future optimization opportunities opened. Some of them we actually implemented:

Testing even newer instance types (c7a, m5zn) for further performance gains. Eventually we migrated to Graviton-based instances
Implementing test result caching for unchanged code. We had a shared cache for Gradle
Maine codebase improvement to better utilize hardware
Rewriting some of the problematic tests to accelerate execution times even more

Conclusion

Going from 3-hour builds to 48 minutes was a solid win, but let's be honest: 48 minutes is still a long coffee break. This optimization journey taught us that sometimes the biggest gains come from picking the low-hanging fruit first.

The shift to on-demand provisioning opened the door. Parallel testing and better hardware walked us through it. Together, they returned 195 hours per month to our development team. That's real value, even if builds aren't quite "instant" yet.

The most important lesson? Challenge your assumptions. We assumed faster hardware would cost more (it didn't). We assumed our tests couldn't run in parallel (most could). We assumed we were stuck with our provisioning model (we weren't).

Is 48 minutes perfect? No. But it's good enough to change how developers work. Good enough to allow same-day iterations. Good enough to stop being a major bottleneck. And most importantly, it proved that dramatic improvements are possible when you measure what matters and tackle the obvious problems first.

The journey continues. There are more optimizations to explore, more time to reclaim. But sometimes the best place to start is with the simple stuff that gets you 80% of the way there.