The Cloud Capacity Your JVM Leaves Idle: A Rigorous Test of Microsoft's jaz
In the cloud, every GB and every core show up on the bill, and the bill shrinks when you pack more services onto a node. Yet out of the box the Java runtime tends to leave part of that capacity idle and can deliver less than the environment allows, precisely in the smaller containers that density asks for. Reclaiming it has always meant a JVM tuning specialist. Microsoft's jaz, the Azure Command Launcher for Java, promises to do that adjustment for you. This study tests whether, and where, it delivers.
Abstract
jaz, the Azure Command Launcher for Java, is a drop-in replacement for the
java command that reads a container's cgroup limits and applies
cloud-appropriate JVM tuning, heap sizing, garbage collector choice, and diagnostics, with no
manual configuration. This study asks a simple question and answers it with measurements. For a realistic,
I/O- and memory-bound Java microservice, does replacing java
with jaz actually beat the default, and by how much?
We ran a controlled A/B benchmark on an Azure virtual machine across a grid of container sizes, 1 and 2 GB of memory by 1 and 2 vCPUs, that deliberately straddles the JVM's own ergonomic boundary for choosing a garbage collector. The headline result is concentrated. Where the default falls into the single-threaded Serial collector on a multi-core container, jaz raises throughput by 36 percent and cuts p99 tail latency by more than six times. Where the default already makes a good choice, jaz is a wash, and on a single core with roomy memory it can cost throughput. The value of jaz is not that it always wins, but that it makes the right call for the container automatically, without a specialist.
Table of Contents
1. The Capacity You Already Pay For
In the cloud, every gigabyte and every core is a line on an invoice, and the invoice shrinks when you pack more services onto each node. Density is one of the most direct cost levers a platform team has. To get it, you shrink the resource request of each workload so more of them fit.
Java complicates that lever in a way most teams never see. The HotSpot JVM chooses its two most consequential defaults, the garbage collector and the maximum heap, from the machine it thinks it is running on. It only turns on the balanced G1 collector when the environment looks "server-class", which it defines as at least two available processors and at least about 1792 MB of memory. Below either threshold it falls back to the single-threaded, stop-the-world Serial collector, and it caps the maximum heap at roughly 25 percent of the container memory. Both are reasonable choices for a laptop or a tiny utility. Neither is a good fit for a latency-sensitive service in a small container, which is exactly the shape density asks for.
The failure mode is quiet. Nothing errors. A team trims a Spring Boot service from a generous
footprint down to something denser, the container drops below the server-class line, the JVM
silently switches to Serial GC, and the tail latency degrades. The people who chose the size and
the people who feel the latency are often not the same, and the boundary that connects them is
specialist knowledge. Recovering the lost performance has always meant someone who knows to pass
-XX:+UseG1GC and a sensible
-XX:MaxRAMPercentage, and to revisit those flags every time the
container is resized.
Microsoft's jaz, the Azure Command Launcher for Java, is a bet that the runtime
should just handle this. You replace java with
jaz in your launch command, and it derives the tuning from the
container's actual cgroup limits at every start. It is a good promise. Good promises deserve a
rigorous test, so we built one.
2. The Candidate: What jaz Does
jaz sits between your container's start command and the JVM. It
reads the cgroup limits, picks JVM flags it considers appropriate for that envelope, and then
launches java with them. It only tunes when you have not passed
your own tuning flags, so it stays out of the way of a workload that already knows what it wants.
Two switches make it easy to inspect: JAZ_DRY_RUN=1 prints the
exact java command it would run, and
JAZ_BYPASS=1 disables its tuning entirely.
Because JAZ_DRY_RUN exposes the real command, we do not have to
trust documentation. For a 1 GB, 2 vCPU container, here is the difference between what the plain
java default does and what jaz applies.
| Setting | plain java default | jaz |
|---|---|---|
| Garbage collector | Serial GC | G1 GC |
| Max heap | 256 MB (25 percent of RAM) | 732 MB (about 71 percent) |
| Heap sizing | static | adaptive, with free-ratio bounds, time-based sizing, and periodic GC |
| Diagnostics | none | Native Memory Tracking, crash error file |
Two levers stand out. jaz switches the collector from Serial to G1, and it nearly triples the maximum heap, from 256 MB to 732 MB, so the runtime uses much more of the memory the container was given. Whether that translates into real performance, and where, is the whole question.
3. Hypotheses
We set out to test three claims, framed so the data could reject any of them.
- H1, resource utilization. The default caps the heap at about 25 percent of the container memory, leaving most of it idle. jaz uses more of the available memory and converts that otherwise-idle capacity into useful work.
- H2, throughput and tail latency. Under sustained concurrent load, jaz delivers throughput at or above the default and p99 tail latency at or below the default, in the same resource envelope.
- H3, garbage collection efficiency. jaz lowers GC overhead, measured as total stop-the-world pause time, for the workload.
4. Method
4.1 The Workload
A benchmark is only as good as its workload. We wrote an in-memory digital bank, a Spring Boot 4.1 service on Java 21 using the classic blocking, thread-per-request model that the large majority of Java services still run. Accounts and an append-only transaction ledger live entirely on the heap. Every request simulates a downstream call by parking its thread for a few milliseconds, so the service is I/O- and memory-bound rather than CPU-bound, the profile of a typical microservice rather than a number-crunching job. A warm working set is preloaded at startup so that heap sizing and garbage collection actually matter under load. The endpoints cover opening accounts, deposits, withdrawals, transfers, balance reads, and statement reads.
4.2 The Two Arms
The comparison is a clean A/B. The same container image, the same application, and the same JDK,
run two ways: java -jar app.jar, the default, versus
jaz -jar app.jar. Nothing else changes. The JDK is the Microsoft
Build of OpenJDK 21, whose container image already bundles jaz, so there is no separate install.
4.3 The Memory by CPU Grid
We run the grid that straddles the server-class boundary on purpose. With the default, this is what the JVM ergonomically chooses in each cell.
| 1 vCPU | 2 vCPU | |
|---|---|---|
| 1 GB | Serial GC | Serial GC |
| 2 GB | Serial GC | G1 GC |
Three of the four cells fall into Serial GC with the default, and only the 2 GB, 2 vCPU cell clears the bar for G1. jaz, as we will see, chooses G1 in all four. This grid lets us separate where jaz fixes a poor default from where the default was already fine. Per-cell captures of the exact flags each arm used are committed alongside the results.
4.4 Bench and Measurement
We did not run this on a laptop. jaz targets the cloud, so the bench is an Azure virtual machine, a
Standard_D4s_v5 with 4 vCPUs and 16 GiB running Docker, with each
arm confined to the cell's cgroup limits. Load is generated on the same machine with k6, about 70
percent reads and 30 percent writes at fixed concurrency, with a warmup phase discarded before a
measurement window. We record throughput and the full latency distribution, reporting p99 as the
tail metric the hypothesis is about, peak and idle memory and CPU from the cgroup, and total GC
pause time from the JVM's unified log. Every cell is run five times and reported as the median. The
harness, the workload, and the raw data are in the companion experiment repository listed in the
references, and a single command reproduces the run.
5. Results
Forty runs on the Azure bench, four cells by two launchers by five repetitions. No container ran out of memory, and no request failed with a server error. The default's collector choice matched the ergonomic prediction exactly, Serial GC everywhere except 2 GB / 2 vCPU where it used G1, and jaz used G1 in all four cells. The medians tell a story with a sharp peak and clear limits.
5.1 Tail Latency
Tail latency is where the Serial-versus-G1 difference shows up most violently. Serial GC freezes every request thread at once during a collection, so those stalls land in the tail. On the 1 GB, 2 vCPU cell the default's p99 is 249 ms while jaz's is 39 ms, more than six times better. On 1 GB, 1 vCPU it is 285 ms versus 102 ms. Where the default already uses G1, at 2 GB, 2 vCPU, the two are even, and on 2 GB, 1 vCPU jaz is slightly worse.
p99 tail latency by scenario (milliseconds, lower is better)
1 vCPU286102
2 vCPU24939
1 vCPU85100
2 vCPU3938
Figure 1: Serial GC pauses push the default's p99 far into the tail on the small cells, where jaz on G1 stays low.
5.2 Throughput
The throughput picture is more nuanced, and this is where the results diverge the most. On 1 GB, 2 vCPU, jaz processes 36 percent more requests per second than the default, because a bigger heap on G1 with two cores spends far less time paused. On the single-core cells the story splits. At 1 GB it is a tie, but at 2 GB, where the default's Serial GC is not under memory pressure, G1's concurrent machinery competes with the one core and jaz gives up about 25 percent of throughput. Where both already use G1, at 2 GB, 2 vCPU, it is a wash.
Throughput by scenario (requests per second, higher is better)
1 vCPU2,6922,698
2 vCPU5,7497,847
1 vCPU3,4632,778
2 vCPU7,8307,915
Figure 2: jaz wins throughput big at 1 GB / 2 vCPU, ties on the larger and the tightest cells, and loses on 2 GB / 1 vCPU.
5.3 Garbage Collection
The garbage collector is the engine behind the latency result, and the numbers are stark. In the cells where the default falls into Serial GC, its total stop-the-world pause time per run is 6 to 10 times higher than jaz. On 1 GB, 2 vCPU the default spends over 29 seconds of a run paused, roughly a third of the time, while jaz spends under 3. Where both use G1 the pause budgets are close.
Total GC pause per run by scenario (milliseconds, lower is better)
1 vCPU15,0902,533
2 vCPU29,2172,805
1 vCPU2,0442,091
2 vCPU2,5602,320
Figure 3: On the small cells the default burns a huge fraction of the run in Serial GC pauses. jaz on G1 stays low across the board.
5.4 Memory
In every cell jaz uses more of the container's memory than the default, and leaves less of it idle. That is the point of the larger heap. But using more memory is only worth something when it buys performance. The throughput chart shows that it does under pressure, at 1 GB, 2 vCPU, and mostly does not when the container is over-provisioned for the working set, in the 2 GB cells where the extra heap sits largely unused.
Peak memory used by scenario (MB, out of the container limit)
1 vCPU407501
2 vCPU416530
1 vCPU421705
2 vCPU492750
Figure 4: jaz always claims more of the memory the container was granted. The payoff depends on whether the container was actually under pressure.
5.5 Verdict on the Hypotheses
| Scenario | Metric | java (default) | jaz | Verdict |
|---|---|---|---|---|
| 1 GB / 2 vCPU | throughput | 5749 | 7847 | jaz +36 percent |
| Serial vs G1 | p99 | 249 | 39 | jaz 6.4x |
| GC pause | 29217 | 2805 | jaz 10x less | |
| 1 GB / 1 vCPU | throughput | 2692 | 2698 | tie |
| Serial vs G1 | p99 | 285 | 102 | jaz 2.8x |
| GC pause | 15090 | 2533 | jaz 6x less | |
| 2 GB / 1 vCPU | throughput | 3463 | 2778 | java +25 percent |
| Serial vs G1 | p99 | 85 | 100 | java |
| GC pause | 2044 | 2091 | tie | |
| 2 GB / 2 vCPU | throughput | 7830 | 7915 | tie |
| G1 vs G1 | p99 | 39 | 38 | tie |
| GC pause | 2560 | 2320 | tie |
Read against the three hypotheses:
- H1, resource utilization: confirmed, conditional. jaz used more of the container memory and left less idle in every cell. That converted to useful work only under real memory pressure. At 1 GB, 2 vCPU the larger heap became 36 percent more throughput, while in the 2 GB cells the extra heap sat mostly unused.
- H2, throughput and p99 tail latency: confirmed in three of four cells, not universal. jaz met or beat both throughput and p99 at 1 GB / 1 vCPU, 1 GB / 2 vCPU, and 2 GB / 2 vCPU, by a wide margin in the first of those. It lost both at 2 GB / 1 vCPU, where the default's Serial GC was not under pressure and G1's overhead cost throughput on a single core.
- H3, garbage collection efficiency: confirmed. Wherever the default fell into Serial GC, jaz cut total pause time by 6 to 10 times. Where both used G1 the two were even. This is the mechanism behind the p99 result.
6. What This Means in Practice
The numbers translate into a few situations a team meets on a normal week.
6.1 Resizing Without Re-Tuning
Container sizes change. Cost reviews, capacity planning, and autoscalers all move the memory and CPU a workload gets, while JVM flags are set once and forgotten, if they were ever set. Trimming a service from 2 GB to 1 GB to save money can silently drop the default from G1 to Serial GC and wreck the tail, with no code change and no error to point at. jaz re-derives the tuning from the cgroup at every start, so the resize does not quietly change the runtime's behavior.
6.2 Not Needing a JVM Specialist
The server-class boundary is niche knowledge. A Spring Boot service in a 1 GB, 2 vCPU container runs on Serial GC and suffers, and the team rarely knows why. jaz encodes the expertise. The 36 percent throughput gain and the six-times better p99 in that cell arrive without anyone reaching for a garbage collector flag.
6.3 Pod Density on Kubernetes
Density is cost. To pack more pods onto a node you shrink each one's request, but shrinking a Java pod with the default runs straight into the boundary, latency degrades, and teams over-provision to be safe, which defeats the point. jaz makes the small container perform, so Java workloads can run in tighter envelopes without the latency penalty. More pods per node, a smaller bill.
6.4 Consistency Across Environments
A laptop with eight cores, a staging box with two, and a production envelope that varies will make the default behave differently in each, the classic works-on-my-machine gap. jaz used G1 in every cell we tested, giving predictable garbage collection regardless of the envelope.
7. Conclusion
jaz is not a universal win. It loses on a single core with memory to spare, and it is a wash where the JVM's default was already going to pick G1. What it does, reliably, is remove a silent and common failure mode, the small multi-core container where the default drops into Serial GC and the tail latency quietly falls apart. In that cell, the one density pushes teams toward, replacing one word in the launch command bought 36 percent more throughput, a p99 more than six times better, and an order of magnitude less time paused for garbage collection.
The value is not that jaz always beats a JVM expert. A specialist who profiles the workload can match it by hand, or beat it. The value is that you do not have to be that specialist, and you do not have to remember to re-tune every time the container is resized. For a team that right-sizes Java workloads for cost and density, that is a real and cheap win, with eyes open about the two cells where it is not.
References
- Microsoft. (2026). About the Azure Command Launcher for Java. Microsoft Learn. Retrieved from https://learn.microsoft.com/en-us/java/jaz/overview
- Microsoft. (2026). Frequently Asked Questions about the Azure Command Launcher for Java. Microsoft Learn. Retrieved from https://learn.microsoft.com/en-us/java/jaz/faq
- Microsoft. (2026). Install the Azure Command Launcher for Java. Microsoft Learn. Retrieved from https://learn.microsoft.com/en-us/java/jaz/install
- Oracle. (2026). HotSpot Virtual Machine Garbage Collection Tuning Guide: Ergonomics. Java Platform, Standard Edition 21. Retrieved from https://docs.oracle.com/en/java/javase/21/gctuning/ergonomics.html
- TM Dev Lab. (2026). jaz vs java: cloud JVM defaults on an I/O- and memory-bound workload. Companion experiment repository. Retrieved from https://github.com/tm-dev-lab/tm-dev-lab-experiments/tree/main/jaz-cloud-jvm-tuning