ExperimentJuly 1, 202615 min read

The Cloud Capacity Your JVM Leaves Idle: A Rigorous Test of Microsoft's jaz

In the cloud, every GB and every core show up on the bill, and the bill shrinks when you pack more services onto a node. Yet out of the box the Java runtime tends to leave part of that capacity idle and can deliver less than the environment allows, precisely in the smaller containers that density asks for. Reclaiming it has always meant a JVM tuning specialist. Microsoft's jaz, the Azure Command Launcher for Java, promises to do that adjustment for you. This study tests whether, and where, it delivers.

Thiago MendesResearcher at TM Dev Lab

Abstract

jaz, the Azure Command Launcher for Java, is a drop-in replacement for the java command that reads a container's cgroup limits and applies cloud-appropriate JVM tuning, heap sizing, garbage collector choice, and diagnostics, with no manual configuration. This study asks a simple question and answers it with measurements. For a realistic, I/O- and memory-bound Java microservice, does replacing java with jaz actually beat the default, and by how much?

We ran a controlled A/B benchmark on an Azure virtual machine across a grid of container sizes, 1 and 2 GB of memory by 1 and 2 vCPUs, that deliberately straddles the JVM's own ergonomic boundary for choosing a garbage collector. The headline result is concentrated. Where the default falls into the single-threaded Serial collector on a multi-core container, jaz raises throughput by 36 percent and cuts p99 tail latency by more than six times. Where the default already makes a good choice, jaz is a wash, and on a single core with roomy memory it can cost throughput. The value of jaz is not that it always wins, but that it makes the right call for the container automatically, without a specialist.

Keywords: jaz, Azure Command Launcher for Java, JVM tuning, G1 GC, Serial GC, container ergonomics, Kubernetes density, Spring Boot, p99 latency, cloud cost

1. The Capacity You Already Pay For

In the cloud, every gigabyte and every core is a line on an invoice, and the invoice shrinks when you pack more services onto each node. Density is one of the most direct cost levers a platform team has. To get it, you shrink the resource request of each workload so more of them fit.

Java complicates that lever in a way most teams never see. The HotSpot JVM chooses its two most consequential defaults, the garbage collector and the maximum heap, from the machine it thinks it is running on. It only turns on the balanced G1 collector when the environment looks "server-class", which it defines as at least two available processors and at least about 1792 MB of memory. Below either threshold it falls back to the single-threaded, stop-the-world Serial collector, and it caps the maximum heap at roughly 25 percent of the container memory. Both are reasonable choices for a laptop or a tiny utility. Neither is a good fit for a latency-sensitive service in a small container, which is exactly the shape density asks for.

The failure mode is quiet. Nothing errors. A team trims a Spring Boot service from a generous footprint down to something denser, the container drops below the server-class line, the JVM silently switches to Serial GC, and the tail latency degrades. The people who chose the size and the people who feel the latency are often not the same, and the boundary that connects them is specialist knowledge. Recovering the lost performance has always meant someone who knows to pass -XX:+UseG1GC and a sensible -XX:MaxRAMPercentage, and to revisit those flags every time the container is resized.

Microsoft's jaz, the Azure Command Launcher for Java, is a bet that the runtime should just handle this. You replace java with jaz in your launch command, and it derives the tuning from the container's actual cgroup limits at every start. It is a good promise. Good promises deserve a rigorous test, so we built one.

2. The Candidate: What jaz Does

jaz sits between your container's start command and the JVM. It reads the cgroup limits, picks JVM flags it considers appropriate for that envelope, and then launches java with them. It only tunes when you have not passed your own tuning flags, so it stays out of the way of a workload that already knows what it wants. Two switches make it easy to inspect: JAZ_DRY_RUN=1 prints the exact java command it would run, and JAZ_BYPASS=1 disables its tuning entirely.

Because JAZ_DRY_RUN exposes the real command, we do not have to trust documentation. For a 1 GB, 2 vCPU container, here is the difference between what the plain java default does and what jaz applies.

Table 1: Plain java default versus jaz for a 1 GB / 2 vCPU container
Setting	plain java default	jaz
Garbage collector	Serial GC	G1 GC
Max heap	256 MB (25 percent of RAM)	732 MB (about 71 percent)
Heap sizing	static	adaptive, with free-ratio bounds, time-based sizing, and periodic GC
Diagnostics	none	Native Memory Tracking, crash error file

Two levers stand out. jaz switches the collector from Serial to G1, and it nearly triples the maximum heap, from 256 MB to 732 MB, so the runtime uses much more of the memory the container was given. Whether that translates into real performance, and where, is the whole question.

3. Hypotheses

We set out to test three claims, framed so the data could reject any of them.

H1, resource utilization. The default caps the heap at about 25 percent of the container memory, leaving most of it idle. jaz uses more of the available memory and converts that otherwise-idle capacity into useful work.
H2, throughput and tail latency. Under sustained concurrent load, jaz delivers throughput at or above the default and p99 tail latency at or below the default, in the same resource envelope.
H3, garbage collection efficiency. jaz lowers GC overhead, measured as total stop-the-world pause time, for the workload.

4. Method

4.1 The Workload

A benchmark is only as good as its workload. We wrote an in-memory digital bank, a Spring Boot 4.1 service on Java 21 using the classic blocking, thread-per-request model that the large majority of Java services still run. Accounts and an append-only transaction ledger live entirely on the heap. Every request simulates a downstream call by parking its thread for a few milliseconds, so the service is I/O- and memory-bound rather than CPU-bound, the profile of a typical microservice rather than a number-crunching job. A warm working set is preloaded at startup so that heap sizing and garbage collection actually matter under load. The endpoints cover opening accounts, deposits, withdrawals, transfers, balance reads, and statement reads.

4.2 The Two Arms

The comparison is a clean A/B. The same container image, the same application, and the same JDK, run two ways: java -jar app.jar, the default, versus jaz -jar app.jar. Nothing else changes. The JDK is the Microsoft Build of OpenJDK 21, whose container image already bundles jaz, so there is no separate install.

4.3 The Memory by CPU Grid

We run the grid that straddles the server-class boundary on purpose. With the default, this is what the JVM ergonomically chooses in each cell.

Table 2: The garbage collector the plain java default selects per cell
	1 vCPU	2 vCPU
1 GB	Serial GC	Serial GC
2 GB	Serial GC	G1 GC

Three of the four cells fall into Serial GC with the default, and only the 2 GB, 2 vCPU cell clears the bar for G1. jaz, as we will see, chooses G1 in all four. This grid lets us separate where jaz fixes a poor default from where the default was already fine. Per-cell captures of the exact flags each arm used are committed alongside the results.

4.4 Bench and Measurement

We did not run this on a laptop. jaz targets the cloud, so the bench is an Azure virtual machine, a Standard_D4s_v5 with 4 vCPUs and 16 GiB running Docker, with each arm confined to the cell's cgroup limits. Load is generated on the same machine with k6, about 70 percent reads and 30 percent writes at fixed concurrency, with a warmup phase discarded before a measurement window. We record throughput and the full latency distribution, reporting p99 as the tail metric the hypothesis is about, peak and idle memory and CPU from the cgroup, and total GC pause time from the JVM's unified log. Every cell is run five times and reported as the median. The harness, the workload, and the raw data are in the companion experiment repository listed in the references, and a single command reproduces the run.

5. Results

Forty runs on the Azure bench, four cells by two launchers by five repetitions. No container ran out of memory, and no request failed with a server error. The default's collector choice matched the ergonomic prediction exactly, Serial GC everywhere except 2 GB / 2 vCPU where it used G1, and jaz used G1 in all four cells. The medians tell a story with a sharp peak and clear limits.

5.1 Tail Latency

Tail latency is where the Serial-versus-G1 difference shows up most violently. Serial GC freezes every request thread at once during a collection, so those stalls land in the tail. On the 1 GB, 2 vCPU cell the default's p99 is 249 ms while jaz's is 39 ms, more than six times better. On 1 GB, 1 vCPU it is 285 ms versus 102 ms. Where the default already uses G1, at 2 GB, 2 vCPU, the two are even, and on 2 GB, 1 vCPU jaz is slightly worse.

p99 tail latency by scenario (milliseconds, lower is better)

java (default)

jaz

300 ms250 ms200 ms150 ms100 ms50 ms0

1 GB
1 vCPU286102

1 GB
2 vCPU24939

2 GB
1 vCPU85100

2 GB
2 vCPU3938

Figure 1: Serial GC pauses push the default's p99 far into the tail on the small cells, where jaz on G1 stays low.

5.2 Throughput

The throughput picture is more nuanced, and this is where the results diverge the most. On 1 GB, 2 vCPU, jaz processes 36 percent more requests per second than the default, because a bigger heap on G1 with two cores spends far less time paused. On the single-core cells the story splits. At 1 GB it is a tie, but at 2 GB, where the default's Serial GC is not under memory pressure, G1's concurrent machinery competes with the one core and jaz gives up about 25 percent of throughput. Where both already use G1, at 2 GB, 2 vCPU, it is a wash.

Throughput by scenario (requests per second, higher is better)

java (default)

jaz

8,000/s6,0004,0002,0000

1 GB
1 vCPU2,6922,698

1 GB
2 vCPU5,7497,847

2 GB
1 vCPU3,4632,778

2 GB
2 vCPU7,8307,915

Figure 2: jaz wins throughput big at 1 GB / 2 vCPU, ties on the larger and the tightest cells, and loses on 2 GB / 1 vCPU.

5.3 Garbage Collection

The garbage collector is the engine behind the latency result, and the numbers are stark. In the cells where the default falls into Serial GC, its total stop-the-world pause time per run is 6 to 10 times higher than jaz. On 1 GB, 2 vCPU the default spends over 29 seconds of a run paused, roughly a third of the time, while jaz spends under 3. Where both use G1 the pause budgets are close.

Total GC pause per run by scenario (milliseconds, lower is better)

java (default)

jaz

30,000 ms20,00010,0000

1 GB
1 vCPU15,0902,533

1 GB
2 vCPU29,2172,805

2 GB
1 vCPU2,0442,091

2 GB
2 vCPU2,5602,320

Figure 3: On the small cells the default burns a huge fraction of the run in Serial GC pauses. jaz on G1 stays low across the board.

5.4 Memory

In every cell jaz uses more of the container's memory than the default, and leaves less of it idle. That is the point of the larger heap. But using more memory is only worth something when it buys performance. The throughput chart shows that it does under pressure, at 1 GB, 2 vCPU, and mostly does not when the container is over-provisioned for the working set, in the 2 GB cells where the extra heap sits largely unused.

Peak memory used by scenario (MB, out of the container limit)

java (default)

jaz

800 MB6004002000

1 GB
1 vCPU407501

1 GB
2 vCPU416530

2 GB
1 vCPU421705

2 GB
2 vCPU492750

Figure 4: jaz always claims more of the memory the container was granted. The payoff depends on whether the container was actually under pressure.

5.5 Verdict on the Hypotheses

Table 3: Median of five runs per cell. Throughput in req/s, latency and GC pause in ms.
Scenario	Metric	java (default)	jaz	Verdict
1 GB / 2 vCPU	throughput	5749	7847	jaz +36 percent
Serial vs G1	p99	249	39	jaz 6.4x
	GC pause	29217	2805	jaz 10x less
1 GB / 1 vCPU	throughput	2692	2698	tie
Serial vs G1	p99	285	102	jaz 2.8x
	GC pause	15090	2533	jaz 6x less
2 GB / 1 vCPU	throughput	3463	2778	java +25 percent
Serial vs G1	p99	85	100	java
	GC pause	2044	2091	tie
2 GB / 2 vCPU	throughput	7830	7915	tie
G1 vs G1	p99	39	38	tie
	GC pause	2560	2320	tie

Read against the three hypotheses:

H1, resource utilization: confirmed, conditional. jaz used more of the container memory and left less idle in every cell. That converted to useful work only under real memory pressure. At 1 GB, 2 vCPU the larger heap became 36 percent more throughput, while in the 2 GB cells the extra heap sat mostly unused.
H2, throughput and p99 tail latency: confirmed in three of four cells, not universal. jaz met or beat both throughput and p99 at 1 GB / 1 vCPU, 1 GB / 2 vCPU, and 2 GB / 2 vCPU, by a wide margin in the first of those. It lost both at 2 GB / 1 vCPU, where the default's Serial GC was not under pressure and G1's overhead cost throughput on a single core.
H3, garbage collection efficiency: confirmed. Wherever the default fell into Serial GC, jaz cut total pause time by 6 to 10 times. Where both used G1 the two were even. This is the mechanism behind the p99 result.

6. What This Means in Practice

The numbers translate into a few situations a team meets on a normal week.

6.1 Resizing Without Re-Tuning

Container sizes change. Cost reviews, capacity planning, and autoscalers all move the memory and CPU a workload gets, while JVM flags are set once and forgotten, if they were ever set. Trimming a service from 2 GB to 1 GB to save money can silently drop the default from G1 to Serial GC and wreck the tail, with no code change and no error to point at. jaz re-derives the tuning from the cgroup at every start, so the resize does not quietly change the runtime's behavior.

6.2 Not Needing a JVM Specialist

The server-class boundary is niche knowledge. A Spring Boot service in a 1 GB, 2 vCPU container runs on Serial GC and suffers, and the team rarely knows why. jaz encodes the expertise. The 36 percent throughput gain and the six-times better p99 in that cell arrive without anyone reaching for a garbage collector flag.

6.3 Pod Density on Kubernetes

Density is cost. To pack more pods onto a node you shrink each one's request, but shrinking a Java pod with the default runs straight into the boundary, latency degrades, and teams over-provision to be safe, which defeats the point. jaz makes the small container perform, so Java workloads can run in tighter envelopes without the latency penalty. More pods per node, a smaller bill.

6.4 Consistency Across Environments

A laptop with eight cores, a staging box with two, and a production envelope that varies will make the default behave differently in each, the classic works-on-my-machine gap. jaz used G1 in every cell we tested, giving predictable garbage collection regardless of the envelope.

Where jaz is not the answer. When a container is over-provisioned for its working set, the default is already fine and jaz adds little. On a single core with roomy memory, jaz's G1 overhead can cost throughput, and the default's Serial GC is competitive. The sweet spot is the small to moderate, multi-core container, which is the most common shape of a microservice on Kubernetes.

7. Conclusion

jaz is not a universal win. It loses on a single core with memory to spare, and it is a wash where the JVM's default was already going to pick G1. What it does, reliably, is remove a silent and common failure mode, the small multi-core container where the default drops into Serial GC and the tail latency quietly falls apart. In that cell, the one density pushes teams toward, replacing one word in the launch command bought 36 percent more throughput, a p99 more than six times better, and an order of magnitude less time paused for garbage collection.

The value is not that jaz always beats a JVM expert. A specialist who profiles the workload can match it by hand, or beat it. The value is that you do not have to be that specialist, and you do not have to remember to re-tune every time the container is resized. For a team that right-sizes Java workloads for cost and density, that is a real and cheap win, with eyes open about the two cells where it is not.

Reproduce it. The workload, the run harness, the raw data, and the environment capture are in the companion experiment repository. jaz is in public preview, so these numbers are a version-pinned snapshot, jaz 0.0.0-preview from June 2026 on the Microsoft Build of OpenJDK 21. A single command reproduces the full grid.

References

Microsoft. (2026). About the Azure Command Launcher for Java. Microsoft Learn. Retrieved from https://learn.microsoft.com/en-us/java/jaz/overview
Microsoft. (2026). Frequently Asked Questions about the Azure Command Launcher for Java. Microsoft Learn. Retrieved from https://learn.microsoft.com/en-us/java/jaz/faq
Microsoft. (2026). Install the Azure Command Launcher for Java. Microsoft Learn. Retrieved from https://learn.microsoft.com/en-us/java/jaz/install
Oracle. (2026). HotSpot Virtual Machine Garbage Collection Tuning Guide: Ergonomics. Java Platform, Standard Edition 21. Retrieved from https://docs.oracle.com/en/java/javase/21/gctuning/ergonomics.html
TM Dev Lab. (2026). jaz vs java: cloud JVM defaults on an I/O- and memory-bound workload. Companion experiment repository. Retrieved from https://github.com/tm-dev-lab/tm-dev-lab-experiments/tree/main/jaz-cloud-jvm-tuning

Abstract

Table of Contents

1. The Capacity You Already Pay For

2. The Candidate: What jaz Does

3. Hypotheses

4. Method

4.1 The Workload

4.2 The Two Arms

4.3 The Memory by CPU Grid

4.4 Bench and Measurement

5. Results

5.1 Tail Latency

p99 tail latency by scenario (milliseconds, lower is better)

5.2 Throughput

Throughput by scenario (requests per second, higher is better)

5.3 Garbage Collection

Total GC pause per run by scenario (milliseconds, lower is better)

5.4 Memory

Peak memory used by scenario (MB, out of the container limit)

5.5 Verdict on the Hypotheses

6. What This Means in Practice

6.1 Resizing Without Re-Tuning

6.2 Not Needing a JVM Specialist

6.3 Pod Density on Kubernetes

6.4 Consistency Across Environments

7. Conclusion

References