Inside the JVM: The Engineering Behind Enterprise Performance
How the JVM delivers predictable performance at scale. A systematic look at the internals that make it the runtime of choice for high-concurrency enterprise applications.
Why Should You Care?
In a landscape of native binaries that start in milliseconds and interpreted languages that prioritize developer velocity, the JVM chose a third path: an adaptive runtime that pays a cost at startup and earns it back, with interest, in sustained production workloads.
Your Java code is not simply "interpreted" or "compiled." It is interpreted first, then compiled with lightweight optimizations while the JVM observes how your application actually behaves, then recompiled with aggressive speculative optimizations based on that observed behavior, then deoptimized back to the interpreter when those speculations turn out wrong, then recompiled again with better data. This cycle runs continuously while your application serves traffic. AOT-compiled runtimes produce native code at build time, and that code is final. The JVM produces native code that evolves.
This is a runtime that ships five production garbage collectors, including two that deliver sub-millisecond pauses on terabyte-scale heaps. A runtime that implements virtual threads with native M:N scheduling, where millions of concurrent tasks share a handful of OS threads. A runtime with a built-in diagnostic engine (JFR) that records 150+ event types with less than 1% overhead. All of this implemented in roughly 1.2 million lines of C++ in the OpenJDK repository.
These capabilities come at a price. The JVM consumes more memory at baseline than a statically compiled binary. It needs time to reach peak performance. Its operational surface is larger. This study does not hide those trade-offs. It explains them, shows where they come from in the source code, and describes how the OpenJDK community is actively addressing each one.
This study opens the box. It walks through every major subsystem of the JVM, explains what it does, how it works, and points to the exact C++ source files where each mechanism is implemented. Along the way, it draws comparisons with other runtime models to give you a clear picture of what the JVM's adaptive architecture buys and what it costs. The goal is not to turn you into a JVM contributor. It is to give you the mental model that separates an engineer who writes code that runs on the JVM from one who understands why it runs the way it does.
Table of Contents
1. Overview: The Major Building Blocks
What and why
The HotSpot JVM is the reference implementation of OpenJDK, the virtual machine that runs your Java (and Kotlin, Scala, Groovy, Clojure) code. The name HotSpot comes from its core strategy: identify frequently executed code paths and compile them to native machine code at runtime.
HotSpot is composed of five subsystems that cooperate with each other:
| Subsystem | Responsibility |
|---|---|
| Class Loading | Find, load, verify, and prepare classes for execution |
| Execution Engine | Execute bytecode through interpretation and JIT compilation |
| Memory Management | Allocate objects, manage metadata, and reclaim memory (GC) |
| Runtime | Manage threads, synchronization, JNI, and cross-subsystem coordination |
| Diagnostics | Instrumentation, profiling, and monitoring (JFR, JVMTI) |
How it works: the lifecycle of a program
When you run java MyApp, the following happens:
flowchart TD
A["java MyApp"] --> B["Launcher creates JVM"]
B --> C["ClassLoader loads MyApp.class"]
C --> D["Interpreter executes main"]
D --> E{"Hot method?"}
E -->|No| D
E -->|Yes| F["JIT compiles to native code"]
F --> G["Optimized native execution"]
G --> H{"Assumption invalidated?"}
H -->|No| G
H -->|Yes| D
style A fill:#1e293b,stroke:#6366f1
style F fill:#1e293b,stroke:#10b981
style G fill:#1e293b,stroke:#10b981
Each step involves a different subsystem, and they exchange information constantly. Profiling in the interpreter feeds the JIT. Write barriers from the JIT serve the GC. Safepoints coordinate everyone. Understanding these interconnections is as important as understanding each subsystem in isolation.
Where it lives in the code: repository structure
The HotSpot code resides in src/hotspot/ with a platform abstraction in layers:
| Directory | Purpose |
|---|---|
share/ | Portable C++ code (~95% of HotSpot) |
share/interpreter/ | Template interpreter |
share/c1/ | C1 compiler (Client) |
share/opto/ | C2 compiler (Server/Opto) |
share/gc/ | Garbage collectors (g1, z, shenandoah, serial, parallel) |
share/oops/ | Object model (oop, Klass, Method, ConstantPool) |
share/classfile/ | Class loading, parsing, verification |
share/runtime/ | Threads, safepoints, synchronization |
share/compiler/ | Shared compilation infrastructure |
share/code/ | Code cache (the memory region where JIT-compiled code is stored) and nmethods (the JVM's internal representation of a compiled Java method) |
share/memory/ | Metaspace, internal allocation |
share/prims/ | JNI, JVMTI, jvm.cpp |
share/jfr/ | Java Flight Recorder |
share/jvmci/ | Interface for external compilers (Graal) |
cpu/<arch>/ | CPU-specific code (x86, aarch64, riscv, ..) |
os/<os>/ | OS-specific code (linux, windows, bsd, ..) |
os_cpu/<os>_<cpu>/ | OS + CPU combination |
The core Java code lives in src/java.base/, and the bridge between
them happens primarily in prims/jvm.cpp, which implements ~200
JVM_* functions called by the native code in
java.base.
2. Class Loading: From .class to Execution
What and why
Before executing any code, the JVM needs to transform the .class
file (a sequence of bytes) into internal structures it can work with. This process has three formal
phases defined by the JVM specification: loading, linking, and
initialization.
How it works
The .class file is a binary container with sequential structure: magic number
(0xCAFEBABE), version, constant pool (an indexed table with all
constants including strings, class names, method and field references), class information, fields,
methods, and attributes. The constant pool works as a symbolic dictionary: bytecode contains no
literal names, only indices into this pool.
The most important attribute of each method is Code, which contains the actual bytecode, stack and local variable limits, exception handler table, and the StackMapTable (used during verification).
Phase 1, Loading: The ClassLoader locates the binary representation of the class, reads its bytes, and creates the internal Klass structure. The JVM automatically loads all superclasses and superinterfaces first.
Phase 2, Linking:
- Verification: the JVM verifies that the bytecode is structurally valid and type-safe. The verifier uses the StackMapTable, type states precomputed by javac. It makes a single pass confirming that each instruction operates on correct types.
- Preparation: allocates memory for static fields (initialized to zero/null/false) and prepares the vtable (virtual dispatch) and itable (interface dispatch).
- Resolution: transforms symbolic references in the constant pool (textual names) into direct references (pointers). Can be lazy and happens on first use.
Phase 3, Initialization: executes the <clinit> method (static initializers). The JVM guarantees thread-safety, each class is initialized exactly once.
ClassLoader Hierarchy:
flowchart BT
C["Custom ClassLoaders"]
A["Application ClassLoader"]
P["Platform ClassLoader"]
B["Bootstrap ClassLoader"]
C -->|delegates to| A
A -->|delegates to| P
P -->|delegates to| B
style B fill:#1e293b,stroke:#6366f1
style P fill:#1e293b,stroke:#8b5cf6
style A fill:#1e293b,stroke:#a78bfa
style C fill:#1e293b,stroke:#c4b5fd
| ClassLoader | Scope |
|---|---|
| Bootstrap | VM internal, represented as null in Java. Loads java.lang.*, java.util.*, etc. |
| Platform | Platform modules (formerly Extension ClassLoader) |
| Application | Classpath / module path |
| Custom | Hot-reload, plugins, isolation |
The parent-first delegation model ensures that fundamental classes are always loaded by Bootstrap.
Internally, the SystemDictionary is the central registry. It maps (class name +
ClassLoader) to Klass*.
Where it lives in the code
The .class parsing happens in classFileParser.cpp. Here is the main entry point:
// src/hotspot/share/classfile/classFileParser.cpp
void ClassFileParser::parse_stream(const ClassFileStream* const stream, TRAPS) {
// Magic value
const u4 magic = stream->get_u4_fast();
guarantee_property(magic == JAVA_CLASSFILE_MAGIC, "Incompatible magic value");
// Version numbers
_minor_version = stream->get_u2_fast();
_major_version = stream->get_u2_fast();
// Constant pool
parse_constant_pool(stream, CHECK);
// Access flags, this class, super class
_access_flags.set_flags(stream->get_u2_fast() & JVM_RECOGNIZED_CLASS_MODIFIERS);
// ..
}
src/hotspot/share/classfile/classFileParser.cpp
The resolution entry point is in LinkResolver::resolve_invoke, which decides whether the call is virtual, interface, static, or special:
src/hotspot/share/interpreter/linkResolver.cpp
The SystemDictionary, the central class registry:
src/hotspot/share/classfile/systemDictionary.cpp
Why this matters in practice
Class loading is not just a startup concern. Late class loading (classes loaded for the first time during normal operation via reflection, service loaders, serialization frameworks, or plugin systems) can trigger cascading effects: new classes invalidate JIT-compiled code dependencies, causing deoptimization of previously optimized methods.
If you have ever seen a latency spike 30 minutes into a running service that correlates with a
new code path being hit for the first time, class loading is a likely culprit. Tools:
-verbose:class shows every class loaded and when.
-Xlog:class+load gives timestamps. CDS/AppCDS pre-loads
classes to eliminate this cost at startup.
3. The Object Model: oops and Klass
What and why
The JVM needs two things: to represent types (classes) and to represent instances
(objects). In HotSpot, types are represented by Klass structures
that live in Metaspace (native memory), and instances are oops
(ordinary object pointers) that live in the Java heap.
How it works
Klass Hierarchy:
classDiagram
Klass <|-- InstanceKlass
Klass <|-- ArrayKlass
InstanceKlass <|-- InstanceRefKlass
InstanceKlass <|-- InstanceMirrorKlass
InstanceKlass <|-- InstanceClassLoaderKlass
InstanceKlass <|-- InstanceStackChunkKlass
ArrayKlass <|-- TypeArrayKlass
ArrayKlass <|-- ObjArrayKlass
class Klass {
_name : Symbol*
_super : Klass*
_layout_helper : jint
_java_mirror : OopHandle
}
class InstanceKlass {
_constants : ConstantPool*
_methods : Array of Method*
_fields : Array of FieldInfo
_vtable inline
_itable inline
_init_state
}
class ArrayKlass {
_dimension : int
_component_mirror
}
InstanceKlass contains everything about a class: constant pool,
methods, fields, vtable (for virtual dispatch), itable (for interface dispatch), and initialization state.
Object layout in the heap (oop):
| Offset | Component | Size | Description |
|---|---|---|---|
| 0 | Mark Word | 64 bits | Tag bits (lock state), GC age (4 bits), identity hash code (31 bits), lock/GC metadata |
| 8 | Klass Pointer | 32 bits (compressed) | Points to the InstanceKlass in Metaspace |
| 12 | Instance data | Variable | Fields: int, long, Object references, etc. |
| Padding | 0-7 bytes | Alignment to 8-byte boundary |
For arrays, a 32-bit length field is inserted after the Klass pointer.
The mark word encodes multiple pieces of information. The tag bits indicate lock
state: 01 = unlocked, 00 =
lightweight-locked, 10 = inflated monitor,
11 = marked by GC. The 31-bit identity hash code is computed
lazily upon first call to hashCode(). The 4-bit GC age tracks
how many young GC cycles the object has survived (max 15, used for promotion decisions).
Compressed oops: object references are represented as 32 bits with a shift,
addressing up to ~32 GB of heap with 32-bit pointers.
Decoding: address = heap_base + (narrow_oop << 3).
Where it lives in the code
The mark word, all encoding/decoding logic lives in this class:
// src/hotspot/share/oops/markWord.hpp
class markWord {
private:
uintptr_t _value; // the entire word
public:
// Bit layout (64-bit):
static const int lock_bits = 2;
static const int age_bits = 4;
static const int hash_bits = 31; // max 31 bits for hash
static const int unused_gap_bits = 4; // reserved for Valhalla
// Lock bit state constants:
// 00 -> lightweight locked
// 01 -> unlocked (normal)
// 10 -> monitor (inflated)
// 11 -> marked by GC
// Operations
uint age() const { return mask_bits(value() >> age_shift, age_mask); }
markWord incr_age() const {
return age() == max_age ? markWord(_value) : set_age(age() + 1);
}
intptr_t hash() const { return mask_bits(value() >> hash_shift, hash_mask); }
};
src/hotspot/share/oops/markWord.hpp
The oop base (every object in the heap):
// src/hotspot/share/oops/oop.hpp
class oopDesc {
private:
volatile markWord _mark; // mark word (64 bits)
union _metadata {
Klass* _klass; // direct pointer to Klass
narrowKlass _compressed_klass; // compressed pointer (32 bits)
} _metadata;
// .. instance fields follow after the header
};
src/hotspot/share/oops/oop.hpp
src/hotspot/share/oops/klass.hpp, full Klass hierarchy
src/hotspot/share/oops/instanceKlass.hpp, InstanceKlass with vtable, itable, and more
Why this matters in practice
Every Java object carries a 12-byte header (mark word + klass pointer) before a single field is
stored. If your application has 10 million Boolean wrapper objects
in the heap, that is 120 MB of headers alone. The actual data (boolean
value) is 1 byte per object.
You can observe this directly using JOL (Java Object Layout):
// Run with: java -jar jol-cli.jar internals java.lang.Boolean
// Output (64-bit, compressed oops):
//
// OFFSET SIZE TYPE DESCRIPTION
// 0 12 (object header: mark word + klass)
// 12 1 boolean Boolean.value
// 13 3 (padding to 8-byte alignment)
//
// Instance size: 16 bytes
// Space losses: 3 bytes (padding) + 12 bytes (header) = 15 bytes overhead for 1 byte of data
This is why replacing HashMap<Integer, Boolean> with a
BitSet or a primitive-specialized collection can cut memory
consumption by 10x or more. Understanding object layout also explains why
record types and value types
(Project Valhalla)
are so anticipated. They target header overhead directly.
4. Execution Engine: Interpretation
What and why
When the JVM starts executing a method, it does not invoke the JIT compiler immediately. That would be too slow at startup. The interpreter executes bytecode directly, delivering immediate results while collecting profiling data that the JIT will use later.
How it works
HotSpot does not interpret bytecode with a switch/case loop in C++.
Instead, it uses a template interpreter: during JVM boot, it generates codelets
(small blocks of native code) for each of the ~200 bytecodes. These codelets are stored in memory and
executed via indirect jumps.
flowchart LR
T["TemplateTable"] --> G["Generator"]
G --> S["StubQueue"]
S --> D["Dispatch Table"]
B["Bytecode stream"] --> D
D --> C1["codelet: iadd"]
D --> C2["codelet: iload"]
D --> C3["codelet: invokevirtual"]
style T fill:#1e293b,stroke:#6366f1
style D fill:#1e293b,stroke:#10b981
During JVM boot (left side), the TemplateTable drives the Generator to produce ~270 codelets stored in a StubQueue. At runtime (right side), the bytecode stream indexes into the Dispatch Table (256 entries per TOS state), which jumps to the corresponding codelet.
The execution cycle for each bytecode:
- Load the next opcode from the bytecode stream
- Advance the bytecode pointer
- Jump indirectly to the corresponding codelet via the dispatch table:
jmp *(table + opcode * 8)
That is only 2-3 machine instructions for dispatch. TOS caching (Top-of-Stack
caching) keeps the top value of the operand stack in a dedicated CPU register
(rax on x86-64), avoiding unnecessary push/pop between
consecutive bytecodes.
Dispatch table swap for safepoints: when the VM needs to pause threads, it replaces the normal dispatch table with a variant that includes safepoint checks. No thread needs to be interrupted. Each one naturally checks on the next dispatch.
Where it lives in the code
Interpreter initialization, note the comment in the code itself confirming the ~270 codelets:
// src/hotspot/share/interpreter/templateInterpreter.cpp
void TemplateInterpreter::initialize_stub() {
assert(_code == nullptr, "must only initialize once");
assert((int)Bytecodes::number_of_codes <= (int)DispatchTable::length,
"dispatch table too small");
int code_size = InterpreterCodeSize;
NOT_PRODUCT(code_size *= 4;) // debug uses extra interpreter code space
// 270+ interpreter codelets are generated and each of them is aligned
// to HeapWordSize, plus their code section is aligned to CodeEntryAlignment.
// ..
}
src/hotspot/share/interpreter/templateInterpreter.cpp
The dispatch table and the safepoint swap:
// src/hotspot/share/interpreter/templateInterpreter.hpp
class TemplateInterpreter: public AbstractInterpreter {
// Three dispatch tables:
static DispatchTable _active_table; // currently active table (pointer that alternates)
static DispatchTable _normal_table; // normal dispatch
static DispatchTable _safept_table; // dispatch with safepoint checks
// The dispatch table: one entry per byte value x each TOS state
// _table[number_of_states][256]
// number_of_states includes: itos, ltos, ftos, dtos, atos, vtos, ..
static address* dispatch_table(TosState state) { return _active_table.table_for(state); }
static address* safept_table(TosState state) { return _safept_table.table_for(state); }
};
src/hotspot/share/interpreter/templateInterpreter.hpp
A concrete example, how the template for the _return instruction generates native code on x86-64:
// src/hotspot/cpu/x86/templateTable_x86.cpp
void TemplateTable::_return(TosState state) {
transition(state, state);
if (_desc->bytecode() == Bytecodes::_return_register_finalizer) {
Register robj = c_rarg1;
__ movptr(robj, aaddress(0)); // load 'this'
__ load_klass(rdi, robj, rscratch1); // load Klass*
__ testb(Address(rdi, Klass::misc_flags_offset()),
KlassFlags::_misc_has_finalizer); // has finalizer?
Label skip_register_finalizer;
__ jcc(Assembler::zero, skip_register_finalizer);
// If so, call runtime to register it
__ call_VM(noreg, CAST_FROM_FN_PTR(address,
InterpreterRuntime::register_finalizer), robj);
__ bind(skip_register_finalizer);
}
// .. remove_activation and return
}
src/hotspot/cpu/x86/templateTable_x86.cpp
Note the pattern: the __ macro is a shortcut for
_masm-> (the macro assembler), and each call generates native x86
instructions. This is the template interpreter: C++ that generates machine code.
Why this matters in practice
The interpreter is what runs your code during startup and warmup. If your application has strict
startup time requirements (serverless functions, CLI tools, microservices with frequent restarts),
you are measuring interpreter performance, not JIT performance. This is why frameworks like Spring
invest in reducing the amount of code executed before the application is ready. It also explains
why -XX:TieredStopAtLevel=1 (compile with C1 only, skip C2)
can improve startup time at the cost of peak throughput — you get native code faster,
just not the most optimized native code.
5. Execution Engine: JIT Compilation
What and why
The interpreter is fast enough for startup, but for peak performance, the JVM compiles "hot" methods into optimized native machine code. HotSpot has two JIT compilers that work together.
How it works: Tiered Compilation
The JVM uses five tiers of execution, balancing startup speed with peak performance:
| Tier | Executor | Behavior |
|---|---|---|
| 0 | Interpreter | Initial execution of all code. Collects basic invocation and backedge counters. |
| 1 | C1 without profiling | For trivial methods. Fast native code, no data collection. |
| 2 | C1 with limited profiling | Counters only. Fallback when the C2 queue is full. |
| 3 | C1 with full profiling | Collects rich data: receiver type profiles, branch frequencies, detailed counters. Default path before C2. |
| 4 | C2 full optimization | Highly optimized compilation using all data collected in Tier 3. |
flowchart TD
T0["Tier 0: Interpreter"] -->|"trivial method"| T1["Tier 1: C1, no profiling"]
T0 -->|"default path"| T3["Tier 3: C1, full profiling"]
T3 -->|"hot method + rich data"| T4["Tier 4: C2, full optimization"]
T4 -->|"deoptimization"| T0
style T0 fill:#1e293b,stroke:#6366f1
style T1 fill:#1e293b,stroke:#10b981
style T3 fill:#1e293b,stroke:#f59e0b
style T4 fill:#1e293b,stroke:#ef4444
The C1 Compiler prioritizes compilation speed (~9-16x faster than C2). Three phases:
- HIR (High-Level IR): abstract interpretation of bytecode generates an SSA (Static Single Assignment) representation, a form where every variable is assigned exactly once, making data flow analysis and optimization straightforward. Includes inlining of small methods and constant folding.
- LIR (Low-Level IR): translation to platform-specific instructions.
- Register Allocation + Emission: linear scan allocation (faster than graph coloring) + final machine code.
The C2 Compiler prioritizes code quality. Its IR is the Sea of Nodes, a graph where nodes represent operations and edges represent data and control dependencies. Nodes "float freely" without belonging to basic blocks, enabling aggressive reordering.
Before reading the table, a few terms: the Ideal Graph is C2's internal representation of the program as a Sea of Nodes graph. GVN (Global Value Numbering) is an optimization that eliminates redundant computations by identifying expressions that always produce the same result. Escape analysis determines whether an object is accessible outside the method that created it. If not, the JVM can allocate it on the stack or decompose it into individual fields, avoiding heap allocation entirely. MachNodes are platform-specific machine instruction representations that replace Ideal nodes during instruction selection.
| Phase | What it does |
|---|---|
| Parsing | Bytecode to Ideal Graph with GVN and inlining |
| Optimization | Iterative GVN, escape analysis, loop optimizations, vectorization |
| Instruction Selection | Ideal nodes to MachNodes via pattern matching |
| Global Code Motion | Places floating nodes into basic blocks |
| Register Allocation | Briggs-Chaitin graph coloring |
| Code Emission | Final machine code |
Profiling, the bridge between interpretation and compilation. Each method has a MethodData (MDO) that accumulates, per bytecode point: invocation counters, receiver type profiles (the 2 most frequent types, for bimorphic inlining), branch frequencies, and deoptimization history. This data fuels C2's speculative optimizations.
Inline Caches and call site specialization. The JVM classifies each virtual call
site based on observed receiver types. A monomorphic call site has seen only one
type. The JIT can inline the target method directly. A bimorphic site has seen
exactly two types. The JIT generates a conditional branch checking both. A
megamorphic site has seen three or more types. The JIT gives up on direct inlining
and falls back to a virtual dispatch lookup via the vtable or itable. This transition from
monomorphic to megamorphic is one of the most common causes of performance cliffs in Java
applications, and understanding it helps explain many
-XX:+PrintCompilation patterns.
On-Stack Replacement (OSR): allows compiling a method that is inside a long-running loop and replacing the interpreted frame with a compiled one without waiting for the method to return.
Deoptimization: when speculative optimizations fail (unexpected type, new subclass,
null check), the compiled frame is reverted to interpreted. The method can be recompiled with
updated profiling. Common reasons: class_check,
null_check, unstable_if,
unreached, bimorphic.
Code Cache: stores all compiled code in native memory, segmented into three areas:
| Segment | Content | Lifetime |
|---|---|---|
| Non-method | Interpreter codelets, compiler buffers, VM internal code | Permanent (~5 MB) |
| Profiled nmethods | C1-compiled code (tiers 2/3) with profiling instrumentation | Short (replaced when C2 compiles) |
| Non-profiled nmethods | C2-compiled code (tier 4) and C1 tier 1 | Long (production code) |
Where it lives in the code
The CompileBroker, the orchestrator that decides what to compile and when:
src/hotspot/share/compiler/compileBroker.cpp
The C2 pipeline entry point, the Compile class:
src/hotspot/share/opto/compile.cpp
Escape analysis, where the JVM decides if it can eliminate an allocation:
src/hotspot/share/opto/escape.cpp
Deoptimization, the mechanism for "back to interpreter":
src/hotspot/share/runtime/deoptimization.cpp
The MethodData, the profiling structure that connects interpreter and JIT:
src/hotspot/share/oops/methodData.hpp
Why this matters in practice
Java benchmarks that do not warm up adequately are measuring the interpreter or C1, not the code that will run in production. That is why JMH (Java Microbenchmark Harness) exists: it manages warmup iterations, fork isolation, and deoptimization detection. If you benchmark without JMH, you are likely producing misleading numbers.
Megamorphic call sites are one of the most common silent performance killers:
// Monomorphic: the JIT sees only one type at this call site.
// It inlines process() directly, eliminating the virtual dispatch entirely.
List<Order> orders = new ArrayList<>();
for (Order order : orders) {
order.process(); // always StandardOrder.process() -> inlined
}
// Megamorphic: the JIT sees 4+ types at the same call site.
// It cannot inline and falls back to vtable lookup on every call.
List<Order> orders = loadMixedOrders(); // StandardOrder, ExpressOrder, BulkOrder, ReturnOrder...
for (Order order : orders) {
order.process(); // which implementation? vtable lookup every time
}
The megamorphic version can be 2-5x slower at that call site compared to monomorphic dispatch
because the JIT cannot inline across multiple implementations. This does not mean you should avoid
polymorphism. It means you should be aware of the cost when polymorphism occurs in hot paths.
-XX:+PrintCompilation and
-XX:+TraceDeoptimization reveal when this happens.
A single class loading event (plugin loaded, serialization of a new type, dynamic proxy created)
can invalidate dozens of compiled methods at once, causing a burst of recompilation. In
latency-sensitive systems, this manifests as an unexplained latency spike that self-resolves
after a few seconds (recompilation).
-XX:+TraceDeoptimization is the diagnostic tool.
AOT-compiled runtimes (Go, Rust, .NET Native AOT) produce final machine code at build time. That code runs at full speed from the first instruction, but it cannot adapt. If a virtual call site turns out to be monomorphic in production, an AOT compiler has no way to know and no way to inline it. The JVM observes this at runtime and eliminates the dispatch entirely.
Interpreted runtimes (CPython, Ruby/CRuby) execute without compilation at all. V8 (Node.js) and LuaJIT sit between these extremes, with lightweight JIT compilation but without the deep speculative pipeline of C2. The closest comparable system to HotSpot is .NET's RyuJIT with dynamic PGO, which also uses tiered compilation and profile-guided recompilation. The JVM's advantage is the maturity and depth of its speculative optimization pipeline, built over two decades of production feedback. The trade-off is warmup time: AOT runtimes deliver peak performance immediately, the JVM takes seconds to minutes.
6. Memory Management: Heap and Allocation
What and why
The JVM manages all Java object memory automatically. This includes allocation (creating objects) and deallocation (garbage collection). The heap is divided into managed areas, and allocation is optimized to be extremely fast.
How it works: TLABs
Object allocation is one of the most frequent operations. HotSpot optimizes it with TLABs (Thread-Local Allocation Buffers): each Java thread receives a private buffer inside Eden (the young part of the heap). Allocation is bump pointer: the thread simply advances a cursor forward through a contiguous block of memory. The pointer moves by the object size and the space behind it becomes the new object. No synchronization, no locks, no atomic CAS. It takes ~6 machine instructions.
flowchart TD
NEW["new Object()"] --> CHECK{"TLAB has space?"}
CHECK -->|Yes| BUMP["Bump pointer: advance cursor"]
CHECK -->|No| REFILL["Retire current TLAB, request new one"]
REFILL --> EDEN{"Eden has space?"}
EDEN -->|Yes| NEWTLAB["Allocate new TLAB from Eden"]
EDEN -->|No| GC["Trigger minor GC"]
GC --> RETRY["Retry allocation"]
style BUMP fill:#1e293b,stroke:#10b981
style GC fill:#1e293b,stroke:#ef4444
CAS (Compare-And-Swap) is a CPU instruction that updates a memory location only if it still holds an expected value. It is the building block of lock-free data structures, but even a single CAS is far more expensive than the bump pointer fast path.
Each thread owns its TLAB exclusively, bump pointer allocation takes ~6 machine instructions with zero synchronization. When a TLAB is exhausted, the remaining space is filled with a filler object (so the GC can walk the heap linearly) and a new TLAB is carved out of Eden.
When the TLAB is exhausted:
- Fill remaining space with a filler object
- Allocate a new TLAB from Eden
- If Eden is full, trigger a minor GC
- If allocation still fails, OOM
Objects too large for a TLAB follow a different path. In G1 (the default collector), objects that are 50% or more of a region size are allocated directly as humongous regions in the old generation, bypassing Eden entirely. In other collectors, oversized objects are allocated in the shared Eden area using atomic CAS operations.
Where it lives in the code
The heap allocation path, see how obj_allocate delegates to MemAllocator:
// src/hotspot/share/gc/shared/collectedHeap.inline.hpp
inline oop CollectedHeap::obj_allocate(Klass* klass, size_t size, TRAPS) {
ObjAllocator allocator(klass, size, THREAD);
return allocator.allocate();
}
src/hotspot/share/gc/shared/collectedHeap.inline.hpp
The TLAB fast path, in memAllocator.cpp, allocate() tries the current thread's TLAB first. If exhausted, the slow path allocates a new TLAB or goes directly to Eden:
src/hotspot/share/gc/shared/memAllocator.cpp
TLAB configuration:
src/hotspot/share/gc/shared/tlab_globals.hpp
Why this matters in practice
Knowing that allocation is bump-pointer fast (~6 instructions) but GC cost is proportional to live objects changes a fundamental design instinct:
// This is FAST in Java. Each new StringBuilder() is a bump-pointer
// allocation in the thread's TLAB. The object dies young in Eden.
// GC cost is near zero for short-lived objects.
public String formatMessage(String user, String action) {
return new StringBuilder()
.append(user).append(": ").append(action)
.toString();
}
// This is SLOWER despite looking "optimized." The pool adds synchronization
// overhead on borrow/return, the pooled objects survive to old gen (increasing
// GC scanning cost), and the CAS contention on the pool defeats the whole
// purpose of lock-free TLAB allocation.
public String formatMessage(String user, String action) {
StringBuilder sb = pool.borrow(); // lock or CAS
try {
return sb.append(user).append(": ").append(action).toString();
} finally {
sb.setLength(0);
pool.release(sb); // lock or CAS again
}
}
In Java, creating short-lived objects and letting them die in Eden is often the correct and
cheapest approach. Object pooling is counterproductive unless the objects are genuinely expensive
to create (database connections, SSL contexts) or very large (direct byte buffers). When
troubleshooting allocation pressure, -XX:+PrintTLAB reveals
per-thread allocation rates and slow-path frequency.
Most runtimes with managed memory use similar thread-local allocation strategies. Go uses per-P mcache allocators with bump-pointer semantics. .NET uses per-thread allocation contexts in its managed heap. V8 allocates in a generational heap with semi-space copying. The fast path is comparable across all of them. Where the JVM pulls ahead is downstream: C2's escape analysis can determine at runtime that an object never escapes the method, and eliminate the allocation entirely by decomposing the object into scalar values that live in CPU registers. Go performs escape analysis at compile time (AOT), but with less information: it cannot observe runtime behavior, so its decisions are more conservative. Runtimes without GC (Rust, C/C++) place allocation responsibility entirely on the developer, avoiding pause-time costs but requiring manual or ownership-based memory management.
7. Garbage Collection
What and why
The GC identifies unreachable objects in the heap and reclaims their space for new allocations. The JVM offers multiple collectors, each with different trade-offs between throughput, pause latency, and footprint.
How it works
The generational principle: most objects die young (weak generational hypothesis). Generational collectors divide the heap into young generation (newly created objects) and old generation (objects that survived multiple collections). Young GCs are frequent and fast. Old GCs are rarer and more expensive.
| Collector | Strategy | Pauses | Best for |
|---|---|---|---|
| Serial GC | Single-threaded, generational | STW, proportional to heap | ≤1 core, small heaps |
| Parallel GC | Multi-threaded, same architecture | STW, optimized for throughput | Batch processing |
| G1 GC (default) | Region-based, incremental, concurrent marking | STW with configurable pause targets | General-purpose |
| ZGC | Region-based, colored pointers, concurrent compaction | Sub-millisecond regardless of heap size | Latency-sensitive, up to 16 TB |
| Shenandoah | Region-based, forwarding pointers, concurrent compaction | Sub-millisecond, supports compressed oops | Latency-sensitive |
G1 GC in depth
The heap is divided into equal-sized regions (1-32 MB). Each region is dynamically assigned as Eden, Survivor, Old, Humongous, or Free. There is no fixed physical separation between generations.
| Region type | Role | Generation |
|---|---|---|
| Eden (E) | Newly allocated objects, TLABs live here | Young |
| Survivor (S) | Objects that survived at least one young GC | Young |
| Old (O) | Objects promoted after surviving multiple GCs | Old |
| Humongous (H) | Objects ≥50% of a region size, allocated directly in old gen | Old |
| Free (F) | Available for assignment to any role | Unassigned |
G1 adjusts the number of young-generation regions dynamically to meet the configured pause target
(-XX:MaxGCPauseMillis).
- Remembered Sets and Card Table: When compiled code writes a reference (
putfield), a write barrier marks the corresponding card as "dirty." Concurrent refinement threads process these dirty cards in the background, updating remembered sets (per-region structures that track incoming references). - Concurrent Marking (SATB): G1 uses Snapshot-At-The-Beginning. A pre-write barrier captures references being overwritten to avoid losing live objects.
- Collection types: Young GC (evacuates Eden+Survivor), Mixed GC (evacuates young + old regions with the most garbage, garbage-first), Full GC (last resort, compacts entire heap).
ZGC: sub-millisecond pauses regardless of heap size (8 MB to 16 TB). The core mechanism is colored pointers: metadata bits in 64-bit pointers. On every heap reference load, a load barrier checks the pointer "color" — if correct, fast path; if incorrect, the barrier "heals" the pointer (updates the address if the object was relocated). Generational ZGC adds store barriers for intergenerational tracking.
Shenandoah: also sub-millisecond, but via forwarding pointers instead of colored pointers. In early versions, a forwarding pointer was an extra word prepended to each object header. In modern Shenandoah (JDK 17+), the forwarding information is stored inside the mark word when the object is evacuated, eliminating the extra per-object overhead. During concurrent evacuation, application threads check the mark word and follow the redirect if the object has been relocated. Load barriers ensure this happens transparently. Advantage: supports compressed oops.
Where it lives in the code
Each GC has its own subdirectory under gc/:
src/hotspot/share/gc/g1/, G1 complete
src/hotspot/share/gc/shenandoah/, Shenandoah
The BarrierSet framework, how each GC injects its barriers into the compilers:
src/hotspot/share/gc/shared/barrierSet.hpp
G1 provides separate implementations for C1 and C2:
src/hotspot/share/gc/g1/c1/g1BarrierSetC1.cpp
src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp
Why this matters in practice
Architectural impact of cross-generational references: in G1, every reference write from an old-generation object to a young-generation object triggers a write barrier and eventually updates a remembered set. Consider two cache designs:
// Problematic: unbounded static cache lives in old gen forever.
// Every put() writes a reference from old gen to a young-gen value,
// triggering a write barrier and remembered set update.
// As the cache grows, GC refinement threads consume more CPU.
private static final Map<Long, UserSession> sessionCache = new ConcurrentHashMap<>();
public void onRequest(long userId) {
sessionCache.put(userId, new UserSession(userId)); // old-to-young ref every time
}
// Better: bounded cache with eviction. Size is controlled,
// old entries are removed (reducing cross-gen references),
// and the GC has fewer remembered set entries to maintain.
private static final Cache<Long, UserSession> sessionCache = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterAccess(Duration.ofMinutes(30))
.build();
The symptom of excessive cross-generational references: the
-XX:MaxGCPauseMillis target is met, but throughput drops because
concurrent refinement threads are consuming CPU processing dirty cards. JFR events
G1MMU and GCPhasePause break
down where time is spent.
Most runtimes offer a single garbage collector. Go ships one concurrent, non-compacting, non-generational collector optimized for low latency. .NET offers Workstation and Server GC modes with a generational architecture, but a narrower range of trade-offs. Runtimes without GC (Rust, C/C++) eliminate pause-time variability entirely. CPython uses reference counting with a cycle detector. The JVM is unique in offering five production collectors that cover the full spectrum from minimal-footprint (Serial) to sub-millisecond latency on terabyte heaps (ZGC, Shenandoah), selectable at startup without changing application code.
8. Runtime: Threads and Synchronization
What and why
The runtime subsystem manages the lifecycle of threads, synchronization between them, and the bridge to native code (JNI). It is the connective tissue that orchestrates the other subsystems.
How it works: thread model
HotSpot uses a 1:1 model for platform threads: each Java
Thread corresponds to exactly one OS native thread.
Virtual Threads implement an M:N model: millions of virtual
threads multiplexed over a pool of carrier threads (ForkJoinPool with work-stealing). The core
mechanism is continuations, native stackful coroutines:
sequenceDiagram
participant S as Scheduler
participant C as Carrier Thread
participant VT as Virtual Thread
S->>C: mount VT
C->>VT: resume continuation
VT->>VT: execute code
VT->>VT: hit blocking I/O
VT->>C: freeze continuation
C->>S: VT unmounted
S->>C: mount another VT
Note over VT: Stack saved to heap
Note over C: Reused for other VTs
Virtual thread stacks start at a few hundred bytes and grow dynamically on the heap (subject to GC), versus ~1 MB per platform thread.
Synchronization evolution
| State | Tag Bits | Mechanism |
|---|---|---|
| Unlocked | 01 | Normal state, no thread holds the lock |
| Lightweight locked | 00 | CAS flips tag bits. Header preserved in-place. Lock stack per-thread. |
| Inflated (ObjectMonitor) | 10 | Full runtime structure with wait queue, adaptive spinning, Object.wait()/notify() support |
| Marked by GC | 11 | Used during garbage collection |
flowchart LR
U["Unlocked: 01"] -->|"CAS flip bits"| L["Lightweight: 00"]
L -->|"contention detected"| I["Inflated: 10"]
I -->|"async deflation"| U
style U fill:#1e293b,stroke:#10b981
style L fill:#1e293b,stroke:#f59e0b
style I fill:#1e293b,stroke:#ef4444
When there is contention (multiple threads competing for the same lock), the lock is
inflated to an ObjectMonitor, a full runtime
structure that manages: the owning thread, recursion level, entry queue (with adaptive spinning
before OS-level blocking), and wait set (threads in
Object.wait()). Deflation occurs asynchronously when no thread
references the monitor.
Where it lives in the code
Threading:
src/hotspot/share/runtime/javaThread.hpp, JavaThread
src/java.base/share/classes/java/lang/VirtualThread.java, Virtual Threads (Java side)
src/hotspot/share/runtime/continuationFreezeThaw.cpp, freeze (capture stack) and thaw (resume) of continuations
Synchronization:
src/hotspot/share/runtime/synchronizer.cpp, monitorenter/monitorexit dispatch
src/hotspot/share/runtime/objectMonitor.cpp, ObjectMonitor (fat lock)
src/hotspot/share/runtime/lightweightSynchronizer.cpp, lightweight locking
Why this matters in practice
Virtual threads change the architecture of I/O-bound services. The one-thread-per-request model that was impractical with platform threads (10,000 threads = 10 GB of stack memory) is now the recommended approach. But the benefit only materializes if blocking operations allow the virtual thread to unmount from the carrier. Consider:
// Historically PINNED the carrier thread. Before JEP 491, the synchronized
// block prevented the virtual thread from unmounting when the I/O call blocked.
// JEP 491 addressed this by reworking monitor ownership tracking, but older
// JDK versions and some edge cases (native frames on stack) can still pin.
public synchronized String fetchData(String url) {
return httpClient.send(request, BodyHandlers.ofString()).body();
}
// Continuation-aware alternative. ReentrantLock has always supported
// unmounting: when the virtual thread blocks on I/O, it unmounts from
// the carrier, freeing it for other virtual threads.
private final ReentrantLock lock = new ReentrantLock();
public String fetchData(String url) {
lock.lock();
try {
return httpClient.send(request, BodyHandlers.ofString()).body();
} finally {
lock.unlock();
}
}
On JDK versions before JEP 491,
synchronized with blocking I/O is a significant scalability
bottleneck for virtual threads. On newer versions the problem is largely resolved, but
ReentrantLock remains the safer choice in libraries that need
to support multiple JDK versions. The diagnostic: JFR event
jdk.VirtualThreadPinned identifies pinning occurrences.
If your application spends significant time in inflated monitors, the answer is rarely to tune
lock parameters. The answer is to redesign the data access pattern: reduce critical section
duration, use lock striping, switch to java.util.concurrent
structures, or eliminate shared mutable state entirely. JFR events
jdk.JavaMonitorWait and
jdk.JavaMonitorEnter quantify the cost.
Go pioneered M:N scheduling for mainstream use with goroutines (2012). Erlang/BEAM has run lightweight processes with preemptive scheduling since the 1980s. The JVM arrived later with virtual threads in JDK 21 (2023), but with a critical advantage: ecosystem integration. Virtual threads work transparently with the entire existing Java library surface (JDBC drivers, HTTP clients, logging frameworks, serialization) without requiring new APIs or concurrency patterns. Goroutines require channel-based communication and a different programming model. Erlang requires the OTP actor model. Virtual threads bring M:N scheduling to the imperative, thread-per-request style that most enterprise Java code already uses.
On the synchronization side, Rust eliminates data races at compile time through ownership and borrowing rules, a fundamentally different approach that trades runtime flexibility for compile-time safety. Go uses a simpler mutex model without the lightweight/inflated escalation that the JVM performs.
9. Safepoints: The Universal Coordination Mechanism
What and why
Several JVM operations require all Java threads to be in a safe and predictable state: STW GC phases, deoptimization, class redefinition. The mechanism that coordinates this is safepoints.
How it works
The mechanism is cooperative via page-trap polling:
sequenceDiagram
participant VM as VMThread
participant T1 as Thread 1
participant T2 as Thread 2
participant PP as Polling Page
VM->>PP: mprotect PROT_NONE
Note over PP: Page marked unreadable
T1->>PP: load poll
PP-->>T1: SIGSEGV
T1->>T1: signal handler enters safepoint
T2->>PP: load poll
PP-->>T2: SIGSEGV
T2->>T2: signal handler enters safepoint
VM->>VM: all threads safe
VM->>VM: execute operation
VM->>PP: mprotect PROT_READ
T1->>T1: resume
T2->>T2: resume
Compiled and interpreted code contain periodic polls: a load from the polling page. During normal execution, this load hits the L1 cache (~1 cycle, negligible cost). When the VM needs a safepoint, it marks the page as unreadable. The next poll triggers a page fault caught by the signal handler. For the interpreter, the JVM swaps the dispatch table for a variant with safepoint checks.
JavaThread structure). This
enables operations like stack trace sampling or lock revocation on a single thread without pausing
the entire application.
Where it lives in the code
src/hotspot/share/runtime/safepoint.cpp
src/hotspot/share/runtime/safepointMechanism.cpp
src/hotspot/share/runtime/handshake.cpp, thread-local handshakes
Why this matters in practice
Confusing them is one of the most common troubleshooting mistakes. A GC pause is one reason
for a safepoint, but deoptimization, class redefinition, thread dumps, and biased lock revocation
are others. The time-to-safepoint (TTSP) can be significant: a thread inside a
counted loop that the JIT optimized to remove safepoint polls can delay the entire safepoint by
hundreds of milliseconds. The symptom: GC logs show short GC pauses, but application latency
shows long pauses. The diagnostic:
-Xlog:safepoint shows TTSP separately from operation time.
-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=2000 identifies
threads that take too long to reach safepoint.
10. How the Subsystems Connect
The most important insight about HotSpot is that performance emerges not from individual components, but from their coordinated interactions.
Subsystem interaction map
| Source | Target | Interaction |
|---|---|---|
| Interpreter / C1 | C2 JIT | Profiling data in MethodData feeds speculative optimizations |
| C2 JIT | GC | Write/load barriers injected via BarrierSet. OOP maps tell GC where references are in compiled code |
| Class Loading | C2 JIT | New subclass loaded invalidates class hierarchy dependency, triggers deoptimization |
| C2 JIT | Interpreter | Deoptimization reverts compiled frames to interpreted frames |
| Safepoints | GC | Coordinates STW phases. Ensures all roots are mapped |
| Safepoints | JIT | Coordinates bulk deoptimization when dependencies are invalidated |
| Runtime (JVMTI) | Class Loading | Class redefinition is a Runtime operation (via JVMTI agent) that uses a safepoint and modifies class metadata in the Class Loading subsystem |
| Runtime | All | VMThread executes safepoint operations. CompileBroker manages compilation. FJP schedules virtual threads |
| JFR / JVMTI | All | Observes events from every subsystem with minimal overhead |
The interactions form two distinct cycles: the compilation cycle (how code gets faster) and the runtime coordination cycle (how subsystems stay in sync).
The compilation cycle
flowchart LR
INT["Interpreter / C1"] -->|"profiling data"| JIT["C2 JIT"]
JIT -->|"barriers + OOP maps"| GC["Garbage Collector"]
style INT fill:#1e293b,stroke:#f59e0b
style JIT fill:#1e293b,stroke:#6366f1
style GC fill:#1e293b,stroke:#10b981
The Interpreter and C1 collect profiling data (MethodData) that feeds C2's speculative optimizations. C2 generates write/load barriers for the active GC and OOP maps that tell the GC where references live in compiled frames.
The invalidation cycle
flowchart LR
CL["Class Loading"] -->|"invalidates dependencies"| JIT["C2 JIT"]
JIT -->|"deoptimization"| INT["Interpreter"]
INT -->|"new profiling data"| JIT
style CL fill:#1e293b,stroke:#8b5cf6
style JIT fill:#1e293b,stroke:#6366f1
style INT fill:#1e293b,stroke:#f59e0b
When a new class is loaded that breaks a C2 assumption (e.g., a new subclass appears), the dependency system marks affected compiled methods as invalid. Deoptimization reverts them to the interpreter, which collects fresh profiling data, and the cycle restarts.
The orchestration layer
The Runtime subsystem (VMThread, CompileBroker, ForkJoinPool scheduler) sits above all of this and orchestrates through safepoints, the cooperative mechanism that pauses threads when a stop-the-world operation is needed (GC STW phases, bulk deoptimization, class redefinition).
The complete lifecycle of a hot method
- Class Loading loads the class and resolves dependencies in the SystemDictionary.
- Interpreter executes the method, collecting data in the MethodData.
- C1 compiles with full profiling (Tier 3), inserting GC barriers and OOP maps. Code goes to the code cache (profiled segment).
- While executing, C1 code continues collecting rich data in the MethodData.
- C2 compiles with aggressive optimizations (Tier 4) using MethodData: devirtualization, inlining, escape analysis, vectorization. Registers dependencies. Inserts GC barriers and generates OOP maps.
- C2 code goes to the code cache (non-profiled segment). Previous C1 code becomes zombie.
- If a new class invalidates a dependency, deoptimization via safepoint brings execution back to step 2.
- The cycle repeats with updated profiling.
11. Where the JVM Pays the Price
The JVM's adaptive architecture delivers significant advantages in sustained workloads. But it comes with trade-offs that other runtime models do not pay. Understanding these trade-offs, and the work the OpenJDK community is doing to address them, is essential for making informed technology decisions.
Startup time
The cost. AOT-compiled runtimes deliver running processes in single-digit milliseconds. The JVM needs to load classes, verify bytecode, interpret, and then JIT-compile. In serverless environments, container orchestration with aggressive scale-out, or CLI tools, this startup cost compounds with every cold start.
Memory footprint
The cost. The JVM loads a full runtime before any application code runs. A minimal Java process consumes tens of megabytes. Comparable programs in Go, Rust, or .NET trimmed/AOT start with single-digit megabytes. In containerized environments where thousands of instances run simultaneously, this per-instance overhead multiplies.
jlink builds custom runtime images containing only the modules
the application needs. Project Leyden will reduce the amount of metadata loaded at startup.
Warmup latency
The cost. Until the JIT reaches Tier 4 (C2), the application runs with suboptimal code. The first seconds of a process execute interpreted bytecode or C1-compiled code that is correct but not peak-optimized. In environments with frequent deployments or aggressive horizontal scaling, applications may spend a meaningful fraction of their lifetime below peak performance.
Operational complexity
The cost. GC selection, heap sizing, code cache tuning, classpath vs modulepath, JIT flags. The JVM has a large operational surface. Runtimes like Go made a deliberate choice of simplicity: one garbage collector, no tuning flags, a single static binary.
12. Conclusion
This continuous feedback loop between interpretation, profiling, compilation, and deoptimization is what defines the JVM's adaptive model. No single subsystem is responsible for peak performance. It emerges from the interaction between all of them: the interpreter collects data, the JIT uses it to speculate, the GC cooperates through barriers and OOP maps, and safepoints keep everything synchronized.
This is why the JVM remains such a strong choice in enterprise environments. In applications that run for hours, days, or months under sustained load, the adaptive model delivers performance that improves over time, shaped by the actual production workload rather than assumptions made at build time. Five garbage collectors, virtual threads, speculative devirtualization, and escape analysis are not features in isolation. They are parts of a system designed to extract maximum performance from long-running, high-concurrency workloads.
The JVM is not the right tool for every scenario. Its startup cost, memory footprint, and warmup latency are real trade-offs that matter in serverless functions, CLI tools, and resource-constrained environments. But the OpenJDK community is actively closing those gaps, and the engineering momentum behind projects like CRaC, Leyden, and Lilliput shows that the platform is not standing still.
Understanding how these subsystems work together does not just satisfy curiosity. It changes the way you design applications, diagnose production issues, and make technology decisions. That is the goal of this study: not to prove that the JVM is the best runtime, but to give you the depth of understanding to know when it is, and why.
13. References
Official Documentation
- The Java Virtual Machine Specification, Java SE 21 Edition
- HotSpot Runtime Overview
- HotSpot Glossary of Terms
- Inside.java, Official Java team blog
Repository
- github.com/openjdk/jdk, Full source code
- DeepWiki, OpenJDK/JDK, Documented repository navigation
OpenJDK Projects
- Project CRaC, Coordinated Restore at Checkpoint
- Project Leyden, Condensing the JVM startup
Referenced JEPs
- JEP 197: Segmented Code Cache
- JEP 243: JVMCI
- JEP 312: Thread-Local Handshakes
- JEP 374: Deprecate and Disable Biased Locking
- JEP 387: Elastic Metaspace
- JEP 439: Generational ZGC
- JEP 444: Virtual Threads
- JEP 450 / JEP 519: Compact Object Headers
- JEP 474: ZGC Generational Mode by Default
- JEP 475: Late Barrier Expansion for G1
- JEP 491: Synchronize Virtual Threads without Pinning