MCP Server Performance Benchmark v2: 15 Implementations, I/O-Bound Workloads
An expanded benchmark covering 15 MCP server implementations across Rust, Java (Spring MVC, WebFlux, Virtual Threads), Quarkus, Micronaut (JVM and GraalVM native images), Go, Bun, Node.js, and Python. Three independent runs totaling 39.9 million requests with real Redis and HTTP API I/O workloads and 0% error rate across all implementations.
This benchmark evaluates a specific question: how do different language runtimes and frameworks perform when implementing MCP servers under I/O-bound workloads at 50 concurrent virtual users? The results reflect this test scenario and should be interpreted within that context. They do not constitute a general ranking of programming languages or frameworks.
MCP is a nascent protocol and its ecosystem (SDKs, tooling, server frameworks) is evolving rapidly. Several SDKs used here are pre-release or early-access (Micronaut MCP SDK 0.0.19). Results may differ with future SDK versions, different concurrency levels, or alternative workload patterns.
The intent of this research is constructive contribution to the MCP ecosystem. No technology evaluated here is unsuitable by nature: each serves real use cases and the communities behind them continue to evolve their implementations. A lower benchmark rank reflects performance under this specific workload, not a judgment of quality or fitness for purpose. We welcome corrections, alternative configurations, and contributions via the benchmark repository.
Abstract
This experiment presents a comprehensive performance analysis of 15 Model Context Protocol (MCP) server implementations spanning Rust, Java (Spring MVC, WebFlux, Virtual Threads), Quarkus, Micronaut (JVM and GraalVM native images), Go, Bun, Node.js, and Python. Three independent runs totaling 39.9 million requests were executed with I/O-bound workloads (Redis + HTTP API), achieving a 0% error rate across all 15 servers in all 3 runs.
Key Findings: Rust leads throughput at 4,845 RPS with only 10.9 MB RAM and a CV of 0.04%. Quarkus leads latency at 4.04ms average and 8.13ms P95. Go and Java remain competitive at 3,616 and 3,540 RPS respectively. Classic blocking I/O (Spring MVC) outperforms reactive WebFlux at 50 VUs. GraalVM native images uniformly reduce memory (27-81%) while reducing throughput (20-36%), with Quarkus-native as the best native trade-off. Bun delivers 2.2x the RPS of Node.js on identical application code. Python with 4 workers and uvloop reaches 259 RPS. The bottleneck is FastMCP session overhead, not the ASGI server.
Recommendations: For high-load production deployments, Rust offers unmatched throughput and resource efficiency. Quarkus is the optimal choice when latency SLAs are primary. Go provides an excellent balance of performance, memory, and operational simplicity. Java Spring MVC (blocking) remains a strong Tier 2 choice. Native images are justified where memory is constrained and throughput requirements are moderate (below 3,500 RPS). JavaScript and Python runtimes are better suited for low-to-moderate traffic MCP deployments.
Table of Contents
1. Introduction and Motivation
The MCP ecosystem is growing rapidly. Organizations adopting MCP servers face a widening set of implementation trade-offs: native compilation via GraalVM, reactive vs blocking I/O models within the JVM, alternative JavaScript runtimes, and the impact of real external I/O workloads on framework-level decisions. v2 was designed to surface these trade-offs empirically.
The v1 benchmark drew valid community feedback: no Quarkus, no GraalVM native images, no Virtual Threads, no reactive frameworks, no Micronaut, no Rust, and Python running in a single-worker default configuration. Version 2 is the direct experimental response to those criticisms, expanding from 4 to 15 implementations and replacing synthetic CPU tools with real Redis and HTTP API workloads.
- How do JVM-based implementations compare across blocking, reactive, and virtual thread concurrency models?
- What are the throughput and latency trade-offs between JVM and GraalVM native images under I/O-bound load?
- Does Rust belong in the MCP server performance conversation?
- Can Bun's JavaScriptCore runtime change the JavaScript performance story for MCP servers?
- What is the realistic production ceiling for optimized Python (multi-worker + uvloop)?
2. Experimental Setup
2.1 From v1 to v2: What Changed
| Dimension | v1 | v2 |
|---|---|---|
| Servers | 4 | 15 |
| Java variants | 1 (Spring Boot + Spring AI) | 6 (Spring MVC, WebFlux, VT + 3 native images) |
| New runtimes | None | Rust, Quarkus, Micronaut, Bun |
| Tools | 4 synthetic (fibonacci, HTTP, JSON, sleep) | 3 I/O-bound (Redis + HTTP API) |
| Total requests | ~3.9M | ~39.9M |
| Runs | 3 rounds | 3 full independent runs |
| CPU per container | 1.0 vCPU | 2.0 vCPUs |
| Memory per container | 1 GB | 2 GB |
2.2 Test Environment
| Component | Specification |
|---|---|
| Host | Microsoft Azure VM, 8 vCPUs, 32 GB RAM |
| OS | Ubuntu 24.04 LTS |
| Container Runtime | Docker with Docker Compose |
| CPU Limit (per MCP server) | 2.0 vCPUs |
| Memory Limit (per MCP server) | 2 GB |
| Infrastructure | Redis 7 Alpine (0.5 vCPU / 512 MB) + Go API service (2 vCPUs / 2 GB) |
| Network | Docker bridge (inter-container, localhost) |
| Test Runs | 3 independent runs (February 27-28, 2026) |
2.3 Server Implementations
| Server | Framework / SDK | Runtime |
|---|---|---|
| rust | rmcp 0.17.0 | Rust / Tokio |
| quarkus | Quarkus 3.31.4 MCP Server SDK | Java 21 / Vert.x |
| go | mcp-go | Go 1.23 |
| java | Spring Boot 4 MVC + Spring AI | Java 21 |
| java-vt | Spring Boot 4 + Project Loom | Java 25 |
| java-webflux | Spring Boot 4 WebFlux + Spring AI | Java 21 / Netty |
| micronaut | Micronaut 4.10.8 / MCP SDK 0.0.19 | Java 21 |
| quarkus-native | Quarkus native image | GraalVM 23 / native |
| java-native | Spring Boot native image | GraalVM 25 / native |
| java-vt-native | Spring Boot VT native image | GraalVM 25 / native |
| java-webflux-native | Spring Boot WebFlux native | GraalVM 25 / native |
| micronaut-native | Micronaut native image | GraalVM 23 / native |
| bun | Express + MCP SDK | Bun 1 / JavaScriptCore |
| nodejs | Express + MCP SDK | Node.js 22 / V8 |
| python | FastMCP + Starlette | Python 3.11 / CPython |
2.4 Benchmark Tools and Workload
Each server implements three identical tools performing I/O-bound operations against a Redis instance and an HTTP API service (100,000 products in-memory):
| Tool | I/O Operations | Performance Dimension |
|---|---|---|
search_products |
HTTP GET /products/search + Redis ZRANGE (parallel) |
Parallel async I/O, HTTP client pool |
get_user_cart |
Redis HGETALL, then HTTP GET /products/{id} + Redis LRANGE (parallel) |
Sequential + parallel I/O, Redis read patterns |
checkout |
HTTP POST /cart/calculate + Redis pipeline INCR+RPUSH+ZADD (parallel) |
Write throughput, Redis pipeline efficiency |
// k6 Load Profile — v2
export const options = {
stages: [
{ duration: '10s', target: 50 }, // Ramp-up to 50 VUs
{ duration: '5m', target: 50 }, // Sustained load
{ duration: '10s', target: 0 }, // Ramp-down
],
thresholds: {
'http_req_failed': ['rate<0.05'],
},
};
// First 60 seconds excluded from metrics (WARMUP_SECONDS=60)
2.5 Test Methodology
graph LR
A[Redis Flush] --> B[Redis Seed]
B --> C[Stop MCP Servers]
C --> D[Start Target Server]
D --> E[Health Check]
E --> F[Warmup 60s]
F --> G[k6 Run 5min]
G --> H[Stats Collection]
H --> I[Consolidate Results]
style D fill:#1e293b,stroke:#6366f1
style G fill:#1e293b,stroke:#10b981
Figure 1: Per-server test cycle ensuring isolation and reproducibility
- Redis Isolation: Flush and re-seed before each server to ensure identical data state across all 15 servers.
- Server Isolation: Only one MCP server running during each test period, eliminating resource contention.
- Warmup: 60 seconds of real tool calls excluded from metrics to allow JIT compilation on all code paths (5 init sessions + 9 tool call sessions per server).
- Sustained Load: 50 VUs, 5 minutes, all 3 tools called in rotation. VU N uses user-(N%1000), providing 50 distinct users.
- Parallel Stats Collection: Docker stats sampled alongside k6 metrics for CPU and memory.
- First run: February 27, 2026 at 21:08:47 UTC
- Second run: February 27, 2026 at 23:34:47 UTC
- Third run: February 28, 2026 at 00:00:47 UTC
3. Implementation Details
3.1 Rust (rmcp SDK)
During our benchmarking, we identified an anomalous fixed latency on every tool that performed
HTTP calls. In an isolated run with rmcp v0.16 defaults, the
search_products tool (pure Redis path) ran at 1.11ms avg, while
get_user_cart and checkout
were stuck at 40.84ms regardless of actual I/O time. We traced the cause to rmcp v0.16, which
hardcodes text/event-stream for all responses, even stateless
request-response exchanges. Every response carries chunked transfer-encoding, SSE framing, and
keep-alive pings: approximately 40ms of pure transport overhead per request on any tool that
returns an HTTP-originated payload. The MCP spec explicitly permits
application/json for stateless responses. rmcp was simply not
implementing that path.
This issue is fixed in rmcp v0.17.0, released on February 27, 2026, which
ships the json_response option as an official feature.
The benchmark was run with the equivalent patch applied to v0.16. The server implementation
has since been updated to use rmcp 0.17.0 from crates.io directly, with no local patch required.
| Configuration | RPS | Avg Latency | Tool Breakdown |
|---|---|---|---|
| Without json_response patch (SSE default) | 1,283 | 27.59 ms | search_products: 1.11ms / get_user_cart: 40.84ms / checkout: 40.84ms |
| With json_response patch | 4,845 | 5.09 ms | search_products: 6.12ms / get_user_cart: 5.63ms / checkout: 3.51ms |
// StreamableHttpServerConfig with json_response patch
StreamableHttpServerConfig {
stateful_mode: false,
json_response: true, // returns application/json directly instead of SSE
..Default::default()
}
// New branch in tower.rs (simplified)
if self.config.json_response {
let cancel = self.config.cancellation_token.child_token();
match tokio::select! {
res = receiver.recv() => res,
_ = cancel.cancelled() => None,
} {
Some(message) => {
let body = serde_json::to_vec(&message)?;
Ok(Response::builder()
.header(CONTENT_TYPE, JSON_MIME_TYPE)
.body(Full::new(Bytes::from(body)).boxed()))
}
None => Err(internal_error_response("empty response")(...))
}
}
We submitted the fix to the official modelcontextprotocol/rust-sdk
repository as PR #683.
The patch adds json_response: bool to
StreamableHttpServerConfig, backwards-compatible by default
(false preserves the original SSE path unchanged). It was refined
during code review (maintainers suggested tokio::select! with
cancellation safety and tracing::info! logging), merged, and
shipped as part of rmcp v0.17.0 on February 27, 2026.
Key Characteristics:
- rmcp 0.17.0 (official release) with
json_response: trueinStreamableHttpServerConfig - Tokio async runtime with deadpool-redis connection pool (pool size 100)
- Parallel tool handlers via
tokio::spawn - Redis pipeline combines 3 write operations in checkout into a single RTT
- 10.9 MB average RAM (lowest of all 15 servers)
- CV 0.04% (most stable across all 3 runs)
3.2 Quarkus (Vert.x / Mutiny)
quarkus.rest-client.api-service.connection-pool-size=1000
quarkus.rest-client.api-service.keep-alive-enabled=true
quarkus.redis.max-pool-size=100
quarkus.redis.max-pool-waiting=1000
Key Characteristics:
- Reactive Vert.x/Mutiny event loop model, non-blocking I/O throughout
- Lowest CPU usage among all Java frameworks (161.7% avg, vs 200%+ for others)
- 194.5 MB RAM (lowest JVM footprint among JVM Java variants)
- Best latency of all 15 servers: 4.04ms avg, 8.13ms P95, 11.16ms P99
3.3 Go (mcp-go)
var httpClient = &http.Client{
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 100,
IdleConnTimeout: 90 * time.Second,
},
Timeout: 10 * time.Second,
}
http.DefaultClient uses
MaxIdleConnsPerHost=2. Under 50 VUs, goroutines contended for two
idle keep-alive connections, causing TCP connection churn and P95 latency spikes of 61ms. Replacing
it with a tuned transport reduced P95 to 17.62ms.
Key Characteristics:
- Goroutine-per-request concurrency model with
net/httpstandard library - 23.9 MB average RAM (second lowest after Rust)
- Static binary, no runtime dependency
- CV 0.19% (third most stable overall)
3.4 Java (Spring MVC, Blocking I/O)
Key Characteristics:
- Spring Boot 4 MVC with Tomcat thread pool executor
- Sequential blocking I/O via
RestClientandStringRedisTemplateper request - No CompletableFuture or reactive primitives
- 368.1 MB average RAM (standard Spring Boot JVM footprint)
3.5 Java Virtual Threads (Project Loom)
Key Characteristics:
spring.threads.virtual.enabled=true(canonical Spring Boot 4 configuration)- Java 25 selected to include JEP 491 (delivered in Java 24): Virtual Threads can now acquire and release synchronized monitors without pinning the carrier thread. Before this fix, any synchronized block inside the call stack (including Spring internals) would pin the Virtual Thread to its OS carrier thread for the duration, eliminating the concurrency benefit for I/O-bound code.
- Same
RestClientandStringRedisTemplateas Java MVC - 349.7 MB RAM (slightly higher than MVC due to Virtual Thread scheduler metadata)
- Competitive throughput at 3,482 RPS
3.6 Java WebFlux (Reactor / Netty)
PooledByteBufAllocator. These buffers are invisible to
-Xmx and persist in GraalVM native images. This explains the high
memory peak (663 MB max) and why java-webflux-native also has elevated memory (351 MB avg) despite
native compilation.
Key Characteristics:
- Reactor event loop model, non-blocking I/O throughout
- Competitive average latency (8.89ms avg) but high P99 tail in checkout (47ms)
- 484.6 MB average RAM (highest among JVM variants, due to Netty off-heap buffers)
3.7 Micronaut
graalvm/native-image-community:23. The previous GraalVM version
could not fully inline Micronaut's annotation-driven dispatch at compile time, causing a -49% RPS
regression vs JVM in earlier testing (worst of any stack). GraalVM 23 improves closed-world
analysis for annotation metadata, reducing the regression significantly.
Key Characteristics:
- Micronaut MCP Server SDK 0.0.19 (pre-release)
- Compile-time dependency injection and annotation processing
- Netty server (same off-heap memory pattern as WebFlux)
- JVM variant competitive at 3,382 RPS
3.8 GraalVM Native Images: Cross-Stack Analysis
Five server stacks were benchmarked in both JVM and GraalVM native image configurations, revealing consistent trade-off patterns across all stacks.
| Stack | JVM RPS | Native RPS | RPS Regression | JVM RAM avg | Native RAM avg | RAM Saving |
|---|---|---|---|---|---|---|
| Quarkus | 4,739 | 3,449 | -27% | 194 MB | 36 MB | -81% |
| Java MVC | 3,540 | 2,316 | -35% | 368 MB | 178 MB | -52% |
| Java VT | 3,482 | 2,447 | -30% | 350 MB | 194 MB | -44% |
| WebFlux | 3,032 | 2,413 | -20% | 485 MB | 351 MB | -28% |
| Micronaut | 3,382 | 2,161 | -36% | 216 MB | 63 MB | -71% |
3.9 Node.js and Bun
Creating a new McpServer and
StreamableHTTPServerTransport per HTTP request is the
intentional design for stateless MCP servers. The SDK throws
"Stateless transport cannot be reused across requests" on any reuse attempt. This sets a fixed
5-10ms overhead floor per request that no framework swap can eliminate.
Experiments attempted and reverted: undici pool with
connections: 100 regressed RPS by 28%. Hono +
WebStandardStreamableHTTPServerTransport cost 23% RPS due to IncomingMessage-to-Request
adaptation overhead on Node.js. Express with StreamableHTTPServerTransport is retained.
// Cluster entry (WEB_CONCURRENCY=4 worker processes)
import cluster from 'cluster';
const WORKERS = parseInt(process.env.WEB_CONCURRENCY || '1');
if (cluster.isPrimary) {
for (let i = 0; i < WORKERS; i++) cluster.fork();
} else {
// Per-request McpServer (SDK design constraint — cannot be reused)
app.post('/mcp', async (req, res) => {
const server = createMcpServer();
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
await server.connect(transport);
await transport.handleRequest(req, res, req.body);
});
}
index.js with WEB_CONCURRENCY=4.
Bun's JavaScriptCore JIT and native fetch() deliver 876 RPS vs
Node.js 423 RPS (2.2x ratio). Memory cost: Bun uses 540.8 MB vs Node.js 389.2 MB (+154 MB). Both
runtimes saturate 2 vCPUs at approximately 200% with 4 workers.
3.10 Python (FastMCP + uvloop)
json_response=True option was already present in v1.
# Launch command
# uvicorn main:app --host 0.0.0.0 --port 8082 --workers 4 --loop uvloop
# FastMCP configuration: json_response=True eliminates SSE framing overhead
mcp = FastMCP("BenchmarkPythonServer", stateless_http=True, json_response=True)
# Shared HTTP client (avoids per-request TCP pool creation)
@asynccontextmanager
async def lifespan(app):
global _http_client
_http_client = httpx.AsyncClient(timeout=10.0)
async with mcp.session_manager.run():
yield
await _http_client.aclose()
Key Characteristics:
- GIL limits true parallelism within each worker process
- 4 workers x approximately 65 RPS per worker = 259 RPS total
- 258.6 MB average RAM
- The bottleneck is FastMCP session overhead in CPython, not uvicorn or network I/O
4. Results and Analysis
4.1 Overall Performance Metrics
| Server | RPS (avg) ▼ | Avg Latency | P95 Latency | Requests Served (3 runs) |
|---|---|---|---|---|
| rust | 4,845 | 5.09 ms | 10.99 ms | 4,724,624 |
| quarkus | 4,739 | 4.04 ms | 8.13 ms | 4,620,520 |
| go | 3,616 | 6.87 ms | 17.62 ms | 3,525,424 |
| java | 3,540 | 6.13 ms | 13.71 ms | 3,452,064 |
| java-vt | 3,482 | 9.03 ms | 18.43 ms | 3,395,384 |
| quarkus-native | 3,449 | 10.36 ms | 15.92 ms | 3,362,784 |
| micronaut | 3,382 | 9.75 ms | 17.00 ms | 3,297,208 |
| java-webflux | 3,032 | 8.89 ms | 27.48 ms | 2,956,424 |
| java-vt-native | 2,447 | 19.06 ms | 36.82 ms | 2,385,880 |
| java-webflux-native | 2,413 | 14.43 ms | 44.17 ms | 2,353,056 |
| java-native | 2,316 | 16.20 ms | 42.44 ms | 2,258,592 |
| micronaut-native | 2,161 | 20.75 ms | 36.94 ms | 2,107,080 |
| bun | 876 | 48.46 ms | 98.50 ms | 853,736 |
| nodejs | 423 | 123.50 ms | 200.07 ms | 412,888 |
| python | 259 | 251.62 ms | 342.41 ms | 252,952 |
| Server | CPU avg (%) | RAM avg (MB) | Error Rate |
|---|---|---|---|
| rust | 117.9% | 10.9 MB | 0% |
| quarkus | 161.7% | 194.5 MB | 0% |
| java-vt | 184.9% | 349.7 MB | 0% |
| micronaut | 190.5% | 216.3 MB | 0% |
| go | 209.9% | 23.9 MB | 0% |
| java | 206.0% | 368.1 MB | 0% |
| quarkus-native | 204.6% | 36.1 MB | 0% |
| java-webflux | 207.2% | 484.6 MB | 0% |
| java-vt-native | 211.7% | 193.8 MB | 0% |
| java-webflux-native | 204.4% | 351.2 MB | 0% |
| java-native | 202.8% | 178.2 MB | 0% |
| micronaut-native | 233.2% | 63.0 MB | 0% |
| bun | 205.8% | 540.8 MB | 0% |
| nodejs | 202.2% | 389.2 MB | 0% |
| python | 206.6% | 258.6 MB | 0% |
4.2 Latency Analysis
Average latency measurements reveal four distinct performance tiers. The top tier (Rust, Quarkus, Go, Java) operates in the 4-7ms range with I/O-bound workloads. This contrasts sharply with the v1 sub-millisecond averages, which reflected synthetic CPU-bound tools. The v2 numbers represent realistic production latency with actual Redis and HTTP network round-trips.
4.3 Throughput Comparison
Throughput measurements reveal three clear clusters: Rust and Quarkus at 4,700-4,850 RPS, the Java/Go cluster at 3,000-3,620 RPS, and the JS/Python group at 250-880 RPS. The gap between Tier 1 and Tier 4 is approximately 19x (Rust vs Python). Within the Java ecosystem, the 6-way spread from java-webflux-native (2,161 RPS) to quarkus (4,739 RPS) illustrates the significant impact of framework, concurrency model, and compilation strategy.
4.4 Resource Efficiency
| Server | RPS / CPU% | RPS / MB RAM | CPU Efficiency Rank |
|---|---|---|---|
| rust | 41.1 | 444.5 | 1st |
| quarkus | 29.3 | 24.4 | 2nd |
| java-vt | 18.8 | 10.0 | 3rd |
| micronaut | 17.7 | 15.6 | 4th |
| go | 17.2 | 151.3 | 5th |
| java | 17.2 | 9.6 | 5th |
| quarkus-native | 16.9 | 95.5 | 7th |
| java-webflux | 14.6 | 6.3 | 8th |
| java-webflux-native | 11.8 | 6.9 | 9th |
| java-vt-native | 11.6 | 12.6 | 10th |
| java-native | 11.4 | 13.0 | 11th |
| micronaut-native | 9.3 | 34.3 | 12th |
| bun | 4.3 | 1.6 | 13th |
| nodejs | 2.1 | 1.1 | 14th |
| python | 1.3 | 1.0 | 15th |
4.5 Tool-Specific Performance
Table 5 breaks down average latency (ms, consolidated across 3 runs) per tool, with each panel sorted independently to show which server leads for that specific operation.
search_products
HTTP GET + Redis ZRANGE (parallel)
| Server | Avg (ms) |
|---|---|
| quarkus | 4.37 |
| rust | 6.12 |
| java | 6.41 |
| go | 8.41 |
| java-vt | 9.00 |
| quarkus-native | 9.71 |
| java-webflux | 9.76 |
| micronaut | 9.82 |
| java-native | 13.97 |
| java-webflux-native | 15.12 |
| java-vt-native | 15.86 |
| micronaut-native | 20.87 |
| bun | 41.47 |
| nodejs | 119.79 |
| python | 244.50 |
get_user_cart
Redis HGETALL + HTTP + LRANGE (parallel)
| Server | Avg (ms) |
|---|---|
| quarkus | 4.35 |
| rust | 5.63 |
| java | 5.63 |
| go | 6.65 |
| java-vt | 8.24 |
| java-webflux | 8.56 |
| micronaut | 10.36 |
| quarkus-native | 12.45 |
| java-native | 12.74 |
| java-webflux-native | 14.63 |
| java-vt-native | 18.14 |
| micronaut-native | 22.03 |
| bun | 59.31 |
| nodejs | 141.88 |
| python | 260.39 |
checkout
HTTP POST + Redis pipeline (parallel)
| Server | Avg (ms) |
|---|---|
| quarkus | 3.38 |
| rust | 3.51 |
| go | 5.57 |
| java | 6.35 |
| java-webflux | 8.34 |
| quarkus-native | 8.93 |
| micronaut | 9.06 |
| java-vt | 9.83 |
| java-webflux-native | 13.54 |
| micronaut-native | 19.34 |
| java-native | 21.88 |
| java-vt-native | 23.18 |
| bun | 44.61 |
| nodejs | 108.84 |
| python | 249.97 |
- checkout is consistently the fastest tool for top performers (Quarkus 3.38ms, Rust 3.51ms). Redis pipeline combines 3 write operations into 1 RTT, eliminating the per-operation network overhead.
- search_products is the slowest tool for most servers. It requires a parallel HTTP GET + Redis ZRANGE, and the HTTP call to the API service dominates the latency.
- Java MVC's sequential I/O is visible in get_user_cart, where the server must wait for HGETALL to complete before firing the HTTP call, unlike reactive implementations that parallelize immediately.
4.6 Stability and Reproducibility
| Server | CV ▲ | Mean RPS | Std Dev | Stability |
|---|---|---|---|---|
| rust | 0.04% | 4,845 | 2 | Excellent |
| java-webflux-native | 0.10% | 2,413 | 2 | Excellent |
| go | 0.19% | 3,616 | 7 | Excellent |
| java-vt-native | 0.36% | 2,447 | 9 | Excellent |
| java-native | 0.44% | 2,316 | 10 | Excellent |
| quarkus | 0.50% | 4,739 | 24 | Excellent |
| bun | 0.52% | 876 | 5 | Excellent |
| micronaut-native | 0.57% | 2,161 | 12 | Excellent |
| java-webflux | 0.62% | 3,032 | 19 | Excellent |
| java | 0.64% | 3,540 | 23 | Excellent |
| micronaut | 0.70% | 3,382 | 24 | Excellent |
| java-vt | 0.92% | 3,482 | 32 | Excellent |
| quarkus-native | 1.13% | 3,449 | 39 | Excellent |
| python | 1.58% | 259 | 4 | Excellent |
| nodejs | 1.61% | 423 | 7 | Excellent |
All 15 servers achieved a CV (Coefficient of Variation: the ratio of standard deviation to mean, expressed as a percentage) below 2% across the 3 independent runs. A CV below 5% is generally considered excellent for load tests on shared infrastructure. The Redis flush-and-reseed methodology eliminates state drift between servers. The 60-second warmup exclusion eliminates JIT cold-start noise. The result is stable, reproducible rankings suitable for technology selection decisions. P95 latency showed equivalent stability: 13 of 15 servers had a stable P95 trend across runs. Only quarkus-native (+1.06ms) and bun (+2.86ms) showed slight increases. Python was the sole variable entry (range of approximately 30ms across runs), consistent with CPython GIL scheduling variability.
5. Discussion
5.1 Performance Tiers
- Rust: 4,845 RPS, 10.9 MB RAM, CV (Coefficient of Variation) 0.04%. Maximum throughput and minimum resource usage. The json_response fix (PR #683) was merged and shipped in rmcp v0.17.0.
- Quarkus: 4,739 RPS, 4.04ms avg latency, 8.13ms P95. Best latency of all 15 servers. Requires explicit connection pool tuning.
- Go: 3,616 RPS, 23.9 MB RAM. Third in throughput, second in memory efficiency, highly stable. Operational simplicity with no JVM dependency.
- Java MVC: 3,540 RPS, 6.13ms avg. Outperforms reactive WebFlux at 50 VUs due to lower scheduling overhead.
- Java-VT: 3,482 RPS, 9.03ms avg. Virtual Threads operate as designed in I/O-bound workloads.
- Quarkus-native: 3,449 RPS, 15.92ms P95 (4th best overall), 36 MB RAM. Best native image option.
- Micronaut: 3,382 RPS. Competitive across all 3 runs.
- Java-WebFlux: 3,032 RPS. Competitive throughput but high P99 tail (47ms) in checkout.
- Java-VT-native: 2,447 RPS, 19.06ms avg. Worst native image (VT continuation overhead in AOT).
- Java-WebFlux-native: 2,413 RPS, 44.17ms P95. High tail latency under sustained write load, compounded by Netty off-heap buffer pressure.
- Java-native: 2,316 RPS. Stable but high tail latency.
- Micronaut-native: 2,161 RPS, 233.2% CPU. Likely CPU-throttled by CFS.
- Bun: 876 RPS. Best JavaScript option, 2.2x over Node.js on identical code.
- Node.js: 423 RPS. Appropriate for low-traffic deployments.
- Python: 259 RPS. Ceiling set by FastMCP session overhead, not the ASGI server.
5.2 Trade-offs Analysis
| Dimension | Rust | Quarkus | Go | Java MVC | Java-VT | WebFlux | Bun | Node.js | Python |
|---|---|---|---|---|---|---|---|---|---|
| Peak Throughput | Highest | Very High | High | High | High | Moderate | Low | Very Low | Lowest |
| Latency (avg) | Very Low | Lowest | Low | Low | Low | Low | High | Very High | Highest |
| Latency Tail (P99) | Low | Lowest | Moderate | Low | Moderate | High | High | Very High | Highest |
| Memory Footprint | Lowest | Moderate | Very Low | High | High | Very High | High | High | Moderate |
| CPU Efficiency | Highest | High | Moderate | Moderate | Moderate | Moderate | Low | Very Low | Lowest |
| Ecosystem Maturity | Early | High | High | Highest | High | High | Moderate | High | High |
| SDK Overhead | Patched | Tunable | Standard | Standard | Standard | Standard | Fixed floor | Fixed floor | Fixed floor |
5.3 Consistency and Reliability
The CV below 2% for all 15 servers is exceptional for a benchmark running on a shared cloud VM. The Redis reset methodology eliminates state drift. The warmup exclusion eliminates JIT noise. Rankings are stable and can be used with confidence. The only notable anomaly is Python's P95 variability across runs (335-387ms), attributable to GC pressure variation in the CPython runtime rather than network or Redis inconsistency.
6. Recommendations
6.1 Production Deployment Guidance
- Maximum throughput is the primary SLA (4,845 RPS, 41.1 RPS per CPU%)
- Memory footprint must be minimal (10.9 MB average)
- Resource cost efficiency matters at scale
- Team has Rust proficiency
- rmcp v0.17.0 or later is available on crates.io
- Latency SLAs are strict (P95 below 10ms required, achieved 8.13ms)
- JVM ecosystem tooling and library access are needed
- Reactive non-blocking I/O is preferred
- Memory-constrained deployments favor Quarkus-native (36 MB)
- Team is Java-proficient and comfortable with reactive programming
- Cloud-native deployment on Kubernetes (23.9 MB RAM, static binary)
- Operational simplicity is preferred (no JVM, minimal configuration)
- Resource cost matters (151 RPS per MB RAM)
- Team uses Go
- No JVM dependency or startup time constraints exist
- Existing Spring ecosystem and team expertise in Java/Spring
- Moderate-to-high throughput requirements within the JVM ecosystem
- Reactive model overhead is not desired
- Blocking I/O matches the concurrency level
Consider instead: Java-VT for future-proofing at higher concurrency levels (above 100 VUs), where Virtual Threads show greater advantage.
- Team is JavaScript-native
- Low-to-moderate traffic scenarios where JavaScript development speed is the priority
- Rapid development cycle is valued
- Prefer Bun over Node.js when JavaScript is required (2.2x throughput advantage)
Not recommended for: Latency-sensitive or high-load production MCP deployments.
- Team is Python-native and Python ML/AI library integration is needed
- Low-traffic deployments where Python ecosystem integration outweighs performance
- Development, testing, or prototyping scenarios
- Integration with existing Python data science tooling outweighs performance requirements
Not recommended for: High-throughput production MCP deployments. The bottleneck is FastMCP session overhead in CPython, not the ASGI server or network I/O.
6.2 Use Case Decision Matrix
| Use Case | Recommended | Alternative | Avoid |
|---|---|---|---|
| High-load production deployments | Rust | Quarkus, Go | Python, Node.js |
| Latency SLA P95 < 10ms | Quarkus | Rust, Go | Python, Node.js |
| Kubernetes / cloud-native | Go | Rust, Quarkus-native | Java WebFlux |
| Memory-constrained (< 50 MB) | Rust | Go, Quarkus-native | Java JVM variants |
| Memory-constrained (< 200 MB) | Quarkus-native | Java-native, Micronaut-native | Java-WebFlux |
| Native image preferred | Quarkus-native | Java-native, Micronaut-native | Java-VT-native |
| Java ecosystem, moderate-to-high load | Java MVC | Java-VT, Micronaut | Python |
| Dev / Testing / low-traffic | Python | Node.js, Bun | (none) |
| JavaScript ecosystem required | Bun | Node.js | (none) |
| Java ecosystem, reactive preferred | Java-VT | Java WebFlux | Java-VT-native |
Figure 2: MCP Server Selection Guide — primary choice and alternatives by deployment scenario. See Table 10 for the full use case matrix.
7. Conclusion
This experimental analysis expanded the MCP server benchmark from 4 to 15 implementations, replacing synthetic CPU tools with real Redis and HTTP API workloads. The expansion revealed performance characteristics that were invisible in v1: the critical impact of connection pool configuration (Quarkus 0 RPS without tuning), the JVM vs native image throughput-memory trade-off under I/O load, the significance of runtime choice within the JavaScript ecosystem (Bun 2.2x Node.js), and the realistic production ceiling of optimized Python (259 RPS with 4 workers and uvloop).
The 39.9 million requests processed with 0% errors across all 15 servers validate the methodology's reproducibility. The CV below 2% for every server confirms that the rankings are stable. The data provides a reliable empirical basis for MCP server technology selection decisions.
Key Finding: In I/O-bound workloads representative of production MCP deployments, Rust and Quarkus lead the field at 4,845 and 4,739 RPS respectively, with Quarkus holding the best latency at 4.04ms average and 8.13ms P95. Go remains the optimal choice for teams prioritizing operational simplicity and resource efficiency. The study confirms that GraalVM native images reduce memory at the cost of throughput in sustained I/O workloads, with Quarkus-native as the best-positioned exception.
Summary of Findings:
- Performance tiers are clearly separated: Rust/Quarkus at 4,700-4,850 RPS, Go/Java cluster at 3,000-3,620 RPS, JS/Python at 250-880 RPS.
- Native images consistently reduce memory (27-81%) at a 20-36% throughput cost under sustained high load. At low request rates, this throughput regression is not observable. Quarkus-native offers the best trade-off at high load.
- Classic blocking I/O (Spring MVC) outperforms reactive (WebFlux) at 50 VUs in this I/O-bound workload.
- Bun delivers 2.2x the throughput of Node.js on identical code, making it the clear choice when the JavaScript ecosystem is required.
- All 15 servers achieved 0% errors and CV below 2% across 39.9 million requests, validating the methodology's reproducibility.
- Production choice (throughput): Rust at 4,845 RPS with 10.9 MB RAM
- Production choice (latency): Quarkus at 4.04ms avg, 8.13ms P95
- Resource and operational choice: Go at 23.9 MB RAM and 3,616 RPS
- Java ecosystem: Spring MVC (blocking) at 3,540 RPS for strong throughput with operational simplicity. Java-VT for future-proofing at higher concurrency levels.
- JavaScript ecosystem: Bun over Node.js (2.2x throughput advantage)
- Python: Appropriate for low-traffic deployments and Python-native teams. The ceiling is FastMCP session overhead in CPython, not the ASGI server.
Future Work: Higher concurrency levels (100-200 VUs) to identify saturation points. Persistent session benchmarks. Multi-instance Kubernetes deployments with session affinity. Rust Rust with native compilation using rmcp v0.17.0.
8. References and Resources
- MCP Streamable HTTP Specification (2025). Model Context Protocol: Streamable HTTP Transport. https://modelcontextprotocol.io/specification/2025-06-18/basic/transports
- Mendes, T. (2026). rmcp SDK PR #683: json_response support for stateless HTTP transport. https://github.com/modelcontextprotocol/rust-sdk/pull/683
- Quarkus Team. (2025). Quarkus REST Client Reactive: Configuration Reference. https://quarkus.io/guides/rest-client-reactive
- OpenJDK. (2023). JEP 444: Virtual Threads. https://openjdk.org/jeps/444
- Oracle. (2025). GraalVM Native Image Documentation. https://www.graalvm.org/latest/reference-manual/native-image/
- FastMCP Contributors. (2025). FastMCP: Running a FastMCP Server. https://gofastmcp.com/deployment/running-server
- Grafana Labs. (2025). k6 Load Testing Documentation. https://k6.io/docs/
- deadpool-redis contributors. (2025). deadpool-redis crate documentation. https://docs.rs/deadpool-redis
- Bun Team. (2025). Bun JavaScript Runtime. https://bun.sh
9. Appendix
9.1 Raw Data and Complete Results
All raw benchmark data, including detailed results from all three runs, per-tool latency breakdowns, Docker stats logs, and k6 output files are available in the project repository:
The benchmark/results/ directory contains timestamped result sets:
summary.json: aggregated metrics across all servers[server]/k6.json: detailed k6 metrics for each server[server]/stats.json: Docker resource usage statistics
9.2 Server Implementations
Complete source code for all 15 MCP server implementations:
rust-server/: rmcp 0.17.0 implementation withjson_response: truequarkus-server/: Quarkus MCP server (JVM and native Dockerfiles)go-server/: mcp-go implementationjava-server/: Spring Boot MVC implementationjava-vt-server/: Spring Boot Virtual Threads implementationjava-webflux-server/: Spring Boot WebFlux implementationmicronaut-server/: Micronaut MCP server (JVM and native Dockerfiles)nodejs-server/: Node.js Express implementationpython-server/: FastMCP + Starlette implementation
9.3 Benchmark Suite
benchmark/benchmark.js: k6 load testing scriptbenchmark/run_benchmark.sh: automated benchmark orchestrationbenchmark/collect_stats.py: Docker stats collectionbenchmark/consolidate.py: results aggregation