Multi-Language MCP Server Performance Benchmark

Abstract

This experiment presents a comprehensive performance analysis of Model Context Protocol (MCP) server implementations across four major programming ecosystems: Java (Spring Boot + Spring AI), Go (official SDK), Node.js (official SDK), and Python (FastMCP). Over three rigorous test rounds totaling 3.9 million requests, we measured latency, throughput, resource consumption, and reliability characteristics under controlled load conditions.

Key Findings: Java and Go implementations demonstrated sub-millisecond average latencies (0.835ms and 0.855ms respectively) with throughput exceeding 1,600 requests per second. In contrast, Node.js and Python showed 10-30x higher latencies, with Python (single-worker uvicorn configuration) averaging 26.45ms and achieving 292 requests per second. Go demonstrated the highest resource efficiency with a memory footprint of just 18MB versus Java's 220MB, while maintaining equivalent performance. All implementations achieved 0% error rates across all test scenarios, demonstrating protocol reliability.

Recommendations: For production high-load scenarios, Go offers the optimal balance of performance and resource efficiency, making it ideal for cloud-native and containerized deployments. Java provides marginally better latency characteristics but at the cost of 12x higher memory consumption. Node.js and Python implementations are better suited for development environments or moderate-traffic production workloads. The tested configurations (security-hardened Node.js and single-worker Python) can be significantly optimized for production use through shared-instance patterns and multi-worker deployments respectively.

Keywords: MCP, Model Context Protocol, Performance Benchmark, Load Testing, Java, Go, Node.js, Python, k6, Docker, Microservices, Scalability, Latency, Throughput, Resource Efficiency

1. Introduction & Motivation

The Model Context Protocol (MCP), introduced by Anthropic, has rapidly emerged as a foundational standard for connecting Large Language Models to enterprise data ecosystems. As organizations adopt MCP for production workloads, the choice of server implementation language becomes critical for scalability, resource efficiency, and operational costs.

While the MCP specification enables implementations across diverse programming languages, performance characteristics can vary dramatically based on language runtime behavior, concurrency models, memory management, and I/O handling strategies. This experiment aims to provide empirical data to inform architectural decisions for production MCP deployments.

Research Questions:

How do MCP server implementations compare across Java, Go, Node.js, and Python in terms of latency and throughput?
What are the resource efficiency characteristics (CPU, memory) of each implementation?
Which implementations are suitable for high-load production scenarios?
What performance trade-offs exist between different language ecosystems?

2. Experimental Setup

2.1 Test Environment

All tests were conducted on a controlled environment to ensure reproducibility and eliminate external variables. Each server was containerized using Docker with identical resource constraints.

Component	Specification
Operating System	Ubuntu 24.04.3 LTS (Noble Numbat)
Container Runtime	Docker with Docker Compose orchestration
CPU Limit (per server)	1.0 core
Memory Limit (per server)	1 GB
Network	Docker bridge network (localhost)
Test Rounds	3 independent rounds (Feb 10, 2026)

2.2 Server Implementations

Four MCP server implementations were developed, each following the Streamable HTTP transport specification. All servers expose identical functionality through four standardized benchmark tools, enabling fair performance comparison.

Methodology Note: The "Standard DX" Approach

The implementation choices for this benchmark were intentionally guided by the "Standard Developer Experience (DX)" typical of each ecosystem in a corporate setting, rather than purely theoretical maximums.

Java (Spring Boot + Spring AI): Reflects the vast majority of enterprise Java applications where framework productivity, dependency injection, and ecosystem integration are prioritized. It is worth noting that purely minimalist implementations, aggressive JVM tuning, or GraalVM Native Image compilation could yield significantly different resource efficiency results.
Go (Standard SDK): Utilizes the standard library and official SDK, reflecting Go's culture where the robust standard library (net/http) is sufficient for production-grade systems without heavy external frameworks.

Language	Framework/SDK	Version	Runtime
Java	Spring Boot + Spring AI	4.0.0 / 2.0.0-M2	Java 21
Go	Official MCP SDK	v1.2.0	Go 1.23
Python	FastMCP + FastAPI	2.12.0+ / 0.109.0+	Python 3.11
Node.js	@modelcontextprotocol/sdk	1.26.0	Node.js 20

Server Ports: Java (8080), Go (8081), Python (8082), Node.js (8083)

Security Consideration: The Node.js implementation employs per-request MCP server instantiation to mitigate CVE-2026-25536, which addresses potential data leakage between concurrent sessions. While this approach enhances security isolation, it introduces performance overhead due to repeated instantiation.

2.3 Benchmark Tools & Configuration

Each server implements four identical tools designed to stress different performance dimensions:

Tool Name	Purpose	Parameters	Performance Dimension
`calculate_fibonacci`	CPU-intensive recursive computation	`n` (integer 0-40)	Computational overhead, algorithmic efficiency
`fetch_external_data`	I/O-intensive HTTP GET request	`endpoint` (URL)	Asynchronous I/O performance, network handling
`process_json_data`	Data transformation (uppercase strings)	`data` (JSON object)	JSON parsing, data manipulation efficiency
`simulate_database_query`	Controlled latency simulation	`query` (string), `delay_ms` (0-5000)	Latency tolerance, concurrent request handling

Load testing was performed using k6, an open-source load testing tool. The benchmark script simulates realistic MCP client behavior with full session lifecycle:

JavaScript

// k6 Load Profile
export const options = {
    stages: [
        { duration: '10s', target: 50 },  // Ramp-up to 50 VUs
        { duration: '5m', target: 50 },   // Sustained load
        { duration: '10s', target: 0 },   // Ramp-down
    ],
    thresholds: {
        'http_req_failed': ['rate<0.05'],  // Error rate < 5%
    },
};

2.4 Test Methodology

The experimental methodology consisted of three independent test rounds to ensure statistical validity and identify consistency patterns. Each round followed this procedure:

graph LR
    A[Start Server] --> B[Health Check]
    B --> C[Start Stats Collection]
    C --> D[Execute k6 Load Test]
    D --> E[Stop Stats Collection]
    E --> F[Shutdown Server]
    F --> G[Aggregate Results]

    style A fill:#1e293b,stroke:#6366f1
    style G fill:#1e293b,stroke:#10b981

Server Isolation: Only one server was tested at a time to eliminate resource contention.
Warmup Period: 10-second ramp-up allowed JIT compilation and connection pool initialization.
Sustained Load: 5-minute test period with 50 concurrent virtual users (VUs).
Metrics Collection: Parallel collection of HTTP metrics (k6) and resource metrics (Docker stats).
Cool-down: 10-second ramp-down for graceful shutdown.

Test Rounds Timestamps:

Round 1: February 10, 2026 at 18:22:44 UTC
Round 2: February 10, 2026 at 21:23:23 UTC
Round 3: February 10, 2026 at 21:45:24 UTC

3. Implementation Details

3.1 Java Server (Spring Boot + Spring AI)

The Java implementation leverages Spring Boot 4.0.0 and Spring AI 2.0.0-M2, utilizing declarative @McpTool annotations for tool registration. Running on Java 21, it benefits from recent performance improvements.

JVM Configuration Note: The Java server was executed without explicit JVM tuning parameters. Java Ergonomics automatically selected default configurations: Serial garbage collector and heap size set to 25% of container memory (256MB out of 1GB limit). The higher memory footprint observed in results (220MB average) reflects JVM's memory pre-allocation strategy. Performance can be further optimized through explicit JVM parameter tuning (e.g., G1GC, custom heap sizing) for production deployments.

Java

@McpTool(description = "Calculate Fibonacci number recursively")
public Map<String, Object> calculateFibonacci(
    @McpToolParameter(description = "Position in Fibonacci sequence") int n
) {
    return Map.of(
        "result", fibonacci(n),
        "server", "java"
    );
}

Key Characteristics:

Compile-time type safety with annotation processing
Synchronous I/O model
JVM warmup benefits from JIT compilation
Spring Boot Actuator for health monitoring

3.2 Go Server (Official SDK)

The Go implementation uses the official MCP SDK v1.2.0 compiled with Go 1.23. It exploits Go's goroutine-based concurrency model for efficient handling of concurrent requests.

Go

server.AddTool("calculate_fibonacci", mcp.Tool{
    Description: "Calculate Fibonacci number recursively",
    InputSchema: map[string]interface{}{
        "type": "object",
        "properties": map[string]interface{}{
            "n": map[string]interface{}{
                "type": "integer",
                "description": "Position in Fibonacci sequence",
            },
        },
        "required": []string{"n"},
    },
}, func(args map[string]interface{}) (*mcp.ToolResponse, error) {
    n := int(args["n"].(float64))
    return &mcp.ToolResponse{
        Content: []interface{}{
            map[string]interface{}{
                "type": "text",
                "text": fmt.Sprintf(`{"result": %d, "server": "go"}`, fibonacci(n)),
            },
        },
    }, nil
})

Key Characteristics:

Goroutines enable lightweight concurrency (thousands of concurrent requests)
Static compilation eliminates runtime dependencies
Minimal memory footprint (~18MB average)
Efficient garbage collection with low pause times

3.3 Node.js Server (Official SDK)

The Node.js implementation uses @modelcontextprotocol/sdk v1.26.0 running on Node.js 20. It employs per-request MCP server instantiation as a security mitigation for CVE-2026-25536.

JavaScript

server.setRequestHandler(CallToolRequestSchema, async (request) => {
    if (request.params.name === 'calculate_fibonacci') {
        const n = request.params.arguments.n;
        const result = fibonacci(n);
        return {
            content: [{
                type: 'text',
                text: JSON.stringify({ result, server: 'nodejs' })
            }]
        };
    }
});

Key Characteristics:

Event-driven I/O with libuv event loop
Per-request server instantiation (security vs. performance trade-off)
Single-threaded execution model
Express.js for HTTP request handling

Performance Impact: The per-request instantiation pattern, while addressing CVE-2026-25536, introduces measurable overhead. Each request creates a new Server instance, registers tools, and tears down the instance post-response. This security-focused design choice prioritizes isolation over raw performance.

3.4 Python Server (FastMCP)

The Python implementation leverages FastMCP 2.12.0+ with FastAPI 0.109.0+ on Python 3.11, utilizing async/await for asynchronous I/O operations.

Python

@mcp.tool()
async def calculate_fibonacci(n: int) -> dict:
    """Calculate Fibonacci number recursively."""
    result = fibonacci(n)
    return {"result": result, "server": "python"}

Key Characteristics:

Decorator-based tool registration (@mcp.tool())
AsyncIO event loop for concurrent request handling
Global Interpreter Lock (GIL) limits CPU-bound parallelism
FastAPI provides automatic OpenAPI documentation

GIL Implications: Python's Global Interpreter Lock prevents true parallel execution of CPU-bound tasks. While async/await enables I/O concurrency, CPU-intensive operations like Fibonacci calculation run sequentially, contributing to higher latencies under load.

Server Configuration Note: The Python server was executed with uvicorn in default configuration (single worker process, without explicit uvloop optimization or access log tuning). Similar to the Java implementation, there are optimization opportunities through production-grade configurations: multiple worker processes (--workers), explicit uvloop integration (--loop uvloop), optimized logging settings, and containerized multi-process deployment patterns can significantly improve throughput and latency characteristics.

4. Results & Analysis

4.1 Overall Performance Metrics

Across three test rounds totaling 3.9 million requests, clear performance tiers emerged. Java and Go demonstrated exceptional performance, Node.js showed moderate capability, while Python lagged significantly.

Perfect Reliability: All four implementations achieved 0% error rate across all 3,932,275 requests, demonstrating excellent MCP protocol compliance and stability.

Performance Metrics Summary
Server	Avg Latency	p95 Latency	Throughput (RPS)	Total Requests
Java	0.835 ms	10.19 ms	1,624	1,559,520
Go	0.855 ms	10.03 ms	1,624	1,558,000
Node.js	10.66 ms	53.24 ms	559	534,150
Python	26.45 ms	73.23 ms	292	280,605

Resource Utilization
Server	Avg CPU %	Avg Memory	Error Rate
Java	28.8%	226 MB	0%
Go	31.8%	18 MB	0%
Node.js	98.7%	110 MB	0%
Python	93.9%	98 MB	0%

4.2 Latency Analysis

Average latency measurements reveal a dramatic performance gap between statically-compiled concurrent runtimes (Java with JVM JIT compilation and thread pooling, Go with goroutines) and single-threaded event-loop runtimes (Node.js V8 JIT engine, Python with GIL constraints).

Average Latency Comparison (milliseconds)

Round 1

Round 2

Round 3

30 ms 25 ms 20 ms 15 ms 10 ms 5 ms 0 ms

Java 0.84ms

Go 0.86ms

Node.js 10.66ms

Python 26.45ms

Latency Comparison Across Three Test Rounds
Server	Round 1	Round 2	Round 3	Mean	Std Dev
Java	0.837 ms	0.821 ms	0.848 ms	0.835 ms	0.014 ms
Go	0.866 ms	0.842 ms	0.858 ms	0.855 ms	0.012 ms
Node.js	10.91 ms	10.52 ms	10.56 ms	10.66 ms	0.20 ms
Python	25.60 ms	28.43 ms	25.33 ms	26.45 ms	1.68 ms

Java and Go demonstrate remarkable consistency (standard deviation < 0.02ms), while Python shows higher variability with Round 2 exhibiting 12% higher latency than other rounds.

Node.js Performance Context: The Node.js implementation tested uses per-request MCP server instantiation as a security measure to address CVE-2026-25536 (session data isolation). This architectural decision introduces measurable instantiation overhead on each request. Alternative implementation patterns with shared server instances would exhibit different performance characteristics, though they would require additional session isolation mechanisms. The measured results reflect this specific security-focused architecture, not fundamental runtime limitations.

4.3 Throughput Comparison

Throughput measurements (requests per second) directly correlate with latency characteristics. Java and Go achieved virtually identical throughput, while Python managed only 18% of the high-performance tier.

Throughput Consistency: Go demonstrated the most consistent throughput across rounds with only 0.5% variability, followed by Java at 0.7%. Python exhibited the highest variability at 9.0%, with Round 2 showing an 8% degradation in total requests processed.

4.4 Resource Efficiency

Resource efficiency metrics reveal critical trade-offs for production deployment decisions, particularly in cloud and containerized environments where resource costs directly impact operational expenses.

CPU and Memory Efficiency Metrics
Server	RPS per CPU %	RPS per MB Memory	Efficiency Ranking
Java	57.2	7.2	Best CPU Efficiency
Go	50.4	92.6	Best Memory Efficiency
Node.js	5.7	5.1	CPU Saturated
Python	3.2	3.1	Limited Efficiency

Go's Memory Advantage: Go achieves 92.6 requests per megabyte of memory versus Java's 7.2, a 12.8x improvement. For a cloud deployment handling 100,000 RPS, Go would require approximately 1.1 GB versus Java's 13.9 GB, representing significant cost savings in containerized environments.

4.5 Tool-Specific Performance

Performance characteristics varied significantly across the four benchmark tools, revealing language-specific strengths and weaknesses in CPU-bound, I/O-bound, and data transformation workloads.

Average Latency Per Tool (ms)
Tool	Java	Go	Node.js	Python	Winner
calculate_fibonacci (CPU-intensive)	0.369	0.388	7.11 (19x slower)	30.83 (84x slower)	Java
fetch_external_data (I/O-intensive)	1.316	1.292	19.18 (15x slower)	80.92 (63x slower)	Go
process_json_data (Data transformation)	0.352	0.443	7.48 (21x slower)	34.24 (97x slower)	Java
simulate_database_query (Latency handling)	10.37	10.71	26.71 (2.6x slower)	42.57 (4.1x slower)	Java
initialize (Session setup)	0.386	0.357	9.22 (26x slower)	25.79 (72x slower)	Go
tools_list (Metadata retrieval)	0.310	0.724	7.56 (24x slower)	28.88 (93x slower)	Java

Key Observations:

CPU-bound tasks: Java excels in Fibonacci computation (JIT optimization)
I/O-bound tasks: Go shows marginal advantage in external data fetching (efficient goroutines)
Python's I/O bottleneck: Despite async/await, fetch operations are 63x slower than Go
Node.js overhead: Per-request instantiation adds ~6-7ms baseline latency

4.6 Percentile Analysis

Percentile latencies provide insight into tail latency characteristics crucial for user experience and SLA compliance.

Latency Percentiles (ms)
Server	p50 (Median)	p90	p95	p99 (est.)	Max
Java	~0.7	0.975	10.19	~40	77
Go	~0.7	0.975	10.03	~45	90
Node.js	~9	43.64	53.24	~75	90
Python	~22	50.82	73.23	~120	233

Note: Values marked with "~" represent rounded approximations from k6's reported percentile distributions. All percentile values (p50, p90, p95, p99) are directly extracted from k6 metrics output without interpolation.

Java and Go demonstrate exceptional p50/p90 performance with sub-millisecond medians. The jump to p95 (~10ms) for both likely reflects the simulated database query tool with 10ms delay. Python's maximum latency of 233ms (Round 2) suggests occasional severe GIL contention or garbage collection pauses.

5. Discussion

5.1 Performance Tiers

The experimental results clearly segment the four implementations into three distinct performance tiers:

Tier 1: High-Performance (Java & Go)

Sub-millisecond average latencies (<1ms)
Throughput exceeding 1,600 RPS per CPU core
Suitable for production high-load scenarios
CPU efficiency headroom for scaling
Consistent performance across test rounds

Tier 2: Medium-Performance (Node.js)

~10ms average latency (12x slower than Tier 1)
~550 RPS throughput
Suitable for moderate-load scenarios
CPU saturated (98.7% utilization), limited scaling headroom
Per-request instantiation overhead for security

Tier 3: Lower-Performance (Python)

~26ms average latency (31x slower than Tier 1)
~290 RPS throughput
Suitable only for low-traffic or development scenarios
GIL limits CPU-bound parallelism
Higher variability across test rounds

5.2 Trade-offs Analysis

Each implementation presents distinct trade-offs between performance, resource usage, developer experience, and ecosystem maturity.

Implementation Trade-offs Matrix
Dimension	Java	Go	Node.js	Python
Performance	Excellent (0.835ms)	Excellent (0.855ms)	Good (10.66ms)	Fair (26.45ms)
Memory Footprint	High (226 MB)	Minimal (18 MB)	Moderate (110 MB)	Moderate (98 MB)
CPU Efficiency	Best (57.2 RPS/%)	Excellent (50.4 RPS/%)	Poor (5.7 RPS/%)	Poor (3.2 RPS/%)
Scaling Headroom	High (28% CPU)	High (32% CPU)	None (99% CPU)	None (94% CPU)
Developer Experience	Verbose, Type-safe	Simple, Explicit	Familiar, Flexible	Concise, Intuitive
Ecosystem Maturity	Very Mature	Growing	Very Mature	Very Mature
Cold Start Time	Slow (JVM warmup)	Fast (static binary)	Moderate	Moderate

* Cold start times were not measured empirically in this experiment. Values represent typical runtime characteristics based on general platform knowledge: JVM requires warmup and class loading, Go produces static binaries with minimal startup overhead, while Node.js and Python have moderate interpreter initialization times.

5.3 Consistency & Reliability

Performance consistency across test rounds provides insight into predictability and operational stability.

Consistency Analysis Across Test Rounds
Server	Throughput Variability	Consistency Ranking	Anomalies Observed
Go	0.5%	Most Consistent	None significant
Java	0.7%	Highly Consistent	Occasional max latency spikes (77ms)
Node.js	2.3%	Moderately Consistent	CPU constantly maxed (99%)
Python	9.0%	Variable	Round 2: 8% throughput degradation, 233ms max latency

Python's Round 2 anomaly (88,290 requests vs. 95,910 in Round 1 and 96,405 in Round 3) suggests potential sensitivity to system state, garbage collection pressure, or GIL contention patterns that vary across runs.

6. Recommendations

6.1 Production Deployment Guidance

Based on experimental findings, the following decision framework provides guidance for selecting MCP server implementations for production deployments:

Use Go When:

Cloud-native deployments: Minimal memory footprint (18MB) ideal for containers and serverless
Cost optimization priority: 12.8x better memory efficiency than Java reduces cloud costs
Horizontal scaling: Lightweight instances enable high replica counts
Kubernetes environments: Low resource requests and fast startup characteristics typical of compiled Go binaries
Multi-tenancy scenarios: Small footprint allows more tenants per node

Use Java When:

Absolute lowest latency required: Marginal advantage over Go (0.835ms vs 0.855ms)
Existing Java infrastructure: Leverage existing JVM expertise and tooling
Complex business logic: Rich ecosystem and mature libraries
Memory not constrained: 220MB footprint acceptable in resource-rich environments
Enterprise compliance: Mature security scanning and compliance tools

Use Node.js When:

Moderate traffic scenarios: <500 RPS per instance
Team expertise: JavaScript/TypeScript team familiarity outweighs performance concerns
Internal tools: Developer-facing or administrative interfaces
Rapid prototyping: Familiar ecosystem accelerates development

Not Recommended For: High-load production (CPU saturation limits scaling)

Use Python When:

Development/testing only: Rapid iteration and debugging
Very low traffic: <100 RPS per instance
AI/ML integration priority: Rich ecosystem for data science tasks
Prototyping: Quick proof-of-concept implementations

Not Recommended For: Any production high-load scenario (31x slower than Go/Java)

6.2 Use Case Decision Matrix

Deployment Recommendations by Use Case
Use Case	Recommended	Alternative	Avoid
High-Load Production (>1000 RPS)	Go, Java	-	Node.js, Python
Kubernetes/Cloud-Native	Go	Java	-
Lowest Latency Critical (<1ms)	Java	Go	Node.js, Python
Serverless/FaaS	Go	-	Java (higher memory footprint)
Cost-Optimized Cloud	Go	-	Java (high memory)
Moderate Load (500-1000 RPS)	Go, Java	Node.js	Python
Low Load (<500 RPS)	Any	-	-
Development/Testing	Python, Node.js	Go, Java	-
Internal Tools	Node.js, Python	Go	-

graph TD
    A[MCP Server Selection] --> B{Traffic Level?}
    B -->|High >1000 RPS| C{Memory Constrained?}
    B -->|Medium 500-1000| D[Go or Java]
    B -->|Low <500| E[Any Implementation]

    C -->|Yes| F[Go - Best Choice]
    C -->|No| G{Lowest Latency Priority?}
    G -->|Yes| H[Java]
    G -->|No| F

    D --> I{Cloud/K8s?}
    I -->|Yes| F
    I -->|No| J[Java or Go]

    E --> K{Team Expertise?}
    K -->|JS/TS| L[Node.js]
    K -->|Python| M[Python]
    K -->|Other| N[Go - Best Balance]

    style F fill:#10b981,stroke:#059669,color:#000
    style H fill:#10b981,stroke:#059669,color:#000
    style A fill:#1e293b,stroke:#6366f1

7. Conclusion

This experimental analysis of MCP server implementations across four major programming languages provides empirical evidence for architectural decision-making in production deployments. Testing 3.9 million requests across three independent rounds revealed clear performance tiers and distinct trade-off profiles.

Key Finding: Go emerges as the optimal choice for production MCP deployments, delivering performance equivalent to Java (0.855ms vs 0.835ms average latency) while consuming 92% less memory (18MB vs 220MB). This combination of high performance and minimal resource footprint makes Go particularly well-suited for cloud-native, containerized, and cost-sensitive deployments.

Summary of Findings:

Performance Tiers: Java and Go form a high-performance tier with sub-millisecond latencies and 1,600+ RPS throughput. Node.js and Python trail by 10-30x, suitable only for moderate to low-load scenarios.
Resource Efficiency: Go demonstrates exceptional memory efficiency (92.6 RPS/MB) while Java excels in CPU efficiency (57.2 RPS/CPU%). Node.js and Python both exhibit CPU saturation (>93% utilization) with limited scaling headroom.
Reliability: All implementations achieved 0% error rates across 3.9 million requests, demonstrating robust MCP protocol compliance. Go showed the highest consistency (0.5% variability), while Python exhibited 9.0% variability with occasional performance degradation.
Tool-Specific Performance: Java excelled in CPU-bound tasks (Fibonacci: 0.369ms), while Go showed advantages in I/O operations (fetch: 1.292ms). Python's GIL significantly hampered CPU-bound performance (84x slower than Java for Fibonacci).

Recommendations Summary:

Production High-Load: Use Go (best balance) or Java (lowest latency)
Cloud/Kubernetes: Use Go (minimal footprint, fast startup characteristics)
Cost Optimization: Use Go (12.8x better memory efficiency)
Development/Testing: Python or Node.js acceptable
Moderate Load: Node.js viable if team expertise justifies trade-offs

Note: Results reflect specific test configurations: security-hardened Node.js with per-request instantiation, and single-worker Python with default uvicorn. Production deployments with shared-instance patterns (Node.js) and multi-worker configurations with uvloop (Python) can achieve significantly improved performance characteristics.

Future Work: This study focused on single-core performance with controlled resource limits. Future research will expand the scope to include:

Alternative Java Runtimes: Benchmarking minimalist frameworks (e.g., Quarkus, Micronaut) and GraalVM Native Image to assess "bare metal" Java performance.
Optimized Python & Node.js: Testing multi-worker Python configurations (with uvloop) and shared-instance Node.js architectures to measure maximum potential throughput.
Real-world Scenarios: Examining multi-core scaling characteristics, mixed workload patterns, and behavior under resource contention.
Advanced Features: Investigating streaming responses and bidirectional communication across implementations.

These extended benchmarks will provide a more granular view of the performance landscape, particularly for edge cases and highly optimized deployments.

The complete benchmark suite, including all test scripts, server implementations, and raw data, is available in the project repository for reproducibility and extended analysis.

8. References & Resources

Anthropic. (2024). Model Context Protocol Specification. https://modelcontextprotocol.io
Spring AI Team. (2025). Spring AI MCP Server Documentation. https://docs.spring.io/spring-ai/reference/
Anthropic. (2024). Go MCP SDK Repository. https://github.com/modelcontextprotocol/go-sdk
FastMCP Contributors. (2024). FastMCP Documentation. https://github.com/jlowin/fastmcp
Anthropic. (2024). Node.js MCP SDK Repository. https://github.com/modelcontextprotocol/typescript-sdk
Grafana Labs. (2024). k6 Load Testing Documentation. https://k6.io/docs/

9. Appendix

9.1 Raw Data and Complete Results

All raw benchmark data, including detailed results from all three test rounds, per-tool latency breakdowns, Docker stats logs, and k6 output files are available in the project repository:

View on GitHub

The benchmark/results/ directory contains timestamped result sets for all test rounds, including:

summary.json - Aggregated metrics across all servers
[server]/k6.json - Detailed k6 metrics for each server
[server]/stats.json - Docker resource usage statistics
[server]/k6_console.log - Complete k6 console output

9.2 Server Implementations

Complete source code for all four MCP server implementations is available in the repository:

java-server/ - Spring Boot + Spring AI implementation
go-server/ - Go SDK implementation
python-server/ - FastMCP + FastAPI implementation
nodejs-server/ - Node.js SDK implementation

9.3 Benchmark Suite

The complete benchmark suite and orchestration scripts are located in the benchmark/ directory:

benchmark.js - k6 load testing script
run_benchmark.sh - Automated benchmark orchestration
collect_stats.py - Docker stats collection tool
consolidate.py - Results aggregation script

Abstract

Table of Contents

1. Introduction & Motivation

2. Experimental Setup

2.1 Test Environment

2.2 Server Implementations

2.3 Benchmark Tools & Configuration

2.4 Test Methodology

3. Implementation Details

3.1 Java Server (Spring Boot + Spring AI)

3.2 Go Server (Official SDK)

3.3 Node.js Server (Official SDK)

3.4 Python Server (FastMCP)

4. Results & Analysis

4.1 Overall Performance Metrics

4.2 Latency Analysis

Average Latency Comparison (milliseconds)

4.3 Throughput Comparison

4.4 Resource Efficiency

4.5 Tool-Specific Performance

4.6 Percentile Analysis

5. Discussion

5.1 Performance Tiers

5.2 Trade-offs Analysis

5.3 Consistency & Reliability

6. Recommendations

6.1 Production Deployment Guidance

6.2 Use Case Decision Matrix

7. Conclusion

8. References & Resources

9. Appendix

9.1 Raw Data and Complete Results

9.2 Server Implementations

9.3 Benchmark Suite