Boost.Corosio Performance Benchmarks
Executive Summary
This report presents comprehensive performance benchmarks comparing Boost.Corosio against Boost.Asio on Windows using the IOCP (I/O Completion Ports) backend. The benchmarks cover handler dispatch, socket throughput, socket latency, and HTTP server workloads.
Bottom Line
Corosio demonstrates exceptional single-threaded handler dispatch performance (2× faster than Asio) and superior interleaved post/run throughput (70% faster). However, Asio shows better multi-threaded scaling in both handler dispatch and HTTP server workloads. Socket I/O throughput is essentially identical between the two implementations.
Where Corosio Excels
-
Single-threaded handler post: 2× faster than Asio (1.59 Mops/s vs 802 Kops/s)
-
Interleaved post/run: 70% faster (2.90 Mops/s vs 1.71 Mops/s)
-
Concurrent post and run: 14% faster (1.68 Mops/s vs 1.48 Mops/s)
-
Large-buffer throughput: Essentially identical, slight edge at some buffer sizes
Where Corosio Needs Improvement
-
Multi-threaded handler scaling: Throughput regresses from 4→8 threads (2.58→2.09 Mops/s)
-
Multi-threaded HTTP: Asio is 56% faster at 8 threads (337.68 vs 215.94 Kops/s)
-
Tail latency: p99 latency ~50% higher than Asio (21 μs vs 14 μs)
-
Concurrent connections: Latency increases faster than Asio under load
Key Insights
| Component | Assessment |
|---|---|
Handler Dispatch |
Corosio is significantly faster single-threaded, but Asio scales better with threads |
Socket I/O |
Essentially identical throughput; Asio has ~0.5 μs lower latency per operation |
HTTP Server |
Asio outperforms at all thread counts; gap widens with more threads |
Scaling Behavior |
Corosio shows thread contention issues at 8 threads |
Detailed Results
Handler Dispatch Summary
| Scenario | Corosio | Asio | Winner |
|---|---|---|---|
Single-threaded post |
1.59 Mops/s |
802 Kops/s |
Corosio (+98%) |
Multi-threaded (8 threads) |
2.09 Mops/s |
3.02 Mops/s |
Asio (+44%) |
Interleaved post/run |
2.90 Mops/s |
1.71 Mops/s |
Corosio (+70%) |
Concurrent post/run |
1.68 Mops/s |
1.48 Mops/s |
Corosio (+14%) |
Socket Throughput Summary
| Scenario | Corosio | Asio | Winner |
|---|---|---|---|
Unidirectional 1KB buffer |
215 MB/s |
213 MB/s |
Tie |
Unidirectional 64KB buffer |
6.43 GB/s |
6.40 GB/s |
Tie |
Bidirectional 64KB buffer |
6.15 GB/s |
6.50 GB/s |
Asio (+6%) |
Test Environment
Platform |
Windows (IOCP backend) |
Benchmarks |
Handler dispatch, socket throughput, socket latency, HTTP server |
Measurement |
Client-side latency and throughput |
Handler Dispatch Benchmarks
These benchmarks measure raw handler posting and execution throughput, isolating the scheduler from I/O completion overhead.
Single-Threaded Handler Post
Posting 5,000,000 handlers from a single thread.
| Metric | Corosio | Asio | Difference |
|---|---|---|---|
Handlers |
5,000,000 |
5,000,000 |
— |
Elapsed |
3.143 s |
6.233 s |
-50% |
Throughput |
1.59 Mops/s |
802 Kops/s |
+98% |
Key finding: Corosio’s single-threaded handler dispatch is nearly 2× faster than Asio.
Multi-Threaded Scaling
Multiple threads running handlers concurrently (5,000,000 handlers total).
| Threads | Corosio | Asio | Corosio Speedup | Asio Speedup |
|---|---|---|---|---|
1 |
2.46 Mops/s |
1.51 Mops/s |
(baseline) |
(baseline) |
2 |
2.24 Mops/s |
2.16 Mops/s |
0.91× |
1.43× |
4 |
2.58 Mops/s |
2.97 Mops/s |
1.05× |
1.96× |
8 |
2.09 Mops/s |
3.02 Mops/s |
0.85× |
1.99× |
Scaling Analysis
Throughput vs Thread Count:
Threads Corosio Asio Winner
1 2.46 1.51 Corosio +63%
2 2.24 2.16 Corosio +4%
4 2.58 2.97 Asio +15%
8 2.09 3.02 Asio +44%
↑
(regression)
Notable observations:
-
Corosio is faster at 1-2 threads
-
Crossover occurs between 2-4 threads
-
Corosio regresses from 4→8 threads (2.58 → 2.09 Mops/s)
-
Asio continues scaling through 8 threads
Interleaved Post/Run
Alternating between posting batches and running them (50,000 iterations × 100 handlers).
| Metric | Corosio | Asio | Difference |
|---|---|---|---|
Total handlers |
5,000,000 |
5,000,000 |
— |
Elapsed |
1.723 s |
2.930 s |
-41% |
Throughput |
2.90 Mops/s |
1.71 Mops/s |
+70% |
Key finding: Corosio excels at interleaved post/run patterns—a common pattern in real applications.
Socket Throughput Benchmarks
Unidirectional Throughput
Single direction transfer of 4096 MB with varying buffer sizes.
| Buffer Size | Corosio | Asio | Difference |
|---|---|---|---|
1024 bytes |
215.20 MB/s |
213.17 MB/s |
+1% |
4096 bytes |
757.98 MB/s |
743.34 MB/s |
+2% |
16384 bytes |
2.56 GB/s |
2.58 GB/s |
-1% |
65536 bytes |
6.43 GB/s |
6.40 GB/s |
+0.5% |
Observation: Throughput is essentially identical. Both implementations achieve excellent performance at large buffer sizes.
Bidirectional Throughput
Simultaneous transfer of 2048 MB in each direction (4096 MB total).
| Buffer Size | Corosio | Asio | Difference |
|---|---|---|---|
1024 bytes |
214.55 MB/s |
212.18 MB/s |
+1% |
4096 bytes |
707.35 MB/s |
755.43 MB/s |
-6% |
16384 bytes |
2.48 GB/s |
2.59 GB/s |
-4% |
65536 bytes |
6.15 GB/s |
6.50 GB/s |
-5% |
Observation: Asio has a slight edge in bidirectional throughput at larger buffer sizes, but differences are small.
Socket Latency Benchmarks
Ping-Pong Round-Trip Latency
Single socket pair exchanging messages (1,000,000 iterations each).
| Message Size | Corosio Mean | Asio Mean | Difference | Corosio p99 | Asio p99 |
|---|---|---|---|---|---|
1 byte |
10.04 μs |
9.66 μs |
+4% |
21.10 μs |
14.20 μs |
64 bytes |
10.10 μs |
9.61 μs |
+5% |
21.20 μs |
13.30 μs |
1024 bytes |
10.03 μs |
9.66 μs |
+4% |
21.10 μs |
12.30 μs |
Latency Distribution (64-byte messages)
| Percentile | Corosio | Asio | Difference |
|---|---|---|---|
p50 |
9.60 μs |
9.20 μs |
+4% |
p90 |
9.80 μs |
9.70 μs |
+1% |
p99 |
21.20 μs |
13.30 μs |
+59% |
p99.9 |
115.70 μs |
76.40 μs |
+51% |
min |
8.30 μs |
8.10 μs |
+2% |
max |
3.15 ms |
2.13 ms |
+48% |
Observation: Mean latencies are very close (~0.5 μs difference), but Corosio has significantly higher tail latency (p99+).
Concurrent Socket Pairs
Multiple socket pairs operating concurrently (64-byte messages).
| Pairs | Iterations | Corosio Mean | Asio Mean | Corosio p99 | Asio p99 |
|---|---|---|---|---|---|
1 |
1,000,000 |
9.95 μs |
9.55 μs |
19.20 μs |
13.10 μs |
4 |
500,000 |
40.90 μs |
39.54 μs |
81.88 μs |
69.60 μs |
16 |
250,000 |
162.95 μs |
160.49 μs |
357.36 μs |
344.09 μs |
Observation: Both implementations scale similarly with concurrent pairs. Asio maintains a small latency advantage throughout.
HTTP Server Benchmarks
Single Connection (Sequential Requests)
| Metric | Corosio | Asio | Difference |
|---|---|---|---|
Requests |
1,000,000 |
1,000,000 |
— |
Elapsed |
10.383 s |
10.421 s |
-0.4% |
Throughput |
96.31 Kops/s |
95.96 Kops/s |
+0.4% |
Mean latency |
10.36 μs |
10.39 μs |
-0.3% |
p99 latency |
14.70 μs |
13.80 μs |
+7% |
Observation: Single-connection HTTP performance is essentially identical.
Concurrent Connections (Single Thread)
| Connections | Corosio Throughput | Asio Throughput | Corosio Mean | Asio Mean | Gap |
|---|---|---|---|---|---|
1 |
92.71 Kops/s |
92.35 Kops/s |
10.76 μs |
10.80 μs |
Tie |
4 |
92.64 Kops/s |
91.14 Kops/s |
43.15 μs |
43.86 μs |
Tie |
16 |
92.03 Kops/s |
90.38 Kops/s |
173.83 μs |
177.00 μs |
Tie |
32 |
92.14 Kops/s |
89.11 Kops/s |
347.27 μs |
359.06 μs |
Corosio +3% |
Observation: Single-threaded HTTP performance scales identically with connection count.
Multi-Threaded HTTP (32 Connections)
| Threads | Corosio Throughput | Asio Throughput | Gap | Scaling Factor |
|---|---|---|---|---|
1 |
89.72 Kops/s |
88.25 Kops/s |
+2% |
(baseline) |
2 |
127.27 Kops/s |
127.48 Kops/s |
0% |
1.42× / 1.44× |
4 |
141.15 Kops/s |
210.64 Kops/s |
-33% |
1.57× / 2.39× |
8 |
215.94 Kops/s |
337.68 Kops/s |
-36% |
2.41× / 3.83× |
Multi-Threaded Latency
| Threads | Corosio Mean | Asio Mean | Corosio p99 | Asio p99 |
|---|---|---|---|---|
1 |
356.63 μs |
362.58 μs |
748.50 μs |
620.88 μs |
2 |
251.37 μs |
250.92 μs |
384.09 μs |
352.85 μs |
4 |
226.46 μs |
151.75 μs |
447.79 μs |
192.31 μs |
8 |
147.86 μs |
94.26 μs |
188.26 μs |
120.68 μs |
Key finding: Asio scales significantly better in multi-threaded HTTP workloads, achieving 3.83× scaling from 1→8 threads compared to Corosio’s 2.41×.
Analysis
Performance Characteristics
Handler Dispatch
Corosio shows dramatically better single-threaded performance but struggles with multi-threaded scaling:
| Scenario | Corosio Advantage | Notes |
|---|---|---|
Single-threaded |
+98% |
Nearly 2× faster |
Interleaved post/run |
+70% |
Excellent batch handling |
Concurrent 4 threads |
+14% |
Still competitive |
8 threads |
-44% |
Scaling regression |
Conclusions
Strengths
Corosio:
-
Exceptional single-threaded handler dispatch (2× faster)
-
Superior interleaved post/run performance (70% faster)
-
Competitive socket I/O throughput
-
Identical single-connection HTTP performance
Asio:
-
Better multi-threaded scaling (no regression at 8 threads)
-
Superior multi-threaded HTTP throughput (+56% at 8 threads)
-
Lower tail latency in socket operations
-
More predictable performance under load
Appendix: Raw Data
Corosio Results
Backend: iocp
=== Single-threaded Handler Post ===
Handlers: 5000000
Elapsed: 3.143 s
Throughput: 1.59 Mops/s
=== Multi-threaded Scaling ===
Handlers per test: 5000000
1 thread(s): 2.46 Mops/s
2 thread(s): 2.24 Mops/s (speedup: 0.91x)
4 thread(s): 2.58 Mops/s (speedup: 1.05x)
8 thread(s): 2.09 Mops/s (speedup: 0.85x)
=== Interleaved Post/Run ===
Iterations: 50000
Handlers/iter: 100
Total handlers: 5000000
Elapsed: 1.723 s
Throughput: 2.90 Mops/s
=== Concurrent Post and Run ===
Threads: 4
Handlers/thread: 1250000
Total handlers: 5000000
Elapsed: 2.970 s
Throughput: 1.68 Mops/s
=== Unidirectional Throughput ===
Buffer size: 1024 bytes, Transfer: 4096 MB
Throughput: 215.20 MB/s
Buffer size: 4096 bytes, Transfer: 4096 MB
Throughput: 757.98 MB/s
Buffer size: 16384 bytes, Transfer: 4096 MB
Throughput: 2.56 GB/s
Buffer size: 65536 bytes, Transfer: 4096 MB
Throughput: 6.43 GB/s
=== Bidirectional Throughput ===
Buffer size: 1024 bytes: 214.55 MB/s (combined)
Buffer size: 4096 bytes: 707.35 MB/s (combined)
Buffer size: 16384 bytes: 2.48 GB/s (combined)
Buffer size: 65536 bytes: 6.15 GB/s (combined)
=== Ping-Pong Round-Trip Latency ===
1 byte: mean=10.04 us, p99=21.10 us
64 bytes: mean=10.10 us, p99=21.20 us
1024 bytes: mean=10.03 us, p99=21.10 us
=== Concurrent Socket Pairs Latency ===
1 pair: mean=9.95 us, p99=19.20 us
4 pairs: mean=40.90 us, p99=81.88 us
16 pairs: mean=162.95 us, p99=357.36 us
=== HTTP Single Connection ===
Throughput: 96.31 Kops/s
Latency: mean=10.36 us, p99=14.70 us
=== HTTP Multi-threaded (32 connections) ===
1 thread: 89.72 Kops/s, mean=356.63 us
2 threads: 127.27 Kops/s, mean=251.37 us
4 threads: 141.15 Kops/s, mean=226.46 us
8 threads: 215.94 Kops/s, mean=147.86 us
Asio Results
=== Single-threaded Handler Post ===
Handlers: 5000000
Elapsed: 6.233 s
Throughput: 802.18 Kops/s
=== Multi-threaded Scaling ===
Handlers per test: 5000000
1 thread(s): 1.51 Mops/s
2 thread(s): 2.16 Mops/s (speedup: 1.43x)
4 thread(s): 2.97 Mops/s (speedup: 1.96x)
8 thread(s): 3.02 Mops/s (speedup: 1.99x)
=== Interleaved Post/Run ===
Iterations: 50000
Handlers/iter: 100
Total handlers: 5000000
Elapsed: 2.930 s
Throughput: 1.71 Mops/s
=== Concurrent Post and Run ===
Threads: 4
Handlers/thread: 1250000
Total handlers: 5000000
Elapsed: 3.374 s
Throughput: 1.48 Mops/s
=== Unidirectional Throughput ===
Buffer size: 1024 bytes: 213.17 MB/s
Buffer size: 4096 bytes: 743.34 MB/s
Buffer size: 16384 bytes: 2.58 GB/s
Buffer size: 65536 bytes: 6.40 GB/s
=== Bidirectional Throughput ===
Buffer size: 1024 bytes: 212.18 MB/s (combined)
Buffer size: 4096 bytes: 755.43 MB/s (combined)
Buffer size: 16384 bytes: 2.59 GB/s (combined)
Buffer size: 65536 bytes: 6.50 GB/s (combined)
=== Ping-Pong Round-Trip Latency ===
1 byte: mean=9.66 us, p99=14.20 us
64 bytes: mean=9.61 us, p99=13.30 us
1024 bytes: mean=9.66 us, p99=12.30 us
=== Concurrent Socket Pairs Latency ===
1 pair: mean=9.55 us, p99=13.10 us
4 pairs: mean=39.54 us, p99=69.60 us
16 pairs: mean=160.49 us, p99=344.09 us
=== HTTP Single Connection ===
Throughput: 95.96 Kops/s
Latency: mean=10.39 us, p99=13.80 us
=== HTTP Multi-threaded (32 connections) ===
1 thread: 88.25 Kops/s, mean=362.58 us
2 threads: 127.48 Kops/s, mean=250.92 us
4 threads: 210.64 Kops/s, mean=151.75 us
8 threads: 337.68 Kops/s, mean=94.26 us