Skip to main content

Performance Benchmarks

All numbers on this page are measured and reproducible. Each section links to the script or tool used so you can verify on your own hardware.

Intra-Process Latency

Roundtrip write + read through the lock-free SlabPool/IndexRing path, no network involved. Measured with examples/latency_benchmark (10,000 messages, 1,000 warmup, BestEffort KeepLast(1), 16-byte payload).

Reproduce:

# Single run
cargo run --example latency_benchmark --release

# Statistical run (100 iterations)
./scripts/bench-intra-process.sh 100

Results (100 runs per machine)

MachineCPUp50 medianp50 bestMin everSub-257ns runs
node2i9-9900KF @ 4.8GHz (turbo)247 ns243 ns242 ns53/100
node4i7-6700K @ 4.0GHz (perf gov.)310 ns306 ns280 ns0/100
devdebXeon E5-2699 v4 @ 2.2GHz (VM)~50 us-526 ns0/100

Date: 2026-03-04, HDDS v1.0.8, Linux 6.12, --release (no LTO).

The "257 ns" figure published on hdds.io was originally measured on the first HDDS version (rustc 1.83, release + LTO, i9-9900KF). The current codebase (v1.0.8, 100+ features) achieves 247 ns median on the same hardware, confirming the claim is accurate and even conservative.

Writer.write() Microbenchmark (Criterion)

Measured with cargo bench (Criterion, 64-byte Temperature struct). This measures only the write path, not the full roundtrip.

cargo bench --package hdds -- write_latency
PayloadLatency (release, no LTO)
64 B2.49 us
256 B2.54 us
1 KB2.70 us

Date: 2026-03-04, HDDS v1.0.8, Xeon E5-2699 v4 @ 2.2GHz.

Note: The Criterion bench measures the full writer.write() path including RTPS framing and UDP send. The intra-process fast path (see above) bypasses these layers entirely, which is why the example benchmark is much faster.


UDP Loopback Latency (Same Host)

End-to-end RTT measured with hdds-latency-probe (ping/pong, 64-byte payload, 500-1000 samples, Reliable QoS).

# Terminal 1
hdds-latency-probe pong --domain 0

# Terminal 2
hdds-latency-probe ping --domain 0 --size 64 --count 1000

Results

MachineCPUp50p99
node2i9-9900KF @ 3.6GHz246 us450 us
node4i7-6700K @ 4.0GHz~270 us~470 us
devdebDual Xeon E5-2699 v4246 us450 us

Date: 2026-01-08, HDDS v0.8.0.

Best-Effort vs Reliable (i9-9900KF, loopback)

QoSMinAvgp99
Reliable971 us1,288 us2,181 us
Best-Effort968 us1,230 us2,192 us

Minimal difference on loopback (no packet loss to trigger retransmission).

Optimization History

VersionChange64B p50vs baseline
v2081ms polling1,134 usbaseline
v209100us polling413 us2.7x faster
v210WakeNotifier (Condvar)354 us3.2x faster
v211Atomic fast-path + Spin280 us4.0x faster
v212mio/epoll listener246 us4.6x faster

Comparison with CycloneDDS

Measured on the same machine, same test conditions, UDP loopback.

SizeCycloneDDS p50HDDS p50Gap
64B133 us246 us1.8x
1KB120 us~270 us2.2x
8KB150 us~290 us1.9x

Date: 2026-01-08, CycloneDDS 0.10.x vs HDDS v0.8.0 (v212), Dual Xeon.

Both implementations achieved 0% packet loss across all tests.

Remaining gap is mainly due to extra memcpy in the receive path and thread context switches (listener to router thread handoff).


Component Benchmarks

Serialization (CDR2)

From cargo bench --package hdds -- serialize:

Operation64 B256 B1 KB4 KB
Serialize50 ns120 ns400 ns1.5 us
Deserialize40 ns100 ns350 ns1.3 us

Memory Usage

Per-Entity Overhead

EntityMemory
DomainParticipant~2 MB
Publisher~64 KB
Subscriber~64 KB
DataWriter~128 KB + history
DataReader~128 KB + history
Topic~16 KB

History Cache Memory

Memory = instances x history_depth x sample_size + overhead

Example (100 sensors, 10-deep history, 1KB samples):
= 100 x 10 x 1024 + 100 x 256
= 1.02 MB + 25 KB = ~1.05 MB

Running Benchmarks

Intra-Process (fast path)

# Single run
cargo run --example latency_benchmark --release

# 100 runs with statistics
./scripts/bench-intra-process.sh 100

# Skip rebuild
./scripts/bench-intra-process.sh 100 --no-build

Tips for stable results:

  • Copy binary to /tmp (the script does this automatically)
  • Set CPU governor to performance: sudo cpupower frequency-set -g performance
  • Close other workloads

Criterion Microbenchmarks

cargo bench --package hdds                 # All benchmarks
cargo bench --package hdds -- write_latency # Write path only
cargo bench --package hdds -- serialize # CDR2 only

UDP Latency (hdds-latency-probe)

# Terminal 1 - Responder
hdds-latency-probe pong --domain 0

# Terminal 2 - Benchmark
hdds-latency-probe ping --domain 0 --size 64 --count 1000

# JSON output
hdds-latency-probe ping --domain 0 --json

Profiling

CPU Profiling

# Using perf
perf record -g cargo run --release --example latency_benchmark
perf report

# Using flamegraph
cargo flamegraph --example latency_benchmark

Memory Profiling

heaptrack cargo run --release --example latency_benchmark
heaptrack_gui heaptrack.latency_benchmark.*.gz