Skip to main content

Performance Issues

Identify and resolve performance problems in HDDS applications.

Diagnosing Performance Issues

Quick Health Check

# Check CPU usage
top -H -p $(pgrep my_app)

# Check memory
ps -o rss,vsz,pid,cmd -p $(pgrep my_app)

# Check network
ss -u -n | grep 7400
netstat -su

Identify Bottleneck

Performance issue?


High CPU? ──────────> Profiling section


High memory? ───────> Memory section


High latency? ──────> Latency section


Low throughput? ────> Throughput section


Packet loss? ───────> Network section

High Latency

Symptoms

  • End-to-end delay exceeds requirements
  • Inconsistent timing (jitter)
  • Timeout errors

Diagnosis

// Measure write latency
let start = Instant::now();
writer.write(&sample)?;
let write_time = start.elapsed();
println!("Write took: {:?}", write_time);

// If blocking:
if write_time > Duration::from_millis(10) {
println!("Write blocked - check reliability/history");
}

Solutions

1. Use Best Effort for non-critical data:

let qos = DataWriterQos::default()
.reliability(Reliability::BestEffort);

2. Enable shared memory:

let config = DomainParticipantConfig::default()
.transport(TransportConfig::default()
.enable_shared_memory(true)
.prefer_shared_memory(true));

3. Reduce history depth:

let qos = DataWriterQos::default()
.history(History::KeepLast { depth: 1 });

4. Disable batching:

let config = DataWriterConfig::default()
.batching_enabled(false);

5. Tune network:

# Reduce buffer bloat
sysctl -w net.core.rmem_default=262144
sysctl -w net.core.wmem_default=262144

# Disable interrupt coalescing
ethtool -C eth0 rx-usecs 0 tx-usecs 0

Low Throughput

Symptoms

  • Can't achieve expected message rate
  • Publish rate limited
  • Bandwidth underutilized

Diagnosis

// Measure throughput
let start = Instant::now();
let mut count = 0;

while start.elapsed() < Duration::from_secs(10) {
if let Err(HddsError::WouldBlock) = writer.try_write(&sample) {
// Buffer full - backpressure
println!("Backpressure at {} samples", count);
break;
}
count += 1;
}

println!("Throughput: {} samples/sec", count as f64 / 10.0);

Solutions

1. Increase history buffer:

let qos = DataWriterQos::default()
.history(History::KeepLast { depth: 10000 })
.resource_limits(ResourceLimits {
max_samples: 10000,
..Default::default()
});

2. Enable batching:

let config = DataWriterConfig::default()
.batching_enabled(true)
.max_batch_size(64 * 1024)
.batch_flush_period(Duration::from_millis(1));

3. Use parallel writers:

// Multiple writers for parallel publishing
let writers: Vec<_> = (0..4)
.map(|_| publisher.create_datawriter(&topic))
.collect();

// Round-robin across writers
for (i, sample) in samples.iter().enumerate() {
writers[i % writers.len()].write(sample)?;
}

4. Increase socket buffers:

sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
let config = TransportConfig::default()
.socket_send_buffer_size(16 * 1024 * 1024)
.socket_receive_buffer_size(16 * 1024 * 1024);

5. Use shared memory:

// For same-host: 10x+ throughput improvement
let config = DomainParticipantConfig::default()
.transport(TransportConfig::default()
.enable_shared_memory(true)
.shared_memory_segment_size(256 * 1024 * 1024));

High CPU Usage

Symptoms

  • CPU at 100% on one or more cores
  • System becomes unresponsive
  • Other processes starved

Diagnosis

# Profile with perf
perf record -g ./my_app
perf report

# Or flamegraph
cargo flamegraph --bin my_app

Solutions

1. Reduce polling:

// Bad: busy loop
loop {
if let Ok(samples) = reader.try_take() {
process(samples);
}
// 100% CPU!
}

// Good: wait with timeout
loop {
let samples = reader.take_timeout(Duration::from_millis(10))?;
process(samples);
}

// Better: use WaitSet
let waitset = WaitSet::new()?;
waitset.attach(reader.status_condition())?;

loop {
waitset.wait(Duration::from_secs(1))?;
let samples = reader.take()?;
process(samples);
}

2. Reduce logging:

# Production: errors only
export RUST_LOG=hdds=error

3. Use release build:

cargo build --release

4. Offload processing:

use std::sync::mpsc;
use std::thread;

// Receive in one thread
let (tx, rx) = mpsc::channel();
thread::spawn(move || {
loop {
let samples = reader.take()?;
tx.send(samples)?;
}
});

// Process in another
thread::spawn(move || {
while let Ok(samples) = rx.recv() {
heavy_processing(samples); // Won't block reader
}
});

High Memory Usage

Symptoms

  • Memory grows over time
  • OOM errors
  • System swapping

Diagnosis

# Track allocations
heaptrack ./my_app
heaptrack_gui heaptrack.my_app.*.gz

# Check at runtime
ps -o rss,vsz,pid,cmd -p $(pgrep my_app)

Solutions

1. Limit history:

// Don't use KeepAll without limits
let qos = DataReaderQos::default()
.history(History::KeepLast { depth: 100 }) // Not KeepAll
.resource_limits(ResourceLimits {
max_samples: 1000,
max_instances: 100,
max_samples_per_instance: 10,
});

2. Dispose instances:

// For keyed topics, dispose old instances
writer.dispose(&sample, handle)?;

// Or unregister to free memory
writer.unregister_instance(&sample, handle)?;

3. Reduce sample size:

// Use bounded types
struct Efficient {
string<256> name; // Max 256 chars
sequence<float, 100> values; // Max 100 elements
};

4. Use external storage:

// Mark large fields as external
struct LargeData {
@external sequence<octet> image_data;
};

Packet Loss

Symptoms

  • SampleLost callbacks
  • Sequence gaps
  • Unreliable even with Reliable QoS

Diagnosis

# Check interface errors
ip -s link show eth0 | grep -E "(dropped|errors)"

# Check socket buffer overruns
netstat -su | grep buffer

# Check HDDS stats
export RUST_LOG=hdds::transport=debug

Solutions

1. Increase socket buffers:

sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216

2. Increase history for reliable:

// More retransmission buffer
let qos = DataWriterQos::default()
.reliability(Reliability::Reliable)
.history(History::KeepLast { depth: 1000 });

3. Reduce publish rate:

// Implement rate limiting
let interval = Duration::from_micros(100); // 10kHz max
let mut last_write = Instant::now();

loop {
let elapsed = last_write.elapsed();
if elapsed < interval {
std::thread::sleep(interval - elapsed);
}
writer.write(&sample)?;
last_write = Instant::now();
}

4. Use flow control:

// Check backpressure before writing
loop {
match writer.try_write(&sample) {
Ok(()) => break,
Err(HddsError::WouldBlock) => {
// Back off
std::thread::sleep(Duration::from_millis(1));
}
Err(e) => return Err(e),
}
}

Discovery Performance

Symptoms

  • Slow startup
  • Takes seconds to match endpoints
  • Frequent re-discovery

Solutions

1. Static discovery:

let config = DomainParticipantConfig::default()
.initial_peers(vec!["192.168.1.100:7400".parse()?])
.enable_multicast_discovery(false);

2. Faster announcements:

let config = DomainParticipantConfig::default()
.initial_announcement_period(Duration::from_millis(50))
.initial_announcement_count(10);

3. Shorter lease:

let config = DomainParticipantConfig::default()
.lease_duration(Duration::from_secs(10));

Performance Tuning Checklist

Low Latency

  • Use shared memory transport
  • Best effort reliability (if acceptable)
  • KeepLast(1) history
  • Disable batching
  • Pre-register instances
  • Pin threads to CPU cores
  • Disable kernel interrupt coalescing

High Throughput

  • Enable batching
  • Large history buffers
  • Large socket buffers
  • Multiple parallel writers
  • Use shared memory
  • Compress large payloads
  • Use fixed-size types

Low Memory

  • KeepLast with small depth
  • Set resource limits
  • Dispose/unregister instances
  • Use bounded types
  • Mark large fields external
  • Monitor and alert

Low CPU

  • Use WaitSet instead of polling
  • Release builds
  • Reduce logging
  • Offload processing to threads
  • Use efficient serialization

Performance Monitoring

// Add performance metrics
struct PerformanceMonitor {
write_latency: Histogram<u64>,
read_latency: Histogram<u64>,
throughput_counter: AtomicU64,
last_report: Instant,
}

impl PerformanceMonitor {
fn report(&self) {
let elapsed = self.last_report.elapsed().as_secs_f64();
let throughput = self.throughput_counter.load(Ordering::Relaxed) as f64 / elapsed;

println!("Performance Report:");
println!(" Write p99: {} us", self.write_latency.value_at_percentile(99.0));
println!(" Read p99: {} us", self.read_latency.value_at_percentile(99.0));
println!(" Throughput: {:.0} samples/sec", throughput);
}
}

Next Steps