Performance Issues
Identify and resolve performance problems in HDDS applications.
Diagnosing Performance Issues
Quick Health Check
# Check CPU usage
top -H -p $(pgrep my_app)
# Check memory
ps -o rss,vsz,pid,cmd -p $(pgrep my_app)
# Check network
ss -u -n | grep 7400
netstat -su
Identify Bottleneck
Performance issue?
│
▼
High CPU? ──────────> Profiling section
│
▼
High memory? ───────> Memory section
│
▼
High latency? ──────> Latency section
│
▼
Low throughput? ────> Throughput section
│
▼
Packet loss? ───────> Network section
High Latency
Symptoms
- End-to-end delay exceeds requirements
- Inconsistent timing (jitter)
- Timeout errors
Diagnosis
// Measure write latency
let start = Instant::now();
writer.write(&sample)?;
let write_time = start.elapsed();
println!("Write took: {:?}", write_time);
// If blocking:
if write_time > Duration::from_millis(10) {
println!("Write blocked - check reliability/history");
}
Solutions
1. Use Best Effort for non-critical data:
let qos = DataWriterQos::default()
.reliability(Reliability::BestEffort);
2. Enable shared memory:
let config = DomainParticipantConfig::default()
.transport(TransportConfig::default()
.enable_shared_memory(true)
.prefer_shared_memory(true));
3. Reduce history depth:
let qos = DataWriterQos::default()
.history(History::KeepLast { depth: 1 });
4. Disable batching:
let config = DataWriterConfig::default()
.batching_enabled(false);
5. Tune network:
# Reduce buffer bloat
sysctl -w net.core.rmem_default=262144
sysctl -w net.core.wmem_default=262144
# Disable interrupt coalescing
ethtool -C eth0 rx-usecs 0 tx-usecs 0
Low Throughput
Symptoms
- Can't achieve expected message rate
- Publish rate limited
- Bandwidth underutilized
Diagnosis
// Measure throughput
let start = Instant::now();
let mut count = 0;
while start.elapsed() < Duration::from_secs(10) {
if let Err(HddsError::WouldBlock) = writer.try_write(&sample) {
// Buffer full - backpressure
println!("Backpressure at {} samples", count);
break;
}
count += 1;
}
println!("Throughput: {} samples/sec", count as f64 / 10.0);
Solutions
1. Increase history buffer:
let qos = DataWriterQos::default()
.history(History::KeepLast { depth: 10000 })
.resource_limits(ResourceLimits {
max_samples: 10000,
..Default::default()
});
2. Enable batching:
let config = DataWriterConfig::default()
.batching_enabled(true)
.max_batch_size(64 * 1024)
.batch_flush_period(Duration::from_millis(1));
3. Use parallel writers:
// Multiple writers for parallel publishing
let writers: Vec<_> = (0..4)
.map(|_| publisher.create_datawriter(&topic))
.collect();
// Round-robin across writers
for (i, sample) in samples.iter().enumerate() {
writers[i % writers.len()].write(sample)?;
}
4. Increase socket buffers:
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
let config = TransportConfig::default()
.socket_send_buffer_size(16 * 1024 * 1024)
.socket_receive_buffer_size(16 * 1024 * 1024);
5. Use shared memory:
// For same-host: 10x+ throughput improvement
let config = DomainParticipantConfig::default()
.transport(TransportConfig::default()
.enable_shared_memory(true)
.shared_memory_segment_size(256 * 1024 * 1024));
High CPU Usage
Symptoms
- CPU at 100% on one or more cores
- System becomes unresponsive
- Other processes starved
Diagnosis
# Profile with perf
perf record -g ./my_app
perf report
# Or flamegraph
cargo flamegraph --bin my_app
Solutions
1. Reduce polling:
// Bad: busy loop
loop {
if let Ok(samples) = reader.try_take() {
process(samples);
}
// 100% CPU!
}
// Good: wait with timeout
loop {
let samples = reader.take_timeout(Duration::from_millis(10))?;
process(samples);
}
// Better: use WaitSet
let waitset = WaitSet::new()?;
waitset.attach(reader.status_condition())?;
loop {
waitset.wait(Duration::from_secs(1))?;
let samples = reader.take()?;
process(samples);
}
2. Reduce logging:
# Production: errors only
export RUST_LOG=hdds=error
3. Use release build:
cargo build --release
4. Offload processing:
use std::sync::mpsc;
use std::thread;
// Receive in one thread
let (tx, rx) = mpsc::channel();
thread::spawn(move || {
loop {
let samples = reader.take()?;
tx.send(samples)?;
}
});
// Process in another
thread::spawn(move || {
while let Ok(samples) = rx.recv() {
heavy_processing(samples); // Won't block reader
}
});
High Memory Usage
Symptoms
- Memory grows over time
- OOM errors
- System swapping
Diagnosis
# Track allocations
heaptrack ./my_app
heaptrack_gui heaptrack.my_app.*.gz
# Check at runtime
ps -o rss,vsz,pid,cmd -p $(pgrep my_app)
Solutions
1. Limit history:
// Don't use KeepAll without limits
let qos = DataReaderQos::default()
.history(History::KeepLast { depth: 100 }) // Not KeepAll
.resource_limits(ResourceLimits {
max_samples: 1000,
max_instances: 100,
max_samples_per_instance: 10,
});
2. Dispose instances:
// For keyed topics, dispose old instances
writer.dispose(&sample, handle)?;
// Or unregister to free memory
writer.unregister_instance(&sample, handle)?;
3. Reduce sample size:
// Use bounded types
struct Efficient {
string<256> name; // Max 256 chars
sequence<float, 100> values; // Max 100 elements
};
4. Use external storage:
// Mark large fields as external
struct LargeData {
@external sequence<octet> image_data;
};
Packet Loss
Symptoms
SampleLostcallbacks- Sequence gaps
- Unreliable even with
ReliableQoS
Diagnosis
# Check interface errors
ip -s link show eth0 | grep -E "(dropped|errors)"
# Check socket buffer overruns
netstat -su | grep buffer
# Check HDDS stats
export RUST_LOG=hdds::transport=debug
Solutions
1. Increase socket buffers:
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
2. Increase history for reliable:
// More retransmission buffer
let qos = DataWriterQos::default()
.reliability(Reliability::Reliable)
.history(History::KeepLast { depth: 1000 });
3. Reduce publish rate:
// Implement rate limiting
let interval = Duration::from_micros(100); // 10kHz max
let mut last_write = Instant::now();
loop {
let elapsed = last_write.elapsed();
if elapsed < interval {
std::thread::sleep(interval - elapsed);
}
writer.write(&sample)?;
last_write = Instant::now();
}
4. Use flow control:
// Check backpressure before writing
loop {
match writer.try_write(&sample) {
Ok(()) => break,
Err(HddsError::WouldBlock) => {
// Back off
std::thread::sleep(Duration::from_millis(1));
}
Err(e) => return Err(e),
}
}
Discovery Performance
Symptoms
- Slow startup
- Takes seconds to match endpoints
- Frequent re-discovery
Solutions
1. Static discovery:
let config = DomainParticipantConfig::default()
.initial_peers(vec!["192.168.1.100:7400".parse()?])
.enable_multicast_discovery(false);
2. Faster announcements:
let config = DomainParticipantConfig::default()
.initial_announcement_period(Duration::from_millis(50))
.initial_announcement_count(10);
3. Shorter lease:
let config = DomainParticipantConfig::default()
.lease_duration(Duration::from_secs(10));
Performance Tuning Checklist
Low Latency
- Use shared memory transport
- Best effort reliability (if acceptable)
- KeepLast(1) history
- Disable batching
- Pre-register instances
- Pin threads to CPU cores
- Disable kernel interrupt coalescing
High Throughput
- Enable batching
- Large history buffers
- Large socket buffers
- Multiple parallel writers
- Use shared memory
- Compress large payloads
- Use fixed-size types
Low Memory
- KeepLast with small depth
- Set resource limits
- Dispose/unregister instances
- Use bounded types
- Mark large fields external
- Monitor and alert
Low CPU
- Use WaitSet instead of polling
- Release builds
- Reduce logging
- Offload processing to threads
- Use efficient serialization
Performance Monitoring
// Add performance metrics
struct PerformanceMonitor {
write_latency: Histogram<u64>,
read_latency: Histogram<u64>,
throughput_counter: AtomicU64,
last_report: Instant,
}
impl PerformanceMonitor {
fn report(&self) {
let elapsed = self.last_report.elapsed().as_secs_f64();
let throughput = self.throughput_counter.load(Ordering::Relaxed) as f64 / elapsed;
println!("Performance Report:");
println!(" Write p99: {} us", self.write_latency.value_at_percentile(99.0));
println!(" Read p99: {} us", self.read_latency.value_at_percentile(99.0));
println!(" Throughput: {:.0} samples/sec", throughput);
}
}
Next Steps
- Common Issues - General troubleshooting
- Debug Guide - Debugging techniques
- Benchmarks - Performance baselines