Debug Guide
Comprehensive guide to debugging HDDS applications.
Logging
Enable Logging
# Basic logging
export RUST_LOG=hdds=info
# Detailed logging
export RUST_LOG=hdds=debug
# Trace all DDS operations
export RUST_LOG=hdds=trace
# Specific modules
export RUST_LOG=hdds::discovery=debug,hdds::transport=trace
Log Levels
| Level | Use Case |
|---|---|
error | Critical failures only |
warn | Warnings and errors |
info | General operation status |
debug | Detailed debugging info |
trace | Very verbose, all operations |
Log to File
# Redirect to file
RUST_LOG=hdds=debug ./my_app 2> hdds.log
# Or configure in code
use tracing_subscriber::fmt::writer::MakeWriterExt;
let file = std::fs::File::create("hdds.log")?;
tracing_subscriber::fmt()
.with_writer(file)
.init();
Structured Logging
use tracing::{info, debug, span, Level};
let span = span!(Level::INFO, "dds_operation", topic = "SensorTopic");
let _guard = span.enter();
info!(sensor_id = 42, value = 23.5, "Publishing sample");
Discovery Debugging
Check Discovery Status
// List discovered participants
println!("Discovered participants:");
for info in participant.discovered_participants() {
println!(" GUID: {:?}", info.guid);
println!(" Vendor: {:?}", info.vendor_id);
println!(" Locators: {:?}", info.unicast_locators);
}
// List matched endpoints
println!("Writer matched {} readers",
writer.publication_matched_status()?.current_count);
println!("Reader matched {} writers",
reader.subscription_matched_status()?.current_count);
Network Debugging
# Watch SPDP traffic
tcpdump -i any -n udp port 7400 -X
# Watch all DDS traffic
tcpdump -i any -n 'udp and portrange 7400-7500'
# With Wireshark filter
rtps
Discovery Log Analysis
export RUST_LOG=hdds::discovery=trace
./my_app 2>&1 | grep -E "(SPDP|SEDP|match)"
Expected flow:
SPDP: Sending announcement
SPDP: Received participant 01.0f.aa.bb...
SEDP: Publishing writer info
SEDP: Received subscription info
SEDP: Match found - writer 01... <-> reader 02...
Communication Debugging
Trace Data Flow
// Writer side
impl DataWriterListener for DebugListener {
fn on_publication_matched(&mut self, _w: &DataWriter<T>, status: PublicationMatchedStatus) {
println!("Matched: {} readers", status.current_count);
}
fn on_offered_deadline_missed(&mut self, _w: &DataWriter<T>, status: OfferedDeadlineMissedStatus) {
println!("DEADLINE MISSED: instance {:?}", status.last_instance_handle);
}
}
// Reader side
impl DataReaderListener for DebugListener {
fn on_data_available(&mut self, reader: &DataReader<T>) {
match reader.take() {
Ok(samples) => println!("Received {} samples", samples.len()),
Err(e) => println!("Take error: {:?}", e),
}
}
fn on_sample_lost(&mut self, _r: &DataReader<T>, status: SampleLostStatus) {
println!("SAMPLE LOST: {} total", status.total_count);
}
}
Monitor Write/Read Cycle
// Add timestamps
use std::time::Instant;
let start = Instant::now();
writer.write(&sample)?;
println!("Write took: {:?}", start.elapsed());
// On reader side
let samples = reader.take()?;
for (sample, info) in samples {
println!("Received: timestamp={:?}, latency={:?}",
info.source_timestamp,
Instant::now() - info.source_timestamp);
}
QoS Debugging
Print QoS Settings
fn print_qos(qos: &QoS) {
println!("QoS settings:");
println!(" {:?}", qos);
}
Check QoS Compatibility
// QoS compatibility is checked automatically by HDDS
// Writer reliability must be >= Reader reliability
// Writer durability must be >= Reader durability
// Check matched status to verify compatibility
println!("Writer matched {} readers", writer.matched_subscriptions().len());
println!("Reader matched {} writers", reader.matched_publications().len());
Memory Debugging
Track Allocations
# Using heaptrack
heaptrack ./my_app
heaptrack_gui heaptrack.my_app.*.gz
# Using valgrind
valgrind --tool=massif ./my_app
ms_print massif.out.*
Monitor Runtime Memory
// Add memory stats endpoint
fn print_memory_stats(participant: &DomainParticipant) {
let stats = participant.memory_stats();
println!("Memory usage:");
println!(" History cache: {} bytes", stats.history_cache_bytes);
println!(" Samples stored: {}", stats.samples_count);
println!(" Instances: {}", stats.instances_count);
}
Check for Leaks
# Using valgrind
valgrind --leak-check=full ./my_app
# Using AddressSanitizer
RUSTFLAGS="-Z sanitizer=address" cargo run --release
Fuzzing
HDDS is continuously fuzzed with cargo-fuzz (libFuzzer) to find parsing vulnerabilities.
Fuzz Targets
| Target | Description | Corpus | Crashes |
|---|---|---|---|
fuzz_rtps_spdp | SPDP discovery parser | 2,054 | 0 |
fuzz_rtps_sedp | SEDP endpoint parser | 3,939 | 0 |
fuzz_rtps_control | RTPS control messages | 517 | 0 |
fuzz_xml_permissions | Security XML parser | 6,383 | 0 |
Running Fuzzers
cd /projects/public/hdds
# Run single fuzzer
cargo +nightly fuzz run fuzz_rtps_spdp
# Run all fuzzers (1 hour each)
./fuzz/run_all_fuzzers.sh 3600
# Check for crashes
ls fuzz/artifacts/
Bugs Found & Fixed (v232)
Fuzzing discovered 3 integer overflow bugs in control_parser.rs and 1 bounds check issue in annotations.rs. All fixed with saturating_add() and proper bounds validation.
Sanitizer Testing
HDDS is tested with all major memory and thread sanitizers. All tests pass clean.
Test Results
| Tool | Result | Notes |
|---|---|---|
| Valgrind | PASS | 0 bytes definitely lost |
| ASan | PASS | 2171/2173 tests (2 timing-sensitive excluded) |
| TSan | PASS | False positives in std lib only |
| MSan | PASS | False positive in std lib (cgroups read) |
Running with Sanitizers
# AddressSanitizer - detects buffer overflows, use-after-free
RUSTFLAGS="-Z sanitizer=address" cargo test -p hdds
# ThreadSanitizer - detects data races
RUSTFLAGS="-Z sanitizer=thread" cargo test -p hdds
# MemorySanitizer - detects uninitialized memory reads
# Requires nightly and instrumented libc (Docker recommended)
RUSTFLAGS="-Z sanitizer=memory" cargo +nightly test -p hdds
# Valgrind - memory leak detection
cargo build --release -p hdds
valgrind --leak-check=full --show-leak-kinds=definite ./target/release/my_app
ASan Common Issues
# If ASan reports stack-buffer-overflow in tests:
# This is often due to RTPS packet parsing with malformed input.
# HDDS handles this gracefully via bounds checking.
# Exclude timing-sensitive tests (flaky under sanitizer)
cargo test -p hdds --features asan-safe -- --skip timing
TSan False Positives
ThreadSanitizer may report false positives in Rust's standard library (particularly std::sync internals). These are known issues with TSan's understanding of Rust atomics.
# Suppress known false positives
export TSAN_OPTIONS="suppressions=tsan_suppressions.txt"
MSan Setup
MSan requires an instrumented standard library. The easiest approach is using a prepared Docker image:
# Using prepared MSan image
docker run --rm -v $(pwd):/code hdds/msan-test cargo test -p hdds
Performance Debugging
Profile CPU Usage
# Using perf
perf record -g ./my_app
perf report
# Using flamegraph
cargo install flamegraph
cargo flamegraph --bin my_app
Measure Latency
use std::time::Instant;
use hdrhistogram::Histogram;
let mut histogram = Histogram::<u64>::new(3).unwrap();
for _ in 0..10000 {
let start = Instant::now();
// Operation to measure
writer.write(&sample)?;
let latency_us = start.elapsed().as_micros() as u64;
histogram.record(latency_us)?;
}
println!("Latency stats:");
println!(" p50: {} us", histogram.value_at_percentile(50.0));
println!(" p95: {} us", histogram.value_at_percentile(95.0));
println!(" p99: {} us", histogram.value_at_percentile(99.0));
println!(" max: {} us", histogram.max());
Measure Throughput
use std::time::Instant;
let sample_count = 100000;
let start = Instant::now();
for _ in 0..sample_count {
writer.write(&sample)?;
}
let elapsed = start.elapsed();
let throughput = sample_count as f64 / elapsed.as_secs_f64();
println!("Throughput: {:.0} samples/sec", throughput);
Network Debugging
Capture Packets
# Capture to file
tcpdump -i any -w hdds_capture.pcap 'udp and portrange 7400-7500'
# Analyze with Wireshark
wireshark hdds_capture.pcap
# Filter: rtps
Check Network Stats
# Socket buffer usage
ss -u -n | grep 7400
# Network errors
netstat -su
# Interface stats
ip -s link show eth0
Simulate Network Issues
# Add latency
sudo tc qdisc add dev eth0 root netem delay 10ms
# Add packet loss
sudo tc qdisc add dev eth0 root netem loss 1%
# Remove rules
sudo tc qdisc del dev eth0 root
Debug Tools
HDDS Viewer
# Monitor traffic
hdds-viewer capture --interface eth0 --domain 0
# Show discovered entities
hdds-viewer show participants
hdds-viewer show topics
hdds-viewer show endpoints
Built-in Diagnostics
// Enable internal diagnostics
let config = DomainParticipantConfig::default()
.enable_diagnostics(true)
.diagnostics_topic("hdds/diagnostics");
// Subscribe to diagnostics
let diag_reader = subscriber.create_datareader::<DiagnosticsData>(
participant.find_topic("hdds/diagnostics")?
)?;
Debug Assertions
// Enable debug assertions in release
// Cargo.toml:
// [profile.release]
// debug-assertions = true
debug_assert!(writer.publication_matched_status()?.current_count > 0,
"No readers matched!");
Common Debug Patterns
Minimal Reproducer
// Simplified test case
use hdds::{Participant, QoS, DDS, TransportMode};
fn main() -> Result<(), hdds::Error> {
// Minimal setup
let participant = Participant::builder("test")
.domain_id(0)
.with_transport(TransportMode::UdpMulticast)
.build()?;
// Single writer
let topic = participant.topic::<TestData>("TestTopic")?;
let writer = topic.writer().qos(QoS::reliable()).build()?;
// Write test data
let sample = TestData { id: 1, value: 42.0 };
writer.write(&sample)?;
println!("Write succeeded");
Ok(())
}
Binary Search Debug
When issue appears in complex code:
- Add logging at midpoint
- If issue before midpoint, search first half
- If issue after midpoint, search second half
- Repeat until isolated
Comparison Debug
// Compare working vs broken configuration
let working_qos = QoS::reliable();
let broken_qos = QoS::best_effort();
// Test both and compare behavior
Debug Checklist
- Enable logging:
export RUST_LOG=hdds=debug - Check discovery: Are participants/endpoints matched?
- Verify QoS: Are writer/reader QoS compatible?
- Check network: Can hosts reach each other?
- Monitor resources: Memory, CPU, file descriptors
- Capture traffic: Use tcpdump/Wireshark
- Isolate issue: Create minimal reproducer
- Check versions: Are all components same version?
Next Steps
- Common Issues - Known issues and fixes
- Performance Issues - Performance debugging
- Error Codes - Error reference