Aller au contenu principal

OpenTelemetry (OTLP)

Export HDDS traces and metrics to any OTLP-compatible backend via gRPC.

Overview

The hdds-telemetry-otlp crate bridges HDDS's tracing instrumentation to OpenTelemetry OTLP exporters. This enables distributed tracing and metrics collection via gRPC (tonic) to backends like Jaeger, Grafana Tempo, or any OTLP-compatible collector.

What it provides:

  • Automatic export of tracing::info_span! calls as OpenTelemetry spans
  • DDS-specific metric instruments (messages sent/received, write latency, discovery events)
  • Clean shutdown via RAII guard pattern

Quick Start

Add the dependency

[dependencies]
hdds-telemetry-otlp = { version = "0.1" }

Initialize

use hdds_telemetry_otlp::{OtlpConfig, init_tracing};

fn main() {
let config = OtlpConfig::default();
let _guard = init_tracing(config).expect("Failed to init OTLP tracing");

// All tracing::info_span! / tracing::info! calls are now exported
// as OpenTelemetry spans to the configured OTLP endpoint.

// _guard must be held alive for the duration of the application.
// Dropping it triggers a clean shutdown of the pipeline.
}

Configuration

OtlpConfig

use hdds_telemetry_otlp::OtlpConfig;

let config = OtlpConfig {
endpoint: "http://localhost:4317".to_string(),
service_name: "my-dds-app".to_string(),
export_traces: true,
export_metrics: true,
batch_timeout_ms: 5000,
};

Configuration Options

OptionDefaultDescription
endpointhttp://localhost:4317OTLP collector endpoint (gRPC)
service_name"hdds"Service name reported to the collector
export_tracestrueExport spans via OTLP
export_metricstrueExport metrics via OTLP
batch_timeout_ms5000Batch export timeout in milliseconds

Tracing

When export_traces is enabled, init_tracing() sets up:

  1. An OTLP SpanExporter via gRPC (tonic) pointed at config.endpoint
  2. A SdkTracerProvider with batch span processing
  3. A tracing_opentelemetry::OpenTelemetryLayer wired into tracing_subscriber::Registry
  4. An EnvFilter (defaults to info, configurable via RUST_LOG)

All tracing spans and events in your application (and in HDDS internals) are automatically exported as OpenTelemetry spans.

// These spans appear in your OTLP backend
let _span = tracing::info_span!("dds.write", topic = "SensorData", seq = 42).entered();
tracing::info!("Writing sample to topic SensorData");

Metrics

Pre-Registered Instruments

When export_metrics is enabled, the following DDS instruments are pre-registered:

InstrumentTypeDescription
dds.messages.sentCounter (u64)Total DDS messages sent
dds.messages.receivedCounter (u64)Total DDS messages received
dds.discovery.participantsCounter (u64)DDS discovery participant events
dds.latency.write_nsHistogram (u64)DDS write latency in nanoseconds

HddsMetrics

The HddsMetrics struct provides a convenience wrapper around the pre-registered instruments.

use hdds_telemetry_otlp::metrics::HddsMetrics;

let metrics = HddsMetrics::new();

// Record a write with latency
metrics.record_write(1_200); // 1200 ns write latency
// Increments dds.messages.sent and records dds.latency.write_ns

// Record a read
metrics.record_read();
// Increments dds.messages.received

// Record a discovery event
metrics.record_discovery_event("participant_added");
// Increments dds.discovery.participants with event_type attribute

Custom Meter

You can create HddsMetrics from an explicit Meter if needed:

use hdds_telemetry_otlp::metrics::HddsMetrics;
use opentelemetry::global;

let meter = global::meter("my-custom-meter");
let metrics = HddsMetrics::from_meter(&meter);

OtlpGuard

The init_tracing() function returns an OtlpGuard that must be held alive for the duration of the application. When dropped, it:

  1. Flushes any remaining spans
  2. Shuts down the SdkTracerProvider
  3. Shuts down the SdkMeterProvider
use hdds_telemetry_otlp::{OtlpConfig, init_tracing};

fn main() {
// Hold the guard for the entire application lifetime
let _guard = init_tracing(OtlpConfig::default())
.expect("Failed to init OTLP");

run_application();

// Give batch exporter time to flush before shutdown
std::thread::sleep(std::time::Duration::from_secs(2));

// _guard drops here, triggering clean shutdown
}
Guard Lifetime

If the OtlpGuard is dropped too early, spans and metrics may be lost. Hold it in your main() function or equivalent application entry point.

Complete Example

use hdds_telemetry_otlp::{OtlpConfig, init_tracing};
use hdds_telemetry_otlp::metrics::HddsMetrics;

fn main() {
// 1. Configure and initialize OTLP export
let config = OtlpConfig {
endpoint: "http://localhost:4317".to_string(),
service_name: "hdds-example".to_string(),
export_traces: true,
export_metrics: true,
batch_timeout_ms: 2000,
};

let _guard = init_tracing(config).expect("Failed to init OTLP tracing");

// 2. Create metric instruments
let metrics = HddsMetrics::new();

// 3. Simulate DDS activity with tracing spans
for i in 0..5 {
{
let _span = tracing::info_span!(
"dds.write", topic = "SensorData", seq = i
).entered();
tracing::info!("Writing sample {} to topic SensorData", i);

let latency_ns = 10_000_000 + (i as u64 * 500_000);
metrics.record_write(latency_ns);
}

{
let _span = tracing::info_span!(
"dds.read", topic = "SensorData", seq = i
).entered();
tracing::info!("Reading sample {} from topic SensorData", i);
metrics.record_read();
}
}

// 4. Discovery event
{
let _span = tracing::info_span!("dds.discovery").entered();
tracing::info!("New participant discovered");
metrics.record_discovery_event("participant_added");
}

// 5. Give the batch exporter a moment to flush
std::thread::sleep(std::time::Duration::from_secs(3));

// 6. OtlpGuard is dropped here, triggering clean shutdown
println!("Shutting down OTLP pipeline...");
}

Backend Setup

Jaeger (with OTLP receiver)

docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 4317:4317 \
-p 16686:16686 \
jaegertracing/all-in-one:latest

Access the UI at http://localhost:16686.

Grafana Tempo

# tempo.yaml
server:
http_listen_port: 3200

distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317

OpenTelemetry Collector

# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317

exporters:
logging:
loglevel: debug

service:
pipelines:
traces:
receivers: [otlp]
exporters: [logging]
metrics:
receivers: [otlp]
exporters: [logging]

Error Handling

The Error enum covers initialization failures:

VariantDescription
TraceOpenTelemetry trace subsystem error
MetricsOpenTelemetry metrics subsystem error
ExporterBuildOTLP exporter build error (e.g., invalid endpoint)
SetSubscriberFailed to set global tracing subscriber
Subscriber Conflict

init_tracing() calls tracing_subscriber::registry().try_init(), which will fail if a global subscriber is already set. Only call it once per process.

Environment Variables

VariableDescriptionExample
RUST_LOGControls tracing log level filterRUST_LOG=debug
OTEL_EXPORTER_OTLP_ENDPOINTOverride OTLP endpoint (standard OTEL env)http://collector:4317

Limitations

LimitationDescription
gRPC onlyOTLP export uses tonic (gRPC), no HTTP/JSON support
Single initinit_tracing() can only be called once per process
No auto-instrumentationHDDS spans must use tracing macros manually
No C FFIOTLP telemetry is not exposed in the C API

Next Steps