DDS Troubleshooting

Common DDS (Data Distribution Service) problems and how to identify them with JitterTrap. DDS uses the RTPS wire protocol over UDP, making packet timing analysis valuable for diagnosing QoS violations and network issues.

Contents


Discovery Failures

Symptoms: DataReaders and DataWriters don't match. Participants can't see each other. Application works on localhost but fails across network. New participants take long time to discover existing ones.

What It Looks Like in JitterTrap

In Top Talkers:

  • Look for multicast traffic to 239.255.0.1 (default SPDP multicast)
  • Discovery uses ports 7400-7410 range (domain 0)
  • Missing multicast flow indicates discovery traffic not reaching interface

In Packet Gap chart:

  • SPDP announcements should be periodic (typically every 3 seconds)
  • Irregular or missing announcements indicate discovery problems

How to Diagnose

  1. Filter for discovery multicast address (239.255.0.1 for domain 0)
  2. Verify periodic SPDP traffic from all expected participants
  3. Check if traffic is unidirectional (participant sending but not receiving)
  4. Look for asymmetric packet counts between participants
  5. Set trap to capture when expected discovery traffic stops

Causes:

  • Multicast not enabled or routed on network
  • Firewall blocking UDP ports (7400+ for domain 0)
  • IGMP snooping misconfigured on switches
  • Participants on different DDS domains
  • Network interface binding issues (wrong NIC selected)

Solutions:

  • Verify multicast routing: ping 239.255.0.1 from each host
  • Check firewall rules for DDS port ranges
  • Configure switches for IGMP snooping or disable it
  • Verify all participants use same domain ID
  • Explicitly configure network interface in DDS config
  • Use unicast discovery peers for non-multicast networks

References: RTPS §8.5 (Discovery Protocol), DDS §7.1 (Domain)

Deadline Violations

Symptoms: DataReader reports deadline missed. Application logic expecting periodic data fails. Control loops become unstable due to missing updates.

What It Looks Like in JitterTrap

In Packet Gap chart:

  • Gaps exceeding the configured deadline QoS
  • Example: Deadline of 100ms but packet gaps of 150ms+

In Top Talkers:

  • Throughput dips or gaps correlating with deadline violations
  • Compare expected publication rate to actual packet rate

How to Diagnose

  1. Calculate expected packet interval from publication rate
  2. Compare to Packet Gap chart—gaps > deadline = violation
  3. Determine if gaps are at source (publisher) or network-induced
  4. Check if gaps correlate with network congestion (other flows spiking)
  5. Set trap on packet gap exceeding deadline threshold

Causes:

  • Publisher not meeting publication rate (CPU, blocking calls)
  • Network congestion delaying packets
  • Publisher suspended or garbage collecting (Java/managed languages)
  • Incorrect deadline QoS configuration
  • System clock issues affecting timing

Solutions:

  • Profile publisher for blocking operations or CPU starvation
  • Ensure QoS deadline is achievable given network conditions
  • Add margin to deadline (e.g., if publishing at 100Hz, deadline > 10ms)
  • Use real-time OS features if strict timing required
  • Enable QoS monitoring in DDS implementation

References: DDS §2.2.3 (Deadline QoS)

Latency Budget Exceeded

Symptoms: End-to-end latency higher than application requirements. Data arrives too late to be useful. Time-sensitive applications miss processing windows.

What It Looks Like in JitterTrap

In Packet Gap chart:

  • Jitter (min/max spread) indicates variable network latency
  • Consistent high gaps suggest systematic delay

In Top Talkers (if TCP used for reliability):

  • TCP RTT shows network round-trip contribution
  • High RTT directly impacts DDS latency

How to Diagnose

  1. Measure one-way network latency between participants
  2. JitterTrap shows network contribution; add serialization/processing time
  3. Total latency = serialization + network + deserialization + processing
  4. Compare to LATENCY_BUDGET QoS setting
  5. Identify largest contributor to total latency

Causes:

  • Network path latency (distance, hops, queuing)
  • Serialization overhead for complex types
  • DDS implementation buffering
  • Competing traffic causing queuing delays
  • Large messages taking longer to transmit

Solutions:

  • Reduce network hops where possible
  • Use simpler data types to reduce serialization time
  • Configure DDS transport for low latency (disable batching)
  • Enable QoS priorities for time-critical topics
  • Consider dedicated network for real-time traffic

References: DDS §2.2.3 (Latency Budget QoS)

Reliability Retransmissions

Symptoms: RELIABLE QoS but samples occasionally lost. Throughput lower than expected. Bursty traffic patterns with gaps followed by catch-up.

What It Looks Like in JitterTrap

In Packet Gap chart:

  • Irregular patterns: bursts of packets, then gaps, then more bursts
  • Retransmissions appear as "extra" packets after gaps

In Top Talkers:

  • Throughput shows sawtooth pattern (similar to TCP congestion)
  • Higher packet rate than expected publication rate (due to repairs)

How to Diagnose

  1. Compare actual packet rate to expected publication rate
  2. Higher actual rate indicates retransmissions occurring
  3. Look for ACKNACK/HEARTBEAT patterns in packet capture
  4. Check if retransmissions correlate with network loss events
  5. Capture traffic during suspected loss for Wireshark analysis

Causes:

  • Network packet loss triggering RTPS repair mechanism
  • Writer history cache too small (old samples purged before ACK)
  • Reader too slow (can't keep up, NACKs repeatedly)
  • Network reordering interpreted as loss

Solutions:

  • Address underlying packet loss (see network troubleshooting)
  • Increase writer history depth for RELIABLE topics
  • Tune HEARTBEAT frequency for faster loss detection
  • Consider BEST_EFFORT for high-rate, loss-tolerant data
  • Increase reader receive buffer size

References: RTPS §8.4 (Reliability), DDS §2.2.3 (Reliability QoS)

Large Data Fragmentation

Symptoms: Large messages fail to deliver reliably. Small messages work fine but large ones don't. Sporadic failures with large data types.

What It Looks Like in JitterTrap

In Packet Gap chart:

  • Bursts of closely-spaced packets (fragments of one message)
  • Loss of any fragment loses entire message

In Top Talkers:

  • High packet rate relative to message rate (fragmentation overhead)
  • Throughput drops when large messages sent

How to Diagnose

  1. Calculate message size vs MTU (typically 1500 bytes)
  2. Messages > ~1400 bytes will be fragmented by RTPS
  3. Look for burst patterns in packet timing (fragments sent together)
  4. Check if failures correlate with message size
  5. Capture large message transmission for fragment analysis

Causes:

  • Large messages fragmented into many UDP packets
  • Loss of single fragment requires entire message retransmission
  • Network more likely to drop during fragment bursts
  • Receiver reassembly buffer overflow

Solutions:

  • Reduce message size if possible (split into multiple samples)
  • Increase MTU if network supports jumbo frames
  • Enable DDS-level fragmentation with FEC
  • Use TCP transport for very large messages
  • Tune fragment reassembly timeout and buffer

References: RTPS §8.3.7 (Fragmentation)

Multicast Issues

Symptoms: Some participants receive data, others don't. Discovery works but data doesn't flow. Adding participants causes traffic to multiply unexpectedly.

What It Looks Like in JitterTrap

In Top Talkers:

  • Multicast traffic visible on some interfaces but not others
  • Unexpected unicast traffic (fallback from failed multicast)

In Throughput chart:

  • Asymmetric traffic patterns between participants
  • Traffic multiplication when multicast fails (unicast to each receiver)

How to Diagnose

  1. Verify multicast group membership on each interface
  2. Check if IGMP joins are being sent and acknowledged
  3. Compare traffic on sender vs receiver interfaces
  4. Look for unicast fallback traffic (indicates multicast failure)
  5. Test with iperf or similar multicast test tool

Causes:

  • IGMP snooping dropping multicast before receivers join
  • Switches not forwarding multicast between VLANs
  • Receiver firewall blocking multicast
  • TTL too low for multicast to cross routers
  • Network interface not joined to multicast group

Solutions:

  • Configure IGMP querier on network
  • Enable multicast routing between subnets if needed
  • Verify firewall allows multicast UDP
  • Increase multicast TTL in DDS configuration
  • Use unicast for small deployments or hostile networks

References: RTPS §9.6 (UDP/IP Mapping), RFC 3376 (IGMPv3)

Liveliness Failures

Symptoms: Participants marked as "not alive" unexpectedly. Connections drop and reconnect. Applications see participant leave/join events during normal operation.

What It Looks Like in JitterTrap

In Packet Gap chart:

  • Gaps exceeding liveliness lease duration
  • Periodic liveliness assertions should be visible

In Top Talkers:

  • Participant's traffic stops, then resumes
  • Discovery traffic may continue while data traffic stops

How to Diagnose

  1. Check configured LIVELINESS lease duration
  2. Look for gaps in traffic exceeding lease duration
  3. Determine if gaps are network-induced or source-side
  4. Verify liveliness assertion mechanism (automatic vs manual)
  5. Correlate with system events (GC pauses, CPU spikes)

Causes:

  • Lease duration too short for network conditions
  • Publisher stalled (GC, blocking I/O, CPU starvation)
  • Network partition isolating participant
  • Asymmetric routing (participant can send but not receive)
  • Clock skew between participants

Solutions:

  • Increase lease duration with appropriate margin
  • Use AUTOMATIC liveliness (DDS sends keepalives)
  • Avoid blocking operations in publishing thread
  • Monitor system resources (CPU, memory)
  • Ensure symmetric network connectivity

References: DDS §2.2.3 (Liveliness QoS)

General Diagnostic Workflow

For any DDS/RTPS issue:

  1. Verify discovery — Can all participants see each other?

    • Check for SPDP multicast traffic (239.255.0.1)
    • Verify matching domain IDs
    • Confirm network allows multicast or configure unicast peers
  2. Check QoS compatibility — Mismatched QoS prevents matching:

    • Reliability: RELIABLE writer required for RELIABLE reader
    • Durability: Writer must meet or exceed reader's durability
    • Deadline, latency budget, liveliness must be compatible
  3. Examine network timing — DDS QoS depends on network behavior:

    • Deadline requires consistent packet delivery
    • Latency budget requires bounded network delay
    • Liveliness requires periodic traffic to arrive
  4. Look for packet loss — RTPS reliability helps but has limits:

    • Check packet gap for missing intervals
    • Capture traffic to count sequence numbers
    • High retransmission rate indicates network issues
  5. Monitor multicast — Many DDS issues stem from multicast:

    • Discovery (SPDP/SEDP) uses multicast
    • Data topics may use multicast
    • Verify IGMP, switch config, firewall rules
  6. Capture for analysis — Set traps to capture:

    • When packet gap exceeds deadline
    • When expected periodic traffic stops
    • During participant join/leave events

    Use Wireshark with RTPS dissector for detailed analysis.

References

Key Specifications

SpecTitleSource
DDS 1.4Data Distribution ServiceOMG
RTPS 2.5DDS Interoperability Wire ProtocolOMG
DDS-XTYPESExtensible TypesOMG
DDS-SECURITYDDS SecurityOMG