DDS Troubleshooting

Common DDS (Data Distribution Service) problems and how to identify them with JitterTrap. DDS uses the RTPS wire protocol over UDP, making packet timing analysis valuable for diagnosing QoS violations and network issues.

Discovery Failures — Participants can't find each other
Deadline Violations — Data not arriving on schedule
Latency Budget Exceeded — End-to-end delay too high
Reliability Retransmissions — Lost samples causing repairs
Large Data Fragmentation — Big messages vulnerable to loss
Multicast Issues — Discovery or data not reaching participants
Liveliness Failures — Participants appearing dead
General Diagnostic Workflow — Step-by-step approach
References — Key specifications

Discovery Failures

Symptoms: DataReaders and DataWriters don't match. Participants can't see each other. Application works on localhost but fails across network. New participants take long time to discover existing ones.

What It Looks Like in JitterTrap

In Top Talkers:

Look for multicast traffic to 239.255.0.1 (default SPDP multicast)
Discovery uses ports 7400-7410 range (domain 0)
Missing multicast flow indicates discovery traffic not reaching interface

In Packet Gap chart:

SPDP announcements should be periodic (typically every 3 seconds)
Irregular or missing announcements indicate discovery problems

How to Diagnose

Filter for discovery multicast address (239.255.0.1 for domain 0)
Verify periodic SPDP traffic from all expected participants
Check if traffic is unidirectional (participant sending but not receiving)
Look for asymmetric packet counts between participants
Set trap to capture when expected discovery traffic stops

Causes:

Multicast not enabled or routed on network
Firewall blocking UDP ports (7400+ for domain 0)
IGMP snooping misconfigured on switches
Participants on different DDS domains
Network interface binding issues (wrong NIC selected)

Solutions:

Verify multicast routing: ping 239.255.0.1 from each host
Check firewall rules for DDS port ranges
Configure switches for IGMP snooping or disable it
Verify all participants use same domain ID
Explicitly configure network interface in DDS config
Use unicast discovery peers for non-multicast networks

References: RTPS §8.5 (Discovery Protocol), DDS §7.1 (Domain)

Deadline Violations

Symptoms: DataReader reports deadline missed. Application logic expecting periodic data fails. Control loops become unstable due to missing updates.

What It Looks Like in JitterTrap

In Packet Gap chart:

Gaps exceeding the configured deadline QoS
Example: Deadline of 100ms but packet gaps of 150ms+

In Top Talkers:

Throughput dips or gaps correlating with deadline violations
Compare expected publication rate to actual packet rate

How to Diagnose

Calculate expected packet interval from publication rate
Compare to Packet Gap chart—gaps > deadline = violation
Determine if gaps are at source (publisher) or network-induced
Check if gaps correlate with network congestion (other flows spiking)
Set trap on packet gap exceeding deadline threshold

Causes:

Publisher not meeting publication rate (CPU, blocking calls)
Network congestion delaying packets
Publisher suspended or garbage collecting (Java/managed languages)
Incorrect deadline QoS configuration
System clock issues affecting timing

Solutions:

Profile publisher for blocking operations or CPU starvation
Ensure QoS deadline is achievable given network conditions
Add margin to deadline (e.g., if publishing at 100Hz, deadline > 10ms)
Use real-time OS features if strict timing required
Enable QoS monitoring in DDS implementation

References: DDS §2.2.3 (Deadline QoS)

Latency Budget Exceeded

Symptoms: End-to-end latency higher than application requirements. Data arrives too late to be useful. Time-sensitive applications miss processing windows.

What It Looks Like in JitterTrap

In Packet Gap chart:

Jitter (min/max spread) indicates variable network latency
Consistent high gaps suggest systematic delay

In Top Talkers (if TCP used for reliability):

TCP RTT shows network round-trip contribution
High RTT directly impacts DDS latency

How to Diagnose

Measure one-way network latency between participants
JitterTrap shows network contribution; add serialization/processing time
Total latency = serialization + network + deserialization + processing
Compare to LATENCY_BUDGET QoS setting
Identify largest contributor to total latency

Causes:

Network path latency (distance, hops, queuing)
Serialization overhead for complex types
DDS implementation buffering
Competing traffic causing queuing delays
Large messages taking longer to transmit

Solutions:

Reduce network hops where possible
Use simpler data types to reduce serialization time
Configure DDS transport for low latency (disable batching)
Enable QoS priorities for time-critical topics
Consider dedicated network for real-time traffic

References: DDS §2.2.3 (Latency Budget QoS)

Reliability Retransmissions

Symptoms: RELIABLE QoS but samples occasionally lost. Throughput lower than expected. Bursty traffic patterns with gaps followed by catch-up.

What It Looks Like in JitterTrap

In Packet Gap chart:

Irregular patterns: bursts of packets, then gaps, then more bursts
Retransmissions appear as "extra" packets after gaps

In Top Talkers:

Throughput shows sawtooth pattern (similar to TCP congestion)
Higher packet rate than expected publication rate (due to repairs)

How to Diagnose

Compare actual packet rate to expected publication rate
Higher actual rate indicates retransmissions occurring
Look for ACKNACK/HEARTBEAT patterns in packet capture
Check if retransmissions correlate with network loss events
Capture traffic during suspected loss for Wireshark analysis

Causes:

Network packet loss triggering RTPS repair mechanism
Writer history cache too small (old samples purged before ACK)
Reader too slow (can't keep up, NACKs repeatedly)
Network reordering interpreted as loss

Solutions:

Address underlying packet loss (see network troubleshooting)
Increase writer history depth for RELIABLE topics
Tune HEARTBEAT frequency for faster loss detection
Consider BEST_EFFORT for high-rate, loss-tolerant data
Increase reader receive buffer size

References: RTPS §8.4 (Reliability), DDS §2.2.3 (Reliability QoS)

Large Data Fragmentation

Symptoms: Large messages fail to deliver reliably. Small messages work fine but large ones don't. Sporadic failures with large data types.

What It Looks Like in JitterTrap

In Packet Gap chart:

Bursts of closely-spaced packets (fragments of one message)
Loss of any fragment loses entire message

In Top Talkers:

High packet rate relative to message rate (fragmentation overhead)
Throughput drops when large messages sent

How to Diagnose

Calculate message size vs MTU (typically 1500 bytes)
Messages > ~1400 bytes will be fragmented by RTPS
Look for burst patterns in packet timing (fragments sent together)
Check if failures correlate with message size
Capture large message transmission for fragment analysis

Causes:

Large messages fragmented into many UDP packets
Loss of single fragment requires entire message retransmission
Network more likely to drop during fragment bursts
Receiver reassembly buffer overflow

Solutions:

Reduce message size if possible (split into multiple samples)
Increase MTU if network supports jumbo frames
Enable DDS-level fragmentation with FEC
Use TCP transport for very large messages
Tune fragment reassembly timeout and buffer

References: RTPS §8.3.7 (Fragmentation)

Multicast Issues

Symptoms: Some participants receive data, others don't. Discovery works but data doesn't flow. Adding participants causes traffic to multiply unexpectedly.

What It Looks Like in JitterTrap

In Top Talkers:

Multicast traffic visible on some interfaces but not others
Unexpected unicast traffic (fallback from failed multicast)

In Throughput chart:

Asymmetric traffic patterns between participants
Traffic multiplication when multicast fails (unicast to each receiver)

How to Diagnose

Verify multicast group membership on each interface
Check if IGMP joins are being sent and acknowledged
Compare traffic on sender vs receiver interfaces
Look for unicast fallback traffic (indicates multicast failure)
Test with iperf or similar multicast test tool

Causes:

IGMP snooping dropping multicast before receivers join
Switches not forwarding multicast between VLANs
Receiver firewall blocking multicast
TTL too low for multicast to cross routers
Network interface not joined to multicast group

Solutions:

Configure IGMP querier on network
Enable multicast routing between subnets if needed
Verify firewall allows multicast UDP
Increase multicast TTL in DDS configuration
Use unicast for small deployments or hostile networks

References: RTPS §9.6 (UDP/IP Mapping), RFC 3376 (IGMPv3)

Liveliness Failures

Symptoms: Participants marked as "not alive" unexpectedly. Connections drop and reconnect. Applications see participant leave/join events during normal operation.

What It Looks Like in JitterTrap

In Packet Gap chart:

Gaps exceeding liveliness lease duration
Periodic liveliness assertions should be visible

In Top Talkers:

Participant's traffic stops, then resumes
Discovery traffic may continue while data traffic stops

How to Diagnose

Check configured LIVELINESS lease duration
Look for gaps in traffic exceeding lease duration
Determine if gaps are network-induced or source-side
Verify liveliness assertion mechanism (automatic vs manual)
Correlate with system events (GC pauses, CPU spikes)

Causes:

Lease duration too short for network conditions
Publisher stalled (GC, blocking I/O, CPU starvation)
Network partition isolating participant
Asymmetric routing (participant can send but not receive)
Clock skew between participants

Solutions:

Increase lease duration with appropriate margin
Use AUTOMATIC liveliness (DDS sends keepalives)
Avoid blocking operations in publishing thread
Monitor system resources (CPU, memory)
Ensure symmetric network connectivity

References: DDS §2.2.3 (Liveliness QoS)

General Diagnostic Workflow

For any DDS/RTPS issue:

Verify discovery — Can all participants see each other?
- Check for SPDP multicast traffic (239.255.0.1)
- Verify matching domain IDs
- Confirm network allows multicast or configure unicast peers
Check QoS compatibility — Mismatched QoS prevents matching:
- Reliability: RELIABLE writer required for RELIABLE reader
- Durability: Writer must meet or exceed reader's durability
- Deadline, latency budget, liveliness must be compatible
Examine network timing — DDS QoS depends on network behavior:
- Deadline requires consistent packet delivery
- Latency budget requires bounded network delay
- Liveliness requires periodic traffic to arrive
Look for packet loss — RTPS reliability helps but has limits:
- Check packet gap for missing intervals
- Capture traffic to count sequence numbers
- High retransmission rate indicates network issues
Monitor multicast — Many DDS issues stem from multicast:
- Discovery (SPDP/SEDP) uses multicast
- Data topics may use multicast
- Verify IGMP, switch config, firewall rules
Capture for analysis — Set traps to capture:
- When packet gap exceeds deadline
- When expected periodic traffic stops
- During participant join/leave events
Use Wireshark with RTPS dissector for detailed analysis.

References

Key Specifications

Spec	Title	Source
DDS 1.4	Data Distribution Service	OMG
RTPS 2.5	DDS Interoperability Wire Protocol	OMG
DDS-XTYPES	Extensible Types	OMG
DDS-SECURITY	DDS Security	OMG

ROS 2 DDS Tuning — ROS 2 DDS configuration
eProsima Fast DDS Docs — Popular open-source implementation
Eclipse Cyclone DDS — Another open-source implementation

TCP Troubleshooting — If using DDS over TCP transport
RTP Troubleshooting — Similar real-time concerns
Network Impairments — Test DDS behavior under adverse conditions

DDS Troubleshooting

Contents

Discovery Failures

What It Looks Like in JitterTrap

How to Diagnose

Deadline Violations

What It Looks Like in JitterTrap

How to Diagnose

Latency Budget Exceeded

What It Looks Like in JitterTrap

How to Diagnose

Reliability Retransmissions

What It Looks Like in JitterTrap

How to Diagnose

Large Data Fragmentation

What It Looks Like in JitterTrap

How to Diagnose

Multicast Issues

What It Looks Like in JitterTrap

How to Diagnose

Liveliness Failures

What It Looks Like in JitterTrap

How to Diagnose

General Diagnostic Workflow

References

Key Specifications

Related Resources

Related