OPC UA Troubleshooting

Common OPC UA problems and how to identify them with JitterTrap. OPC UA typically runs over TCP, so TCP analysis tools are directly applicable for diagnosing session, subscription, and connection issues.

Contents


Session Timeouts

Symptoms: Client disconnects unexpectedly. Session expires despite activity. Application must repeatedly reconnect. "BadSessionIdInvalid" or "BadSessionClosed" errors.

What It Looks Like in JitterTrap

In TCP RTT chart:

  • RTT spikes approaching session timeout value
  • High RTT can delay keepalive responses past timeout

In Packet Gap chart:

  • Gaps in traffic exceeding session timeout / 3 (typical keepalive interval)
  • Sudden traffic stop followed by new connection

In Top Talkers:

  • TCP connection terminates (Closed marker)
  • New connection to same server shortly after

How to Diagnose

  1. Note the configured session timeout (default often 30-60 seconds)
  2. Check if RTT spikes + processing time could exceed timeout
  3. Look for gaps in client→server traffic exceeding keepalive interval
  4. Verify keepalive ("Publish" with no data) is being sent
  5. Capture session timeout event for timing analysis

Causes:

  • Network latency causing responses to arrive late
  • Server overloaded, slow to respond to keepalive
  • Client not sending activity within timeout period
  • Firewall terminating idle connections
  • Session timeout configured too aggressively

Solutions:

  • Increase session timeout (at least 3× worst-case RTT)
  • Reduce network latency where possible
  • Ensure client sends regular activity (Publish requests)
  • Configure firewall for longer TCP idle timeout
  • Implement robust reconnection logic in client

References: OPC UA Part 4 §5.6 (Session Services)

Subscription Delays

Symptoms: Data arrives later than expected publishing interval. HMI updates lag behind actual values. Trending shows stale data. MonitoredItem queue fills up.

What It Looks Like in JitterTrap

In Packet Gap chart:

  • Irregular intervals instead of consistent publish period
  • Gaps larger than configured publishing interval

In TCP RTT chart:

  • RTT variability can delay Publish responses
  • High RTT reduces effective throughput

In Throughput chart:

  • Bursty traffic pattern (batched notifications)
  • Traffic rate lower than expected for item count

How to Diagnose

  1. Calculate expected traffic from publishing interval and item count
  2. Compare actual packet rate to expected Publish frequency
  3. Check RTT—high RTT means Publish response delayed
  4. Look for batching (multiple intervals of data in one response)
  5. Set trap for packet gaps exceeding publishing interval

Causes:

  • Network latency delaying Publish responses
  • Server batching notifications (MaxNotificationsPerPublish)
  • Server overloaded, falling behind on sampling
  • Publishing interval faster than server can sustain
  • TCP window or congestion limiting throughput

Solutions:

  • Increase publishing interval to match network capability
  • Reduce MaxNotificationsPerPublish for lower latency
  • Optimize server sampling (reduce monitored item count)
  • Check for TCP issues (window starvation, congestion)
  • Use multiple subscriptions for different priority data

References: OPC UA Part 4 §5.13 (Subscription Services)

Connection Failures

Symptoms: Cannot establish connection to server. Connection drops randomly. TLS handshake fails. "BadConnectionClosed" errors.

What It Looks Like in JitterTrap

In Top Talkers:

  • TCP SYN sent but no connection established
  • Or connection established then immediately closed
  • Repeated connection attempts visible

In TCP RTT chart:

  • New connection markers (▶) without sustained traffic
  • Connection closed markers (■) shortly after open

How to Diagnose

  1. Verify TCP connectivity (port 4840 default, or custom)
  2. Check if connection completes (SYN-ACK received)
  3. Look for TLS handshake (if secure endpoint)
  4. Capture connection attempt for detailed analysis
  5. Check for RST packets indicating rejection

Causes:

  • Firewall blocking OPC UA port
  • Server not running or not listening
  • Port number mismatch
  • TLS/certificate issues (expired, untrusted, hostname mismatch)
  • Server at connection limit

Solutions:

  • Verify firewall allows OPC UA port (default 4840)
  • Confirm server is running and endpoint URL is correct
  • Check certificate validity and trust chain
  • Verify client trusts server certificate (and vice versa for mutual auth)
  • Check server connection limits and license

References: OPC UA Part 6 §7.1 (TCP Mapping)

Publish Interval Violations

Symptoms: MonitoredItems not updating at expected rate. Some items update, others don't. Data changes missed. Sampling slower than configured.

What It Looks Like in JitterTrap

In Packet Gap chart:

  • Publish responses not arriving at configured interval
  • Irregular spacing between responses

In Throughput chart:

  • Lower throughput than expected for item count
  • Periodic bursts instead of steady stream

How to Diagnose

  1. Verify configured publishing interval vs actual arrival rate
  2. Check if sampling interval < publishing interval (sampling is independent)
  3. Look for "keep-alive" Publishes (empty responses)
  4. Compare server capability to configured rate
  5. Monitor for DataChangeNotification vs keep-alive ratio

Causes:

  • Publishing interval faster than server supports
  • Sampling interval not matched to data change rate
  • Server prioritizing other subscriptions
  • Network limiting achievable publish rate
  • Queue mode discarding unchanged values

Solutions:

  • Match publishing interval to actual requirements
  • Configure sampling interval appropriately
  • Use priority settings for critical subscriptions
  • Reduce monitored item count if server limited
  • Set queue size > 1 if samples might be lost

References: OPC UA Part 4 §5.12.1 (MonitoredItem)

Secure Channel Issues

Symptoms: Connection drops every hour (default token lifetime). "BadSecureChannelTokenUnknown" errors. Security alerts in server logs. Intermittent authentication failures.

What It Looks Like in JitterTrap

In Top Talkers:

  • Connection stable, then drops at regular intervals
  • New connection immediately after drop

In TCP RTT chart:

  • Traffic pattern normal until sudden termination
  • May see RST if renewal fails

In Packet Gap chart:

  • Regular pattern interrupted at token expiration time

How to Diagnose

  1. Note security token lifetime (default 3600000 ms = 1 hour)
  2. Check if disconnects occur at that interval
  3. Look for OpenSecureChannel renewal traffic before expiry
  4. Verify renewal succeeds (no connection drop)
  5. Capture traffic around token expiration time

Causes:

  • Client not renewing security token before expiry
  • Server rejecting renewal (certificate issue)
  • Clock skew between client and server
  • Network delay causing renewal to arrive late
  • Misconfigured token lifetime

Solutions:

  • Ensure client renews token at 75% of lifetime (standard practice)
  • Synchronize clocks using NTP
  • Increase token lifetime for high-latency networks
  • Check certificate validity and trust
  • Verify security policy compatibility

References: OPC UA Part 4 §5.5 (SecureChannel Services), Part 6 §6.7 (Security)

Large Response Problems

Symptoms: Requests for many nodes fail or timeout. Browse operations incomplete. Historical reads truncated. "BadResponseTooLarge" errors.

What It Looks Like in JitterTrap

In Throughput chart:

  • Large burst of traffic for single request
  • Possible TCP window issues during burst

In TCP Window chart:

  • Window shrinking during large transfer
  • Possible Zero Window events

In TCP RTT chart:

  • RTT may increase during large transfer
  • Indicates buffer/queuing delay

How to Diagnose

  1. Estimate response size (node count × data size)
  2. Check if size exceeds MaxMessageSize or MaxChunkCount
  3. Look for chunked message indicators in capture
  4. Monitor TCP window behavior during large transfers
  5. Check if request succeeds with fewer items

Causes:

  • Response exceeds configured message size limits
  • MaxChunkCount too low for response size
  • TCP receive buffer too small
  • Network MTU causing fragmentation
  • Server truncating due to resource limits

Solutions:

  • Increase MaxMessageSize and MaxChunkCount
  • Use paging (ContinuationPoint) for large result sets
  • Break large requests into smaller batches
  • Increase TCP receive buffer size
  • Check server resource configuration

References: OPC UA Part 6 §7.1.2 (Message Chunking)

Discovery Issues

Symptoms: Client can't find server. FindServers returns empty. GetEndpoints works from some clients but not others. Server not visible on network.

What It Looks Like in JitterTrap

In Top Talkers:

  • Traffic to discovery port (4840) but no response
  • Or response received but no subsequent connection

How to Diagnose

  1. Test connectivity to discovery endpoint (usually same as server)
  2. Check if Local Discovery Server (LDS) is configured
  3. Verify GetEndpoints returns valid endpoint URLs
  4. Check if returned URLs are reachable from client
  5. Test with UA Expert or similar tool to isolate issue

Causes:

  • Discovery endpoint not enabled
  • Server returning unreachable endpoint URLs (wrong hostname/IP)
  • Firewall blocking discovery port
  • LDS not running or not registering servers
  • Network segmentation preventing discovery

Solutions:

  • Enable discovery endpoint on server
  • Configure server to return reachable endpoint URLs
  • Ensure hostname in endpoint URL resolves correctly
  • Configure firewall for OPC UA ports
  • Use direct endpoint URL if discovery not required

References: OPC UA Part 4 §5.4 (Discovery Services), Part 12 (Discovery)

Queue Overflow

Symptoms: Data changes missed between publishes. MonitoredItem queue overflow notifications. Historical gaps in trending. Alarm transitions missed.

What It Looks Like in JitterTrap

In Packet Gap chart:

  • Normal pattern, but application reports missing data
  • Gap represents time where multiple changes occurred

In Throughput chart:

  • Traffic rate matches publish interval
  • But data content indicates overflow (sequence gaps)

How to Diagnose

  1. Check MonitoredItem queue size vs expected data change rate
  2. Calculate: changes per second × publishing interval = required queue
  3. Look for overflow indicators in OPC UA responses
  4. Verify sampling interval vs data change rate
  5. Check DataChangeFilter settings (might be filtering changes)

Causes:

  • Queue size too small for data change rate
  • Publishing interval too long for volatile data
  • Sampling interval causing aliases (sampling slower than changes)
  • Server discarding oldest values (queue policy)
  • DataChangeFilter with deadband filtering changes

Solutions:

  • Increase queue size to match change rate × publish interval
  • Reduce publishing interval for volatile data
  • Reduce sampling interval to catch all changes
  • Use queue discard policy appropriate for application
  • Adjust deadband filter if filtering needed changes

References: OPC UA Part 4 §5.12.1.4 (Queuing)

General Diagnostic Workflow

For any OPC UA issue:

  1. Verify TCP connectivity — OPC UA relies on TCP:

    • Can client reach server IP and port?
    • Check for firewall, routing, or DNS issues
    • See TCP Troubleshooting for TCP-specific issues
  2. Check session health — Session state is fundamental:

    • Is session established and maintained?
    • Are keepalives (empty Publishes) being sent?
    • Is session timeout appropriate for network conditions?
  3. Examine subscription timing — Most data issues are subscription-related:

    • Does publishing interval match requirements?
    • Is sampling interval appropriate for data change rate?
    • Are queue sizes adequate?
  4. Monitor security channel — Security issues cause mysterious drops:

    • When was token last renewed?
    • Is clock synchronized?
    • Are certificates valid and trusted?
  5. Check message sizes — Large requests cause problems:

    • What's the expected response size?
    • Is it within configured limits?
    • Are you using paging for large results?
  6. Capture for analysis — Set traps to capture:

    • When session drops (TCP connection close)
    • When gaps exceed publishing interval
    • Before and after expected token renewal

    Use Wireshark with OPC UA dissector for protocol analysis.

References

Key Specifications

PartTitleDescription
Part 1Overview and ConceptsArchitecture introduction
Part 4ServicesAll OPC UA services defined
Part 6MappingsTCP and other transport mappings
Part 7ProfilesConformance profiles
Part 12DiscoveryDiscovery services and LDS
Part 14PubSubPublish-subscribe extension