OPC UA Troubleshooting
Common OPC UA problems and how to identify them with JitterTrap. OPC UA typically runs over TCP, so TCP analysis tools are directly applicable for diagnosing session, subscription, and connection issues.
Contents
- Session Timeouts — Sessions expiring unexpectedly
- Subscription Delays — Data not arriving on schedule
- Connection Failures — TCP connection problems
- Publish Interval Violations — Monitored items late
- Secure Channel Issues — Security token renewal failures
- Large Response Problems — Chunking and message size
- Discovery Issues — Finding servers on the network
- Queue Overflow — Monitored item data loss
- General Diagnostic Workflow — Step-by-step approach
- References — Key specifications
Session Timeouts
Symptoms: Client disconnects unexpectedly. Session expires despite activity. Application must repeatedly reconnect. "BadSessionIdInvalid" or "BadSessionClosed" errors.
What It Looks Like in JitterTrap
In TCP RTT chart:
- RTT spikes approaching session timeout value
- High RTT can delay keepalive responses past timeout
In Packet Gap chart:
- Gaps in traffic exceeding session timeout / 3 (typical keepalive interval)
- Sudden traffic stop followed by new connection
In Top Talkers:
- TCP connection terminates (Closed marker)
- New connection to same server shortly after
How to Diagnose
- Note the configured session timeout (default often 30-60 seconds)
- Check if RTT spikes + processing time could exceed timeout
- Look for gaps in client→server traffic exceeding keepalive interval
- Verify keepalive ("Publish" with no data) is being sent
- Capture session timeout event for timing analysis
Causes:
- Network latency causing responses to arrive late
- Server overloaded, slow to respond to keepalive
- Client not sending activity within timeout period
- Firewall terminating idle connections
- Session timeout configured too aggressively
Solutions:
- Increase session timeout (at least 3× worst-case RTT)
- Reduce network latency where possible
- Ensure client sends regular activity (Publish requests)
- Configure firewall for longer TCP idle timeout
- Implement robust reconnection logic in client
References: OPC UA Part 4 §5.6 (Session Services)
Subscription Delays
Symptoms: Data arrives later than expected publishing interval. HMI updates lag behind actual values. Trending shows stale data. MonitoredItem queue fills up.
What It Looks Like in JitterTrap
In Packet Gap chart:
- Irregular intervals instead of consistent publish period
- Gaps larger than configured publishing interval
In TCP RTT chart:
- RTT variability can delay Publish responses
- High RTT reduces effective throughput
In Throughput chart:
- Bursty traffic pattern (batched notifications)
- Traffic rate lower than expected for item count
How to Diagnose
- Calculate expected traffic from publishing interval and item count
- Compare actual packet rate to expected Publish frequency
- Check RTT—high RTT means Publish response delayed
- Look for batching (multiple intervals of data in one response)
- Set trap for packet gaps exceeding publishing interval
Causes:
- Network latency delaying Publish responses
- Server batching notifications (MaxNotificationsPerPublish)
- Server overloaded, falling behind on sampling
- Publishing interval faster than server can sustain
- TCP window or congestion limiting throughput
Solutions:
- Increase publishing interval to match network capability
- Reduce MaxNotificationsPerPublish for lower latency
- Optimize server sampling (reduce monitored item count)
- Check for TCP issues (window starvation, congestion)
- Use multiple subscriptions for different priority data
References: OPC UA Part 4 §5.13 (Subscription Services)
Connection Failures
Symptoms: Cannot establish connection to server. Connection drops randomly. TLS handshake fails. "BadConnectionClosed" errors.
What It Looks Like in JitterTrap
In Top Talkers:
- TCP SYN sent but no connection established
- Or connection established then immediately closed
- Repeated connection attempts visible
In TCP RTT chart:
- New connection markers (▶) without sustained traffic
- Connection closed markers (■) shortly after open
How to Diagnose
- Verify TCP connectivity (port 4840 default, or custom)
- Check if connection completes (SYN-ACK received)
- Look for TLS handshake (if secure endpoint)
- Capture connection attempt for detailed analysis
- Check for RST packets indicating rejection
Causes:
- Firewall blocking OPC UA port
- Server not running or not listening
- Port number mismatch
- TLS/certificate issues (expired, untrusted, hostname mismatch)
- Server at connection limit
Solutions:
- Verify firewall allows OPC UA port (default 4840)
- Confirm server is running and endpoint URL is correct
- Check certificate validity and trust chain
- Verify client trusts server certificate (and vice versa for mutual auth)
- Check server connection limits and license
References: OPC UA Part 6 §7.1 (TCP Mapping)
Publish Interval Violations
Symptoms: MonitoredItems not updating at expected rate. Some items update, others don't. Data changes missed. Sampling slower than configured.
What It Looks Like in JitterTrap
In Packet Gap chart:
- Publish responses not arriving at configured interval
- Irregular spacing between responses
In Throughput chart:
- Lower throughput than expected for item count
- Periodic bursts instead of steady stream
How to Diagnose
- Verify configured publishing interval vs actual arrival rate
- Check if sampling interval < publishing interval (sampling is independent)
- Look for "keep-alive" Publishes (empty responses)
- Compare server capability to configured rate
- Monitor for DataChangeNotification vs keep-alive ratio
Causes:
- Publishing interval faster than server supports
- Sampling interval not matched to data change rate
- Server prioritizing other subscriptions
- Network limiting achievable publish rate
- Queue mode discarding unchanged values
Solutions:
- Match publishing interval to actual requirements
- Configure sampling interval appropriately
- Use priority settings for critical subscriptions
- Reduce monitored item count if server limited
- Set queue size > 1 if samples might be lost
References: OPC UA Part 4 §5.12.1 (MonitoredItem)
Secure Channel Issues
Symptoms: Connection drops every hour (default token lifetime). "BadSecureChannelTokenUnknown" errors. Security alerts in server logs. Intermittent authentication failures.
What It Looks Like in JitterTrap
In Top Talkers:
- Connection stable, then drops at regular intervals
- New connection immediately after drop
In TCP RTT chart:
- Traffic pattern normal until sudden termination
- May see RST if renewal fails
In Packet Gap chart:
- Regular pattern interrupted at token expiration time
How to Diagnose
- Note security token lifetime (default 3600000 ms = 1 hour)
- Check if disconnects occur at that interval
- Look for OpenSecureChannel renewal traffic before expiry
- Verify renewal succeeds (no connection drop)
- Capture traffic around token expiration time
Causes:
- Client not renewing security token before expiry
- Server rejecting renewal (certificate issue)
- Clock skew between client and server
- Network delay causing renewal to arrive late
- Misconfigured token lifetime
Solutions:
- Ensure client renews token at 75% of lifetime (standard practice)
- Synchronize clocks using NTP
- Increase token lifetime for high-latency networks
- Check certificate validity and trust
- Verify security policy compatibility
References: OPC UA Part 4 §5.5 (SecureChannel Services), Part 6 §6.7 (Security)
Large Response Problems
Symptoms: Requests for many nodes fail or timeout. Browse operations incomplete. Historical reads truncated. "BadResponseTooLarge" errors.
What It Looks Like in JitterTrap
In Throughput chart:
- Large burst of traffic for single request
- Possible TCP window issues during burst
In TCP Window chart:
- Window shrinking during large transfer
- Possible Zero Window events
In TCP RTT chart:
- RTT may increase during large transfer
- Indicates buffer/queuing delay
How to Diagnose
- Estimate response size (node count × data size)
- Check if size exceeds MaxMessageSize or MaxChunkCount
- Look for chunked message indicators in capture
- Monitor TCP window behavior during large transfers
- Check if request succeeds with fewer items
Causes:
- Response exceeds configured message size limits
- MaxChunkCount too low for response size
- TCP receive buffer too small
- Network MTU causing fragmentation
- Server truncating due to resource limits
Solutions:
- Increase MaxMessageSize and MaxChunkCount
- Use paging (ContinuationPoint) for large result sets
- Break large requests into smaller batches
- Increase TCP receive buffer size
- Check server resource configuration
References: OPC UA Part 6 §7.1.2 (Message Chunking)
Discovery Issues
Symptoms: Client can't find server. FindServers returns empty. GetEndpoints works from some clients but not others. Server not visible on network.
What It Looks Like in JitterTrap
In Top Talkers:
- Traffic to discovery port (4840) but no response
- Or response received but no subsequent connection
How to Diagnose
- Test connectivity to discovery endpoint (usually same as server)
- Check if Local Discovery Server (LDS) is configured
- Verify GetEndpoints returns valid endpoint URLs
- Check if returned URLs are reachable from client
- Test with UA Expert or similar tool to isolate issue
Causes:
- Discovery endpoint not enabled
- Server returning unreachable endpoint URLs (wrong hostname/IP)
- Firewall blocking discovery port
- LDS not running or not registering servers
- Network segmentation preventing discovery
Solutions:
- Enable discovery endpoint on server
- Configure server to return reachable endpoint URLs
- Ensure hostname in endpoint URL resolves correctly
- Configure firewall for OPC UA ports
- Use direct endpoint URL if discovery not required
References: OPC UA Part 4 §5.4 (Discovery Services), Part 12 (Discovery)
Queue Overflow
Symptoms: Data changes missed between publishes. MonitoredItem queue overflow notifications. Historical gaps in trending. Alarm transitions missed.
What It Looks Like in JitterTrap
In Packet Gap chart:
- Normal pattern, but application reports missing data
- Gap represents time where multiple changes occurred
In Throughput chart:
- Traffic rate matches publish interval
- But data content indicates overflow (sequence gaps)
How to Diagnose
- Check MonitoredItem queue size vs expected data change rate
- Calculate: changes per second × publishing interval = required queue
- Look for overflow indicators in OPC UA responses
- Verify sampling interval vs data change rate
- Check DataChangeFilter settings (might be filtering changes)
Causes:
- Queue size too small for data change rate
- Publishing interval too long for volatile data
- Sampling interval causing aliases (sampling slower than changes)
- Server discarding oldest values (queue policy)
- DataChangeFilter with deadband filtering changes
Solutions:
- Increase queue size to match change rate × publish interval
- Reduce publishing interval for volatile data
- Reduce sampling interval to catch all changes
- Use queue discard policy appropriate for application
- Adjust deadband filter if filtering needed changes
References: OPC UA Part 4 §5.12.1.4 (Queuing)
General Diagnostic Workflow
For any OPC UA issue:
-
Verify TCP connectivity — OPC UA relies on TCP:
- Can client reach server IP and port?
- Check for firewall, routing, or DNS issues
- See TCP Troubleshooting for TCP-specific issues
-
Check session health — Session state is fundamental:
- Is session established and maintained?
- Are keepalives (empty Publishes) being sent?
- Is session timeout appropriate for network conditions?
-
Examine subscription timing — Most data issues are subscription-related:
- Does publishing interval match requirements?
- Is sampling interval appropriate for data change rate?
- Are queue sizes adequate?
-
Monitor security channel — Security issues cause mysterious drops:
- When was token last renewed?
- Is clock synchronized?
- Are certificates valid and trusted?
-
Check message sizes — Large requests cause problems:
- What's the expected response size?
- Is it within configured limits?
- Are you using paging for large results?
-
Capture for analysis — Set traps to capture:
- When session drops (TCP connection close)
- When gaps exceed publishing interval
- Before and after expected token renewal
Use Wireshark with OPC UA dissector for protocol analysis.
References
Key Specifications
| Part | Title | Description |
|---|---|---|
| Part 1 | Overview and Concepts | Architecture introduction |
| Part 4 | Services | All OPC UA services defined |
| Part 6 | Mappings | TCP and other transport mappings |
| Part 7 | Profiles | Conformance profiles |
| Part 12 | Discovery | Discovery services and LDS |
| Part 14 | PubSub | Publish-subscribe extension |
Related Resources
- OPC Foundation — Standards and membership
- open62541 — Open-source OPC UA implementation
- Eclipse Milo — Java OPC UA stack
- FreeOpcUa — Python OPC UA
Related
- TCP Troubleshooting — OPC UA runs over TCP
- Network Impairments — Test OPC UA under adverse conditions
- Traps & Capture — Capture session timeout events