TCP Troubleshooting

Common TCP problems and how to identify them with JitterTrap. Each section describes what to look for in the charts and how to diagnose the root cause.

Bufferbloat — Latency increases under load
Receive Window Starvation — Slow receiver limits throughput
Retransmission Storms — Frequent packet loss
Head-of-Line Blocking — Stalls from in-order delivery
Nagle's Algorithm + Delayed ACK — 40ms latency on small writes
Congestion Window Collapse — Sawtooth throughput pattern
RTO Stalls — Multi-second stalls after loss
Silly Window Syndrome — Tiny packets, poor efficiency
TCP vs UDP — When TCP's guarantees hurt performance
General Diagnostic Workflow — Step-by-step approach
References — Key RFCs

Bufferbloat

Symptoms: Latency increases dramatically under load. A connection that shows 20ms RTT when idle may spike to 500ms+ when saturated. Interactive applications become sluggish during bulk transfers.

What It Looks Like in JitterTrap

In the TCP RTT chart:

Baseline RTT is low (e.g., 20ms) when idle
RTT climbs steadily as throughput increases
RTT may reach 500ms or more at full load
RTT returns to baseline when transfer completes

In the Throughput chart:

High throughput correlates with high RTT
The correlation is the signature—RTT tracks throughput

How to Test for Bufferbloat

Start JitterTrap and establish baseline RTT to a remote host
Begin a large file transfer (saturate the link)
Watch the RTT chart—if it climbs from 20ms to 200ms+, you have bufferbloat
Stop the transfer and confirm RTT returns to baseline

Causes: Oversized buffers in routers, switches, or host network stacks that allow excessive queuing.

Solutions:

Enable Active Queue Management (AQM) like fq_codel on routers
Reduce buffer sizes on network equipment
Use TCP congestion control algorithms designed for bufferbloat (BBR, CUBIC with ECN)

References: RFC 7567 (AQM Recommendations), RFC 8289 (CoDel), Bufferbloat.net

Receive Window Starvation

Symptoms: Throughput is limited even though the network has capacity. The receiver can't process data fast enough.

What It Looks Like in JitterTrap

In the TCP Window chart:

Advertised window drops toward zero
Zero Window markers (⚠) appear
Window may oscillate between zero and small values
Pattern is consistent regardless of RTT

In the Throughput chart:

Throughput drops when window shrinks
May see "staircase" pattern as window opens and closes

How to Diagnose

Watch the TCP Window chart for a suspect flow
If window drops to zero while throughput also drops, receiver is the bottleneck
Capture packets during the event
In Wireshark, look for Window Full and Zero Window events

Causes: Slow application not reading from socket buffers, or socket receive buffer too small.

Solutions:

Profile and optimize the receiving application
Increase socket receive buffer size (SO_RCVBUF)
Check for application-level backpressure

References: RFC 793 (TCP Flow Control), RFC 7323 (Window Scaling)

Retransmission Storms

Symptoms: Poor throughput despite adequate bandwidth. High CPU usage on endpoints.

What It Looks Like in JitterTrap

In the TCP Window chart:

Frequent Retransmit markers (↩)
Markers may be clustered (burst loss) or evenly distributed (steady loss)
Window size may fluctuate as congestion control reacts

In the TCP RTT chart:

RTT may spike during retransmission events
Erratic RTT pattern if loss is causing timeout-based retransmits
Smoother RTT if fast retransmit (duplicate ACKs) is working

How to Diagnose

Count retransmit markers over time—occasional is normal, frequent indicates a problem
Note if retransmits are clustered (burst loss) or distributed (random loss)
Set a trap to capture packets when retransmits exceed a threshold
Analyze in Wireshark to determine if loss is at a specific hop

Causes: Packet loss from congestion, bad links, or MTU issues.

Solutions:

Identify where loss is occurring (use packet capture)
Check for duplex mismatches
Verify MTU is consistent across path
Look for congested links or failing hardware

References: RFC 5681 (Fast Retransmit), RFC 6298 (RTO Calculation)

Head-of-Line Blocking

Symptoms: Periodic stalls in data delivery even when packets are arriving.

What It Looks Like in JitterTrap

In the Throughput chart:

Gaps or dips that don't correlate with network congestion
Throughput returns to normal after brief pause
Pattern may be periodic if the same packet position is vulnerable

In the TCP Window chart:

Dup ACK markers during the stall
Window may remain healthy (receiver has space, just waiting for in-order data)

How to Diagnose

Look for throughput dips that don't match RTT spikes
Check for Dup ACK markers (indicate out-of-order arrival)
If application streams multiple independent data types over one TCP connection, head-of-line blocking is likely
Capture during a stall to see the out-of-order packets in Wireshark

Causes: TCP's in-order delivery requirement means one lost packet stalls all following data.

Solutions:

Consider QUIC or other protocols with stream multiplexing
Use multiple TCP connections for independent data streams
Reduce RTT to minimize stall duration

References: RFC 793 (In-Order Delivery), RFC 9000 (QUIC)

Nagle's Algorithm + Delayed ACK

Symptoms: Small writes have unexpectedly high latency (often ~40ms).

What It Looks Like in JitterTrap

In the TCP RTT chart:

Very consistent ~40ms RTT spikes
The regularity is the key signature—network jitter is random, this is fixed
Pattern appears on request/response workloads with small messages

In the Throughput chart:

Low throughput with periodic bursts
Each burst separated by ~40ms gaps

How to Diagnose

Look for suspiciously consistent 40ms RTT
Check if the pattern occurs only with small messages
Capture packets and look for delayed ACKs (200ms timer reduced to 40ms in most stacks)
Test with TCP_NODELAY to confirm—if RTT drops dramatically, this was the cause

Causes: Nagle's algorithm waits for ACK before sending small packets. Delayed ACK waits ~40ms before acknowledging. Together they create artificial delays.

Solutions:

Set TCP_NODELAY on latency-sensitive sockets
Use TCP_QUICKACK on the receiver
Batch small writes into larger ones

References: RFC 896 (Nagle's Algorithm), RFC 1122 §4.2.3.2 (Delayed ACK)

Congestion Window Collapse

Symptoms: Throughput drops sharply and recovers slowly after packet loss.

What It Looks Like in JitterTrap

In the Throughput chart:

Sawtooth pattern: gradual increase, sharp drop, slow recovery
Each cycle takes several RTTs to recover
May see multiple cycles during sustained transfer

In the TCP RTT chart:

RTT increases as congestion builds (bufferbloat)
Retransmit markers appear
RTT drops when congestion control backs off

How to Diagnose

Look for the sawtooth throughput pattern
Note if RTT spikes precede the throughput drops (bufferbloat triggering loss)
Time the recovery—slow ramp indicates traditional AIMD congestion control
Compare behavior with different congestion control algorithms (BBR vs CUBIC)

Causes: TCP's congestion control cuts the sending rate dramatically after detecting loss.

Solutions:

Reduce packet loss (the real fix)
Consider BBR congestion control for lossy links
Use ECN to get early congestion signals before loss occurs

References: RFC 5681 (Congestion Control), RFC 8312 (CUBIC), RFC 3168 (ECN)

Retransmission Timeout (RTO) Stalls

Symptoms: Long stalls (1-3+ seconds) followed by a burst of activity. Much worse than typical packet loss recovery.

What It Looks Like in JitterTrap

In the TCP RTT chart:

Gaps of 1+ seconds with no data
Multiple Retransmit markers (↩) clustered after the gap
Pattern: silence, then burst of retransmits, then recovery

In the Throughput chart:

Complete stop, then sudden burst
Much longer pause than normal retransmission

How to Diagnose

Time the stall duration—1+ seconds indicates RTO, not fast retransmit
Check if retransmits cluster after the gap (RTO fired)
Look for patterns—tail loss (end of burst) often triggers RTO
Capture packets and check if fast retransmit (3 dup ACKs) failed

Causes: When fast retransmit (3 duplicate ACKs) fails, TCP falls back to RTO-based recovery. The minimum RTO is often 200ms-1s, and it doubles with each failed attempt (exponential backoff). A lost retransmit can cause multi-second stalls.

Solutions:

Investigate why fast retransmit is failing (tail loss, small windows)
Enable TLP (Tail Loss Probe) and RACK if available
For latency-sensitive applications, these stalls may be unacceptable—consider UDP

References: RFC 6298 (RTO Calculation), RFC 5681 §3.2 (Fast Retransmit)

Silly Window Syndrome

Symptoms: High packet rate but low throughput. Lots of small packets instead of full-sized segments.

What It Looks Like in JitterTrap

In the TCP Window chart:

Very small advertised window values (bytes, not KB)
Window may oscillate between tiny values

In the Top Talkers:

High packet count relative to byte count
Throughput is a fraction of expected

How to Diagnose

Compare packet rate to byte rate—if packet rate is high but throughput is low, packets are small
Check TCP Window for tiny values
Look for recovery pattern after window starvation

Causes: Receiver advertises tiny windows (e.g., after window starvation recovery). Sender sends tiny segments to fill the advertised window. Overhead dominates.

Solutions:

Most TCP stacks have SWS avoidance built in
If you're seeing this, check for broken or embedded TCP implementations
Increase receive buffer sizes

References: RFC 813 (Window and Acknowledgement Strategy), RFC 1122 §4.2.3.4 (SWS Avoidance)

TCP vs UDP: When TCP Hurts

TCP is designed for reliable, ordered delivery of bulk data. These guarantees come at a cost that's often invisible until you look closely:

TCP Behavior	Cost for Real-Time Systems
Guaranteed delivery	Stalls waiting for retransmits of data that may no longer be relevant
In-order delivery	Head-of-line blocking—one lost packet blocks everything behind it
Congestion control	Throughput collapse after loss; slow recovery; competing flows affect each other
Connection establishment	1.5 RTT before first data byte; connection state on both ends
Flow control	Slow receiver blocks fast sender, even if data could be dropped

Consider UDP when:

You can tolerate some loss
Need lowest latency
Data has a "freshness" deadline
You want application-level control over retransmission decisions

Examples: VoIP, video conferencing, gaming, live telemetry, sensor data, financial trading, DNS.

General Diagnostic Workflow

For any TCP performance issue:

Establish baseline — Observe charts during normal operation. Know what "good" looks like.
Identify the flow — Use Top Talkers to find the specific connection with issues.
Check RTT first — High or variable RTT affects almost everything else.
- High RTT → check for bufferbloat, long paths, or congestion
- Variable RTT → check for jitter, route changes, or competing traffic
Check the Window — If throughput is limited but RTT is reasonable:
- Small window → receiver issue (application not reading, buffer too small)
- Window collapse → congestion control reacting to loss
Look for markers — Retransmit (↩) and Zero Window (⚠) markers tell you what's happening:
- Many retransmits → packet loss problem
- Zero window → receiver backpressure
Correlate events — The most useful insights come from correlating multiple charts:
- RTT spike + throughput drop → bufferbloat
- Window drop + throughput drop → receiver starvation
- Retransmit + throughput drop → packet loss
Capture packets — Set traps to automatically capture when thresholds are exceeded. Analyze in Wireshark for definitive diagnosis.

References

Key RFCs

RFC	Title
RFC 793	Transmission Control Protocol
RFC 896	Congestion Control in IP/TCP (Nagle)
RFC 1122	Requirements for Internet Hosts
RFC 3168	Explicit Congestion Notification
RFC 5681	TCP Congestion Control
RFC 6298	Computing TCP's Retransmission Timer
RFC 7323	TCP Extensions for High Performance
RFC 7567	IETF Recommendations Regarding AQM
RFC 8312	CUBIC Congestion Control
RFC 9000	QUIC Transport Protocol

Media Streaming — How these problems affect streaming applications
Network Impairments — Test how your application handles these conditions

TCP Troubleshooting

Contents

Bufferbloat

What It Looks Like in JitterTrap

How to Test for Bufferbloat

Receive Window Starvation

What It Looks Like in JitterTrap

How to Diagnose

Retransmission Storms

What It Looks Like in JitterTrap

How to Diagnose

Head-of-Line Blocking

What It Looks Like in JitterTrap

How to Diagnose

Nagle's Algorithm + Delayed ACK

What It Looks Like in JitterTrap

How to Diagnose

Congestion Window Collapse

What It Looks Like in JitterTrap

How to Diagnose

Retransmission Timeout (RTO) Stalls

What It Looks Like in JitterTrap

How to Diagnose

Silly Window Syndrome

What It Looks Like in JitterTrap

How to Diagnose

TCP vs UDP: When TCP Hurts

General Diagnostic Workflow

References

Key RFCs

Related