Tcp Window Auto Tuning Win 8.1 ^NEW^
Many of our clients send large files over the network as a part of their day-to-day business. In Windows, this can be slowed down by "network auto-tuning". If file transfer is slow over a network and you have troubleshot everything else, such as updating the NIC driver, remapping drives, software impedance, and the router and flashing the BIOS, you might try the steps below.
Tcp Window Auto Tuning Win 8.1
This command changes the custom TCP setting to have a value of 64 for the initial congestion window and use "Compound TCP" (an advanced TCP congestion control algorithm which is similar to cubic on Linux). The "InitialCongestionWindowMss" specifies the initial size of the congestion window, and it can be an even number from 2 through 64.
There is no way to adjust the default TCP buffer in Vista/7, which is 64 KB. Also, the Windows Vista/7 autotuning algorithm is not used unless the RTT is greater than 1 ms, so single stream TCP will be throttled on a LAN by this small default TCP buffer.
Before Windows Server 2008*, the network stack used a fixed-size receive-side window. Starting with Windows Server 2008, Windows provides TCP receive window auto-tuning. The registry keywords TcpWindowSize, NumTcbTablePartitions, and MaxHashTableSize are ignored starting with Windows Server 2008.
TCP receive window size (RWIN) is the amount of received data (in bytes) that can be buffered during a connection. According to Wikipedia, the sending host can send only up to that amount of data before it must wait for an acknowledgment and window update from the receiving host. When a receiver advertises the window size of 0, the sender stops sending data and starts the persist timer. The persist timer is used to protect TCP from the dead lock situation. The dead lock situation could be when the new window size update from the receiver is lost and the receiver has no more data to send while the sender is waiting for the new window size update. When the persist timer expires the TCP sender sends a small packet so that the receivers ACKs the packet with the new window size and TCP can recover from such situations.
The TCP window size field controls the flow of data and is limited to between 2 and 65,535 bytes, and cannot be expanded anymore. Thus, a scaling factor is used to get a larger TCP receive window size to achieve more efficient use of high bandwidth networks. The TCP window scale option is used to increase the maximum window size from 65,535 bytes to 1 Gigabyte. Scaling up to larger TCP congestion window sizes is a part of what is necessary for TCP Tuning. The window scale option is used only during the TCP 3-way handshake. The window scale value represents the number of bits to left-shift the 16-bit window size field, and can be set from 0 (no shift) to 14.
netsh interface tcp set global autotuninglevel=highlyrestricted Allow for the receive window to grow beyond the default value, but do so very conservatively. In this mode, Windows will by default use RWIN of 16,384 bytes with a scale factor of 2.
netsh interface tcp set global autotuninglevel=normal Allow for the receive window to grow to accommodate almost all scenarios. The default setting in Windows. Specifying this command mean you want to turn back on AutoTuning feature.
netsh interface tcp set global autotuninglevel=experimental Allow for the receive window to grow to accommodate extreme scenarios. Note The experimental value can decrease performance in common scenarios. This value should be used only for research purposes.
Clearly the link can sustain this high throughput, but I have to explicity set the window size to make any use of it, which most real world applications won't let me do. The TCP handshakes use the same starting points in each case, but the forced one scales
Without any forcing, it scales as expected. This can't be something in the intervening hops or our local switches/routers and seems to affect Windows 7 and 8 clients alike. I've read lots of guides on auto-tuning, but these are typically about disabling scaling altogether to work around bad terrible home networking kit.
This is the first second of the 1MB capture, zoomed in. You can see Slow Start in action as the window scales up and the buffer gets bigger. There's then this tiny plateau of 0.2s exactly at the point that the default window iperf test flattens out forever. This one of course scales to much dizzier heights, but it's curious that there's this pause in the scaling (Values are 1022bytes * 512 = 523264) before it does so.
No change following disabling heuristics and RWIN autotuning. Have updated the Intel network drivers to the latest (12.10.28.0) with software that exposes functioanlity tweaks viadevice manager tabs. The card is an 82579V Chipset on-board NIC - (I'm going to do some more testing from clients with realtek or other vendors)
8MB/s puts it up at the levels I was getting with explicitly large windows in iperf. Oddly, though, 80MB in 1273 buffers = a 64kB buffer again. A further wireshark shows a good, variable RWIN coming back from the server (Scale factor 256) that the client seems to fulfil; so perhaps ntttcp is misreporting the send window.
These algorithms work well for small BDPs and smaller receive window sizes. However, when you have a TCP connection with a large receive window size and a large BDP, such as replicating data between two servers located across a high-speed WAN link with a 100ms round-trip time, these algorithms do not increase the send window fast enough to fully utilize the bandwidth of the connection.
To better utilize the bandwidth of TCP connections in these situations, the Next Generation TCP/IP stack includes Compound TCP (CTCP). CTCP more aggressively increases the send window for connections with large receive window sizes and BDPs. CTCP attempts to maximize throughput on these types of connections by monitoring delay variations and losses. In addition, CTCP ensures that its behavior does not negatively impact other TCP connections.
The existing algorithms that prevent a sending TCP peer from overwhelming the network are known as slow start and congestion avoidance. These algorithms increase the amount of segments that the sender can send, known as the send window, when initially sending data on the connection and when recovering from a lost segment. Slow start increases the send window by one full TCP segment for either each acknowledgement segment received (for TCP in Windows XP and Windows Server 2003) or for each segment acknowledged (for TCP in Windows Vista and Windows Server 2008). Congestion avoidance increases the send window by one full TCP segment for each full window of data that is acknowledged.
These algorithms work well for LAN media speeds and smaller TCP window sizes. However, when you have a TCP connection with a large receive window size and a large bandwidth-delay product (high bandwidth and high delay), such as replicating data between two servers located across a high-speed WAN link with a 100 ms round trip time, these algorithms do not increase the send window fast enough to fully utilize the bandwidth of the connection. For example, on a 1 Gigabit per second (Gbps) WAN link with a 100 ms round trip time (RTT), it can take up to an hour for the send window to initially increase to the large window size being advertised by the receiver and to recover when there are lost segments.
To better utilize the bandwidth of TCP connections in these situations, the Next Generation TCP/IP stack includes Compound TCP (CTCP). CTCP more aggressively increases the send window for connections with large receive window sizes and large bandwidth-delay products. CTCP attempts to maximize throughput on these types of connections by monitoring delay variations and losses. CTCP also ensures that its behavior does not negatively impact other TCP connections.
My analysis is that the sender isn't sending fast enough because the send window (aka the congestion control window) isn't opening up enough to satisfy the RWIN of the receiver. So in short the receiver says "Give me More", and when Windows is the sender it isn't sending fast enough.
I don't know. interface tcp set global congestionprovider=ctcp sounds like the right thing to do to me because it would increase the send window (which is another term for the congestion window). You said that is isn't working. So just to make sure:
There's been some great info here by @Pat and @Kyle. Definitely pay attention to @Kyle's explanation of the TCP receive and send windows, I think there has been some confusion around that. To confuse matters further, iperf uses the term "TCP window" with the -w setting which is kind of an ambiguous term with regards to the receive, send, or overall sliding window. What it actually does is set the socket send buffer for the -c (client) instance and the socket receive buffer on the -s (server) instance. In src/tcp_window_size.c:
When sending data over a TCP connection using Windows sockets, it is important to keep a sufficient amount of data outstanding (sent but not acknowledged yet) in TCP in order to achieve the highest throughput. The ideal value for the amount of data outstanding to achieve the best throughput for the TCP connection is called the ideal send backlog (ISB) size. The ISB value is a function of the bandwidth-delay product of the TCP connection and the receiver's advertised receive window (and partly the amount of congestion in the network).
The average throughput of your most recent iperf test using the 64k window is 5.8Mbps. That's from Statistics > Summary in Wireshark, which counts all the bits. Likely, iperf is counting TCP data throughput which is 5.7Mbps. We see the same performance with the FTP test as well, 5.6Mbps.
What this demonstrates is that the send socket buffer directly controls the send window and that, coupled with the receive window from the other side, controls throughput. The advertised receive window has room, so we're not limited by the receiver.