melevittfl
20-10-2004, 15:06
Over the last couple of days, I've been having serious performance problems trying to reach certain sites.
This occured just after I had a short outage of about 15 minutes.
Here's what I think has happened:
During the outage, NTL installed a new router, proxy, traffic shaper, or some other bit of kit.
This new NTL equipment is not properly implementing RFC 1323 (http://www.faqs.org/rfcs/rfc1323.html)
RFC1323 defines a way for two machines to use a large TCP window size. Originally, TCP window size was limited to 64kb. This is too small to efficiently use the bandwidth or more modern networks. So, RFC1323 defines a standard by which connections can be set up with a window "scaling" factor.
RFC1323 uses one of the TCP option fields to defice a "window scaling factor". From lwn.net:
"...a system wanting to use window scaling sets a TCP option containing an eight-bit scale factor. All window values used by that system thereafter should be left-shifted by that scale factor; a window scale of zero, thus, implies no scaling at all, while a scale factor of five implies that window sizes should be shifted five bits, or multiplied by 32. With this scheme, a 128KB window could be expressed by setting the scale factor to five and putting 4096 in the window field."
The problem is that some network devices (and specificly whichever bit of kit NTL just added in the Reading area) are incorrectly leaving the TCP option present in the TCP header, but reseting it to zero. The other end of the connection sees the option present so it acknoledges the use of the window scaling method. However, the initial machine thinks the window scale being used is, say five, while the receiving end sees it as zero because of the broken router.
In the more recent Linux kernels (2.6.8 and above (I think)), the default value for the window scale is 7. So, what's happening is that Linux set the window scale in the TCP options to 7 and some router in NTLs network is reseting it to zero (which is a violation of the rules of TCP, BTW). You wouldn't see this on a Windows system becuase the Windows TCP implementation doesn't implement the RFC. More details here:
http://lwn.net/Articles/92727/
[EDIT]: I'm pretty sure this is the problem because if I change the TCP stack to use a value of "zero" for the window scale, the sites that were slow are suddenly speedy. If I change the window scale back to the default of 7, the sites slow down again.
Now, having explained all of that, I can't imagine a way to get that across to anyone who'd answer the phone at NTL. So, if anyone on this board knows a way to get this info to someone who can actually fix the problem, that would be great.
Thanks.
This occured just after I had a short outage of about 15 minutes.
Here's what I think has happened:
During the outage, NTL installed a new router, proxy, traffic shaper, or some other bit of kit.
This new NTL equipment is not properly implementing RFC 1323 (http://www.faqs.org/rfcs/rfc1323.html)
RFC1323 defines a way for two machines to use a large TCP window size. Originally, TCP window size was limited to 64kb. This is too small to efficiently use the bandwidth or more modern networks. So, RFC1323 defines a standard by which connections can be set up with a window "scaling" factor.
RFC1323 uses one of the TCP option fields to defice a "window scaling factor". From lwn.net:
"...a system wanting to use window scaling sets a TCP option containing an eight-bit scale factor. All window values used by that system thereafter should be left-shifted by that scale factor; a window scale of zero, thus, implies no scaling at all, while a scale factor of five implies that window sizes should be shifted five bits, or multiplied by 32. With this scheme, a 128KB window could be expressed by setting the scale factor to five and putting 4096 in the window field."
The problem is that some network devices (and specificly whichever bit of kit NTL just added in the Reading area) are incorrectly leaving the TCP option present in the TCP header, but reseting it to zero. The other end of the connection sees the option present so it acknoledges the use of the window scaling method. However, the initial machine thinks the window scale being used is, say five, while the receiving end sees it as zero because of the broken router.
In the more recent Linux kernels (2.6.8 and above (I think)), the default value for the window scale is 7. So, what's happening is that Linux set the window scale in the TCP options to 7 and some router in NTLs network is reseting it to zero (which is a violation of the rules of TCP, BTW). You wouldn't see this on a Windows system becuase the Windows TCP implementation doesn't implement the RFC. More details here:
http://lwn.net/Articles/92727/
[EDIT]: I'm pretty sure this is the problem because if I change the TCP stack to use a value of "zero" for the window scale, the sites that were slow are suddenly speedy. If I change the window scale back to the default of 7, the sites slow down again.
Now, having explained all of that, I can't imagine a way to get that across to anyone who'd answer the phone at NTL. So, if anyone on this board knows a way to get this info to someone who can actually fix the problem, that would be great.
Thanks.