Re: Satellite latency

5 Mar 2002

      On Tue, Mar 05, 2002 at 08:42:22AM +0200, Hank Nussbacher wrote:
...
New 12.2(8)T feature in Cisco IOS called TCP Windows Scaling:
http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/122newft/122...
Specifically made for satellite networks:
ip tcp window-size 750000
A) This is quite old, as part of RFC1323.

Command History
Release  Modification  
9.1      This command was introduced.
12.2(8)T Default window size and maximum window scaling factor were increased.

B) This only applies to connections TO and FROM the router itself, not 
stuff it routes.

You should also beware of turning up TCP window settings to whatever big
number you feel like. I can only vouch for unix systems here, but the way
the socket interface and kernel tcp works requires a buffer which is big
enough to hold all data in flight be maintained in the kernel for every
connection. This data cannot be released until it has been ACK'd (incase
TCP needs to retransmitit), which is generally the limiting factor of TCP
window sizes.

Say for example you turn up your socket buffers to 1MB, and enable your
RFC1323 extensions (window scaling is the one you care about, it's
basically just adding a multiplier value so you can get bigger values then
65535). While this does let you to keep a whole lot of data in-flight, it
also makes your system quite unstable. Consider the case where you are
transfering a large file to a slow host: you will immediately fill the 1MB
kernel buffer (the write() on the socket goes into that first, also the
userland program has no way of knowing if that is a fast host or a big
kernel buffer and will misreport speed). Open a few more connections like
that and you've exausted your kernel memory and most likely will have a
panic. If you did these settings on a web server, all it would take is a
few dialups trying to download a big file before you go boom.

Except for malicious activity on the part of the remote host, it is 
generally safe to turn up your RECEIVE socket buffer to a REASONABLY
beefy number. Your performance may ultimately be impacted by the settings 
of the sending side, but you're still likely to improve performance with 
someone.

If you're wondering what "reasonable" is, you can calculate what must 
be kept in-flight by doing the bandwidth * delay product:

A slow ethernet pipe to a host that is fairly close by:
10Mbit  * 10ms  = 13,107 Bytes

A fast ethernet pipe to a host that is fairly close by:
100Mbit  * 10ms  = 131,072 Bytes

A fast ethernet pipe to a host that is on the opposite side of the US via 
someones drunken fiber path:
100Mbit * 100ms = 1,310,720 Bytes

A satellite link:
5Mbit   * 800ms = 524,288 Bytes

For best results, you should multiply the number by a minimum of 2, to 
allow for TCP to do error recovery without destroying your windowing. 
Multiplying by a value of 3 is the most you would want, more than that is 
unnecessary and wasteful.

If you're looking for something you can do as a server to improve
responsiveness over long fat networks, the simplest and safest is to turn
up a slowstart multiplier (under FreeBSD it's sysctl
net.inet.tcp.slowstart_flightsize). This skips the slowstart ahead just a
bit, optimizing for a certain target audience (say 56k modems) while
people with 300baud modems will just have to drop some packets and
backoff. Ramping up the slowstart from 1 can be very painful if your delay
is extreme.

While I'm on the subject, I'm not certain if Cisco BGP is linked to the 
"ip tcp" settings or if it tunes itself, but that is a potential win for a 
peer over an LFN if it does not. Anyone want to comment?

And just for theories sake, the correct way to fix the whole socket buffer 
mess is to automatically tune them based on results from the congestion
window. PSC (http://www.psc.edu/networking/auto.html) has an implementation
for NetBSD, but it goes a tiny bit overboard doing nutty things like 
scanning the entire TCB list twice a second to try and achieve "fairness". 
Minus that part however, it is actually pretty darn simple to implement (at
least on BSD, the socket buffers aren't allocated buffers at all, simply
numbers which fix maximium size that can be allocated when data comes in).

-- 
Richard A Steenbergen <ras@e-gerbil.net>       http://www.e-gerbil.net/ras
PGP Key ID: 0x138EA177  (67 29 D7 BC E8 18 3E DA  B2 46 B3 D8 14 36 FE B6)