RE: UUNET Routing issues
Are the tickets closed yet? I'm tempted to call in and see if I can get a grasp of the scope and nature of the problem. But maybe it would be best if someone simply posted a brief summary of what is publicly known about the issue....to be followed by resonable speculation peppered with some wild speculation. Seems to be a relevant topic of mild interest to the majority of the list if in fact the problem is impact multiple UUNET locations. -----Original Message----- From: Eric Whitehill [mailto:eric@botbay.net] Sent: Thursday, October 03, 2002 10:19 AM To: Patrick_McAllister@WASHGAS.COM Cc: Manolo Hernandez; Nanog; owner-nanog@merit.edu Subject: Re: UUNET Routing issues For T-1 customers, the master Ticket Number is 651744 For customers with DS/OC gear, that master ticket number is 651751. I came to this information after calling their noc and asking. :) -Eric
I'm tempted to call in and see if I can get a grasp of the scope and nature of the problem. But maybe it would be best if someone simply posted a brief summary of what is publicly known about the issue....to be followed by resonable speculation peppered with some wild speculation.
So far we've received notification of this from Verisign Global Registry, Verisign Payment Services, Genuity Customer Care, NANOG, MSNBC, F.C., and it's been mentioned on a few Web sites. There still seem to be problems. Earlier today CHI->ATL was 2000ms. Now it's improved to 1000ms. 9 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 24.466 ms 24.311 ms 24.382 ms 10 0.so-0-0-0.TL2.CHI2.ALTER.NET (152.63.68.89) 24.467 ms 24.349 ms 24.454 ms 11 0.so-3-0-0.TL2.ATL5.ALTER.NET (152.63.101.50) 1029.484 ms 1049.529 ms 1063.692 ms 12 0.so-7-0-0.XL4.ATL5.ALTER.NET (152.63.85.194) 1106.067 ms 1118.102 ms 1132.124 ms Kevin
Once upon a time, sigma@smx.pair.com <sigma@smx.pair.com> said:
There still seem to be problems. Earlier today CHI->ATL was 2000ms. Now it's improved to 1000ms.
9 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 24.466 ms 24.311 ms 24.382 ms 10 0.so-0-0-0.TL2.CHI2.ALTER.NET (152.63.68.89) 24.467 ms 24.349 ms 24.454 ms 11 0.so-3-0-0.TL2.ATL5.ALTER.NET (152.63.101.50) 1029.484 ms 1049.529 ms 1063.692 ms 12 0.so-7-0-0.XL4.ATL5.ALTER.NET (152.63.85.194) 1106.067 ms 1118.102 ms 1132.124 ms
We're a UUNet customer (we also have other connections), and we haven't really seen any big problem today. We're connected to Atlanta, and I see: $ traceroute 152.63.73.21 traceroute to 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21): 1-30 hops, 38 byte packets 1 servers.hsvcore.hiwaay.net (208.147.154.33) 0.977 ms 0.977 ms 0.0 ms 2 500.Serial2-6.GW6.ATL5.ALTER.NET (65.208.82.61) 5.85 ms 5.86 ms 6.83 ms 3 178.at-6-0-0.XL4.ATL5.ALTER.NET (152.63.82.178) 6.83 ms (ttl=252!) 7.81 ms (ttl=252!) 7.81 ms (ttl=252!) 4 0.so-2-1-0.TL2.ATL5.ALTER.NET (152.63.85.229) 7.81 ms (ttl=251!) 6.83 ms (ttl=251!) 6.83 ms (ttl=251!) 5 0.so-5-3-0.TL2.CHI2.ALTER.NET (152.63.13.42) 22.4 ms (ttl=250!) 22.4 ms (ttl=250!) 21.4 ms (ttl=250!) 6 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 25.3 ms 27.3 ms 24.4 ms -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble.
On Thursday, October 3, 2002, at 04:07 PM, Chris Adams wrote:
Once upon a time, sigma@smx.pair.com <sigma@smx.pair.com> said:
There still seem to be problems. Earlier today CHI->ATL was 2000ms. Now it's improved to 1000ms.
9 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 24.466 ms 24.311 ms 24.382 ms 10 0.so-0-0-0.TL2.CHI2.ALTER.NET (152.63.68.89) 24.467 ms 24.349 ms 24.454 ms 11 0.so-3-0-0.TL2.ATL5.ALTER.NET (152.63.101.50) 1029.484 ms 1049.529 ms 1063.692 ms 12 0.so-7-0-0.XL4.ATL5.ALTER.NET (152.63.85.194) 1106.067 ms 1118.102 ms 1132.124 ms
We're a UUNet customer (we also have other connections), and we haven't really seen any big problem today. We're connected to Atlanta, and I see: <snip>
We haven't seen anything unusual on our UU circuit in PHX, either.
-- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble.
-- Matt Levine @Home: matt@deliver3.com @Work: matt@eldosales.com ICQ : 17080004 AIM : exile GPG : http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x6C0D04CF "The Trouble with doing anything right the first time is that nobody appreciates how difficult it was." -BIX
The only thing I've noticed is high latency between UUNet and Sprint (around 2 second latency) in at least one traffic exchange point between them, maybe more. Probably because of the diversion of traffic on UUNet's network. At 04:30 PM 10/3/2002 -0400, Matt Levine wrote:
On Thursday, October 3, 2002, at 04:07 PM, Chris Adams wrote:
Once upon a time, sigma@smx.pair.com <sigma@smx.pair.com> said:
There still seem to be problems. Earlier today CHI->ATL was 2000ms. Now it's improved to 1000ms.
9 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 24.466 ms 24.311 ms 24.382 ms 10 0.so-0-0-0.TL2.CHI2.ALTER.NET (152.63.68.89) 24.467 ms 24.349 ms 24.454 ms 11 0.so-3-0-0.TL2.ATL5.ALTER.NET (152.63.101.50) 1029.484 ms 1049.529 ms 1063.692 ms 12 0.so-7-0-0.XL4.ATL5.ALTER.NET (152.63.85.194) 1106.067 ms 1118.102 ms 1132.124 ms
We're a UUNet customer (we also have other connections), and we haven't really seen any big problem today. We're connected to Atlanta, and I see: <snip>
We haven't seen anything unusual on our UU circuit in PHX, either.
-- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. -- Matt Levine @Home: matt@deliver3.com @Work: matt@eldosales.com ICQ : 17080004 AIM : exile GPG : http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x6C0D04CF "The Trouble with doing anything right the first time is that nobody appreciates how difficult it was." -BIX
Vinny Abello Network Engineer Server Management vinny@tellurian.com (973)300-9211 x 125 (973)940-6125 (Direct) PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0 E935 5325 FBCB 0100 977A Tellurian Networks - The Ultimate Internet Connection http://www.tellurian.com (888)TELLURIAN
Where are they diverting it to, the Moon (1.5 light seconds away) ? Really - I have seen some multisecond latencies on network links we were testing, and I always wondered how these could come to be. -- Regards Marshall Eubanks Vinny Abello wrote:
The only thing I've noticed is high latency between UUNet and Sprint (around 2 second latency) in at least one traffic exchange point between them, maybe more. Probably because of the diversion of traffic on UUNet's network.
At 04:30 PM 10/3/2002 -0400, Matt Levine wrote:
On Thursday, October 3, 2002, at 04:07 PM, Chris Adams wrote:
Once upon a time, sigma@smx.pair.com <sigma@smx.pair.com> said:
There still seem to be problems. Earlier today CHI->ATL was 2000ms. Now it's improved to 1000ms.
9 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 24.466 ms 24.311 ms 24.382 ms 10 0.so-0-0-0.TL2.CHI2.ALTER.NET (152.63.68.89) 24.467 ms 24.349 ms 24.454 ms 11 0.so-3-0-0.TL2.ATL5.ALTER.NET (152.63.101.50) 1029.484 ms 1049.529 ms 1063.692 ms 12 0.so-7-0-0.XL4.ATL5.ALTER.NET (152.63.85.194) 1106.067 ms 1118.102 ms 1132.124 ms
We're a UUNet customer (we also have other connections), and we haven't really seen any big problem today. We're connected to Atlanta, and I see: <snip>
We haven't seen anything unusual on our UU circuit in PHX, either.
-- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble.
-- Matt Levine @Home: matt@deliver3.com @Work: matt@eldosales.com ICQ : 17080004 AIM : exile GPG : http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x6C0D04CF "The Trouble with doing anything right the first time is that nobody appreciates how difficult it was." -BIX
Vinny Abello Network Engineer Server Management vinny@tellurian.com (973)300-9211 x 125 (973)940-6125 (Direct) PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0 E935 5325 FBCB 0100 977A
Tellurian Networks - The Ultimate Internet Connection http://www.tellurian.com (888)TELLURIAN
T.M. Eubanks Multicast Technologies, Inc 10301 Democracy Lane, Suite 410 Fairfax, Virginia 22030 Phone : 703-293-9624 Fax : 703-293-9609 e-mail : tme@multicasttech.com http://www.multicasttech.com Test your network for multicast : http://www.multicasttech.com/mt/ Status of Multicast on the Web : http://www.multicasttech.com/status/index.html
The Juniper routers (it appears they are based on the interface naming scheme) tend to have incredible buffering capabilities as compared to the predecasors of the time. This allows a full link to not drop packets and fully buffer them over a period of time. This obviously has ramifications when it relates to tcp timing and when you go from having a 20ms rtt for a packet to 1000+ms. tcp obviously will think that there is some loss. - Jared On Thu, Oct 03, 2002 at 05:33:05PM -0400, Marshall Eubanks wrote:
Where are they diverting it to, the Moon (1.5 light seconds away) ?
Really - I have seen some multisecond latencies on network links we were testing, and I always wondered how these could come to be.
-- Regards Marshall Eubanks
Vinny Abello wrote:
The only thing I've noticed is high latency between UUNet and Sprint (around 2 second latency) in at least one traffic exchange point between them, maybe more. Probably because of the diversion of traffic on UUNet's network.
At 04:30 PM 10/3/2002 -0400, Matt Levine wrote:
On Thursday, October 3, 2002, at 04:07 PM, Chris Adams wrote:
Once upon a time, sigma@smx.pair.com <sigma@smx.pair.com> said:
There still seem to be problems. Earlier today CHI->ATL was 2000ms. Now it's improved to 1000ms.
9 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 24.466 ms 24.311 ms 24.382 ms 10 0.so-0-0-0.TL2.CHI2.ALTER.NET (152.63.68.89) 24.467 ms 24.349 ms 24.454 ms 11 0.so-3-0-0.TL2.ATL5.ALTER.NET (152.63.101.50) 1029.484 ms 1049.529 ms 1063.692 ms 12 0.so-7-0-0.XL4.ATL5.ALTER.NET (152.63.85.194) 1106.067 ms 1118.102 ms 1132.124 ms
We're a UUNet customer (we also have other connections), and we haven't really seen any big problem today. We're connected to Atlanta, and I see: <snip>
We haven't seen anything unusual on our UU circuit in PHX, either.
-- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble.
-- Matt Levine @Home: matt@deliver3.com @Work: matt@eldosales.com ICQ : 17080004 AIM : exile GPG : http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x6C0D04CF "The Trouble with doing anything right the first time is that nobody appreciates how difficult it was." -BIX
Vinny Abello Network Engineer Server Management vinny@tellurian.com (973)300-9211 x 125 (973)940-6125 (Direct) PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0 E935 5325 FBCB 0100 977A
Tellurian Networks - The Ultimate Internet Connection http://www.tellurian.com (888)TELLURIAN
T.M. Eubanks Multicast Technologies, Inc 10301 Democracy Lane, Suite 410 Fairfax, Virginia 22030 Phone : 703-293-9624 Fax : 703-293-9609 e-mail : tme@multicasttech.com http://www.multicasttech.com
Test your network for multicast : http://www.multicasttech.com/mt/ Status of Multicast on the Web : http://www.multicasttech.com/status/index.html
-- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
On Thu, 3 Oct 2002, Marshall Eubanks wrote:
Where are they diverting it to, the Moon (1.5 light seconds away) ?
Really - I have seen some multisecond latencies on network links we were testing, and I always wondered how these could come to be.
Good question. Cisco routers use a default queue size of 40 packets. That will give you a ~2 second delay on a 128 kbps line. I seem to remember that during my tour of duty at UUNET we had slightly faster lines... But that was back in the good old days when life was good. At 155 Mbps you need 32 MB worth of buffer space to arrive at a delay like this. I wouldn't put it past ATM vendors to think of this kind of over-enthusiastic buffering as a feature rather than a bug. Does anyone have any thoughts on optimum buffer sizes?
At 155 Mbps you need 32 MB worth of buffer space to arrive at a delay like this. I wouldn't put it past ATM vendors to think of this kind of over-enthusiastic buffering as a feature rather than a bug.
Vendor C sells packet memory up to 256M each way for a line card. Whether this makes any sense depends obviously on your interfaces. Theoretically it makes sense to be able to accommodate the number of flows you´re carrying times the window size advertised by TCP. In live networks not too large a percentage of the flows send data at maximum so one would expect to have a few thousand "full" flows on a link at time. 64k window for thousand flows would use 64M buffer memory. (not counting memory utilization inefficiencies) If you go deeper into the equation and start to analyze how fast you´ll get the packets in anyway, the associated mathematics will require a significantly longer presentation which you´ll probably find easily by Google. Pete
On Fri, 4 Oct 2002, Petri Helenius wrote:
Vendor C sells packet memory up to 256M each way for a line card. Whether this makes any sense depends obviously on your interfaces.
Hm, even at 10 Gbps 256M would add up to a delay of something like 200 ms. I doubt this is something customers like. Don't forget TCP can handle either a long round trip time or packet loss relatively well, but not both at the same time. So if you're doing that much buffering you should make absolutely sure it's enough to get rid of tail drops or TCP performance will be extremely poor.
Theoretically it makes sense to be able to accommodate the number of flows you´re carrying times the window size advertised by TCP.
Curious. Then the objective of buffering would be to absorb the entire window for each TCP flow. Is this a good thing to do? That will only add more delay, so TCP will use larger windows and you need more buffering... Kind of an arms race between the routers and the hosts to see which can buffer more data. Also, well-behaved TCP implementations shouldn't send a full window worth of data back to back. The only way I can see this happening is when the application at the receiving end stalls and then absorbs all the data buffered by the receiving TCP at once. But then the sending TCP should initiate the congestion avoidance algorithm, IMO. Under normal circumstances, the full window worth of data will be spread out over the entire path with no more than two packets arriving back to back at routers along the way (unless one session monopolizes a link). Iljitsch
http://www.wired.com/news/technology/0,1282,55580,00.html How and Why the Internet Broke By Michelle Delio 9:35 a.m. Oct. 4, 2002 PDT The Internet was very confused on Thursday. But cyberspace hasn't gone senile. Those massive e-mail delays, slow Internet connections and downed e-businesses were all caused by a software upgrade that went horribly wrong at WorldCom's UUNet division, a large provider network communications. [...]
Sean Donelan <sean@donelan.com> wrote:
But cyberspace hasn't gone senile. Those massive e-mail delays, slow Internet connections and downed e-businesses were all caused by a software upgrade that went horribly wrong at WorldCom's UUNet division, a large provider network communications.
After reading all the stories about what supposedly happened does anyone know what really happened? Did UUNet US really do an IOS upgrade on a sizable proportion of their border routers in one go? This seems like suicide to me. What possible reason could there be for a network-wide roll out of an untested IOS apart from being in the mire already? Tim
On Sat, 5 Oct 2002, Tim Thorne wrote:
After reading all the stories about what supposedly happened does anyone know what really happened? Did UUNet US really do an IOS upgrade on a sizable proportion of their border routers in one go? This seems like suicide to me. What possible reason could there be for a network-wide roll out of an untested IOS apart from being in the mire already?
Corporate culture is the hardest thing to change in a company. You'll need to talk with your Worldcom account rep about what happened, and what Worldcom intends to do about it. In the past, Worldcom has not been very open or transparent when it has had network problems.
Curious. Then the objective of buffering would be to absorb the entire window for each TCP flow. Is this a good thing to do? That will only add more delay, so TCP will use larger windows and you need more buffering... Kind of an arms race between the routers and the hosts to see which can buffer more data.
You usually end up with 64k window with modern systems anyway. Hardly anything uses window scaling bits actively. Obviously by dropping select packets you can keep the window at a more moderate size. Doing this effectively would require the box to regocnize flows which is not feasible at high speeds. (unless you´re caspian sales person :-)
Also, well-behaved TCP implementations shouldn't send a full window worth of data back to back. The only way I can see this happening is when the application at the receiving end stalls and then absorbs all the data buffered by the receiving TCP at once. But then the sending TCP should initiate the congestion avoidance algorithm, IMO.
I didn´t want to imply that the packets would be back to back in the queue but if you have a relatively short path with real latency in order of few tens of milliseconds and introduce extra 1000ms to the path, you have a full window of packets on the same queue. They will not be adjacent to each other but they would be sitting in the same packet memory.
Under normal circumstances, the full window worth of data will be spread out over the entire path with no more than two packets arriving back to back at routers along the way (unless one session monopolizes a link).
This discussion started as a discussion of non-normal circumstances. Not sure if the consensus is that congestion is non-normal. It´s very complicated to agree on metrics that define a "normal" network. Most people consider some packet loss normal and some jitter normal. Some people even accept their DNS to be offline for 60 seconds every hour for a "reload" as normal. Pete
On Fri, 04 Oct 2002 22:28:01 +0300, Petri Helenius said:
You usually end up with 64k window with modern systems anyway. Hardly anything uses window scaling bits actively. Obviously by dropping select packets you can keep the window at a more moderate size. Doing this effectively would require the box to regocnize flows which is not feasible at high speeds. (unless you're caspian sales person :-)
OK. I'll bite - is it feasible if you're a caspian engineer? ;)
OK. I'll bite - is it feasible if you're a caspian engineer? ;)
Obviously, as most of the audience knows, it´s a function of the speed you want to achieve, the number of flows you expect to be interested in and what you want to do with the flows. Getting traffic split up in a few million flows and maintaining the flow cache and associated state and doing lookups in the the cache is not too hard. Doing anything more clever than switching packets (like scheduling which one goes next) across a large dataset has been unachievable challenge so far. (at least at price points people want to pay) It would have to be an earlier hour to walk trough if a design which would combine flow classification and CAM based scheduling would cut it, but I´m afraid of the aliasing contention killing the actual thing you´re trying to achieve. (service quarantees) Pete
On Fri, 4 Oct 2002, Petri Helenius wrote:
Kind of an arms race between the routers and the hosts to see which can buffer more data.
You usually end up with 64k window with modern systems anyway. Hardly anything uses window scaling bits actively.
I also see ~17k a lot. I guess most applications don't need the extra performance offered by the larger windows anyway.
Obviously by dropping select packets you can keep the window at a more moderate size. Doing this effectively would require the box to regocnize flows which is not feasible at high speeds.
I think random early detect works reasonably well. Obviously something that really looks at the sessions would work better, but statistically, RED should work out fairly well.
Also, well-behaved TCP implementations shouldn't send a full window worth of data back to back. The only way I can see this happening is when the application at the receiving end stalls and then absorbs all the data buffered by the receiving TCP at once.
I didn´t want to imply that the packets would be back to back in the queue but if you have a relatively short path with real latency in order of few tens of milliseconds and introduce extra 1000ms to the path, you have a full window of packets on the same queue. They will not be adjacent to each other but they would be sitting in the same packet memory.
The only way this would happen is when the sending TCP sends them out back to back after the window opening up after having been closed. Under normal circumstances, the sending TCP sends out two new packets after each ACK. Obviously ACKs aren't forthcoming if all the traffic is waiting in buffers somewhere along the way. Only when a packet gets through an ack comes back and a new packet (or two) is transmitted. Hm, but a somewhat large number of packets being released at once by a sending TCP could also happen as the slow start threshold gets bigger. This could be half a window at once.
Under normal circumstances, the full window worth of data will be spread out over the entire path with no more than two packets arriving back to back at routers along the way (unless one session monopolizes a link).
This discussion started as a discussion of non-normal circumstances. Not sure if the consensus is that congestion is non-normal. It´s very complicated to agree on metrics that define a "normal" network. Most people consider some packet loss normal and some jitter normal. Some people even accept their DNS to be offline for 60 seconds every hour for a "reload" as normal.
Obviously "some" packet loss and jitter are normal. But how much is normal? Even at a few tenths of a percent packet loss hurts TCP performance. The only way to keep jitter really low without dropping large numbers of packets is to severly overengineer the network. That costs money. So how much are customers prepared to pay to avoid jitter? In any case, delays of 1000 ms aren't within any accepted definition of "normal". With these delays, high-bandwidth batch applications will monopolize the links and interactive traffic suffers. 20 ms worth of buffer space with RED would keep those high-bandwidth applications in check and allow a reasonable degree of interactive traffic. Maybe a different buffer size would be better, but the 20 ms someone mentioned seems as good a starting point as anything else.
## On 2002-10-04 23:50 +0200 Iljitsch van Beijnum typed: IvB> IvB> Obviously "some" packet loss and jitter are normal. But how much is IvB> normal? Even at a few tenths of a percent packet loss hurts TCP IvB> performance. The only way to keep jitter really low without dropping large IvB> numbers of packets is to severely overengineer the network. That costs IvB> money. So how much are customers prepared to pay to avoid jitter? There may be better ways to keep "reasonable" jitter but that depends on what is "really low" jitter - care to define numbers ? IvB> IvB> In any case, delays of 1000 ms aren't within any accepted definition of IvB> "normal". Ever used a satellite link ? Practical RTT("normal" - end to end including the local loops at both sides) starts at about 600msec
With these delays, high-bandwidth batch applications will IvB> monopolize the links and interactive traffic suffers.
I'm assuming TCP since you didn't state otherwise TCP extensions for "fat pipes"(such as window scaling and SACK) disabled (as both sides of the TCP connection need to have them) IIRC the maximum TCP(theoretical)session BW under these conditions Is less than 1Mb/sec (for 600msec RTT) For a reality check you may want to have look at the links under "Satellite links and performance" on <http://www.internet-2.org.il/documents.html> (yes the docs are a bit dated but the principles aren't)
20 ms worth of IvB> buffer space with RED would keep those high-bandwidth applications in IvB> check and allow a reasonable degree of interactive traffic. Maybe a IvB> different buffer size would be better, but the 20 ms someone mentioned IvB> seems as good a starting point as anything else. IvB> IvB>
-- Rafi
On Sat, 5 Oct 2002, Rafi Sadowsky wrote:
IvB> Obviously "some" packet loss and jitter are normal. But how much is IvB> normal? Even at a few tenths of a percent packet loss hurts TCP IvB> performance. The only way to keep jitter really low without dropping large IvB> numbers of packets is to severely overengineer the network. That costs IvB> money. So how much are customers prepared to pay to avoid jitter?
There may be better ways to keep "reasonable" jitter but that depends on what is "really low" jitter - care to define numbers ?
I don't use applications that have jitter requirements, so I'm not in the best position to comment on this. I'd say that with a line utilization of 50% or less, which leads to an average queue size of one packet or less, jitter is "really low". If the level of jitter introduced here is too high, then I don't think the application can successfully run over IP.
IvB> In any case, delays of 1000 ms aren't within any accepted definition of IvB> "normal".
Ever used a satellite link ? Practical RTT("normal" - end to end including the local loops at both sides) starts at about 600msec
So then a satellite link with a 1000 ms delay wouldn't be normal, would it?
With these delays, high-bandwidth batch applications will IvB> monopolize the links and interactive traffic suffers.
I'm assuming TCP since you didn't state otherwise TCP extensions for "fat pipes"(such as window scaling and SACK) disabled (as both sides of the TCP connection need to have them)
IIRC the maximum TCP(theoretical)session BW under these conditions Is less than 1Mb/sec (for 600msec RTT)
Ok, so "1 Mbps batch applications" will monopolize the links and interactive traffic suffers.
On Sat, 5 Oct 2002 18:29:38 +0200 (CEST) Iljitsch van Beijnum <iljitsch@muada.com> wrote:
On Sat, 5 Oct 2002, Rafi Sadowsky wrote:
IvB> Obviously "some" packet loss and jitter are normal. But how much is IvB> normal? Even at a few tenths of a percent packet loss hurts TCP IvB> performance. The only way to keep jitter really low without dropping large IvB> numbers of packets is to severely overengineer the network. That costs IvB> money. So how much are customers prepared to pay to avoid jitter?
There may be better ways to keep "reasonable" jitter but that depends on what is "really low" jitter - care to define numbers ?
I don't use applications that have jitter requirements, so I'm not in the best position to comment on this. I'd say that with a line utilization of 50% or less, which leads to an average queue size of one packet or less, jitter is "really low". If the level of jitter introduced here is too high, then I don't think the application can successfully run over IP.
IvB> In any case, delays of 1000 ms aren't within any accepted definition of IvB> "normal".
Ever used a satellite link ? Practical RTT("normal" - end to end including the local loops at both sides) starts at about 600msec
Dear Iljitsch; Geosynchronous satellites are up at 35,786 km This is ~ 98 millisecond (directly beneath) to 122 msec (at the limb). So, if the return is also by satellite, you get RTT < 4 x 122 msec = 486 msec (+ equipment delays). However, sometimes (here to India, for example) satellite paths use two hops - so you can get 1 second. (Communications satellites in Molniya orbits are a little higher at apogee, but you are unlikely to encounter these outside the FSU.) Of course, it is notorious that TCP requires tuning or proxies to perform well at high bandwidths over such long links. Having said all of that, I have seen RTTs of _tens_ of seconds US - Singapore. I would love to know how this is arranged. Regards Marshall Eubanks
So then a satellite link with a 1000 ms delay wouldn't be normal, would it?
With these delays, high-bandwidth batch applications will IvB> monopolize the links and interactive traffic suffers.
I'm assuming TCP since you didn't state otherwise TCP extensions for "fat pipes"(such as window scaling and SACK) disabled (as both sides of the TCP connection need to have them)
IIRC the maximum TCP(theoretical)session BW under these conditions Is less than 1Mb/sec (for 600msec RTT)
Ok, so "1 Mbps batch applications" will monopolize the links and interactive traffic suffers.
IIRC the maximum TCP(theoretical)session BW under these conditions Is less than 1Mb/sec (for 600msec RTT)
873.8kbps payload, add headers with assumed 1500 byte MTU and you'll have 897.8kbps. This assumes zero latency on the hosts reacting to the packets. Pete
Thus spake "Iljitsch van Beijnum" <iljitsch@muada.com>
At 155 Mbps you need 32 MB worth of buffer space to arrive at a delay like this. I wouldn't put it past ATM vendors to think of this kind of over-enthusiastic buffering as a feature rather than a bug.
Traditionally, it's ATM switches that have tiny buffers and routers that have excessive buffers. ATM networks have closed-loop feedback and ingress policing mechanisms to handle this scenario; IP networks just throw buffers at the problem and hope it works.
Does anyone have any thoughts on optimum buffer sizes?
The "correct" amount of buffer space for a link is equal to its bandwidth-delay product. Unfortunately, this requires per-link testing and configuration on the part of the operator, which is extremely rare. S
Well, Corning had to do something with all that extra fiber they couldn't sell, so they make a gigantic spool and made it a light buffer. On Thu, 3 Oct 2002, Marshall Eubanks wrote:
Where are they diverting it to, the Moon (1.5 light seconds away) ?
Really - I have seen some multisecond latencies on network links we were testing, and I always wondered how these could come to be.
participants (15)
-
Brennan_Murphy@NAI.com
-
Chris Adams
-
Iljitsch van Beijnum
-
Jared Mauch
-
Marshall Eubanks
-
Matt Levine
-
Petri Helenius
-
Rafi Sadowsky
-
Scott Granados
-
Sean Donelan
-
sigma@smx.pair.com
-
Stephen Sprunk
-
tim.thorne@btinternet.com
-
Valdis.Kletnieks@vt.edu
-
Vinny Abello