Questions about Internet Packet Losses
Hello, and best wishes for what's left of 1997. Now, if you would, ... Below are some questions I hope you'll help me answer about packet loss on the Internet. Here are two paragraphs taken from: http://www.merit.edu/~ipma/netnow/docs/info.html: "Early experiments with NetNow show that 30% packet loss between public exchange points is common for major Internet service providers during peak usage periods. The initial investigation also suggests that loss rates are closely related to bandwidth usage and congestion problems. Although some of the packet loss is inadvertent, a large percentage of the public exchange point connectivity problems reflect intentional engineering decisions by Internet service providers based on commercial settlement issues. "The high packet loss may not generally reflect problems seen by the majority of customers of the larger network service providers. In fact, increasing levels of Internet traffic are not traversing the public exchange points. Instead, many large service providers are migrating their inter-provider traffic to private exchange points, or direct connections to other providers. Merit is working closely with providers to develop tools and infrastructure that more closely reflect Internet performance as observed by the majority of backbone customers." Questions: Are you familiar with this packet loss data from Merit? If not, please see above URL. Is Merit's packet loss data (NetNow) credible? Do packet losses in the Internet now average between 2% and 4% daily? Are 30% packet losses common during peak periods? Is there any evidence that Internet packet losses are trending up or down? If Merit's data is not correct, where has Merit gone wrong? Where is there better data? Were Merit's data correct, what would be the impact of 30% packet losses on opening up TCP connections? On TCP throughput, say through a 28.8Kbps modem? On Web throughput, since so many TCP connections are involved? On DNS look-ups? On email transport? How big a problem is HTTP's opening of so many TCP connections? Does TCP need to operate differently than it does now when confronted routinely with 30% packet losses and quarter-second transit delays? What is the proper response of an IP-based protocol, like TCP, as packet losses climb? Try harder or back off or what? How robust are various widespread TCP/IP implementations in the face of 30% packet loss and quarter-second transit delays? Is the Internet's sometimes bogging down due mainly to packet losses or busy servers or what, or does the Internet not bog down? What fraction of Internet traffic still goes through public exchange points and therefore sees these kinds of packet losses? What fraction of Internet traffic originates and terminates within a single ISP? Where is the data on packet losses experienced by traffic that does not go through public exchange points? If 30% loss impacts are noticeable, what should be done to eliminate the losses or reduce their impacts on Web performance and reliability? Are packet losses due mainly to transient queue buffer overflows of user traffic or to discards by overburdened routing processors or something else? What does Merit mean when they say that some of these losses are intentional because of settlement issues? Are ISPs cooperating intelligently in the carriage of Internet traffic, or are ISPs competing destructively, to the detriment of them and their customers? Any help you can offer on these questions would be appreciated. /Bob Metcalfe, InfoWorld ______________________________________________ ______________________________________________ Dr. Robert M. ("Bob") Metcalfe VP Technology, International Data Group InfoWorld columns: www.infoworld.com Mail: metcalfe@infoworld.com Telephone: 617-534-1215 Conference Chairman for ACM97: The Next 50 Years of Computing San Jose Convention Center, March 1-5, 1997 Registration information: www.acm.org/acm97 Register now at 1-800-342-6626 ______________________________________________ ______________________________________________
Bob Metcalfe writes:
Hello, and best wishes for what's left of 1997. Now, if you would, ...
Below are some questions I hope you'll help me answer about packet loss on the Internet.
Didn't get burned enough last time, Metcalfe? You seem to have claimed the internet was doomed to collapse last year. Somehow this hasn't happened. In fact, I've got to say that stability for my customers has improved dramatically over recent months. Cruising to make more collapse claims now? (You know, the standard "oh, did I say the world was ending last Tuesday? I meant NEXT Tuesday, of course"). Trying to salvage whats left of your reputation? Or is it just a way of creating flame wars on nanog? Go back under your rock. Perry
Bob, You quote: "Although some of the packet loss is inadvertent, a large percentage of the public exchange point connectivity problems reflect intentional engineering decisions by Internet service providers based on commercial settlement issues. I think that this is an _extremely_ dangerous assertion on Merit's part. As always, ascribing intent rather than raw data requires much more justification which I have yet to see. Are you familiar with this packet loss data from Merit? If not, please see above URL. Am now... ;-) Is Merit's packet loss data (NetNow) credible? Do packet losses in the Internet now average between 2% and 4% daily? Are 30% packet losses common during peak periods? Is there any evidence that Internet packet losses are trending up or down? Yes, that matches my instinctive feel. I don't have concrete data which corroborates or disputes their data, nor reflects high packet loss rates nor trends. Were Merit's data correct, what would be the impact of 30% packet losses on opening up TCP connections? TCP is pretty damn robust. Opening a connection is still likely to work. On TCP throughput, say through a 28.8Kbps modem? On Web throughput, since so many TCP connections are involved? On DNS look-ups? On email transport? As you might imagine, that kind of packet loss rate is 'highly detrimental' to throughput. If you're asking for concrete numbers, I don't have them, but I've lived through them. Qualitatively, it means that interactive usage is intolerable. On the bright side, email works just fine. How big a problem is HTTP's opening of so many TCP connections? It's a very significant problem. It decreases the average packet size, thereby making router work much harder. It generates many more packets than necessary, and then closes down the connection after a very short transfer. In short, it's a horribly inefficient use of the net. Does TCP need to operate differently than it does now when confronted routinely with 30% packet losses and quarter-second transit delays? Your question presumes that we should live with the 30% losses. We should not. TCP does palatably well at surviving such brown-outs and I would not suggest changes for that cause. Note that there are other changes that I'd like to see, such as more use of Path MTU Discovery and fixing HTTP which are much more important. The quarter-second transit delays fall into two categories: one are transient delays, mostly caused by routing transients. Obviously we need to minimize such transients. The second is normal propagation delay. Using larger windows would aid that a great deal. I don't think that many TCP implementations allocate sufficient buffering today to truly be efficient. What is the proper response of an IP-based protocol, like TCP, as packet losses climb? Try harder or back off or what? Back off. Slow start is the accepted algorithm. Trying harder only increases congestion. How robust are various widespread TCP/IP implementations in the face of 30% packet loss and quarter-second transit delays? I have yet to see a significant problem with robustness. Is the Internet's sometimes bogging down due mainly to packet losses or busy servers or what, or does the Internet not bog down? That depends on your definitions. "The Internet" as a whole does not bog down. It's a modular system and there are localized problems and congestion which result in poor service to a wide-ranging set of users. The causes of the problems vary. I've seen lots of really slow servers, congested access links, unhappy routers, congested interconnects, etc. Where is the data on packet losses experienced by traffic that does not go through public exchange points? I suspect that you'd have to ask the parties involved in the private exchange point. I suspect that there are not such statistics currently kept, or if so, they would not be willing to disclose them. Thus IPPM... If 30% loss impacts are noticeable, what should be done to eliminate the losses or reduce their impacts on Web performance and reliability? Ah... Yes, loss rates of 30% are noticeable and painful. There are literally hundreds of things that can and should be done to imrpove things. Let's see, just off the top of my head: - more private interconnects are necessary in the long term to scale the network. We cannot have interconnects of infinite bandwidth as hardware simply doesn't scale as quickly as demand. Thus, we need to invoke parallelism. I think that this is already happening in a reasonable way. - more bandwidth. Of course, faster is better. OC3 SONET technology is quickly becoming an obvious upgrade path from today's T3 backbones. - better routers. Current implementations have many shortcomings which aggravate instability. - accurate reporting. There seems to be a trend to find a problem and get everyone hyped up over it, far in excess of reality. We spend time dealing with such issues rather than doing beneficial engineering. - improved protocols. We have an ongoing scalability problem with our routing protocols. - fixed host stacks. Using the full MTU would be a boon. Recent data indicates that >40% of the packets out there are 40 bytes. Are packet losses due mainly to transient queue buffer overflows of user traffic or to discards by overburdened routing processors or something else? "mainly" is a dangerous quantifier given that there's no hard data. My intuition says that sheer congestion is the most serious problem, followed closely by router implementation. What does Merit mean when they say that some of these losses are intentional because of settlement issues? I think you really need to ask Merit that. I could find no justification for that on their Web page. Are ISPs cooperating intelligently in the carriage of Internet traffic, or are ISPs competing destructively, to the detriment of them and their customers? Ummm... I see them cooperating. "intelligently" is in the eye of the beholder. Certainly there are some who are being anti-social. Tony
to add to tony's constructive response... re: host stacks. other improvements i've heard of that are relevant to "reacting to http" are (1) applying slow-start- and congestion-avoidance-type algorithms to the rate at which new tcp connections are opened and (2) having kernels share data in the protocol control blocks relevant to the tcp algorithms across connections to the same host (or even "network"). there are issues with both of these ideas, but the point is that we can do things to react to the observed behavior resulting from the extreme popularity of the web re: merit's data. while i have the highest respects for their motivations in collecting this data, i *do* concur with tony about them doing anything more than reporting raw numbers. and with respect to the raw numbers, some questions and observations are: (1) the measurements are made nap-to-nap. no user traffic goes from nap-to-nap (because ISP1 doesn't play transit for traffic between ISP2 and ISP3) (2) what percentage of traffic between large providers goes across public exchanges anyway? (3) do the collection methodologies and analysis techniques share community consensus? is the former independantly verifiable? having said all of this, i will add that some of their data is alarming. but for the second time today on this list, i will say that we need to be careful about chasing stats for their own sake .. we need to look very carefully at *exactly* what is being measured, how it's being collected and analyzed and what the results actually mean /jws
Bob,
You quote:
"Although some of the packet loss is inadvertent, a large percentage of the public exchange point connectivity problems reflect intentional engineering decisions by Internet service providers based on commercial settlement issues.
I think that this is an _extremely_ dangerous assertion on Merit's part. As always, ascribing intent rather than raw data requires much more justification which I have yet to see.
Are you familiar with this packet loss data from Merit? If not, please s
ee
above URL.
Am now... ;-)
Is Merit's packet loss data (NetNow) credible? Do packet losses in the Internet now average between 2% and 4% daily? Are 30% packet losses comm on during peak periods? Is there any evidence that Internet packet losses a re trending up or down?
Yes, that matches my instinctive feel. I don't have concrete data which corroborates or disputes their data, nor reflects high packet loss rates nor trends.
Were Merit's data correct, what would be the impact of 30% packet losses on opening up TCP connections?
TCP is pretty damn robust. Opening a connection is still likely to work.
On TCP throughput, say through a 28.8Kbps modem? On Web throughput, since so many TCP connections are involved? O n DNS look-ups? On email transport?
As you might imagine, that kind of packet loss rate is 'highly detrimental' to throughput. If you're asking for concrete numbers, I don't have them, but I've lived through them. Qualitatively, it means that interactive usage is intolerable. On the bright side, email works just fine.
How big a problem is HTTP's opening of so many TCP connections?
It's a very significant problem. It decreases the average packet size, thereby making router work much harder. It generates many more packets than necessary, and then closes down the connection after a very short transfer. In short, it's a horribly inefficient use of the net.
Does TCP need to operate differently than it does now when confronted routinely with 30% packet losses and quarter-second transit delays?
Your question presumes that we should live with the 30% losses. We should not. TCP does palatably well at surviving such brown-outs and I would not suggest changes for that cause. Note that there are other changes that I'd like to see, such as more use of Path MTU Discovery and fixing HTTP which are much more important. The quarter-second transit delays fall into two categories: one are transient delays, mostly caused by routing transients. Obviously we need to minimize such transients. The second is normal propagation delay. Using larger windows would aid that a great deal. I don't think that many TCP implementations allocate sufficient buffering today to truly be efficient.
What is the proper response of an IP-based protocol, like TCP, as packet losses climb? Try harder or back off or what?
Back off. Slow start is the accepted algorithm. Trying harder only increases congestion.
How robust are various widespread TCP/IP implementations in the face of 30% packet loss and quarter-second transit delays?
I have yet to see a significant problem with robustness.
Is the Internet's sometimes bogging down due mainly to packet losses or busy servers or what, or does the Internet not bog down?
That depends on your definitions. "The Internet" as a whole does not bog down. It's a modular system and there are localized problems and congestion which result in poor service to a wide-ranging set of users. The causes of the problems vary. I've seen lots of really slow servers, congested access links, unhappy routers, congested interconnects, etc.
Where is the data on packet losses experienced by traffic that does not g o through public exchange points?
I suspect that you'd have to ask the parties involved in the private exchange point. I suspect that there are not such statistics currently kept, or if so, they would not be willing to disclose them. Thus IPPM...
If 30% loss impacts are noticeable, what should be done to eliminate the losses or reduce their impacts on Web performance and reliability?
Ah... Yes, loss rates of 30% are noticeable and painful. There are literally hundreds of things that can and should be done to imrpove things. Let's see, just off the top of my head:
- more private interconnects are necessary in the long term to scale the network. We cannot have interconnects of infinite bandwidth as hardware simply doesn't scale as quickly as demand. Thus, we need to invoke parallelism. I think that this is already happening in a reasonable way. - more bandwidth. Of course, faster is better. OC3 SONET technology is quickly becoming an obvious upgrade path from today's T3 backbones. - better routers. Current implementations have many shortcomings which aggravate instability. - accurate reporting. There seems to be a trend to find a problem and get everyone hyped up over it, far in excess of reality. We spend time dealing with such issues rather than doing beneficial engineering. - improved protocols. We have an ongoing scalability problem with our routing protocols. - fixed host stacks. Using the full MTU would be a boon. Recent data indicates that >40% of the packets out there are 40 bytes.
Are packet losses due mainly to transient queue buffer overflows of user traffic or to discards by overburdened routing processors or something el se?
"mainly" is a dangerous quantifier given that there's no hard data. My intuition says that sheer congestion is the most serious problem, followed closely by router implementation.
What does Merit mean when they say that some of these losses are intentional because of settlement issues?
I think you really need to ask Merit that. I could find no justification for that on their Web page.
Are ISPs cooperating intelligently in the carriage of Internet traffic, or are ISPs competing destructively, to the detriment of them and their customers?
Ummm... I see them cooperating. "intelligently" is in the eye of the beholder. Certainly there are some who are being anti-social.
Tony
[Not about NANOG per se]
re: host stacks. other improvements i've heard of that are relevant to "reacting to http" are (1) applying slow-start- and congestion-avoidance-type algorithms to the rate at which new tcp connections are opened and (2) having kernels share data in the protocol control blocks relevant to the tcp algorithms across connections to the same host (or even "network"). there are issues with both of these ideas, but the point is that we can do things to react to the observed behavior resulting from the extreme popularity of the web
Given that, roughly, TCP congestion avoidance algorithm `estimates' bandwidth available to a connection on a particular route, perhaps this estimate can be used as a bandwidth *hint* for any further traffic between the two hosts. That is, such an estimate should be useful beyond the connection that computed it. Why throw away hard earned statistical data if we can figure out how to reuse it? [Thinking aloud here...] Perhaps a part of the TCP congestion avoidance algorithm can be factored out in some sort of a `traffic central' module that tries to give you the best bandwidth/packet loss estimate it has for a given route provided you keep it updated with what you learn (i.e. TCP tells it when a packet is lost etc). A new TCP connection can then immediately start off with a bigger window (and won't open the window too wide too quickly). Multiple connections between two hosts can avoid what would be largely redundant estimate computation. Even a UDP app. can try to benefit from this (such as for communication where bounded delay is more critical than packet loss). Other `traffic conditions' input can also be fed into this module [perhaps as part of some future routing protocol]. Combining this `quality' of a route aspect into routing protocols may make sense in the long run.... -- bakul
at Mon, 13 Jan 1997 14:00:33 PST, you wrote:
You quote: "Although some of the packet loss is inadvertent, a large percentage of the public exchange point connectivity problems reflect intentional engineering decisions by Internet service providers based on commercial settlement issues.
I think that this is an _extremely_ dangerous assertion on Merit's part. As always, ascribing intent rather than raw data requires much more justification which I have yet to see.
Unfortunately, a key word here was probably left out. The sentence should have read "_may_ reflect"... Tony is right -- we cannot ascribe any motives/rational to ISP engineering decisions. We are walking a bit of a tightrope here. Although I believe the NetNow statistics contain some valuable information on network performance, a number of large ISPs have expressed significant concerns about the possible misinterpretation of the data. Specifically, several ISPs have explained that increasing amounts of their customer traffic do not traverse public exchanges. These ISPs assert that measuring packet loss/latency between public exchange points is not reflective of actual network performance as perceived by their customers. These ISPs further explained that their priority is always to optimize their customers' network performance. For the large ISPs, this increasingly may mean prioritizing connectivity to direct/private exchange points before public exchange points. Of course, I have not seen publicly available statistics on the amount of traffic traversing public exchange points versus private exchange points. The text Bob quotes essentially reiterates the explanations/descriptions of the NetNow data provided by the large ISPs. Lacking any evidence to the contrary, we included the text as an attempt to provide a more balanced view of the statistics.
Where is the data on packet losses experienced by traffic that does not go through public exchange points?
I suspect that you'd have to ask the parties involved in the private exchange point. I suspect that there are not such statistics currently kept, or if so, they would not be willing to disclose them. Thus IPPM...
A number of large ISPs have volunteered probe platforms at private exchange points. We (Merit/CAIDA/NLANR) are now evaluating the possible deployment of additional probe machines for inclusion in these network performance studies. - Craig -- Craig Labovitz labovit@merit.edu Merit Network, Inc. (313) 764-0252 (office) 4251 Plymouth Road, Suite C. (313) 747-3745 (fax) Ann Arbor, MI 48105-2785
tli@jnx.com said:
- fixed host stacks. Using the full MTU would be a boon. Recent data indicates that >40% of the packets out there are 40 bytes.
One of the nice things about proxy servers and web caches is they mask end user host stack inadequacy, and thus (I allege) tend to increase MTUs and encourage better network utilization (in additional to the normal benefits of caching). Isn't it unsurpsrising that >40% of packets are small? ACKs? Alex Bligh Xara Networks
Isn't it unsurpsrising that >40% of packets are small? ACKs? If one made the gross oversimplifying approximation that everything is unidirectional TCP traffic, then you'd expect to see one ACK per two data packets. Thus, we'd expect to see 33% at 40 bytes. It is the additional 7+% that's surprising. Tony
One of the nice things about proxy servers and web caches is they mask end user host stack inadequacy, and thus (I allege) tend to increase MTUs and encourage better network utilization (in additional to the normal benefits of caching).
Proxy servers can also maintain persistent long/medium term HTTP 1.1 connections that the clients are too dumb to maintain themselves, thus getting streaming without requiring client upgrades.
On Tue, 14 Jan 1997, Paul A Vixie wrote:
Proxy servers can also maintain persistent long/medium term HTTP 1.1 connections that the clients are too dumb to maintain themselves, thus getting streaming without requiring client upgrades.
Does Squid do this? Starting in what version? Michael Dillon - Internet & ISP Consulting Memra Software Inc. - Fax: +1-604-546-3049 http://www.memra.com - E-mail: michael@memra.com
On Mon, 13 Jan 1997, Bob Metcalfe wrote:
Hello, and best wishes for what's left of 1997. Now, if you would, ...
Is Merit's packet loss data (NetNow) credible? Do packet losses in the Internet now average between 2% and 4% daily? Are 30% packet losses common during peak periods? Is there any evidence that Internet packet losses are trending up or down?
If your provider has 30% packet loss you need to look at a new provider. I think most providers have little packet loss. This is a ping -c 1000 from one of my servers in Arlington, VA to a router a router at PAIX. --- 205.215.63.18 ping statistics --- 1000 packets transmitted, 1000 packets received, 0% packet loss round-trip min/avg/max = 77.3/80.0/127.3 ms I know you are sad that the net did not fall apart, but most of us are able to keep up. The nice thing is that bandwidth is starting to drop, we have some OC-3 circuits that are just a little more then a DS3. P.S. Yes the delay is up there, but we are installing a DS3 from Palo Alto to Arlington, so packets from Arlington - Palo Alto will not need to go through Atlanta or Chicago to get to CA. Nathan Stratton President, NetRail,Inc. ------------------------------------------------------------------------ Phone (888)NetRail NetRail, Inc. Fax (703)534-5033 2007 N. 15 St. Suite 5 WWW http://www.netrail.net/ Arlington, VA 22201 ------------------------------------------------------------------------ "Therefore do not worry about tomorrow, for tomorrow will worry about itself. Each day has enough trouble of its own." Matthew 6:34
participants (10)
-
Alex.Bligh
-
Bakul Shah
-
Bob Metcalfe
-
Craig Labovitz
-
John W. Stewart III
-
Michael Dillon
-
Nathan Stratton
-
Paul A Vixie
-
Perry E. Metzger
-
Tony Li