to add to tony's constructive response... re: host stacks. other improvements i've heard of that are relevant to "reacting to http" are (1) applying slow-start- and congestion-avoidance-type algorithms to the rate at which new tcp connections are opened and (2) having kernels share data in the protocol control blocks relevant to the tcp algorithms across connections to the same host (or even "network"). there are issues with both of these ideas, but the point is that we can do things to react to the observed behavior resulting from the extreme popularity of the web re: merit's data. while i have the highest respects for their motivations in collecting this data, i *do* concur with tony about them doing anything more than reporting raw numbers. and with respect to the raw numbers, some questions and observations are: (1) the measurements are made nap-to-nap. no user traffic goes from nap-to-nap (because ISP1 doesn't play transit for traffic between ISP2 and ISP3) (2) what percentage of traffic between large providers goes across public exchanges anyway? (3) do the collection methodologies and analysis techniques share community consensus? is the former independantly verifiable? having said all of this, i will add that some of their data is alarming. but for the second time today on this list, i will say that we need to be careful about chasing stats for their own sake .. we need to look very carefully at *exactly* what is being measured, how it's being collected and analyzed and what the results actually mean /jws
Bob,
You quote:
"Although some of the packet loss is inadvertent, a large percentage of the public exchange point connectivity problems reflect intentional engineering decisions by Internet service providers based on commercial settlement issues.
I think that this is an _extremely_ dangerous assertion on Merit's part. As always, ascribing intent rather than raw data requires much more justification which I have yet to see.
Are you familiar with this packet loss data from Merit? If not, please s
ee
above URL.
Am now... ;-)
Is Merit's packet loss data (NetNow) credible? Do packet losses in the Internet now average between 2% and 4% daily? Are 30% packet losses comm on during peak periods? Is there any evidence that Internet packet losses a re trending up or down?
Yes, that matches my instinctive feel. I don't have concrete data which corroborates or disputes their data, nor reflects high packet loss rates nor trends.
Were Merit's data correct, what would be the impact of 30% packet losses on opening up TCP connections?
TCP is pretty damn robust. Opening a connection is still likely to work.
On TCP throughput, say through a 28.8Kbps modem? On Web throughput, since so many TCP connections are involved? O n DNS look-ups? On email transport?
As you might imagine, that kind of packet loss rate is 'highly detrimental' to throughput. If you're asking for concrete numbers, I don't have them, but I've lived through them. Qualitatively, it means that interactive usage is intolerable. On the bright side, email works just fine.
How big a problem is HTTP's opening of so many TCP connections?
It's a very significant problem. It decreases the average packet size, thereby making router work much harder. It generates many more packets than necessary, and then closes down the connection after a very short transfer. In short, it's a horribly inefficient use of the net.
Does TCP need to operate differently than it does now when confronted routinely with 30% packet losses and quarter-second transit delays?
Your question presumes that we should live with the 30% losses. We should not. TCP does palatably well at surviving such brown-outs and I would not suggest changes for that cause. Note that there are other changes that I'd like to see, such as more use of Path MTU Discovery and fixing HTTP which are much more important. The quarter-second transit delays fall into two categories: one are transient delays, mostly caused by routing transients. Obviously we need to minimize such transients. The second is normal propagation delay. Using larger windows would aid that a great deal. I don't think that many TCP implementations allocate sufficient buffering today to truly be efficient.
What is the proper response of an IP-based protocol, like TCP, as packet losses climb? Try harder or back off or what?
Back off. Slow start is the accepted algorithm. Trying harder only increases congestion.
How robust are various widespread TCP/IP implementations in the face of 30% packet loss and quarter-second transit delays?
I have yet to see a significant problem with robustness.
Is the Internet's sometimes bogging down due mainly to packet losses or busy servers or what, or does the Internet not bog down?
That depends on your definitions. "The Internet" as a whole does not bog down. It's a modular system and there are localized problems and congestion which result in poor service to a wide-ranging set of users. The causes of the problems vary. I've seen lots of really slow servers, congested access links, unhappy routers, congested interconnects, etc.
Where is the data on packet losses experienced by traffic that does not g o through public exchange points?
I suspect that you'd have to ask the parties involved in the private exchange point. I suspect that there are not such statistics currently kept, or if so, they would not be willing to disclose them. Thus IPPM...
If 30% loss impacts are noticeable, what should be done to eliminate the losses or reduce their impacts on Web performance and reliability?
Ah... Yes, loss rates of 30% are noticeable and painful. There are literally hundreds of things that can and should be done to imrpove things. Let's see, just off the top of my head:
- more private interconnects are necessary in the long term to scale the network. We cannot have interconnects of infinite bandwidth as hardware simply doesn't scale as quickly as demand. Thus, we need to invoke parallelism. I think that this is already happening in a reasonable way. - more bandwidth. Of course, faster is better. OC3 SONET technology is quickly becoming an obvious upgrade path from today's T3 backbones. - better routers. Current implementations have many shortcomings which aggravate instability. - accurate reporting. There seems to be a trend to find a problem and get everyone hyped up over it, far in excess of reality. We spend time dealing with such issues rather than doing beneficial engineering. - improved protocols. We have an ongoing scalability problem with our routing protocols. - fixed host stacks. Using the full MTU would be a boon. Recent data indicates that >40% of the packets out there are 40 bytes.
Are packet losses due mainly to transient queue buffer overflows of user traffic or to discards by overburdened routing processors or something el se?
"mainly" is a dangerous quantifier given that there's no hard data. My intuition says that sheer congestion is the most serious problem, followed closely by router implementation.
What does Merit mean when they say that some of these losses are intentional because of settlement issues?
I think you really need to ask Merit that. I could find no justification for that on their Web page.
Are ISPs cooperating intelligently in the carriage of Internet traffic, or are ISPs competing destructively, to the detriment of them and their customers?
Ummm... I see them cooperating. "intelligently" is in the eye of the beholder. Certainly there are some who are being anti-social.
Tony