Re: Ungodly packet loss rates
Independent of the fact that the traceroute shows that packets are not taking the most direct route (both Best and BBN are at Mae-West), your complaint about high loss is valid. This is an industry problem, but the problem is not as universal as you may think. Merit has been measuring loss NAP to NAP. We've been looking at this data and trying to provide useful summaries. Its kind of drafty since Merit's data collection is not so good, but you can get a vague idea of the conditions from the page: (This page is temporary - don't point to it.) http://www.brookfield.ans.net/ans/netnow/netnow.html We're not sure of the accuracy of this data or the statistical validity. In particular 1) Merit's method of storage doesn't provide markers between data sets, 2) cummalitive probabilities of 20 packet samples are fairly worthless for looking at the probability of losses below 5%. (The perl programs are all in the same directory so if you see any bugs please tell me. :) \begin{aside} Probability of 1% to 4% loss could be estimated by assuming data sets are uncorrelated. For example, the chance of 2.5% loss or more P(2.5) is (1 - ((1 - P(5))**2)) * (1 - P(10)), or in english the estimate probability that one of two samples is zero times the probability that the other point is no worse than 5%. Or something like that. Apply Bayes theorem of joint probabilities. But is the data uncorrelated? \end{aside} Curtis ps- wrt what's being done about it, this is an attempt to provide insights into loss rates as per your point #2 but more for the purpose of meaningful measure than any kind of guarantee. 2. Develop meaningful quality-of-service standards that can be used to guarantee reasonable performance in terms of end-to-end drop rates, delays, and downtime. Though our loss is among the lowest, we're adding circuits to bring it back so it is zero most of the time, where it used to be and where it belongs. I guess this is the reason for the "state of the Internet" segment of NANOG tommorrow. Other providers can speak for themselves. pps- Its been a long enough thread already. Sorry to add to the thread. :( ------- Forwarded Message Received: from interlock.ans.net (interlock.ans.net [147.225.5.5]) by brookfield.ans.net (8.7.3/8.7.3) with SMTP id NAA05470 for <curtis@brookfield.ans.net>; Mon, 21 Oct 1996 13:59:17 -0400 (EDT) Received: by interlock.ans.net id AA05244 (InterLock SMTP Gateway 3.0 for regional-techsers@ans.net); Mon, 21 Oct 1996 13:59:45 -0400 Received: by interlock.ans.net (Internal Mail Agent-5); Mon, 21 Oct 1996 13:59:45 -0400 Received: by interlock.ans.net (Internal Mail Agent-4); Mon, 21 Oct 1996 13:59:45 -0400 Received: by interlock.ans.net (Internal Mail Agent-3); Mon, 21 Oct 1996 13:59:45 -0400 Received: by interlock.ans.net (Internal Mail Agent-2); Mon, 21 Oct 1996 13:59:45 -0400 Received: by interlock.ans.net (Internal Mail Agent-1); Mon, 21 Oct 1996 13:59:45 -0400 To: noc@tlg.net, help@uunet.uu.net, nanog@merit.net, ops@bbnplanet.com Cc: barb@velvet.com, ianp@darktower.demon.co.uk From: jbash@velvet.com Subject: Ungodly packet loss rates Date: Mon, 21 Oct 1996 10:42:00 -0700 Message-Id: <96Oct21.104256-0700pdt.18972-3+3@blue.velvet.com> Sender: owner-nanog@merit.edu [Resent... I stupidly used the wrong address for the NANOG list] This is being sent to the "help-line" addresses of several Internet providers because they're not providing what I consider appropriate service. It's being sent to the NANOG mailing list because it represents what I believe to be an industry-wide problem. I'm just a lowly end user, and perhaps I shouldn't intrude into the councils of the Wise and the Great, but this is just a bit ridiculous. Attached is a traceroute from my home machine to the system I'm trying to work on over a TELNET session. It looks like there's about a 40-percent overall round-trip loss rate, most or all of it apparently introduced in the Alternet and BBN Planet backbones. This is not a transient condition; it's been going on for at least several days, and similar things happen all the time. I think we can all agree that a 40-percent loss rate isn't an acceptable level of service in an IP network. It's certainly making it annoying and frustrating for me to try to work. It's also driving up the load on the network by provoking retransmissions. A corporate internal network running at that loss rate would probably be considered to be in collapse. I pay TLGnet (now Best) an agreed-upon amount of money every month, nominally in exchange for a reasonable level of Internet service. I think that part of TLGnet's obligation under that arrangement is to contract for reliable backbone service. Likewise, the other end of the path (Cisco systems, for whom I am emphatically not, *not, *NOT* speaking here) pays BBN what I suspect to be a very large amount of money indeed for DS3 service, presumably in the expectation that most of the packets that go into the DS3 will come out of the network somewhere. Alternet presumably has agreements with both TLGnet and BBN. That puts everybody on the hook. I fully understand that it's difficult to provide reliable service in an exponentially-growing network. I'm aware that everybody's already using the fastest lines they can get, and connecting the fastest routers to them. I know that links are being added. I appreciate that both lines and equipment are very expensive, and that adding lines serves to complicate an already amazingly complex router configuration situation. I understand that cash-flow issues (as well as convincing-the-bean-counters issues) are involved. I sympathize... ... but the fact remains that I'm not getting the level of service I think I'm entitled to, nor are other end users. Not only that, but if the level of service gets any lower, the Net will become so painful to use that I'll start wondering why I bother. While reducing my Net use might be good for my mental health, I don't think anybody wants to see users abandoning the Net because of poor service. So, what's to be done about it? Assuming that all technical means are being pursued, and from what I've seen on various mailing lists I believe they probably are, the only thing left is a management fix. May I make the probably-sacreligous suggestion that the industry as a whole, and the providers I've mentioned in particular, show greater concern for the quality of service provided, and specifically-- 1. Stop taking on new customers (or other traffic sources) until existing customers can be provided with an appropriate level of service. 2. Develop meaningful quality-of-service standards that can be used to guarantee reasonable performance in terms of end-to-end drop rates, delays, and downtime. 3. Reexamine both pricing levels and the Internet pricing model, to make sure that there's enough money available to fund a usable level of service. Yes, this means giving up some business. That's one of the costs of honoring your agreements... and of not alienating an entire generation of customers. Thank you for your attention. Although I usually scan at least the subject lines of messages sent to the NANOG list, I'm temporarily without access to the news server on which I ordinarily read the list. For the next few days, I won't be able to answer replies not sent to me directly. -- J. Bashinski blue% traceroute -a -q 25 -Q champagne.cisco.com traceroute to checkpoint-sj.cisco.com (171.69.10.37), 30 hops max, 40 byte packets 1 tongue.velvet.com (206.14.77.65) (2.8 ms/3.6 ms(+-0.9 ms)/15.8 ms) 25/25 (100.00%) 2 tlg-cust-link.tlg.net (140.174.151.93) (39.0 ms/47.5 ms(+-10.3 ms)/134.8 ms) 25/25 (100.00%) 3 mae-west.tlg.net (198.32.136.22) (40.7 ms/45.6 ms(+-9.2 ms)/57.0 ms) 25/25 (100.00%) 4 905.Hssi3-0.GW1.SCL1.ALTER.NET (137.39.133.89) (43.9 ms/49.1 ms(+-9.9 ms)/60.8 ms) 25/25 (100.00%) 5 Fddi0-0.CR1.SCL1.Alter.Net (137.39.19.5) (43.5 ms/48.8 ms(+-9.8 ms)/57.0 ms) 25/25 (100.00%) 6 Hssi3-0.San-Jose3.CA.Alter.Net (137.39.100.1) * * (45.3 ms/52.5 ms(+-11.1 ms)/77.6 ms) 23/25 (92.00%) 7 Fddi0-0.San-Jose6.CA.Alter.Net (137.39.27.12) * (44.4 ms/49.6 ms(+-10.2 ms)/60.8 ms) 24/25 (96.00%) 8 Hssi1-0.Palo-Alto2.CA.ALTER.NET (137.39.101.162) (46.7 ms/52.6 ms(+-10.5 ms)/61.6 ms) 25/25 (100.00%) 9 Fddi1-0.Palo-Alto3.CA.Alter.Net (137.39.47.7) * * * * * (57.9 ms/82.0 ms(+-21.6 ms)/248.5 ms) 20/25 (80.00%) 10 decwrl.bbnplanet.net (198.32.176.5) * * * * * * (50.4 ms/60.8 ms(+-14.0 ms)/68.9 ms) 19/25 (76.00%) 11 paloalto-br1.bbnplanet.net (4.0.1.57) * * * * * * (52.7 ms/63.6 ms(+-14.7 ms)/82.3 ms) 19/25 (76.00%) 12 paloalto-cisco.bbnplanet.net (131.119.0.196) * * * * * * * (57.0 ms/66.2 ms(+-15.7 ms)/80.7 ms) 18/25 (72.00%) 13 * 131.119.26.10 (131.119.26.10) * * * * * (54.8 ms/65.6 ms(+-15.1 ms)/75.6 ms) 19/25 (76.00%) 14 sj-wall-2.cisco.com (192.31.7.34) * * * * * (48.2 ms/77.2 ms(+-18.4 ms)/151.9 ms) 20/25 (80.00%) 15 * * * * * sj-eng-corp2.cisco.com (198.92.1.130) * * * * * (58.5 ms/70.3 ms(+-18.2 ms)/81.2 ms) 15/25 (60.00%) 16 * eng-atm-gw2.cisco.com (171.69.4.129) * * * * * * * * * (61.9 ms/67.1 ms(+-17.4 ms)/78.7 ms) 15/25 (60.00%) 17 sj-eng-corp1.cisco.com (171.69.5.10) * * * * * * * (52.2 ms/64.9 ms(+-15.4 ms)/81.8 ms) 18/25 (72.00%) 18 checkpoint-sj.cisco.com (171.69.10.37) * * * * * * * * * * * * * * * * (58.1 ms/78.6 ms(+-30.0 ms)/202.2 ms) 9/25 (36.00%) ------- End of Forwarded Message
participants (1)
-
Curtis Villamizar