
Curtis: I was referring to when Merit had the NSFNET NOC...! ;-) Of course you are correct; if you observe the links over a long enough time, you will see loss. I hope that the orders of magnitude between 10% loss and 1E-4/1E-5 make an impression on persons saying that the first number is acceptable. I'm also glad to hear that MCI has continued its vigilance; they were always very ready to look into problems which we reported, run diagnostics with us, etc. Steve R. ===
From list-admin@merit.edu Tue Nov 7 00:08:47 1995 Message-Id: <199511070414.XAA17732@brookfield.ans.net> To: "Steven J. Richardson" <sjr@merit.edu> cc: hwb@upeksa.sdsc.edu, michael@memra.com, D.Mills@cs.ucl.ac.uk, mn@tremere.ios.com, nanog@merit.edu, nathan@netrail.net Reply-To: curtis@ans.net Subject: Re: links on the blink (fwd) In-reply-to: Your message of "Mon, 06 Nov 1995 15:18:15 EST." <199511062018.PAA08597@home.merit.edu> Date: Mon, 06 Nov 1995 23:14:45 -0500 From: Curtis Villamizar <curtis@ans.net> Status: R
Steve,
Enough of your wild stories of -0%- loss. :-) The correct figure was 10^-5 for acceptance with 10^-4 being the maximum threshold we would accept on a running circuit before contacting MCI to take the circuit in a maintenance window for diagnostics. That doesn't mean we wouldn't bug MCI to get the circuits back perfectly clean. ;-)
We still have the same criteria. I think MCInet is also as vigilant.
Curtis

On Tue, 7 Nov 1995, Steven J. Richardson wrote:
Of course you are correct; if you observe the links over a long enough time, you will see loss. I hope that the orders of magnitude between 10% loss and 1E-4/1E-5 make an impression on persons saying that the first number is acceptable.
You are misinterpreting my statements and I think this is because you are forgetting the time element. Even 100% packet loss is acceptable and was frequently occurring on the old NSFnet. If you measure over the time interval that a 1500 byte packet takes to traverse a gateway then loss of a single packet will register as 100% loss. So in order to state what percentage of loss is acceptable and be unambiguously understood, you need to specify the time element. Of course, I am also guilty of not explicitly stating this time element. Then there is the difference between what loss is acceptable and what loss is desired. It is likely that most core NSP's would consider a DESIRABLE packet loss rate over an hour of time to be .001% but it is also possible to simultaneously consider 10% packet loss over the same time period to be ACCEPTABLE as long as it does not occur during more than one hour out of 24. Note the similarities between my statement re 10% and the familiar refrain from polling companies, "accurate to within 2 percentage points 19 times out of 20". To get a REASONABLE standard of packet loss you have to qualify your numbers by saying that a loss rate of .001% 23 hours out of 24 is desirable but a loss rate of 10% 1 hour out of 24 is acceptable. This recognizes the reality of today's global Internet which is not anywhere near fully meshed and which is experiencing sustained surges of growth. Just like in a race condition, sometimes the NSP's will fall behind due to line failures and equipment failures and new equipment shipping failures and so on. We don't get anywhere by slamming NSP's for aberrations even if those aberrations recur on a daily basis for short periods of time because the Model T Ford level of network technology that we are using just doesn't allow them to do much better. Michael Dillon Voice: +1-604-546-8022 Memra Software Inc. Fax: +1-604-542-4130 http://www.memra.com E-mail: michael@memra.com

Mike, I bet myself you would respond the way you did before I finished processing my mail bag. I won. Sure, 100% packet loss is eminently acceptable if that loss rate occurs not more than 1% of the time. Maybe 10% packet loss is acceptable if it occurs no more than 2% of the time and not in the same breath as the former. Please deliver me/us from all this. The performance of the US/UK link is clearly unacceptable. Geeze, but it would be nice to have a loss profile which would allow characterization on burst frequencies/durations for that, but the carriers serving Dog Island in the middle of the Thames ain't talking. The loss profile that HW is talking about might even be marginally acceptable, in spite of his squawks, but we don't have the data to develop that, right? Dave

In message <Pine.LNX.3.91.951107212422.18960D-100000@okjunc.junction.net>, Mich ael Dillon writes:
So in order to state what percentage of loss is acceptable and be unambiguously understood, you need to specify the time element. Of course, I am also guilty of not explicitly stating this time element.
10^-4 and 10^-5 is determined by a test involving a bit under 10^5 packets per run. <g>. The test is generally run many times during circuit acceptance. It is run on circuits that are suspected of trouble. Since the traffic source is pps limited, the test can be run on a live lightly loaded DS3.
To get a REASONABLE standard of packet loss you have to qualify your numbers by saying that a loss rate of .001% 23 hours out of 24 is desirable but a loss rate of 10% 1 hour out of 24 is acceptable. This recognizes the reality of today's global Internet which is not anywhere near fully meshed and which is experiencing sustained surges of growth. Just like in a race condition, sometimes the NSP's will fall behind due to line failures and equipment failures and new equipment shipping failures and so on.
We also allow a very small number of SES. I think it is 60 or 90 SES per day with a lower threshhold on any 15 minute interval. This is reported in the DS3 MIB. One hour of loss has never been acceptable. The NSS routers are pps limited but rock solid below a certain pps ceiling. We have strived to work around this by arranging our topology to avoid exceeding the pps limits until we can replace these routers. The goal here is to keep below 10^-4 packet loss over 15 minute periods including cicuit or FDDI congestion within our core. Only tail circuits to customers are allowed to congest (as long as that is all the customer is willing to pay for for their attachment). Lately we have been having difficulty with the pps limits but nowhere near 10% even briefly. At certain hot spots we have occasionally set up temporary shell programs checking error rates on a 1 second interval, saving intervals with high error rate. We will probably have to adjust our topology again to distribute traffic differently and have ordered circuits specifically to avoid loss. The point is that not everyone accepts high loss rates as "normal".
Michael Dillon
Curtis

On Wed, 8 Nov 1995, Curtis Villamizar wrote:
We also allow a very small number of SES. I think it is 60 or 90 SES per day with a lower threshhold on any 15 minute interval. This is reported in the DS3 MIB. One hour of loss has never been acceptable.
The NSS routers are pps limited but rock solid below a certain pps ceiling. We have strived to work around this by arranging our topology to avoid exceeding the pps limits until we can replace these routers. The goal here is to keep below 10^-4 packet loss over 15 minute periods including cicuit or FDDI congestion within our core. Only tail circuits to customers are allowed to congest (as long as that is all the customer is willing to pay for for their attachment). Lately we have been having difficulty with the pps limits but nowhere near 10% even briefly. At certain hot spots we have occasionally set up temporary shell programs checking error rates on a 1 second interval, saving intervals with high error rate. We will probably have to adjust our topology again to distribute traffic differently and have ordered circuits specifically to avoid loss.
The point is that not everyone accepts high loss rates as "normal".
Depends what you mean by normal. I understand that as an NSP you don't accept high loss rates as normal and ignore them. You monitor loss rates and take action to deal with them. But the fact is that in todays Internet it is a normal state of affairs for end users to experience high loss rates from time to time on some of their connections. The end user must accept this since it as a fact of life and totally beyond their control or their ISP's control. In fact, I suspect it is even beyond your control. You are not able to predict 100% of the time where congestion will occur and what will be needed to cope with it. All you can do is detect it fast, and act fast to fix the problem. During the interval between the first occurence of the problem and the fix, end users have to live with congestion. That's what I mean by "normal". Michael Dillon Voice: +1-604-546-8022 Memra Software Inc. Fax: +1-604-542-4130 http://www.memra.com E-mail: michael@memra.com
participants (4)
-
Curtis Villamizar
-
Dave Mills
-
Michael Dillon
-
Steven J. Richardson