Re: Limits of reliability or is 99.999999999% realistic

27 Nov 2000

      On Mon, 27 November 2000, Robert Cooper wrote:
...
On the other hand the data is somewhat conservative since it counts
total number of subscribers, not the total number of active or would-be
active users during the outage. The data does include overloads which
in the PSTN manifest themselves through call admission control
(e.g. network busy signal).
Like most statistical studies, understanding the data collection process
is key.  Using the FCC outage reports has some good features.  In theory
all carriers are under the same obligation to report outages meeting
the defined thresholds, so you should be able to compare the data
across carriers.

There are some data anomolies, which makes it difficult to compare
the data more widely.  The failure to report a reportable outages is
difficult to detect in the FCC data.  And even among carriers reporting,
there is tremendous variation in the amount of information supplied.

Outages in leased, rather than owned facilities are occasionally reported.
Shared leased facilities can also result in a very wide-spread, but
unreported outage because the  owner may not have 30,000 direct customers
although the downstream carriers may have far more customers impacted.
The various Illuminet SS7 problems are an example of wide-spread,
unreported problems.  And finally, data or packet switched outages
are only reportable in so far as they affect the voice network.
For example, when AT&T's frame-relay network failed the analysis done
was the impact of lots of ISDN dial-backup devices simultaneously
attempting to re-establish the data connection across the switched
telephone network.

And finally, the FCC outage data excludes most outages less than
30 minutes in length.  So it is impossible to claim 99.999% reliability
based on just the FCC data because the resolution of the data isn't
fine enough to detected outages of 5 minutes.  At most you can claim
is four-nines.
...
What data exists for the Internet?
There are several sets of data for the Internet.  As a starting point,
I point people to Labovitz and Ahuja study "Experimental Study of
Internet Stability and WideþArea Backbone Failures"

http://www.eecs.umich.edu/techreports/cse/1998/CSE-TR-382-98.pdf

One interesting piece of data they found is the mean time to repair
an Internet problem is 20 minutes.  If we followed the FCC outage
reporting thresholds of thirty minutes or more,  most (mean? half?)
of the Internet problems wouldn't be counted as an outage.  Also see
the list of references at the end of the paper.

Other sources of data include provider traffic reports, such as

http://traffic.cw.net/
http://ipnetwork.bgtmo.ip.att.net/

provider status pages, such as

http://www.noc.uu.net/
http://help.mindspring.com/netstatus/

and third-party performance reporting, such as

http://average.miq.net/
http://internetpulse.com/
http://www.whatsdown.com/

Re: Limits of reliability or is 99.999999999% realistic

Sean Donelan