On Mon, 27 November 2000, Robert Cooper wrote:
On the other hand the data is somewhat conservative since it counts total number of subscribers, not the total number of active or would-be active users during the outage. The data does include overloads which in the PSTN manifest themselves through call admission control (e.g. network busy signal).
Like most statistical studies, understanding the data collection process is key. Using the FCC outage reports has some good features. In theory all carriers are under the same obligation to report outages meeting the defined thresholds, so you should be able to compare the data across carriers. There are some data anomolies, which makes it difficult to compare the data more widely. The failure to report a reportable outages is difficult to detect in the FCC data. And even among carriers reporting, there is tremendous variation in the amount of information supplied. Outages in leased, rather than owned facilities are occasionally reported. Shared leased facilities can also result in a very wide-spread, but unreported outage because the owner may not have 30,000 direct customers although the downstream carriers may have far more customers impacted. The various Illuminet SS7 problems are an example of wide-spread, unreported problems. And finally, data or packet switched outages are only reportable in so far as they affect the voice network. For example, when AT&T's frame-relay network failed the analysis done was the impact of lots of ISDN dial-backup devices simultaneously attempting to re-establish the data connection across the switched telephone network. And finally, the FCC outage data excludes most outages less than 30 minutes in length. So it is impossible to claim 99.999% reliability based on just the FCC data because the resolution of the data isn't fine enough to detected outages of 5 minutes. At most you can claim is four-nines.
What data exists for the Internet?
There are several sets of data for the Internet. As a starting point, I point people to Labovitz and Ahuja study "Experimental Study of Internet Stability and WideþArea Backbone Failures" http://www.eecs.umich.edu/techreports/cse/1998/CSE-TR-382-98.pdf One interesting piece of data they found is the mean time to repair an Internet problem is 20 minutes. If we followed the FCC outage reporting thresholds of thirty minutes or more, most (mean? half?) of the Internet problems wouldn't be counted as an outage. Also see the list of references at the end of the paper. Other sources of data include provider traffic reports, such as http://traffic.cw.net/ http://ipnetwork.bgtmo.ip.att.net/ provider status pages, such as http://www.noc.uu.net/ http://help.mindspring.com/netstatus/ and third-party performance reporting, such as http://average.miq.net/ http://internetpulse.com/ http://www.whatsdown.com/