On 31 Oct 2000, Sean Donelan wrote:
One anecdotal data point, I've been reporting about Internet problems for the last five years or so. Over the last 5 years no Internet network event has been so severe it prevented me from reporting about the problem on the net. In a strange way, my postings about the problems on the net are proof of the reliability of the same network.
In the same time period, I've lost my telephone service several times.
I've lost my pager service multiple times
Even the Associated Press has gone down in the last five years.
So I'm sick and tired about hearing the telephone network is 99.999% reliable and the Internet isn't.
Me, too. I'll put some finer points on the topic, though. The services you cite (phone, paging, AP) have essentially one application from a user point of view. The Internet has thousands of different applications at the user level. They handle this or that "outage" differently. Email is particularly robust in its store and forward behavior as a bounce is the only failure mode that readily comes to mind (ie. any mail that's eventually delivered is successful.) Other applications may behave differently but adding complexity to the analysis is the fact that the number of end nodes means that the matrix of possible src/dest pairs quickly climbs into the billions with port and protocol multiplexing on top of that. Thinking of things this way, it seems clear that there's no way to measure the "up-ness" of any part of the Internet that's not been isolated by some outage at a local entry point. Meaning that other than when its ethernet cable is unplugged or the WAN link out of the building go down, a given machine seems to be "up" as does the larger network, but we can be sure there's something somewhere it can't get to and some application that would be affected. Given this, measurements based on 9's seem particularly ill-suited and any metric that's not *extremely* narrowly defined seems incalculable. How to explain this to customers though? One possible approach would be to remind them that the way most users approach Internet applications approximates, "Oh, it's not working now, time for a coffee break." Nothing seems to stop people from building inappropriate applications on top of IP, though. (While I might consider it folly to depend on a web page to send sell orders for my stocks when the market's crashing, can I be sure that what ever other approach might be in the back of my mind would work in a "bad time"?) I'd like to see more discussion in this forum of new ways to think about risk and communicate it to those outside the technical IP community (eg. managers and customers). Tony