On Fri, 21 Jun 1996, Peter Kaminski wrote:
Can other big parts of the backbone fall down and take 13 (or more) hours to get back up? Or is the rest of the net engineered more redundantly than Netcom? Should I build two backbones, each with separate technologies?
Ask NASA how they do it. Three redundant systems using two separate technologies. But then look at NASA's downside and compare it to yours. If Netcom's customers hardly noticed this maybe the dialup market doesn't care. However, the leased line market is a whole other story and they also have the technical expertise to understand your backbone engineering and perhaps pay a higher fee to have that redundancy. This question really tangles up marketing and engineering concerns together.
Was this a foreshock of the coming Metcalfean Big One, or just lousy procedures at one of the bigger ISPs?
The bigger they are, the harder they fall. Seems to me that as ISP's and NSP's get larger, failures will be more spectacular. However, the big one depends on the ability for failures to propogate from one ISP/NSP to another and I don't think this is very likely. Partly due to the different engineering styles and partly due to the diversity of technology deployed. You have frame relay backbones, ATM fabrics, DS3 meshes with Cisco nodes and DS3 meshes with Bay nodes. Up until Netcom, the most spectacular failures I recall seeing over the past two years were either caused by NAP congestion or backhoes. NAP congestion is partially a management failure to deploy bigger pipes and routers and increase the number of NAP's in time to meet the growth in traffic flow. But it is also self-correcting as some customers migrate to NSP's with less congestion and management injects capital into their infrastructure. It seems to be a well understood problem. But to me, backhoes are the most interesting failure mode. For one, I don't think that backhoe problems can be eliminated and I think that as the physical mesh of fibre becomes more finely divided over the geography, these incidents will increase. And I also don't know of anyone taking action to protect against these events by building geographic redundancy into their backbones. This may be partly because NSP's often don't have any idea where the fibres lie and partly because they want to use a specific infrastructure like SPRINT and its railway rights of way. The incident in the Northeast where a backhoe cut a Wiltel(?) fibre bundle that was carrying critical DS3's leased by all the NSP's in the region points out how catastrophic this can be. Michael Dillon ISP & Internet Consulting Memra Software Inc. Fax: +1-604-546-3049 http://www.memra.com E-mail: michael@memra.com