Even when you have generators, there are still a myriad of things which can and do go wrong no matter how hard you try, even with trebly redundant backups, etc. etc. A fuel supplier accidently puts half a load of jet fuel instead of #2 diesel in your storage tank, which you won't find until it reloads the run tank from the day tank. Floods happen and the water rises taller than the 3 story snorkle on the diesels. Generators takes a direct lightning strike and fries house DC power (even inside shielded enclosures). A fiber transmission system goes crazy when the control system is zapped by the lightning strike on the generator. A 200mph hurricane gust rips a microwave system off the roof (tower and all) and throws it down on the generators, crushing the exhaust system and the diesels strangle. (No, I'm' not imagining these.) We all try very, very hard to make things reliable, but the world isn't perfect, nor are any of us. BBN will probably find some things to improve and change the odds next time. Then it will be someone else's turn to catch the javelins. I suggest that neither gleefull hand-rubbing nor "obvious" pronouncements based on partial knowledge about the real situation will seem quite so appropriate when it's *your* turn to be downfield from the launcher. Peace, -mo
On Sat, 12 Oct 1996, Mike O'Dell wrote:
A fuel supplier accidently puts half a load of jet fuel instead of #2 diesel in your storage tank, which you won't find until it reloads the run tank from the day tank.
Take samples from the fuel tank and test them every time tanks are filled.
Floods happen and the water rises taller than the 3 story snorkle on the diesels.
Don't site buildings where there have been floods in the past 500 years and don't site buildings downstream from dams. If a waterproof building is built in a flood area, make sure that snorkles rise above the 500 year flood level.
Generators takes a direct lightning strike and fries house DC power (even inside shielded enclosures). A fiber transmission system goes crazy when the control system is zapped by the lightning strike on the generator.
Is there any way to protect against lightning?
A 200mph hurricane gust rips a microwave system off the roof (tower and all) and throws it down on the generators, crushing the exhaust system and the diesels strangle. (No, I'm' not imagining these.)
When siting a diesel exhaust system make sure that there are no trees, towers or similar things nearby that could fall on it. If in a hurricane area, reinforce diesel snorkels and arrange for multiple paths to get air in.
We all try very, very hard to make things reliable, but the world isn't perfect, nor are any of us.
Yup. Now everybody else has learned from BBN's mistakes and from your anecdotes. Thus we make the network more reliable one step at a time. Michael Dillon - ISP & Internet Consulting Memra Software Inc. - Fax: +1-604-546-3049 http://www.memra.com - E-mail: michael@memra.com
Sure, Mike, but how do you protect against an airplane falling out of the sky? or having the building that houses your generators flattened by a runaway semi? Or the ever present possibility that the building next door will have a gas leak and explode? And what about that house-sized meteor that could come hurtling down? Give me a break. Hindsight is 20/20, it's easy to see how things could have been avoided, but excessive paranoia can and does get in the way of getting real work done. Any engineer worth his salt will tell you that 100% reliability is unattainable - IMHO, these days with the technology we work with daily as young as it is, I'm impressed with 90% uptime... For all the effort you put into saying how you could have done better, I sure hope you check the fuel quality on your generator and hide it (and the rest of your ISP) in a flood-proof well ventilated bomb shelter. I hear the goverment has an installation that might meet your standards somewhere under Cheyenne Mountain.... -Z On Saturday, Oct 12, 1996, Michael Dillon writes:
On Sat, 12 Oct 1996, Mike O'Dell wrote:
A fuel supplier accidently puts half a load of jet fuel instead of #2 diesel in your storage tank, which you won't find until it reloads the run tank from the day tank.
Take samples from the fuel tank and test them every time tanks are filled.
Floods happen and the water rises taller than the 3 story snorkle on the diesels.
Don't site buildings where there have been floods in the past 500 years and don't site buildings downstream from dams. If a waterproof building is built in a flood area, make sure that snorkles rise above the 500 year flood level.
Generators takes a direct lightning strike and fries house DC power (even inside shielded enclosures). A fiber transmission system goes crazy when the control system is zapped by the lightning strike on the generator.
Is there any way to protect against lightning?
A 200mph hurricane gust rips a microwave system off the roof (tower and all) and throws it down on the generators, crushing the exhaust system and the diesels strangle. (No, I'm' not imagining these.)
When siting a diesel exhaust system make sure that there are no trees, towers or similar things nearby that could fall on it. If in a hurricane area, reinforce diesel snorkels and arrange for multiple paths to get air in.
We all try very, very hard to make things reliable, but the world isn't perfect, nor are any of us.
Yup. Now everybody else has learned from BBN's mistakes and from your anecdotes. Thus we make the network more reliable one step at a time.
Michael Dillon - ISP & Internet Consulting Memra Software Inc. - Fax: +1-604-546-3049 http://www.memra.com - E-mail: michael@memra.com
On Sat, 12 Oct 1996, Zachary DeAquila wrote:
Sure, Mike, but how do you protect against an airplane falling out of the sky? or having the building that houses your generators flattened by a runaway semi? Or the ever present possibility that the building next door will have a gas leak and explode? And what about that house-sized meteor that could come hurtling down?
I don't think he was talking about protecting from something like a runaway semi, but you should protect your gear from as much as you can. It took us 6 months to get a location to build our Arlington colo. We did do things like only pick buildings over the 500 year flood stage, dual power feed from diverse locations, generator, redundant fiber feeds form diverse locations, built walls out of brick so that it would be harder to break in, and much more. BBN should have had a generator at the site and did not, they have now learned from their error. True that would not protect them from a "house-sized meteor", but basic things like backup generator on site is a basic. Nathan Stratton CEO, NetRail, Inc. Tracking the future today! --------------------------------------------------------------------------- Phone (703)524-4800 NetRail, Inc. Fax (703)534-5033 2007 N. 15 St. Suite 5 Email sales@netrail.net Arlington, Va. 22201 WWW http://www.netrail.net/ Access: (703) 524-4802 guest --------------------------------------------------------------------------- "Therefore do not worry about tomorrow, for tomorrow will worry about itself. Each day has enough trouble of its own." Matthew 6:34
On Sat, 12 Oct 1996, Zachary DeAquila wrote:
Sure, Mike, but how do you protect against an airplane falling out of the sky? or having the building that houses your generators flattened by a runaway semi? Or the ever present possibility that the building next door will have a gas leak and explode? And what about that house-sized meteor that could come hurtling down?
I suppose you think this is funny. But the people who run datacenters for large corporations (like insurance companies) and important government operations (like the taxman) do take these things into consideration. That's why you find redundant locations (like muliple exchange points) and data centers that are located two stories underground. About the only scenario you mentioned that the underground data center is vulnerable to is the metoer and a baseball sized one would likely suffice to destroy a whole town. That's why it is wise to not have everything at one physical location. Redundancy, redundancy, redundancy.
Give me a break. Hindsight is 20/20, it's easy to see how things could have been avoided,
That's right, so use hindsight to make better plans for the future.
but excessive paranoia can and does get in the way of getting real work done.
Not at all. Paranoia is for the people who make site plans and who reccommend site planning issues to management. It doesn't need to consume your attention all day long. Just be ready when the boss comes in for a tour, point at a box in the corner and say, "See that box there, if it breaks then the entire Northeast would be off the air for 24 hrs".
Any engineer worth his salt will tell you that 100% reliability is unattainable - IMHO, these days with the technology we work with daily as young as it is, I'm impressed with 90% uptime...
I'm not. Five nines quality *IS* attainable and the telcos generally manage this. Maybe individual components or subsytems will have as low as 90% uptime, but the entire mesh can be engineered for 99.999% uptime even with unreliable components like that. Five nines is equivalent to 8.76 hours downtime per year and that includes scheduled events.
For all the effort you put into saying how you could have done better,
I don't recall saying that I could have done better. I do recall saying that we (the industry as a whole) can do better in the future. Rather than throw up our hands when these events occur and say it's just bad luck, we can use them to learn where our blind spots are and fix the problems.
I hear the goverment has an installation that might meet your standards somewhere under Cheyenne Mountain....
I think that installation has much better than five nines uptime. What's wrong with learning from their example? If organizations like the Freemen and the OK City bombers weren't such frigging idiots they could probably destroy Western civilization as we know it by knocking out most of the USA's key power and communications infrastructure. Modern technological civilization is built on a house of cards and it's about time we started hardening the foundations before it collapses. Michael Dillon - ISP & Internet Consulting Memra Software Inc. - Fax: +1-604-546-3049 http://www.memra.com - E-mail: michael@memra.com
In message <Pine.BSI.3.93.961012190050.16602C-100000@sidhe.memra.com>, Michael Dillon writes:
Any engineer worth his salt will tell you that 100% reliability is unattainable - IMHO, these days with the technology we work with daily as young as it is, I'm impressed with 90% uptime...
I'm not. Five nines quality *IS* attainable and the telcos generally manage this. Maybe individual components or subsytems will have as low as 90% uptime, but the entire mesh can be engineered for 99.999% uptime even with unreliable components like that. Five nines is equivalent to 8.76 hours downtime per year and that includes scheduled events.
You must have missed my half joking posting on the voice system outages we've seen (as just one customer of the voice network). In order to give us 99.999%, Nynex owes us 250 years of flawless service for the one 22 hour outage I mentioned. Since there have been other outages to our voice service, I think we're due for a good 1,000 years of flawaless service to bring us up to 99.999%. If we hold BBN to the same standards that the telco industry uses to come up with 99.999%, BBN was up but "a few customers" experienced a localized outage. This would be brushed off the same as the Illinious AT&T fire at the POP that took out much of the Chicago suburbs for about a week. It doesn't count against the 99.999%. (Otherwise AT&T owes Chicago a couple thousand years of flawless service:). Curtis
Date: Sat, 12 Oct 1996 19:02:50 -0500 From: Zachary DeAquila <zachary@zachs.place.org> Sender: owner-nanog@merit.edu Sure, Mike, but how do you protect against an airplane falling out of the sky? or having the building that houses your generators flattened by a runaway semi? Or the ever present possibility that the building next door will have a gas leak and explode? And what about that house-sized meteor that could come hurtling down? Give me a break. Hindsight is 20/20, it's easy to see how things could have been avoided, but excessive paranoia can and does get in the way of getting real work done. Any engineer worth his salt will tell you that 100% reliability is unattainable - IMHO, these days with the technology we work with daily as young as it is, I'm impressed with 90% uptime... For all the effort you put into saying how you could have done better, I sure hope you check the fuel quality on your generator and hide it (and the rest of your ISP) in a flood-proof well ventilated bomb shelter. I hear the goverment has an installation that might meet your standards somewhere under Cheyenne Mountain.... -Z There's another thing that hasn't been mentioned during this thread. Suppose BBN did ALL of the necessary upgrades to prevent this sort of outage. Not only at this site, but at all of their sites. Now they come to you, their customer and say "Well, we've made all of these improvements, but we now have to raise your rates by 50% to cover the costs." How many of you would simply change providers? An editorial in Unix Review (October issue I believe) talked about some outages that their site had suffered and mentioned other outages. Then the writer asked the question "Who's at fault?". His answer was that the consumer was at fault, because the consumer is unwilling to pay the rates necessary to pay for the level of service that they demand. We all scream for fixes when these outages occur. How much are we (and our customers) willing to pay for them? -- David.Schmidt@on-ramp.ior.com Internet On-Ramp, Inc. (509)624-RAMP (7267) Spokane, Washington http://www.ior.com/ (509)323-0116 (fax)
On Sun, 13 Oct 1996, David J. Schmidt wrote:
There's another thing that hasn't been mentioned during this thread.
Suppose BBN did ALL of the necessary upgrades to prevent this sort of outage. Not only at this site, but at all of their sites.
Great.
Now they come to you, their customer and say "Well, we've made all of these improvements, but we now have to raise your rates by 50% to cover the costs."
No way, most of the things mentioned that need to be done before you put a build a pop are free. Things like generators are not a lot of money, and a nice little manual bypass switch is about 2K. Things can be done, I don't think all of us should build POPs 100 feet under the ground, but there are things we can do that can help.
How many of you would simply change providers?
An editorial in Unix Review (October issue I believe) talked about some outages that their site had suffered and mentioned other outages. Then the writer asked the question "Who's at fault?". His answer was that the consumer was at fault, because the consumer is unwilling to pay the rates necessary to pay for the level of service that they demand.
No way, this stuff is not that much. A generator cost less then a router, and most of the things you can do to help are free.
We all scream for fixes when these outages occur. How much are we (and our customers) willing to pay for them?
Look, I don't think people are asking for 100% uptime, but 99.999 is a valid goal, and can be reached. I had to learn that hard way that a generator at every pop was something was saved you money. Nathan Stratton CEO, NetRail, Inc. Tracking the future today! --------------------------------------------------------------------------- Phone (703)524-4800 NetRail, Inc. Fax (703)534-5033 2007 N. 15 St. Suite 5 Email sales@netrail.net Arlington, Va. 22201 WWW http://www.netrail.net/ Access: (703) 524-4802 guest --------------------------------------------------------------------------- "Therefore do not worry about tomorrow, for tomorrow will worry about itself. Each day has enough trouble of its own." Matthew 6:34
In message <Pine.LNX.3.95.961013231754.15595A-100000@netrail.net>, Nathan Strat ton writes:
We all scream for fixes when these outages occur. How much are we (and our customers) willing to pay for them?
Look, I don't think people are asking for 100% uptime, but 99.999 is a valid goal, and can be reached. I had to learn that hard way that a generator at every pop was something was saved you money.
Sorry wrong answer. Five 9s is less than 6 minutes of downtime/year. A large amount of the telco gear is only rated at six nines. (Six nines is less than 1 minute of downtime/year.) Thats not even getting into the issue of downtime v. availability. With the delays in BGP processing commonly seen today, (on the order of 15 minutes), one prefix flapping would put that prefix at less than four nines (11 minutes). Three nines, (just under two hours), would be a good goal for planned availability. Then of course, there is the issue of maintence. I understand telephone switch software designers have the problem of all switch upgrades have to be done with the switch running. There is a whole programming group at some companies that searchs for extra bits in structures to add new features, because they can't reboot the things to change the size of data strucutres... That just ain't gonna happen, when your favorite router vendor's best guess for fixing some problems is to reload the router.... (or you have to install new firmware, and do a reload to get it going...) Not all routers may be this way, but for any deployed networks today, three nines is going to be an upper limit. --- Jeremy Porter, Freeside Communications, Inc. jerry@fc.net PO BOX 80315 Austin, Tx 78708 | 1-800-968-8750 | 512-458-9810 http://www.fc.net
On Sat, 12 Oct 1996, Zachary DeAquila wrote:
Sure, Mike, but how do you protect against an airplane falling out of the sky? or having the building that houses your generators flattened by a runaway semi? Or the ever present possibility that the building next door will have a gas leak and explode? And what about that house-sized meteor that could come hurtling down?
You should not be working in a production, 24x7 environment.
Give me a break. Hindsight is 20/20, it's easy to see how things could have been avoided, but excessive paranoia can and does get in the way of getting real work done. Any engineer worth his salt will tell you that 100% reliability is unattainable - IMHO, these days with the technology we work with daily as young as it is, I'm impressed with 90% uptime...
Any Operations Manager should curb any engineer that does not understand or display a comittment to 24x7 uptime.
For all the effort you put into saying how you could have done better, I sure hope you check the fuel quality on your generator and hide it (and the rest of your ISP) in a flood-proof well ventilated bomb shelter.
I don't expect a small or medium sized ISP to take such precautions. I do expect a BBN, ANS, SPrint, MCI, UUNET *to take such precautions*.
I hear the goverment has an installation that might meet your standards somewhere under Cheyenne Mountain....
Your off.. Take a visit to any wireless office, PCS or Cellular for an understanding of how equipment quarters are built to wihtstand disaster situations.. Many of these companies are smaller endeavors than the larger ISP's and IAP's and, have dealt with comparible growth periods. If they can do it, so can people like BBN. Regards pjc
On Sat, 12 Oct 1996, Mike O'Dell wrote: Greetings..
Even when you have generators, there are still a myriad of things which can and do go wrong no matter how hard you try, even with trebly redundant backups, etc. etc.
A fuel supplier accidently puts half a load of jet fuel instead of #2 diesel in your storage tank, which you won't find until it reloads the run tank from the day tank.
Run the generator under load periodically. Have your gen maintenance contractor do fuel tests quarterly. Water is just as bad in the Diesel tank.
Floods happen and the water rises taller than the 3 story snorkle on the diesels.
30' flood waters.. That's a bad, bad place for a network hub.
Generators takes a direct lightning strike and fries house DC power (even inside shielded enclosures).
Not so.. A properly grounded building with external ground halo should be able to take a direct hit. Take a look inside a cellular telephone site located in the lightening belt for a first hand understanding.
A fiber transmission system goes crazy when the control system is zapped by the lightning strike on the generator.
See above..
A 200mph hurricane gust rips a microwave system off the roof (tower and all) and throws it down on the generators, crushing the exhaust system and the diesels strangle. (No, I'm' not imagining these.)
Better to put the gensets inside the building.
We all try very, very hard to make things reliable, but the world isn't perfect, nor are any of us. BBN will probably find some things to improve and change the odds next time. Then it will be someone else's turn to catch the javelins.
By indications apparent, BBN didn't have much in the way of disaster planning in mind.
I suggest that neither gleefull hand-rubbing nor "obvious" pronouncements based on partial knowledge about the real situation will seem quite so appropriate when it's *your* turn to be downfield from the launcher.
I presently manage a very large cellular network located in the a hurricane belt on an isolated power grid. One of our sidelines is equipment co-location for ISP's. Before this job, I managed the cellular switches for one of the Los Angeles carriers (Riots and Earthquakes). There are many, many steps to accept and follow that will allow you to maintain service in the worst situations. The're sometimes costly yet common with the carriers that are in service when the others are not. I see system outage reports weekly from other markets in my company. Unfortunately, it is all to common to see that they ignored battery and generator maintenance and rarely, if ever exercised their backup systems under load. Regards Patrick J. Chicas Email: pjc@unix.off-road.com URL: http://www.Off-Road.com -------------------------------- The Off-Road Center of The 'Net!
participants (8)
-
Curtis Villamizar
-
David J. Schmidt
-
Jeremy Porter
-
Michael Dillon
-
mo@UU.NET
-
Nathan Stratton
-
Patrick J. Chicas
-
Zachary DeAquila