[fwd] Rats take down Stanford ...
A follow-up thought on redundancy issues. - paul [snip]
Date: Mon, 21 Oct 1996 12:54:05 -0700 (PDT) From: risks@csl.sri.com Subject: RISKS DIGEST 18.54
[snip]
Date: Fri, 18 Oct 96 11:03 EST From: William Hugh Murray <0003158580@mcimail.com> Subject: Re: Rats take down Stanford ... (RISKS-18.53)
PGN's request for redundancy brings to mind the story of the infrastructure computer center in Trumbull, Connecticut. It is an old story but bears repeating.
Seems that a squirrel got into a transformer and brought down the external power supply. The UPS kicked in, engine generators came on line, and the center operated in this mode for about an hour and a half. At the end of that time the external power was restored. The external power, the UPS, and the engine generators went inot a deadly embrace. The whole thing came down and would not come back up.
I take two lessons from this. First, redundancy adds some complexity and a lot of redundancy adds a lot of complexity. At some point the redundancy begins to introduce failure modes and failure events that would not have exited in its absence. There is an upper bound to such redundancy.
Second, test redundant systems through to resumption of normal operations. In this case, the operators had tested to ensure that the redundant systems would come online in the event of a failure of the primary system. They had not tested to see what would happen when the primary system was restored to normal operation.
Who would have even thought about it? I confess that I would not have.
William Hugh Murray, New Canaan, Connecticut
[snip]
On Tue, 22 Oct 1996, Paul Ferguson wrote:
Second, test redundant systems through to resumption of normal operations. In this case, the operators had tested to ensure that the redundant systems would come online in the event of a failure of the primary system. They had not tested to see what would happen when the primary system was restored to normal operation.
Who would have even thought about it? I confess that I would not have.
William Hugh Murray, New Canaan, Connecticut
Greetings, As a rule we exercise our generator under load every Thursday at 1000hrs. I would like to do this monthly but, the automatic exercise mechanism cannot deal with 30 day intervals. In addition, once every 6 months, we simulate a power outage by switching off and on the main breaker at the service entrance durig a traffic lull period. Cheers Patrick J. Chicas Email: pjc@unix.off-road.com URL: http://www.Off-Road.com -------------------------------- The Off-Road Center of The 'Net!
participants (2)
-
Patrick J. Chicas
-
Paul Ferguson