On 8 Jan 2001, Sean Donelan wrote:
Is there any consistency among network operators how they operate their networks when they know a possibility of imminent failure exists?
1. Do you attempt to preserve service as long as possible, including running equipment to the point of destruction?
To extend as long as you can: 1) Power down as much hardware as possible 2) Pull all redundant cards 3) Pull fan trays
2. Do you attempt to minimize recovery time by shutting down equipment to a "safe" condition before failure?
Depends on the outage, if you think you can make it then you dont. Things like pulling fan trays can give you a lot more run time, but may damage hardware so you need to watch it. If it looks like you may make it you may want to override your love voltage disconnects on your DC systems. It may toast your batteries, but if it will get you through an outage it may be worth it.
If you are running a database/transaction oriented system, I would expect you want to put the database into a stable condition. On the other hand, if you are operating mostly communication equipment, you would want to leave it operating as long as possible.
What I like to do is shutdown the redundant database so you know you have something to fall back on. You then run the other database into the ground.
I'm aware of a variety of proprietary software shutdown programs associated with UPS vendors. But I'm wondering do any "open standards" exist for initiating soft shutdowns?
It very much depend on what you are doing. I like having the control over what I kill in my network. Of course the best plan is to never let the above happen, but I don't care how redundant your system, if you have been in this business a long time you will reach a crash event. Knowing how to deal with it can extend the event a long time.
<> Nathan Stratton CTO, Exario Networks, Inc. nathan@robotics.net nathan@exario.net http://www.robotics.net http://www.exario.net