Is there any consistency among network operators how they operate their networks when they know a possibility of imminent failure exists? 1. Do you attempt to preserve service as long as possible, including running equipment to the point of destruction? 2. Do you attempt to minimize recovery time by shutting down equipment to a "safe" condition before failure? If you are running a database/transaction oriented system, I would expect you want to put the database into a stable condition. On the other hand, if you are operating mostly communication equipment, you would want to leave it operating as long as possible. I'm aware of a variety of proprietary software shutdown programs associated with UPS vendors. But I'm wondering do any "open standards" exist for initiating soft shutdowns?
On 8 Jan 2001, Sean Donelan wrote:
Is there any consistency among network operators how they operate their networks when they know a possibility of imminent failure exists?
1. Do you attempt to preserve service as long as possible, including running equipment to the point of destruction?
To extend as long as you can: 1) Power down as much hardware as possible 2) Pull all redundant cards 3) Pull fan trays
2. Do you attempt to minimize recovery time by shutting down equipment to a "safe" condition before failure?
Depends on the outage, if you think you can make it then you dont. Things like pulling fan trays can give you a lot more run time, but may damage hardware so you need to watch it. If it looks like you may make it you may want to override your love voltage disconnects on your DC systems. It may toast your batteries, but if it will get you through an outage it may be worth it.
If you are running a database/transaction oriented system, I would expect you want to put the database into a stable condition. On the other hand, if you are operating mostly communication equipment, you would want to leave it operating as long as possible.
What I like to do is shutdown the redundant database so you know you have something to fall back on. You then run the other database into the ground.
I'm aware of a variety of proprietary software shutdown programs associated with UPS vendors. But I'm wondering do any "open standards" exist for initiating soft shutdowns?
It very much depend on what you are doing. I like having the control over what I kill in my network. Of course the best plan is to never let the above happen, but I don't care how redundant your system, if you have been in this business a long time you will reach a crash event. Knowing how to deal with it can extend the event a long time.
<> Nathan Stratton CTO, Exario Networks, Inc. nathan@robotics.net nathan@exario.net http://www.robotics.net http://www.exario.net
1. When I had a power supply fail in a fileserver about a year ago, I limped it along until my next maintence window (which happened to be in 24 hours, thank goodness) and replaced it then. It was only a 10 minute downtime for my users who were very happy because there was no downtime durning business hours. Usually this is what I will do. The less downtime I can have outside my maintence windows, the better. 2. Depends. If there is a chance I'll break something if I don't shut it all down, I will. If there is not a likely chance I'll break it, then great, I'll keep working. If I have to shut down my database server, I'll switch over to the backup and keep working and then do the repairs and bring my backup online. We've had issues here with power outages and usually the UPS' will hold. The one time they didn't, we went and brought all the machines down gracefully as we didn't have the auto-shutdown installed on the systems. While I do realize this is describing the "perfect" problem, there will be times when a NIC will fail or someone will cut the fiber, and then you just have to handle it the best way you know how to get the issue resolved, then take a blunt object (like the clue phone) to the person who cut the fiber. ;-) -Eric -- Eric Whitehill ericw@xtratyme.com Network Engineer XtraTyme Technologies 320.864.8513 http://www.xtratyme.com
1. Do you attempt to preserve service as long as possible, including running equipment to the point of destruction?
2. Do you attempt to minimize recovery time by shutting down equipment to a "safe" condition before failure?
If you are running a database/transaction oriented system, I would expect you want to put the database into a stable condition. On the other hand, if you are operating mostly communication equipment, you would want to leave it operating as long as possible.
I'm aware of a variety of proprietary software shutdown programs associated with UPS vendors. But I'm wondering do any "open standards" exist for initiating soft shutdowns?
On Mon, Jan 08, 2001 at 08:49:17AM -0600, Eric Whitehill wrote:
We've had issues here with power outages and usually the UPS' will hold. The one time they didn't, we went and brought all the machines down gracefully as we didn't have the auto-shutdown installed on the systems.
We don't shut anything down with a management call, unless it's going to fail and break something in the next 15 minutes. We have a generator, but we have had two amazing coincidences cause it to fail. The first time, the generator was fine, but the switch didn't switch. The person who was signing off (erroneously) that he was checking that switch monthly lost his job shortly before we stopped using his company entirely. We discovered the problem when the batteries reached the point where it was supposed to cut over, and the entire data center went dark. That was a very, very bad day. The second time, an o-ring blew out, and we dumped so much oil on the ground, we were told that if it'd been a tiny bit more we'd have had to call the EPA. This one gave us enough warning to shut things down, but we had to hustle and a few things were triaged as "let it die, we don't have time." In general, however, we start planning for a controlled shutdown the minute we know there's a problem, and we attempt to schedule that shutdown for our scheduled weekly outage window if possible. If not, we try to make it after peak processing time for the affected components.
On Mon, 8 Jan 2001, Shawn McMahon wrote:
We have a generator, but we have had two amazing coincidences cause it to fail. The first time, the generator was fine, but the switch didn't switch. The person who was signing off (erroneously) that he was checking that switch monthly lost his job shortly before we stopped using his company entirely. We discovered the problem when the batteries reached the point where it was supposed to cut over, and the entire data center went dark. That was a very, very bad day.
I recall that ominous depressing feeling sitting by myself in a dark data center at 3 in the morning, with nothing but the lights of my equipment, listening to the rectifiers beeping and as the batteries went dead, hearing machines drop off one by one until it was completely dark... andy
On Mon, Jan 08, 2001 at 09:27:07AM -0600, Andy Walden wrote:
I recall that ominous depressing feeling sitting by myself in a dark data center at 3 in the morning, with nothing but the lights of my equipment, listening to the rectifiers beeping and as the batteries went dead, hearing machines drop off one by one until it was completely dark...
I was on the phone with our Managing Director (that'd be three levels above me) assuring him that everything was reading fine on the monitors, and the generator was running, so everything would cut over any... ...and then three voices all said "oh shit" at the same time.
I'm aware of a variety of proprietary software shutdown programs associated with UPS vendors. But I'm wondering do any "open standards" exist for initiating soft shutdowns? Almost all UPS's on the market twiddle with the DTR and RTS signals on the serial port when a power failure or an imminent battery failure. You can generally twiddle a pin on the serial cable to shut it off as well. For APC smart UPS devices, people have reverse engineered the protocol to communicate in smart mode and get battery voltages and the like. Do a search for "linux UPS daemons APC" and you should find something of use (since you will probably have to read the source code to figure out the
On 8 Jan 2001, Sean Donelan wrote: protocol, it helps if you know C) -Paul By popular request my signature has moved to <http://198.87.147.226/paulsig.txt> Paul Timmins paul@timmins.net http://www.timmins.net/ "By definition, if you don't stand up for anything, you stand for nothing." ---Paul Timmins
On Mon, Jan 08, 2001 at 09:58:07AM -0500, Paul Timmins wrote:
I'm aware of a variety of proprietary software shutdown programs associated with UPS vendors. But I'm wondering do any "open standards" exist for initiating soft shutdowns? Almost all UPS's on the market twiddle with the DTR and RTS signals on the serial port when a power failure or an imminent battery failure. You can generally twiddle a pin on the serial cable to shut it off as well. For APC smart UPS devices, people have reverse engineered the protocol to communicate in smart mode and get battery voltages and the like. Do a search for "linux UPS daemons APC" and you should find something of use (since you will probably have to read the source code to figure out the
On 8 Jan 2001, Sean Donelan wrote: protocol, it helps if you know C)
And if you are running a late-model linux (preferably RedHat), you can download APC's own "award-winning PowerChute Plus" software for linux from their website. It seems to be identical to PowerChutePlus running on any other platforms, except that the interface is through X86. -- Henry Yen Aegis Information Systems, Inc. Senior Systems Programmer Hicksville, New York
Unnamed Administration sources reported that Sean Donelan said:
I'm aware of a variety of proprietary software shutdown programs associated with UPS vendors. But I'm wondering do any "open standards" exist for initiating soft shutdowns?
NUTS: Network UPS Tools <http://www.exploits.org/nut/> has sw for many UPSi..... -- A host is a host from coast to coast.................wb8foz@nrk.com & no one will talk to a host that's close........[v].(301) 56-LINUX Unless the host (that isn't close).........................pob 1433 is busy, hung or dead....................................20915-1433
participants (8)
-
Andy Walden
-
David Lesher
-
Eric Whitehill
-
Henry Yen
-
Nathan Stratton
-
Paul Timmins
-
Sean Donelan
-
Shawn McMahon