On Mon, 08 January 2001, Henry Yen wrote:
And if you are running a late-model linux (preferably RedHat), you can download APC's own "award-winning PowerChute Plus" software for linux from their website. It seems to be identical to PowerChutePlus running on any other platforms, except that the interface is through X86.
And what if you are not using APCs? One issue with highly redudandent data centers is the failure modes are "interesting." You don't want to shutdown due to a single UPS failure, so you don't use something simple like PowerChute Plus. You most likely don't want to shutdown based on any automatic signal. However, you do want a way for an operator to gracefully shutdown a lot of equipment quickly when the decision is made. For a server farm, with potentially thousands of individual systems, is there any standard piece of software you can install on all of the systems to act as a receiver of a signal to begin a graceful shutdown that does not depend on a vendor's proprietary interface? Preferabally one which does not involve running a lot of additional wires. I know, everyone says their systems will never fail. Think of this as the "else" statement for the condition which will never happen. Again this is only needed if people want a gracefull shutdown. If you can live with a hard shutdown, you wouldn't require this. If you use ctrl-alt-del as a normal management practice, I suspect you don't really require a graceful shutdown.
Unnamed Administration sources reported that Sean Donelan said:
And what if you are not using APCs?
See the menu of systems listed at: http://www.exploits.org/nut/
One issue with highly redudandent data centers is the failure modes are "interesting." You don't want to shutdown due to a single UPS failure, so you don't use something simple like PowerChute Plus. You most likely don't want to shutdown based on any automatic signal. However, you do want a way for an operator to gracefully shutdown a lot of equipment quickly when the decision is made.
For a server farm, with potentially thousands of individual systems, is there any standard piece of software you can install on all of the systems to act as a receiver of a signal to begin a graceful shutdown that does not depend on a vendor's proprietary interface? Preferabally one which does not involve running a lot of additional wires.
Good point; you'll likely need a box just to talk to UPSi and control shutdowns. That alas, is adding a single point of failure.
Again this is only needed if people want a gracefull shutdown. If you can live with a hard shutdown, you wouldn't require this. If you use ctrl-alt-del as a normal management practice, I suspect you don't really require a graceful shutdown.
You really don't want to run all the UPS batteries flat. It will lengthen the recovery time.... (If graceful shutdown is your goal; when power is restored, you want the UPS to FIRST recharge enough so it can again gracefully shutdown, when the power turns out to be back up for just a minute or two....thus you delay restarting the load.) -- A host is a host from coast to coast.................wb8foz@nrk.com & no one will talk to a host that's close........[v].(301) 56-LINUX Unless the host (that isn't close).........................pob 1433 is busy, hung or dead....................................20915-1433
On Mon, Jan 08, 2001 at 02:35:49PM -0800, Sean Donelan put this into my mailbox:
One issue with highly redudandent data centers is the failure modes are "interesting." You don't want to shutdown due to a single UPS failure, so you don't use something simple like PowerChute Plus. You most likely don't want to shutdown based on any automatic signal. However, you do want a way for an operator to gracefully shutdown a lot of equipment quickly when the decision is made.
It should be technically fairly easy to set up an 'Emergency Graceful Shutdown' button to live next to the EPO button; this controls a line that runs through the data center that activates either one relay per system or one optoisolator per system, depending on your fancy, that can raise or lower a particular serial line (DSR, CTS, whatever). You then install a daemon (again, fairly simple) that listens to this serial line; when it detects a change, it executes a graceful shutdown on that system. If you wanted to get fancy, pushing the "EGS" button could send a series of pulses that the daemon would have to interpret; this way you guard against odd line noise or loose connections triggering the shutdown. You could even set up a 'Cancel EGS' signal, as well. You could then interface this to more stuff, like the "You'll be out of diesel for your generator in 120 seconds" alert, etc. etc. Then again, my sprinklers water my lawn via a cron job, so I might just be Different. -dalvenjah -- Dalvenjah FoxFire (aka Sven Nielsen) "If her breath were as terrible as her Founder, the DALnet IRC Network terminations, there were no living near her; she would infect to the North Star!" e-mail: dalvenjah@dal.net WWW: http://www.dal.net/~dalvenjah/ whois: SN90 Try DALnet! http://www.dal.net/
One issue with highly redudandent data centers is the failure modes are "interesting." You don't want to shutdown due to a single UPS failure, so you don't use something simple like PowerChute Plus. You most likely don't want to shutdown based on any automatic signal. However, you do want a way for an operator to gracefully shutdown a lot of equipment quickly when the decision is made.
The old Deltec stuff was good about this. They had it so that a server daemon would notify different groups at different stages. Power lost->notify group A (Printers, PCs) Low battery->notify group B (Secondary servers) Dead battery->notify group C (Primary servers, comms) They also had different outlets on different "groups", so if a device wasn't able to understand the network alert (the routers and firewalls don't have agents), they could be terminated as a part of a group. Deltec got bought by somebody and I'm sure a lot of this stuff has changed since I last looked at it, but it was a good design. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
* Sean Donelan <sean@donelan.com> [20010108 15:05]:
And what if you are not using APCs?
But still stand alone UPSes? Don't most data centers have larger UPS(es) or battery plants (say, two) feeding the entire facility? The ones I've worked in have (well, not *all* of them, but those exceptions had much bigger issues than worrying about how they were going to shutdown all of the boxes at once..) And if you aren't using standalone UPSes what do you care what the interface is to the BigUPS(tm) as long as you can get one of your network monitoring servers to talk to it (and reliably)? None of your servers in the server farm are going to be talking to your BigUPS(tm) directly anyway..
One issue with highly redundant data centers is the failure modes are "interesting." You don't want to shutdown due to a single UPS failure, so you don't use something simple like PowerChute Plus. You most likely don't want to shutdown based on any automatic signal. However, you do want a way for an operator to gracefully shutdown a lot of equipment quickly when the decision is made.
Agreed. And in this case, the UPS has no involvement. If the operator wants the servers shutdown, the operator shuts servers down. No UPS involved (OK, well not literally). I realize this doesn't address your entire point...one sec I'll get to that.
For a server farm, with potentially thousands of individual systems, is there any standard piece of software you can install on all of the systems to act as a receiver of a signal to begin a graceful shutdown that does not depend on a vendor's proprietary interface? Preferabally one which does not involve running a lot of additional wires.
Sure, ssh/rsh[1]. :-) What vendor's proprietory interface -- the OS vendor of the servers? The UPSes don't have anything to do with the shutdown process if the operator is the one making the call. To accomplish that it's a simple matter of scripting a bunch of: ssh webserver01 'shutdown -h now Power-Go-Bye-Bye' Of course, if you have unmanaged (e.g. customer boxes you do not have root access to) within the same data center, and you want to do the same for those, that's a whole another story... Oh, hmm, and Windows. Well, remote command execution is possible there too from my understanding. At that point, once all servers are gracefull shutdown, you can just shut the UPS(es) off if you're intent is to eventually cut any and all power to the facility. Or did I completely miss your point?
Again this is only needed if people want a gracefull shutdown. If you can live with a hard shutdown, you wouldn't require this. If you use ctrl-alt-del as a normal management practice, I suspect you don't really require a graceful shutdown.
I'm being anal but even ctrl-alt-del is graceful on most modern OSes. The power or reset button though on the other hand... :-) [1] rsh only mentioned for historical reasons, please don't use to manage the remote power capability of your mission-critical server farm located in your highly redundant data center unless you understand why you might consider not doing so. :) -jr ---- Josh Richards [JTR38/JR539-ARIN] <jrichard@geekresearch.com/cubicle.net/fix.net/freedom.gen.ca.us> Geek Research LLC - <URL:http://www.geekresearch.com/> IP Network Engineering and Consulting
2001-01-08-17:35:49 Sean Donelan:
[...] You most likely don't want to shutdown based on any automatic signal. However, you do want a way for an operator to gracefully shutdown a lot of equipment quickly when the decision is made.
For a server farm, with potentially thousands of individual systems, is there any standard piece of software you can install on all of the systems to act as a receiver of a signal to begin a graceful shutdown that does not depend on a vendor's proprietary interface? Preferabally one which does not involve running a lot of additional wires.
I've got my own preference; when running even mere dozens of machines in a tightly coordinated farm, I want the ability to manage them all very quickly and easily, so I use a script I wrote for parallel execution of a command. It takes a command-line and operates on it with controlled parallelism; it's available at <URL:http://people.oven.com/bet/multicmd>. I'll install that on an admin server, which will be a very very tightly secured machine indeed, since the account which I use on that machine will have an ssh key, with no passphrase, that's accepted for running root commands on every machine in the farm. Given that setup, the answer to your question requires only a pre-built list of the hostnames or ip addrs of the machines to halt, at which point it's something along the rough lines of multicmd ssh \$1 'sh -c "sleep 10;halt" >&- 2>&- <&- &' \ <hostlist or thereabouts; after it's tested I'd save this, and any other useful invocations, in scripts so I don't have to remember 'em. For thousands of machines, the default options (10-at-a-time parallel, 1-second delay between launches) wouldn't be quick; for an emergency halt program, I'd probably up the parallelism to whatever my local system could handle well, and drop the inter-cmd delay to maybe 0.01 sec. And of course for platforms with software-controllable power switches, the "halt" could be replaced with an invocation that would power the boxes down. -Bennett
First thing that comes to mind is a perl script that, given the correct password/passphrase can `ssh -l [machine] shutdown -h now`, seems pretty simply to me, assuming you keep a list of all the servers current with a common RSA auth key or whatnot. Matthew S. Hallacy XtraTyme Technologies On 8 Jan 2001, Sean Donelan wrote:
On Mon, 08 January 2001, Henry Yen wrote:
And if you are running a late-model linux (preferably RedHat), you can download APC's own "award-winning PowerChute Plus" software for linux from their website. It seems to be identical to PowerChutePlus running on any other platforms, except that the interface is through X86.
And what if you are not using APCs?
One issue with highly redudandent data centers is the failure modes are "interesting." You don't want to shutdown due to a single UPS failure, so you don't use something simple like PowerChute Plus. You most likely don't want to shutdown based on any automatic signal. However, you do want a way for an operator to gracefully shutdown a lot of equipment quickly when the decision is made.
For a server farm, with potentially thousands of individual systems, is there any standard piece of software you can install on all of the systems to act as a receiver of a signal to begin a graceful shutdown that does not depend on a vendor's proprietary interface? Preferabally one which does not involve running a lot of additional wires.
I know, everyone says their systems will never fail. Think of this as the "else" statement for the condition which will never happen.
Again this is only needed if people want a gracefull shutdown. If you can live with a hard shutdown, you wouldn't require this. If you use ctrl-alt-del as a normal management practice, I suspect you don't really require a graceful shutdown.
participants (8)
-
Bennett Todd
-
bmanning@vacation.karoshi.com
-
Dalvenjah FoxFire
-
David Lesher
-
Eric A. Hall
-
Josh Richards
-
poptix@sleepybox.poptix.net
-
Sean Donelan