
2001-01-08-17:35:49 Sean Donelan:
[...] You most likely don't want to shutdown based on any automatic signal. However, you do want a way for an operator to gracefully shutdown a lot of equipment quickly when the decision is made.
For a server farm, with potentially thousands of individual systems, is there any standard piece of software you can install on all of the systems to act as a receiver of a signal to begin a graceful shutdown that does not depend on a vendor's proprietary interface? Preferabally one which does not involve running a lot of additional wires.
I've got my own preference; when running even mere dozens of machines in a tightly coordinated farm, I want the ability to manage them all very quickly and easily, so I use a script I wrote for parallel execution of a command. It takes a command-line and operates on it with controlled parallelism; it's available at <URL:http://people.oven.com/bet/multicmd>. I'll install that on an admin server, which will be a very very tightly secured machine indeed, since the account which I use on that machine will have an ssh key, with no passphrase, that's accepted for running root commands on every machine in the farm. Given that setup, the answer to your question requires only a pre-built list of the hostnames or ip addrs of the machines to halt, at which point it's something along the rough lines of multicmd ssh \$1 'sh -c "sleep 10;halt" >&- 2>&- <&- &' \ <hostlist or thereabouts; after it's tested I'd save this, and any other useful invocations, in scripts so I don't have to remember 'em. For thousands of machines, the default options (10-at-a-time parallel, 1-second delay between launches) wouldn't be quick; for an emergency halt program, I'd probably up the parallelism to whatever my local system could handle well, and drop the inter-cmd delay to maybe 0.01 sec. And of course for platforms with software-controllable power switches, the "halt" could be replaced with an invocation that would power the boxes down. -Bennett