Re: San Francisco Power Outage
Seth wrote:
Jonathan Lassoff wrote:
Just a heads up to anyone on list that PG&E has just sustained a large outage in San Francisco that has caused a few hiccups (both network, electrical, infrastructural, etc.) around the city.
I've confirmed that both customers in 365 Main and parts of telecom 1 have both sustained brief blackouts. No word yet form 200 Paul.
Anyone in the area that could use a hand with anything, I'll probably be wrapping up fixes for my stuff soon, and would be glad to help however I can.
I have a question: does anyone seriously accept "oh, power trouble" as a reason your servers went offline? Where's the generators? UPS? Testing said combination of UPS and generators? What if it was important? I honestly find it hard to believe anyone runs a facility like that and people actually *pay* for it.
If you do accept this is a good reason for failure, why?
Unfortunate real-world lesson: there is a functional difference between pushing the UPS test cutover button, and some of the stuff that can happen out on the power lines (including rapid voltage swings, harmonics, etc). I know 365 Main has the equipment and tests it, I've been standing outside when the generators spool up. I've had generator firmware upgrades generate reporting info on the serial uplink that flipped the UPSes into permanent error state until the Liebert guys got off the plane with the replacement mainboard. I've had grid voltage fluctuations that toasted VSDs in chillers. I watched a building's electrical service go "pop" when a transformer blew and ran 10kv into the 220 mains for a fraction of a second as it arced. I was at home but called in after a 5 MW generator popped under a sufficiently badly harmonic UPS and PDU load of only about 2.4 MW. I had a client who forgot to wire the A/C into the UPS, and nearly melted a whole server room. And the stories that the power guy I'm working with tells about foreign facilities, particularly in middle east war zones, are really scary... We fundamentally do not have the facilities problem completely nailed down to the point that things will never drop. Level 4 datacenters can, and will, fail. Nothing you can do including just doing 48V DC for everything are truly foolproof solutions. -george williiam herbert gherbert@retro.com
And the stories that the power guy I'm working with tells about foreign facilities, particularly in middle east war zones, are really scary...
We fundamentally do not have the facilities problem completely nailed down to the point that things will never drop. Level 4 datacenters can, and will, fail. Nothing you can do including just doing 48V DC for everything are truly foolproof solutions.
A single level 4 datacenter is a Single Point of Failure! Two of those middle-eastern style facilities is... ? Has anyone actually kept track of all these data center failures over the years and done some statistical analysis on it? Maybe two half-baked data centers is better than one over the long run? Remember that one 10-12 years ago in (Palo Alto, Mountainview?) where a lady in a car caused a backhoe driver to move out of the way which resulted in him cutting a gas line which resulted in the fire department evacuating the data center, cutting off electricity in the area, and forbidding the diesel generators to be switched on? --Michael Dillon
On 7/25/07, michael.dillon@bt.com <michael.dillon@bt.com> wrote:
... fire department evacuating the data center, cutting off electricity in the area, and forbidding the diesel generators to be switched on?
I know a guy who was at the US Data Centers Inc facility in Marlborough, MA (before USDCI failed). Soon after they first opened it up, they had a fire. The problem was the fire was *in* the giant APC/Silicon system they had. They had to kill the APC, and that took the load down too. So they installed an external transfer switch, rather than depending on the one built-in to the APC system. There was some SNAFU with the wiring, so right after the install, there was an electrical fire -- this time in the external transfer switch panel. While I suspect poor planning/testing contributed to their woes, it still goes to show: Some days you're the windshield, and some days you're the bug. -- Ben
Speaking on Deep Background, the Press Secretary whispered:
Level 4 datacenters can, and will, fail. Nothing you can do including just doing 48V DC for everything are truly foolproof solutions.
Hard to find anyone who takes the -48vdc mantra to heart more than an RBOC. Ditto on lightning protection. Yet I recall the Bell South 305-255 CO taking a lightning hit on the incoming power; the 5ESS was down for 3-4 hours. -- A host is a host from coast to coast.................wb8foz@nrk.com & no one will talk to a host that's close........[v].(301) 56-LINUX Unless the host (that isn't close).........................pob 1433 is busy, hung or dead....................................20915-1433
participants (4)
-
Ben Scott
-
David Lesher
-
George William Herbert
-
michael.dillon@bt.com