165 Halsey recurring power issues
Hello, I wanted to get some feedback as to what is considered standard A/B power setup when data centers sell redundant power. It has always been my understanding that A/B power means individually unique and preferably alternate path connections to disparate UPS units. A few months ago, 165 Halsey took us down for several hours. They claimed that a UPS failed causing this issue. Our natural reaction was that we have A/B redundant power so a failed UPS on the A circuit should not take down the cabinet. Joe the facility manager claimed that industry standard A/B power means two circuits to the same UPS, which makes no sense to me. They committed to move us to A/B power with redundant circuits to disparate UPS units. However, we had a multi-hour outage again in that site this weekend. At first glance it seems to be the same problem. We have checked with all of our other data center providers who have confirmed A/B power is in fact individually unique connections to disparate UPS units. 165 Halsey's definition of what constitutes redundant power seems unique. Why would anyone pay extra for a second connection to the same UPS? However, I wanted to get feedback to see if I am taking crazy pills here 🙂 None-the-less, we have lost all confidence in this facility. Best Regards, Babak
165 Halsey (and most of its tenant) data centers is an older facility. Data center practices have changed over the decades, and terminlogy wasn't standardized until recently. The biggest FUBAR in telco and data centers is the difference between "redundancy" and "diversity." Redundant A/B power feeds are often multiple cables from the same power source. Diverse A/B power feeds are cables from different backup power sources (within limits). 1-utility, 2 battery strings or backup generators. Often routed through in same conduits/cable trays. But both may be out of service for scheduled maintenance and some kinds of faults. Add a spare A/B power feed. Generally a N+1 backup power source and some additional power switching capability. Fault tolerant A/B power. Everything from utility to rack is diverse and redudant (cables, conduit/cable trays, switching equipment and backup sources). Maintenance can be performed on one of the power feeds without affecting the other feeds. Does not include redundant utility feed (not redundant substation, utility). Cost increase 2x, 5x, 10x I haven't toured 165 Halsey for 10+ years, so I don't know its current state. It has multiple tenant data centers, so some may be better than others. On Mon, 23 Oct 2023, Babak Pasdar wrote:
Hello,
I wanted to get some feedback as to what is considered standard A/B power setup when data centers sell redundant power. It has always been my understanding that A/B power means individually unique and preferably alternate path connections to disparate UPS units.
A few months ago, 165 Halsey took us down for several hours. They claimed that a UPS failed causing this issue. Our natural reaction was that we have A/B redundant power so a failed UPS on the A circuit should not take down the cabinet. Joe the facility manager claimed that industry standard A/B power means two circuits to the same UPS, which makes no sense to me.
They committed to move us to A/B power with redundant circuits to disparate UPS units. However, we had a multi-hour outage again in that site this weekend. At first glance it seems to be the same problem.
We have checked with all of our other data center providers who have confirmed A/B power is in fact individually unique connections to disparate UPS units. 165 Halsey's definition of what constitutes redundant power seems unique. Why would anyone pay extra for a second connection to the same UPS? However, I wanted to get feedback to see if I am taking crazy pills here 🙂
None-the-less, we have lost all confidence in this facility.
Best Regards,
Babak
The building itself got into the action and their goal was to make a top notch facility focusing on central patch panel fiber cross connects. They started with half of the 9th floor originally called MMR-2 and continued with multiple spaces each bigger as it was quite successful. No raised floors, properly positioned chillers, ample power, basic but standard and roomy cabinets, one time fee per cross connect (plus initial cabling and panel setup OTC) and they have been very succesfull by all appearances. Staff reflected their initial goals and I have always interacted well with them. Original mmr where each xcon was actualy pulled space to space was quite a sight with multiple cable conduits and trays running from the tops of the cabs to the ceiling, all full. New space adopted modern approaches and looked it. Joe Sean Donelan wrote:
ine tume 165 Halsey (and most of its tenant) data centers is an older facility. Data center practices have changed over the decades, and terminlogy wasn't standardized until recently.
The biggest FUBAR in telco and data centers is the difference between "redundancy" and "diversity."
Redundant A/B power feeds are often multiple cables from the same power source.
Diverse A/B power feeds are cables from different backup power sources (within limits). 1-utility, 2 battery strings or backup generators. Often routed through in same conduits/cable trays. But both may be out of service for scheduled maintenance and some kinds of faults.
Add a spare A/B power feed. Generally a N+1 backup power source and some additional power switching capability.
Fault tolerant A/B power. Everything from utility to rack is diverse and redudant (cables, conduit/cable trays, switching equipment and backup sources). Maintenance can be performed on one of the power feeds without affecting the other feeds. Does not include redundant utility feed (not redundant substation, utility).
Cost increase 2x, 5x, 10x
I haven't toured 165 Halsey for 10+ years, so I don't know its current state. It has multiple tenant data centers, so some may be better than others.
On Mon, 23 Oct 2023, Babak Pasdar wrote:
Hello,
I wanted to get some feedback as to what is considered standard A/B power setup when data centers sell redundant power. It has always been my understanding that A/B power means individually unique and preferably alternate path connections to disparate UPS units.
A few months ago, 165 Halsey took us down for several hours. They claimed that a UPS failed causing this issue. Our natural reaction was that we have A/B redundant power so a failed UPS on the A circuit should not take down the cabinet. Joe the facility manager claimed that industry standard A/B power means two circuits to the same UPS, which makes no sense to me.
They committed to move us to A/B power with redundant circuits to disparate UPS units. However, we had a multi-hour outage again in that site this weekend. At first glance it seems to be the same problem.
We have checked with all of our other data center providers who have confirmed A/B power is in fact individually unique connections to disparate UPS units. 165 Halsey's definition of what constitutes redundant power seems unique. Why would anyone pay extra for a second connection to the same UPS? However, I wanted to get feedback to see if I am taking crazy pills here 🙂
None-the-less, we have lost all confidence in this facility.
Best Regards,
Babak
On Mon, Oct 23, 2023 at 10:38:09AM -0400, Babak Pasdar wrote:
I wanted to get some feedback as to what is considered standard A/B power setup when data centers sell redundant power.?? It has always been my understanding that A/B power means individually unique and preferably alternate path connections to disparate UPS units.
Generally speaking, the definition of A/B has become muddied in recent decades. It has almost become an inaccurate marketing term. Most sane people have the opinion (myself included) that when "A/B" power is offered, it is at minimum offererd as 2N UPS (different building entrance and MSBs and even physically separate UPS rooms are also desired on a true 2N A/B, but may not always be available). Some data center operators go even further and architect load switching within their distribution, thereby preventing single-side/one-leg power outages for customers during most of their power maintenance activities Some data center operators treat "A/B" as convenience for them to undertake maintenance and offload uptime responsibilities to their own customers, and require them to either undertake their own transfer switching and/or dual-cord every equipment, so that they can keep taking one side of the power system down for repeated maintenance. This does not scale well for retail colo, as not every customer is going to be good at maintaining two PSUs for every single piece of equipment. Some data centers also view "N+1" system deployment at the UPS as an acceptable form of A/B protection, as long as customer circuits are on different PDUs. Long story short, whether you're receiving N+1 or 2N or 1N, it's important to inquire about how your power circuits will be architected and delivered by the data center, and either have that codified in the contract or reflected appropriately in SLA offering. There is nothing wrong with the data center providing N+1 or 1N power, as long as they're transparent about it and that it is what you're willing to accept for the right terms. However, simply accepting "we are providing you A/B power" or "we've never had primary power failure" are not sufficient to meet proper due diligence during a site selection process, unless you can accept the site outage occurring from time to time, or you're deploying your own power plant (i.e. DC power and batteries) to supplant data center's own power protection scheme. James
Thanks James, At signup we asked for N+1 power, two circuits to different UPS units. I think they sliced it thin by connecting us to two battery packs on the same UPS. When the UPS controller crashed both battery packs went down. Which now raises the question -- is it reasonable to have to specify and expect that two UPS units means that they do not share any common points of failure. Is the UPS the battery or the battery and controller combined? Babak On 10/23/23 15:16, James Jun wrote:
On Mon, Oct 23, 2023 at 10:38:09AM -0400, Babak Pasdar wrote:
I wanted to get some feedback as to what is considered standard A/B power setup when data centers sell redundant power.?? It has always been my understanding that A/B power means individually unique and preferably alternate path connections to disparate UPS units. Generally speaking, the definition of A/B has become muddied in recent decades. It has almost become an inaccurate marketing term.
Most sane people have the opinion (myself included) that when "A/B" power is offered, it is at minimum offererd as 2N UPS (different building entrance and MSBs and even physically separate UPS rooms are also desired on a true 2N A/B, but may not always be available). Some data center operators go even further and architect load switching within their distribution, thereby preventing single-side/one-leg power outages for customers during most of their power maintenance activities
Some data center operators treat "A/B" as convenience for them to undertake maintenance and offload uptime responsibilities to their own customers, and require them to either undertake their own transfer switching and/or dual-cord every equipment, so that they can keep taking one side of the power system down for repeated maintenance. This does not scale well for retail colo, as not every customer is going to be good at maintaining two PSUs for every single piece of equipment.
Some data centers also view "N+1" system deployment at the UPS as an acceptable form of A/B protection, as long as customer circuits are on different PDUs.
Long story short, whether you're receiving N+1 or 2N or 1N, it's important to inquire about how your power circuits will be architected and delivered by the data center, and either have that codified in the contract or reflected appropriately in SLA offering. There is nothing wrong with the data center providing N+1 or 1N power, as long as they're transparent about it and that it is what you're willing to accept for the right terms. However, simply accepting "we are providing you A/B power" or "we've never had primary power failure" are not sufficient to meet proper due diligence during a site selection process, unless you can accept the site outage occurring from time to time, or you're deploying your own power plant (i.e. DC power and batteries) to supplant data center's own power protection scheme.
James
On Mon, Oct 23, 2023 at 03:31:21PM -0400, Babak Pasdar wrote:
Is the UPS the battery or the battery and controller combined?
"N+1" nominally means you're connected to the same UPS system/complex, but each of your feed is on a different module. Your other leg will be diverse from a failure in that module, or downstream PDU/panel work fed by that module. A common failure mode in the UPS system itself hosting the different modules can knock out both of your circuits. It sounds like this is how you are configured presently. "2N" generally means you're connected to completely different UPS system/complex and corresponding distribution systems for each of your circuit. This is ideal configuration for most critical loads. James
On Mon, 23 Oct 2023, James Jun wrote:
"2N" generally means you're connected to completely different UPS system/complex and corresponding distribution systems for each of your circuit. This is ideal configuration for most critical loads.
If you are in a single facility, even one with 2N+2 backups, redundancy, diversity, etc., it still has shared fate. Clouds with regions and zones on campuses in Eastern Virginia seem to come up with new and exciting ways to fail :-) https://en.wikipedia.org/wiki/Chaos_engineering
At which point one starts looking at the risk factors, if your whole facility is "redundant", is the power feed coming in from two geographically diverse substations, via diverse duct banks, into diverse entry vaults, and diverse risers? Doesn't eliminate the possibility of the entire building having some catastrophic emergency, but if you really need to use a singular specific geographic facility, can reduce the risk The giant new electrical vault built under 6th Ave in Seattle in front of the Westin Building back in 2016/2017 is an example of such diversity. On Mon, Oct 23, 2023 at 7:08 PM Sean Donelan <sean@donelan.com> wrote:
On Mon, 23 Oct 2023, James Jun wrote:
"2N" generally means you're connected to completely different UPS system/complex and corresponding distribution systems for each of your circuit. This is ideal configuration for most critical loads.
If you are in a single facility, even one with 2N+2 backups, redundancy, diversity, etc., it still has shared fate. Clouds with regions and zones on campuses in Eastern Virginia seem to come up with new and exciting ways to fail :-)
If you have been sold "redundant" power and the DC provider has connected both sides to one UPS in any form they are seriously amiss. You should not be expected to know the internal workings of the DC UPS systems and any talk of battery packs (unless you are getting 48v DC) is utterly irrelevant. This DC provider is, in my opinion is very much out of step with reality if they think this is some sort of normal practice. -----Original Message----- From: NANOG <nanog-bounces+tony=wicks.co.nz@nanog.org> On Behalf Of Babak Pasdar Sent: Tuesday, October 24, 2023 8:31 AM To: James Jun <james.jun@towardex.com> Cc: nanog@nanog.org Subject: Re: 165 Halsey recurring power issues Thanks James, At signup we asked for N+1 power, two circuits to different UPS units. I think they sliced it thin by connecting us to two battery packs on the same UPS. When the UPS controller crashed both battery packs went down. Which now raises the question -- is it reasonable to have to specify and expect that two UPS units means that they do not share any common points of failure. Is the UPS the battery or the battery and controller combined? Babak
Willing to bet that there was slicing on both sides of that conversation and this is what I will now refer to as the expected and resulting razor burn. Babak Pasdar wrote:
Thanks James,
At signup we asked for N+1 power, two circuits to different UPS units. I think they sliced it thin by connecting us to two battery packs on the same UPS. When the UPS controller crashed both battery packs went down. Which now raises the question -- is it reasonable to have to specify and expect that two UPS units means that they do not share any common points of failure.
Is the UPS the battery or the battery and controller combined?
Babak
On 10/23/23 15:16, James Jun wrote:
On Mon, Oct 23, 2023 at 10:38:09AM -0400, Babak Pasdar wrote:
I wanted to get some feedback as to what is considered standard A/B power setup when data centers sell redundant power.?? It has always been my understanding that A/B power means individually unique and preferably alternate path connections to disparate UPS units. Generally speaking, the definition of A/B has become muddied in recent decades. It has almost become an inaccurate marketing term.
Most sane people have the opinion (myself included) that when "A/B" power is offered, it is at minimum offererd as 2N UPS (different building entrance and MSBs and even physically separate UPS rooms are also desired on a true 2N A/B, but may not always be available). Some data center operators go even further and architect load switching within their distribution, thereby preventing single-side/one-leg power outages for customers during most of their power maintenance activities
Some data center operators treat "A/B" as convenience for them to undertake maintenance and offload uptime responsibilities to their own customers, and require them to either undertake their own transfer switching and/or dual-cord every equipment, so that they can keep taking one side of the power system down for repeated maintenance. This does not scale well for retail colo, as not every customer is going to be good at maintaining two PSUs for every single piece of equipment.
Some data centers also view "N+1" system deployment at the UPS as an acceptable form of A/B protection, as long as customer circuits are on different PDUs.
Long story short, whether you're receiving N+1 or 2N or 1N, it's important to inquire about how your power circuits will be architected and delivered by the data center, and either have that codified in the contract or reflected appropriately in SLA offering. There is nothing wrong with the data center providing N+1 or 1N power, as long as they're transparent about it and that it is what you're willing to accept for the right terms. However, simply accepting "we are providing you A/B power" or "we've never had primary power failure" are not sufficient to meet proper due diligence during a site selection process, unless you can accept the site outage occurring from time to time, or you're deploying your own power plant (i.e. DC power and batteries) to supplant data center's own power protection scheme.
James
On Mon, Oct 23, 2023 at 7:38 AM Babak Pasdar <babak@pasdar.com> wrote:
A few months ago, 165 Halsey took us down for several hours. They claimed that a UPS failed causing this issue. Our natural reaction was that we have A/B redundant power so a failed UPS on the A circuit should not take down the cabinet. Joe the facility manager claimed that industry standard A/B power means two circuits to the same UPS, which makes no sense to me.
If they're being truthful (and many folks are not) then A/B power means that your power is redundant back to at least two different UPSes. The UPSes are maintained at under 40% capacity so that a failure of one doesn't cascade to the other. Ideally these UPSes back to two different generators, also maintained at 40% of capacity. In large, fancy data centers they even get power company feeds from two different substations. Don't just ask the sales droid. When they deliver the rack or the cage, ask the data center manager to show you where your power connections run. If they can't or won't... don't believe them. "Industry standard" A/B power does NOT mean two circuits to the same UPS. That's just extra power, not A/B. Joe lied to you. Incidentally, if you're worried about N+1 redundancy, I assume you're hosted at more than one data center from more than one vendor? Buildings and vendors are single points of failure too. Even when built right, stuff happens. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
I toured The Planet years ago in Dallas and was told by the sales rep that A+B power was two circuits from the same PDU. :) I consider A+B power to be two distinct feeds, separate utility entrances, separate generators, separate UPS', PDU's, etc. Past that I consider things like firewall separation, rated chases and such to be customer specific requirements. Aaron On 10/23/2023 9:38 AM, Babak Pasdar wrote:
Hello,
I wanted to get some feedback as to what is considered standard A/B power setup when data centers sell redundant power. It has always been my understanding that A/B power means individually unique and preferably alternate path connections to disparate UPS units.
A few months ago, 165 Halsey took us down for several hours. They claimed that a UPS failed causing this issue. Our natural reaction was that we have A/B redundant power so a failed UPS on the A circuit should not take down the cabinet. Joe the facility manager claimed that industry standard A/B power means two circuits to the same UPS, which makes no sense to me.
They committed to move us to A/B power with redundant circuits to disparate UPS units. However, we had a multi-hour outage again in that site this weekend. At first glance it seems to be the same problem.
We have checked with all of our other data center providers who have confirmed A/B power is in fact individually unique connections to disparate UPS units. 165 Halsey's definition of what constitutes redundant power seems unique. Why would anyone pay extra for a second connection to the same UPS? However, I wanted to get feedback to see if I am taking crazy pills here 🙂
None-the-less, we have lost all confidence in this facility.
Best Regards,
Babak
Bulk/high-volume hosting companies, dedicated server companies/small rack unit count colocation operate on very thin margins. Unless a customer is paying a LOT more per month they're not economically going to be connected to true diverse A/B power. In this case their use of the incorrectly-described A/B was probably exclusively to handle the (not extremely rare) instances of rackmount server power supply failures, to give each 1U or 2U size machine, or rack of blades, two live power supplies with live power feeds. Nothing more complicated than that. On Mon, Oct 23, 2023 at 3:34 PM Aaron Wendel <aaron@wholesaleinternet.net> wrote:
I toured The Planet years ago in Dallas and was told by the sales rep that A+B power was two circuits from the same PDU. :)
I consider A+B power to be two distinct feeds, separate utility entrances, separate generators, separate UPS', PDU's, etc. Past that I consider things like firewall separation, rated chases and such to be customer specific requirements.
Aaron
On 10/23/2023 9:38 AM, Babak Pasdar wrote:
Hello,
I wanted to get some feedback as to what is considered standard A/B power setup when data centers sell redundant power. It has always been my understanding that A/B power means individually unique and preferably alternate path connections to disparate UPS units.
A few months ago, 165 Halsey took us down for several hours. They claimed that a UPS failed causing this issue. Our natural reaction was that we have A/B redundant power so a failed UPS on the A circuit should not take down the cabinet. Joe the facility manager claimed that industry standard A/B power means two circuits to the same UPS, which makes no sense to me.
They committed to move us to A/B power with redundant circuits to disparate UPS units. However, we had a multi-hour outage again in that site this weekend. At first glance it seems to be the same problem.
We have checked with all of our other data center providers who have confirmed A/B power is in fact individually unique connections to disparate UPS units. 165 Halsey's definition of what constitutes redundant power seems unique. Why would anyone pay extra for a second connection to the same UPS? However, I wanted to get feedback to see if I am taking crazy pills here 🙂
None-the-less, we have lost all confidence in this facility.
Best Regards,
Babak
On Mon, Oct 23, 2023 at 3:56 PM Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
Bulk/high-volume hosting companies, dedicated server companies/small rack unit count colocation operate on very thin margins. Unless a customer is paying a LOT more per month they're not economically going to be connected to true diverse A/B power.
Zero sympathy for anyone who advertises A/B power and doesn't at least have them connected to different UPSs. Don't care how big you are; don't advertise fake reliability. I don't need "six nines" to make effective use of your service but if you lie to me, we're done. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
I didn't say that I have sympathy for it but that unfortunately this is considered acceptable practice within many low-budget "hosting" companies and probably has been for 15 years. It's a known risk when you're buying a $50/month "server". Same general category of problem as the OVH datacenter that caught on fire in France a while back. Anything like that which becomes a race to the bottom in pricing for product MRC will have unacceptable corners cut. I would highly encourage anyone who takes seriously hosting their own stuff to really know/understand the full infrastructure "underneath" your server in terms of power and cooling redundancy. On Mon, Oct 23, 2023 at 4:38 PM William Herrin <bill@herrin.us> wrote:
On Mon, Oct 23, 2023 at 3:56 PM Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
Bulk/high-volume hosting companies, dedicated server companies/small rack unit count colocation operate on very thin margins. Unless a customer is paying a LOT more per month they're not economically going to be connected to true diverse A/B power.
Zero sympathy for anyone who advertises A/B power and doesn't at least have them connected to different UPSs. Don't care how big you are; don't advertise fake reliability. I don't need "six nines" to make effective use of your service but if you lie to me, we're done.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
On 10/23/23 15:56, Eric Kuhnke wrote:
In this case their use of the incorrectly-described A/B was probably exclusively to handle the (not extremely rare) instances of rackmount server power supply failures, to give each 1U or 2U size machine, or rack of blades, two live power supplies with live power feeds. Nothing more complicated than that.
And then inevitably the customer will load the rack with dual supply gear to the point that each feed is pulling over 50% of the breaker rating. When one of the feeds eventually does have an issue, they'll immediately pop the breaker on the other one. -- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
participants (9)
-
Aaron Wendel
-
Babak Pasdar
-
Eric Kuhnke
-
James Jun
-
Jay Hennigan
-
Joe Maimon
-
Sean Donelan
-
Tony Wicks
-
William Herrin