On Thu, 29 May 2003, Alex Rubenstein wrote:
Even in instances where 'High availability' is designed, in the case where one of the units has a failure that causes a fire and FM200 dump, either the FM200 will still trigger an EPO, or the fire department will.
Why do you think most telephone central offices don't have EPO's? It is possible to meet code without an EPO, if you have a smart PE on the project.
So, the second 'high available' unit will generally not prevent you from dropping the critical load, but instead, will help you get back on line quicker.
That's why you have geographic diversity, if one node goes down the other location may be unaffected.
A much cheaper and easier to implement external maintenance make-before-break bypass will accomplish the same thing.
Pick two out of three. The "Internet philosphy" has tended to be a lots of cheap equipment connected by diverse paths. Designing for failure also means defining "failure" in terms of the service, not particular pieces of equipment. I don't care how many 9's your switch is, I just care if my packets get through.
I've heard many a story of the paralleling gear causing the problem in the first place, as well...
Yep, tieing together "redundant" systems with parelleling gears turns two independent systems into one "co-dependent" system. In a failure situation, you want to compartmentalize the failure. Loosing half your systems may be better than loosing all your systems.