It flabbergasts me to no end that nobody simulated the actual incident they are guarding against. But I guess that’s why we run telecom companies. Diesel piston generators need to be run for 30min every 30 (absent engineer calcs permitting lower, but, why). You should also consider a pull and re-strike on that breaker 3 times. Most transmission level circuit breakers will auto-retry 3x then quit if they trip each time. Your ATS should smooth this, but that function needs to get tested too. Things you learn in heavy civil construction that you don’t necessarily learn in telecom even. -Ben
On Mar 18, 2020, at 9:58 AM, Paul Nash <paul@nashnetworks.ca> wrote:
You just have to make sure that you test the right thing.
In a former life I was an electrical engineer. My first job was with a consulting engineering firm; out biggest customer was the biggest supermarket chain in South Africa. One of my tasks was to travel to one of their stores each Saturday after closing (those were the days when they closed at noon on a Saturday until Monday morning) and test their stand generators.
The manager’s idea was usually to press the start button, check that the big diesel started, then shut down and go home. My idea was to pull the main incoming breaker. 9 times out of 10 on first visit, the diesel would start, and then die as soon as the load kicked in because of carbon buildup in the cylinders.
After discussions with the supermarket management, they decided to (a) have all the diesels serviced ASAP, and (b) adopt my protocol of start diesel, wait for it to come under load, run for at least 30 minutes to get up to heat and clear the carbon deposits.
I use a similar technique for failover tests on servers, routers, firewalls — pull the power cord and see what happens, pull the incoming network and see what happens.
This was stymied by a recent network outage where the ISP network was up and running, connected back to their local PoP and thence to their backbone, but connectivity from that network to the critical servers was down. So now we test end-to-end that the server is reachable, and let the network fail over if not.
paul
On Mar 18, 2020, at 11:56 AM, Karl Auer <kauer@biplane.com.au> wrote:
An untested emergency system has to be regarded as a non-existent emergency system.
No matter how painful it is to test, no matter how expensive it is to test, the pain and the expense are nothing compared to the pain and expense of having an actual emergency and discovering that the emergency system doesn't work...
Multiplied by infinity if it costs lives.
Regards, K.
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer http://twitter.com/kauer389
GPG fingerprint: 2561 E9EC D868 E73C 8AF1 49CF EE50 4B1D CCA1 5170 Old fingerprint: 8D08 9CAA 649A AFEF E862 062A 2E97 42D4 A2A0 616D