Estimated Time To Repair (was Re: History: lengthy outages)
On Thu, 25 January 2001, Clayton Fiske wrote:
1. I think it might be prudent to weed out the 2-5 hour outages here. While that's still an excessively long time to recover a change that should have been monitored and tested properly in the first place (and still probably cause for firing in some shops), I can at least conceive of it taking this amount of time. Too long, yes, but not quite in the jaw-dropping category.
I deliberately included them for a reason. Historically, when I look at a lot of network problems I notice an interesting coincidence. Outages involving "operator error" tended to take the longest time to fix, while outages involving equipment failure tend to be the shortest times. Complete hardware box failure (smoke makes debugging easy): 1 hour Power failure (utility, generator, etc): 3 hours External malicious attack (ddos, etc): 4 hours Fiber/Cable cut: 5 hours Electronic DCS failure: 18 hours Operator error: 1 business day (24-72 hours depending whether the operator made the change before leaving on a Friday night or a Tuesday night) Vendor software error: 2 business days (1 day to "escalate" the problem through customer channels, 1 day to actually get the fix, can take as long as 5 days if the problem happens after 3pm on Friday) Psychologists study why people have a difficult time recognizing their own mistakes. It is a very difficult problem. The problem is worse with "smart" people.
participants (1)
-
Sean Donelan