[Names withheld to protect the inept] Tonight has been a frustrating night. Two outages, both totally unexpected, and as of yet, totally unexplained. Admittedly, it's only been 8 hours since the first, and 2 since the second. But it really should not take 48 (or, often times, MORE) to come up with a valid explanation of what happened. We've had outages that have -never- been explained... just ignored. Basically, the first outage around 8 tonight, we saw everything blip for about 3 minutes. We're dual homed to $provider within the facility, one gig-e pipe to each of two hosting routers. What it looked like to me was that both were rebooted, or something was cycled in between us and the world. When I called $provider, I was shooed off by one tech who promised (and failed) to call back. The second crammed the "No, you're an idiot, you missed the published maintenance list which I'm emailing to you, go away" line down my throat. Of course, I dont know of providers that perform that level of maintenance at 8pm PST. None of their scheduled maintenance was listed. The second was, indeed, scheduled maintenance. Replacing the secondary hosting router. Then someone (apparently) reloaded the primary router while the secondary was -in pieces-... Then when I called, I was told "Oh, no, we have no idea what happened, but we'll let you know in 48 hours..." My BGP session was reset. -SOMETHING- happened, guys. Neither of these outages generated an email to the customer notification list. IMO, this shouldn't be acceptable. I may be new to some of this stuff, but in my experience, it doesn't take long to figure out whether someone reloaded the wrong router, or of the GRP let go of the precious magic smoke. What are some of the other major hosting/transit providers' outage notification and post-mortem policies? Should our next contract include provisions for Rogaine, so that when I finish tearing out my hair I can recover? -j -- -Jonathan Disher -Sr. Systems and Network Engineer, Web Operations -Internet Pictures Corporation, Palo Alto, CA -[v] (650) 388-0497 | [p] (877) 446-9311 | [e] jdisher@eng.ipix.com
participants (1)
-
Jonathan Disher