On Wed, Apr 27, 2005 at 10:45:15AM -0400, Jay Patel wrote:
I have heard rumors that S&D has been having persistent switch problems with their switches at PAIX (Palo Alto), and I was kind of wondering if anyone actually cared?
Personally I tend to suspect the general lack of uproar is a rather unfortunate (for them) sign that PAIX is no longer relevant when it comes to critical backbone infrastructures. It looks like different folks have been seeing different levels of outages depending upon which switch/card they are connected to, but I havn't been able to find anyone who has seen fewer than 30 hits between April 16th and the two this morning. Our ports have seen just under 28 hours of total downtime so far this month, while some lucky people have only seen around 6 hours. I'm not sure if anyone at S&D or Extreme actually has any real idea what the problem is with these current switches, but given this amount of downtime, they should have replace every last component by now. If Extreme can't fix them, there should be a pile of Black Diamond's sitting on the curb waiting for trash day. In fact, 9/10ths of the way through writing this e-mail, I got a call from S&D stating that they are doing exactly that. :) In the mean time, here are some of the more interesting snipits of what has been tried on the current switches: 16 Apr 2005 20:19:53 GMT We are currently experiencing some problems with 2 network cards in our Palo Alto peering switch. This might be causing possible service degradations. Switch Engineers are expecting new cards to replace the 2 suspected faulty network cards. These cards should be arriving in or around 1 hour. Right after the cards arrive, we will be scheduling an emergency maintenance window to get these cards replaced. 19 Apr 2005 14:16:07 GMT The Purpose of this Emergency Maintenance window is for Switch Engineers to replace a faulty processor module card affecting the Bay Area Peering customers. The estimated down time will be 15 minutes. (Actual downtime several hours) 19 Apr 2005 19:27:49 GMT This is the final update regarding the problems experienced today with the peering fabric. Our Switch Engineers corrected the problems during the emergency maintenance window by replacing two line cards and 2 processor cards in the Palo Alto switch. All peering sessions should be restored at this time. 22 Apr 2005 21:56:15 GMT The purpose of this emergency maintenance window is for engineers to replace defective power supply units on the Paix Switch. No impact to your services is expected. 24 Apr 2005 21:25:48 GMT Our Switch Engineers will be conducting and emergency processor cards replacement at the Palo Alto site. The expected downtime while this maintenance is being conducting will be 2 hours. 24 Apr 2005 21:36:18 GMT Our Switch Engineers will be conducting and emergency chassis replacement at the Palo Alto site. The expected downtime while this maintenance is being conducting will be 3 hours. 25 Apr 2005 19:17:41 GMT Our engineers have escalated the problems with the peering switch in Palo Alto to 3rd level support at Extreme, the switch vendor. More details will follow as they become available. 26 Apr 2005 03:00:34 GMT Our Switch Engineers have advised us that the switch has been migrated to a different power bus to rule out any power variables. Power is being monitored for the next 24 hours. 28 Apr 2005 13:33:05 GMT At approximately 6:05 AM local time, the peering switch rebooted itself. Our switch engineers are investigating this issue and believe all sessions are back to normal at this time. More details will be provided as they become available. When I see a stable switching platform going forward, and some service credits for the massive outages we've all endured so far, I'll probably be a lot less cranky about the entire situation. Until then I have to say, if they keep this up their are going to need to change their name to "Switch or Data". Oh well, at least this didn't happen during the S&D sponsored NANOG. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)