Anyone using transit from GNi at 365main seeing problems on routes that normally go over Above.net? For the last 36 hours, we've had problems. GNi isn't saying anything except that "replacement router cards are being delivered". I'm not sure if it's just routing via Above.net but lots of routes going through core-01.ge-1-1.sfo1.gni.com seems to be getting dropped.
We found a routing loop on 8/20 caused by some maintenance that either did not get completed, wasn't properly configured, or otherwise had some problems the evening before. At that point I went ahead and shut down BGP peering and asked to be notified when all was well. 8/26 notified that all was well, other customers had been affected, really sorry about that, etc. 8/27 received another emergency maintenance notification 8/29 received another emergency maintenance notification I did not do any investigation so I don't know any more then the above but there clearly is some work being done that has not gone exactly as planned. Meanwhile, peering will remain down until they things get straightened out. I suppose the short answer is, we've seen problems for about 10 days. Hope they get things worked out. -wil On Aug 30, 2008, at 11:42 AM, Rusty Hodge wrote:
Anyone using transit from GNi at 365main seeing problems on routes that normally go over Above.net?
For the last 36 hours, we've had problems. GNi isn't saying anything except that "replacement router cards are being delivered".
I'm not sure if it's just routing via Above.net but lots of routes going through core-01.ge-1-1.sfo1.gni.com seems to be getting dropped.
Received this update for GNi:
At this time we believe we have found a Cisco day 0 network vulnerability.
We have 20+ routers in our core network - 6 of the 20 have the identical route processor and IOS version. These 6 have been affected in 3 separate geographical locations in the past several days. The network issues range from simple SNMP failure to loss of BGP or OSPF communications. These are creating black holes intermittently across our network. We are experiencing as much as 20% of networks not available at this time.
We have been coordinating with Cisco for the past two days and have deployed patches to address the problem. Overall, we have determined that the fastest, surest path to restoring 100% network normality is to replace these 6 routers. We will continue this process to replace these 6 routers.
Two of the six affected routers have already been replaced and clients have been moved. We are now replacing the remaining 4 affected routers. We know that this week has been very painful and frustrating for you. We will provide a detailed and open RFO Reason For Outage post mortem document as soon as we have the replaced the affected routers.
Also, I will be updating all customers who have opened tickets at least every two hours from this time to ensure that all customers have the latest status information. If you would prefer a phone call status report, please note it on the ticket and I will also give you a call to answer any additional questions you may have. In addition, you may call the NOC at your convenience at (415) 979-9786.
Thank you for your patience.
participants (2)
-
Rusty Hodge
-
Wil Schultz