MAI's view: Here's a status on what AS 7007 themselves saw and think happened. They're disconnected from the 'net now, so they can't post this themselves. 7007 is an ASN used by mai.net, MAI Network Services. They also use ASN 6082. Their topology at their new data center is: [asn 6082] Bay BLN [fddi] -> mae-east, where they peer with some and have mae-transit from Sprintlink [1239] [asn 7007] Bay BLN [t3] -> sprintlink [1239 1790] [t3] -> customer X And the two BLNs are connected to each other via 100bt. Customer X announced something like a full route table, according to MAI, and MAI was not filtering on either AS_PATH or by distribute_list (or whatever Bay calls them) on the routes. MAI apologizes for this, and acknowledges that if they had done this, the major problem could have been avoided. At about 11:30am (this is when we saw it), they started redistributing more specifics for many (thousands) of CIDR routes to Sprintlink [1239 1790] but the routes were announced - AND the routes were injected with an origin AS of 7007, hiding the AS_PATH the routes had (and the customer involved). MAI noticed this at about 11:45. MAI shut down the Sprintlink connection and the same thing happened again. 72,000 is the number of routes the Sprint router took before it melted, but the Bay might have been willing to generate even more than that. Anyway, Sprintlink saw the routes again from MAI (the de-aggregated specifics, with an AS_PATH of ^7007$). Then they rebooted the router and the 7007 router was not advertising the specifics or hiding AS_PATH info, but the damage was done, and the more specifics were still out there for some reason. At about 12:15 they shut off all of their routers; shortly after Sprintlink shut down the T3 for good measure. MAI does NOT think that they were distributing the BGP routes into an IGP and then readvertising them on that basis; they think there was/is a Bay BGP bug that caused this to happen. MAI's NOC # is 888 624 8700. Vincent Bono of MAI wishes to apologize for the trouble; a Bay tech is on the scene in DC working on the problem to make sure that when they come back up, nothing like this will happen again. An outside view of what happened: At 11:30, I noticed that we lost connectivity to the world. Some of our customers called our NOC - and our dual-homed customers said that they saw hundreds of routes from ASN 7007. One of the customers noticed that the ASN 7007 routes were stomping on their OSPF routes (because the BGP routes from _1239 1790 7007$ were more specific than some of their internal /23 OSPF routes). And ASN 7007 We saw about 60k routes in our core routers at the time, and saw thousands of routes from 7007 when we looked more carefully. We kept clearing sessions to filter 7007 but the routes kept popping back up: Sprint (of course), UUNET, and MCI all had them. Also, we had to advertise some of our customers more specifically because the dampened 7007 routes (because everyone else was clearing at the time) were screwing their connectivity. I know that some others did this, so tomorrow the tables will have a bunch of more specifics, I suppose. A note: If all more specific routes for a destination are dampened, exsting more general routes will not be looked at. If this behavior was changed, some of the blackholing that went on today would not have been possible. But this is a separate topic for discussion. I suppose that the immediate topic will be route filtering vs. AS_PATH filtering... Thanks, Avi Freedman (the original) Net Access (netaxs.net)
===== Avi Freedman previously wrote: ====
MAI does NOT think that they were distributing the BGP routes into an IGP and then readvertising them on that basis; they think there was/is a Bay BGP bug that caused this to happen.
Probably. I remember seeing it before, when JVNC was advertising our customer routes as more specifics, similar problem on a much much smaller scale. And when I called them, I was told that it was a Bay bug. JVNC blocked the announcements at CIX to fix it (and since their upstream MCI uses the registered route objects, JVNC's extra routes were not advertised into MCI). And I thought the problem was fixed since then by Bay. That was more than a year ago if memory serves. Jun -- Jun (John) Wu | Voice: (703)689-5325 Supervisor - Global IP Systems & Services | Fax: (703)478-7852 Global One Communications L.L.C. | Email: jun@gsl.net Reston, VA 20196 | URL: http://wolfox.gsl.net/jun
participants (2)
-
Avi Freedman
-
Jun Wu