Steven, all, On Wed, Jan 25, 2006 at 03:04:30PM -0500, Steven M. Bellovin wrote:
It's now been 2.5 business days since Panix was taken out. Do we know what the root cause was? It's hard to engineer a solution until we know what the problem was.
I keep hearing that Con Ed Comm was previously an upstream of of Panix ( http://www.renesys.com/blog/2006/01/coned_steals_the_net.shtml#comments ) and that this might have explained why Con Ed had Panix routes in their radb as-27506-transit object. But I checked our records of routing data going back to jan 1, 2002, and see no evidence of 27506 and 2033 being adjacent to each other in any announcement from any of our peers at any time since then. So I can't really verify that Panix was ever a Con Ed Comm customer. Can anyone else clear this up? So far, it's not making sense. The supposition was that all of the other affected ASes that are not currently customers of Con Ed Comm were also previously customers. Some appear to have been (Walrus Internet (AS7169), Advanced Digital Internet (AS23011), and NYFIX (AS20282) for sure) but I haven't been able to verify that all of them were. I know that this isn't really a "root cause" that Steven was asking for, though. The root cause is that filtering is imperfect and out of date frequently. This case is particularly intersting and painful because Verio is known for building good filters automatically. In this case, they did so based on out-of-date information, unfortunately. This is particularly depressing because normally in cases of leaks like this, the propagation is via some provider or peer who doesn't filter at all. In this case, one of the vectors was one of the most responsible filterers on the net. sigh. So in terms of engineering good solutions, the space is pretty crowded. One camp is of the "total solution" variety that involves new hardware, new protocols, and a Public Key approach where originations (or any announcements) are signed and verified. This is obviously a very good and complete approach to the problem but it's also obviously seeing precious little adoption. And in the mean time we have nothing. Another set of approaches has been to look at alternate methods of building filters, taking into account more information about history of routing announcements and dampening or refusing to accept novel, questionable announcements for some fixed, short amount of time. Josh Karlin's paper suggests that as does some of the stuff that Tom Scholl, Jim Deleskie and I presented at the last nanog. All of this has the disadvantage of being a partial solution, the advantage of being implementable easily and in stages without a network forklift or a protocol upgrade, but the further disadvantage of being nowhere near fully baked. Clearly more, smarter people need to keep searching for good solutions to this set of problems. Extra credit for solutions that can be implemented by individual autonomous systems without hardware upgrades or major protocol changes, but that may not be possible. t. p.s.: wrt comments made previously that imply that moving parts of routing control off of the routers is "Bell-like" or "bell-headed": although the comments are silly and made somewhat in jest, they're obviously not true. anyone who builds prefix filters or access lists off of routers is already generating policy somewhere other than the router. using additional history or smarts to do that and uploading prefix filters more often doesn't change that existing architecture or make the network somehow "bell-like". it might not work well enough to solve the problem, but that's another, interesting objection. -- _____________________________________________________________________ todd underwood chief of operations & security renesys - internet intelligence todd@renesys.com http://www.renesys.com/blog