On Tue, Jul 12, 2011 at 11:42 AM, Leo Bicknell <bicknell@ufp.org> wrote:
I'll pick on LISP as an example, since many operators are at least aware of it. Some operators have said we need a locator and identifier split. Interesting feedback. The IETF has gone off and started playing in the sandbox, trying to figure out how to make that go.
As an operator (who understands how most things work in very great detail), I found the LISP folks very much uninterested in my concerns about if LISP can ever be made to scale up to "Internet-scale," with respect to a specific DDoS vector. I also think that an explosion of small, multi-homed SOHO networks would be a disaster, because we might have 3 million FIB instead of 360k FIB after a few years. These things are directly related to each-other, too. So I emailed some LISP gurus off-list and discussed my concern. I was encouraged to post to the LISP IETF list, which I did. To my great surprise, not one single person was interested in my problem. If you think it is a small problem, well, you should try going back to late-1990s flow-cache routing in your data-center networks and see what happens when you get DDoS. I am sure most of us remember some of those painful experiences. Now there is a LISP "threats" draft which the working group mandates they produce, discussing various security problems. The current paper is a laundry list of "what if" scenarios, like, what if a malicious person could fill the LISP control-plane with garbage. BGP has the same issue, if some bad guy had enable on a big enough network that their peers/transits don't filter their routes, they could do a lot of damage before they were stopped. This sometimes happens even by accident, for example, some poor guy accidentally announcing 12/9 and giving AT&T a really bad day. What it doesn't contain is anything relevant to the special-case DDoS that all LISP sites would be vulnerable to, due to the IMO bad flow-cache management system that is specified. I am having a very great deal of trouble getting the authors of the "threats" document to even understand what the problem is, because as one of them put it, he is "just a researcher." I am sure he and his colleagues are very smart guys, but they clearly do not remember our 1990s pains. That is the "not an operator" problem. It is understandable. Others who have been around long enough simply dismiss this problem, because they believe the unparalleled benefits of LISP for mobility and multi-homing SOHO sites must greatly out-weigh the fact that, well, if you are a content provider and you receive a DDoS, your site will be down and there isn't a damn thing you can do about it, other than spec routers that have way, way more FIB than the number of possible routes, again due to the bad caching scheme. The above is what I think is the "ego-invested" problem, where certain pretty smart, well-intentioned people have a lot of time, and professional credibility, invested in making LISP work. I'm sure it isn't pleasing for these guys to defend their project against my argument that it may never be able to reach Internet-scale, and that they have missed what I claim is a show-stopping problem with an easy way to improve it through several years of development. Especially since I am a guy who did not ever participate in the IETF before, someone they don't know from a random guy on the street. I am glad that this NANOG discussion has got some of these LISP folks to pay more attention to my argument, and my suggested improvement (I am not only bashing their project; I have positive input, too.) Simply posting to their mailing list once and emailing a few draft authors did not cause any movement at all. Evidently it does get attention, though, to jump up and down on a different list. Go figure! If operators don't provide input and *perspective* to things like LISP, we will end up with bad results. How many of us are amazed that we still do not have 32:32 bits BGP communities to go along with 32 bit ASNs, for signalling requests to transit providers without collision with other networks' community schemes? It is a pretty stupid situation, and yet here we are, with 32 bit ASN for years, and if you want to do advertisement control with 32 bit ASNs used, you are either mapping your 32 bit neighbors to special numbers, or your community scheme can overlap with others. That BGP community problem is pretty tiny compared to, what if people really started rolling out something new and clever like LISP, but in a half-baked, broken way that takes us back to 1990s era of small DDoS taking out whole data-center aggregation router. A lot of us think IPv6 is over-baked and broken, and probably this is why it has taken such a very long time to get anywhere with it. But ultimately, it is our fault for not participating. I am reversing my own behavior and providing input to some WGs I care about, in what time I have to do so. More operators should do the same. Otherwise, we have no right to blame the people who do participate in IETF, because we aren't part of the solution. -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts