On Jul 17, 2011, at 1:17 PM, Jeff Wheeler wrote:
On Sun, Jul 17, 2011 at 3:40 PM, Owen DeLong <owen@delong.com> wrote:
Basically an ND entry would have the following states and timers:
I've discussed what you have described with some colleagues in the past. The idea has merit and I would certainly not complain if vendors included it (as a knob) on their boxes. The downfalls of this approach are that they still don't ensure the discovery of new neighbors (rather than "ever seen" neighbors) during DoS, and you make the local DoS a bit more complex by needing to establish more rules for purging these semi-permanent entries.
Sure they do... Just not necessarily on the first attempt. There are no semi-permanent entries. In fact, it doesn't make any entry more permanent than today's state. The D flag just makes entries more readily discardable than today's entries. So you have some misconceptions about how it would work in practice, I think. Under DOS, the first packet that arrives for a known host generates the standard ND request sent to the host, but, the Incomplete ND table entry is created with the D flag set. If the host responds before the ND table entry is discarded, all functions as normal. If the entry is discarded before the host responds, then the response from the host creates a new incomplete entry without the D flag set. This entry will live for the normal time that an incomplete ND entry would be kept (not eligible for early discard) and the retry packet from the originating host would then generate a new ND request and the response should arrive before the normal incomplete ND timer expires. At that point a normal complete entry is created and things continue to function. So, what happens under this scenario is that you have a small chance that you need to wait for an initial connection retry on an unseen host, but, you can easily discard incomplete ND entries for which no response has yet been received. Further, since you're only discarding the oldest one entry each time you need to create a new entry in a full table, this would only start discarding things when an actual table overflow is occurring whether from DOS or other cause. If it's another cause, I don't think this makes life any worse. If it's DOS, then, it should be relatively rare that a responsive host is the oldest ND table entry that would get discarded, no?
I think most of this punting could be handled at the line card level. Is there any reason that the ND process can't be moved into line-card level silicon as described above?
You could implement ND solicit in the data-plane (and remove punts entirely) in even some current chips, to say nothing of future ones. Whether or not that is a good idea, well, keep in mind that the ND solicits would then be mcasted to the LAN at a potentially unlimited rate.
There's no reason it would have to be an unlimited rate, but, I think that would probably be acceptable in most cases anyway.
That is not necessarily a problem unless the L2 implementation is not too good with respect to multicast. For example, in some "switches" (mostly those that are routers that can switch) the L2 mcast has surprising caveats, such as using up a lot of fabric capacity for whatever replication scheme has been chosen.
If your L2 implementation sucks on Mcast in IPv6, you're kind of in a bad way anyway.
Of course, you also hope NDP on all the connected hosts works right. I believe some Juniper customers noticed a pretty big problem with JUNOS NDP implementation when deploying boxes using the DE-CIX addressing scheme, and in a situation like that, the ingress router for the attack could be crippled by spurious responses from the other mis-behaving hosts on the LAN, essentially like smurf except without sending any garbage back out to the Internet.
I think the bad NDP implementations on the hosts will get sorted fairly quickly anyway. Since all a spurious hosts would do is create a new incomplete entry without the D flag set the FIRST time it sends an unsolicited ND response, I'm not sure how that would really cripple the ingress router. Care to explain that?
What you definitely don't want to do is assume this fixes the local DoS, because it doesn't. I would like for you to keep in mind that a host on the LAN, misconfigured to do something like "local proxy-arp," or otherwise responding to all ND solicits, would accidentally DoS the LAN's gateway. I do not think we should assume that the local DoS won't happen, or is "fixable" with a whack-a-mole method.
I consider local DOS to be a corner case unique to universities and very poorly run colos. We've already had that discussion and IIRC agreed to disagree.
Sure, that doesn't solve the problem on current hardware, but, it moves it from design problem to implementation issue, which IMHO is a step in the right direction.
Well, it already is a design problem that implementations can largely work-around. Vendors just aren't doing it. :-/
Well, I think provided a simple solution as outlined above it might be easier to get them to do so if they think there is demand. I know I'll be discussing this with the guy that deals with our vendors to see if we can convince them to roll it into an upcoming release. Owen