On Fri, Mar 11, 2011 at 6:33 PM, Owen DeLong <owen@delong.com> wrote:
Yes, you can bring as much of the pain from IPv4 forward into IPv6 as you like. You can also commit many other acts of masochism.
This is the problem with "Fundamentalists," such as yourself, Owen. You think that "fixing" things which work fine (like reasonable-sized VLSM LANs for content farms) is worth introducing a DDoS vulnerability for which there is no current defense, and for which the only feasible defense is either reversing your choice and renumbering the subnet from /64 to /smaller, or waiting until your vendors supply you with patched images for your routers and/or switches. You need to move beyond this myopic view that /64 provides a benefit that is worth this kind of operational sacrifice. When vendors cough up some more knobs, I'll be right there with you, configuring /64 subnets. I've already allocated them! It's pretty easy for me to renumber my /120 subnets to /64, after all -- I don't have to update any zone files for public-facing services, or modify significant configuration for software -- I just have to reconfigure my router and host interfaces from /120 to /64. You, on the other hand, may have addresses in use all over that /64, and condensing them into a smaller subnet is guaranteed to be at least as hard as my work for growing my subnet, and may be much more difficult -- every bit as difficult as renumbering from one IPv4 block to another. Given the current state of IPv6, your "Fundamentalist" way introduces new problems *and* brings the old ones forward. This makes no sense, but Fundamentalists rarely do.
Personally, I prefer to approach IPv6 as a way to reduce some of the more painful aspects of IPv4, such as undersized subnets, having to renumber or add prefixes for growth, limited aggregation, NAT, and more.
I look forward to that when it works. As I've noticed, I have prepared to take advantage of those things as soon as the NDP issue is resolved.
None of that "standard IPv6 automatic stuff" works today, anyway. The state of IPv6 support on end-user CPE generally ranges from As someone using SLAAC in a number of environments, I'm confused by this statement. It seems to be working quite well in many places and end-user residential networks are certainly not the only places where it is useful.
Your definition of "working quite well in many places" is different than mine. I'll come around to your point of view when it is possible to get working IPv6 connectivity from most major end-user ISPs, and all (or close enough) the CPE being sold at Fry's and Best Buy works right. We are pretty far from that right now. This is another thing the "IPv6 Fundamentalists" seem to ignore. CPE support is almost non-existent, ISP support is not there (some tier-1 transit networks still have no IPv6 product!), and the major IXPs still have three orders of magnitude more IPv4 traffic than IPv6. Cogent, Level3, and Hurricane Electric still can't decide that it's in their mutual interest to exchange IPv6 traffic with each-other, and their customers don't care enough to go to another service provider, because IPv6 is largely unimportant to them. None of this stuff "works" today. You aren't seeing DDoS scenarios on the v6 network today because the largest IPv4 DDoS attacks are larger than the total volume of inter-domain IPv6 traffic.
Most of the top-of-rack switches I'm aware of have no problem doing at least 64k NDP/ARP entries. Many won't do more than that, but, most will go at least that far.
Owen, this statement is either: 1) a gross misunderstanding on your part, because you can't or don't read spec sheets, let alone test gear 2) you've never seen or used a top-of-rack switch or considered buying one long enough to examine the specs 3) your racks are about 3 feet taller than everyone else's and you blow 100k on switching for every few dozen servers 4) an outright lie, although not an atypical one for the "IPv6 Fundamentalist" crowd I'd like you to clarify which of these is the case. Please list some switches which fit your definition of "top-of-rack switch" that support 64k NDP entries. Then list how many "top-of-rack" switches you are currently aware of. Don't bother listing the ones you know don't support 64k, because I'll gladly provide a list of plenty more of those, than the number of switches which you find to support 64k in a ToR form-factor. For those following along at home, how many ToR switches do indeed support at least 64k NDP entries? Unlike Owen, I know the answer to this question: Zero. There are no ToR switches that support >= 64k NDP table entries. Of course, I don't really mean to call Owen a liar, or foolish, or anything else. I do mean to point out that his "facts" are wrong and his argument not based in the world of reality. He is a "Fundamentalist," and is part of the problem, not the solution.
I find it interesting that you _KNOW_ that /64 LANs will cause you DoS problems and yet we've been running them for years without incident.
That's because I understand how packet forwarding to access LANs actually works. You don't. Again, the biggest DDoS attacks today dwarf the whole volume of inter-domain IPv6 traffic. *Routine* IPv4 attacks are greater than the peak IPv6 traffic at any IXP. IPv6 hasn't seen any real DDoS yet. It will probably happen soon.
There are several things that could eventually be implemented in the access switch software. Techniques like rapidly timing out unanswered NDP requests, not storing ND entries for SLAAC MAC-based suffixes (after all, the information you need is already in the IP address, just use that). Not storing ND entries for things that don't have an entry in the MAC forwarding table (pass the first ND packet and if you get a response, create the ND entry at that time), etc.
I am glad you have given this some thought. The things you mention above are not bad, but they don't fix the problem. There are several practical solutions available which require pretty straight-forward router/switch knobs. Vendors will *eventually* deliver these knobs. Probably not before IPv6 is deployed enough that we see real DDoS, though; and if the most popular fix becomes dependent on NDP inspection ... you can forget about benefiting from that fix if you still have 10-year-old access switches. I do have 10-year-old access switches, and older. I'm not upgrading them specifically because vendors aren't offering the needed knobs to solve this problem. I want budgetary resources to be available to me when that time comes. To Cisco/Foundry/Juniper/et al: I've been waiting a real long time to upgrade these old beasts, and whichever of you gets me a fix I consider practical first, is very likely to be the vendor that gets to sell me > 1000 new ToR switches. Unless Cisco feels like back-porting a fix to my older platforms, which I see as unlikely, I am quite prepared to replace 10 - 15 year old switches when NDP flooding fix is among the benefits I receive. I really hope my 0 to 5 year old switches all get back-ported fixes, or I'll be pretty displeased.
Yes, these all involve a certain amount of changing some expected behaviors, but, those changes could probably be easily accommodated in most environments.
This can be fixed without changing *any* behavior at all from the host's perspective. We just need the knobs. To get that, we need people like you to stop telling people this isn't a problem, and to start telling your vendors that "the sky is falling," and asking for some specific fix that you think is practical, or a fix in general, if you don't think you have a truly practical idea.
Finally, the bottom line is that a rogue host behind your firewall is probably going to cause other forms of damage well before it runs you out of ND entries and any time you have such a thing, it's going to be pretty vital to identify and remove it as fast as possible anyway.
I've seen this argument before, too. We all have. It doesn't hold water. Once again, you "Fundamentalists" think that if there is any case where a fix might not be helpful, or you can distract attention from this issue to one of host security, you try to do so. Let me give you the case I care about: Script kiddie hacks one server in a multi-use hosting datacenter, which is served by a layer-3 switch aggregating hundreds of customers. Script kiddie decides to DoS someone from his newly-hacked server, and uses random source addresses within the configured /64. Maybe he intends to DoS the upstream aggregation switch, or maybe it doesn't even occur to him. Either way, my NDP table immediately becomes full (even with only a few hundred PPS.) Are any other customers affected? Yes, potentially all the customers on this layer-3 switch are affected. Definitely all the customers on this VLAN/subnet are affected, even with the Cisco knob (which is better than all VLANs/subnets breaking.) Now replace the "some script kiddie" scenario with something that's simply misconfigured or buggy. You don't even have to be compromised.
I'm glad SLAAC is an option, but that's all it is, an option. /64 LANs must also be considered optional, and should be considered useful They are entirely optional, but, IMHO, avoiding them at all costs such as you seem to be suggesting is unnecessarily painful in most environments.
"At all costs?" Again, more "IPv6 Fundamentalist" talk. What is the cost of configuring a /120 instead of a /64 on my LAN, if I already know I don't want SLAAC on this LAN? None. Might the subnet need to grow? I'll grant that, but it's a minimal cost compared to a DoS vulnerability which can be exploited trivially.
I'd settle for Cisco coming to the point of having RA guard universally available on all switch products. That, to me, is a much more pressing issue than this imagined ND exhaustion attack which, in reality, requires near DDOS levels of traffic for most networks to actually run the ND table meaningfully into overflow.
First, I agree, we need more knobs on all switching ports, period. Vendors are not delivering, and I'm not buying any more access switches than I absolutely have to. I have been putting off upgrades for *years* because of these things. Anytime clients ask me, "should we replace these old switches? They work but.. they're pretty old!" I say "no, wait until X." This is X. Second, ND exhaustion attack is hardly imagined. Go do it on a box. Any box, pick one. They all break, period. The failure mode differs somewhat from one vendor to the next (when entries are evicted, etc.) but *every router* breaks today, period. Third, I don't know what you mean by "near DDOS levels of traffic," but I already know that you are unfamiliar with common NDP table sizes. 4k - 8k is actually a pretty common range of supported NDP entries for modern layer-3 ToR switches, and I'm talking about 1-year-old, 40x10GbE switches from reputable vendors. As you might imagine, it only takes a few thousand packets to fill this up, and aging timers these days get up to several *hours* before entries are evicted. One packet per second is plenty. One PPS! Some DDoS, huh? If this sounds like a "magic packet" issue to some, remember, it's not. This is not "ping of death" or "winnuke," and it's not smurf. It's the same thing that happens if you toss a /8 on an IPv4 LAN and start banging away at the ARP table, while expecting all of your legitimate hosts within that /8 to continue working correctly. We all know that's crazy, right? How is it suddenly less crazy to put an even larger subnet on an IPv6 LAN without gaining any direct benefits from doing so? Remember, many LANs don't need SLAAC. The VPS farm sure doesn't. The router point-to-point doesn't. Any person who would tell you to configure a /64 for those LANs is an "IPv6 Fundamentalist." -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts