Re: Internet Edge Router replacement - IPv6 route table size considerations
have you looked into juniper networks? ----- Reply message ----- From: "Chris Enger" <chrise@ci.hillsboro.or.us> Date: Tue, Mar 8, 2011 5:15 pm Subject: Internet Edge Router replacement - IPv6 route table size considerations To: "'nanog@nanog.org'" <nanog@nanog.org> Greetings, I am researching possible replacements for our Internet edge routers, and wanted to see what people could recommend for a smaller chassis or fixed router that can handle current IPv4 routes and transition into IPv6. Currently we have Brocade NetIron 4802s pulling full IPv4 routes plus a default route. I've looked at Extreme, Brocade, Cisco, and a few others. Most range from 256k - 500k IPv4 and 4k - 16k IPv6 routes when CAM space is allocated for both. The only exception I've found so far is the Cisco ASR 1002, which can do 125k v6 along with 500k v4 routes at once. I'm curious if any other vendors have comparable products. My concern is trying to find a router (within our budget) that has room for growth in the IPv6 routing space. When compared to the live table sizes that the CIDR report and routeviews show, some can't handle current routing tables, let alone years of growth. BGP tweaks may keep us going but I can't see how 16k or fewer IPv6 routes on a router is going to be viable a few years from now. Thank you, Chris Enger
I did look at a Juniper J6350, and the documentation states it can handle 400k routes with 1GB of memory, or 1 million with 2GB. However it doesn’t spell out how that is divvyed up between the two based on a profile setting or some other mechanism. Chris From: tsison@gmail.com [mailto:tsison@gmail.com] Sent: Tuesday, March 08, 2011 4:33 PM To: Chris Enger; 'nanog@nanog.org' Subject: Re: Internet Edge Router replacement - IPv6 route table size considerations have you looked into juniper networks? ----- Reply message ----- From: "Chris Enger" <chrise@ci.hillsboro.or.us> Date: Tue, Mar 8, 2011 5:15 pm Subject: Internet Edge Router replacement - IPv6 route table size considerations To: "'nanog@nanog.org'" <nanog@nanog.org> Greetings, I am researching possible replacements for our Internet edge routers, and wanted to see what people could recommend for a smaller chassis or fixed router that can handle current IPv4 routes and transition into IPv6. Currently we have Brocade NetIron 4802s pulling full IPv4 routes plus a default route. I've looked at Extreme, Brocade, Cisco, and a few others. Most range from 256k - 500k IPv4 and 4k - 16k IPv6 routes when CAM space is allocated for both. The only exception I've found so far is the Cisco ASR 1002, which can do 125k v6 along with 500k v4 routes at once. I'm curious if any other vendors have comparable products. My concern is trying to find a router (within our budget) that has room for growth in the IPv6 routing space. When compared to the live table sizes that the CIDR report and routeviews show, some can't handle current routing tables, let alone years of growth. BGP tweaks may keep us going but I can't see how 16k or fewer IPv6 routes on a router is going to be viable a few years from now. Thank you, Chris Enger
On 09/03/11 11:57, Chris Enger wrote:
I did look at a Juniper J6350, and the documentation states it can handle 400k routes with 1GB of memory, or 1 million with 2GB. However it doesn’t spell out how that is divvyed up between the two based on a profile setting or some other mechanism. It's a software router so the short answer is "it isn't"
From others the Cisco ASR1k or Brocade NetIron XMR (2M routes IIRC) are
With 3GB of RAM both a 4350 and 6350 can easily handle multiple IPv4 feeds and an IPv6 feed (3GB just happens to be what I have due to upgrading from 1GB by adding a pair of 1GB sticks) If you need more then ~500Mbit or so then you would want something bigger. The MX80 is nice and has some cheap bundles at the moment; it's specced for 8M routes (unspecified, but the way Juniper chips typically store routes there's less difference in size then the straight 4x) the obvious choices.
On 09/03/11 12:08, Julien Goodwin wrote:
On 09/03/11 11:57, Chris Enger wrote:
I did look at a Juniper J6350, and the documentation states it can handle 400k routes with 1GB of memory, or 1 million with 2GB. However it doesn’t spell out how that is divvyed up between the two based on a profile setting or some other mechanism. It's a software router so the short answer is "it isn't"
With 3GB of RAM both a 4350 and 6350 can easily handle multiple IPv4 feeds and an IPv6 feed (3GB just happens to be what I have due to upgrading from 1GB by adding a pair of 1GB sticks)
If you need more then ~500Mbit or so then you would want something bigger. The MX80 is nice and has some cheap bundles at the moment; it's specced for 8M routes (unspecified, but the way Juniper chips typically store routes there's less difference in size then the straight 4x)
From others the Cisco ASR1k or Brocade NetIron XMR (2M routes IIRC) are the obvious choices. And I meant Brocade NetIron CES here.
Our Brocade reps pointed us to the CER 2000 series, and they can do up to 512k v4 or up to 128k v6. With other Brocade products they spell out the CAM profiles that are available, however I haven't found specifics on the CER series. Chris -----Original Message----- From: Julien Goodwin [mailto:nanog@studio442.com.au] Sent: Tuesday, March 08, 2011 5:09 PM To: 'nanog@nanog.org' Cc: Chris Enger Subject: Re: Internet Edge Router replacement - IPv6 route table size considerations On 09/03/11 12:08, Julien Goodwin wrote:
On 09/03/11 11:57, Chris Enger wrote:
I did look at a Juniper J6350, and the documentation states it can handle 400k routes with 1GB of memory, or 1 million with 2GB. However it doesn’t spell out how that is divvyed up between the two based on a profile setting or some other mechanism. It's a software router so the short answer is "it isn't"
With 3GB of RAM both a 4350 and 6350 can easily handle multiple IPv4 feeds and an IPv6 feed (3GB just happens to be what I have due to upgrading from 1GB by adding a pair of 1GB sticks)
If you need more then ~500Mbit or so then you would want something bigger. The MX80 is nice and has some cheap bundles at the moment; it's specced for 8M routes (unspecified, but the way Juniper chips typically store routes there's less difference in size then the straight 4x)
From others the Cisco ASR1k or Brocade NetIron XMR (2M routes IIRC) are the obvious choices. And I meant Brocade NetIron CES here.
-----Original Message----- From: Chris Enger [mailto:chrise@ci.hillsboro.or.us] Sent: Tuesday, March 08, 2011 5:18 PM To: 'jgoodwin@studio442.com.au'; 'nanog@nanog.org' Subject: RE: Internet Edge Router replacement - IPv6 route table sizeconsiderations
Our Brocade reps pointed us to the CER 2000 series, and they can do up to 512k v4 or up to 128k v6. With other Brocade products they spell out the CAM profiles that are available, however I haven't found specifics on the CER series.
Chris \
CER features are here: http://www.brocade.com/products/all/routers/product-details/netiron-cer-2000...
We use both NI-CERs and NI-XMRs for less than 175k. Work with a rep. Don't go by list. The price depends on quantity and configuration. So less than 175k could mean 80k or 500k for your config. -Bret Sent from my iPhone On Mar 8, 2011, at 6:59 PM, George Bonser <gbonser@seven.com> wrote:
-----Original Message----- From: Chris Enger [mailto:chrise@ci.hillsboro.or.us] Sent: Tuesday, March 08, 2011 5:18 PM To: 'jgoodwin@studio442.com.au'; 'nanog@nanog.org' Subject: RE: Internet Edge Router replacement - IPv6 route table sizeconsiderations
Our Brocade reps pointed us to the CER 2000 series, and they can do up to 512k v4 or up to 128k v6. With other Brocade products they spell out the CAM profiles that are available, however I haven't found specifics on the CER series.
Chris \
CER features are here:
http://www.brocade.com/products/all/routers/product-details/netiron-cer-2000...
Get a cheap J series, load it full of memory, forget about it. If you haven't played with Juniper gear before, you will be quite pleased. -Jack Carrozzo On Tue, Mar 8, 2011 at 8:58 PM, George Bonser <gbonser@seven.com> wrote:
-----Original Message----- From: Chris Enger [mailto:chrise@ci.hillsboro.or.us] Sent: Tuesday, March 08, 2011 5:18 PM To: 'jgoodwin@studio442.com.au'; 'nanog@nanog.org' Subject: RE: Internet Edge Router replacement - IPv6 route table sizeconsiderations
Our Brocade reps pointed us to the CER 2000 series, and they can do up to 512k v4 or up to 128k v6. With other Brocade products they spell out the CAM profiles that are available, however I haven't found specifics on the CER series.
Chris \
CER features are here:
http://www.brocade.com/products/all/routers/product-details/netiron-cer-2000...
MX80 is perfect for this.. 5g 10g bundles are cheap.. On Mar 8, 2011 8:49 PM, "Jack Carrozzo" <jack@crepinc.com> wrote:
Get a cheap J series, load it full of memory, forget about it. If you haven't played with Juniper gear before, you will be quite pleased.
-Jack Carrozzo
On Tue, Mar 8, 2011 at 8:58 PM, George Bonser <gbonser@seven.com> wrote:
-----Original Message----- From: Chris Enger [mailto:chrise@ci.hillsboro.or.us] Sent: Tuesday, March 08, 2011 5:18 PM To: 'jgoodwin@studio442.com.au'; 'nanog@nanog.org' Subject: RE: Internet Edge Router replacement - IPv6 route table sizeconsiderations
Our Brocade reps pointed us to the CER 2000 series, and they can do up to 512k v4 or up to 128k v6. With other Brocade products they spell out the CAM profiles that are available, however I haven't found specifics on the CER series.
Chris \
CER features are here:
http://www.brocade.com/products/all/routers/product-details/netiron-cer-2000...
But, even one of the small MX80 bundles are about the price of 5 J4350s or 3 J6350s. Granted, if you need the throughput, it is very difficult to beat an MX80, particularly one of the 5g or 10g bundles. -Randy ----- Original Message -----
MX80 is perfect for this.. 5g 10g bundles are cheap.. On Mar 8, 2011 8:49 PM, "Jack Carrozzo" <jack@crepinc.com> wrote:
Get a cheap J series, load it full of memory, forget about it. If you haven't played with Juniper gear before, you will be quite pleased.
-Jack Carrozzo
On Tue, Mar 8, 2011 at 8:58 PM, George Bonser <gbonser@seven.com> wrote:
-----Original Message----- From: Chris Enger [mailto:chrise@ci.hillsboro.or.us] Sent: Tuesday, March 08, 2011 5:18 PM To: 'jgoodwin@studio442.com.au'; 'nanog@nanog.org' Subject: RE: Internet Edge Router replacement - IPv6 route table sizeconsiderations
Our Brocade reps pointed us to the CER 2000 series, and they can do up to 512k v4 or up to 128k v6. With other Brocade products they spell out the CAM profiles that are available, however I haven't found specifics on the CER series.
Chris \
CER features are here:
http://www.brocade.com/products/all/routers/product-details/netiron-cer-2000...
I think this is the point where I get a shovel, a bullwhip and head over to the horse graveyard that is CAM optimization... -C On Mar 8, 2011, at 5:18 20PM, Chris Enger wrote:
Our Brocade reps pointed us to the CER 2000 series, and they can do up to 512k v4 or up to 128k v6. With other Brocade products they spell out the CAM profiles that are available, however I haven't found specifics on the CER series.
Chris
-----Original Message----- From: Julien Goodwin [mailto:nanog@studio442.com.au] Sent: Tuesday, March 08, 2011 5:09 PM To: 'nanog@nanog.org' Cc: Chris Enger Subject: Re: Internet Edge Router replacement - IPv6 route table size considerations
On 09/03/11 12:08, Julien Goodwin wrote:
On 09/03/11 11:57, Chris Enger wrote:
I did look at a Juniper J6350, and the documentation states it can handle 400k routes with 1GB of memory, or 1 million with 2GB. However it doesn’t spell out how that is divvyed up between the two based on a profile setting or some other mechanism. It's a software router so the short answer is "it isn't"
With 3GB of RAM both a 4350 and 6350 can easily handle multiple IPv4 feeds and an IPv6 feed (3GB just happens to be what I have due to upgrading from 1GB by adding a pair of 1GB sticks)
If you need more then ~500Mbit or so then you would want something bigger. The MX80 is nice and has some cheap bundles at the moment; it's specced for 8M routes (unspecified, but the way Juniper chips typically store routes there's less difference in size then the straight 4x)
From others the Cisco ASR1k or Brocade NetIron XMR (2M routes IIRC) are the obvious choices. And I meant Brocade NetIron CES here.
-----Original Message----- From: Chris Woodfield [mailto:rekoil@semihuman.com] Sent: Wednesday, March 09, 2011 6:11 PM To: Chris Enger Cc: 'jgoodwin@studio442.com.au'; 'nanog@nanog.org' Subject: Re: Internet Edge Router replacement - IPv6 route table sizeconsiderations
I think this is the point where I get a shovel, a bullwhip and head over to the horse graveyard that is CAM optimization...
-C
Well, it really isn't so bad. With Brocade FPGA gear you can change how much CAM is allocated to different functions (but you can't do it on the fly, it takes a reboot). I don't think these are available for the CER series, though. The MLX or MXR can be reconfigured. The thing is that the XMR and MLX are not ASIC-based devices, they are FPGA-based which means the hardware can be re-wired with a code change. Personally, I like to be able to reallocate CAM from features I am not using to features that I am using. And to be fair, Brocade has been improving over the past couple of years. Now if only we could route layer 3 on MCT VLANS ... (MCT is sort of like Arista mLAG but it is layer2 only at this point).
On Wed, Mar 09, 2011 at 07:00:57PM -0800, George Bonser wrote:
-----Original Message----- From: Chris Woodfield [mailto:rekoil@semihuman.com] Sent: Wednesday, March 09, 2011 6:11 PM To: Chris Enger Cc: 'jgoodwin@studio442.com.au'; 'nanog@nanog.org' Subject: Re: Internet Edge Router replacement - IPv6 route table sizeconsiderations
I think this is the point where I get a shovel, a bullwhip and head over to the horse graveyard that is CAM optimization...
-C
Well, it really isn't so bad. With Brocade FPGA gear you can change how much CAM is allocated to different functions (but you can't do it on the fly, it takes a reboot). I don't think these are available for the CER series, though. The MLX or MXR can be reconfigured. The thing is that the XMR and MLX are not ASIC-based devices, they are FPGA-based which means the hardware can be re-wired with a code change. Personally, I like to be able to reallocate CAM from features I am not using to features that I am using.
And to be fair, Brocade has been improving over the past couple of years. Now if only we could route layer 3 on MCT VLANS ... (MCT is sort of like Arista mLAG but it is layer2 only at this point).
My experience with Foundry/Brocade which is recent, is only with the FCX devices and I wished I had gone with something else. No SNMP stats for virtual vlan interfaces and when asking Brocade about it, you get told "it is too hard to program". You gotta be kiddin me .... Or how they do vlan configurations. Or how a FCX stack will crash when you do jumbo frames. -- Regards, Ulf. --------------------------------------------------------------------- Ulf Zimmermann, 1525 Pacific Ave., Alameda, CA-94501, #: 510-865-0204 You can find my resume at: http://www.Alameda.net/~ulf/resume.html
No SNMP stats for virtual vlan interfaces and when asking Brocade about it, you get told "it is too hard to program". You gotta be kiddin me ....
Yeah, that is something that has been bugging me. No stats on ve interfaces.
Or how they do vlan configurations.
I have complained about that, too. With Cisco you add vlans to ports, with Brocade you add ports to vlans. Subtle difference. You can't look at the config and very easily see which vlans are on which ports, you have to do something like: show vlan e 1/1/1 and parse through the output.
Or how a FCX stack will crash when you do jumbo frames.
I have been running jumbo frames with stacked FCX units, no problems so far. Running 7.2.00
Or how they do vlan configurations.
I have complained about that, too. With Cisco you add vlans to ports, with Brocade you add ports to vlans. Subtle difference. You can't look at the config and very easily see which vlans are on which ports, you have to do something like:
Extreme does the same. It has the great advantage that a trunk port doesn't magically allow all VLANs - which is an absolutely horrible default for Cisco in the SP case. Steinar Haug, Nethelp consulting, sthaug@nethelp.no
On Thu, Mar 10, 2011 at 08:30:17AM +0100, sthaug@nethelp.no wrote:
Or how they do vlan configurations.
I have complained about that, too. With Cisco you add vlans to ports, with Brocade you add ports to vlans. Subtle difference. You can't look at the config and very easily see which vlans are on which ports, you have to do something like:
Extreme does the same. It has the great advantage that a trunk port doesn't magically allow all VLANs - which is an absolutely horrible default for Cisco in the SP case.
Steinar Haug, Nethelp consulting, sthaug@nethelp.no
I can agree with no allowing all vlans by default, but Brocades way is just broken imho. -- Regards, Ulf. --------------------------------------------------------------------- Ulf Zimmermann, 1525 Pacific Ave., Alameda, CA-94501, #: 510-865-0204 You can find my resume at: http://www.Alameda.net/~ulf/resume.html
On Wed, Mar 09, 2011 at 08:52:54PM -0800, George Bonser wrote:
No SNMP stats for virtual vlan interfaces and when asking Brocade about it, you get told "it is too hard to program". You gotta be kiddin me ....
Yeah, that is something that has been bugging me. No stats on ve interfaces.
Or how they do vlan configurations.
I have complained about that, too. With Cisco you add vlans to ports, with Brocade you add ports to vlans. Subtle difference. You can't look at the config and very easily see which vlans are on which ports, you have to do something like:
show vlan e 1/1/1
and parse through the output.
Or how a FCX stack will crash when you do jumbo frames.
I have been running jumbo frames with stacked FCX units, no problems so far. Running 7.2.00
This is with code 07.2.00aT7f3. Had two units stacked together, rebooted/power cycled at least once and it worked. Next time we had to power cycle due to a bad config apply, second unit came back and as soon it would join the stack, it crashed. Brocade wanted us to remove it from the stack (remotely) and/or disabled jumbo frames. -- Regards, Ulf. --------------------------------------------------------------------- Ulf Zimmermann, 1525 Pacific Ave., Alameda, CA-94501, #: 510-865-0204 You can find my resume at: http://www.Alameda.net/~ulf/resume.html
On Wed, Mar 9, 2011 at 9:11 PM, Chris Woodfield <rekoil@semihuman.com> wrote:
I think this is the point where I get a shovel, a bullwhip and head over to the horse graveyard that is CAM optimization...
The classic problem with any sort of FIB optimization is that you can't optimize every figure on the spec sheet at once, at least not without telling lies to your customers! You can have more compact structures which require more memory accesses and clock cycles to perform look-ups, or you can have bigger structures which improve look-up speed at the expense of memory footprint. Since the market is pretty much used to everything being advertised as "wire speed" now, in order to continue doing look-ups at wire speed with an ever-increasing number of routes in the FIB and with entries having longer bit masks, you need more silicon -- more parallel look-up capability, faster (or parallel) memory, or "optimizations" which may not maintain wire speed for all use cases (cache, interleaving, etc.) As the guy making purchasing decisions, I really care about one thing: correct information on the spec sheet. You may have noticed that some recent spec sheets from Cisco include little asterisks about the number of routes which will fit on the FIB are based on "prefix length distribution," which means, in effect, that such "optimizations" are in effect and the box should perform at a guaranteed forwarding speed by sacrificing a guaranteed number of possible routes in FIB. Relating to IPv6 forwarding in particular, this produces an interesting problem when deploying the network: the IPv6 NDP table exhaustion issue. Some folks think it's a red herring; I obviously strongly disagree and point to Cisco's knob, which Cisco will gladly tell you only allows you to control the failure mode of your box (not prevent subnets/interfaces from breaking), as evidence. (I am not aware of any other vendors who have even added knobs for this.) If you configure a /64, you are much more likely to have guaranteed forwarding speed to that destination, and guaranteed number of routes in FIB. What you don't have is a guarantee that ARP/NDP will work correctly on the access router. If you choose to configure a /120, you may lose one or both of the first guarantees. The currently-available compromise is to configure a /120 on the access device and summarize to a /64 (or shorter) towards your aggregation/core. I see nothing wrong with this, since I allocate a /64 even if I only configure a /120 within it, and this is one of the driving reasons behind that decision (the other being a possible future solution to NDP table exhaustion, if one becomes practical.) The number of people thinking about the "big picture" of IPv6 forwarding is shockingly small, and the lack of public discussion about these issues continues to concern me. I fear we are headed down a road where the first large IPv6 DDoS attacks will be a major wake-up call for operators and vendors. I don't intend to be one of the guys hurriedly redesigning my access layer as a result, but I'm pretty sure that many networks will be in exactly that situation. -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts
If you configure a /64, you are much more likely to have guaranteed forwarding speed to that destination, and guaranteed number of routes in FIB. What you don't have is a guarantee that ARP/NDP will work correctly on the access router. If you choose to configure a /120, you may lose one or both of the first guarantees. The currently-available compromise is to configure a /120 on the access device and summarize to a /64 (or shorter) towards your aggregation/core. I see nothing wrong with this, since I allocate a /64 even if I only configure a /120 within it, and this is one of the driving reasons behind that decision (the other being a possible future solution to NDP table exhaustion, if one becomes practical.)
What I have done on point to points and small subnets between routers is to simply make static neighbor entries. That eliminates any neighbor table exhaustion causing the desired neighbors to become unreachable. I also do the same with neighbors at public peering points. Yes, that comes at the cost of having to reconfigure the entry if a MAC address changes, but that doesn't happen often.
On Thu, Mar 10, 2011 at 10:52:37AM -0800, George Bonser wrote:
What I have done on point to points and small subnets between routers is to simply make static neighbor entries. That eliminates any neighbor table exhaustion causing the desired neighbors to become unreachable. I also do the same with neighbors at public peering points. Yes, that comes at the cost of having to reconfigure the entry if a MAC address changes, but that doesn't happen often.
And this is better than just not trying to implement IPv6 stateless auto-configuration on ptp links in the first place how exactly? Don't get taken in by the people waving an RFC around without actually taking the time to do a little critical thinking on their own first, /64s and auto-configuration just don't belong on router ptp links. And btw only a handful of routers are so poorly designed that they depend on not having subnets longer than /64s when doing IPv6 lookups, and there are many other good reasons why you should just not be using those boxes in the first place. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
On Thu, 10 Mar 2011, Richard A Steenbergen wrote:
On Thu, Mar 10, 2011 at 10:52:37AM -0800, George Bonser wrote:
What I have done on point to points and small subnets between routers is to simply make static neighbor entries. That eliminates any neighbor table exhaustion causing the desired neighbors to become unreachable. I also do the same with neighbors at public peering points. Yes, that comes at the cost of having to reconfigure the entry if a MAC address changes, but that doesn't happen often.
And this is better than just not trying to implement IPv6 stateless auto-configuration on ptp links in the first place how exactly? Don't get taken in by the people waving an RFC around without actually taking the time to do a little critical thinking on their own first, /64s and auto-configuration just don't belong on router ptp links. And btw only a handful of routers are so poorly designed that they depend on not having subnets longer than /64s when doing IPv6 lookups, and there are many other good reasons why you should just not be using those boxes in the first place. :)
+1 Auto-config has its place, and I don't think core infrastructure is one of them. In our addressing plan, I've allocated /64s for each point-to-point link, but will use /127s in practice. That seemed like the best compromise between throwing /64s at everything and being prepared for the off-chance that something absolutely requires a /64. jms
Is anyone staying away from certain address ranges in /127s? I have seen where they say not to use the all zeros or end addresses from 1 - 127. Thoughts on this? -Mike -----Original Message----- From: Justin M. Streiner [mailto:streiner@cluebyfour.org] Sent: Thursday, March 10, 2011 10:36 AM To: Richard A Steenbergen Cc: nanog@nanog.org Subject: Re: Internet Edge Router replacement - IPv6 route table sizeconsiderations On Thu, 10 Mar 2011, Richard A Steenbergen wrote:
On Thu, Mar 10, 2011 at 10:52:37AM -0800, George Bonser wrote:
What I have done on point to points and small subnets between routers is to simply make static neighbor entries. That eliminates any neighbor table exhaustion causing the desired neighbors to become unreachable. I also do the same with neighbors at public peering points. Yes, that comes at the cost of having to reconfigure the entry if a MAC address changes, but that doesn't happen often.
And this is better than just not trying to implement IPv6 stateless auto-configuration on ptp links in the first place how exactly? Don't get taken in by the people waving an RFC around without actually taking the time to do a little critical thinking on their own first, /64s and auto-configuration just don't belong on router ptp links. And btw only a handful of routers are so poorly designed that they depend on not having subnets longer than /64s when doing IPv6 lookups, and there are many other good reasons why you should just not be using those boxes in the first place. :)
+1 Auto-config has its place, and I don't think core infrastructure is one of them. In our addressing plan, I've allocated /64s for each point-to-point link, but will use /127s in practice. That seemed like the best compromise between throwing /64s at everything and being prepared for the off-chance that something absolutely requires a /64. jms
On Mar 10, 2011, at 11:12 AM, Richard A Steenbergen wrote:
On Thu, Mar 10, 2011 at 10:52:37AM -0800, George Bonser wrote:
What I have done on point to points and small subnets between routers is to simply make static neighbor entries. That eliminates any neighbor table exhaustion causing the desired neighbors to become unreachable. I also do the same with neighbors at public peering points. Yes, that comes at the cost of having to reconfigure the entry if a MAC address changes, but that doesn't happen often.
And this is better than just not trying to implement IPv6 stateless auto-configuration on ptp links in the first place how exactly? Don't get taken in by the people waving an RFC around without actually taking the time to do a little critical thinking on their own first, /64s and auto-configuration just don't belong on router ptp links. And btw only a handful of routers are so poorly designed that they depend on not having subnets longer than /64s when doing IPv6 lookups, and there are many other good reasons why you should just not be using those boxes in the first place. :)
I agree that SLAAC doesn't belong on PTP links, but, I fail to see why having /64s on them is problematic if you take proper precautions. Owen
And this is better than just not trying to implement IPv6 stateless auto-configuration on ptp links in the first place how exactly?
I don't use autoconfiguration. Static configured IPs, static neighbor entries for these types of links.
On Thu, 10 Mar 2011, George Bonser wrote:
And this is better than just not trying to implement IPv6 stateless auto-configuration on ptp links in the first place how exactly?
I don't use autoconfiguration. Static configured IPs, static neighbor entries for these types of links.
Man. It must be annoying to change all those static neighbor entries when the interfaces fail, links must be migrated to another line cards or you replace routers. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
Man. It must be annoying to change all those static neighbor entries when the interfaces fail, links must be migrated to another line cards or you replace routers.
--
Yeah, that happens about once every five years or so and in my particular case there aren't a lot of these point to point links, maybe a dozen. So what works for me wouldn't necessarily be a global truth.
On Thu, Mar 10, 2011 at 1:52 PM, George Bonser <gbonser@seven.com> wrote:
What I have done on point to points and small subnets between routers is to simply make static neighbor entries. That eliminates any neighbor table exhaustion causing the desired neighbors to become unreachable. I also do the same with neighbors at public peering points. Yes, that comes at the cost of having to reconfigure the entry if a MAC address changes, but that doesn't happen often.
I wouldn't bet on the router evicting a maliciously-learned dynamic NDP entry to install a static NDP entry when an interface flaps up, and if it doesn't, I wouldn't bet on that static NDP entry ever being installed until the interface flaps again. Remember, there are several possible attack methods here, one of which is a compromised or badly broken box on a connected LAN. As Richard points out, there is *no* reason to configure /64s on point-to-point links, and there are obvious disadvantages. The "RFC wavers" are downright stupid to suggest otherwise. As for IXP LANs, I predict that one of two things will happen: either one or more major IXPs will be subject to NDP DoS and will decide to shrink their subnet size, allowing others to follow suit; or vendors will make NDP inspection work and be configurable enough to prevent most problems. Again, Cisco has already added a knob to some platforms which allows you to steer the failure mode. Interfaces will fail regardless of what you do; the Cisco knob just lets you decide to break NDP on only the interface(s) subject to attack instead of on the entire box. In any case, I don't judge static NDP entries on IXP LANs to be a practical long-term solution. There are obvious disadvantages to that. If your network is entirely made up of backbone routers with fairly static neighbors, your strategy can certainly work with a bit of extra effort and a vendor box that doesn't do entirely crazy things. If you have customers (those pesky customers!) they may not be so comfortable having to open a ticket and feel like they are troubleshooting a problem you've caused them because you have configured a static NDP entry facing them. If you have hosting customers with servers attached to VLANs, especially in a VPS environment where IP/MAC associations may routinely change ... good luck with those static NDP entries. Obviously, some folks will continue to cite "standards" for something which was developed in 1997 and still isn't really working, or claim their own "fix" works, until they get actual IPv6 customers. Those folks are probably choosing to redesign their access layer in the future, *AFTER* they already have customers. I have been talking to smart people about this problem for nearly ten years, and I have never heard a practical solution that doesn't involve some kind of persistent "sticky NDP" which refuses to make discovery requests to the LAN for addresses which have never been seen from the LAN. I've also never seen a practical idea for preventing malicious hosts on the LAN from filling the table that doesn't involve NDP inspection at the access port, some kind of database (e.g. RADIUS/etc) or additional configuration in the router, or proposals that would simply change the failure mode (e.g. rate-limit knobs.) You'll notice that there have been several discussions about this on NANOG and other mailing lists, most of which include some "RFC wavers," some people saying "the sky is falling," and some guys in-between who think their vendor's box will fail gracefully or that NDP learning not functioning for some or all interfaces is not bad as long as the box doesn't evict busy entries. I suggest all the folks in the middle ask themselves why Cisco added a knob *just to control the failure mode.* This is a real problem, and the current, practical fix is simply to not configure /64. -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts
As Richard points out, there is *no* reason to configure /64s on point-to-point links, and there are obvious disadvantages. The "RFC wavers" are downright stupid to suggest otherwise.
As for IXP LANs, I predict that one of two things will happen: either one or more major IXPs will be subject to NDP DoS and will decide to shrink their subnet size, allowing others to follow suit; or vendors will make NDP inspection work and be configurable enough to prevent most problems. Again, Cisco has already added a knob to some platforms which allows you to steer the failure mode. Interfaces will fail regardless of what you do; the Cisco knob just lets you decide to break NDP on only the interface(s) subject to attack instead of on the entire box. In any case, I don't judge static NDP entries on IXP LANs to be a practical long-term solution. There are obvious disadvantages to that.
And I say making them /127s may not really make any difference. Say you make all of those /127s, at some point you *are* going to have a network someplace that is a /64 that has hosts on it and that one is just as subject to such an attack. If you are a content provider, it doesn't make any difference if they take down the links between your routers or if they take down the link that your content farm is on. The end result is the same. You have managed to re-arrange the deck chairs. Have another squeeze at that water balloon. If you are a service provider where practically all of your links are point to points, sure.
If your network is entirely made up of backbone routers with fairly static neighbors, your strategy can certainly work with a bit of extra effort and a vendor box that doesn't do entirely crazy things.
Where that is done is primarily on a backbone section of the network and where I connect to public peering points. I add static entries for the specific peers I communicate with. Yes, it does take a little maintenance when a peer changes out some gear or moves to a different port on their gear but that doesn't happen all that often and is a compromise I am willing to make in exchange for some added protection. It also protects against someone who I am not peering with on that same switch fat-fingering an IP address during some maintenance and accidently configuring their gear with the same IP as someone I *am* peering with. I won't even see it if I have a static neighbor (the same thing is also done with v4 on public peering switches with static ARP entries, too).
If you have customers (those pesky customers!) they may not be so comfortable having to open a ticket and feel like they are troubleshooting a problem you've caused them because you have configured a static NDP entry facing them.
Right, if I had dozens of point-to-points with gear that is constantly changing at the other end, yeah. I agree. I might then consider that approach. But in this case I can only speak about my own stuff and what is the best solution for one specific application might not be the best solution for someone else. I am not trying to say that this solution is best for everyone, I am simply pointing out a solution that others might find useful depending on their application.
I have been talking to smart people about this problem for nearly ten years, and I have never heard a practical solution that doesn't involve some kind of persistent "sticky NDP" which refuses to make discovery requests to the LAN for addresses which have never been seen from the LAN. I've also never seen a practical idea for preventing malicious hosts on the LAN from filling the table that doesn't involve NDP inspection at the access port, some kind of database (e.g. RADIUS/etc) or additional configuration in the router, or proposals that would simply change the failure mode (e.g. rate-limit knobs.)
Yeah, it's a tough nut to crack. I do agree that 64-bit host addressing is just too big. The reason is that it even allows you to configure more IPs on a single subnet legitimately than the network gear can handle. The notion of "we are going to give you a subnet with 8 bazillion possible addresses, but don't try to use more than half a million of them or you will melt your network gear" seems quite idiotic, actually. So you have a huge address space that you actually can't use much of (relative to the size of the space) and it creates a stability risk. We didn't need much more host addressing, we needed more subnet addressing. I would have settled for 16 bits of host addressing and 112 bits of subnet addressing and I suppose nothing prevents me from doing that except none of the standard IPv6 automatic stuff would work anymore.
You'll notice that there have been several discussions about this on NANOG and other mailing lists, most of which include some "RFC wavers," some people saying "the sky is falling," and some guys in-between who think their vendor's box will fail gracefully or that NDP learning not functioning for some or all interfaces is not bad as long as the box doesn't evict busy entries. I suggest all the folks in the middle ask themselves why Cisco added a knob *just to control the failure mode.* This is a real problem, and the current, practical fix is simply to not configure /64.
And again, are you talking about all the way down to the host subnet level? I suppose I could configure server farms in /112 or even /96 (/96 has some appeal for other reasons mostly having to do with multicast) but then I would wonder how many bugs that would flush out of OS v6 stacks. Something will have to evolve to handle this problem because I agree with you, it is going to be a major issue at some point. It might just be as simple as a host doing what amounts to a gratuitous announcement when an IP is assigned to an interface (and some kind of withdrawal announcement when the address is removed). If a packet arrived from an interface off the host's subnet for an IP address that is not in the neighbor table at all (even stale ones), then maybe the router simply drops the packet. Leave it up to the hosts themselves to keep their information current on the network even when they aren't passing traffic. That doesn't protect against rogue hosts but there might be ways around that, too, or at least limiting the damage a rogue host can cause. So any packet arriving on a router for a local subnet, if that address isn't already in the neighbor table, drop it on the floor or return an "unreachable" or whatever. Something will have to be done at some point ... soon.
On Mar 11, 2011, at 10:51 AM, George Bonser wrote:
If you are a content provider, it doesn't make any difference if they take down the links between your routers or if they take down the link that your content farm is on.
Of course, it does - you may have many content farms/instances, and taking down point-to-point links can DoS your entire set of farms/instances, whereas an attack against a given endpoint access network doesn't necessarily mean that your other properties/networks/services are being attacked, as well. Limiting this vector to endpoint access networks also makes mitigation mechanisms far more practicable. There is no good reason to use /64s on point-to-point links. It is wasteful (please, no more about the supposed infinitude of IPv6 addresses; some of us reject this as being shortsighted and insufficiently visionary concerning eventual one-time-uses of IPv6 addresses at nanoscale) and turns your routers into sinkholes. It is a Very Bad Idea. ;> ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> The basis of optimism is sheer terror. -- Oscar Wilde
=
Of course, it does - you may have many content farms/instances, and taking down point-to-point links can DoS your entire set of farms/instances, whereas an attack against a given endpoint access network doesn't necessarily mean that your other properties/networks/services are being attacked, as well.
And I say taking down 10 such farms is no bigger problem than taking down 10 /64 backbone links. Same challenge. A /64 is a /64, seen one you've seen them all.
There is no good reason to use /64s on point-to-point links. It is wasteful (please, no more about the supposed infinitude of IPv6 addresses; some of us reject this as being shortsighted and insufficiently visionary concerning eventual one-time-uses of IPv6 addresses at nanoscale) and turns your routers into sinkholes. It is a Very Bad Idea.
I wouldn't say it is wasteful so much as it is unnecessary but the difference is that everything is pretty much known to work as expected with a /64 subnet. Anything broken with a /64 is really broken and the vendor would be expected to get right on it. If something breaks while using a /127, the doctor might tell you to stop sticking the spoon in your eye.
On Mar 11, 2011, at 11:34 AM, George Bonser wrote:
And I say taking down 10 such farms is no bigger problem than taking down 10 /64 backbone links.
Yes, but the difference is in routine attacker behavior. And of course, iACLs should be protecting p2p links and loopbacks, irrespective of CIDR length, anyways.
If something breaks while using a /127, the doctor might tell you to stop sticking the spoon in your eye.
If vendors are somehow optimizing for or restricting functionality to certain CIDR lengths, they should stop this immediately. Features and functionality should work the same, irrespective of CIDR length. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> The basis of optimism is sheer terror. -- Oscar Wilde
On Mar 10, 2011, at 8:00 PM, Dobbins, Roland wrote:
On Mar 11, 2011, at 10:51 AM, George Bonser wrote:
If you are a content provider, it doesn't make any difference if they take down the links between your routers or if they take down the link that your content farm is on.
Of course, it does - you may have many content farms/instances, and taking down point-to-point links can DoS your entire set of farms/instances, whereas an attack against a given endpoint access network doesn't necessarily mean that your other properties/networks/services are being attacked, as well.
How is an attack against all your content farms in any way MORE difficult than an attack against enough point to point links to take everything out? If you've designed things properly, it takes more PtoP links to DOS the complete set than it does End point networks.
Limiting this vector to endpoint access networks also makes mitigation mechanisms far more practicable.
It's actually pretty easy to eliminate it 100% from the PtoP links even if they are /64s by simply not allowing traffic to the PtoP addresses other from selected sources (NOC/Admin Network, required peers, etc.). If you want to be truly anal about it, you can also block packets to non-existent addresses on the PtoP links.
There is no good reason to use /64s on point-to-point links. It is wasteful (please, no more about the supposed infinitude of IPv6 addresses; some of us reject this as being shortsighted and insufficiently visionary concerning eventual one-time-uses of IPv6 addresses at nanoscale) and turns your routers into sinkholes. It is a Very Bad Idea.
This isn't a one-time-use of IPv6 addresses and the one-time-uses of IPv6 addresses are what should be considered unscalable and absurdly wasteful. There's a lot to be said for the principle of least surprise and uniform /64s actually help with that quite a bit. Frankly, unless you have parallel links, there isn't a definite need to even number PtoP links for IPv6. Every thing you need to do with an interface specific address on a PtoP link can be done with link local. Owen
On Mar 11, 2011, at 2:02 PM, Owen DeLong wrote:
If you want to be truly anal about it, you can also block packets to non-existent addresses on the PtoP links.
Sure, I advocate iACLs to block traffic to p2p links and loopbacks. Still, it's best not to turn routers into sinkholes in the first place.
This isn't a one-time-use of IPv6 addresses and the one-time-uses of IPv6 addresses are what should be considered unscalable and absurdly wasteful.
I don't know that I agree with this - I can see lots of value in one-time-use addresses/blocks, and have a metaphysical degree of certitude that they'll be used that way in some cases, irrespective of what I think.
There's a lot to be said for the principle of least surprise and uniform /64s actually help with that quite a bit.
Enforcing uniformity of wasteful and potentially harmful addressing practices in the name of consistency isn't necessarily a win, IMHO. ;>
Frankly, unless you have parallel links, there isn't a definite need to even number PtoP links for IPv6. Every thing you need to do with an interface specific address on a PtoP link can be done with link local.
Which is why IP unnumbered caught on so well in IPv4-land, heh? ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> The basis of optimism is sheer terror. -- Oscar Wilde
On Mar 10, 2011, at 11:22 PM, Dobbins, Roland wrote:
On Mar 11, 2011, at 2:02 PM, Owen DeLong wrote:
If you want to be truly anal about it, you can also block packets to non-existent addresses on the PtoP links.
Sure, I advocate iACLs to block traffic to p2p links and loopbacks. Still, it's best not to turn routers into sinkholes in the first place.
This isn't a one-time-use of IPv6 addresses and the one-time-uses of IPv6 addresses are what should be considered unscalable and absurdly wasteful.
I don't know that I agree with this - I can see lots of value in one-time-use addresses/blocks, and have a metaphysical degree of certitude that they'll be used that way in some cases, irrespective of what I think.
If so, opefully from a tiny and limited range. fc::/7 sounds good to me. It has few other useful purposes in life.
There's a lot to be said for the principle of least surprise and uniform /64s actually help with that quite a bit.
Enforcing uniformity of wasteful and potentially harmful addressing practices in the name of consistency isn't necessarily a win, IMHO.
We can agree to disagree. I don't think it's so wasteful and it's what the bits were put there to do. Perverting them to other uses and then complaining that the legitimate uses are getting in the way, OTOH, well...
;>
Frankly, unless you have parallel links, there isn't a definite need to even number PtoP links for IPv6. Every thing you need to do with an interface specific address on a PtoP link can be done with link local.
Which is why IP unnumbered caught on so well in IPv4-land, heh?
There's a HUGE difference between IP unnumbered and link-local. Frankly, absent parallel links, there was a lot to be said for IP unnumbered and I think that if people had better understood the implications of where and when it was a good vs. bad idea and tied it properly to loopbacks instead of $RANDOM_INTERFACE, it might have caught on better. Owen
On Mar 11, 2011, at 2:33 PM, Owen DeLong wrote:
There's a HUGE difference between IP unnumbered and link-local.
In all honesty, at the macro level, I don't see it; if you wouldn't mind elaborating on this, I would certainly find it useful. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> The basis of optimism is sheer terror. -- Oscar Wilde
At 01:33 AM 3/11/2011, Owen DeLong wrote:
On Mar 10, 2011, at 11:22 PM, Dobbins, Roland wrote:
On Mar 11, 2011, at 2:02 PM, Owen DeLong wrote:
Frankly, unless you have parallel links, there isn't a definite need to even number PtoP links for IPv6. Every thing you need to do with an interface specific address on a PtoP link can be done with link local.
Which is why IP unnumbered caught on so well in IPv4-land, heh?
There's a HUGE difference between IP unnumbered and link-local.
Frankly, absent parallel links, there was a lot to be said for IP unnumbered and I think that if people had better understood the implications of where and when it was a good vs. bad idea and tied it properly to loopbacks instead of $RANDOM_INTERFACE, it might have caught on better.
Owen
Is anyone else considering only using link local for their PtoP links? I realized while deploying our IPv6 infrastructure that OSPFv3 uses the link-local address in the routing table and than the global address, so if I want to have a routing table which makes sense, I need to statically assign a global address AND the link-local address. Then I realized, why even assign a global in the first place? Traceroutes replies end up using the loopback. BGP will use loopbacks. So is there any obvious harm in this approach that I'm missing? -James
On Fri, Mar 11, 2011 at 1:55 PM, James Stahr <stahr@mailbag.com> wrote:
Is anyone else considering only using link local for their PtoP links? I realized while deploying our IPv6 infrastructure that OSPFv3 uses the link-local address in the routing table and than the global address, so if I want to have a routing table which makes sense, I need to statically assign a global address AND the link-local address. Then I realized, why even assign a global in the first place? Traceroutes replies end up using the loopback. BGP will use loopbacks. So is there any obvious harm in this approach that I'm missing?
For now I have allocated /64s per p-t-p, but I'm doing "ipv6 unnumbered loopback0" I quite like how the core route table looks. It also lets me avoid "The Point to Point Wars" :-) Maybe there will be a good reason to go back and slap globals on there, but I've not been convinced yet. -- Tim:>
On Fri, Mar 11, 2011 at 12:55:33PM -0600, James Stahr wrote:
link-local address. Then I realized, why even assign a global in the first place? Traceroutes replies end up using the loopback. BGP will use loopbacks. So is there any obvious harm in this approach that I'm missing?
Traceroute replies most assuredly do NOT use loopbacks on most networks, and it would make troubleshooting massively more difficult if this was the only option. Imagine any kind of complex network where there is more than one link between a pair of routers (and don't just picture your own internal network, but imagine customers connecting to their ISPs as well) , and now tell me how you plan on identifying a particular link with a traceroute. The two words that best sum this up would be "epic disaster". -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
On Thu, Mar 10, 2011 at 10:51 PM, George Bonser <gbonser@seven.com> wrote:
And I say making them /127s may not really make any difference. Say you make all of those /127s, at some point you *are* going to have a network someplace that is a /64 that has hosts on it and that one is just as subject to such an attack. If you are a content provider, it doesn't make any difference if they take down the links between your routers or if they take down the link that your content farm is on. The end result is the same. You have managed to re-arrange the deck chairs. Have another squeeze at that water balloon.
Again, this is the argument put forth by the "RFC wavers," that you can't solve the problem because you must want to configure /64s for ... what, exactly? Oh, right, SLAAC. More on that below. If I'm a content provider, I don't have to configure a /64 for my content farm. I can configure a /120 or whatever subnet size is practical for my environment. I can also use link-local addressing on my content farm LANs and route subnets to my content boxes, if that is somehow more practical than using a smaller subnet.
If you are a service provider where practically all of your links are point to points, sure.
No, you can avoid configuring /64s if you don't need SLAAC. Who needs SLAAC? I don't. It has absolutely no place in any of my environments. It seems to me that DHCPv6 will do everything which SLAAC does, and everything SLAAC forgot about. The "complexity" argument is pretty much indefensible when the trade-off is configuring DHCPv6 vs turning a bunch of router knobs and hoping no one ever targets your LANs with an NDP DoS.
We didn't need much more host addressing, we needed more subnet addressing. I would have settled for 16 bits of host addressing and 112 bits of subnet addressing and I suppose nothing prevents me from doing that except none of the standard IPv6 automatic stuff would work anymore.
None of that "standard IPv6 automatic stuff" works today, anyway. The state of IPv6 support on end-user CPE generally ranges from non-existent to untested to verified-to-be-broken. This is the only place in your network where /64 can offer any value, and currently, CPE is just not there. Even the latest Cisco/Linksys CPE does not support IPv6. Sure, that'll change; but what won't change is the total lack of any basis for configuring /64 LANs for "content farms" or any similar non-end-user, non-dynamic segments. I don't want 16 bits of host addressing. I want to choose an appropriate size for each subnet. Why? Because exactly zero of my access routers can handle 2**16 NDP entries, let alone 2**16 entries on multiple interfaces/VLANs. I would like to see much larger ARP/NDP tables in layer-3 switches, and I think vendors will deliver on that, but I doubt we'll soon see even a 10x increase from typical table sizes of today. VPS farms are already pushing the envelope with IPv4, where customers are already used to conserving addresses. Guess what, customers may still have to conserve addresses with IPv6, not because the numbers themselves are precious, but because the number of ARP/NDP entries in the top-of-rack or distribution switch is finite.
And again, are you talking about all the way down to the host subnet level? I suppose I could configure server farms in /112 or even /96 (/96 has some appeal for other reasons mostly having to do with multicast) but then I would wonder how many bugs that would flush out of OS v6 stacks.
I'm not getting reports of problems with smaller-than-/64 subnets from customers yet. Am I confident that I never will? No, absolutely not! Like almost everyone else, I have some customers who have configured IPv6, but the amount of production traffic on it remains trivial. That is why I allocate a /64 but provision a /120 (or similar practical size.) I can grow the subnet if I have to. I do know that /64 LANs will cause me DoS problems, so I choose to work around that problem now. If /120 LANs cause me OS headaches in the future, I have the option to revise my choice with minimal effort (no services get renumbered, only the subnet must grow.) Why would you suggest /96 as being more practical than /64 from the perspective of NDP DoS? Again, this is an example of the "in-between" folks in these arguments, who seem not to fully understand the problem. Your routers do not have room for 2**(128-96) NDP entries. Typical access switches today have room for a few thousand to perhaps 16k, and typical "bigger" switches are specifying figures below 100k. This doesn't approach the 4.3M addresses in a /96. In short, suggesting /96 is flat out stupid -- you get "the worst of both worlds," potential for OS compatibility issues, AND guaranteed NDP DoS vulnerability.
passing traffic. That doesn't protect against rogue hosts but there might be ways around that, too, or at least limiting the damage a rogue host can cause.
How do you suggest we limit the damage a rogue host can cause? A lot of people would like to hear your idea. Again, in nearly ten years of discussing this with colleagues, I have not seen any idea which is more practical than configuring a /120 instead of a /64. I have not seen any idea, period, which doesn't involve configuring the IPs which are allowed to be used on the LAN, either on the access switch port (NDP inspection), the access router, or in a database (like RADIUS.) This kind of configuration complexity is exactly what SLAAC hoped to avoid. Unfortunately, the folks who thought that up did so at a time when most access routers were still routing in CPU and main memory, not ASIC. It's important to understand how the underlying technology has changed in the past 15+ years -- if you don't, you must think, "man, those IPv6 standards guys were idiots." They weren't idiots, but they didn't know how routing and switching hardware would evolve, certainly they did not know how long it would take IPv6 to be deployed (it still isn't, effectively), and they probably didn't consider the potential future where DDoS is rampant and virtually unchecked to be a good reason not to craft SLAAC into the standards. They were also coming out of an era where CIDR/VLSM was still kinda new, and I believe there were *zero* boxes at that time which could "route" at wire speed, vs many boxes that would switch at wire speed, thus there were perceived advantages to having fewer, bigger subnets with no requirement for VLSM. Remember when there was no such thing as a layer-3 switch, and installing an RSM in your Cat5500 to get a little bit of routing capability was the greatest thing since sliced bread? This is where IPv6 came from, and it is why we have these design problems today -- the standards people are ideologically married to ideas that made perfect sense in the mid-90s, but they forgot why those ideas made sense back then and don't understand why this is not practical today. I'm glad SLAAC is an option, but that's all it is, an option. /64 LANs must also be considered optional, and should be considered useful only when SLAAC is desired.
Something will have to be done at some point ... soon.
I'm glad more people are coming around to this point of view. Cisco certainly is there. -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts
Jeff Wheeler wrote:
I'm glad SLAAC is an option, but that's all it is, an option. /64 LANs must also be considered optional, and should be considered useful only when SLAAC is desired.
That also could be optional, automatic host configuration does not actually require 64 bits, unless there is a naive assumption that DAD need not occur. Which it must always and for many reasons. rfc3927 does not require 64 bits and works sufficiently well wherever it is employed. SLAAC should be redesigned to be configurable to work with however many bits are available to it and it should be a standard feature to turn that knob all the way from on - off with 128 bit stops in between. And now DHCP-PD can start at any point in the connected hierarchy, working with whatever amount of space is available and not requiring complete upstream support. I dont accept that IPv6 is set in stone. IPv4 wasnt/isnt set in stone and people were/are actually using it because they depend on it. Joe
On Fri, 11 Mar 2011 09:38:12 EST, Joe Maimon said:
rfc3927 does not require 64 bits and works sufficiently well wherever it is employed. SLAAC should be redesigned to be configurable to work with however many bits are available to it and it should be a standard feature to turn that knob all the way from on - off with 128 bit stops in between.
Feel free to explain how SLAAC should work on a /96 with 32 bits of host address (or any amount smaller than the 48 bits most MAC addresses provide). Remember in your answer to deal with collisions. It's one thing to say "it should be redesigned". It's another matter entirely to actually come up with a scheme that doesn't suck even harder than "screw it, it's a /64".
Valdis.Kletnieks@vt.edu wrote:
On Fri, 11 Mar 2011 09:38:12 EST, Joe Maimon said:
rfc3927 does not require 64 bits and works sufficiently well wherever it is employed. SLAAC should be redesigned to be configurable to work with however many bits are available to it and it should be a standard feature to turn that knob all the way from on - off with 128 bit stops in between.
Feel free to explain how SLAAC should work on a /96 with 32 bits of host address (or any amount smaller than the 48 bits most MAC addresses provide). Remember in your answer to deal with collisions.
Is there something fundamentally wrong with rfc3927?
It's one thing to say "it should be redesigned". It's another matter entirely to actually come up with a scheme that doesn't suck even harder than "screw it, it's a /64".
I dont have to, its already been done. In ipv4. Joe
In a message written on Fri, Mar 11, 2011 at 01:07:15PM -0500, Valdis.Kletnieks@vt.edu wrote:
On Fri, 11 Mar 2011 09:38:12 EST, Joe Maimon said:
rfc3927 does not require 64 bits and works sufficiently well wherever it is employed. SLAAC should be redesigned to be configurable to work with however many bits are available to it and it should be a standard feature to turn that knob all the way from on - off with 128 bit stops in between.
Feel free to explain how SLAAC should work on a /96 with 32 bits of host address (or any amount smaller than the 48 bits most MAC addresses provide). Remember in your answer to deal with collisions.
Well, I at least think an option should be a /80, using the 48 bits of MAC directly. This generates exactly the same collision potential as today we have with a /64 and an EUI-64 constructed from an EUI-48 ethernet address. The router is already sending RA's for SLAAC to work, sending along one of a well-known set of masks would be a relatively minor modification. That said, ND has built into it DAD - Duplicate Address Detection. There is already an expectation that there will be collisions, and the protocols to detect them are already in place. I see little to no reason you couldn't use a different length subnet (like the /96 in your example), randomly select an address and do DAD to see if it is in use. Indeed, this is pretty much how AppleTalk back in the day worked (with a 16 bit number space). The probability of collision is pretty low, and the penalty/recovery (picking a new address and trying again) is rather quick and cheap. If a service provider is going to end up giving me a /64 at home (I know, a whole different argument) I'd vastly prefer to use /80 or /96 subnets with either of these methods, and still be able to subnet the space. I suspect if /64's are given out one or both will come to be "standard". -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
In a message written on Fri, Mar 11, 2011 at 10:58:09AM -0800, Leo Bicknell wrote:
That said, ND has built into it DAD - Duplicate Address Detection. There is already an expectation that there will be collisions, and the protocols to detect them are already in place. I see little to no reason you couldn't use a different length subnet (like the /96 in your example), randomly select an address and do DAD to see if it is in use. Indeed, this is pretty much how AppleTalk back in the day worked (with a 16 bit number space).
Three people have now mailed me privately saying that DAD does not provide a way to select a second address if your first choice is not in use. What I am basically suggesting is to use the same method in RFC 3041 as the default way to get an address. That is the protocol is already defined and deployed, it's just today it's used for a second (third, fourth, ...) address. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
Leo Bicknell wrote:
Three people have now mailed me privately saying that DAD does not provide a way to select a second address if your first choice is not in use.
So fix that as well while we are at it, how bout it? Its code, not stone.
On 03/11/2011 04:05 PM, Joe Maimon wrote:
Leo Bicknell wrote:
Three people have now mailed me privately saying that DAD does not provide a way to select a second address if your first choice is not in use.
So fix that as well while we are at it, how bout it? Its code, not stone.
So it is simple, use privacy (3041) addresses scaled appropriately for the appropriate netmask... remember the last D in DAD is detection; what you do about it is in another protocol. -- Pete
On Mar 11, 2011, at 10:58 AM, Leo Bicknell wrote:
In a message written on Fri, Mar 11, 2011 at 01:07:15PM -0500, Valdis.Kletnieks@vt.edu wrote:
On Fri, 11 Mar 2011 09:38:12 EST, Joe Maimon said:
rfc3927 does not require 64 bits and works sufficiently well wherever it is employed. SLAAC should be redesigned to be configurable to work with however many bits are available to it and it should be a standard feature to turn that knob all the way from on - off with 128 bit stops in between.
Feel free to explain how SLAAC should work on a /96 with 32 bits of host address (or any amount smaller than the 48 bits most MAC addresses provide). Remember in your answer to deal with collisions.
Well, I at least think an option should be a /80, using the 48 bits of MAC directly. This generates exactly the same collision potential as today we have with a /64 and an EUI-64 constructed from an EUI-48 ethernet address. The router is already sending RA's for SLAAC to work, sending along one of a well-known set of masks would be a relatively minor modification.
How would you use that on a Firewire netowrk or FDDI or any of the other media that uses 64-bit MAC addresses?
That said, ND has built into it DAD - Duplicate Address Detection. There is already an expectation that there will be collisions, and the protocols to detect them are already in place. I see little to no reason you couldn't use a different length subnet (like the /96 in your example), randomly select an address and do DAD to see if it is in use. Indeed, this is pretty much how AppleTalk back in the day worked (with a 16 bit number space).
Detect, yes. Mitigate, no. DAD on the link-local results in Interface shutdown. In an environment where there's a very low probability of collision, that's an acceptable risk that is easily mitigated in most cases. In an environment where you create a much higher risk of collision, such as 1/2^32 or less, vs. 1/2^48 or more, I think that's a rather ill advised approach.
The probability of collision is pretty low, and the penalty/recovery (picking a new address and trying again) is rather quick and cheap.
IPv6 does not try to pick a new address and try again in SLAAC, at least not what it's supposed to do.
If a service provider is going to end up giving me a /64 at home (I know, a whole different argument) I'd vastly prefer to use /80 or /96 subnets with either of these methods, and still be able to subnet the space. I suspect if /64's are given out one or both will come to be "standard".
If a service provider attempts to give ma a /64 at home, I'd opt for a new provider instead. Owen
In a message written on Fri, Mar 11, 2011 at 04:13:13PM -0800, Owen DeLong wrote:
On Mar 11, 2011, at 10:58 AM, Leo Bicknell wrote:
Well, I at least think an option should be a /80, using the 48 bits of MAC directly. This generates exactly the same collision potential as today we have with a /64 and an EUI-64 constructed from an EUI-48 ethernet address. The router is already sending RA's for SLAAC to work, sending along one of a well-known set of masks would be a relatively minor modification.
How would you use that on a Firewire netowrk or FDDI or any of the other media that uses 64-bit MAC addresses?
It wouldn't. I'm not proposing a solution for everything, just a useful case for some things. I don't want to change say, RIR policy that you can allocate a /64, just allow operators to use /80's, or /96's in a more useful way if they find that useful. Basically I think the IETF and IPv6 propoents went a bit too far down the "one size fits all" route. It has nothing to do with how many numbers may or may not be used, but everything to do with the fact that you often have to fit inside what's been given to you. If you're stuck with a monopoly provider who gives you a /64 to your cable modem there should be easy options to split it up and get some subnets. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
Leo Bicknell wrote:
In a message written on Fri, Mar 11, 2011 at 04:13:13PM -0800, Owen DeLong wrote:
On Mar 11, 2011, at 10:58 AM, Leo Bicknell wrote:
Well, I at least think an option should be a /80, using the 48 bits of MAC directly. This generates exactly the same collision potential as today we have with a /64 and an EUI-64 constructed from an EUI-48 ethernet address. The router is already sending RA's for SLAAC to work, sending along one of a well-known set of masks would be a relatively minor modification.
How would you use that on a Firewire netowrk or FDDI or any of the other media that uses 64-bit MAC addresses?
It wouldn't.
Yes it would. It works for any size subnet that can fit both the RA node and the auto configuring one, from /0 - /127. And it is even backwards compatible. 1) Listen to RA, discover subnet size and whether to perform autoconfiguration, for backwards compatibility, assume /64 size if not included in RA. 2a) Generate address using phy bits, right aligned up to the lesser of subnet/phy size. Use 1fff as leftmost host bits if the subnet size is 64 and the phy is 48. 2b) Use any other algorithm that may be more desirable, such as one that helps preserves privacy and that contains /dev/random as one of its inputs. The randomness can be optionally initially confined to the subnet bits that exceed the phy bits, if any. 3) Perform DAD 4a) Collision, goto 2b, remembering the previous values and avoiding them. Retry 2a and forget all avoided values when they total up to (subnet size ** 2) - 3. 4b) No collision, happy surfing. 5) RA values change/expire, goto 1 Why is the ability to blindly embed the phy L2 into an auto-configured L3 address considered a prerequisite, let alone a universally good idea?
I'm not proposing a solution for everything, just a useful case for some things. I don't want to change say, RIR policy that you can allocate a /64, just allow operators to use /80's, or /96's in a more useful way if they find that useful.
Basically I think the IETF and IPv6 propoents went a bit too far down the "one size fits all" route. It has nothing to do with how many numbers may or may not be used, but everything to do with the fact that you often have to fit inside what's been given to you. If you're stuck with a monopoly provider who gives you a /64 to your cable modem there should be easy options to split it up and get some subnets.
Leaving scarcity behind should not mean kicking flexibility to the curb as well. Just because SLAAC may work best with /64 should not mean that it must only work with a /64. And failing with an unconfigurable stack when DAD detects a collision means that SLAAC is not a guaranteed safe general use option, contrasted with DHCP and the possibility of conflict detection and reaction. Using bad design choices as justification for requiring additional ones simply means that SLAAC is broken as designed. It also means attempts to fix it are going to run up against entrenched opposition. Which is readily apparent. DHCPv6 needs to be fixed with address and router options and then all DHCPv6 servers/helpers should be configurable to disable all RA on a segment by way of beaconing their own poison-reverse RA. Joe
Nice article relating to the original subject of the post. I didn't see if it had be previously posted. http://ccie-in-3-months.blogspot.com/2011/03/trying-to-calculate-ipv6-bgp-ta... -Hammer- "I was a normal American nerd." -Jack Herer On Sat, Mar 12, 2011 at 9:13 PM, Joe Maimon <jmaimon@ttec.com> wrote:
Leo Bicknell wrote:
In a message written on Fri, Mar 11, 2011 at 04:13:13PM -0800, Owen DeLong wrote:
On Mar 11, 2011, at 10:58 AM, Leo Bicknell wrote:
Well, I at least think an option should be a /80, using the 48 bits of MAC directly. This generates exactly the same collision potential as today we have with a /64 and an EUI-64 constructed from an EUI-48 ethernet address. The router is already sending RA's for SLAAC to work, sending along one of a well-known set of masks would be a relatively minor modification.
How would you use that on a Firewire netowrk or FDDI or any of the other media that uses 64-bit MAC addresses?
It wouldn't.
Yes it would. It works for any size subnet that can fit both the RA node and the auto configuring one, from /0 - /127. And it is even backwards compatible.
1)
Listen to RA, discover subnet size and whether to perform autoconfiguration, for backwards compatibility, assume /64 size if not included in RA.
2a)
Generate address using phy bits, right aligned up to the lesser of subnet/phy size. Use 1fff as leftmost host bits if the subnet size is 64 and the phy is 48.
2b)
Use any other algorithm that may be more desirable, such as one that helps preserves privacy and that contains /dev/random as one of its inputs. The randomness can be optionally initially confined to the subnet bits that exceed the phy bits, if any.
3)
Perform DAD
4a)
Collision, goto 2b, remembering the previous values and avoiding them. Retry 2a and forget all avoided values when they total up to (subnet size ** 2) - 3.
4b)
No collision, happy surfing.
5)
RA values change/expire, goto 1
Why is the ability to blindly embed the phy L2 into an auto-configured L3 address considered a prerequisite, let alone a universally good idea?
I'm not proposing a solution for everything, just a useful case for some things. I don't want to change say, RIR policy that you can allocate a /64, just allow operators to use /80's, or /96's in a more useful way if they find that useful.
Basically I think the IETF and IPv6 propoents went a bit too far down the "one size fits all" route. It has nothing to do with how many numbers may or may not be used, but everything to do with the fact that you often have to fit inside what's been given to you. If you're stuck with a monopoly provider who gives you a /64 to your cable modem there should be easy options to split it up and get some subnets.
Leaving scarcity behind should not mean kicking flexibility to the curb as well.
Just because SLAAC may work best with /64 should not mean that it must only work with a /64.
And failing with an unconfigurable stack when DAD detects a collision means that SLAAC is not a guaranteed safe general use option, contrasted with DHCP and the possibility of conflict detection and reaction.
Using bad design choices as justification for requiring additional ones simply means that SLAAC is broken as designed. It also means attempts to fix it are going to run up against entrenched opposition. Which is readily apparent.
DHCPv6 needs to be fixed with address and router options and then all DHCPv6 servers/helpers should be configurable to disable all RA on a segment by way of beaconing their own poison-reverse RA.
Joe
On Fri, Mar 11, 2011 at 1:07 PM, <Valdis.Kletnieks@vt.edu> wrote:
Feel free to explain how SLAAC should work on a /96 with 32 bits of host address (or any amount smaller than the 48 bits most MAC addresses provide). Remember in your answer to deal with collisions.
Why should SLAAC dictate the size of *every subnet* on the IPv6 Internet? This is what people who I label "IPv6 Fundamentalists" wish to do. They refuse to admit that their ideas were conceived in the mid-90s, that technology has advanced a great deal since that time, and ARP/NDP is a real limit now, while VLSM is no longer a tough challenge (vendors have had a couple decades to make it work really well!) I think there are a lot of people who throw around the SLAAC argument like it's actually good for something. Do these people know what SLAAC does? For core networks, it doesn't do anything. For hosting/datacenter networks and cluster/VPS environments, again, it doesn't do anything. Zero benefit. You probably don't configure these things using DHCP today. Wait, you do? Oh, it's a good thing we've got DHCPv6, which clearly can run alongside your DHCP for IPv4. Is SLAAC for end-user access networks? Not so much. See recent discussions on this list about things which are not included in SLAAC that DHCPv6 does do today. SLAAC can provide an advantage if you can live without those things, but that advantage is limited to one thing: the subnet doesn't need a DHCPv6 server (or proxy/forwarding of packets to same.) IPv4 has gotten along just fine for a long time with both full-featured and light-weight DHCP servers, and statically configured subnets. Is SLAAC solving any problem? Sure, for some situations, like SOHO networks, it's a nice option, but it's just that, an option. It isn't needed. Is SLAAC for fully peer-to-peer networks, with no central gateway? No. To function, SLAAC requires an RA message from something that decides it is a router. It isn't going to facilitate a headless, ad-hoc network to support the next revolution with peer-to-peer cell phones. So what we know is that the sole arguments from "IPv6 Fundamentalists" in favor of /64 LANs are * VLSM is hard (it isn't; vendors are really good at it now, otherwise IPv4 wouldn't work) * SLAAC needs it to work (not all LANs need SLAAC) * it's the standard (more on this below) I believe everything except the "it's the standard" argument is fully and completely debunked. If anyone disagrees with me, feel free to correct me, or argue your point until you are blue in the face. I have often been reminded that I should have been more vocal about this matter 10+ years ago, but frankly, I thought vendors, large ISPs, veterans with more public credibility than myself, or the standards folks themselves, would have straightened this out a long time ago. If you can decide for yourself that VLSM is easy and you trust your vendor to do it right (if you don't, summarize to /64 towards your core, or do one great thing IPv6 allows us to do, and summarize to *even shorter* prefixes towards your core, and carry fewer routes in core) then you are half-way there. If you realize SLAAC isn't a tool for your VPS farm or on your backbone link-nets, you're all the way there. At this point, you can deploy your IPv6 without it being broken by design. The only thing broken here is the "Fundamentalists," who are stuck in a mid-1990s mindset. These guys need to get out of the way, because they are impeding deployment (for those smart enough to recognize this problem) and they are creating an almost certain need for a future re-design (for those who aren't smart.) This "future" doesn't depend on anything except v6 actually getting deployed enough to where DDoS happens over it at any appreciable scale. In the current state of the Internet, it is certain that this problem will happen. No visible progress has been made on solving it, except by guys like myself who are happy to cry "the sky is falling," configure our networks in a "non-standard" way, and tell the standards folks they are wrong. The Cisco knob is "progress" only in that Cisco recognizes customers are concerned about this problem and allow them to steer their failure mode. If the DDoS happens before vendors provide a real solution, or before standards are revised or thrown out, you can thank those of us on the "sky is falling" side of this argument for testing the work-around (by never having exposed ourselves to the problem in the first place.)
It's one thing to say "it should be redesigned". It's another matter entirely to actually come up with a scheme that doesn't suck even harder than "screw it, it's a /64".
This is true. I think the price of energy is continuing to rise and our future is very uncertain as a result. I don't know how to fix it. Does that mean I should keep my opinion to myself? Of course not. Recognizing a problem is the first step on the path to a solution. I imagine the same arguments taking place before VLSM and again before CIDR, which pre-dates my entry into this field by a few years. I don't think most people in our field today understand why the IPv6 standard ended up the way it did. They haven't been around long enough to understand the context of IPv6 being developed in what was truly a different era. I don't have a better idea for SLAAC, in large part because I do not care about SLAAC. I don't think SLAAC should dictate the size of *every subnet.* I do think SLAAC should be an *option* available to those who want it on a given LAN. I've got three feasible "fixes" to the NDP flooding problem. One is dead simple: don't configure /64 LANs. How hard is that? It's a lot easier than the alternatives. -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts
On Fri, 11 Mar 2011, Jeff Wheeler wrote:
I've got three feasible "fixes" to the NDP flooding problem. One is dead simple: don't configure /64 LANs. How hard is that? It's a lot easier than the alternatives.
What problem are you trying to solve by not having every subnet being a /64 ? -- Mikael Abrahamsson email: swmike@swm.pp.se
On 11/03/2011 19:37, Mikael Abrahamsson wrote:
What problem are you trying to solve by not having every subnet being a /64 ?
nd cache exhaustion? Personally, I'm rather concerned about this, and you should be too, given the overlapping spheres of our interests. Particularly as there is no DAI equivalent for ipv6 on any switch vendor roadmap that i've seen. Nick
On Mar 11, 2011, at 11:22, Jeff Wheeler wrote:
I think there are a lot of people who throw around the SLAAC argument like it's actually good for something. Do these people know what SLAAC does? For core networks, it doesn't do anything. For hosting/datacenter networks and cluster/VPS environments, again, it doesn't do anything. Zero benefit.
Doesn't SLAAC give you automatic "MAC address to IP" mapping? It'll save you manually doing that (in an otherwise well controlled environment). - ask
On 14 Mar 2011, at 23:30, Ask Bjørn Hansen <ask@develooper.com> wrote:
Doesn't SLAAC give you automatic "MAC address to IP" mapping? It'll save you manually doing that (in an otherwise well controlled environment).
No, it doesn't. On some systems, the mac address is used to create the ipv6 address, but not on others (e.g. windows 7). Nick
On Mar 14, 2011, at 16:38, Nick Hilliard wrote:
Doesn't SLAAC give you automatic "MAC address to IP" mapping? It'll save you manually doing that (in an otherwise well controlled environment).
No, it doesn't. On some systems, the mac address is used to create the ipv6 address, but not on others (e.g. windows 7).
Sorry, I made the mail a bit too short I supposed. "Well controlled environment" in my case is a bunch of relatively homogeneous linux server systems ("plain" hardware and virtualized), all managed by the same team. - ask -- Ask Bjørn Hansen, http://askask.com/
On Mar 11, 2011, at 5:53 AM, Jeff Wheeler wrote:
On Thu, Mar 10, 2011 at 10:51 PM, George Bonser <gbonser@seven.com> wrote:
And I say making them /127s may not really make any difference. Say you make all of those /127s, at some point you *are* going to have a network someplace that is a /64 that has hosts on it and that one is just as subject to such an attack. If you are a content provider, it doesn't make any difference if they take down the links between your routers or if they take down the link that your content farm is on. The end result is the same. You have managed to re-arrange the deck chairs. Have another squeeze at that water balloon.
Again, this is the argument put forth by the "RFC wavers," that you can't solve the problem because you must want to configure /64s for ... what, exactly? Oh, right, SLAAC. More on that below.
If I'm a content provider, I don't have to configure a /64 for my content farm. I can configure a /120 or whatever subnet size is practical for my environment. I can also use link-local addressing on my content farm LANs and route subnets to my content boxes, if that is somehow more practical than using a smaller subnet.
Yes, you can bring as much of the pain from IPv4 forward into IPv6 as you like. You can also commit many other acts of masochism. Personally, I prefer to approach IPv6 as a way to reduce some of the more painful aspects of IPv4, such as undersized subnets, having to renumber or add prefixes for growth, limited aggregation, NAT, and more.
If you are a service provider where practically all of your links are point to points, sure.
No, you can avoid configuring /64s if you don't need SLAAC. Who needs SLAAC? I don't. It has absolutely no place in any of my environments. It seems to me that DHCPv6 will do everything which SLAAC does, and everything SLAAC forgot about. The "complexity" argument is pretty much indefensible when the trade-off is configuring DHCPv6 vs turning a bunch of router knobs and hoping no one ever targets your LANs with an NDP DoS.
SLAAC is a very useful and convenient way to deal with client networks. I would agree it's of limited use in a content provider scenario, but, there is utility to /64s beyond just SLAAC. Yes, they are a hard requirement for SLAAC.
We didn't need much more host addressing, we needed more subnet addressing. I would have settled for 16 bits of host addressing and 112 bits of subnet addressing and I suppose nothing prevents me from doing that except none of the standard IPv6 automatic stuff would work anymore.
None of that "standard IPv6 automatic stuff" works today, anyway. The state of IPv6 support on end-user CPE generally ranges from non-existent to untested to verified-to-be-broken. This is the only place in your network where /64 can offer any value, and currently, CPE is just not there. Even the latest Cisco/Linksys CPE does not support IPv6. Sure, that'll change; but what won't change is the total lack of any basis for configuring /64 LANs for "content farms" or any similar non-end-user, non-dynamic segments.
As someone using SLAAC in a number of environments, I'm confused by this statement. It seems to be working quite well in many places and end-user residential networks are certainly not the only places where it is useful. Yes, residential end-user CPE is rather limited and somewhat less than ideal today. I would argue that there are probably at least as many end-user hosts on non-residential networks that could take advantage of SLAAC if the administrators wanted to.
I don't want 16 bits of host addressing. I want to choose an appropriate size for each subnet. Why? Because exactly zero of my access routers can handle 2**16 NDP entries, let alone 2**16 entries on multiple interfaces/VLANs. I would like to see much larger ARP/NDP tables in layer-3 switches, and I think vendors will deliver on that, but I doubt we'll soon see even a 10x increase from typical table sizes of today. VPS farms are already pushing the envelope with IPv4, where customers are already used to conserving addresses. Guess what, customers may still have to conserve addresses with IPv6, not because the numbers themselves are precious, but because the number of ARP/NDP entries in the top-of-rack or distribution switch is finite.
What do your access routers have to do with your content farm? Sounds like you've got some pretty darn small access routers as well if they can't handle 64k NDP entries. Yes, larger tables in switches would be a good thing, but, I hardly think that's a reason to use smaller netmasks. Most of the top-of-rack switches I'm aware of have no problem doing at least 64k NDP/ARP entries. Many won't do more than that, but, most will go at least that far.
And again, are you talking about all the way down to the host subnet level? I suppose I could configure server farms in /112 or even /96 (/96 has some appeal for other reasons mostly having to do with multicast) but then I would wonder how many bugs that would flush out of OS v6 stacks.
I'm not getting reports of problems with smaller-than-/64 subnets from customers yet. Am I confident that I never will? No, absolutely not! Like almost everyone else, I have some customers who have configured IPv6, but the amount of production traffic on it remains trivial. That is why I allocate a /64 but provision a /120 (or similar practical size.) I can grow the subnet if I have to. I do know that /64 LANs will cause me DoS problems, so I choose to work around that problem now. If /120 LANs cause me OS headaches in the future, I have the option to revise my choice with minimal effort (no services get renumbered, only the subnet must grow.)
How many customers are using smaller-than-/64 subnets to do much of anything yet? I find it interesting that you _KNOW_ that /64 LANs will cause you DoS problems and yet we've been running them for years without incident. I believe growing the subnet still requires you to touch every machine unless they're getting all their configuration from DHCP. Again, sounds like an exercise in unnecessary masochism to me.
Why would you suggest /96 as being more practical than /64 from the perspective of NDP DoS? Again, this is an example of the "in-between" folks in these arguments, who seem not to fully understand the problem. Your routers do not have room for 2**(128-96) NDP entries. Typical access switches today have room for a few thousand to perhaps 16k, and typical "bigger" switches are specifying figures below 100k. This doesn't approach the 4.3M addresses in a /96. In short, suggesting /96 is flat out stupid -- you get "the worst of both worlds," potential for OS compatibility issues, AND guaranteed NDP DoS vulnerability.
Yeah, in-between makes little sense. Kind of worst of both worlds. On that we can at least agree. As to your /96, that's 4.3B, not 4.3M. (at least last I looked, 2^32 was 4.3B and 4.3M was approximately a /114).
passing traffic. That doesn't protect against rogue hosts but there might be ways around that, too, or at least limiting the damage a rogue host can cause.
How do you suggest we limit the damage a rogue host can cause? A lot of people would like to hear your idea. Again, in nearly ten years of discussing this with colleagues, I have not seen any idea which is more practical than configuring a /120 instead of a /64. I have not seen any idea, period, which doesn't involve configuring the IPs which are allowed to be used on the LAN, either on the access switch port (NDP inspection), the access router, or in a database (like RADIUS.)
There are several things that could eventually be implemented in the access switch software. Techniques like rapidly timing out unanswered NDP requests, not storing ND entries for SLAAC MAC-based suffixes (after all, the information you need is already in the IP address, just use that). Not storing ND entries for things that don't have an entry in the MAC forwarding table (pass the first ND packet and if you get a response, create the ND entry at that time), etc. Yes, these all involve a certain amount of changing some expected behaviors, but, those changes could probably be easily accommodated in most environments. Finally, the bottom line is that a rogue host behind your firewall is probably going to cause other forms of damage well before it runs you out of ND entries and any time you have such a thing, it's going to be pretty vital to identify and remove it as fast as possible anyway.
I'm glad SLAAC is an option, but that's all it is, an option. /64 LANs must also be considered optional, and should be considered useful only when SLAAC is desired.
They are entirely optional, but, IMHO, avoiding them at all costs such as you seem to be suggesting is unnecessarily painful in most environments.
Something will have to be done at some point ... soon.
I'm glad more people are coming around to this point of view. Cisco certainly is there.
I'd settle for Cisco coming to the point of having RA guard universally available on all switch products. That, to me, is a much more pressing issue than this imagined ND exhaustion attack which, in reality, requires near DDOS levels of traffic for most networks to actually run the ND table meaningfully into overflow. Owen
On Fri, Mar 11, 2011 at 6:33 PM, Owen DeLong <owen@delong.com> wrote:
Yes, you can bring as much of the pain from IPv4 forward into IPv6 as you like. You can also commit many other acts of masochism.
This is the problem with "Fundamentalists," such as yourself, Owen. You think that "fixing" things which work fine (like reasonable-sized VLSM LANs for content farms) is worth introducing a DDoS vulnerability for which there is no current defense, and for which the only feasible defense is either reversing your choice and renumbering the subnet from /64 to /smaller, or waiting until your vendors supply you with patched images for your routers and/or switches. You need to move beyond this myopic view that /64 provides a benefit that is worth this kind of operational sacrifice. When vendors cough up some more knobs, I'll be right there with you, configuring /64 subnets. I've already allocated them! It's pretty easy for me to renumber my /120 subnets to /64, after all -- I don't have to update any zone files for public-facing services, or modify significant configuration for software -- I just have to reconfigure my router and host interfaces from /120 to /64. You, on the other hand, may have addresses in use all over that /64, and condensing them into a smaller subnet is guaranteed to be at least as hard as my work for growing my subnet, and may be much more difficult -- every bit as difficult as renumbering from one IPv4 block to another. Given the current state of IPv6, your "Fundamentalist" way introduces new problems *and* brings the old ones forward. This makes no sense, but Fundamentalists rarely do.
Personally, I prefer to approach IPv6 as a way to reduce some of the more painful aspects of IPv4, such as undersized subnets, having to renumber or add prefixes for growth, limited aggregation, NAT, and more.
I look forward to that when it works. As I've noticed, I have prepared to take advantage of those things as soon as the NDP issue is resolved.
None of that "standard IPv6 automatic stuff" works today, anyway. The state of IPv6 support on end-user CPE generally ranges from As someone using SLAAC in a number of environments, I'm confused by this statement. It seems to be working quite well in many places and end-user residential networks are certainly not the only places where it is useful.
Your definition of "working quite well in many places" is different than mine. I'll come around to your point of view when it is possible to get working IPv6 connectivity from most major end-user ISPs, and all (or close enough) the CPE being sold at Fry's and Best Buy works right. We are pretty far from that right now. This is another thing the "IPv6 Fundamentalists" seem to ignore. CPE support is almost non-existent, ISP support is not there (some tier-1 transit networks still have no IPv6 product!), and the major IXPs still have three orders of magnitude more IPv4 traffic than IPv6. Cogent, Level3, and Hurricane Electric still can't decide that it's in their mutual interest to exchange IPv6 traffic with each-other, and their customers don't care enough to go to another service provider, because IPv6 is largely unimportant to them. None of this stuff "works" today. You aren't seeing DDoS scenarios on the v6 network today because the largest IPv4 DDoS attacks are larger than the total volume of inter-domain IPv6 traffic.
Most of the top-of-rack switches I'm aware of have no problem doing at least 64k NDP/ARP entries. Many won't do more than that, but, most will go at least that far.
Owen, this statement is either: 1) a gross misunderstanding on your part, because you can't or don't read spec sheets, let alone test gear 2) you've never seen or used a top-of-rack switch or considered buying one long enough to examine the specs 3) your racks are about 3 feet taller than everyone else's and you blow 100k on switching for every few dozen servers 4) an outright lie, although not an atypical one for the "IPv6 Fundamentalist" crowd I'd like you to clarify which of these is the case. Please list some switches which fit your definition of "top-of-rack switch" that support 64k NDP entries. Then list how many "top-of-rack" switches you are currently aware of. Don't bother listing the ones you know don't support 64k, because I'll gladly provide a list of plenty more of those, than the number of switches which you find to support 64k in a ToR form-factor. For those following along at home, how many ToR switches do indeed support at least 64k NDP entries? Unlike Owen, I know the answer to this question: Zero. There are no ToR switches that support >= 64k NDP table entries. Of course, I don't really mean to call Owen a liar, or foolish, or anything else. I do mean to point out that his "facts" are wrong and his argument not based in the world of reality. He is a "Fundamentalist," and is part of the problem, not the solution.
I find it interesting that you _KNOW_ that /64 LANs will cause you DoS problems and yet we've been running them for years without incident.
That's because I understand how packet forwarding to access LANs actually works. You don't. Again, the biggest DDoS attacks today dwarf the whole volume of inter-domain IPv6 traffic. *Routine* IPv4 attacks are greater than the peak IPv6 traffic at any IXP. IPv6 hasn't seen any real DDoS yet. It will probably happen soon.
There are several things that could eventually be implemented in the access switch software. Techniques like rapidly timing out unanswered NDP requests, not storing ND entries for SLAAC MAC-based suffixes (after all, the information you need is already in the IP address, just use that). Not storing ND entries for things that don't have an entry in the MAC forwarding table (pass the first ND packet and if you get a response, create the ND entry at that time), etc.
I am glad you have given this some thought. The things you mention above are not bad, but they don't fix the problem. There are several practical solutions available which require pretty straight-forward router/switch knobs. Vendors will *eventually* deliver these knobs. Probably not before IPv6 is deployed enough that we see real DDoS, though; and if the most popular fix becomes dependent on NDP inspection ... you can forget about benefiting from that fix if you still have 10-year-old access switches. I do have 10-year-old access switches, and older. I'm not upgrading them specifically because vendors aren't offering the needed knobs to solve this problem. I want budgetary resources to be available to me when that time comes. To Cisco/Foundry/Juniper/et al: I've been waiting a real long time to upgrade these old beasts, and whichever of you gets me a fix I consider practical first, is very likely to be the vendor that gets to sell me > 1000 new ToR switches. Unless Cisco feels like back-porting a fix to my older platforms, which I see as unlikely, I am quite prepared to replace 10 - 15 year old switches when NDP flooding fix is among the benefits I receive. I really hope my 0 to 5 year old switches all get back-ported fixes, or I'll be pretty displeased.
Yes, these all involve a certain amount of changing some expected behaviors, but, those changes could probably be easily accommodated in most environments.
This can be fixed without changing *any* behavior at all from the host's perspective. We just need the knobs. To get that, we need people like you to stop telling people this isn't a problem, and to start telling your vendors that "the sky is falling," and asking for some specific fix that you think is practical, or a fix in general, if you don't think you have a truly practical idea.
Finally, the bottom line is that a rogue host behind your firewall is probably going to cause other forms of damage well before it runs you out of ND entries and any time you have such a thing, it's going to be pretty vital to identify and remove it as fast as possible anyway.
I've seen this argument before, too. We all have. It doesn't hold water. Once again, you "Fundamentalists" think that if there is any case where a fix might not be helpful, or you can distract attention from this issue to one of host security, you try to do so. Let me give you the case I care about: Script kiddie hacks one server in a multi-use hosting datacenter, which is served by a layer-3 switch aggregating hundreds of customers. Script kiddie decides to DoS someone from his newly-hacked server, and uses random source addresses within the configured /64. Maybe he intends to DoS the upstream aggregation switch, or maybe it doesn't even occur to him. Either way, my NDP table immediately becomes full (even with only a few hundred PPS.) Are any other customers affected? Yes, potentially all the customers on this layer-3 switch are affected. Definitely all the customers on this VLAN/subnet are affected, even with the Cisco knob (which is better than all VLANs/subnets breaking.) Now replace the "some script kiddie" scenario with something that's simply misconfigured or buggy. You don't even have to be compromised.
I'm glad SLAAC is an option, but that's all it is, an option. /64 LANs must also be considered optional, and should be considered useful They are entirely optional, but, IMHO, avoiding them at all costs such as you seem to be suggesting is unnecessarily painful in most environments.
"At all costs?" Again, more "IPv6 Fundamentalist" talk. What is the cost of configuring a /120 instead of a /64 on my LAN, if I already know I don't want SLAAC on this LAN? None. Might the subnet need to grow? I'll grant that, but it's a minimal cost compared to a DoS vulnerability which can be exploited trivially.
I'd settle for Cisco coming to the point of having RA guard universally available on all switch products. That, to me, is a much more pressing issue than this imagined ND exhaustion attack which, in reality, requires near DDOS levels of traffic for most networks to actually run the ND table meaningfully into overflow.
First, I agree, we need more knobs on all switching ports, period. Vendors are not delivering, and I'm not buying any more access switches than I absolutely have to. I have been putting off upgrades for *years* because of these things. Anytime clients ask me, "should we replace these old switches? They work but.. they're pretty old!" I say "no, wait until X." This is X. Second, ND exhaustion attack is hardly imagined. Go do it on a box. Any box, pick one. They all break, period. The failure mode differs somewhat from one vendor to the next (when entries are evicted, etc.) but *every router* breaks today, period. Third, I don't know what you mean by "near DDOS levels of traffic," but I already know that you are unfamiliar with common NDP table sizes. 4k - 8k is actually a pretty common range of supported NDP entries for modern layer-3 ToR switches, and I'm talking about 1-year-old, 40x10GbE switches from reputable vendors. As you might imagine, it only takes a few thousand packets to fill this up, and aging timers these days get up to several *hours* before entries are evicted. One packet per second is plenty. One PPS! Some DDoS, huh? If this sounds like a "magic packet" issue to some, remember, it's not. This is not "ping of death" or "winnuke," and it's not smurf. It's the same thing that happens if you toss a /8 on an IPv4 LAN and start banging away at the ARP table, while expecting all of your legitimate hosts within that /8 to continue working correctly. We all know that's crazy, right? How is it suddenly less crazy to put an even larger subnet on an IPv6 LAN without gaining any direct benefits from doing so? Remember, many LANs don't need SLAAC. The VPS farm sure doesn't. The router point-to-point doesn't. Any person who would tell you to configure a /64 for those LANs is an "IPv6 Fundamentalist." -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts
On Mar 12, 2011, at 11:14 AM, Jeff Wheeler wrote:
Of course, I don't really mean to call Owen a liar, or foolish, or anything else.
Please don't; even though I disagree with him and agree with you very strongly on this set of issues, Owen is a smart and straightforward guy, and is simply speaking from his (selective on this particular set of topics, IMHO) own individual viewpoint. ;>
and if the most popular fix becomes dependent on NDP inspection
If that comes to pass, then the fix will be useless, unfortunately, just as dynamic ARP inspection (DAI) is useless today; it self-DoSes the box. Any form of 'inspection' will not scale for this problem, as it will be CPU-bound even on ASIC-based platforms. All this ICMPv6 weirdness and outright brokenness is the Achilles' heel of IPv6, and I see no ready solution in sight for the set of problems it engenders. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> The basis of optimism is sheer terror. -- Oscar Wilde
On Fri, Mar 11, 2011 at 8:14 PM, Jeff Wheeler <jsw@inconcepts.biz> wrote:
It's the same thing that happens if you toss a /8 on an IPv4 LAN and start banging away at the ARP table, while expecting all of your legitimate hosts within that /8 to continue working correctly. We all know that's crazy, right?
This is a valid concern. However...
How is it suddenly less crazy to put an even larger subnet on an IPv6 LAN without gaining any direct benefits from doing so? [...]
This is not a valid statement. I understand that you don't value the benefits we find with /64 or less, but we find value there, and it's really important to us, and they're things which were explicitly hoped for and planned for with IPv6 transition. The problem you pointed out, with a single host overrunning switch tables, can be outsmarted rather than brute forced by mandating small enough subnets that it doesn't exist. If we presume that the originating host doesn't fake its' layer 2 MAC as it's faking its layer 3 address, it's pretty trivial; you build in a software option that puts a maximum number of IPs per MAC. You balance virtualization cluster size limits with preemptive defense against this type of DOS when you do that, but balance points around 1E2 to 1E3 seem to me to be able to handle that just fine. You build in an override for switches / L2 gateways, or by port, or whatever other tuning mechanisms make sense (default to 10, override for your VMware cluster box and your switches...). If the originating host does try to fake its layer 2 MAC, you can detect new floods of new MACs via existing mechanisms. Plenty of port MAC map / allowed MAC mechanisms already exist for basic LAN security purposes. You just dump the fake MACs on the floor. The world is not perfect, and I'm sure there are still new vulnerabilities out there. But we can smart this one. If we can't smart this one, I'll be extremely surprised and disappointed. -- -george william herbert george.herbert@gmail.com
participants (29)
-
Ask Bjørn Hansen
-
Bret Palsson
-
Chris Enger
-
Chris Woodfield
-
Dobbins, Roland
-
George Bonser
-
George Herbert
-
Hammer
-
Jack Carrozzo
-
James Stahr
-
Jeff Wheeler
-
Joe Maimon
-
Julien Goodwin
-
Justin M. Streiner
-
Leo Bicknell
-
Mikael Abrahamsson
-
Mike Walter
-
Nick Hilliard
-
Owen DeLong
-
Pekka Savola
-
Pete Carah
-
Randy Carpenter
-
Richard A Steenbergen
-
sthaug@nethelp.no
-
Tim Durack
-
Tim Jackson
-
tsison@gmail.com
-
Ulf Zimmermann
-
Valdis.Kletnieks@vt.edu