Does anyone have suggestions on good books to really get a thorough understanding of v6, subnetting, security practices, etc. Or a few books. Just turned up dual stack with our peers and a test network but I'd like to be a lot more comfortable with it before looking at our customer network. Thanks, David
On Jun 5, 2012, at 9:29 PM, David Hubbard wrote:
security practices
<http://www.ciscopress.com/bookstore/product.asp?isbn=1587055945> <http://www.ciscopress.com/bookstore/product.asp?isbn=1587053365> ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton
I believe that Silvia Hagan's book [1] is still the primary reference available, but there are others reviewed here: http://getipv6.info/index.php/Book_Reviews. Cheers, ~Chris PS - Shameless plug: If you're running Juniper, I wrote two books for them that you can get for free [2][3]. And I have an intro to IPv6 done in four parts on my blog as well (read from the bottom up) [4]. [1] - http://shop.oreilly.com/product/9780596100582.do [2] - http://chrisgrundemann.com/index.php/2010/day-exploring-ipv6/ [3] - http://chrisgrundemann.com/index.php/2011/day-advanced-ipv6-configuration/ [4] - http://chrisgrundemann.com/index.php/category/ipv6/introducing-ipv6/ On Tue, Jun 5, 2012 at 8:33 AM, Dobbins, Roland <rdobbins@arbor.net> wrote:
On Jun 5, 2012, at 9:29 PM, David Hubbard wrote:
security practices
<http://www.ciscopress.com/bookstore/product.asp?isbn=1587055945>
<http://www.ciscopress.com/bookstore/product.asp?isbn=1587053365>
----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com>
Luck is the residue of opportunity and design.
-- John Milton
-- @ChrisGrundemann http://chrisgrundemann.com
Op 5-6-2012 16:29, David Hubbard schreef:
Does anyone have suggestions on good books to really get a thorough understanding of v6, subnetting, security practices, etc. Or a few books. Just turned up dual stack with our peers and a test network but I'd like to be a lot more comfortable with it before looking at our customer network.
I liked the O'reilly IPv6 essentials. I've read a few chapters when I needed it. Cheers, Seth
http://long.ccaba.upc.es/long/070Related_Activities/020Documents/IPv6_An_Int... worth going through certification................ ________________________________ From: Seth Mos <seth.mos@dds.nl> To: nanog@nanog.org Sent: Tuesday, June 5, 2012 3:45 PM Subject: Re: ipv6 book recommendations? Op 5-6-2012 16:29, David Hubbard schreef:
Does anyone have suggestions on good books to really get a thorough understanding of v6, subnetting, security practices, etc. Or a few books. Just turned up dual stack with our peers and a test network but I'd like to be a lot more comfortable with it before looking at our customer network.
I liked the O'reilly IPv6 essentials. I've read a few chapters when I needed it. Cheers, Seth
Shameless plug: Certification wise, the IPv6 Sage certification at Hurricane Electric (http://www.tunnelbroker.net) uses a practical step-by-step approach where you actually have to deploy IPv6 and make it work to progress through the steps. Owen On Jun 5, 2012, at 10:07 AM, isabel dias wrote:
http://long.ccaba.upc.es/long/070Related_Activities/020Documents/IPv6_An_Int...
worth going through certification................
________________________________ From: Seth Mos <seth.mos@dds.nl> To: nanog@nanog.org Sent: Tuesday, June 5, 2012 3:45 PM Subject: Re: ipv6 book recommendations?
Op 5-6-2012 16:29, David Hubbard schreef:
Does anyone have suggestions on good books to really get a thorough understanding of v6, subnetting, security practices, etc. Or a few books. Just turned up dual stack with our peers and a test network but I'd like to be a lot more comfortable with it before looking at our customer network.
I liked the O'reilly IPv6 essentials. I've read a few chapters when I needed it.
Cheers,
Seth
And you get a t-shirt at the end! That was enough motivation for me, anyway :) -- Adam Kennedy Network Engineer Omnicity, Inc. From: Owen DeLong <owen@delong.com<mailto:owen@delong.com>> To: isabel dias <isabeldias1@yahoo.com<mailto:isabeldias1@yahoo.com>> Cc: "nanog@nanog.org<mailto:nanog@nanog.org>" <nanog@nanog.org<mailto:nanog@nanog.org>> Subject: Re: ipv6 book recommendations? Shameless plug: Certification wise, the IPv6 Sage certification at Hurricane Electric (http://www.tunnelbroker.net) uses a practical step-by-step approach where you actually have to deploy IPv6 and make it work to progress through the steps. Owen On Jun 5, 2012, at 10:07 AM, isabel dias wrote: http://long.ccaba.upc.es/long/070Related_Activities/020Documents/IPv6_An_Int... worth going through certification................ ________________________________ From: Seth Mos <seth.mos@dds.nl<mailto:seth.mos@dds.nl>> To: nanog@nanog.org<mailto:nanog@nanog.org> Sent: Tuesday, June 5, 2012 3:45 PM Subject: Re: ipv6 book recommendations? Op 5-6-2012 16:29, David Hubbard schreef: Does anyone have suggestions on good books to really get a thorough understanding of v6, subnetting, security practices, etc. Or a few books. Just turned up dual stack with our peers and a test network but I'd like to be a lot more comfortable with it before looking at our customer network. I liked the O'reilly IPv6 essentials. I've read a few chapters when I needed it. Cheers, Seth
On Tue, Jun 5, 2012 at 7:29 AM, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Does anyone have suggestions on good books to really get a thorough understanding of v6, subnetting, security practices, etc. Or a few books. Just turned up dual stack with our peers and a test network but I'd like to be a lot more comfortable with it before looking at our customer network.
Network Warrior. Sounds a bit silly since it's a bit of an overview of lots of different things, however it's chapters on IPV6 get right to the point and helped clear up a lot of things for me. -B
On 6/5/12, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Does anyone have suggestions on good books to really get a thorough understanding of v6, subnetting, security practices, etc. Or a few books. Just turned up dual stack with our peers and a test network but I'd like to be a lot more comfortable with it before looking at our customer network.
Hi David, Instead of going the book route, I'd suggest getting some tunneled addresses from he.net and then working through http://ipv6.he.net/certification/ . They have the basics pretty well covered, it's interactive and it's free. Some additional thoughts: 1. Anybody who tells you that there are security best practices for IPv6 is full of it. It simply hasn't seen enough use in the environment to which we're now deploying it and rudimentary technologies widely used in IPv4 (e.g. NAT/PAT to private address space) haven't yet made their transition. 2. Subnetting in v6 in a nutshell: a. If it's a LAN, /64. Always. Stateless autoconfiguration (SLAAC) only works for /64. b. Delegations on 4-bit boundaries for reverse-DNS convenience. c. If it's a point to point, a reasonable practice seems to be a /64 per network area and around /124 per link. Works OK for ethernet point to points too. d. Default customer assignments should be /56 or /48 depending on who you ask. /48 was the IETF's original plan. Few of your customers appear to use tens of LANS, let alone thousands. Maybe that will change but the motivations driving such a thing seem a bit pie in the sky. /56 let's the customer implement more than one LAN (e.g. wired and wireless) but burns through your address space much more slowly. /60 would do that too but nobody seems to be using it. /64 allows only one LAN, so avoid it. e. "sparse allocation" if you feel like it. The jury is still out on whether this is a good idea. Basically, instead of assigning address blocks linearly, you divide your largest free space in half and stick the new assignment right in the middle. Good news: if the assignment later needs to grow your can probably just change the subnet mask, keeping the number of entries in the routing table the same. Bad news: fragments the heck out of your address space so when you actually need a large address block for something, you don't have it. Trying to keep non-dynamic assignments in local or regional aggregable blocks works about as well as it did in IPv4, which is to say poorly. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On Jun 5, 2012, at 2:23 PM, William Herrin wrote:
On 6/5/12, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Does anyone have suggestions on good books to really get a thorough understanding of v6, subnetting, security practices, etc. Or a few books. Just turned up dual stack with our peers and a test network but I'd like to be a lot more comfortable with it before looking at our customer network.
Hi David,
Instead of going the book route, I'd suggest getting some tunneled addresses from he.net and then working through http://ipv6.he.net/certification/ .
They have the basics pretty well covered, it's interactive and it's free.
Some additional thoughts:
1. Anybody who tells you that there are security best practices for IPv6 is full of it. It simply hasn't seen enough use in the environment to which we're now deploying it and rudimentary technologies widely used in IPv4 (e.g. NAT/PAT to private address space) haven't yet made their transition.
Not quite. I will say that the security BCPs are not mature and are evolving, but that does not mean that they do not yet exist.
2. Subnetting in v6 in a nutshell:
a. If it's a LAN, /64. Always. Stateless autoconfiguration (SLAAC) only works for /64.
b. Delegations on 4-bit boundaries for reverse-DNS convenience.
c. If it's a point to point, a reasonable practice seems to be a /64 per network area and around /124 per link. Works OK for ethernet point to points too.
/64 is perfectly reasonable per point to point as well.
d. Default customer assignments should be /56 or /48 depending on who you ask. /48 was the IETF's original plan. Few of your customers appear to use tens of LANS, let alone thousands. Maybe that will change but the motivations driving such a thing seem a bit pie in the sky. /56 let's the customer implement more than one LAN (e.g. wired and wireless) but burns through your address space much more slowly. /60 would do that too but nobody seems to be using it. /64 allows only one LAN, so avoid it.
Planning your IPv6 deployment based on today's network needs is folly. Deploying /48s will help future-proof your network and pave the way for some very interesting innovations in the home networking space.
e. "sparse allocation" if you feel like it. The jury is still out on whether this is a good idea. Basically, instead of assigning address blocks linearly, you divide your largest free space in half and stick the new assignment right in the middle. Good news: if the assignment later needs to grow your can probably just change the subnet mask, keeping the number of entries in the routing table the same. Bad news: fragments the heck out of your address space so when you actually need a large address block for something, you don't have it.
Since you should be doing this mostly at the 4-12 bits to the right of your base allocation and the policy is structured such that you should, in most cases, be able to assign same-sized chunks everywhere at this level, that really shouldn't be an issue. Lower in the hierarchy, it's a judgement call on which optimization fits better on a case-by-case basis. Generally, the higher up the hierarchy, the more likely that allocation by bisection (there are other forms of sparse allocation as well) is ideal. In some cases, sparse allocation by reservation, for example, can reduce fragmentation while still providing substantial room for likely growth.
Trying to keep non-dynamic assignments in local or regional aggregable blocks works about as well as it did in IPv4, which is to say poorly.
If you apply for a large enough IPv6 block, this should be less of an issue. That was hard to do under previous policy regimes, but the current ISP allocation policy should make it pretty easy to optimize for this. Certainly, if you have suggestions for how policy can better support this, I am open to improvements at any time. Owen
On Jun 5, 2012, at 2:23 PM, William Herrin wrote:
2. Subnetting in v6 in a nutshell:
FWIW - There is a published BCOP on IPv6 subnetting: http://www.ipbcop.org/ratified-bcops/bcop-ipv6-subnetting/ Cheers, ~Chris -- @ChrisGrundemann http://chrisgrundemann.com
On Jun 5, 2012, at 3:15 PM, Chris Grundemann wrote:
On Jun 5, 2012, at 2:23 PM, William Herrin wrote:
2. Subnetting in v6 in a nutshell:
FWIW - There is a published BCOP on IPv6 subnetting: http://www.ipbcop.org/ratified-bcops/bcop-ipv6-subnetting/
Unfortunately, this BCOP recommends /56s for residential which is potentially harmful. I'm also not a fan of the /126 or /127 on point-to-points, but, the theoretical issues of neighbor table exhaustion attacks, etc. certainly should not be ignored entirely. Owen
On Tue, Jun 5, 2012 at 4:29 PM, Owen DeLong <owen@delong.com> wrote:
On Jun 5, 2012, at 3:15 PM, Chris Grundemann wrote:
On Jun 5, 2012, at 2:23 PM, William Herrin wrote:
2. Subnetting in v6 in a nutshell:
FWIW - There is a published BCOP on IPv6 subnetting: http://www.ipbcop.org/ratified-bcops/bcop-ipv6-subnetting/
Unfortunately, this BCOP recommends /56s for residential which is potentially harmful.
While it does use /56 as an example (mainly because most of the operators I have spoken to say that is as big as they'll go and many are shooting for less) but it does NOT make that a recommendation, from the BCOP: "This is an example for demonstrative purposes only. Individual operators will need to determine their own prefix size preference for serving customers (internal or external). The SMEs of this BCOP highly recommend a /48 for any site that requires more than one subnet and that a site be defined as an individual customer in residential networks."
I'm also not a fan of the /126 or /127 on point-to-points, but, the theoretical issues of neighbor table exhaustion attacks, etc. certainly should not be ignored entirely.
Agreed, they must be considered. Cheers, ~Chris
Owen
-- @ChrisGrundemann http://chrisgrundemann.com
On 6/5/12, Owen DeLong <owen@delong.com> wrote:
On Jun 5, 2012, at 2:23 PM, William Herrin wrote:
c. If it's a point to point, a reasonable practice seems to be a /64 per network area and around /124 per link. Works OK for ethernet point to points too.
/64 is perfectly reasonable per point to point as well.
Hi Owen, Sure, but with the neighbor discovery cache issues that come up with /64's under attack, why open yourself to trouble where you can't realize any benefit? Regards, Bill -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On Jun 5, 2012, at 3:23 PM, William Herrin wrote:
On 6/5/12, Owen DeLong <owen@delong.com> wrote:
On Jun 5, 2012, at 2:23 PM, William Herrin wrote:
c. If it's a point to point, a reasonable practice seems to be a /64 per network area and around /124 per link. Works OK for ethernet point to points too.
/64 is perfectly reasonable per point to point as well.
Hi Owen,
Sure, but with the neighbor discovery cache issues that come up with /64's under attack, why open yourself to trouble where you can't realize any benefit?
Why permit external traffic aimed at your point to point links at all? No external traffic, no attack surface. Owen
On Jun 5, 2012, at 3:23 PM, William Herrin wrote:
On 6/5/12, Owen DeLong <owen@delong.com> wrote:
On Jun 5, 2012, at 2:23 PM, William Herrin wrote:
c. If it's a point to point, a reasonable practice seems to be a /64 per network area and around /124 per link. Works OK for ethernet point to points too.
/64 is perfectly reasonable per point to point as well.
Hi Owen,
Sure, but with the neighbor discovery cache issues that come up with /64's under attack, why open yourself to trouble where you can't realize any benefit?
It makes little sense to me to permit people outside your network to deliver packets to your point to point interfaces. Denying this traffic at your borders/edges eliminates all of the attacks without having to juggle inconsistent prefix sizes or do silly bit-math to figure out which address is at the other end of the link. Owen
Apologies for the double post... Mistakenly hit send instead of cancel on the first one. Owen On Jun 5, 2012, at 3:32 PM, Owen DeLong wrote:
On Jun 5, 2012, at 3:23 PM, William Herrin wrote:
On 6/5/12, Owen DeLong <owen@delong.com> wrote:
On Jun 5, 2012, at 2:23 PM, William Herrin wrote:
c. If it's a point to point, a reasonable practice seems to be a /64 per network area and around /124 per link. Works OK for ethernet point to points too.
/64 is perfectly reasonable per point to point as well.
Hi Owen,
Sure, but with the neighbor discovery cache issues that come up with /64's under attack, why open yourself to trouble where you can't realize any benefit?
It makes little sense to me to permit people outside your network to deliver packets to your point to point interfaces. Denying this traffic at your borders/edges eliminates all of the attacks without having to juggle inconsistent prefix sizes or do silly bit-math to figure out which address is at the other end of the link.
Owen
Sure, but with the neighbor discovery cache issues that come up with
/64's under attack, why open yourself to trouble where you can't realize any benefit?
I happen to be a fan of /126s, but if you chose to use a /64, presumably your infrastructure ACLs would provide protection against such attacks.
Op 5-6-2012 23:23, William Herrin schreef:
On 6/5/12, David Hubbard<dhubbard@dino.hostasaurus.com> wrote: Hi David,
Instead of going the book route, I'd suggest getting some tunneled addresses from he.net and then working through http://ipv6.he.net/certification/ .
They have the basics pretty well covered, it's interactive and it's free. +1 it's one of the best ways to learn. Do.
Some additional thoughts:
1. Anybody who tells you that there are security best practices for IPv6 is full of it. It simply hasn't seen enough use in the environment to which we're now deploying it and rudimentary technologies widely used in IPv4 (e.g. NAT/PAT to private address space) haven't yet made their transition. Well, not quite, but firewall rules work just the same as before. Use those. The longer version is that some people used from internet to any rules on their wan which in a IPv4 NAT really translated to allow everything to my external address. Unless you used 1:1 ofcourse, but I digress.
d. Default customer assignments should be /56 or /48 depending on who you ask. /48 was the IETF's original plan. Few of your customers appear to use tens of LANS, let alone thousands. Maybe that will change but the motivations driving such a thing seem a bit pie in the sky. /56 let's the customer implement more than one LAN (e.g. wired and wireless) but burns through your address space much more slowly. /60 would do that too but nobody seems to be using it. /64 allows only one LAN, so avoid it. You seem to miss a semi important thing here. Daisy chaining of routers in the premises. Some routers (pfSense included) allow for setting up prefix delegation,
In IPv6 such a rule really means anything internal. People that have administered firewalls that route public addresses will know exactly what I mean. this means that you can connect routers behind the one you have and still have native v6. Although the automatic setup system I wrote for this works with /56 networks it will only setup PD for /64 networks at this point. I allocate a part of the assigned /56 network for prefix delegation automatically. If the PD is /48 I can delegate /56 networks to the subrouters, which on their turn can delegate /64 networks to another sub router. It's not that the user itself will actually assign all those networks, but routers will do automatically and you need proper route aggregation. It's unlikely that all networks will be directly assinged as /64 networks either, it could also be multiple routers. Even if it was done manually I'd assign a /60 route out of a /56 PD. The notion that it will always be a /64 is... well. Regards, Seth
One more (free) book: http://www.ipv6tf.org/index.php?page=news/newsroom&id=8281 (available in several languages) ********************************************** IPv4 is over Are you ready for the new Internet ? http://www.consulintel.es The IPv6 Company This electronic message contains information which may be privileged or confidential. The information is intended to be for the use of the individual(s) named above. If you are not the intended recipient be aware that any disclosure, copying, distribution or use of the contents of this information, including attached files, is prohibited.
On Jun 5, 2012, at 5:23 PM, William Herrin wrote:
On 6/5/12, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Does anyone have suggestions on good books to really get a thorough understanding of v6, subnetting, security practices, etc. Or a few books. Just turned up dual stack with our peers and a test network but I'd like to be a lot more comfortable with it before looking at our customer network.
Hi David,
Instead of going the book route, I'd suggest getting some tunneled addresses from he.net and then working through http://ipv6.he.net/certification/ .
They have the basics pretty well covered, it's interactive and it's free.
Some additional thoughts:
1. Anybody who tells you that there are security best practices for IPv6 is full of it. It simply hasn't seen enough use in the environment to which we're now deploying it and rudimentary technologies widely used in IPv4 (e.g. NAT/PAT to private address space) haven't yet made their transition.
2. Subnetting in v6 in a nutshell:
a. If it's a LAN, /64. Always. Stateless autoconfiguration (SLAAC) only works for /64.
b. Delegations on 4-bit boundaries for reverse-DNS convenience.
c. If it's a point to point, a reasonable practice seems to be a /64 per network area and around /124 per link. Works OK for ethernet point to points too.
d. Default customer assignments should be /56 or /48 depending on who you ask. /48 was the IETF's original plan. Few of your customers appear to use tens of LANS, let alone thousands. Maybe that will change but the motivations driving such a thing seem a bit pie in the sky. /56 let's the customer implement more than one LAN (e.g. wired and wireless) but burns through your address space much more slowly. /60 would do that too but nobody seems to be using it. /64 allows only one LAN, so avoid it.
e. "sparse allocation" if you feel like it. The jury is still out on whether this is a good idea. Basically, instead of assigning address blocks linearly, you divide your largest free space in half and stick the new assignment right in the middle. Good news: if the assignment later needs to grow your can probably just change the subnet mask, keeping the number of entries in the routing table the same. Bad news: fragments the heck out of your address space so when you actually need a large address block for something, you don't have it.
Trying to keep non-dynamic assignments in local or regional aggregable blocks works about as well as it did in IPv4, which is to say poorly.
Regards, Bill Herrin
-- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
Bill's additional comments about subnetting are a concise and accurate view. They also show and overlooked benefit of IPv6 over IPv4 -- For address planning, it is no longer necessary to count individual end points, rather only the subnets must be counted. This reduces labor in planning, assigning, and tracking addresses. James R. Cutler james.cutler@consultant.com
On 6 June 2012 14:12, Cutler James R <james.cutler@consultant.com> wrote:
On Jun 5, 2012, at 5:23 PM, William Herrin wrote:
On 6/5/12, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Does anyone have suggestions on good books to really get a thorough understanding of v6, subnetting, security practices, etc. Or a few books. Just turned up dual stack with our peers and a test network but I'd like to be a lot more comfortable with it before looking at our customer network.
Hi David,
Instead of going the book route, I'd suggest getting some tunneled addresses from he.net and then working through http://ipv6.he.net/certification/ .
They have the basics pretty well covered, it's interactive and it's free.
Some additional thoughts:
1. Anybody who tells you that there are security best practices for IPv6 is full of it. It simply hasn't seen enough use in the environment to which we're now deploying it and rudimentary technologies widely used in IPv4 (e.g. NAT/PAT to private address space) haven't yet made their transition.
2. Subnetting in v6 in a nutshell:
a. If it's a LAN, /64. Always. Stateless autoconfiguration (SLAAC) only works for /64.
b. Delegations on 4-bit boundaries for reverse-DNS convenience.
c. If it's a point to point, a reasonable practice seems to be a /64 per network area and around /124 per link. Works OK for ethernet point to points too.
d. Default customer assignments should be /56 or /48 depending on who you ask. /48 was the IETF's original plan. Few of your customers appear to use tens of LANS, let alone thousands. Maybe that will change but the motivations driving such a thing seem a bit pie in the sky. /56 let's the customer implement more than one LAN (e.g. wired and wireless) but burns through your address space much more slowly. /60 would do that too but nobody seems to be using it. /64 allows only one LAN, so avoid it.
e. "sparse allocation" if you feel like it. The jury is still out on whether this is a good idea. Basically, instead of assigning address blocks linearly, you divide your largest free space in half and stick the new assignment right in the middle. Good news: if the assignment later needs to grow your can probably just change the subnet mask, keeping the number of entries in the routing table the same. Bad news: fragments the heck out of your address space so when you actually need a large address block for something, you don't have it.
Trying to keep non-dynamic assignments in local or regional aggregable blocks works about as well as it did in IPv4, which is to say poorly.
Regards, Bill Herrin
-- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
Bill's additional comments about subnetting are a concise and accurate view. They also show and overlooked benefit of IPv6 over IPv4 -- For address planning, it is no longer necessary to count individual end points, rather only the subnets must be counted. This reduces labor in planning, assigning, and tracking addresses.
James R. Cutler james.cutler@consultant.com
Hi all, Potentially silly question but, as Bill points out a LAN always occupies a /64. Does this imply that we would have large L2 segments with a large number of hosts on them? What about the age old discussion about keeping broadcast segments small? Or, will it be that a /64 will only typically have a similar number of hosts in it as say, a /23|4 in the IPv4 world? Cheers, Anton
Anton Smith <anton@huge.geek.nz> a écrit sur 06/06/2012 09:53:02 AM :
Potentially silly question but, as Bill points out a LAN always occupies a /64.
Does this imply that we would have large L2 segments with a large number of hosts on them? What about the age old discussion about keeping broadcast segments small?
The /64 only removes the limitation on the number of *addresses* on the L2 domain. Limitations still apply for the amount of ARP and ND noise. A maximum number of hosts is reached when that noise floor represents a significant portion of the link bandwidth. If ARP/ND proxying is used, the limiting factor may instead be the CPU on the gateway. The ND noise generated is arguably higher than ARP because of DAD, but I don't remember seeing actual numbers on this (anybody?). I've seen links with up to 15k devices where ARP represented a significant part of the link usage, but most weren't (yet) IPv6. /JF
Does anyone know the reason /64 was proposed as the size for all L2 domains? I've looked for this answer before, never found a good one. I thought I read there are some L2 technologies that use a 64 bit hardware address, might have been Bluetooth. Guaranteeing that ALL possible hosts could live together in the same L2 domain seems like overkill, even for this group. /80 would make more sense, it does match up with Ethernet MACs. Not as easy to compute, for humans nor processors that like things in 32 or 64 bit chunks however. Anyone have a definite answer? Thanks, Chuck -----Original Message----- From: Jean-Francois.TremblayING@videotron.com [mailto:Jean-Francois.TremblayING@videotron.com] Sent: Wednesday, June 06, 2012 10:36 AM To: anton@huge.geek.nz Cc: NANOG list Subject: IPv6 /64 links (was Re: ipv6 book recommendations?) Anton Smith <anton@huge.geek.nz> a écrit sur 06/06/2012 09:53:02 AM :
Potentially silly question but, as Bill points out a LAN always occupies a /64.
Does this imply that we would have large L2 segments with a large number of hosts on them? What about the age old discussion about keeping broadcast segments small?
The /64 only removes the limitation on the number of *addresses* on the L2 domain. Limitations still apply for the amount of ARP and ND noise. A maximum number of hosts is reached when that noise floor represents a significant portion of the link bandwidth. If ARP/ND proxying is used, the limiting factor may instead be the CPU on the gateway. The ND noise generated is arguably higher than ARP because of DAD, but I don't remember seeing actual numbers on this (anybody?). I've seen links with up to 15k devices where ARP represented a significant part of the link usage, but most weren't (yet) IPv6. /JF
Thus spake Chuck Church (chuckchurch@gmail.com) on Wed, Jun 06, 2012 at 10:58:05AM -0400:
Does anyone know the reason /64 was proposed as the size for all L2 domains?
Some day eui-48 will "run out". So, just assume eui-64 now and map into it. Also, as you point out below, not all L2 is ethernet.
I've looked for this answer before, never found a good one. I thought I read there are some L2 technologies that use a 64 bit hardware address, might have been Bluetooth. Guaranteeing that ALL possible hosts could live together in the same L2 domain seems like overkill, even for this group. /80 would make more sense, it does match up with Ethernet MACs. Not as easy to compute, for humans nor processors that like things in 32 or 64 bit chunks however. Anyone have a definite answer?
A good history lesson for this addressing model would be to look at IPX. (And maybe also IRDP for ipv4). When we did our first trial ipv6 deployments here in the early 2000's we were still running IPX, so I guess SLAAC wasn't hard to grasp. Dale
It is because of IEEE EUI-64 standard. It was believed at the time of IPv6 development that EUI-48 would run out of numbers and IEEE had proposed going to EUI-64. While IEEE still hasn't quite made that change (though Firewire does appear to use EUI-64 already), it will likely occur prior to the EOL for IPv6. There is a simple algorithm used by IEEE for mapping EUI-48 onto the EUI-64 space. The 0x02 bit of the first octet of an EUI-64 address is an L-Flag, indicating that the address was locally generated (if it is a 1) vs. IEEE/vendor assigned (if it is a 0). The mapping process takes the EUI-48 address XX:YY:ZZ:RR:SS:TT and maps it as follows: let AA = XX xor 0x02. AAYY:ZZff:feRR:SSTT ff:fe above is literal. IPv6 was originally going to be a 32-bit address space, but, the developers and proponent of SLAAC convinced IETF to add 64 more bits to the IPv6 address for this purpose. Since bits are free when designing a new protocol, there really was no reason to impose such limitations. You really don't gain anything by going to /80 at this point. There are more than enough addresses available in IPv6 for any foreseeable future even with /64 subnets. Owen On Jun 6, 2012, at 7:58 AM, Chuck Church wrote:
Does anyone know the reason /64 was proposed as the size for all L2 domains? I've looked for this answer before, never found a good one. I thought I read there are some L2 technologies that use a 64 bit hardware address, might have been Bluetooth. Guaranteeing that ALL possible hosts could live together in the same L2 domain seems like overkill, even for this group. /80 would make more sense, it does match up with Ethernet MACs. Not as easy to compute, for humans nor processors that like things in 32 or 64 bit chunks however. Anyone have a definite answer?
Thanks,
Chuck
-----Original Message----- From: Jean-Francois.TremblayING@videotron.com [mailto:Jean-Francois.TremblayING@videotron.com] Sent: Wednesday, June 06, 2012 10:36 AM To: anton@huge.geek.nz Cc: NANOG list Subject: IPv6 /64 links (was Re: ipv6 book recommendations?)
Anton Smith <anton@huge.geek.nz> a écrit sur 06/06/2012 09:53:02 AM :
Potentially silly question but, as Bill points out a LAN always occupies a /64.
Does this imply that we would have large L2 segments with a large number of hosts on them? What about the age old discussion about keeping broadcast segments small?
The /64 only removes the limitation on the number of *addresses* on the L2 domain. Limitations still apply for the amount of ARP and ND noise. A maximum number of hosts is reached when that noise floor represents a significant portion of the link bandwidth. If ARP/ND proxying is used, the limiting factor may instead be the CPU on the gateway.
The ND noise generated is arguably higher than ARP because of DAD, but I don't remember seeing actual numbers on this (anybody?). I've seen links with up to 15k devices where ARP represented a significant part of the link usage, but most weren't (yet) IPv6.
/JF
On 06/06/2012 03:05 PM, Owen DeLong wrote:
It is because of IEEE EUI-64 standard.
It was believed at the time of IPv6 development that EUI-48 would run out of numbers and IEEE had proposed going to EUI-64. While IEEE still hasn't quite made that change (though Firewire does appear to use EUI-64 already), it will likely occur prior to the EOL for IPv6.
There is a simple algorithm used by IEEE for mapping EUI-48 onto the EUI-64 space.
The 0x02 bit of the first octet of an EUI-64 address is an L-Flag, indicating that the address was locally generated (if it is a 1) vs. IEEE/vendor assigned (if it is a 0).
The mapping process takes the EUI-48 address XX:YY:ZZ:RR:SS:TT and maps it as follows:
let AA = XX xor 0x02.
AAYY:ZZff:feRR:SSTT
ff:fe above is literal.
IPv6 was originally going to be a 32-bit address space, but, the developers did you mean "originally going to be a 64-bit address space"... and proponent of SLAAC convinced IETF to add 64 more bits to the IPv6 address for this purpose. Since bits are free when designing a new protocol, there really was no reason to impose such limitations.
You really don't gain anything by going to /80 at this point. There are more than enough addresses available in IPv6 for any foreseeable future even with /64 subnets.
Owen
On Jun 6, 2012, at 7:58 AM, Chuck Church wrote:
Does anyone know the reason /64 was proposed as the size for all L2 domains? I've looked for this answer before, never found a good one. I thought I read there are some L2 technologies that use a 64 bit hardware address, might have been Bluetooth. Guaranteeing that ALL possible hosts could live together in the same L2 domain seems like overkill, even for this group. /80 would make more sense, it does match up with Ethernet MACs. Not as easy to compute, for humans nor processors that like things in 32 or 64 bit chunks however. Anyone have a definite answer?
Thanks,
Chuck
-----Original Message----- From: Jean-Francois.TremblayING@videotron.com [mailto:Jean-Francois.TremblayING@videotron.com] Sent: Wednesday, June 06, 2012 10:36 AM To: anton@huge.geek.nz Cc: NANOG list Subject: IPv6 /64 links (was Re: ipv6 book recommendations?)
Anton Smith<anton@huge.geek.nz> a écrit sur 06/06/2012 09:53:02 AM :
Potentially silly question but, as Bill points out a LAN always occupies a /64.
Does this imply that we would have large L2 segments with a large number of hosts on them? What about the age old discussion about keeping broadcast segments small? The /64 only removes the limitation on the number of *addresses* on the L2 domain. Limitations still apply for the amount of ARP and ND noise. A maximum number of hosts is reached when that noise floor represents a significant portion of the link bandwidth. If ARP/ND proxying is used, the limiting factor may instead be the CPU on the gateway.
The ND noise generated is arguably higher than ARP because of DAD, but I don't remember seeing actual numbers on this (anybody?). I've seen links with up to 15k devices where ARP represented a significant part of the link usage, but most weren't (yet) IPv6.
/JF
-- Stephen Clark *NetWolves* Director of Technology Phone: 813-579-3200 Fax: 813-882-0209 Email: steve.clark@netwolves.com http://www.netwolves.com
Owen DeLong wrote:
It is because of IEEE EUI-64 standard.
Right, so far.
It was believed at the time of IPv6 development that EUI-48 would run out of numbers and IEEE had proposed going to EUI-64. While IEEE still hasn't quite made that change (though Firewire does appear to use EUI-64 already), it will likely occur prior to the EOL for IPv6.
Wrong. It is because I pointed out that IEEE1394 already use EUI-64.
Since bits are free when designing a new protocol, there really was no reason to impose such limitations.
Bits are not free. Remembering a 64 bit value human, a 128 bit value divine, which makes IPv6 network operation hard. Masataka Ohta
On Wed, 06 Jun 2012 10:58:05 -0400, Chuck Church <chuckchurch@gmail.com> wrote:
Does anyone know the reason /64 was proposed as the size for all L2 domains?
There is one, and only one, reason for the ::/64 split: SLAAC. IPv6 is a classless addressing system. You can make your LAN ::/117 if you want to; SLAAC will not work there. The reason the requirement is (currently) 64 is to accomodate EUI-64 hardware addresses -- firewire, bluetooth, fibre channel, etc. Originally, SLAAC was designed for ethernet and its 48bit hardware address. (required LAN mask was ::/80.) The purpose wasn't to put the whole internet into one LAN. It was to make address selection "brainless", esp. for embeded systems with limited memory/cpu/etc... they can form an address by simply appending their MAC to the prefix, and be 99.99999% sure it won't be in use. (i.e. no DAD required.) However, that was optimizing a problem that never existed -- existing tiny systems of the day were never destined to have an IPv6 stack, "modern" IPv6 hardware can select an address and perform DAD efficiently in well under 1K. (which is noise vs. the size of the rest of the IPv6 stack.) SLAAC has been a flawed idea from the first letter... if for no other reason than it makes people think "64bit network + 64bit host" -- and that is absolutely wrong. (one cannot make such assumptions about networks they do not control. it's even worse when people design hardware thinking that.) --Ricky
On Jun 7, 2012, at 1:27 PM, Ricky Beam wrote:
On Wed, 06 Jun 2012 10:58:05 -0400, Chuck Church <chuckchurch@gmail.com> wrote:
Does anyone know the reason /64 was proposed as the size for all L2 domains?
There is one, and only one, reason for the ::/64 split: SLAAC. IPv6 is a classless addressing system. You can make your LAN ::/117 if you want to; SLAAC will not work there.
Nope... There's also ND and the solicited node address.
The reason the requirement is (currently) 64 is to accomodate EUI-64 hardware addresses -- firewire, bluetooth, fibre channel, etc. Originally, SLAAC was designed for ethernet and its 48bit hardware address. (required LAN mask was ::/80.) The purpose wasn't to put the whole internet into one LAN. It was to make address selection "brainless", esp. for embeded systems with limited memory/cpu/etc... they can form an address by simply appending their MAC to the prefix, and be 99.99999% sure it won't be in use. (i.e. no DAD required.) However, that was optimizing a problem that never existed -- existing tiny systems of the day were never destined to have an IPv6 stack, "modern" IPv6 hardware can select an address and perform DAD efficiently in well under 1K. (which is noise vs. the size of the rest of the IPv6 stack.)
Modern embedded IPv6 systems in short order will have IPv6 implemented in the chip ala the Wizard W5100 chip that is very popular for IPv4 in embedded systems and micro-controllers today.
SLAAC has been a flawed idea from the first letter... if for no other reason than it makes people think "64bit network + 64bit host" -- and that is absolutely wrong. (one cannot make such assumptions about networks they do not control. it's even worse when people design hardware thinking that.)
While one cannot assume 64+64 on networks you don't control and CIDR is the rule for IPv6, having a common 64+64 subnet size widely deployed has a number of advantages. I am interested to hear what people are using in lieu of ND and ARP on NBMA and/or BMA multipoint IPv6 networks with netmasks longer than /64. Owen
Le 07/06/2012 22:27, Ricky Beam a écrit :
On Wed, 06 Jun 2012 10:58:05 -0400, Chuck Church <chuckchurch@gmail.com> wrote:
Does anyone know the reason /64 was proposed as the size for all L2 domains?
There is one, and only one, reason for the ::/64 split: SLAAC. IPv6 is a classless addressing system. You can make your LAN ::/117 if you want to; SLAAC will not work there.
SLAAC could work with ::/117 but not on Ethernet and its keen. There are many other links than Ethernet and IEEE. Nothing (no RFC) prohibits SLAAC with something longer than 64, provided a means to form an Interfac Identifier for that particular link is provided. I.e. a new document that specifies e.g. IPv6-over-LTE (replace LTE with something non-IEEE). Alex
The reason the requirement is (currently) 64 is to accomodate EUI-64 hardware addresses -- firewire, bluetooth, fibre channel, etc. Originally, SLAAC was designed for ethernet and its 48bit hardware address. (required LAN mask was ::/80.) The purpose wasn't to put the whole internet into one LAN. It was to make address selection "brainless", esp. for embeded systems with limited memory/cpu/etc... they can form an address by simply appending their MAC to the prefix, and be 99.99999% sure it won't be in use. (i.e. no DAD required.) However, that was optimizing a problem that never existed -- existing tiny systems of the day were never destined to have an IPv6 stack, "modern" IPv6 hardware can select an address and perform DAD efficiently in well under 1K. (which is noise vs. the size of the rest of the IPv6 stack.)
SLAAC has been a flawed idea from the first letter... if for no other reason than it makes people think "64bit network + 64bit host" -- and that is absolutely wrong. (one cannot make such assumptions about networks they do not control. it's even worse when people design hardware thinking that.)
--Ricky
I think, the length of Interface ID be 64 is so mostly because IEEE works now with 64bit EUI identifiers (instead of older 48bit MAC addresses). I.e. compatibility between IEEE and IETF IPv6 would be the main reason for this Interface ID to be 64. And this is so, even though there are IEEE links for which the MAC address is even shorter than 64bit, like 802.15.4 short addresses being on 16bit. For those, an IPv6 prefix length of 112bit would even make sense. But it's not done, because same IEEE which says the 15.4 MAC address is 16bit says that its EUI is 64bit. (what 'default' fill that with is what gets into an IPv6 address as well). The good thing isthere is nothing in the RFC IPv6 Addressing Architecture that makes the Interface ID to be MUST 64bit. It just says 'n'. What there _is_, is that when using RFC stateless addess autoconfiguration (not DHCP) and on Ethernet and its keen (WiFi, Bluetooth, ZigBee, more; but not USB nor LTE for example) then one must use Interface ID of 64bit; and consequently network prefix length of 64bit no more. Alex Le 06/06/2012 16:58, Chuck Church a écrit :
Does anyone know the reason /64 was proposed as the size for all L2 domains? I've looked for this answer before, never found a good one. I thought I read there are some L2 technologies that use a 64 bit hardware address, might have been Bluetooth. Guaranteeing that ALL possible hosts could live together in the same L2 domain seems like overkill, even for this group. /80 would make more sense, it does match up with Ethernet MACs. Not as easy to compute, for humans nor processors that like things in 32 or 64 bit chunks however. Anyone have a definite answer?
Thanks,
Chuck
-----Original Message----- From: Jean-Francois.TremblayING@videotron.com [mailto:Jean-Francois.TremblayING@videotron.com] Sent: Wednesday, June 06, 2012 10:36 AM To: anton@huge.geek.nz Cc: NANOG list Subject: IPv6 /64 links (was Re: ipv6 book recommendations?)
Anton Smith <anton@huge.geek.nz> a écrit sur 06/06/2012 09:53:02 AM :
Potentially silly question but, as Bill points out a LAN always occupies a /64.
Does this imply that we would have large L2 segments with a large number of hosts on them? What about the age old discussion about keeping broadcast segments small?
The /64 only removes the limitation on the number of *addresses* on the L2 domain. Limitations still apply for the amount of ARP and ND noise. A maximum number of hosts is reached when that noise floor represents a significant portion of the link bandwidth. If ARP/ND proxying is used, the limiting factor may instead be the CPU on the gateway.
The ND noise generated is arguably higher than ARP because of DAD, but I don't remember seeing actual numbers on this (anybody?). I've seen links with up to 15k devices where ARP represented a significant part of the link usage, but most weren't (yet) IPv6.
/JF
On Jun 19, 2012, at 8:44 AM, Alexandru Petrescu wrote:
I think, the length of Interface ID be 64 is so mostly because IEEE works now with 64bit EUI identifiers (instead of older 48bit MAC addresses). I.e. compatibility between IEEE and IETF IPv6 would be the main reason for this Interface ID to be 64.
And this is so, even though there are IEEE links for which the MAC address is even shorter than 64bit, like 802.15.4 short addresses being on 16bit. For those, an IPv6 prefix length of 112bit would even make sense. But it's not done, because same IEEE which says the 15.4 MAC address is 16bit says that its EUI is 64bit. (what 'default' fill that with is what gets into an IPv6 address as well).
It's easy to put a 16 bit value into a 64 bit bucket. It's very hard to put a 64 bit value into a 16 bit bucket. Just saying.
The good thing isthere is nothing in the RFC IPv6 Addressing Architecture that makes the Interface ID to be MUST 64bit. It just says 'n'.
What there _is_, is that when using RFC stateless addess autoconfiguration (not DHCP) and on Ethernet and its keen (WiFi, Bluetooth, ZigBee, more; but not USB nor LTE for example) then one must use Interface ID of 64bit; and consequently network prefix length of 64bit no more.
Well, there's another issue... On such a network, how would you go about doing ND? How do you construct a solicited node multicast address for such a node if it has, say, a /108 prefix? Owen
On Wed, 2012-06-06 at 10:35 -0400, Jean-Francois.TremblayING@videotron.com wrote:
The ND noise generated is arguably higher than ARP because of DAD, but I don't remember seeing actual numbers on this (anybody?). I've seen links with up to 15k devices where ARP represented a significant part of the link usage, but most weren't (yet) IPv6.
That doesn't sound right to me. a) DAD only happens when an IPv6 node is starting up. ARP happens whenever a node needs to talk to another node that it hasn't seen in while. b) DAD only goes to solicited node multicast addresses, i.e., only to those nodes that share the same last 24 bits as the target address. ARP goes to every node on the link (broadcast). c) Similarly, ND (the direct equivalent of ARP) goes only to solicited node multicast addresses, ARP goes to every node on the link. So I'm not sure how DAD traffic would exceed ARP traffic. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer GPG fingerprint: AE1D 4868 6420 AD9A A698 5251 1699 7B78 4EEE 6017 Old fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
On Wed, 06 Jun 2012 17:17:37 -0400, Karl Auer <kauer@biplane.com.au> wrote:
a) DAD only happens when an IPv6 node is starting up. ARP happens whenever a node needs to talk to another node that it hasn't seen in while.
DAD is a special case of ND. It happens every time the system selects an address. (i.e. startup with non-SLAAC address, and when privacy extensions generates an address.)
b) DAD only goes to solicited node multicast addresses, i.e., only to those nodes that share the same last 24 bits as the target address. ARP goes to every node on the link (broadcast).
This assumes a network of devices that do multicast filtering, correctly. This is not a good assumption even in large enterprises. Common residential gear usually doesn't understand multicast at all. (unless you're a uverse tv customer using ethernet and paid close attention to your hardware.)
c) Similarly, ND (the direct equivalent of ARP) goes only to solicited node multicast addresses, ARP goes to every node on the link.
Effectively the same as broadcast in the IPv6 world. If everyone is running IPv6, then everyone will see the packet. (things not running ipv6 can filter it out, but odds are it'll be put on the cable.)
So I'm not sure how DAD traffic would exceed ARP traffic.
I wouldn't expect it to. Looking at the output of my 3745, it fires 3 ND's at startup and is then silent. (TWC has no IPv6 on my node, but v4 ARP broadcasts amount to ~16K/s) --Ricky
On Thu, Jun 7, 2012 at 8:42 PM, Ricky Beam <jfbeam@gmail.com> wrote:
On Wed, 06 Jun 2012 17:17:37 -0400, Karl Auer <kauer@biplane.com.au> wrote:
c) Similarly, ND (the direct equivalent of ARP) goes only to solicited node multicast addresses, ARP goes to every node on the link.
Effectively the same as broadcast in the IPv6 world. If everyone is running IPv6, then everyone will see the packet. (things not running ipv6 can filter it out, but odds are it'll be put on the cable.)
Bzzt. With ARP, every IPv4 node on the link indicates each ARP packet to the OS. With ND, only those nodes sharing the same last 24 bits of the IPv6 address indicate the packet up the stack. The rest of the IPv6 nodes filter the multicast in the NIC. Cheers, Dave Hart
On Thu, 2012-06-07 at 21:07 +0000, Dave Hart wrote:
Bzzt. With ARP, every IPv4 node on the link indicates each ARP packet to the OS. With ND, only those nodes sharing the same last 24 bits of the IPv6 address indicate the packet up the stack. The rest of the IPv6 nodes filter the multicast in the NIC.
Still not quite correct :-) The "filtering" is done by a MLD-aware switch, which will send multicast packets only to nodes that are listening to the appropriate multicast group. The filtering you describe is pretty much what ARP does - ALL nodes receive the packet, all but one ignore it. It depends on the platform whether the CPU that does the ignoring is just in the NIC or is in the node itself. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer GPG fingerprint: AE1D 4868 6420 AD9A A698 5251 1699 7B78 4EEE 6017 Old fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
On Thu, Jun 7, 2012 at 10:14 PM, Karl Auer <kauer@biplane.com.au> wrote:
On Thu, 2012-06-07 at 21:07 +0000, Dave Hart wrote:
Bzzt. With ARP, every IPv4 node on the link indicates each ARP packet to the OS. With ND, only those nodes sharing the same last 24 bits of the IPv6 address indicate the packet up the stack. The rest of the IPv6 nodes filter the multicast in the NIC.
Still not quite correct :-)
The "filtering" is done by a MLD-aware switch, which will send multicast packets only to nodes that are listening to the appropriate multicast group. The filtering you describe is pretty much what ARP does - ALL nodes receive the packet, all but one ignore it. It depends on the platform whether the CPU that does the ignoring is just in the NIC or is in the node itself.
Karl, you seem to fail to understand how ethernet NICs are implemented in the real world. Ignoring the optional (but common) promiscuous mode support and various offloading, IPv4 ARP is sent as ethernet broadcast and the NIC hardware and driver is in no position to filter -- it must be done by the IP stack. In contrast, ND is sent as ethernet multicast which are filtered by receivers in hardware. Whether or not the switches are smart enough to filter is an implementation decision that has no bearing on the requirement to filter in the NIC hardware. Cheers, Dave Hart
On Thu, 2012-06-07 at 22:27 +0000, Dave Hart wrote:
Karl, you seem to fail to understand how ethernet NICs are implemented in the real world. Ignoring the optional (but common) promiscuous mode support and various offloading, IPv4 ARP is sent as ethernet broadcast and the NIC hardware and driver is in no position to filter -- it must be done by the IP stack. In contrast, ND is sent as ethernet multicast which are filtered by receivers in hardware. Whether or not the switches are smart enough to filter is an implementation decision that has no bearing on the requirement to filter in the NIC hardware.
I'm the first to admit that I often don't know stuff. One good reason to be on the NANOG mailing list! But in this case... Yes - whether with ARP or ND, any node has to filter out the packets that do not apply to it (whether it's done by the NIC or the host CPU is another question, not relevant here). But in a properly switched IPv6 network, many/most ND packets do not arrive at most nodes' network interfaces at all, so those nodes have no filtering work to do. Yes, the nodes that DO get a packet - those listening on the relevant multicast group, often a solicited node multicast group - DO need to filter out the NDs that don't apply to them, but the point is that a vastly reduced number of nodes are thus inconvenienced compared. The original post posited that ND could cause as much traffic as ARP. My point is that it probably doesn't, because the ND packets will only be seen on the specific switch ports belonging to those nodes that are listening to the relevant multicast groups, and only those nodes will actually receive the ND packets. In contrast to ARP, which is broadcast, always, to all nodes, and thus goes out every switch port in the broadcast domain. This is pretty much the *point* of using multicast instead of broadcast. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer GPG fingerprint: AE1D 4868 6420 AD9A A698 5251 1699 7B78 4EEE 6017 Old fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
In message <1339116492.2754.162.camel@karl>, Karl Auer writes:
--=-ebOzahzuucm9tstf70zM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Thu, 2012-06-07 at 22:27 +0000, Dave Hart wrote:
Karl, you seem to fail to understand how ethernet NICs are implemented in the real world. Ignoring the optional (but common) promiscuous mode support and various offloading, IPv4 ARP is sent as ethernet broadcast and the NIC hardware and driver is in no position to filter -- it must be done by the IP stack. In contrast, ND is sent as ethernet multicast which are filtered by receivers in hardware. Whether or not the switches are smart enough to filter is an implementation decision that has no bearing on the requirement to filter in the NIC hardware.
I'm the first to admit that I often don't know stuff. One good reason to be on the NANOG mailing list! But in this case...
Yes - whether with ARP or ND, any node has to filter out the packets that do not apply to it (whether it's done by the NIC or the host CPU is another question, not relevant here).
But in a properly switched IPv6 network, many/most ND packets do not arrive at most nodes' network interfaces at all, so those nodes have no filtering work to do. Yes, the nodes that DO get a packet - those listening on the relevant multicast group, often a solicited node multicast group - DO need to filter out the NDs that don't apply to them, but the point is that a vastly reduced number of nodes are thus inconvenienced compared.
The original post posited that ND could cause as much traffic as ARP. My point is that it probably doesn't, because the ND packets will only be seen on the specific switch ports belonging to those nodes that are listening to the relevant multicast groups, and only those nodes will actually receive the ND packets. In contrast to ARP, which is broadcast, always, to all nodes, and thus goes out every switch port in the broadcast domain.
This is pretty much the *point* of using multicast instead of broadcast.
The point of multicast is be able to reject traffic sooner rather than later. Running IPv6 with a nic that doesn't support several multicast addresses is a real pain which I know from experience. It can however be done.
Regards, K.
--=20 -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
On Fri, 2012-06-08 at 11:08 +1000, Mark Andrews wrote:
This is pretty much the *point* of using multicast instead of broadcast.
The point of multicast is be able to reject traffic sooner rather than later.
Well - yes - and my description was of how, when properly configured and on the right hardware, unwanted multicast IPv6 packets do not even reach the NIC. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer GPG fingerprint: AE1D 4868 6420 AD9A A698 5251 1699 7B78 4EEE 6017 Old fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
On Fri, Jun 8, 2012 at 12:48 AM, Karl Auer <kauer@biplane.com.au> wrote:
Yes - whether with ARP or ND, any node has to filter out the packets that do not apply to it (whether it's done by the NIC or the host CPU is another question, not relevant here).
It is relevant to the question of the scalability of large L2 networks. With IPv4, ARP presents not only a network capacity issue, but also a host capacity issue as every node expends software resources processing every broadcast ARP. With ND, only a tiny fraction of hosts expend any software capacity processing a given multicast packet, thanks to ethernet NIC's hardware filtering of received multicasts -- with or without multicast-snooping switches.
The original post posited that ND could cause as much traffic as ARP. My point is that it probably doesn't, because the ND packets will only be seen on the specific switch ports belonging to those nodes that are listening to the relevant multicast groups, and only those nodes will actually receive the ND packets. In contrast to ARP, which is broadcast, always, to all nodes, and thus goes out every switch port in the broadcast domain.
This is pretty much the *point* of using multicast instead of broadcast.
I agree.
On Fri, 2012-06-08 at 03:08 +0000, Dave Hart wrote:
networks. With IPv4, ARP presents not only a network capacity issue, but also a host capacity issue as every node expends software resources processing every broadcast ARP. With ND, only a tiny fraction of hosts expend any software capacity processing a given multicast packet, thanks to ethernet NIC's hardware filtering of received multicasts -- with or without multicast-snooping switches.
So we are actually sort of agreeing. That's a relief :-) However, preventing packets getting to the NICs *at all* is a pretty big win, because even if a clever NIC can prevent a host CPU being interrupted, the packet was still wasting bandwidth on the path to the NIC. I would go so far as to say that MLD snooping makes the NIC side of things almost irrelevant. Almost :-) Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer GPG fingerprint: AE1D 4868 6420 AD9A A698 5251 1699 7B78 4EEE 6017 Old fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
On Thu, 2012-06-07 at 16:42 -0400, Ricky Beam wrote:
On Wed, 06 Jun 2012 17:17:37 -0400, Karl Auer wrote:
a) DAD only happens when an IPv6 node is starting up. ARP happens whenever a node needs to talk to another node that it hasn't seen in while.
DAD is a special case of ND. It happens every time the system selects an address. (i.e. startup with non-SLAAC address, and when privacy extensions generates an address.)
Er - OK. I should have said "happens when an address is assigned to an interface". It is still, however, way less traffic than ARP, which was my point. Possible exception - a network where everyone is using privacy addresses.
b) DAD only goes to solicited node multicast addresses
This assumes a network of devices that do multicast filtering, correctly.
Yes, it does. It assumes a properly provisioned and configured IPv6 network. While that may not be common now, it will become more common. And it is a self-correcting problem - people who don't want lots of noise will implement their networks correctly, those who don't care will do as they wish. No change there :-) BTW, I'm assuming here that by "multicast filtering" you mean "switching that properly snoops on MLD and sends multicast packets only to the correct listeners".
c) Similarly, ND (the direct equivalent of ARP) goes only to solicited node multicast addresses, ARP goes to every node on the link.
Effectively the same as broadcast in the IPv6 world. If everyone is running IPv6, then everyone will see the packet. (things not running ipv6 can filter it out, but odds are it'll be put on the cable.)
On this point I think you are wrong. Except for router advertisements, most NDP packets are sent to a solicited node multicast address, and so do NOT go to all nodes. It is "the same as broadcast" only in a network with switches that do not do MLD snooping.
So I'm not sure how DAD traffic would exceed ARP traffic.
I wouldn't expect it to.
Nor would I - which was the point of my response to an original poster who said it might. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer GPG fingerprint: AE1D 4868 6420 AD9A A698 5251 1699 7B78 4EEE 6017 Old fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
Karl Auer <kauer@biplane.com.au> a écrit sur 07/06/2012 06:09:46 PM :
On this point I think you are wrong. Except for router advertisements, most NDP packets are sent to a solicited node multicast address, and so do NOT go to all nodes. It is "the same as broadcast" only in a network with switches that do not do MLD snooping.
So I'm not sure how DAD traffic would exceed ARP traffic. I wouldn't expect it to. Nor would I - which was the point of my response to an original poster who said it might.
Karl, Actually, your analysis seems fair for a normal broadcast network. It's true that DAD is fairly rare. RS and RAs are also part of ND though, but they shouldn't be a large part of the trafic. My comment was probably skewed by my perspective as an MSO. Docsis networks are not really broadcast in nature and the gateway (CMTS) sees all the ND trafic (ND-proxying), including DAD and RS, which can become a fair amount of trafic in some specific situations. /JF
Karl Auer wrote:
BTW, I'm assuming here that by "multicast filtering" you mean "switching that properly snoops on MLD and sends multicast packets only to the correct listeners".
Errrrr, do you want to say MLD noise is not a problem?
On this point I think you are wrong. Except for router advertisements, most NDP packets are sent to a solicited node multicast address, and so do NOT go to all nodes. It is "the same as broadcast" only in a network with switches that do not do MLD snooping.
But, MLD packets must go to all routers.
So I'm not sure how DAD traffic would exceed ARP traffic.
I wouldn't expect it to.
Nor would I - which was the point of my response to an original poster who said it might.
For the original poster, : I've seen links with up to 15k devices where ARP represented : a significant part of the link usage, but most weren't (yet) IPv6. MLD noise around a router is as bad as ARP/ND noise. That's how IPv6 along with SLAAC is totally broken. Masataka Ohta
On Tue, 2012-06-12 at 17:16 +0900, Masataka Ohta wrote:
Errrrr, do you want to say MLD noise is not a problem?
I did not say or imply that MLD noise is (or is not) a problem. I took issue with the idea that DAD traffic - the specific kind of traffic mentioned by the original poster - was likely to exceed ARP traffic.
But, MLD packets must go to all routers.
As I understand it, DAD is not MLD and does not itself cause any MLD traffic. The MLD that happens around the same time as DAD happens anyway, as the node adds itself to all-link-local-nodes and its own solicited-node-multicast group. Except in that DAD is NDP, and both NDP and MLD use ICMPv6 as their transport, DAD has nothing to do with MLD? You might be right that MLD is noisy, but I don't think that has anything to do with the original discussion.
: I've seen links with up to 15k devices where ARP represented : a significant part of the link usage, but most weren't (yet) IPv6.
MLD noise around a router is as bad as ARP/ND noise.
Possibly true, but that's another discussion. And is the MLD traffic as bad *everywhere* on the link, as ARP is? I strongly suspect not, because the payoff for MLD is a lessening of traffic going to all nodes.
That's how IPv6 along with SLAAC is totally broken.
I think we have different ideas of what constitutes "totally" broken. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer GPG fingerprint: AE1D 4868 6420 AD9A A698 5251 1699 7B78 4EEE 6017 Old fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
Karl Auer wrote:
: I've seen links with up to 15k devices where ARP represented : a significant part of the link usage, but most weren't (yet) IPv6.
MLD noise around a router is as bad as ARP/ND noise.
Possibly true, but that's another discussion.
Then, you could have simply argued that there is no ARP problem with IPv6, because ND, not ARP, were another discussion.
That's how IPv6 along with SLAAC is totally broken.
I think we have different ideas of what constitutes "totally" broken.
It is because you avoid to face the reality of MLD. Masataka Ohta
Masataka Ohta wrote:
Karl Auer wrote:
: I've seen links with up to 15k devices where ARP represented : a significant part of the link usage, but most weren't (yet) IPv6.
MLD noise around a router is as bad as ARP/ND noise.
Possibly true, but that's another discussion.
Then, you could have simply argued that there is no ARP problem with IPv6, because ND, not ARP, were another discussion.
That's how IPv6 along with SLAAC is totally broken.
I think we have different ideas of what constitutes "totally" broken.
It is because you avoid to face the reality of MLD.
MLD != ND MLD == IGMP ND ~= ARP ND is less overhead on end systems than ARP because it is only received by nodes that are subscribed to a specific multicast group rather than broadcast reception by all. There is no difference in L2 resolution traffic at the packet level on the network. There are multicast join messages for groups specific to ND use, but those should not be frequent, and were a specific tradeoff in minor additional network load to reduce significant end system load. There are DAD messages that impact group members, but in IPv4 there are gratuitous ARP broadcasts which impact all nodes, so while the number of messages for that function is the same, the system-wide impact is much lower. Multicast group management is inherently noisy, but a few more bits on the wire reduces the load on the significantly larger number of end systems. Get over it ... Tony
Tony Hain wrote:
It is because you avoid to face the reality of MLD.
MLD != ND MLD == IGMP
OK.
ND ~= ARP
Wrong, because ND requires MLD while ARP does not.
ND is less overhead on end systems than ARP
Today, overhead in time is more serious than that in processor load. As ND requires MLD and DAD, overhead in time when addresses are assigned is very large (several seconds or more if multicast is not very reliable), which is harmful especially for quicking moving mobile hosts.
because it is only received by nodes that are subscribed to a specific multicast group rather than broadcast reception by all.
Broadcast reception by all is good because that's how ARP can detect duplicated addresses without DAD overhead in time.
Multicast group management is inherently noisy,
Thus, IPv6 is inherently noisy while IPv4 is not.
but a few more bits on the wire reduces the load on the significantly larger number of end systems. Get over it ...
First of all, with CATENET model, there is no significantly large number of end systems in a link. Secondly, even if there are significantly large number of end systems in a link, with the end to end principle, network equipments must be dumb while end systems must be intelligent, which means MLD snooping is unnecessary and end systems must take care of themselves, violation of which results in inefficiencies and incompleteness of ND. Masataka Ohta
Masataka Ohta
Tony Hain wrote:
It is because you avoid to face the reality of MLD.
MLD != ND MLD == IGMP
OK.
ND ~= ARP
Wrong, because ND requires MLD while ARP does not.
Note the ~ ... And ARP requires media level broadcast, which ND does not. Not all media support broadcast.
ND is less overhead on end systems than ARP
Today, overhead in time is more serious than that in processor load.
As ND requires MLD and DAD, overhead in time when addresses are assigned is very large (several seconds or more if multicast is not very reliable), which is harmful especially for quicking moving mobile hosts.
So leveraging broadcast is why just about every implementation does a gratuitous ARP-and-wait multiple times, which is no different than DAD timing? MLD does not need to significantly increase time for address assignment. If hosts are moving quickly the fabric needs to be able to keep up with that anyway, so adding a new multicast member needs to be fast independent of IPv6 address assignment.
because it is only received by nodes that are subscribed to a specific multicast group rather than broadcast reception by all.
Broadcast reception by all is good because that's how ARP can detect duplicated addresses without DAD overhead in time.
BS ... Broadcasts are dropped all the time, so some nodes miss them and they need to be repeated which causes further delay. On top of that, the widespread practice of a gratuitous ARP was the precedent for the design of DAD.
Multicast group management is inherently noisy,
Thus, IPv6 is inherently noisy while IPv4 is not.
but a few more bits on the wire reduces the load on the significantly larger number of end systems. Get over it ...
First of all, with CATENET model, there is no significantly large number
of end
systems in a link.
Clearly you have never looked at some networks with > 64k nodes on a link. Not all nodes move, and not all networks are a handful of end systems per segment.
Secondly, even if there are significantly large number of end systems in a link, with the end to end principle, network equipments must be dumb while end systems must be intelligent, which means MLD snooping is unnecessary and end systems must take care of themselves, violation of which results
in
inefficiencies and incompleteness of ND.
MLD snooping was a recent addition to deal with intermediate network devices that want to insert themselves into a process that was designed to bypass them. That is not a violation of the end systems taking care of themselves, it is an efficiency issue some devices chose to assert that isn't strictly required for end-to-end operation. Just because you have never liked the design choices and tradeoffs made in developing IPv6 doesn't make them wrong. I don't know anybody that is happy with all aspects of the process, but that is also true for all the bolt-on's developed to keep IPv4 running over the last 30 years. IPv4 had its day, and it is time to move on. Continuing to complain about existing IPv6 design does nothing productive. If there are constructive suggestions to make the outcome better, take them to the IETF just like all the constructive changes made to IPv4. Tony
Tony Hain wrote:
Note the ~ ... And ARP requires media level broadcast, which ND does not.
Any multicast capable link is broadcast capable.
Not all media support broadcast.
A fundamental misunderstanding of people designed IPv6 is that they believed ATM not broadcast capable but multicast capable.
As ND requires MLD and DAD, overhead in time when addresses are assigned is very large (several seconds or more if multicast is not very reliable), which is harmful especially for quicking moving mobile hosts.
So leveraging broadcast is why just about every implementation does a gratuitous ARP-and-wait multiple times,
Not at all. IPv4 over something does not have to be ARP. IPv6 is broken eventually requiring all link use ND, even though ND was designed for stational hosts with only Ethernet, PPP and ATM (with a lot of misunderstanding) in mind.
MLD does not need to significantly increase time for address assignment.
That DAD latency is already too bad does not validate additional latency of MLD.
If hosts are moving quickly the fabric needs to be able to keep up with that anyway, so adding a new multicast member needs to be fast independent of IPv6 address assignment.
If only IPv6 over something were defined reflecting link specific properties. Instead, universal timing specification of ND and MLD ignoring various links in the world makes it impossible to be fast.
BS ... Broadcasts are dropped all the time,
On Ethernet, broadcast is as reliable as unicast.
MLD snooping was a recent addition
MLD snooping ~= IGMP snooping.
it is an efficiency issue some devices chose to assert that isn't strictly required for end-to-end operation.
There certainly are many problems, including but not limited to efficiency ones, caused by ND ignoring the end to end principle to make routers more intelligent than hosts, against which MLD snooping cloud be a half solution. But, so what?
Just because you have never liked the design choices and tradeoffs made in developing IPv6 doesn't make them wrong.
It is the ignorance on the end to end principle which makes IPv6 wrong.
Continuing to complain about existing IPv6 design does nothing productive.
Insisting on broken IPv6 design does nothing productive.
If there are constructive suggestions to make the outcome better, take them to the IETF just like all the constructive changes made to IPv4.
IPv6 is a proof that IETF has lost the power to make the world better. Masataka Ohta
On Jun 12, 2012, at 4:24 PM, Masataka Ohta wrote:
Tony Hain wrote:
Note the ~ ... And ARP requires media level broadcast, which ND does not.
Any multicast capable link is broadcast capable.
BZZT! but thank you for playing. Many NBMA topologies support multicast.
Not all media support broadcast.
A fundamental misunderstanding of people designed IPv6 is that they believed ATM not broadcast capable but multicast capable.
This is, in fact, true. Yes, you can synthesize ATM broadcast-like behavior, but it is not broadcast.
As ND requires MLD and DAD, overhead in time when addresses are assigned is very large (several seconds or more if multicast is not very reliable), which is harmful especially for quicking moving mobile hosts.
So leveraging broadcast is why just about every implementation does a gratuitous ARP-and-wait multiple times,
Not at all. IPv4 over something does not have to be ARP.
IPv4 over anything requires some form of L2 address resolution in any case where L2 addresses must be discovered.
IPv6 is broken eventually requiring all link use ND, even though ND was designed for stational hosts with only Ethernet, PPP and ATM (with a lot of misunderstanding) in mind.
Not really.
BS ... Broadcasts are dropped all the time,
On Ethernet, broadcast is as reliable as unicast.
BS.
Just because you have never liked the design choices and tradeoffs made in developing IPv6 doesn't make them wrong.
It is the ignorance on the end to end principle which makes IPv6 wrong.
End-to-end is significantly more broken in IPv4 because of the need for NAT than it is in IPv6. IIRC, you were the one promoting even more borked forms of NAT to try and compensate for this.
If there are constructive suggestions to make the outcome better, take them to the IETF just like all the constructive changes made to IPv4.
IPv6 is a proof that IETF has lost the power to make the world better.
IPv6 is quite a bit better than IPv4 in many ways. It could be better still, but, it is definitely superior to current IPv4 implementations and vastly superior to the IPv4 implementations that existed when IPv6 was designed. Owen
Owen DeLong wrote:
Any multicast capable link is broadcast capable.
BZZT! but thank you for playing.
Many NBMA topologies support multicast.
When you specify a "link" as a small subset of NBMA, it is broadcast capable, as was demonstrated by history of CLIP. If you want to have a large (as large as broadcast is not practical) "link" in NBMA, multicast control messages badly implode.
So leveraging broadcast is why just about every implementation does a gratuitous ARP-and-wait multiple times,
Not at all. IPv4 over something does not have to be ARP.
IPv4 over anything requires some form of L2 address resolution in any case where L2 addresses must be discovered.
For a mobile link around a base station, during link set up, the base station and mobile hosts know MAC addresses of each other. The base station can (or, must, in case of hidden terminals) relay packets between mobile hosts attacked to it. There is no ARP nor ARP-and-wait necessary.
IPv6 is broken eventually requiring all link use ND, even though ND was designed for stational hosts with only Ethernet, PPP and ATM (with a lot of misunderstanding) in mind.
Not really.
I know it happening within WG discussions.
End-to-end is significantly more broken in IPv4 because of the need for NAT than it is in IPv6.
More? So, even you think IPv6 is more or less broken.
IIRC, you were the one promoting even more borked forms of NAT to try and compensate for this.
I just need a UPnP capable NAT to restore the end to end transparency.
IPv6 is quite a bit better than IPv4 in many ways. It could be better still, but, it is definitely superior to current IPv4 implementations and vastly superior to the IPv4 implementations that existed when IPv6 was designed.
That is a commonly heard propaganda. However, these days, few believe it. Actually in this thread, your statement is proven to be untrue w.r.t. amount of noises on link bandwidth in large links. Masataka Ohta
On Wed, Jun 13, 2012 at 4:23 AM, Masataka Ohta wrote:
I just need a UPnP capable NAT to restore the end to end transparency.
You're not restoring transparency, you're restoring communication after stateful reconfiguration of the network for each service. It is not transparent when you have to negotiate an inbound path for each service. Even for apps that work today through local NATs, the future is dim. Increasing use of carrier NAT will force apps to additionally try Port Control Protocol to overcome evolving IPv4 brokenness. UPnP is inadequate for carrier NAT due to its model assuming the NAT trusts its clients. When TCP headers are being rewritten, it's a strong hint that transparency has been lost, even if some communication remains possible. Cheers, Dave Hart
Dave Hart wrote:
It is not transparent when you have to negotiate an inbound path for each service.
I mean, for applications, global address and global port numbers are visible.
UPnP is inadequate for carrier NAT due to its model assuming the NAT trusts its clients.
UPnP gateway configured with purely static port mapping needs no security. Assuming shared global address of 131.112.32.132, TCP/UDP port 100 to 199 may be forwarded to port 100 to 199 of 192.168.1.1, port 200 to 299 be forwarded to port 200 to 299 of 192.168.1.2, ...
When TCP headers are being rewritten, it's a strong hint that transparency has been lost, even if some communication remains possible.
UPnP provides information for clients to restore IP and TCP headers from local ones back to global ones, which is visible to applications. See the following protocol stack. UPnP capable NAT GW Client +---------+ | public | | appli- | | cation | information +---------+ +------+ for reverse translation | public | | UPnP |-------------------------->|transport| +---------+---------+ +---------+ | public | private | | private | |transport|transport| |transport| +---------+---------+ +---------+ +---------+ | public | private | | private | | private | | IP | IP | | IP | | IP | +---------+-----------------------+-----------------------+ | privatte datalink | private datalink | +-----------------------+-----------------------+ Masataka Ohta
On Jun 12, 2012, at 10:47 PM, Masataka Ohta wrote:
Dave Hart wrote:
It is not transparent when you have to negotiate an inbound path for each service.
I mean, for applications, global address and global port numbers are visible.
Showing that you don't actually understand what everyone else means when they say "end-to-end".
UPnP is inadequate for carrier NAT due to its model assuming the NAT trusts its clients.
UPnP gateway configured with purely static port mapping needs no security.
Assuming shared global address of 131.112.32.132, TCP/UDP port 100 to 199 may be forwarded to port 100 to 199 of 192.168.1.1, port 200 to 299 be forwarded to port 200 to 299 of 192.168.1.2, ...
No carrier is going to implement that for obvious reasons. Besides, that's not transparent end-to-end, that's predictably opaque end-to-end.
When TCP headers are being rewritten, it's a strong hint that transparency has been lost, even if some communication remains possible.
UPnP provides information for clients to restore IP and TCP headers from local ones back to global ones, which is visible to applications.
But it doesn't work across multiple layers of NAT.
See the following protocol stack.
UPnP capable NAT GW Client +---------+ | public | | appli- | | cation | information +---------+ +------+ for reverse translation | public | | UPnP |-------------------------->|transport| +---------+---------+ +---------+ | public | private | | private | |transport|transport| |transport| +---------+---------+ +---------+ +---------+ | public | private | | private | | private | | IP | IP | | IP | | IP | +---------+-----------------------+-----------------------+ | privatte datalink | private datalink | +-----------------------+-----------------------+
Now, redraw the diagram for the real world scenario: host <-> UPnP NAT <-> Carrier NAT <-> Internet <-> Carrier NAT <-> UPnP NAT <-> host Tell me again how the application signaling from UPnP survives through all that and comes up with correct answers? Yeah, thought so. Owen
Owen DeLong wrote:
Showing that you don't actually understand what everyone else means when they say "end-to-end".
Where is your point only to demonstrate that you don't understand what"end to end" means?
No carrier is going to implement that for obvious reasons.
Besides, that's not transparent end-to-end, that's predictably opaque end-to-end.
With no reasoning of you, I can simply say: WRONG
UPnP provides information for clients to restore IP and TCP headers from local ones back to global ones, which is visible to applications.
But it doesn't work across multiple layers of NAT.
It is trivially easy to make UPnP works across multiple layers of UPnP capable NAT.
Now, redraw the diagram for the real world scenario:
host <-> UPnP NAT <-> Carrier NAT <-> Internet <-> Carrier NAT <-> UPnP NAT <-> host
Tell me again how the application signaling from UPnP survives through all that and comes up with correct answers?
It is trivially: host <-> home UPnP NAT <-> Carrier UPnP NAT <-> Internet <-> Carrier UPnP NAT <-> home UPnP NAT <-> host Masataka Ohta
On Tue, 2012-06-19 at 22:28 +0900, Masataka Ohta wrote:
It is trivially:
host <-> home UPnP NAT <-> Carrier UPnP NAT <-> Internet <-> Carrier UPnP NAT <-> home UPnP NAT <-> host
"Trivially"? I think this looks much nicer: host <-> Internet <-> host The way it used to be before NAT, and the way, with IPv6, it can be again. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer GPG fingerprint: AE1D 4868 6420 AD9A A698 5251 1699 7B78 4EEE 6017 Old fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
Karl Auer wrote:
host <-> home UPnP NAT <-> Carrier UPnP NAT <-> Internet <-> Carrier UPnP NAT <-> home UPnP NAT <-> host
"Trivially"? I think this looks much nicer:
host <-> Internet <-> host
Yes, if only the Internet were uniform. However, compared to V6 incapable V6 capable host <-> Internet<-> home router <-> Internet <-> 6/4 tunnel V6 capable 6/4 tunnel V6 capable end point <-> Internet <-> end point <-> Internet <-> V6 incapable home router <-> Internet <-> host which can often be: V6 incapable V6 capable host <-> Internet<-> home router <-> Internet <-> 6/4 tunnel V6 *INCAPABLE* 6/4 tunnel V6 capable end point <-> Internet <-> end point <-> Internet <-> V6 incapable home router <-> Internet <-> host
host <-> home UPnP NAT <-> Carrier UPnP NAT <-> Internet <-> Carrier UPnP NAT <-> home UPnP NAT <-> host
is just trivial and uniform.
The way it used to be before NAT, and the way, with IPv6, it can be again.
With IPv6, see above. Masataka Ohta
On Wed, 13 Jun 2012 14:47:35 +0900, Masataka Ohta said:
Dave Hart wrote:
is inadequate for carrier NAT due to its model assuming the NAT trusts its clients.
UPnP gateway configured with purely static port mapping needs no security.
Assuming shared global address of 131.112.32.132, TCP/UDP port 100 to 199 may be forwarded to port 100 to 199 of 192.168.1.1, port 200 to 299 be forwarded to port 200 to 299 of 192.168.1.2,
And you tell the rest of the world that customer A's SMTP port is on 125, and B's is on 225, and Z's is up at 2097, how? (HInt - we haven't solved that problem for NAT yet, it's one of the big reasons that NAT breaks stuff) (Totally overlooking the debugging issues that arise when a customer tries to run a combination of applications that in aggregate have 101 ports open..)
valdis.kletnieks@vt.edu wrote:
And you tell the rest of the world that customer A's SMTP port is on 125, and B's is on 225, and Z's is up at 2097, how?
How? In draft-ohta-e2e-nat-00.txt, I already wrote: A server port number different from well known ones may be specified through mechanisms to specify an address of the server, which is the case of URLs. However, port numbers for DNS and SMTP are, in general, implicitly assumed by DNS and are not changeable. When an ISP operate a NAT gateway, the ISP should, for fairness between customers, reserve some well know port numbers and assign small port numbers evenly to all the customers. Or, a NAT gateway may receive packets to certain ports and behave as an application gateway to end hosts, if request messages to the server contains information, such as domain names, which is the case with DNS, SMTP and HTTP, to demultiplex the request messages to end hosts. However, for an ISP operating the NAT gateway, it may be easier to operate independent servers at default port for DNS, SMTP, HTTP and other applications for their customers than operating application relays.
(HInt - we haven't solved that problem for NAT yet, it's one of the big reasons that NAT breaks stuff)
As you can see, there is no such problem.
(Totally overlooking the debugging issues that arise when a customer tries to run a combination of applications that in aggregate have 101 ports open..)
The applications are broken, if they can't handle temporally error of EAGAIN to use the 101st port. Unlike legacy NAT, where no error can be returned for failed port allocation, end to end NAT can take care of the situation. Masataka Ohta
On Tue, 19 Jun 2012 22:21:11 +0900, Masataka Ohta said:
Or, a NAT gateway may receive packets to certain ports and behave as an application gateway to end hosts, if request messages to the server contains information, such as domain names, which is the case with DNS, SMTP and HTTP, to demultiplex the request messages to end
For SMTP, you'll have already consumed the 3 packet handshake and the EHLO, MAIL FROM, and at least one RCPT TO before you know which end host to demultiplex to (and even then, you may not unless the end hosts are running a DNS that advertises MX's with the NAT'ed IP in them). At that point, you have little choice but to then start up a conversation with the end host and relay the EHLO/MAIL FROM/RCPT TO and hope to heck that the end host doesn't reply differently to you than you did to the other end (in particular, you had to respond to the EHLO with a list of extensions supported - if you said you supported an extension that the end system doesn't actually have, you get to do fixups on the fly as you continue the MITM). And some things, like ssh or anything that uses OpenSSL, you'll have a very hard time because you need to respond with the right certificate or key, which you don't have.
hosts. However, for an ISP operating the NAT gateway, it may be easier to operate independent servers at default port for DNS, SMTP, HTTP and other applications for their customers than operating application relays.
So you're admitting that the NAT breaks things badly enough at the ISP level that running a forwarding ALG is easier than actually making the NAT work.
(HInt - we haven't solved that problem for NAT yet, it's one of the big reasons that NAT breaks stuff)
As you can see, there is no such problem.
You haven't actually *deployed* your solution in a production environment, have you?
valdis.kletnieks@vt.edu wrote:
hosts. However, for an ISP operating the NAT gateway, it may be easier to operate independent servers at default port for DNS, SMTP, HTTP and other applications for their customers than operating application relays.
So you're admitting that the NAT breaks things badly enough at the ISP level that running a forwarding ALG is easier than actually making the NAT work.
No, I don't. I just wrote that, if servers' port numbers are not changeable, which has nothing to do with NAT, ISPs or someone else can run servers, not ALGs. It's like operating a server for whois, when whois commands had a hard coded fixed IP address of the server. Note that, at that time, the Internet was completely transparent that your argument has nothing to do with the transparency.
(HInt - we haven't solved that problem for NAT yet, it's one of the big reasons that NAT breaks stuff)
As you can see, there is no such problem.
You haven't actually *deployed* your solution in a production environment, have you?
Because we still have enough IPv4 addresses, because most users are happy with legacy NAT and because some people loves legacy NAT, there is not much commercial motivation. However, it does not invalidate end to end NAT as a counter argument against people insisting on IPv6 so transparent with a lot of legacy NAT used by people who loves it. That is, end to end transparency can not be a reason to insist on IPv6. Masataka Ohta
On Wed, Jun 20, 2012 at 8:44 AM, Masataka Ohta wrote:
Because we still have enough IPv4 addresses, because most users are happy with legacy NAT and because some people loves legacy NAT, there is not much commercial motivation.
Sure, there are folks out there who believe NAT gives them benefits. Some are actually sane (small multihomers avoiding BGP). You stand out as insane for attempting to redefine "transparent" to mean "inbound communication is possible after negotatiation with multiple levels of NAT".
However, it does not invalidate end to end NAT as a counter argument against people insisting on IPv6 so transparent with a lot of legacy NAT used by people who loves it.
That is, end to end transparency can not be a reason to insist on IPv6.
It certainly is, for those of us not arguing by redefinition. Cheers, Dave Hart
Dave Hart wrote:
Sure, there are folks out there who believe NAT gives them benefits. Some are actually sane (small multihomers avoiding BGP).
They are sane, because there is no proper support for multiple addresses (as is demonstrated by a host with a v4 and a v6 addresses) nor automatic renumbering with neither v4 nor v6. Here, v6 is guilty for lack of transparency because they are the promised features of v6. But, there are people, including me, still working on them both with v4 and v6 and we know they are not a very hard problems.
You stand out as insane for attempting to redefine "transparent" to mean "inbound communication is possible
I just say it is as transparent as hosts directly connected to the Internet with port based routing such as RSIP [RFC3102] hosts: : Abstract : This document examines the general framework of Realm Specific IP : (RSIP). RSIP is intended as a alternative to NAT in which the end- : to-end integrity of packets is maintained. We focus on : implementation issues, deployment scenarios, and interaction with : other layer-three protocols. and despite IESG note on it, RSIP is transparent to IPsec if SPI is regarded as port numbers.
after negotatiation with multiple levels of NAT".
It will be necessary with, according to your definition, insane configuration with multiple levels of NAT.
That is, end to end transparency can not be a reason to insist on IPv6.
It certainly is, for those of us not arguing by redefinition.
The problem is that you are arguing against non existing redefinitions. Masataka Ohta
----- Original Message -----
From: "Dave Hart" <davehart@gmail.com>
Sure, there are folks out there who believe NAT gives them benefits. Some are actually sane (small multihomers avoiding BGP). You stand out as insane for attempting to redefine "transparent" to mean "inbound communication is possible after negotatiation with multiple levels of NAT".
However, it does not invalidate end to end NAT as a counter argument against people insisting on IPv6 so transparent with a lot of legacy NAT used by people who loves it.
That is, end to end transparency can not be a reason to insist on IPv6.
It certainly is, for those of us not arguing by redefinition.
Ah, you're on the "I should be required to allow direct outside connection to my interior machines if I want to be connected to the Internet" crowd. Got it. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274
On Wed, Jun 20, 2012 at 11:05 PM, Jay Ashworth <jra@baylink.com> wrote:
----- Original Message -----
From: "Dave Hart" <davehart@gmail.com>
Sure, there are folks out there who believe NAT gives them benefits. Some are actually sane (small multihomers avoiding BGP). You stand out as insane for attempting to redefine "transparent" to mean "inbound communication is possible after negotatiation with multiple levels of NAT".
However, it does not invalidate end to end NAT as a counter argument against people insisting on IPv6 so transparent with a lot of legacy NAT used by people who loves it.
That is, end to end transparency can not be a reason to insist on IPv6.
It certainly is, for those of us not arguing by redefinition.
Ah, you're on the "I should be required to allow direct outside connection to my interior machines if I want to be connected to the Internet" crowd.
Not quite. I'd go for "I should be able to permit direct outside connection to my interior machines via stable IPv6 prefix, or it's not really the Internet to me." Packet filter to your heart's content. 1:1 NAT your clients if you believe breaking connectivity is in your interest. Cheers, Dave Hart
On Wed, 2012-06-20 at 19:05 -0400, Jay Ashworth wrote:
Ah, you're on the "I should be required to allow direct outside connection to my interior machines if I want to be connected to the Internet" crowd.
Speaking for myself, I'm one of the "if I want to allow direct outside connection to my interior machines I should be able to" crowd. And also one of the "if I and someone else want to connect our hosts directly we should be able to" crowd. That is, I don't want the architecture of the Internet to be crippled by NAT everywhere. If you want to NAT *your* network, go for it. As a local stalwart is wont to say, "I encourage my competitors to do that" ;-) Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer GPG fingerprint: AE1D 4868 6420 AD9A A698 5251 1699 7B78 4EEE 6017 Old fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
On Wed, 2012-06-20 at 19:05 -0400, Jay Ashworth wrote: That is, I don't want the architecture of the Internet to be crippled by NAT everywhere. If you want to NAT *your* network, go for it.
in this case, an air gap might be encouraged randy
Karl Auer wrote:
Speaking for myself, I'm one of the "if I want to allow direct outside connection to my interior machines I should be able to" crowd.
While "direct" and "interior" are not compatible that you actually mean some indirections... Anyway, what if, your ISP assigns a globally unique IPv4 address to your home router (a NAT box) which is UPnP capable? That's what the largest retail ISP in Japan is doing. Masataka Ohta
On Thu, 2012-06-21 at 21:04 +0900, Masataka Ohta wrote:
Karl Auer wrote:
Speaking for myself, I'm one of the "if I want to allow direct outside connection to my interior machines I should be able to" crowd.
While "direct" and "interior" are not compatible that you actually mean some indirections...
I am a native English speaker, and I actually meant exactly what I actually wrote. I have found "direct" and "interior" to be completely compatible. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@biplane.com.au) http://www.biplane.com.au/kauer GPG fingerprint: AE1D 4868 6420 AD9A A698 5251 1699 7B78 4EEE 6017 Old fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
Owen DeLong wrote:
Does not scale. Not enough IPv4 addresses to do that for 6.8 billion people on the planet.
It is the first step to have the RSIP style transparent Internet. The second step is to use port numbers for routing within ISPs. But, it is not necessary today.
What if my ISP just routes my /48? Seems to work quite well, actually.
Unlike IPv4 with natural boundary of /24, routing table explosion of IPv6 is a serious scalability problem. Masataka Ohta
On Jun 21, 2012, at 4:40 PM, Masataka Ohta wrote:
Owen DeLong wrote:
Does not scale. Not enough IPv4 addresses to do that for 6.8 billion people on the planet.
It is the first step to have the RSIP style transparent Internet.
The second step is to use port numbers for routing within ISPs. But, it is not necessary today.
Still doesn't scale. 40 bits isn't enough to uniquely identify a conversation end-point. If you use port numbers for routing, you don't have enough port numbers for conversation IDs.
What if my ISP just routes my /48? Seems to work quite well, actually.
Unlike IPv4 with natural boundary of /24, routing table explosion of IPv6 is a serious scalability problem.
Solvable. IPv6 has enough bits that we can use map/encap or other various forms of herarchical overlay ASN-based routing to resolve those issues over time. Owen
Owen DeLong wrote:
It is the first step to have the RSIP style transparent Internet.
The second step is to use port numbers for routing within ISPs. But, it is not necessary today.
Still doesn't scale. 40 bits isn't enough to uniquely identify a conversation end-point.
It's 48 bit.
If you use port numbers for routing, you don't have enough port numbers for conversation IDs.
That you use IPv4 addresses for routing does not make it unusable for identifications. Moreover, it is easy to have a transport protocol with 32bit or 48bit port numbers with the end to end fashion only by modifying end part of the Internet.
Unlike IPv4 with natural boundary of /24, routing table explosion of IPv6 is a serious scalability problem.
Solvable.
It was solvable.
IPv6 has enough bits that we can use map/encap or other various forms of herarchical overlay ASN-based routing to resolve those issues over time.
The reality is that situation has been worsening over time. As RFC2374 was obsoleted long ago, it is now impossible to restore it. Masataka Ohta
On Fri, 22 Jun 2012 08:40:02 +0900, Masataka Ohta said:
Owen DeLong wrote:
What if my ISP just routes my /48? Seems to work quite well, actually.
Unlike IPv4 with natural boundary of /24, routing table explosion of IPv6 is a serious scalability problem.
Do you have any *realistic* and *actual* reason to suspect that the IPv6 routing table will "explode" any further than the IPv4 has already? Hint - Owen's /48 will just get aggregated and announced just like the cable companies *already* aggregate all those /20s of customer /32s. Unless Owen multihomes - at which point he's a new entry in the v6 routing tables - but *also* almost certainly a new entry in the v4 routing table. Routing table size depends on the number of AS's, not the amount of address space the routes cover.
On Jun 21, 2012, at 5:36 PM, valdis.kletnieks@vt.edu wrote:
On Fri, 22 Jun 2012 08:40:02 +0900, Masataka Ohta said:
Owen DeLong wrote:
What if my ISP just routes my /48? Seems to work quite well, actually.
Unlike IPv4 with natural boundary of /24, routing table explosion of IPv6 is a serious scalability problem.
Do you have any *realistic* and *actual* reason to suspect that the IPv6 routing table will "explode" any further than the IPv4 has already? Hint - Owen's /48 will just get aggregated and announced just like the cable companies *already* aggregate all those /20s of customer /32s. Unless Owen multihomes - at which point he's a new entry in the v6 routing tables - but *also* almost certainly a new entry in the v4 routing table. Routing table size depends on the number of AS's, not the amount of address space the routes cover.
Um, unlikely. My /48 is an ARIN direct assignment: 2620:0:930::/48 It's not really aggregable with their other customers. I do multihome and I am one entry in the v6 routing tables. However, I'm actually two entries in the v4 routing table. 192.159.10.0/24 and 192.124.40.0/23. Owen
valdis.kletnieks@vt.edu wrote:
Unlike IPv4 with natural boundary of /24, routing table explosion of IPv6 is a serious scalability problem.
Do you have any *realistic* and *actual* reason to suspect that the IPv6 routing table will "explode" any further than the IPv4 has already?
That's not the point. The problem is that SRAMs scale well but CAMs do not. Masataka Ohta
On Fri, 22 Jun 2012, Masataka Ohta wrote:
Unlike IPv4 with natural boundary of /24, routing table explosion of IPv6 is a serious scalability problem.
I really don't see where you're getting that from. The biggest consumers of IPv4 space in the US tended to get initial IPv6 blocks from ARIN that were large enough to accommodate their needs for some time. One large v6 prefix in the global routing table is more efficient in terms of the impact on the global routing table than the patchwork of IPv4 blocks those same providers needed to get over time to accommodate growth. Those 'green-field' deployments of IPv6, coupled with the sparse allocation model that the RIRs seem to be using will do a lot to keep v6 routing table growth in check. I see periodic upticks in the growth of the global v6 routing table (a little over 9k prefixes at the moment - the v4 global view is about 415k prefixes right now), which I would reasonably attribute an upswing in networks getting initial assignments. If anything, I see more of a chance for the v4 routing table to grow more out of control, as v4 blocks get chopped up into smaller and smaller pieces in an ultimately vain effort to squeeze a little more mileage out of IPv4. jms
Justin M. Streiner wrote:
I see periodic upticks in the growth of the global v6 routing table (a little over 9k prefixes at the moment - the v4 global view is about 415k prefixes right now), which I would reasonably attribute an upswing in networks getting initial assignments.
As I already wrote: : That's not the point. The problem is that SRAMs scale well but : CAMs do not. it is a lot more difficult to quickly look up 1M routes with /48 than 2M routes with /24.
If anything, I see more of a chance for the v4 routing table to grow more out of control, as v4 blocks get chopped up into smaller and smaller pieces in an ultimately vain effort to squeeze a little more mileage out of IPv4.
The routing table grows mostly because of multihoming, regardless of whether it is v4 or v6. The only solution is, IMO, to let multihomed sites have multiple prefixes inherited from their upper ISPs, still keeping the sites' ability to control loads between incoming multiple links. Masataka Ohta
The only solution is, IMO, to let multihomed sites have multiple prefixes inherited from their upper ISPs, still keeping the sites' ability to control loads between incoming multiple links.
And for the basement multi-homers, RA / SLAAC makes this much easier to do with v6. The larger-scale / more mission-critical multi-homers are going to consume an AS and some BGP space whether you like it or not - at least with v6 there's a really good chance that they'll only *ever* need to announce a single-prefix. (Ignore "traffic engineering" pollution, but that doesn't get better or worse). Regards, Tim.
On Jun 25, 2012, at 12:06 AM, Masataka Ohta wrote:
Justin M. Streiner wrote:
I see periodic upticks in the growth of the global v6 routing table (a little over 9k prefixes at the moment - the v4 global view is about 415k prefixes right now), which I would reasonably attribute an upswing in networks getting initial assignments.
As I already wrote:
: That's not the point. The problem is that SRAMs scale well but : CAMs do not.
it is a lot more difficult to quickly look up 1M routes with /48 than 2M routes with /24.
It is incrementally more difficult, but not a lot at this point. Further, 2M routes in IPv4 at the current prefix:ASN ratios would only map to about 100,000 routes in IPv6. (IPv6 prefix:AS ratio is currently about 3:1 while IPv4 is around 14:1, so if all 35,000 active AS were advertising 3 IPv6 routes, we would be at about 100,000. Most of the growth in the IPv4 routing table represents increases in the prefix:ASN ratio whereas most of the growth in the IPv6 routing table represents additional ASNs coming online with IPv6.)
If anything, I see more of a chance for the v4 routing table to grow more out of control, as v4 blocks get chopped up into smaller and smaller pieces in an ultimately vain effort to squeeze a little more mileage out of IPv4.
The routing table grows mostly because of multihoming, regardless of whether it is v4 or v6.
Assertion proved false by actual data. The majority of the growth in the IPv4 routing table is actually due to disaggregation and slow start. A smaller proportion is due to traffic engineering and multihoming. (See Geoff Huston's various presentations and white papers on this).
The only solution is, IMO, to let multihomed sites have multiple prefixes inherited from their upper ISPs, still keeping the sites' ability to control loads between incoming multiple links.
This is not a solution. This is an administrative nightmare for the multihomed sites which has very poor failure survival characteristics. 1. Established flows do not survive a failover. 2. Every end host has to have knowledge of reachability which is not readily available in order to make a proper source address selection. The solution, in fact, is to move IDR to being locator based while intra-domain routing is done on prefix. This would allow the global table to only contain locator information and not care about prefixes. Currently, in order to do that, we unfortunately have to wrap the entire datagram up inside another datagram. If we were to create a version of the IPv6 header that had a field for destination ASN, we could do this without encapsulation. Unfortunately, encapsulation brings all the MTU baggage of tunneling. More unfortunately, changing the header comes with the need to touch the IP stack on every end host. Neither is an attractive option. It would have been better if IETF had actually solved this instead of punting on it when developing IPv6. Owen
On 6/25/12 7:54 AM, Owen DeLong wrote:
It would have been better if IETF had actually solved this instead of punting on it when developing IPv6.
Dear Owen, The IETF offered a HA solution that operates at the transport level. It solves jumbo frame error detection rate issues, head of queue blocking, instant fail-over, better supports high data rates with lower overhead, offers multi-homing transparently across multiple providers, offers fast setup and anti-packet source spoofing. The transport is SCTP, used by every cellular tower and for media distribution. This transport's improved error detection is now supported in hardware by current network adapters and processors. Conversely, TCP suffers from high undetected stuck bit errors, head of queue blocking, complex multi-homing, slow setup, high process overhead and is prone to source spoofing. It seems OS vendors rather than the IETF hampered progress in this area. Why band-aid on a solved problem? Regards, Douglas Otis
On Mon, Jun 25, 2012 at 1:09 PM, Douglas Otis <dotis@mail-abuse.org> wrote:
On 6/25/12 7:54 AM, Owen DeLong wrote:
It would have been better if IETF had actually solved this instead of punting on it when developing IPv6.
Dear Owen,
The IETF offered a HA solution that operates at the transport level. It solves jumbo frame error detection rate issues, head of queue blocking, instant fail-over, better supports high data rates with lower overhead, offers multi-homing transparently across multiple providers, offers fast setup and anti-packet source spoofing. The transport is SCTP, used by every cellular tower and for media distribution.
This transport's improved error detection is now supported in hardware by current network adapters and processors. Conversely, TCP suffers from high undetected stuck bit errors, head of queue blocking, complex multi-homing, slow setup, high process overhead and is prone to source spoofing. It seems OS vendors rather than the IETF hampered progress in this area. Why band-aid on a solved problem?
can I use sctp to do the facebooks?
On 6/25/12 10:17 AM, Christopher Morrow wrote:
On Mon, Jun 25, 2012 at 1:09 PM, Douglas Otis <dotis@mail-abuse.org> wrote:
On 6/25/12 7:54 AM, Owen DeLong wrote:
It would have been better if IETF had actually solved this instead of punting on it when developing IPv6.
Dear Owen,
The IETF offered a HA solution that operates at the transport level. It solves jumbo frame error detection rate issues, head of queue blocking, instant fail-over, better supports high data rates with lower overhead, offers multi-homing transparently across multiple providers, offers fast setup and anti-packet source spoofing. The transport is SCTP, used by every cellular tower and for media distribution.
This transport's improved error detection is now supported in hardware by current network adapters and processors. Conversely, TCP suffers from high undetected stuck bit errors, head of queue blocking, complex multi-homing, slow setup, high process overhead and is prone to source spoofing. It seems OS vendors rather than the IETF hampered progress in this area. Why band-aid on a solved problem?
can I use sctp to do the facebooks?
Dear Christopher, Not now, but you could. SCTP permits faster page loads and more efficient use of bandwidth. OS vendors could embrace SCTP to achieve safer and faster networks also better able to scale. Instead, vendors are hacking HTTP to provide experimental protocols like SPDY which requires extensions like: http://tools.ietf.org/search/draft-agl-tls-nextprotoneg-00 The Internet should use more than port 80 and port 443. Is extending entrenched TCP cruft really taking the Internet to a better and safer place? Regards, Douglas Otis
On Mon, Jun 25, 2012 at 1:58 PM, Douglas Otis <dotis@mail-abuse.org> wrote:
The Internet should use more than port 80 and port 443. Is extending entrenched TCP cruft really taking the Internet to a better and safer place?
isn't the 'internet should use more than 80/443' really: "Some compelling use case should be found for more than 2 ports" ? Or perhaps more clearly: "What application is written that is getting wide appeal and uses more than 80/443?" (aside from edonkey which Arbor always shows as a huge user of bandwidth) -chris (btw, it would be nice to use more ports, if there are applications and users of said applications that want to do that...)
On Mon, Jun 25, 2012 at 1:09 PM, Douglas Otis <dotis@mail-abuse.org> wrote:
On 6/25/12 7:54 AM, Owen DeLong wrote:
It would have been better if IETF had actually solved this instead of punting on it when developing IPv6.
The IETF offered a HA solution that operates at the transport level. The transport is SCTP
Hi Douglas, SCTP proposes a solution to multihoming by multi-addressing each server. Each address represents one of the leaf node's paths to the Internet and if one fails an SCTP session can switch to the other. Correct? How does SCTP address the most immediate problem with multiaddressed TCP servers: the client doesn't rapidly find a currently working address from the set initially offered by A and AAAA DNS records. Is there anything in the SCTP protocol for this? Or does it handle it exactly the way TCP does (nothing at all in the API; app-controlled timeout and round robin)? Is the SCTP API drop-in compatible with TCP where a client can change a parameter in a socket() call and expect it to try SCTP and promptly fall back to TCP if no connection establishes? On the server side, does it work like the IPv6 API where one socket accepts both protocols? Or do the apps have to be redesigned to handle both SCTP and TCP? Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On 6/25/12 12:20 PM, William Herrin wrote:
On Mon, Jun 25, 2012 at 1:09 PM, Douglas Otis <dotis@mail-abuse.org> wrote:
On 6/25/12 7:54 AM, Owen DeLong wrote:
It would have been better if IETF had actually solved this instead of punting on it when developing IPv6.
The IETF offered a HA solution that operates at the transport level. The transport is SCTP
Hi Douglas,
SCTP proposes a solution to multihoming by multi-addressing each server. Each address represents one of the leaf node's paths to the Internet and if one fails an SCTP session can switch to the other. Correct?
Dear William, Yes. An SCTP association periodically checks alternate path functionality.
How does SCTP address the most immediate problem with multiaddressed TCP servers: the client doesn't rapidly find a currently working address from the set initially offered by A and AAAA DNS records. Is there anything in the SCTP protocol for this? Or does it handle it exactly the way TCP does (nothing at all in the API; app-controlled timeout and round robin)?
This is addressed by deprecating use of TCP, since SCTP offers a super-set of the socket API. It can also dramatically expand the number of virtual associations supported in a manner similar to that of UDP while still mitigating source spoofing.
Is the SCTP API drop-in compatible with TCP where a client can change a parameter in a socket() call and expect it to try SCTP and promptly fall back to TCP if no connection establishes? On the server side, does it work like the IPv6 API where one socket accepts both protocols? Or do the apps have to be redesigned to handle both SCTP and TCP?
The SCTP socket API is defined by: http://tools.ietf.org/html/rfc6458 As the world adopts IPv6, NAT issues become a bad memory of insecure middle boxes replaced by transports that can be as robust as necessary. IMHO, TCP is the impediment preventing simplistic (hardware based) high speed interfaces able to avoid buffer bloat. Regards, Douglas Otis
On Mon, Jun 25, 2012 at 7:06 PM, Douglas Otis <dotis@mail-abuse.org> wrote:
On 6/25/12 12:20 PM, William Herrin wrote:
How does SCTP address the most immediate problem with multiaddressed TCP servers: the client doesn't rapidly find a currently working address from the set initially offered by A and AAAA DNS records. Is there anything in the SCTP protocol for this? Or does it handle it exactly the way TCP does (nothing at all in the API; app-controlled timeout and round robin)?
This is addressed by deprecating use of TCP, since SCTP offers a super-set of the socket API. It can also dramatically expand the number of virtual associations supported in a manner similar to that of UDP while still mitigating source spoofing.
Hi Douglas, Your answer was not responsive to my question. I'll rephrase. The most immediate problem with multiaddressed TCP servers is that clients have no way to pass the list of IPv4 and IPv6 addresses received from DNS to the layer 4 protocol as a whole. Instead, the application must try each in sequence. This results in connect delays (2 minutes by default) for each address which is not currently reachable as the application attempts a TCP connection to each in sequence, trying the next after a time out. This delay is often unacceptable. Does SCTP operate on a list of IPv4 and IPv6 addresses received from the application when it asks for a connect, parallelizing its attempt to reach a live address? Or a DNS name which it resolves to find those addresses? Or does it accept only one address at a time for the initial connect, just like TCP? Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On Mon, Jun 25, 2012 at 8:03 PM, William Herrin <bill@herrin.us> wrote:
Does SCTP operate on a list of IPv4 and IPv6 addresses received from the application when it asks for a connect, parallelizing its attempt to reach a live address? Or a DNS name which it resolves to find those addresses? Or does it accept only one address at a time for the initial connect, just like TCP?
Hi Douglas, Another gentleman clarified for me privately: sctp_connectx() is listed as a new function in the 12/2011 standard. It accepts and uses multiple addresses during the initial connect. Good progress since the last time I looked at SCTP. I assume the SCTP API does not gracefully fall back to TCP for stream-oriented connections and UDP for datagram oriented connections, yes? So if an app author wants to use this in the real world as it exists in 2012, he'll have to juggle timeouts in order to try TCP if SCTP doesn't promptly establish. And he'll have to juggle the two APIs anywhere he does something more complex than send() and recv(). Yes? Also, has there been improvement to the situation where an endpoint loses all of its IP addresses and wants to re-establish? Something like a notification to the app requesting a fresh list of addresses? Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On Jun 25, 2012 6:38 PM, "William Herrin" <bill@herrin.us> wrote:
On Mon, Jun 25, 2012 at 8:03 PM, William Herrin <bill@herrin.us> wrote:
Does SCTP operate on a list of IPv4 and IPv6 addresses received from the application when it asks for a connect, parallelizing its attempt to reach a live address? Or a DNS name which it resolves to find those addresses? Or does it accept only one address at a time for the initial connect, just like TCP?
Hi Douglas,
Another gentleman clarified for me privately: sctp_connectx() is listed as a new function in the 12/2011 standard. It accepts and uses multiple addresses during the initial connect.
Good progress since the last time I looked at SCTP.
I assume the SCTP API does not gracefully fall back to TCP for stream-oriented connections and UDP for datagram oriented connections, yes? So if an app author wants to use this in the real world as it exists in 2012, he'll have to juggle timeouts in order to try TCP if SCTP doesn't promptly establish. And he'll have to juggle the two APIs anywhere he does something more complex than send() and recv(). Yes?
There is some scope for this type of work.... This draft is expired, i imagine it may come back soonish http://tools.ietf.org/html/draft-wing-tsvwg-happy-eyeballs-sctp-02 now that the ipv6 variant has shipped SCTP is coming along, and it has a lot of promise. CB
Also, has there been improvement to the situation where an endpoint loses all of its IP addresses and wants to re-establish? Something like a notification to the app requesting a fresh list of addresses?
Regards, Bill Herrin
-- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On Mon, 25 Jun 2012, Cameron Byrne wrote:
SCTP is coming along, and it has a lot of promise.
Doesn't SCTP "suffer" from the same problem as SHIM6 was said to be suffering from, ie that now all of a sudden end systems control where packets go and there is going to be a bunch of people on this list complaining that they no longer can do "traffic engineering"? I don't mind. I wish more would use SCTP so it would get wider use. I also wish <http://mosh.mit.edu/> would have used SCTP instead of trying to invent that part again (the transport part of it at least). -- Mikael Abrahamsson email: swmike@swm.pp.se
On 6/25/12 10:33 PM, Mikael Abrahamsson wrote:
On Mon, 25 Jun 2012, Cameron Byrne wrote:
SCTP is coming along, and it has a lot of promise.
Doesn't SCTP "suffer" from the same problem as SHIM6 was said to be suffering from, ie that now all of a sudden end systems control where packets go and there is going to be a bunch of people on this list complaining that they no longer can do "traffic engineering"?
Dear Mikael, SCTP permits multiple provider support of specific hosts where instant fail-over is needed. When DNS returns multiple IP addresses, an application calls sctp_connectx() with this list combined into an association endpoint belonging to a single host. This eliminates a need for PI addresses and related router table growth when high availability service becomes popular. Rather than having multi-homing implemented at the router, SCTP fail-over does not require 20 second delays nor will fail-over cause a sizable shift in traffic that might introduce other instabilities. Although not all details related to multi-homing remain hidden, SCTP offers several significant advantages related to performance and reliability. SCTP can isolate applications over fewer ports. Unlike TCP, SCTP can combine thousands of independent streams into a single association and port. SCTP offers faster setup and can eliminate head-of-queue blocking and the associated buffering involved. SCTP also compensates for reduced Ethernet error detection rates when Jumbo frames are used. Providers able to control multiple routers will likely prefer router based methods. A router approach will not always offer a superior solution nor will it limit router table growth, but traffic engineering should remain feasible when SCTP is used instead.
I don't mind. I wish more would use SCTP so it would get wider use. I also wish <http://mosh.mit.edu/> would have used SCTP instead of trying to invent that part again (the transport part of it at least).
Perhaps MIT could have implemented SCTP over UDP as a starting point. An adoption impediment has been desktop OS vendors. This may change once SCTP's advantages become increasingly apparent with the rise of data rates and desires for greater resiliency and security. Regards, Douglas Otis
On Jun 21, 2012, at 5:04 AM, Masataka Ohta wrote:
Karl Auer wrote:
Speaking for myself, I'm one of the "if I want to allow direct outside connection to my interior machines I should be able to" crowd.
While "direct" and "interior" are not compatible that you actually mean some indirections...
Anyway, what if, your ISP assigns a globally unique IPv4 address to your home router (a NAT box) which is UPnP capable?
That's what the largest retail ISP in Japan is doing.
Masataka Ohta
Does not scale. Not enough IPv4 addresses to do that for 6.8 billion people on the planet. What if my ISP just routes my /48? Seems to work quite well, actually. Owen
On Jun 6, 2012, at 9:53 AM, Anton Smith wrote:
<snip>
Hi all,
Potentially silly question but, as Bill points out a LAN always occupies a /64.
Does this imply that we would have large L2 segments with a large number of hosts on them? What about the age old discussion about keeping broadcast segments small?
Or, will it be that a /64 will only typically have a similar number of hosts in it as say, a /23|4 in the IPv4 world?
Cheers, Anton
Now you have deduced the beauty of the scheme. The number of end points does not matter to IPv6 address planning. Said another way - my factory subnet may have a gazillion(1) little machines on one subnet while my data center boxes may have several subnets. Just count the subnets. Let the traffic/technology drive the use per subnet whilst you TRILL(2) a pretty tune. Note (1) Gazillion < 2^64 Note (2) Thanks, Radia James R. Cutler james.cutler@consultant.com
On Wed, 06 Jun 2012 14:53:02 +0100, Anton Smith said:
Potentially silly question but, as Bill points out a LAN always occupies a /64.
Does this imply that we would have large L2 segments with a large number of hosts on them? What about the age old discussion about keeping broadcast segments small?
Or, will it be that a /64 will only typically have a similar number of hosts in it as say, a /23|4 in the IPv4 world?
We simply allocated a v6 /64 for each v4 /21, /22, /23, /2whatever in our network. Works fine. No more "what the fsck is the subnet in THIS building?" issues. Amazing how often we find hosts that are in one building but misconfig'ed because the sysadmin put in the netmask for the building his office is in, not the building the server is in. When there's 125+ buildings on the campus, this matters. ;) As somebody else mentioned, the limiting factor is "How much ND/ARP traffic are you willing to tolerate in one broadcast domain?".
participants (33)
-
Adam Kennedy
-
Alexandru Petrescu
-
Anton Smith
-
Bryan Irvine
-
Cameron Byrne
-
Chris Grundemann
-
Christopher Morrow
-
Chuck Church
-
Cutler James R
-
Dale W. Carder
-
Dave Hart
-
David Hubbard
-
Dobbins, Roland
-
Douglas Otis
-
isabel dias
-
Jay Ashworth
-
Jean-Francois.TremblayING@videotron.com
-
JORDI PALET MARTINEZ
-
Justin M. Streiner
-
Karl Auer
-
Mark Andrews
-
Mark Boolootian
-
Masataka Ohta
-
Mikael Abrahamsson
-
Owen DeLong
-
Randy Bush
-
Ricky Beam
-
Seth Mos
-
Steve Clark
-
Tim Franklin
-
Tony Hain
-
valdis.kletnieks@vt.edu
-
William Herrin