Re: It's Ars Tech's turn to bang the IPv4 exhaustion drum
On 8/19/08 1:50 PM, "sthaug@nethelp.no" <sthaug@nethelp.no> wrote:
In practice, many routers require the packet to go twice in the hardware if the prefix length is > 64 bits, so even though it is a total waste of space, it is not stupid to use /64 for point-to-point links and even for loopbacks!
Could you provide some documentation on this? First I've heard about it.
Ask your favorite router vendor. This has been confirmed to me by at least 3 major one we use. - Alain.
Date: Tue, 19 Aug 2008 14:30:38 -0400 From: Alain Durand <alain_durand@cable.comcast.com>
On 8/19/08 1:50 PM, "sthaug@nethelp.no" <sthaug@nethelp.no> wrote:
In practice, many routers require the packet to go twice in the hardware if the prefix length is > 64 bits, so even though it is a total waste of space, it is not stupid to use /64 for point-to-point links and even for loopbacks!
Could you provide some documentation on this? First I've heard about it.
Ask your favorite router vendor. This has been confirmed to me by at least 3 major one we use.
Odd. I have asked both of our router vendors and they have confirmed that they route in the ASIC based on the full address, not just the first 64 bits. (I believe one of them based on actual testing. I am suspicious of the other.) That said, one does use a few bits for something else (port) and does not load them into the FIB, so I believe they route on 120 bits, not 128. I'd love to get complete verification of the real facts of this. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751
What I was told is that, yes, the packet get routed through the ASIC, but it has to go there twice... Hence reducing the pps by a factor of 2 compare to IPv4. Some vendors had shortcuts that, if the prefix len was < 64, only one pass was necessary. Caveat, this may not be true for all vendors or all models of all vendors. YMMV. - Alain. On 8/19/08 4:22 PM, "Kevin Oberman" <oberman@es.net> wrote:
Date: Tue, 19 Aug 2008 14:30:38 -0400 From: Alain Durand <alain_durand@cable.comcast.com>
On 8/19/08 1:50 PM, "sthaug@nethelp.no" <sthaug@nethelp.no> wrote:
In practice, many routers require the packet to go twice in the hardware if the prefix length is > 64 bits, so even though it is a total waste of space, it is not stupid to use /64 for point-to-point links and even for loopbacks!
Could you provide some documentation on this? First I've heard about it.
Ask your favorite router vendor. This has been confirmed to me by at least 3 major one we use.
Odd. I have asked both of our router vendors and they have confirmed that they route in the ASIC based on the full address, not just the first 64 bits. (I believe one of them based on actual testing. I am suspicious of the other.)
That said, one does use a few bits for something else (port) and does not load them into the FIB, so I believe they route on 120 bits, not 128.
I'd love to get complete verification of the real facts of this.
matsuzaki-san's preso, i think the copy he will present next week at apops: http://www.attn.jp/presentation/apnic26-maz-ipv6-p2p.pdf randy
On 20 aug 2008, at 3:31, Randy Bush wrote:
matsuzaki-san's preso, i think the copy he will present next week at apops:
He (she?) says packets will ping-pong across the link if they are addressed to an address on the p2p subnet that isn't used. However, this is only true if there is no address resolution on the subnet, which would be the normal mode of operation with IPv4 on p2p links because those links don't have addresses and there is no ARP. With v6 on the other hand, ND can work on all link types and PPP does negotiate an address of sorts. So whether this actually happens on a true point-to-point link is open with IPv6, and if it's a point-to-point ethernet or similar link you only get some neighbor discovery traffic that goes nowhere, not an increase in actual traffic.
On 8/20/2008 at 1:54 AM, Iljitsch van Beijnum <iljitsch@muada.com> wrote: On 20 aug 2008, at 3:31, Randy Bush wrote:
matsuzaki-san's preso, i think the copy he will present next week at
apops:
He (she?) says packets will ping-pong across the link if they are addressed to an address on the p2p subnet that isn't used. However,
this is only true if there is no address resolution on the subnet, which would be the normal mode of operation with IPv4 on p2p links because those links don't have addresses and there is no ARP. With v6
on the other hand, ND can work on all link types and PPP does negotiate an address of sorts.
So whether this actually happens on a true point-to-point link is open with IPv6, and if it's a point-to-point ethernet or similar link you
only get some neighbor discovery traffic that goes nowhere, not an increase in actual traffic.
On a "true" P-to-P link, there is no netmask, no? A netmask is a concept that applies to broadcast media, like Ethernet. Even if you only have two hosts on an Ethernet link, it's not really P-to-P in the strict sense. For example, my IPv6-over-IPv4 tunnel from home (thank you HE, tunnelbroker.net) terminating on a Soekris net5501 running FreeBSD 7.0 (im s0 l33t!!11), gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280 tunnel inet 24.6.175.101 --> 72.52.104.74 inet6 fe80::200:24ff:feca:91b4%gif0 prefixlen 64 scopeid 0x7 inet6 2001:470:1f04:2fc::2 --> 2001:470:1f04:2fc::1 prefixlen 128 Note the prefixlen on the P-to-P portion inet6 configuration, 128. And the IPv4 tunnel part doesn't bother with a netmask. It doesn't make sense for a P-to-P. But the link-local on the gif(4)... well, hmmm. But as for how ND works over a P-to-P, the FreeBSD stack seems to be a little odd. I see, IP6 2001:470:1f04:2fc::2 > 2001:470:1f04:2fc::1: ICMP6, neighbor solicitation, who has 2001:470:1f04:2fc::1, length 24 IP6 2001:470:1f04:2fc::1 > 2001:470:1f04:2fc::2: ICMP6, neighbor advertisement, tgt is 2001:470:1f04:2fc::1, length 24 IP6 2001:470:1f04:2fc::1 > 2001:470:1f04:2fc::2: ICMP6, neighbor solicitation, who has 2001:470:1f04:2fc::2, length 24 IP6 2001:470:1f04:2fc::1 > 2001:470:1f04:2fc::2: ICMP6, neighbor solicitation, who has 2001:470:1f04:2fc::2, length 24 IP6 2001:470:1f04:2fc::1 > 2001:470:1f04:2fc::2: ICMP6, neighbor solicitation, who has 2001:470:1f04:2fc::2, length 24 My FreeBSD 7.0 occasionally will attempt gratuitous ND, and the other end responds, but when the remote tries to find us... silence. I don't think it's my ipf ruleset either. I tried to figure out if this is a bug or feature, but digging in src/sys/netinet6... well, it's been several years since I was last in there and it made my brain hurt. But despite all of that, it all works pretty sweetly. This is from a FreeBSD 6.2 host that does autoconf to the tunnel endpoint in my tunnelbroker.net /48 (and uses temporary, privacy extension addresses, m4d l33tnes), $ /sbin/ping6 www.freebsd.org PING6(56=40+8+8 bytes) 2001:470:8045:0:7034:f7e7:3d02:c41a --> 2001:4f8:fff6::21 16 bytes from 2001:4f8:fff6::21, icmp_seq=0 hlim=56 time=16.844 ms 16 bytes from 2001:4f8:fff6::21, icmp_seq=1 hlim=56 time=17.674 ms 16 bytes from 2001:4f8:fff6::21, icmp_seq=2 hlim=56 time=15.692 ms 16 bytes from 2001:4f8:fff6::21, icmp_seq=3 hlim=56 time=45.123 ms 16 bytes from 2001:4f8:fff6::21, icmp_seq=4 hlim=56 time=116.619 ms 16 bytes from 2001:4f8:fff6::21, icmp_seq=5 hlim=56 time=22.286 ms 16 bytes from 2001:4f8:fff6::21, icmp_seq=6 hlim=56 time=18.861 ms 16 bytes from 2001:4f8:fff6::21, icmp_seq=7 hlim=56 time=15.797 ms 16 bytes from 2001:4f8:fff6::21, icmp_seq=8 hlim=56 time=15.391 ms 16 bytes from 2001:4f8:fff6::21, icmp_seq=9 hlim=56 time=19.165 ms 16 bytes from 2001:4f8:fff6::21, icmp_seq=10 hlim=56 time=47.429 ms ^C --- www.freebsd.org ping6 statistics --- 11 packets transmitted, 11 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 15.391/31.898/116.619/28.985 ms B¼information contained in this e-mail message is confidential, intended only for the use of the individual or entity named above. If the reader of this e-mail is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any review, dissemination, distribution or copying of this communication is strictly prohibited. If you have received this e-mail in error, please contact postmaster@globalstar.com
On 20 aug 2008, at 20:34, Crist Clark wrote:
On a "true" P-to-P link, there is no netmask, no? A netmask is a concept that applies to broadcast media, like Ethernet. Even if you only have two hosts on an Ethernet link, it's not really P-to-P in the strict sense.
An interface needs a prefix length (subnet mask for those of us stuck in the '90s) so the system knows which addresses are directly connected through the interface in question. Whether the link is point- to-point, broadcast or NBMA doesn't matter for that purpose.
But as for how ND works over a P-to-P, the FreeBSD stack seems to be a little odd.
[...]
But despite all of that, it all works pretty sweetly.
There have been compatibility issues with PPP for IPv6 in the past because some implementations would do ND and others wouldn't...
On 8/20/2008 at 11:57 AM, Iljitsch van Beijnum <iljitsch@muada.com> wrote: On 20 aug 2008, at 20:34, Crist Clark wrote:
On a "true" P-to-P link, there is no netmask, no? A netmask is a concept that applies to broadcast media, like Ethernet. Even if you only have two hosts on an Ethernet link, it's not really P-to-P in the strict sense.
An interface needs a prefix length (subnet mask for those of us stuck
in the '90s) so the system knows which addresses are directly connected through the interface in question. Whether the link is point- to-point, broadcast or NBMA doesn't matter for that purpose.
No, that's my point. On a true point-to-point link, there is only one other address on the link. That's what point-to-point means. And no, that does not really mean there is an implied /32 (for IPv4) or /128 (for IPv6) on the link since that would tell the system its the only address on the link. For example, on the IPv4 ends gif(4) tunnel in my previous message, gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280 tunnel inet 24.6.175.101 --> 72.52.104.74 inet6 fe80::200:24ff:feca:91b4%gif0 prefixlen 64 scopeid 0x7 inet6 2001:470:1f04:2fc::2 --> 2001:470:1f04:2fc::1 prefixlen 128 A netmask doesn't make sense. They're not on the same LAN since there is no LAN on a point-to-point tunnel. (The most specific mask those two share is 0x8000000.) As for the IPv6 portion, the two endpoints happen to be adjacent, they look like they are "on the same network," but there is no reason that has to be in the general case just like the IPv4 case. There is no LAN. It's point-to-point. It could be, 2001:470:8045:0:2b0:d0ff:fe2c:982d --> 2001:470:1f04:2fc::1 (That only happen share the /16 belonging to the ISP, not even on the same /64) and everything would be fine. Right? Or is something different about IPv6? I'm wondering if all of this confusion is about people calling an Ethernet, or other broadcast media, link with only two interfaces on it point-to-point. It's not. B¼information contained in this e-mail message is confidential, intended only for the use of the individual or entity named above. If the reader of this e-mail is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any review, dissemination, distribution or copying of this communication is strictly prohibited. If you have received this e-mail in error, please contact postmaster@globalstar.com
On 20 aug 2008, at 21:33, Crist Clark wrote:
No, that's my point. On a true point-to-point link, there is only one other address on the link. That's what point-to-point means.
For example, on the IPv4 ends gif(4) tunnel in my previous message,
gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280 tunnel inet 24.6.175.101 --> 72.52.104.74 inet6 fe80::200:24ff:feca:91b4%gif0 prefixlen 64 scopeid 0x7 inet6 2001:470:1f04:2fc::2 --> 2001:470:1f04:2fc::1 prefixlen 128
Note that this interface doesn't _have_ any IPv4 addresses: the IPv4 addresses that you see are the tunnel endpoints. However, the IPv6 addresses do what you say: there is a local one and a remote one and they don't share a subnet. Obviously it's possible to do this, but in my opinion, this is just an implementation variation, not the natural state of point-to-point links. It makes much more sense to have one set of behaviors that applies to all interfaces. And what is a point-to-point link, anyway? In theory gigabit ethernet is CSMA/CD, but I don't think anyone ever bothered to implement that, in practice it's point-to-point on layer 1, but layer 2 is point-to- multipoint...
On 21 Aug 2008, at 09:09, Iljitsch van Beijnum wrote:
On 20 aug 2008, at 21:33, Crist Clark wrote:
No, that's my point. On a true point-to-point link, there is only one other address on the link. That's what point-to-point means.
For example, on the IPv4 ends gif(4) tunnel in my previous message,
gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280 tunnel inet 24.6.175.101 --> 72.52.104.74 inet6 fe80::200:24ff:feca:91b4%gif0 prefixlen 64 scopeid 0x7 inet6 2001:470:1f04:2fc::2 --> 2001:470:1f04:2fc::1 prefixlen 128
Note that this interface doesn't _have_ any IPv4 addresses: the IPv4 addresses that you see are the tunnel endpoints.
However, the IPv6 addresses do what you say: there is a local one and a remote one and they don't share a subnet. Obviously it's possible to do this, but in my opinion, this is just an implementation variation, not the natural state of point-to-point links. It makes much more sense to have one set of behaviors that applies to all interfaces.
And what is a point-to-point link, anyway? In theory gigabit ethernet is CSMA/CD, but I don't think anyone ever bothered to implement that, in practice it's point-to-point on layer 1, but layer 2 is point-to-multipoint...
1000BASE-PX10 and 1000BASE-PX20 are both point to multipoint at layer 1.
matsuzaki-san's preso, i think the copy he will present next week at apops:
To summarize, using /64 on a link opens the door to a DOS problem that we need to pressure the vendors to fix. Obviously, this matters more to people who are running full-blown production IPv6 networks right now than it does to people in the planning stages. But everyone should really contact their vendor and find out when this issue will be fixed. What could vendors do? They could have an implied packet filter builtin to the router code, or they could treat all odd addresses from a /64 as implicitly assigned to the :1 end and all even ones as implicitly assigned to the :2 end. Workarounds are to use /64 on the link from a link-local address range, or to filter incoming traffic that could trigger the problem or to use a /127 on the link. In the latter case, you should read and understand the implications documented in RFC 3627 <http://tools.ietf.org/html/rfc3627> In any case, IPv6 is not cut and dried. The landscape is still shifting and the only way for you to learn what works and what doesn't is to deploy it seriously. --Michael Dillon
michael.dillon@bt.com wrote:
matsuzaki-san's preso, i think the copy he will present next week at apops:
To summarize, using /64 on a link opens the door to a DOS problem that we need to pressure the vendors to fix.
How is this not an obvious 'duh' kind of situation that just depends on doing ones configuration correctly? A similar problem occurs when one assigns a /48 down the P2P link and the downstream user has a default route back upstream but doesn't route the /48 to a loopback, but only routes a part of it (eg a /64 or two). eg: { Internet} - { ISP } - { p2p-link } - { customer } - { c1 } \ { c2 } p2p-link = 2001:db8:1000::/64 (::1 == ISP, ::2 == Customer) customer = 2001:db8:2000::/48 via 2001:db8:1000::2 c1 = 2001:db8:2000:1::/64 c2 = 2001:db8:2000:1::/64 Packets from $internet to 2001:db8:2000:1234::1 will travel down to the customer, who routes it with it's default back up to the p2p-link, where your correctly configured box will see a source address of $internet and icmp admin reject it because that is an invalid source address. Indeed the packet will bounce back up and a third packet (the icmp) will be sent thus you have an amplification of 3x, but who cares? that is at the customer link, they should configure that link correctly, and they are paying you for that link anyway -> their problem, your cash $$$ :) RPF saves the day here yet again. Remember boys and girls to configure at least your boxes correctly, don't trust other people to do the same ;) There are various number of "ISP's" who of course don't do this and which allow full spoofing from any prefix as they don't do RPF or even something simple as a "source != 2001:db8::/32" or whatever they have as their own prefix on their core routers. There of course also "ISP's" which think they are transits and tunnel to everybody they can find, these "ISP's" then of course also don't do any spoofing-filtering and generally have 'customers' that exhibit the same problem, as those just set a default back upstream. Take a small guess how easy it is to take those networks off the Internet.... better start fixing that broken setup ;) Greets, Jeroen
participants (8)
-
Alain Durand
-
Crist Clark
-
Ian Mason
-
Iljitsch van Beijnum
-
Jeroen Massar
-
Kevin Oberman
-
michael.dillon@bt.com
-
Randy Bush