IP4 address conservation method
I read: http://www.nanog.org/sites/default/files/tues.general.Papandreou.conservatio... I would like to point out RFC 3069. On most cisco equipment this is done using static routes and "ip unnumbered". So my question is basically: What am I missing? Why can't data center guys not build their network the same way regular ETTH is done? Either one vlan per customer and sharing the IPv4 subnet between several vlans, or having several customers in the same vlan but use antispoofing etc (IETF SAVI-wg functionality) to handle the security stuff? One vlan per customer also works very well with IPv6. -- Mikael Abrahamsson email: swmike@swm.pp.se
On 06/05/13 00:34 +0200, Mikael Abrahamsson wrote:
I read:
http://www.nanog.org/sites/default/files/tues.general.Papandreou.conservatio...
I would like to point out RFC 3069. On most cisco equipment this is done using static routes and "ip unnumbered".
So my question is basically: What am I missing? Why can't data center guys not build their network the same way regular ETTH is done? Either one vlan per customer and sharing the IPv4 subnet between several vlans, or having several customers in the same vlan but use antispoofing etc (IETF SAVI-wg functionality) to handle the security stuff?
VLAN-per-subscriber (1 customer per VLAN), can require more costly routing equipment, particularly if you're performing double tagging (outer tag for switch, inner tag for customer). Sharing an IPv4 subnet among customers is appropriate for residential and small business services, which is how we typically deliver service. But may be less appropriate for larger business customers (and I presume hosting customers) where the number of IPs is large enough that you're throwing away less addresses ratio-wise. Generally the simpler deployment model wins out in that type of scenario. Also, the 'ip unnumbered' approach may require some layer-3 security features. VLAN-per-service (>1 customer sharing a VLAN) is problematic, and typically pushes a lot of IPv4 specific layer-3 security features (MACFF, DHCPv4 snooping, proxy arp, broadcast forwarding/split horizon) down into the access equipment, and that's rarely a perfect feature set. In my experience, IPv6 services lag behind on such equipment because those v4 security features break v6.
One vlan per customer also works very well with IPv6.
+1 -- Dan White
Dan White wrote the following on 6/5/2013 9:44 AM:
On 06/05/13 00:34 +0200, Mikael Abrahamsson wrote:
I read:
http://www.nanog.org/sites/default/files/tues.general.Papandreou.conservatio...
I would like to point out RFC 3069. On most cisco equipment this is done using static routes and "ip unnumbered".
So my question is basically: What am I missing? Why can't data center guys not build their network the same way regular ETTH is done? Either one vlan per customer and sharing the IPv4 subnet between several vlans, or having several customers in the same vlan but use antispoofing etc (IETF SAVI-wg functionality) to handle the security stuff?
VLAN-per-subscriber (1 customer per VLAN), can require more costly routing equipment, particularly if you're performing double tagging (outer tag for switch, inner tag for customer). Sharing an IPv4 subnet among customers is appropriate for residential and small business services, which is how we typically deliver service. But may be less appropriate for larger business customers (and I presume hosting customers) where the number of IPs is large enough that you're throwing away less addresses ratio-wise. Generally the simpler deployment model wins out in that type of scenario. Also, the 'ip unnumbered' approach may require some layer-3 security features.
One thing not mentioned so far in this discussion is using PPPoE or some other tunnel/VPN technology for efficient IP utilization. The result could be zero wasted IP addresses without the need to resort to non-routable IP addresses in a customer's path (as the pdf suggested) and without some of the quirkyness or vendor lock-in of using ip unnumbered. PPPoE (and other VPNs) have many of the same downsides as mentioned above though, they require routing cost and increase the complexity of the network. The question becomes which deployment has more cost: the simple, yet wasteful, design or the efficient, but complex, design. --Blake
* Blake Hudson
One thing not mentioned so far in this discussion is using PPPoE or some other tunnel/VPN technology for efficient IP utilization. The result could be zero wasted IP addresses without the need to resort to non-routable IP addresses in a customer's path (as the pdf suggested) and without some of the quirkyness or vendor lock-in of using ip unnumbered.
PPPoE (and other VPNs) have many of the same downsides as mentioned above though, they require routing cost and increase the complexity of the network. The question becomes which deployment has more cost: the simple, yet wasteful, design or the efficient, but complex, design.
<shameless plug alert> Or, simply just use IPv6, and use a stateless translation service located in the core network to provide IPv4 connectivity to the public Internet services. This allows for 100% efficient utilisation of whatever IPv4 addresses you have left - nothing needs to go to waste due to router interfaces, subnet power of 2 overhead, internal servers/services that have no Internet-available services, etc...all without requiring you to do anything special on the server/application stacks to support it (like set up tunnel endpoints), add dual-stack complexity into your network, or introduce any form of stateful translation or VPN service into your network. Here's some more resources: http://fud.no/talks/20130321-V6_World_Congress-The_Case_for_IPv6_Only_Data_C... http://tools.ietf.org/html/draft-anderson-siit-dc-00 In case you're interested in more, Ivan Pepelnjak and I will host a (free) webinar about the approach next week. Feel free to join! http://www.ipspace.net/IPv6-Only_Data_Centers BTW: I hear Cisco has implemented support for this approach in their latest AS1K code, although I haven't confirmed this myself yet. Tore
On Tue, Jun 4, 2013 at 6:34 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
http://www.nanog.org/sites/default/files/tues.general.Papandreou.conservatio...
So my question is basically: What am I missing?
Both the router and host have to support sending and accepting invalid ARP requests. Since the Linux kernel already mishandles arp by default, you're probably begging for unexpected behavior. Double down on that if the customer controls the server image. I don't have any experience with softlayer but I have had to abandon a handful of VPS providers due to bizarre routing failures they couldn't fix. I was particularly thrilled with the one where if I didn't ping the second-hop router from each of the VPS's IPs at least once every 15 seconds it would eventually forget how to reach the address. I could log in via one of the other addresses and confirm with tcpdump that no arps or anything else would appear on the interface. Their advice? Disable iptables. Thanks guys, real helpful. -Bill -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On Wed, 5 Jun 2013, William Herrin wrote:
Both the router and host have to support sending and accepting invalid ARP requests. Since the Linux kernel already mishandles arp by default, you're probably begging for unexpected behavior. Double down on that if the customer controls the server image.
Exactly what is wrong with the ARP answers and requests sent using local-proxy-arp? -- Mikael Abrahamsson email: swmike@swm.pp.se
On Wed, Jun 5, 2013 at 12:11 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
On Wed, 5 Jun 2013, William Herrin wrote:
Both the router and host have to support sending and accepting invalid ARP requests. Since the Linux kernel already mishandles arp by default, you're probably begging for unexpected behavior. Double down on that if the customer controls the server image.
Exactly what is wrong with the ARP answers and requests sent using local-proxy-arp?
Nothing. The problem is that the arp source IP doesn't fall within the interface netmask at the receiver. Some receivers ignore that... after all, why do they care what the source IP is? They only care about the source MAC. Other receivers see a spoofed packet and drop it. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On Wed, 5 Jun 2013, William Herrin wrote:
Nothing. The problem is that the arp source IP doesn't fall within the interface netmask at the receiver. Some receivers ignore that... after all, why do they care what the source IP is? They only care about the source MAC. Other receivers see a spoofed packet and drop it.
Why wouldn't it be within the source IP mask? I would imagine local-proxy-arp would work exactly the same way as if a directly connected host with the IP the ARP request was for would have answered. -- Mikael Abrahamsson email: swmike@swm.pp.se
On 06/05/13 18:57 +0200, Mikael Abrahamsson wrote:
On Wed, 5 Jun 2013, William Herrin wrote:
Nothing. The problem is that the arp source IP doesn't fall within the interface netmask at the receiver. Some receivers ignore that... after all, why do they care what the source IP is? They only care about the source MAC. Other receivers see a spoofed packet and drop it.
Why wouldn't it be within the source IP mask? I would imagine local-proxy-arp would work exactly the same way as if a directly connected host with the IP the ARP request was for would have answered.
I've seen two vendors get it wrong: 1) when originating an ARP request, the router uses a source IP that does not match the subnet of the ip being requested (happened when the interface on the router had secondary IPs); 2) when a customer had more than IP address assigned on an interface/VLAN, and one device ARPd the other, the router responded with its own MAC, creating a race condition where sometimes traffic between those two devices was forced up through the router. -- Dan White
On Wed, 05 Jun 2013 12:06:49 -0400, William Herrin <bill@herrin.us> wrote:
... Since the Linux kernel already mishandles arp by default, you're probably begging for unexpected behavior. Double down on that if the customer controls the server image.
I won't argue against calling Linux "wrong". However, the linux way of dealing with ARP is well tuned for "host" and not "router" duty. It's just not designed for the kernel to maintain huge arp tables for extended periods. Generally, a host speaks to very few L2 neighbors. Even a "server" tends to speak to few of it's L2 neighbors -- esp. for an internet service (www, ftp, irc, etc.). However, a ROUTER speaks to everything on most of it's links. As such, out-of-the-box, linux makes for a very BAD router... it's neighbor cache goes "stale" in 30s (avg), and entries are dropped on a scale of minutes. Real Routers(tm) hold on to arp's for *hours* -- because broadcast traffic requires CPU attention. That said, I do use a stripped debian box as an inter-vlan router. You don't want to see the pages of tweaks it's taken to stop it being a broadcast storm generator. (and no, "arpd" is stupid hack.) It's a beautiful thing to run "tcpdump ... broadcast" and see no packets! (And I'm not too happy with the BS 32 interface limit for multicast routing.) --Ricky
On Wed, Jun 5, 2013 at 6:25 PM, Ricky Beam <jfbeam@gmail.com> wrote:
I won't argue against calling Linux "wrong". However, the linux way of dealing with ARP is well tuned for "host" and not "router" duty.
I love Linux and use it throughout my work but I can't tell you the number of times its ARP behavior has bitten me. If you send a packet to a VIP on a Linux box and it doesn't have an arp entry for the default gateway, the Linux box will send an arp request... with the vip as the source. That is just wrong. Wrong, wrong, wrong. Use the damn interface IP when you arp for something on that interface. If the router doesn't happen to like the bad arp (since the VIP isn't on the router's LAN) the router will ignore it. And your service will merrily pop up and down depending on whether the Linux box has any traffic to originate. Okay, I'm done venting now. -Bill -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
William Herrin <bill@herrin.us> writes:
On Wed, Jun 5, 2013 at 6:25 PM, Ricky Beam <jfbeam@gmail.com> wrote:
I won't argue against calling Linux "wrong". However, the linux way of dealing with ARP is well tuned for "host" and not "router" duty.
I love Linux and use it throughout my work but I can't tell you the number of times its ARP behavior has bitten me. If you send a packet to a VIP on a Linux box and it doesn't have an arp entry for the default gateway, the Linux box will send an arp request... with the vip as the source. That is just wrong. Wrong, wrong, wrong. Use the damn interface IP when you arp for something on that interface. If the router doesn't happen to like the bad arp (since the VIP isn't on the router's LAN) the router will ignore it. And your service will merrily pop up and down depending on whether the Linux box has any traffic to originate.
Did you try setting sys.net.ipv4.conf.all.arp_announce=2 ? Yes, the system default may be tuned for host/desktop usage, but it's not like you *have* to use the system default. Tweak it as you like. And if there isn't enough knobs, then you can always add another one. You have the source code. Bjørn
On Thu, Jun 6, 2013 at 3:00 PM, Bjørn Mork <bjorn@mork.no> wrote:
William Herrin <bill@herrin.us> writes:
On Wed, Jun 5, 2013 at 6:25 PM, Ricky Beam <jfbeam@gmail.com> wrote:
I won't argue against calling Linux "wrong". However, the linux way of dealing with ARP is well tuned for "host" and not "router" duty.
I love Linux and use it throughout my work but I can't tell you the number of times its ARP behavior has bitten me. If you send a packet to a VIP on a Linux box and it doesn't have an arp entry for the default gateway, the Linux box will send an arp request... with the vip as the source. That is just wrong. Wrong, wrong, wrong. Use the damn interface IP when you arp for something on that interface. If the router doesn't happen to like the bad arp (since the VIP isn't on the router's LAN) the router will ignore it. And your service will merrily pop up and down depending on whether the Linux box has any traffic to originate.
Did you try setting sys.net.ipv4.conf.all.arp_announce=2 ?
Yes, of course I changed the sysctl. Yes of course that worked. Every time I've run in to the problem. On server after server after server.
Yes, the system default may be tuned for host/desktop usage
No, it doesn't default to reasonable desktop settings for ARP... it defaults to a version of wrong that on a desktop with one NIC and one IP doesn't happen to break anything. It'd be nice if it defaulted to RFC compliant instead and let the few folks with wacky needs move it off the standard behavior. -Bill -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On 6/6/13, William Herrin <bill@herrin.us> wrote:
Yes, the system default may be tuned for host/desktop usage No, it doesn't default to reasonable desktop settings for ARP... it defaults to a version of wrong that on a desktop with one NIC and one IP doesn't happen to break anything. It'd be nice if it defaulted to RFC compliant instead and let the few folks with wacky needs move it off the standard behavior.
I find Linux's arp defaults annoying also, but they're not "wrong" or "non-RFC compliant". An interpretation that applies in the design of Linux networking, is that IP addresses belong to the host, and IP addresses do not belong to IP interfaces (excepting 'scope local' IPs, such as IPv6 link-local). An interface has a source IP address assigned to it for outgoing traffic from the host. All destination IPs for incoming traffic to the host belong to no specific interface on the host. Any IP address added to any interface, belongs to the host as a valid destination IP, and can be ARP'ed on any of the host's IP interfaces. Excepting a firewall rule to the contrary, traffic for any of the host's destination IPs can come in any interface. This is a totally valid and correct way of a host managing that host's IP addresses. However, it is a tad inconvenient for the administrator, in some real-world circumstances; mainly unusual configs such as servers with multiple NICs plugged into different subnets, or servers behind a load balancer. And the ARP behavior is counterintuitive, because regardless of that fact, in Linux you _still_ configure IP addresses on interfaces; every interface has a preferred IP, and maybe some alias IPs. In most case's Linux's choice not to restrict ARP to a specific interface bound to the IP is not useful. However, it is useful if you have a host that has multiple NICs plugged into the same network. The kernel has its defaults, but distribution vendors such as Redhat/Ubuntu/Debian, are free to supply their own defaults through sysctl.conf or their NetworkManager packages or network configuration scripts... It's interesting to note they have so far chosen to go (mostly) with the defaults. I'm sure most people do not have a problem, or else, someone would have updated the defaults by now
-Bill -- -JH
On Fri, Jun 7, 2013 at 12:06 AM, Jimmy Hess <mysidia@gmail.com> wrote:
On 6/6/13, William Herrin <bill@herrin.us> wrote:
Yes, the system default may be tuned for host/desktop usage No, it doesn't default to reasonable desktop settings for ARP... it defaults to a version of wrong that on a desktop with one NIC and one IP doesn't happen to break anything. It'd be nice if it defaulted to RFC compliant instead and let the few folks with wacky needs move it off the standard behavior.
An interpretation that applies in the design of Linux networking, is that IP addresses belong to the host, and IP addresses do not belong to IP interfaces (excepting 'scope local' IPs, such as IPv6 link-local).
I find Linux's arp defaults annoying also, but they're not "wrong" or "non-RFC compliant".
Hi Jimmy, I reread RFC 826 and much to my annoyance it doesn't directly speak to this question. But it does speak to it in a backhanded way, setting a requirement that makes sense only if the ARP source address is part of the subnet on which the arp request is made. 826 says, "The Address Resolution module then sets the [...] ar$spa with the protocol address of itself." "Itself" is never explicitly defined. But 826 also says, "The sender hardware address and sender protocol address are absolutely necessary. It is these fields that get put in a translation table." It says that in a context that appears to apply to both request and response ARPs. RFC 5227 confirms this interpretation, insisting that gratuitous arps and defensive arps are arp-request packets, not arp-reply packets. That would yield a nonsensical activity from the ARP request message *unless* the source layer 3 address is part of the subnet defined on that layer 2 network. Not just any source address will do; it must be one of the machine's addresses that would form a valid entry in the target's arp cache. Linux's default behavior copies the source IP address of the outgoing IP packet to the ARP request, regardless of whether that IP is valid for that particular LAN subnet. So, I reiterate that Linux's default for selecting the ARP source address does not match what the RFC says. Postel's law cuts Linux some slack with respect to accepting ARPs on the wrong interface. Even though that's almost always the wrong thing to do. On the other hand, it reinforces the errant nature of Linux's behavior with respect to source address selection when originating ARP requests. -Bill -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
Jimmy Hess <mysidia@gmail.com> writes:
The kernel has its defaults, but distribution vendors such as Redhat/Ubuntu/Debian, are free to supply their own defaults through sysctl.conf or their NetworkManager packages or network configuration scripts...
It's interesting to note they have so far chosen to go (mostly) with the defaults.
I'm sure most people do not have a problem, or else, someone would have updated the defaults by now
Changing defaults will break stuff for people relying on those defaults. This is usually not acceptable. At least not in the kernel. The behaviour is well documented and easy to change. Whining about the defaults not matching personal preferences is useless noise. Bjørn
On 2013-06-05 18:25, Ricky Beam wrote:
That said, I do use a stripped debian box as an inter-vlan router. You don't want to see the pages of tweaks it's taken to stop it being a broadcast storm generator. (and no, "arpd" is stupid hack.) It's a beautiful thing to run "tcpdump ... broadcast" and see no packets!
(And I'm not too happy with the BS 32 interface limit for multicast routing.)
Actually, I'd love to see the pages of tweaks. Seems like it would be useful if I need to do this in the future :) Maybe drop it on the Debian wiki somewhere if you get the chance. Or at the least it would be nice to know what issues you're hitting now. You can tune the neighbor cache size and timeout via sysctl, so I would think it would be more of a memory limit than anything (unless the kernel uses a really poor hash lookup for arp entries)
--Ricky
--Robert
On 6/5/13, rdrake <rdrake@direcpath.com> wrote:
On 2013-06-05 18:25, Ricky Beam wrote: [snip]
(And I'm not too happy with the BS 32 interface limit for multicast routing.)
Actually, I'd love to see the pages of tweaks. Seems like it would be useful if I need to do this in the future :)
The great thing about open sourced operating system kernels is if an arbitrary limit or system misbehavior causes you problems, or a tweak is needed to fix incorrect behavior, you can work out a patch to correct the situation -- or add an optional configuration setting to fix the problem, and submit the improvement to the maintainer in the form of a patch. :) -- -JH
Hi Mikael, (Sorry if you are getting a duplicate copy of this.) In our network we had a couple of problems with RFC3069. Not all the hardware we currently use supports the RFC so we tried to come up with a solution that worked and didn't have us opening a lot of ERs (I know I reference 1 ER in the presentation but that's just 1 rather than a lot). We have more than just routers to consider (i.e. load balancers, firewalls, etc..) and don't want to lock ourselves in to any particular vendor. We also wanted a solution that we could easily migrate our customers into rather than completely taking them off line while we "retrofit" them into a new config (as probably would've been the case if we tried implementing RFC3069). Additionally, for a number of our customers we needed a solution that worked with a FHRP. I don't currently see a way to do that with RFC3069 but if I've missed something please let me know. Thanks, ChrisP. SoftLayer Technologies chrisp@softlayer.com -----Original Message----- From: Mikael Abrahamsson [mailto:swmike@swm.pp.se] Sent: Tuesday, June 04, 2013 5:34 PM To: nanog@nanog.org Subject: IP4 address conservation method I read: http://www.nanog.org/sites/default/files/tues.general.Papandreou.conservatio... I would like to point out RFC 3069. On most cisco equipment this is done using static routes and "ip unnumbered". So my question is basically: What am I missing? Why can't data center guys not build their network the same way regular ETTH is done? Either one vlan per customer and sharing the IPv4 subnet between several vlans, or having several customers in the same vlan but use antispoofing etc (IETF SAVI-wg functionality) to handle the security stuff? One vlan per customer also works very well with IPv6. -- Mikael Abrahamsson email: swmike@swm.pp.se
participants (10)
-
Bjørn Mork
-
Blake Hudson
-
Christopher Papandreou
-
Dan White
-
Jimmy Hess
-
Mikael Abrahamsson
-
rdrake
-
Ricky Beam
-
Tore Anderson
-
William Herrin