Hello Baldur, Your design regarding proxy arp for every VLAN might hit some issues. If you look at the nanog history you will find people having issues with proxy arp for large number of VLANs, what is your requirement for proxy arp? Doing something at the access switch will most likely be better for you such as PVLAN or Brocade IP follow ve statement. If you are planning to put clients on the same subnet what are you planning to put in place to limit client stealing each other’s IPs? Only a few Brocade devices support the ARP ACLs rules which are a really nice feature, IP Source Guard works reasonable if using a DHCP server otherwise you need to specify the MAC address. Some other brand switches support filtering the ARP packets per access port. Regards, Steven. Date: Sat, 28 Dec 2013 02:18:55 +0100 From: Baldur Norddahl <baldur.norddahl@gmail.com> To: "nanog@nanog.org" <nanog@nanog.org> Subject: Re: The Making of a Router Message-ID: <CAPkb-7C2+pebvp+WwYx0S3DLwQmy_hDPbZgqipvQ_sFj_3uNUQ@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Sat, Dec 28, 2013 at 12:56 AM, Jon Sands <fohdeesha@gmail.com> wrote:
Yes, and in that world, one should probably not start up a FTTH ISP when one has not even budgeted for a router, among a thousand other things. And if you must, you should probably figure out your cost breakdown beforehand, not after. Baldur, you mention $200k total to move 10gb with Juniper (which seems insanely off to me). Look into Brocades CER line, you can move 4x 10gbe per chassis for under 12k.
I was saying $100k for two Juniper routers total. Perhaps we could get back on track, instead of trying to second guess what we did or did not budget for. You have absolute no information about our business plans. The Brocade BR-CER-2024F-4X-RT-AC - Brocade NetIron CER 2024F-4X goes for about $21k and we need two of them. That is enough to buy a full year of unlimited 10G internet. And even then, we would be short on 10G ports. It is not that we could not bring that money if that was the only way to do it. It is just that I have so many other things that I could spend that money on, that would further our business plans so much more. I can not even say if the Juniper or the Brocade will actually solve my problem. I need it to route to ten of thousands of VLANS (Q-in-Q), both with IPv4 and IPv6. It needs to act as IPv6 router on every VLAN, and very few devices seems to like having that many IP-addresses assigned. It also needs to do VRRP and proxy arp for every VLAN. The advantage of a software solution is that I can test it all before buying. Also to some limited degree, I am able to fix shortcomings myself. Regards, Baldur
On Sat, Dec 28, 2013 at 8:09 AM, sten rulz <stenrulz@gmail.com> wrote:
Hello Baldur,
Your design regarding proxy arp for every VLAN might hit some issues. If you look at the nanog history you will find people having issues with proxy arp for large number of VLANs, what is your requirement for proxy arp? Doing something at the access switch will most likely be better for you such as PVLAN or Brocade IP follow ve statement. If you are planning to put clients on the same subnet what are you planning to put in place to limit client stealing each other’s IPs? Only a few Brocade devices support the ARP ACLs rules which are a really nice feature, IP Source Guard works reasonable if using a DHCP server otherwise you need to specify the MAC address. Some other brand switches support filtering the ARP packets per access port.
This is a complex question that depends entirely on the capabilities of the equipment I can get. I was considering an OpenFlow solution, where this is easy: I would make rules that would only forward traffic with correct source IP from each VLAN. If the user tries something funny, nothing happens and his traffic is just dropped. But I am bit let down on the capabilities of current OpenFlow switches. Most only support OpenFlow 1.0 which is simply not good enough. That has no IPv6 support, which naturally is a requirement. I know about the HP offerings, but they only support 4k rules in hardware, which is a far cry from being enough. There is NoviFlow who are still working on getting me a quote. If they can give me a competitive price I might still consider OpenFlow. The problem is this: A conventional approach assigns a full IPv4 subnet to each user. This uses a minimum of 4 addresses of each user. I currently have to pay somewhere between $10 and $20 for each address and this will only become more expensive in the future. The users each have a unique VLAN (Q-in-Q). The question is, what do I put on those VLANs, if I do not want to put a full IPv4 subnet on each? My own answer to that is to have the users share a larger subnet, for example I could have a full class C sized subnet shared between 253 users/VLANs. To allow these users to communicate with each other, and so they can communicate with the default gateway IP, I will need proxy arp. And in a non-OpenFlow solution, also the associated security functions such as DHCP-snooping to prevent hijacking of IP addresses. Which devices can solve this task? To me the work seems quite simple. For outbound packets, check that the source IP matches the expected IP on the VLAN, then forward the packet according to the routing table. For inbound packets, lookup the destination IP and find the correct VLAN, then push the VLAN tag on the packet and forward it using the normal MAC lookup. For ARP packets, lookup the destination VLAN from the destination IP, change the VLAN tag and forward the packet. There is no reason a device should not be able to handle a large number of rules such as the above. The NoviSwitch will do it. However it appears that a lot of devices are quite limited in this regard. I could buy a router/switch for every few thousand users and split the work between them. Split the cost on many users, so the extra cost would probably not be prohibitive. This is the do the work at the edge solution, although I would be hosting the equipment in the same rack as the core router. But why fill a rack with equipment, to do simple dummy work, that should be manageable by a single device? Regards, Baldur
On Sun, 2013-12-29 at 03:31 +0100, Baldur Norddahl wrote:
(...) The users each have a unique VLAN (Q-in-Q). The question is, what do I put on those VLANs, if I do not want to put a full IPv4 subnet on each?
My own answer to that is to have the users share a larger subnet, for example I could have a full class C sized subnet shared between 253 users/VLANs.
To allow these users to communicate with each other, and so they can communicate with the default gateway IP, I will need proxy arp. And in a non-OpenFlow solution, also the associated security functions such as DHCP-snooping to prevent hijacking of IP addresses.
Which devices can solve this task?
Hi Baldur, Assuming you manage 1.1.1.0/24 and 2001:db8:0::/48 and have a Linux box on both ends you can get rid of IPv4 and v6 interco subnets and arp proxy the following way: 1/ on the gateway ip addr add 1.1.1.0/32 dev lo for all client VLAN "NN" on eth0 : ip -6 addr add fe80::1/64 dev eth0.NN ip -6 route add 2001:db8:0:NN00::/56 via fe80::1:NN dev eth0.NN 2/ on user CPE number "NN" CPE WAN interface being eth0 : ip addr add 1.1.1.NN/32 dev eth0 ip route add 1.1.1.0/32 dev eth0 ip route add default via 1.1.1.0 ip -6 addr add fe80::1:NN/64 dev eth0 ip -6 route add default via fe80::1 dev eth0 # ip -6 addr add 2001:db8:0:NN00::1/56 dev eth0 # optional Note: NN in hex for IPv6 The trick in IPv4 is that linux by default will answer to ARP requests for "1.1.1.0" on all interfaces even if the adress is on the loopback. And in IPv6 use static link local on both ends. You can replace "1.1.1.0" by any IPv4, but since ".0" are rarely assigned to end users it doesn't waste anything and keep traceroute with public IPv4. The nice thing of this setup is that it "virtualizes" the routing from the client point of view: you can split/balance your clients on multiple physical gateways and not change a line to the client configuration while it's being moved, you just have to configure your IGP between gateways to properly distribute internal routes. We (AS197422 / tetaneutral.net) use this for virtual machines too (with "tapNN" interfaces from KVM instead of "eth0.NN"): it allows us to move virtual machines around physical machines without user reconfiguration, not waste any IPv4 and avoid all issues with shared L2 (rogue RA/ARP spoofing/whatever) since there's no shared L2 anymore between user VM. It also allows us to not pre split our IPv4 space in a fixed scheme, we manage only /32 so no waste at all. Of course you still have work to do on PPS tuning. Sincerely, Laurent GUERBY AS197422 http://tetaneutral.net peering http://as197422.net PS: minimum settings on a Linux router echo 1 > /proc/sys/net/ipv4/ip_forward for i in /proc/sys/net/ipv6/conf/*; do for j in autoconf accept_ra; do echo 0 > $i/$j; done;done echo 1 > /proc/sys/net/ipv6/conf/all/forwarding echo 65536 > /proc/sys/net/ipv6/route/max_size for i in /proc/sys/net/ipv4/conf/*/arp_announce; do echo 2 > $i;done PPS: we also like to give /56 to our users in IPv6, it makes a nice /24 IPv4 <=> /48 IPv6 correspondance (256 users).
for i in /proc/sys/net/ipv4/conf/*/arp_announce; do echo 2 > $i;done
+1 setting arp_announce in Linux is essential if being used as a router with more than one subnet. I would also recommend setting arp_ignore. For Linux-based routers, I've found the following settings to be optimal: echo 1 > /proc/sys/net/ipv4/conf/all/arp_announce echo 2 > /proc/sys/net/ipv4/conf/all/arp_ignore On a side note, this underscores what a lot of people on-list are saying: If you don't understand the internals of a Linux system, for example, "rolling your own" will bite you. It's also pretty rare to find a network engineer who is also a Linux system-level developer, so finding and maintaining that talent can often be a challenge. Many make a leap and go on to assert that because of this software-based systems can never be viable, which I disagree with. After all, the latest OS offerings from Cisco run a Linux kernel. Nearly all the Ciena DWDM and ME gear I run is built on Linux. These companies aren't doing quite as much with hardware acceleration as they would lead you to believe. I think Intel DPDK will be a disruptive technology for networking. At the end of the day, I'm pretty anxious to see the days of over-priced routers driving up network service costs go away. On Sun, Dec 29, 2013 at 4:10 AM, Laurent GUERBY <laurent@guerby.net> wrote:
On Sun, 2013-12-29 at 03:31 +0100, Baldur Norddahl wrote:
(...) The users each have a unique VLAN (Q-in-Q). The question is, what do I
put
on those VLANs, if I do not want to put a full IPv4 subnet on each?
My own answer to that is to have the users share a larger subnet, for example I could have a full class C sized subnet shared between 253 users/VLANs.
To allow these users to communicate with each other, and so they can communicate with the default gateway IP, I will need proxy arp. And in a non-OpenFlow solution, also the associated security functions such as DHCP-snooping to prevent hijacking of IP addresses.
Which devices can solve this task?
Hi Baldur,
Assuming you manage 1.1.1.0/24 and 2001:db8:0::/48 and have a Linux box on both ends you can get rid of IPv4 and v6 interco subnets and arp proxy the following way:
1/ on the gateway ip addr add 1.1.1.0/32 dev lo
for all client VLAN "NN" on eth0 : ip -6 addr add fe80::1/64 dev eth0.NN ip -6 route add 2001:db8:0:NN00::/56 via fe80::1:NN dev eth0.NN
2/ on user CPE number "NN" CPE WAN interface being eth0 : ip addr add 1.1.1.NN/32 dev eth0 ip route add 1.1.1.0/32 dev eth0 ip route add default via 1.1.1.0 ip -6 addr add fe80::1:NN/64 dev eth0 ip -6 route add default via fe80::1 dev eth0 # ip -6 addr add 2001:db8:0:NN00::1/56 dev eth0 # optional
Note: NN in hex for IPv6
The trick in IPv4 is that linux by default will answer to ARP requests for "1.1.1.0" on all interfaces even if the adress is on the loopback. And in IPv6 use static link local on both ends. You can replace "1.1.1.0" by any IPv4, but since ".0" are rarely assigned to end users it doesn't waste anything and keep traceroute with public IPv4.
The nice thing of this setup is that it "virtualizes" the routing from the client point of view: you can split/balance your clients on multiple physical gateways and not change a line to the client configuration while it's being moved, you just have to configure your IGP between gateways to properly distribute internal routes.
We (AS197422 / tetaneutral.net) use this for virtual machines too (with "tapNN" interfaces from KVM instead of "eth0.NN"): it allows us to move virtual machines around physical machines without user reconfiguration, not waste any IPv4 and avoid all issues with shared L2 (rogue RA/ARP spoofing/whatever) since there's no shared L2 anymore between user VM. It also allows us to not pre split our IPv4 space in a fixed scheme, we manage only /32 so no waste at all.
Of course you still have work to do on PPS tuning.
Sincerely,
Laurent GUERBY AS197422 http://tetaneutral.net peering http://as197422.net
PS: minimum settings on a Linux router echo 1 > /proc/sys/net/ipv4/ip_forward for i in /proc/sys/net/ipv6/conf/*; do for j in autoconf accept_ra; do echo 0 > $i/$j; done;done echo 1 > /proc/sys/net/ipv6/conf/all/forwarding echo 65536 > /proc/sys/net/ipv6/route/max_size for i in /proc/sys/net/ipv4/conf/*/arp_announce; do echo 2 > $i;done
PPS: we also like to give /56 to our users in IPv6, it makes a nice /24 IPv4 <=> /48 IPv6 correspondance (256 users).
-- Ray Patrick Soucy Network Engineer University of Maine System T: 207-561-3526 F: 207-561-3531 MaineREN, Maine's Research and Education Network www.maineren.net
participants (4)
-
Baldur Norddahl
-
Laurent GUERBY
-
Ray Soucy
-
sten rulz