Hi All, Sorry if this is a repeat topic. I've done a fair bit of trawling but can't find anything concrete to base decisions on. I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga. We do this at the moment and our box manages under the 100Mbps level very effectively. Over the next year however we expect to push about 250Mbps outbound traffic with very little inbound (50Mbps simultaneously) and I'm seeing differing suggestions of what to do in order to move up to the 1Gbps level. It seems even a dual core box with expensive NICs and some kernel tweaks will accomplish this but we can't afford to get the hardware purchases wrong. We'd be looking to buy one live and one standby box within the next month or so. They will only run Quagga primarily with 'tc' for shaping. We're in the UK if it makes any difference. Any help massively appreciated, ideally from those doing the same in production environments. Thanks, Chris
Chris <chris@ghostbusters.co.uk> writes:
I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga.
There was a talk "Towards 10Gb/s open-source routing" at this years Linux-Kongress in Hamburg. Here are th slides: http://data.guug.de/slides/lk2008/10G_preso_lk2008.pdf cheers Jens -- Berlin, Germany | http://www.quux.de | jabber: jenslink@guug.de sage@guug Berlin: http://www.guug.de/lokal/berlin/index.html
I've been pretty happy running IBM x-series hardware using RHEL4. Usually it's PPS rather than throughput that will kill it, so if you're doing 250Mbit of DNS/I-mix/HTTP, you'll probably have very different results. There are some rx-ring tweaks for the NICs that are needed, but on the most part it's all out of the box (No custom kernel patches, and such - Just some sysctl settings). I have two x3650s (Quad core) doing around 6-700Mbit/sec (40k pps) at around 20% CPU right now. No Quagga BGP, but that's minimal in terms of CPU. I've not been able to get much beyond 1Gb/sec on this environment because my ASAs are not configured to support more than one Gig into that particular network. Chris wrote:
Hi All, Sorry if this is a repeat topic. I've done a fair bit of trawling but can't find anything concrete to base decisions on.
I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga. We do this at the moment and our box manages under the 100Mbps level very effectively. Over the next year however we expect to push about 250Mbps outbound traffic with very little inbound (50Mbps simultaneously) and I'm seeing differing suggestions of what to do in order to move up to the 1Gbps level.
It seems even a dual core box with expensive NICs and some kernel tweaks will accomplish this but we can't afford to get the hardware purchases wrong. We'd be looking to buy one live and one standby box within the next month or so. They will only run Quagga primarily with 'tc' for shaping. We're in the UK if it makes any difference.
Any help massively appreciated, ideally from those doing the same in production environments.
Thanks,
Chris
I don't think you will have any troubles with industry standard hardware for the rates you are quoting. When you get in excess of 300Mbps you have to start worrying about PPS. When you are looking at >600Mbps then you should pick out your system more carefully (tcpoe nics, pcie(X), cpu at over Xghz, fast ram if you are doing a lot of BGP, tweaking your linux distribution and kernel, etc.). You should be fine with any recent hardware. A cheap HP dl360 would do a great job. --p -----Original Message----- From: Chris [mailto:chris@ghostbusters.co.uk] Sent: Wednesday, December 17, 2008 9:03 AM To: nanog list Subject: Gigabit Linux Routers Hi All, Sorry if this is a repeat topic. I've done a fair bit of trawling but can't find anything concrete to base decisions on. I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga. We do this at the moment and our box manages under the 100Mbps level very effectively. Over the next year however we expect to push about 250Mbps outbound traffic with very little inbound (50Mbps simultaneously) and I'm seeing differing suggestions of what to do in order to move up to the 1Gbps level. It seems even a dual core box with expensive NICs and some kernel tweaks will accomplish this but we can't afford to get the hardware purchases wrong. We'd be looking to buy one live and one standby box within the next month or so. They will only run Quagga primarily with 'tc' for shaping. We're in the UK if it makes any difference. Any help massively appreciated, ideally from those doing the same in production environments. Thanks, Chris
You've given me lots to think about ! Thanks for all the input so far. A few queries for the replies if I may. My brain is whirring. Chris: You're right and I'm tempted. I've almost had my arm twisted to go down the proprietory route as I have some Cisco experience but have become pretty familiar with Quagga and tc. David: May I ask which NICs you use in the IBM boxes ? I see the Intels recommended by Mike have dual ports on one board (the docs say "Two complete Gigabit Ethernet connections in a single device • Lower latency due to one electrical load on the bus"). Patrick: That's what I was hoping to hear :) It's not the world's biggest network. Michael: Thanks very much. We have three upstreams. I guess 2GB of RAM would cover many more sessions. Eugeniu: That's very useful. The Intel dual port NICs mentioned aren't any good then I presume (please see my comment to David). Thanks again, Chris
The boxes (3650s) came with Broadcom BCM5708 on-board, but I push most of my traffic over these: 1c:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter Flags: bus master, fast devsel, latency 0, IRQ 58 Memory at c7ea0000 (32-bit, non-prefetchable) [size=128K] Memory at c7e80000 (32-bit, non-prefetchable) [size=128K] I/O ports at 6020 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+ Capabilities: [e0] Express Endpoint IRQ 0 Capabilities: [100] Advanced Error Reporting There are four Intel ports in the boxes, so traffic may or may not stay on the same PCI-X card depending how things are flowing. Chris wrote:
David: May I ask which NICs you use in the IBM boxes ? I see the Intels recommended by Mike have dual ports on one board (the docs say "Two complete Gigabit Ethernet connections in a single device • Lower latency due to one electrical load on the bus").
This might be of some use, it's a document written by one of the AMS-IX engineers, it's a little aged (almost 2 years old) so there should be some improvement in the numbers, but it might give you some insight in the bottlenecks when pushing a Linux server to it's max (10Gigabit in this case) http://noc.easycolocate.nl/10-GE_Routing_on_Linux.pdf David Coulson wrote:
The boxes (3650s) came with Broadcom BCM5708 on-board, but I push most of my traffic over these:
1c:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter Flags: bus master, fast devsel, latency 0, IRQ 58 Memory at c7ea0000 (32-bit, non-prefetchable) [size=128K] Memory at c7e80000 (32-bit, non-prefetchable) [size=128K] I/O ports at 6020 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+ Capabilities: [e0] Express Endpoint IRQ 0 Capabilities: [100] Advanced Error Reporting
There are four Intel ports in the boxes, so traffic may or may not stay on the same PCI-X card depending how things are flowing.
Chris wrote:
David: May I ask which NICs you use in the IBM boxes ? I see the Intels recommended by Mike have dual ports on one board (the docs say "Two complete Gigabit Ethernet connections in a single device • Lower latency due to one electrical load on the bus").
-- Met vriendelijke groet, Jeroen Wunnink, EasyHosting B.V. Systeembeheerder systeembeheer@easyhosting.nl telefoon:+31 (035) 6285455 Postbus 48 fax: +31 (035) 6838242 3755 ZG Eemnes http://www.easyhosting.nl http://www.easycolocate.nl
On Dec 18, 2008, at 4:13 AM, Jeroen Wunnink wrote:
This might be of some use, it's a document written by one of the AMS- IX engineers, it's a little aged (almost 2 years old) so there should be some improvement in the numbers, but it might give you some insight in the bottlenecks when pushing a Linux server to it's max (10Gigabit in this case)
Note that this test did not involve full BGP. Given the problems that used to occur on some name brand routers when BGP took up too much CPU, I would be careful extrapolating these results if you are planning on running full BGP. As the paper itself says, " In a real-world situation the device might be running BGP, with a full routing table. This will surely affect the performance of the device." Regards Marshall
David Coulson wrote:
The boxes (3650s) came with Broadcom BCM5708 on-board, but I push most of my traffic over these:
1c:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter Flags: bus master, fast devsel, latency 0, IRQ 58 Memory at c7ea0000 (32-bit, non-prefetchable) [size=128K] Memory at c7e80000 (32-bit, non-prefetchable) [size=128K] I/O ports at 6020 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+ Capabilities: [e0] Express Endpoint IRQ 0 Capabilities: [100] Advanced Error Reporting
There are four Intel ports in the boxes, so traffic may or may not stay on the same PCI-X card depending how things are flowing.
Chris wrote:
David: May I ask which NICs you use in the IBM boxes ? I see the Intels recommended by Mike have dual ports on one board (the docs say "Two complete Gigabit Ethernet connections in a single device • Lower latency due to one electrical load on the bus").
--
Met vriendelijke groet,
Jeroen Wunnink, EasyHosting B.V. Systeembeheerder systeembeheer@easyhosting.nl
telefoon:+31 (035) 6285455 Postbus 48 fax: +31 (035) 6838242 3755 ZG Eemnes
Just as another source of info here, I'm running: Dual Core Intel Xeon 3060 @ 2.4Ghz 2 Gb Ram (it says "Mem: 2059280k total, 1258500k used, 800780k free, 278004k buffers" right now) 2 of these on the motherboard: Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) (port-channel bonded to my switch) One other card with 2 ports: Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03) Gentoo Linux with a fairly small kernel with FIB_TRIE enabled. I'm taking in 2 full BGP feeds, a decent amount of iptables rules, and I've hit 1.2 Gbps with no problems. At this point, I just don't have anything behind the router to push more than that. -- Alex Thurlow Blastro Networks http://www.blastro.com http://www.roxwel.com http://www.yallwire.com Chris wrote:
You've given me lots to think about ! Thanks for all the input so far.
A few queries for the replies if I may. My brain is whirring.
Chris: You're right and I'm tempted. I've almost had my arm twisted to go down the proprietory route as I have some Cisco experience but have become pretty familiar with Quagga and tc.
David: May I ask which NICs you use in the IBM boxes ? I see the Intels recommended by Mike have dual ports on one board (the docs say "Two complete Gigabit Ethernet connections in a single device • Lower latency due to one electrical load on the bus").
Patrick: That's what I was hoping to hear :) It's not the world's biggest network.
Michael: Thanks very much. We have three upstreams. I guess 2GB of RAM would cover many more sessions.
Eugeniu: That's very useful. The Intel dual port NICs mentioned aren't any good then I presume (please see my comment to David).
Thanks again,
Chris
Chris wrote:
Eugeniu: That's very useful. The Intel dual port NICs mentioned aren't any good then I presume (please see my comment to David).
Actually it depends on the motherboard chipset. Some chipsets allocate an interrupt per slot, and when you have lot's a traffic between two ports on a dual port card the will increase dramatically, but should get you at 1Gbps, at higer speeds... depends. It's adviseable to use a 2.6 kernel as the network stack, compared to 2.4, is way better and you can achieve higher speeds.
Chris wrote:
Hi All, Sorry if this is a repeat topic. I've done a fair bit of trawling but can't find anything concrete to base decisions on.
I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga. We do this at the moment and our box manages under the 100Mbps level very effectively. Over the next year however we expect to push about 250Mbps outbound traffic with very little inbound (50Mbps simultaneously) and I'm seeing differing suggestions of what to do in order to move up to the 1Gbps level.
Any recent hardware can do do 1Gbps of routing from one NIC to another without issues. What you would need is PCI-Express cards, each with it's own slot (try avoiding dual/quad port cards for I/O intensive tasks). Quagga with one full view and two feeds of about 5000 prefixes each consumes around 50MB of RAM. Putting alot of RAM in the box will not help you with increasing performance. You can also use a kernel with LC-Trie as route hashing algorithm to improve FIB lookups.
It seems even a dual core box with expensive NICs and some kernel tweaks will accomplish this but we can't afford to get the hardware purchases wrong. We'd be looking to buy one live and one standby box within the next month or so. They will only run Quagga primarily with 'tc' for shaping. We're in the UK if it makes any difference.
Regarding tc, make sure you use a scalable algorithm like HTB/HSFQ and tweak your rules so that a packet will spend the least amount of time in matching and classifying routines.
Any help massively appreciated, ideally from those doing the same in production environments.
At 100Mbps FDX full load (routing traffic from one NIC to another) on 2.53 GHz Celeron box with 512Mbps of traffic, the load is between 0.00 and 0.01-0.02
* Eugeniu Patrascu:
You can also use a kernel with LC-Trie as route hashing algorithm to improve FIB lookups.
Do you know if it's possible to switch of the route cache? Based on my past experience, it was a major source of routing performance dependency on traffic patterns (it's basically flow-based forwarding). Anyway, with very few flows, we get quite decent performance (several hundred megabits five-minute peak, and we haven't bothered tuning yet), running on mid-range single-socket server boards and Intel NICs (PCI-X, this is all 2006 hardware). We use a router-on-a-stick configuration with VLAN separation between all hosts to get a decent number of ports. My concern with PC routing (in the WAN area) is a lack of WAN NICs with properly maintained kernel drivers. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Florian Weimer wrote:
* Eugeniu Patrascu:
My concern with PC routing (in the WAN area) is a lack of WAN NICs with properly maintained kernel drivers.
Depending on your WAN interface, there's actually a decent amount of stuff out there. The cheaper alternative to me has actually always been to get some old cisco hardware with the proper interfaces and use it for media conversion. I have a 6500 with Sup1As in it. It can't take BGP feeds with the amount of memory it has, but with the right cards, it will give my router Ethernet and push a few million pps with no problem. Sounds like he's getting Ethernet from his provider though, so this probably isn't an issue. -- Alex Thurlow Blastro Networks http://www.blastro.com http://www.roxwel.com http://www.yallwire.com
* Alex Thurlow:
Depending on your WAN interface, there's actually a decent amount of stuff out there. The cheaper alternative to me has actually always been to get some old cisco hardware with the proper interfaces and use it for media conversion. I have a 6500 with Sup1As in it. It can't take BGP feeds with the amount of memory it has, but with the right cards, it will give my router Ethernet and push a few million pps with no problem.
But you have to ask your peer to enable eBGP multihop, right? Or are there some TTL tricks you can play? -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Florian Weimer wrote:
* Eugeniu Patrascu:
You can also use a kernel with LC-Trie as route hashing algorithm to improve FIB lookups.
Do you know if it's possible to switch of the route cache? Based on my past experience, it was a major source of routing performance dependency on traffic patterns (it's basically flow-based forwarding).
I don't understand your question. In kernel, when you compile it, you have two options: - hash based route algorithm - lc-trie based route algorithm From what I've read on the internet about the latter algorithm, it's supposed to be faster regarding route lookups with large routing tables (like a global routing table).
Anyway, with very few flows, we get quite decent performance (several hundred megabits five-minute peak, and we haven't bothered tuning yet), running on mid-range single-socket server boards and Intel NICs (PCI-X, this is all 2006 hardware). We use a router-on-a-stick configuration with VLAN separation between all hosts to get a decent number of ports.
In that configuration you'll split available bandwidth on the NIC and also have less throughput because server NICs are not optimized for "same interface switching".
My concern with PC routing (in the WAN area) is a lack of WAN NICs with properly maintained kernel drivers.
Usually it's better to get a dedicated router for that kind of stuff than bother with PC WAN cards.
* Eugeniu Patrascu:
Do you know if it's possible to switch of the route cache? Based on my past experience, it was a major source of routing performance dependency on traffic patterns (it's basically flow-based forwarding).
I don't understand your question.
Flow-based routing does not deal well with certain traffic patterns (high HTTP or DNS load, or DoS attacks).
In kernel, when you compile it, you have two options: - hash based route algorithm - lc-trie based route algorithm
From what I've read on the internet about the latter algorithm, it's supposed to be faster regarding route lookups with large routing tables (like a global routing table).
In the past, Linux used flow routing. First, an ordinary hash table (the dst cache, also called route cache) is looked up using the destination address of the packet (and a few other bits). In case of a hit, the information is used. In case of a miss, a FIB lookup (using the hash algorithm or LC-trie) is performed, and the result is stored in the cache and used. If there are more flows than cache entries, the work to update the cache (and expire old records from it) is wasted. But under more benign conditions, the cache is a win.
In that configuration you'll split available bandwidth on the NIC and also have less throughput because server NICs are not optimized for "same interface switching".
If this is a problem, I can use multiple trunk ports or multiple routers. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Chris wrote:
Hi All, Sorry if this is a repeat topic. I've done a fair bit of trawling but can't find anything concrete to base decisions on.
I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga. We do this at the moment and our box manages under the 100Mbps level very effectively. Over the next year however we expect to push about 250Mbps outbound traffic with very little inbound (50Mbps simultaneously) and I'm seeing differing suggestions of what to do in order to move up to the 1Gbps level.
As somebody else said, it's more pps than bits you need to worry about. The Intel NICs can do a full gigabit without any difficulty, if packet size is large enough. But they buckle somewhere around 300Kpps. 300K 100-byte packets is only 240 Mb/s. On the other hand, you mentioned your traffic is mostly outbound, which makes me think you might be a content provider. In that case, you'll know what your average packet size is -- and it should be a lot bigger than 100 bytes. For that type of traffic, using a Linux router up to, say, 1.5-2 Gb/s is pretty trivial. You can do more than that, too, but have to start getting a lot more careful about hardware selection, tuning, etc. The other issue is the number of concurrent flows. The actual route table size is unimportant -- it's the size of the route cache that matters. Unfortunately, I have no figures here. But I did once convert a router from limited routes (quagga, 10K routes) to full routes (I think about 200K routes at the time), with absolutely no measurable impact. There were only a few thousand concurrent flows, and that number did not change -- and that's the one that might have made a difference. I hope this is helpful. Jim
All the responses have been really helpful. Thanks to everyone for being friendly and for taking the time to answer in detail. I've asked a hardware provider to quote for a couple of x86 boxes and I'll look for suitable Intel NICs too. Jim: We're a very small ISP and have a full mix of packet sizes on the network but the vast majority is outbound on port 80 so hopefully that'll help. Any more input will of course be considered. I may post the NIC models for approval if I'm scratching my head again :) Thanks, Chris 2008/12/17 Jim Shankland <nanog@shankland.org>
Chris wrote:
Hi All, Sorry if this is a repeat topic. I've done a fair bit of trawling but can't find anything concrete to base decisions on.
I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga. We do this at the moment and our box manages under the 100Mbps level very effectively. Over the next year however we expect to push about 250Mbps outbound traffic with very little inbound (50Mbps simultaneously) and I'm seeing differing suggestions of what to do in order to move up to the 1Gbps level.
As somebody else said, it's more pps than bits you need to worry about. The Intel NICs can do a full gigabit without any difficulty, if packet size is large enough. But they buckle somewhere around 300Kpps. 300K 100-byte packets is only 240 Mb/s. On the other hand, you mentioned your traffic is mostly outbound, which makes me think you might be a content provider. In that case, you'll know what your average packet size is -- and it should be a lot bigger than 100 bytes. For that type of traffic, using a Linux router up to, say, 1.5-2 Gb/s is pretty trivial. You can do more than that, too, but have to start getting a lot more careful about hardware selection, tuning, etc.
The other issue is the number of concurrent flows. The actual route table size is unimportant -- it's the size of the route cache that matters. Unfortunately, I have no figures here. But I did once convert a router from limited routes (quagga, 10K routes) to full routes (I think about 200K routes at the time), with absolutely no measurable impact. There were only a few thousand concurrent flows, and that number did not change -- and that's the one that might have made a difference.
I hope this is helpful.
Jim
On Wed, Dec 17, 2008, Chris wrote:
All the responses have been really helpful. Thanks to everyone for being friendly and for taking the time to answer in detail. I've asked a hardware provider to quote for a couple of x86 boxes and I'll look for suitable Intel NICs too.
Jim: We're a very small ISP and have a full mix of packet sizes on the network but the vast majority is outbound on port 80 so hopefully that'll help.
Any more input will of course be considered. I may post the NIC models for approval if I'm scratching my head again :)
Just FYI, the more recent Intel hardware has multiple hardware TX/RX queues, implemented via seperate (IIRC) PCIe channels, and Linux/FreeBSD is growing support to handle these multiple queues via multiple kernel threads. Ie, multiple CPUs handling packet forwarding. The trick is whether they can pull it off in a way that scales the FIB and RIB lookups and updates across 4 core (and more) boxes. But 40kpps is absolutely doable on one CPU. Some of the FreeBSD guys working on it are looking at supporting 1mil pps + on 10GE cards (in the public source tree), so .. :) Adrian
Greetings all, We are a software development firm that currently delivers our install ISOs via Sourceforge. We need to start serving them ourselves for marketing reasons and are therefore increasing our bandwidth and getting a 2nd ISP in our datacenter. Both ISPs will be delivering 100mbit/sec links. We don't expect to increase that for the next year or so and expect average traffic to be about 40-60mbit/sec. We are planning to run two OpenBSD based firewalls (with CARP and pf) running OpenBGP in order to connect to the two ISPs. I saw from previous email that Quagga was recommended as opposed to OpenBGP. Any further comments on that? Also, any comments on the choice of OpenBSD vs. Linux? I don't want to start a religious war :-) Just curious about what most folks are doing and what their experiences have been. Thanks in advance, Marc Runkel Technical Operations Manager Untangle, Inc.
OpenBSD SMP support is quite limited. NetBSD SMP is quite limited. FreeBSD and Linux seem to be running better. :) Adrian On Wed, Dec 17, 2008, Marc Runkel wrote:
Greetings all,
We are a software development firm that currently delivers our install ISOs via Sourceforge. We need to start serving them ourselves for marketing reasons and are therefore increasing our bandwidth and getting a 2nd ISP in our datacenter. Both ISPs will be delivering 100mbit/sec links. We don't expect to increase that for the next year or so and expect average traffic to be about 40-60mbit/sec.
We are planning to run two OpenBSD based firewalls (with CARP and pf) running OpenBGP in order to connect to the two ISPs.
I saw from previous email that Quagga was recommended as opposed to OpenBGP. Any further comments on that? Also, any comments on the choice of OpenBSD vs. Linux?
I don't want to start a religious war :-) Just curious about what most folks are doing and what their experiences have been.
Thanks in advance,
Marc Runkel Technical Operations Manager Untangle, Inc.
-- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
On Wed, Dec 17, 2008 at 9:37 AM, Marc Runkel <MRunkel@untangle.com> wrote: [snip]
Greetings all,
We are a software development firm that currently delivers our install ISOs via Sourceforge. We need to start serving them ourselves for marketing reasons and are therefore increasing our bandwidth and getting a 2nd ISP in our datacenter. Both ISPs will be delivering 100mbit/sec links. We don't expect to increase that for the next year or so and expect average traffic to be about 40-60mbit/sec.
We are planning to run two OpenBSD based firewalls (with CARP and pf) running OpenBGP in order to connect to the two ISPs.
I saw from previous email that Quagga was recommended as opposed to OpenBGP. Any further comments on that? Also, any comments on the choice of OpenBSD vs. Linux?
IMO, the performance and utility of OpenBSD as a routing/networking platform is unmatched by any other open source platform. OpenBGPD (recent 4-byte ASN issues notwithstanding) has been very stable for us in production (running roughly equivalent traffic levels to what you're discussing), and the best part is that you get stateful transparent failover with CARP, filtering/redirection with pf, load balancing all the way up through layer7 with relayd, and a host of other excellent tools for the network engineer's toolkit, all included, and all integrated. Then of course there's the wider issues of OpenBSD's track record on security and networking in comparison with the other OSS platforms, the smaller pool of folks to draw on who are experienced in running and tuning OpenBSD (although any reasonably competent UNIX admin should be able to adapt to it in a few days, given the generally clean layout and high degree of internal consistency). advocacy@openbsd.org is down the hall, so I'll stop there. :) As Adrian said, there are other platforms with better SMP implementations ... but my experience has been that for small and mid-size sites, CPU utilization on a reasonably modern x86-based router is the least of one's worries. -- darkuncle@{gmail.com,darkuncle.net} || 0x5537F527 http://darkuncle.net/pubkey.asc for public key
Hi Marc,
I saw from previous email that Quagga was recommended as opposed to OpenBGP. Any further comments on that? Also, any comments on the choice of OpenBSD vs. Linux?
I don't want to start a religious war :-) Just curious about what most folks are doing and what their experiences have been.
We run a similar setup since about a year. I also don't want to start a "religious war" (being a happy user of both Linux and OpenBSD, for different purposes), but in this scenario my decision was quick and clear: I went for OpenBSD with OpenBGPD, consistent with my experience throughout the last few years, that for the basic, "hidden" (from end user perspective) network services (routing, firewalling, DHCP, DNS…) OpenBSD never let me down and saved me a _lot_ of time and hassle as an admin (doing this stuff with Linux before). And admin time is often more valuable than that of one or two CPU cycles… (and as long as I get the throughput I demand plus a large enough margin I really don't care about those). My basic rule of thumb now is (and I'm just pragmatic, not religious): If I can get away with the base installation of OpenBSD for a service, I really give it the first try. So for OpenBGPD. It was also the documentation, the clean design and the usability (okay, that's really personal taste, but I really got to love the OpenBSD config file style) that helped with that decision. And from my perspective, it really was the right one: The setup just works, right from the beginning. Flawless. With both Junipers and Ciscos as neighbors.
We are planning to run two OpenBSD based firewalls (with CARP and pf) running OpenBGP in order to connect to the two ISPs.
Just one thing independent of the OpenBSD vs. Linux question: Depending on the complexity of your setup and maybe also for a cleaner design and possibly additional layers of security, I'd recommend to think about separating the "pure" firewalls from the BGP stuff. I do have three OpenBGPD boxes towards the Internet as our BGP peers plus two redundant pairs of OpenBSD carp/pf boxes towards different internal networks and DMZs. Between the OpenBGPD and the carp/pf boxes is our "backbone". I experimented with a setup as you describe it (many different BGP/ router/firewalling roles combined on one pair of OpenBSD boxes) first, but soon realized that (while perfectly okay for a simple setup) as soon as you get more and more specialized requirements, things tend to get unneccessarily complicated and you're probably better of with dedicated boxes (if not for performance reasons, then still for the design). Best regards, Beat Vontobel -- Beat Vontobel, CTO, MeteoNews AG Siewerdtstr. 105, CH-8050 Zurich, Switzerland E-Mail: b.vontobel@meteonews.ch IT Department: +41 (0)43 288 40 54 Main phone: +41 (0)43 288 40 50
On Thu, Dec 18, 2008 at 8:55 AM, Beat Vontobel <b.vontobel@meteonews.ch> wrote:
Hi Marc,
I saw from previous email that Quagga was recommended as opposed to OpenBGP. Any further comments on that? Also, any comments on the choice of OpenBSD vs. Linux?
I don't want to start a religious war :-) Just curious about what most folks are doing and what their experiences have been.
For the past couple of years we've had good success running Quagga border router/firewall boxes on Debian booting from Sony Microvault 1GB USB. Standard Debian install with some minor mods to make it USB friendly (noatime, a few /dev/shm links.) Once you've got used to apt, it's hard to accept anything else. I know lots of people prefer a stripped down system, but if you're running the same basic services (BGP, SSH) I don't see the difference. Disclaimer: we only take default from our upstreams, so can't comment on Quagga and full routes. Tim:>
Hi Marc,
We are a software development firm that currently delivers our install ISOs via Sourceforge. We need to start serving them ourselves for marketing reasons and are therefore increasing our bandwidth and getting a 2nd ISP in our datacenter. Both ISPs will be delivering 100mbit/sec links. We don't expect to increase that for the next year or so and expect average traffic to be about 40-60mbit/sec.
We are planning to run two OpenBSD based firewalls (with CARP and pf) running OpenBGP in order to connect to the two ISPs.
I saw from previous email that Quagga was recommended as opposed to OpenBGP. Any further comments on that? Also, any comments on the choice of OpenBSD vs. Linux?
I would suggset checking out Vyatta Linux as a possible Linux solution. It's designed to be configured as a routing/firewall platform. One caveat, I have never used it but it seems to be mentioned in this list from time to time. Now for my rant. I attempted a setup as you describe using two servers using pf, carp, and openbgp. I also had VLANs configured (each VLAN interface had it's own CARP interface). I tried both load-balanced and failover mode but the results weren't desirable. The routers were connected to a switch which connects the servers and the ISP connection. There was only one drop from the ISP but each router had it's own /30 and BGP session on it's own VLAN. The remaining servers were also VLANned appropriately. Each VLAN interface on the router that connects to the servers would also have an accompanying CARP interface. There were a myriad of problems when attempting my setup. These are some that I distinctly recall. * In load-balancing mode I would unplug a router. The other router would register as a CARP master but didn't forward the remaining traffic. * In failover mode when unplugging a router the other router would forward traffic for certain VLANs and wouldn't register as master for the others. In hindsight I should've reached out to the openbsd community for assistance. It's possible I was running into bugs in the CARP code or I was simply doing it all wrong. However I was under a time crunch and this was merely a favour for a friend in need. I didn't want to further disrupt the network by testing so I ended up going with a single router setup (still openbsd though). I haven't revisited the daul router setup since everything has been working fine and dandy with one router. Regardless of what OS choice you make be sure to thoroughly test your network setup and make sure it works as planned. Lastly don't hesitate to ask the appropriate people for help. You may have discovered oddities that noone else has. Good luck, Naveen
"chris" == chris <chris@ghostbusters.co.uk> writes:
chris> All the responses have been really helpful. Thanks to everyone chris> for being friendly and for taking the time to answer in detail. chris> I've asked a hardware provider to quote for a couple of x86 chris> boxes and I'll look for suitable Intel NICs too. chris> Jim: We're a very small ISP and have a full mix of packet sizes chris> on the network but the vast majority is outbound on port 80 so chris> hopefully that'll help. chris> Any more input will of course be considered. I may post the NIC chris> models for approval if I'm scratching my head again :) It's also worth saying that you should consider using FreeBSD --- which uses an r-tree for routes (constant time lookup) and is not flow-based. Dave. -- ============================================================================ |David Gilbert, Independent Contractor. | Two things can be | |Mail: dave@daveg.ca | equal if and only if they | |http://daveg.ca | are precisely opposite. | =========================================================GLO================
the recent facebook engineering post on scaling memcached to 200-300K UDP requests/sec/node may be germaine here (in particular, patches to make irq handling more intelligent become very useful at the traffic levels being discussed). http://www.facebook.com/note.php?note_id=39391378919&id=9445547199&index=0 /sf On Wed, Dec 17, 2008 at 8:30 AM, Jim Shankland <nanog@shankland.org> wrote:
Chris wrote:
Hi All, Sorry if this is a repeat topic. I've done a fair bit of trawling but can't find anything concrete to base decisions on.
I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga. We do this at the moment and our box manages under the 100Mbps level very effectively. Over the next year however we expect to push about 250Mbps outbound traffic with very little inbound (50Mbps simultaneously) and I'm seeing differing suggestions of what to do in order to move up to the 1Gbps level.
As somebody else said, it's more pps than bits you need to worry about. The Intel NICs can do a full gigabit without any difficulty, if packet size is large enough. But they buckle somewhere around 300Kpps. 300K 100-byte packets is only 240 Mb/s. On the other hand, you mentioned your traffic is mostly outbound, which makes me think you might be a content provider. In that case, you'll know what your average packet size is -- and it should be a lot bigger than 100 bytes. For that type of traffic, using a Linux router up to, say, 1.5-2 Gb/s is pretty trivial. You can do more than that, too, but have to start getting a lot more careful about hardware selection, tuning, etc.
The other issue is the number of concurrent flows. The actual route table size is unimportant -- it's the size of the route cache that matters. Unfortunately, I have no figures here. But I did once convert a router from limited routes (quagga, 10K routes) to full routes (I think about 200K routes at the time), with absolutely no measurable impact. There were only a few thousand concurrent flows, and that number did not change -- and that's the one that might have made a difference.
I hope this is helpful.
Jim
-- darkuncle@{gmail.com,darkuncle.net} || 0x5537F527 http://darkuncle.net/pubkey.asc for public key
On 18/12/2008, at 3:02 AM, Chris wrote:
Hi All, Sorry if this is a repeat topic. I've done a fair bit of trawling but can't find anything concrete to base decisions on.
I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga. We do this at the moment and our box manages under the 100Mbps level very effectively. Over the next year however we expect to push about 250Mbps outbound traffic with very little inbound (50Mbps simultaneously) and I'm seeing differing suggestions of what to do in order to move up to the 1Gbps level.
It seems even a dual core box with expensive NICs and some kernel tweaks will accomplish this but we can't afford to get the hardware purchases wrong. We'd be looking to buy one live and one standby box within the next month or so. They will only run Quagga primarily with 'tc' for shaping. We're in the UK if it makes any difference.
Any help massively appreciated, ideally from those doing the same in production environments.
Give Click a try - it is an alternative forwarding plane for Linux, that ran much faster than regular Linux forwarding a few years ago, and I imagine would still do so. The XORP routing suite supports various different FIBs, including Click. http://read.cs.ucla.edu/click/ -- Nathan Ward
Ah, NO! Stay away from Click. It is NOT stable. Unless you want to hold your network together with paperclips and rubber bands, stay away. We use Linux software routing extensively where I work. We use Quagga primarily. I tried XORP, and it was very interesting, but not particularly ready for production. For our non-Linux boxes (FreeBSD), we use OpenBGPd, whereas on Linux, Quagga is far more stable. From a hardware perspective, you guys really don't need anything special anymore. Nathan Ward wrote:
On 18/12/2008, at 3:02 AM, Chris wrote:
Hi All, Sorry if this is a repeat topic. I've done a fair bit of trawling but can't find anything concrete to base decisions on.
I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga. We do this at the moment and our box manages under the 100Mbps level very effectively. Over the next year however we expect to push about 250Mbps outbound traffic with very little inbound (50Mbps simultaneously) and I'm seeing differing suggestions of what to do in order to move up to the 1Gbps level.
It seems even a dual core box with expensive NICs and some kernel tweaks will accomplish this but we can't afford to get the hardware purchases wrong. We'd be looking to buy one live and one standby box within the next month or so. They will only run Quagga primarily with 'tc' for shaping. We're in the UK if it makes any difference.
Any help massively appreciated, ideally from those doing the same in production environments.
Give Click a try - it is an alternative forwarding plane for Linux, that ran much faster than regular Linux forwarding a few years ago, and I imagine would still do so.
The XORP routing suite supports various different FIBs, including Click.
http://read.cs.ucla.edu/click/
-- Nathan Ward
-- +1.925.202.9485 Sargun Dhillon deCarta sdhillon@decarta.com www.decarta.com
Thanks to the list again. There's lots more options than I'd considered. I think it's likely that I'll stick with what I know, which is Linux not FreeBSD and Quagga. The lack of a need to learn new stuff is the my main motivation behind this because I'm unlikely to break things as frequently. One final quick question on the NICs if I can. Following Mike's suggestion about specific Intel chipsets (82575 or 82576) it looks like it's much easier to source the chipsets mentioned by David (82571EB). If these NICs are embedded on the motherboard is it going to be of disadvantage in terms of performance ? I take the point of the interrupts being the key, kindly thrown into the mix by Eugeniu. A nice man called John mailed me off list and mentioned this off-the-shelf build. On that note does anyone have any experience of Lannerinc's appliances mentioned above by Ingo or John's suggested RouterBoard: "the 1000 series seems good, just short on ram on the basic spec. At sub £500 notes, it's cheaper than buying a basic server and it's designed to do the job you need. http://www.routerboard.com/prices.html". Both appliances seem to perform well in the throughput tests. Now to look at very affordable layer 2, Gigabit 3com switches with good pps. Chris
Eugeniu Patrascu wrote:
Chris wrote:
Now to look at very affordable layer 2, Gigabit 3com switches with good pps.
You should take a look at HP. They have very good gigabit switches and also offer lifetime guarantee on them.
HP actually has a CLI to configure the switch, not the crap 3Com has.
Let me provide a strong second to HP. They are rock solid, easy to configure, easy to monitor remotely, and worth every penny. -- I like mathematics because it is not human and has nothing particular to do with this planet or with the whole accidental universe - because, like Spinoza's God, it won't love us in return. (Bertrand Russell)
On Dec 18, 2008, at 4:00 AM, Eugeniu Patrascu wrote:
Chris wrote:
Now to look at very affordable layer 2, Gigabit 3com switches with good pps.
You should take a look at HP. They have very good gigabit switches and also offer lifetime guarantee on them.
HP actually has a CLI to configure the switch, not the crap 3Com has.
Not to defend 3Com or anything, but all of their enterprise stuff (for quite a few years now) has an extremely similar CLI to IOS. Came out very shortly after they got involved with Huawei. If you're already familiar with 3com enterprise gear, check out the 4200G series for cheap L2 gig switching. -- Adam
Not to defend 3Com or anything, but all of their enterprise stuff (for quite a few years now) has an extremely similar CLI to IOS. Came out very shortly after they got involved with Huawei. If you're already familiar with 3com enterprise gear, check out the 4200G series for cheap L2 gig switching.
3Com's CLI is just different enough from Cisco's so they won't get sued. show interface = display interface write mem = save no ip address = undo ip address etc. All in all we've been fairly happy with the higher end gear (5500EI, 5500GEI).
Dear Chris,
One final quick question on the NICs if I can. Following Mike's suggestion about specific Intel chipsets (82575 or 82576) it looks like it's much easier to source the chipsets mentioned by David (82571EB). If these NICs are embedded on the motherboard is it going to be of disadvantage in terms of performance ? I take the point of the interrupts being the key, kindly thrown into the mix by Eugeniu.
For a new system you should go with pci-e cards.
A nice man called John mailed me off list and mentioned this off-the-shelf build. On that note does anyone have any experience of Lannerinc's appliances mentioned above by Ingo
I have posted thos off-list, for the list: http://www.lannerinc.com/DM/FW-7550_DM.pdf pros: cheap, cf-disk support, low power (~50W) cons: only 1GB Ram (enough for 1million routes), pci-connected intel 82541GI, 32bit, 33MHZ acpi max-temp is set to low in bios and needs an acpi-aml file to be loaded http://www.axiomtek.de/uploads/na-820.pdf pros: 7x pci-e www.endian.com use them. http://www.endian.com/en/products/hardware/macro-x2/ OS: Freebsd: pros: very stable, quagge runs very well, fastforwarding support, simple traffic shaping, interrupt less polling supported cons: only 1 route for each network, vrrp failover is not easy to implement with quagga and ospf, no multipath routing Linux: pros: more than 1 route for each network possible, interrupt less polling should be supported? fastforwarding ? cons: no multipath routing Cpu's: Single-core-cpus performs better at freebsd than multi-core ones At freebsd-net mailinglist there is a very long thread about freebsd-routers. Kind regards, Ingo Flaschberger
Ingo Flaschberger wrote:
OS: Freebsd: pros: very stable, quagge runs very well, fastforwarding support, simple traffic shaping, interrupt less polling supported cons: only 1 route for each network, vrrp failover is not easy to implement with quagga and ospf, no multipath routing Linux: pros: more than 1 route for each network possible, interrupt less polling should be supported? fastforwarding ? cons: no multipath routing ^^^^^^^^^^^^^^^^^^^^ Are you sure ? Because there is an option in the kernel, under advanced routing setup to enable multipath routing. And also, with iproute2, you can add multiple gateways with different/equal weights for a specific prefix
Dear Eugeniu,
OS: Freebsd: pros: very stable, quagge runs very well, fastforwarding support, simple traffic shaping, interrupt less polling supported cons: only 1 route for each network, vrrp failover is not easy to implement with quagga and ospf, no multipath routing Linux: pros: more than 1 route for each network possible, interrupt less polling should be supported? fastforwarding ? cons: no multipath routing ^^^^^^^^^^^^^^^^^^^^ Are you sure ? Because there is an option in the kernel, under advanced routing setup to enable multipath routing. And also, with iproute2, you can add multiple gateways with different/equal weights for a specific prefix
Multipath, yes, but flow-based, not per packet. There exists a patch for 2.4 kernel, but not for 2.6 Or tinker with iptables. Kind regards, Ingo Flaschberger
One final query for this thread if I may. Our hardware provider has come back with this as an 'easy to source build' in case we want two or three identical boxes: Supermicro X7SBI-LN2 motherboard with 2 x Intel 82573V/L gigabit PCI-Express NICs Does anyone have experience of these NICs before I commit ? Or any other comments ? I'll start trawling their specs too. Thanks again to all that responded, Chris
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ingo Flaschberger wrote:
cons: only 1 route for each network, vrrp failover is not easy to implement with quagga and ospf, no multipath routing
Anyone cares about VRRPD when you have Heartbeat?
Linux: pros: more than 1 route for each network possible, interrupt less polling should be supported? fastforwarding ? cons: no multipath routing
In what way is multipath routing not supported? Iproute2 and contrack has done this for ages. Equal metric round robin is also possible and works very well, only problem is it's not capacity sensitive. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJSjpg0FZZWLfHKjURAi5vAJ9KM3lS2vzG/ssh0UqkSijul1q8DACcDxAZ GijQNdu+5YYdNuO1LBtkCNA= =VmHM -----END PGP SIGNATURE-----
I have posted thos off-list, for the list: http://www.lannerinc.com/DM/FW-7550_DM.pdf pros: cheap, cf-disk support, low power (~50W)
cf-disk support is pretty easy to add to lots of things. With the advent of 4GB compact flash modules and CF-to-IDE adapters, it is not too hard to avoid rotating media...
OS: Freebsd: pros: very stable, quagge runs very well, fastforwarding support,
quagga OSPF needs a patch on FreeBSD 7, else it will decimate your OSPF environment.
simple traffic shaping, interrupt less polling supported
Several different traffic shaping strategies are available, and I think all of them go far beyond "simple".
cons: only 1 route for each network, vrrp failover is not easy to implement with quagga and ospf, no multipath routing
carp seems easy to implement, even with quagga and ospf. At least, it's set up on a lab setup here and everything appears to work as expected. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.
Dear Joe,
Several different traffic shaping strategies are available, and I think all of them go far beyond "simple".
ipfw 100 add pipe 1 all from 192.168.0.0/24 to any xmit vlan1 ipfw pipe 1 config bw 95Mbit/s queue 200Kbytes thats simple.
cons: only 1 route for each network, vrrp failover is not easy to implement with quagga and ospf, no multipath routing
carp seems easy to implement, even with quagga and ospf. At least, it's set up on a lab setup here and everything appears to work as expected.
example setup: A----(ospf)---B \ / \ / \ / \ / \ / lan1 A and B share 1 virtual ip for lan1 (192.168.0.1/24). problems: *) only 1 ip-net supported (no aliases) *) carp is i bound, carp-dev line openbsd is in development (not shure if already stable) *) if carp switch over: t=0: A is master, has route 192.168.0.1/24 B has route 192.168.0.1/24 via ospf t=1: A goes down, route disappear (need linkstate in ospf) t=2: B carp takes over 192.168.0.1/24 B can not add 192.168.0.1/24 route as it is still known via ospf t=3: B gets update to remove route 192.168.0.1/24 via ospf t=4: 192.168.0.1/24 route has disappeared, failover broken. with ucarp, some special scripts and source code changed I was able to handle this situation, but not with carp and ospf (at least at freebsd 6.3) Kind regards, Ingo Flaschberger
Dear Joe,
Several different traffic shaping strategies are available, and I think all of them go far beyond "simple".
ipfw 100 add pipe 1 all from 192.168.0.0/24 to any xmit vlan1 ipfw pipe 1 config bw 95Mbit/s queue 200Kbytes
thats simple.
Yes, but the point was that the feature was listed as "simple traffic shaping." You can do *complicated* traffic shaping too, which was the reason I commented on that. Usually the ability to do complicated traffic shaping means you can do simple traffic shaping too. ;-)
cons: only 1 route for each network, vrrp failover is not easy to implement with quagga and ospf, no multipath routing
carp seems easy to implement, even with quagga and ospf. At least, it's set up on a lab setup here and everything appears to work as expected.
example setup:
A----(ospf)---B \ / \ / \ / \ / \ / lan1
A and B share 1 virtual ip for lan1 (192.168.0.1/24). problems: *) only 1 ip-net supported (no aliases)
So, you want, what, like multiple aliases on the network? I just tried adding an alias with the normal alias syntax, and it looks to work. rtr0# ifconfig vlan20; ifconfig carp20 vlan20: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=3<RXCSUM,TXCSUM> ether 00:04:23:b7:8e:08 inet 206.55.68.194 netmask 0xffffffe0 broadcast 206.55.68.223 media: Ethernet autoselect status: active vlan: 20 parent interface: lagg0 carp20: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500 inet 206.55.68.193 netmask 0xffffffe0 inet 206.55.68.196 netmask 0xffffffff carp: BACKUP vhid 1 advbase 1 advskew 0 rtr1# ifconfig vlan20; ifconfig carp20 vlan20: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 00:80:c8:cd:43:1d inet 206.55.68.195 netmask 0xffffffe0 broadcast 206.55.68.223 media: Ethernet autoselect status: active vlan: 20 parent interface: lagg1 carp20: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500 inet 206.55.68.193 netmask 0xffffffe0 inet 206.55.68.196 netmask 0xffffffff carp: MASTER vhid 1 advbase 1 advskew 0 switch20> ping 206.55.68.193 Pinging 206.55.68.193 with 56 bytes of data: 56 bytes from 206.55.68.193: icmp_seq=1. time=0 ms 56 bytes from 206.55.68.193: icmp_seq=2. time=0 ms 56 bytes from 206.55.68.193: icmp_seq=3. time=0 ms 56 bytes from 206.55.68.193: icmp_seq=4. time=0 ms ----206.55.68.193 PING Statistics---- 4 packets transmitted, 4 packets received, 0% packet loss round-trip (ms) min/avg/max = 0/0/0 switch20> ping 206.55.68.194 Pinging 206.55.68.194 with 56 bytes of data: 56 bytes from 206.55.68.194: icmp_seq=1. time=0 ms 56 bytes from 206.55.68.194: icmp_seq=2. time=0 ms 56 bytes from 206.55.68.194: icmp_seq=3. time=0 ms 56 bytes from 206.55.68.194: icmp_seq=4. time=0 ms ----206.55.68.194 PING Statistics---- 4 packets transmitted, 4 packets received, 0% packet loss round-trip (ms) min/avg/max = 0/0/0 switch20> ping 206.55.68.195 Pinging 206.55.68.195 with 56 bytes of data: 56 bytes from 206.55.68.195: icmp_seq=1. time=0 ms 56 bytes from 206.55.68.195: icmp_seq=2. time=0 ms 56 bytes from 206.55.68.195: icmp_seq=3. time=0 ms 56 bytes from 206.55.68.195: icmp_seq=4. time=0 ms ----206.55.68.195 PING Statistics---- 4 packets transmitted, 4 packets received, 0% packet loss round-trip (ms) min/avg/max = 0/0/0 switch20> ping 206.55.68.196 Pinging 206.55.68.196 with 56 bytes of data: 56 bytes from 206.55.68.196: icmp_seq=1. time=0 ms 56 bytes from 206.55.68.196: icmp_seq=2. time=0 ms 56 bytes from 206.55.68.196: icmp_seq=3. time=0 ms 56 bytes from 206.55.68.196: icmp_seq=4. time=0 ms ----206.55.68.196 PING Statistics---- 4 packets transmitted, 4 packets received, 0% packet loss round-trip (ms) min/avg/max = 0/0/0 switch20> Mmm, generally, it looks to me like it works, but the above is the entirety of my testing, so I could easily be wrong.
*) carp is i bound, carp-dev line openbsd is in development (not shure if already stable)
You mean inbound? Well, yes. That's reasonably practical. It isn't entirely clear what other paradigms would look like (i.e. if the host system didn't have a native address on the wire), though several ideas spring to mind. Am I correct in assuming that you mean to have no native interface on the network in question, and only a CARP interface? Or am I reading in between the lines incorrectly?
*) if carp switch over: t=0: A is master, has route 192.168.0.1/24 B has route 192.168.0.1/24 via ospf t=1: A goes down, route disappear (need linkstate in ospf) t=2: B carp takes over 192.168.0.1/24 B can not add 192.168.0.1/24 route as it is still known via ospf t=3: B gets update to remove route 192.168.0.1/24 via ospf t=4: 192.168.0.1/24 route has disappeared, failover broken.
with ucarp, some special scripts and source code changed I was able to handle this situation, but not with carp and ospf (at least at freebsd 6.3)
I agree that this is a problematic scenario. FreeBSD 5.* and 6.* are pretty worthless to us, so we've pretty much jumped from 4 to 7, and so my knowledge of the networking improvements in between are limited. Under FreeBSD 4, there is indeed a great deal of pain associated with routes coming in via a routing protocol that are also theoretically available on a directly-attached interface. I just tried downing rtr1: vlan20 on the above (which is FreeBSD 7, obviously) and from rtr1's PoV the network did move correctly to an alternate route via OSPF, but upon re-enabling the vlan20 interface, the OSPF route remained. Now, it seemed to "all work again" when I did the following: # ifconfig vlan20 up # route delete -net 206.55.68.192 # ifconfig vlan20 inet 206.55.68.195 netmask 0xffffffe0 which re-established the local link. That's not ideal, but it is a lot better than FreeBSD 4, where things were just breaking all over if you did "strange" things like this. For most important things around here, we use OSPF with stub routes so the failure of a particular ethernet is not necessarily of great concern, but it would be nice to see things like this know how to DTRT. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.
Dear Joe,
Yes, but the point was that the feature was listed as "simple traffic shaping." You can do *complicated* traffic shaping too, which was the reason I commented on that. Usually the ability to do complicated traffic shaping means you can do simple traffic shaping too. ;-)
with linux? really?
Mmm, generally, it looks to me like it works, but the above is the entirety of my testing, so I could easily be wrong.
you have ospf between this 2 boxes? show me them routing table. do a failover and show the routing table again,
*) carp is i bound, carp-dev line openbsd is in development (not shure if already stable)
You mean inbound? Well, yes. That's reasonably practical. It isn't entirely clear what other paradigms would look like (i.e. if the host system didn't have a native address on the wire), though several ideas spring to mind.
Am I correct in assuming that you mean to have no native interface on the network in question, and only a CARP interface? Or am I reading in between the lines incorrectly?
only carp-int has the ip's.
*) if carp switch over: t=0: A is master, has route 192.168.0.1/24 B has route 192.168.0.1/24 via ospf t=1: A goes down, route disappear (need linkstate in ospf) t=2: B carp takes over 192.168.0.1/24 B can not add 192.168.0.1/24 route as it is still known via ospf t=3: B gets update to remove route 192.168.0.1/24 via ospf t=4: 192.168.0.1/24 route has disappeared, failover broken.
with ucarp, some special scripts and source code changed I was able to handle this situation, but not with carp and ospf (at least at freebsd 6.3)
I agree that this is a problematic scenario. FreeBSD 5.* and 6.* are pretty worthless to us, so we've pretty much jumped from 4 to 7, and so my knowledge of the networking improvements in between are limited.
I have not yet tested freebsd 7, as the multicast kernel interface changed and quagge ospf breaked. also I need(ed) a stable platform.
Under FreeBSD 4, there is indeed a great deal of pain associated with routes coming in via a routing protocol that are also theoretically available on a directly-attached interface.
I just tried downing rtr1: vlan20 on the above (which is FreeBSD 7, obviously) and from rtr1's PoV the network did move correctly to an alternate route via OSPF, but upon re-enabling the vlan20 interface, the OSPF route remained. Now, it seemed to "all work again" when I did the following:
yes, thats the problem.
# ifconfig vlan20 up # route delete -net 206.55.68.192 # ifconfig vlan20 inet 206.55.68.195 netmask 0xffffffe0
I have changed ucarp todo so, but you also need gratious arp and such stuff to get a real, flawless failover.
which re-established the local link. That's not ideal, but it is a lot better than FreeBSD 4, where things were just breaking all over if you did "strange" things like this.
For most important things around here, we use OSPF with stub routes so the failure of a particular ethernet is not necessarily of great concern, but it would be nice to see things like this know how to DTRT.
DTRT? Kind regards, Ingo Flaschberger
Dear Joe,
Yes, but the point was that the feature was listed as "simple traffic shaping." You can do *complicated* traffic shaping too, which was the reason I commented on that. Usually the ability to do complicated traffic shaping means you can do simple traffic shaping too. ;-)
with linux? really?
Reread the message... the text was in reply to a discussion of FreeBSD features. And, yes, really. ipfw, pf, and ipf solutions are all trivially available, giving you a selection of rule types and altq or dummynet shaping options that can be tied into extremely flexible rules.
Mmm, generally, it looks to me like it works, but the above is the entirety of my testing, so I could easily be wrong.
you have ospf between this 2 boxes?
Yes. vlan20 is OSPF-enabled, as a matter of fact, and both routers are on it. The goal was to see if I could get a network that was smart enough for both OSPF-enabled hosts ("no static gateway needs to be config'ed") and non-OSPF hosts (CARP as default gateway).
show me them routing table. do a failover and show the routing table again,
I did that experiment below. I didn't grab snapshots of the routing table at the time, but I described the effect. Essentially, upon downing of the interface, the local link via the vlan20 interface went away, and was promptly replaced by the OSPF route (generally good/desirable). Further discussion was in my previous message.
*) carp is i bound, carp-dev line openbsd is in development (not shure if already stable)
You mean inbound? Well, yes. That's reasonably practical. It isn't entirely clear what other paradigms would look like (i.e. if the host system didn't have a native address on the wire), though several ideas spring to mind.
Am I correct in assuming that you mean to have no native interface on the network in question, and only a CARP interface? Or am I reading in between the lines incorrectly?
only carp-int has the ip's.
Really? Interesting. I'm trying to think of how that would be configured. How does the system identify which ethernet interface to use, or is this something that's specific to Linux?
*) if carp switch over: t=0: A is master, has route 192.168.0.1/24 B has route 192.168.0.1/24 via ospf t=1: A goes down, route disappear (need linkstate in ospf) t=2: B carp takes over 192.168.0.1/24 B can not add 192.168.0.1/24 route as it is still known via ospf t=3: B gets update to remove route 192.168.0.1/24 via ospf t=4: 192.168.0.1/24 route has disappeared, failover broken.
with ucarp, some special scripts and source code changed I was able to handle this situation, but not with carp and ospf (at least at freebsd 6.3)
I agree that this is a problematic scenario. FreeBSD 5.* and 6.* are pretty worthless to us, so we've pretty much jumped from 4 to 7, and so my knowledge of the networking improvements in between are limited.
I have not yet tested freebsd 7, as the multicast kernel interface changed and quagge ospf breaked. also I need(ed) a stable platform.
I'm aware of the Quagga OSPF issues, having grinched about them a number of times in various places. For what it is worth, there's a patch that appears to work, but which was thought to not really be a "correct" fix. Several people, including us, however, are using it with apparent success.
Under FreeBSD 4, there is indeed a great deal of pain associated with routes coming in via a routing protocol that are also theoretically available on a directly-attached interface.
I just tried downing rtr1: vlan20 on the above (which is FreeBSD 7, obviously) and from rtr1's PoV the network did move correctly to an alternate route via OSPF, but upon re-enabling the vlan20 interface, the OSPF route remained. Now, it seemed to "all work again" when I did the following:
yes, thats the problem.
# ifconfig vlan20 up # route delete -net 206.55.68.192 # ifconfig vlan20 inet 206.55.68.195 netmask 0xffffffe0
I have changed ucarp todo so, but you also need gratious arp and such stuff to get a real, flawless failover.
Don't know.
which re-established the local link. That's not ideal, but it is a lot better than FreeBSD 4, where things were just breaking all over if you did "strange" things like this.
For most important things around here, we use OSPF with stub routes so the failure of a particular ethernet is not necessarily of great concern, but it would be nice to see things like this know how to DTRT.
DTRT?
Do the right thing. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.
Dear Joe,
I did that experiment below. I didn't grab snapshots of the routing table at the time, but I described the effect. Essentially, upon downing of the interface, the local link via the vlan20 interface went away, and was promptly replaced by the OSPF route (generally good/desirable). Further discussion was in my previous message.
I'm not shure if this setup would ever be "stable". also with ucarp tweaks. hopefully freebsd supports soon more than 1 route.
only carp-int has the ip's.
Really? Interesting. I'm trying to think of how that would be configured. How does the system identify which ethernet interface to use, or is this something that's specific to Linux?
I'm not shure how I have configured that (~6months ago). Now with ucarp I use a /32 for the interfaces as ip and the virtual ip is added as an alias.
I'm aware of the Quagga OSPF issues, having grinched about them a number of times in various places. For what it is worth, there's a patch that appears to work, but which was thought to not really be a "correct" fix. Several people, including us, however, are using it with apparent success.
As far I remember, freebsd changed the multicast-interface to linux-style. Source code seems to be already there, only makefile needs to be changed, to support freebsd <7 and 7. Kind regards, Ingo Flaschberger
On Sat, Dec 20, 2008, Ingo Flaschberger wrote:
I'm not shure if this setup would ever be "stable". also with ucarp tweaks. hopefully freebsd supports soon more than 1 route.
FreeBSD, like all good open source projects, gets features supported when people code them up. So if you'd like to see FreeBSD support it, either code it up, or pay soemone to code it up. Then everyone benefits. :) Adrian
Date: Sun, 21 Dec 2008 12:58:42 +0900 From: Adrian Chadd <adrian@creative.net.au>
On Sat, Dec 20, 2008, Ingo Flaschberger wrote:
I'm not shure if this setup would ever be "stable". also with ucarp tweaks. hopefully freebsd supports soon more than 1 route.
FreeBSD, like all good open source projects, gets features supported when people code them up.
So if you'd like to see FreeBSD support it, either code it up, or pay someone to code it up. Then everyone benefits. :)
I might mention that the development branch of FreeBSD (8-current) now supports multiple routing tables, so it looks like it is on the way. Note that these are being done for broader purposes than those being discussed in this thread, but I believe that they will do most of what is needed. If you are interested, I'm sure Julian would appreciate testing and input. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751
We spent a good amount of time looking into deploying a home-grown Linux-based CPE device over the summer. Generally, Linux is not the issue with performance. You want to focus on your hardware. We've seen the best performance with Intel MT series PCI-X server NICs. When we were testing the PCI-e cards were still underperforming, but they may have improved recently. The Intel cards have significantly better driver support in Linux so you will prob. want to stay away from anything without an Intel chipset. We also went with a low-end server-grade box from Dell (PowerEdge 840 w/ Dual core Xeon 3040 1.86 GHz, 1066 MHz FSB) which proved to be more than adequate. We used a tower for the text box to cut costs, but you would probably want something rack-mountable. With our setup we were able to sustain about 970 Mbps. Ultimately, we stopped because Quagga lacked any multicast support (we need PIM-SM). We recently looked at XORP as a possibility, and it works... but lacks the level of logging and control you would expect for a production environment. Vyatta recently announced a shift from XORP to Quagga so Quagga may see some new functionality. We also found IP Infusion which is being advertised as a complete solution, but when we tried to talk to them about getting a demo they seemed hesitant to work with us on anything beyond what Quagga already does (I'm guessing that they don't really have anything and it's all advertising). If all you're looking for is basic routing though, it might be worthwhile just getting a Vyatta appliance. Ray -----Original Message----- From: Chris [mailto:chris@ghostbusters.co.uk] Sent: Wednesday, December 17, 2008 9:03 AM To: nanog list Subject: Gigabit Linux Routers Hi All, Sorry if this is a repeat topic. I've done a fair bit of trawling but can't find anything concrete to base decisions on. I'm hoping someone can offer some advice on suitable hardware and kernel tweaks for using Linux as a router running bgpd via Quagga. We do this at the moment and our box manages under the 100Mbps level very effectively. Over the next year however we expect to push about 250Mbps outbound traffic with very little inbound (50Mbps simultaneously) and I'm seeing differing suggestions of what to do in order to move up to the 1Gbps level. It seems even a dual core box with expensive NICs and some kernel tweaks will accomplish this but we can't afford to get the hardware purchases wrong. We'd be looking to buy one live and one standby box within the next month or so. They will only run Quagga primarily with 'tc' for shaping. We're in the UK if it makes any difference. Any help massively appreciated, ideally from those doing the same in production environments. Thanks, Chris
--On December 18, 2008 4:02:14 PM -0800 Bruce Robertson <bruce@greatbasin.net> wrote:
Imagestream does nice work as well.
I'll second the plug for imagestream as well.
Soucy, Ray wrote:
If all you're looking for is basic routing though, it might be worthwhile just getting a Vyatta appliance.
-- "Genius might be described as a supreme capacity for getting its possessors into trouble of all kinds." -- Samuel Butler
On Fri, Dec 19, 2008 at 18:32:40PM -0700, Michael Loftis wrote:
--On December 18, 2008 4:02:14 PM -0800 Bruce Robertson <bruce@greatbasin.net> wrote:
Imagestream does nice work as well.
I'll second the plug for imagestream as well.
Soucy, Ray wrote:
If all you're looking for is basic routing though, it might be worthwhile just getting a Vyatta appliance.
Aren't both Imagestream and Vyatta routers built atop a Linux platform? -- Henry Yen Aegis Information Systems, Inc. Senior Systems Programmer Hicksville, New York
Henry Yen wrote:
On Fri, Dec 19, 2008 at 18:32:40PM -0700, Michael Loftis wrote:
--On December 18, 2008 4:02:14 PM -0800 Bruce Robertson <bruce@greatbasin.net> wrote:
Imagestream does nice work as well.
I'll second the plug for imagestream as well.
Soucy, Ray wrote:
If all you're looking for is basic routing though, it might be worthwhile just getting a Vyatta appliance.
Aren't both Imagestream and Vyatta routers built atop a Linux platform?
So is Juniper a BSD base (if I recall correct). The difference is the selection of hardware and added routing hardware. The issue is, that those additions, that Juniper, Imagestream and Vyatta add, are not available on the standard platform, so it can't be quite compared. Kind regards, Martin List-Petersen -- Airwire - Ag Nascadh Pobal an Iarthar http://www.airwire.ie Phone: 091-865 968
I wasn't aware of imagestream using any custom (asic) hardware, except the T1/3 cards in the concentrator we bought from them (worked like a champ, btw). -brandon On 12/19/08, Martin List-Petersen <martin@airwire.ie> wrote:
Henry Yen wrote:
On Fri, Dec 19, 2008 at 18:32:40PM -0700, Michael Loftis wrote:
--On December 18, 2008 4:02:14 PM -0800 Bruce Robertson <bruce@greatbasin.net> wrote:
Imagestream does nice work as well.
I'll second the plug for imagestream as well.
Soucy, Ray wrote:
If all you're looking for is basic routing though, it might be worthwhile just getting a Vyatta appliance.
Aren't both Imagestream and Vyatta routers built atop a Linux platform?
So is Juniper a BSD base (if I recall correct). The difference is the selection of hardware and added routing hardware.
The issue is, that those additions, that Juniper, Imagestream and Vyatta add, are not available on the standard platform, so it can't be quite compared.
Kind regards, Martin List-Petersen
-- Airwire - Ag Nascadh Pobal an Iarthar http://www.airwire.ie Phone: 091-865 968
-- Sent from my mobile device Brandon Galbraith Voice: 630.400.6992 Email: brandon.galbraith@gmail.com
Brandon Galbraith wrote:
I wasn't aware of imagestream using any custom (asic) hardware, except the T1/3 cards in the concentrator we bought from them (worked like a champ, btw).
It doesn't have to be hardware. Even their custom developed drivers and software isn't available on anything but their platform. But true, their products show, what can be done even without custom hardware. It's a matter of optimizing the drivers and a careful selection of hardware. All I was referring to, is that if you take a Linux box, Quagga, stock hardware, you might not get quite the same results. And don't get me wrong, we use Quagga and are quite happy with it. Kind regards, Martin List-Petersen
-brandon
On 12/19/08, Martin List-Petersen <martin@airwire.ie> wrote:
Henry Yen wrote:
On Fri, Dec 19, 2008 at 18:32:40PM -0700, Michael Loftis wrote:
--On December 18, 2008 4:02:14 PM -0800 Bruce Robertson <bruce@greatbasin.net> wrote:
Imagestream does nice work as well.
I'll second the plug for imagestream as well.
Soucy, Ray wrote:
If all you're looking for is basic routing though, it might be worthwhile just getting a Vyatta appliance. Aren't both Imagestream and Vyatta routers built atop a Linux platform?
So is Juniper a BSD base (if I recall correct). The difference is the selection of hardware and added routing hardware.
The issue is, that those additions, that Juniper, Imagestream and Vyatta add, are not available on the standard platform, so it can't be quite compared.
Kind regards, Martin List-Petersen
-- Airwire - Ag Nascadh Pobal an Iarthar http://www.airwire.ie Phone: 091-865 968
-- Airwire - Ag Nascadh Pobal an Iarthar http://www.airwire.ie Phone: 091-865 968
It doesn't - It's just an x86 PC. I have Vyatta running inside VMware ESX, not well, but it works ;-) Comparing Imagestream and Vyatta to Juniper is crazy. The first two are software based platforms (with perhaps some hardware off-load for checksums and whatnot), where as the Juniper pretty much just uses BSD for control-plane features (BGP, for example, and controlling the hardware that actually does packet switching/routing). Brandon Galbraith wrote:
I wasn't aware of imagestream using any custom (asic) hardware, except the T1/3 cards in the concentrator we bought from them (worked like a champ, btw).
David Coulson wrote:
It doesn't - It's just an x86 PC. I have Vyatta running inside VMware ESX, not well, but it works ;-)
Comparing Imagestream and Vyatta to Juniper is crazy. The first two are software based platforms (with perhaps some hardware off-load for checksums and whatnot), where as the Juniper pretty much just uses BSD for control-plane features (BGP, for example, and controlling the hardware that actually does packet switching/routing).
Sorry, but it's not crazy at all. They are all routers. The question is now, what your requirements are and what you are looking for. All I was pointing out was, that it doesn't matter, what platform the router is based on, but more, what hardware it's running on and how much work has been spend on optimizing it. You can use an Olive with JunOS on stock hardware, too and you'll probably be fine for gbit speeds, but you can't expect support for it. It's nice to see some development in the FOSS side of things to get past gbit speeds, but I don't think Imagestream or Vyatta could be mixed in there either, because that's the other end of the scale again (within the software routers that is). Kind regards, Martin List-Petersen
Brandon Galbraith wrote:
I wasn't aware of imagestream using any custom (asic) hardware, except the T1/3 cards in the concentrator we bought from them (worked like a champ, btw).
-- Airwire - Ag Nascadh Pobal an Iarthar http://www.airwire.ie Phone: 091-865 968
* David Coulson:
Comparing Imagestream and Vyatta to Juniper is crazy. The first two are software based platforms (with perhaps some hardware off-load for checksums and whatnot), where as the Juniper pretty much just uses BSD for control-plane features (BGP, for example, and controlling the hardware that actually does packet switching/routing).
Juniper has software-based offerings, too. Not everybody pushes multi-gigabit traffic volumes at which (software on) special processors seem to be required.
Once upon a time, David Coulson <david@davidcoulson.net> said:
Comparing Imagestream and Vyatta to Juniper is crazy. The first two are software based platforms (with perhaps some hardware off-load for checksums and whatnot), where as the Juniper pretty much just uses BSD for control-plane features (BGP, for example, and controlling the hardware that actually does packet switching/routing).
Well, the J-series are fully software-based routers. Still, they have their own routing daemons and such. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble.
Well, the J-series are fully software-based routers. Still, they have their own routing daemons and such.
The difference is that Juniper, even on th J-series box, completely separates the control plane and fowarding plane. The forwarding plane on a M or T series is going to be a ppc based chip along with ASIC to do the real lifitng. In the J-series it is a emulated FPC running on real-time BSD. The control plane on both platform is freebsd based. This is also what an Olive is. The control plane with no FPC. You'll also note horrible forwarding performance on an Olive. So a J-series it cannot be compared to a Vyatta or Imagestream even though it's software based. I think IOS can be compared more so (combined fowarding and control planes on software routers) -Brandon http://www.juniper.net/company/presscenter/pr/2005/pr-050131.html http://www.linuxdevices.com/news/NS4066781213.html
participants (36)
-
Adam Crosby
-
Adrian Chadd
-
Alex Thurlow
-
Beat Vontobel
-
Brandon Bennett
-
Brandon Galbraith
-
Bruce Robertson
-
Chris
-
Chris Adams
-
Colin Alston
-
Darden, Patrick S.
-
David Coulson
-
David Gilbert
-
Etaoin Shrdlu
-
Eugeniu Patrascu
-
Florian Weimer
-
Florian Weimer
-
Henry Yen
-
Ingo Flaschberger
-
Jens Link
-
Jeroen Wunnink
-
Jim Shankland
-
Joe Greco
-
Kevin Oberman
-
Marc Runkel
-
Marshall Eubanks
-
Martin List-Petersen
-
Michael 'Moose' Dinn
-
Michael Loftis
-
Nathan Ward
-
Naveen Nathan
-
Randy Bush
-
Sargun Dhillon
-
Scott Francis
-
Soucy, Ray
-
Tim Durack