On Mon, Jan 26, 2015 at 8:53 PM, micah anderson <micah@riseup.net> wrote:
Hi,
I know that specially programmed ASICs on dedicated hardware like Cisco, Juniper, etc. are going to always outperform a general purpose server running gnu/linux, *bsd... but I find the idea of trying to use proprietary, NSA-backdoored devices difficult to accept, especially when I don't have the budget for it.
I've noticed that even with a relatively modern system (supermicro with a 4 core 1265LV2 CPU, with a 9MB cache, Intel E1G44HTBLK Server adapters, and 16gig of ram, you still tend to get high percentage of time working on softirqs on all the CPUs when pps reaches somewhere around 60-70k, and the traffic approaching 600-900mbit/sec (during a DDoS, such hardware cannot typically cope).
It seems like finding hardware more optimized for very high packet per second counts would be a good thing to do. I just have no idea what is out there that could meet these goals. I'm unsure if faster CPUs, or more CPUs is really the problem, or networking cards, or just plain old fashioned tuning.
Any ideas or suggestions would be welcome! micah
Hello! This is a very interesting yet obscure and not widely discussed subject. And industry generally does not like the discussion to come up in public lists like this one. If you happen to reach line rate PPS throughput on x86, for filtering or forwarding, how will they keep that high profit rate on their products and keep investors happy? With that said, I am a very happy user for two hardware vendors not widely known, and a technology very well known but still barely discussed. I run FreeBSD, the so called "silent workhorse" as a BGP router and also FreeBSD (or pfSense) as a border firewall. For hardware vendors, I am a very happy customer of: - iXSystems (www.ixsystems.com) - ServerU Inc. (www.serveru.us) They are both BSD/Linux driven hardware specialists, and they are both very good consultants and technology engineers. I run a number of BGP and firewall boxes on GA, NY, FL and some other locations on east coast, as well as Belize, BVI and Bahamas and LATAM. pfSense is my number one system of choice, but sometimes I run FreeBSD vanilla, specially in my core locations. In one central location I have the following setup: - 1x ServerU Netmap L800 box in Bridge Mode for Core Firewall protection - 2x ServerU Netmap L800 boxes as BGP router (redundant) - Several Netmap L800, L100 and iXSystems servers (iXS for everything else since ServerU are only networking-centric, not high storage high processing Xeon servers) In this setup I am running yet another not well known but very promising technology, called Netmap. A Netmap firewall (called netmap-ipfw) was supplied from ServerU vendor, it's a slightly modified version from what you can download from Luigi Rizzo's (netmap author) public repository with multithread capabilities based on the number of queues available in the ServerU igb(4) networking card. What it does is, IMHO, amazing for a x86 hardware: line rate firewall on 1GbE port (1.3-1.4Mpps) and line rate firewall for 10GbE port (12-14Mpps) in a system with 8 @2.4Ghz Intel Rangeley CPU. It's not Linux DNA. It's not PF_RING. It's not Intel DPDK. It's netmap, it's there, available, on FreeBSD base system with a number of utilities and code for reference on Rizzos' repositories. It's there, it's available and it's amazing. This firewall has saved my sleep several times since November, dropping up to 9Mpps amplified UDP/NTP traffic on peak DDoS attack rates. For the BGP box, I needed trunking, Q-in-Q and vlan. And sadly right now this is not available in a netmap implementation. It means I had to keep my BGP router in the kernel path. It's funny to say this, but Netmap usually skips kernel path completely and does its job direct on the NIC, reaching backplane and bus limits directly. ServerU people recommended me to use Chelsio Terminator 5 40G ports. OK I only needed 10G but they convinced me not to look at the bits per second numbers but the packets per seconds number. Honestly, I don't know how Chelsio T5 did it, even though ServerU 1GbE ports perform very good on interruption CPU usage (probably this is an Intel igb(4) / ix(4) credit) but everything I route from one 40GbE port to the other port on the same L-800 expansion card, I have very, very, very LOW interrupt rates. Sometimes I have no interrupt at all!! I peaked routing 6Mpps on ServerU L-800 and still had CPU there, available. I am not sure where proper credits is due to ServerU hardware, to FreeBSD OS, to Netmap or to Chelsio. But I am sure on what it matters for my VP or my CFO: $$$ While a T5 card will cost around USD 1,000 and a ServerU L-800 router will cost another USD 1,200, I have a 2,2k USD overall cost of ownership for a box that will give me PPS rates that otherwise would cost from 9,000 USD to 12,000 USD on an industry product. I have followed a good discussion on a Linkedin Group (anyone googling for it will find it) comparing Netmap to DPDK from the developer perspective. Netmap developer pointed some good considerations while an Intel engineer pointed some other perspectives. Overall, DPDK and Netmap sounds, from my end-user/non-developer/non-software-engineer point of view, very similar in matter of results, while different in the inner gore details with some flexibility/generalist advantages for Netmap and some hardware specifics advantages for DPDK when running Intel hardware (of course), since its like CUDA is for Nvidia... vendor specific. I honestly hope a fraction of this million dollar donated to FreeBSD Foundation from WhatsApp founder goes on research and enhancements for Netmap technology. It's the most promising networking technology I have seen in the last years, and it goes straight to what FreeBSD does best: networking performance. It's not a coincidence that since the beginning of Internet, top Internet traffic servers, from Yahoo! to WhatsApp and Netflix, run FreeBSD. I don't know how important decisions can be addressed concerning adding to a Netmap stack a superset of full forwarding capability along with lagg(4), vlan(4), Q-in-Q, maybe carp(4) and other lightweight but still very kernel-path choppy features. But I hope FreeBSD engineers take good decisions on assigning those issues. And address time, funds and goals to Netmap. For now, however, if you really want a relatively new and innovative technology with actual code to use and run, ready and available, this is my suggestion: FreeBSD+Netmap. And for hardware vendors, iXSystems + ServerU. It gets out from the speculation field, since Netmap reference code for serious stuff, including a whole firewall, is available and ready to test, compare results, enhance and use. Suricata IDP has Netmap support, so yes, you can inspect close to line rate packets on IDS (not IPS) mode with Suricata. For everything else, DPDK, DNA, PF_RING, you have a framework in place. Some are experimental, some are more mature, but you will have to code and prove it by yourself. While FreeBSD/Netmap is a flavor ready to be tasted. This is my 5 cents opinion for such a great topic! Concerning BGP convergence time. Come on, are you serious? You deal with platforms that take 1 minute, up to 3 minutes for full convergence of a couple of bgp FULL sessions? What hardware is that? A Nintendo 8bits? LOL! ;-) Seriously and literally, a Sega Dreamcast videogame running NetBSD + BIRD will have better convergence time!! Now, serious again and no ironic statements further. While Cisco and Juniper have great ASICS chips and stuff, it's amazing to see that nowadays, Juniper MX Series still run weak Cavium-Octeon CPU for stuff their Trio 3D chip won't run. The same goes to Cisco with amazing ASICS but with weak CPU power that need, indeed, to be protected from DDoS attacks for things won't run on ASICS. Convergence time frames above 30 seconds nowadays, IMHO, should not be accepted on any new BGP environment. Only legacy hardware should take that long. For OpenBGP I have <30s convergence time for several full sessions on x86 hardware as the ones mentioned above. With BIRD, convergence time frames are even lower. If convergence time takes longer on OpenBGP or BIRD its mostly related to how long the UPDATE messages take to arrive, not to be processed. -- Eddie