Re: scaling linux-based router hardware recommendations

27 Jan 2015

      On 1/26/15 11:33 PM, Pavel Odintsov wrote:
...
Hello!
Looks like somebody want to build Linux soft router!) Nice idea for
routing 10-30 GBps. I route about 5+ Gbps in Xeon E5-2620v2 with 4
10GE cards Intel 82599 and Debian Wheezy 3.2 (but it's really terrible
kernel, everyone should use modern kernels since 3.16 because "buggy
linux route cache"). My current processor load on server is about:
15%, thus I can route about 15 GE on my Linux server.
I looked into the promise and limits of this approach pretty intensively 
a few years back before abandoning the effort abruptly due to other 
constraints. Underscoring what others have said: it's all about pps, not 
aggregate throughput. Modern NICs can inject packets at line rate into 
the kernel, and distribute them across per-processor queues, etc. 
Payloads end up getting DMA-ed from NIC to RAM to NIC. There's really no 
reason you shouldn't be able to push 80 Gb/s of traffic, or more, 
through these boxes. As for routing protocol performance (BGP 
convergence time, ability to handle  multiple full tables, etc.): that's 
just CPU and RAM.

The part that's hard (as in "can't be fixed without rethinking this 
approach") is the per-packet routing overhead: the cost of reading the 
packet header, looking up the destination in the routing table, 
decrementing the TTL, and enqueueing the packet on the correct outbound 
interface. At the time, I was able to convince myself that being able to 
do this in 4 us, average, in the Linux kernel, was within reach. That's 
not really very much time: you start asking things like "will the entire 
routing table fit into the L2 cache?"

4 us to "think about" each packet comes out to 250Kpps per processor; 
with 24 processors, it's 6Mpps (assuming zero concurrency/locking 
overhead, which might be a little bit of an ... assumption). With 
1500-byte packets, 6Mpps is 72 Gb/s of throughput -- not too shabby. But 
with 40-byte packets, it's less than 2 Gb/s. Which means that your Xeon 
ES-2620v2 will not cope well with a DDoS of 40-byte packets. That's not 
necessarily a reason not to use this approach, depending on your 
situation; but it's something to be aware of.

I ended up convincing myself that OpenFlow was the right general idea: 
marry fast, dumb, and cheap switching hardware with fast, smart, and 
cheap generic CPU for the complicated stuff.

My expertise, such as it ever was, is a bit stale at this point, and my 
figures might be a little off. But I think the general principle 
applies: think about the minimum number of x86 instructions, and the 
minimum number of main memory accesses, to inspect a packet header, do a 
routing table lookup, and enqueue the packet on an outbound interface. I 
can't see that ever getting reduced to the point where a generic server 
can handle 40-byte packets at line rate (for that matter, "line rate" is 
increasing a lot faster than "speed of generic server" these days).

Jim

Re: scaling linux-based router hardware recommendations

Jim Shankland