Well said Eddie, It would be worth pointing out that on CCR's each port also has a core dedicated to it, a benefit of such a design is that each port is able to handle a much higher PPS rate, and if there is a DDOS attack on one port, it will not bring down the rest of the ports / router etc. (disclaimer, if the router is setup properly, without all traffic going thru the CPU etc etc). Faisal Imtiaz Snappy Internet & Telecom ----- Original Message -----
From: "Eddie Tardist" <edtardist@gmail.com> To: "North American Network Operators Group" <nanog@nanog.org> Sent: Wednesday, May 20, 2015 6:34:11 PM Subject: Re: Low Cost 10G Router
On Wed, May 20, 2015 at 2:07 PM, Mike Hammett <nanog@ics-il.net> wrote:
Well, the cores on a many-core CPU aren't going to have the "torque" that a Xeon would. They're also still working on the software. It has gotten a ton better over the life of the CCRs thus far. BGP is still atrocious on the CCRs, but that's because the route update process isn't multithreaded. It won't be multithreaded in the next major version either, but they will have done some programming voodoo (all programming is voodoo to me) to reign in the poor performance issues with full tables.
I honestly don't know why most people gets impressed by the number of Tylera cores on CCR and think it's a good thing. Your "torque" point makes much sense to me. A few cores with decent clock and Xeon or Rangeley "torque" is just better. Adding that much weak tylera cores with low clock only results in much more context switching, much more CPU Affinity needs.
Multithreading the relevant grained bit of code will also lead to more context switching, but for threads now instead of processes.
As I understand the architecture of those solutions, I don't see why a bgp daemon mono threaded is a problem. Ok, multithreaded would give a better full routing convergence. But once the routing table is loaded it does not matter how many threads the bgp process will use. The dirty work on Linux (RouterOS kernel for that matter) will be done on the forward information table, on the packet forwarding code and specially on softirq (interrupt requests). This is where the bottleneck seems to be, IMHO. Linux is not good at multithreaded packet forwarding and not good specially at handling interrupt requests on multi-queue NICs. So, RouterOS is not good as well.
Therefore that "several dozens" cheap and weak tylera cores powering CCR boxes is absolutely not friendly for Linux core and RouterOS itself.
I'm better served off with a smaller amount of cores with better clock and better "torque" as Mr Hammett mentioned (I liked the expression usage yes) and that's why a Linux or a BSD box with a couple Xeon CPUs will perform better than CCR. Sometimes as someone mentioned a couple i7 cores will outperform a CCR box as well. More torque, yeah. Less context switching and time sharing wasted.
However this horizontal scalar number of tylera cores on the CCR is good for marketing. After all "you are buying a 36 CPU box" paying "a couple hundred bucks". Impressive, hum? Well not for me.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest Internet Exchange http://www.midwest-ix.com
----- Original Message -----
From: "Colton Conor" <colton.conor@gmail.com> To: "Faisal Imtiaz" <faisal@snappytelecom.net> Cc: "North American Network Operators Group" <nanog@nanog.org> Sent: Tuesday, May 19, 2015 9:06:26 PM Subject: Re: Low Cost 10G Router
So this new $1295 Mikrotik CCR1036-8G-2S+EM has a 36 core Tilera CPU with 16GB of ram. Each core is running at 1.2Ghz? I assume that Mikrotik is multicore in software, so why does this box not outperform these intel boxes that everyone is recommending? Is it just a limitation of ports?
On Tue, May 19, 2015 at 6:03 PM, Faisal Imtiaz <faisal@snappytelecom.net> wrote:
I've seen serious, unusual performance bottlenecks in Mikrotik CCR, in some cases not even achieving a gigabit speeds on 10G interfaces.
Performance
drops more rapidly then Cisco with smaller packet sizes.
-mel beckman
Folks often forget that Mikrotik ROS can also run on x86 machines.....
Size your favorite hardware (server) or network appliance with appropriate ports, add MT ROS on a CF card, and you are good to go.
We use i7 based network appliance with dual 10g cards (you can use a quad 10g card, such as those made by hotlav).
with a 2gig of ram, you can easily do multiple (4-5 or more full bgp peers), and i7 are good for approx 1.2mill pps.
Best of luck.
Faisal Imtiaz Snappy Internet & Telecom