On 07/15/2018 09:03 AM, Denys Fedoryshchenko wrote:
On 2018-07-14 22:05, Baldur Norddahl wrote:
I have considered OpenFlow and might do that. We have OpenFlow capable switches and I may be able to offload the work to the switch hardware. But I also consider this solution harder to get right than the idea of using Linux with tap devices. Also it appears the Openvswitch implements a different flavour of OpenFlow than the hardware switch (the hardware is limited to some fixed tables that Broadcom made up), so I might not be able to start with the software and then move on to hardware.
AFAIK openflow is suitable for datacenters, but doesn't scale well for users termination purposes. You will run out from TCAM much sooner than you expect.
Denys, could you expand on this? In a linux based solution (say with OVS), TCAM is memory/software based, and in following their dev threads, they have been optimizing flow caches continuously for various types of flows: megaflow, tiny flows, flow quantity and variety, caching, ... When you mate OVS with something like a Mellanox Spectrum switch (via SwitchDev) for hardware based forwarding, I could see certain hardware limitations applying, but don't have first hand experience with that. But I suppose you will see these TCAM issues on hardware only specialized openflow switches. On edge based translations, is hardware based forwarding actually necessary, since there are so many software functions being performed anyway? But I think a clarification on Baldur's speed requirements is needed. He indicates that there are a bunch of locations: do each of the locations require 10G throughput, or was the throughput defined for all sites in aggregate? If the sites indivdiually have smaller throughput, the software based boxes might do, but if that is at each site, then software-only boxes may not handle the throughput. But then, it may be conceivable that by buying a number of servers, and load spreading across the servers will provide some resiliency and will come in at a lower cost than putting in 'big iron' anyway. Because then there are some additional benefits: you can run Network Function Virtualization at the edge and provide additional services to customers. I forgot to mention this in the earlier thread, but there are some companies out there which provide devices with many ports on them and provide compute at the same time. So software based Linux switches are possible, with out reverting to a combination of physical switch and separate compute box. In a Linux based switch, by using IRQ affiinity, traffic from ports can be balanced across CPUs. So by collapsing switch and compute, additional savings might be able to be realized. As a couple of side notes: 1) the DPDK people support a user space dataplane version of OVS/OpenFlow, and 2) an eBPF version of the OVS dataplane is being worked on. In summary, OVS supports three current dataplanes with a fourth on the way. 1) native kernel, 2) hardware offload via TC (SwitchDev), 3) DPDK, 4) eBPF.
Linux tap device has very high overhead, it suits no more than working as some hotspot gateway for 100s of users.
As does the 'veth' construct.
-- Raymond Burkholder ray@oneunified.net https://blog.raymond.burkholder.net