On Sat, May 13, 2023 at 12:28 AM Mark Tinka <mark@tinka.africa> wrote:
On 5/12/23 17:59, Dave Taht wrote:
:blush:
We have done a couple podcasts about it, like this one:
https://packetpushers.net/podcast/heavy-networking-666-improving-quality-of-...
and have perhaps made a mistake by using matrix chat, rather than a web forum, to too-invisibly, do development and support in, but it has been a highly entertaining way to get a better picture of the real problems caring ISPs have.
I see you are in Africa? We have a few ISPs playing with this in kenya...
DM me, please, the ones you are aware about that would be willing to share their experiences. I'd like to get them to talk about what they've gathered at the upcoming SAFNOG meeting in Lusaka.
We have a fairly large network in Kenya, so would be happy to engage with the operators running the LibreQoS there.
I forwarded your info. Slide 40 here, has an anonymized libreqos report of observed latencies in Africa. The RTTs there are severely bimodal (30ms vs 300ms), which mucks with a default assumption in sch_cake of a default 100ms RTT. http://www.taht.net/~d/Misunderstanding_Residential_Bandwidth_Latency.pdf There are two ways to deal with this, right now we are recommending cake rtt 200ms setting to keep throughput up. The FQ component dominates for most traffic, anyway. With a bit more work we hope to come up with a way of more consistent queuing delay. Or, we could just wait for more CDNs to and IXPs to deploy there. /me hides A note about our public plots: We had a lot of people sharing screenshots, so we added a "klingon mode" to consistently transliterate the more private data to that language. Another fun fact was by deploying this stuff several folk found sufficient non-paying clients on their network to pay for the hardware inside of a month or two.
We do not know. Presently our work is supported by equinix´s open source program, with four servers in their Dallas DC, and they are 25Gbit ports. Putting together enough dough to get to 100Gbit or finding someone willing to send traffic through more bare metal at that data center or elsewhere is on my mind. In other words, we can easily spin up the ability to L2 route some traffic through a box in their DCs, if only we knew where to find it. :)
If you assume linearity to cores (which is a lousy assumption, ok?), 64 Xeon cores could do about 200Gbit, running flat out. I am certain it will not scale linearly and we will hit multiple bottlenecks on a way to that goal.
Limits we know about:
A) Trying to drive 10s of gbits of realistic traffic through this requires more test clients and servers than we have, or someone with daring and that kind of real traffic in the first place. For example one of our most gung-ho clients has 100Gbit ports, but not anywhere near that amount of inbound traffic. (they are crazy enough to pull git head, try it for a few minutes in production, and then roll back or leave it up)
B) A brief test of a 64 core AMD + Nvidia ethernet was severely outperformed by our current choice of a 20 core xeon gold + intel 710 or 810 card. It is far more the ethernet card that is the dominating factor. I would kill if I could find one that did a LPM -> CPU mapping... (e.g. instead of a LPM->route mapping, LPM to what cpu to interrupt). We also tried an 80 core arm to inconclusive results early on.
Tests of the latest ubuntu release are ongoing. I am not prepared to bless that or release any results yet.
C) A single cake instance on one of the more high end Xeons can *almost* push 10Gbit/sec while eating a core.
D) Our model is one cake instance per subscriber + the ability to establish trees emulating links further down the chain. One ISP is modeling 10 mmwave hops. Another is just putting in multiple boxes closer to the towers.
So in other words, 100s of gbits is achievable today if you throw boxes at it, and more cost effective to do that way. We will of course, keep striving to crack 100gbit native on a single box with multiple cards. It is a nice goal to have.
E) In our present, target markets, 10k typical residential subscribers only eat 11Gbit/sec at peak. That is a LOT of the smaller ISPs and networks that fit into that space, so of late we have been focusing more on analytics and polish than pushing more traffic. Some of our new R/T analytics break down at 10k cake instances (that is 40 million fq_codel queues, ok?), and we cannot sample at 10ms rates, falling back to (presently) 1s conservatively.
We are nearing putting out a v1.4-rc7 which is just features and polish, you can get a .deb of v1.4-rc6 here:
https://github.com/LibreQoE/LibreQoS/releases/tag/v1.4-rc6
There is an optional, and anonymized reporting facility built into that. In the last two months, 44404 cake shaped devices shaping .19Tbits that we know of have come online. Aside from that we have no idea how many ISPs have picked it up! a best guess would be well over 100k subs at this point.
Putting in libreqos is massively cheaper than upgrading all the cpe to good queue management, (it takes about 8 minutes to get it going in monitor mode, but exporting shaping data into it requires glue, and time) but better cpe remains desirable - especially that the uplink component of the cpe also do sane shaping natively.
"And dang, it, ISPs of the world, please ship decent wifi!?", because we can see the wifi going south in many cases from this vantage point now. In the past year mikrotik in particular has done a nice update to fq_codel and cake in RouterOS, eero 6s have got quite good, much of openwifi/openwrt, evenroute is good...
It feels good, after 14 years of trying to fix the internet, to be seeing such progress, on fixing bufferbloat, and in understanding and explaining the internet better. joooooiiiiiiiin us..
All sounds very exciting.
I am happier about this than I have been since free.fr deployed fq_codel (to ultimately 3m devices) in 2012.
I'll share this with some friends at Cisco who are actively looking at ways to incorporate such tech. in their routers in response to QUIC. They might find it interesting.
I had hoped they were paying attention, in particular, that Cisco AFD would deploy more widely. Is anyone using that? It doesn´t do ECN (I think) but seemed promising at the time I encountered it. I kind of gave up on getting juniper to revisit their RED implementation, and started slapping cake on mikrotik in front of it. Here´s an example of why... https://blog.cerowrt.org/post/juniper/ hardware-wise, not middlebox, running native, right now it is just the low/middle end of the market fq_codeled - the cambiums, ubnts (smart queues are pretty universal there), openwrt and derivatives like openwifi (I am pleased to see tip openwifi gaining traction), opensense, freebsd, openbsd, linux, apple, and mikrotiks, the eeros, riverbeds, peplinks, evenroute, firewalla, now a pretty long list!, with only a few calling out the underlying algorithm specifically. Peplink has a "mitigate bufferbloat" checkbox, ubnt (smart queues). Despite my advocacy here of libreqos, cake is now also in paraqum´s product. They are claiming 100Gbit support, and I do not know how that works!! Also fq_codel has long been an integral part of preseem, which I think is the leading middlebox in the wisp world. As best as I can tell, only bequant (licensed by cambium) is doing a more dpi-oriented middlebox approach still. I do not know much about other major DPI players nowadays. ? These companies actually have business models, which thus far is sadly lacking for libreqos. Donations and feature bounties are not cutting it. Without equinix´s support and a few fervent believers we would not be where we are today, and I worry about exponential growth overwhelming our chat room´s all-volunteer support department. Yesterday we had 44044 shaped devices, this morning 46864 shaped devices... I think the install process is now easy enough for most folk to get it going without any support, but...
What we do now is put it inline with ospf/olsr/bgp with a low cost, and a wire with a higher cost, if it fails. Things have stablized a lot in the last few months, the last crash I can remember was in january. (in rust we trust!). You have to watch out for breaking spanning tree in that case. The most common install bug is someone flipping inbound and outbound interfaces in the setup.
Among other things we replaced the linux native bridge code with about 600 lines of ebpf C. The enormous speedup from that is getting us closer to what dpdk could do, but dpdk cannot queue worth a darn, just forward willy nilly.
I hope, in particular, far, far more folk start leveraging variants of doing inband measurements with pping. The stand alone code for that is here: https://github.com/thebracket/cpumap-pping
Thank you for doing this work. Because of the scope of our network, it's
Honestly, I want to finally quit doing bufferbloat in the next year or three, and work on other things. farplay.io is pretty neat, so is jacktrip.org. I have spent a lot of time in the past year trying to shift USA BEAD planning in more of the right directions.
not something that we would deploy in this current form (which is why I'd like to see what Cisco think about it, even if we don't really use them much anymore). But I do see the utility in it, especially for the smaller-to-medium sized ISP's, and will be sure let the community know about this.
Thanks for being willing to share!
Mark.
-- Podcast: https://www.linkedin.com/feed/update/urn:li:activity:7058793910227111937/ Dave Täht CSO, LibreQos