While attempting to ascertain how big of switch buffers I needed in a 100G switch, I rediscovered this article where I first learned about switch buffers. https://fasterdata.es.net/network-tuning/router-switch-buffer-size-issues/#:~:text=Optimum%20Buffer%20Size&text=The%20general%20rule%20of%20thumb,1G%20host%20across%20the%20WAN. It suggests that 60 meg is what you need at 10G. Is that per interface? Would it be linear in that I would need 600 meg at 100G? Some 100G switches I was looking at only had 36 megs, so that's insufficient either way you look at it. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com
On Tue, Jan 2, 2024 at 3:02 PM Mike Hammett <nanog@ics-il.net> wrote:
While attempting to ascertain how big of switch buffers I needed in a 100G switch, I rediscovered this article where I first learned about switch buffers.
It suggests that 60 meg is what you need at 10G. Is that per interface? Would it be linear in that I would need 600 meg at 100G?
Some 100G switches I was looking at only had 36 megs, so that's insufficient either way you look at it.
Hi Mike, My thoughts: 1. 50 ms is -way- too much buffer. A couple links like that in the path and the user will suffer jitter in excess of 100ms which is incredibly disruptive for interactive applications. 2. The article discussed how much buffer to apply to the -slower- interfaces, not the faster ones, the idea being that data entering from the faster interfaces could otherwise overwhelm the slower ones resulting in needless retransmission and head-end blocking. Are the 100G interfaces on your switch the -slower- ones? I don't know the best number, but I suspect the speed at which packets clear an interface is probably a factor in the equation, so that the reasonable buffer depth in ms when a packet clears in 1ms is probably different than the reasonable buffer depth when a packet clears in 1 us. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
Hoo, boy. This is now such an old debate that I do not know where to start anymore. I am of the firm opinion nowadays that if you are buffering more than a few ms at these enormous speeds, you are doing it wrong, and regardless https://arxiv.org/abs/2109.11693 seems to hold as for highly multiplexed traffic. My outside number for a FIFO buffer in the modern CDN´d world is a mere 30ms at lower speeds which allows for good gaming and videoconferencing experiences, and good performance with modern paced transports (in general linux now does packet pacing across all congestion controls) and with a good head drop AQM and FQ algo, far, far less buffering is feasible across the board. https://blog.cerowrt.org/post/juniper/ But regrettably not available at 100Gbit yet (tho libreqos is coming close) On Tue, Jan 2, 2024 at 6:22 PM William Herrin <bill@herrin.us> wrote:
On Tue, Jan 2, 2024 at 3:02 PM Mike Hammett <nanog@ics-il.net> wrote:
While attempting to ascertain how big of switch buffers I needed in a 100G switch, I rediscovered this article where I first learned about switch buffers.
It suggests that 60 meg is what you need at 10G. Is that per interface? Would it be linear in that I would need 600 meg at 100G?
Some 100G switches I was looking at only had 36 megs, so that's insufficient either way you look at it.
Hi Mike,
My thoughts:
1. 50 ms is -way- too much buffer. A couple links like that in the path and the user will suffer jitter in excess of 100ms which is incredibly disruptive for interactive applications.
2. The article discussed how much buffer to apply to the -slower- interfaces, not the faster ones, the idea being that data entering from the faster interfaces could otherwise overwhelm the slower ones resulting in needless retransmission and head-end blocking. Are the 100G interfaces on your switch the -slower- ones?
I don't know the best number, but I suspect the speed at which packets clear an interface is probably a factor in the equation, so that the reasonable buffer depth in ms when a packet clears in 1ms is probably different than the reasonable buffer depth when a packet clears in 1 us.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
-- 40 years of net history, a couple songs: https://www.youtube.com/watch?v=D9RGX6QFm5E Dave Täht CSO, LibreQos
I'm assuming that modern queuing mechanisms aren't going to be viable in switches until which time they're baked into the silicon. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Dave Taht" <dave.taht@gmail.com> To: "William Herrin" <bill@herrin.us> Cc: "Mike Hammett" <nanog@ics-il.net>, "NANOG" <nanog@nanog.org> Sent: Tuesday, January 2, 2024 6:02:27 PM Subject: Re: Sufficient Buffer Sizes Hoo, boy. This is now such an old debate that I do not know where to start anymore. I am of the firm opinion nowadays that if you are buffering more than a few ms at these enormous speeds, you are doing it wrong, and regardless https://arxiv.org/abs/2109.11693 seems to hold as for highly multiplexed traffic. My outside number for a FIFO buffer in the modern CDN´d world is a mere 30ms at lower speeds which allows for good gaming and videoconferencing experiences, and good performance with modern paced transports (in general linux now does packet pacing across all congestion controls) and with a good head drop AQM and FQ algo, far, far less buffering is feasible across the board. https://blog.cerowrt.org/post/juniper/ But regrettably not available at 100Gbit yet (tho libreqos is coming close) On Tue, Jan 2, 2024 at 6:22 PM William Herrin <bill@herrin.us> wrote:
On Tue, Jan 2, 2024 at 3:02 PM Mike Hammett <nanog@ics-il.net> wrote:
While attempting to ascertain how big of switch buffers I needed in a 100G switch, I rediscovered this article where I first learned about switch buffers.
It suggests that 60 meg is what you need at 10G. Is that per interface? Would it be linear in that I would need 600 meg at 100G?
Some 100G switches I was looking at only had 36 megs, so that's insufficient either way you look at it.
Hi Mike,
My thoughts:
1. 50 ms is -way- too much buffer. A couple links like that in the path and the user will suffer jitter in excess of 100ms which is incredibly disruptive for interactive applications.
2. The article discussed how much buffer to apply to the -slower- interfaces, not the faster ones, the idea being that data entering from the faster interfaces could otherwise overwhelm the slower ones resulting in needless retransmission and head-end blocking. Are the 100G interfaces on your switch the -slower- ones?
I don't know the best number, but I suspect the speed at which packets clear an interface is probably a factor in the equation, so that the reasonable buffer depth in ms when a packet clears in 1ms is probably different than the reasonable buffer depth when a packet clears in 1 us.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
-- 40 years of net history, a couple songs: https://www.youtube.com/watch?v=D9RGX6QFm5E Dave Täht CSO, LibreQos
On Wed, 3 Jan 2024 at 01:05, Mike Hammett <nanog@ics-il.net> wrote:
It suggests that 60 meg is what you need at 10G. Is that per interface? Would it be linear in that I would need 600 meg at 100G?
Not at all. You need to understand WHY buffering is needed, to determine how much you want to offer buffering. Big buffering is needed, when: - Sender is faster than Receiver - Receiver wants to receive single flow at maximum rate - Sender is sending window growth at sender-rate, instead of estimated receiver-rate (Common case, but easy to change, as Linux already estimates receiver-rate, and 'tc' command can change this behaviour) Amount of big buffering depends on: - How much can the window grow, when it grows. Windows grow exponentially, so you need (RTT*receiver-rate)/2, /2 because if the window grows the first half is already done and is dropping in at receiver-rate, as ACKs come by. Let's imagine your sender is 100GE connected, and your receiver is 10GE connected. And you want to achieve a 10Gbps single flow rate. 10ms RTT - 12.5MB window size, worst case you need to grow 6.25MB and -10% off, because some of the growth you can send to the receiver, instead of buffering all of the growth, so you'd need 5.5-6MB. 100ms RTT would be ~60MB 200ms RTT would be ~600MB Now decide the answer you want to give in your products for these. At what RTT you want to guarantee what single-flow maximum rate? I do believe many of the CDNs are already using estimated receiver-rate to grow windows, which basically removes the need for buffering. But any standard cubic without tuning (i.e. all OS) will burst at line-rate window growth, causing the need for buffering. -- ++ytti
Threads like this are why I subscribe to this. On Wed, Jan 3, 2024 at 12:27 AM Saku Ytti <saku@ytti.fi> wrote:
On Wed, 3 Jan 2024 at 01:05, Mike Hammett <nanog@ics-il.net> wrote:
It suggests that 60 meg is what you need at 10G. Is that per interface? Would it be linear in that I would need 600 meg at 100G?
Not at all.
You need to understand WHY buffering is needed, to determine how much you want to offer buffering.
Big buffering is needed, when: - Sender is faster than Receiver - Receiver wants to receive single flow at maximum rate - Sender is sending window growth at sender-rate, instead of estimated receiver-rate (Common case, but easy to change, as Linux already estimates receiver-rate, and 'tc' command can change this behaviour)
Amount of big buffering depends on: - How much can the window grow, when it grows. Windows grow exponentially, so you need (RTT*receiver-rate)/2, /2 because if the window grows the first half is already done and is dropping in at receiver-rate, as ACKs come by.
Let's imagine your sender is 100GE connected, and your receiver is 10GE connected. And you want to achieve a 10Gbps single flow rate.
10ms RTT - 12.5MB window size, worst case you need to grow 6.25MB and -10% off, because some of the growth you can send to the receiver, instead of buffering all of the growth, so you'd need 5.5-6MB. 100ms RTT would be ~60MB 200ms RTT would be ~600MB
Now decide the answer you want to give in your products for these. At what RTT you want to guarantee what single-flow maximum rate?
I do believe many of the CDNs are already using estimated receiver-rate to grow windows, which basically removes the need for buffering. But any standard cubic without tuning (i.e. all OS) will burst at line-rate window growth, causing the need for buffering.
-- ++ytti
Thus spake Mike Hammett (nanog@ics-il.net) on Tue, Jan 02, 2024 at 05:02:22PM -0600:
While attempting to ascertain how big of switch buffers I needed in a 100G switch, I rediscovered this article where I first learned about switch buffers.
It suggests that 60 meg is what you need at 10G. Is that per interface? Would it be linear in that I would need 600 meg at 100G?
We've tried to be clear about the use cases where these guidelines apply. In these sets of articles, we are primarily describing issues prevalent between many scientific research and education environments where traffic can be dominated by multiplexing a low number of high-BDP machine-machine flows, such as from a telescope array to a supercomputer one continent away. Numbers here are not one-size-fits all, and are not necessarily characteristic of what you would want to do for multiplexing say a bazillion flows from cdns to homes all within ~10ms rtt. That said, if you dig in and understand where the numbers are coming from, the principles apply. Dale
participants (6)
-
Dale W. Carder
-
Dave Taht
-
Maurice Brown
-
Mike Hammett
-
Saku Ytti
-
William Herrin