Once again, which is better shared buffer featurerich or fat buffer switches? When its better to put big buffer switch? When its better to drop and retransmit instead of queueing? Thanks. Dmitry
What I've observed is that it's better to have a big buffer device when you're mixing port speeds. The more dramatic the port speed differences (and the more of them), the more buffer you need. If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Dmitry Sherman" <dmitry@interhost.net> To: nanog@nanog.org Sent: Friday, April 9, 2021 7:57:05 AM Subject: Trident3 vs Jericho2 Once again, which is better shared buffer featurerich or fat buffer switches? When its better to put big buffer switch? When its better to drop and retransmit instead of queueing? Thanks. Dmitry
If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located.
While the larger buffer there you are likely to be severely impacting application throughput. On Fri, Apr 9, 2021 at 9:05 AM Mike Hammett <nanog@ics-il.net> wrote:
What I've observed is that it's better to have a big buffer device when you're mixing port speeds. The more dramatic the port speed differences (and the more of them), the more buffer you need.
If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located.
----- Mike Hammett Intelligent Computing Solutions <http://www.ics-il.com/> <https://www.facebook.com/ICSIL> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb> <https://www.linkedin.com/company/intelligent-computing-solutions> <https://twitter.com/ICSIL> Midwest Internet Exchange <http://www.midwest-ix.com/> <https://www.facebook.com/mdwestix> <https://www.linkedin.com/company/midwest-internet-exchange> <https://twitter.com/mdwestix> The Brothers WISP <http://www.thebrotherswisp.com/> <https://www.facebook.com/thebrotherswisp> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> ------------------------------ *From: *"Dmitry Sherman" <dmitry@interhost.net> *To: *nanog@nanog.org *Sent: *Friday, April 9, 2021 7:57:05 AM *Subject: *Trident3 vs Jericho2
Once again, which is better shared buffer featurerich or fat buffer switches? When its better to put big buffer switch? When its better to drop and retransmit instead of queueing?
Thanks. Dmitry
I have seen the opposite, where small buffers impacted throughput. Then again, it was observation only, no research into why, other than superficial. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Tom Beecher" <beecher@beecher.cc> To: "Mike Hammett" <nanog@ics-il.net> Cc: "Dmitry Sherman" <dmitry@interhost.net>, "NANOG" <nanog@nanog.org> Sent: Friday, April 9, 2021 8:40:00 AM Subject: Re: Trident3 vs Jericho2 If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located. While the larger buffer there you are likely to be severely impacting application throughput. On Fri, Apr 9, 2021 at 9:05 AM Mike Hammett < nanog@ics-il.net > wrote: <blockquote> What I've observed is that it's better to have a big buffer device when you're mixing port speeds. The more dramatic the port speed differences (and the more of them), the more buffer you need. If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP From: "Dmitry Sherman" < dmitry@interhost.net > To: nanog@nanog.org Sent: Friday, April 9, 2021 7:57:05 AM Subject: Trident3 vs Jericho2 Once again, which is better shared buffer featurerich or fat buffer switches? When its better to put big buffer switch? When its better to drop and retransmit instead of queueing? Thanks. Dmitry </blockquote>
The reason why we need larger buffers on some applications is because of TCP implementation detail. When TCP window grows in size (it grows exponentially) the newly created window size is bursted on to the wire at sender speed. If sender is significantly higher speed than receiver, someone needs to store these bytes, while they are serialised at receiver speed. If we cannot store them, then the window cannot grow to accommodate the banwdith*delay product and the receiver cannot observe ideal TCP receive rate. If we'd change TCP sender to bandwidth estimation, and newly created window space would be serialised at estimated receiver rate then we would need dramatically less buffers. However this less aggressive TCP algorithm would be outcompeted by new reno reducing bandwidth estimation to approach zero. Luckily almost all traffic is handled by few players, if they agree to change to well behaved TCP (or QUIC) algorithm, it doesn't matter much if the long tail is badly behaving TCP. On Fri, 9 Apr 2021 at 17:13, Mike Hammett <nanog@ics-il.net> wrote:
I have seen the opposite, where small buffers impacted throughput.
Then again, it was observation only, no research into why, other than superficial.
----- Mike Hammett Intelligent Computing Solutions <http://www.ics-il.com/> <https://www.facebook.com/ICSIL> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb> <https://www.linkedin.com/company/intelligent-computing-solutions> <https://twitter.com/ICSIL> Midwest Internet Exchange <http://www.midwest-ix.com/> <https://www.facebook.com/mdwestix> <https://www.linkedin.com/company/midwest-internet-exchange> <https://twitter.com/mdwestix> The Brothers WISP <http://www.thebrotherswisp.com/> <https://www.facebook.com/thebrotherswisp> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> ------------------------------ *From: *"Tom Beecher" <beecher@beecher.cc> *To: *"Mike Hammett" <nanog@ics-il.net> *Cc: *"Dmitry Sherman" <dmitry@interhost.net>, "NANOG" <nanog@nanog.org> *Sent: *Friday, April 9, 2021 8:40:00 AM *Subject: *Re: Trident3 vs Jericho2
If you have all the same port speed, small buffers are fine. If you have
100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located.
While the larger buffer there you are likely to be severely impacting application throughput.
On Fri, Apr 9, 2021 at 9:05 AM Mike Hammett <nanog@ics-il.net> wrote:
What I've observed is that it's better to have a big buffer device when you're mixing port speeds. The more dramatic the port speed differences (and the more of them), the more buffer you need.
If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located.
----- Mike Hammett Intelligent Computing Solutions <http://www.ics-il.com/> <https://www.facebook.com/ICSIL> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb> <https://www.linkedin.com/company/intelligent-computing-solutions> <https://twitter.com/ICSIL> Midwest Internet Exchange <http://www.midwest-ix.com/> <https://www.facebook.com/mdwestix> <https://www.linkedin.com/company/midwest-internet-exchange> <https://twitter.com/mdwestix> The Brothers WISP <http://www.thebrotherswisp.com/> <https://www.facebook.com/thebrotherswisp> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> ------------------------------ *From: *"Dmitry Sherman" <dmitry@interhost.net> *To: *nanog@nanog.org *Sent: *Friday, April 9, 2021 7:57:05 AM *Subject: *Trident3 vs Jericho2
Once again, which is better shared buffer featurerich or fat buffer switches? When its better to put big buffer switch? When its better to drop and retransmit instead of queueing?
Thanks. Dmitry
-- ++ytti
❦ 9 avril 2021 17:20 +03, Saku Ytti:
If we'd change TCP sender to bandwidth estimation, and newly created window space would be serialised at estimated receiver rate then we would need dramatically less buffers. However this less aggressive TCP algorithm would be outcompeted by new reno reducing bandwidth estimation to approach zero.
Luckily almost all traffic is handled by few players, if they agree to change to well behaved TCP (or QUIC) algorithm, it doesn't matter much if the long tail is badly behaving TCP.
I think many of them are now using BBR or BBR v2. It would be interesting to know how it impacted switch buffering. -- As flies to wanton boys are we to the gods; they kill us for their sport. -- Shakespeare, "King Lear"
On Fri, Apr 9, 2021 at 6:05 AM Mike Hammett <nanog@ics-il.net> wrote:
What I've observed is that it's better to have a big buffer device when you're mixing port speeds. The more dramatic the port speed differences (and the more of them), the more buffer you need.
If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located.
When a network is behaving well (losing few packets to data corruption), TCP throughput is is impacted by exactly two factors: 1. Packet round trip time 2. The size to which the congestion window has grown when the first packet is lost Assuming the sender has data ready, it will (after the initial negotiation) slam out 10 packets back to back at the local wire speed. Those 10 packets are the initial congestion window. After sending 10 packets it will wait and wait and wait until it hits a timeout or the other side responds with an acknowledgement. So those initial packets start out crammed right at the front of the round trip time with lots of empty afterwards. The receiver gets the packets in a similar burst and sends its acks. As the sender receives acknowledgement for each of the original packets, it sends two more. This doubling effect is called "slow start," and it's slow in the sense that the sender doesn't just throw the entire data set at the wire and hope. So, having received acks for 10 packets, it sends 20 more. These 20 have spread out a little bit, more or less based on the worst link speed in the path, but they're still all crammed up in a bunch at the start of the round trip time. Next round trip time it doubles to 40 packets. Then 80. Then 160. All crammed up at the start of the round trip time causing them to hit that one slowest link in the middle all at once. This doubling continues until one of the buffers in the middle is too small to hold the trailing part of the burst of packets while the leading part is sent. With a full buffer, a packet is dropped. Whatever the congestion window size is when that first packet is dropped, that number times the round trip time is more or less the throughput you're going to see on that TCP connection. The various congestion control algorithms for TCP do different things after they see that first packet drop. Some knock the congestion window in half right away. Others back down more cautiously. Some reduce growth all the way down to 1 packet per round trip time. Others will allow faster growth as the packets spread out over the whole round trip time and demonstrate that they don't keep getting lost. But in general, the throughput you're going to see on that TCP connection has been decided as soon as you lose that first packet. So, TCP will almost always get better throughput with more buffers. The flip side is latency: packets sitting in a buffer extend the time before the receiver gets them. So if you make a buffer that's 500 milliseconds long and then let a TCP connection fill it up, apps which work poorly in high latency environments (like games and ssh) will suffer. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
It will not be easy to get a straight answer, I would say more about your environ and applications. So if you considered the classical TCP algorithm ignoring latency it is large buffer, yet what about microburst? LG ________________________________ From: NANOG <nanog-bounces+lobna_gouda=hotmail.com@nanog.org> on behalf of William Herrin <bill@herrin.us> Sent: Friday, April 9, 2021 1:07 PM To: Mike Hammett <nanog@ics-il.net> Cc: nanog@nanog.org <nanog@nanog.org> Subject: Re: Trident3 vs Jericho2 On Fri, Apr 9, 2021 at 6:05 AM Mike Hammett <nanog@ics-il.net> wrote:
What I've observed is that it's better to have a big buffer device when you're mixing port speeds. The more dramatic the port speed differences (and the more of them), the more buffer you need.
If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located.
When a network is behaving well (losing few packets to data corruption), TCP throughput is is impacted by exactly two factors: 1. Packet round trip time 2. The size to which the congestion window has grown when the first packet is lost Assuming the sender has data ready, it will (after the initial negotiation) slam out 10 packets back to back at the local wire speed. Those 10 packets are the initial congestion window. After sending 10 packets it will wait and wait and wait until it hits a timeout or the other side responds with an acknowledgement. So those initial packets start out crammed right at the front of the round trip time with lots of empty afterwards. The receiver gets the packets in a similar burst and sends its acks. As the sender receives acknowledgement for each of the original packets, it sends two more. This doubling effect is called "slow start," and it's slow in the sense that the sender doesn't just throw the entire data set at the wire and hope. So, having received acks for 10 packets, it sends 20 more. These 20 have spread out a little bit, more or less based on the worst link speed in the path, but they're still all crammed up in a bunch at the start of the round trip time. Next round trip time it doubles to 40 packets. Then 80. Then 160. All crammed up at the start of the round trip time causing them to hit that one slowest link in the middle all at once. This doubling continues until one of the buffers in the middle is too small to hold the trailing part of the burst of packets while the leading part is sent. With a full buffer, a packet is dropped. Whatever the congestion window size is when that first packet is dropped, that number times the round trip time is more or less the throughput you're going to see on that TCP connection. The various congestion control algorithms for TCP do different things after they see that first packet drop. Some knock the congestion window in half right away. Others back down more cautiously. Some reduce growth all the way down to 1 packet per round trip time. Others will allow faster growth as the packets spread out over the whole round trip time and demonstrate that they don't keep getting lost. But in general, the throughput you're going to see on that TCP connection has been decided as soon as you lose that first packet. So, TCP will almost always get better throughput with more buffers. The flip side is latency: packets sitting in a buffer extend the time before the receiver gets them. So if you make a buffer that's 500 milliseconds long and then let a TCP connection fill it up, apps which work poorly in high latency environments (like games and ssh) will suffer. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
There is no easy, one side fits all answer to this question. It's a complex subject, and the answer will often be different depending on the environment and traffic profile. On Fri, Apr 9, 2021 at 8:58 AM Dmitry Sherman <dmitry@interhost.net> wrote:
Once again, which is better shared buffer featurerich or fat buffer switches? When its better to put big buffer switch? When its better to drop and retransmit instead of queueing?
Thanks. Dmitry
Buffer size has nothing to do with feature richness. Assuming you are asking about DC - in a wide radix low oversubscription network shallow buffers do just fine, some applications (think map reduce/ML model training) have many to one traffic patterns and suffer from incast as the result, deep buffers might be helpful here, DCI/DC-GW is another case where deep buffers could be justified. Regards, Jeff
On Apr 9, 2021, at 05:59, Dmitry Sherman <dmitry@interhost.net> wrote:
Once again, which is better shared buffer featurerich or fat buffer switches? When its better to put big buffer switch? When its better to drop and retransmit instead of queueing?
Thanks. Dmitry
participants (8)
-
Dmitry Sherman
-
Jeff Tantsura
-
lobna gouda
-
Mike Hammett
-
Saku Ytti
-
Tom Beecher
-
Vincent Bernat
-
William Herrin