Random Early Detect and streaming video
I've been involved in service provider networks, small retail ISPs, for 20+ years now. Largely though, we've never needed complex QoS, as at $OLD_DAY_JOB, we had been consistently positioned to avoid regular link congestion by having sufficient capacity. In the few instances when we've had link congestion, egress priority queuing met our needs. With a new organization there is a set of circumstances that have resulted in a long standing business decision to apply some rate-limiting/traffic management during times of higher utilization to a subset of traffic. This traffic happens to be ABR streaming video traffic. The thought was that a little bit of packet loss that comes from RED or WRED could largely be absorbed in the ABR playback client's innate behavior, and yes, possibly a drop in video profile. These are acceptable business outcomes in this case. The question I have for the smart people of this list is, given the specific application that is receiving this treatment, is there any reason to apply a RED behavior any appreciable amount before the bandwidth limit for this application? It makes sense to me for interactive TCP traffic where you want to apply some artificial control to the TCP window, but I *feel* like ABR streaming video was designed to expect congestion, at least as the edge of the customers home, and combine that with the buffering and we should adjust the drop profile to kick in at a higher percentage. Today we use 70% to start triggering the drop behavior, but my head tells me it should be higher. The reason I am saying this is that we are dropping packets ahead of full link congestion, yes that is what RED was designed to do, but I surmise that we are making this application worse than is actually intended. Hopefully my targeted vagurey has still left enough context intact to receive some useful commentary back. Thanks
Hey, On Mon, 7 Nov 2022 at 21:58, Graham Johnston <johnston.grahamj@gmail.com> wrote:
I've been involved in service provider networks, small retail ISPs, for 20+ years now. Largely though, we've never needed complex QoS, as at $OLD_DAY_JOB, we had been consistently positioned to avoid regular link congestion by having sufficient capacity. In the few instances when we've had link congestion, egress priority queuing met our needs.
What does 'egress priority queueing' mean? Do you mean 'send all X, before any Y, send all Y before any Z'? If this, then this must have been quite some time now, as since traffic managers were implemented in hardware ages ago, this hasn't been available. And the only thing that has been available has been 'X has guaranteed rate X1, Y has Y1 and Z has Z1' and love it or hate it, that's the QoS tool industry has decided you need.
combine that with the buffering and we should adjust the drop profile to kick in at a higher percentage. Today we use 70% to start triggering the drop behavior, but my head tells me it should be higher. The reason I am saying this is that we are dropping packets ahead of full link congestion, yes that is what RED was designed to do, but I surmise that we are making this application worse than is actually intended.
I wager almost no one knows what their RED curve is, and different vendors have different default curves which is then the curve almost everyone uses. Some use a RED curve such that everything is basically tail drop (Juniper, 0% drop at 96% fill and 100% drop at 98% fill). Some are linear. Some allow defining just two points, some allow defining 64 points. And almost no one has any idea what their curve is, i.e. mostly it doesn't matter. If it usually mattered, we'd all know what the curve is and why. As practical example Juniper has basically In your case, I assume you have at least two points with 0% drop at 69% fill, then a linear curve from 70% to 100% fill with 1% to 100% drop. It doesn't seem outright wrong to me. You have 2-3 goals here, to avoid synchronising TCP flows so that you have steady fill, instead of wave-like behaviour and to reduce queueing delay for packets not dropped, which would experience as long a delay as there is queue if tail dropped. You could have a 3rd possible goal, if you map more than 1 class of packets into the same queue you can still give them different curves, so during congestion in a single queue can show two different behaviours depending on packet. So what is the problem you're trying to fix? Can you measure it? I suspect in a modern high speed network with massive amounts of flows the wave-like synchronisation is not a problem. If you can't measure it or If your only goal is to reduce queueing delay because you have 'strategic' congestion, perhaps instead of worrying about RED, use tail only and reduce queue size to something that is tolerable 1ms-5ms max? -- ++ytti
What does 'egress priority queueing' mean? Do you mean 'send all X, before any Y, send all Y before any Z'? If this, then this must have been quite some time now, as since traffic managers were implemented in hardware ages ago, this hasn't been available.
As you'll probably remember, I'm just an academic who tries to keep up with the times. What I can add is that in the latest MEF CECP (Carrier Ethernet Certified Professional) courseware (Blueprint D), egress bandwidth profiles that are based on token buckets are still actively advocated as a means of applying QoS. Cheers, Etienne On Tue, Nov 8, 2022 at 10:00 AM Saku Ytti <saku@ytti.fi> wrote:
Hey,
On Mon, 7 Nov 2022 at 21:58, Graham Johnston <johnston.grahamj@gmail.com> wrote:
I've been involved in service provider networks, small retail ISPs, for 20+ years now. Largely though, we've never needed complex QoS, as at $OLD_DAY_JOB, we had been consistently positioned to avoid regular link congestion by having sufficient capacity. In the few instances when we've had link congestion, egress priority queuing met our needs.
What does 'egress priority queueing' mean? Do you mean 'send all X, before any Y, send all Y before any Z'? If this, then this must have been quite some time now, as since traffic managers were implemented in hardware ages ago, this hasn't been available. And the only thing that has been available has been 'X has guaranteed rate X1, Y has Y1 and Z has Z1' and love it or hate it, that's the QoS tool industry has decided you need.
combine that with the buffering and we should adjust the drop profile to kick in at a higher percentage. Today we use 70% to start triggering the drop behavior, but my head tells me it should be higher. The reason I am saying this is that we are dropping packets ahead of full link congestion, yes that is what RED was designed to do, but I surmise that we are making this application worse than is actually intended.
I wager almost no one knows what their RED curve is, and different vendors have different default curves which is then the curve almost everyone uses. Some use a RED curve such that everything is basically tail drop (Juniper, 0% drop at 96% fill and 100% drop at 98% fill). Some are linear. Some allow defining just two points, some allow defining 64 points. And almost no one has any idea what their curve is, i.e. mostly it doesn't matter. If it usually mattered, we'd all know what the curve is and why. As practical example Juniper has basically
In your case, I assume you have at least two points with 0% drop at 69% fill, then a linear curve from 70% to 100% fill with 1% to 100% drop. It doesn't seem outright wrong to me. You have 2-3 goals here, to avoid synchronising TCP flows so that you have steady fill, instead of wave-like behaviour and to reduce queueing delay for packets not dropped, which would experience as long a delay as there is queue if tail dropped. You could have a 3rd possible goal, if you map more than 1 class of packets into the same queue you can still give them different curves, so during congestion in a single queue can show two different behaviours depending on packet. So what is the problem you're trying to fix? Can you measure it?
I suspect in a modern high speed network with massive amounts of flows the wave-like synchronisation is not a problem. If you can't measure it or If your only goal is to reduce queueing delay because you have 'strategic' congestion, perhaps instead of worrying about RED, use tail only and reduce queue size to something that is tolerable 1ms-5ms max?
-- ++ytti
-- Ing. Etienne-Victor Depasquale Assistant Lecturer Department of Communications & Computer Engineering Faculty of Information & Communication Technology University of Malta Web. https://www.um.edu.mt/profile/etiennedepasquale
Sorry, everyone, my initial reply was only to Saku so I'm replying again for visibility to the list. On Tue, 8 Nov 2022 at 02:57, Saku Ytti <saku@ytti.fi> wrote:
Hey,
On Mon, 7 Nov 2022 at 21:58, Graham Johnston <johnston.grahamj@gmail.com> wrote:
I've been involved in service provider networks, small retail ISPs, for 20+ years now. Largely though, we've never needed complex QoS, as at $OLD_DAY_JOB, we had been consistently positioned to avoid regular link congestion by having sufficient capacity. In the few instances when we've had link congestion, egress priority queuing met our needs.
What does 'egress priority queueing' mean? Do you mean 'send all X, before any Y, send all Y before any Z'? If this, then this must have been quite some time now, as since traffic managers were implemented in hardware ages ago, this hasn't been available. And the only thing that has been available has been 'X has guaranteed rate X1, Y has Y1 and Z has Z1' and love it or hate it, that's the QoS tool industry has decided you need.
Yeah, I'm sure I didn't use all of the features, we did have to set a bandwidth-share value and possibly a bit more. I guess as I look at my past it was more a case of not having to perform any rate limiting on the parts of the network that I'm thinking about, and long term familiarity with that platform as compared to the new environment which I'm less familiar with, and is a different platform, Juniper to be specific.
combine that with the buffering and we should adjust the drop profile to kick in at a higher percentage. Today we use 70% to start triggering the drop behavior, but my head tells me it should be higher. The reason I am saying this is that we are dropping packets ahead of full link congestion, yes that is what RED was designed to do, but I surmise that we are making this application worse than is actually intended.
I wager almost no one knows what their RED curve is, and different vendors have different default curves which is then the curve almost everyone uses. Some use a RED curve such that everything is basically tail drop (Juniper, 0% drop at 96% fill and 100% drop at 98% fill). Some are linear. Some allow defining just two points, some allow defining 64 points. And almost no one has any idea what their curve is, i.e. mostly it doesn't matter. If it usually mattered, we'd all know what the curve is and why. As practical example Juniper has basically
Overall, with my current concern being drops before they seem to be necessary, combined with you comments about Juniper which I take to be the behavior of default drop profile, I feel more confident that our current drop profile behavior is just more aggressive than it needs to be.
In your case, I assume you have at least two points with 0% drop at 69% fill, then a linear curve from 70% to 100% fill with 1% to 100% drop. It doesn't seem outright wrong to me. You have 2-3 goals here, to avoid synchronising TCP flows so that you have steady fill, instead of wave-like behaviour and to reduce queueing delay for packets not dropped, which would experience as long a delay as there is queue if tail dropped. You could have a 3rd possible goal, if you map more than 1 class of packets into the same queue you can still give them different curves, so during congestion in a single queue can show two different behaviours depending on packet. So what is the problem you're trying to fix? Can you measure it?
As mentioned above, my problem/supposition is that we drop too much before it's necessary and impact the customer experience in a way that isn't needed. While I can't directly measure the customer experience, I can measure drop rate versus bandwidth. If my supposition is correct, that a drop profile that drops later (at a higher utilization rate), we'd see less dropped packets, and possibly a higher utilization rate. While this whole configuration policy is in place to reduce utilization, we operate these links with a hard cap, thus I'd like to use as much of it as possible. What may have changed is that in the past these links were functionally operated at their capacity, rather than right now where we are slightly below capacity.
I suspect in a modern high speed network with massive amounts of flows the wave-like synchronisation is not a problem. If you can't measure it or If your only goal is to reduce queueing delay because you have 'strategic' congestion, perhaps instead of worrying about RED, use tail only and reduce queue size to something that is tolerable 1ms-5ms max?
On many levels, it does seem like what I want is tail drop rather than RED.
-- ++ytti
Thanks for your response, Saku. I also am a user of Oxidized, thanks for that as well.
participants (3)
-
Etienne-Victor Depasquale
-
Graham Johnston
-
Saku Ytti