Sorry, everyone, my initial reply was only to Saku so I'm replying again for visibility to the list.

On Tue, 8 Nov 2022 at 02:57, Saku Ytti <saku@ytti.fi> wrote:
Hey,


On Mon, 7 Nov 2022 at 21:58, Graham Johnston <johnston.grahamj@gmail.com> wrote:


> I've been involved in service provider networks, small retail ISPs, for 20+ years now. Largely though, we've never needed complex QoS, as at $OLD_DAY_JOB, we had been consistently positioned to avoid regular link congestion by having  sufficient capacity. In the few instances when we've had link congestion, egress priority queuing met our needs.

What does 'egress priority queueing' mean? Do you mean 'send all X,
before any Y, send all Y before any Z'? If this, then this must have
been quite some time now, as since traffic managers were implemented
in hardware ages ago, this hasn't been available. And the only thing
that has been available has been 'X has guaranteed rate X1, Y has Y1
and Z has Z1' and love it or hate it, that's the QoS tool industry has
decided you need.

Yeah, I'm sure I didn't use all of the features, we did have to set a bandwidth-share value and possibly a bit more. I guess as I look at my past it was more a case of not having to perform any rate limiting on the parts of the network that I'm thinking about, and long term familiarity with that platform as compared to the new environment which I'm less familiar with, and is a different platform, Juniper to be specific.
 

> combine that with the buffering and we should adjust the drop profile to kick in at a higher percentage. Today we use 70% to start triggering the drop behavior, but my head tells me it should be higher. The reason I am saying this is that we are dropping packets ahead of full link congestion, yes that is what RED was designed to do, but I surmise that we are making this application worse than is actually intended.

I wager almost no one knows what their RED curve is, and different
vendors have different default curves which is then the curve almost
everyone uses. Some use a RED curve such that everything is basically
tail drop (Juniper, 0% drop at 96% fill and 100% drop at 98% fill).
Some are linear. Some allow defining just two points, some allow
defining 64 points. And almost no one has any idea what their curve
is, i.e. mostly it doesn't matter. If it usually mattered, we'd all
know what the curve is and why. As practical example Juniper has
basically

Overall, with my current concern being drops before they seem to be necessary, combined with you comments about Juniper which I take to be the behavior of default drop profile, I feel more confident that our current drop profile behavior is just more aggressive than it needs to be.
 

In your case, I assume you have at least two points with 0% drop at
69% fill, then a linear curve from 70% to 100% fill with 1% to 100%
drop. It doesn't seem outright wrong to me. You have 2-3 goals here,
to avoid synchronising TCP flows so that you have steady fill, instead
of wave-like behaviour and to reduce queueing delay for packets not
dropped, which would experience as long a delay as there is queue if
tail dropped. You could have a 3rd possible goal, if you map more than
1 class of packets into the same queue you can still give them
different curves, so during congestion in a single queue can show two
different behaviours depending on packet.
So what is the problem you're trying to fix? Can you measure it?

As mentioned above, my problem/supposition is that we drop too much before it's necessary and impact the customer experience in a way that isn't needed. While I can't directly measure the customer experience, I can measure drop rate versus bandwidth. If my supposition is correct, that a drop profile that drops later (at a higher utilization rate), we'd see less dropped packets, and possibly a higher utilization rate. While this whole configuration policy is in place to reduce utilization, we operate these links with a hard cap, thus I'd like to use as much of it as possible. What may have changed is that in the past these links were functionally operated at their capacity, rather than right now where we are slightly below capacity.
 

I suspect in a modern high speed network with massive amounts of flows
the wave-like synchronisation is not a problem. If you can't measure
it or If your only goal is to reduce queueing delay because you have
'strategic' congestion, perhaps instead of worrying about RED, use
tail only and reduce queue size to something that is tolerable 1ms-5ms
max?

On many levels, it does seem like what I want is tail drop rather than RED.
 
--
  ++ytti

Thanks for your response, Saku. I also am a user of Oxidized, thanks for that as well.