![](https://secure.gravatar.com/avatar/9439472b574c38b0f9c4b10c5a60f7b2.jpg?s=120&d=mm&r=g)
On Mon, Mar 16, 2009 at 09:09:35AM -0500, Leo Bicknell wrote:
Many edge devices have queues that are way too large.
What appears to happen is vendors don't auto-size queues. Something like a cable or DSL modem may be designed for a maximum speed of 10Mbps, and the vendor sizes the queue appropriately. The service provider then deploys the device at 2.5Mbps, which means roughly (as it can be more complex) the queue should be 1/4th the size. However the software doesn't auto-size the buffer to the link speed, and the operator doesn't adjust the buffer size in their config.
The result is that if the vendor targeted 100ms of buffer you now have 400ms of buffer, and really bad lag.
This is a very good point. Let me add, that it happens also for every autosensing 10/100/1000Base-T ethernet port, which typically does not auto-reduce buffers when the actual negotiated speed is not 1 Gbps.
As network operators we have to get out of the mind set that "packet drops are bad". While that may be true in planning the backbone to have sufficient bandwidth, it's the exact opposite of true when managing congestion at the edge. Reducing the buffer to be ~50ms of bandwidth makes the users a lot happier, and allows TCP to work. TCP needs drops to manage to the right speed.
My wish is for the vendors to step up. I would love to be able to configure my router/cable modem/dsl box with "queue-size 50ms" and have it compute, for the current link speed, 50ms of buffer.
Reducing buffers to 50 msec clearly avoids excessive queueing delays, but let's look at this from the wider perspective: 1) initially we had a system where hosts were using fixed 64 kB buffers This was unable to achieve good performance over high BDP paths 2) OS maintainers have fixed this by means of buffer autotuning, where the host buffer size is no longer the problem. 3) the above fix introduces unacceptable delays into networks and users are complaining, especially if autotuning approach #2 is used 4) network operators will fix the problem by reducing buffers to e.g. 50 msec So at the end of the day, we'll again have a system which is unable to achieve good performance over high BDP paths, since with reduced buffers we'll have an underbuffered bottleneck in the path which will prevent full link untilization if RTT>50 msec. Thus all the above exercises will end up in having almost the same situation as before (of course YMMV). Something is seriously wrong, isn't it? And yes, I opened this topic last week on Linux netdev mailinglist and tried hard to persuade those people that some less aggresive approach is probably necessary to achieve good balance between the requirements for fastest possible throughput and fairness in the network. But the maintainers simply didn't want to listen :-( M.