The reason why we need larger buffers on some applications is because of TCP implementation detail. When TCP window grows in size (it grows exponentially) the newly created window size is bursted on to the wire at sender speed.
If sender is significantly higher speed than receiver, someone needs to store these bytes, while they are serialised at receiver speed. If we cannot store them, then the window cannot grow to accommodate the banwdith*delay product and the receiver cannot observe ideal TCP receive rate.
If we'd change TCP sender to bandwidth estimation, and newly created window space would be serialised at estimated receiver rate then we would need dramatically less buffers. However this less aggressive TCP algorithm would be outcompeted by new reno reducing bandwidth estimation to approach zero.
Luckily almost all traffic is handled by few players, if they agree to change to well behaved TCP (or QUIC) algorithm, it doesn't matter much if the long tail is badly behaving TCP.