Re: overly timid congestion control with amazon prime live video

I replied to Dan off list to investigate. Any Prime Video quality issues reported to an ISP by customers or CDN issues can be sent directly to primevideo-isp-us@amazon.com . Thanks, Sean
On Dec 13, 2024, at 3:52 PM, Daniel Sterling <sterling.daniel@gmail.com> wrote:
While streaming football last night from AT&T fiber (AS7018), I noticed the video quality went way down when I did a large download on another system. I have gigabit fiber but I'm using Linux tc to throttle my network traffic. I've configured cake with a 200mbit limit, and I also use a low BQL setting to further ensure low latency for low-bandwidth traffic.
IOW, my Linux router will drop packets across the board rather liberally in the face of large downloads, but I've always seen streams fight back for their share of the bandwidth -- except for amazon's.
The live stream appears to use UDP on a non-standard port (not 443). Does anyone know what amazon has done to cause their congestion control algorithms to yield so much bandwidth and not fight for their fair share?
Thanks, Dan

Do you have an explanation for the question he asked? I am sure it would be of interest to many here. Shane
On Dec 13, 2024, at 4:23 PM, L Sean Kennedy <liam@fedney.org> wrote:
I replied to Dan off list to investigate.
Any Prime Video quality issues reported to an ISP by customers or CDN issues can be sent directly to primevideo-isp-us@amazon.com .
Thanks, Sean
On Dec 13, 2024, at 3:52 PM, Daniel Sterling <sterling.daniel@gmail.com> wrote:
While streaming football last night from AT&T fiber (AS7018), I noticed the video quality went way down when I did a large download on another system. I have gigabit fiber but I'm using Linux tc to throttle my network traffic. I've configured cake with a 200mbit limit, and I also use a low BQL setting to further ensure low latency for low-bandwidth traffic.
IOW, my Linux router will drop packets across the board rather liberally in the face of large downloads, but I've always seen streams fight back for their share of the bandwidth -- except for amazon's.
The live stream appears to use UDP on a non-standard port (not 443). Does anyone know what amazon has done to cause their congestion control algorithms to yield so much bandwidth and not fight for their fair share?
Thanks, Dan

A couple cake notes below... On Fri, Dec 13, 2024 at 3:50 PM <sronan@ronan-online.com> wrote:
Do you have an explanation for the question he asked? I am sure it would be of interest to many here.
Shane
On Dec 13, 2024, at 4:23 PM, L Sean Kennedy <liam@fedney.org> wrote:
I replied to Dan off list to investigate.
Any Prime Video quality issues reported to an ISP by customers or CDN issues can be sent directly to primevideo-isp-us@amazon.com .
Thanks, Sean
On Dec 13, 2024, at 3:52 PM, Daniel Sterling <sterling.daniel@gmail.com> wrote:
While streaming football last night from AT&T fiber (AS7018), I noticed the video quality went way down when I did a large download on another system. I have gigabit fiber but I'm using Linux tc to throttle my network traffic. I've configured cake with a 200mbit limit, and I also use a low BQL setting to further ensure low latency for low-bandwidth traffic.
OK there are several things about cake that seem to be conflated here. Glad you are using it! 1) BQL is for outbound traffic only and provides backpressure at the native rate of the interface. If you further apply cake's shaper to a rate below that, BQL hardly enters the picture. 2) Applying a 200Mbit inbound limit to a gig fiber connection is overkill. Worst case it should be at 50%, and we generally recommend 85%. People keep trying to go above 95% and that fails to control slow start. This does not account for how congested the backhaul is and we certainly see people trying desparately to figure that out - see the cake-autorate project for a 5g example. 3) By default cake is in triple-isolate mode which does per-host/per flow fq. This means if you have two devices asking for the bandwidth, one with 1 flow, the other with dozens, each device will get roughly half the bandwidth. fq_codel, on the other hand, shares on pure flow basis. We put per host fq in there because of torrent, and to some extent, web traffic (which typically opens 15 flows at a time). However, if cake is on a natted router the per host mode fails unless you apply the "nat" option on the invocation. Arguably we should have made nat the default. If you have demand for less bandwidth that your fair share, you experience near zero latency and no loss for your flow. At 200Mbit, assuming nat mode was on, your amazon flow (probably running below 20mbit) should have seen no congestion or loss at all while another machine merrily downloaded at 180, dropping the occasional packet to keep it under control. Anyway, moving on...
IOW, my Linux router will drop packets across the board rather liberally in the face of large downloads, but I've always seen streams
Arguably cake drops less packets than any other AQM we know of. It still does not early and fast enough to slow start on short RTTs. I keep trying to get people to deploy ecn.
fight back for their share of the bandwidth -- except for amazon's.
The live stream appears to use UDP on a non-standard port (not 443). Does anyone know what amazon has done to cause their congestion control algorithms to yield so much bandwidth and not fight for their fair share?
Anyway, this is a good question, if that was the observed behavior. A packet capture showing it not recovering after a loss or delay would be interesting.
Thanks, Dan
-- Dave Täht CSO, LibreQos

On Fri, Dec 13, 2024 at 7:32 PM Dave Taht <dave.taht@gmail.com> wrote:
A couple cake notes below...
Hey Dave, thanks for replying and for all your hard work on AQM and latency.
On Fri, Dec 13, 2024 at 3:50 PM <sronan@ronan-online.com> wrote:
Do you have an explanation for the question he asked? I am sure it would be of interest to many here.
Sean noted live video uses a custom protocol called Sye that estimates available throughput. My understanding is this estimation can be too low as compared to TCP when the network is slow or congested. That is, in cases where TCP would buffer and then be able to move a large amount of data at the expense of latency, Sye may instead push much less data, sacrificing quality for latency. IMO amazon streaming should fall back to TCP in these conditions, perhaps noting to the customer their stream is no longer "live" but is slightly delayed to improve video quality. Ideally a network that could push on average 10mbit/s consistently over TCP could also push 10mbit/s consistently over a custom UDP protocol, but when this is not the case (due to any number of bizarre real-world conditions), the system should detect this and reset. Giving the customer a delayed but high-quality nearly-live stream would (again, IMO) be a better experience than a live but poor-quality video. Of course I could probably have achieved myself this by simply rewinding the live stream, but I was blinded to this option at the time by my surprise and amazement at how poor the video quality was.
2) Applying a 200Mbit inbound limit to a gig fiber connection is overkill. Worst case it should be at 50%, and we generally recommend 85%.
The reasoning for 200mbit is it's about 50% of best-case real-world 802.11 performance across a house. The goal is to keep buffers in the APs as empty as possible. I'd rather enforce this on the APs, but lacking that ability I do it for all traffic from the router to the rest of the network.
If you have demand for less bandwidth that your fair share, you experience near zero latency and no loss for your flow. At 200Mbit, assuming nat mode was on, your amazon flow (probably running below 20mbit) should have seen no congestion or loss at all while another machine merrily downloaded at 180, dropping the occasional packet to keep it under control.
You're absolutely right. It's very possible the issue I experienced was due to slowness at the wireless network level, and not Linux traffic shaping.
The live stream appears to use UDP on a non-standard port (not 443). Does anyone know what amazon has done to cause their congestion control algorithms to yield so much bandwidth and not fight for their fair share?
Anyway, this is a good question, if that was the observed behavior. A packet capture showing it not recovering after a loss or delay would be interesting.
My guess is that since Sye prioritizes live data over throughput, it will essentially by design deliver poor quality in situations where bandwidth is limited and TCP streams are vying to use as much of it as they can. This unfortunately describes a lot of home networks using wifi in real-world conditions. -- Dan

On Fri, Dec 13, 2024 at 5:32 PM Daniel Sterling <sterling.daniel@gmail.com> wrote:
On Fri, Dec 13, 2024 at 7:32 PM Dave Taht <dave.taht@gmail.com> wrote:
A couple cake notes below...
Hey Dave, thanks for replying and for all your hard work on AQM and latency.
I'm just the loudest...
On Fri, Dec 13, 2024 at 3:50 PM <sronan@ronan-online.com> wrote:
Do you have an explanation for the question he asked? I am sure it would be of interest to many here.
Sean noted live video uses a custom protocol called Sye that estimates available throughput. My understanding is this estimation can be too low as compared to TCP when the network is slow or congested.
That is, in cases where TCP would buffer and then be able to move a large amount of data at the expense of latency, Sye may instead push much less data, sacrificing quality for latency.
IMO amazon streaming should fall back to TCP in these conditions, perhaps noting to the customer their stream is no longer "live" but is slightly delayed to improve video quality. Ideally a network that could push on average 10mbit/s consistently over TCP could also push 10mbit/s consistently over a custom UDP protocol, but when this is not the case (due to any number of bizarre real-world conditions), the system should detect this and reset. Giving the customer a delayed but high-quality nearly-live stream would (again, IMO) be a better experience than a live but poor-quality video.
Of course I could probably have achieved myself this by simply rewinding the live stream, but I was blinded to this option at the time by my surprise and amazement at how poor the video quality was.
2) Applying a 200Mbit inbound limit to a gig fiber connection is overkill. Worst case it should be at 50%, and we generally recommend 85%.
The reasoning for 200mbit is it's about 50% of best-case real-world 802.11 performance across a house. The goal is to keep buffers in the APs as empty as possible. I'd rather enforce this on the APs, but lacking that ability I do it for all traffic from the router to the rest of the network.
These days I am a huge fan of the mt79 wifi chipset, either the gl-inet mt6000 oe the new openwrt one.
If you have demand for less bandwidth that your fair share, you experience near zero latency and no loss for your flow. At 200Mbit, assuming nat mode was on, your amazon flow (probably running below 20mbit) should have seen no congestion or loss at all while another machine merrily downloaded at 180, dropping the occasional packet to keep it under control.
You're absolutely right. It's very possible the issue I experienced was due to slowness at the wireless network level, and not Linux traffic shaping.
The live stream appears to use UDP on a non-standard port (not 443). Does anyone know what amazon has done to cause their congestion control algorithms to yield so much bandwidth and not fight for their fair share?
Anyway, this is a good question, if that was the observed behavior. A packet capture showing it not recovering after a loss or delay would be interesting.
My guess is that since Sye prioritizes live data over throughput, it will essentially by design deliver poor quality in situations where bandwidth is limited and TCP streams are vying to use as much of it as they can. This unfortunately describes a lot of home networks using wifi in real-world conditions.
It sounds like a protocol that could be improved.
-- Dan
-- Dave Täht CSO, LibreQos
participants (4)
-
Daniel Sterling
-
Dave Taht
-
L Sean Kennedy
-
sronan@ronan-online.com