On Thu, Sep 21, 2023 at 3:34 PM William Herrin <bill@herrin.us> wrote:
My understanding has always been that 30ms was set based on human
On Thu, Sep 21, 2023 at 6:28 AM Tom Beecher <beecher@beecher.cc> wrote: perceptibility. 30ms was the average point at which the average person could start to detect artifacts in the audio.
Hi Tom,
Jitter doesn't necessarily cause artifacts in the audio. Modern applications implement what's called a "jitter buffer." As the name implies, the buffer collects and delays audio for a brief time before playing it for the user. This allows time for the packets which have been delayed a little longer (jitter) to catch up with the earlier ones before they have to be played for the user. Smart implementations can adjust the size of the jitter buffer to match the observed variation in delay so that sound quality remains the same regardless of jitter.
Indeed, on Zoom I barely noticed audio artifacts for a friend who was experiencing 800ms jitter. Yes, really, 800ms. We had to quit our gaming session because it caused his character actions to be utterly spastic, but his audio came through okay.
The problem, of course, is that instead of the audio delay being the average packet delay, it becomes the maximum packet delay.
Yes. I talked to this point in my apnic session here: https://blog.apnic.net/2020/01/22/bufferbloat-may-be-solved-but-its-not-over... I called it "riding the TCP sawtooth"- the compensating voip delay becomes equal to the maximum size of the buffer, and thus controls the jitter that way. Sometimes, to unreasonable extents, like 800ms in your example.
You start to have problems with people talking over each other because when they start they can't yet hear the other person talking. "Sorry, go ahead. No, you go ahead."
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
-- Oct 30: https://netdevconf.info/0x17/news/the-maestro-and-the-music-bof.html Dave Täht CSO, LibreQos