Thank you all for your answers here, on the poll itself, and for papers like this one. The consensus seems to be settling around 30ms for VOIP with a few interesting outliers and viewpoints.

https://scholarworks.gsu.edu/cgi/viewcontent.cgi?article=1043&context=cs_theses

Something that came up in reading that... that I half remember from my early days of working with VOIP (on asterisk) was that silence suppression (and comfort noise) that did not send any packets was in general worse than sending silence (or comfort noise) - for two reasons - one was nat closures,
but the other was that steady stream also helped control congestion and had less jitter swings. 

So in the deployments I was doing then, I universally disabled this feature on the phones I was using then. 

In my mind (particularly in a network that is packet (not byte) buffer limited), this showed that point, (to an extreme!)  

https://www.duo.uio.no/bitstream/handle/10852/45274/1/thesis.pdf 

But my question is now, are we doing silence suppression (not sending packets) on voip nowadays? 

On Thu, Sep 21, 2023 at 2:55 PM Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
Artifacts in audio are a product of packet loss or jitter resulting in codec issues issues leading to human subject perceptible audio anomalies, not so much latency by itself. Two way voice is remarkably NOT terrible on a 495ms RTT satellite based two-way geostationary connection as long as there is little or no packet loss.

On Thu, Sep 21, 2023 at 12:47 PM Tom Beecher <beecher@beecher.cc> wrote:
My understanding has always been that 30ms was set based on human perceptibility. 30ms was the average point at which the average person could start to detect artifacts in the audio. 

On Tue, Sep 19, 2023 at 8:13 PM Dave Taht <dave.taht@gmail.com> wrote:
Dear nanog-ers:

I go back many, many years as to baseline numbers for managing voip networks, including things like CISCO LLQ, diffserv, fqm prioritizing vlans, and running
voip networks entirely separately... I worked on codecs, such as oslec, and early sip stacks, but that was over 20 years ago.

The thing is, I have been unable to find much research (as yet) as to why my number exists. Over here I am taking a poll as to what number is most correct (10ms, 30ms, 100ms, 200ms),


but I am even more interested in finding cites to support various viewpoints, including mine, and learning how slas are met to deliver it.


--
Oct 30: https://netdevconf.info/0x17/news/the-maestro-and-the-music-bof.html
Dave Täht CSO, LibreQos