
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com

Hi Mike, BFD on the links in the path that you can control (if your Networking OS supports it reliably ) can provide sub second failover) for assurance you can use Ethernet OAM (to show that your network is ok) (not much help for the internet ... it may be worth reviewing the paths between your network and where the radio station wanted / the service delivered or streamed to ? but if an 3rd party after your upstream provider has a faux paw it would be unreasonable for the customer to blame you... On Sun, 14 Sept 2025 at 21:29, Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D2LUMIGG...
-- Kindest regards, Tom Smyth.

The broadcasting industry generally runs parallel pipelines on completely independent infrastructure - the endpoints on either side simply take the first segment which lands. So they produce a segment (audio snippet) twice, ship it to destination over two separate paths, destination takes the first arriving and drops the second. There’s really no way to detect a failure and shift away so fast! G
On 14 Sep 2025, at 22:28, Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D2LUMIGG...

Mike, I help run KRYZ LPFM, we run two simultaneous streams with buffering from two locations/providers to prevent these issues. StereoTool is what we use to create and send the encoded data streams. Even then we can/do have upstream provider issues, so our second stream runs 'fill in' music that is hosted on a local hard drive. Most transmitters have this capability built in too, its better than dead air and we get notifications and fix things if we can or hassle our upstream. Having outages is life, preventing people noticing them is our job! https://www.thimeo.com/stereo-tool/ Colin Constable On Sun, Sep 14, 2025, 13:29 Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D2LUMIGG...

I run a shoutcast/icecast hosting business myself, and when something like that happens, my first suspect is latency/routing. There really are no failover systems for that, aside from using multiple DCs, but honestly this is more of a DC issue anyway and it'll happen wherever you go. It's usually orginating at the tier 1 carriers anyway, or some isp thats got the hop between the csr and them. On 9/14/25 4:28 PM, Mike Hammett via NANOG wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D2LUMIGG...

----- Original Message -----
From: "Mike Hammett via NANOG" <nanog@lists.nanog.org> To: "North American Network Operators Group" <nanog@lists.nanog.org> Cc: "Mike Hammett" <nanog@ics-il.net> Sent: Sunday, September 14, 2025 4:28:45 PM Subject: Resilient Internet
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
If you've lost one second, 6 times in six months, that is like *six* nines for the year; already 10 times better than most commercial services (it's about 6.5 nines, actually: https://www.bmc.com/blogs/service-availability-calculation-metrics/ That's already, likely, *much* better than they're paying for. Pushing to seven nines will cost about ten times what they're paying now. This is what we used to call a "Sales problem", in my IT work; the problem is not that the service is bad, the problem is that the customer doesn't have reasonable expectation, because it's not been explained to them what service levels are, and how much it costs to add each "nine" at the end of that measure. I know this won't help, but I hope this helps. :-) Your problem is worse, because if your outage is only 1 second, you have to guarantee that any duplicate presented streams are in sync to no less than 9/10 of that, 100ms or less. Can you just turn the buffering up a second? Cheers, -- jra Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274

I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
Hard to answer the question properly until the actual cause of the 1 second drops is actually defined. Is it network buffer, routing reconvergence, application buffer, TCP related, etc. Sounds like the customer probably doesn't know either , but without knowing that any possible solutions are just blind guesses. On Sun, Sep 14, 2025 at 4:29 PM Mike Hammett via NANOG < nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D2LUMIGG...

I've heard about this being an issue in broadcast journalism. The mobility of news vans adds another layer to this. From what I've heard, broadcasters are a big part of the SD-WAN market. Packet duplication and dedup is common in SD-WAN implementations. Another user mentioned parallel feeds, and this would be one way to achieve that. I imagine you'd need some kind of frame replication or PRP/HSR to do this without the SD-WAN overlay. It could be kind of involved. Maybe there's a way it can be done with broadcast/multicast traffic on the traditional networking side. One last thought, it could be worth a check to see if drop times align with changes in your RIB/FIB. You may have an flappy, but more preferred route to the service provider. With all given odds, the cause likely sits outside of your subscriber/AS boundaries anyway. At 1 second it might as well be solar flares or EMF interference from the station itself. - Riley On Sunday, September 14th, 2025 at 2:29 PM, Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D2LUMIGG...

On Sun, 14 Sept 2025 at 23:29, Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
Lot more information would be needed to meaningfully contribute. But generally speaking if the price expectation is anywhere near what Internet services typically are, the customer is definitely asking too much. And your contract terms should make it clear that this level of service availability is within the SLA. Having said that, I used to work for a company that provides streams for terrestrial tv. Not IP-TV, regular antenna TV. How this was done was that there was dual-plane MPLS/IP backplane and the stream was sent through both planes, at the antenna site a duplicate packet was dropped before content was fed to the transmitters. If you have a very high expectation of availability, you'll very quickly find that you either do it twice or you do it once and break SLA and apologise regularly. -- ++ytti

you either do it twice or you do it once and break SLA and apologise regularly.
Well said. Shows up outside networks too, CPU instruction lockstepping was built on the same principle. https://en.wikipedia.org/wiki/Lockstep_(computing) On Monday, September 15th, 2025 at 1:14 AM, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Sun, 14 Sept 2025 at 23:29, Mike Hammett via NANOG nanog@lists.nanog.org wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
Lot more information would be needed to meaningfully contribute.
But generally speaking if the price expectation is anywhere near what Internet services typically are, the customer is definitely asking too much. And your contract terms should make it clear that this level of service availability is within the SLA.
Having said that, I used to work for a company that provides streams for terrestrial tv. Not IP-TV, regular antenna TV. How this was done was that there was dual-plane MPLS/IP backplane and the stream was sent through both planes, at the antenna site a duplicate packet was dropped before content was fed to the transmitters. If you have a very high expectation of availability, you'll very quickly find that you either do it twice or you do it once and break SLA and apologise regularly.
-- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OULXCJ2A...

*nods* Well, and that's the rub. Their expectations don't match any Internet SLA I've ever seen, much less for standard broadband. However, simply telling the customer that we're within our SLA or proving it's not our fault doesn't do much to enhance customer satisfaction and thus doesn't help our reputation. Hearing from others that the broadcast industry has already figured this problem out and sends the same stream via multiple paths is a big help in getting us going in the right direction. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Saku Ytti" <saku@ytti.fi> To: "North American Network Operators Group" <nanog@lists.nanog.org> Cc: "Mike Hammett" <nanog@ics-il.net> Sent: Monday, September 15, 2025 2:13:40 AM Subject: Re: Resilient Internet On Sun, 14 Sept 2025 at 23:29, Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
Lot more information would be needed to meaningfully contribute. But generally speaking if the price expectation is anywhere near what Internet services typically are, the customer is definitely asking too much. And your contract terms should make it clear that this level of service availability is within the SLA. Having said that, I used to work for a company that provides streams for terrestrial tv. Not IP-TV, regular antenna TV. How this was done was that there was dual-plane MPLS/IP backplane and the stream was sent through both planes, at the antenna site a duplicate packet was dropped before content was fed to the transmitters. If you have a very high expectation of availability, you'll very quickly find that you either do it twice or you do it once and break SLA and apologise regularly. -- ++ytti

If they can bend the application they are using, and don't mind significant latency, something like RaptorQ codes with deep time interleaving can spackle over considerably larger gaps than 1 seconds, at the cost of some additional overhead. On Mon, Sep 15, 2025 at 2:07 PM Mike Hammett via NANOG < nanog@lists.nanog.org> wrote:
*nods* Well, and that's the rub. Their expectations don't match any Internet SLA I've ever seen, much less for standard broadband. However, simply telling the customer that we're within our SLA or proving it's not our fault doesn't do much to enhance customer satisfaction and thus doesn't help our reputation. Hearing from others that the broadcast industry has already figured this problem out and sends the same stream via multiple paths is a big help in getting us going in the right direction.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
----- Original Message ----- From: "Saku Ytti" <saku@ytti.fi> To: "North American Network Operators Group" <nanog@lists.nanog.org> Cc: "Mike Hammett" <nanog@ics-il.net> Sent: Monday, September 15, 2025 2:13:40 AM Subject: Re: Resilient Internet
On Sun, 14 Sept 2025 at 23:29, Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
Lot more information would be needed to meaningfully contribute.
But generally speaking if the price expectation is anywhere near what Internet services typically are, the customer is definitely asking too much. And your contract terms should make it clear that this level of service availability is within the SLA.
Having said that, I used to work for a company that provides streams for terrestrial tv. Not IP-TV, regular antenna TV. How this was done was that there was dual-plane MPLS/IP backplane and the stream was sent through both planes, at the antenna site a duplicate packet was dropped before content was fed to the transmitters. If you have a very high expectation of availability, you'll very quickly find that you either do it twice or you do it once and break SLA and apologise regularly.
-- ++ytti
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/KJNGBFS4...

There's also the substantially easier option of keeping a buffer of longer than one second, and using TCP (do some testing to make sure it will actually retransmit packets within the buffer timeout. Likely already the case due to SACK.). On 15/09/2025 14:37, Dorn Hetzel via NANOG wrote:
If they can bend the application they are using, and don't mind significant latency, something like RaptorQ codes with deep time interleaving can spackle over considerably larger gaps than 1 seconds, at the cost of some additional overhead.
On Mon, Sep 15, 2025 at 2:07 PM Mike Hammett via NANOG < nanog@lists.nanog.org> wrote:
*nods* Well, and that's the rub. Their expectations don't match any Internet SLA I've ever seen, much less for standard broadband. However, simply telling the customer that we're within our SLA or proving it's not our fault doesn't do much to enhance customer satisfaction and thus doesn't help our reputation. Hearing from others that the broadcast industry has already figured this problem out and sends the same stream via multiple paths is a big help in getting us going in the right direction.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
----- Original Message ----- From: "Saku Ytti" <saku@ytti.fi> To: "North American Network Operators Group" <nanog@lists.nanog.org> Cc: "Mike Hammett" <nanog@ics-il.net> Sent: Monday, September 15, 2025 2:13:40 AM Subject: Re: Resilient Internet
On Sun, 14 Sept 2025 at 23:29, Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
Lot more information would be needed to meaningfully contribute.
But generally speaking if the price expectation is anywhere near what Internet services typically are, the customer is definitely asking too much. And your contract terms should make it clear that this level of service availability is within the SLA.
Having said that, I used to work for a company that provides streams for terrestrial tv. Not IP-TV, regular antenna TV. How this was done was that there was dual-plane MPLS/IP backplane and the stream was sent through both planes, at the antenna site a duplicate packet was dropped before content was fed to the transmitters. If you have a very high expectation of availability, you'll very quickly find that you either do it twice or you do it once and break SLA and apologise regularly.
-- ++ytti
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/KJNGBFS4...
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/Z5HYQHC7...

It's simpler, but you can use coding to spackle over larger gaps than TCP can usually manage, and it doesn't require the round-trips for retransmissions, you just encode with enough redundancy to deal with the design allowed for gap sizes. On Mon, Sep 15, 2025 at 3:54 PM nanog--- via NANOG <nanog@lists.nanog.org> wrote:
There's also the substantially easier option of keeping a buffer of longer than one second, and using TCP (do some testing to make sure it will actually retransmit packets within the buffer timeout. Likely already the case due to SACK.).
On 15/09/2025 14:37, Dorn Hetzel via NANOG wrote:
If they can bend the application they are using, and don't mind significant latency, something like RaptorQ codes with deep time interleaving can spackle over considerably larger gaps than 1 seconds, at the cost of some additional overhead.
On Mon, Sep 15, 2025 at 2:07 PM Mike Hammett via NANOG < nanog@lists.nanog.org> wrote:
*nods* Well, and that's the rub. Their expectations don't match any Internet SLA I've ever seen, much less for standard broadband. However, simply telling the customer that we're within our SLA or proving it's not our fault doesn't do much to enhance customer satisfaction and thus doesn't help our reputation. Hearing from others that the broadcast industry has already figured this problem out and sends the same stream via multiple paths is a big help in getting us going in the right direction.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
----- Original Message ----- From: "Saku Ytti" <saku@ytti.fi> To: "North American Network Operators Group" <nanog@lists.nanog.org> Cc: "Mike Hammett" <nanog@ics-il.net> Sent: Monday, September 15, 2025 2:13:40 AM Subject: Re: Resilient Internet
On Sun, 14 Sept 2025 at 23:29, Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
Lot more information would be needed to meaningfully contribute.
But generally speaking if the price expectation is anywhere near what Internet services typically are, the customer is definitely asking too much. And your contract terms should make it clear that this level of service availability is within the SLA.
Having said that, I used to work for a company that provides streams for terrestrial tv. Not IP-TV, regular antenna TV. How this was done was that there was dual-plane MPLS/IP backplane and the stream was sent through both planes, at the antenna site a duplicate packet was dropped before content was fed to the transmitters. If you have a very high expectation of availability, you'll very quickly find that you either do it twice or you do it once and break SLA and apologise regularly.
-- ++ytti
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/KJNGBFS4...
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/Z5HYQHC7... _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/QTE2G2FV...

IIUC the application needs interactivity which is not great with lots of buffering. Sorry if I’m off, that was my impression. If you already know you’ll have some loss, that implies some packets _will_ be lost; IMHO duplication is most likely the only way to significantly lessen the losses. Retransmissionalgorithms are susceptible to being affected again on the retransmission, worsening the delay. Btw, I’ve heard of duplication even over the _same link_ because it was detected to be lossy, just to increase the odds one of the duplicates would make it. Pedro Martins Prado pedro.prado@gmail.com / +353 83 036 1875
On 15 Sep 2025, at 15:21, Dorn Hetzel via NANOG <nanog@lists.nanog.org> wrote:
It's simpler, but you can use coding to spackle over larger gaps than TCP can usually manage, and it doesn't require the round-trips for retransmissions, you just encode with enough redundancy to deal with the design allowed for gap sizes.
On Mon, Sep 15, 2025 at 3:54 PM nanog--- via NANOG <nanog@lists.nanog.org> wrote:
There's also the substantially easier option of keeping a buffer of longer than one second, and using TCP (do some testing to make sure it will actually retransmit packets within the buffer timeout. Likely already the case due to SACK.).
On 15/09/2025 14:37, Dorn Hetzel via NANOG wrote:
If they can bend the application they are using, and don't mind significant latency, something like RaptorQ codes with deep time interleaving can spackle over considerably larger gaps than 1 seconds, at the cost of some additional overhead.
On Mon, Sep 15, 2025 at 2:07 PM Mike Hammett via NANOG < nanog@lists.nanog.org> wrote:
*nods* Well, and that's the rub. Their expectations don't match any Internet SLA I've ever seen, much less for standard broadband. However, simply telling the customer that we're within our SLA or proving it's not our fault doesn't do much to enhance customer satisfaction and thus doesn't help our reputation. Hearing from others that the broadcast industry has already figured this problem out and sends the same stream via multiple paths is a big help in getting us going in the right direction.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
----- Original Message ----- From: "Saku Ytti" <saku@ytti.fi> To: "North American Network Operators Group" <nanog@lists.nanog.org> Cc: "Mike Hammett" <nanog@ics-il.net> Sent: Monday, September 15, 2025 2:13:40 AM Subject: Re: Resilient Internet
On Sun, 14 Sept 2025 at 23:29, Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
Lot more information would be needed to meaningfully contribute.
But generally speaking if the price expectation is anywhere near what Internet services typically are, the customer is definitely asking too much. And your contract terms should make it clear that this level of service availability is within the SLA.
Having said that, I used to work for a company that provides streams for terrestrial tv. Not IP-TV, regular antenna TV. How this was done was that there was dual-plane MPLS/IP backplane and the stream was sent through both planes, at the antenna site a duplicate packet was dropped before content was fed to the transmitters. If you have a very high expectation of availability, you'll very quickly find that you either do it twice or you do it once and break SLA and apologise regularly.
-- ++ytti
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/KJNGBFS4...
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/Z5HYQHC7... _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/QTE2G2FV...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/WEXGXLXK...

On Mon, 15 Sept 2025 at 18:42, Pedro Prado via NANOG <nanog@lists.nanog.org> wrote:
Btw, I’ve heard of duplication even over the _same link_ because it was detected to be lossy, just to increase the odds one of the duplicates would make it.
Sure if you misbehave you can win congested internet battles, provided most others do not misbehave. But it's likely not a reasonable place to start your product design from. That is, let's assume the original QUIC which does some sort of dynamic FEC to increase redundancy to combat loss. This is fine if loss is due to an unreliable link or such. But if, as the common case is, loss is due to congestion, then it'll only make things worse when everyone else uses the same strategy, since as packet loss increases demand for capacity increases with it as form of increased redundancy. -- ++ytti

I think most people on this aren't the ones who design the products or services, but use them. Sometimes you need to use a screwdriver as a hammer. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Saku Ytti via NANOG" <nanog@lists.nanog.org> To: "North American Network Operators Group" <nanog@lists.nanog.org> Cc: "Saku Ytti" <saku@ytti.fi> Sent: Monday, September 15, 2025 10:54:03 AM Subject: Re: Resilient Internet On Mon, 15 Sept 2025 at 18:42, Pedro Prado via NANOG <nanog@lists.nanog.org> wrote:
Btw, I’ve heard of duplication even over the _same link_ because it was detected to be lossy, just to increase the odds one of the duplicates would make it.
Sure if you misbehave you can win congested internet battles, provided most others do not misbehave. But it's likely not a reasonable place to start your product design from. That is, let's assume the original QUIC which does some sort of dynamic FEC to increase redundancy to combat loss. This is fine if loss is due to an unreliable link or such. But if, as the common case is, loss is due to congestion, then it'll only make things worse when everyone else uses the same strategy, since as packet loss increases demand for capacity increases with it as form of increased redundancy. -- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/2CIJMCQB...

*nods* Well, and that's the rub. Their expectations don't match any Internet SLA I've ever seen, much less for standard broadband. However, simply telling the customer that we're within our SLA or proving it's not our fault doesn't do much to enhance customer satisfaction and thus doesn't help our reputation. Hearing from others that the broadcast industry has already figured this problem out and sends the same stream via multiple paths is a big help in getting us going in the right direction.
If we assume that these 6 1s interruptions are actually the fault of the network providing the connectivity, then you're already much better than 6 9s of reliability, as Jay said. You'd be hard pressed to get much better. But again, without technical proof here that the network service itself is actually the cause of the issue, you're kind of in the tall grass still. We had broadcast video teams internally years ago that would come to us with all kinds of similar 'small burp' problems, and blame the network for it. In 95% of cases, the problem turned out to be their equipment or applications doing it to themselves. Had nothing to do with the network at all. Hundreds of hours were sunk proving that. You don't have to meet every expectation of every subscriber. It's totally fair for you to say "It appears as if the service we provide may not be suitable for your needs. We'd be happy to continue to provide you services, but it may be beneficial for you to investigate other options." Reasonable people will respect that. Unreasonable people stomp their feet and yell, but those people are going to stomp their feet and yell no matter what. On Mon, Sep 15, 2025 at 8:07 AM Mike Hammett via NANOG < nanog@lists.nanog.org> wrote:
*nods* Well, and that's the rub. Their expectations don't match any Internet SLA I've ever seen, much less for standard broadband. However, simply telling the customer that we're within our SLA or proving it's not our fault doesn't do much to enhance customer satisfaction and thus doesn't help our reputation. Hearing from others that the broadcast industry has already figured this problem out and sends the same stream via multiple paths is a big help in getting us going in the right direction.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
----- Original Message ----- From: "Saku Ytti" <saku@ytti.fi> To: "North American Network Operators Group" <nanog@lists.nanog.org> Cc: "Mike Hammett" <nanog@ics-il.net> Sent: Monday, September 15, 2025 2:13:40 AM Subject: Re: Resilient Internet
On Sun, 14 Sept 2025 at 23:29, Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
I have a radio station customer who is utilizing one of those streaming services to bring their broadcast station online. We've received a complaint of a half dozen or so 1-second drops in connectivity over the Internet to this streaming service in the six or so months they've been a customer. I consider that pretty amazing service delivery. However, the customer does not. I suspect this is a layer 8 issue, but what have your experiences been in these kinds of situations, and what technical remedies would be available? I don't know what sub-second failover systems exist, but I'm sure they're not cost-effective if they do.
Lot more information would be needed to meaningfully contribute.
But generally speaking if the price expectation is anywhere near what Internet services typically are, the customer is definitely asking too much. And your contract terms should make it clear that this level of service availability is within the SLA.
Having said that, I used to work for a company that provides streams for terrestrial tv. Not IP-TV, regular antenna TV. How this was done was that there was dual-plane MPLS/IP backplane and the stream was sent through both planes, at the antenna site a duplicate packet was dropped before content was fed to the transmitters. If you have a very high expectation of availability, you'll very quickly find that you either do it twice or you do it once and break SLA and apologise regularly.
-- ++ytti
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/KJNGBFS4...

On Mon, Sep 15, 2025 at 7:07 AM Mike Hammett via NANOG <nanog@lists.nanog.org> wrote:
*nods* Well, and that's the rub. Their expectations don't match any Internet SLA I've ever seen
The implied expectation is on a completely infeasible ground for the provider of a basic internet line. Typically security updates alone necessary for CPEs, etc, would bump connections more than a second per 6 months. It's more like the service level for point-to-point path-protected circuits, or dual-connection disparate-path dark fiber build directly between two locations, bought from a telecoms provider, and not an internet connection, in order to reasonably offer a service level anywhere near this. "1 second maximum every 6 months"; let alone less. An infrequent 1 second one-off interruption typically doesn't count as an outage. It may also be something that cannot be diagnosed without major maintenance; that is part of the environment IP exists in, and the internet itself is a large network with multiple instable or changing paths through other providers' networks. Most peers are a best-effort packet delivery, not a promise of 0% loss. As described the expectation the expectation is a near equal to guaranteeing a lossless connection between point A and B. But, point B is across the internet. Which means part of the path to B is outside the control of the provider of that line in the first place, and parts of that path at different points are on infrastructure which is shared and overcommitted both by the point A's network providers and point B's network providers. Also, point A and point B are both host devices which are subject to the chance of a local software or hardware issue. That means when there is a "disruption"; the rational thing for Provider A to do is to assume the issue is with point B, the network in the middle, or end devices, until proven otherwise. Providers do not assume an issue they have not detected is with their own service or network; that has to be proven, and the proof may be too difficult to accomplish. Provider B or the networks in the middle have no reason to adjust their service to accommodate Provider A or Provider A's end users' special requirements without first making purchases of additional dedicated infrastructure and contracts with each provider between point A and B. For a network line provider to ensure this level of service; that provider realistically has have to have failover circuits between point A and B with dedicated infrastructure on the whole path not dependent on 3rd parties. For example a ring path-protected point to point Ethernet or SONET-based circuits with established bandwidth reservations between A and B. Even with those: you can still expect a few hours of outage per year for maintenance. And there is always the chance of that one-off double fibre cut every few years and similar. The risk level depends on many variables. There are those SDN solutions that overlay private networks with redundant forwarding on top of several internet connections. _Both_ point A and point B require multiple internet connections. The multiple internet connections still have simultaneous failure scenarios during major internet events. The expectation can be more protection from different possible causes of outages, but the risk for them does not go to 0.0%, etc
Mike Hammett -- -JA

On Mon, Sep 15, 2025 at 9:19 AM Tom Beecher via NANOG <nanog@lists.nanog.org> wrote:
*nods* Well, and that's the rub. Their expectations don't match any Internet SLA I've ever seen, much less for standard broadband. However, simply telling the customer that we're within our SLA or proving it's not our fault doesn't do much to enhance customer satisfaction and thus
doesn't
help our reputation. Hearing from others that the broadcast industry has already figured this problem out and sends the same stream via multiple paths is a big help in getting us going in the right direction.
If we assume that these 6 1s interruptions are actually the fault of the network providing the connectivity, then you're already much better than 6 9s of reliability, as Jay said. You'd be hard pressed to get much better. But again, without technical proof here that the network service itself is actually the cause of the issue, you're kind of in the tall grass still.
We had broadcast video teams internally years ago that would come to us with all kinds of similar 'small burp' problems, and blame the network for it. In 95% of cases, the problem turned out to be their equipment or applications doing it to themselves. Had nothing to do with the network at all. Hundreds of hours were sunk proving that.
The flip side to that, though, is the 5% of the time when it turns out to be misunderstood queue handling on a particular type of linecard when it is configured with non-default queue parameters and microbursts hit the linecard buffers...and after hundreds of hours of debugging, it really does turn out to be a network problem. The customer always remembers that 5%, and comes back wagging the finger at the network for every hiccup, conveniently forgetting the other 95% of the times when it wasn't the network. ^_^;;
You don't have to meet every expectation of every subscriber. It's totally fair for you to say "It appears as if the service we provide may not be suitable for your needs. We'd be happy to continue to provide you services, but it may be beneficial for you to investigate other options." Reasonable people will respect that. Unreasonable people stomp their feet and yell, but those people are going to stomp their feet and yell no matter what.
Also known as the "here is your SLA credit refund for the 1 second of lost traffic. We're so sorry. Please don't spend it all at once." answer. ;) Matt

We had broadcast video teams internally years ago that would come to us with all kinds of similar 'small burp' problems, and blame the network for it. In 95% of cases, the problem turned out to be their equipment or applications doing it to themselves. Had nothing to do with the network at all. Hundreds of hours were sunk proving that.
The flip side to that, though, is the 5% of the time when it turns out to be misunderstood queue handling on a particular type of linecard when it is configured with non-default queue parameters and microbursts hit the linecard buffers...and after hundreds of hours of debugging, it really does turn out to be a network problem. The customer always >remembers that 5%, and comes back wagging the finger at the network for every hiccup, conveniently forgetting the other 95% of the times when it wasn't the network. ^_^;;
Reminds me of one time long ago at an MSP - I identified a faulty Verizon line card in a router hop. One office that was being supported was having issues with youtube videos playing, but no one else was complaining, so we checked with all the other sites/clients we supported.... And found two more with faulting issues. Between all of them, Verizon's business support actually doing their job, and the traceroutes pointing to a specific router hostname in common, they were able to have someone determine that - yes - that line card that was part of the hostname apparently, was faulty and causing the issue. These were all DSL circuits, all in a nearby geographical region. That was quite an interesting diagnosis, going back and forth with VZ until I could throw the commonalities at them. And the only noticeable problem (that we ever knew of) was failure of youtube videos to play properly. On the surface, without that, everything seemed 100% okay.
participants (15)
-
Colin Constable
-
Dorn Hetzel
-
Doron Beit-Halahmi
-
Gary Sparkes
-
Giorgio Bonfiglio
-
Jay Acuna
-
Jay R. Ashworth
-
Matthew Petach
-
Mike Hammett
-
nanog@immibis.com
-
Pedro Prado
-
Riley O
-
Saku Ytti
-
Tom Beecher
-
Tom Smyth