Sent from my iPhone

On Apr 1, 2021, at 2:02 PM, Jean St-Laurent via NANOG <nanog@nanog.org> wrote:

*External Email: Use Caution*

I remembered working for a big ISP in Europe offering cable tv + internet with +20M subscribers

Every time there was a huge power outage in major cities, all tv`s would go off at the same time. I don`t have stats on power grid stability in Europe Vs N/A.

The problem, was when the power was coming back in big cities, all the tv subscribers would come back online at the exact same second or minute.
More or less the same 2 or 3 minutes.

What happened is that it would create a kind of internal DDoS and they would all timed out and give a weird error message. Something very useful like Error Code 0x8098808 Please call our support line at this phone number.

The server sysadmins would go on a panic because all systems were overloaded. They often needed to do overtime because DB crashed, key servers there crashed, DB here crashed, whatever... there was always something crashing. This was before the cloud when you could just push a slider and have tons of VMs or containers to absorb the load in real time. (in my dream)

This would every time create frustration from the clients, the help desk, the support teams and also the upper management. Every time the teams were really tired after that. It was draining juice.

Anyway, after some years of talking internally (red tape), we finally managed to install a random artificial penalty in the setup boxes when they boot after a power outage. Nothing like 20 minutes, but just enough to spread the load over a longer period of time. For the end user, it went transparent for them because, if the setup box would boot in 206 seconds instead of the super aggressive 34 seconds, well it booted and they could watch tv.
Vs

my system is totally frozen and it`s been like that for 20 minutes with weird messages because all your systems are down and the error msg said to call the help desk.

This simple change to add 3 lines of code to add a random artificial boot penalty of few seconds, completely solve the problem. This way, when a city would black out, we wouldn't be self DDoS, because the systems would slowly rampup. The setup boxes would all reboot but, wait randomly before asking for the DRM package to unlock the cable TV service and validate whether billing is right.

I`m no Call of Duty expert nor Akamai, but it's been many times that I observe the same question here:

What's happening?
Call of Duty!
Okay.

Would a kind of throttle help here?

An artificial roll out penalty somehow? Probably not at the ISP level, but more at the game level. Well, ISP could also have some mechanisms to reduce the impact or even Akamai could force a progressive roll out.

I`m not sure that the proposed solutions could work, but it seems to impact NANOG frequently and/or at least generate a call overnight/weekend. It seems to also happens just before long holidays when operations are sometimes on reduce personnel.

Are big games roll out really impacting NANOG? or it's more a: Hey I was curious what happened and I thought to ask here on NANOG?

#JustCurious

Jean

-----Original Message-----
From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of aaron1@gvtc.com
Sent: April 1, 2021 12:12 PM
To: 'Jared Mauch' <jared@puck.nether.net>; 'Töma Gavrichenkov' <ximaera@gmail.com>
Cc: 'NANOG' <nanog@nanog.org>
Subject: RE: wow, lots of akamai

Gaming update... I had a feeling. Thanks for the feedback folks.

Thanks Jared, it's running well, before, during and after. We have a lot of capacity there.

-Aaron