Fwd: Internet operations during pandemics
Did other folk on nanog-l see the nLnog-l note copied here? I wonder how folk are planning for things (noted in the slides) o supply chain for parts/equipment Wait, I can't get me a new shiny shipped because what?? o ongoing rollout of new equipment I'm deploying next week in KIX, I'm currently in LAX how do I get there? equipment arrives.. in between...oops! o noc/etc support staff omg.. wait, I can't have my noc staff in the same room? our 'wfh' solution is ... wait, where is that? how do i get their phone queue sent to them? omg :( <sadness!> o services capacity crunches I love my shiny new dns service.. .wait, why is there a smoking hole where my dns servers were? I think some of this has been discussed (shifts in peaks, leveling of peaks) Some hasn't really... I expect that at least sharing some 'err, our WFH changed now we do: X, Y , Z and use M to get N solved' could be super cool to discuss/share and iterate for better solutions for all of our users. thoughts? :) thanks! -chris (note all the hard work in this message is not mine... thanks Job!) ---------- Forwarded message --------- From: Job Snijders <job@ntt.net> Date: Wed, Mar 18, 2020 at 6:02 PM Subject: Internet operations during pandemics To: <nlnog@nlnog.net> Dear all, I threw together a slidedeck today on the potential impact and second order effects of COVID-19 on Internet network operations. http://instituut.net/~job/netops_during_pandemics.pdf I hope we together over time can add and extend projections in the deck on what will happen and how we can mitigate the negative effects on Internet operations. We have to answer questions such as: 1) what problems already exist today because of a few weeks of C19? 2) What problems are still coming? Will those be localized or globally? 3) What possible workarounds can we plan for those problems? I would appreciate feedback, comments, corrections or whatever you want to tell me. None of us have been in this situation before, so my guess is as good as yours. Kind regards, Job
In my past it has always benefited me to set use cases and plan accordingly. For many it is difficult to imagine these less than awesome use cases. Having working to get datacentres back online in 8.8 earthquakes and dealing with fires in co-location sites it is hard. 1. Document 2. Generate use cases, DR plans, OOBM, document peers phone numbers offline 3. Implement, share, discuss 4. Profit While this sounds ideal and simple it is not a small effort. I have two talks I must finish up where on is on *Organizations as Code* and how to survive the worst. On Wed, Mar 18, 2020 at 5:25 PM Christopher Morrow <morrowc.lists@gmail.com> wrote:
Did other folk on nanog-l see the nLnog-l note copied here? I wonder how folk are planning for things (noted in the slides) o supply chain for parts/equipment Wait, I can't get me a new shiny shipped because what??
o ongoing rollout of new equipment I'm deploying next week in KIX, I'm currently in LAX how do I get there? equipment arrives.. in between...oops!
o noc/etc support staff omg.. wait, I can't have my noc staff in the same room? our 'wfh' solution is ... wait, where is that? how do i get their phone queue sent to them? omg :( <sadness!>
o services capacity crunches I love my shiny new dns service.. .wait, why is there a smoking hole where my dns servers were?
I think some of this has been discussed (shifts in peaks, leveling of peaks) Some hasn't really... I expect that at least sharing some 'err, our WFH changed now we do: X, Y , Z and use M to get N solved' could be super cool to discuss/share and iterate for better solutions for all of our users.
thoughts? :)
thanks! -chris (note all the hard work in this message is not mine... thanks Job!)
---------- Forwarded message --------- From: Job Snijders <job@ntt.net> Date: Wed, Mar 18, 2020 at 6:02 PM Subject: Internet operations during pandemics To: <nlnog@nlnog.net>
Dear all,
I threw together a slidedeck today on the potential impact and second order effects of COVID-19 on Internet network operations.
http://instituut.net/~job/netops_during_pandemics.pdf
I hope we together over time can add and extend projections in the deck on what will happen and how we can mitigate the negative effects on Internet operations.
We have to answer questions such as:
1) what problems already exist today because of a few weeks of C19? 2) What problems are still coming? Will those be localized or globally? 3) What possible workarounds can we plan for those problems?
I would appreciate feedback, comments, corrections or whatever you want to tell me. None of us have been in this situation before, so my guess is as good as yours.
Kind regards,
Job
-- - Andrew "lathama" Latham -
Many years ago (1990s) I worked for a startup in NYC. We had a conference room called "Conference Room S", this was a semi-reserved loft are in the Starbucks across the street, with an open tab[0]. The leads for each group had the project to design the DR / BCP plans for the entire organization, and so we had a daily, 1 hour meeting in Conf Room S. Instead of actually working on the DR plan, we used the time to get other work done - it was quiet, there was wifi, there were no interruptions, there was free coffee... After a few months the CTO asked us to finish up and give him the plans... so, we wrote: Step 1: Panic !!! Step 2: Sell stock options (if any...)[1] Step 3: Post resume on Monster.com Step 4: ... We printed up a bunch of copies of this, put it in an envelope, labeled it as "DR plan - open in case of disaster" and gave it to the CTO -- we fully expected him to open it, shout at us for a bit and / or chuckle resignedly, and then demand we actually do something useful - but, instead, something much much worse occurred... he thanked us, and locked it, unopened, in his filing cabinet. As blinked and asked him if he was going to read it, and he said "No, I trust you to have done a good job..." We felt *really* bad, and worked late over the next few weeks and weekends to actually make a good BCP/DR plan, and then confessed our sins. We also ran table-tops, distributed and tested the plans, etc.. I've always wondered whether the CTO somehow knew what we'd been up to - because of our guilt, the quality / comprehensiveness of the plan ended up much better than it would have otherwise... W [0]: We thought that we were super cool for this... [1]: This was an ongoing joke - the company was always "almost ready" to go public... On Thu, Mar 19, 2020 at 8:35 AM Andrew Latham <lathama@gmail.com> wrote:
In my past it has always benefited me to set use cases and plan accordingly. For many it is difficult to imagine these less than awesome use cases. Having working to get datacentres back online in 8.8 earthquakes and dealing with fires in co-location sites it is hard.
1. Document 2. Generate use cases, DR plans, OOBM, document peers phone numbers offline 3. Implement, share, discuss 4. Profit
While this sounds ideal and simple it is not a small effort. I have two talks I must finish up where on is on *Organizations as Code* and how to survive the worst.
On Wed, Mar 18, 2020 at 5:25 PM Christopher Morrow <morrowc.lists@gmail.com> wrote:
Did other folk on nanog-l see the nLnog-l note copied here? I wonder how folk are planning for things (noted in the slides) o supply chain for parts/equipment Wait, I can't get me a new shiny shipped because what??
o ongoing rollout of new equipment I'm deploying next week in KIX, I'm currently in LAX how do I get there? equipment arrives.. in between...oops!
o noc/etc support staff omg.. wait, I can't have my noc staff in the same room? our 'wfh' solution is ... wait, where is that? how do i get their phone queue sent to them? omg :( <sadness!>
o services capacity crunches I love my shiny new dns service.. .wait, why is there a smoking hole where my dns servers were?
I think some of this has been discussed (shifts in peaks, leveling of peaks) Some hasn't really... I expect that at least sharing some 'err, our WFH changed now we do: X, Y , Z and use M to get N solved' could be super cool to discuss/share and iterate for better solutions for all of our users.
thoughts? :)
thanks! -chris (note all the hard work in this message is not mine... thanks Job!)
---------- Forwarded message --------- From: Job Snijders <job@ntt.net> Date: Wed, Mar 18, 2020 at 6:02 PM Subject: Internet operations during pandemics To: <nlnog@nlnog.net>
Dear all,
I threw together a slidedeck today on the potential impact and second order effects of COVID-19 on Internet network operations.
http://instituut.net/~job/netops_during_pandemics.pdf
I hope we together over time can add and extend projections in the deck on what will happen and how we can mitigate the negative effects on Internet operations.
We have to answer questions such as:
1) what problems already exist today because of a few weeks of C19? 2) What problems are still coming? Will those be localized or globally? 3) What possible workarounds can we plan for those problems?
I would appreciate feedback, comments, corrections or whatever you want to tell me. None of us have been in this situation before, so my guess is as good as yours.
Kind regards,
Job
-- - Andrew "lathama" Latham -
-- I don't think the execution is relevant when it was obviously a bad idea in the first place. This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants. ---maf
On Wed, Mar 18, 2020 at 6:23 PM Christopher Morrow <morrowc.lists@gmail.com> wrote:
Did other folk on nanog-l see the nLnog-l note copied here? I wonder how folk are planning for things (noted in the slides) o supply chain for parts/equipment Wait, I can't get me a new shiny shipped because what??
o ongoing rollout of new equipment I'm deploying next week in KIX, I'm currently in LAX how do I get there? equipment arrives.. in between...oops!
o noc/etc support staff omg.. wait, I can't have my noc staff in the same room? our 'wfh' solution is ... wait, where is that? how do i get their phone queue sent to them? omg :( <sadness!>
o services capacity crunches I love my shiny new dns service.. .wait, why is there a smoking hole where my dns servers were?
I think some of this has been discussed (shifts in peaks, leveling of peaks) Some hasn't really... I expect that at least sharing some 'err, our WFH changed now we do: X, Y , Z and use M to get N solved' could be super cool to discuss/share and iterate for better solutions for all of our users.
thoughts? :)
replying to myself, for one example of impact with some numbers: https://www.pornhub.com/insights/corona-virus note that basically across the board there is a 20% uplift in serving traffic mid-day. I imagine that netflix/hulu/etc all have similar sorts of changes, that translates downstream to some extent as well, depending upon how well / where the cache for this data is, I expect. It occurred to me in another conversation that a bunch of the 'internet business' has worked for the last 10+ years to push 'content' as close to the user as possible. This likely relieves long-haul or interconnect links at the (not really fixable) cost of increased capacity demands on the last-mile links. During this time, however, 'work from home' technology hasn't really progressed along the same path, has it? So, "get to the vpn" is still largely a process of getting packets across the wide internet and to small locations (your enterprise), there's little relief in site for that model :(
thanks! -chris (note all the hard work in this message is not mine... thanks Job!)
---------- Forwarded message --------- From: Job Snijders <job@ntt.net> Date: Wed, Mar 18, 2020 at 6:02 PM Subject: Internet operations during pandemics To: <nlnog@nlnog.net>
Dear all,
I threw together a slidedeck today on the potential impact and second order effects of COVID-19 on Internet network operations.
http://instituut.net/~job/netops_during_pandemics.pdf
I hope we together over time can add and extend projections in the deck on what will happen and how we can mitigate the negative effects on Internet operations.
We have to answer questions such as:
1) what problems already exist today because of a few weeks of C19? 2) What problems are still coming? Will those be localized or globally? 3) What possible workarounds can we plan for those problems?
I would appreciate feedback, comments, corrections or whatever you want to tell me. None of us have been in this situation before, so my guess is as good as yours.
Kind regards,
Job
On 3/19/20 9:51 AM, Christopher Morrow wrote:
During this time, however, 'work from home' technology hasn't really progressed along the same path, has it? So, "get to the vpn" is still largely a process of getting packets across the wide internet and to small locations (your enterprise), there's little relief in site for that model:(
IMO that's where local peering comes in, but the big ISPs like AT&T and Charter/Spectrum (the two national providers in my area) are loathe to peer anywhere except a few big central locations, if at all. It's not a technical problem (i.e. Charter has a 10% utilized 10Ge and unused 1Ge switch trunks in my facility as custs cancel due to he.net moving in), it's a policy problem. So we end up with setups like colo customers not using Charter at the colo because they can get better pricing options, then suddenly they have remote workers on high latency cable connections at home since for that home cable connection to talk to the colo server traffic has to take some crazy long out of state boomerang path that a simple peering connection would solve.
On Thu, Mar 19, 2020 at 1:47 PM Seth Mattinen <sethm@rollernet.us> wrote:
On 3/19/20 9:51 AM, Christopher Morrow wrote:
During this time, however, 'work from home' technology hasn't really progressed along the same path, has it? So, "get to the vpn" is still largely a process of getting packets across the wide internet and to small locations (your enterprise), there's little relief in site for that model:(
IMO that's where local peering comes in, but the big ISPs like AT&T and Charter/Spectrum (the two national providers in my area) are loathe to peer anywhere except a few big central locations, if at all. It's not a
peer or transit? or did you mean crossing between att/comcast ? (assume they are SFP not customer/transit)
technical problem (i.e. Charter has a 10% utilized 10Ge and unused 1Ge switch trunks in my facility as custs cancel due to he.net moving in), it's a policy problem.
I expect charter (in your example) would happily sign you up to a 1g or 10g port that's vacated there, right? the difference/question is about 'settlement free' or 'less than standard transit' access?
So we end up with setups like colo customers not using Charter at the colo because they can get better pricing options, then suddenly they have remote workers on high latency cable connections at home since for that home cable connection to talk to the colo server traffic has to take some crazy long out of state boomerang path that a simple peering connection would solve.
yea, this is exactly the sort of problem I was thinking about... I wonder if enterprises pulling their VPN from 'on prem' to 'deploy in "equinix" (pick your xerox copy of same)' with a private network backhaul to their prem(s) might actually make things better? Might that allow them to deploy more servers more easily? (ship to "equinix" ask remote hands to deploy...) That and some reasonable answer for 'connect to the IX, get some local peering to networks where your employees are...' etc.
participants (4)
-
Andrew Latham
-
Christopher Morrow
-
Seth Mattinen
-
Warren Kumari