Facebook post-mortems...
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr <https://m.facebook.com/nt/screen/?params={%22note_id%22:10158791436142200}&path=/notes/note/&_rdr> Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/ <https://blog.cloudflare.com/october-2021-facebook-outage/> FYI, /John
The FB one seems to be from a previous event. Downtime doesn't match, visible flaw effects don't either. Rubens On Mon, Oct 4, 2021 at 9:59 PM <jcurran@istaff.org> wrote:
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr
Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/
FYI, /John
Per o comments, the linked Facebook outage was from around 5/15/21 On Mon, Oct 4, 2021 at 9:08 PM Rubens Kuhl <rubensk@gmail.com> wrote:
The FB one seems to be from a previous event. Downtime doesn't match, visible flaw effects don't either.
Rubens
On Mon, Oct 4, 2021 at 9:59 PM <jcurran@istaff.org> wrote:
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr
Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/
FYI, /John
On 4 Oct 2021, at 8:58 PM, jcurran@istaff.org wrote:
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr <https://m.facebook.com/nt/screen/?params={%22note_id%22:10158791436142200}&path=/notes/note/&_rdr>
My bad - might be best to ignore the above post as it is a unconfirmed/undated post-mortem that may reference a different event.
Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/ <https://blog.cloudflare.com/october-2021-facebook-outage/> The Cloudflare writeup looks quite solid (loss of network -> no DNS servers -> major issue) /John
On 10/4/21 6:07 PM, jcurran@istaff.org wrote:
On 4 Oct 2021, at 8:58 PM, jcurran@istaff.org <mailto:jcurran@istaff.org> wrote:
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr <https://m.facebook.com/nt/screen/?params={%22note_id%22:10158791436142200}&path=/notes/note/&_rdr>
My bad - might be best to ignore the above post as it is a unconfirmed/undated post-mortem that may reference a different event.
One of the replies say it's from February, so year. Mike
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr <https://m.facebook.com/nt/screen/?params={%22note_id%22:10158791436142200}&path=/notes/note/&_rdr>
My bad - might be best to ignore the above post as it is a unconfirmed/undated post-mortem that may reference a different event.
If I'm reading the source correctly, the timestamp inside is for 09 SEP 2021 14:22:49 GMT (Unix time 1631197369). Then again, I may not be reading it correctly. :) -- Rabbi Rob Thomas Team Cymru "It is easy to believe in freedom of speech for those with whom we agree." - Leo McKern
On 10/4/21 5:58 PM, jcurran@istaff.org wrote:
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr <https://m.facebook.com/nt/screen/?params={"note_id":10158791436142200}&path=/notes/note/&_rdr>
Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/ <https://blog.cloudflare.com/october-2021-facebook-outage/>
They have a monkey patch subsystem. Lol. Mike
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com> wrote:
They have a monkey patch subsystem. Lol.
Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. This is where at an arbitrary location in the code you re-open an object defined elsewhere and change its methods. Chef doesn't always do the right thing. You tell Chef to remove an RPM and it does. Even if it has to remove half the operating system to satisfy the dependencies. If you want it to do something reasonable, say throw an error because you didn't actually tell it to remove half the operating system, you have a choice: spin up a fork of chef with a couple patches to the chef-rpm interaction or just monkey-patch it in one of your chef recipes. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
129.134.30.0/23, 129.134.30.0/24, 129.134.31.0/24. The specific routes covering all 4 nameservers (a-d) were withdrawn from all FB peering at approximately 15:40 UTC. Cheers, Jeff
On Oct 4, 2021, at 22:45, William Herrin <bill@herrin.us> wrote:
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com> wrote:
They have a monkey patch subsystem. Lol.
Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. This is where at an arbitrary location in the code you re-open an object defined elsewhere and change its methods.
Chef doesn't always do the right thing. You tell Chef to remove an RPM and it does. Even if it has to remove half the operating system to satisfy the dependencies. If you want it to do something reasonable, say throw an error because you didn't actually tell it to remove half the operating system, you have a choice: spin up a fork of chef with a couple patches to the chef-rpm interaction or just monkey-patch it in one of your chef recipes.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities. Let's check how these big companies are spreading their NS's. $ dig +short facebook.com NS d.ns.facebook.com. b.ns.facebook.com. c.ns.facebook.com. a.ns.facebook.com. $ dig +short google.com NS ns1.google.com. ns4.google.com. ns2.google.com. ns3.google.com. $ dig +short apple.com NS a.ns.apple.com. b.ns.apple.com. c.ns.apple.com. d.ns.apple.com. $ dig +short amazon.com NS ns4.p31.dynect.net. ns3.p31.dynect.net. ns1.p31.dynect.net. ns2.p31.dynect.net. pdns6.ultradns.co.uk. pdns1.ultradns.net. $ dig +short netflix.com NS ns-1372.awsdns-43.org. ns-1984.awsdns-56.co.uk. ns-659.awsdns-18.net. ns-81.awsdns-10.com. Amnazon and Netflix seem to not keep their eggs in the same basket. From a first look, they seem more resilient than facebook.com, google.com and apple.com Jean -----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Jeff Tantsura Sent: October 5, 2021 2:18 AM To: William Herrin <bill@herrin.us> Cc: nanog@nanog.org Subject: Re: Facebook post-mortems... 129.134.30.0/23, 129.134.30.0/24, 129.134.31.0/24. The specific routes covering all 4 nameservers (a-d) were withdrawn from all FB peering at approximately 15:40 UTC. Cheers, Jeff
On Oct 4, 2021, at 22:45, William Herrin <bill@herrin.us> wrote:
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com> wrote:
They have a monkey patch subsystem. Lol.
Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. This is where at an arbitrary location in the code you re-open an object defined elsewhere and change its methods.
Chef doesn't always do the right thing. You tell Chef to remove an RPM and it does. Even if it has to remove half the operating system to satisfy the dependencies. If you want it to do something reasonable, say throw an error because you didn't actually tell it to remove half the operating system, you have a choice: spin up a fork of chef with a couple patches to the chef-rpm interaction or just monkey-patch it in one of your chef recipes.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
On 10/5/21 14:08, Jean St-Laurent via NANOG wrote:
Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities.
Well, doesn't really matter if you can resolve the A/AAAA/MX records, but you can't connect to the network that is hosting the services. At any rate, having 3rd party DNS hosting for your domain is always a good thing to have. But in reality, it only hits the spot if the service is also available on a 3rd party network, otherwise, you keep DNS up, but get no service. Mark.
On Tue, Oct 05, 2021 at 02:22:09PM +0200, Mark Tinka wrote:
On 10/5/21 14:08, Jean St-Laurent via NANOG wrote:
Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities.
Well, doesn't really matter if you can resolve the A/AAAA/MX records, but you can't connect to the network that is hosting the services.
At any rate, having 3rd party DNS hosting for your domain is always a good thing to have. But in reality, it only hits the spot if the service is also available on a 3rd party network, otherwise, you keep DNS up, but get no service.
That's not quite true. It still gives much better clue as to what is going on; if a host resolves to an IP but isn't pingable/traceroutable, that is something that many more techy people will understand than if the domain is simply unresolvable. Not everyone has the skill set and knowledge of DNS to understand how to track down what nameservers Facebook is supposed to have, and how to debug names not resolving. There are lots of helpdesk people who are not expert in every topic. Having DNS doesn't magically get you service back, of course, but it leaves a better story behind than simply vanishing from the network. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'"-Asimov
On 10/5/21 14:52, Joe Greco wrote:
That's not quite true. It still gives much better clue as to what is going on; if a host resolves to an IP but isn't pingable/traceroutable, that is something that many more techy people will understand than if the domain is simply unresolvable. Not everyone has the skill set and knowledge of DNS to understand how to track down what nameservers Facebook is supposed to have, and how to debug names not resolving. There are lots of helpdesk people who are not expert in every topic.
Having DNS doesn't magically get you service back, of course, but it leaves a better story behind than simply vanishing from the network.
That's great for you and me who believe in and like troubleshooting. Jane and Thando who just want their Instagram timeline feed couldn't care less about DNS working but network access is down. To them, it's broken, despite your state-of-the-art global DNS architecture. I'm also yet to find any DNS operator who makes deploying 3rd party resiliency to give other random network operators in the wild troubleshooting joy their #1 priority for doing so :-). On the real though, I'm all for as much useful redundancy as we can get away with. But given just how much we rely on the web for basic life these days, we need to do better about making actual services as resilient as we can (and have) the DNS. Mark.
On Tue, Oct 05, 2021 at 02:57:42PM +0200, Mark Tinka wrote:
On 10/5/21 14:52, Joe Greco wrote:
That's not quite true. It still gives much better clue as to what is going on; if a host resolves to an IP but isn't pingable/traceroutable, that is something that many more techy people will understand than if the domain is simply unresolvable. Not everyone has the skill set and knowledge of DNS to understand how to track down what nameservers Facebook is supposed to have, and how to debug names not resolving. There are lots of helpdesk people who are not expert in every topic.
Having DNS doesn't magically get you service back, of course, but it leaves a better story behind than simply vanishing from the network.
That's great for you and me who believe in and like troubleshooting.
Jane and Thando who just want their Instagram timeline feed couldn't care less about DNS working but network access is down. To them, it's broken, despite your state-of-the-art global DNS architecture.
You don't think at least 10,000 helpdesk requests about Facebook being down were sent yesterday? There's something to be said for building these things to be resilient in a manner that isn't just convenient internally, but also externally to those people that network operators sometimes forget also support their network issues indirectly. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'"-Asimov
On 10/5/21 15:04, Joe Greco wrote:
You don't think at least 10,000 helpdesk requests about Facebook being down were sent yesterday?
That and Jane + Thando likely re-installing all their apps and iOS/Android on their phones, and rebooting them 300 times in the hopes that Facebook and WhatsApp would work. Yes, total nightmare yesterday, but sure that 9,999 of the helpdesk tickets had nothing to do with DNS. They likely all were - "Your Internet is down, just fix it; we don't wanna know".
There's something to be said for building these things to be resilient in a manner that isn't just convenient internally, but also externally to those people that network operators sometimes forget also support their network issues indirectly.
I don't disagree with you one bit. It's for that exact reason that we built: https://as37100.net/ ... not for us, but specifically for other random network operators around the world whom we may never get to drink a crate of wine with. I have to say that it has likely cut e-mails to our NOC as well as overall pain in half, if not more. Mark.
On 10/5/21 15:40, Mark Tinka wrote:
I don't disagree with you one bit. It's for that exact reason that we built:
... not for us, but specifically for other random network operators around the world whom we may never get to drink a crate of wine with.
I have to say that it has likely cut e-mails to our NOC as well as overall pain in half, if not more.
What I forgot to add, however, is that unlike Facebook, we aren't a major content provider. So we don't have a need to parallel our DNS resiliency with our service resiliency, in terms of 3rd party infrastructure. If our network were to melt, we'll already be getting it from our eyeballs. If we had content of note that was useful to, say, a handful-billion people around the world, we'd give some thought - however complex - to having critical services running on 3rd party infrastructure. Mark.
On Tue, Oct 5, 2021 at 9:56 AM Mark Tinka <mark@tinka.africa> wrote:
On 10/5/21 15:40, Mark Tinka wrote:
I don't disagree with you one bit. It's for that exact reason that we built:
... not for us, but specifically for other random network operators around the world whom we may never get to drink a crate of wine with.
Can someone explain to me, preferably in baby words, why so many providers view information like https://as37100.net/?bgp as secret/proprietary? I've interacted with numerous providers who require an NDA or pinky-swear to get a list of their communities -- is this really just 1: security through obscurity, 2: an artifact of the culture of not sharing, 3: an attempt to seem cool by making you jump through hoops to prove your worthiness, 4: some weird 'mah competitors won't be able to figure out my secret sauce without knowing that 17 means Asia, or 5: something else? Yes, some providers do publish these (usually on the website equivalent of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”), and PeeringDB has definitely helped, but I still don't understand many providers stance on this... W
I have to say that it has likely cut e-mails to our NOC as well as overall pain in half, if not more.
What I forgot to add, however, is that unlike Facebook, we aren't a major content provider. So we don't have a need to parallel our DNS resiliency with our service resiliency, in terms of 3rd party infrastructure. If our network were to melt, we'll already be getting it from our eyeballs.
If we had content of note that was useful to, say, a handful-billion people around the world, we'd give some thought - however complex - to having critical services running on 3rd party infrastructure.
Mark.
-- The computing scientist’s main challenge is not to get confused by the complexities of his own making. -- E. W. Dijkstra
On 10/5/21 09:49, Warren Kumari wrote:
Can someone explain to me, preferably in baby words, why so many providers view information like https://as37100.net/?bgp <https://as37100.net/?bgp> as secret/proprietary? I've interacted with numerous providers who require an NDA or pinky-swear to get a list of their communities -- is this really just 1: security through obscurity, 2: an artifact of the culture of not sharing, 3: an attempt to seem cool by making you jump through hoops to prove your worthiness, 4: some weird 'mah competitors won't be able to figure out my secret sauce without knowing that 17 means Asia, or 5: something else?
Not sure the rationale of leeping them secret, but at least one aggregated source of dozens of them exists and has been around for a long time. https://onestep.net/communities/ -- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
There are also a bunch at http://bgp.community (linked to the source where possible instead of keeping a stale copy). On Tue, Oct 5, 2021, 1:17 PM Jay Hennigan <jay@west.net> wrote:
On 10/5/21 09:49, Warren Kumari wrote:
Can someone explain to me, preferably in baby words, why so many providers view information like https://as37100.net/?bgp <https://as37100.net/?bgp> as secret/proprietary? I've interacted with numerous providers who require an NDA or pinky-swear to get a list of their communities -- is this really just 1: security through obscurity, 2: an artifact of the culture of not sharing, 3: an attempt to seem cool by making you jump through hoops to prove your worthiness, 4: some weird 'mah competitors won't be able to figure out my secret sauce without knowing that 17 means Asia, or 5: something else?
Not sure the rationale of leeping them secret, but at least one aggregated source of dozens of them exists and has been around for a long time. https://onestep.net/communities/
-- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
Can someone explain to me, preferably in baby words, why so many providers view information like https://as37100.net/?bgp as secret/proprietary?
it shows we're important
On Tue, Oct 05, 2021 at 03:40:39PM +0200, Mark Tinka wrote:
Yes, total nightmare yesterday, but sure that 9,999 of the helpdesk tickets had nothing to do with DNS. They likely all were - "Your Internet is down, just fix it; we don't wanna know".
Unrealistic user expectations are not the point. Users can demand whatever unrealistic claptrap they wish to. The point is that there are a lot of helpdesk staff at a lot of organizations who are responsible for responding to these issues. When Facebook or Microsoft or Amazon take a dump, you get a storm of requests. This is a storm of requests not just to one helpdesk, but to MANY helpdesks, across a wide number of organizations, and this means that you have thousands of people trying to investigate what has happened. It is very common for large companies to forget (or not care) that their technical failures impact not just their users, but also external support organizations. I totally get your disdain and indifference towards end users in these instances; for the average end user, yes, it indeed makes no difference if DNS works or not. However, some of those end users do have a point of contact up the chain. This could be their ISP support, or a company helpdesk, and most of these are tasked with taking an issue like this to some sort of resolution. What I'm talking about here is that it is easier to debug and make a determination that there is an IP connectivity issue when DNS works. If DNS isn't working, then you get into a bunch of stuff where you need to do things like determine if maybe it is some sort of DNSSEC issue, or other arcane and obscure issues, which tends to be beyond what front line helpdesk is capable of. These issues often cost companies real time and money to figure out. It is unlikely that Facebook is going to compensate them for this, so this brings me back around to the point that it's preferable to have DNS working when you have a BGP problem, because this is ultimately easier for people to test and reach a reasonable determination that the problem is on Facebook's side quickly and easily. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'"-Asimov
On 10/5/21 16:49, Joe Greco wrote:
Unrealistic user expectations are not the point. Users can demand whatever unrealistic claptrap they wish to.
The user's expectations, today, are always going to be unrealistic, especially when they are able to enjoy a half-decent service free-of-charge. The bar has moved. Nothing we can do about it but adapt.
The point is that there are a lot of helpdesk staff at a lot of organizations who are responsible for responding to these issues. When Facebook or Microsoft or Amazon take a dump, you get a storm of requests. This is a storm of requests not just to one helpdesk, but to MANY helpdesks, across a wide number of organizations, and this means that you have thousands of people trying to investigate what has happened.
We are in agreement. And it's no coincidence that the Facebook's of the world rely almost 100% on non-human contact to give their users support. So that leaves us, infrastructure, in the firing line to pick up the slack for a lack of warm-body access to BigContent.
It is very common for large companies to forget (or not care) that their technical failures impact not just their users, but also external support organizations.
Not just large companies, but I believe all companies... and worse, not at ground level where folk on lists like these tend to keep in touch, but higher up where money decisions where caring about your footprint on other Internet settlers whom you may never meet matters. You and I can bash our heads till they come home, but if the folk that need to say "Yes" to $$$ needed to help external parties troubleshoot better don't get it, then perhaps starting a NOG or some such is our best bet.
I totally get your disdain and indifference towards end users in these instances; for the average end user, yes, it indeed makes no difference if DNS works or not.
On the contrary, I looooooove customers. I wasn't into them, say, 12 years ago, but since I began to understand that users will respond to empathy and value, I fell in love with them. They drive my entire thought-process and decision-making. This is why I keep saying, "Users don't care about how we build the Internet", and they shouldn't. And I support that. BigContent get it, and for better or worse, they are the ones who've set the bar higher than what most network operators are happy with. Infrastructure still doesn't get it, and we are seeing the effects of that play out around the world, with the recent SK Broadband/Netflix debacle being the latest barbershop gossip.
However, some of those end users do have a point of contact up the chain. This could be their ISP support, or a company helpdesk, and most of these are tasked with taking an issue like this to some sort of resolution. What I'm talking about here is that it is easier to debug and make a determination that there is an IP connectivity issue when DNS works. If DNS isn't working, then you get into a bunch of stuff where you need to do things like determine if maybe it is some sort of DNSSEC issue, or other arcane and obscure issues, which tends to be beyond what front line helpdesk is capable of.
We are in agreement.
These issues often cost companies real time and money to figure out. It is unlikely that Facebook is going to compensate them for this, so this brings me back around to the point that it's preferable to have DNS working when you have a BGP problem, because this is ultimately easier for people to test and reach a reasonable determination that the problem is on Facebook's side quickly and easily.
We are in agreement. So let's see if Facebook can fix the scope of their DNS architecture, and whether others can learn from it. I know I have... even though we provide friendly secondary for a bunch of folk we are friends with, we haven't done the same for our own networks... all our stuff sits on just our network - granted in many different countries, but still, one AS. It's been nagging at the back of my mind for yonks, but yesterday was the nudge I needed to get this organized; so off I go. Mark.
If your NS are in 2 separate entities, you could still resolve your MX/A/AAAA/NS. Look how Amazon is doing it. dig +short amazon.com NS ns4.p31.dynect.net. ns3.p31.dynect.net. ns1.p31.dynect.net. ns2.p31.dynect.net. pdns6.ultradns.co.uk. pdns1.ultradns.net. They use dyn DNS from Oracle and ultradns. 2 very strong network of anycast DNS servers. Amazon would have not been impacted like Facebook yesterday. Unless ultradns and Oracle have their DNS servers hosted in Amazon infra? I doubt that Oracle has dns hosted in Amazon, but it's possible. Probably the management overhead to use 2 different entities for DNS is not financially viable? Jean -----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Mark Tinka Sent: October 5, 2021 8:22 AM To: nanog@nanog.org Subject: Re: Facebook post-mortems... On 10/5/21 14:08, Jean St-Laurent via NANOG wrote:
Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities.
Well, doesn't really matter if you can resolve the A/AAAA/MX records, but you can't connect to the network that is hosting the services. At any rate, having 3rd party DNS hosting for your domain is always a good thing to have. But in reality, it only hits the spot if the service is also available on a 3rd party network, otherwise, you keep DNS up, but get no service. Mark.
On 10/5/21 14:58, Jean St-Laurent wrote:
If your NS are in 2 separate entities, you could still resolve your MX/A/AAAA/NS.
Look how Amazon is doing it.
dig +short amazon.com NS ns4.p31.dynect.net. ns3.p31.dynect.net. ns1.p31.dynect.net. ns2.p31.dynect.net. pdns6.ultradns.co.uk. pdns1.ultradns.net.
They use dyn DNS from Oracle and ultradns. 2 very strong network of anycast DNS servers.
Amazon would have not been impacted like Facebook yesterday. Unless ultradns and Oracle have their DNS servers hosted in Amazon infra? I doubt that Oracle has dns hosted in Amazon, but it's possible.
Probably the management overhead to use 2 different entities for DNS is not financially viable?
So I'm not worried about DNS stability when split across multiple physical entities. I'm talking about the actual services being hosted on a single network that goes bye-bye like what we saw yesterday. All the DNS resolution means diddly, even if it tells us that DNS is not the issue. Mark.
Mark Tinka wrote:
So I'm not worried about DNS stability when split across multiple physical entities.
I'm talking about the actual services being hosted on a single network that goes bye-bye like what we saw yesterday.
All the DNS resolution means diddly, even if it tells us that DNS is not the issue.
Mark.
You could put up a temp page or two. Like, the internet is not down, we are just having a bad day. Bear with us for a bit. Go outside and enjoy nature for the next few hours. But more importantly, internal infrastructure domains, containing router names, bootstraps, tools, utilities, physical access control, config repositories, network documentations, oob-network names (who remembers those?) , oob-email, oob communications (messenger, conferences, voip), etc.. Doesnt even have to be globally registered. External DNS server in the resolver list of all tech laptops slaving the zone. Rapid response requires certain amenities, or as we can see, your talking about hours just getting started. Also, the oob-network needs to be used regularly or it will be essentially unusable when actually needed, due to bit rot (accumulation of unnoticed and unresolved issues) and lack of mind muscle memory. It should be standard practice to deploy all new equipment from the oob-network servicing it. Install things how you want to be able to repair them. Joe
On Tue, Oct 5, 2021 at 5:44 AM Mark Tinka <mark@tinka.africa> wrote:
On 10/5/21 14:08, Jean St-Laurent via NANOG wrote:
Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities.
Well, doesn't really matter if you can resolve the A/AAAA/MX records, but you can't connect to the network that is hosting the services.
Disagree for two reasons: 1. If you have some DNS working, you can point it at a static “we are down and we know it” page much sooner. 2. If you have convinced the entire world to install tracking pixels on their web pages that all need your IP address, it is rude to the rest of the world’s DNS to not be able to always provide a prompt (and cacheable) response.
On 10/5/21 16:59, Matthew Kaufman wrote:
Disagree for two reasons:
1. If you have some DNS working, you can point it at a static “we are down and we know it” page much sooner.
Isn't that what Twirra is for, nowadays :-)...
2. If you have convinced the entire world to install tracking pixels on their web pages that all need your IP address, it is rude to the rest of the world’s DNS to not be able to always provide a prompt (and cacheable) response.
Agreed, but I know many an exec that signs the capex cheques who may find "rude" not a noteworthy discussion point when we submit the budget. Not saying I think being rude is cool, but there is a reason we are here, now, today. Mark.
1. If you have some DNS working, you can point it at a static “we are down and we know it” page much sooner. 2. Good catch and you’re right that it would have reduce the planetary impact. Less call to help-desk and less reboot of devices. It would have give visibility on what’s happening. It seems to be really resilient in today’s world, a business needs their NS in at least 2 different entities like amazon.com is doing. Jean From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Matthew Kaufman Sent: October 5, 2021 10:59 AM To: Mark Tinka <mark@tinka.africa> Cc: nanog@nanog.org Subject: Re: Facebook post-mortems... On Tue, Oct 5, 2021 at 5:44 AM Mark Tinka <mark@tinka.africa <mailto:mark@tinka.africa> > wrote: On 10/5/21 14:08, Jean St-Laurent via NANOG wrote:
Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities.
Well, doesn't really matter if you can resolve the A/AAAA/MX records, but you can't connect to the network that is hosting the services. Disagree for two reasons: 1. If you have some DNS working, you can point it at a static “we are down and we know it” page much sooner. 2. If you have convinced the entire world to install tracking pixels on their web pages that all need your IP address, it is rude to the rest of the world’s DNS to not be able to always provide a prompt (and cacheable) response.
On Oct 5, 2021, at 10:32 AM, Jean St-Laurent via NANOG <nanog@nanog.org> wrote:
If you have some DNS working, you can point it at a static “we are down and we know it” page much sooner,
At the scale of facebook that seems extremely difficult to pull off w/o most of their architecture online. Imagine trying to terminate >billion sessions. When they started to come back up and had their "We're sorry" page up- even their static png couldn't make it onto the wire.
Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities.
Assuming they had such a thing in place , it would not have helped. Facebook stopped announcing the vast majority of their IP space to the DFZ during this. So even they did have an offnet DNS server that could have provided answers to clients, those same clients probably wouldn't have been able to connect to the IPs returned anyways. If you are running your own auths like they are, you likely view your public network reachability as almost bulletproof and that it will never disappear. Which is probably true most of the time. Until yesterday happens and the 9's in your reliability percentage change to 7's. On Tue, Oct 5, 2021 at 8:10 AM Jean St-Laurent via NANOG <nanog@nanog.org> wrote:
Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities.
Let's check how these big companies are spreading their NS's.
$ dig +short facebook.com NS d.ns.facebook.com. b.ns.facebook.com. c.ns.facebook.com. a.ns.facebook.com.
$ dig +short google.com NS ns1.google.com. ns4.google.com. ns2.google.com. ns3.google.com.
$ dig +short apple.com NS a.ns.apple.com. b.ns.apple.com. c.ns.apple.com. d.ns.apple.com.
$ dig +short amazon.com NS ns4.p31.dynect.net. ns3.p31.dynect.net. ns1.p31.dynect.net. ns2.p31.dynect.net. pdns6.ultradns.co.uk. pdns1.ultradns.net.
$ dig +short netflix.com NS ns-1372.awsdns-43.org. ns-1984.awsdns-56.co.uk. ns-659.awsdns-18.net. ns-81.awsdns-10.com.
Amnazon and Netflix seem to not keep their eggs in the same basket. From a first look, they seem more resilient than facebook.com, google.com and apple.com
Jean
-----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Jeff Tantsura Sent: October 5, 2021 2:18 AM To: William Herrin <bill@herrin.us> Cc: nanog@nanog.org Subject: Re: Facebook post-mortems...
129.134.30.0/23, 129.134.30.0/24, 129.134.31.0/24. The specific routes covering all 4 nameservers (a-d) were withdrawn from all FB peering at approximately 15:40 UTC.
Cheers, Jeff
On Oct 4, 2021, at 22:45, William Herrin <bill@herrin.us> wrote:
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com> wrote:
They have a monkey patch subsystem. Lol.
Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. This is where at an arbitrary location in the code you re-open an object defined elsewhere and change its methods.
Chef doesn't always do the right thing. You tell Chef to remove an RPM and it does. Even if it has to remove half the operating system to satisfy the dependencies. If you want it to do something reasonable, say throw an error because you didn't actually tell it to remove half the operating system, you have a choice: spin up a fork of chef with a couple patches to the chef-rpm interaction or just monkey-patch it in one of your chef recipes.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
Facebook stopped announcing the vast majority of their IP space to the DFZ during this. This is where I would like to learn more about the outage. Direct Peering FB connections saw a drop in a networks (about a dozen) and one the networks covered their C and D Nameservers but the block for A and B name servers remained advertised but simply not responsive . I imagine the dropped blocks could have prevented internal responses but an suprise all of these issue would stem from the perspective I have . On Tue, Oct 5, 2021 at 8:48 AM Tom Beecher <beecher@beecher.cc> wrote:
Maybe withdrawing those routes to their NS could have been mitigated by
having NS in separate entities.
Assuming they had such a thing in place , it would not have helped.
Facebook stopped announcing the vast majority of their IP space to the DFZ during this. So even they did have an offnet DNS server that could have provided answers to clients, those same clients probably wouldn't have been able to connect to the IPs returned anyways.
If you are running your own auths like they are, you likely view your public network reachability as almost bulletproof and that it will never disappear. Which is probably true most of the time. Until yesterday happens and the 9's in your reliability percentage change to 7's.
On Tue, Oct 5, 2021 at 8:10 AM Jean St-Laurent via NANOG <nanog@nanog.org> wrote:
Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities.
Let's check how these big companies are spreading their NS's.
$ dig +short facebook.com NS d.ns.facebook.com. b.ns.facebook.com. c.ns.facebook.com. a.ns.facebook.com.
$ dig +short google.com NS ns1.google.com. ns4.google.com. ns2.google.com. ns3.google.com.
$ dig +short apple.com NS a.ns.apple.com. b.ns.apple.com. c.ns.apple.com. d.ns.apple.com.
$ dig +short amazon.com NS ns4.p31.dynect.net. ns3.p31.dynect.net. ns1.p31.dynect.net. ns2.p31.dynect.net. pdns6.ultradns.co.uk. pdns1.ultradns.net.
$ dig +short netflix.com NS ns-1372.awsdns-43.org. ns-1984.awsdns-56.co.uk. ns-659.awsdns-18.net. ns-81.awsdns-10.com.
Amnazon and Netflix seem to not keep their eggs in the same basket. From a first look, they seem more resilient than facebook.com, google.com and apple.com
Jean
-----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Jeff Tantsura Sent: October 5, 2021 2:18 AM To: William Herrin <bill@herrin.us> Cc: nanog@nanog.org Subject: Re: Facebook post-mortems...
129.134.30.0/23, 129.134.30.0/24, 129.134.31.0/24. The specific routes covering all 4 nameservers (a-d) were withdrawn from all FB peering at approximately 15:40 UTC.
Cheers, Jeff
On Oct 4, 2021, at 22:45, William Herrin <bill@herrin.us> wrote:
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com> wrote:
They have a monkey patch subsystem. Lol.
Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. This is where at an arbitrary location in the code you re-open an object defined elsewhere and change its methods.
Chef doesn't always do the right thing. You tell Chef to remove an RPM and it does. Even if it has to remove half the operating system to satisfy the dependencies. If you want it to do something reasonable, say throw an error because you didn't actually tell it to remove half the operating system, you have a choice: spin up a fork of chef with a couple patches to the chef-rpm interaction or just monkey-patch it in one of your chef recipes.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
* telescope40@gmail.com (Lou D) [Tue 05 Oct 2021, 15:12 CEST]:
Facebook stopped announcing the vast majority of their IP space to the DFZ during this.
People keep repeating this but I don't think it's true. It's probably based on this tweet: https://twitter.com/ryan505/status/1445118376339140618 but that's an aggregate adding up prefix counts from many sessions. The total number of hosts covered by those announcements didn't vary by nearly as much, since to a significant extent it were more specifics (/24) of larger prefixes (e.g. /17) that disappeared, while those /17s stayed. (There were no covering prefixes for WhatsApp's NS addresses so those were completely unreachable from the DFZ.) -- Niels.
People keep repeating this but I don't think it's true.
My comment is solely sourced on my direct observations on my network, maybe 30-45 minutes in. Everything except a few /24s disappeared from DFZ providers, but I still heard those prefixes from direct peerings. There was no disaggregation that I saw, just the big stuff gone. This was consistent over 5 continents from my viewpoints. Others may have seen different things at different times. I do not run an eyeball so I had no need to continually monitor. On Tue, Oct 5, 2021 at 10:22 AM Niels Bakker <niels=nanog@bakker.net> wrote:
* telescope40@gmail.com (Lou D) [Tue 05 Oct 2021, 15:12 CEST]:
Facebook stopped announcing the vast majority of their IP space to the DFZ during this.
People keep repeating this but I don't think it's true.
It's probably based on this tweet: https://twitter.com/ryan505/status/1445118376339140618
but that's an aggregate adding up prefix counts from many sessions. The total number of hosts covered by those announcements didn't vary by nearly as much, since to a significant extent it were more specifics (/24) of larger prefixes (e.g. /17) that disappeared, while those /17s stayed.
(There were no covering prefixes for WhatsApp's NS addresses so those were completely unreachable from the DFZ.)
-- Niels.
Does anyone have info whether this network 69.171.240.0/20 was reachable during the outage. Jean From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Tom Beecher Sent: October 5, 2021 10:30 AM To: NANOG <nanog@nanog.org> Subject: Re: Facebook post-mortems... People keep repeating this but I don't think it's true. My comment is solely sourced on my direct observations on my network, maybe 30-45 minutes in. Everything except a few /24s disappeared from DFZ providers, but I still heard those prefixes from direct peerings. There was no disaggregation that I saw, just the big stuff gone. This was consistent over 5 continents from my viewpoints. Others may have seen different things at different times. I do not run an eyeball so I had no need to continually monitor. On Tue, Oct 5, 2021 at 10:22 AM Niels Bakker <niels=nanog@bakker.net <mailto:nanog@bakker.net> > wrote: * telescope40@gmail.com <mailto:telescope40@gmail.com> (Lou D) [Tue 05 Oct 2021, 15:12 CEST]:
Facebook stopped announcing the vast majority of their IP space to the DFZ during this.
People keep repeating this but I don't think it's true. It's probably based on this tweet: https://twitter.com/ryan505/status/1445118376339140618 but that's an aggregate adding up prefix counts from many sessions. The total number of hosts covered by those announcements didn't vary by nearly as much, since to a significant extent it were more specifics (/24) of larger prefixes (e.g. /17) that disappeared, while those /17s stayed. (There were no covering prefixes for WhatsApp's NS addresses so those were completely unreachable from the DFZ.) -- Niels.
Niels, you are correct about my initial tweet, which I updated in later tweets to clarify with a hat tip to Will Hargrave as thanks for seeking more detail. Cheers, Ryan On Tue, Oct 5, 2021 at 08:24 Niels Bakker <niels=nanog@bakker.net> wrote:
* telescope40@gmail.com (Lou D) [Tue 05 Oct 2021, 15:12 CEST]:
Facebook stopped announcing the vast majority of their IP space to the DFZ during this.
People keep repeating this but I don't think it's true.
It's probably based on this tweet: https://twitter.com/ryan505/status/1445118376339140618
but that's an aggregate adding up prefix counts from many sessions. The total number of hosts covered by those announcements didn't vary by nearly as much, since to a significant extent it were more specifics (/24) of larger prefixes (e.g. /17) that disappeared, while those /17s stayed.
(There were no covering prefixes for WhatsApp's NS addresses so those were completely unreachable from the DFZ.)
-- Niels.
Ryan, thanks for sharing your data, it's unfortunate that it was seemingly misinterpreted by a few souls. * ryan.landry@gmail.com (Ryan Landry) [Tue 05 Oct 2021, 17:52 CEST]:
Niels, you are correct about my initial tweet, which I updated in later tweets to clarify with a hat tip to Will Hargrave as thanks for seeking more detail.
I agree to resolve non-routable address doesn’t bring you a working service. I thought a few networks were still reachable like their MX or some DRP networks. Thanks for the update Jean From: Tom Beecher <beecher@beecher.cc> Sent: October 5, 2021 8:33 AM To: Jean St-Laurent <jean@ddostest.me> Cc: Jeff Tantsura <jefftant.ietf@gmail.com>; William Herrin <bill@herrin.us>; NANOG <nanog@nanog.org> Subject: Re: Facebook post-mortems... Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities. Assuming they had such a thing in place , it would not have helped. Facebook stopped announcing the vast majority of their IP space to the DFZ during this. So even they did have an offnet DNS server that could have provided answers to clients, those same clients probably wouldn't have been able to connect to the IPs returned anyways. If you are running your own auths like they are, you likely view your public network reachability as almost bulletproof and that it will never disappear. Which is probably true most of the time. Until yesterday happens and the 9's in your reliability percentage change to 7's. On Tue, Oct 5, 2021 at 8:10 AM Jean St-Laurent via NANOG <nanog@nanog.org <mailto:nanog@nanog.org> > wrote: Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities. Let's check how these big companies are spreading their NS's. $ dig +short facebook.com <http://facebook.com> NS d.ns.facebook.com <http://d.ns.facebook.com> . b.ns.facebook.com <http://b.ns.facebook.com> . c.ns.facebook.com <http://c.ns.facebook.com> . a.ns.facebook.com <http://a.ns.facebook.com> . $ dig +short google.com <http://google.com> NS ns1.google.com <http://ns1.google.com> . ns4.google.com <http://ns4.google.com> . ns2.google.com <http://ns2.google.com> . ns3.google.com <http://ns3.google.com> . $ dig +short apple.com <http://apple.com> NS a.ns.apple.com <http://a.ns.apple.com> . b.ns.apple.com <http://b.ns.apple.com> . c.ns.apple.com <http://c.ns.apple.com> . d.ns.apple.com <http://d.ns.apple.com> . $ dig +short amazon.com <http://amazon.com> NS ns4.p31.dynect.net <http://ns4.p31.dynect.net> . ns3.p31.dynect.net <http://ns3.p31.dynect.net> . ns1.p31.dynect.net <http://ns1.p31.dynect.net> . ns2.p31.dynect.net <http://ns2.p31.dynect.net> . pdns6.ultradns.co.uk <http://pdns6.ultradns.co.uk> . pdns1.ultradns.net <http://pdns1.ultradns.net> . $ dig +short netflix.com <http://netflix.com> NS ns-1372.awsdns-43.org <http://ns-1372.awsdns-43.org> . ns-1984.awsdns-56.co.uk <http://ns-1984.awsdns-56.co.uk> . ns-659.awsdns-18.net <http://ns-659.awsdns-18.net> . ns-81.awsdns-10.com <http://ns-81.awsdns-10.com> . Amnazon and Netflix seem to not keep their eggs in the same basket. From a first look, they seem more resilient than facebook.com <http://facebook.com> , google.com <http://google.com> and apple.com <http://apple.com> Jean -----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org <mailto:ddostest.me@nanog.org> > On Behalf Of Jeff Tantsura Sent: October 5, 2021 2:18 AM To: William Herrin <bill@herrin.us <mailto:bill@herrin.us> > Cc: nanog@nanog.org <mailto:nanog@nanog.org> Subject: Re: Facebook post-mortems... 129.134.30.0/23 <http://129.134.30.0/23> , 129.134.30.0/24 <http://129.134.30.0/24> , 129.134.31.0/24 <http://129.134.31.0/24> . The specific routes covering all 4 nameservers (a-d) were withdrawn from all FB peering at approximately 15:40 UTC. Cheers, Jeff
On Oct 4, 2021, at 22:45, William Herrin <bill@herrin.us <mailto:bill@herrin.us> > wrote:
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com <mailto:mike@mtcc.com> > wrote:
They have a monkey patch subsystem. Lol.
Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. This is where at an arbitrary location in the code you re-open an object defined elsewhere and change its methods.
Chef doesn't always do the right thing. You tell Chef to remove an RPM and it does. Even if it has to remove half the operating system to satisfy the dependencies. If you want it to do something reasonable, say throw an error because you didn't actually tell it to remove half the operating system, you have a choice: spin up a fork of chef with a couple patches to the chef-rpm interaction or just monkey-patch it in one of your chef recipes.
Regards, Bill Herrin
-- William Herrin bill@herrin.us <mailto:bill@herrin.us> https://bill.herrin.us/
As of now, their MX is hosted on 69.171.251.251 Was this network still announced yesterday in the DFZ during the outage? 69.171.224.0/19 69.171.240.0/20 Jean From: Jean St-Laurent <jean@ddostest.me> Sent: October 5, 2021 9:50 AM To: 'Tom Beecher' <beecher@beecher.cc> Cc: 'Jeff Tantsura' <jefftant.ietf@gmail.com>; 'William Herrin' <bill@herrin.us>; 'NANOG' <nanog@nanog.org> Subject: RE: Facebook post-mortems... I agree to resolve non-routable address doesn’t bring you a working service. I thought a few networks were still reachable like their MX or some DRP networks. Thanks for the update Jean From: Tom Beecher <beecher@beecher.cc <mailto:beecher@beecher.cc> > Sent: October 5, 2021 8:33 AM To: Jean St-Laurent <jean@ddostest.me <mailto:jean@ddostest.me> > Cc: Jeff Tantsura <jefftant.ietf@gmail.com <mailto:jefftant.ietf@gmail.com> >; William Herrin <bill@herrin.us <mailto:bill@herrin.us> >; NANOG <nanog@nanog.org <mailto:nanog@nanog.org> > Subject: Re: Facebook post-mortems... Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities. Assuming they had such a thing in place , it would not have helped. Facebook stopped announcing the vast majority of their IP space to the DFZ during this. So even they did have an offnet DNS server that could have provided answers to clients, those same clients probably wouldn't have been able to connect to the IPs returned anyways. If you are running your own auths like they are, you likely view your public network reachability as almost bulletproof and that it will never disappear. Which is probably true most of the time. Until yesterday happens and the 9's in your reliability percentage change to 7's. On Tue, Oct 5, 2021 at 8:10 AM Jean St-Laurent via NANOG <nanog@nanog.org <mailto:nanog@nanog.org> > wrote: Maybe withdrawing those routes to their NS could have been mitigated by having NS in separate entities. Let's check how these big companies are spreading their NS's. $ dig +short facebook.com <http://facebook.com> NS d.ns.facebook.com <http://d.ns.facebook.com> . b.ns.facebook.com <http://b.ns.facebook.com> . c.ns.facebook.com <http://c.ns.facebook.com> . a.ns.facebook.com <http://a.ns.facebook.com> . $ dig +short google.com <http://google.com> NS ns1.google.com <http://ns1.google.com> . ns4.google.com <http://ns4.google.com> . ns2.google.com <http://ns2.google.com> . ns3.google.com <http://ns3.google.com> . $ dig +short apple.com <http://apple.com> NS a.ns.apple.com <http://a.ns.apple.com> . b.ns.apple.com <http://b.ns.apple.com> . c.ns.apple.com <http://c.ns.apple.com> . d.ns.apple.com <http://d.ns.apple.com> . $ dig +short amazon.com <http://amazon.com> NS ns4.p31.dynect.net <http://ns4.p31.dynect.net> . ns3.p31.dynect.net <http://ns3.p31.dynect.net> . ns1.p31.dynect.net <http://ns1.p31.dynect.net> . ns2.p31.dynect.net <http://ns2.p31.dynect.net> . pdns6.ultradns.co.uk <http://pdns6.ultradns.co.uk> . pdns1.ultradns.net <http://pdns1.ultradns.net> . $ dig +short netflix.com <http://netflix.com> NS ns-1372.awsdns-43.org <http://ns-1372.awsdns-43.org> . ns-1984.awsdns-56.co.uk <http://ns-1984.awsdns-56.co.uk> . ns-659.awsdns-18.net <http://ns-659.awsdns-18.net> . ns-81.awsdns-10.com <http://ns-81.awsdns-10.com> . Amnazon and Netflix seem to not keep their eggs in the same basket. From a first look, they seem more resilient than facebook.com <http://facebook.com> , google.com <http://google.com> and apple.com <http://apple.com> Jean -----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org <mailto:ddostest.me@nanog.org> > On Behalf Of Jeff Tantsura Sent: October 5, 2021 2:18 AM To: William Herrin <bill@herrin.us <mailto:bill@herrin.us> > Cc: nanog@nanog.org <mailto:nanog@nanog.org> Subject: Re: Facebook post-mortems... 129.134.30.0/23 <http://129.134.30.0/23> , 129.134.30.0/24 <http://129.134.30.0/24> , 129.134.31.0/24 <http://129.134.31.0/24> . The specific routes covering all 4 nameservers (a-d) were withdrawn from all FB peering at approximately 15:40 UTC. Cheers, Jeff
On Oct 4, 2021, at 22:45, William Herrin <bill@herrin.us <mailto:bill@herrin.us> > wrote:
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com <mailto:mike@mtcc.com> > wrote:
They have a monkey patch subsystem. Lol.
Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. This is where at an arbitrary location in the code you re-open an object defined elsewhere and change its methods.
Chef doesn't always do the right thing. You tell Chef to remove an RPM and it does. Even if it has to remove half the operating system to satisfy the dependencies. If you want it to do something reasonable, say throw an error because you didn't actually tell it to remove half the operating system, you have a choice: spin up a fork of chef with a couple patches to the chef-rpm interaction or just monkey-patch it in one of your chef recipes.
Regards, Bill Herrin
-- William Herrin bill@herrin.us <mailto:bill@herrin.us> https://bill.herrin.us/
Jean St-Laurent via NANOG <nanog@nanog.org> writes:
Let's check how these big companies are spreading their NS's.
$ dig +short facebook.com NS d.ns.facebook.com. b.ns.facebook.com. c.ns.facebook.com. a.ns.facebook.com.
$ dig +short google.com NS ns1.google.com. ns4.google.com. ns2.google.com. ns3.google.com.
$ dig +short apple.com NS a.ns.apple.com. b.ns.apple.com. c.ns.apple.com. d.ns.apple.com.
$ dig +short amazon.com NS ns4.p31.dynect.net. ns3.p31.dynect.net. ns1.p31.dynect.net. ns2.p31.dynect.net. pdns6.ultradns.co.uk. pdns1.ultradns.net.
$ dig +short netflix.com NS ns-1372.awsdns-43.org. ns-1984.awsdns-56.co.uk. ns-659.awsdns-18.net. ns-81.awsdns-10.com.
Just to state the obvious: Names are irrelevant. Addresses are not. These names are just place holders for the glue in the parent zone anyway. If you look behind the names you'll find that Apple spread their servers between two ASes. So they are not as vulnerable as Google and Facebook. Bjørn
On 5. Oct 2021, at 07:42, William Herrin <bill@herrin.us> wrote:
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com> wrote:
They have a monkey patch subsystem. Lol.
Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches.
While Ruby indeed has a chain-saw (read: powerful, dangerous, still the tool of choice in certain cases) in its toolkit that is generally called “monkey-patching”, I think Michael was actually thinking about the “chaos monkey”, https://en.wikipedia.org/wiki/Chaos_engineering#Chaos_Monkey https://netflix.github.io/chaosmonkey/ That was a Netflix invention, but see also https://en.wikipedia.org/wiki/Chaos_engineering#Facebook_Storm Grüße, Carsten
Carsten Bormann wrote:
While Ruby indeed has a chain-saw (read: powerful, dangerous, still the tool of choice in certain cases) in its toolkit that is generally called “monkey-patching”, I think Michael was actually thinking about the “chaos monkey”, https://en.wikipedia.org/wiki/Chaos_engineering#Chaos_Monkey https://netflix.github.io/chaosmonkey/
That was a Netflix invention, but see also https://en.wikipedia.org/wiki/Chaos_engineering#Facebook_Storm
It seems to me that so called chaos engineering assumes cosmic internet environment, though, in good old days, we were aware that the Internet is the source of chaos. Masataka Ohta
On 10/5/21 12:17 AM, Carsten Bormann wrote:
On 5. Oct 2021, at 07:42, William Herrin <bill@herrin.us> wrote:
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com> wrote:
They have a monkey patch subsystem. Lol. Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. While Ruby indeed has a chain-saw (read: powerful, dangerous, still the tool of choice in certain cases) in its toolkit that is generally called “monkey-patching”, I think Michael was actually thinking about the “chaos monkey”, https://en.wikipedia.org/wiki/Chaos_engineering#Chaos_Monkey https://netflix.github.io/chaosmonkey/
No, chaos monkey is a purposeful thing to induce corner case errors so they can be fixed. The earlier outage involved a config sanitizer that screwed up and then pushed it out. I can't get my head around why anybody thought that was a good idea vs rejecting it and making somebody fix the config. Mike
Updated: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/ On Tue, Oct 5, 2021 at 1:26 PM Michael Thomas <mike@mtcc.com> wrote:
On 10/5/21 12:17 AM, Carsten Bormann wrote:
On 5. Oct 2021, at 07:42, William Herrin <bill@herrin.us> wrote:
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com> wrote:
They have a monkey patch subsystem. Lol. Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. While Ruby indeed has a chain-saw (read: powerful, dangerous, still the tool of choice in certain cases) in its toolkit that is generally called “monkey-patching”, I think Michael was actually thinking about the “chaos monkey”, https://en.wikipedia.org/wiki/Chaos_engineering#Chaos_Monkey https://netflix.github.io/chaosmonkey/
No, chaos monkey is a purposeful thing to induce corner case errors so they can be fixed. The earlier outage involved a config sanitizer that screwed up and then pushed it out. I can't get my head around why anybody thought that was a good idea vs rejecting it and making somebody fix the config.
Mike
-- Randy Monroe Network Engineering [image: Uber] <https://uber.com/>
This bit posted by Randy might get lost in the other thread, but it appears that their DNS withdraws BGP routes for prefixes that they can't reach or are flaky it seems. Apparently that goes for the prefixes that the name servers are on too. This caused internal outages too as it seems they use their front facing DNS just like everybody else. Sounds like they might consider having at least one split horizon server internally. Lots of fodder here. Mike On 10/5/21 11:11 AM, Randy Monroe wrote:
Updated: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/ <https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/>
On Tue, Oct 5, 2021 at 1:26 PM Michael Thomas <mike@mtcc.com <mailto:mike@mtcc.com>> wrote:
On 10/5/21 12:17 AM, Carsten Bormann wrote: > On 5. Oct 2021, at 07:42, William Herrin <bill@herrin.us <mailto:bill@herrin.us>> wrote: >> On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com <mailto:mike@mtcc.com>> wrote: >>> They have a monkey patch subsystem. Lol. >> Yes, actually, they do. They use Chef extensively to configure >> operating systems. Chef is written in Ruby. Ruby has something called >> Monkey Patches. > While Ruby indeed has a chain-saw (read: powerful, dangerous, still the tool of choice in certain cases) in its toolkit that is generally called “monkey-patching”, I think Michael was actually thinking about the “chaos monkey”, > https://en.wikipedia.org/wiki/Chaos_engineering#Chaos_Monkey <https://en.wikipedia.org/wiki/Chaos_engineering#Chaos_Monkey> > https://netflix.github.io/chaosmonkey/ <https://netflix.github.io/chaosmonkey/>
No, chaos monkey is a purposeful thing to induce corner case errors so they can be fixed. The earlier outage involved a config sanitizer that screwed up and then pushed it out. I can't get my head around why anybody thought that was a good idea vs rejecting it and making somebody fix the config.
Mike
--
Randy Monroe
Network Engineering
Uber <https://uber.com/>
On 10/5/21 8:39 PM, Michael Thomas wrote:
This bit posted by Randy might get lost in the other thread, but it appears that their DNS withdraws BGP routes for prefixes that they can't reach or are flaky it seems. Apparently that goes for the prefixes that the name servers are on too. This caused internal outages too as it seems they use their front facing DNS just like everybody else.
Sounds like they might consider having at least one split horizon server internally. Lots of fodder here.
---------------------------------------------------------------------------- Move fast; break things? :) scott
On 10/5/21 5:51 AM, scott wrote:
On 10/5/21 8:39 PM, Michael Thomas wrote:
This bit posted by Randy might get lost in the other thread, but it appears that their DNS withdraws BGP routes for prefixes that they can't reach or are flaky it seems. Apparently that goes for the prefixes that the name servers are on too. This caused internal outages too as it seems they use their front facing DNS just like everybody else.
Sounds like they might consider having at least one split horizon server internally. Lots of fodder here.
even a POTS line connected to a modem connected to a serial port on a workstation in the data enter so that you can talk to whatever you need to talk to. I would go so far as to have other outgoing serial connections to routers from that workstation. It's ugly, but it provides remote out of band disaster management. Just sayin'
----------------------------------------------------------------------------
Move fast; break things? :)
scott
I probably still have my US Robotics 14.4 in the basement, but it's been awhile since I've had access to a POTS line it would work on ... :) pj capelli pjcapelli@pm.me "Never to get lost, is not living" - Rebecca Solnit Sent with ProtonMail Secure Email. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Wednesday, October 6th, 2021 at 10:41 AM, Curtis Maurand <cmaurand@xyonet.com> wrote:
On 10/5/21 5:51 AM, scott wrote:
On 10/5/21 8:39 PM, Michael Thomas wrote:
This bit posted by Randy might get lost in the other thread, but it appears that their DNS withdraws BGP routes for prefixes that they can't reach or are flaky it seems. Apparently that goes for the prefixes that the name servers are on too. This caused internal outages too as it seems they use their front facing DNS just like everybody else.
Sounds like they might consider having at least one split horizon server internally. Lots of fodder here.
even a POTS line connected to a modem connected to a serial port on a workstation in the data enter so that you can talk to whatever you need to talk to. I would go so far as to have other outgoing serial connections to routers from that workstation. It's ugly, but it provides remote out of band disaster management. Just sayin'
----------------------------------------------------------------------------
Move fast; break things? :)
scott
It's a few years old, but Facebook has talked a little bit about their DNS infrastructure before. Here's a little clip that talks about Cartographer: https://youtu.be/bxhYNfFeVF4?t=2073 From their outage report, it sounds like their authoritative DNS servers withdraw their anycast announcements when they're unhealthy. The health check from those servers must have relied on something upstream. Maybe they couldn't talk to Cartographer for a few minutes so they thought they might be isolated from the rest of the network and they decided to withdraw their routes instead of serving stale data. Makes sense when a single node does it, not so much when the entire fleet thinks that they're out on their own. A performance issue in Cartographer (or whatever manages this fleet these days) could have been the ticking time bomb that set the whole thing in motion. On 10/5/21 3:39 PM, Michael Thomas wrote:
This bit posted by Randy might get lost in the other thread, but it appears that their DNS withdraws BGP routes for prefixes that they can't reach or are flaky it seems. Apparently that goes for the prefixes that the name servers are on too. This caused internal outages too as it seems they use their front facing DNS just like everybody else.
Sounds like they might consider having at least one split horizon server internally. Lots of fodder here.
Mike
On 10/5/21 11:11 AM, Randy Monroe wrote:
Updated: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/
On Tue, Oct 5, 2021 at 1:26 PM Michael Thomas <mike@mtcc.com <mailto:mike@mtcc.com>> wrote:
On 10/5/21 12:17 AM, Carsten Bormann wrote: > On 5. Oct 2021, at 07:42, William Herrin <bill@herrin.us <mailto:bill@herrin.us>> wrote: >> On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com <mailto:mike@mtcc.com>> wrote: >>> They have a monkey patch subsystem. Lol. >> Yes, actually, they do. They use Chef extensively to configure >> operating systems. Chef is written in Ruby. Ruby has something called >> Monkey Patches. > While Ruby indeed has a chain-saw (read: powerful, dangerous, still the tool of choice in certain cases) in its toolkit that is generally called “monkey-patching”, I think Michael was actually thinking about the “chaos monkey”, > https://en.wikipedia.org/wiki/Chaos_engineering#Chaos_Monkey > https://netflix.github.io/chaosmonkey/
No, chaos monkey is a purposeful thing to induce corner case errors so they can be fixed. The earlier outage involved a config sanitizer that screwed up and then pushed it out. I can't get my head around why anybody thought that was a good idea vs rejecting it and making somebody fix the config.
Mike
--
Randy Monroe
Network Engineering
Uber <https://uber.com/>
On 10/5/21 3:09 PM, Andy Brezinsky wrote:
It's a few years old, but Facebook has talked a little bit about their DNS infrastructure before. Here's a little clip that talks about Cartographer: https://youtu.be/bxhYNfFeVF4?t=2073
From their outage report, it sounds like their authoritative DNS servers withdraw their anycast announcements when they're unhealthy. The health check from those servers must have relied on something upstream. Maybe they couldn't talk to Cartographer for a few minutes so they thought they might be isolated from the rest of the network and they decided to withdraw their routes instead of serving stale data. Makes sense when a single node does it, not so much when the entire fleet thinks that they're out on their own.
A performance issue in Cartographer (or whatever manages this fleet these days) could have been the ticking time bomb that set the whole thing in motion.
Rereading it is said that their internal (?) backbone went down so pulling the routes was arguably the right thing to do. Or at least not flat out wrong. Taking out their nameserver subnets was clearly a problem though, though a fix is probably tricky since you clearly want to take down errant nameservers too. Mike
Had some chats with other folks: Arguably you could change the nameserver isolation check failure action to be "depref your exports" rather than "yank it all". Basically, set up a tiered setup so the boxes passing those additional health checks and that should have correct entries would be your primary destination and failing nodes shouldn't receive query traffic since they're depref'd in your internal routing. But in case all nodes fail that check simultaneously, those nodes failing the isolation check would attract traffic again as no better paths remain. Better to serve stale data than none at all; CAP theorem trade-offs at work? -- Hugo Slabbert On Tue, Oct 5, 2021 at 3:22 PM Michael Thomas <mike@mtcc.com> wrote:
On 10/5/21 3:09 PM, Andy Brezinsky wrote:
It's a few years old, but Facebook has talked a little bit about their DNS infrastructure before. Here's a little clip that talks about Cartographer: https://youtu.be/bxhYNfFeVF4?t=2073
From their outage report, it sounds like their authoritative DNS servers withdraw their anycast announcements when they're unhealthy. The health check from those servers must have relied on something upstream. Maybe they couldn't talk to Cartographer for a few minutes so they thought they might be isolated from the rest of the network and they decided to withdraw their routes instead of serving stale data. Makes sense when a single node does it, not so much when the entire fleet thinks that they're out on their own.
A performance issue in Cartographer (or whatever manages this fleet these days) could have been the ticking time bomb that set the whole thing in motion.
Rereading it is said that their internal (?) backbone went down so pulling the routes was arguably the right thing to do. Or at least not flat out wrong. Taking out their nameserver subnets was clearly a problem though, though a fix is probably tricky since you clearly want to take down errant nameservers too.
Mike
By what they have said publicly, the initial trigger point was that all of their datacenters were disconnected from their internal backbone, thus unreachable. Once that occurs, nothing else really matters. Even if the external announcements were not withdrawn, and the edge DNS servers could provide stale answers, the IPs those answers provided wouldn't have actually been reachable, and there wouldn't be 3 days of red herring conversations about DNS design. No DNS design exists that can help people reach resources not network reachable. /shrug On Tue, Oct 5, 2021 at 6:30 PM Hugo Slabbert <hugo@slabnet.com> wrote:
Had some chats with other folks: Arguably you could change the nameserver isolation check failure action to be "depref your exports" rather than "yank it all". Basically, set up a tiered setup so the boxes passing those additional health checks and that should have correct entries would be your primary destination and failing nodes shouldn't receive query traffic since they're depref'd in your internal routing. But in case all nodes fail that check simultaneously, those nodes failing the isolation check would attract traffic again as no better paths remain. Better to serve stale data than none at all; CAP theorem trade-offs at work?
-- Hugo Slabbert
On Tue, Oct 5, 2021 at 3:22 PM Michael Thomas <mike@mtcc.com> wrote:
On 10/5/21 3:09 PM, Andy Brezinsky wrote:
It's a few years old, but Facebook has talked a little bit about their DNS infrastructure before. Here's a little clip that talks about Cartographer: https://youtu.be/bxhYNfFeVF4?t=2073
From their outage report, it sounds like their authoritative DNS servers withdraw their anycast announcements when they're unhealthy. The health check from those servers must have relied on something upstream. Maybe they couldn't talk to Cartographer for a few minutes so they thought they might be isolated from the rest of the network and they decided to withdraw their routes instead of serving stale data. Makes sense when a single node does it, not so much when the entire fleet thinks that they're out on their own.
A performance issue in Cartographer (or whatever manages this fleet these days) could have been the ticking time bomb that set the whole thing in motion.
Rereading it is said that their internal (?) backbone went down so pulling the routes was arguably the right thing to do. Or at least not flat out wrong. Taking out their nameserver subnets was clearly a problem though, though a fix is probably tricky since you clearly want to take down errant nameservers too.
Mike
Tom Beecher <beecher@beecher.cc> writes:
Even if the external announcements were not withdrawn, and the edge DNS servers could provide stale answers, the IPs those answers provided wouldn't have actually been reachable
Do we actually know this wrt the tools referred to in "the total loss of DNS broke many of the tools we’d normally use to investigate and resolve outages like this."? Those tools aren't necessarily located in any of the remote data centers, and some of them might even refer to resources outside the facebook network. Not to mention that keeping the DNS service up would have prevented resolver overload in the rest of the world. Besides, the disconnected frontend servers are probably configured to display a "we have a slight technical issue. will be right back" notice in such situations. This is a much better user experience that the "facebook? never heard of it" message we got on monday. yes, it makes sense to keep your domains alive even if your network isn't. That's why the best practice is name servers in more than one AS. Bjørn
I mean, at the end of the day they likely designed these systems to be able to handle one or more datacenters being disconnected from the world, and considered a scenario of ALL their datacenters being disconnected from the world so unlikely they chose not to solve for it. Works great, until it doesn't. I'm sure they'll learn from this and in the future have some better things in place to account for such a scenario. On Wed, Oct 6, 2021 at 12:21 PM Bjørn Mork <bjorn@mork.no> wrote:
Tom Beecher <beecher@beecher.cc> writes:
Even if the external announcements were not withdrawn, and the edge DNS servers could provide stale answers, the IPs those answers provided wouldn't have actually been reachable
Do we actually know this wrt the tools referred to in "the total loss of DNS broke many of the tools we’d normally use to investigate and resolve outages like this."? Those tools aren't necessarily located in any of the remote data centers, and some of them might even refer to resources outside the facebook network.
Not to mention that keeping the DNS service up would have prevented resolver overload in the rest of the world.
Besides, the disconnected frontend servers are probably configured to display a "we have a slight technical issue. will be right back" notice in such situations. This is a much better user experience that the "facebook? never heard of it" message we got on monday.
yes, it makes sense to keep your domains alive even if your network isn't. That's why the best practice is name servers in more than one AS.
Bjørn
Do we actually know this wrt the tools referred to in "the total loss of DNS broke many of the tools we’d normally use to investigate and resolve outages like this."? Those tools aren't necessarily located in any of the remote data centers, and some of them might even refer to resources outside the facebook network.
Yea; that's kinda the thinking here. Specifics are scarce, but there were notes re: the OOB for instance also being unusable. The questions are how much that was due to dependence of the OOB network on the production side, and how much DNS being notionally available might have supported getting things back off the ground (if it would just provide mgt addresses for key devices, or if perhaps there was a AAA dependency that also rode on DNS). This isn't to say there aren't other design considerations in play to make that fly (e.g. if DNS lives in edge POPs, and such an edge POP gets isolated from the FB network but still has public Internet peering, how do we ensure that edge POP does not continue exporting the DNS prefix into the DFZ and serving stale records?), but perhaps also still solvable I'm sure they'll learn from this and in the future have some better things
in place to account for such a scenario.
100% I think we can say with some level of confidence that there is going to be a *lot* of discussion and re-evaluation of inter-service dependencies. -- Hugo Slabbert On Wed, Oct 6, 2021 at 9:48 AM Tom Beecher <beecher@beecher.cc> wrote:
I mean, at the end of the day they likely designed these systems to be able to handle one or more datacenters being disconnected from the world, and considered a scenario of ALL their datacenters being disconnected from the world so unlikely they chose not to solve for it. Works great, until it doesn't.
I'm sure they'll learn from this and in the future have some better things in place to account for such a scenario.
On Wed, Oct 6, 2021 at 12:21 PM Bjørn Mork <bjorn@mork.no> wrote:
Tom Beecher <beecher@beecher.cc> writes:
Even if the external announcements were not withdrawn, and the edge DNS servers could provide stale answers, the IPs those answers provided wouldn't have actually been reachable
Do we actually know this wrt the tools referred to in "the total loss of DNS broke many of the tools we’d normally use to investigate and resolve outages like this."? Those tools aren't necessarily located in any of the remote data centers, and some of them might even refer to resources outside the facebook network.
Not to mention that keeping the DNS service up would have prevented resolver overload in the rest of the world.
Besides, the disconnected frontend servers are probably configured to display a "we have a slight technical issue. will be right back" notice in such situations. This is a much better user experience that the "facebook? never heard of it" message we got on monday.
yes, it makes sense to keep your domains alive even if your network isn't. That's why the best practice is name servers in more than one AS.
Bjørn
Randy Monroe via NANOG wrote:
Updated: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/
So, what was lost was internal connectivity between data centers. That facebook use very short expiration period for zone data is a separate issue. As long as name servers with expired zone data won't serve request from outside of facebook, whether BGP routes to the name servers are announced or not is unimportant. Masataka Ohta
Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> writes:
As long as name servers with expired zone data won't serve request from outside of facebook, whether BGP routes to the name servers are announced or not is unimportant.
I am not convinced this is true. You'd normally serve some semi-static content, especially wrt stuff you need yourself to manage your network. Removing all DNS servers at the same time is never a good idea, even in the situation where you believe they are all failing. The problem is of course that you can't let the servers take the decision to withdraw from anycast if you want to prevent this catastrophe. The servers have no knowledge of the rest of the network. They only know that they've lost contact with it. So they all make the same stupid decision. But if the servers can't withdraw, then they will serve stale content if the data center loses backbone access. And with a large enough network then that is probably something which happens on a regular basis. This is a very hard problem to solve. Thanks a lot to facebook for making the detailed explanation available to the public. I'm crossing my fingers hoping they follow up with details about the solutions they come up with. The problem affects any critical anycast DNS service. And it doesn't have to be as big as facebook to be locally critical to an enterprise, ISP or whatever. Bjørn
Bjørn Mork wrote:
Removing all DNS servers at the same time is never a good idea, even in the situation where you believe they are all failing.
As I wrote: : That facebook use very short expiration period for zone : data is a separate issue. that is a separate issue.
This is a very hard problem to solve.
If that is their policy, it is just a policy to enforce and not a problem to solve. Masataka Ohta
Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> writes:
Bjørn Mork wrote:
Removing all DNS servers at the same time is never a good idea, even in the situation where you believe they are all failing.
As I wrote:
: That facebook use very short expiration period for zone : data is a separate issue.
that is a separate issue.
Sorry, I don't understand what you're getting at. The TTL is not an issue. An infinite TTL won't save you if all authoritative servers are unreachable. It will just make things worse in almost every other error scenario. The only solution to the problem of unreachable authoritative DNS servers is: Don't do that.
This is a very hard problem to solve.
If that is their policy, it is just a policy to enforce and not a problem to solve.
The policy is there to solve a real problem. Serving stale data from a single disconnected anycast site is a problem. A disconnected site is unmanaged and must make autonomous decisions. That pre-programmed decision is "just policy". Should you withdraw the DNS routes or not? Serve stale or risk meltdown? I still don't think there is an easy and obviously correct answer. But they do of course need to add a safety net or two if they continue with the "meltdown" policy. Bjørn
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). I can certainly understand for the DNS servers to not give answers they think are unreachable but there is always the problem that they may be partitioned and not the routes themselves. At a minimum, I would think they'd need some consensus protocol that says that it's broken across multiple servers. But I just don't understand why this is a good idea at all. Network topology is not DNS's bailiwick so using it as a trigger to withdraw routes seems really strange and fraught with unintended consequences. Why is it a good idea to withdraw the route if it doesn't seem reachable from the DNS server? Give answers that are reachable, sure, but to actually make a topology decision? Yikes. And what happens to the cached answers that still point to the supposedly dead route? They're going to fail until the TTL expires anyway so why is it preferable withdraw the route too? My guess is that their post while more clear that most doesn't go into enough detail, but is it me or does it seem like this is a really weird thing to do? Mike On 10/5/21 11:56 PM, Bjørn Mork wrote:
Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> writes:
As long as name servers with expired zone data won't serve request from outside of facebook, whether BGP routes to the name servers are announced or not is unimportant. I am not convinced this is true. You'd normally serve some semi-static content, especially wrt stuff you need yourself to manage your network. Removing all DNS servers at the same time is never a good idea, even in the situation where you believe they are all failing.
The problem is of course that you can't let the servers take the decision to withdraw from anycast if you want to prevent this catastrophe. The servers have no knowledge of the rest of the network. They only know that they've lost contact with it. So they all make the same stupid decision.
But if the servers can't withdraw, then they will serve stale content if the data center loses backbone access. And with a large enough network then that is probably something which happens on a regular basis.
This is a very hard problem to solve.
Thanks a lot to facebook for making the detailed explanation available to the public. I'm crossing my fingers hoping they follow up with details about the solutions they come up with. The problem affects any critical anycast DNS service. And it doesn't have to be as big as facebook to be locally critical to an enterprise, ISP or whatever.
Bjørn
They most likely sent an update to the DNS servers for TLV DNSSEC and in oversight forgot they needed to null something's out of the workbook to not touch the BGP instances. I'd hardly believe that would be triggered by the dns server itself. -- J. Hellenthal The fact that there's a highway to Hell but only a stairway to Heaven says a lot about anticipated traffic volume.
On Oct 6, 2021, at 12:45, Michael Thomas <mike@mtcc.com> wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). I can certainly understand for the DNS servers to not give answers they think are unreachable but there is always the problem that they may be partitioned and not the routes themselves. At a minimum, I would think they'd need some consensus protocol that says that it's broken across multiple servers.
But I just don't understand why this is a good idea at all. Network topology is not DNS's bailiwick so using it as a trigger to withdraw routes seems really strange and fraught with unintended consequences. Why is it a good idea to withdraw the route if it doesn't seem reachable from the DNS server? Give answers that are reachable, sure, but to actually make a topology decision? Yikes. And what happens to the cached answers that still point to the supposedly dead route? They're going to fail until the TTL expires anyway so why is it preferable withdraw the route too?
My guess is that their post while more clear that most doesn't go into enough detail, but is it me or does it seem like this is a really weird thing to do?
Mike
On 10/5/21 11:56 PM, Bjørn Mork wrote: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> writes:
As long as name servers with expired zone data won't serve request from outside of facebook, whether BGP routes to the name servers are announced or not is unimportant. I am not convinced this is true. You'd normally serve some semi-static content, especially wrt stuff you need yourself to manage your network. Removing all DNS servers at the same time is never a good idea, even in the situation where you believe they are all failing.
The problem is of course that you can't let the servers take the decision to withdraw from anycast if you want to prevent this catastrophe. The servers have no knowledge of the rest of the network. They only know that they've lost contact with it. So they all make the same stupid decision.
But if the servers can't withdraw, then they will serve stale content if the data center loses backbone access. And with a large enough network then that is probably something which happens on a regular basis.
This is a very hard problem to solve.
Thanks a lot to facebook for making the detailed explanation available to the public. I'm crossing my fingers hoping they follow up with details about the solutions they come up with. The problem affects any critical anycast DNS service. And it doesn't have to be as big as facebook to be locally critical to an enterprise, ISP or whatever.
Bjørn
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment. It doesn’t have to be externally visible like this event for that to be the case. I would say more like Application availability caused the BGP routes to be withdrawn. I know several network operators that run DNS internally (even on raspberry pi devices) and may have OSPF or BGP announcements internally to ensure things work well. If the process dies (crash, etc) they want to route to the next nearest cluster. Of course if they all are down there’s negative outcomes. - Jared
On Oct 6, 2021, at 1:42 PM, Michael Thomas <mike@mtcc.com> wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). I can certainly understand for the DNS servers to not give answers they think are unreachable but there is always the problem that they may be partitioned and not the routes themselves. At a minimum, I would think they'd need some consensus protocol that says that it's broken across multiple servers.
But I just don't understand why this is a good idea at all. Network topology is not DNS's bailiwick so using it as a trigger to withdraw routes seems really strange and fraught with unintended consequences. Why is it a good idea to withdraw the route if it doesn't seem reachable from the DNS server? Give answers that are reachable, sure, but to actually make a topology decision? Yikes. And what happens to the cached answers that still point to the supposedly dead route? They're going to fail until the TTL expires anyway so why is it preferable withdraw the route too?
My guess is that their post while more clear that most doesn't go into enough detail, but is it me or does it seem like this is a really weird thing to do?
Mike
On 10/5/21 11:56 PM, Bjørn Mork wrote:
Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> writes:
As long as name servers with expired zone data won't serve request from outside of facebook, whether BGP routes to the name servers are announced or not is unimportant. I am not convinced this is true. You'd normally serve some semi-static content, especially wrt stuff you need yourself to manage your network. Removing all DNS servers at the same time is never a good idea, even in the situation where you believe they are all failing.
The problem is of course that you can't let the servers take the decision to withdraw from anycast if you want to prevent this catastrophe. The servers have no knowledge of the rest of the network. They only know that they've lost contact with it. So they all make the same stupid decision.
But if the servers can't withdraw, then they will serve stale content if the data center loses backbone access. And with a large enough network then that is probably something which happens on a regular basis.
This is a very hard problem to solve.
Thanks a lot to facebook for making the detailed explanation available to the public. I'm crossing my fingers hoping they follow up with details about the solutions they come up with. The problem affects any critical anycast DNS service. And it doesn't have to be as big as facebook to be locally critical to an enterprise, ISP or whatever.
Bjørn
Yes, it really is common to announce sink routes via bgp from destination services / proxies and to have those announcements be dynamically based on service viability. On Wed, Oct 6, 2021, 12:56 Jared Mauch <jared@puck.nether.net> wrote:
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment. It doesn’t have to be externally visible like this event for that to be the case.
I would say more like Application availability caused the BGP routes to be withdrawn.
I know several network operators that run DNS internally (even on raspberry pi devices) and may have OSPF or BGP announcements internally to ensure things work well. If the process dies (crash, etc) they want to route to the next nearest cluster.
Of course if they all are down there’s negative outcomes.
- Jared
On Oct 6, 2021, at 1:42 PM, Michael Thomas <mike@mtcc.com> wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). I can certainly understand for the DNS servers to not give answers they think are unreachable but there is always the problem that they may be partitioned and not the routes themselves. At a minimum, I would think they'd need some consensus protocol that says that it's broken across multiple servers.
But I just don't understand why this is a good idea at all. Network topology is not DNS's bailiwick so using it as a trigger to withdraw routes seems really strange and fraught with unintended consequences. Why is it a good idea to withdraw the route if it doesn't seem reachable from the DNS server? Give answers that are reachable, sure, but to actually make a topology decision? Yikes. And what happens to the cached answers that still point to the supposedly dead route? They're going to fail until the TTL expires anyway so why is it preferable withdraw the route too?
My guess is that their post while more clear that most doesn't go into enough detail, but is it me or does it seem like this is a really weird thing to do?
Mike
On 10/5/21 11:56 PM, Bjørn Mork wrote:
Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> writes:
As long as name servers with expired zone data won't serve request from outside of facebook, whether BGP routes to the name servers are announced or not is unimportant. I am not convinced this is true. You'd normally serve some semi-static content, especially wrt stuff you need yourself to manage your network. Removing all DNS servers at the same time is never a good idea, even in the situation where you believe they are all failing.
The problem is of course that you can't let the servers take the decision to withdraw from anycast if you want to prevent this catastrophe. The servers have no knowledge of the rest of the network. They only know that they've lost contact with it. So they all make the same stupid decision.
But if the servers can't withdraw, then they will serve stale content if the data center loses backbone access. And with a large enough network then that is probably something which happens on a regular basis.
This is a very hard problem to solve.
Thanks a lot to facebook for making the detailed explanation available to the public. I'm crossing my fingers hoping they follow up with details about the solutions they come up with. The problem affects any critical anycast DNS service. And it doesn't have to be as big as facebook to be locally critical to an enterprise, ISP or whatever.
Bjørn
Jared Mauch wrote:
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment.
Yes, that is a commonly seen mistake with anycast.
I would say more like Application availability caused the BGP routes to be withdrawn.
Considering a failure mode that routes are not withdrawn even if application dies or routes are withdrawn even if application is alive, active withdrawal of routes is unnecessary complication with no improvement of redundancy. DNS (and other protocol's) redundancy is to have multiple unicast/anycast name server addresses. Just rely on it. Masataka Ohta
On Wed, Oct 6, 2021 at 10:44 PM Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:
Jared Mauch wrote:
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment.
Yes, that is a commonly seen mistake with anycast.
You don't know what you're talking about. If your anycast node stops receiving updated data and you can't reach any of the other nodes to check whether they're online, 99 times out of 100 this means a local failure of some sort. You withdraw the node's announcement so that you don't serve bad data to the end user. That's what happened here - because the facebook backbone was down, the DNS servers stopped receiving updates and determined their data to be stale. Simply turning themselves off, instead of withdrawing the routes, would result in suboptimal performance. And 99 times out of 100, not doing one or the other would cause rather than prevent an outage. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
William Herrin wrote:
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment.
Yes, that is a commonly seen mistake with anycast.
You don't know what you're talking about.
I do but you don't.
If your anycast node stops receiving updated data and you can't reach any of the other nodes to check whether they're online, 99 times out of 100 this means a local failure of some sort.
Yes. In case of DNS, if expiration period of a zone is passed without successful check of the current most zone version, unicast or anycast name servers stop responding requests for the zone. But, it has nothing specifically to do with anycast. As there are other name servers with different IP addresses, there is no reason to withdraw routes. So?
You withdraw the node's announcement so that you don't serve bad data to the end user.
That will only introduce new failure modes of mismatches between server availability and server reachability and is a bad idea.
That's what happened here -
Yes, facebook did wrong thing to actively withdraw routes.
Simply turning themselves off, instead of withdrawing the routes, would result in suboptimal performance.
This time, facebook is saying that they could not reach their name servers even though the servers were perfectly working. How much performance, do you think, facebook enjoyed? A lot less than "suboptimal", I'm afraid.
And 99 times out of 100, not doing one or the other would cause rather than prevent an outage.
That is a commonly seen misconception wrongly assuming that server routes were withdrawn if and only if the server is unavailable. But, the reality is that it is impossible to correctly recognize server is unavailable or to correctly withdraw routes only when server is unavailable. Masataka Ohta
But, the reality is that it is impossible to correctly recognize server is unavailable or to correctly withdraw routes only when server is unavailable.
Not true at all. On Thu, Oct 7, 2021 at 9:50 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
William Herrin wrote:
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment.
Yes, that is a commonly seen mistake with anycast.
You don't know what you're talking about.
I do but you don't.
If your anycast node stops receiving updated data and you can't reach any of the other nodes to check whether they're online, 99 times out of 100 this means a local failure of some sort.
Yes. In case of DNS, if expiration period of a zone is passed without successful check of the current most zone version, unicast or anycast name servers stop responding requests for the zone.
But, it has nothing specifically to do with anycast. As there are other name servers with different IP addresses, there is no reason to withdraw routes. So?
You withdraw the node's announcement so that you don't serve bad data to the end user.
That will only introduce new failure modes of mismatches between server availability and server reachability and is a bad idea.
That's what happened here -
Yes, facebook did wrong thing to actively withdraw routes.
Simply turning themselves off, instead of withdrawing the routes, would result in suboptimal performance.
This time, facebook is saying that they could not reach their name servers even though the servers were perfectly working.
How much performance, do you think, facebook enjoyed? A lot less than "suboptimal", I'm afraid.
And 99 times out of 100, not doing one or the other would cause rather than prevent an outage.
That is a commonly seen misconception wrongly assuming that server routes were withdrawn if and only if the server is unavailable.
But, the reality is that it is impossible to correctly recognize server is unavailable or to correctly withdraw routes only when server is unavailable.
Masataka Ohta
Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> writes:
William Herrin wrote:
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment.
Yes, that is a commonly seen mistake with anycast. You don't know what you're talking about.
I do but you don't.
https://datatracker.ietf.org/doc/html/rfc4786#section-4.4.1 Not a mistake. BCP. Bjørn
Bjørn Mork wrote:
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment.
Yes, that is a commonly seen mistake with anycast. You don't know what you're talking about.
I do but you don't.
https://datatracker.ietf.org/doc/html/rfc4786#section-4.4.1
Not a mistake. BCP.
My comment on the rfc is that it is simply wrong. See also: https://datatracker.ietf.org/doc/html/rfc3258 While it would be possible to have some process withdraw the route for a specific server instance when it is not available, there is considerable operational complexity involved in ensuring that this occurs reliably. Given the existing DNS failover methods, the marginal improvement in performance will not be sufficient to justify the additional complexity for most uses. which was our consensus at that time in DNSOP. I have no idea why it was forgotten. Masataka Ohta
On Thu, Oct 7, 2021 at 8:28 AM Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:
My comment on the rfc is that it is simply wrong.
See also:
https://datatracker.ietf.org/doc/html/rfc3258 While it would be possible to have some process withdraw the route for a specific server instance when it is not available, there is considerable operational complexity involved in ensuring that this occurs reliably. Given the existing DNS failover methods, the marginal improvement in performance will not be sufficient to justify the additional complexity for most uses.
which was our consensus at that time in DNSOP. I have no idea why it was forgotten.
It wasn't forgotten. Folks gained a lot of experience with anycast DNS between 2002 and 2006. Not withdrawing the routes when the servers are deemed malfunctioning turned out not to be an operationally sound practice. The theory offered in 3258 was wrong. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On 10/7/21 18:21, William Herrin wrote:
It wasn't forgotten. Folks gained a lot of experience with anycast DNS between 2002 and 2006. Not withdrawing the routes when the servers are deemed malfunctioning turned out not to be an operationally sound practice. The theory offered in 3258 was wrong.
Especially terrible when you have a DNS daemon that has crashed, but Quagga (or whatever routing suite you use) is still humming. Mark.
William Herrin wrote:
It wasn't forgotten. Folks gained a lot of experience with anycast DNS between 2002 and 2006. Not withdrawing the routes when the servers are deemed malfunctioning turned out not to be an operationally sound practice. The theory offered in 3258 was wrong.
So, from limited experience, you thought it were wrong because:
Simply turning themselves off, instead of withdrawing the routes, would result in suboptimal performance.
But, this time, the reality strikes back. That you can be safe 99 times out of 100 can mean remaining 1 time is totally disastrous. When servers are deemed malfunctioning, the best practice is to check whether the servers are really malfunctioning or not before blindly shutdown the servers. Masataka Ohta
On Thu, Oct 7, 2021 at 9:52 AM Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:
But, this time, the reality strikes back.
Not really. Or at all. Facebook the external service was down hard as soon as the cross-datacenter connections all failed. Whether or not the BGP routes for the external DNS were withdrawn had no impact on the outage. Facebook's _internal_ DNS, while not anycasted, followed a similar logic: if the data center is isolated and their data goes stale, they stop serving potentially wrong answers. Since the routing failure isolated all of the data centers, this left no usable _INTERNAL_ DNS on which more or less everything else depends. I didn't work for the DNS team when I worked as a production engineer for Facebook but I worked close enough to understand what happened from the posted description. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
William Herrin wrote:
Facebook's _internal_ DNS, while not anycasted, followed a similar logic: if the data center is isolated and their data goes stale, they stop serving potentially wrong answers.
As I already wrote, that is a standard mechanism of DNS with SOA expiration period as is documented in rfc1034 as ("an discard" should be "and discard"): If the secondary finds it impossible to perform a serial check for the EXPIRE interval, it must assume that its copy of the zone is obsolete an discard it. But, that has nothing to do with anycast or route (BGP or IGP) withdrawal.
I didn't work for the DNS team when I worked as a production engineer for Facebook but I worked close enough to understand what happened from the posted description.
I don't think those who post the description properly understand what is wrong with their management. Masataka Ohta
On Thu, Oct 7, 2021 at 10:23 AM Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:
William Herrin wrote:
Facebook's _internal_ DNS, while not anycasted, followed a similar logic: if the data center is isolated and their data goes stale, they stop serving potentially wrong answers.
As I already wrote, that is a standard mechanism of DNS with SOA expiration period as is documented in rfc1034
Then we agree: The failure mode was that after the data centers disconnected from each other, all their DNS expired, breaking the tools they'd normally use to recover. Facebook withdrawing the BGP routes to its anycasted public DNS servers as they expired made no difference. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
William Herrin wrote:
Facebook's _internal_ DNS, while not anycasted, followed a similar logic: if the data center is isolated and their data goes stale, they stop serving potentially wrong answers.
As I already wrote, that is a standard mechanism of DNS with SOA expiration period as is documented in rfc1034
Then we agree:
Do we?
The failure mode was that after the data centers disconnected from each other, all their DNS expired, breaking the tools they'd normally use to recover.
It means DNS management of facebook is poor. If they are using standard expire mechanism, they should have used two zones facebook.com for external users with short expire and internal.facebook.com for internal users with long expire.
Facebook withdrawing the BGP routes to its anycasted public DNS servers as they expired made no difference.
If they are not using standard expire mechanism expecting internal data still accessible even after external data has expired, there is difference. Masataka Ohta
----- On Oct 7, 2021, at 9:03 PM, Masataka Ohta mohta@necom830.hpcl.titech.ac.jp wrote: Hi,
It means DNS management of facebook is poor.
Whenever there is an aviation incident, the keyboard warriors at pprune.org are always the first to start speculating about root causes, and complain how the air crew made mistakes. They, the keyboard warriors, of course know how best to fly an aircraft with 20/20 hindsight from their armchairs. Why do I see so many posts that are basically throwing Facebook engineers under the bus? Let's for a moment contemplate about the sheer magnitude of their operation. With almost 3 billion users worldwide, can you imagine the amount of DNS queries they have to process? Their scale is unprecedented. Sure, it's ok to speculate about potential operational or design issues that may have been contributing factors to the outage. But throwing our colleagues in front of the lions like this is something I would not recommend. I'm sure they are aware of these posts, but are unable to reply due to the amount of NDAs signed. Thanks, Sabri
On 10/8/21 07:25, Sabri Berisha wrote:
Whenever there is an aviation incident, the keyboard warriors at pprune.org are always the first to start speculating about root causes, and complain how the air crew made mistakes. They, the keyboard warriors, of course know how best to fly an aircraft with 20/20 hindsight from their armchairs.
Why do I see so many posts that are basically throwing Facebook engineers under the bus? Let's for a moment contemplate about the sheer magnitude of their operation. With almost 3 billion users worldwide, can you imagine the amount of DNS queries they have to process? Their scale is unprecedented.
Sure, it's ok to speculate about potential operational or design issues that may have been contributing factors to the outage. But throwing our colleagues in front of the lions like this is something I would not recommend.
I'm sure they are aware of these posts, but are unable to reply due to the amount of NDAs signed.
Folk love to complain and critique. It's human nature, and unproductive quality that is part of our DNA. The good news is that lots of human beings have the DNA to ignore noise and carry on helping mankind to grow and live better lives. In any event, if you aren't willing to put your neck on the line and fail spectacularly, I don't care what you have to say. Failure is finding out what doesn't work, quickly, and inching closer to what does. I'm all for that. Mark.
Sabri Berisha wrote:
Let's for a moment contemplate about the sheer magnitude of their operation. With almost 3 billion users worldwide, can you imagine the amount of DNS queries they have to process? Their scale is unprecedented. That's what I predicted about 20 years ago, which is why I proposed to have anycast name servers analyzing its implications.
As such I'm sure anycast route withdrawal ignoring rfc3258 is poor engineering. Scalable solutions can be constructed only with careful theoretical analysis, against which random hacks, which may work 99% of the time, are just harmful. In facebook case, it was combined with poor understanding on short/long expiration period to cause the disaster. Masataka Ohta
In facebook case, it was combined with poor understanding on short/long expiration period to cause the disaster.
Still, no. The CAUSE of the outage was all of the FB datacenters being completely disconnected from their backbone, and thus the internet. DNS breaking was a direct RESULT of that. Even if FB's DNS was happily still providing answers to IPs that were still unreachable, they were still horked. Could their DNS design possibly have contributed to some delay in the RESTORATION phase? Perhaps. But with the volume of traffic they do, that was certainly going to take a while anyways. On Fri, Oct 8, 2021 at 5:17 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Sabri Berisha wrote:
Let's for a moment contemplate about the sheer magnitude of their operation. With almost 3 billion users worldwide, can you imagine the amount of DNS queries they have to process? Their scale is unprecedented. That's what I predicted about 20 years ago, which is why I proposed to have anycast name servers analyzing its implications.
As such I'm sure anycast route withdrawal ignoring rfc3258 is poor engineering.
Scalable solutions can be constructed only with careful theoretical analysis, against which random hacks, which may work 99% of the time, are just harmful.
In facebook case, it was combined with poor understanding on short/long expiration period to cause the disaster.
Masataka Ohta
On 2021-10-08, at 07:25, Sabri Berisha <sabri@cluecentral.net> wrote:
Whenever there is an aviation incident, the keyboard warriors at pprune.org are always the first to start speculating about root causes
So we need an NTSB, BFU, ... for the Internet and widely used Internet applications. (And the other national equivalents…) A site like avherald.com would also be useful (minor todo: Find someone as dedicated as Simon Hradecky to run it). Grüße, Carsten
On Thu, Oct 7, 2021 at 9:04 PM Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:
William Herrin wrote:
Facebook withdrawing the BGP routes to its anycasted public DNS servers as they expired made no difference.
If they are not using standard expire mechanism expecting internal data still accessible even after external data has expired, there is difference.
I give up. Although you have no knowledge whatsoever about how Facebook implemented their DNS you are obviously correct in all things. -- William Herrin bill@herrin.us https://bill.herrin.us/
William Herrin wrote:
If they are not using standard expire mechanism expecting internal data still accessible even after external data has expired, there is difference.
I give up.
To accept the reality of disastrous facebook failure? I know.
Although you have no knowledge whatsoever about how Facebook implemented their DNS
https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/ our DNS servers disable those BGP advertisements if they themselves can not speak to our data centers The end result was that our DNS servers became unreachable even though they were still operational. means their DNS servers were serving the zone, even after they recognize their zone data were too old, that is, expired.
you are obviously correct in all things.
If you think so, it's your problem, I'm afraid. Masataka Ohta
(I'm going to hate myself in the morning, but) On Fri, Oct 8, 2021 at 10:22 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
William Herrin wrote:
https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/
our DNS servers disable those BGP advertisements if they themselves can not speak to our data centers
The end result was that our DNS servers became unreachable even though they were still operational.
means their DNS servers were serving the zone, even after they recognize their zone data were too old, that is, expired.
that's not what this means. I think Mr. Petach previously described this, but: 1) dns server in pop serves some content (ttls aren't important right now) 2) dns server uses some quagga/gated/bird/etc to announce locally: "Hey, foo/32 here!" (imagine this triggers an 'aggregate route' or 'network statement' (pick your vendor solution) to appear in the global table) 3) dns server also 'ping backend server set' 4) when 3 fails for X period of time 'tell quagga/bird/etc to stop announcing the /32' then the local pop no longer sources the aggregate (/24 or /23 or whatever)... so traffic SHOULD (externally) flow toward another copy of the /23 or /24 or whatever... there's not a lot of magic here... and it's not about the zone data really at all.
Christopher Morrow wrote:
means their DNS servers were serving the zone, even after they recognize their zone data were too old, that is, expired.
that's not what this means. I think Mr. Petach previously described this,
He wrote:
So, the idea is that if the edge CDN node loses connectivity to the core datacenters, the DNS servers should stop answering queries for A records with the local CDN node's address, and let a different site respond back to the client's DNS request.
which may be performed by standard DNS with short expire period, after which name servers will return SERVFAIL and other name servers in other edge node with different IP addresses are tried. It may be that facebook uses all the four name server IP addresses in each edge node. But, it effectively kills essential redundancy of DNS to have two or more name servers (at separate locations) and the natural consequence is, as you can see, mass disaster.
but: 1) dns server in pop serves some content (ttls aren't important right now)
You MUST distinguish TTL and EXPIRE. They are different.
there's not a lot of magic here... and it's not about the zone data really at all.
Statement of Petach: "the edge CDN node loses connectivity to the core datacenters, the DNS servers should stop answering" means, with DNS terminology, zone data is expired, which has nothing to do with TTL. Masataka Ohta
On Oct 9, 2021, at 10:37 AM, Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote: It may be that facebook uses all the four name server IP addresses in each edge node. But, it effectively kills essential redundancy of DNS to have two or more name servers (at separate locations) and the natural consequence is, as you can see, mass disaster.
Yep. I think we even had a NANOG talk on exactly that specific topic a long time ago. https://www.pch.net/resources/Papers/dns-service-architecture/dns-service-ar... -Bill
Bill Woodcock wrote:
It may be that facebook uses all the four name server IP addresses in each edge node. But, it effectively kills essential redundancy of DNS to have two or more name servers (at separate locations) and the natural consequence is, as you can see, mass disaster.
Yep. I think we even had a NANOG talk on exactly that specific topic a long time ago.
https://www.pch.net/resources/Papers/dns-service-architecture/dns-service-ar...
Yes, having separate sets of anycast addresses by two or more pops should be fine. However, if CDN provider has their own transit backbone, which is, seemingly, not assumed by your slides, and retail ISPs are tightly connected to only one pop of the CDN provider, the CDN provider may be motivated to let users access only one pop killing essential redundancy of DNS, which should be overengineering, which is my concern of the paragraph quoted by you. Masataka Ohta
On Sat, Oct 9, 2021 at 11:16 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Bill Woodcock wrote:
It may be that facebook uses all the four name server IP addresses in each edge node. But, it effectively kills essential redundancy of DNS to have two or more name servers (at separate locations) and the natural consequence is, as you can see, mass disaster.
Yep. I think we even had a NANOG talk on exactly that specific topic a long time ago.
https://www.pch.net/resources/Papers/dns-service-architecture/dns-service-ar...
Yes, having separate sets of anycast addresses by two or more pops should be fine.
To be fair, it looks like FB has 4 /32's (and 4 /128's) for their DNS authoritatives. All from different /24's or /48's, so they should have decent routing diversity. They could choose to announce half/half from alternate pops, or other games such as this. I don't know that that would have solved any of the problems last week nor any problems in the future. I think Bill's slide 30 is pretty much what FB has/had deployed: 1) I would think the a/b cloud is really 'as similar a set of paths from like deployments as possible 2) redundant pairs of servers in the same transit/network 3) hidden masters (almost certainly these are in the depths of the FB datacenter network) (though also this part isn't important for the conversation) 4) control/sync traffic on a different topology than the customer serving one
However, if CDN provider has their own transit backbone, which is, seemingly, not assumed by your slides, and retail ISPs are tightly
I think it is, actually, in slide 30 ? "We need a network topology to carry control and synchronization traffic between the nodes" connected to only one pop of the CDN provider, the CDN provider
it's also not clear that FB is connecting their CDN to single points in any provider... I'd guess there are some cases of that, but for larger networks I would imagine there are multiple CDN deployments per network. I can't imagine that it's safe to deploy 1 CDN node for all of 7018 or 3320... for instance.
may be motivated to let users access only one pop killing essential redundancy of DNS, which should be overengineering, which is my concern of the paragraph quoted by you.
it seems that the problem FB ran into was really that there wasn't either: "secondary path to communicate: "You are the last one standing, do not die" (to an edge node) or: "maintain a very long/less-preferred path to a core location(s) to maintain service in case the CDN disappears" There are almost certainly more complexities which FB is not discussion in their design/deployment which affected their services last week, but it doesn't look like they were very far off on their deployment, if they need to maintain back-end connectivity to serve customers from the CDN locales. -chris
On Mon, Oct 11, 2021 at 8:07 AM Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Sat, Oct 9, 2021 at 11:16 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Bill Woodcock wrote:
[...]
it seems that the problem FB ran into was really that there wasn't either: "secondary path to communicate: "You are the last one standing, do not die" (to an edge node) or: "maintain a very long/less-preferred path to a core location(s) to maintain service in case the CDN disappears"
There are almost certainly more complexities which FB is not discussion in their design/deployment which affected their services last week, but it doesn't look like they were very far off on their deployment, if they need to maintain back-end connectivity to serve customers from the CDN locales.
-chris
Having worked on trying to solve health-checking situations in large production complexes in the past, I can definitely say that is is an exponentially difficult problem for a single site to determine whether it is "safe" for it to fail out, or if doing so will result in an entire service going offline, short of having a central controller which tracks every edge site's health, and can determine "no, we're below $magic_threshold number of sites, you can't fail yourself out no matter how unhealthy you think you are". Which of course you can't really have, without undoing one of the key reasons for distributing your serving sites to geographically distant places in different buildings on different providers--namely to eliminate single points of failure in your serving infrastructure. Doing the equivalent of "no router bgp" on your core backbone is going to make things suck, no matter how you slice it, and I don't think any amount of tweaking the anycast setup or DNS values would have made a whit of difference to the underlying outage. I think the only question we can armchair quarterback at this point is whether there were prudent steps that could go into a design to shorten the recovery interval. So far, we seem to have collected a few key points: 1) make sure your disaster recovery plan doesn't depend on your production DNS servers being usable; have key nodes in /etc/hosts files that are periodically updated via $automation_tool, but ONLY for non-production, out-of-band recovery nodes; don't static any of your production-facing entries. 2) Have a working out-of-band that exists entirely independent of your production network. Dial, frame relay, SMDS, LTE modems, starlink dishes on the roof; pick your poison, but budget it in for every production site. Test it monthly to ensure connectivity to all sites works. Audit regularly to ensure no dependencies on the production infrastructure have crept in. 3) Ensure you have a good "oh sh**" physical access plan for key personnel. Some of you at a recent virtual happy hour heard me talk about the time I isolated the credit card payment center for a $dayjob, which also cut off access for the card readers to get into it to restore the network. Use of a fire axe was granted to on-site personnel during that. Take the time to think through how physical access is controlled for every key site in your network, think about failure scenarios, and have a "in case of emergency, break glass to get the key" plan in place to shorten recovery times. 4) Have a dependency map/graph of your production network. a) if everything dies, and you have to restart, what has to come up first? b) what dependencies are there that have to be done in the right order c) what services are independent that can be brought up in parallel to speed up recovery? d) does every team supporting services on the critical, dependent pathway have 24x7 on-call coverage, and do they know where in the recovery graph they're needed? It doesn't help to have teams that can't start back up until step 9 crowding around asking "are you ready for us yet?" when you still can't raise the team needed for step 1 on the dependency graph. ^_^; 5) do you know how close the nearest personnel are to each POP/CDN node, in case you have to do emergency "drive over with a laptop, hop on the console, and issue the following commands" rousting in the middle of the night? If someone lives.3 miles from the CDN node, it's good to know that, so you don't call the person who is on-call but is 2 hours away without first checking if the person 3 miles away can do it faster. I'm sure others have even better experiences than I, who can contribute and add to the list. If nothing else, perhaps collectively we can help other companies prepare a bit better, so that when the next big "ooops" happens, the recovery time can be a little bit shorter. :) Thanks! Matt
Christopher Morrow wrote:
To be fair, it looks like FB has 4 /32's (and 4 /128's) for their DNS authoritatives. All from different /24's or /48's, so they should have decent routing diversity. They could choose to announce half/half from alternate pops, or other games such as this.
Yup.
I don't know that that would have solved any of the problems last week nor any problems in the future.
There are various solutions. For example, if FB had relied on, instead of route withdrawal, standard DNS expire mechanism, FB should have noticed that FB needed another zone for stable data for maintenance servers, I think.
I think Bill's slide 30 is pretty much what FB has/had deployed:
It seems to me that he assumes transit providers and cloud providers are different entities. FB, instead, operate their own transit network and clouds within its domain and clouds are connected only by FB transit (there aren't multiple (red and green) transit).
it's also not clear that FB is connecting their CDN to single points in any provider... I'd guess there are some cases of that,
That is bad enough, if FB wants to "optimize" their traffic for the cases by killing DNS redundancy to put all the name servers in single POP, which is my concern. Masataka Ohta
On Sat, Oct 9, 2021 at 1:40 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Christopher Morrow wrote:
means their DNS servers were serving the zone, even after they recognize their zone data were too old, that is, expired.
that's not what this means. I think Mr. Petach previously described this,
He wrote:
So, the idea is that if the edge CDN node loses connectivity to the core datacenters, the DNS servers should stop answering queries for A records with the local CDN node's address, and let a different site respond back to the client's DNS request.
which may be performed by standard DNS with short expire period, after which name servers will return SERVFAIL and other name servers in other edge node with different IP addresses are tried.
(Apologies for the delayed response--I had back-to-back board meetings the past two days which had me completely tied up.) That is one way in which it *could* be done--but is by no means the ONLY way in which it can be done. With an anycast setup using the same IP addresses in every location, returning SERVFAIL doesn't have the same effect, however, because failing over from anycast address 1 to anycast address 2 is likely to be routed to the same pop location, where the same result will occur. You don't really want to hunt among different *IP addresses*, you want to hunt to a different *location*. This is why withdrawing the BGP announcement from that location works more effectively, because it allows the clients to continue querying the same IP address, but get routed to the next most proximal location. If you simply return SERVFAIL and have the client pick a different IP address from the list of NS entries, it falls into one of two situations: a) the new IP address is also anycasted, and is therefore likely to pick the same pop that is unhealthy, with similar results, or b) the new IP address is *not* anycasted, but is served from a single geographical location, which means answers given back by that DNS server are unlikely to be geolocated with any accuracy, and therefore the content served is also unlikely to be geographically relevant or correct.
It may be that facebook uses all the four name server IP addresses in each edge node. But, it effectively kills essential redundancy of DNS to have two or more name servers (at separate locations) and the natural consequence is, as you can see, mass disaster.
Even if the four anycasted nameserver IP addresses weren't completely overlapping (let's assume as a hypothetical that a.ns is served out of EU pops, b.ns is served out of NA pops, c.ns is served out of SA pops, and d.ns is served out of APAC pops), if all sites run the same healthcheck code, then if the underlying healthcheck fails, *every site* will decide it is unhealthy, and stop answering requests; so, all the EU sites fail health check and stop serving a.ns; all the North America sites fail health check, and stop serving b.ns...and so forth. You followed the best practices, you had different NS entries that were on different subnets, that were geographically dispersed around the globe, that were redundant for each other. But because they all used the same fundamental health check, they all *independently* decided they were unhealthy and needed to stop giving out DNS answers, and instead let one of the other healthier sites take over.
but: 1) dns server in pop serves some content (ttls aren't important right now)
You MUST distinguish TTL and EXPIRE. They are different.
TTL and EXPIRE are irrelevant here. The only thing changing those values would do is change how long it took for caching resolvers to reflect the loss of connectivity at the DNS layer. Once the underlying layer 3 connectivity had broken, DNS answers became meaningless. No matter what records were returned, or cached, you couldn't reach the servers. Yes, yes, as an academic exercise you can point out that there's a difference in how and when those DNS records stop being used, and you're right about that--but in terms of this particular failure, this particular post-mortem we're beating to a horse-shaped pulp, it's entirely meaningless. ^_^;
there's not a lot of magic here... and it's not about the zone data really at all.
Statement of Petach: "the edge CDN node loses connectivity to the core datacenters, the DNS servers should stop answering" means, with DNS terminology, zone data is expired, which has nothing to do with TTL.
As you're using my words, I'm going to have to point out that "the DNS servers should stop answering" does not require that any change happens *at the DNS layer* -- in this case, the change can happen at the routing layer, ensuring that even if some caching resolver out there is completely defiant of your expire time, you *will not answer* because the query packets can never reach you in the first place.
Masataka Ohta
Thanks! Matt
Matthew Petach wrote:
With an anycast setup using the same IP addresses in every location, returning SERVFAIL doesn't have the same effect, however, because failing over from anycast address 1 to anycast address 2 is likely to be routed to the same pop location, where the same result will occur.
That's why that is a bad idea. Alternative name servers with different IP addresses should be provided at separate locations. Masataka Ohta
On Tue, Oct 12, 2021 at 8:41 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Matthew Petach wrote:
With an anycast setup using the same IP addresses in every location, returning SERVFAIL doesn't have the same effect, however, because failing over from anycast address 1 to anycast address 2 is likely to be routed to the same pop location, where the same result will occur.
That's why that is a bad idea. Alternative name servers with different IP addresses should be provided at separate locations.
Masataka Ohta
Sure. But that doesn't do anything to help prevent the type of outage that hit Facebook, which was the point I was trying to make in my response. Facebook did use different IP addresses, and it didn't matter, because the underlying health of the network is what was at issue, not the health of the nameservers. I agree with you--different IP addresses should be used in different geographic locations, even with anycast setups. But people need to also recognize that's not a panacea that solves everything, and that it wouldn't have changed the nature of the outage last week. Thanks! :) Matt
Matthew Petach wrote:
With an anycast setup using the same IP addresses in every location, returning SERVFAIL doesn't have the same effect, however, because failing over from anycast address 1 to anycast address 2 is likely to be routed to the same pop location, where the same result will occur.
That's why that is a bad idea. Alternative name servers with different IP addresses should be provided at separate locations.
Sure. But that doesn't do anything to help prevent the type of outage that hit Facebook, which was the point I was trying to make in my response. Facebook did use > different IP addresses, and it didn't matter, because the underlying health of the network is what was at issue, not the health of the nameservers.
A possible solution is to force unbundling of CDN providers and transit providers by antitrust agencies. Then, CDN providers can't pursue efficiency only to kill fundamental redundancy of DNS. For network neutrality, backbone providers *MUST* be neutral for contents they carry. However, CDN providers having their own backbone are using their backbone for contents they prefer, which is *NOT* neutral at all. As such, access/retail providers may pay for peering with neutral backbone providers for their customers but should reject direct peering request from, actively behaving against neutrality, CDN providers.
I agree with you--different IP addresses should be used in different geographic locations, even with anycast setups.
But people need to also recognize that's not a panacea that solves everything, and that it wouldn't have changed the nature of the outage last week.
We should recognize the fundamental difference between independent, thus neutral, backbone providers and CDN providers with anti-neutral backbone of their own. Masataka Ohta
For network neutrality, backbone providers *MUST* be neutral for contents they carry.
However, CDN providers having their own backbone are using their backbone for contents they prefer, which is *NOT* neutral at all.
As such, access/retail providers may pay for peering with neutral backbone providers for their customers but should reject direct peering request from, actively behaving against neutrality, CDN providers.
If I am understanding you correctly, are you arguing that anyone with a network MUST be forced to become a transit provider for anyone else, in the name of "neutrality"? I'll reserve further comment until I make sure I have grasped your point. On Wed, Oct 13, 2021 at 9:28 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Matthew Petach wrote:
With an anycast setup using the same IP addresses in every location, returning SERVFAIL doesn't have the same effect, however, because failing over from anycast address 1 to anycast address 2 is likely to be routed to the same pop location, where the same result will occur.
That's why that is a bad idea. Alternative name servers with different IP addresses should be provided at separate locations.
Sure. But that doesn't do anything to help prevent the type of outage that hit Facebook, which was the point I was trying to make in my response. Facebook did use > different IP addresses, and it didn't matter, because the underlying health of the network is what was at issue, not the health of the nameservers.
A possible solution is to force unbundling of CDN providers and transit providers by antitrust agencies.
Then, CDN providers can't pursue efficiency only to kill fundamental redundancy of DNS.
For network neutrality, backbone providers *MUST* be neutral for contents they carry.
However, CDN providers having their own backbone are using their backbone for contents they prefer, which is *NOT* neutral at all.
As such, access/retail providers may pay for peering with neutral backbone providers for their customers but should reject direct peering request from, actively behaving against neutrality, CDN providers.
I agree with you--different IP addresses should be used in different geographic locations, even with anycast setups.
But people need to also recognize that's not a panacea that solves everything, and that it wouldn't have changed the nature of the outage last week.
We should recognize the fundamental difference between independent, thus neutral, backbone providers and CDN providers with anti-neutral backbone of their own.
Masataka Ohta
Tom Beecher wrote:
For network neutrality, backbone providers *MUST* be neutral for contents they carry.
However, CDN providers having their own backbone are using their backbone for contents they prefer, which is *NOT* neutral at all.
As such, access/retail providers may pay for peering with neutral backbone providers for their customers but should reject direct peering request from, actively behaving against neutrality, CDN providers.
If I am understanding you correctly, are you arguing that anyone with a network MUST be forced to become a transit provider for anyone else, in the name of "neutrality"?
No, not at all. For example, CDN (N stands for a network) operators may rely on neutral transit providers to connect their CDN to access/retail providers. But, I certainly mean that CDN operators should not request peering directly to access/retail ISPs merely because they have their own transit, because the transit is not at all neutral. Masataka Ohta
But, I certainly mean that CDN operators should not request peering directly to access/retail ISPs merely because they have their own transit, because the transit is not at all neutral.
I'm still confused. Let's say I have a CDN network, with a datacenter somewhere, an edge site somewhere else. I carry my bits from my datacenter, across my internal network, to my edge site. This is where I intend to hand the bits over to someone else to carry them to the end user. Let's say in this site, I have a paid transit connection , and a peering session directly with the end user's ISP. Where is anything related to neutrality being 'violated', regardless of which path I choose to send the bits out? On Wed, Oct 13, 2021 at 10:36 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Tom Beecher wrote:
For network neutrality, backbone providers *MUST* be neutral for contents they carry.
However, CDN providers having their own backbone are using their backbone for contents they prefer, which is *NOT* neutral at all.
As such, access/retail providers may pay for peering with neutral backbone providers for their customers but should reject direct peering request from, actively behaving against neutrality, CDN providers.
If I am understanding you correctly, are you arguing that anyone with a network MUST be forced to become a transit provider for anyone else, in the name of "neutrality"?
No, not at all.
For example, CDN (N stands for a network) operators may rely on neutral transit providers to connect their CDN to access/retail providers.
But, I certainly mean that CDN operators should not request peering directly to access/retail ISPs merely because they have their own transit, because the transit is not at all neutral.
Masataka Ohta
On Wed, Oct 13, 2021 at 10:56 AM Tom Beecher <beecher@beecher.cc> wrote:
But, I certainly mean that CDN operators should not request
peering directly to access/retail ISPs merely because they have their own transit, because the transit is not at all neutral.
I'm still confused.
Let's say I have a CDN network, with a datacenter somewhere, an edge site somewhere else. I carry my bits from my datacenter, across my internal network, to my edge site. This is where I intend to hand the bits over to someone else to carry them to the end user.
Let's say in this site, I have a paid transit connection , and a peering session directly with the end user's ISP. Where is anything related to neutrality being 'violated', regardless of which path I choose to send the bits out?
It sounds like masataka is saying that the network between your 'datacenter' and 'cdn node' is a 'transit network'. I think 'transit network' is a sentence fragment much like: "bgp peer" .. it's overloaded (in this conversation at least) so probably some more clarity is required in the conversation to progress in a meaningful manner.
On Wed, Oct 13, 2021 at 10:36 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Tom Beecher wrote:
For network neutrality, backbone providers *MUST* be neutral for contents they carry.
However, CDN providers having their own backbone are using their backbone for contents they prefer, which is *NOT* neutral at all.
As such, access/retail providers may pay for peering with neutral backbone providers for their customers but should reject direct peering request from, actively behaving against neutrality, CDN providers.
If I am understanding you correctly, are you arguing that anyone with a network MUST be forced to become a transit provider for anyone else, in the name of "neutrality"?
No, not at all.
For example, CDN (N stands for a network) operators may rely on neutral transit providers to connect their CDN to access/retail providers.
But, I certainly mean that CDN operators should not request peering directly to access/retail ISPs merely because they have their own transit, because the transit is not at all neutral.
Masataka Ohta
Tom Beecher wrote:
But, I certainly mean that CDN operators should not request peering directly to access/retail ISPs merely because they have their own transit, because the transit is not at all neutral.
I'm still confused.
Let's say I have a CDN network, with a datacenter somewhere, an edge site somewhere else. I carry my bits from my datacenter, across my internal network, to my edge site. This is where I intend to hand the bits over to someone else to carry them to the end user.
The problem is that, unlike neutral transit providers, "the bits" is biased by the CDN provider. Then, access/retail ISPs who also want to supply their own contents, even though they must be neutral to contents provided by neutral transit providers, naturally refuse peering with the anti-neutral CDN providers. Remember that CDN providers are not neutral at all. Masataka Ohta
On 10/13/21 17:24, Masataka Ohta wrote:
The problem is that, unlike neutral transit providers, "the bits" is biased by the CDN provider.
Then, access/retail ISPs who also want to supply their own contents, even though they must be neutral to contents provided by neutral transit providers, naturally refuse peering with the anti-neutral CDN providers.
Remember that CDN providers are not neutral at all.
Well, the purpose of a network is whatever its proprietor deems it to be, and makes no false advertising about it. A private enterprise network that carries a company's internal traffic - which may or may not interface with an external network that is interested in some or all of that traffic - would, in your eyes, be classified as not neutral, because it chooses not to use its network to provide global IP Transit? In my mind, the word "transit" refers to carriage between two non-homogeneous points. So network A (customer) will talk to network C (content) via my network B (transit). If the traffic originates either from A or C, BUT terminates/ends inside of B, I do not consider that transit. I'm unaware of content operators who run their own network and (promise to) provide connectivity between A and C. Mark.
Mark Tinka wrote:
Remember that CDN providers are not neutral at all.
Well, the purpose of a network is whatever its proprietor deems it to be, and makes no false advertising about it.
What?
A private enterprise network that carries a company's internal traffic - which may or may not interface with an external network that is interested in some or all of that traffic - would, in your eyes, be classified as not neutral, because it chooses not to use its network to provide global IP Transit? Unless they directly reach their end users, yes, of course.
The fundamental problem of networking is the last mile problem that access costs alot more than backbone. As such, long distance carriers may peer with access providers only when they are neutral or pay some of there revenue share to access providers.
In my mind, the word "transit" refers to carriage between two non-homogeneous points. So network A (customer) will talk to network C (content) via my network B (transit). If the traffic originates either from A or C, BUT terminates/ends inside of B, I do not consider that transit.
With your definition, as CDN providers with their own backbone are not "transit", they can not request access providers (and, ultimately, end users) peering without paying some as compensation for access network cost. Otherwise, CDN providers with their own backbone are free riders ignoring access costs. Masataka Ohta
On 10/16/21 15:44, Masataka Ohta wrote:
What?
I will use my network for what I want my network to do for me. There are no international rules about why a network must be built. Provided that I am clear to those whom I want to connect to my network, I can do what I want with it and not be a bad actor.
Unless they directly reach their end users, yes, of course.
So by your logic, a bank's internal network used to drive its ATM machines is not neutral because one cannot use that network for global IP Transit?
The fundamental problem of networking is the last mile problem that access costs alot more than backbone.
Well, yes and no. While it is true that one of the biggest problems of the Internet is the last mile, it is vital to not be forced into the mistake of classification. For some operators, the "last mile" is the biggest cost. For other networks, the "backbone" is the biggest cost. You can't tell me that US$700 million being spent to build a submarine cable around a continent is something to scoff at. For me, I don't want to hold myself back by classifying "access", "backbone", "metro", e.t.c. Your business model will determine what is costly to you, and what isn't.
As such, long distance carriers may peer with access providers only when they are neutral or pay some of there revenue share to access providers.
Again, you are trying to keep the old Internet (and the classic telephone company model) alive in 2021. That is not how operators work anymore. There are networks that have neither a "backbone" nor an "access network" that do very well, and don't cause anyone else pain, because they are clear in what their model is.
With your definition, as CDN providers with their own backbone are not "transit", they can not request access providers (and, ultimately, end users) peering without paying some as compensation for access network cost.
Otherwise, CDN providers with their own backbone are free riders ignoring access costs.
Okay, so by your logic, "access providers" should pay CDN's for peering, because the CDN's have spent millions building submarine cables and data centres around the world to bring their service to the access providers. After all, why give the access providers a free ride either? In case it's not clear, that last paragraph was sarcastic. It's 2021 - long distance, access, backbone, metro, e.t.c. Those are boxes that don't exist anymore. Let's not refuse the advancement of the model because we can't find a way to make it fit in our old box. Mark.
Mark Tinka wrote:
What?
I will use my network for what I want my network to do for me. There are no international rules about why a network must be built.
As you are seemingly requesting international legal formality, let me point out there are "International Telecommunication Regulations", based on which network neutrality is discussed by ITU.
Unless they directly reach their end users, yes, of course.
So by your logic, a bank's internal network used to drive its ATM machines is not neutral because one cannot use that network for global IP Transit?
No, of course. So?
You can't tell me that US$700 million being spent to build a submarine cable around a continent is something to scoff at.
That cost is negligible compared to the cost to prepare access network all over the continent, I'm afraid.
Okay, so by your logic, "access providers" should pay CDN's for peering, because the CDN's have spent millions building submarine cables and data centres around the world to bring their service to the access providers. After all, why give the access providers a free ride either?
The essential difference is whether they are neutral or not.
In case it's not clear, that last paragraph was sarcastic. It's 2021 - long distance, access, backbone, metro, e.t.c. Those are boxes that don't exist anymore.
Are you saying that there is no such thing as tier 1 ISPs? Masataka Ohta
On 10/18/21 10:11, Masataka Ohta wrote:
As you are seemingly requesting international legal formality, let me point out there are "International Telecommunication Regulations", based on which network neutrality is discussed by ITU.
And since when does the IETF world follow the ITU standards? Even though ITU heads don't think much of IETF heads, you can't find an SDH or DWDM port in a laptop. On the other hand, GMPLS is based on OSPF, IS-IS and RSVP-TE :-).
No, of course. So?
Well, I'll be asking my bank to sell me some IP Transit or DIA, then, since they are running an IP network.
That cost is negligible compared to the cost to prepare access network all over the continent, I'm afraid.
It may be, it may not be. The reason is only one or a small handful of folk are investing US$700 million into a submarine cable. On the other hand, access networks are built by several operators, all competing. So no single operator building an access network is spending more than the content folk laying pipe in the Atlantic and Indian oceans.
The essential difference is whether they are neutral or not.
Well, the issue is you want to label things. I can see this is what is causing your confusion. Willing buyer, willing seller. That's all that's needed. If the seller doesn't like the buyer, they move on. If the buyer doesn't like the seller, they move on.
Are you saying that there is no such thing as tier 1 ISPs?
Hehe, let's not go down that rat hole. But no, I don't believe in "tiers" for service provider networks. Haven't done so in nearly 15 years. Heck, my marketing team are always asking if we can identify ourselves as "Tier 1" because we own and operate a submarine cable. I'm sure you can guess my answer to them... Personally, I don't care whether you are "transit-free" or not. You cannot provide an Internet service to the entire world from just one operator (network or content). So trying to be "bigger" than the other guy is a pointless exercise. Jane + Thatho don't care about your measuring contest. Mark.
Mark Tinka wrote:
As you are seemingly requesting international legal formality, let me point out there are "International Telecommunication Regulations", based on which network neutrality is discussed by ITU.
And since when does the IETF world follow the ITU standards?
As copper and optical fiber for access politically belongs to ITU, DSL and optical fiber standards of ITU are followed by the IETF world. I actually joined an ITU meeting at Geneva, when I was actively acting for DSL in Japan.
Even though ITU heads don't think much of IETF heads, you can't find an SDH or DWDM port in a laptop. On the other hand, GMPLS is based on OSPF, IS-IS and RSVP-TE :-).
FYI, IS-IS is part of OSI, which was jointly developed by ISO and ITU, not by IETF at all.
Well, I'll be asking my bank to sell me some IP Transit or DIA, then, since they are running an IP network.
Feel free to do so.
It may be, it may not be. The reason is only one or a small handful of folk are investing US$700 million into a submarine cable.
Are you agreeing with me that they are earning a lot more than they should?
On the other hand, access networks are built by several operators, all competing.
Access networks are subject to regional monopoly unless unbundling is forced by regulatory bodies. Worse, with PON, such unbundling is hard (not impossible, see https://ieeexplore.ieee.org/document/5616389).
Willing buyer, willing seller. That's all that's needed. If the seller doesn't like the buyer, they move on. If the buyer doesn't like the seller, they move on.
So, you are a neo-liberalist. Good luck.
Are you saying that there is no such thing as tier 1 ISPs?
Hehe, let's not go down that rat hole.
But no, I don't believe in "tiers" for service provider networks. Haven't done so in nearly 15 years.
Though precise definition of "tier 1" is a rat hole, that there are entities called tier 1, which are the primary elements of the Internet backbone, is a common concept shared by most of us, maybe excluding you. Masataka Ohta
On 10/18/21 14:16, Masataka Ohta wrote:
As copper and optical fiber for access politically belongs to ITU, DSL and optical fiber standards of ITU are followed by the IETF world.
Yes, but nobody cares about Layer 1 or Layer 2. Once the road is built, all anyone remembers is the car I drove across it, not whether the tar used to build the road was mixed well :-).
I actually joined an ITU meeting at Geneva, when I was actively acting for DSL in Japan.
Good for you. Look, I'm not saying the ITU are bad - I am saying that they are "more structured and rigid", than Internet-land. And that is okay. There is a reason we TCP/IP became dominant.
FYI, IS-IS is part of OSI, which was jointly developed by ISO and ITU, not by IETF at all.
You might be forgetting that the IETF adapted IS-IS to IP networks: https://datatracker.ietf.org/doc/html/rfc1195 I'm not sure anyone running IS-IS in an ISP environment, today, is running it for CLNS. But we thank the ISO, immensely :-).
Are you agreeing with me that they are earning a lot more than they should?
I have zero interest in being the profit police. Who am I tell anyone that they are earning too much? If you make something people find value in, the billions will automatically flow your way - you can't stop it. Is it a perfect system, probably not, but it's what we've got.
Access networks are subject to regional monopoly unless unbundling is forced by regulatory bodies. Worse, with PON, such unbundling is hard (not impossible, see https://ieeexplore.ieee.org/document/5616389).
Submarine cables are usually either owned by one party, or a small club. It's no different - and trying to be a member of the club can be just as demoralizing as local regulation on terrestrial builds. That said, different markets have different policies on access networks. So a single policy for what we think is best is not practical. Moreover, if access networks are expensive due to backward regulation and monopolistic promotion, then that is an artificial problem that can be removed, but the actors choose not to. You can't blame a content operator for that market position.
So, you are a neo-liberalist. Good luck.
I also like the one where whole gubbermints shutdown the Internet for elections, or to hush voices. I discriminate equally :-).
Though precise definition of "tier 1" is a rat hole, that there are entities called tier 1, which are the primary elements of the Internet backbone, is a common concept shared by most of us, maybe excluding you.
I know many here that have moved on from the "tier" terminologies. But it's unnecessary for them to chime in. There hasn't been "a core of the Internet" for a long while, and anyone still believing that either in reality or words is living in a fantasy world long gone, which is partially why infrastructure finds itself becoming less and less relevant, and being swallowed up by BigContent. I mean, if you missed the fact that Facebook went down, and people thought the Internet had stopped, then maybe Facebook are a Tier 1... Mark.
Mark Tinka wrote:
Yes, but nobody cares about Layer 1 or Layer 2.
As you wrote:
You can't tell me that US$700 million being spent to build a > submarine cable around a continent is something to scoff at.
you do care.
Look, I'm not saying the ITU are bad
FYI, I'm not arguing especially for ITU. But, it do have some regulatory influence for its Members.
FYI, IS-IS is part of OSI, which was jointly developed by ISO and ITU, not by IETF at all.
You might be forgetting that the IETF adapted IS-IS to IP networks:
Just as RIP was imported from XNS world, which does not deny Xerox and ITU/ISO primarily contributed to develop the protocols.
I have zero interest in being the profit police. Who am I tell anyone that they are earning too much?
Anti-trust agencies, of course.
Access networks are subject to regional monopoly unless unbundling is forced by regulatory bodies. Worse, with PON, such unbundling is hard (not impossible, see https://ieeexplore.ieee.org/document/5616389).
Submarine cables are usually either owned by one party, or a small club.
Submarine cables are for backbone. That's why you must distinguish access and backbone. Masataka Ohta
Otherwise, CDN providers with their own backbone are free riders ignoring access costs.
I think the Pointy Hairs and Bean Counters would love it if they could ignore all the monthly bills for the access costs that we generate. On Sat, Oct 16, 2021 at 9:46 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Mark Tinka wrote:
Remember that CDN providers are not neutral at all.
Well, the purpose of a network is whatever its proprietor deems it to be, and makes no false advertising about it.
What?
A private enterprise network that carries a company's internal traffic - which may or may not interface with an external network that is interested in some or all of that traffic - would, in your eyes, be classified as not neutral, because it chooses not to use its network to provide global IP Transit? Unless they directly reach their end users, yes, of course.
The fundamental problem of networking is the last mile problem that access costs alot more than backbone.
As such, long distance carriers may peer with access providers only when they are neutral or pay some of there revenue share to access providers.
In my mind, the word "transit" refers to carriage between two non-homogeneous points. So network A (customer) will talk to network C (content) via my network B (transit). If the traffic originates either from A or C, BUT terminates/ends inside of B, I do not consider that transit.
With your definition, as CDN providers with their own backbone are not "transit", they can not request access providers (and, ultimately, end users) peering without paying some as compensation for access network cost.
Otherwise, CDN providers with their own backbone are free riders ignoring access costs.
Masataka Ohta
On 10/13/21 07:34, Masataka Ohta wrote:
But, I certainly mean that CDN operators should not request peering directly to access/retail ISPs merely because they have their own transit, because the transit is not at all neutral.
I'm not sure that I understand this. Peering is rarely if ever neutral. It's almost always "My network and customers only to your network and customers only." CDNs and their customers (content providers) peering with ISPs and their customers (eyeballs) seems to me to be a win-win. Access/retail ISPs should want to peer with CDNs as it greatly reduces their transport costs. CDNs will want to peer with access/retail ISPs for the same reason. Specifically what is the objection to CDNs peering with access ISPs? -- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
Jay Hennigan wrote:
Access/retail ISPs should want to peer with CDNs as it greatly reduces their transport costs.
Not at all. Access/retail ISPs have no problem by peering with neutral backbone providers. CDN provided backbone only reduces costs of other backbone providers without reducing costs of access/retail ISPs. Worse, peering beyond neutral providers costs more for access/retail providers. Masataka Ohta
On 10/16/21 06:48, Masataka Ohta wrote:
Jay Hennigan wrote:
Access/retail ISPs should want to peer with CDNs as it greatly reduces their transport costs.
Not at all.
Access/retail ISPs have no problem by peering with neutral backbone providers.
Neutral backbone providers don't peer with access/retail ISPs. They sell transit to them.
CDN provided backbone only reduces costs of other backbone providers without reducing costs of access/retail ISPs.
Access/retail ISPs that peer with CDNs eliminate the cost of paying for transit for the content delivered by the CDN. That's what the initials CDN stand for. Access/retail ISPs that peer with CDNs don't reduce the costs of backbone providers, they reduce their profits. Those backbone providers no longer are charging to deliver the content provided by the CDNs. The retail/access ISPs are getting it direct at no charge from the CDN by peering. It also reduces the cost to the content provider as they no longer are paying a transit provider to deliver it. It also often increases the reliability of the Internet experience by creating a more direct path.
Worse, peering beyond neutral providers costs more for access/retail providers.
I think you are mistaken. Every gigabyte delivered by peering is a gigabyte that the access/retail ISP isn't paying a transit provider to deliver. -- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
Jay Hennigan wrote:
Access/retail ISPs have no problem by peering with neutral backbone providers.
Neutral backbone providers don't peer with access/retail ISPs. They sell transit to them.
FYI, that is called paid peering.
CDN provided backbone only reduces costs of other backbone providers without reducing costs of access/retail ISPs.
Access/retail ISPs that peer with CDNs eliminate the cost of paying for transit for the content delivered by the CDN. That's what the initials CDN stand for.
But, it does not mean both parties of the peer are equally benefited. As such peering may be paid one, though it may not be the current practice. Given the observed profitability of CDN providers, CDN providers are, seemingly, more benefited (because they are not neutral), in which case, CDN providers should pay to access/retail ISPs. Masataka Ohta
* mohta@necom830.hpcl.titech.ac.jp (Masataka Ohta) [Sun 17 Oct 2021, 11:17 CEST]:
Jay Hennigan wrote:
Neutral backbone providers don't peer with access/retail ISPs. They sell transit to them.
FYI, that is called paid peering.
Can you please please please stop posting nonsense? -- Niels.
søn. 17. okt. 2021 11.16 skrev Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp>:
Jay Hennigan wrote:
Access/retail ISPs have no problem by peering with neutral backbone providers.
Neutral backbone providers don't peer with access/retail ISPs. They sell transit to them.
FYI, that is called paid peering.
Paid peering is not the same product as IP Transit. In general a packet never traverse two peering links because that would mean the middle man is not getting paid to move the traffic. Paid peering with a backbone provider will get you routes from their paying customers but not from their peers. The same as you would have from a settlement free peering.
CDN provided backbone only reduces costs of other backbone providers without reducing costs of access/retail ISPs.
Access/retail ISPs that peer with CDNs eliminate the cost of paying for transit for the content delivered by the CDN. That's what the initials CDN stand for.
But, it does not mean both parties of the peer are equally benefited. As such peering may be paid one, though it may not be the current practice.
Given the observed profitability of CDN providers, CDN providers are, seemingly, more benefited (because they are not neutral), in which case, CDN providers should pay to access/retail ISPs.
Masataka Ohta
I do not want Netflix to pay me. I get paid by my customers, some of which also happens to be Netflix customers. If Netflix had to pay me, they would need to get that money from the same people who are already paying me directly. What is the point of that? Let me tell you the point. Large ISP can exploit their domination of the marked to double dip, which means they want to be paid twice. That happens to be not neutral and is a way to make the customer pay a hidden fee. For smaller ISPs it works the other way around. An evil CDN could attempt to charge us, the small ISP. I am happy that is not happening. Regards Baldur
Baldur Norddahl wrote:
Neutral backbone providers don't peer with access/retail ISPs. They sell transit to them.
FYI, that is called paid peering.
Paid peering is not the same product as IP Transit. In general a packet never traverse two peering links because that would mean the middle man is not getting paid to move the traffic.
So, there is terminology confusion because, these days, many people distinguish transit from peering without precise understanding on the peering situations.
Paid peering with a backbone provider will get you routes from their paying customers but not from their peers.
That argument may be applicable to the simplest cases of, so called, peering between leaf ISPs and transit peering (here, "transit peering" seems to be a proper terminology accepted by most) between leaf ISPs and upper level ISPs. But, with settlement free peering between tier 1 ISPs, tier 2 ISPs having transit/paid peering with a tier 1 ISP will receive routes from peers of the tier 1 ISP. There is transit traffic exchanged between tier 1 ISPs over settlement free peering. So, I don't think distinguishing transit from peering meaningful for precise discussions.
I do not want Netflix to pay me.
You are so generous.
Let me tell you the point. Large ISP can exploit their domination of the marked to double dip, which means they want to be paid twice. That happens to be not neutral and is a way to make the customer pay a hidden fee.
For smaller ISPs it works the other way around. An evil CDN could attempt to charge us, the small ISP. I am happy that is not happening.
Because of natural monopoly and PON, most access/retail ISPs enjoy their domination in their own area regardless of their sizes. Masataka Ohta
On Mon, 18 Oct 2021 at 09:51, Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
But, with settlement free peering between tier 1 ISPs, tier 2 ISPs having transit/paid peering with a tier 1 ISP will receive routes from peers of the tier 1 ISP. There is transit traffic exchanged between tier 1 ISPs over settlement free peering.
So, I don't think distinguishing transit from peering meaningful for precise discussions.
Around here there are certain expectations if you sell a product called IP Transit and other expectations if you call the product paid peering. The latter is not providing the whole internet and is cheaper. The so-called "tier" of a company is a meaningless term. Traffic will never traverse two settlement free peering links and this is true for "tier 1" ISPs as well. Paid peering is understood to be the same as a settlement free peering except for not being settlement free. Therefore a paid peering with an "tier 1" ISP will not provide any traffic that traverses their settlement free peering links with other "tier 1" ISPs. It is quite possible some "tier 1" ISPs do not see the point in providing such a product but then they just won't offer paid peering - only IP transit. In more technical terms, no peering link, settlement free or for pay, has routes for the whole internet. If the peering had routes for the whole internet it would be IP transit. This is achieved by only announcing own customer routes on the peering links and _not_ announcing routes received from other peering links. You get access to their customers but you need to make other arrangements to get access to the rest of the internet.
For smaller ISPs it works the other way around. An evil CDN could attempt to charge us, the small ISP. I am happy that is not happening.
Because of natural monopoly and PON, most access/retail ISPs enjoy their domination in their own area regardless of their sizes.
This is not true in our part of the world. The regulator is requiring all major last mile infrastructure owners to give access to reseller ISPs breaking that monopoly. My own company both owns infrastructure (FTTH and FTTB / apartment networks) and resell using FTTH / DSL owned by other companies. Plus we have three 5G networks providing an alternative and also breaking the monopoly. Regards, Baldur
On Mon, Oct 18, 2021 at 10:30 AM Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Around here there are certain expectations if you sell a product called IP Transit and other expectations if you call the product paid peering. The latter is not providing the whole internet and is cheaper.
The problem with paid peering is that it creates a conflict of interest which corruptly influences the company's behavior. Two customers are paying you in full for a service but if one elects not to pay you will also deny or degrade the service to the other one who has, in fact, paid you. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On Mon, Oct 18, 2021 at 11:16 AM William Herrin <bill@herrin.us> wrote:
On Mon, Oct 18, 2021 at 10:30 AM Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Around here there are certain expectations if you sell a product called IP Transit and other expectations if you call the product paid peering. The latter is not providing the whole internet and is cheaper.
The problem with paid peering is that it creates a conflict of interest which corruptly influences the company's behavior. Two customers are paying you in full for a service but if one elects not to pay you will also deny or degrade the service to the other one who has, in fact, paid you.
The phrase "paying you in full" is the stumbling point with your claim. As Baldur noted, "paid peer [...] is not providing the whole internet and is cheaper." If the two customers are "paying you in full", then they're paying you for transit, and as such, they get a copy of the full tables, regardless of how you learn those routes, whether through a paid relationship or a settlement free relationship. If the two customers are *not* paying full price, but are instead paying the reduced price for "paid peering", then they each recognize that the set of prefixes they are receiving, and the spread of their prefixes in return are inherently limited, *and will change over time as the customer relationships on each side change." Nobody buying "paid peering" expects the list of prefixes sent and received across those sessions to remain constant forever. That would imply no new customers are ever added, and would imply no customers ever leave, which is clearly unreasonable in the real world. If you, as the customer paying for paid peering, see the list of prefixes decreasing over time, when the contract comes up for renewal, you are likely to argue for a lower price, or may decide it's no longer worth it, and decide to not renew the relationship. On the other hand, if you, as the provider, are increasing the number of prefixes being seen across those paid peerings at a substantial rate, when the next renewal cycle comes up, you may decide the price for paid peering should go up, because you're providing more value across those sessions. Each side evaluates the then-present set of prefixes being exchanged when the contract comes up for renewal, to decide if it's still worth it or not. But if you're "paying in full" for IP transit, then the sessions should include as much of the full BGP table as possible, potentially including a default route, and the promise of that session is to make your prefixes as visible to the entire rest of the Internet as possible. (This is, as a small aside, why I don't think Cogent should be allowed to label their product "IP transit" so long as they are willfully refusing to propagate their customer's prefixes to *all* of the rest of the Internet. So long as they are choosing to cherry-pick out certain networks that they will *not* propagate their customers routes to, they are *not* providing true IP transit, and should not label it as such.)
Regards, Bill Herrin
Thanks! Matt
On Mon, Oct 18, 2021 at 11:47 AM Matthew Petach <mpetach@netflight.com> wrote:
On Mon, Oct 18, 2021 at 11:16 AM William Herrin <bill@herrin.us> wrote:
On Mon, Oct 18, 2021 at 10:30 AM Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Around here there are certain expectations if you sell a product called IP Transit and other expectations if you call the product paid peering. The latter is not providing the whole internet and is cheaper.
The problem with paid peering is that it creates a conflict of interest which corruptly influences the company's behavior. Two customers are paying you in full for a service but if one elects not to pay you will also deny or degrade the service to the other one who has, in fact, paid you.
The phrase "paying you in full" is the stumbling point with your claim.
As Baldur noted, "paid peer [...] is not providing the whole internet and is cheaper."
Since peering customers can only reach transit customers, it follows that one of the customers in the equation is a fully-paid transit customer. That fully paid customer's service is degraded or denied unless the peering customer also pays. Hence the conflict of interest. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On Mon, Oct 18, 2021 at 1:17 PM William Herrin <bill@herrin.us> wrote:
On Mon, Oct 18, 2021 at 11:16 AM William Herrin <bill@herrin.us> wrote:
On Mon, Oct 18, 2021 at 10:30 AM Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Around here there are certain expectations if you sell a product called IP Transit and other expectations if you call the product paid
On Mon, Oct 18, 2021 at 11:47 AM Matthew Petach <mpetach@netflight.com> wrote: peering. The latter is not providing the whole internet and is cheaper.
The problem with paid peering is that it creates a conflict of interest which corruptly influences the company's behavior. Two customers are paying you in full for a service but if one elects not to pay you will also deny or degrade the service to the other one who has, in fact, paid you.
The phrase "paying you in full" is the stumbling point with your claim.
As Baldur noted, "paid peer [...] is not providing the whole internet and is cheaper."
Since peering customers can only reach transit customers, it follows that one of the customers in the equation is a fully-paid transit customer. That fully paid customer's service is degraded or denied unless the peering customer also pays. Hence the conflict of interest.
I'm sorry. :( I'm feeling particularly dense this morning, so I'm going to work through the two cases very slowly to make sure I understand. Customer A is full transit paying customer. In case 1, Customer B is a full transit paying customer also. Customer A announces their prefixes to ISP; as a transit customer, ISP promises to announce those prefixes to everyone they have a BGP relationship with, including customer B. Likewise, ISP provides a full BGP table, including default if requested, to Customer A, ensuring Customer A can reach Customer B, and Customer B can reach Customer A. in case 2, Customer B is a paid peering customer. Customer A announces their prefixes to ISP; as a transit customer, the ISP promises to announce those prefixes to everyone they have a BGP relationship with, including Customer B. Likewise, ISP provides a full BGP table, including default if requested, to Customer A, ensuring Customer A can reach Customer B, and Customer B can reach Customer A. I'm not seeing how Customer B's status as paid peer versus transit customer changes either the set of prefixes Customer A sees, or the spread of Customer A's prefixes to the rest of the Internet. In short--the amount Customer B is paying or not paying, does not change the view of prefixes that Customer A sees, nor does it change the propagation scope of Customer A's prefixes. As neither of those two things change, I'm completely failing to see how Customer A's service is being degraded or denied based on Customer B's choices. Can you explain what it is I'm missing here? ^_^; Regards,
Bill Herrin
Thanks! Matt
On Mon, Oct 18, 2021 at 1:47 PM Matthew Petach <mpetach@netflight.com> wrote:
On Mon, Oct 18, 2021 at 1:17 PM William Herrin <bill@herrin.us> wrote:
Since peering customers can only reach transit customers, it follows that one of the customers in the equation is a fully-paid transit customer. That fully paid customer's service is degraded or denied unless the peering customer also pays. Hence the conflict of interest.
Customer A is full transit paying customer. in case 2, Customer B is a paid peering customer.
Can you explain what it is I'm missing here? ^_^;
The part where customer A is paying for a connection to "the Internet" at some data rate, which includes the network run by customer B. REGARDLESS of whether B pays the same service provider. If the service rendered to A is changed by B's payment (or lack), that's a conflict of interest. To remove the conflict of interest, you either have to fiddle the definition of what customer A is buying, turning it into something that would not be obvious to an ordinary person or you have or you have to allow B to engage in settlement free peering if they want to. Or, counterintuitively, pay your own transit provider enough to handle any capacity A and B together care to consume. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
Baldur Norddahl wrote:
Around here there are certain expectations if you sell a product called IP Transit and other expectations if you call the product paid peering.
That some word is used for marketing hype with an intentional self-contradicting definition is not my problem, at all.
The so-called "tier" of a company is a meaningless term.
I have been using that term properly, which means it is meaningful if people use it properly. So is "peering". Sabri Berisha wrote:
The term "network neutrality" was invented by people who want to control a network owned and paid for by someone else.
Excellent theory. Masataka Ohta
On Wed, Oct 13, 2021 at 6:26 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Matthew Petach wrote:
With an anycast setup using the same IP addresses in every location, returning SERVFAIL doesn't have the same effect, however, because failing over from anycast address 1 to anycast address 2 is likely to be routed to the same pop location, where the same result will occur.
That's why that is a bad idea. Alternative name servers with different IP addresses should be provided at separate locations.
Sure. But that doesn't do anything to help prevent the type of outage that hit Facebook, which was the point I was trying to make in my response. Facebook did use > different IP addresses, and it didn't matter, because the underlying health of the network is what was at issue, not the health of the nameservers.
A possible solution is to force unbundling of CDN providers and transit providers by antitrust agencies.
Other people have already spoken to the misunderstanding or misuse of the terms "CDN provider" and "transit provider" in this case. I'd like to take a moment to point out the other problem with this sentence, which is "antitrust agencies". One of the key aspects to both CDN providers and transit providers is they tend to be multi-national organizations with infrastructure in multiple countries on multiple continents. A CDN provider that only exists in one city is a hosting company, not a CDN. A transit provider that only provides network connectivity in one city, or one state, isn't a very valuable transit provider, since the implicit (and sometimes explicit) promise the transit network is making to their customers is that they will carry their IP traffic to the rest of the world, ensuring as best as they can that their prefixes are visible to others, and that their packets are carried to other networks, wherever they may be. You won't be terribly successful as a transit provider if your business model is to "carry traffic for your customers all the way to the edges of the city", or "carry your traffic anywhere within the country it needs to go, but discard it if it needs to go outside the country." So, given that both our CDN provider and our transit network provider operate in more than one country, what "antitrust agency" would have jurisdiction over the CDN provider and the transit provider that could force unbundling of their services? What if every country the CDN provider and the transit provider operate in has a different definition of what it means to "unbundle" the services? Then, CDN providers can't pursue efficiency only to kill
fundamental redundancy of DNS.
For network neutrality, backbone providers *MUST* be neutral for contents they carry.
Nothing at all requires backbone providers to be neutral. Backbone networks are free to restrict what traffic or content passes across their networks. Indeed, many backbone providers include in their terms of service lists of traffic that they reserve the right to block or discard. Most of the time, those clauses are focused on traffic which may be injurious to the backbone network or the systems that support it; but even DDoS traffic which isn't itself injurious to the backbone, but does impact other customers, may be dropped at the backbone providers' discretion. We should recognize the fundamental difference between
independent, thus neutral, backbone providers and CDN providers with anti-neutral backbone of their own.
Others have, I think, already addressed more directly their fundamental disagreement with that statement. ^_^;
Masataka Ohta
Thanks! :) Matt
Matthew Petach wrote:
I'd like to take a moment to point out the other problem with this sentence, which is "antitrust agencies".
One of the key aspects to both CDN providers and transit providers is they tend to be multi-national organizations with infrastructure in multiple countries on multiple continents.
Your theory that multi-national entities can not be targets of anti-trust agencies of individual countries and can enjoy world wide oligopoly is totally against the reality. Masataka Ohta
----- On Oct 17, 2021, at 4:50 AM, Masataka Ohta mohta@necom830.hpcl.titech.ac.jp wrote: Hi,
Matthew Petach wrote:
One of the key aspects to both CDN providers and transit providers is they tend to be multi-national organizations with infrastructure in multiple countries on multiple continents.
Your theory that multi-national entities can not be targets of anti-trust agencies of individual countries and can enjoy world wide oligopoly is totally against the reality.
At face value, your statement is correct. In context, it is unrealistic. Government anti-trust intervention is nothing less than the (a) government interfering in private business. In most civilized countries, that requires a strong legal basis as the government is essentially infringing on private property which is protected in most Constitutions. Therefore, anti-trust intervention is only considered in markets where there are a relatively small amount of competitors and this lack of competition harms the consumer, or when one or more dominant parties use their position to force smaller companies into unreasonable compliance with their wishes. The CDN market has multiple competitors, and the barrier to entry the market is relatively low as you don't have any last-mile issues or difficult-to-get government license requirements. And let's not even begin to talk about anti-trust for content providers; on just my Roku I have Netflix, Disney+, Hulu, Amazon Prime, Discovery+, FandangoNow (although they moved into something else I think), NatGeo+, Sling TV, Nickelodeon, and a bunch more that I can't even remember. Plenty of competition there. Thanks, Sabri
Sabri Berisha wrote:
Therefore, anti-trust intervention is only considered in markets where there are a relatively small amount of competitors and this lack of competition harms the consumer, or when one or more dominant parties use their position to force smaller companies into unreasonable compliance with their wishes.
Didn't network neutrality become an issue because "one or more dominant parties use their position to force smaller companies into unreasonable compliance with their wishes"?
The CDN market has multiple competitors, and the barrier to entry the market is relatively low as you don't have any last-mile issues or difficult-to-get government license requirements.
To enter the market competitively, you must have large number of servers at many locations, I think. Masataka Ohta
----- On Oct 18, 2021, at 1:40 AM, Masataka Ohta mohta@necom830.hpcl.titech.ac.jp wrote:
Sabri Berisha wrote:
Therefore, anti-trust intervention is only considered in markets where there are a relatively small amount of competitors and this lack of competition harms the consumer, or when one or more dominant parties use their position to force smaller companies into unreasonable compliance with their wishes.
Didn't network neutrality become an issue because "one or more dominant parties use their position to force smaller companies into unreasonable compliance with their wishes"?
The term "network neutrality" was invented by people who want to control a network owned and paid for by someone else. Your version of "unreasonable" and my version of "unreasonable" are on the opposite end of the spectrum. I think it is unreasonable for you to tell me how to run configure my routers, and you think it is unreasonable for me to configure my routers that I pay for the way that I want to. Net neutrality is just a fancy word for "I don't like the fifth"*.
The CDN market has multiple competitors, and the barrier to entry the market is relatively low as you don't have any last-mile issues or difficult-to-get government license requirements.
To enter the market competitively, you must have large number of servers at many locations, I think.
Hence the "relatively low". It is far easier to start a CDN than it is to start a residential internet service. At least here in the U.S. Thanks, Sabri * The fifth, besides the right to remain silent, also contains the takings clause.
On 10/18/21 11:09 AM, Sabri Berisha wrote:
The term "network neutrality" was invented by people who want to control a network owned and paid for by someone else.
Your version of "unreasonable" and my version of "unreasonable" are on the opposite end of the spectrum. I think it is unreasonable for you to tell me how to run configure my routers, and you think it is unreasonable for me to configure my routers that I pay for the way that I want to.
Yeahbut, for the last mile that network is often a monopoly or maybe a duopoly if you're lucky. If streaming provider 1 pays ISP to give priority over streaming provider 2 -- maybe by severely rate limiting provider 2 -- the people who get screwed are end users without a way to vote with their feet. That sort of monopolistic behavior is bad for end users. Mostly I want ISP's to be dumb bit providers and stay out of shady deals that enrich ISP's at my expense. And if it takes regulation to do that, bring it. Mike
" to give priority" Assuming priority is given. It's going to be very rare for their to be both only one ISP and no other ISPs able to be motivated to be present. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Michael Thomas" <mike@mtcc.com> To: nanog@nanog.org Sent: Monday, October 18, 2021 1:51:50 PM Subject: Re: DNS pulling BGP routes? On 10/18/21 11:09 AM, Sabri Berisha wrote:
The term "network neutrality" was invented by people who want to control a network owned and paid for by someone else.
Your version of "unreasonable" and my version of "unreasonable" are on the opposite end of the spectrum. I think it is unreasonable for you to tell me how to run configure my routers, and you think it is unreasonable for me to configure my routers that I pay for the way that I want to.
Yeahbut, for the last mile that network is often a monopoly or maybe a duopoly if you're lucky. If streaming provider 1 pays ISP to give priority over streaming provider 2 -- maybe by severely rate limiting provider 2 -- the people who get screwed are end users without a way to vote with their feet. That sort of monopolistic behavior is bad for end users. Mostly I want ISP's to be dumb bit providers and stay out of shady deals that enrich ISP's at my expense. And if it takes regulation to do that, bring it. Mike
----- On Oct 18, 2021, at 11:51 AM, Michael Thomas mike@mtcc.com wrote: Hi,
On 10/18/21 11:09 AM, Sabri Berisha wrote:
The term "network neutrality" was invented by people who want to control a network owned and paid for by someone else.
Your version of "unreasonable" and my version of "unreasonable" are on the opposite end of the spectrum. I think it is unreasonable for you to tell me how to run configure my routers, and you think it is unreasonable for me to configure my routers that I pay for the way that I want to.
Yeahbut, for the last mile that network is often a monopoly or maybe a duopoly if you're lucky. If streaming provider 1 pays ISP to give priority over streaming provider 2 -- maybe by severely rate limiting provider 2 -- the people who get screwed are end users without a way to vote with their feet. That sort of monopolistic behavior is bad for end users. Mostly I want ISP's to be dumb bit providers and stay out of shady deals that enrich ISP's at my expense. And if it takes regulation to do that, bring it.
I totally agree. 100%. Now we just have to agree on the regulation that we're talking about. My idea of regulation in this context is to get rid of the monopoly/duopoly so that users actually do have a way out and can vote with their feet. From that perspective, the NBN model isn't that bad (not trying to start an NBN flamewar here). But, I would be opposed to regulation that prevents a network operator from going into enable mode. There are more reasons than "government intervention into a privately owned network" / "network neutrality" to want more competition. Lower prices and better service, for example. Have you ever tried calling Comcast/Spectrum? I'd love to get involved (privately, not professionally) in a municipal broadband project where I live. We have 1 fiber duct for the entire town. That got cut last year, and literally everyone was without internet access for many hours. We don't need net neutrality. We need competition. The FCC sucks, and so does the CPUC. Thanks, Sabri
On 10/18/21 12:22 PM, Sabri Berisha wrote:
----- On Oct 18, 2021, at 11:51 AM, Michael Thomas mike@mtcc.com wrote:
Hi,
The term "network neutrality" was invented by people who want to control a network owned and paid for by someone else.
Your version of "unreasonable" and my version of "unreasonable" are on the opposite end of the spectrum. I think it is unreasonable for you to tell me how to run configure my routers, and you think it is unreasonable for me to configure my routers that I pay for the way that I want to. Yeahbut, for the last mile that network is often a monopoly or maybe a duopoly if you're lucky. If streaming provider 1 pays ISP to give
On 10/18/21 11:09 AM, Sabri Berisha wrote: priority over streaming provider 2 -- maybe by severely rate limiting provider 2 -- the people who get screwed are end users without a way to vote with their feet. That sort of monopolistic behavior is bad for end users. Mostly I want ISP's to be dumb bit providers and stay out of shady deals that enrich ISP's at my expense. And if it takes regulation to do that, bring it. I totally agree. 100%. Now we just have to agree on the regulation that we're talking about.
My idea of regulation in this context is to get rid of the monopoly/duopoly so that users actually do have a way out and can vote with their feet. From that perspective, the NBN model isn't that bad (not trying to start an NBN flamewar here).
But, I would be opposed to regulation that prevents a network operator from going into enable mode.
There are more reasons than "government intervention into a privately owned network" / "network neutrality" to want more competition. Lower prices and better service, for example. Have you ever tried calling Comcast/Spectrum?
I'd love to get involved (privately, not professionally) in a municipal broadband project where I live. We have 1 fiber duct for the entire town. That got cut last year, and literally everyone was without internet access for many hours. We don't need net neutrality. We need competition. The FCC sucks, and so does the CPUC.
I know that there are a lot of risks with hamfisted gubbermint regulations. But even when StarLink turns the sky into perpetual daylight and we get another provider, there are going to still be painfully few choices, and too often the response to $EVIL is not "oh great, more customers for us!" but "oh great, let's do that too!". Witness airlines and the race to the bottom with various fees -- and that's in a field where there is plenty of competition. This is obviously complicated and one of the complications is QoS in the last mile. DOCSIS has a lot of QoS machinery so that MSO's could get CBR like flows for voice back in the day. I'm not sure whether this ever got deployed because as is often the case, brute force and ignorance (ie, make the wire faster) wins, mooting the need. Is there even a constructive use of QoS in the last mile these days that isn't niche? Maybe gaming? Would any sizable set of customers buy it if it were offered? If there isn't, a regulation that just says "don't cut deals to prioritize one traffic source at the expense of others" seems pretty reasonable, and probably reflects the status quo anyway. Mike
----- On Oct 18, 2021, at 12:40 PM, Michael Thomas mike@mtcc.com wrote:
On 10/18/21 12:22 PM, Sabri Berisha wrote:
I totally agree. 100%. Now we just have to agree on the regulation that we're talking about.
My idea of regulation in this context is to get rid of the monopoly/duopoly so that users actually do have a way out and can vote with their feet. From that perspective, the NBN model isn't that bad (not trying to start an NBN flamewar here).
I know that there are a lot of risks with hamfisted gubbermint regulations. But even when StarLink turns the sky into perpetual daylight and we get another provider, there are going to still be painfully few choices, and too often the response to $EVIL is not "oh great, more customers for us!" but "oh great, let's do that too!".
That's the point where MBAs take over from engineering to squeeze every last penny out of the customer. And that usually happens when a company gets large.
Witness airlines and the race to the bottom with various fees -- and that's in a field where there is plenty of competition.
For the most part: yes. But, that's also where the success of Southwest comes from. They generally don't take part in that kind of bovine manure.
This is obviously complicated and one of the complications is QoS in the last mile. DOCSIS has a lot of QoS machinery so that MSO's could get CBR like flows for voice back in the day. I'm not sure whether this ever got deployed because as is often the case, brute force and ignorance (ie, make the wire faster) wins, mooting the need. Is there even a constructive use of QoS in the last mile these days that isn't niche? Maybe gaming? Would any sizable set of customers buy it if it were offered?
It's been a few years since I've worked for a residential service provider, but to the best of my memory, congestion was rarely found in the last mile.
If there isn't, a regulation that just says "don't cut deals to prioritize one traffic source at the expense of others" seems pretty reasonable, and probably reflects the status quo anyway.
But again, now you are interfering in how I operate my network. Let's say I have two options: 1. Accept one million from Netflix to prioritize their traffic and set my residential internet pricing to $50; or 2. Be subjected to government regulations that prohibit me from accepting said funds and set my residential internet pricing to $100 to cover costs; Isn't it up to me to make that decision? The government should not need to have any say in this matter. And note my careful wording, because in the current market, they do need to have a say. My point is: the market should be open enough that if a sub disagrees with their ISP's technical choices, they should be able to switch. It's government regulation that makes that extremely difficult, if not impossible. But, I don't want to pollute the list any further and I've made my points so I shall grant you the last word publically :) Thanks, Sabri
On 10/18/21 1:51 PM, Sabri Berisha wrote:
regulations. But even when StarLink turns the sky into perpetual daylight and we get another provider, there are going to still be painfully few choices, and too often the response to $EVIL is not "oh great, more customers for us!" but "oh great, let's do that too!". That's the point where MBAs take over from engineering to squeeze every last
I know that there are a lot of risks with hamfisted gubbermint penny out of the customer. And that usually happens when a company gets large.
So what's the counter? I mean, MSO's already pull that kind of shitty behavior with their "fees" cloaked as taxes. Maybe a better argument is that this is all theoretical since to my knowledge it's not being done on any large scale, so let's not fix theoretical problems.
This is obviously complicated and one of the complications is QoS in the last mile. DOCSIS has a lot of QoS machinery so that MSO's could get CBR like flows for voice back in the day. I'm not sure whether this ever got deployed because as is often the case, brute force and ignorance (ie, make the wire faster) wins, mooting the need. Is there even a constructive use of QoS in the last mile these days that isn't niche? Maybe gaming? Would any sizable set of customers buy it if it were offered? It's been a few years since I've worked for a residential service provider, but to the best of my memory, congestion was rarely found in the last mile.
That's what I figured. I remember talking to some Sprint architect types around the same time when I told them all of their insistence on AAL2 was useless because voice was going to be drop in the bucket. They looked at me as if I was completely insane. Mike
On Sun, Oct 17, 2021 at 4:54 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Matthew Petach wrote:
I'd like to take a moment to point out the other problem with this sentence, which is "antitrust agencies".
One of the key aspects to both CDN providers and transit providers is they tend to be multi-national organizations with infrastructure in multiple countries on multiple continents.
Your theory that multi-national entities can not be targets of anti-trust agencies of individual countries and can enjoy world wide oligopoly is totally against the reality.
*facepalm* No, the point I was making wasn't that they can't be the target of antitrust agencies, the point was that there's so many conflicting jurisdictions that consistent enforcement in a coordinated fashion is impossible. We can't even get countries to agree on what a copyright or a trademark means, or even what privacy rights a person should have. I know one content distribution company that was originally thinking of putting a site in country X; however, after taking a closer look at the laws in country X, decided instead to put the site in a nearby country with more favourable laws and to interconnect with the network providers just outside country X, thus putting them outside the reach of those laws. It's really, *really* hard to "regulate" global infrastructure because it crosses over/under/through so many different jurisdictions; if one country decides to put considerably stronger restrictions in place, the reaction by and large is to 'route around the damage' so to speak. The lack of success from Brasil's efforts are a good indication of just how successful per-country regulation of internet providers tends to be: https://www.networkworld.com/article/2175352/brazil-to-drop-requirement-that... The GDPR is probably the most successful effort at reining in global internet companies in recent years, and even there, when companies ignore it, the resulting fines are a small slap on the wrist at best, hardly causing them to change their behaviours: https://secureprivacy.ai/blog/gdpr-the-6-biggest-fines-enforced-by-regulator... Even the $5 billion fine Facebook paid to the FTC after the Cambridge Analytica was really only a $106M fine, with an extra $4.9B thrown in to make the personal lawsuit go away: https://www.politico.com/news/2021/09/21/facebook-paid-billions-extra-to-the... When companies can afford to throw an extra 50x the money at a regulatory agency to make a problem go away, it's pretty clear that thinking that regulatory agencies are going to have enough teeth to fundamentally change the way of life of those businesses is optimistic at best. Looking at the top 15 antitrust cases in the US, you can see how in many cases, the antitrust action was minimally effective in the long term, as the companies that were split up often ended up rejoining again, years down the line: https://stacker.com/stories/3604/15-companies-us-government-tried-break-mono...
Masataka Ohta
Matt
On Fri, Oct 8, 2021 at 10:04 AM Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Fri, Oct 8, 2021 at 10:22 AM Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:
The end result was that our DNS servers became unreachable even though they were still operational.
means their DNS servers were serving the zone, even after they recognize their zone data were too old, that is, expired.
that's not what this means.
Give it up man. Masataka knows more about how Facebook implemented DNS than people who actually worked there. He will tell them (and us) what their public statements really mean. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
Nice document. In section 2.5 Routing, this is written: Distributing Authoritative Name Servers via Shared Unicast Addresses... organizations implementing these practices should always provide at least one authoritative server which is not a participant in any shared unicast mesh. Could it be that by having the NS a,b in one mesh and c,d in another was a mistake? -----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Masataka Ohta Sent: October 7, 2021 11:27 AM To: Bjørn Mork <bjorn@mork.no> Cc: nanog@nanog.org Subject: Re: DNS pulling BGP routes? Bjørn Mork wrote:
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment.
Yes, that is a commonly seen mistake with anycast. You don't know what you're talking about.
I do but you don't.
https://datatracker.ietf.org/doc/html/rfc4786#section-4.4.1
Not a mistake. BCP.
My comment on the rfc is that it is simply wrong. See also: https://datatracker.ietf.org/doc/html/rfc3258 While it would be possible to have some process withdraw the route for a specific server instance when it is not available, there is considerable operational complexity involved in ensuring that this occurs reliably. Given the existing DNS failover methods, the marginal improvement in performance will not be sufficient to justify the additional complexity for most uses. which was our consensus at that time in DNSOP. I have no idea why it was forgotten. Masataka Ohta
On Oct 7, 2021, at 6:25 PM, Jean St-Laurent via NANOG <nanog@nanog.org> wrote:
Nice document.
In section 2.5 Routing, this is written:
Distributing Authoritative Name Servers via Shared Unicast Addresses...
organizations implementing these practices should always provide at least one authoritative server which is not a participant in any shared unicast mesh.
This was superstition, brought forward from 1992 by the folks who were yelling “damned kids get offa my lawn” at the time. There’s no reason to include a unicast address in an NS set in the 21st century, and plenty of reasons not to (since it’ll be very difficult to load-balance with the rest of the servers). But one should NEVER NEVER depend on a single administrative or technical authority for all your NS records. That’s what shot Facebook in the foot, they were trying to do it all themselves, so when they shot themselves in the foot, they only had the one foot, and nothing left to stand on. Whereas other folks shoot themselves in the foot all the time, and nobody notices, because they paid attention to the spirit of RFC 2182. -Bill
Well said Bill. I agree with you about having all your tech/adm records + registrar on the same NS... especially for your OOB domain. Probably what killed them. They lost access to their fb-00b-net-mgmt.io cool dns name network. It just went from bad to worst when they realized that they also lost physical access to the building. We all learned a lot and we're still learning. Jean -----Original Message----- From: Bill Woodcock <woody@pch.net> Sent: October 7, 2021 12:45 PM To: Jean St-Laurent <jean@ddostest.me> Cc: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>; Bjørn Mork <bjorn@mork.no>; nanog@nanog.org Subject: Re: DNS pulling BGP routes? This was superstition, brought forward from 1992 by the folks who were yelling “damned kids get offa my lawn” at the time. There’s no reason to include a unicast address in an NS set in the 21st century, and plenty of reasons not to (since it’ll be very difficult to load-balance with the rest of the servers). But one should NEVER NEVER depend on a single administrative or technical authority for all your NS records. That’s what shot Facebook in the foot, they were trying to do it all themselves, so when they shot themselves in the foot, they only had the one foot, and nothing left to stand on. Whereas other folks shoot themselves in the foot all the time, and nobody notices, because they paid attention to the spirit of RFC 2182. -Bill
On Oct 7, 2021, at 06:49 , Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:
William Herrin wrote:
This is quite common to tie an underlying service announcement to BGP announcements in an Anycast or similar environment.
Yes, that is a commonly seen mistake with anycast. You don't know what you're talking about.
I do but you don't.
If your anycast node stops receiving updated data and you can't reach any of the other nodes to check whether they're online, 99 times out of 100 this means a local failure of some sort.
Yes. In case of DNS, if expiration period of a zone is passed without successful check of the current most zone version, unicast or anycast name servers stop responding requests for the zone.
But, it has nothing specifically to do with anycast. As there are other name servers with different IP addresses, there is no reason to withdraw routes. So?
WRONG. First, assuming that there are non-anycast name servers assumes facts not in evidence. Second, if you are a participant in an anycast name server network, there are good reasons to withdraw your announcement of that prefix in order to avoid users having to wait for timeouts (which in some cases might be even worse than serving stale data).
You withdraw the node's announcement so that you don't serve bad data to the end user.
That will only introduce new failure modes of mismatches between server availability and server reachability and is a bad idea.
No, if the server is available, it should announce the anycast prefix. If it i snot available, it should withdraw it. That’s the best way to make anycast work and it’s what virtually every anycast DNS server network does. If the server is unavailable, but doesn’t withdraw, then you have the failure mode of the server being reachable, but unavailable and it becomes a black hole for traffic that should otherwise flow to other available anycast nodes.
That's what happened here -
Yes, facebook did wrong thing to actively withdraw routes.
No, facebook did the right thing for 99+% of situations that would trigger this withdraw. The problem was that they withdrew EVERY server when the failure wasn’t local instead of having some way to recognize the failure for what it was, global in nature and continue serving DNS.
Simply turning themselves off, instead of withdrawing the routes, would result in suboptimal performance.
This time, facebook is saying that they could not reach their name servers even though the servers were perfectly working.
Because their servers couldn’t verify that they were working and thus thought that they had stale data. Thus, the servers were “perfectly working” with stale data and the safe thing to do if you can’t confirm that your reason for believing you have stale data is erroneous, is to stop serving what you have. If you’re not going to serve what you have, then you shouldn’t announce the anycast prefix, either.
How much performance, do you think, facebook enjoyed? A lot less than "suboptimal", I'm afraid.
As noted, this was that 1% failure that isn’t anticipated. The behavior of the system was correct for 99% of failures and the number of years facebook has operated without a significant or noticeable DNS outage is testament to that fact.
And 99 times out of 100, not doing one or the other would cause rather than prevent an outage.
That is a commonly seen misconception wrongly assuming that server routes were withdrawn if and only if the server is unavailable.
The servers withdrew their routes because the servers had no ability to verify that they were serving valid data. If you can’t verify your data is valid, it’s better (in most cases) to not serve the data you have. If you’re not going to serve, the best thing to do is withdraw the anycast prefix that claims you are a server for the data.
But, the reality is that it is impossible to correctly recognize server is unavailable or to correctly withdraw routes only when server is unavailable.
Yes… So you go with something that works 99% of the time and you get an event like this in that 1% of cases where the failure in question was not one of the failure modes that was previously anticipated. I’m betting that facebook is quickly figuring out changes that will mitigate this type of failure in the future and their DNS will likely stay up until the next 1 in 100 (or will it be 1 in 10,000 this time?) events pops up that surprised them again. That’s the nature of operations. Owen
Owen DeLong wrote:
But, it has nothing specifically to do with anycast. As there are other name servers with different IP addresses, there is no reason to withdraw routes. So?
WRONG.
First, assuming that there are non-anycast name servers assumes facts not in evidence.
There is no such assumption.
Second, if you are a participant in an anycast name server network, there are good reasons to withdraw your announcement of that prefix
You completely misunderstand my points. See other posts of mine. Masataka Ohta
On Wed, Oct 6, 2021 at 10:45 AM Michael Thomas <mike@mtcc.com> wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). I can certainly understand for the DNS servers to not give answers they think are unreachable but there is always the problem that they may be partitioned and not the routes themselves. At a minimum, I would think they'd need some consensus protocol that says that it's broken across multiple servers.
But I just don't understand why this is a good idea at all. Network topology is not DNS's bailiwick so using it as a trigger to withdraw routes seems really strange and fraught with unintended consequences. Why is it a good idea to withdraw the route if it doesn't seem reachable from the DNS server? Give answers that are reachable, sure, but to actually make a topology decision? Yikes. And what happens to the cached answers that still point to the supposedly dead route? They're going to fail until the TTL expires anyway so why is it preferable withdraw the route too?
My guess is that their post while more clear that most doesn't go into enough detail, but is it me or does it seem like this is a really weird thing to do?
Mike
Hi Mike, You're kinda thinking about this from the wrong angle. It's not that the route is withdrawn if doesn't seem reachable from the DNS server. It's that your DNS server is geolocating requests to the nearest content delivery cluster, where the CDN cluster is likely fetching content from a core datacenter elsewhere. You don't want that remote/edge CDN node to give back A records for a CDN node that is isolated from the rest of the network and can't reach the datacenter to fetch the necessary content; otherwise, you'll have clients that reach the page, can load the static elements on the page, but all the dynamic elements hang, waiting for a fetch to complete from the origin which won't ever complete. Not a very good end user experience. So, the idea is that if the edge CDN node loses connectivity to the core datacenters, the DNS servers should stop answering queries for A records with the local CDN node's address, and let a different site respond back to the client's DNS request. In particular, you really don't want the client to even send the request to the edge CDN node that's been isolated, you want to allow anycast to find the next-best edge site; so, once the DNS servers fail the "can-I-reach-my-datacenter" health check, they stop announcing the Anycast service address to the local routers; that way, they drop out of the Anycast pool, and normal Internet routing will ensure the client DNS requests are now sent to the next-nearest edge CDN cluster for resolution and retrieving data. This works fine for ensuring that one or two edge sites that get isolated due to fiber cuts don't end up pulling client requests into them, and subsequently leaving the users hanging, waiting for data that will never arrive. However, it fails big-time if *all* sites fail their "can-I-reach-the-datacenter" check simultaneously. When I was involved in the decision making on a design like this, a choice was made to have a set of "really core" sites in the middle of the network always announce the anycast prefixes, as a fallback, so even if the routing wasn't optimal to reach them, the users would still get *some* level of reply back. In this situation, that would have ensured that at least some DNS servers were reachable; but it wouldn't have fixed the "oh crap we pushed 'no router bgp' out to all the routers at the same time" type problem. But that isn't really the core of your question, so we'll just quietly push that aside for now. ^_^; Point being--it's useful and normal for edge sites that may become isolated from the rest of the network to be configured to stop announcing the Anycast service address for DNS out to local peers and transit providers at that site during the period in which they are isolated, to prevent users from being directed to CDN servers which can't fetch content from the origin servers in the datacenter. It's just generally assumed that not every site will become "isolated" at the same time like that. :) I hope this helps clear up the confusion. Thanks! Matt
----- On Oct 6, 2021, at 10:42 AM, Michael Thomas mike@mtcc.com wrote: Hi,
My guess is that their post while more clear that most doesn't go into enough detail, but is it me or does it seem like this is a really weird thing to do?
In large environments, it's not uncommon to have DNS servers announce themselves on an anycast IP. This is also referred to as "host BGP". Basically, the host (or hypervisor) speaks BGP with the TOR. Your spines or superspines will then pick a best route or ECMP across multiple DNS servers. My guess is that Facebook took this concept a step further and anycasted their public DNS servers through their datacenters to the internet. One single config change made the DNS servers think that they were no longer functioning properly which caused them to withdraw the routes. At least, that's what I understand from the post-mortem. Thanks, Sabri
On Wed, 6 Oct 2021, Michael Thomas wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). I can certainly understand for the DNS servers to not give answers they think are unreachable but there is always the problem that they may be partitioned and not the routes themselves. At a minimum, I would think they'd need some consensus protocol that says that it's broken across multiple servers.
But I just don't understand why this is a good idea at all. Network topology is not DNS's bailiwick so using it as a trigger to withdraw routes seems
Everything I've seen posted about this (whether from Facebook directly, or others) is so vague as to what happened, that I think everyone's just making assumptions based on their own experiences or best guesses as to what really happened. In that vein, imagine you have dozens of small sites acting as anycast origins for DNS. Each regularly does some network health tests to determine if its links to the rest of the (region|backbone|world|etc.) are working within defined paramters. If the health test fails, the site needs to be removed from anycast until the network health issue is resolved. You're big, like automating things, and feel the need for speed, so when the health test fails, rather than trigger an alarm which your NOC may or may not act on in a timely manner, the local anycast origin routes are automatically suppressed from propagating beyond the site. Just suppose you pushed out a new network health test that was guaranteed to fail in every POP...and you pushed it out to every POP. All of a sudden, your anycast routes aren't advertised anywhere. Is this what happened? I really have no clue. It sounds like something like this might have happened. Unless someone at Facebook shares an actual detailed account of what they broke, most of us will never know what really happened. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route StackPath, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On 06/10/2021 22:38, Jon Lewis wrote:
But I just don't understand why this is a good idea at all. Network topology is not DNS's bailiwick so using it as a trigger to withdraw routes seems
Everything I've seen posted about this (whether from Facebook directly, or others) is so vague as to what happened, that I think everyone's just making assumptions based on their own experiences or best guesses as to what really happened.
Better question is why do we not see any FB netadmins on NANOG? I'm not talking about October 2021 but rather over the past 3-5 years how many FB techies have posted here like we see people from Google, Cloudflare, Akamai, etc.? -Hank
On 10/7/21 08:26, Hank Nussbacher wrote:
Better question is why do we not see any FB netadmins on NANOG? I'm not talking about October 2021 but rather over the past 3-5 years how many FB techies have posted here like we see people from Google, Cloudflare, Akamai, etc.?
They are likely here, but BigContent does not really endorse talking about their operations in public fora, typically without PR/Legal OK. For those who talk about stuff, it's either stuff that is already public, publicly-known, or in their own capacity not representing their employer. Mark.
Something public that we know now, is that it's possible to totally shut down facebook and restart it. Can we shutdown the full internet one day and see if it will restart properly without too much hack here and there? Jean -----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Mark Tinka Sent: October 7, 2021 2:31 AM To: nanog@nanog.org Subject: Re: DNS pulling BGP routes? On 10/7/21 08:26, Hank Nussbacher wrote:
Better question is why do we not see any FB netadmins on NANOG? I'm not talking about October 2021 but rather over the past 3-5 years how many FB techies have posted here like we see people from Google, Cloudflare, Akamai, etc.?
They are likely here, but BigContent does not really endorse talking about their operations in public fora, typically without PR/Legal OK. For those who talk about stuff, it's either stuff that is already public, publicly-known, or in their own capacity not representing their employer. Mark.
On 10/7/21 13:18, Jean St-Laurent wrote:
Something public that we know now, is that it's possible to totally shut down facebook and restart it.
Can we shutdown the full internet one day and see if it will restart properly without too much hack here and there?
I think one thing that I learned from this Facebook outage is that the impact to steady supply of electricity to computing and networking gear under spool-up load is not a small problem to scoff at. We could shutdown the entire Internet, and power companies will probably love us. But they will hate a tad more as we reboot it. Mark.
On Wed, Oct 6, 2021 at 10:43 AM Michael Thomas <mike@mtcc.com> wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo).
The servers' IP addresses are anycasted. When one data center determines itself to be malfunctioning, it withdraws the routes so that users will reach a different data center that is, in theory, still functioning. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On 10/6/21 2:33 PM, William Herrin wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). The servers' IP addresses are anycasted. When one data center determines itself to be malfunctioning, it withdraws the routes so
On Wed, Oct 6, 2021 at 10:43 AM Michael Thomas <mike@mtcc.com> wrote: that users will reach a different data center that is, in theory, still functioning.
Ah, I was wondering if the anycast part was the relevant bit. But doesn't it seem odd that it would be intertwined with the DNS infrastructure? Mike
On Wed, 6 Oct 2021, Michael Thomas wrote:
On 10/6/21 2:33 PM, William Herrin wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). The servers' IP addresses are anycasted. When one data center determines itself to be malfunctioning, it withdraws the routes so
On Wed, Oct 6, 2021 at 10:43 AM Michael Thomas <mike@mtcc.com> wrote: that users will reach a different data center that is, in theory, still functioning.
Ah, I was wondering if the anycast part was the relevant bit. But doesn't it seem odd that it would be intertwined with the DNS infrastructure?
People have been anycasting DNS server IPs for years (decades?). So, no. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route StackPath, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On 10/6/21 2:58 PM, Jon Lewis wrote:
On Wed, 6 Oct 2021, Michael Thomas wrote:
On 10/6/21 2:33 PM, William Herrin wrote:
On Wed, Oct 6, 2021 at 10:43 AM Michael Thomas <mike@mtcc.com> wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). The servers' IP addresses are anycasted. When one data center determines itself to be malfunctioning, it withdraws the routes so that users will reach a different data center that is, in theory, still functioning.
Ah, I was wondering if the anycast part was the relevant bit. But doesn't it seem odd that it would be intertwined with the DNS infrastructure?
People have been anycasting DNS server IPs for years (decades?). So, no.
But it wasn't just their DNS subnets that were pulled, I thought. I'm obviously really confused. Anycast to a DNS server makes sense that they'd pull out if they couldn't contact the backend. But I thought that almost all of their routes to the backend were pulled? That is, the DFZ was emptied of FB routes. Mike
On Wed, 6 Oct 2021, Michael Thomas wrote:
People have been anycasting DNS server IPs for years (decades?). So, no.
But it wasn't just their DNS subnets that were pulled, I thought. I'm obviously really confused. Anycast to a DNS server makes sense that they'd pull out if they couldn't contact the backend. But I thought that almost all of their routes to the backend were pulled? That is, the DFZ was emptied of FB routes.
Well, as someone else said, DNS wasn't the problem...it was just one of the more noticeable casualties. Whatever they did broke the network rather completely, and that took out all of their DNS, which broke lots of other things that depend on DNS. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route StackPath, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On 10/6/21 3:33 PM, Jon Lewis wrote:
On Wed, 6 Oct 2021, Michael Thomas wrote:
People have been anycasting DNS server IPs for years (decades?). So, no.
But it wasn't just their DNS subnets that were pulled, I thought. I'm obviously really confused. Anycast to a DNS server makes sense that they'd pull out if they couldn't contact the backend. But I thought that almost all of their routes to the backend were pulled? That is, the DFZ was emptied of FB routes.
Well, as someone else said, DNS wasn't the problem...it was just one of the more noticeable casualties. Whatever they did broke the network rather completely, and that took out all of their DNS, which broke lots of other things that depend on DNS.
Maybe the problem here is that two things happened and the article conflated the two: the DNS infrastructure pulled its routes from the anycast address and something else pulled all of the other routes but wasn't mentioned in the article. Mike
On Wed, 6 Oct 2021, Michael Thomas wrote:
On 10/6/21 3:33 PM, Jon Lewis wrote:
On Wed, 6 Oct 2021, Michael Thomas wrote:
People have been anycasting DNS server IPs for years (decades?). So, no.
But it wasn't just their DNS subnets that were pulled, I thought. I'm obviously really confused. Anycast to a DNS server makes sense that they'd pull out if they couldn't contact the backend. But I thought that almost all of their routes to the backend were pulled? That is, the DFZ was emptied of FB routes.
Well, as someone else said, DNS wasn't the problem...it was just one of the more noticeable casualties. Whatever they did broke the network rather completely, and that took out all of their DNS, which broke lots of other things that depend on DNS.
Maybe the problem here is that two things happened and the article conflated the two: the DNS infrastructure pulled its routes from the anycast address and something else pulled all of the other routes but wasn't mentioned in the article.
From the engineering.fb.com article:
"This was the source of yesterday’s outage. During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally." If you kill the backbone, and every site determines "my connectivity is hosed, suppress anycast propagation.", then you simultaneously have no network, and no anycast (which might otherwise propagate to transit/peers at each or at least some subset of your sites). All of your internal data and communication systems that rely on both network and working DNS suddenly don't work, so internal communications likely degraded to engineers calling or texting each other.
From one of the earlier articles, it sounds like they don't have true out of band access to their routers/switches, which makes it kind of hard to fix the network, if it's no longer a network and you have no access to console or management ports.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route StackPath, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On 10/7/21 00:37, Michael Thomas wrote:
Maybe the problem here is that two things happened and the article conflated the two: the DNS infrastructure pulled its routes from the anycast address and something else pulled all of the other routes but wasn't mentioned in the article.
The origin problem was some "automation thingy" that went to check capacity status around the network ahead of some planned maintenance work, and that "automation thingy" decided checking was not enough, let's just turn the whole thing off. Mark.
On 10/7/21 00:22, Michael Thomas wrote:
But it wasn't just their DNS subnets that were pulled, I thought. I'm obviously really confused. Anycast to a DNS server makes sense that they'd pull out if they couldn't contact the backend. But I thought that almost all of their routes to the backend were pulled? That is, the DFZ was emptied of FB routes.
During the outage, we kept serving traffic to Facebook in various locations. So it would seem that while a large amount of their NLRI left the DFZ, it wasn't all of them. However, what was left was not sufficient to actually keep typical services up, including their DNS. Mark.
On 05/10/2021 21:11, Randy Monroe via NANOG wrote:
Updated: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/
Lets try to breakdown this "engineering" blog posting: - "During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network" Can anyone guess as to what command FB issued that would cause them to withdraw all those prefixes? - "it was not possible to access our data centers through our normal means because their networks were down, and second, the total loss of DNS broke many of the internal tools we’d normally use to investigate and resolve outages like this. Our primary and out-of-band network access was down..." Does this mean that FB acknowledges that the loss of DNS broke their OOB access? -Hank
On 10/6/21 06:51, Hank Nussbacher wrote:
- "During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network"
Can anyone guess as to what command FB issued that would cause them to withdraw all those prefixes?
Hard to say, as it seems that the command was innocent enough, perhaps running a batch of other sub-commands to check port status, bandwidth utilization, MPLS-TE values, e.t.c. However, sounds like another unforeseen bug in the command ran other things, or the cascade process of how the sub-commands were ran caused unforeseen problems. We shall guess this one forever, as I doubt Facebook will go into that much detail. What I can tell you is that all the major content providers spend a lot of time, money and effort in automating both capacity planning, as well as capacity auditing. It's a bit more complex for them, because their variables aren't just links and utilization, but also locations, fibre availability, fibre pricing, capacity lease pricing, the presence of carrier-neutral data centres, the presence of exchange points, current vendor equipment models and pricing, projection of future fibre and capacity pricing, e.t.c. It's a totally different world from normal ISP-land.
- "it was not possible to access our data centers through our normal means because their networks were down, and second, the total loss of DNS broke many of the internal tools we’d normally use to investigate and resolve outages like this. Our primary and out-of-band network access was down..."
Does this mean that FB acknowledges that the loss of DNS broke their OOB access?
I need to put my thinking cap on, but not sure whether running DNS in the IGP would have been better in this instance. We run our Anycast DNS network in our IGP, mainly to always guarantee latency-based routing, but also to ensure that the failure of a higher-level protocol like BGP does not disconnect internal access that is needed for troubleshooting and repair. Given the IGP is a much more lower-level routing protocol, it's more likely (not impossible) that it would not go down with BGP. In the past, we have, indeed, had BGP issues that allowed us to maintain DNS access internally as the IGP was unaffected. The final statement from that report is interesting: "From here on out, our job is to strengthen our testing, drills, and overall resilience to make sure events like this happen as rarely as possible." ... which, in my rudimentary translation, means that: "There are no guarantees that our automation software will not poop cows again, but we hope that when that does happen, we shall be able to send our guys out to site much more quickly." ... which, to be fair, is totally understandable. These automation tools, especially in large networks such as BigContent, are significantly more fragile the more complex they get, and the more batch tasks they need to perform on various parts of a network of this size and scope. It's a pity these automation tools are all homegrown, and can't be bought "pre-packaged and pre-approved to never fail" from IT Software Store down the road. But it's the only way for networks of this capacity to operate, and the risk they always sit with for being that large. Mark.
Hank Nussbacher wrote:
- "it was not possible to access our data centers through our normal means because their networks were down, and second, the total loss of DNS broke many of the internal tools we'd normally use to investigate and resolve outages like this. Our primary and out-of-band network access was down..."
Does this mean that FB acknowledges that the loss of DNS broke their OOB access?
It means FB still do not yet understand what happened. Lack of BGP announcement does not mean "total loss". Name servers should still be accessible by internal tools. But, withdrawing route (for BGP and, maybe, IGP) of failing anycast server is a bad engineering seemingly derived from commonly seen misunderstanding that anycast could provide redundancy. Redundancy of DNS is maintained by multiple (unicast or anycast) name servers with different addresses, for which, withdrawal of failing route is unnecessary complication. Masataka Ohta
On 10/4/21 10:42 PM, William Herrin wrote:
On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com> wrote:
They have a monkey patch subsystem. Lol. Yes, actually, they do. They use Chef extensively to configure operating systems. Chef is written in Ruby. Ruby has something called Monkey Patches. This is where at an arbitrary location in the code you re-open an object defined elsewhere and change its methods.
Chef doesn't always do the right thing. You tell Chef to remove an RPM and it does. Even if it has to remove half the operating system to satisfy the dependencies. If you want it to do something reasonable, say throw an error because you didn't actually tell it to remove half the operating system, you have a choice: spin up a fork of chef with a couple patches to the chef-rpm interaction or just monkey-patch it in one of your chef recipes.
Just because a language allows monkey patching doesn't mean that you should use it. In that particular outage they said that they fix up errant looking config files rather than throw an error and make somebody fix it. That is an extremely bad practice and frankly looks like amateur hour to me. Mike
On 10/4/21 17:58, jcurran@istaff.org wrote:
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr <https://m.facebook.com/nt/screen/?params={"note_id":10158791436142200}&path=/notes/note/&_rdr>
I believe that the above link refers to a previous outage. The duration of the outage doesn't match today's, the technical explanation doesn't align very well, and many of the comments reference earlier dates.
Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/ <https://blog.cloudflare.com/october-2021-facebook-outage/>
This appears to indeed reference today's event. -- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
The CF post mortem looks sensible, and a good summary of what we all saw from the outside with BGP routes being withdrawn. Given the fragility of BGP, this could still end up being a malicious attack. -mel via cell
On Oct 4, 2021, at 6:19 PM, Jay Hennigan <jay@west.net> wrote:
On 10/4/21 17:58, jcurran@istaff.org wrote:
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr <https://m.facebook.com/nt/screen/?params={"note_id":10158791436142200}&path=/notes/note/&_rdr>
I believe that the above link refers to a previous outage. The duration of the outage doesn't match today's, the technical explanation doesn't align very well, and many of the comments reference earlier dates.
Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/ <https://blog.cloudflare.com/october-2021-facebook-outage/>
This appears to indeed reference today's event.
-- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
Update about the October 4th outage https://engineering.fb.com/2021/10/04/networking-traffic/outage/ -- TTFN, patrick
On Oct 4, 2021, at 9:25 PM, Mel Beckman <mel@beckman.org> wrote:
The CF post mortem looks sensible, and a good summary of what we all saw from the outside with BGP routes being withdrawn.
Given the fragility of BGP, this could still end up being a malicious attack.
-mel via cell
On Oct 4, 2021, at 6:19 PM, Jay Hennigan <jay@west.net> wrote:
On 10/4/21 17:58, jcurran@istaff.org wrote:
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr <https://m.facebook.com/nt/screen/?params={"note_id":10158791436142200}&path=/notes/note/&_rdr>
I believe that the above link refers to a previous outage. The duration of the outage doesn't match today's, the technical explanation doesn't align very well, and many of the comments reference earlier dates.
Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/ <https://blog.cloudflare.com/october-2021-facebook-outage/>
This appears to indeed reference today's event.
-- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
On 05/10/2021 05:53, Patrick W. Gilmore wrote:
Update about the October 4th outage
https://engineering.fb.com/2021/10/04/networking-traffic/outage/
Thanks for the posting. How come they couldn't access their routers via their OOB access? -Hank
On 10/5/21 1:22 PM, Hank Nussbacher wrote:
Thanks for the posting. How come they couldn't access their routers via their OOB access?
Rumour is that when the FB route prefixes had been withdrawn their door authentication system stopped working and they could not get back into the building or server room :)
On 10/5/21 08:55, av@nethead.de wrote:
Rumour is that when the FB route prefixes had been withdrawn their door authentication system stopped working and they could not get back into the building or server room :)
Assuming there is any truth to that, guess we can't cancel the hard lines yet :-). #EverythingoIP Mark.
On 05.10.21 07:22, Hank Nussbacher wrote:
Thanks for the posting. How come they couldn't access their routers via their OOB access?
My speculative guess would be that OOB access to a few outbound-facing routers per DC does not help much if a configuration error withdraws the infrastructure prefixes down to the rack level while dedicated OOB to each RSW would be prohibitive. https://research.fb.com/wp-content/uploads/2021/03/Running-BGP-in-Data-Cente...
My speculative guess would be that OOB access to a few outbound-facing routers per DC does not help much if a configuration error withdraws the infrastructure prefixes down to the rack level while dedicated OOB to each RSW would be prohibitive.
If your OOB has any dependence on the inband side, it's not OOB. It's not complicated to have a completely independent OOB infra , even at scale. On Tue, Oct 5, 2021 at 8:40 AM Hauke Lampe <lampe@hauke-lampe.de> wrote:
On 05.10.21 07:22, Hank Nussbacher wrote:
Thanks for the posting. How come they couldn't access their routers via their OOB access?
My speculative guess would be that OOB access to a few outbound-facing routers per DC does not help much if a configuration error withdraws the infrastructure prefixes down to the rack level while dedicated OOB to each RSW would be prohibitive.
https://research.fb.com/wp-content/uploads/2021/03/Running-BGP-in-Data-Cente...
On 05/10/2021 13:17, Hauke Lampe wrote:
On 05.10.21 07:22, Hank Nussbacher wrote:
Thanks for the posting. How come they couldn't access their routers via their OOB access?
My speculative guess would be that OOB access to a few outbound-facing routers per DC does not help much if a configuration error withdraws the infrastructure prefixes down to the rack level while dedicated OOB to each RSW would be prohibitive.
https://research.fb.com/wp-content/uploads/2021/03/Running-BGP-in-Data-Cente...
Thanks for sharing that article. But OOB access involves exactly that - Out Of Band - meaning one doesn't depend on any infrastructure prefixes or DFZ announced prefixes. OOB access is usually via a local ADSL or wireless modem connected to the BFR. The article does not discuss OOB at all. Regards, Hank
Why ever would have a card reader on your external facing network, if that was really the case why they couldn't get in to fix it? -----Original Message----- From: NANOG <nanog-bounces+bkain1=ford.com@nanog.org> On Behalf Of Patrick W. Gilmore Sent: Monday, October 04, 2021 10:53 PM To: North American Operators' Group <nanog@nanog.org> Subject: Re: Facebook post-mortems... WARNING: This message originated outside of Ford Motor Company. Use caution when opening attachments, clicking links, or responding. Update about the October 4th outage https://clicktime.symantec.com/3X9y1HrhXV7HkUEoMWnXtR67Vc?u=https%3A%2F%2Fen... -- TTFN, patrick
On Oct 4, 2021, at 9:25 PM, Mel Beckman <mel@beckman.org> wrote:
The CF post mortem looks sensible, and a good summary of what we all saw from the outside with BGP routes being withdrawn.
Given the fragility of BGP, this could still end up being a malicious attack.
-mel via cell
On Oct 4, 2021, at 6:19 PM, Jay Hennigan <jay@west.net> wrote:
On 10/4/21 17:58, jcurran@istaff.org wrote:
Fairly abstract - Facebook Engineering - https://clicktime.symantec.com/3CDR8hh26akhF2bhzN9S5cv7Vc?u=https%3A%2F%2Fm.... <https://clicktime.symantec.com/3KA6ZdSTySHYFm2mVQy4h5j7Vc?u=https%3A%2F%2Fm.facebook.com%2Fnt%2Fscreen%2F%3Fparams%3D%7B"note_id":10158791436142200}&path=/notes/note/&_rdr>
I believe that the above link refers to a previous outage. The duration of the outage doesn't match today's, the technical explanation doesn't align very well, and many of the comments reference earlier dates.
Also, Cloudflare’s take on the outage - https://clicktime.symantec.com/3EkkFFLL3nVZGvWBnB834uN7Vc?u=https%3A%2F%2Fbl... <https://clicktime.symantec.com/3EkkFFLL3nVZGvWBnB834uN7Vc?u=https%3A%2F%2Fblog.cloudflare.com%2Foctober-2021-facebook-outage%2F>
This appears to indeed reference today's event.
-- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
On Tue, Oct 5, 2021 at 8:57 AM Kain, Becki (.) <bkain1@ford.com> wrote:
Why ever would have a card reader on your external facing network, if that was really the case why they couldn't get in to fix it?
Let's hypothesize for a moment. Let's suppose you've decided that certificate-based authentication is the cat's meow, and so you've got dot1x authentication on every network port in your corporate environment, all your users are authenticated via certificates, all properly signed all the way up the chain to the root trust anchor. Life is good. But then you have a bad network day. Suddenly, you can't talk to upstream registries/registrars, you can't reach the trust anchor for your certificates, and you discover that all the laptops plugged into your network switches are failing to validate their authenticity; sure, you're on the network, but you're in a guest vlan, with no access. Your user credentials aren't able to be validated, so you're stuck with the base level of access, which doesn't let you into the OOB network. Turns out your card readers were all counting on dot1x authentication to get them into the right vlan as well, and with the network buggered up, the switches can't validate *their* certificates either, so the door badge card readers just flash their LEDs impotently when you wave your badge at them. Remember, one attribute of certificates is that they are designated as valid for a particular domain, or set of subdomains with a wildcard; that is, an authenticator needs to know where the certificate is being presented to know if it is valid within that scope or not. You can do that scope validation through several different mechanisms, such as through a chain of trust to a certificate authority, or through DNSSEC with DANE--but fundamentally, all certificates have a scope within which they are valid, and a means to identify in which scope they are being used. And wether your certificate chain of trust is being determined by certificate authorities or DANE, they all require that trust to be validated by something other than the client and server alone--which generally makes them dependent on some level of external network connectivity being present in order to properly function. [yes, yes, we can have a side discussion about having every authentication server self-sign certificates as its own CA, and thus eliminate external network connectivity dependencies--but that's an administrative nightmare that I don't think any large organization would sign up for.] So, all of the client certificates and authorization servers we're talking about exist on your internal network, but they all counted on reachability to your infrastructure servers in order to properly authenticate and grant access to devices and people. If your BGP update made your infrastructure servers, such as DNS servers, become unreachable, then suddenly you might well find yourself locked out both physically and logically from your own network. Again, this is purely hypothetical, but it's one scenario in which a routing-level "oooooops" could end up causing physical-entry denial, as well as logical network access level denial, without actually having those authentication systems on external facing networks. Certificate-based authentication is scalable and cool, but it's really important to think about even generally "that'll never happen" failure scenarios when deploying it into critical systems. It's always good to have the "break glass in case of emergency" network that doesn't rely on dot1x, that works without DNS, without NTP, without RADIUS, or any other external system, with a binder with printouts of the IP addresses of all your really critical servers and routers in it which gets updated a few times a year, so that when the SHTF, a person sitting at a laptop plugged into that network with the binder next to them can get into the emergency-only local account on each router to fix things. And yes, you want every command that local emergency-only user types into a router to be logged, because someone wanting to create mischief in your network is going to aim for that account access if they can get it; so watch it like a hawk, and the only time it had better be accessed and used is when the big red panic button has already been hit, and the executives are huddled around speakerphones wanting to know just how fast you can get things working again. ^_^; I know nothing of the incident in question. But sitting at home, hypothesizing about ways in which things could go wrong, this is one of the reasons why I still configure static emergency accounts on network devices, even with centrally administered account systems, and why there's always a set of "no dot1x" ports that work to get into the OOB/management network even when everything else has gone toes-up. :) So--that's one way in which an outage like this could have locked people out of buildings. ^_^; Thanks! Matt [ready for the deluge of people pointing out I've overly simplified the validation chain for certificates in order to keep the post short and high-level. ^_^; ]
Actually for card readers, the offline verification nature of certificates is probably a nice property. But client certs pose all sorts of other problems like their scalability, ease of making changes (roles, etc), and other kinds of considerations that make you want to fetch more information online... which completely negates the advantages of offline verification. Just the CRL problem would probably sink you since when you fire an employee you want access to be cut off immediately. The other thing that would scare me in general with expecting offline verification is the *reason* it's being used is for offline might get forgotten and back comes the online dependencies while nobody is looking. BTW: you don't need to reach the trust anchor, though you almost certainly need to run OCSP or something like it if you have client certs. Mike On 10/5/21 1:34 PM, Matthew Petach wrote:
On Tue, Oct 5, 2021 at 8:57 AM Kain, Becki (.) <bkain1@ford.com <mailto:bkain1@ford.com>> wrote:
Why ever would have a card reader on your external facing network, if that was really the case why they couldn't get in to fix it?
Let's hypothesize for a moment.
Let's suppose you've decided that certificate-based authentication is the cat's meow, and so you've got dot1x authentication on every network port in your corporate environment, all your users are authenticated via certificates, all properly signed all the way up the chain to the root trust anchor.
Life is good.
But then you have a bad network day. Suddenly, you can't talk to upstream registries/registrars, you can't reach the trust anchor for your certificates, and you discover that all the laptops plugged into your network switches are failing to validate their authenticity; sure, you're on the network, but you're in a guest vlan, with no access. Your user credentials aren't able to be validated, so you're stuck with the base level of access, which doesn't let you into the OOB network.
Turns out your card readers were all counting on dot1x authentication to get them into the right vlan as well, and with the network buggered up, the switches can't validate *their* certificates either, so the door badge card readers just flash their LEDs impotently when you wave your badge at them.
Remember, one attribute of certificates is that they are designated as valid for a particular domain, or set of subdomains with a wildcard; that is, an authenticator needs to know where the certificate is being presented to know if it is valid within that scope or not. You can do that scope validation through several different mechanisms, such as through a chain of trust to a certificate authority, or through DNSSEC with DANE--but fundamentally, all certificates have a scope within which they are valid, and a means to identify in which scope they are being used. And wether your certificate chain of trust is being determined by certificate authorities or DANE, they all require that trust to be validated by something other than the client and server alone--which generally makes them dependent on some level of external network connectivity being present in order to properly function. [yes, yes, we can have a side discussion about having every authentication server self-sign certificates as its own CA, and thus eliminate external network connectivity dependencies--but that's an administrative nightmare that I don't think any large organization would sign up for.]
So, all of the client certificates and authorization servers we're talking about exist on your internal network, but they all counted on reachability to your infrastructure servers in order to properly authenticate and grant access to devices and people. If your BGP update made your infrastructure servers, such as DNS servers, become unreachable, then suddenly you might well find yourself locked out both physically and logically from your own network.
Again, this is purely hypothetical, but it's one scenario in which a routing-level "oooooops" could end up causing physical-entry denial, as well as logical network access level denial, without actually having those authentication systems on external facing networks.
Certificate-based authentication is scalable and cool, but it's really important to think about even generally "that'll never happen" failure scenarios when deploying it into critical systems. It's always good to have the "break glass in case of emergency" network that doesn't rely on dot1x, that works without DNS, without NTP, without RADIUS, or any other external system, with a binder with printouts of the IP addresses of all your really critical servers and routers in it which gets updated a few times a year, so that when the SHTF, a person sitting at a laptop plugged into that network with the binder next to them can get into the emergency-only local account on each router to fix things.
And yes, you want every command that local emergency-only user types into a router to be logged, because someone wanting to create mischief in your network is going to aim for that account access if they can get it; so watch it like a hawk, and the only time it had better be accessed and used is when the big red panic button has already been hit, and the executives are huddled around speakerphones wanting to know just how fast you can get things working again. ^_^;
I know nothing of the incident in question. But sitting at home, hypothesizing about ways in which things could go wrong, this is one of the reasons why I still configure static emergency accounts on network devices, even with centrally administered account systems, and why there's always a set of "no dot1x" ports that work to get into the OOB/management network even when everything else has gone toes-up. :)
So--that's one way in which an outage like this could have locked people out of buildings. ^_^;
Thanks!
Matt [ready for the deluge of people pointing out I've overly simplified the validation chain for certificates in order to keep the post short and high-level. ^_^; ]
I think that was from an outage in 2010: https://engineering.fb.com/2010/09/23/uncategorized/more-details-on-today-s-... On Mon, Oct 4, 2021 at 6:19 PM Jay Hennigan <jay@west.net> wrote:
On 10/4/21 17:58, jcurran@istaff.org wrote:
Fairly abstract - Facebook Engineering -
<https://m.facebook.com/nt/screen/?params={ "note_id":10158791436142200}&path=/notes/note/&_rdr>
I believe that the above link refers to a previous outage. The duration of the outage doesn't match today's, the technical explanation doesn't align very well, and many of the comments reference earlier dates.
Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/ <https://blog.cloudflare.com/october-2021-facebook-outage/>
This appears to indeed reference today's event.
-- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
jcurran@istaff.org wrote:
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr <https://m.facebook.com/nt/screen/?params={"note_id":10158791436142200}&path=/notes/note/&_rdr>
Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/
FYI, /John
This may be a dumb question, but does this suggest that Facebook publishes rather short TTLs for their DNS records? Otherwise, why would an internal failure make them unreachable so quickly? Miles Fidelman -- In theory, there is no difference between theory and practice. In practice, there is. .... Yogi Berra Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. In our lab, theory and practice are combined: nothing works and no one knows why. ... unknown
On Tue, Oct 5, 2021 at 1:47 PM Miles Fidelman <mfidelman@meetinghouse.net> wrote:
jcurran@istaff.org wrote:
Fairly abstract - Facebook Engineering - https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr <https://m.facebook.com/nt/screen/?params=%7B%22note_id%22:10158791436142200%7D&path=/notes/note/&_rdr>
Also, Cloudflare’s take on the outage - https://blog.cloudflare.com/october-2021-facebook-outage/
FYI, /John
This may be a dumb question, but does this suggest that Facebook publishes rather short TTLs for their DNS records? Otherwise, why would an internal failure make them unreachable so quickly?
Looks like 60 seconds: $ dig +norec star-mini.c10r.facebook.com. @d.ns.c10r.facebook.com. ; <<>> DiG 9.10.6 <<>> +norec star-mini.c10r.facebook.com. @ d.ns.c10r.facebook.com. ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25582 ;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;star-mini.c10r.facebook.com. IN A ;; ANSWER SECTION: star-mini.c10r.facebook.com. 60 IN A 157.240.229.35 ;; Query time: 42 msec ;; SERVER: 185.89.219.11#53(185.89.219.11) ;; WHEN: Tue Oct 05 14:01:06 EDT 2021 ;; MSG SIZE rcvd: 72 ... and cue the "Bwahahhaha! If *I* ran Facebook I'd make the TTL be [2 sec|30sec|5min|1h|6h+3sec|1day|6months|maxint32]" threads.... Choosing the TTL is a balancing act between stability, agility, load, politeness, renewal latency, etc -- but I'm sure NANOG can boil it down to "They did it wrong!..." W
Miles Fidelman
-- In theory, there is no difference between theory and practice. In practice, there is. .... Yogi Berra
Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. In our lab, theory and practice are combined: nothing works and no one knows why. ... unknown
-- The computing scientist’s main challenge is not to get confused by the complexities of his own making. -- E. W. Dijkstra
participants (50)
-
Andy Brezinsky
-
av@nethead.de
-
Baldur Norddahl
-
Bill Woodcock
-
Bjørn Mork
-
Blake Dunlap
-
Callahan Warlick
-
Carsten Bormann
-
Christopher Morrow
-
Curtis Maurand
-
Hank Nussbacher
-
Hauke Lampe
-
Hugo Slabbert
-
J. Hellenthal
-
Jared Mauch
-
Jay Hennigan
-
jcurran@istaff.org
-
Jean St-Laurent
-
Jeff Tantsura
-
Joe Greco
-
Joe Maimon
-
Jon Lewis
-
Justin Keller
-
Kain, Becki (.)
-
Lou D
-
Mark Tinka
-
Masataka Ohta
-
Matthew Kaufman
-
Matthew Petach
-
Mel Beckman
-
Michael Thomas
-
Mike Hammett
-
Miles Fidelman
-
Niels Bakker
-
niels=nanog@bakker.net
-
Owen DeLong
-
Patrick W. Gilmore
-
PJ Capelli
-
Rabbi Rob Thomas
-
Randy Bush
-
Randy Monroe
-
Ross Tajvar
-
Rubens Kuhl
-
Ryan Brooks
-
Ryan Landry
-
Sabri Berisha
-
scott
-
Tom Beecher
-
Warren Kumari
-
William Herrin