From what I believe was a FB employee on Reddit, account now deleted it seems.


As many of you know, DNS for FB services has been affected and this is likely a symptom of the actual issue, and that's that BGP peering with Facebook peering routers has gone down, very likely due to a configuration change that went into effect shortly before the outages happened (started roughly 1540 UTC).

 

There are people now trying to gain access to the peering routers to implement fixes, but the people with physical access is separate from the people with knowledge of how to actually authenticate to the systems and people who know what to actually do, so there is now a logistical challenge with getting all that knowledge unified.

 

Part of this is also due to lower staffing in data centers due to pandemic measures.

 

 

 

I believe the original change was 'automatic' (as in configuration done via a web interface). However, now that connection to the outside world is down, remote access to those tools don't exist anymore, so the emergency procedure is to gain physical access to the peering routers and do all the configuration locally.

 

 

 

https://twitter.com/jgrahamc/status/1445068309288951820 "About five minutes before Facebook's DNS stopped working we saw a large number of BGP changes (mostly route withdrawals) for Facebook's ASN."

 

 

 

 

From: NANOG <nanog-bounces+lguillory=reservetele.com@nanog.org> On Behalf Of Baldur Norddahl
Sent: Monday, October 4, 2021 1:41 PM
To: NANOG <nanog@nanog.org>
Subject: Re: massive facebook outage presently

 

*External Email: Use Caution*

I got a mail that Facebook was leaving NLIX. Maybe someone botched the script so they took down all BGP sessions instead of just NLIX and now they can't access the equipment to put it back... :-)

 

 

man. 4. okt. 2021 20.31 skrev Billy Croan <BCroan@unrealservers.net>:

I know what this is.....  They forgot to update the credit card on their godaddy account and the domain lapsed.  I guess it will be facebook.info when they get it back online.  The post mortem should be an interesting read.

 

On Mon, Oct 4, 2021 at 11:46 AM Jason Kuehl <jason.w.kuehl@gmail.com> wrote:

Looks like they run there own nameservers and I see the soa records are even missing.

 

On Mon, Oct 4, 2021, 12:23 PM Mel Beckman <mel@beckman.org> wrote:

Here’s a screenshot:

 

 -mel beckman



On Oct 4, 2021, at 9:06 AM, Eric Kuhnke <eric.kuhnke@gmail.com> wrote:



 

Normally not worth mentioning random $service having an outage here, but this will undoubtedly generate a large volume of customer service calls.

 

Appears to be failure in DNS resolution.

 



Links contained in this email have been replaced. If you click on a link in the email above, the link will be analyzed for known threats. If a known threat is found, you will not be able to proceed to the destination. If suspicious content is detected, you will see a warning.