On Tue, Feb 16, 2021 at 01:37:35PM -0600, John Kristoff wrote:
I'd like to start a thread about the most famous and widespread Internet operational issues, outages or implementation incompatibilities you have seen.
Which examples would make up your top three?
This was a fantastic outage, one could really feel the tremors into the far corners of the BGP default-free zone: https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experime... The experiment triggered a bug in some Cisco router models: affected Ciscos would corrupt this specific BGP announcement ** ON OUTBOUND **. Any peers of such Ciscos receiving this BGP update, would (according to then current RFCs) consider the BGP UPDATE corrupted, and would subsequently tear down the BGP sessions with the Ciscos. Because the corruption was not detected by the Ciscos themselves, whenever the sessions would come back online again they'd reannounce the corrupted update, causing a session tear down. Bounce ... Bounce ... Bounce ... at global scale in both IBGP and EBGP! :-) Luckily the industry took these, and many other lessons to heart: in 2015 the IETF published RFC 7606 ("Revised Error Handling for BGP UPDATE Messages") which specifices far more robust behaviour for BGP speakers. Kind regards, Job