----- On Feb 19, 2021, at 3:07 AM, Daniel Karrenberg dfk@ripe.net wrote: Hi,
Lessons: HW/SW mono-cultures are dangerous. Input testing is good practice at all levels software. Operational co-ordination is key in times of crisis.
Well... Here is a very similar, fairly recent one. Albeit in this case, the opposite is true: running one software train would have prevented an outage. Some members on this list (hi, Brian!) will recognize the story. Group XX within $company decided to deploy EVPN. All of backbone was running single $vendor, but different software trains. Turns out that between an early draft, implemented in version X, and the RFC, implemented in version Y, a change was made in NLRI formats which were not backwards compatible. Version X was in use on virtually all DC egress boxes, version Y was in use on route reflectors. The moment the first EVPN NLRI was advertised, the entire backbone melted down. Dept-wide alert issued (at night), people trying to log on to the VPN. Oh wait, the VPN requires yubikey, which requires the corp network to access the interwebs, which is not accessible due to said issue. And, despite me complaining since the day of hire, no out of band network. I didn't stay much longer after that. Thanks, Sabri