On Wed, Mar 27, 2019 at 09:36:20PM +0000, Graham Johnston wrote:
This afternoon at around 12:17 central time today we began learning the subnet for the Equinix IX in Chicago via a transit provider; we are on the IX as well. The subnet in question is 208.115.136.0/23. Using stat.ripe.net I can see that this subnet is also being learned by others, see the snip below. On our network this caused a nasty routing loop until we figured out what was wrong. My current best understanding is that because the route was learned via eBGP it trumped the OSPF learned route. As soon as I filtered the advertisement from my transit provider everything returned to normal. What am I doing that isn’t best practices that would have prevented this?
There is two pieces to help prevent this type of failure: 1/ Equinix should have created a RPKI ROA for 208.115.136.0/23, with an Origin ASN of 0 or one of their own ASNs, and a Max Length of 23. 2/ You should implement RPKI based BGP Origin Validation in your network and honor those ROAs. Kind regards, Job