Dear Baldur, On Mon, Oct 23, 2017 at 12:53:48AM +0200, Baldur Norddahl wrote:
I do not get why every BGP implementation kills the session at the prefix limit. It appears that is making a bad situation worse. Routing flaps creating lots of visible disturbance for end users. When the BGP session restarts, it will just happen again and again until operator intervention.
Maximum prefix limits are used as a naive last resort to attempt to protect against catastrophic failures such as memory/fib overflow and full table route leaks. The moment a maximum prefix limit kicks in, something somewhere went wrong and indeed an operator has to intervene. That is the beauty and essence of the maxpfx feature. :)
Instead an implementation could ignore any additional prefixes
This may work in some specific cases, but can be disastrous in other cases. In my opinion, in context of Internet routing, the potential for disaster outweighs any benefits I can see for "ignoring additional prefixes" (in L3VPN context different considerations may apply). You offered "killing a session may make a bad situation worse", but there are of scenarios where keeping the session up can make a bad situation into a diaster. I'll elaborate on the above with an example to hopefully clarify myself. Let's take this event and hypothetically assume 'soft maximum prefix limits' are a commonly deployed thing. https://bgpmon.net/bgp-leak-causing-internet-outages-in-japan-and-beyond/ According to PeeringDB AS 15169 recommends to configure 15,000 as the maximum prefix limit for IPv4. (https://www.peeringdb.com/asn/15169) Let's assume that Verizon had configured "a maximum of 15,000 but keep the BGP session up"-style of soft limit. I currently see roughly 419 prefixes via AS15169 in the DFZ. 15000 - 419 = 14581, so this leaves room for 14581 invalid announcements before the softlimit is kicks in. At that point I'd argue that it is better to just tear down the BGP session rather than create a situation where 14581 invalid announcements (which are part of a 160,000 prefix route leak) can continue to exist. We could go back and forth a bit on how high or low that '15,000' number should be and how things would look if it was closer to 500. But in the end actual operator intervention was needed, and soft maxprefix limits would have the potential to hide that.
or it could compare each additional prefix received to already learned prefixes and decide to drop one to make room for the new one. For example you could drop the most specific routes before less specific routes.
The moment a BGP implementation can do such RIB compression, it may indeed make sense to offer two types of limits: a 'pre-policy maximum prefix limit' and a 'post-policy maximum prefix limit'. The former type of limit would be useful in context of route leaks, the latter in context of protecting against overflow of the FIB capability. Kind regards, Job ps. RPKI Origin Validation and BGPSEC do have the potential to change the way we look at big hammers like maximum prefix limits, but we're not there yet.