All,

This thread touches on day one bgp architecture bug where the BGP spec is too vague on what should be considered as valid next hop.

Most implementations today go as far as checking if the next hop can be resolved in RIB and if so consider the path as valid and eligible for best path selection.

Well clearly that approach is not working well in IXP setup as there can be peers on a connected subnet which went down 179 seconds ago and we are still blackholing by pushing the traffic to such next hops.

In reality the situation get's worse even in intradomain cases where operators use encapsulation via the domain while IGP does install the route to a next hop. The encap path can be broken while the BGP thinks it is all cool and keep blasting traffic towards it.

It has been observed as a real issue in number of networks and Rajiv few years back wrote a draft to fix it .. https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-bestpath-selection-criteria-12 Well as lot's of useful real BGP protocol extensions in IDR it died.

So to the top of this thread IMHO we should not put BFD state in BGP. That's as ugly as BGP-LS.

Instead each peer should be able to periodically test reachability to a peer (bgp nh) in the data plane and only then consider the path is valid for BGP best path selection.

Would running even ICMP to a nh peer every second be an overkill ? I don't think so but I am sure there would be some who think it would.

Best,

On Sun, Dec 22, 2024 at 2:05 PM Mark Tinka <mark@tinka.africa> wrote:

On 12/22/24 14:15, Nick Hilliard wrote:

> As a separate issue, hold timers should generally be of a comparable
> order of magnitude to the non-availability effect they're attempting
> to mitigate. Inter-domain routing convergence is often measured in
> minutes rather than seconds. So even if the protocol layer worked at
> IXPs without causing control plane meltdown, it's still a mechanism
> which which has a trigger timer two orders of magnitude faster than
> the general case of DFZ reconvergence. I can't see that this would
> help overall inter-domain routing stability.

I think this is the fundamental question.

BGP is stable and scales well given its global scope, not only because
it turns like a tanker, but because we accept that it turns like a tanker.

Now, in a world of TikTok Brain and Uber Eats where we are used to
getting what we want instantly, imposing that on to BGP, even if
sneakily, is probably not something we want. At least not at a global scale.

Mark.