Baldur Norddahl Sent: Wednesday, February 12, 2020 7:57 PM
On Tue, Feb 11, 2020 at 12:33 AM Lukas Tribus <mailto:lists@ltri.eu> wrote:
Therefore, if being down for several minutes is not ok, you should invest in dual links to your transits. And connect those to two different routers. If possible with a guarantee the transits use two routers at their end and that divergent fiber paths are used etc.
That is not my experience *at all*. I have always seen my prefixes converge in a couple of seconds upstream (vs 2 different Tier1's).
This is a bit old but probably still thus:
https://labs.ripe.net/Members/vastur/the-shape-of-a-bgp-update
Quote: "To conclude, we observe that BGP route updates tend to converge globally in just a few minutes. The propagation of newly announced prefixes happens almost instantaneously, reaching 50% visibility in just under 10 seconds, revealing a highly responsive global system. Prefix withdrawals take longer to converge and generate nearly 4 times more BGP traffic, with the visibility dropping below 10% only after approximately 2 minutes".
Unfortunately they did not test the case of withdrawal from one router while having the prefix still active at another.
Yes that's unfortunate, Although I'm thinking that the convergence time would be highly dependent on the first-hop upstream providers involved in the "local-repair" for the affected AS -once that is done doesn't matter that the whole world still routes traffic to affected AS towards the original first-hop upstream AS, as long as it has a valid detour route. And I guess the topology configuration of this first-hop outskirt from the affected AS involved in the "local-repair" would dictate the convergence time. E.g. if your upstream A box happens to have a direct (usable) link/session to upstream B box -winner, however the higher the number of boxes involved in the "local-repair" detour that need to be told "A no more, now B is the way to go" the longer the convergence time. -but if significant portion of the Internet gets withdraw in 2 min -wondering how long could it be for a typical "local-repair" string of bgp speakers to all get the memo. -but realistically how many bgp speakers could that be, ranging from min 2 - to max... say ~6?
When I saw *minutes* of brownouts in connectivity it was always because of ingress prefix convergence (or the lack thereof, due to slow FIB programing, then temporary internal routing loops, nasty things like that, but never external).
That is also a significant problem. In the case of a single transit connection per router, two routers and two providers, there will be a lot of internal convergence between your two routers in the case of a link failure. That is also avoided by having both routers having the same provider connections. That way a router may still have to invalidate many routes but there will be no loops and the router has loop free alternatives loaded into memory already (to the other provider). Plus you can use the simple trick of having a default route as a fall back.
This is a very good point actually, indeed since the box has two transit sessions in case of a failure of only one of them it will still retain all the prefixes in FIB -it will just need to reprogram few next-hops to point towards the other eBGP/iBGP speakers, whoever offers a best path. And reprograming next-hops is significantly faster (with hierarchical FIBs anyways). adam