Hello Baldur, On Mon, 10 Feb 2020 at 19:57, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Many dual homed companies may start out with two routers and two transits but without dual links to each transit, as you describe above. That will cause significant disruption if one link goes down. It is not just about convergence between T1 and T2 but for a major part of the internet. Been there, done that, yes you can be down for up to several minuttes before everything is normal again. Assume tier 1 transits and that contact to T1 was lost. This means T1 will have a peering session with T2 somewhere, but T1 will not allow peer to peer traffic to go via that link. All those peers will need to search for a different way to reach you, a way that does not transit T1 (unless they have a contract with T1).
Therefore, if being down for several minutes is not ok, you should invest in dual links to your transits. And connect those to two different routers. If possible with a guarantee the transits use two routers at their end and that divergent fiber paths are used etc.
That is not my experience *at all*. I have always seen my prefixes converge in a couple of seconds upstream (vs 2 different Tier1's). That is with a double-digit number of announcements. Maybe if you announce tens of thousands of prefixes as a large Tier 2, things are more problematic, that I can't tell. Or maybe you hit some old-school route dampening somewhere down the path. Maybe there is another reason for this. But even if 3 AS hops are involved I don't really understand how they would spend *minutes* to converge after receiving your BGP withdraw message. When I saw *minutes* of brownouts in connectivity it was always because of ingress prefix convergence (or the lack thereof, due to slow FIB programing, then temporary internal routing loops, nasty things like that, but never external). I agree there are a number of reasons (including best convergence) to have completely diversified connections to a single transit AS. Another reason is that when you manually reroute traffic for a certain AS path (say transit 2 has an always congested PNI towards a third party ASN), you may not have an alternative to the congested path when you other transit provider goes away. But I never saw minutes of brownout because of upstream -> downstream -> downstream convergence (or whatever the scenario looks like). lukas