Hello I find that the type of outage that affects our network the most is neither of the two options you describe. As is probably typical for smaller networks, we do not have redundant uplinks to all of our transits. If a transit link goes, for example because we had to reboot a router, traffic is supposed to reroute to the remaining transit links. Internally our network handles this fairly fast for egress traffic. However the problem is the ingress traffic - it can be 5 to 15 minutes before everything has settled down. This is the time before everyone else on the internet has processed that they will have to switch to your alternate transit. The only solution I know of is to have redundant links to all transits. Going forward I will make sure we have this because it is a huge disadvantage not being able to take a router out of service without causing downtime for all users. Not to mention that a router crash or link failure that should have taken seconds at most to reroute, but instead causes at least 5 minutes of unstable internet. Regards, Baldur Den 09/01/2017 kl. 23.56 skrev Laurent Vanbever:
Hi NANOG,
We often read that the Internet (i.e. BGP) is "slow to converge". But how slow is it really? Do you care anyway? And can we (researchers) do anything about it? Please help us out to find out by answering our short anonymous survey (<10 minutes).
Survey URL: https://goo.gl/forms/JZd2CK0EFpCk0c272 <https://goo.gl/forms/WW7KX5kT45m6UUM82>
** Background:
While existing fast-reroute mechanisms enable sub-second convergence upon local outages (planned or not), they do not apply to remote outages happening further away from your AS as their detection and protection mechanisms only work locally.
Remote outages therefore mandate a "BGP-only" convergence which tends to be slow, as long streams of BGP UPDATEs (containing up to 100,000s of them) must be propagated router-by-router. Our initial measurements indicate that it can take state-of-the-art BGP routers dozens of seconds to process and propagate these large streams of BGP UPDATEs. During this time, traffic for important destinations can be lost.
** This survey:
This survey aims at evaluating the impact of slow BGP convergence on operational practices. We expect the findings to increase the understanding of the perceived BGP convergence in the Internet, which could then help researchers to design better fast-reroute mechanisms.
We expect the questionnaire to be filled out by network operators whose job relates to BGP operations. It has a total of 17 questions and should take less 10 minutes to answer. The survey and the collected data are anonymous (so please do *not* include information that may help to identify you or your organization). All questions are optional, so if you don't like a question or don't know the answer, please skip it.
A summary of the aggregate results will be published as a part of a scientific article later this year.
Thank you so much in advance, and we look forward to read your responses!
Laurent Vanbever (ETH Zürich, Switzerland)
PS: It goes without saying that we would be also extremely grateful if you could forward this email to any operator you might know who may not read NANOG.