Baldur Norddahl Sent: Monday, February 10, 2020 3:06 PM
No matter how much money you put into your peering router, the session will be no more stable that whatever the peer did to their end.
Agreed, that's a fair point,
Plus at some point you will need to reboot due to software upgrade or other reasons.
There are ways of draining traffic for planned maintenance.
If you care at all, you should be doing redundancy by having multiple locations, multiple routers. You can then save the money spent on each router, because a router failure will not cause any change on what the internet sees through BGP.
I think router failure will cause change on what the Internet sees as you rightly outlined below:
Also transits are way more important than peers. Loosing a transit will cause massive route changes around the globe and it will take a few minutes to stabilize. Loosing a peer usually just means the peer switches to the transit route, that they already had available.
agreed and I suppose the questions is whether folks tend to try minimizing these impacts by all means possible or just take it as necessary evil that will eventually happen.
Peers are not equal. You may want to ensure redundancy to your biggest peers, while the small fish will be fine without.
To be explicit: Router R1 has connections to transits T1 and T2. Router R2 also has connections to the same transits T1 and T2. When router R1 goes down, only small internal changes at T1 and T2 happens. Nobody notices and the recovery is sub second.
Good point again, Though if I had only T1 on R1 and only T2 on R2 then convergence won't happen inside each Transit but instead between T1 and T2 which will add to the convergence time. So thinking about it seems the optimal design pattern in a distributed (horizontally scaled out) edge would be to try and pair up -i.e. at least two edge nodes per Transit (or Peer for that matter), in order to allow for potentially faster intra-Transit convergence rather than arguably slower inter-transit convergence. adam