On 12/Jun/20 04:01, Michael Hare wrote:
Mark (and others),
I used to run loose uRPF on peering/transit links for AS3128 because I used to think that tightening the screws was always the "right thing to do".
I instrumented at 60s granularity with vendor J uRPF drop counters on these links. Drops during steady state [bgp converged] were few [Kbps]. Drops during planned maintenance were at much rates for a few minutes.
What was happening: I advertise a handful of routes to transit/peers from multiple ASBR. Typically my ASBR sees 800K FIB and a few million RIB routes We all know this takes a good amount of time to churn..
For planned maintenance of ASBR A [cold boot upgrades], if recovery didn't include converging my inbound routes before doing eBGP advertising, I'd be tossing packets due to loose uRPF.
Remember during this time 'ASBR B' in my AS is happy egressing traffic as soon as 'ASBR A' advertises my dozen or so prefixes via eBGP, I start to see return traffic much sooner than before 'ASBR A' has converged. No more specific return route yet other than maybe default for a few minutes if unlucky.. The result is bit bucket networkwide despite ASBR B functioning just fine.
Maybe everyone already convergences inbound before advertising eBGP and I made a rookie mistake, but what about unplanned events?
For me the summary is that I was causing more collateral damage than good [verified by time series data], so I turned off loose URPF. YMMV.
To be honest, we haven't seen this. We've got plenty of peering and transit exit/entry points, each just about dedicated to its own router, across multiple cities in Africa (peering) and Europe (peering + transit). We also only do about 10% - 15% of traffic via transit (remember, we don't run any kind of uRPF on our peering routers). We originate our aggregates from deep within the core, never from the transit, peering or edge routers. We did experience some slowness with the ASR9001 some years back during convergence for the transit network that was connected to that router in Amsterdam, but it had been slowing down for years. Since swapping it out for MX480's and/or MX204's, that problem has since gone away. Mark.