On Sep 30, 2010, at 5:37 PM, Randy Bush <randy@psg.com> wrote:
i was recently bitten by a cousin of this
research router getting an ebgp multi-hop full feed from 147.28.0.1 (address is relevant)
it is on a lan with a default gateway 42.666.77.11 (address not relevant), so it has
ip route 0.0.0.0 0.0.0.0 42.666.77.11
massive flapping results.
it seems it gets the bgp route for 147.28.0.0/16 and then can not resolve the next hop. it would not recurse to the default exit.
of course it was solved by
ip route 147.28.0.0 255.255.0.0 42.666.77.11
but i do not really understand in my heart why i needed to do this.
Looks like a classic race condition, in that 147.28/16, upon arrival, becomes a better route for the recursed next-hop (which really is a recursed lookup on your default) So you get 147.28/16 -> 147.28.0.1, and then 147.28.0.1 looks best through the learned route. Of course, this would appear to be a matter of how it is implemented. Because in fact, the 147 route isn't yet in the routing table, so your default should apply. The static seems to force a recursion to the 666 nh. I'll wait for your friend to send the implementation details, but from a glance, it looks like a defensive (lazy?) attempt to avoid a recursion loop during the update receive process. Btw, this will happen on a Juniper (or at least it used to). I'll have to check to confirm. Chris
randy