On Tue, 2010-02-16 at 21:50 -0800, Joe Abley wrote:
On 2010-02-16, at 19:53, Tomas L. Byrnes wrote:
There's significant theoretical work, backed up with lots of practical experience connecting a lot more nodes in real time in a lot more places than the Internet currently does, that posits that the control and forwarding plane should actually ALWAYS be separate, and control higher priority, so that state management converges faster than the dataflows.
I'd like to see the countervailing, peer reviewed, references.
I have no shortage of anecdotes where a non-trivial layer-2 topology at an exchange point has left my router and provider X's router both able to talk to a route server, but unable to talk to each other directly. Since the NEXT_HOP on routes we each learnt from the route server pointed at an address we couldn't talk to, the result was a black hole.
I have similar anecdotes... and I was on the side of running the route-servers. This gets to be a tough nut to crack especially if you happen to have multiple RSes on opposite ends of a layer2 failure (a case where intended redundancy resulted in unintended new failure modes). The best solution we came up with at the time was to add some control knobs to rsd in order to allow us to quickly take down the BGP session to the peer on the falsely advertising RS. Figuring out which third-party negotiated "pairwise peering" was being effected during a switch fabric breakage was done manually at the time and not all that accurate nor of course was it expedient. We attempted to automate that part without too much success. -- /*=================[ Jake Khuon <khuon@NEEBU.Net> ]=================+ | Packet Plumber, Network Engineers /| / [~ [~ |) | | -------- | | for Effective Bandwidth Utilisation / |/ [_ [_ |) |_| NETWORKS | +==================================================================*/