On Tue, Jan 08, 2013 at 03:45:10PM +0100, Tim Vollebregt wrote:
Hi,
What we do nowadays as some workaround, is configuring a default route towards a core router on 8 x 10G before maintaining an MX box. Which will be installed before BGP sessions come up, this will cause some packet loss during burst hour outages but is fine during maintenance hours.
I've seen cases where it took up to 30 minutes before the full table was installed correctly in the PFE's.
Currently this issue/bug is holding back our Juniper deployments. As far as I know Juniper created a project group for this bug, and so far they were able to reproduce the issue. Looks like the issue is being taken serious from now.
PR 836197 I actually have very good luck reproducing it: http://cluepon.net/ras/rpdstall.png The issue appears to be that when rpd is busy processing incoming BGP updates (such as when you turn up a large number of peers simultaniously), it starves the rest of the process from actually spending any CPU time handling/installing the route. The graph above shows a plot of the total BGP paths, the number of routes in the "pending" state, and the number of routes actually installed into the forwarding hardware. This is a very simplified example (nothing but IBGP sessions with very simple policies here, not even any EBGP neighbors), using the latest top of the line routing engine, so in real life the issue is much worse. As you can see, while rpd is still busy receiving and processing the incoming updates, the number of pending routes rises and doesn't fall, and the number of routes installed in the PFE stays almost non-existant. A few routes actually manage to squeek in before all of the BGP sessions come up, which is why it has any at all for the period between 0 and 330 seconds. After the router finishes receiving the BGP paths, the pending routes clear very quickly, and then the FIB installation process begins. 8 minutes after turning up the BGP sessions, this router finally has a full table installed in hardware. The pending routes actually clear much quicker than this once the BGP routes stop coming int, I need to update this graph with a higher resolution to show it. :) Juniper actually DOES have a fix for this issue, tweaking the scheduler in rpd so that the router still processes BGP routes even when it's spending a lot of time receiving new routes. Unfortunately they haven't yet decided to prioritize implementing this fix, so it's still stuck in development. If this issue drives you as insane as it does me, I highly encourage you to talk to your account team about PR 836197 and why 8-20+ minutes to install routes to the FIB is not acceptable to you. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)