RE: [j-nsp] Krt queue issues
Look into Static route retain. Should keep the route in the forwarding table.
From Jniper site <<< Route Retention
By default, static routes are not retained in the forwarding table when the routing process shuts down. When the routing process starts up again, any routes configured as static routes must be added to the forwarding table again. To avoid this latency, routes can be flagged as retain, so that they are kept in the forwarding table even after the routing process shuts down. Retention ensures that the routes are always in the forwarding table, even immediately after a system reboot.
Thanks, Jensen Tyler Sr Engineering Manager Fiberutilities Group, LLC -----Original Message----- From: juniper-nsp-bounces@puck.nether.net [mailto:juniper-nsp-bounces@puck.nether.net] On Behalf Of Benny Amorsen Sent: Wednesday, October 03, 2012 8:32 AM To: Jared Mauch Cc: Saku Ytti; juniper-nsp@puck.nether.net Subject: Re: [j-nsp] Krt queue issues Jared Mauch <jared@puck.nether.net> writes:
As far as the fallback 'default' route, if you are purchasing transit from someone, you could consider a last-resort default pointed at them. You can exclude routes like 10/8 etc by routing these to discard + install on your devices.
That only helps if the default gets installed first, though. If the default has to wait at boot in the krt-queue behind the 300k+ Internet-routes, I have not really gained anything... I suppose it is likely that a static default would be installed before the BGP sessions even come up. /Benny _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
I think route retention might help in the event the table was cleared or routing process restarted but I don't that it will help with a boot because the table structures are being built as part of the system initialization. In reality, I would expect the static routes to get installed very early as soon as the routing process comes up. Since you will need a route to your BGP neighbor (even though it may be directly connected, it is still a route), routing has to be up BEFORE BGP establishes and by definition your static routes will have to be up before your BGP routes are ready. How well your router responds to traffic during an initial boot and during a 300,000 route update is another story. My experience with very large routers and tables is that you will have a hard time guaranteeing user traffic will pass with very much performance during an event like a full table rebuild. Luckily with the bandwidth we have these days and the CPU power on the routers, it does not take that long to pull in a full internet table and begin handling traffic. Steven Naslund -----Original Message----- From: Jensen Tyler [mailto:JTyler@fiberutilities.com] Sent: Wednesday, October 03, 2012 9:45 AM To: nanog@nanog.org Subject: RE: [j-nsp] Krt queue issues Look into Static route retain. Should keep the route in the forwarding table.
From Jniper site <<< Route Retention
By default, static routes are not retained in the forwarding table when the routing process shuts down. When the routing process starts up again, any routes configured as static routes must be added to the forwarding table again. To avoid this latency, routes can be flagged as retain, so that they are kept in the forwarding table even after the routing process shuts down. Retention ensures that the routes are always in the forwarding table, even immediately after a system reboot.
Thanks, Jensen Tyler Sr Engineering Manager Fiberutilities Group, LLC -----Original Message----- From: juniper-nsp-bounces@puck.nether.net [mailto:juniper-nsp-bounces@puck.nether.net] On Behalf Of Benny Amorsen Sent: Wednesday, October 03, 2012 8:32 AM To: Jared Mauch Cc: Saku Ytti; juniper-nsp@puck.nether.net Subject: Re: [j-nsp] Krt queue issues Jared Mauch <jared@puck.nether.net> writes:
As far as the fallback 'default' route, if you are purchasing transit from someone, you could consider a last-resort default pointed at them. You can exclude routes like 10/8 etc by routing these to discard + install on your devices.
That only helps if the default gets installed first, though. If the default has to wait at boot in the krt-queue behind the 300k+ Internet-routes, I have not really gained anything... I suppose it is likely that a static default would be installed before the BGP sessions even come up. /Benny _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Hi, What we do nowadays as some workaround, is configuring a default route towards a core router on 8 x 10G before maintaining an MX box. Which will be installed before BGP sessions come up, this will cause some packet loss during burst hour outages but is fine during maintenance hours. I've seen cases where it took up to 30 minutes before the full table was installed correctly in the PFE's. Currently this issue/bug is holding back our Juniper deployments. As far as I know Juniper created a project group for this bug, and so far they were able to reproduce the issue. Looks like the issue is being taken serious from now. Tim On Oct 3, 2012, at 11:50 PM, Naslund, Steve wrote:
I think route retention might help in the event the table was cleared or routing process restarted but I don't that it will help with a boot because the table structures are being built as part of the system initialization. In reality, I would expect the static routes to get installed very early as soon as the routing process comes up. Since you will need a route to your BGP neighbor (even though it may be directly connected, it is still a route), routing has to be up BEFORE BGP establishes and by definition your static routes will have to be up before your BGP routes are ready. How well your router responds to traffic during an initial boot and during a 300,000 route update is another story. My experience with very large routers and tables is that you will have a hard time guaranteeing user traffic will pass with very much performance during an event like a full table rebuild. Luckily with the bandwidth we have these days and the CPU power on the routers, it does not take that long to pull in a full internet table and begin handling traffic.
Steven Naslund
-----Original Message----- From: Jensen Tyler [mailto:JTyler@fiberutilities.com] Sent: Wednesday, October 03, 2012 9:45 AM To: nanog@nanog.org Subject: RE: [j-nsp] Krt queue issues
Look into Static route retain. Should keep the route in the forwarding table.
From Jniper site <<< Route Retention
By default, static routes are not retained in the forwarding table when the routing process shuts down. When the routing process starts up again, any routes configured as static routes must be added to the forwarding table again. To avoid this latency, routes can be flagged as retain, so that they are kept in the forwarding table even after the routing process shuts down. Retention ensures that the routes are always in the forwarding table, even immediately after a system reboot.
Thanks,
Jensen Tyler Sr Engineering Manager Fiberutilities Group, LLC
-----Original Message----- From: juniper-nsp-bounces@puck.nether.net [mailto:juniper-nsp-bounces@puck.nether.net] On Behalf Of Benny Amorsen Sent: Wednesday, October 03, 2012 8:32 AM To: Jared Mauch Cc: Saku Ytti; juniper-nsp@puck.nether.net Subject: Re: [j-nsp] Krt queue issues
Jared Mauch <jared@puck.nether.net> writes:
As far as the fallback 'default' route, if you are purchasing transit from someone, you could consider a last-resort default pointed at them. You can exclude routes like 10/8 etc by routing these to discard + install on your devices.
That only helps if the default gets installed first, though. If the default has to wait at boot in the krt-queue behind the 300k+ Internet-routes, I have not really gained anything...
I suppose it is likely that a static default would be installed before the BGP sessions even come up.
/Benny _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
On Tue, Jan 08, 2013 at 03:45:10PM +0100, Tim Vollebregt wrote:
Hi,
What we do nowadays as some workaround, is configuring a default route towards a core router on 8 x 10G before maintaining an MX box. Which will be installed before BGP sessions come up, this will cause some packet loss during burst hour outages but is fine during maintenance hours.
I've seen cases where it took up to 30 minutes before the full table was installed correctly in the PFE's.
Currently this issue/bug is holding back our Juniper deployments. As far as I know Juniper created a project group for this bug, and so far they were able to reproduce the issue. Looks like the issue is being taken serious from now.
PR 836197 I actually have very good luck reproducing it: http://cluepon.net/ras/rpdstall.png The issue appears to be that when rpd is busy processing incoming BGP updates (such as when you turn up a large number of peers simultaniously), it starves the rest of the process from actually spending any CPU time handling/installing the route. The graph above shows a plot of the total BGP paths, the number of routes in the "pending" state, and the number of routes actually installed into the forwarding hardware. This is a very simplified example (nothing but IBGP sessions with very simple policies here, not even any EBGP neighbors), using the latest top of the line routing engine, so in real life the issue is much worse. As you can see, while rpd is still busy receiving and processing the incoming updates, the number of pending routes rises and doesn't fall, and the number of routes installed in the PFE stays almost non-existant. A few routes actually manage to squeek in before all of the BGP sessions come up, which is why it has any at all for the period between 0 and 330 seconds. After the router finishes receiving the BGP paths, the pending routes clear very quickly, and then the FIB installation process begins. 8 minutes after turning up the BGP sessions, this router finally has a full table installed in hardware. The pending routes actually clear much quicker than this once the BGP routes stop coming int, I need to update this graph with a higher resolution to show it. :) Juniper actually DOES have a fix for this issue, tweaking the scheduler in rpd so that the router still processes BGP routes even when it's spending a lot of time receiving new routes. Unfortunately they haven't yet decided to prioritize implementing this fix, so it's still stuck in development. If this issue drives you as insane as it does me, I highly encourage you to talk to your account team about PR 836197 and why 8-20+ minutes to install routes to the FIB is not acceptable to you. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Hi, On Tue, Jan 8, 2013 at 10:20 PM, Richard A Steenbergen <ras@e-gerbil.net> wrote:
PR 836197
That looks like a spanking new PR number to me. The highest PR number I found in 12.2 release notes was 82xxxx. Rather strange that they didn't have an earlier PR number, while the issue has existed for such a long time.
If this issue drives you as insane as it does me, I highly encourage you to talk to your account team about PR 836197
Done. I can't read PR836197 online as it is not public. Can you post it without liability? If you would be liable do not post it.. Also do _not_ email me off list with the PR description....... Thanks.
On Tue, Jan 08, 2013 at 11:10:16PM +0100, bas wrote:
Hi,
On Tue, Jan 8, 2013 at 10:20 PM, Richard A Steenbergen <ras@e-gerbil.net> wrote:
PR 836197
That looks like a spanking new PR number to me. The highest PR number I found in 12.2 release notes was 82xxxx. Rather strange that they didn't have an earlier PR number, while the issue has existed for such a long time.
Oh I have a pile of PR's about a mile long, including some that I opened on this issue 5+ years ago. But I'm not going to harp on the complete absurdity of how long it has taken to finally figure this thing out, or the number of people who have seen this issue while they've claimed all along that nobody else sees it. I'm just going to focus on fixing it. This is the PR that they've chosen for implementing the actual fix, so that's what I'm going with for the sake of simplicity. :)
I can't read PR836197 online as it is not public. Can you post it without liability? If you would be liable do not post it.. Also do _not_ email me off list with the PR description.......
Neither can I, but the basic description of the issue is what I said before. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
participants (5)
-
bas
-
Jensen Tyler
-
Naslund, Steve
-
Richard A Steenbergen
-
Tim Vollebregt