Hi Frederik,
On 09 Apr 2015, at 13:24, Frederik Kriewitz <frederik@kriewitz.eu> wrote:
Thank you very much for all your responses.
First of all, the problems we see are really RIB (Processor memory) and CPU related. The TCAM/FIB limits are properly configured. From the FIB capacity view they should last a couple of more years. Software routing doesn't cause the problem. The most extreme case of Cisco 6500/SUP720 abuse I'm aware of is a setup with 4 full table transit connections + 2 RR sessions + ~20 peerings, no downstreams. Besides the IPv4 and IPv6 peerings it's pretty much only handling a small amount of OSPF and MPLS (<5k prefixes ~500 routers). No netflow or any other memory hog. Under normal condition it's running at 20% CPU and 90% processor memory (1G/SUP720 XL).
The main limit here apart from the rather slow CPU for RP is the amount of memory you can have. I’d setup a CSR1000v as RR and offload the 6500 from the control-plane completely. It’s nice box to do very fast hardware forwarding as long as the FIB fits in the TCAMs, which it seems it does in your scenario.
In case a session with a lot of prefixes (e.g. a transit) fails, it takes up to 5 minutes for the BGP Router process to recompute the RIB, etc.. During that time it's running at 100% CPU. Low priority processes are completely ignored (e.g. SNMP based monitoring stops working). Occasionally it even drops OSPF neighbours or other BGP sessions due to expired hold timers causing further havoc.
You can tune this with process time tweaks.
Applying a /22 filter was suggested. In order to actually safe the RIB memory we would have to disable soft-reconfiguration on the corresponding sessions. I don't like that option for various reasons as it trades less memory usage for longer convergence times and significant bigger impacts on route map updates. Due to the IPv4 exhaustion we expect to see more small prefixes in the future which can't be aggregated (considering the AS path). Simply dropping them would result in less optimal routing.
If you have to filter somewhere on something, I’d rather try to filter by AS_PATH (neighbors, etc) than prefix lengths. -- "There's no sense in being precise when | Łukasz Bromirski you don't know what you're talking | jid:lbromirski@jabber.org about." John von Neumann | http://lukasz.bromirski.net