Hello. I've run into a bit of a snag and I hope some folks here may be able to enlighten. From time to time I check the 'sh platform hardware capacity' command on our Catalyst 6509s and have noticed this item: CPU Resources CPU utilization: Module 5 seconds 1 minute 5 minutes 5 RP 1% / 0% 3% 4% 5 SP 82% / 27% 62% 73% This is shown on two 6509 switches that we operate as Core layer devices. This value goes up to 85-90% during periods of peak traffic and I'm concerned that this may be a problem. Checking 'sh proc cpu' is usually 10% or less. I've gone over this document backwards and forwards and none of the situations outlined seem to apply here: http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note0918... One thing to note, is that our main ACL for ingress traffic is applied here due to historical reasons. It's roughly 5000 single host entries at present. We also use these devices for NDE. I'm probably missing some other key details, but what could influence the SP like this? Any insight would be appreciated. -- Philip L.
On Sat, 15 Nov 2008, Philip L. wrote:
I've run into a bit of a snag and I hope some folks here may be able to enlighten. From time to time I check the 'sh platform hardware capacity' command on our Catalyst 6509s and have noticed this item:
CPU Resources CPU utilization: Module 5 seconds 1 minute 5 minutes 5 RP 1% / 0% 3% 4% 5 SP 82% / 27% 62% 73%
This is shown on two 6509 switches that we operate as Core layer devices. This value goes up to 85-90% during periods of peak traffic and I'm concerned that this may be a problem.
Checking 'sh proc cpu' is usually 10% or less.
I've gone over this document backwards and forwards and none of the situations outlined seem to apply here: http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note0918...
One thing to note, is that our main ACL for ingress traffic is applied here due to historical reasons. It's roughly 5000 single host entries at present. We also use these devices for NDE.
This should probably be on cisco-nsp rather than nanog, but... 5000 lines for ACL? I don't have any experience with ACLs of that size, but it sounds like a possible problem. If you're doing netflow export and not doing sampled netflow, I'm guessing this is where your problem is. sh mls netflow table-contention detailed might be able to confirm or rule this out. ---------------------------------------------------------------------- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net | _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
* Jon Lewis:
I've run into a bit of a snag and I hope some folks here may be able to enlighten. From time to time I check the 'sh platform hardware capacity' command on our Catalyst 6509s and have noticed this item:
MSFC/PFC version is also relevant.
5000 lines for ACL? I don't have any experience with ACLs of that size, but it sounds like a possible problem.
Yes, but it should be doable. I don't know the commands for the current IOS releases, but "show tcam" (including "show tcam detail") and "show fm interface" were quite helpful for designing ACLs for efficient processing.
This is on a Sup720-3BXL by the way: 'sh mls netflow table-con detailed:' Earl in Module 5 Detailed Netflow CAM (TCAM and ICAM) Utilization ================================================ TCAM Utilization : 100% ICAM Utilization : 6% Netflow TCAM count : 262024 Netflow ICAM count : 8 Netflow Creation Failures : 2085847 Netflow CAM aliases : 0 I had read about this earlier, along with 100% TCAM usage for the FIB, but that wouldn't be the case here, as we're only showing 25% of the FIB TCAM being used. -- Philip L. Jon Lewis wrote:
This should probably be on cisco-nsp rather than nanog, but...
5000 lines for ACL? I don't have any experience with ACLs of that size, but it sounds like a possible problem.
If you're doing netflow export and not doing sampled netflow, I'm guessing this is where your problem is. sh mls netflow table-contention detailed might be able to confirm or rule this out.
---------------------------------------------------------------------- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net | _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On Sat, 15 Nov 2008, Philip L. wrote:
This is on a Sup720-3BXL by the way:
'sh mls netflow table-con detailed:' Earl in Module 5 Detailed Netflow CAM (TCAM and ICAM) Utilization ================================================ TCAM Utilization : 100% ICAM Utilization : 6% Netflow TCAM count : 262024 Netflow ICAM count : 8 Netflow Creation Failures : 2085847 Netflow CAM aliases : 0
This looks like the same issue I ran into not long ago. Switch your netflow over from full to sampled...you lose lots of data, but your hardware can't handle full netflow for your traffic level. AFAIK, your only other options are to mess with the mls aging timers (shorten them) or buy cards with DFC and hope that gets you enough additional netflow capacity for the interfaces your collecting. http://www.gossamer-threads.com/lists/cisco/nsp/94953 ---------------------------------------------------------------------- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net | _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
Jon Lewis wrote:
On Sat, 15 Nov 2008, Philip L. wrote:
This is on a Sup720-3BXL by the way:
'sh mls netflow table-con detailed:' Earl in Module 5 Detailed Netflow CAM (TCAM and ICAM) Utilization ================================================ TCAM Utilization : 100% ICAM Utilization : 6% Netflow TCAM count : 262024 Netflow ICAM count : 8 Netflow Creation Failures : 2085847 Netflow CAM aliases : 0
This looks like the same issue I ran into not long ago. Switch your netflow over from full to sampled...you lose lots of data, but your hardware can't handle full netflow for your traffic level.
AFAIK, your only other options are to mess with the mls aging timers (shorten them) or buy cards with DFC and hope that gets you enough additional netflow capacity for the interfaces your collecting.
Hopefully he is not trying to use netflow for accounting/billing. I use: mls sampling packet-based 1024 8192 As it is a convenient factor of ~1000 from the real numbers. 1Gbit/s of traffic shows up as 1Mbit/s. This has been accurate enough for anything I have wanted to look at like per-AS traffic. - Kevin
On Sat, Nov 15, 2008 at 04:35:28PM -0500, Philip L. wrote:
One thing to note, is that our main ACL for ingress traffic is applied here due to historical reasons. It's roughly 5000 single host entries at present. We also use these devices for NDE.
On a SUP7203BXL, if your ACL TCAM utilization is fine, this shouldn't impact performance unless you're logging too much. Since you've been over the CPU utilization doc, I'm guessing you know that. "show platform hardware capacity acl" will give you a breakdown on your ACL TCAM usage.
I'm probably missing some other key details, but what could influence the SP like this? Any insight would be appreciated.
Cisco says that Netflow-based features always handle the first packet of a flow in software, but I don't know if this is the RP or the SP. It would make sense if a first-flow packet that didn't need punting hit the SP and not the RP. In that case, your traffic level with netflow enabled could explain your high SP utilization. -- Ross Vandegrift ross@kallisti.us "If the fight gets hot, the songs get hotter. If the going gets tough, the songs get tougher." --Woody Guthrie
Ross Vandegrift wrote:
On Sat, Nov 15, 2008 at 04:35:28PM -0500, Philip L. wrote:
One thing to note, is that our main ACL for ingress traffic is applied here due to historical reasons. It's roughly 5000 single host entries at present. We also use these devices for NDE.
On a SUP7203BXL, if your ACL TCAM utilization is fine, this shouldn't impact performance unless you're logging too much. Since you've been over the CPU utilization doc, I'm guessing you know that.
"show platform hardware capacity acl" will give you a breakdown on your ACL TCAM usage.
I'm probably missing some other key details, but what could influence the SP like this? Any insight would be appreciated.
Cisco says that Netflow-based features always handle the first packet of a flow in software, but I don't know if this is the RP or the SP. It would make sense if a first-flow packet that didn't need punting hit the SP and not the RP. In that case, your traffic level with netflow enabled could explain your high SP utilization.
It is a Sup720-3BXL. Based on the suggestions here, I went ahead and did 'no ip flow ingress' on all the interfaces just to see, and surely enough, the SP went down to about 10-15%. My colleague implemented packet count-based NetFlow sampling to attempt to reduce the 100% NetFlow TCAM usage, and it appears to be partially effective. It still fills up frequently, so we'll have to do some more tweaking. I appreciate all the replies, public and private. -- Philip L.
participants (5)
-
Florian Weimer
-
Jon Lewis
-
Kevin Loch
-
Philip L.
-
Ross Vandegrift