We're running GRE/IPSec transport over a point-to-point DS3. We're also doing some QoS. The traffic mix is voice; our average packet size can be as low as 250 bytes at times. We are seeing incredibly high CPU when the traffic levels approach 30Mb/s and around 11kpps in each direction, at times over 95%. We've seen packet loss as well in the priority queue. We recently forklifted the routers on a point-to-point DS3 from 3845s to 3945s, thinking we'd see an improvement in performance. We saw no such improvement, and some on our team argue it's worse. I'm assuming here that the packet rate coupled with the QoS and GRE is just killing the router's CPU. That said, are there any optimizations that we could consider before ripping this thing out completely? Yes, cef is enabled. I'm considering changing from GRE/IPSec to VTI, but I suspect this will still have the same actual switching behavior through the router, and may not change anything. One other possibility was to run IPSec tunnel mode and just exclude EIGRP from the tunnel, but that may be risky (IPSec fails == black hole). Our fall back plan is swap out the DS3 with an ethernet and get a L2 ethernet encryptor, but that can't happen until 2011-01-01. Any suggestions, wisdom? Thanks, -cjp
This is probably more appropriate for the cisco-nsp list, but what process is taking up the CPU or is it due to interrupts? To the best of my knowledge the crypto should be hardware accelerated, while everything else is going to be done in software on the 3800. -Pete On Thu, Nov 18, 2010 at 11:10 AM, Christopher J. Pilkington <cjp@0x1.net> wrote:
We're running GRE/IPSec transport over a point-to-point DS3. We're also doing some QoS. The traffic mix is voice; our average packet size can be as low as 250 bytes at times.
We are seeing incredibly high CPU when the traffic levels approach 30Mb/s and around 11kpps in each direction, at times over 95%. We've seen packet loss as well in the priority queue.
We recently forklifted the routers on a point-to-point DS3 from 3845s to 3945s, thinking we'd see an improvement in performance. We saw no such improvement, and some on our team argue it's worse.
I'm assuming here that the packet rate coupled with the QoS and GRE is just killing the router's CPU. That said, are there any optimizations that we could consider before ripping this thing out completely?
Yes, cef is enabled.
I'm considering changing from GRE/IPSec to VTI, but I suspect this will still have the same actual switching behavior through the router, and may not change anything.
One other possibility was to run IPSec tunnel mode and just exclude EIGRP from the tunnel, but that may be risky (IPSec fails == black hole).
Our fall back plan is swap out the DS3 with an ethernet and get a L2 ethernet encryptor, but that can't happen until 2011-01-01.
Any suggestions, wisdom?
Thanks, -cjp
On 11/18/2010 14:39, Pete Lumbis wrote:
This is probably more appropriate for the cisco-nsp list, but what process is taking up the CPU or is it due to interrupts? To the best of my knowledge the crypto should be hardware accelerated, while everything else is going to be done in software on the 3800.
The ISR series do have onboard hardware crypto, but I don't know offhand if it can handle a full DS3 worth. My first guess is fragment reassembly would probably kill it fast. ~Seth
Do you have the VPN/SSL AIM module? That would offload the crypto work. Supposedly capable of full 100Mbps line rate, I have them in 2811s. Sincerely, Brian A . Rettke RHCT, CCDP, CCNP, CCIP Network Engineer, CableONE Internet Services -----Original Message----- From: Seth Mattinen [mailto:sethm@rollernet.us] Sent: Thursday, November 18, 2010 3:48 PM To: nanog@nanog.org Subject: Re: Cisco GRE/IPSec performance, 3845 ISR/3945 ISR G2 On 11/18/2010 14:39, Pete Lumbis wrote:
This is probably more appropriate for the cisco-nsp list, but what process is taking up the CPU or is it due to interrupts? To the best of my knowledge the crypto should be hardware accelerated, while everything else is going to be done in software on the 3800.
The ISR series do have onboard hardware crypto, but I don't know offhand if it can handle a full DS3 worth. My first guess is fragment reassembly would probably kill it fast. ~Seth
There are a couple potential issues, that when looked at in whole, add up to a significant performance impact. 1) IPSec + GRE involves two forwarding operations, one to send it to the tunnel interface , and another to send the now-encapsulated packet out the WAN interface. This effectively halves the total forwarding rate before any other considerations. 2) While the IPSec portion is hardware accelerated, the GRE encapsulation is not, unless this is a Cat6500/CISCO7600 router, or 7200VXR with C7200-VSA card. Because of this, the GRE process itself will consume a fairly large amount of CPU, as this is also a per-packet process. The impact is similar to a forwarding decision, so that throughput level is halved again. 3) Other factors like quantity of tunnels, any routing protocols running, NAT, or other such control protocols all have their own CPU demands too, and can, in aggregate, be a small but significant burden when the router also has to handle the demands of IPSec + GRE. For reference, here is a guide to VPN performance: http://www.cisco.com/web/partners/downloads/765/tools/quickreference/vpn _performance_eng.pdf It's slightly old, as it does not have the 39xx routers, but is still useful for raw 3DES/AES performance for the 1800/2800/3800. See Table 5. Sam Chesluk | Team Lead - Key Accounts | Network Hardware Resale | T: 805.690.3718 | M:805.450.7469 | F: 805-690-3713 26 Castilian Dr. Santa Barbara, CA 93117 E: sam@networkhardware.com | www.networkhardware.com - NHR's top global performer 7 years running - World's largest provider of pre-owned/fully-tested and new/sealed Cisco hardware -----Original Message----- From: Seth Mattinen [mailto:sethm@rollernet.us] Sent: Thursday, November 18, 2010 2:48 PM To: nanog@nanog.org Subject: Re: Cisco GRE/IPSec performance, 3845 ISR/3945 ISR G2 On 11/18/2010 14:39, Pete Lumbis wrote:
This is probably more appropriate for the cisco-nsp list, but what process is taking up the CPU or is it due to interrupts? To the best of my knowledge the crypto should be hardware accelerated, while everything else is going to be done in software on the 3800.
The ISR series do have onboard hardware crypto, but I don't know offhand if it can handle a full DS3 worth. My first guess is fragment reassembly would probably kill it fast. ~Seth
On Thu, Nov 18, 2010 at 03:18:04PM -0800, Sam Chesluk wrote:
2) While the IPSec portion is hardware accelerated, the GRE encapsulation is not, unless this is a Cat6500/CISCO7600 router, or 7200VXR with C7200-VSA card. Because of this, the GRE process itself will consume a fairly large amount of CPU, as this is also a per-packet process. The impact is similar to a forwarding decision, so that throughput level is halved again.
I think this is where we're having the issue. It is just shocking that this is occurring in a relatively low kpps situation.
3) Other factors like quantity of tunnels, any routing protocols running, NAT, or other such control protocols all have their own CPU demands too, and can, in aggregate, be a small but significant burden when the router also has to handle the demands of IPSec + GRE.
The number we were given for the 3945 for IMIX 1400 raw IPSec performance was 840Mbps. However, all this extra crypto power is completely useless if the GRE processing is hitting the same limits as it's predecessor, the 3845. We're going to give straight IPSec a go to see if that solves things. -cjp
On Thursday 18 November 2010 18:18:04 Sam Chesluk wrote:
There are a couple potential issues, that when looked at in whole, add up to a significant performance impact.
1) IPSec + GRE involves two forwarding operations, one to send it to the tunnel interface , and another to send the now-encapsulated packet out the WAN interface. This effectively halves the total forwarding rate before any other considerations.
2) While the IPSec portion is hardware accelerated, the GRE encapsulation is not, unless this is a Cat6500/CISCO7600 router, or 7200VXR with C7200-VSA card. Because of this, the GRE process itself will consume a fairly large amount of CPU, as this is also a per-packet process. The impact is similar to a forwarding decision, so that throughput level is halved again.
I would like to question this one. I always thought that GRE header is pre-calculated and kept in the CEF adjacency table, thus GRE encapsulation involves no additional processing overhead compared to regular ethernet encapsulation. The only difference with 6500/7600 is that encapsulation is done by CPU, not PFC. I'm in no way an expert in this, but I'd imagine the whole process to be like this: 1. a sinlge CEF lookup/encapsulation produces a GRE packet 2. packet encryption/ESP encapsulation 3. another CEF lookup/encapsulation to get the encrypted packet out So forwarding rate halved, but just once. Am I wrong? Michael
The GRE encap on a software based router like an ISR should be resolved in CEF from the start, so it shouldn't be two CEF lookups. However, on the software based platforms, every feature you turn on takes a little more CPU so even with a single lookup I wouldn't expect the same performance from GRE that I would from non-GRE traffic. 6k/7600 requires recirculation in hardware and the story is completely different as you are basically running the packet through the hardware twice. -Pete On Fri, Nov 19, 2010 at 12:28 PM, Michael Ulitskiy <mulitskiy@acedsl.com> wrote:
On Thursday 18 November 2010 18:18:04 Sam Chesluk wrote:
There are a couple potential issues, that when looked at in whole, add up to a significant performance impact.
1) IPSec + GRE involves two forwarding operations, one to send it to the tunnel interface , and another to send the now-encapsulated packet out the WAN interface. This effectively halves the total forwarding rate before any other considerations.
2) While the IPSec portion is hardware accelerated, the GRE encapsulation is not, unless this is a Cat6500/CISCO7600 router, or 7200VXR with C7200-VSA card. Because of this, the GRE process itself will consume a fairly large amount of CPU, as this is also a per-packet process. The impact is similar to a forwarding decision, so that throughput level is halved again.
I would like to question this one. I always thought that GRE header is pre-calculated and kept in the CEF adjacency table, thus GRE encapsulation involves no additional processing overhead compared to regular ethernet encapsulation. The only difference with 6500/7600 is that encapsulation is done by CPU, not PFC. I'm in no way an expert in this, but I'd imagine the whole process to be like this: 1. a sinlge CEF lookup/encapsulation produces a GRE packet 2. packet encryption/ESP encapsulation 3. another CEF lookup/encapsulation to get the encrypted packet out So forwarding rate halved, but just once. Am I wrong?
Michael
On Thu, Nov 18, 2010 at 02:47:35PM -0800, Seth Mattinen wrote:
The ISR series do have onboard hardware crypto, but I don't know offhand if it can handle a full DS3 worth.
My first guess is fragment reassembly would probably kill it fast.
We're not seeing fragmentation. The MTU of the physical DS3 is arbitrarily large (over 9000) to intentionally avoid this. -cjp
participants (6)
-
Christopher J. Pilkington
-
Michael Ulitskiy
-
Pete Lumbis
-
Rettke, Brian
-
Sam Chesluk
-
Seth Mattinen