Encountered an issue with an MX204 using all 4x100G ports and a logical tunnel to hairpin a VRF. The tunnel started dropping packets around 8Gbps. I bumped up tunnel-services BW from 10G to 100G which made the problem worse; the tunnel was now limited to around 1.3Gbps. To my knowledge with Trio PFE you shouldn't have to disable a physical port to allocate bandwidth for tunnel-services. Any helpful info is appreciated.
AIUI, with Trio, you don’t have to disable a physical port, but that comes at the cost of “Tunnel gets whatever bandwidth is left after physical port packets are processed” and likely some additional overhead for managing the sharing. Could that be what’s happening to you? Owen
On Oct 2, 2023, at 09:24, Jeff Behrns via NANOG <nanog@nanog.org> wrote:
Encountered an issue with an MX204 using all 4x100G ports and a logical tunnel to hairpin a VRF. The tunnel started dropping packets around 8Gbps. I bumped up tunnel-services BW from 10G to 100G which made the problem worse; the tunnel was now limited to around 1.3Gbps. To my knowledge with Trio PFE you shouldn't have to disable a physical port to allocate bandwidth for tunnel-services. Any helpful info is appreciated.
-----Original Message----- From: Delong.com <owen@delong.com> Sent: Monday, October 2, 2023 5:47 PM To: behrnsjeff@yahoo.com Cc: nanog@nanog.org Subject: Re: MX204 tunnel services BW
“Tunnel gets whatever bandwidth is left after physical port packets are processed” and likely some additional overhead for managing the sharing.
Could that be what’s happening to you?
Aggregate throughput for the box was less than 100Gbps while the tunnel was being starved.
On Oct 2, 2023, at 20:18, behrnsjeff@yahoo.com wrote:
-----Original Message----- From: Delong.com <owen@delong.com> Sent: Monday, October 2, 2023 5:47 PM To: behrnsjeff@yahoo.com Cc: nanog@nanog.org Subject: Re: MX204 tunnel services BW
“Tunnel gets whatever bandwidth is left after physical port packets are processed” and likely some additional overhead for managing the sharing.
Could that be what’s happening to you?
Aggregate throughput for the box was less than 100Gbps while the tunnel was being starved.
Yeah, doesn’t quite work that way… The tunnel is assigned to one particular PFE. What was the aggregate throughput on that PFE (which spending on the card may well top out at 40Gbps or even 10Gbps, though not likely on most Trio-based cards, that’s more of the DPC era cards, which did require you to sacrifice a port for tunnel bandwidth). Owen
AIUI, with Trio, you don’t have to disable a physical port, but that comes at the cost of “Tunnel gets whatever bandwidth is left after physical port packets are processed” and likely some additional overhead for managing the sharing.
This was pretty much my understanding as well, last time I dealt with this. On MPC/Trio , you just enabled tunnel-services on a given PIC, and landed your tunnel there. The tunnel capacity was just part of the PFE capacity. Was only on pre-Trio that the bandwidth keyword was required, and that actually reserved that much capacity strictly for the tunnel. On Mon, Oct 2, 2023 at 6:48 PM Delong.com via NANOG <nanog@nanog.org> wrote:
AIUI, with Trio, you don’t have to disable a physical port, but that comes at the cost of “Tunnel gets whatever bandwidth is left after physical port packets are processed” and likely some additional overhead for managing the sharing.
Could that be what’s happening to you?
Owen
On Oct 2, 2023, at 09:24, Jeff Behrns via NANOG <nanog@nanog.org> wrote:
Encountered an issue with an MX204 using all 4x100G ports and a logical tunnel to hairpin a VRF. The tunnel started dropping packets around 8Gbps. I bumped up tunnel-services BW from 10G to 100G which made the problem worse; the tunnel was now limited to around 1.3Gbps. To my knowledge with Trio PFE you shouldn't have to disable a physical port to allocate bandwidth for tunnel-services. Any helpful info is appreciated.
On Mon, 2 Oct 2023 at 20:21, Jeff Behrns via NANOG <nanog@nanog.org> wrote:
Encountered an issue with an MX204 using all 4x100G ports and a logical tunnel to hairpin a VRF. The tunnel started dropping packets around 8Gbps. I bumped up tunnel-services BW from 10G to 100G which made the problem worse; the tunnel was now limited to around 1.3Gbps. To my knowledge with Trio PFE you shouldn't have to disable a physical port to allocate bandwidth for tunnel-services. Any helpful info is appreciated.
You might have more luck in j-nsp. But yes you don't need any physical interface in trio to do tunneling. I can't explain your problem, and you probably need JTAC help. I would appreciate it if you'd circle back and tell what the problem was. How it works is that when PPE decides it needs to tunnel the packet, you're going to send the packet back to MQ via SERDES (which will then send it again to some PPE, not the same). I think what that bandwidth command does is change the stream allocation, you should see it in 'show <MQ/XM...> <#> stream'. In theory, because PPE can process packet forever (well, until watchdog kills the PPE for thinking it is stuck) you could very cheaply do outer+inner at the local PPE, but I think that would mean that certain features like QoS would not work on the inner interface, so I think all this expensive recirculation and SERDES consumption is to satisfy quite limited need, and it should be possible to implement some 'performance mode' for tunneling, where these MQ/XM provided features are not available, but performance cost in most cases is negligible. In parallel to opening the JTAC case, you might want to try to experiment in which FPC/PIC you set the tunneling bandwidth to. I don't understand how the tunneling would work if the MQ/XM is remote, like would you then also steal fabric capacity every time you tunnel, not just MQ>LU>MQ>LU SERDES, but MQ>LU>MQ>FAB>MQ>LU. So intuitively I would recommend ensuring you have the bandwidth configured at the local PFE, if you don't know what the local PFE is, just configure it everywhere? Also you could consult several counters to see if some stream or fabric is congested, and these tunneled packets are being sent over congested fabric every time with lower fabric qos. I don't understand why the bandwidth command is a thing, and why you can configure where it is. To me it seems obvious they should always handle tunneling strictly locally, never over fabric, because you always end up stealing more capacity if you send it to remote MQ. That is, implicitly it should be on for every MQ, and every PPE tunnel via local MQ. -- ++ytti
You can configure tunnel bandwidth everywhere, but you can’t configure a given tunnel everywhere, you have to assign it to a particular FPC/PIC/0. For example, with: set chassis fps 2 pic 3 tunnel-services bandwidth 10g You need to create gr-2/3/0 interfaces for tunnels to use that PFE. You can create multiple tunnel-services bandwidth entries on multiple PICs, but you can only put a given tunnel on one gr-x/y/0 interface. Owen
On Oct 2, 2023, at 23:21, Saku Ytti <saku@ytti.fi> wrote:
On Mon, 2 Oct 2023 at 20:21, Jeff Behrns via NANOG <nanog@nanog.org> wrote:
Encountered an issue with an MX204 using all 4x100G ports and a logical tunnel to hairpin a VRF. The tunnel started dropping packets around 8Gbps. I bumped up tunnel-services BW from 10G to 100G which made the problem worse; the tunnel was now limited to around 1.3Gbps. To my knowledge with Trio PFE you shouldn't have to disable a physical port to allocate bandwidth for tunnel-services. Any helpful info is appreciated.
You might have more luck in j-nsp.
But yes you don't need any physical interface in trio to do tunneling. I can't explain your problem, and you probably need JTAC help. I would appreciate it if you'd circle back and tell what the problem was.
How it works is that when PPE decides it needs to tunnel the packet, you're going to send the packet back to MQ via SERDES (which will then send it again to some PPE, not the same). I think what that bandwidth command does is change the stream allocation, you should see it in 'show <MQ/XM...> <#> stream'.
In theory, because PPE can process packet forever (well, until watchdog kills the PPE for thinking it is stuck) you could very cheaply do outer+inner at the local PPE, but I think that would mean that certain features like QoS would not work on the inner interface, so I think all this expensive recirculation and SERDES consumption is to satisfy quite limited need, and it should be possible to implement some 'performance mode' for tunneling, where these MQ/XM provided features are not available, but performance cost in most cases is negligible.
In parallel to opening the JTAC case, you might want to try to experiment in which FPC/PIC you set the tunneling bandwidth to. I don't understand how the tunneling would work if the MQ/XM is remote, like would you then also steal fabric capacity every time you tunnel, not just MQ>LU>MQ>LU SERDES, but MQ>LU>MQ>FAB>MQ>LU. So intuitively I would recommend ensuring you have the bandwidth configured at the local PFE, if you don't know what the local PFE is, just configure it everywhere? Also you could consult several counters to see if some stream or fabric is congested, and these tunneled packets are being sent over congested fabric every time with lower fabric qos.
I don't understand why the bandwidth command is a thing, and why you can configure where it is. To me it seems obvious they should always handle tunneling strictly locally, never over fabric, because you always end up stealing more capacity if you send it to remote MQ. That is, implicitly it should be on for every MQ, and every PPE tunnel via local MQ.
-- ++ytti
JTAC says we must disable a physical port to allocate BW for tunnel-services. Also leaving tunnel-services bandwidth unspecified is not possible on the 204. I haven't independently tested / validated in lab yet, but this is what they have told me. I advised JTAC to update the MX204 "port-checker" tool with a tunnel-services knob to make this caveat more apparent.
Looks like the MX204 Is a bit of an odd duck in the MX series. It probably shares some hardware characteristics under the hood (even the MX80 (mostly, there was a variant that had pre-installed interfaces) had MIC slots). The MX-204 appears to be an entirely fixed configuration chassis and looks from the literature like it is based on pre-trio chipset technology. Interesting that there are 100Gbe interfaces implemented with this seemingly older technology, but yes, looks like the PFE on the MX-204 has all the same restrictions as a DPC-based line card in other MX-series routers. Owen
On Oct 16, 2023, at 12:49, Jeff Behrns via NANOG <nanog@nanog.org> wrote:
JTAC says we must disable a physical port to allocate BW for tunnel-services. Also leaving tunnel-services bandwidth unspecified is not possible on the 204. I haven't independently tested / validated in lab yet, but this is what they have told me. I advised JTAC to update the MX204 "port-checker" tool with a tunnel-services knob to make this caveat more apparent.
On Tue, 17 Oct 2023 at 00:28, Delong.com <owen@delong.com> wrote:
The MX-204 appears to be an entirely fixed configuration chassis and looks from the literature like it is based on pre-trio chipset technology. Interesting that there are 100Gbe interfaces implemented with this seemingly older technology, but yes, looks like the PFE on the MX-204 has all the same restrictions as a DPC-based line card in other MX-series routers.
It is 100% normal Trio EA. -- ++ytti
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 According to: [https://www.juniper.net/documentation/us/en/software/junos/interfaces-encryption/topics/topic-map/configuring-tunnel-interfaces.html\#id-configuring-tunnel-interfaces-on-mx-204-routers][https_www.juniper.net_documentation_us_en_software_junos_interfaces-encryption_topics_topic-map_configuring-tunnel-interfaces.html_id-configuring-tunnel-interfaces-on-mx-204-routers] "The MX204 router supports two inline tunnels - one per PIC. To configure the tunnel interfaces, include the tunnel-services statement and an optional bandwidth of 1 Gbps through 200 Gbps at the \[edit chassis fpc fpc-slot pic number\] hierarchy level. If you do not specify the tunnel bandwidth then, the tunnel interface can have a maximum bandwidth of up to 200 Gbps." If JTAC is saying it's no longer optional they need to update their docs. AFAIK, tunnel services doesn't directly take bandwidth from physical ports, but it does take from the total available PFE bandwidth. Disabling a port may be required as the MX204 has a maximum PFE bandwidth of 400G and you can oversubscribe that with the fixed physical ports. I just checked a production config as an example, note how et-0/0/3 is not configured so the total bandwidth adds up to 400g: set chassis fpc 0 pic 0 tunnel-services bandwidth 20g set chassis fpc 0 pic 0 port 0 speed 100g set chassis fpc 0 pic 0 port 1 speed 100g set chassis fpc 0 pic 0 port 2 speed 100g set chassis fpc 0 pic 1 port 0 speed 10g set chassis fpc 0 pic 1 port 1 speed 10g set chassis fpc 0 pic 1 port 2 speed 10g set chassis fpc 0 pic 1 port 3 speed 10g set chassis fpc 0 pic 1 port 4 speed 10g set chassis fpc 0 pic 1 port 5 speed 10g set chassis fpc 0 pic 1 port 6 speed 10g set chassis fpc 0 pic 1 port 7 speed 10g Regards, Ryan \-------- Original Message -------- On Oct. 16, 2023, 12:49, Jeff Behrns via NANOG < nanog@nanog.org> wrote:
JTAC says we must disable a physical port to allocate BW for tunnel-services. Also leaving tunnel-services bandwidth unspecified is not possible on the 204. I haven't independently tested / validated in lab yet, but this is what they have told me. I advised JTAC to update the MX204 "port-checker" tool with a tunnel-services knob to make this caveat more apparent.
[https_www.juniper.net_documentation_us_en_software_junos_interfaces-encryption_topics_topic-map_configuring-tunnel-interfaces.html_id-configuring-tunnel-interfaces-on-mx-204-routers]: https://www.juniper.net/documentation/us/en/software/junos/interfaces-encryp... -----BEGIN PGP SIGNATURE----- Version: ProtonMail wnUEARYIACcFAmUt4VMJEP7aH/V1zBsBFiEExqGOs9CyQpg6/JJ5/tof9XXM GwEAAJF0AQCDM0b/X+LFPSXjVfC6NQGEyszqkIkbq84tmzl+boOJgwD+NM8u n7o4e2SoCYs8yOIyaii2ElG+SFT735zXQhFx6A4= =JuZc -----END PGP SIGNATURE-----
On 10/17/23 03:20, Ryan Kozak wrote:
"The MX204 router supports two inline tunnels - one per PIC. To configure the tunnel interfaces, include the tunnel-services statement and an optional bandwidth of 1 Gbps through 200 Gbps at the \[edit chassis fpc fpc-slot pic number\] hierarchy level. If you do not specify the tunnel bandwidth then, the tunnel interface can have a maximum bandwidth of up to 200 Gbps."
If JTAC is saying it's no longer optional they need to update their docs.
We can commit "tunnel-services" on an MX204 without caveat. Mark.
On Mon, 16 Oct 2023 at 22:49, <behrnsjeff@yahoo.com> wrote:
JTAC says we must disable a physical port to allocate BW for tunnel-services. Also leaving tunnel-services bandwidth unspecified is not possible on the 204. I haven't independently tested / validated in lab yet, but this is what they have told me. I advised JTAC to update the MX204 "port-checker" tool with a tunnel-services knob to make this caveat more apparent.
Did they explain why you need to disable the physical port? I'd love to hear that explanation. The MX204 is single Trio EA, so you can't even waste serdes sending the packet to remote PFE after first lookup, it would only bounce between local XM/MQ and LU/XL, wasting that serdes. -- ++ytti
participants (7)
-
behrnsjeff@yahoo.com
-
Delong.com
-
Mark Tinka
-
Owen DeLong
-
Ryan Kozak
-
Saku Ytti
-
Tom Beecher