Cisco ASR9902 SNMP polling ... is interesting

Hello, We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them. We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently. Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router. If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out. If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out. I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design. I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number? The larger implication is that I still can't find another router from another vendor that does this. Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have? Thanks, -Drew

How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals. -mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/HUP4BJYN...

Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. -mel
On Aug 1, 2025, at 6:38 AM, Mel Beckman <mel@beckman.org> wrote:
How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals.
-mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/HUP4BJYN...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/YFBCZDFS...

I don't know if you're speaking specifically about the ASR 9902 or all routers but I can tell you that after doing this for 26 years I've never seen another router handle SNMP responses differently depending on what interface the request comes in on. I can name 8 vendors and even models from Cisco that don't do this. So I'm not sure this is standard practice as you seem to be implying. Thanks, -Drew -----Original Message----- From: Mel Beckman <mel@beckman.org> Sent: Friday, August 1, 2025 9:43 AM To: nanog@lists.nanog.org Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@lists.nanog.org Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. -mel
On Aug 1, 2025, at 6:38 AM, Mel Beckman <mel@beckman.org> wrote:
How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals.
-mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=Q2RyEqHfEgQ-X2KzSAl-_cydxhA0rlcApGAdZvdw5ve2NIJN86F-3a_rxvmBGX7G&s=tdz6udW6pvsXVnz3KKbQDKNwyYe3cjFT3ZOBcvyuiYo&e=

Each device type will have different internals which might influence the results, but bottom line, SNMP doesn’t scale well for rich information retrieval plus frequent polling. It’s similar to running a heavy SQL query in a database, often. But neither the SNMP servers nor the protocol are really optimized for it… The polling doesn’t cut it. See if the devices support any form of telemetry, where the device itself will take care of collecting and sending to a central server. Pedro Martins Prado pedro.prado@gmail.com / +353 83 036 1875
On 1 Aug 2025, at 15:00, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
I don't know if you're speaking specifically about the ASR 9902 or all routers but I can tell you that after doing this for 26 years I've never seen another router handle SNMP responses differently depending on what interface the request comes in on. I can name 8 vendors and even models from Cisco that don't do this. So I'm not sure this is standard practice as you seem to be implying.
Thanks, -Drew
-----Original Message----- From: Mel Beckman <mel@beckman.org <mailto:mel@beckman.org>> Sent: Friday, August 1, 2025 9:43 AM To: nanog@lists.nanog.org <mailto:nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com <mailto:drew.weaver@thenap.com>>; nanog@lists.nanog.org <mailto:nanog@lists.nanog.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
-mel
On Aug 1, 2025, at 6:38 AM, Mel Beckman <mel@beckman.org> wrote:
How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals.
-mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=Q2RyEqHfEgQ-X2KzSAl-_cydxhA0rlcApGAdZvdw5ve2NIJN86F-3a_rxvmBGX7G&s=tdz6udW6pvsXVnz3KKbQDKNwyYe3cjFT3ZOBcvyuiYo&e=
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OEOY5K7F...

On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc. 62% would be devastating. In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic. Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor. It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them. -- ++ytti

Hi, Just to correct: I was saying that 62% of the polls timeout and that only 38% actually result in responses and those 38% responses take multiples of time longer to actually complete if polling on an in-line interface. This is just with a simple bash script running "time check_interfaces <args>" from the Nagios-Tools package and doing hundreds of poll runs in a row with various pauses between pollings. It would be a little less of a concern if any other product did this but the idea that they just sort of left it 62% broken and shipped it that way is really making me wonder what else only functions at 38%. We don't have a huge budget and the ASR9902 costs almost twice as much as the Arista devices we would've preferred to buy [the Arista device in question has 30x100GE ports and the ASR9902 is basically an 8x100GE router with a very poorly configured midplane/gearbox that ties into some sort of switch [that nobody seems to know how any of that works at Cisco, either]. If we had an unlimited budget we'd just mulligan this thing and buy the DCS devices that we want but we're stuck with it and if we're stuck with it I don't think it's insane to expect it to operate at least as well as an ASR9001. Thanks, -Drew -----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 1, 2025 2:28 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc. 62% would be devastating. In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic. Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor. It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_KK73RTHMIZXLUMICYPEECO2AQXILKHIQ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=d_XQ0w1ltWzu7JBKSWfGAfci8ywpv0Vz_Lg6Q-eS5pZAWpgoZ9PBnm_qnf2BAqbd&s=CmbeUcr_Ltz9nrzW2h4l3azL_KBEqloxrF9Rl9GuEpQ&e=

Drew, As I said elsewhere, the control plane was invented to separate management functions from the data forwarding process. In-band SNMP to data forwarding interfaces violates that separation. I’d say all bets are off. As they say in mathematics, this behavior is undefined. :) -mel via cell
On Aug 1, 2025, at 11:42 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hi,
Just to correct:
I was saying that 62% of the polls timeout and that only 38% actually result in responses and those 38% responses take multiples of time longer to actually complete if polling on an in-line interface.
This is just with a simple bash script running "time check_interfaces <args>" from the Nagios-Tools package and doing hundreds of poll runs in a row with various pauses between pollings.
It would be a little less of a concern if any other product did this but the idea that they just sort of left it 62% broken and shipped it that way is really making me wonder what else only functions at 38%.
We don't have a huge budget and the ASR9902 costs almost twice as much as the Arista devices we would've preferred to buy [the Arista device in question has 30x100GE ports and the ASR9902 is basically an 8x100GE router with a very poorly configured midplane/gearbox that ties into some sort of switch [that nobody seems to know how any of that works at Cisco, either].
If we had an unlimited budget we'd just mulligan this thing and buy the DCS devices that we want but we're stuck with it and if we're stuck with it I don't think it's insane to expect it to operate at least as well as an ASR9001.
Thanks, -Drew
-----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 1, 2025 2:28 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc.
62% would be devastating.
In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic.
Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor.
It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them.
-- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_KK73RTHMIZXLUMICYPEECO2AQXILKHIQ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=d_XQ0w1ltWzu7JBKSWfGAfci8ywpv0Vz_Lg6Q-eS5pZAWpgoZ9PBnm_qnf2BAqbd&s=CmbeUcr_Ltz9nrzW2h4l3azL_KBEqloxrF9Rl9GuEpQ&e= _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/F2466J65...

So you're saying that for you at your shop, something that you've done for decades across multiple generations of products from the VXR 7206, to the Cisco 7600/6500, to the GSRs, to ASR9000, .... not to mention all of the non-cisco deployments from vendors.... and suddenly the ASR99xx just "cant handle it" ... would be fine and expected? I'm just trying to grasp the... root of what you're saying. Thanks, -Drew -----Original Message----- From: Mel Beckman <mel@beckman.org> Sent: Friday, August 1, 2025 2:47 PM To: nanog@lists.nanog.org Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@lists.nanog.org Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Drew, As I said elsewhere, the control plane was invented to separate management functions from the data forwarding process. In-band SNMP to data forwarding interfaces violates that separation. I’d say all bets are off. As they say in mathematics, this behavior is undefined. :) -mel via cell
On Aug 1, 2025, at 11:42 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hi,
Just to correct:
I was saying that 62% of the polls timeout and that only 38% actually result in responses and those 38% responses take multiples of time longer to actually complete if polling on an in-line interface.
This is just with a simple bash script running "time check_interfaces <args>" from the Nagios-Tools package and doing hundreds of poll runs in a row with various pauses between pollings.
It would be a little less of a concern if any other product did this but the idea that they just sort of left it 62% broken and shipped it that way is really making me wonder what else only functions at 38%.
We don't have a huge budget and the ASR9902 costs almost twice as much as the Arista devices we would've preferred to buy [the Arista device in question has 30x100GE ports and the ASR9902 is basically an 8x100GE router with a very poorly configured midplane/gearbox that ties into some sort of switch [that nobody seems to know how any of that works at Cisco, either].
If we had an unlimited budget we'd just mulligan this thing and buy the DCS devices that we want but we're stuck with it and if we're stuck with it I don't think it's insane to expect it to operate at least as well as an ASR9001.
Thanks, -Drew
-----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 1, 2025 2:28 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc.
62% would be devastating.
In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic.
Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor.
It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them.
-- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_KK73RTHMIZXLUMICYPEECO2AQ XILKHIQ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=d_XQ0w1ltWzu7JBKSWfGAfci8ywp v0Vz_Lg6Q-eS5pZAWpgoZ9PBnm_qnf2BAqbd&s=CmbeUcr_Ltz9nrzW2h4l3azL_KBEqlo xrF9Rl9GuEpQ&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_F2466J65DSWXATIP7DWSXU6FD HFW7L6H_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=lywWjE9qVWdmjXoOSXcoz1MZdEmv dqtTi8IL0y8gmEXL5LsoB6u7hnq1p3q910in&s=U0f81r8gnQbRH0nZcq-fRkKTFYJy8_A eahrx4J0t-so&e=

As I said elsewhere, the control plane was invented to separate management functions from the data forwarding process. In-band SNMP to data forwarding interfaces violates that separation.
Uh, no it doesn't. A control plane is just a separate compute/processing space that isn't used for traffic forwarding. In many cases , the 'management interface' is, by itself, not even part of the control plane.It's just a separate forwarding plane that isn't supposed to be able to send or receive anything from the 'main' forwarding plane. ( But as many of us have seen over time, that isn't always true. ) On Fri, Aug 1, 2025 at 3:07 PM Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Drew,
As I said elsewhere, the control plane was invented to separate management functions from the data forwarding process. In-band SNMP to data forwarding interfaces violates that separation. I’d say all bets are off. As they say in mathematics, this behavior is undefined. :)
-mel via cell
On Aug 1, 2025, at 11:42 AM, Drew Weaver via NANOG < nanog@lists.nanog.org> wrote:
Hi,
Just to correct:
I was saying that 62% of the polls timeout and that only 38% actually result in responses and those 38% responses take multiples of time longer to actually complete if polling on an in-line interface.
This is just with a simple bash script running "time check_interfaces <args>" from the Nagios-Tools package and doing hundreds of poll runs in a row with various pauses between pollings.
It would be a little less of a concern if any other product did this but the idea that they just sort of left it 62% broken and shipped it that way is really making me wonder what else only functions at 38%.
We don't have a huge budget and the ASR9902 costs almost twice as much as the Arista devices we would've preferred to buy [the Arista device in question has 30x100GE ports and the ASR9902 is basically an 8x100GE router with a very poorly configured midplane/gearbox that ties into some sort of switch [that nobody seems to know how any of that works at Cisco, either].
If we had an unlimited budget we'd just mulligan this thing and buy the DCS devices that we want but we're stuck with it and if we're stuck with it I don't think it's insane to expect it to operate at least as well as an ASR9001.
Thanks, -Drew
-----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 1, 2025 2:28 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG < nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc.
62% would be devastating.
In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic.
Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor.
It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them.
-- ++ytti _______________________________________________ NANOG mailing list
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/F2466J65... _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/COQS4UD6...

If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
Drew- I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find. This almost sounds like a default control plane DDOS policer / LPTS , something like that. On Fri, Aug 1, 2025 at 2:42 PM Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hi,
Just to correct:
I was saying that 62% of the polls timeout and that only 38% actually result in responses and those 38% responses take multiples of time longer to actually complete if polling on an in-line interface.
This is just with a simple bash script running "time check_interfaces <args>" from the Nagios-Tools package and doing hundreds of poll runs in a row with various pauses between pollings.
It would be a little less of a concern if any other product did this but the idea that they just sort of left it 62% broken and shipped it that way is really making me wonder what else only functions at 38%.
We don't have a huge budget and the ASR9902 costs almost twice as much as the Arista devices we would've preferred to buy [the Arista device in question has 30x100GE ports and the ASR9902 is basically an 8x100GE router with a very poorly configured midplane/gearbox that ties into some sort of switch [that nobody seems to know how any of that works at Cisco, either].
If we had an unlimited budget we'd just mulligan this thing and buy the DCS devices that we want but we're stuck with it and if we're stuck with it I don't think it's insane to expect it to operate at least as well as an ASR9001.
Thanks, -Drew
-----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 1, 2025 2:28 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc.
62% would be devastating.
In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic.
Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor.
It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them.
-- ++ytti _______________________________________________ NANOG mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_KK73RTHMIZXLUMICYPEECO2AQXILKHIQ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=d_XQ0w1ltWzu7JBKSWfGAfci8ywpv0Vz_Lg6Q-eS5pZAWpgoZ9PBnm_qnf2BAqbd&s=CmbeUcr_Ltz9nrzW2h4l3azL_KBEqloxrF9Rl9GuEpQ&e= _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/F2466J65...

On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that.
There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS. But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally. It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti

Saku, What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling. -mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that.
There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGF...

Mel- Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that. Probably just want to take the L here. On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that.
There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGF... _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTI...

I’ll just let the incivility of you both stand. -mel On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc> wrote: Mel- Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that. Probably just want to take the L here. On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Saku, What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling. -mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that.
There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGF...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTI...

Mel, The control plane receives 100% of the packets, providing the control plane policies allow it to. The control plane is likely connected to the ASIC via a mix of a PCI-E interface (providing the programming interface and an emulated NIC) and/or a specialized NIC port. If the CPU port is experiencing packet loss (my stance is very unlikely), that can be a separate discussion. I agree that an escalation is the appropriate response here, where TAC should try to reproduce the issue with Drew's config. Kind regards, Ryan Hamel ________________________________ From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Saturday, August 2, 2025 4:23 PM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org <nanog@lists.nanog.org>; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Caution: This is an external email and may be malicious. Please take care when clicking links or opening attachments. I’ll just let the incivility of you both stand. -mel On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc> wrote: Mel- Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that. Probably just want to take the L here. On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Saku, What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling. -mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that.
There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2F7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572035533%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6ZnOfMhjhXcOW6xJQgBOLUTCz9tS4Uzyb8esw9zjkww%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/>
NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FCF3QHVTISL6LDFTOWG4E3KK54QEDHUIY%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572091179%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=rMHLcrZ21hVS2zLMHWW2nmH%2FZoF%2FPm3gZdU1ViywGQc%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTISL6LDFTOWG4E3KK54QEDHUIY/> _______________________________________________ NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FOJ7ICXLSPFND32X2XS2U7XIWA6DALSIF%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572130874%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=J8MV4YiEOgQROlfuT5ij7baERA6aF8bH0Tm%2Bg2%2FMKC0%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OJ7ICXLSPFND32X2XS2U7XIWA6DALSIF/>

Thanks for that informative — and civil — explanation. In my experience, packets can “drop out“ of ASIC processing under unexpected and unusual circumstances, resulting in high CPU loads. Hopefully escalation of the case works, and TAC discovers yet another bug that can be addressed in a firmware update. -mel via cell On Aug 2, 2025, at 9:35 PM, Ryan Hamel <ryan@rkhtech.org> wrote: Mel, The control plane receives 100% of the packets, providing the control plane policies allow it to. The control plane is likely connected to the ASIC via a mix of a PCI-E interface (providing the programming interface and an emulated NIC) and/or a specialized NIC port. If the CPU port is experiencing packet loss (my stance is very unlikely), that can be a separate discussion. I agree that an escalation is the appropriate response here, where TAC should try to reproduce the issue with Drew's config. Kind regards, Ryan Hamel ________________________________ From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Saturday, August 2, 2025 4:23 PM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org <nanog@lists.nanog.org>; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Caution: This is an external email and may be malicious. Please take care when clicking links or opening attachments. I’ll just let the incivility of you both stand. -mel On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc> wrote: Mel- Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that. Probably just want to take the L here. On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Saku, What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling. -mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that.
There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2F7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572035533%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6ZnOfMhjhXcOW6xJQgBOLUTCz9tS4Uzyb8esw9zjkww%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/>
NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FCF3QHVTISL6LDFTOWG4E3KK54QEDHUIY%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572091179%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=rMHLcrZ21hVS2zLMHWW2nmH%2FZoF%2FPm3gZdU1ViywGQc%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTISL6LDFTOWG4E3KK54QEDHUIY/> _______________________________________________ NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FOJ7ICXLSPFND32X2XS2U7XIWA6DALSIF%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572130874%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=J8MV4YiEOgQROlfuT5ij7baERA6aF8bH0Tm%2Bg2%2FMKC0%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OJ7ICXLSPFND32X2XS2U7XIWA6DALSIF/>

The control plane receives 100% of the packets, providing the control plane policies allow it to. The control plane is likely connected to the ASIC via a mix of a PCI-E interface (providing the programming interface and an emulated NIC) and/or a specialized NIC port.
Different vendors and platforms do this differently, but generally the forwarding complex element has two egress paths ; one goes to the device fabric used for through traffic, the other goes to the control plane. Packets that egress to the fabric are usually chopped up into cells that are reassembled at the forwarding complex connected to the outbound interface. The control plane connection is usually just a standard ethernet to an internal control plane switch. The forwarding complex wraps up the packet and transmits it that direction, and it's passed over to the RP/RE that way. There is internal policing here to prevent elements from running the CP over. Some of those are user configurable, others are not. ( Vendor dependent.) When you say 'the control plane receives 100% of the packets', it sort of depends on what you define as the 'control plane'. That's usually defined as 'did the packet get to the RE/RP to process it' There are many scenarios by which this can break : - Interface buffers may be full - Interface buffers may be drained fast enough - Oversubscription of forwarding complex - Poorly designed QoS - Incorrect config/bugs of control plane policer on internal interface to CP switch - Central CPU (RE, RP, etc) overwhelmed - Internal CP switch manfunctioning On Sun, Aug 3, 2025 at 12:35 AM Ryan Hamel <ryan@rkhtech.org> wrote:
Mel,
The control plane receives 100% of the packets, providing the control plane policies allow it to. The control plane is likely connected to the ASIC via a mix of a PCI-E interface (providing the programming interface and an emulated NIC) and/or a specialized NIC port. If the CPU port is experiencing packet loss (my stance is very unlikely), that can be a separate discussion. I agree that an escalation is the appropriate response here, where TAC should try to reproduce the issue with Drew's config.
Kind regards,
Ryan Hamel
------------------------------ *From:* Mel Beckman via NANOG <nanog@lists.nanog.org> *Sent:* Saturday, August 2, 2025 4:23 PM *To:* Tom Beecher <beecher@beecher.cc> *Cc:* nanog@lists.nanog.org <nanog@lists.nanog.org>; Mel Beckman < mel@beckman.org> *Subject:* Re: Cisco ASR9902 SNMP polling ... is interesting
Caution: This is an external email and may be malicious. Please take care when clicking links or opening attachments.
I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG < nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that.
There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2F7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572035533%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6ZnOfMhjhXcOW6xJQgBOLUTCz9tS4Uzyb8esw9zjkww%3D&reserved=0 <https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/> _______________________________________________ NANOG mailing list
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FCF3QHVTISL6LDFTOWG4E3KK54QEDHUIY%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572091179%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=rMHLcrZ21hVS2zLMHWW2nmH%2FZoF%2FPm3gZdU1ViywGQc%3D&reserved=0 <https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTISL6LDFTOWG4E3KK54QEDHUIY/> _______________________________________________ NANOG mailing list
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FOJ7ICXLSPFND32X2XS2U7XIWA6DALSIF%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572130874%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=J8MV4YiEOgQROlfuT5ij7baERA6aF8bH0Tm%2Bg2%2FMKC0%3D&reserved=0 <https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OJ7ICXLSPFND32X2XS2U7XIWA6DALSIF/>

Tom, I 100% agree with this, thank you for going into greater detail on that. Kind regards, Ryan Hamel ________________________________ From: Tom Beecher <beecher@beecher.cc> Sent: Sunday, August 3, 2025 6:39 AM To: Ryan Hamel <ryan@rkhtech.org> Cc: North American Network Operators Group <nanog@lists.nanog.org>; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Caution: This is an external email and may be malicious. Please take care when clicking links or opening attachments. The control plane receives 100% of the packets, providing the control plane policies allow it to. The control plane is likely connected to the ASIC via a mix of a PCI-E interface (providing the programming interface and an emulated NIC) and/or a specialized NIC port. Different vendors and platforms do this differently, but generally the forwarding complex element has two egress paths ; one goes to the device fabric used for through traffic, the other goes to the control plane. Packets that egress to the fabric are usually chopped up into cells that are reassembled at the forwarding complex connected to the outbound interface. The control plane connection is usually just a standard ethernet to an internal control plane switch. The forwarding complex wraps up the packet and transmits it that direction, and it's passed over to the RP/RE that way. There is internal policing here to prevent elements from running the CP over. Some of those are user configurable, others are not. ( Vendor dependent.) When you say 'the control plane receives 100% of the packets', it sort of depends on what you define as the 'control plane'. That's usually defined as 'did the packet get to the RE/RP to process it' There are many scenarios by which this can break : * Interface buffers may be full * Interface buffers may be drained fast enough * Oversubscription of forwarding complex * Poorly designed QoS * Incorrect config/bugs of control plane policer on internal interface to CP switch * Central CPU (RE, RP, etc) overwhelmed * Internal CP switch manfunctioning On Sun, Aug 3, 2025 at 12:35 AM Ryan Hamel <ryan@rkhtech.org<mailto:ryan@rkhtech.org>> wrote: Mel, The control plane receives 100% of the packets, providing the control plane policies allow it to. The control plane is likely connected to the ASIC via a mix of a PCI-E interface (providing the programming interface and an emulated NIC) and/or a specialized NIC port. If the CPU port is experiencing packet loss (my stance is very unlikely), that can be a separate discussion. I agree that an escalation is the appropriate response here, where TAC should try to reproduce the issue with Drew's config. Kind regards, Ryan Hamel ________________________________ From: Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> Sent: Saturday, August 2, 2025 4:23 PM To: Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> Cc: nanog@lists.nanog.org<mailto:nanog@lists.nanog.org> <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>; Mel Beckman <mel@beckman.org<mailto:mel@beckman.org>> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Caution: This is an external email and may be malicious. Please take care when clicking links or opening attachments. I’ll just let the incivility of you both stand. -mel On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote: Mel- Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that. Probably just want to take the L here. On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku, What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling. -mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that.
There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2F7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572035533%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6ZnOfMhjhXcOW6xJQgBOLUTCz9tS4Uzyb8esw9zjkww%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/>
NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FCF3QHVTISL6LDFTOWG4E3KK54QEDHUIY%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572091179%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=rMHLcrZ21hVS2zLMHWW2nmH%2FZoF%2FPm3gZdU1ViywGQc%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTISL6LDFTOWG4E3KK54QEDHUIY/> _______________________________________________ NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FOJ7ICXLSPFND32X2XS2U7XIWA6DALSIF%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572130874%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=J8MV4YiEOgQROlfuT5ij7baERA6aF8bH0Tm%2Bg2%2FMKC0%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OJ7ICXLSPFND32X2XS2U7XIWA6DALSIF/>

Here is an illustration I use to illustrate the internal “receive path” the control, management, and signal planes take. Note the risk. It is very easy to hit a router/switch control/management plane. You do not need to flood the link to overload the control plane. 
On Aug 4, 2025, at 06:14, Ryan Hamel via NANOG <nanog@lists.nanog.org> wrote:
Tom,
I 100% agree with this, thank you for going into greater detail on that.
Kind regards,
Ryan Hamel
________________________________ From: Tom Beecher <beecher@beecher.cc <mailto:beecher@beecher.cc>> Sent: Sunday, August 3, 2025 6:39 AM To: Ryan Hamel <ryan@rkhtech.org <mailto:ryan@rkhtech.org>> Cc: North American Network Operators Group <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>>; Mel Beckman <mel@beckman.org <mailto:mel@beckman.org>> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
Caution: This is an external email and may be malicious. Please take care when clicking links or opening attachments.
The control plane receives 100% of the packets, providing the control plane policies allow it to. The control plane is likely connected to the ASIC via a mix of a PCI-E interface (providing the programming interface and an emulated NIC) and/or a specialized NIC port.
Different vendors and platforms do this differently, but generally the forwarding complex element has two egress paths ; one goes to the device fabric used for through traffic, the other goes to the control plane. Packets that egress to the fabric are usually chopped up into cells that are reassembled at the forwarding complex connected to the outbound interface.
The control plane connection is usually just a standard ethernet to an internal control plane switch. The forwarding complex wraps up the packet and transmits it that direction, and it's passed over to the RP/RE that way. There is internal policing here to prevent elements from running the CP over. Some of those are user configurable, others are not. ( Vendor dependent.)
When you say 'the control plane receives 100% of the packets', it sort of depends on what you define as the 'control plane'. That's usually defined as 'did the packet get to the RE/RP to process it' There are many scenarios by which this can break :
* Interface buffers may be full * Interface buffers may be drained fast enough * Oversubscription of forwarding complex * Poorly designed QoS * Incorrect config/bugs of control plane policer on internal interface to CP switch * Central CPU (RE, RP, etc) overwhelmed * Internal CP switch manfunctioning
On Sun, Aug 3, 2025 at 12:35 AM Ryan Hamel <ryan@rkhtech.org <mailto:ryan@rkhtech.org><mailto:ryan@rkhtech.org>> wrote: Mel,
The control plane receives 100% of the packets, providing the control plane policies allow it to. The control plane is likely connected to the ASIC via a mix of a PCI-E interface (providing the programming interface and an emulated NIC) and/or a specialized NIC port. If the CPU port is experiencing packet loss (my stance is very unlikely), that can be a separate discussion. I agree that an escalation is the appropriate response here, where TAC should try to reproduce the issue with Drew's config.
Kind regards,
Ryan Hamel
________________________________ From: Mel Beckman via NANOG <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org>> Sent: Saturday, August 2, 2025 4:23 PM To: Tom Beecher <beecher@beecher.cc <mailto:beecher@beecher.cc><mailto:beecher@beecher.cc>> Cc: nanog@lists.nanog.org <mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org> <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org>>; Mel Beckman <mel@beckman.org <mailto:mel@beckman.org><mailto:mel@beckman.org>> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
Caution: This is an external email and may be malicious. Please take care when clicking links or opening attachments.
I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc <mailto:beecher@beecher.cc><mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that.
There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2F7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572035533%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6ZnOfMhjhXcOW6xJQgBOLUTCz9tS4Uzyb8esw9zjkww%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2F7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572035533%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6ZnOfMhjhXcOW6xJQgBOLUTCz9tS4Uzyb8esw9zjkww%3D&reserved=0%3Chttps://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/%3E>
NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FCF3QHVTISL6LDFTOWG4E3KK54QEDHUIY%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572091179%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=rMHLcrZ21hVS2zLMHWW2nmH%2FZoF%2FPm3gZdU1ViywGQc%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTISL6LDFTOWG4E3KK54QEDHUIY/> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FCF3QHVTISL6LDFTOWG4E3KK54QEDHUIY%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572091179%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=rMHLcrZ21hVS2zLMHWW2nmH%2FZoF%2FPm3gZdU1ViywGQc%3D&reserved=0%3Chttps://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTISL6LDFTOWG4E3KK54QEDHUIY/%3E> _______________________________________________ NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FOJ7ICXLSPFND32X2XS2U7XIWA6DALSIF%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572130874%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=J8MV4YiEOgQROlfuT5ij7baERA6aF8bH0Tm%2Bg2%2FMKC0%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OJ7ICXLSPFND32X2XS2U7XIWA6DALSIF/> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FOJ7ICXLSPFND32X2XS2U7XIWA6DALSIF%2F&data=05%7C02%7Cryan%40rkhtech.org%7C935f5b9276c34753763108ddd21bb022%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638897738572130874%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=J8MV4YiEOgQROlfuT5ij7baERA6aF8bH0Tm%2Bg2%2FMKC0%3D&reserved=0%3Chttps://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OJ7ICXLSPFND32X2XS2U7XIWA6DALSIF/%3E> _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/BVVKHMXY...

On Sun, 3 Aug 2025 at 16:40, Tom Beecher via NANOG <nanog@lists.nanog.org> wrote:
When you say 'the control plane receives 100% of the packets', it sort of depends on what you define as the 'control plane'. That's usually defined as 'did the packet get to the RE/RP to process it' There are many scenarios by which this can break :
- Interface buffers may be full - Interface buffers may be drained fast enough - Oversubscription of forwarding complex - Poorly designed QoS - Incorrect config/bugs of control plane policer on internal interface to CP switch - Central CPU (RE, RP, etc) overwhelmed - Internal CP switch malfunctioning
Goot starting point is: RP/0/RSP0/CPU0:leruuter#show lpts pifib hardware police location 0/3/CPU0 | i SNMP SNMP 25 Static 300 300 0 0 01234567 RP/0/RSP0/CPU0:leruuter#show snmp request drop summary NMS Address INQ Encode Duplicate Stack AIPC Overload Timeout Internal Threshold 192.0.2.1 0 0 0 0 0 0 4 0 0 192.0.2.2 0 0 0 0 0 0 2 0 0 Further 'show snmp trace ...' will help. But this gets really tricky, really fast, you really need your account team on your side. -- ++ytti

Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also, Joe On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote:
I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGF...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTI... _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OJ7ICXLS...

Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGF...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTI... _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OJ7ICXLS...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/E4CF2TFV...

Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible
problem.
While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject
matter.
some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :(
-mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG < nanog@lists.nanog.org> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG < nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG < nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGF... _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTI...
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OJ7ICXLS... _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/E4CF2TFV...
NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RU6WF77Q...

Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7KXUNRGF...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CF3QHVTI... _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OJ7ICXLS...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/E4CF2TFV...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RU6WF77Q...

Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/ 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/C F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/O J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/E4 CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RU6WF77Q... _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/3NCOGL6S...

More like this, please. This was fun :-). Thank you. Mark. On 8/5/25 17:45, LJ Wobker (lwobker) via NANOG wrote:
Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-)
In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place.
No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words.
Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like.
First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing.
I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations.
I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it.
As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree.
At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling?
Anyway... hopefully that points you at least somewhat in the right direction.
--lj
-----Original Message----- From: Mel Beckman via NANOG<nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher<beecher@beecher.cc> Cc:nanog@lists.nanog.org; Mel Beckman<mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
Sorry, Tom. I’m not taking the bait.
-mel via cell
On Aug 4, 2025, at 7:02 AM, Tom Beecher<beecher@beecher.cc> wrote:
Mel-
You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages :
1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation.
You have attempted to frame these comments as :
honest and sincere attempts by other members to help identify the possible problem.
While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*.
Saku made 2 comments that addressed these falsehoods :
It might be easier to contribute, if there is familiarity to the subject matter.
some community member piled on with what can only be described as a bizarre drivel.
The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it.
There is a massive difference between the following statements :
1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ]
It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack.
Nobody is bullying you, or anybody else, in this conversation.
On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :(
-mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/ 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/C F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/O J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/E4 CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RU6WF77Q... _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/3NCOGL6S... _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/LE6LLRVD...

On Tue, 5 Aug 2025 15:45:57 +0000 "LJ Wobker (lwobker) via NANOG" <nanog@lists.nanog.org> wrote: Ah, a breath of fresh air. Thank you for your response. I definitely agree with "No one uses the same terms for anything" having worked at F5 for a while. Trunks... what are trunks? Depends on who you ask and where they work.
Wow, what a food fight this became. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words.
Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like.
First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing.
I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations.
I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it.
As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree.
At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling?
Anyway... hopefully that points you at least somewhat in the right direction.
--lj
-----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
Sorry, Tom. I’m not taking the bait.
-mel via cell
On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote:
Mel-
You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages :
1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation.
You have attempted to frame these comments as :
honest and sincere attempts by other members to help identify the possible problem.
While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*.
Saku made 2 comments that addressed these falsehoods :
It might be easier to contribute, if there is familiarity to the subject matter.
some community member piled on with what can only be described as a bizarre drivel.
The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it.
There is a massive difference between the following statements :
1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ]
It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack.
Nobody is bullying you, or anybody else, in this conversation.
On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :(
-mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/ 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/C F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/O J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/E4 CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RU6WF77Q... _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/3NCOGL6S... _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/LE6LLRVD...

On 05/08/2025 21:37, ab.nanog--- via NANOG wrote: Reminded me of the days of Tony Li. -Hank
On Tue, 5 Aug 2025 15:45:57 +0000 "LJ Wobker (lwobker) via NANOG" <nanog@lists.nanog.org> wrote:
Ah, a breath of fresh air. Thank you for your response. I definitely agree with "No one uses the same terms for anything" having worked at F5 for a while. Trunks... what are trunks? Depends on who you ask and where they work.
Wow, what a food fight this became.

Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e=

One other quick note for the zero people that are interested. It seems like LPTS is at best half implemented in XR 24 as it seems to have no problem dynamically generating its own <<totally hidden>> configuration for configured BGP sessions, etc but it can't look at the fact that there is an ACL on the snmp-community and simply white list the IPs attached to that ACL in the LPTS/CoPP policer? It can't throw a log entry on boot that says "Hi most likely we're just silently discarding traffic due to the default LPTS policer". It can't further go on to specifically say "Hey we know you didn't configure this but we noticed that the SNMP policer is dropping like 60% of the requests...." By default there isn't even a control-plane configuration in IOS XR 24 show run control-plane Wed Aug 6 13:44:59.244 EDT % No such configuration item(s) I bet the smart licensing daemon gets a real nice carveout in the default LPTS policer configuration. Its very important that smart licensing never fails to function but basic network telemetry isn’t that important. Thanks, -Drew -----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: 'LJ Wobker (lwobker)' <lwobker@cisco.com>; Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_PNYIWP2AYG6A5XAUWE2LB7ITRSWE4O5Y_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=XqO0z1zusbA5mM-cjO6mJQBBYiF6tKyN3QbF_1TuXAbm1aecz0pA7ZBv4LoMY2xZ&s=epw_VYl-NVBHiYGuuePKpKAZPoMzcdYh1MJrXkDLoAI&e=

Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on xrdocs.io, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e=

If only the customers would recognize how awesome this is! No, I'm not bashing Cisco (worked for them, including XR & NXOS development, good people, had a good time), but there is an attitude that Cisco helped develop though and meanwhile it is everywhere. (on a lighter note, reminds me of this: https://e-fun.blogspot.com/2006/06/dilbert-at-cisco-ii.html) Back to LJ's email: Just a few quotes.
But there are A LOT of assumptions that have to be made here [...]
[...] very likely not great assumptions for a system with a MUCH smaller/simpler config.
Then why making these assumptions? Especially with XR - not your mom & dad IT box but for ISPs or IT departments - you could provide the mechanism and either "do nothing as default" or "block everything as default". And then provide documentation and service$$$ to the customers Sure, this was discussed back then. And your reply is showing a big problem (IMHO): * "we have a good idea for default rate limitations" You did not say this but I assume this sums it up, as someone decided for default values. Problem: no you don't have a good idea. Often enough not even the customer has a good understanding. What you probably know from customer interactions is how to measure-and-adjust and _reach_ a good configuration. With the customer. * the elephant in the room is that if you are a small(er) ISP ... lets be honest, if you are not one of the big guys, then the focus of vendors is small(er) too. I get it to some extend but I have seen this too often from engineers too.
Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes"
Ah, XR. A "real" OS (QNX back then, Linux now?). And "real" engineers do multi-threading. If they will ever learn how to do it right ... . Frankly, there is a much bigger problem when you need rate limiting to avoid your RP CPU to melt, i.e. to be unable to keep the box & network stable. SNMP may be toast if you flood it, the CPU may run at very high utilization - the rest should keep going. The whole complexity of XR and the hardware can have only this justification (in reality: engineers get carried away with the new toy). There is another twist to it: if your router behaves "bad" then it was the customer's fault to not limit aggressively enough? And if you do rate "tight", then establishing the base line is a full project - is this realistic? And the customer is left alone. (remember, "taking the blame" is part of the vendor's job ;-) Don't get me wrong, I am sure you worked hard for the best outcome. I like you protect "your little router" :-) We should simply stop sugar coating the situation. Not sure any "pressure" of brutally-honest talk has an impact on Cisco's router business - it's commodity now, except the 400G interfaces and such - but at least it acknowledges the real problems colleagues like Drew have. Taking a deep breath. My god, what have we done. With technology, with the Internet. The simple (and often older) stuff works, everything on top is too often complex, fragile and it is not always obvious what is achieves or if the effort is worth the outcome. Just look at the SPF/DKIM discussion we recently had on NANOG. Just look at XR and this LPTS discussion. And if you ever work(ed) on the code, it is quite shocking too. We may have more/better tools, routers and software have improved but getting a basic engineering job done seems as hard, if not harder, compared to the old days. Back then we did not know (and had to learn), now we (well, the organizations, vendors) do not care anymore. I better distract myself with a cat video on YT :-) Marc On Wed, 6 Aug 2025 19:51:21 +0000, LJ Wobker (lwobker) via NANOG wrote:
Some more background might be useful here...
"Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is:
- identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer
The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config.
I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always.
This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms.
Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals.
If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on xrdocs.io, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists.
I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing.
For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate
Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes"
I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup".
On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works.
--lj
-----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
Hi there,
It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic.
I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later.
So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all.
Still waiting for TAC to tell me how to whitelist a single /32 in the policer.
In 9 more weeks I'll let you know what the result ends up being.
Thanks though for stopping by. -Drew
-----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-)
In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place.
No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words.
Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like.
First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing.
I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations.
I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it.
As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree.
At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling?
Anyway... hopefully that points you at least somewhat in the right direction.
--lj
-----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
Sorry, Tom. I’m not taking the bait.
-mel via cell
On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote:
Mel-
You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages :
1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation.
You have attempted to frame these comments as :
honest and sincere attempts by other members to help identify the possible problem.
While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*.
Saku made 2 comments that addressed these falsehoods :
It might be easier to contribute, if there is familiarity to the subject matter.
some community member piled on with what can only be described as a bizarre drivel.
The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it.
There is a massive difference between the following statements :
1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ]
It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack.
Nobody is bullying you, or anybody else, in this conversation.
On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :(
-mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG
<nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>>
wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG
<nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>>
wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG
<nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>>
wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list
_______________________________________________ NANOG mailing list
_______________________________________________ NANOG mailing list
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/KLOXVK36...

On Thu, 7 Aug 2025 at 15:08, Marc Binderberger via NANOG <nanog@lists.nanog.org> wrote:
Then why making these assumptions? Especially with XR - not your mom & dad IT box but for ISPs or IT departments - you could provide the mechanism and either "do nothing as default" or "block everything as default". And then provide documentation and service$$$ to the customers
Because while Cisco can't dimension the box well, operators do an even worse job at it. On cXR we had issues where occasionally LPTS would admit too much BGP, after LPTS admits BGP traffic it is hashed to 1/8 XIPC worker processes, before it is handed over to BGP. Because we had a busy device, XIPC didn't get the CPU cycles it needed to service the LPTS admitted packets, causing XIPC to drop packets. This meant a couple times a month we lost on some router 1/8th of BGP speakers, and Cisco explicitly refused to fix it. They literally said maybe it works better in eXR (it does). The funny thing is, this CPU demand was created by BGP, so because XIPC didn't have priority for CPU over BGP, it caused BGP to demand more CPU, due to flaps. If XIPC had had priority over BGP, the symptoms would have been lessen. I pointed this out to Cisco, they agreed, but said they've previously explored process priorities in cXR, but ended up having just more unstable devices (unmanageable complexity for people to understand what the priorities should be). All this while pitching that RTOS is mandatory for carrier grade NOS, while behind the scene nothing for said RTOS was used, it's just flat priority all around. Additionally LPTS is exclusively NPU level policer, if port1 congests some policer, also port2 suffers, there isn't a more-specific fall-back policer into IFD, IFL levels. So what can you do, if port1 has an L2 loop and is spewing ARP to you, killing port2? You can't MQC to 10pps, you can't ACL it, as LPTS bypasses MQC and ACL, so your only option is to shutdown port1, you cannot a-priori ensure one port won't take out other ports. There was an excessive flow tap, which could be used with success in this scenario, but that feature was retired, because I guess someone in cisco who knew why it was needed had left, and remaining people didn't understand its use case and didn't want to carry the complexity. All of these are actually solvable, you can deliver NOS where port1 in the same NPU won't take down port2, out-of-the-box, without configuration. But it requires deep understanding on what the platform can do, how it can do it, and how the actual customer network works. This person doesn't exist. Cisco or Nokia cannot be even configured like this by an operator, Juniper can be, but it's way too complicated for operators to do. So if you have a casual understanding how these devices work, you can bring down any core devices no matter how it's protected from trivial size single VPC DoS. Only reason the Internet works is because there isn't motivation to break it, not because it is well protected. Which is fine, because the same is true for personal safety, and focus should be on the motivation mitigation, rather than absolute safety. Of course this thread isn't about protecting devices in bad weather, it is about trying to make devices work in fair weather, which is a much more reasonable ask. -- ++ytti

I'm just replying here to let you know that this was "solved". lpts pifib hardware police flow snmp rate 2000 ! I want to point out that if you set it to it's max configuration value (4294967295) it ignores it entirely even though IOS XR seems to know that it's maximum for this hardware is 50000. It couldn't be bothered to simply set it to 50000 if you set it to the configured maximum of 4294967295 It couldn't be bothered to simply say: "Hey we know the max for this platform is 50000 so we set it to 50000 but you probably shouldn't be using 50000 for this value anyway" It could be bothered to do absolutely nothing and silently reject the command which made me laugh for about 5 minutes this morning. So thanks for that Cisco and more sincerely thank you to everyone that took any time to try and assist me with this. I still would have preferred to just tell it what IP addresses to expect SNMP traffic to come from and use that instead of a PPS policer but hey it's 2025 and preferences are luxuries. -Drew -----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 8, 2025 3:34 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com>; Marc Binderberger <marc+lists@sniff.es>; Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting On Thu, 7 Aug 2025 at 15:08, Marc Binderberger via NANOG <nanog@lists.nanog.org> wrote:
Then why making these assumptions? Especially with XR - not your mom & dad IT box but for ISPs or IT departments - you could provide the mechanism and either "do nothing as default" or "block everything as default". And then provide documentation and service$$$ to the customers
Because while Cisco can't dimension the box well, operators do an even worse job at it. On cXR we had issues where occasionally LPTS would admit too much BGP, after LPTS admits BGP traffic it is hashed to 1/8 XIPC worker processes, before it is handed over to BGP. Because we had a busy device, XIPC didn't get the CPU cycles it needed to service the LPTS admitted packets, causing XIPC to drop packets. This meant a couple times a month we lost on some router 1/8th of BGP speakers, and Cisco explicitly refused to fix it. They literally said maybe it works better in eXR (it does). The funny thing is, this CPU demand was created by BGP, so because XIPC didn't have priority for CPU over BGP, it caused BGP to demand more CPU, due to flaps. If XIPC had had priority over BGP, the symptoms would have been lessen. I pointed this out to Cisco, they agreed, but said they've previously explored process priorities in cXR, but ended up having just more unstable devices (unmanageable complexity for people to understand what the priorities should be). All this while pitching that RTOS is mandatory for carrier grade NOS, while behind the scene nothing for said RTOS was used, it's just flat priority all around. Additionally LPTS is exclusively NPU level policer, if port1 congests some policer, also port2 suffers, there isn't a more-specific fall-back policer into IFD, IFL levels. So what can you do, if port1 has an L2 loop and is spewing ARP to you, killing port2? You can't MQC to 10pps, you can't ACL it, as LPTS bypasses MQC and ACL, so your only option is to shutdown port1, you cannot a-priori ensure one port won't take out other ports. There was an excessive flow tap, which could be used with success in this scenario, but that feature was retired, because I guess someone in cisco who knew why it was needed had left, and remaining people didn't understand its use case and didn't want to carry the complexity. All of these are actually solvable, you can deliver NOS where port1 in the same NPU won't take down port2, out-of-the-box, without configuration. But it requires deep understanding on what the platform can do, how it can do it, and how the actual customer network works. This person doesn't exist. Cisco or Nokia cannot be even configured like this by an operator, Juniper can be, but it's way too complicated for operators to do. So if you have a casual understanding how these devices work, you can bring down any core devices no matter how it's protected from trivial size single VPC DoS. Only reason the Internet works is because there isn't motivation to break it, not because it is well protected. Which is fine, because the same is true for personal safety, and focus should be on the motivation mitigation, rather than absolute safety. Of course this thread isn't about protecting devices in bad weather, it is about trying to make devices work in fair weather, which is a much more reasonable ask. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_V56CX5TXE7MSA2NQR6WFFZQWSWEDQCB5_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=JpBzXEAHGqhw7yYz2WYDniWSu1mYKW1Hpnju_sjqO-Z5HFqV2hrVPk9ge-SMaqrk&s=78hSyv-0ZbBYSmiMoeY-ttfxJ9O_K8Dab4hkaP-mlKk&e=

One other note I'd like to make on this just for future reference: The default for SNMP in LPTS on this platform is 300 (I'm assuming that is 300pps) We aren't sending 300pps of SNMP traffic at this device so nothing should have been policed by it. There might be an issue with how it's counting or it's duplicating packets. Anyway setting it to 500 made everything work properly. (We aren't sending 500pps of SNMP at the machine either). Thanks, -Drew -----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org> Sent: Friday, August 8, 2025 9:32 AM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: 'LJ Wobker (lwobker)' <lwobker@cisco.com>; 'Marc Binderberger' <marc+lists@sniff.es>; Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting I'm just replying here to let you know that this was "solved". lpts pifib hardware police flow snmp rate 2000 ! I want to point out that if you set it to it's max configuration value (4294967295) it ignores it entirely even though IOS XR seems to know that it's maximum for this hardware is 50000. It couldn't be bothered to simply set it to 50000 if you set it to the configured maximum of 4294967295 It couldn't be bothered to simply say: "Hey we know the max for this platform is 50000 so we set it to 50000 but you probably shouldn't be using 50000 for this value anyway" It could be bothered to do absolutely nothing and silently reject the command which made me laugh for about 5 minutes this morning. So thanks for that Cisco and more sincerely thank you to everyone that took any time to try and assist me with this. I still would have preferred to just tell it what IP addresses to expect SNMP traffic to come from and use that instead of a PPS policer but hey it's 2025 and preferences are luxuries. -Drew -----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 8, 2025 3:34 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com>; Marc Binderberger <marc+lists@sniff.es>; Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting On Thu, 7 Aug 2025 at 15:08, Marc Binderberger via NANOG <nanog@lists.nanog.org> wrote:
Then why making these assumptions? Especially with XR - not your mom & dad IT box but for ISPs or IT departments - you could provide the mechanism and either "do nothing as default" or "block everything as default". And then provide documentation and service$$$ to the customers
Because while Cisco can't dimension the box well, operators do an even worse job at it. On cXR we had issues where occasionally LPTS would admit too much BGP, after LPTS admits BGP traffic it is hashed to 1/8 XIPC worker processes, before it is handed over to BGP. Because we had a busy device, XIPC didn't get the CPU cycles it needed to service the LPTS admitted packets, causing XIPC to drop packets. This meant a couple times a month we lost on some router 1/8th of BGP speakers, and Cisco explicitly refused to fix it. They literally said maybe it works better in eXR (it does). The funny thing is, this CPU demand was created by BGP, so because XIPC didn't have priority for CPU over BGP, it caused BGP to demand more CPU, due to flaps. If XIPC had had priority over BGP, the symptoms would have been lessen. I pointed this out to Cisco, they agreed, but said they've previously explored process priorities in cXR, but ended up having just more unstable devices (unmanageable complexity for people to understand what the priorities should be). All this while pitching that RTOS is mandatory for carrier grade NOS, while behind the scene nothing for said RTOS was used, it's just flat priority all around. Additionally LPTS is exclusively NPU level policer, if port1 congests some policer, also port2 suffers, there isn't a more-specific fall-back policer into IFD, IFL levels. So what can you do, if port1 has an L2 loop and is spewing ARP to you, killing port2? You can't MQC to 10pps, you can't ACL it, as LPTS bypasses MQC and ACL, so your only option is to shutdown port1, you cannot a-priori ensure one port won't take out other ports. There was an excessive flow tap, which could be used with success in this scenario, but that feature was retired, because I guess someone in cisco who knew why it was needed had left, and remaining people didn't understand its use case and didn't want to carry the complexity. All of these are actually solvable, you can deliver NOS where port1 in the same NPU won't take down port2, out-of-the-box, without configuration. But it requires deep understanding on what the platform can do, how it can do it, and how the actual customer network works. This person doesn't exist. Cisco or Nokia cannot be even configured like this by an operator, Juniper can be, but it's way too complicated for operators to do. So if you have a casual understanding how these devices work, you can bring down any core devices no matter how it's protected from trivial size single VPC DoS. Only reason the Internet works is because there isn't motivation to break it, not because it is well protected. Which is fine, because the same is true for personal safety, and focus should be on the motivation mitigation, rather than absolute safety. Of course this thread isn't about protecting devices in bad weather, it is about trying to make devices work in fair weather, which is a much more reasonable ask. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_V56CX5TXE7MSA2NQR6WFFZQWSWEDQCB5_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=JpBzXEAHGqhw7yYz2WYDniWSu1mYKW1Hpnju_sjqO-Z5HFqV2hrVPk9ge-SMaqrk&s=78hSyv-0ZbBYSmiMoeY-ttfxJ9O_K8Dab4hkaP-mlKk&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_5QFU3TMPNYTRDQWGD6ZNYQSCG56J3YBH_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=CiPRK92BvloBNS51T81cJ1YPGgGmfKkdKxEIYl46ZuxxUJtYYXIsrOu-aL7rBOoR&s=bcUoPtLvZA6z0yoTtxYOPYMn8MNceeJugOEslPrbz6o&e=

I would chase this further with Cisco, if you have the cycles. Often it pays dividends in the future to have a proper understanding of anatomy of the issue. So it's not purely for curiosity's sake. On Fri, 8 Aug 2025 at 16:51, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
One other note I'd like to make on this just for future reference:
The default for SNMP in LPTS on this platform is 300 (I'm assuming that is 300pps)
We aren't sending 300pps of SNMP traffic at this device so nothing should have been policed by it.
There might be an issue with how it's counting or it's duplicating packets.
Anyway setting it to 500 made everything work properly.
(We aren't sending 500pps of SNMP at the machine either).
Thanks, -Drew
-----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org> Sent: Friday, August 8, 2025 9:32 AM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: 'LJ Wobker (lwobker)' <lwobker@cisco.com>; 'Marc Binderberger' <marc+lists@sniff.es>; Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
I'm just replying here to let you know that this was "solved".
lpts pifib hardware police flow snmp rate 2000 !
I want to point out that if you set it to it's max configuration value (4294967295) it ignores it entirely even though IOS XR seems to know that it's maximum for this hardware is 50000.
It couldn't be bothered to simply set it to 50000 if you set it to the configured maximum of 4294967295 It couldn't be bothered to simply say: "Hey we know the max for this platform is 50000 so we set it to 50000 but you probably shouldn't be using 50000 for this value anyway" It could be bothered to do absolutely nothing and silently reject the command which made me laugh for about 5 minutes this morning.
So thanks for that Cisco and more sincerely thank you to everyone that took any time to try and assist me with this.
I still would have preferred to just tell it what IP addresses to expect SNMP traffic to come from and use that instead of a PPS policer but hey it's 2025 and preferences are luxuries.
-Drew
-----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 8, 2025 3:34 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com>; Marc Binderberger <marc+lists@sniff.es>; Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
On Thu, 7 Aug 2025 at 15:08, Marc Binderberger via NANOG <nanog@lists.nanog.org> wrote:
Then why making these assumptions? Especially with XR - not your mom & dad IT box but for ISPs or IT departments - you could provide the mechanism and either "do nothing as default" or "block everything as default". And then provide documentation and service$$$ to the customers
Because while Cisco can't dimension the box well, operators do an even worse job at it.
On cXR we had issues where occasionally LPTS would admit too much BGP, after LPTS admits BGP traffic it is hashed to 1/8 XIPC worker processes, before it is handed over to BGP. Because we had a busy device, XIPC didn't get the CPU cycles it needed to service the LPTS admitted packets, causing XIPC to drop packets. This meant a couple times a month we lost on some router 1/8th of BGP speakers, and Cisco explicitly refused to fix it. They literally said maybe it works better in eXR (it does). The funny thing is, this CPU demand was created by BGP, so because XIPC didn't have priority for CPU over BGP, it caused BGP to demand more CPU, due to flaps. If XIPC had had priority over BGP, the symptoms would have been lessen. I pointed this out to Cisco, they agreed, but said they've previously explored process priorities in cXR, but ended up having just more unstable devices (unmanageable complexity for people to understand what the priorities should be). All this while pitching that RTOS is mandatory for carrier grade NOS, while behind the scene nothing for said RTOS was used, it's just flat priority all around.
Additionally LPTS is exclusively NPU level policer, if port1 congests some policer, also port2 suffers, there isn't a more-specific fall-back policer into IFD, IFL levels. So what can you do, if port1 has an L2 loop and is spewing ARP to you, killing port2? You can't MQC to 10pps, you can't ACL it, as LPTS bypasses MQC and ACL, so your only option is to shutdown port1, you cannot a-priori ensure one port won't take out other ports. There was an excessive flow tap, which could be used with success in this scenario, but that feature was retired, because I guess someone in cisco who knew why it was needed had left, and remaining people didn't understand its use case and didn't want to carry the complexity.
All of these are actually solvable, you can deliver NOS where port1 in the same NPU won't take down port2, out-of-the-box, without configuration. But it requires deep understanding on what the platform can do, how it can do it, and how the actual customer network works. This person doesn't exist. Cisco or Nokia cannot be even configured like this by an operator, Juniper can be, but it's way too complicated for operators to do.
So if you have a casual understanding how these devices work, you can bring down any core devices no matter how it's protected from trivial size single VPC DoS. Only reason the Internet works is because there isn't motivation to break it, not because it is well protected. Which is fine, because the same is true for personal safety, and focus should be on the motivation mitigation, rather than absolute safety.
Of course this thread isn't about protecting devices in bad weather, it is about trying to make devices work in fair weather, which is a much more reasonable ask.
-- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_V56CX5TXE7MSA2NQR6WFFZQWSWEDQCB5_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=JpBzXEAHGqhw7yYz2WYDniWSu1mYKW1Hpnju_sjqO-Z5HFqV2hrVPk9ge-SMaqrk&s=78hSyv-0ZbBYSmiMoeY-ttfxJ9O_K8Dab4hkaP-mlKk&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_5QFU3TMPNYTRDQWGD6ZNYQSCG56J3YBH_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=CiPRK92BvloBNS51T81cJ1YPGgGmfKkdKxEIYl46ZuxxUJtYYXIsrOu-aL7rBOoR&s=bcUoPtLvZA6z0yoTtxYOPYMn8MNceeJugOEslPrbz6o&e= _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/ORJMBJRV...
-- ++ytti

I'm not sure I have the minerals tbh. -Drew -----Original Message----- From: Saku Ytti <saku@ytti.fi> Sent: Friday, August 8, 2025 9:55 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com>; Marc Binderberger <marc+lists@sniff.es>; Drew Weaver <drew.weaver@thenap.com> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting I would chase this further with Cisco, if you have the cycles. Often it pays dividends in the future to have a proper understanding of anatomy of the issue. So it's not purely for curiosity's sake. On Fri, 8 Aug 2025 at 16:51, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
One other note I'd like to make on this just for future reference:
The default for SNMP in LPTS on this platform is 300 (I'm assuming that is 300pps)
We aren't sending 300pps of SNMP traffic at this device so nothing should have been policed by it.
There might be an issue with how it's counting or it's duplicating packets.
Anyway setting it to 500 made everything work properly.
(We aren't sending 500pps of SNMP at the machine either).
Thanks, -Drew
-----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org> Sent: Friday, August 8, 2025 9:32 AM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: 'LJ Wobker (lwobker)' <lwobker@cisco.com>; 'Marc Binderberger' <marc+lists@sniff.es>; Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
I'm just replying here to let you know that this was "solved".
lpts pifib hardware police flow snmp rate 2000 !
I want to point out that if you set it to it's max configuration value (4294967295) it ignores it entirely even though IOS XR seems to know that it's maximum for this hardware is 50000.
It couldn't be bothered to simply set it to 50000 if you set it to the configured maximum of 4294967295 It couldn't be bothered to simply say: "Hey we know the max for this platform is 50000 so we set it to 50000 but you probably shouldn't be using 50000 for this value anyway" It could be bothered to do absolutely nothing and silently reject the command which made me laugh for about 5 minutes this morning.
So thanks for that Cisco and more sincerely thank you to everyone that took any time to try and assist me with this.
I still would have preferred to just tell it what IP addresses to expect SNMP traffic to come from and use that instead of a PPS policer but hey it's 2025 and preferences are luxuries.
-Drew
-----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 8, 2025 3:34 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com>; Marc Binderberger <marc+lists@sniff.es>; Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
On Thu, 7 Aug 2025 at 15:08, Marc Binderberger via NANOG <nanog@lists.nanog.org> wrote:
Then why making these assumptions? Especially with XR - not your mom & dad IT box but for ISPs or IT departments - you could provide the mechanism and either "do nothing as default" or "block everything as default". And then provide documentation and service$$$ to the customers
Because while Cisco can't dimension the box well, operators do an even worse job at it.
On cXR we had issues where occasionally LPTS would admit too much BGP, after LPTS admits BGP traffic it is hashed to 1/8 XIPC worker processes, before it is handed over to BGP. Because we had a busy device, XIPC didn't get the CPU cycles it needed to service the LPTS admitted packets, causing XIPC to drop packets. This meant a couple times a month we lost on some router 1/8th of BGP speakers, and Cisco explicitly refused to fix it. They literally said maybe it works better in eXR (it does). The funny thing is, this CPU demand was created by BGP, so because XIPC didn't have priority for CPU over BGP, it caused BGP to demand more CPU, due to flaps. If XIPC had had priority over BGP, the symptoms would have been lessen. I pointed this out to Cisco, they agreed, but said they've previously explored process priorities in cXR, but ended up having just more unstable devices (unmanageable complexity for people to understand what the priorities should be). All this while pitching that RTOS is mandatory for carrier grade NOS, while behind the scene nothing for said RTOS was used, it's just flat priority all around.
Additionally LPTS is exclusively NPU level policer, if port1 congests some policer, also port2 suffers, there isn't a more-specific fall-back policer into IFD, IFL levels. So what can you do, if port1 has an L2 loop and is spewing ARP to you, killing port2? You can't MQC to 10pps, you can't ACL it, as LPTS bypasses MQC and ACL, so your only option is to shutdown port1, you cannot a-priori ensure one port won't take out other ports. There was an excessive flow tap, which could be used with success in this scenario, but that feature was retired, because I guess someone in cisco who knew why it was needed had left, and remaining people didn't understand its use case and didn't want to carry the complexity.
All of these are actually solvable, you can deliver NOS where port1 in the same NPU won't take down port2, out-of-the-box, without configuration. But it requires deep understanding on what the platform can do, how it can do it, and how the actual customer network works. This person doesn't exist. Cisco or Nokia cannot be even configured like this by an operator, Juniper can be, but it's way too complicated for operators to do.
So if you have a casual understanding how these devices work, you can bring down any core devices no matter how it's protected from trivial size single VPC DoS. Only reason the Internet works is because there isn't motivation to break it, not because it is well protected. Which is fine, because the same is true for personal safety, and focus should be on the motivation mitigation, rather than absolute safety.
Of course this thread isn't about protecting devices in bad weather, it is about trying to make devices work in fair weather, which is a much more reasonable ask.
-- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_V56CX5TXE7MSA2NQR6WFFZQWS WEDQCB5_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=JpBzXEAHGqhw7yYz2WYDniWSu1mY KW1Hpnju_sjqO-Z5HFqV2hrVPk9ge-SMaqrk&s=78hSyv-0ZbBYSmiMoeY-ttfxJ9O_K8D ab4hkaP-mlKk&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_5QFU3TMPNYTRDQWGD6ZNYQSCG 56J3YBH_&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=CiPRK92BvloBNS51T81cJ1YPGgGm fKkdKxEIYl46ZuxxUJtYYXIsrOu-aL7rBOoR&s=bcUoPtLvZA6z0yoTtxYOPYMn8MNceeJ ugOEslPrbz6o&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_ORJMBJRVNLLDAYU3SMOFOW34O ABC7UOD_&d=DwIFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=g9V7cxKwbXhjWWffG8XudwAabSTr kHWCrLcOhzztkzw5DkNw0QeIzeTn7DKk9e9p&s=pClfygoAgsC_PvS2a2Ni__FrYKh77ZK SCIAmKiS2Jno&e=
-- ++ytti

On Fri, 8 Aug 2025, Drew Weaver via NANOG wrote:
It could be bothered to do absolutely nothing and silently reject the command which made me laugh for about 5 minutes this morning.
This wouldn't be the first time. An example of other case where this happened was when configuring traffic shaping, but the specific setup didn't have enough traffic manager bandwidth (4 port 10GE into a 35 gigabit/s traffic manager if I remember correctly) so XR chose the path of least resistance and accepted the commit, decided it couldn't be implemented in HW, and got on with its life without saying anything. -- Mikael Abrahamsson email: swmike@swm.pp.se

Drew Weaver via NANOG wrote on 08/08/2025 14:31:
It couldn't be bothered to simply set it to 50000 if you set it to the configured maximum of 4294967295 It couldn't be bothered to simply say: "Hey we know the max for this platform is50000 so we set it to 50000 but you probably shouldn't be using 50000 for this value anyway" It could be bothered to do absolutely nothing and silently reject the command which made me laugh for about 5 minutes this morning.
Some years ago I was fighting with a low level pps rate limiter for a telemetry service on a long obsolete platform. The default limit caused packets to be dropped, and we finally settled on an updated figure based on the usual compromise of performance vs consequence. But: if we increased the limiter above what we had measured to be reasonable, this fairly quickly caused a performance cliff which affected other services, e.g. snmp / lacp timeouts, etc, so production impact. Although this was in the days of in-house NOS schedulers, I'd be fairly cautious in this area - particular on RTOS platforms like XR. If Cisco have implemented a pps limiter of 50k/s, that's a lot of snmp pps. Is this a realistic amount of requests to be properly serviced per second? SNMP packet encapsulation / general handling is one thing, but stats collection / intermediation can be more heavyweight. Bear in mind that the failure modes in this sort of situation are often non-linear. For sure it's a bit annoying that they don't warn that this is the maximum (possibly a platform / LC limit? i.e. possible that this is not a generic limit across all SPs on all types of unit), but at least the box won't fall over in production just because someone tweaked a parameter beyond what the hardware was likely capable of handling. Nick

By design, the LPTS default values are set to be on the "slow but safe" side. As I've already mentioned, picking default values is incredibly hard for stuff like this because you've got a dramatic range of system sizes, shapes, use cases, blah blah. The general consensus is we'd rather force people to open up policers explicitly than have them be too open by default. Feel free to dismiss me as a crybaby apologist, but that's how we got here. Said another way: feel free to say that we made terrible choices and you hate our defaults. But you can't justly accuse of us of not thinking about it and/or just making shit up. Another challenge here (yeah, I know... there goes LJ apologizing again...) is that a "500 pps" policer is not actually 500 packets per second. It's a token bucket meter where the actual parameters are the token fill rate and a burst size. Choosing THESE values is yet another messy problem... if we assumed that the bucket gets filled up once per second, then what you'd end up with is a meter that would allow 500 packets through as fast as they could be dequeued, but then let nothing else through for the rest of that second time window. Then we add 500 "tokens" at T = 1 sec, and lather rinse repeat. In the real world every hardware based meter is slightly different as far as what burst sizes are available, how fast the token interval fills, and more stuff. But if we circle back to this particular case, we might well not have a 500 PPS policer, but rather we might have a policer that is "50 packets every tenth of a second" or "5 packets every hundredth of a second". This is where you have to know something about other side (here - the SNMP client)... does it send a burst of packets all at once? How large is that burst? Does that burst overrun the policer on the router? If it times out and re-tries, does that make it worse or better? Your complaint about the thing silently accepting a value that can't be supported in the hardware is 100% valid. We should not let you say "police this to a rate of 4 billion" and reply with "OK no problem" when in reality we're not doing that. Please ask your TAC engineer to file a bug for this ... we might or might not ever get around to fixing it, but it at least needs to be documented somewhere. (I would do it myself but they deemed me too dangerous to allow continued access to the DDTS database many years ago...) As to why it takes so much longer to do the same thing on a non-management interface, I'm truly curious as to this one. 5 seconds is a bonkers amount of time on a system like this... my best guesses at this point are things like: - because the rate limiters are different for mgmt vs non, somehow we're getting a partial completion each "cycle" and we've got tons of retries in there. Drew, did you ever get the output of something like "debug snmp packet" or whatever it was the TAC guys asked for? I'd be specifically interested in comparing those traces for the two {mgmt, not} cases... once the SNMP process generates it's replies the data plane on the way OUT is pretty much non blocking, so I'd want to see if somehow we're pacing the arrival of the requests into the SNMP process, and/or if it thinks it's generating the responses in the same amount of time.... --lj -----Original Message----- From: Nick Hilliard via NANOG <nanog@lists.nanog.org> Sent: Friday, August 8, 2025 11:39 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Nick Hilliard <nick@foobar.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Drew Weaver via NANOG wrote on 08/08/2025 14:31:
It couldn't be bothered to simply set it to 50000 if you set it to the configured maximum of 4294967295 It couldn't be bothered to simply say: "Hey we know the max for this platform is50000 so we set it to 50000 but you probably shouldn't be using 50000 for this value anyway" It could be bothered to do absolutely nothing and silently reject the command which made me laugh for about 5 minutes this morning.
Some years ago I was fighting with a low level pps rate limiter for a telemetry service on a long obsolete platform. The default limit caused packets to be dropped, and we finally settled on an updated figure based on the usual compromise of performance vs consequence. But: if we increased the limiter above what we had measured to be reasonable, this fairly quickly caused a performance cliff which affected other services, e.g. snmp / lacp timeouts, etc, so production impact. Although this was in the days of in-house NOS schedulers, I'd be fairly cautious in this area - particular on RTOS platforms like XR. If Cisco have implemented a pps limiter of 50k/s, that's a lot of snmp pps. Is this a realistic amount of requests to be properly serviced per second? SNMP packet encapsulation / general handling is one thing, but stats collection / intermediation can be more heavyweight. Bear in mind that the failure modes in this sort of situation are often non-linear. For sure it's a bit annoying that they don't warn that this is the maximum (possibly a platform / LC limit? i.e. possible that this is not a generic limit across all SPs on all types of unit), but at least the box won't fall over in production just because someone tweaked a parameter beyond what the hardware was likely capable of handling. Nick _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/WUYR7KRW QCA5EA2IF6RVNE4BKUUD5TZL/

On Fri, 8 Aug 2025 at 18:45, Nick Hilliard via NANOG <nanog@lists.nanog.org> wrote:
If Cisco have implemented a pps limiter of 50k/s, that's a lot of snmp pps. Is this a realistic amount of requests to be properly serviced per second? SNMP packet encapsulation / general handling is one thing, but stats collection / intermediation can be more heavyweight. Bear in mind that the failure modes in this sort of situation are often non-linear.
In this case something less obvious is happening, OP isn't pushing 300 pps, yet the policer is firing. This could be a legitimate bug, might require a peek into what actually gets programmed into the BRCM. In PTX PE (Paradise) there isn't a PPS policer in the hardware, yet ddos-protection can only be configured as PPS. So as a compromise the developer decided to program (1500*8*pps) bps policer. So out of the box, standard configuration, the box will admit far too many small packets, more than the VoQ from ASIC -> LC_CPU can admit, congesting the whole VoQ, which is shared by most things. Unfortunately the user cannot change the 1500 into 64, nor can user decide which ddos-protocols go into which VoQ, making it very tricky to get reasonable punt results under poor weather. -- ++ytti

Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e=

No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :) BUT - the policer rates are different based on where the packets arrive. XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU. If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe. BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time. Following my own suggestion of earlier, google returned this: https://www.cisco.com/c/en/us/td/docs/routers/asr9000/software/asr9k_r5-1/ad... and this: https://www.cisco.com/c/en/us/td/docs/routers/asr9000/software/ip-addresses/... Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this: configure lpts pifib hardware police flow [snmp] rate [something] You should be able to futz around with the rates until you figure out what eliminates the drops. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e=

Yes but there is no functional difference between using MGMT eth ports and simply specifying the IP address of the NMS somewhere. (which we already did in the ACL applied to the damn snmp-community command in the first place.......) Your argument was that someone who is on that 'whitelisted IP' could burn the control plane down if there was a whitelisted IP. From that standpoint the same applies to mgmteth ports, so what is the actual benefit of any of this then? From an operational standpoint I don't want to specify a pps or bps rate that all SNMP traffic can pass the policer through. I want to specify an IP address that bypasses the policer entirely like you can do on all other network operating systems CoPP configuration. I appreciate you googling the docs. I will try to increase the rate on the policer just for SNMP but as I've mentioned at least 15 times in these threads I can't even see the way it's currently configured. Just completely blind poking around in the internals of this thing. Cool. -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Thursday, August 7, 2025 10:33 AM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :) BUT - the policer rates are different based on where the packets arrive. XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU. If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe. BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time. Following my own suggestion of earlier, google returned this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_asr9k-5Fr5-2D1_addr-5Fserv_configuration_guide_b-5Fipaddr-5Fcg51xa9k_b-5Fipaddr-5Fcg51xa9k-5Fchapter-5F01000.pdf&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=xyh9Ub_u-ByRGArTKOUFAY_VG7nbtFkgwg8dyvPG8Og&e= and this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_ip-2Daddresses_command_reference_b-2Dip-2Daddresses-2Dcr-2Dasr9000_b-2Dipaddr-2Dcr-2Dasr9k-5Fchapter-5F0111.html&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=HzsiarzWMDm6qryzhiWgfx3qfEDVuKc2DjWnyiGtH1o&e= Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this: configure lpts pifib hardware police flow [snmp] rate [something] You should be able to futz around with the rates until you figure out what eliminates the drops. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e=

In my mind, management ports should be on an entirely separated fabric, immune to the 'general public' fabric. That's what this seems to "loosely" achieve" -----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org> Sent: Thursday, August 7, 2025 10:47 AM To: 'LJ Wobker (lwobker)' <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Yes but there is no functional difference between using MGMT eth ports and simply specifying the IP address of the NMS somewhere. (which we already did in the ACL applied to the damn snmp-community command in the first place.......) Your argument was that someone who is on that 'whitelisted IP' could burn the control plane down if there was a whitelisted IP. From that standpoint the same applies to mgmteth ports, so what is the actual benefit of any of this then? From an operational standpoint I don't want to specify a pps or bps rate that all SNMP traffic can pass the policer through. I want to specify an IP address that bypasses the policer entirely like you can do on all other network operating systems CoPP configuration. I appreciate you googling the docs. I will try to increase the rate on the policer just for SNMP but as I've mentioned at least 15 times in these threads I can't even see the way it's currently configured. Just completely blind poking around in the internals of this thing. Cool. -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Thursday, August 7, 2025 10:33 AM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :) BUT - the policer rates are different based on where the packets arrive. XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU. If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe. BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time. Following my own suggestion of earlier, google returned this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_asr9k-5Fr5-2D1_addr-5Fserv_configuration_guide_b-5Fipaddr-5Fcg51xa9k_b-5Fipaddr-5Fcg51xa9k-5Fchapter-5F01000.pdf&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=xyh9Ub_u-ByRGArTKOUFAY_VG7nbtFkgwg8dyvPG8Og&e= and this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_ip-2Daddresses_command_reference_b-2Dip-2Daddresses-2Dcr-2Dasr9000_b-2Dipaddr-2Dcr-2Dasr9k-5Fchapter-5F0111.html&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=HzsiarzWMDm6qryzhiWgfx3qfEDVuKc2DjWnyiGtH1o&e= Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this: configure lpts pifib hardware police flow [snmp] rate [something] You should be able to futz around with the rates until you figure out what eliminates the drops. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e= _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/5F2ASQ5X...

I would consider this instead: interface MgmtEth0/RP0/CPU0/0 ipv4 address 172.16.5.1 255.255.255.252 ! interface MgmtEth0/RP1/CPU0/0 ipv4 address 172.16.6.1 255.255.255.252 ! Unless there is some insane MGMT IP clustering service built into this thing (and given the state of SNMP polling I cannot imagine that there would be) the router is not magically going to move the IP address 172.16.5.1 255.255.255.252 over to MgmtEth0/RP1/CPU0/0 when RP0 dies on Christmas morning at 4:41 AM. So if you're relying on the IP 172.16.5.1 for SSH/SNMP Polling (or whatever else) good luck with that dude. This is why for 20 years we've only ever known our routers by their loopback IP addresses. Thanks, -Drew -----Original Message----- From: Gary Sparkes <gary@kisaracorporation.com> Sent: Thursday, August 7, 2025 2:24 PM To: North American Network Operators Group <nanog@lists.nanog.org>; 'LJ Wobker (lwobker)' <lwobker@cisco.com> Cc: Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting In my mind, management ports should be on an entirely separated fabric, immune to the 'general public' fabric. That's what this seems to "loosely" achieve" -----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org> Sent: Thursday, August 7, 2025 10:47 AM To: 'LJ Wobker (lwobker)' <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Yes but there is no functional difference between using MGMT eth ports and simply specifying the IP address of the NMS somewhere. (which we already did in the ACL applied to the damn snmp-community command in the first place.......) Your argument was that someone who is on that 'whitelisted IP' could burn the control plane down if there was a whitelisted IP. From that standpoint the same applies to mgmteth ports, so what is the actual benefit of any of this then? From an operational standpoint I don't want to specify a pps or bps rate that all SNMP traffic can pass the policer through. I want to specify an IP address that bypasses the policer entirely like you can do on all other network operating systems CoPP configuration. I appreciate you googling the docs. I will try to increase the rate on the policer just for SNMP but as I've mentioned at least 15 times in these threads I can't even see the way it's currently configured. Just completely blind poking around in the internals of this thing. Cool. -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Thursday, August 7, 2025 10:33 AM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :) BUT - the policer rates are different based on where the packets arrive. XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU. If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe. BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time. Following my own suggestion of earlier, google returned this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_asr9k-5Fr5-2D1_addr-5Fserv_configuration_guide_b-5Fipaddr-5Fcg51xa9k_b-5Fipaddr-5Fcg51xa9k-5Fchapter-5F01000.pdf&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=xyh9Ub_u-ByRGArTKOUFAY_VG7nbtFkgwg8dyvPG8Og&e= and this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_ip-2Daddresses_command_reference_b-2Dip-2Daddresses-2Dcr-2Dasr9000_b-2Dipaddr-2Dcr-2Dasr9k-5Fchapter-5F0111.html&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=HzsiarzWMDm6qryzhiWgfx3qfEDVuKc2DjWnyiGtH1o&e= Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this: configure lpts pifib hardware police flow [snmp] rate [something] You should be able to futz around with the rates until you figure out what eliminates the drops. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_5F2ASQ5X7EGIV3LGB77FJG3BBZ5GAPA7_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=NN913kEnXLtkQDUjQ2az8yeet3rtQiE8ZN9RgkEfRZkjkd8wU6CaS2aeEgzTrVEu&s=oyWQby9pGan2lvXj6M7w36_Sek45C9RfQeyU6R8gW8w&e=

Also can you explain why it automatically adds entries for NTP and DNS servers that are configured in the router's configuration but not for SNMP/NMS hosts? I didn't configure this: L4 Protocol : UDP VRF ID : 0x00000000 Destination IP : any Source IP/BFD Disc: ip address redacted Port/Type : Port:123 Source Port : any Is Fragment : 0 Is SYN : any Is Bundle : na Is Virtual : na Interface : any Slice : 0 V/L/T/F : 0/IPv4_LISTENER/0/NTP-known DestNode : LU(0x30) DestAddr : LU(0x30) Accepted/Dropped : 0/0 Po/Ar/Bu : 80/200pps/200ms State : pl_pifib_state_complete -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Thursday, August 7, 2025 10:33 AM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :) BUT - the policer rates are different based on where the packets arrive. XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU. If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe. BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time. Following my own suggestion of earlier, google returned this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_asr9k-5Fr5-2D1_addr-5Fserv_configuration_guide_b-5Fipaddr-5Fcg51xa9k_b-5Fipaddr-5Fcg51xa9k-5Fchapter-5F01000.pdf&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=xyh9Ub_u-ByRGArTKOUFAY_VG7nbtFkgwg8dyvPG8Og&e= and this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_ip-2Daddresses_command_reference_b-2Dip-2Daddresses-2Dcr-2Dasr9000_b-2Dipaddr-2Dcr-2Dasr9k-5Fchapter-5F0111.html&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=HzsiarzWMDm6qryzhiWgfx3qfEDVuKc2DjWnyiGtH1o&e= Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this: configure lpts pifib hardware police flow [snmp] rate [something] You should be able to futz around with the rates until you figure out what eliminates the drops. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e=

Okay I configured LPTS like this: lpts pifib hardware police flow snmp rate 4294967295 ! These tests were run on a machine connected directly to the ports on the ASR9902. interface TwentyFiveGigE0/0/0/4 ipv4 address 192.168.101.1 255.255.255.252 ! [drew@cisconightmaretesthost.localdomain ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 192.168.101.1 -x 195,261 OK: host '192.168.101.1 ', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m5.779s user 0m0.335s sys 0m0.011s Takes 5.779 seconds to respond. interface MgmtEth0/RP0/CPU0/0 ipv4 address 172.16.5.1 255.255.255.252 ! [drew@cisconightmaretesthost.localdomain ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 172.16.5.1 -x 195,261 OK: host '172.16.5.1', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m0.790s user 0m0.333s sys 0m0.012s [drew@cisconightmaretesthost.localdomain ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 172.16.5.1 -x 195,261 OK: host '172.16.5.1', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m0.776s user 0m0.322s sys 0m0.024s Takes 0.0776 seconds to respond from MgmtEth0 Why does it take so much longer to respond? Is that also a cool LPTS feature to protect the control plane? How do I configure it to not delay the responses exponentially? Thank you, -Drew -----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org> Sent: Thursday, August 7, 2025 10:54 AM To: 'LJ Wobker (lwobker)' <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Also can you explain why it automatically adds entries for NTP and DNS servers that are configured in the router's configuration but not for SNMP/NMS hosts? I didn't configure this: L4 Protocol : UDP VRF ID : 0x00000000 Destination IP : any Source IP/BFD Disc: ip address redacted Port/Type : Port:123 Source Port : any Is Fragment : 0 Is SYN : any Is Bundle : na Is Virtual : na Interface : any Slice : 0 V/L/T/F : 0/IPv4_LISTENER/0/NTP-known DestNode : LU(0x30) DestAddr : LU(0x30) Accepted/Dropped : 0/0 Po/Ar/Bu : 80/200pps/200ms State : pl_pifib_state_complete -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Thursday, August 7, 2025 10:33 AM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :) BUT - the policer rates are different based on where the packets arrive. XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU. If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe. BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time. Following my own suggestion of earlier, google returned this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_asr9k-5Fr5-2D1_addr-5Fserv_configuration_guide_b-5Fipaddr-5Fcg51xa9k_b-5Fipaddr-5Fcg51xa9k-5Fchapter-5F01000.pdf&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=xyh9Ub_u-ByRGArTKOUFAY_VG7nbtFkgwg8dyvPG8Og&e= and this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_ip-2Daddresses_command_reference_b-2Dip-2Daddresses-2Dcr-2Dasr9000_b-2Dipaddr-2Dcr-2Dasr9k-5Fchapter-5F0111.html&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=HzsiarzWMDm6qryzhiWgfx3qfEDVuKc2DjWnyiGtH1o&e= Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this: configure lpts pifib hardware police flow [snmp] rate [something] You should be able to futz around with the rates until you figure out what eliminates the drops. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_GEQGSD4IJTM4UBJJHZA3U2PDFZBW4ZFS_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=sBHhnSFW449gtSVDkdUsiFLy6L-gIdljtUDkX09kU2kZMmeLEyjgDnqz7Cne92Wt&s=FGMxBs59RNxWah1SAXy8aDwGSIjnlT41LR6mJ7J17Hk&e=

Hi Drew, You can check what’s going on with SNMP with the following commands. show snmp request incoming-queue detail show snmp request drop summary It’s unlikely there is anything in the actual packet forwarding path that’s delaying something 5 seconds, there aren’t really buffers to do that in the hardware. If you look at the LPTS entries are you seeing the SNMP packets show up under SNMP LPTS entry or the default UDP one. show lpts pifib hardware police location 0/0/CPU0 | inc UDP show lpts pifib hardware police location 0/0/CPU0 | inc SNMP Phil From: Drew Weaver via NANOG <nanog@lists.nanog.org> Date: Thursday, August 7, 2025 at 10:17 To: 'North American Network Operators Group' <nanog@lists.nanog.org>, 'LJ Wobker (lwobker)' <lwobker@cisco.com> Cc: Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Okay I configured LPTS like this: lpts pifib hardware police flow snmp rate 4294967295 ! These tests were run on a machine connected directly to the ports on the ASR9902. interface TwentyFiveGigE0/0/0/4 ipv4 address 192.168.101.1 255.255.255.252 ! [drew@cisconightmaretesthost.localdomain ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 192.168.101.1 -x 195,261 OK: host '192.168.101.1 ', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m5.779s user 0m0.335s sys 0m0.011s Takes 5.779 seconds to respond. interface MgmtEth0/RP0/CPU0/0 ipv4 address 172.16.5.1 255.255.255.252 ! [drew@cisconightmaretesthost.localdomain ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 172.16.5.1 -x 195,261 OK: host '172.16.5.1', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m0.790s user 0m0.333s sys 0m0.012s [drew@cisconightmaretesthost.localdomain ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 172.16.5.1 -x 195,261 OK: host '172.16.5.1', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m0.776s user 0m0.322s sys 0m0.024s Takes 0.0776 seconds to respond from MgmtEth0 Why does it take so much longer to respond? Is that also a cool LPTS feature to protect the control plane? How do I configure it to not delay the responses exponentially? Thank you, -Drew -----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org> Sent: Thursday, August 7, 2025 10:54 AM To: 'LJ Wobker (lwobker)' <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Also can you explain why it automatically adds entries for NTP and DNS servers that are configured in the router's configuration but not for SNMP/NMS hosts? I didn't configure this: L4 Protocol : UDP VRF ID : 0x00000000 Destination IP : any Source IP/BFD Disc: ip address redacted Port/Type : Port:123 Source Port : any Is Fragment : 0 Is SYN : any Is Bundle : na Is Virtual : na Interface : any Slice : 0 V/L/T/F : 0/IPv4_LISTENER/0/NTP-known DestNode : LU(0x30) DestAddr : LU(0x30) Accepted/Dropped : 0/0 Po/Ar/Bu : 80/200pps/200ms State : pl_pifib_state_complete -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Thursday, August 7, 2025 10:33 AM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :) BUT - the policer rates are different based on where the packets arrive. XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU. If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe. BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time. Following my own suggestion of earlier, google returned this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_asr9k-5Fr5-2D1_addr-5Fserv_configuration_guide_b-5Fipaddr-5Fcg51xa9k_b-5Fipaddr-5Fcg51xa9k-5Fchapter-5F01000.pdf&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=xyh9Ub_u-ByRGArTKOUFAY_VG7nbtFkgwg8dyvPG8Og&e= and this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_ip-2Daddresses_command_reference_b-2Dip-2Daddresses-2Dcr-2Dasr9000_b-2Dipaddr-2Dcr-2Dasr9k-5Fchapter-5F0111.html&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=HzsiarzWMDm6qryzhiWgfx3qfEDVuKc2DjWnyiGtH1o&e= Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this: configure lpts pifib hardware police flow [snmp] rate [something] You should be able to futz around with the rates until you figure out what eliminates the drops. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_GEQGSD4IJTM4UBJJHZA3U2PDFZBW4ZFS_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=sBHhnSFW449gtSVDkdUsiFLy6L-gIdljtUDkX09kU2kZMmeLEyjgDnqz7Cne92Wt&s=FGMxBs59RNxWah1SAXy8aDwGSIjnlT41LR6mJ7J17Hk&e= _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/MPLDGAVF...

Our experience has been that although the command is accepted by the parser, a very large number for the rate limit doesn’t seem to be programmed. It's important to do the show lpts pifib hardware police | inc SNMP to verify the rate limit was programmed as you desire. Note the number will be slightly different than you specify. This command also lets you see how many packets lpts dropped before hitting the snmp process. I am guessing the default 523 rate is still there dropping snmp and the delay is due to retires, if lpts is the cause of your issue. Tnx Chris From: Phil Bedard via NANOG <nanog@lists.nanog.org> Date: Thursday, August 7, 2025 at 3:12 PM To: North American Network Operators Group <nanog@lists.nanog.org>, 'LJ Wobker (lwobker)' <lwobker@cisco.com> Cc: Phil Bedard <bedard.phil@gmail.com> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting [Caution: External Email] Hi Drew, You can check what’s going on with SNMP with the following commands. show snmp request incoming-queue detail show snmp request drop summary It’s unlikely there is anything in the actual packet forwarding path that’s delaying something 5 seconds, there aren’t really buffers to do that in the hardware. If you look at the LPTS entries are you seeing the SNMP packets show up under SNMP LPTS entry or the default UDP one. show lpts pifib hardware police location 0/0/CPU0 | inc UDP show lpts pifib hardware police location 0/0/CPU0 | inc SNMP Phil From: Drew Weaver via NANOG <nanog@lists.nanog.org> Date: Thursday, August 7, 2025 at 10:17 To: 'North American Network Operators Group' <nanog@lists.nanog.org>, 'LJ Wobker (lwobker)' <lwobker@cisco.com> Cc: Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Okay I configured LPTS like this: lpts pifib hardware police flow snmp rate 4294967295 ! These tests were run on a machine connected directly to the ports on the ASR9902. interface TwentyFiveGigE0/0/0/4 ipv4 address 192.168.101.1 255.255.255.252 ! [drew@cisconightmaretesthost.localdomain ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 192.168.101.1 -x 195,261 OK: host '192.168.101.1 ', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m5.779s user 0m0.335s sys 0m0.011s Takes 5.779 seconds to respond. interface MgmtEth0/RP0/CPU0/0 ipv4 address 172.16.5.1 255.255.255.252 ! [drew@cisconightmaretesthost.localdomain ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 172.16.5.1 -x 195,261 OK: host '172.16.5.1', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m0.790s user 0m0.333s sys 0m0.012s [drew@cisconightmaretesthost.localdomain ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 172.16.5.1 -x 195,261 OK: host '172.16.5.1', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m0.776s user 0m0.322s sys 0m0.024s Takes 0.0776 seconds to respond from MgmtEth0 Why does it take so much longer to respond? Is that also a cool LPTS feature to protect the control plane? How do I configure it to not delay the responses exponentially? Thank you, -Drew -----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org> Sent: Thursday, August 7, 2025 10:54 AM To: 'LJ Wobker (lwobker)' <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Also can you explain why it automatically adds entries for NTP and DNS servers that are configured in the router's configuration but not for SNMP/NMS hosts? I didn't configure this: L4 Protocol : UDP VRF ID : 0x00000000 Destination IP : any Source IP/BFD Disc: ip address redacted Port/Type : Port:123 Source Port : any Is Fragment : 0 Is SYN : any Is Bundle : na Is Virtual : na Interface : any Slice : 0 V/L/T/F : 0/IPv4_LISTENER/0/NTP-known DestNode : LU(0x30) DestAddr : LU(0x30) Accepted/Dropped : 0/0 Po/Ar/Bu : 80/200pps/200ms State : pl_pifib_state_complete -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Thursday, August 7, 2025 10:33 AM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :) BUT - the policer rates are different based on where the packets arrive. XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU. If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe. BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time. Following my own suggestion of earlier, google returned this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_asr9k-5Fr5-2D1_addr-5Fserv_configuration_guide_b-5Fipaddr-5Fcg51xa9k_b-5Fipaddr-5Fcg51xa9k-5Fchapter-5F01000.pdf&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=xyh9Ub_u-ByRGArTKOUFAY_VG7nbtFkgwg8dyvPG8Og&e= and this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_ip-2Daddresses_command_reference_b-2Dip-2Daddresses-2Dcr-2Dasr9000_b-2Dipaddr-2Dcr-2Dasr9k-5Fchapter-5F0111.html&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=HzsiarzWMDm6qryzhiWgfx3qfEDVuKc2DjWnyiGtH1o&e= Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this: configure lpts pifib hardware police flow [snmp] rate [something] You should be able to futz around with the rates until you figure out what eliminates the drops. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_GEQGSD4IJTM4UBJJHZA3U2PDFZBW4ZFS_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=sBHhnSFW449gtSVDkdUsiFLy6L-gIdljtUDkX09kU2kZMmeLEyjgDnqz7Cne92Wt&s=FGMxBs59RNxWah1SAXy8aDwGSIjnlT41LR6mJ7J17Hk&e= _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/MPLDGAVF... _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/WEOJRFBR...

Drew - the extra 5 seconds do look like a timeout somewhere (sorry I am not a specialist on those platforms). I see your inband connection is 25Gbps. Perhaps some other rate is being exceeded. Are you able to limit your NMS’s speed to communicate at the same speed as the management interface (supposedly 1Gbps)? If yes, do you still see the 5-second extra delay? Pedro Martins Prado pedro.prado@gmail.com / +353 83 036 1875
On 7 Aug 2025, at 16:16, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Okay I configured LPTS like this:
lpts pifib hardware police flow snmp rate 4294967295 ! These tests were run on a machine connected directly to the ports on the ASR9902.
interface TwentyFiveGigE0/0/0/4 ipv4 address 192.168.101.1 255.255.255.252 ! [drew@cisconightmaretesthost.localdomain <mailto:drew@cisconightmaretesthost.localdomain> ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 192.168.101.1 -x 195,261 OK: host '192.168.101.1 ', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m5.779s user 0m0.335s sys 0m0.011s
Takes 5.779 seconds to respond.
interface MgmtEth0/RP0/CPU0/0 ipv4 address 172.16.5.1 255.255.255.252 ! [drew@cisconightmaretesthost.localdomain <mailto:drew@cisconightmaretesthost.localdomain> ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 172.16.5.1 -x 195,261 OK: host '172.16.5.1', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0
real 0m0.790s user 0m0.333s sys 0m0.012s [drew@cisconightmaretesthost.localdomain <mailto:drew@cisconightmaretesthost.localdomain> ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 172.16.5.1 -x 195,261 OK: host '172.16.5.1', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0
real 0m0.776s user 0m0.322s sys 0m0.024s
Takes 0.0776 seconds to respond from MgmtEth0
Why does it take so much longer to respond?
Is that also a cool LPTS feature to protect the control plane? How do I configure it to not delay the responses exponentially?
Thank you, -Drew
-----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>> Sent: Thursday, August 7, 2025 10:54 AM To: 'LJ Wobker (lwobker)' <lwobker@cisco.com <mailto:lwobker@cisco.com>>; 'North American Network Operators Group' <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>> Cc: Drew Weaver <drew.weaver@thenap.com <mailto:drew.weaver@thenap.com>> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
Also can you explain why it automatically adds entries for NTP and DNS servers that are configured in the router's configuration but not for SNMP/NMS hosts?
I didn't configure this:
L4 Protocol : UDP VRF ID : 0x00000000 Destination IP : any Source IP/BFD Disc: ip address redacted Port/Type : Port:123 Source Port : any Is Fragment : 0 Is SYN : any Is Bundle : na Is Virtual : na Interface : any Slice : 0 V/L/T/F : 0/IPv4_LISTENER/0/NTP-known DestNode : LU(0x30) DestAddr : LU(0x30) Accepted/Dropped : 0/0 Po/Ar/Bu : 80/200pps/200ms State : pl_pifib_state_complete
-----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Thursday, August 7, 2025 10:33 AM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :)
BUT - the policer rates are different based on where the packets arrive.
XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU.
If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe.
BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time.
Following my own suggestion of earlier, google returned this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_asr9k-5Fr5-2D1_addr-5Fserv_configuration_guide_b-5Fipaddr-5Fcg51xa9k_b-5Fipaddr-5Fcg51xa9k-5Fchapter-5F01000.pdf&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=xyh9Ub_u-ByRGArTKOUFAY_VG7nbtFkgwg8dyvPG8Og&e= and this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_ip-2Daddresses_command_reference_b-2Dip-2Daddresses-2Dcr-2Dasr9000_b-2Dipaddr-2Dcr-2Dasr9k-5Fchapter-5F0111.html&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=HzsiarzWMDm6qryzhiWgfx3qfEDVuKc2DjWnyiGtH1o&e=
Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this:
configure lpts pifib hardware police flow [snmp] rate [something]
You should be able to futz around with the rates until you figure out what eliminates the drops.
--lj
-----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
Hello,
One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports.
This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network?
What?
-Drew
-----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
Some more background might be useful here...
"Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is:
- identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer
The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config.
I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always.
This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms.
Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals.
If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists.
I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing.
For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate
Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes"
I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup".
On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works.
--lj
-----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
Hi there,
It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic.
I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later.
So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all.
Still waiting for TAC to tell me how to whitelist a single /32 in the policer.
In 9 more weeks I'll let you know what the result ends up being.
Thanks though for stopping by. -Drew
-----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting
Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-)
In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place.
No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words.
Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like.
First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing.
I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations.
I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it.
As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree.
At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling?
Anyway... hopefully that points you at least somewhat in the right direction.
--lj
-----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
Sorry, Tom. I’m not taking the bait.
-mel via cell
On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote:
Mel-
You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages :
1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation.
You have attempted to frame these comments as :
honest and sincere attempts by other members to help identify the possible problem.
While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*.
Saku made 2 comments that addressed these falsehoods :
It might be easier to contribute, if there is familiarity to the subject matter.
some community member piled on with what can only be described as a bizarre drivel.
The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it.
There is a massive difference between the following statements :
1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ]
It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack.
Nobody is bullying you, or anybody else, in this conversation.
On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :(
-mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_GEQGSD4IJTM4UBJJHZA3U2PDFZBW4ZFS_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=sBHhnSFW449gtSVDkdUsiFLy6L-gIdljtUDkX09kU2kZMmeLEyjgDnqz7Cne92Wt&s=FGMxBs59RNxWah1SAXy8aDwGSIjnlT41LR6mJ7J17Hk&e= _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/MPLDGAVF...

Our actual NMS is like 2 hops away from this router and is connected at 1Gbps. This test machine that has the 25Gbps connection and the 1Gbps connection to the MGMT ethernet port is sitting right below the router in the rack. So the speed of the remote device really doesn’t make much of a difference. I’ve also tried various versions of the check_ifstatus script from the Nagios-Tools package but my assumption is that if the script isn’t killing the ASR9001s and ASR9010 RPs it shouldn’t kill the ASR9902 RP. I’m even excluding all of the interfaces that don’t actually exist on the ASR9902 but that the MIB incorrectly reports that do exist (another little lol). Thanks, -Drew From: Pedro Prado <pedro.prado@gmail.com> Sent: Friday, August 8, 2025 7:45 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com>; Drew Weaver <drew.weaver@thenap.com> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Drew - the extra 5 seconds do look like a timeout somewhere (sorry I am not a specialist on those platforms). I see your inband connection is 25Gbps. Perhaps some other rate is being exceeded. Are you able to limit your NMS’s speed to communicate at the same speed as the management interface (supposedly 1Gbps)? If yes, do you still see the 5-second extra delay? Pedro Martins Prado pedro.prado@gmail.com<mailto:pedro.prado@gmail.com> / +353 83 036 1875 On 7 Aug 2025, at 16:16, Drew Weaver via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Okay I configured LPTS like this: lpts pifib hardware police flow snmp rate 4294967295 ! These tests were run on a machine connected directly to the ports on the ASR9902. interface TwentyFiveGigE0/0/0/4 ipv4 address 192.168.101.1 255.255.255.252 ! [drew@cisconightmaretesthost.localdomain<mailto:drew@cisconightmaretesthost.localdomain> ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 192.168.101.1 -x 195,261 OK: host '192.168.101.1 ', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m5.779s user 0m0.335s sys 0m0.011s Takes 5.779 seconds to respond. interface MgmtEth0/RP0/CPU0/0 ipv4 address 172.16.5.1 255.255.255.252 ! [drew@cisconightmaretesthost.localdomain<mailto:drew@cisconightmaretesthost.localdomain> ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 172.16.5.1 -x 195,261 OK: host '172.16.5.1', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m0.790s user 0m0.333s sys 0m0.012s [drew@cisconightmaretesthost.localdomain<mailto:drew@cisconightmaretesthost.localdomain> ~]# time /usr/local/nagios/libexec/check_ifstatus -C testASR9902 -H 172.16.5.1 -x 195,261 OK: host '172.16.5.1', interfaces up: 36, down: 0, dormant: 0, excluded: 36, unused: 0 |up=36 down=0 dormant=0 excluded=36 unused=0 real 0m0.776s user 0m0.322s sys 0m0.024s Takes 0.0776 seconds to respond from MgmtEth0 Why does it take so much longer to respond? Is that also a cool LPTS feature to protect the control plane? How do I configure it to not delay the responses exponentially? Thank you, -Drew -----Original Message----- From: Drew Weaver via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> Sent: Thursday, August 7, 2025 10:54 AM To: 'LJ Wobker (lwobker)' <lwobker@cisco.com<mailto:lwobker@cisco.com>>; 'North American Network Operators Group' <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> Cc: Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Also can you explain why it automatically adds entries for NTP and DNS servers that are configured in the router's configuration but not for SNMP/NMS hosts? I didn't configure this: L4 Protocol : UDP VRF ID : 0x00000000 Destination IP : any Source IP/BFD Disc: ip address redacted Port/Type : Port:123 Source Port : any Is Fragment : 0 Is SYN : any Is Bundle : na Is Virtual : na Interface : any Slice : 0 V/L/T/F : 0/IPv4_LISTENER/0/NTP-known DestNode : LU(0x30) DestAddr : LU(0x30) Accepted/Dropped : 0/0 Po/Ar/Bu : 80/200pps/200ms State : pl_pifib_state_complete -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com<mailto:lwobker@cisco.com>> Sent: Thursday, August 7, 2025 10:33 AM To: Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>>; 'North American Network Operators Group' <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :) BUT - the policer rates are different based on where the packets arrive. XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU. If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe. BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time. Following my own suggestion of earlier, google returned this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_asr9k-5Fr5-2D1_addr-5Fserv_configuration_guide_b-5Fipaddr-5Fcg51xa9k_b-5Fipaddr-5Fcg51xa9k-5Fchapter-5F01000.pdf&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=xyh9Ub_u-ByRGArTKOUFAY_VG7nbtFkgwg8dyvPG8Og&e= and this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_ip-2Daddresses_command_reference_b-2Dip-2Daddresses-2Dcr-2Dasr9000_b-2Dipaddr-2Dcr-2Dasr9k-5Fchapter-5F0111.html&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=HzsiarzWMDm6qryzhiWgfx3qfEDVuKc2DjWnyiGtH1o&e= Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this: configure lpts pifib hardware police flow [snmp] rate [something] You should be able to futz around with the rates until you figure out what eliminates the drops. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com<mailto:lwobker@cisco.com>>; 'North American Network Operators Group' <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com<mailto:lwobker@cisco.com>> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>>; 'North American Network Operators Group' <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> Cc: LJ Wobker (lwobker) <lwobker@cisco.com<mailto:lwobker@cisco.com>> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> Cc: LJ Wobker (lwobker) <lwobker@cisco.com<mailto:lwobker@cisco.com>> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> Cc: nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>; Mel Beckman <mel@beckman.org<mailto:mel@beckman.org>> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org%3cmailto:nanog@lists.nanog.org>>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org%3cmailto:nanog@lists.nanog.org>>> wrote: Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also, Joe On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand. -mel On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc<mailto:beecher@beecher.cc%3cmailto:beecher@beecher.cc>>> wrote: Mel- Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that. Probably just want to take the L here. On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org%3cmailto:nanog@lists.nanog.org>><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org%3cmailto:nanog@lists.nanog.org>>>> wrote: Saku, What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling. -mel On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org%3cmailto:nanog@lists.nanog.org>><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org%3cmailto:nanog@lists.nanog.org>>>> wrote: On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org%3cmailto:nanog@lists.nanog.org>><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org%3cmailto:nanog@lists.nanog.org>>>> wrote: I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find. This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS. But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally. It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_GEQGSD4IJTM4UBJJHZA3U2PDFZBW4ZFS_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=sBHhnSFW449gtSVDkdUsiFLy6L-gIdljtUDkX09kU2kZMmeLEyjgDnqz7Cne92Wt&s=FGMxBs59RNxWah1SAXy8aDwGSIjnlT41LR6mJ7J17Hk&e= _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/MPLDGAVFS3ZYV4ELQHMS6GJL6BUUTFZT/<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_MPLDGAVFS3ZYV4ELQHMS6GJL6BUUTFZT_&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=qzt7nOpvcOvGGAGczV0PVBDrWJS0RxZ2BjYOKB4AMzXz_rb4NGFjuh_OjeM84jsd&s=qIFyv47FVKmEwkr6k5KvsUeDirNevssqoNVgnw7Jkkw&e=>

I cannot, but I can guess.... Possibly because of directionality? NTP and DNS are things where you tell the router "establish some kind of client-server relationship with this thing where you're the client". For SNMP it's the other way around. </guess> --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 10:54 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Also can you explain why it automatically adds entries for NTP and DNS servers that are configured in the router's configuration but not for SNMP/NMS hosts? I didn't configure this: L4 Protocol : UDP VRF ID : 0x00000000 Destination IP : any Source IP/BFD Disc: ip address redacted Port/Type : Port:123 Source Port : any Is Fragment : 0 Is SYN : any Is Bundle : na Is Virtual : na Interface : any Slice : 0 V/L/T/F : 0/IPv4_LISTENER/0/NTP-known DestNode : LU(0x30) DestAddr : LU(0x30) Accepted/Dropped : 0/0 Po/Ar/Bu : 80/200pps/200ms State : pl_pifib_state_complete -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Thursday, August 7, 2025 10:33 AM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting No, it's never okay to burn the CPU down, that's the whole raison d'etre for the existence of LPTS. :) BUT - the policer rates are different based on where the packets arrive. XR is designed to handle distributed systems where you could have lots (20+) linecards and more lots (100+) of forwarding NPUs - each of these is a potential source of traffic punted to the RP CPU. If you know you have a single source (the management ethernet) you can set the policer value and "trust" it - all the traffic coming to you is coming down a single pipe. BUT - in a distributed system, you have to consider how many sources you might have and (presumably) set the policer rate to be lower, because you don’t know how many of those might be active at the same time. There's no way for the poor bastard who has to pick the default to know what a "reasonable" number is for how many interfaces sprinkled across the system will be trying to send packets to the CPU at the same time. Following my own suggestion of earlier, google returned this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_asr9k-5Fr5-2D1_addr-5Fserv_configuration_guide_b-5Fipaddr-5Fcg51xa9k_b-5Fipaddr-5Fcg51xa9k-5Fchapter-5F01000.pdf&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=xyh9Ub_u-ByRGArTKOUFAY_VG7nbtFkgwg8dyvPG8Og&e= and this: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cisco.com_c_en_us_td_docs_routers_asr9000_software_ip-2Daddresses_command_reference_b-2Dip-2Daddresses-2Dcr-2Dasr9000_b-2Dipaddr-2Dcr-2Dasr9k-5Fchapter-5F0111.html&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=RKik23YswQ8Vloi9G2xpsfb4rF_YYEJhMDf-1BoIk3jLIVM8_kpNvOIyf2Zc8j5t&s=HzsiarzWMDm6qryzhiWgfx3qfEDVuKc2DjWnyiGtH1o&e= Which appears to be how you manually configure/override the policer values. The semantics are that the location is the linecard (node) ID. So if the port your SNMP polls arrive on is on card 1, you'll need something like this: configure lpts pifib hardware police flow [snmp] rate [something] You should be able to futz around with the rates until you figure out what eliminates the drops. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Thursday, August 7, 2025 9:02 AM To: LJ Wobker (lwobker) <lwobker@cisco.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hello, One thing you seem to be forgetting (I'm not sure if you are or not) is that SNMP polling appears to work just fine if it is sourced from a machine connected to the mgmteth ports. This seems to imply that the policy at Cisco is it's okay to burn the CPU to the ground if the requests come from the MGMTETH* ports but not if they come across the network? What? -Drew -----Original Message----- From: LJ Wobker (lwobker) <lwobker@cisco.com> Sent: Wednesday, August 6, 2025 3:51 PM To: Drew Weaver <drew.weaver@thenap.com>; 'North American Network Operators Group' <nanog@lists.nanog.org> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Some more background might be useful here... "Back In The Day" ... IOS XR was designed at least in some part to "automatically" protect the control plane from misconfiguration or malicious activity. The LPTS architecture is built around a whole bunch of what I will call "automated policers" -- depending on the platform you might have dozens or hundreds of them. The flow very roughly is: - identify traffic types (BGP, BGP from a known peer, SNMP, ARP, and god only knows how many other things) - check those incoming packets against a policer - drop packets that exceed that policer The whole idea here is that we don't want lots of packets from some "not totally trusted" thing to melt the box. But there are A LOT of assumptions that have to be made here... and any assumption made to protect the box when it has 2,000 BGP peers and 10,000 interfaces (which the asr9k can actually do) are very likely not great assumptions for a system with a MUCH smaller/simpler config. I was around back in the very early 2000's when we discussed, specifically, whether or not we should try to find a way to put the LPTS policer values into the configuration. There's no perfect answer here. One of the fundamental choices in XR (which is not *always* followed but pretty close) is to not put things in the config that are default values. This prevents the config from being a bazillion lines long. Another fundamental choice is that we only put things in the configuration that the user has actually configured... which sort of seems obvious but definitely isn't always. This gets to your complaint, which is at the very least partially legitimate: the system is doing things (policing) that on other platforms have to be explicitly configured. But on XR systems, these LPTS (i.e. control plane policers) are IMPLICIT, and therefore they're a lot less visible than you might see on other platforms. Not that it matters, but 20+ years ago we spent quite a few heated meetings kicking around how to handle this, and balance the need for visibility, configurability, and simplicity. No answer is possible to optimize for all three, so what we have to day is more or less what we landed on. My apologies that it's not super obvious, but we did our best to balance those conflicting goals. If you google for queries like "asr 9000 lpts policers" or "configure lpts policer rates" you should find at least a few config guides, maybe a decent doc or two on https://urldefense.proofpoint.com/v2/url?u=http-3A__xrdocs.io&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ViYAfFMekaPRzBt-zWcVDHiKWQ4O9Du98Z8hWqR_hF-9SBi2VFQlUvC-R-DyJx63&s=q6Zww2ZNpyrRNWp2Kcf30WxUiUOmyK8S3QDcvLCWmbA&e=, and god willing even a ciscolive presentation or two (hell, one of them might even be mine) that talks about this. Again, I'll apologize as my experience with the box was from roughly 2007-2013 so I can't quote you chapter and verse here -- but I do KNOW some of that capability exists. I do know that there are ways to configure the policer rates for specific protocols... I can't swear on my life that SNMP is one of those that is configurable, hopefully the answer is "yes" -- at least in theory if we know how fast the polling station wants to ask, we can open the policer to that number. This is often trial and error as the exact numbers and unit conversions aren't obvious unless you want to put a damn packet sniffer on the thing. For what it's worth... even if it's possible (and I'm not sure it is?) I would advise against pure-whitelisting any host or netblock in LPTS. If you completely turned off the LPTS policers and you either accidentally (or someone else maliciously) got into that machine and did something {Accidental, Stupid, Devious} -- you risk melting the box. It's MUCH safer to figure out a combination of: - slowing down the requests from the external thing - knowingly opening up the policers to a different / faster rate Than to say "machine a.b.c.d has totally unfettered access to my RP CPU and can melt me if he likes" I would push more towards "find me a way to open this policer up so I can choose how to balance my own risks like a grownup". On the TAC side, if it's true that you've had a case open forever and haven't been able to get to SOMEONE who knows enough about IOS XR to get you at least remotely close to "you need to twiddle with the LPTS policers to get the behavior that you want" -- then something is pretty badly broken. If you could unicast me a case number and whatever other specific info might help, I will do a little digging on the back end. These routers are pretty fuckin' complex and troubleshooting them definitely isn't easy (again, sorry - we tried to err on the side of "be overly careful") ... but we also can't have the support org be some total black hole that can't get you reasonably quickly to someone who sorta knows how the thing works. --lj -----Original Message----- From: Drew Weaver <drew.weaver@thenap.com> Sent: Wednesday, August 6, 2025 1:28 PM To: 'North American Network Operators Group' <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Hi there, It has since been identified that the reason that the traffic is being dropped is the SNMP policer in LPTS seems to just be discarding the traffic. I didn't configure it to do this. This doesn't show up in the running configuration TAC still hasn't figured that out yet and they still haven't provided me with a way to simply whitelist traffic from a single /32 in LPTS 4 weeks later. So yes, I will admit that I am somewhat ignorant on what you guys call CoPP in this platform but I don't think me being ignorant about it is as big as problem as TAC being fully unaware that it exists at all. Still waiting for TAC to tell me how to whitelist a single /32 in the policer. In 9 more weeks I'll let you know what the result ends up being. Thanks though for stopping by. -Drew -----Original Message----- From: LJ Wobker (lwobker) via NANOG <nanog@lists.nanog.org> Sent: Tuesday, August 5, 2025 11:46 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: LJ Wobker (lwobker) <lwobker@cisco.com> Subject: RE: Cisco ASR9902 SNMP polling ... is interesting Wow, what a food fight this became. At risk of wading into the middle school cafeteria and wearing ketchup, I'll attempt to possibly return to some semblance of a technical discussion. For background, I was the first TME here at Cisco who worked on the ASR9k program back in the mid-2000s - so my memory might be a bit rusty but at least to some degree I can present myself as a knowledgeable source. I also worked in TAC back in the day so I have some familiarity with their processes. ;-) In all IOS XR systems, there's an architecture designed to make sure that control plane traffic coming from the very high speed interfaces doesn't overwhelm the processing capacity of the system. The whole thing is relatively complex and the exact implementation differs from system to system in the fine details, but the idea is that you want to funnel down traffic headed for the RP or linecard CPU so that by the time it gets there you're as confident as you can be that the traffic is legitimate and in the right place. No one uses the same terms for anything, so some terminology... We (cisco) broadly call the infrastructure "LPTS": Local Packet Transport Service. The act of identifying that a packet needs to go up to the control plane we call "punting". Every modern system from every vendor has SOME form or fashion for this, otherwise it's trivial to melt the system with traffic pointed at the control CPU. But no one uses the same words. Drew - I'm sorry you don't like the way my router works. This hurts my feelings, because he's really a pretty good little router. Let's see if we can figure out why. In this case, there's lots of possible places things can behave in ways you don't like. First question... when you say "we poll SNMP on any interface" -- do you mean you're changing the target IP address for where you point the SNMP manager, where sometimes it's the management ethernet address and sometimes a regular interface address? This matters because IN GENERAL (yes, I know...) the system behaves differently here. Packets pointed at the management ethernet are run through a different set of policers than if you're pointed at a data plane interface. IN GENERAL the "best" way to do something like this is with a loopback interface, as the defaults are "better" tuned for that config compared to a direct zap at the actual interface IP. This also has the benefit of virtualizing the loopback so you aren't tied to a single point of failure, but that's a separate thing. I'm not remotely surprised that the behavior is different from the 9901 to the 9902. At risk of being an apologist for my implementation, even within a product family there are always (sometimes stupid) differences in the implementations. I can ABSOLUTELY ASSURE you that there is nowhere in the code that says "make 62% of the SNMP polls fail because we hate Drew". This is not how our system works... somewhere in the path there's a policer or a meter that is either dropping some of the inbound requests, or the SNMP process is choking on something and timing out, or something like that. But there is no such thing on the router side as an SNMP polling timeout - that is a client side thing. The SNMP process on the router gets a request, and it sends a response, that's all. If something (either external or within the labyrinth of internal protections) drops the request on the way in, SNMP never sees it, so it can't respond. Then the client has to figure out what to do, which often is throw a timeout and/or retry -- but this is dependent on the implementation of the SNMP client, and there's nothing that the router OS can do about it. As someone mentioned along the way, the right way to troubleshoot this is to find the commands in XR that will show you the counters and potential drops between "the packet arrives at the box" and "SNMP did its thing with the packet". I have to sadly admit that here I'm one of those old-ass Air Force Colonels who USED to be a hot-shit pilot, but now I fly a desk. 12 years ago I could have told you chapter and verse what the commands are and where all the drop/meter counters live, but father time is undefeated and now I spend time apologizing on NANOG lists instead of having an actual lab to work on. That said, your expectation that someone in TAC can figure out what's happening and explain it to you is totally reasonable, and if you're not getting those answers then escalating is correct. We might not be able (or willing) to change the behavior to do things the way you like them, but we absolutely owe you an explanation of what's actually happening. If you can't this from TAC, let me know and I will attempt to shake that tree. At LEAST the following things would need to be chased down, some of which we'd have to get from the customer side... * which interface(s) are being polled? MgmtEth, loopback, physical? * at what rate does the SNMP station generate and send request packets? (Time windows matter here. A short but very fast burst of requests might trip the meter, stuff like that) * can this rate be changed? * how much stuff (i.e. MIBs) are you polling? Anyway... hopefully that points you at least somewhat in the right direction. --lj -----Original Message----- From: Mel Beckman via NANOG <nanog@lists.nanog.org> Sent: Monday, August 4, 2025 10:42 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@lists.nanog.org; Mel Beckman <mel@beckman.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Sorry, Tom. I’m not taking the bait. -mel via cell On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher@beecher.cc> wrote: Mel- You have made multiple technical assertions in this thread that are demonstrably false. Quoting your earlier messages : 1. Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. 2. Cisco is likely to say that the control plane is only fully supported on the management port. 3. In-band SNMP to data forwarding interfaces violates that separation. You have attempted to frame these comments as : honest and sincere attempts by other members to help identify the possible problem. While your attempts to help may have been honest and sincere attempts to help the OP, they actually achieved the opposite effect. Your incorrect technical assertions , if anything, only hindered the OP's attempt to understand and identify their issue. Comment #1 is especially egregious ; you're telling Drew that his observations are *normal*. Saku made 2 comments that addressed these falsehoods : It might be easier to contribute, if there is familiarity to the subject matter. some community member piled on with what can only be described as a bizarre drivel. The first was a polite way of calling out the technical inaccuracies. The second was a more forceful way of stating "what you said was wrong". Most people, when they are corrected on a factual point, tend to reply with "Oh hey, I got that wrong, thanks for setting me straight" and move on. You seem to have just ignored it. There is a massive difference between the following statements : 1. You are an idiot. [ Attacking the person ] 2. What you said was idiotic. [ Attacking the statements ] It seems to be that you may be struggling in identifying that difference, and taking *any* criticism as a personal attack. Nobody is bullying you, or anybody else, in this conversation. On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: Thanks. I knew we were not so out to lunch! If you don’t push back on bullies, they take over the community. It crops up on nanog periodically. :( -mel via cell
On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hi Mel, for what it's worth, I could not figure out what they were referring to by Saku's comments. I saw no justification for their complaint. A bit out of character for Saku, also,
Joe
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote: I’ll just let the incivility of you both stand.
-mel
On Aug 2, 2025, at 3:52 PM, Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote:
Mel-
Saku did not call *you* any names. He called your *incorrect statements* in this thread 'bizzard drivel'. Which he is absolutely correct about. While your intentions may certainly have been to help, your statements here have been frankly dead wrong and did not accomplish that.
Probably just want to take the L here.
On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote: Saku,
What is actually appalling is that a member of NANOG calls “bizarre drivel” the honest and sincere attempts by other members to help identify the possible problem. There’s no cause to be uncivil, people can disagree without stooping to name-calling.
-mel
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org><mailto:nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>> wrote:
I don't have in depth knowledge of Cisco's SNMP implementations, or even the ASR platform specifically, but if Cisco TAC is telling you this is 'normal', they are completely full of shit, and you should click any and every 'escalate' button you can find.
This almost sounds like a default control plane DDOS policer / LPTS , something like that. There are various complicated reasons for this, LPTS policer is unlikely culprit, but possible. Bug search will show various DDTS with poor SNMP performance outcome, most of them are unrelated to LPTS.
But absolutely correct, the right solution is to escalate. In common case this would be SE from your account team, who would fight for you internally.
It is appalling that OP came to nanog after correctly suspecting TAC is gaslighting them, some community member piled on with what can only be described as a bizarre drivel. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org _archives_list_nanog-40lists.nanog.org_message_&d=DwIGaQ&c=euGZstcaT DllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1 o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVb VNzC8h9aWfc&s=HVfyN6javj5uX9ryxhOPxQSiMh2CkQJi_x885vQNB0M&e= 7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_C&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=GtBCB1cT8FNf1-UD3vXAYH3UHRxLVcJgUO3WmSwt7a4&e= F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_ archives_list_nanog-40lists.nanog.org_message_O&d=DwIGaQ&c=euGZstcaTD llvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4 LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNz C8h9aWfc&s=5ODSQkzz8W9Kr3E9IWdoE9mLIm_bTb8Z0H9sSnuNKSs&e= J7ICXLSPFND32X2XS2U7XIWA6DALSIF/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_E4&d=DwIGaQ&c=euGZstcaTDl lvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4Lt A3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h 9aWfc&s=8E7xCMB2-Jb4W7oWeB3GOFc7RFZYZYj3W5GlLeJX9BA&e= CF2TFV35VSJVFEZZANEWOAJFUUNDL4/
NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_RU6WF77QOECXABP6IDCMVNLAH67X4WNW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=7t0FidWZ-eOmjk9WDRw3h78TBRDLNkqVXdQ7GSVnrOc&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=M4KruocLeATFcohjqA5bbEtk4u9xNX0ZFyQt_OhItjM&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ZRQKyw0amYQuJDrOUoUtJCSVZKWvb764kPF4UjLJKuQ_I4NVhCMVbVNzC8h9aWfc&s=HK3eMuL_F8B7YRLvgGYzli-lx8Y-h6JZXJr7pNeDoCg&e=

Cisco is likely to say that the control plane is only fully supported on the management port. After all, the control plane was invented to separate management functions from the data forwarding process. -mel via cell
On Aug 1, 2025, at 11:28 AM, Saku Ytti <saku@ytti.fi> wrote:
On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc.
62% would be devastating.
In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic.
Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor.
It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them.
-- ++ytti

On Fri, 1 Aug 2025 at 21:45, Mel Beckman <mel@beckman.org> wrote:
Cisco is likely to say that the control plane is only fully supported on the management port. After all, the control plane was invented to separate management functions from the data forwarding process.
Cisco will 100% fully support control-plane on in-line ports, before cloudy shop in-line was the norm, MGMT port exception. Management ports to this day are extremely dangerous and I consider using them anti pattern. If you have MGMT L2 broadcast domain, you can potentially break every control-plane by having L2 storms (actual risk that has happened). Because you cannot protect the control-plane on MGMT ETH port, for obvious reasons. And you can protect (some platforms better, some worse) control-plane on in-line ports by combination of QoS, ACL, control-plane ACL, control-plane police/shape/ACL. It might be easier to contribute, if there is familiarity to the subject matter. -- ++ytti

90 seconds... but also we can poll Supervisor 720s at the same rate and they don't time out or delay responses. 😊 -----Original Message----- From: Mel Beckman <mel@beckman.org> Sent: Friday, August 1, 2025 9:37 AM To: nanog@lists.nanog.org Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@lists.nanog.org Subject: Re: Cisco ASR9902 SNMP polling ... is interesting How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals. -mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ysryPUJQffffnj7NA86CIwOOPWsLq5M3v5_s4HOyDNvnNLv1f3rVKsrdYPpBqkBS&s=4ACrFXyyWFX_bxDa3z7o9aQNmNy6DiDi3Xn9hjKjKJY&e=

Could this be somehow related to control plane policing? You might be hitting some default policy threshold, and may have to adjust it to allow snmp from your specific sources at a higher rate. IIRC on ios-xr that's called lots or sdr (but I had been a while...) On Fri, Aug 1, 2025, 6:59 AM Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
90 seconds... but also we can poll Supervisor 720s at the same rate and they don't time out or delay responses.
😊
-----Original Message----- From: Mel Beckman <mel@beckman.org> Sent: Friday, August 1, 2025 9:37 AM To: nanog@lists.nanog.org Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@lists.nanog.org Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals.
-mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ysryPUJQffffnj7NA86CIwOOPWsLq5M3v5_s4HOyDNvnNLv1f3rVKsrdYPpBqkBS&s=4ACrFXyyWFX_bxDa3z7o9aQNmNy6DiDi3Xn9hjKjKJY&e= _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/C3BD4D2R...

They have the configuration. They seem to be saying that there is just some invisible hand inside the router controlling the responses on a per interface basis. It’s pretty heavy handed tbh. -Drew From: Arie Vayner <ariev@vayner.net> Sent: Friday, August 1, 2025 10:15 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Mel Beckman <mel@beckman.org>; Drew Weaver <drew.weaver@thenap.com> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Could this be somehow related to control plane policing? You might be hitting some default policy threshold, and may have to adjust it to allow snmp from your specific sources at a higher rate. IIRC on ios-xr that's called lots or sdr (but I had been a while...) On Fri, Aug 1, 2025, 6:59 AM Drew Weaver via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: 90 seconds... but also we can poll Supervisor 720s at the same rate and they don't time out or delay responses. 😊 -----Original Message----- From: Mel Beckman <mel@beckman.org<mailto:mel@beckman.org>> Sent: Friday, August 1, 2025 9:37 AM To: nanog@lists.nanog.org<mailto:nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>>; nanog@lists.nanog.org<mailto:nanog@lists.nanog.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals. -mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ysryPUJQffffnj7NA86CIwOOPWsLq5M3v5_s4HOyDNvnNLv1f3rVKsrdYPpBqkBS&s=4ACrFXyyWFX_bxDa3z7o9aQNmNy6DiDi3Xn9hjKjKJY&e=

I'm guessing your hitting default SNMP LPTS rate limits which you may be able to change, but for various reasons may not be a good idea. Here is some good info on the LPTS architecture written by Xander himself. https://community.cisco.com/t5/service-providers-knowledge-base/asr9000-xr-l... You might want to look at streaming telemetry. __ Jim On 8/1/25, 11:01 AM, "Arie Vayner via NANOG" <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>> wrote: CAUTION: The e-mail below is from an external source. Please exercise caution before opening attachments, clicking links, or following guidance. Could this be somehow related to control plane policing? You might be hitting some default policy threshold, and may have to adjust it to allow snmp from your specific sources at a higher rate. IIRC on ios-xr that's called lots or sdr (but I had been a while...) On Fri, Aug 1, 2025, 6:59 AM Drew Weaver via NANOG <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>> wrote:
90 seconds... but also we can poll Supervisor 720s at the same rate and they don't time out or delay responses.
😊
-----Original Message----- From: Mel Beckman <mel@beckman.org <mailto:mel@beckman.org>> Sent: Friday, August 1, 2025 9:37 AM To: nanog@lists.nanog.org <mailto:nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com <mailto:drew.weaver@thenap.com>>; nanog@lists.nanog.org <mailto:nanog@lists.nanog.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals.
-mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ysryPUJQffffnj7NA86CIwOOPWsLq5M3v5_s4HOyDNvnNLv1f3rVKsrdYPpBqkBS&s=4ACrFXyyWFX_bxDa3z7o9aQNmNy6DiDi3Xn9hjKjKJY&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ysryPUJQffffnj7NA86CIwOOPWsLq5M3v5_s4HOyDNvnNLv1f3rVKsrdYPpBqkBS&s=4ACrFXyyWFX_bxDa3z7o9aQNmNy6DiDi3Xn9hjKjKJY&e=> _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>/message/C3BD4D2RCOWC75EMNUOHE62T3P3KWYJ6/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>/message/FWKMUUHY74HJZEBXB6TJKSF6UQH7RPKM/ The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited.

On 2025-08-01 08:09, Drew Weaver via NANOG wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Hi Drew, We recently stood up a couple pairs of ASR 9902's. We poll SNMP heavily in-band to a loopback interface, never the management LAN ports. Not seeing the issue you've mentioned when testing with a few snmpwalks to each router. We've got lots of different ASR9k models. 9902 doesn't seem to be any different as far as SNMP querying goes. We use default LPTS settings for these units. I also hit up my teammate working on that deployment, and he hasn't seen any issues. We're using XR 24.3.2 if that helps.
Thanks, -Drew
HTH, -Brian

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On Friday, 1 August 2025 at 15:10, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
Hi Drew. I haven't worked with IOS-XR for a few years but I have had problems with SNMP in the past. A few years ago I was deploying 9904 chassis with a modest amount of services on them (not thousands of services per chassis, but hundreds, so they weren't idle, but certainly not under any mentionable load control-plane wise). We noticed that SNMP polling was returning nothing for some of the services and it ended up being a couple of problems compounding. At that time we had virtually every 9xxx and 99xx chassis in the network. This problem only exists with these boxes, but they were also the only routers in the network with this exact combination of services on them. So nothing chassis specific I believe, this was on IOS-XR 6.something for reference. When the SNMP process received the poll request, it in turn fires off requests internally to other processes to get the stats being asked for. This is/was (I'm out of touch now) a maximum amount of time SNMP would wait for the other processes to respond. If they didn't respond in time the SNMP response was sent without those details, or the query which was pending an answer was just dropped and no response sent. So problem number one was those other processes taking too long to respond. Problem number two was those other processes had a bug; after provisioning services those processes hadn't pick up on the changes. The request came from the SNMP process to the other processes for stats relating to X, the other processes had no knowledge of X. TAC provided us with a short term work around, which was to restart some processes after provisioning new services, to ensure the processes were aware of the new services and would respond to the SNMP process with the requested stats. Long term they created a DDTS and SMU to fix the inter-process timeout issues and missing stats issues. I don't know exactly what you're polling, and like I said, I'm a bit out of touch here, but I can say that it took quite a lot of digging and working with TAC to bottom out the problem. We could replicate the issue in the lab which always helps. So if you can replicate the issue in the lab, and turn all debugging settings up to 11, you might be able to find something like we did (TAC sent some debug commands and we could trace the issue in the lab, IPC debgging is hard on these boxes!). Even if TAC are trying to fob you off by saying "oh yeah this is dropped by LTSP as expected", get them to prove it to you; replicate the issue in the lab and gather the debug info which shows how/where the request is being dropped, if they can't find the drop in LTPS, then LTPS isn't the problem and you need to look else were like IPC/EOBC. Cheers, James. -----BEGIN PGP SIGNATURE----- Version: ProtonMail wsG5BAEBCgBtBYJojwuDCZCoEx+igX+A+0UUAAAAAAAcACBzYWx0QG5vdGF0 aW9ucy5vcGVucGdwanMub3Jne6/4gXRiD1B/oyx0cm03xe+bPfK4lh4ErWip GQvWH9oWIQQ+k2NZBObfK8Tl7sKoEx+igX+A+wAAlZAP/3DFVyR1e2DiJ7bv 4udRjmX0xLtEpkZM7UJGwhihiIiqW/JV+TyqEq75Ko4Hu9xOiOURkz+VkBx6 XfgbrFuXxPT/i4NhcMZ8qygSBwoAQK4Z6CIeXf9msWnly259hA5F88SB/oCc LKOjcH6hNHVI2+5jSIMJFqNVkD/3b2eSIF3ZHbdWsZ+uq6QRMMvM7gOHuJAm 0mCiOBTUbN4oIziQdN0u3tbWVgIWulC2TyM8wy2FGyN+r5ks/jqmZQhlTASo u+9kPtBZ4SQc0p9GwvYZN4XHXQtcftx7xrPymmXhwU+3UaE70YoSZuJVULE+ eGipYUDUiQ9OA9pj39BWZe6fpRLqgoeEl6GDiavHYLcfw3CVkMwThPUGDRFX RDNxKpebdPEZHzsJyvqORgM+/RHYIAgqOOQIQdiZGbaiIxa8ooT06WJRkNWO iKL2jOkXndbbxWenyw4RNZwVX50H1Y79eqUxhU24yiA0Wfs6qVCRZWP3M//g a+BJwOBqb8gFmuJErvezWUPUNIt94UhEv8aFpVtPZ7R4IIpPzFBFlLUV4HEK F5IU9JgqvyBagubAPeIOoUk0+DboE4gGBPTz9RGWSfdxM+D5pX/HWBh8qIwB prO6hDk3PkkGAk4/fhd5jNmGk0hE0yKyTubE711vIJ9vXD1dJbqKgoOjSA18 t315dumB =LkYJ -----END PGP SIGNATURE-----

When the SNMP process received the poll request, it in turn fires off requests internally to other processes to get the stats being asked for. This is/was (I'm out of touch now) a maximum amount of time SNMP would wait for the other processes to respond. If they didn't respond in time the SNMP response was sent without those details, or the query which was pending an answer was just dropped and no response sent. So problem number one was those other processes taking too long to respond.
This is generally true with multiple vendors. Main SNMP process is responsible for receiving/replying to requests, and separate processes for actual collection of the data from elements. If those collector processes wedge or bog down ( on their own, or due to the element polled being bogged down, etc) that timeout bubbles up and you get nothing. Pretty standard design to segment the things this way. On Sun, Aug 3, 2025 at 3:12 AM James Bensley via NANOG < nanog@lists.nanog.org> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On Friday, 1 August 2025 at 15:10, Drew Weaver via NANOG < nanog@lists.nanog.org> wrote:
Hello,
Hi Drew.
I haven't worked with IOS-XR for a few years but I have had problems with SNMP in the past.
A few years ago I was deploying 9904 chassis with a modest amount of services on them (not thousands of services per chassis, but hundreds, so they weren't idle, but certainly not under any mentionable load control-plane wise).
We noticed that SNMP polling was returning nothing for some of the services and it ended up being a couple of problems compounding. At that time we had virtually every 9xxx and 99xx chassis in the network. This problem only exists with these boxes, but they were also the only routers in the network with this exact combination of services on them. So nothing chassis specific I believe, this was on IOS-XR 6.something for reference.
When the SNMP process received the poll request, it in turn fires off requests internally to other processes to get the stats being asked for. This is/was (I'm out of touch now) a maximum amount of time SNMP would wait for the other processes to respond. If they didn't respond in time the SNMP response was sent without those details, or the query which was pending an answer was just dropped and no response sent. So problem number one was those other processes taking too long to respond.
Problem number two was those other processes had a bug; after provisioning services those processes hadn't pick up on the changes. The request came from the SNMP process to the other processes for stats relating to X, the other processes had no knowledge of X.
TAC provided us with a short term work around, which was to restart some processes after provisioning new services, to ensure the processes were aware of the new services and would respond to the SNMP process with the requested stats. Long term they created a DDTS and SMU to fix the inter-process timeout issues and missing stats issues.
I don't know exactly what you're polling, and like I said, I'm a bit out of touch here, but I can say that it took quite a lot of digging and working with TAC to bottom out the problem. We could replicate the issue in the lab which always helps. So if you can replicate the issue in the lab, and turn all debugging settings up to 11, you might be able to find something like we did (TAC sent some debug commands and we could trace the issue in the lab, IPC debgging is hard on these boxes!). Even if TAC are trying to fob you off by saying "oh yeah this is dropped by LTSP as expected", get them to prove it to you; replicate the issue in the lab and gather the debug info which shows how/where the request is being dropped, if they can't find the drop in LTPS, then LTPS isn't the problem and you need to look else were like IPC/EOBC.
Cheers, James.
-----BEGIN PGP SIGNATURE----- Version: ProtonMail
wsG5BAEBCgBtBYJojwuDCZCoEx+igX+A+0UUAAAAAAAcACBzYWx0QG5vdGF0 aW9ucy5vcGVucGdwanMub3Jne6/4gXRiD1B/oyx0cm03xe+bPfK4lh4ErWip GQvWH9oWIQQ+k2NZBObfK8Tl7sKoEx+igX+A+wAAlZAP/3DFVyR1e2DiJ7bv 4udRjmX0xLtEpkZM7UJGwhihiIiqW/JV+TyqEq75Ko4Hu9xOiOURkz+VkBx6 XfgbrFuXxPT/i4NhcMZ8qygSBwoAQK4Z6CIeXf9msWnly259hA5F88SB/oCc LKOjcH6hNHVI2+5jSIMJFqNVkD/3b2eSIF3ZHbdWsZ+uq6QRMMvM7gOHuJAm 0mCiOBTUbN4oIziQdN0u3tbWVgIWulC2TyM8wy2FGyN+r5ks/jqmZQhlTASo u+9kPtBZ4SQc0p9GwvYZN4XHXQtcftx7xrPymmXhwU+3UaE70YoSZuJVULE+ eGipYUDUiQ9OA9pj39BWZe6fpRLqgoeEl6GDiavHYLcfw3CVkMwThPUGDRFX RDNxKpebdPEZHzsJyvqORgM+/RHYIAgqOOQIQdiZGbaiIxa8ooT06WJRkNWO iKL2jOkXndbbxWenyw4RNZwVX50H1Y79eqUxhU24yiA0Wfs6qVCRZWP3M//g a+BJwOBqb8gFmuJErvezWUPUNIt94UhEv8aFpVtPZ7R4IIpPzFBFlLUV4HEK F5IU9JgqvyBagubAPeIOoUk0+DboE4gGBPTz9RGWSfdxM+D5pX/HWBh8qIwB prO6hDk3PkkGAk4/fhd5jNmGk0hE0yKyTubE711vIJ9vXD1dJbqKgoOjSA18 t315dumB =LkYJ -----END PGP SIGNATURE----- _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/LFEK3ERO...
participants (23)
-
ab.nanog@loopw.com
-
Arie Vayner
-
Barry Greene
-
Brian Knight
-
Chris Griffin
-
Drew Weaver
-
Gary Sparkes
-
Hank Nussbacher
-
James Bensley
-
Joe Loiacono
-
LJ Wobker (lwobker)
-
ljwobker@gmail.com
-
Marc Binderberger
-
Mark Tinka
-
Mel Beckman
-
Mikael Abrahamsson
-
Nick Hilliard
-
Pedro Prado
-
Phil Bedard
-
Rampley, Jim F
-
Ryan Hamel
-
Saku Ytti
-
Tom Beecher