
Hello, We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them. We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently. Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router. If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out. If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out. I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design. I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number? The larger implication is that I still can't find another router from another vendor that does this. Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have? Thanks, -Drew

How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals. -mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/HUP4BJYN...

Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. -mel
On Aug 1, 2025, at 6:38 AM, Mel Beckman <mel@beckman.org> wrote:
How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals.
-mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/HUP4BJYN...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/YFBCZDFS...

I don't know if you're speaking specifically about the ASR 9902 or all routers but I can tell you that after doing this for 26 years I've never seen another router handle SNMP responses differently depending on what interface the request comes in on. I can name 8 vendors and even models from Cisco that don't do this. So I'm not sure this is standard practice as you seem to be implying. Thanks, -Drew -----Original Message----- From: Mel Beckman <mel@beckman.org> Sent: Friday, August 1, 2025 9:43 AM To: nanog@lists.nanog.org Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@lists.nanog.org Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other. -mel
On Aug 1, 2025, at 6:38 AM, Mel Beckman <mel@beckman.org> wrote:
How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals.
-mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=Q2RyEqHfEgQ-X2KzSAl-_cydxhA0rlcApGAdZvdw5ve2NIJN86F-3a_rxvmBGX7G&s=tdz6udW6pvsXVnz3KKbQDKNwyYe3cjFT3ZOBcvyuiYo&e=

Each device type will have different internals which might influence the results, but bottom line, SNMP doesn’t scale well for rich information retrieval plus frequent polling. It’s similar to running a heavy SQL query in a database, often. But neither the SNMP servers nor the protocol are really optimized for it… The polling doesn’t cut it. See if the devices support any form of telemetry, where the device itself will take care of collecting and sending to a central server. Pedro Martins Prado pedro.prado@gmail.com / +353 83 036 1875
On 1 Aug 2025, at 15:00, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
I don't know if you're speaking specifically about the ASR 9902 or all routers but I can tell you that after doing this for 26 years I've never seen another router handle SNMP responses differently depending on what interface the request comes in on. I can name 8 vendors and even models from Cisco that don't do this. So I'm not sure this is standard practice as you seem to be implying.
Thanks, -Drew
-----Original Message----- From: Mel Beckman <mel@beckman.org <mailto:mel@beckman.org>> Sent: Friday, August 1, 2025 9:43 AM To: nanog@lists.nanog.org <mailto:nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com <mailto:drew.weaver@thenap.com>>; nanog@lists.nanog.org <mailto:nanog@lists.nanog.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
-mel
On Aug 1, 2025, at 6:38 AM, Mel Beckman <mel@beckman.org> wrote:
How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals.
-mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=Q2RyEqHfEgQ-X2KzSAl-_cydxhA0rlcApGAdZvdw5ve2NIJN86F-3a_rxvmBGX7G&s=tdz6udW6pvsXVnz3KKbQDKNwyYe3cjFT3ZOBcvyuiYo&e=
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/OEOY5K7F...

On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc. 62% would be devastating. In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic. Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor. It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them. -- ++ytti

Hi, Just to correct: I was saying that 62% of the polls timeout and that only 38% actually result in responses and those 38% responses take multiples of time longer to actually complete if polling on an in-line interface. This is just with a simple bash script running "time check_interfaces <args>" from the Nagios-Tools package and doing hundreds of poll runs in a row with various pauses between pollings. It would be a little less of a concern if any other product did this but the idea that they just sort of left it 62% broken and shipped it that way is really making me wonder what else only functions at 38%. We don't have a huge budget and the ASR9902 costs almost twice as much as the Arista devices we would've preferred to buy [the Arista device in question has 30x100GE ports and the ASR9902 is basically an 8x100GE router with a very poorly configured midplane/gearbox that ties into some sort of switch [that nobody seems to know how any of that works at Cisco, either]. If we had an unlimited budget we'd just mulligan this thing and buy the DCS devices that we want but we're stuck with it and if we're stuck with it I don't think it's insane to expect it to operate at least as well as an ASR9001. Thanks, -Drew -----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 1, 2025 2:28 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc. 62% would be devastating. In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic. Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor. It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_KK73RTHMIZXLUMICYPEECO2AQXILKHIQ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=d_XQ0w1ltWzu7JBKSWfGAfci8ywpv0Vz_Lg6Q-eS5pZAWpgoZ9PBnm_qnf2BAqbd&s=CmbeUcr_Ltz9nrzW2h4l3azL_KBEqloxrF9Rl9GuEpQ&e=

Drew, As I said elsewhere, the control plane was invented to separate management functions from the data forwarding process. In-band SNMP to data forwarding interfaces violates that separation. I’d say all bets are off. As they say in mathematics, this behavior is undefined. :) -mel via cell
On Aug 1, 2025, at 11:42 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hi,
Just to correct:
I was saying that 62% of the polls timeout and that only 38% actually result in responses and those 38% responses take multiples of time longer to actually complete if polling on an in-line interface.
This is just with a simple bash script running "time check_interfaces <args>" from the Nagios-Tools package and doing hundreds of poll runs in a row with various pauses between pollings.
It would be a little less of a concern if any other product did this but the idea that they just sort of left it 62% broken and shipped it that way is really making me wonder what else only functions at 38%.
We don't have a huge budget and the ASR9902 costs almost twice as much as the Arista devices we would've preferred to buy [the Arista device in question has 30x100GE ports and the ASR9902 is basically an 8x100GE router with a very poorly configured midplane/gearbox that ties into some sort of switch [that nobody seems to know how any of that works at Cisco, either].
If we had an unlimited budget we'd just mulligan this thing and buy the DCS devices that we want but we're stuck with it and if we're stuck with it I don't think it's insane to expect it to operate at least as well as an ASR9001.
Thanks, -Drew
-----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 1, 2025 2:28 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc.
62% would be devastating.
In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic.
Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor.
It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them.
-- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_KK73RTHMIZXLUMICYPEECO2AQXILKHIQ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=d_XQ0w1ltWzu7JBKSWfGAfci8ywpv0Vz_Lg6Q-eS5pZAWpgoZ9PBnm_qnf2BAqbd&s=CmbeUcr_Ltz9nrzW2h4l3azL_KBEqloxrF9Rl9GuEpQ&e= _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/F2466J65...

So you're saying that for you at your shop, something that you've done for decades across multiple generations of products from the VXR 7206, to the Cisco 7600/6500, to the GSRs, to ASR9000, .... not to mention all of the non-cisco deployments from vendors.... and suddenly the ASR99xx just "cant handle it" ... would be fine and expected? I'm just trying to grasp the... root of what you're saying. Thanks, -Drew -----Original Message----- From: Mel Beckman <mel@beckman.org> Sent: Friday, August 1, 2025 2:47 PM To: nanog@lists.nanog.org Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@lists.nanog.org Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Drew, As I said elsewhere, the control plane was invented to separate management functions from the data forwarding process. In-band SNMP to data forwarding interfaces violates that separation. I’d say all bets are off. As they say in mathematics, this behavior is undefined. :) -mel via cell
On Aug 1, 2025, at 11:42 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hi,
Just to correct:
I was saying that 62% of the polls timeout and that only 38% actually result in responses and those 38% responses take multiples of time longer to actually complete if polling on an in-line interface.
This is just with a simple bash script running "time check_interfaces <args>" from the Nagios-Tools package and doing hundreds of poll runs in a row with various pauses between pollings.
It would be a little less of a concern if any other product did this but the idea that they just sort of left it 62% broken and shipped it that way is really making me wonder what else only functions at 38%.
We don't have a huge budget and the ASR9902 costs almost twice as much as the Arista devices we would've preferred to buy [the Arista device in question has 30x100GE ports and the ASR9902 is basically an 8x100GE router with a very poorly configured midplane/gearbox that ties into some sort of switch [that nobody seems to know how any of that works at Cisco, either].
If we had an unlimited budget we'd just mulligan this thing and buy the DCS devices that we want but we're stuck with it and if we're stuck with it I don't think it's insane to expect it to operate at least as well as an ASR9001.
Thanks, -Drew
-----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 1, 2025 2:28 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc.
62% would be devastating.
In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic.
Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor.
It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them.
-- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_KK73RTHMIZXLUMICYPEECO2AQ XILKHIQ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=d_XQ0w1ltWzu7JBKSWfGAfci8ywp v0Vz_Lg6Q-eS5pZAWpgoZ9PBnm_qnf2BAqbd&s=CmbeUcr_Ltz9nrzW2h4l3azL_KBEqlo xrF9Rl9GuEpQ&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_F2466J65DSWXATIP7DWSXU6FD HFW7L6H_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=lywWjE9qVWdmjXoOSXcoz1MZdEmv dqtTi8IL0y8gmEXL5LsoB6u7hnq1p3q910in&s=U0f81r8gnQbRH0nZcq-fRkKTFYJy8_A eahrx4J0t-so&e=

Cisco is likely to say that the control plane is only fully supported on the management port. After all, the control plane was invented to separate management functions from the data forwarding process. -mel via cell
On Aug 1, 2025, at 11:28 AM, Saku Ytti <saku@ytti.fi> wrote:
On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc.
62% would be devastating.
In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic.
Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor.
It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them.
-- ++ytti

On Fri, 1 Aug 2025 at 21:45, Mel Beckman <mel@beckman.org> wrote:
Cisco is likely to say that the control plane is only fully supported on the management port. After all, the control plane was invented to separate management functions from the data forwarding process.
Cisco will 100% fully support control-plane on in-line ports, before cloudy shop in-line was the norm, MGMT port exception. Management ports to this day are extremely dangerous and I consider using them anti pattern. If you have MGMT L2 broadcast domain, you can potentially break every control-plane by having L2 storms (actual risk that has happened). Because you cannot protect the control-plane on MGMT ETH port, for obvious reasons. And you can protect (some platforms better, some worse) control-plane on in-line ports by combination of QoS, ACL, control-plane ACL, control-plane police/shape/ACL. It might be easier to contribute, if there is familiarity to the subject matter. -- ++ytti

90 seconds... but also we can poll Supervisor 720s at the same rate and they don't time out or delay responses. 😊 -----Original Message----- From: Mel Beckman <mel@beckman.org> Sent: Friday, August 1, 2025 9:37 AM To: nanog@lists.nanog.org Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@lists.nanog.org Subject: Re: Cisco ASR9902 SNMP polling ... is interesting How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals. -mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ysryPUJQffffnj7NA86CIwOOPWsLq5M3v5_s4HOyDNvnNLv1f3rVKsrdYPpBqkBS&s=4ACrFXyyWFX_bxDa3z7o9aQNmNy6DiDi3Xn9hjKjKJY&e=

Could this be somehow related to control plane policing? You might be hitting some default policy threshold, and may have to adjust it to allow snmp from your specific sources at a higher rate. IIRC on ios-xr that's called lots or sdr (but I had been a while...) On Fri, Aug 1, 2025, 6:59 AM Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
90 seconds... but also we can poll Supervisor 720s at the same rate and they don't time out or delay responses.
😊
-----Original Message----- From: Mel Beckman <mel@beckman.org> Sent: Friday, August 1, 2025 9:37 AM To: nanog@lists.nanog.org Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@lists.nanog.org Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals.
-mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ysryPUJQffffnj7NA86CIwOOPWsLq5M3v5_s4HOyDNvnNLv1f3rVKsrdYPpBqkBS&s=4ACrFXyyWFX_bxDa3z7o9aQNmNy6DiDi3Xn9hjKjKJY&e= _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/C3BD4D2R...

They have the configuration. They seem to be saying that there is just some invisible hand inside the router controlling the responses on a per interface basis. It’s pretty heavy handed tbh. -Drew From: Arie Vayner <ariev@vayner.net> Sent: Friday, August 1, 2025 10:15 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Mel Beckman <mel@beckman.org>; Drew Weaver <drew.weaver@thenap.com> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Could this be somehow related to control plane policing? You might be hitting some default policy threshold, and may have to adjust it to allow snmp from your specific sources at a higher rate. IIRC on ios-xr that's called lots or sdr (but I had been a while...) On Fri, Aug 1, 2025, 6:59 AM Drew Weaver via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote: 90 seconds... but also we can poll Supervisor 720s at the same rate and they don't time out or delay responses. 😊 -----Original Message----- From: Mel Beckman <mel@beckman.org<mailto:mel@beckman.org>> Sent: Friday, August 1, 2025 9:37 AM To: nanog@lists.nanog.org<mailto:nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>>; nanog@lists.nanog.org<mailto:nanog@lists.nanog.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals. -mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ysryPUJQffffnj7NA86CIwOOPWsLq5M3v5_s4HOyDNvnNLv1f3rVKsrdYPpBqkBS&s=4ACrFXyyWFX_bxDa3z7o9aQNmNy6DiDi3Xn9hjKjKJY&e=

I'm guessing your hitting default SNMP LPTS rate limits which you may be able to change, but for various reasons may not be a good idea. Here is some good info on the LPTS architecture written by Xander himself. https://community.cisco.com/t5/service-providers-knowledge-base/asr9000-xr-l... You might want to look at streaming telemetry. __ Jim On 8/1/25, 11:01 AM, "Arie Vayner via NANOG" <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>> wrote: CAUTION: The e-mail below is from an external source. Please exercise caution before opening attachments, clicking links, or following guidance. Could this be somehow related to control plane policing? You might be hitting some default policy threshold, and may have to adjust it to allow snmp from your specific sources at a higher rate. IIRC on ios-xr that's called lots or sdr (but I had been a while...) On Fri, Aug 1, 2025, 6:59 AM Drew Weaver via NANOG <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>> wrote:
90 seconds... but also we can poll Supervisor 720s at the same rate and they don't time out or delay responses.
😊
-----Original Message----- From: Mel Beckman <mel@beckman.org <mailto:mel@beckman.org>> Sent: Friday, August 1, 2025 9:37 AM To: nanog@lists.nanog.org <mailto:nanog@lists.nanog.org> Cc: Drew Weaver <drew.weaver@thenap.com <mailto:drew.weaver@thenap.com>>; nanog@lists.nanog.org <mailto:nanog@lists.nanog.org> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
How often are you polling the interfaces? SNMP was never meant for high frequency polling (e.g., once per second), yet I often see people using SNMP as if it were a SCADA service, which is used in industrial automation for high frequency supervisory control and data acquisition. SNMP probes are typically anticipated by device designers to occur at 30 second or 60 second intervals.
-mel
On Aug 1, 2025, at 6:10 AM, Drew Weaver via NANOG <nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>> wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Thanks, -Drew
_______________________________________________ NANOG mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ysryPUJQffffnj7NA86CIwOOPWsLq5M3v5_s4HOyDNvnNLv1f3rVKsrdYPpBqkBS&s=4ACrFXyyWFX_bxDa3z7o9aQNmNy6DiDi3Xn9hjKjKJY&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_HUP4BJYN3E7YQZKMDT6PLM3XBTK7DCJU_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=ysryPUJQffffnj7NA86CIwOOPWsLq5M3v5_s4HOyDNvnNLv1f3rVKsrdYPpBqkBS&s=4ACrFXyyWFX_bxDa3z7o9aQNmNy6DiDi3Xn9hjKjKJY&e=> _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>/message/C3BD4D2RCOWC75EMNUOHE62T3P3KWYJ6/
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org <mailto:nanog@lists.nanog.org>/message/FWKMUUHY74HJZEBXB6TJKSF6UQH7RPKM/ The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited.

On 2025-08-01 08:09, Drew Weaver via NANOG wrote:
Hello,
We purchased an ASR9902 I think almost 2 years ago now intending to replace 4 routers with them.
We had a history of lets just say design decision quirks with the router that prevented us from deploying it until recently.
Then when we finally were able to implement it we've noticed something strange about how SNMP polling works in the router.
If we poll SNMP on any interface that isn't one of the built in management ethernet interfaces the response takes 8x-16x longer to respond and exactly 62% of the polls time out.
If we poll SNMP on the built-in MGMT interfaces the responses are still slower than the ASR9001s that we used to use but they don't seem to time out.
I've had a TAC case with Cisco open over this for weeks now and they are now saying that the slow responses and the 62% poll timeouts are intentional and that they don't see any problem with the design.
I understand the security implications of having control plane stuff responding on all interfaces but the part I don't understand is why bind the SNMP daemon to the non MGMT* interfaces at all if they are making a moral or ethical decision to not allow SNMP to work on non MGMT interfaces. Shouldn't it just not work at all then? Who came up with 62% timeout as the right number?
The larger implication is that I still can't find another router from another vendor that does this.
Has anyone else run into this or did you guys all avoid the ASR 9902 like we should have?
Hi Drew, We recently stood up a couple pairs of ASR 9902's. We poll SNMP heavily in-band to a loopback interface, never the management LAN ports. Not seeing the issue you've mentioned when testing with a few snmpwalks to each router. We've got lots of different ASR9k models. 9902 doesn't seem to be any different as far as SNMP querying goes. We use default LPTS settings for these units. I also hit up my teammate working on that deployment, and he hasn't seen any issues. We're using XR 24.3.2 if that helps.
Thanks, -Drew
HTH, -Brian
participants (7)
-
Arie Vayner
-
Brian Knight
-
Drew Weaver
-
Mel Beckman
-
Pedro Prado
-
Rampley, Jim F
-
Saku Ytti