
So you're saying that for you at your shop, something that you've done for decades across multiple generations of products from the VXR 7206, to the Cisco 7600/6500, to the GSRs, to ASR9000, .... not to mention all of the non-cisco deployments from vendors.... and suddenly the ASR99xx just "cant handle it" ... would be fine and expected? I'm just trying to grasp the... root of what you're saying. Thanks, -Drew -----Original Message----- From: Mel Beckman <mel@beckman.org> Sent: Friday, August 1, 2025 2:47 PM To: nanog@lists.nanog.org Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@lists.nanog.org Subject: Re: Cisco ASR9902 SNMP polling ... is interesting Drew, As I said elsewhere, the control plane was invented to separate management functions from the data forwarding process. In-band SNMP to data forwarding interfaces violates that separation. I’d say all bets are off. As they say in mathematics, this behavior is undefined. :) -mel via cell
On Aug 1, 2025, at 11:42 AM, Drew Weaver via NANOG <nanog@lists.nanog.org> wrote:
Hi,
Just to correct:
I was saying that 62% of the polls timeout and that only 38% actually result in responses and those 38% responses take multiples of time longer to actually complete if polling on an in-line interface.
This is just with a simple bash script running "time check_interfaces <args>" from the Nagios-Tools package and doing hundreds of poll runs in a row with various pauses between pollings.
It would be a little less of a concern if any other product did this but the idea that they just sort of left it 62% broken and shipped it that way is really making me wonder what else only functions at 38%.
We don't have a huge budget and the ASR9902 costs almost twice as much as the Arista devices we would've preferred to buy [the Arista device in question has 30x100GE ports and the ASR9902 is basically an 8x100GE router with a very poorly configured midplane/gearbox that ties into some sort of switch [that nobody seems to know how any of that works at Cisco, either].
If we had an unlimited budget we'd just mulligan this thing and buy the DCS devices that we want but we're stuck with it and if we're stuck with it I don't think it's insane to expect it to operate at least as well as an ASR9001.
Thanks, -Drew
-----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 1, 2025 2:28 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting
On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc.
62% would be devastating.
In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic.
Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor.
It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them.
-- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_KK73RTHMIZXLUMICYPEECO2AQ XILKHIQ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=d_XQ0w1ltWzu7JBKSWfGAfci8ywp v0Vz_Lg6Q-eS5pZAWpgoZ9PBnm_qnf2BAqbd&s=CmbeUcr_Ltz9nrzW2h4l3azL_KBEqlo xrF9Rl9GuEpQ&e= _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_a rchives_list_nanog-40lists.nanog.org_message_F2466J65DSWXATIP7DWSXU6FD HFW7L6H_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPuf M5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=lywWjE9qVWdmjXoOSXcoz1MZdEmv dqtTi8IL0y8gmEXL5LsoB6u7hnq1p3q910in&s=U0f81r8gnQbRH0nZcq-fRkKTFYJy8_A eahrx4J0t-so&e=