
Hi, Just to correct: I was saying that 62% of the polls timeout and that only 38% actually result in responses and those 38% responses take multiples of time longer to actually complete if polling on an in-line interface. This is just with a simple bash script running "time check_interfaces <args>" from the Nagios-Tools package and doing hundreds of poll runs in a row with various pauses between pollings. It would be a little less of a concern if any other product did this but the idea that they just sort of left it 62% broken and shipped it that way is really making me wonder what else only functions at 38%. We don't have a huge budget and the ASR9902 costs almost twice as much as the Arista devices we would've preferred to buy [the Arista device in question has 30x100GE ports and the ASR9902 is basically an 8x100GE router with a very poorly configured midplane/gearbox that ties into some sort of switch [that nobody seems to know how any of that works at Cisco, either]. If we had an unlimited budget we'd just mulligan this thing and buy the DCS devices that we want but we're stuck with it and if we're stuck with it I don't think it's insane to expect it to operate at least as well as an ASR9001. Thanks, -Drew -----Original Message----- From: Saku Ytti via NANOG <nanog@lists.nanog.org> Sent: Friday, August 1, 2025 2:28 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Saku Ytti <saku@ytti.fi> Subject: Re: Cisco ASR9902 SNMP polling ... is interesting On Fri, 1 Aug 2025 at 16:44, Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Also, non-management interfaces do packet processing in silicon at the ASIC level and don’t have the capacity to do anything more than statistical sampling of packets that require CPU-level processing to retrieve counters and generate SNMP responses. 62 % is as good a sampling rate as any other.
Absolutely not. We expect to process 100% of legitimate control-plane traffic, e.g. BGP, ISIS, LDP, ARP, SNMP etc. 62% would be devastating. In fair weather this is easy, in bad weather you need hardware based discrimination on what is expected good traffic and what is unexpected bad traffic. Drew is in the right to expect functioning SNMP and is experiencing significant regression in behaviour compared to previous devices from the same vendor. It would take a very long time to explain how to troubleshoot this, as it is an extremely complicated topic with a lot of nuance that even the best experts of Cisco are unaware of. I've regularly had TAC handwave problems away 'sometimes it be like that' because they didn't want to do the work. Once our NOC spent months on a case where TAC was blaming our QoS configuration for BGP flaps, by the time I got on it, I escalated it to Xander, and initially even Xander agreed with TAC that we need to look into QoS configuration, until I reminded him that LPTS is not subject to QoS or ACL (which is terrible design choice, for reasons I'm happy to elaborate), which immediately reminded him how LPTS works and the TAC case finally got some traction. This is a completely untenable situation, IOS-XR regularly has complicated problems that TAC is not equipped to solve and the expectation is that the user has deep enough knowledge to rebuff them. -- ++ytti _______________________________________________ NANOG mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.nanog.org_archives_list_nanog-40lists.nanog.org_message_KK73RTHMIZXLUMICYPEECO2AQXILKHIQ_&d=DwIGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw&m=d_XQ0w1ltWzu7JBKSWfGAfci8ywpv0Vz_Lg6Q-eS5pZAWpgoZ9PBnm_qnf2BAqbd&s=CmbeUcr_Ltz9nrzW2h4l3azL_KBEqloxrF9Rl9GuEpQ&e=