
When the SNMP process received the poll request, it in turn fires off requests internally to other processes to get the stats being asked for. This is/was (I'm out of touch now) a maximum amount of time SNMP would wait for the other processes to respond. If they didn't respond in time the SNMP response was sent without those details, or the query which was pending an answer was just dropped and no response sent. So problem number one was those other processes taking too long to respond.
This is generally true with multiple vendors. Main SNMP process is responsible for receiving/replying to requests, and separate processes for actual collection of the data from elements. If those collector processes wedge or bog down ( on their own, or due to the element polled being bogged down, etc) that timeout bubbles up and you get nothing. Pretty standard design to segment the things this way. On Sun, Aug 3, 2025 at 3:12 AM James Bensley via NANOG < nanog@lists.nanog.org> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On Friday, 1 August 2025 at 15:10, Drew Weaver via NANOG < nanog@lists.nanog.org> wrote:
Hello,
Hi Drew.
I haven't worked with IOS-XR for a few years but I have had problems with SNMP in the past.
A few years ago I was deploying 9904 chassis with a modest amount of services on them (not thousands of services per chassis, but hundreds, so they weren't idle, but certainly not under any mentionable load control-plane wise).
We noticed that SNMP polling was returning nothing for some of the services and it ended up being a couple of problems compounding. At that time we had virtually every 9xxx and 99xx chassis in the network. This problem only exists with these boxes, but they were also the only routers in the network with this exact combination of services on them. So nothing chassis specific I believe, this was on IOS-XR 6.something for reference.
When the SNMP process received the poll request, it in turn fires off requests internally to other processes to get the stats being asked for. This is/was (I'm out of touch now) a maximum amount of time SNMP would wait for the other processes to respond. If they didn't respond in time the SNMP response was sent without those details, or the query which was pending an answer was just dropped and no response sent. So problem number one was those other processes taking too long to respond.
Problem number two was those other processes had a bug; after provisioning services those processes hadn't pick up on the changes. The request came from the SNMP process to the other processes for stats relating to X, the other processes had no knowledge of X.
TAC provided us with a short term work around, which was to restart some processes after provisioning new services, to ensure the processes were aware of the new services and would respond to the SNMP process with the requested stats. Long term they created a DDTS and SMU to fix the inter-process timeout issues and missing stats issues.
I don't know exactly what you're polling, and like I said, I'm a bit out of touch here, but I can say that it took quite a lot of digging and working with TAC to bottom out the problem. We could replicate the issue in the lab which always helps. So if you can replicate the issue in the lab, and turn all debugging settings up to 11, you might be able to find something like we did (TAC sent some debug commands and we could trace the issue in the lab, IPC debgging is hard on these boxes!). Even if TAC are trying to fob you off by saying "oh yeah this is dropped by LTSP as expected", get them to prove it to you; replicate the issue in the lab and gather the debug info which shows how/where the request is being dropped, if they can't find the drop in LTPS, then LTPS isn't the problem and you need to look else were like IPC/EOBC.
Cheers, James.
-----BEGIN PGP SIGNATURE----- Version: ProtonMail
wsG5BAEBCgBtBYJojwuDCZCoEx+igX+A+0UUAAAAAAAcACBzYWx0QG5vdGF0 aW9ucy5vcGVucGdwanMub3Jne6/4gXRiD1B/oyx0cm03xe+bPfK4lh4ErWip GQvWH9oWIQQ+k2NZBObfK8Tl7sKoEx+igX+A+wAAlZAP/3DFVyR1e2DiJ7bv 4udRjmX0xLtEpkZM7UJGwhihiIiqW/JV+TyqEq75Ko4Hu9xOiOURkz+VkBx6 XfgbrFuXxPT/i4NhcMZ8qygSBwoAQK4Z6CIeXf9msWnly259hA5F88SB/oCc LKOjcH6hNHVI2+5jSIMJFqNVkD/3b2eSIF3ZHbdWsZ+uq6QRMMvM7gOHuJAm 0mCiOBTUbN4oIziQdN0u3tbWVgIWulC2TyM8wy2FGyN+r5ks/jqmZQhlTASo u+9kPtBZ4SQc0p9GwvYZN4XHXQtcftx7xrPymmXhwU+3UaE70YoSZuJVULE+ eGipYUDUiQ9OA9pj39BWZe6fpRLqgoeEl6GDiavHYLcfw3CVkMwThPUGDRFX RDNxKpebdPEZHzsJyvqORgM+/RHYIAgqOOQIQdiZGbaiIxa8ooT06WJRkNWO iKL2jOkXndbbxWenyw4RNZwVX50H1Y79eqUxhU24yiA0Wfs6qVCRZWP3M//g a+BJwOBqb8gFmuJErvezWUPUNIt94UhEv8aFpVtPZ7R4IIpPzFBFlLUV4HEK F5IU9JgqvyBagubAPeIOoUk0+DboE4gGBPTz9RGWSfdxM+D5pX/HWBh8qIwB prO6hDk3PkkGAk4/fhd5jNmGk0hE0yKyTubE711vIJ9vXD1dJbqKgoOjSA18 t315dumB =LkYJ -----END PGP SIGNATURE----- _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/LFEK3ERO...