The problem I am trying to solve is to accurately be able to tell a customer if their home internet connection was up or down. Example, customer calls in and says my internet was down for 2 minutes yesterday. We need to be able to verify that their internet connection was indeed down. Right now we have no easy way to do this.
Getting metrics like packet loss and jitter would be great too, though I realize ICMP data path does not always equal customer experience as many network device prioritize ICMP traffic. However ICMP pings over the internet do usually accurately tell if a customers modem is indeed online or not.
Most devices out in the field like ONT's and DSL modems do not support SNMP but rather use TR-069 for management. Most of these devices only check into the TR-069 ACS server once a day.
If the consumer device does support SNMP, they usually have weak broadcom or qualcom SoC processors, outdated linux kernel embedded operating systems, limited ram, and storage. Most of these can't handle SNMP walks every minute let alone every 5. We are talking about sub $100 routers here not Juniper, Cisco, Arista, etc.
Most all of these consumer devices are connected to an carrier aggregation device like a DSLAM, OLT, ethernet switch, or wireless access point. These access devices do support SNMP, but most manufactures recommend only 5 minute SNMP poling, so a 2 minute outage would not easily be detected. Plus its hard to correlate that consumer X is on port Y on access switch, and get that right for a tier 1 CSR.
The only two ways I think I can accomplish this is:
1. ICMP pings to a device every so many seconds. Almost every device supports responding to WAN ICMP pings.
or
2. IPFIX sampling at core router, and then drilling down by customer IP. I think this will tell me if any data was flowing to this customers IP on a second by second basis, but won't necessarily give us an up or down indicator. Requires nothing from the consumer's router.