Hi, The link is not the only component to fail - routers and routing protocols all contribute at least as much. If your customers would have redundant connections, you also would like to look at convergence times. So a measurement end to end by a probe in the customers network could give you a more true picture. Facing that even sub second outages can annoy a video meeting, it might be that you want to poll more often than a second. Realizing that your "internet service" depends on the behaviour of all all the other service providers quality and if you even start monitoring that - you understand that you are "in deep shit" ;-) I did a small scale global inter domain measurement and discovered that the sheer number of small outages is way too high. Many of them might be routing changeovers in multi-redundant networks. cheers Olav On 15.12.2018 18:55, Tim Pozar wrote:
In one of my client's company, we use LibreNMS. It is normally used > to get SNMP data but we also have it configured to ping our more > "high touch" cients routers. In that case we can record performance > such as latency and packet loss. It will generate graphs that we can > pass on to the client. It also can be set to alert us if a client's > router is not pingable. > > LibreNMS can also integrate Smokeping if you want Smokeping-style > graphs showing standard deviation, etc. > > Currently I am running LibreNMS on a VM on a Proxmox cluser with a > couple of cores. It is probing 385 devices every 5 minutes and > keeping up with that. In polling, SNMP is the real time and CPU hog > where ping is pretty low impact. > > Tim > > On 12/15/18 9:37 AM, Baldur Norddahl wrote: >> You could configure BFD to send out a SNMP alert when three packets >> have been missed on a 50 ms cycle. Or instantly if the interface >> charges state to down. This way you would know that they are down >> within 150 ms. >> >> BFD is the hardware solution. A Linux box that has to ping 1000 >> addresses per second will be very taxed and likely unable to do >> that in a stable way. You will have seconds where it fails to do >> them all followed by seconds where it attempts to do them more than >> once. The result is that the statistics gathered is worthless. If >> you do something like this, it is much better to have a less >> ambitious 1 minute cycle. >> >> Take a look at Smokeping. If you want a graph to show the quality >> of the line, Smokeping makes some very good graphs for that. >> >> Regards Baldur >> >> 15. dec. 2018 16.49 skrev "Colton Conor" <colton.conor@gmail.com ><mailto:colton.conor@gmail.com>> <mailto:colton.conor@gmail.com><mailto:colton.conor@gmail.com>>: >> >> How much compute and network resources does it take for a NMS to: >> >> 1. ICMP ping a device every second 2. Record these results. 3. >> Report an alarm after so many seconds of missed pings. >> >> We are looking for a system to in near real-time monitor if an end >> customers router is up or down. SNMP I assume would be too >> resource intensive, so ICMP pings seem like the only logical >> solution. >> >> The question is once a second pings too polling on an NMS and a >> consumer grade router? Does it take much network bandwidth and CPU >> resources from both the NMS and CPE side? >> >> Lets say this is for a 1,000 customer ISP. >> >> >>