On Tue, 30 Oct 2007, Nesser, Phil wrote:
It has been a while since I have had to seriously think about network/system/application monitoring and now I have got to look at it. Can anyone point me towards:
1. Serious documents on monitoring (i.e. not vendor whitepapers)
I think there have been several sets of slides presented at previous NANOG meetings that may be of interest, but I'll have to locate specific URLs.
2. Open Source Tools that you use or would recommend (I know the obvious smokeping, mrtg, nagios).
As much as I hate to give a wishy-washy answer like "it depends", in this case, that's a reasonable start. What tools you use would depend on many factors, such as: * hardware and OS platforms that are realistic for your organization Put another way, if your IT or net mgmt organization standardizes on some flavor of Windows as part of a regular server build, it might not make sense to use tools that require Linux, *BSD, etc, unless you have the people and processes to handle that. Since you mentioned tools like nagios and MRTG, I'm assuming you're working in the unix/Linux/*BSD world, but you know what they say about assumptions :) * goals and metrics What information do you want to get out of your monitoring setup? Do you need to produce regular reports from your NM tools? Do they need to integrate with tools you already use? Do you want the tools to automatically trigger certain actions? if X consecutive pings to $router_ip fail, send out a page, email the NOC, etc... What data do you want to collect from your network devices? SNMP traps? Netflow records? Syslog messages? RMON? Do you need to visualize the data, i.e. generate usage graphs, top-talker scoreboards, etc? Do you need to store the output in a central SQL database so other apps can work with it, do reports, etc? This is by no means an all-inclusive list, but I think it covers some of the important points. jms