Rev. Jeffrey Paul (sneak) writes:
1) Is SNMP the best way to do this? Obviously some of the data (service checks) will need to be collected other ways.
SNMP, the vendor MIBs + SNMP extensions for monitoring hardware specifics (PSU, etc...), and something like Nagios to do the TCP/network checks.
2) Is there any good solution that does both logging/trending of this data and also notification/monitoring/alerting? I've used both Nagios and Cacti in the past, and, due to the number of individual things being monitored (3-5 items per OS instance, 5-10 items per physical server, 10-50 things per network device), setting them both up independently seems like a huge pain. Also, I've never really liked Nagios that much.
Well, you could look at Zabbix, Hyperic, ZenOSS, OpenNMS and see if they cut it better for you, but the trick with Nagios is to use a DB and generate the include files automatically, then have some other more user friendly tools to populate the DB. Or use templates extensively. Then make sure your plugins output performance data for perf.data monitoring, and use something like NagiosGraph http://nagiosgraph.wiki.sourceforge.net/ or PNP4Nagios: http://www.pnp4nagios.org/pnp/about#system_requirements http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN203 http://www.pnp4nagios.org/pnp/screenshots
I recently entertained the idea of writing a CGI that output all of this information in a standard format (csv?), distributing and installing it, then collecting it periodically at a central location and doing all the rrd/notification myself, but then realized that this problem must've been solved a million times already.
Yes :) But check out the above links, and with a bit of planning and a small amount of coding/adapting existing components, it will work out.
There's got to be a better way. What do you guys use?
We rewrote our own NMS from scratch :)
(I'm not opposed to non-free solutions, provided they work better.)
We sell our solution, so I'm biased, but do check out the Nagios route, it works well enough for small to medium, and larger installations with careful planning (problem with Nagios is how to make it perform with thousands of hosts). Hth, Phil