recommendations for external montioring services?
I'm not looking to monitor a massive infrastructure: 3 web sites, 2 mail servers (pop,imap,submission port, https webmail), 4 dns servers (including lookups to ensure they're not listening but not talking), and one inbound mx. A few network points to ping to ensure connectivity throughout my system. Scheduled notification windows (for example, during work hours I don't want my phone pinged unless it's everything going offline. Off hours I do. Secondary notifications if problem persists to other users, or in the event of many triggers. That sort of thing). Sensitivity settings (If web server 1 shows down for 5 min, that's not a big deal. Another one if it doesn't respond to repeated queries within 1 minute is a big deal) A Weekly summary of issues would be nice. (especially the 'well it was down for a short bit but we didn't notify as per settings') I don't have a lot of money to throw at this. I DO have detailed internal monitoring of our systems but sometimes that is not entirely useful, due to the fact that there are a few 'single points of failure' within our network/notification system, not to mention if the monitor itself goes offline it's not exactly going to be able to tell me about it. (and that happened once, right before the mail server decided to stop receiving mail). __________________________ Eric Esslinger Information Services Manager - Fayetteville Public Utilities http://www.fpu-tn.com/ (931)433-1522 ext 165 This message may contain confidential and/or proprietary information and is intended for the person/entity to whom it was originally addressed. Any use by others is strictly prohibited.
Take a look at Panopta - we use it to compliment our internal monitoring and find it great compared to some of the systems we've used in the past (Pingdom, Binary Canary). The interface is easy to use and responsive, we don't get false positives and there are a good range of checks. There's an API as well if you want to integrate it. I'd stay clear of the software agent though, we've had a few issues with that. For remote service checks we love it. Edward Dore Freethought Internet On 12 Dec 2011, at 19:10, Eric J Esslinger wrote:
I'm not looking to monitor a massive infrastructure: 3 web sites, 2 mail servers (pop,imap,submission port, https webmail), 4 dns servers (including lookups to ensure they're not listening but not talking), and one inbound mx. A few network points to ping to ensure connectivity throughout my system. Scheduled notification windows (for example, during work hours I don't want my phone pinged unless it's everything going offline. Off hours I do. Secondary notifications if problem persists to other users, or in the event of many triggers. That sort of thing). Sensitivity settings (If web server 1 shows down for 5 min, that's not a big deal. Another one if it doesn't respond to repeated queries within 1 minute is a big deal) A Weekly summary of issues would be nice. (especially the 'well it was down for a short bit but we didn't notify as per settings') I don't have a lot of money to throw at this. I DO have detailed internal monitoring of our systems but sometimes that is not entirely useful, due to the fact that there are a few 'single points of failure' within our network/notification system, not to mention if the monitor itself goes offline it's not exactly going to be able to tell me about it. (and that happened once, right before the mail server decided to stop receiving mail).
__________________________ Eric Esslinger Information Services Manager - Fayetteville Public Utilities http://www.fpu-tn.com/ (931)433-1522 ext 165
This message may contain confidential and/or proprietary information and is intended for the person/entity to whom it was originally addressed. Any use by others is strictly prohibited.
On Mon, Dec 12, 2011 at 11:18 AM, Edward Dore <edward.dore@freethought-internet.co.uk> wrote:
Take a look at Panopta - we use it to compliment our internal monitoring and find it great compared to some of the systems we've used in the past (Pingdom, Binary Canary).
The interface is easy to use and responsive, we don't get false positives and there are a good range of checks. There's an API as well if you want to integrate it.
I'd stay clear of the software agent though, we've had a few issues with that. For remote service checks we love it.
Edward Dore Freethought Internet
On 12 Dec 2011, at 19:10, Eric J Esslinger wrote:
I'm not looking to monitor a massive infrastructure: 3 web sites, 2 mail servers (pop,imap,submission port, https webmail), 4 dns servers (including lookups to ensure they're not listening but not talking), and one inbound mx. A few network points to ping to ensure connectivity throughout my system. Scheduled notification windows (for example, during work hours I don't want my phone pinged unless it's everything going offline. Off hours I do. Secondary notifications if problem persists to other users, or in the event of many triggers. That sort of thing). Sensitivity settings (If web server 1 shows down for 5 min, that's not a big deal. Another one if it doesn't respond to repeated queries within 1 minute is a big deal) A Weekly summary of issues would be nice. (especially the 'well it was down for a short bit but we didn't notify as per settings') I don't have a lot of money to throw at this. I DO have detailed internal monitoring of our systems but sometimes that is not entirely useful, due to the fact that there are a few 'single points of failure' within our network/notification system, not to mention if the monitor itself goes offline it's not exactly going to be able to tell me about it. (and that happened once, right before the mail server decided to stop receiving mail).
_____
Nagios, or Zabbix are the ones I am most familiar with. Zabbix is a bit involved to set up, and may not be what you need in the scale of things. Nagios is a bit cumbersome to keep up with rapidly changing systems of any size, but is good for small (and large) setups that are more static. Not without it's quirks mind, and takes a bit of work to set up if you've never done it before. But doesn't require a DB backend, or any other stuff, just a server to put it on. No agent needed, as long as everything you want to check is "gettable" from the server, like checking that a mail server is available for connections, etc. But can use agent checks, or pretty much any other checks. -- http://neon-buddha.net
On Dec 12, 2011, at 2:10 PM, Eric J Esslinger wrote:
I'm not looking to monitor a massive infrastructure: 3 web sites, 2 mail servers (pop,imap,submission port, https webmail), 4 dns servers (including lookups to ensure they're not listening but not talking), and one inbound mx. A few network points to ping to ensure connectivity throughout my system. Scheduled notification windows (for example, during work hours I don't want my phone pinged unless it's everything going offline. Off hours I do. Secondary notifications if problem persists to other users, or in the event of many triggers. That sort of thing). Sensitivity settings (If web server 1 shows down for 5 min, that's not a big deal. Another one if it doesn't respond to repeated queries within 1 minute is a big deal) A Weekly summary of issues would be nice. (especially the 'well it was down for a short bit but we didn't notify as per settings') I don't have a lot of money to throw at this. I DO have detailed internal monitoring of our systems but sometimes that is not entirely useful, due to the fact that there are a few 'single points of failure' within our network/notification system, not to mention if the monitor itself goes offline it's not exactly going to be able to tell me about it. (and that happened once, right before the mail server decided to stop receiving mail).
I have been passively looking for external monitoring with similar requirements, though I'm curious about one more requirement if people chiming in can share it - whether or not said vendor supports IPv6 for both external service checks and potentially for agent communications as well.
__________________________ Eric Esslinger Information Services Manager - Fayetteville Public Utilities http://www.fpu-tn.com/ (931)433-1522 ext 165
This message may contain confidential and/or proprietary information and is intended for the person/entity to whom it was originally addressed. Any use by others is strictly prohibited.
On Mon, Dec 12, 2011 at 01:10:54PM -0600, Eric J Esslinger wrote:
I'm not looking to monitor a massive infrastructure: 3 web sites, 2 mail servers (pop,imap,submission port, https webmail), 4 dns servers (including lookups to ensure they're not listening but not talking), and one inbound mx. A few network points to ping to ensure connectivity throughout my system. Scheduled notification windows (for example, during work hours I don't want my phone pinged unless it's everything going offline. Off hours I do. Secondary notifications if problem persists to other users, or in the event of many triggers. That sort of thing). Sensitivity settings (If web server 1 shows down for 5 min, that's not a big deal. Another one if it doesn't respond to repeated queries within 1 minute is a big deal) A Weekly summary of issues would be nice. (especially the 'well it was down for a short bit but we didn't notify as per settings') I don't have a lot of money to throw at this. I DO have detailed internal monitoring of our systems but sometimes that is not entirely useful, due to the fact that there are a few 'single points of failure' within our network/notification system, not to mention if the monitor itself goes offline it's not exactly going to be able to tell me about it. (and that happened once, right before the mail server decided to stop receiving mail).
You may want to check out http://www.panopta.com/ Works well for me with reasonable pricing. Derrick
__________________________ Eric Esslinger Information Services Manager - Fayetteville Public Utilities http://www.fpu-tn.com/ (931)433-1522 ext 165
This message may contain confidential and/or proprietary information and is intended for the person/entity to whom it was originally addressed. Any use by others is strictly prohibited.
You may want to check out http://www.panopta.com/ Works well for me with reasonable pricing.
+1 to Panopta. We have been using them for the past two years and they have been very solid. We have even put in a few feature requests (voice notifications was one we specifically requested) and they have had them implemented and pushed out for beta testing in a couple of weeks. I would highly recommend them.
Two I know and have used are Alertra and SiteRecon. -----Original Message----- From: Express Web Systems [mailto:mailinglists@expresswebsystems.com] Sent: Monday, December 12, 2011 10:19 PM To: 'Derrick H.'; nanog@nanog.org Subject: RE: recommendations for external montioring services?
You may want to check out http://www.panopta.com/ Works well for me with reasonable pricing.
+1 to Panopta. We have been using them for the past two years and they +have been very solid. We have even put in a few feature requests (voice notifications was one we specifically requested) and they have had them implemented and pushed out for beta testing in a couple of weeks. I would highly recommend them.
At 22-07-2011 20:59, Eric J Esslinger wrote:
I'm not looking to monitor a massive infrastructure: 3 web sites, 2 mail servers (pop,imap,submission port, https webmail), 4 dns servers (including lookups to ensure they're not listening but not talking), and one inbound mx. A few network points to ping to ensure connectivity throughout my system. Scheduled notification windows (for example, during work hours I don't want my phone pinged unless it's everything going offline. Off hours I do. Secondary notifications if problem persists to other users, or in the event of many triggers. That sort of thing). Sensitivity settings (If web server 1 shows down for 5 min, that's not a big deal. Another one if it doesn't respond to repeated queries within 1 minute is a big deal) A Weekly summary of issues would be nice. (especially the 'well it was down for a short bit but we didn't notify as per settings') I don't have a lot of money to throw at this. I DO have detailed internal monitoring of our systems but sometimes that is not entirely useful, due to the fact that there are a few 'single points of failure' within our network/notification system, not to mention if the monitor itself goes offline it's not exactly going to be able to tell me about it. (and that happened once, right before the mail server decided to stop receiving mail).
Some external monitoring services I could recommend: https://circonus.com/ http://mon.itor.us/ http://pingdom.com/
On 12/13/2011 5:11 AM, Michiel Klaver wrote:
At 22-07-2011 20:59, Eric J Esslinger wrote:
I'm not looking to monitor a massive infrastructure: 3 web sites, 2 mail servers (pop,imap,submission port, https webmail), 4 dns servers (including lookups to ensure they're not listening but not talking), and one inbound mx. A few network points to ping to ensure connectivity throughout my system. Scheduled notification windows (for example, during work hours I don't want my phone pinged unless it's everything going offline. Off hours I do. Secondary notifications if problem persists to other users, or in the event of many triggers. That sort of thing). Sensitivity settings (If web server 1 shows down for 5 min, that's not a big deal. Another one if it doesn't respond to repeated queries within 1 minute is a big deal) A Weekly summary of issues would be nice. (especially the 'well it was down for a short bit but we didn't notify as per settings') I don't have a lot of money to throw at this. I DO have detailed internal monitoring of our systems but sometimes that is not entirely useful, due to the fact that there are a few 'single points of failure' within our network/notification system, not to mention if the monitor itself goes offline it's not exactly going to be able to tell me about it. (and that happened once, right before the mail server decided to stop receiving mail).
Some external monitoring services I could recommend:
https://circonus.com/ http://mon.itor.us/ http://pingdom.com/
I'll throw another into the list: http://www.watchmouse.com/ They have some nice features and monitoring over IPv4 and IPv6. -DMM
On Mon, 12 Dec 2011, Eric J Esslinger wrote:
I'm not looking to monitor a massive infrastructure: 3 web sites, 2 mail servers (pop,imap,submission port, https webmail), 4 dns servers (including lookups to ensure they're not listening but not talking), and one inbound mx. A few network points to ping to ensure connectivity throughout my system. Scheduled notification windows (for example, during work hours I don't want my phone pinged unless it's everything going offline. Off hours I do. Secondary notifications if problem persists to other users, or in the event of many triggers. That sort of thing). Sensitivity settings (If web server 1 shows down for 5 min, that's not a big deal. Another one if it doesn't respond to repeated queries within 1 minute is a big deal) A Weekly summary of issues would be nice. (especially the 'well it was down for a short bit but we didn't notify as per settings') I don't have a lot of money to throw at this. I
Hi Eric. The feature set you are describing should be in any monitoring system worthy of the name. I've used Nagios to good effect for the best part of the last 12 years or so. Before that I used Big Brother, which sucked in various ways. I did an evaluation on a wide variety of FOSS monitoring systems 2-3 years ago and Nagios won at the time (again). Generally I found the alternatives had problems that I considered to be quite serious (such as being overly complicated or doing checks so frequently that they loaded the systems they were supposed to be monitoring[1]). I'm currently trialing Icinga, a fork of Nagios. Puppet can be set up to manage Nagios/Icinga config which cuts down on the admin overhead. Nagios/Icinga can be hooked up to Collectd to provide performance data as well as alert monitoring. One concern about external monitoring services is the level of visibility they need to have in to your network to adequately monitor them. My recommendation is to do a proper risk assessment on the available options.
DO have detailed internal monitoring of our systems but sometimes that is not entirely useful, due to the fact that there are a few 'single points of failure' within our network/notification system, not to mention if the monitor itself goes offline it's not exactly going to be able to tell me about it. (and that happened once, right before the mail server decided to stop receiving mail).
There are a couple of ways to deal with this. Some monitoring applications can fail-over to a standby server if the primary fails. But this isn't even really necessary. You will arguably gain higher reliability by running multiple _independent_ monitors and have them monitor each other[2]. I have often used this approach. The principal aim here is to guarantee that you are alerted to any single failure (a production service, system or a monitor). Multiple simultaneous failures could still produce a blackspot. It is possible to design a system that will discover multiple simultaneous failures, but it takes more effort and resources. [1] Sometimes I wonder if the people developing certain systems have any operational experience at all. [2] A system designed to fail-over on certain conditions may fail to fail-over, ah, so to speak. Cheers, Rob -- Email: robert@timetraveller.org Linux counter ID #16440 IRC: Solver (OFTC & Freenode) Web: http://www.practicalsysadmin.com Director, Software in the Public Interest (http://spi-inc.org/) Free & Open Source: The revolution that quietly changed the world "One ought not to believe anything, save that which can be proven by nature and the force of reason" -- Frederick II (26 December 1194 – 13 December 1250)
Solar winds as you send in the specific mib required to monitor and a week later it's general release Sent from my iPhone On 2011-12-13, at 7:11 PM, "Robert Brockway" <robert@timetraveller.org> wrote:
On Mon, 12 Dec 2011, Eric J Esslinger wrote:
I'm not looking to monitor a massive infrastructure: 3 web sites, 2 mail servers (pop,imap,submission port, https webmail), 4 dns servers (including lookups to ensure they're not listening but not talking), and one inbound mx. A few network points to ping to ensure connectivity throughout my system. Scheduled notification windows (for example, during work hours I don't want my phone pinged unless it's everything going offline. Off hours I do. Secondary notifications if problem persists to other users, or in the event of many triggers. That sort of thing). Sensitivity settings (If web server 1 shows down for 5 min, that's not a big deal. Another one if it doesn't respond to repeated queries within 1 minute is a big deal) A Weekly summary of issues would be nice. (especially the 'well it was down for a short bit but we didn't notify as per settings') I don't have a lot of money to throw at this. I
Hi Eric. The feature set you are describing should be in any monitoring system worthy of the name. I've used Nagios to good effect for the best part of the last 12 years or so. Before that I used Big Brother, which sucked in various ways.
I did an evaluation on a wide variety of FOSS monitoring systems 2-3 years ago and Nagios won at the time (again). Generally I found the alternatives had problems that I considered to be quite serious (such as being overly complicated or doing checks so frequently that they loaded the systems they were supposed to be monitoring[1]).
I'm currently trialing Icinga, a fork of Nagios.
Puppet can be set up to manage Nagios/Icinga config which cuts down on the admin overhead.
Nagios/Icinga can be hooked up to Collectd to provide performance data as well as alert monitoring.
One concern about external monitoring services is the level of visibility they need to have in to your network to adequately monitor them.
My recommendation is to do a proper risk assessment on the available options.
DO have detailed internal monitoring of our systems but sometimes that is not entirely useful, due to the fact that there are a few 'single points of failure' within our network/notification system, not to mention if the monitor itself goes offline it's not exactly going to be able to tell me about it. (and that happened once, right before the mail server decided to stop receiving mail).
There are a couple of ways to deal with this. Some monitoring applications can fail-over to a standby server if the primary fails. But this isn't even really necessary. You will arguably gain higher reliability by running multiple _independent_ monitors and have them monitor each other[2]. I have often used this approach.
The principal aim here is to guarantee that you are alerted to any single failure (a production service, system or a monitor). Multiple simultaneous failures could still produce a blackspot. It is possible to design a system that will discover multiple simultaneous failures, but it takes more effort and resources.
[1] Sometimes I wonder if the people developing certain systems have any operational experience at all.
[2] A system designed to fail-over on certain conditions may fail to fail-over, ah, so to speak.
Cheers,
Rob
-- Email: robert@timetraveller.org Linux counter ID #16440 IRC: Solver (OFTC & Freenode) Web: http://www.practicalsysadmin.com Director, Software in the Public Interest (http://spi-inc.org/) Free & Open Source: The revolution that quietly changed the world "One ought not to believe anything, save that which can be proven by nature and the force of reason" -- Frederick II (26 December 1194 – 13 December 1250)
participants (11)
-
David Miller
-
Derrick H.
-
Edward Dore
-
Eric J Esslinger
-
Express Web Systems
-
Jim Richardson
-
Mark Gauvin
-
Michiel Klaver
-
Robert Brockway
-
Ryan Rawdon
-
Scott Berkman