Hi all, curious if anyone has recommendations on software that helps manage routine duties assigned to operations staff? For example, let’s say we have a P&P that says someone from the netops group must check that Rancid is successfully backing up all router configs bi-weekly. Ideally, it would send an email reminder to this pre-defined group of people saying hey, it’s Monday, someone needs to check this and come acknowledge the task as having been completed. If that doesn’t occur, pre-defined manager X is notified on Tuesday. If manager X doesn’t get someone to complete the task, director Y is notified, so on and so forth. Then, perhaps periodically it emails manager X anyway and says hey, it’s been three months, you need to audit netops to ensure they’re actually doing the Rancid audit and not just checking that it was done. This could be applied to the staff who check on backup failures, backup internet circuit status, out of band interfaces, etc. A data center I looked at recently had QR code stickers on all of their infrastructure stuff and there were staff assigned to check and log certain displayed values each day. The software would at least ensure they actually visited the equipment by requiring they scan the relevant QR code when in front of it. So I figure something that does what I’m looking for properly already exists. Thanks, David
Been meaning to dig into this one https://www.upguard.com/blog/guardrail-tasks-a-lightweight-tracking-system-f... --srs
On 27-Jul-2016, at 11:46 PM, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Hi all, curious if anyone has recommendations on software that helps manage routine duties assigned to operations staff?
For example, let’s say we have a P&P that says someone from the netops group must check that Rancid is successfully backing up all router configs bi-weekly. Ideally, it would send an email reminder to this pre-defined group of people saying hey, it’s Monday, someone needs to check this and come acknowledge the task as having been completed. If that doesn’t occur, pre-defined manager X is notified on Tuesday. If manager X doesn’t get someone to complete the task, director Y is notified, so on and so forth. Then, perhaps periodically it emails manager X anyway and says hey, it’s been three months, you need to audit netops to ensure they’re actually doing the Rancid audit and not just checking that it was done. This could be applied to the staff who check on backup failures, backup internet circuit status, out of band interfaces, etc.
A data center I looked at recently had QR code stickers on all of their infrastructure stuff and there were staff assigned to check and log certain displayed values each day. The software would at least ensure they actually visited the equipment by requiring they scan the relevant QR code when in front of it. So I figure something that does what I’m looking for properly already exists.
Thanks,
David
On 7/27/16, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Hi all, curious if anyone has recommendations on software that helps manage routine duties assigned to operations staff?
Have computers do the routine scut work - not people.
For example, let’s say we have a P&P that says someone from the netops group must check that Rancid is successfully backing up all router configs bi-weekly.
You've got the source code for rancid, so change rancid-run to do something like LOGFILE=$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S`; export LOGFILE change the ) >$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S` 2>&1 to ) >$LOGFILE 2>&1 and then in control_rancid do something like grep "clogin error:" $LOGFILE | sort | uniq -c >$TMP.fail if [ -s $TMP.fail ]; then # got some output, mail the report ... Do the same type thing for checking on
backup failures, backup internet circuit status, out of band interfaces, etc.
Automate the checks, put the scripts in crontab & mail out an "OhNoes!" or "all clear" msg at the end. At which point you're left with the problem of making sure the managers are looking at the emails & making sure whatever problems are found actually get fixed :) Regards, Lee
Ideally, it would send an email reminder to this pre-defined group of people saying hey, it’s Monday, someone needs to check this and come acknowledge the task as having been completed. If that doesn’t occur, pre-defined manager X is notified on Tuesday. If manager X doesn’t get someone to complete the task, director Y is notified, so on and so forth. Then, perhaps periodically it emails manager X anyway and says hey, it’s been three months, you need to audit netops to ensure they’re actually doing the Rancid audit and not just checking that it was done. This could be applied to the staff who check on backup failures, backup internet circuit status, out of band interfaces, etc.
A data center I looked at recently had QR code stickers on all of their infrastructure stuff and there were staff assigned to check and log certain displayed values each day. The software would at least ensure they actually visited the equipment by requiring they scan the relevant QR code when in front of it. So I figure something that does what I’m looking for properly already exists.
Thanks,
David
Full automation is planned but does not eliminate the need for the software. Zero human auditing of fully automated processes and data collection are not acceptable to various certifying entities, the relevant auditors, the inevitably involved lawyers, and won’t pick up on bad data, like a bad thermometer or snmp counter that says a CRAC is 65 degrees when it’s really 90. So I’m still going to need a management solution to the issue whether it’s to tell someone to do the work or to tell someone to check the automated work. David On 7/27/16, 7:19 PM, "Lee" <ler762@gmail.com> wrote: On 7/27/16, David Hubbard <dhubbard@dino.hostasaurus.com> wrote: > Hi all, curious if anyone has recommendations on software that helps manage > routine duties assigned to operations staff? Have computers do the routine scut work - not people. > For example, let’s say we have a P&P that says someone from the netops group > must check that Rancid is successfully backing up all router configs > bi-weekly. You've got the source code for rancid, so change rancid-run to do something like LOGFILE=$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S`; export LOGFILE change the ) >$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S` 2>&1 to ) >$LOGFILE 2>&1 and then in control_rancid do something like grep "clogin error:" $LOGFILE | sort | uniq -c >$TMP.fail if [ -s $TMP.fail ]; then # got some output, mail the report ... Do the same type thing for checking on > backup failures, backup internet circuit status, out of band interfaces, etc. Automate the checks, put the scripts in crontab & mail out an "OhNoes!" or "all clear" msg at the end. At which point you're left with the problem of making sure the managers are looking at the emails & making sure whatever problems are found actually get fixed :) Regards, Lee
On 7/27/16, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Full automation is planned but does not eliminate the need for the software. Zero human auditing of fully automated processes and data collection are not acceptable to various certifying entities, the relevant auditors, the inevitably involved lawyers, and won’t pick up on bad data, like a bad thermometer or snmp counter that says a CRAC is 65 degrees when it’s really 90. So I’m still going to need a management solution to the issue whether it’s to tell someone to do the work or to tell someone to check the automated work.
You have a ticketing system - right? Create a cron job that creates a ticket to check whatever. Regards, Lee
David
On 7/27/16, 7:19 PM, "Lee" <ler762@gmail.com> wrote:
On 7/27/16, David Hubbard <dhubbard@dino.hostasaurus.com> wrote: > Hi all, curious if anyone has recommendations on software that helps manage > routine duties assigned to operations staff?
Have computers do the routine scut work - not people.
> For example, let’s say we have a P&P that says someone from the netops group > must check that Rancid is successfully backing up all router configs > bi-weekly.
You've got the source code for rancid, so change rancid-run to do something like LOGFILE=$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S`; export LOGFILE change the ) >$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S` 2>&1 to ) >$LOGFILE 2>&1
and then in control_rancid do something like grep "clogin error:" $LOGFILE | sort | uniq -c >$TMP.fail if [ -s $TMP.fail ]; then # got some output, mail the report ...
Do the same type thing for checking on > backup failures, backup internet circuit status, out of band interfaces, etc.
Automate the checks, put the scripts in crontab & mail out an "OhNoes!" or "all clear" msg at the end. At which point you're left with the problem of making sure the managers are looking at the emails & making sure whatever problems are found actually get fixed :)
Regards, Lee
Jira works well as a task tracking system for ops. Customizable work flows, decent integration with ldap, etc. Also good for tracking software projects. Having both software and ops tasks in one place has many benefits. On Wed, Jul 27, 2016, 16:28 David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Full automation is planned but does not eliminate the need for the software. Zero human auditing of fully automated processes and data collection are not acceptable to various certifying entities, the relevant auditors, the inevitably involved lawyers, and won’t pick up on bad data, like a bad thermometer or snmp counter that says a CRAC is 65 degrees when it’s really 90. So I’m still going to need a management solution to the issue whether it’s to tell someone to do the work or to tell someone to check the automated work.
David
On 7/27/16, 7:19 PM, "Lee" <ler762@gmail.com> wrote:
On 7/27/16, David Hubbard <dhubbard@dino.hostasaurus.com> wrote: > Hi all, curious if anyone has recommendations on software that helps manage > routine duties assigned to operations staff?
Have computers do the routine scut work - not people.
> For example, let’s say we have a P&P that says someone from the netops group > must check that Rancid is successfully backing up all router configs > bi-weekly.
You've got the source code for rancid, so change rancid-run to do something like LOGFILE=$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S`; export LOGFILE change the ) >$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S` 2>&1 to ) >$LOGFILE 2>&1
and then in control_rancid do something like grep "clogin error:" $LOGFILE | sort | uniq -c >$TMP.fail if [ -s $TMP.fail ]; then # got some output, mail the report ...
Do the same type thing for checking on > backup failures, backup internet circuit status, out of band interfaces, etc.
Automate the checks, put the scripts in crontab & mail out an "OhNoes!" or "all clear" msg at the end. At which point you're left with the problem of making sure the managers are looking at the emails & making sure whatever problems are found actually get fixed :)
Regards, Lee
We use redmine, combined with scripts that call it’s API to create automated tickets/tasks that NOC or engineers need to attend to. Has email notifications, wiki, documents, files, code repo, calendar, customisable fields all built in. — Jeroen Wunnink IP Engineering Manager Hibernia Networks - Amsterdam Office Main numbers (Ext: 1011): USA +1.908.516.4200 | Canada +1.902.442.1780 Ireland +353.1.867.3600 | UK +44.1704.322.300 | Netherlands +31.208.200.622 24/7/365 IP NOC Phone: +31.20.82.00.623 Jeroen.Wunnink@hibernianetworks.com www.hibernianetworks.com On 27/07/16 20:16, "NANOG on behalf of David Hubbard" <nanog-bounces@nanog.org on behalf of dhubbard@dino.hostasaurus.com> wrote:
Hi all, curious if anyone has recommendations on software that helps manage routine duties assigned to operations staff?
For example, let’s say we have a P&P that says someone from the netops group must check that Rancid is successfully backing up all router configs bi-weekly. Ideally, it would send an email reminder to this pre-defined group of people saying hey, it’s Monday, someone needs to check this and come acknowledge the task as having been completed. If that doesn’t occur, pre-defined manager X is notified on Tuesday. If manager X doesn’t get someone to complete the task, director Y is notified, so on and so forth. Then, perhaps periodically it emails manager X anyway and says hey, it’s been three months, you need to audit netops to ensure they’re actually doing the Rancid audit and not just checking that it was done. This could be applied to the staff who check on backup failures, backup internet circuit status, out of band interfaces, etc.
A data center I looked at recently had QR code stickers on all of their infrastructure stuff and there were staff assigned to check and log certain displayed values each day. The software would at least ensure they actually visited the equipment by requiring they scan the relevant QR code when in front of it. So I figure something that does what I’m looking for properly already exists.
Thanks,
David
This e-mail and any attachments thereto is intended only for use by the addressee(s) named herein and may be proprietary and/or legally privileged. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this email, and any attachments thereto, without the prior written permission of the sender is strictly prohibited. If you receive this e-mail in error, please immediately telephone or e-mail the sender and permanently delete the original copy and any copy of this e-mail, and any printout thereof. All documents, contracts or agreements referred or attached to this e-mail are SUBJECT TO CONTRACT. The contents of an attachment to this e-mail may contain software viruses that could damage your own computer system. While Hibernia Networks has taken every reasonable precaution to minimize this risk, we cannot accept liability for any damage that you sustain as a result of software viruses. You should carry out your own virus checks before opening any attachment.
On 27 July 2016 at 21:16, David Hubbard <dhubbard@dino.hostasaurus.com> wrote: Hey,
Hi all, curious if anyone has recommendations on software that helps manage routine duties assigned to operations staff?
I'd solicit opinions as well. There are few features I'd like to see: 1) ability to create parent+child, if all childs are closed, parent closes if parent is closed, childs close 2) ability to create dependencies, perhaps I have some design change I want to make, but it can't be done until large bunch of operational work is done, I could create tickets for ops, and then create ticket for myself, and make it depend on the the ops ticket being solved. It wouldn't be seen in my work queue, until all solve-dependencies are solved. 3) user (non-admin) access to API, if the UI is bad, like it probably is for my very small subnet of things I need, I could create own CLI UI addressing solely the use cases that are relevant to me, in an streamlined, low-time-cost UI to me. In dream scenario shipping webUI is dog-fooding documented API, so anything I can do there, I can do from my own CLI UI. There are probably others, but those are the main things I think I need. -- ++ytti
participants (6)
-
David Hubbard
-
Jeroen Wunnink
-
Lee
-
Matt Ryanczak
-
Saku Ytti
-
Suresh Ramasubramanian