On 7/27/16, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Hi all, curious if anyone has recommendations on software that helps manage routine duties assigned to operations staff?
Have computers do the routine scut work - not people.
For example, let’s say we have a P&P that says someone from the netops group must check that Rancid is successfully backing up all router configs bi-weekly.
You've got the source code for rancid, so change rancid-run to do something like LOGFILE=$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S`; export LOGFILE change the ) >$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S` 2>&1 to ) >$LOGFILE 2>&1 and then in control_rancid do something like grep "clogin error:" $LOGFILE | sort | uniq -c >$TMP.fail if [ -s $TMP.fail ]; then # got some output, mail the report ... Do the same type thing for checking on
backup failures, backup internet circuit status, out of band interfaces, etc.
Automate the checks, put the scripts in crontab & mail out an "OhNoes!" or "all clear" msg at the end. At which point you're left with the problem of making sure the managers are looking at the emails & making sure whatever problems are found actually get fixed :) Regards, Lee
Ideally, it would send an email reminder to this pre-defined group of people saying hey, it’s Monday, someone needs to check this and come acknowledge the task as having been completed. If that doesn’t occur, pre-defined manager X is notified on Tuesday. If manager X doesn’t get someone to complete the task, director Y is notified, so on and so forth. Then, perhaps periodically it emails manager X anyway and says hey, it’s been three months, you need to audit netops to ensure they’re actually doing the Rancid audit and not just checking that it was done. This could be applied to the staff who check on backup failures, backup internet circuit status, out of band interfaces, etc.
A data center I looked at recently had QR code stickers on all of their infrastructure stuff and there were staff assigned to check and log certain displayed values each day. The software would at least ensure they actually visited the equipment by requiring they scan the relevant QR code when in front of it. So I figure something that does what I’m looking for properly already exists.
Thanks,
David