I am trying to take stock of all the network tools we employ on our network and come up with a concise list of metrics I can compile from these to report to management on a rolling basis to reflect the health of our enterprise network. I've started a list and I was hoping others might help me out by adding to it. If a measurement you sugggest is not achieveable with a tool I have, I'll probably pester you and find out what you are using. But to keep it simple, I'm just looking for things to measure. Assuming a monthly reporting schedule, here's the list I have so far: 1. Uptime per WAN or Internet circuit 2. # and average length of outages 3. Bandwidth utilization per WAN/Internet circuit and "important" VLANs 4. Overall Network Latency, RTT measured from various parts of network (cisco IPM) to various other parts 5. Top talkers per WAN circuit 6. Top destinations per WAN circuit 7. Top 10 most utilized WAN circuits (% burst above CIR, etc) 7. Protocol distribution per WAN circuit 8. Syslog/Sniffer alarms by severity 9. Application Response time for key Apps (eg, SAP, HTTP) 10. Security Incidents 11. TACACs reports on number of logins, changes, etc 12. Bandwidth/Latency trending What am I missing? Thanks! -BM
On Fri, 2 Nov 2001, Murphy, Brennan wrote:
What am I missing?
If you're going through all this trouble, you might as well measure some interface statistics such as: - CRC errors, these are an important clue indicating lower layer problems - collisions, if you use any non-switched ethernet (or even if it's switched: duplex mismatch is a bad thing and it happens) - over- and underruns, these indicate (transient) high CPU loads - input/output drops, to see if you are experiencing congestion And: - router CPU load These should all be easy to measure with MRTG if you can find out how to read the info from the box using SNMP.
participants (2)
-
Iljitsch van Beijnum
-
Murphy, Brennan