On Wed, Feb 22, 2006 at 12:50:34PM -0600, Tom Sands wrote:
A lot of smaller folks check the counter every 5 min and use that same value for the 95th percentile. Most of us larger folks need to check more often to prevent 32bit counters from rolling over too often.
Actually, a lot of people do 5 minutes... and I would say that larger companies don't check them more often because they are using 64 bit counters, as should anyone with over about 100Mbps of traffic.
Counter size is an incomplete reason for polling interval. If you need a 5 minute average and poll your routers once every five minutes, what happens if an SNMP packet gets lost? In the best case, a retransmission over Y seconds sees it through, but now you've got 300+Y seconds in what was supposed to be a 300 second average...your next datapoint will also now be a 300-Y average unless you schedule it into the future. In the worst case, you've lost the datapoint entirely. This loses not just the one datapoint ending in that five minute span, but also the next datapoint. Sure, you can synthesize two 5 minute averages from one 10 minute average (presuming your counters wouldn't roll), but this is still a loss in data - one of those two datapoints should have been higher than the other. At a place of previous employ, we solved this problem by using a 30 second (!) polling interval, and a home-written (C, linking to the UCD-SNMP library (now net-snmp)) polling engine that did its best to emit and receive as many queries in as short a space of time as it was able to (without flooding monitored devices). In these circumstances, we could lose several datapoints and still construct valid 5-minute averages from the pieces (combinations of 30, 60, 90 etc second averages, weighting each by the number of seconds it represents within the 300-second span). Our operations staff also enjoyed being able to see graphical response to changes in traffic balancing within half a minute...better, faster feedback. Another factor that makes 'counter size' a bad indicator for polling interval.
In our setup, as with a lot of people likely, any data that is older than 30 days is averaged. However, we store the exact maximums for the most current 30 days.
You keep no record? What do you do if a customer challenges their bill? Synthesize 5 minute datapoints out of the larger averages? I recommend keeping the 5 minute averages in perpetuity, even if that means having an operator burn the data to CD and store it in a safe (not under his desk in the pizza boxes, nor under his soft drink as a coaster). -- David W. Hankins "If you don't do it right the first time, Software Engineer you'll just have to do it again." Internet Systems Consortium, Inc. -- Jack T. Hankins