[ On Monday, June 4, 2001 at 00:21:31 (+1000), Geoff Huston wrote: ]
Subject: Re: 95th Percentile again (was RE: C&W Peering Problem?)
No its not obvious. The SNMP byte counters are odometers - as long as you get two clean samples per counter wrap you can accurately count bytes. The trick is to ensure that you get a minimum of two clean samples of the odometer reading per counter wrap - for high speed interfaces that typically implies reading the MIB2 64 bit interface counters, or triggering an SNMP poll at relatively tight time intervals.
The worst problem with using SNMP counters is not the wrap-around (properly implemented that happens "rarely" even on high-speed links since the `standard' does `mandate' use of 64-bit counters for truly high-speed links), but rather accidental resets caused by improper agent implementations, or reboots (or both). You have to detect not only counter roll-over, but also resets, and you can only do the latter if the agent's uptime value is also reset when the counters are reset. Otherwise you have to do what MRTG and recent versions of cricket do and simply ignore all roll-over and reset events (and thus take the loss on the counter deltas for those intervals). Which is why taking measurements of even MIB-2 64-bit counters very frequently (eg. even as often as every five minutes) is "wise" to do even if you're simply billing on bulk throughput per period. It's not very hard to scale a collection engine that can run in parallel (on parallel hardware if necessary) to do this, and indeed the data volume should not be an issue even at a one-minute collection interval! Another problem I've seen is with SNMP agents that can't scale to handle a full compliment of ports on their host routers/switches. This is an important consideration to keep in mind when choosing a hardware vendor. I think this is still an area that needs covering by an independent test lab too....
(My previous comments a month or so back about the inaccuracies inherant in 95% systems still apply - given a particular (extreme case) traffic load pattern it is possible for two measurement systems that are not phase locked, using precisely the same sampling technique and computation to deliver outcome values for the 95% point where one is up to twice the value of the other. )
Well, IIRC, your example was one of true extremes in the "coarse" variety, and one in which any ISP (or customer, if it's the other way around) who's paying attention will spot and nix immediately (because they're well aware of the wicked ways of the world and will clearly have anticipated them in their contracts). I.e. you can't play games with the system because you can't be a customer if you do! ;-) (unless maybe all your customers play the same game and you mandate that they play "in sync" with each other thus guaranteeing your own utilisation is flat.... :-) -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <woods@robohack.ca> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>