[ On Saturday, June 2, 2001 at 22:23:50 (-0700), David Schwartz wrote: ]
Subject: RE: 95th Percentile again!
Pretty much every billing scheme is based upon statistical sampling in some form.
Huh? No proper scheme of usage-based accounting, be it a bulk- throughput measurment, or a 95th percentile measurement, is in any way based on "statistical sampling"! Both schemes involve counting each and every byte passed thorugh the pipe, and indeed of keeping an accurate timestamp for each sample too (if you're interested in being able to audit your results). So long as there's no loss/noise on the pipe then both schemes mathematically must produce the same results on both ends of the pipe. I.e. both the total byte counts per billing period must match, as must the level of the 95th percentiles of rates calculated from these samples. Although there are some schemes that seem to allow you to divide your billing period into segments and "drop" most of the samples which calculate to rates under the Nth percentile after each segment, even they do not equate to a "statistical sampling". All of the data is considered in detail and none is actually thrown away or ignored until after the necessary calculations and checks have been made with it -- it's just that the resulting data set isn't possible to audit after the fact.
It's not exactly fair to ignore sampling errors in your favor and then cry foul should the odds go against you.
Indeed. Fortunately it's not necessary to regularly put up with such sampling errors (at least not so long as your router/switch/whatever has a properly implemented SNMP agent or other reliable means to access its interface byte counters).
On the other hand, providers that use statistical sampling should disclose that to their customers so that they understand that they're being billed using systems that aren't necessarily 100% reproducible.
The phrase "statistical sampling" would suggest that you're thinking of some scheme where periodic samples are taken of the counters and then these values are used on the spot to calculate throughput and then those throughput numbers archived over time and used periodically to estimate the average throughput over time. I suppose this is in effect what you might end up with if you used the "consolidated" part of RRDtool data, such as from the monthly graph generated by Cricket (i.e. if don't keep all samples for at least your full billing period, if not two periods). MRTG results are probably similarly unauditable from an accounting point of view. However as we already know it's not very wise to use even a properly and carefully configured Cricket, let alone MRTG, for billing purposes. -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <woods@robohack.ca> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>