RE: 95th Percentile again!

3 Jun 2001

      [ On Saturday, June 2, 2001 at 22:23:50 (-0700), David Schwartz wrote: ]
...
Subject: RE: 95th Percentile again!
Pretty much every billing scheme is based upon statistical
sampling in some form.
Huh?  No proper scheme of usage-based accounting, be it a bulk-
throughput measurment, or a 95th percentile measurement, is in any way
based on "statistical sampling"!

Both schemes involve counting each and every byte passed thorugh the
pipe, and indeed of keeping an accurate timestamp for each sample too
(if you're interested in being able to audit your results).  So long as
there's no loss/noise on the pipe then both schemes mathematically must
produce the same results on both ends of the pipe.  I.e. both the total
byte counts per billing period must match, as must the level of the 95th
percentiles of rates calculated from these samples.

Although there are some schemes that seem to allow you to divide your
billing period into segments and "drop" most of the samples which
calculate to rates under the Nth percentile after each segment, even
they do not equate to a "statistical sampling".  All of the data is
considered in detail and none is actually thrown away or ignored until
after the necessary calculations and checks have been made with it --
it's just that the resulting data set isn't possible to audit after the
fact.
...
It's not exactly fair to ignore sampling errors in your favor and then
cry foul should the odds go against you.
Indeed.  Fortunately it's not necessary to regularly put up with such
sampling errors (at least not so long as your router/switch/whatever has
a properly implemented SNMP agent or other reliable means to access its
interface byte counters).
...
On the other hand, providers that
use statistical sampling should disclose that to their customers so that
they understand that they're being billed using systems that aren't
necessarily 100% reproducible.
The phrase "statistical sampling" would suggest that you're thinking of
some scheme where periodic samples are taken of the counters and then
these values are used on the spot to calculate throughput and then those
throughput numbers archived over time and used periodically to estimate
the average throughput over time.

I suppose this is in effect what you might end up with if you used the
"consolidated" part of RRDtool data, such as from the monthly graph
generated by Cricket (i.e. if don't keep all samples for at least your
full billing period, if not two periods).  MRTG results are probably
similarly unauditable from an accounting point of view.

However as we already know it's not very wise to use even a properly and
carefully configured Cricket, let alone MRTG, for billing purposes.

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>     <woods@robohack.ca>
Planix, Inc. <woods@planix.com>;   Secrets of the Weird <woods@weird.com>

RE: 95th Percentile again!

woods＠weird.com