RE: 95th Percentile again!

3 Jun 2001

...
[ On Saturday, June 2, 2001 at 22:23:50 (-0700), David Schwartz wrote: ]
...
...
Subject: RE: 95th Percentile again!
...
...
Pretty much every billing scheme is based upon statistical
sampling in some form.
...
Huh?  No proper scheme of usage-based accounting, be it a bulk-
throughput measurment, or a 95th percentile measurement, is in any way
based on "statistical sampling"!
...
Both schemes involve counting each and every byte passed thorugh the
pipe, and indeed of keeping an accurate timestamp for each sample too
(if you're interested in being able to audit your results).  So long as
there's no loss/noise on the pipe then both schemes mathematically must
produce the same results on both ends of the pipe.  I.e. both the total
byte counts per billing period must match, as must the level of the 95th
percentiles of rates calculated from these samples.
I don't agree that this is so for 95th percentile. Exactly which five
minute interval a packet is counted in will affect the results. There is no
way to totally agree on which such interval a packet belongs in. Similarly,
where the five-minute intervals begin and end is arbitrary and affects the
final numbers.

	Now it's perfectly reasonable for both ends to agree that the provider will
do the sampling and the provider's results, unless in actual error, shall be
the basis for the billing. Nevertheless, the agreement is to use a billing
scheme based upon statistical sampling.
...
...
It's not exactly fair to ignore sampling errors in your favor and then
cry foul should the odds go against you.
...
Indeed.  Fortunately it's not necessary to regularly put up with such
sampling errors (at least not so long as your router/switch/whatever has
a properly implemented SNMP agent or other reliable means to access its
interface byte counters).
The interface byte counters won't tell you where the packets went. So any
such billing scheme would be based ultimately upon statistical sampling. The
provider would determine that typically some of your packets are local and
cost very little and some are remote and may cost much more. Rather than
counting each packet and figuring out its cost, the provider relies upon
prior statistical sampling to come up with some 'average' cost which he
bills you on the basis of.

	Sometimes what happens in this case is the customer or the provider realize
that this particular traffic pattern does not match the statistical sample
on which the billing was based. Richard Steenbergen told me a story about a
company that colocated all their servers at POPs of the same provider and
paid twice for traffic between their machines. Needless to say, they had to
negotiate new pricing. Why? Because their traffic pattern made the
statistical sampling upon which their billing was based inappropriate.

	If a billing scheme were not based upon statistical sampling, it would
require the provider to somehow accurately determine how much each packet
cost him to get to you or handoff from you and bill you based upon that on
something like a cost plus basis.
...
However as we already know it's not very wise to use even a properly and
carefully configured Cricket, let alone MRTG, for billing purposes.
I agree, but all of the alternatives are ultimately based upon statistical
sampling. NetFlow, for example, loses a certain percentage of the packets
because it's UDP based. The provider compensates for this by raising his
rates. If he expects 3% of his accounting records to be lost, he raises his
rates to 103% hoping that he'll get a fair statistical sample. If this
assumption is violated, for example if packets are more likely to drop at
peak times and a particular customer passes most of their traffic at peak
times, then the statistical assumptions upon which the billing is based will
be violated, and the ISP will get taken advantage of.

	If he counts bytes out an Ethernet port, he'll be billing you for some
broadcast traffic that costs him nothing. He'll be billing you for some
local traffic that costs him nothing. He'll be billing you for some
short-range traffic that costs him very little. But he uses statistical
sampling to come up with some 'per byte' cost. If, for example, most of a
particular customer's traffic is from another customer in the same POP,
again the statistical assumptions upon which the billing is based will be
violated, and the customer will likely have to negotiate some other billing
mechanism.

	Every billing scheme I have ever seen has been based upon statistical
sampling. The closest to an exception I've seen is Level3's distance-based
scheme.

	DS