[ On Saturday, June 2, 2001 at 22:23:50 (-0700), David Schwartz wrote: ]
Subject: RE: 95th Percentile again!
Pretty much every billing scheme is based upon statistical sampling in some form.
Huh? No proper scheme of usage-based accounting, be it a bulk- throughput measurment, or a 95th percentile measurement, is in any way based on "statistical sampling"!
Both schemes involve counting each and every byte passed thorugh the pipe, and indeed of keeping an accurate timestamp for each sample too (if you're interested in being able to audit your results). So long as there's no loss/noise on the pipe then both schemes mathematically must produce the same results on both ends of the pipe. I.e. both the total byte counts per billing period must match, as must the level of the 95th percentiles of rates calculated from these samples.
I don't agree that this is so for 95th percentile. Exactly which five minute interval a packet is counted in will affect the results. There is no way to totally agree on which such interval a packet belongs in. Similarly, where the five-minute intervals begin and end is arbitrary and affects the final numbers. Now it's perfectly reasonable for both ends to agree that the provider will do the sampling and the provider's results, unless in actual error, shall be the basis for the billing. Nevertheless, the agreement is to use a billing scheme based upon statistical sampling.
It's not exactly fair to ignore sampling errors in your favor and then cry foul should the odds go against you.
Indeed. Fortunately it's not necessary to regularly put up with such sampling errors (at least not so long as your router/switch/whatever has a properly implemented SNMP agent or other reliable means to access its interface byte counters).
The interface byte counters won't tell you where the packets went. So any such billing scheme would be based ultimately upon statistical sampling. The provider would determine that typically some of your packets are local and cost very little and some are remote and may cost much more. Rather than counting each packet and figuring out its cost, the provider relies upon prior statistical sampling to come up with some 'average' cost which he bills you on the basis of. Sometimes what happens in this case is the customer or the provider realize that this particular traffic pattern does not match the statistical sample on which the billing was based. Richard Steenbergen told me a story about a company that colocated all their servers at POPs of the same provider and paid twice for traffic between their machines. Needless to say, they had to negotiate new pricing. Why? Because their traffic pattern made the statistical sampling upon which their billing was based inappropriate. If a billing scheme were not based upon statistical sampling, it would require the provider to somehow accurately determine how much each packet cost him to get to you or handoff from you and bill you based upon that on something like a cost plus basis.
However as we already know it's not very wise to use even a properly and carefully configured Cricket, let alone MRTG, for billing purposes.
I agree, but all of the alternatives are ultimately based upon statistical sampling. NetFlow, for example, loses a certain percentage of the packets because it's UDP based. The provider compensates for this by raising his rates. If he expects 3% of his accounting records to be lost, he raises his rates to 103% hoping that he'll get a fair statistical sample. If this assumption is violated, for example if packets are more likely to drop at peak times and a particular customer passes most of their traffic at peak times, then the statistical assumptions upon which the billing is based will be violated, and the ISP will get taken advantage of. If he counts bytes out an Ethernet port, he'll be billing you for some broadcast traffic that costs him nothing. He'll be billing you for some local traffic that costs him nothing. He'll be billing you for some short-range traffic that costs him very little. But he uses statistical sampling to come up with some 'per byte' cost. If, for example, most of a particular customer's traffic is from another customer in the same POP, again the statistical assumptions upon which the billing is based will be violated, and the customer will likely have to negotiate some other billing mechanism. Every billing scheme I have ever seen has been based upon statistical sampling. The closest to an exception I've seen is Level3's distance-based scheme. DS