[ On Saturday, June 2, 2001 at 23:59:17 (-0700), David Schwartz wrote: ]
Subject: RE: 95th Percentile again!
I don't agree that this is so for 95th percentile. Exactly which five minute interval a packet is counted in will affect the results. There is no way to totally agree on which such interval a packet belongs in. Similarly, where the five-minute intervals begin and end is arbitrary and affects the final numbers.
Perhaps you should sit down with a table of numbers and compare the results by hand. I think you'll find that you are gravely mistaken. (I can provide you with some raw numbers that are guaranteed to have been sampled out-of-sync at the ends of the same pipe if you'd like.) The only time there can ever be a descrepancy is at the "edge". I.e. if during the last sample time in the billing period the ISP sees a huge count of bytes, but the customer (because his last full sample was five minutes less one second before the end of the period) sees zero bytes, *AND* iff this one large sample throws the Nth percentile calculation for the entire billing period up over the next billing increment, then the lack of syncronisation will cause a "problem" (for the customer in this case :-). However the chances of this kind of error happening in real life are so tiny as to be almost impossible (at least if the billing period is orders of magnitude larger than the sample period, which of course is what we're supposing here). I count over three orders of magnitude difference for a 30-day billing period and a 5-min sample period. For the customer it's easy to avoid too -- just unplug your network (scheduled down time) during the 10-minute period between billing cyle roll-overs. :-)
The interface byte counters won't tell you where the packets went.
Clearly if the ISP is at one end of the pipe and the customer's at the other then the out/in (and in/out at the other end) counters are an extremely accurate count of where the packets went! Obviously such a scheme "limits" in some ways the viable alternatives for connecting customers, and it certainly forces you to do your data collection at specific points.
So any such billing scheme would be based ultimately upon statistical sampling.
Please try and talk sense man! Regardless of what you're buying or selling there's absolutely NOTHING "statistical" about byte counting! It's pure accounting, plain and simple. It's 100% auditable and 100% verifiable too!
The provider would determine that typically some of your packets are local and cost very little and some are remote and may cost much more. Rather than counting each packet and figuring out its cost, the provider relies upon prior statistical sampling to come up with some 'average' cost which he bills you on the basis of.
The only way to do that is to count flows instead of bytes and the only way I know of doing that is indeed based only on statistical sampling. Any customer who'd be willing to suffer under such a scheme is either not very clueful or getting one heck of a deal on their pricing....
Sometimes what happens in this case is the customer or the provider realize that this particular traffic pattern does not match the statistical sample on which the billing was based. Richard Steenbergen told me a story about a company that colocated all their servers at POPs of the same provider and paid twice for traffic between their machines. Needless to say, they had to negotiate new pricing. Why? Because their traffic pattern made the statistical sampling upon which their billing was based inappropriate.
You're talking apples and oranges -- please stop mis-directing the topic in an apparent attempt to "call the kettle black".
If a billing scheme were not based upon statistical sampling, it would require the provider to somehow accurately determine how much each packet cost him to get to you or handoff from you and bill you based upon that on something like a cost plus basis.
Iff. but that's not what we're talking about here.
I agree, but all of the alternatives are ultimately based upon statistical sampling. NetFlow, for example, loses a certain percentage of the packets because it's UDP based. The provider compensates for this by raising his rates. If he expects 3% of his accounting records to be lost, he raises his rates to 103% hoping that he'll get a fair statistical sample. If this assumption is violated, for example if packets are more likely to drop at peak times and a particular customer passes most of their traffic at peak times, then the statistical assumptions upon which the billing is based will be violated, and the ISP will get taken advantage of.
Duh. But this isn't what we're talking about.
If he counts bytes out an Ethernet port, he'll be billing you for some broadcast traffic that costs him nothing. He'll be billing you for some local traffic that costs him nothing. He'll be billing you for some short-range traffic that costs him very little. But he uses statistical sampling to come up with some 'per byte' cost. If, for example, most of a particular customer's traffic is from another customer in the same POP, again the statistical assumptions upon which the billing is based will be violated, and the customer will likely have to negotiate some other billing mechanism.
I don't see the problem. It's a very simple matter to adjust the pricing to fit. You can do some "statistical sampling" to set the price, just like anyone might do in any form of cost estimation, but what's on the invoice in the end is a pure accounting of the actual traffic. You can do the same for packet loss too. It's only the price/unit that's based on statistical sampling and cost estimates. Why is this so difficult for some people to understand?
Every billing scheme I have ever seen has been based upon statistical sampling. The closest to an exception I've seen is Level3's distance-based scheme.
You've obviously never looked beyond the silly schemes you're apparently stuck on talking about. I know of many billing systems that are based on pure bulk-throughput accounting and several that are based on true Nth percentile usage. None of them, not a single one, are based on statistical samples of anything -- *ALL* are pure 100% byte-counting and all of them count each and every byte. -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <woods@robohack.ca> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>