Thanks all for the responses. I think I'm going to use cacti and plugins to aggregate. Aggregated billing is kind of something that would be nice to have but wasn't required. It's nice to know there are concerns with using cacti for this. My last question is if there is any easy/automated way to pull interfaces into cacti and configure graphs for them either via SNMP or reading from a mysql DB. I suddenly remember how much I hate importing large routers into cacti and configuring the graphs. 2012/1/20 Leo Bicknell <bicknell@ufp.org>
In a message written on Fri, Jan 20, 2012 at 12:16:14AM -0600, Jimmy Hess wrote:
Except Cacti/RRDTOOL is really just a great visualization tool, while you can build stacks, it is not something that accurately meters data for billing purposes. The right kind of tool to use would be a netflow or network tap-based billing tool, that actually meters/samples specific datapoints at a specific interval and applies the billing business logic for reporting based on sampled data points, instead of smoothed averages of approximations.
To suggest Netflow is more accurate than rrdtool seems rather strange to me. It can be as accurate, but is not the way most people deploy it.
RRDTool pulls the SNMP counters from an interface and records them to a file. With no aggregation, and assuming your device has accurate SNMP, this should be 100% accurate. While you are right that the defaults for RRDTOOL aggregate data (after a day, week, and month, approximately) those aggregates can be disabled keeping the raw data. I know several ISP's that keep the raw data and use it for billing using these tools.
Netflow often suffers right at the source. If you want to bill off netflow data 1:1 netflow is almost required, while most ISP's do sampled Netflow at 1:100 or 1:1000. Those sampling levels produce more inaccuracy than RRDTool's aggregation function. What's more, once the data is put into the Netflow collector, they all do aggregation as well, just like RRDTool. Again, you can disable much of it with careful configuration.
But let's compare apples to apples. Let's consider RRDTool configured to not aggregate with 1:1 netflow configured to not aggregate. RRDTool polls a monotonically increasing counter. Should a poll be missed no data is lost about the total number of bytes transferred. Thus you can bill by the number of bytes transferred with 100% accuracy, even with missed polls. If you bill by the bit-rate, you can interpolate a single missing data point which high accuracy as well.
Netflow is a continuous stream of UDP across the network. If a UDP packet is lost between the router and the collector there is no way to reconstruct that data, and it is lost forever. Thus any network events means you won't have the data to bill your customer, and you're pretty much stuck always underbilling them with the data actually collected.
If data is not gathered using a mechanism that communicates timestamp to the poller, datapoints will still be imprecise, SNMP would be an example -- the cacti application may assume the SNMP response is current data, but possibly on the actual hardware, the internal MIB on the device was actually updated 10 seconds ago, which means there will be small spikes in traffic rate graphs that do not represent actual spikes in traffic.
Most of the large ISP's I know of moved away from both of the solutions above to propretary, custom solutions. They SNMP poll the counters and store that data in a database with high resolution counters, forever, never aggregated. The necessary perl/python/ruby code to do that and stick it in mysql or postgres is only a few pages long and easy to audit.
-- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/