Polling Bandwidth as an Aggregate
Has anyone had to aggregate bandwidth data from multiple interfaces for billing. For example I'd like to poll with an open source tool and aggregate data from multiple interfaces connected to the same customer or multiple customers for the purpose of billing and capacity management. Is there an easy way to do this with cacti/rrd or another open source kit? Keegan Holley ▪ Network Architect ▪ SunGard Availability Services ▪ 401 North Broad St. Philadelphia, PA 19108 ▪ (215) 446-1242 ▪ keegan.holley@sungard.com Keeping People and Information Connected® ▪ http://www.availability.sungard.com/ Think before you print CONFIDENTIALITY: This e-mail (including any attachments) may contain confidential, proprietary and privileged information, and unauthorized disclosure or use is prohibited. If you received this e-mail in error, please notify the sender and delete this e-mail from your system.
Hi Keegan, On Jan 19, 2012, at 9:50 PM, Keegan Holley wrote:
Has anyone had to aggregate bandwidth data from multiple interfaces for billing. For example I'd like to poll with an open source tool and aggregate data from multiple interfaces connected to the same customer or multiple customers for the purpose of billing and capacity management. Is there an easy way to do this with cacti/rrd or another open source kit?
With the rrdtool backend, you can certainly define and add multiple sources from different files together. Using 'AREA' first and subsequently 'STACK' to view multiple data sources is particularly nice for visualization. Otherwise, the RRDs and Statistics::Descriptive libraries in Perl can probably go a long way towards what you might be wanting for reporting. Dale
On Thu, Jan 19, 2012 at 10:48 PM, Dale W. Carder <dwcarder@wisc.edu> wrote:
With the rrdtool backend, you can certainly define and add multiple sources from different files together. Using 'AREA' first and subsequently 'STACK' to view multiple data sources is particularly nice for visualization.
Except Cacti/RRDTOOL is really just a great visualization tool, while you can build stacks, it is not something that accurately meters data for billing purposes. The right kind of tool to use would be a netflow or network tap-based billing tool, that actually meters/samples specific datapoints at a specific interval and applies the billing business logic for reporting based on sampled data points, instead of smoothed averages of approximations. RRDTOOL is clearly not designed to accurately report on information for billing. To a great extent, RRDTOOL aggregates, averages, interpolates, smooths what it reports. http://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html See "Data Resampling" Aggregation could be mitigated by including a large number of data rows at step=1 while creating the RRD file, eg for 5 minute polling 1440*(ndays) data rows; (enough rows to include the whole bill period + some number of days without aggregating), but not the rest of the issues with RRD, and including so many rows greatly increases .rrd file size. I would look at Torrus or RTG before RRDTOOL for that, but even then... If data is not gathered using a mechanism that communicates timestamp to the poller, datapoints will still be imprecise, SNMP would be an example -- the cacti application may assume the SNMP response is current data, but possibly on the actual hardware, the internal MIB on the device was actually updated 10 seconds ago, which means there will be small spikes in traffic rate graphs that do not represent actual spikes in traffic. -- -JH
In a message written on Fri, Jan 20, 2012 at 12:16:14AM -0600, Jimmy Hess wrote:
Except Cacti/RRDTOOL is really just a great visualization tool, while you can build stacks, it is not something that accurately meters data for billing purposes. The right kind of tool to use would be a netflow or network tap-based billing tool, that actually meters/samples specific datapoints at a specific interval and applies the billing business logic for reporting based on sampled data points, instead of smoothed averages of approximations.
To suggest Netflow is more accurate than rrdtool seems rather strange to me. It can be as accurate, but is not the way most people deploy it. RRDTool pulls the SNMP counters from an interface and records them to a file. With no aggregation, and assuming your device has accurate SNMP, this should be 100% accurate. While you are right that the defaults for RRDTOOL aggregate data (after a day, week, and month, approximately) those aggregates can be disabled keeping the raw data. I know several ISP's that keep the raw data and use it for billing using these tools. Netflow often suffers right at the source. If you want to bill off netflow data 1:1 netflow is almost required, while most ISP's do sampled Netflow at 1:100 or 1:1000. Those sampling levels produce more inaccuracy than RRDTool's aggregation function. What's more, once the data is put into the Netflow collector, they all do aggregation as well, just like RRDTool. Again, you can disable much of it with careful configuration. But let's compare apples to apples. Let's consider RRDTool configured to not aggregate with 1:1 netflow configured to not aggregate. RRDTool polls a monotonically increasing counter. Should a poll be missed no data is lost about the total number of bytes transferred. Thus you can bill by the number of bytes transferred with 100% accuracy, even with missed polls. If you bill by the bit-rate, you can interpolate a single missing data point which high accuracy as well. Netflow is a continuous stream of UDP across the network. If a UDP packet is lost between the router and the collector there is no way to reconstruct that data, and it is lost forever. Thus any network events means you won't have the data to bill your customer, and you're pretty much stuck always underbilling them with the data actually collected.
If data is not gathered using a mechanism that communicates timestamp to the poller, datapoints will still be imprecise, SNMP would be an example -- the cacti application may assume the SNMP response is current data, but possibly on the actual hardware, the internal MIB on the device was actually updated 10 seconds ago, which means there will be small spikes in traffic rate graphs that do not represent actual spikes in traffic.
Most of the large ISP's I know of moved away from both of the solutions above to propretary, custom solutions. They SNMP poll the counters and store that data in a database with high resolution counters, forever, never aggregated. The necessary perl/python/ruby code to do that and stick it in mysql or postgres is only a few pages long and easy to audit. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
Thanks all for the responses. I think I'm going to use cacti and plugins to aggregate. Aggregated billing is kind of something that would be nice to have but wasn't required. It's nice to know there are concerns with using cacti for this. My last question is if there is any easy/automated way to pull interfaces into cacti and configure graphs for them either via SNMP or reading from a mysql DB. I suddenly remember how much I hate importing large routers into cacti and configuring the graphs. 2012/1/20 Leo Bicknell <bicknell@ufp.org>
In a message written on Fri, Jan 20, 2012 at 12:16:14AM -0600, Jimmy Hess wrote:
Except Cacti/RRDTOOL is really just a great visualization tool, while you can build stacks, it is not something that accurately meters data for billing purposes. The right kind of tool to use would be a netflow or network tap-based billing tool, that actually meters/samples specific datapoints at a specific interval and applies the billing business logic for reporting based on sampled data points, instead of smoothed averages of approximations.
To suggest Netflow is more accurate than rrdtool seems rather strange to me. It can be as accurate, but is not the way most people deploy it.
RRDTool pulls the SNMP counters from an interface and records them to a file. With no aggregation, and assuming your device has accurate SNMP, this should be 100% accurate. While you are right that the defaults for RRDTOOL aggregate data (after a day, week, and month, approximately) those aggregates can be disabled keeping the raw data. I know several ISP's that keep the raw data and use it for billing using these tools.
Netflow often suffers right at the source. If you want to bill off netflow data 1:1 netflow is almost required, while most ISP's do sampled Netflow at 1:100 or 1:1000. Those sampling levels produce more inaccuracy than RRDTool's aggregation function. What's more, once the data is put into the Netflow collector, they all do aggregation as well, just like RRDTool. Again, you can disable much of it with careful configuration.
But let's compare apples to apples. Let's consider RRDTool configured to not aggregate with 1:1 netflow configured to not aggregate. RRDTool polls a monotonically increasing counter. Should a poll be missed no data is lost about the total number of bytes transferred. Thus you can bill by the number of bytes transferred with 100% accuracy, even with missed polls. If you bill by the bit-rate, you can interpolate a single missing data point which high accuracy as well.
Netflow is a continuous stream of UDP across the network. If a UDP packet is lost between the router and the collector there is no way to reconstruct that data, and it is lost forever. Thus any network events means you won't have the data to bill your customer, and you're pretty much stuck always underbilling them with the data actually collected.
If data is not gathered using a mechanism that communicates timestamp to the poller, datapoints will still be imprecise, SNMP would be an example -- the cacti application may assume the SNMP response is current data, but possibly on the actual hardware, the internal MIB on the device was actually updated 10 seconds ago, which means there will be small spikes in traffic rate graphs that do not represent actual spikes in traffic.
Most of the large ISP's I know of moved away from both of the solutions above to propretary, custom solutions. They SNMP poll the counters and store that data in a database with high resolution counters, forever, never aggregated. The necessary perl/python/ruby code to do that and stick it in mysql or postgres is only a few pages long and easy to audit.
-- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On 20/01/2012 15:36, Keegan Holley wrote:
using cacti for this. My last question is if there is any easy/automated way to pull interfaces into cacti and configure graphs for them either via SNMP or reading from a mysql DB. I suddenly remember how much I hate importing large routers into cacti and configuring the graphs.
No. This is one of cacti's major failings: there is no externally accessible API. You're going to end up injecting SQL directly into the cacti database and hoping that version upgrades don't screw up the schema layout too much. Nick
On 20/01/2012 15:44, "Nick Hilliard" <nick@foobar.org> wrote:
No. This is one of cacti's major failings: there is no externally accessible API.
Not an external API but scripts have been available for some time now: http://www.cacti.net/downloads/docs/html/scripts.html Ian
In a message written on Fri, Jan 20, 2012 at 10:36:38AM -0500, Keegan Holley wrote:
using cacti for this. My last question is if there is any easy/automated way to pull interfaces into cacti and configure graphs for them either via SNMP or reading from a mysql DB. I suddenly remember how much I hate importing large routers into cacti and configuring the graphs.
I find using MRTG is easier than Cacti for _automation_ purposes. It's configmaker script will generate a config file for a single router. I've written about 5 different versions of a small script that's basically a customized config maker so the graphs get named with customer names or the like. The job can be fully automated with a few hours of coding; run it out of Cron to rebuild your interface list automatically and you'll never miss a customer turn up because someone forgot to configure a graph. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
Is there a plugin for MRTG that allows you to go back to specific times? I like MRTG better for this as well but cacti's graphs are much more flexible. 2012/1/20 Leo Bicknell <bicknell@ufp.org>
In a message written on Fri, Jan 20, 2012 at 10:36:38AM -0500, Keegan Holley wrote:
using cacti for this. My last question is if there is any easy/automated way to pull interfaces into cacti and configure graphs for them either via SNMP or reading from a mysql DB. I suddenly remember how much I hate importing large routers into cacti and configuring the graphs.
I find using MRTG is easier than Cacti for _automation_ purposes. It's configmaker script will generate a config file for a single router. I've written about 5 different versions of a small script that's basically a customized config maker so the graphs get named with customer names or the like. The job can be fully automated with a few hours of coding; run it out of Cron to rebuild your interface list automatically and you'll never miss a customer turn up because someone forgot to configure a graph.
-- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On 20/01/2012 15:48, Leo Bicknell wrote:
I find using MRTG is easier than Cacti for _automation_ purposes.
It also has another slightly subtle but hugely useful advantage: the primary index reference of a graph does not refer to an interface name or a number, but can be defined as an arbitrary unique token. This is ridiculously useful when it comes to 3rd party scripting and moving customers around the place Nick
Once upon a time, Leo Bicknell <bicknell@ufp.org> said:
To suggest Netflow is more accurate than rrdtool seems rather strange to me. It can be as accurate, but is not the way most people deploy it.
Comparing Netflow to RRDTool is comparing apples to cabinets; one is a source of information and one is a way of storing information.
RRDTool pulls the SNMP counters from an interface and records them to a file.
No, RRDTool stores data given to it by a front end such as MRTG, Cricket, Cacti, etc. That front end can fetch data from any number of sources, including (but not limited to) SNMP. RRDTool then stores information in its database.
With no aggregation, and assuming your device has accurate SNMP, this should be 100% accurate. While you are right that the defaults for RRDTOOL aggregate data (after a day, week, and month, approximately) those aggregates can be disabled keeping the raw data.
RRDTool does not store the raw data. Even for 5-minute intervals, it adjusts the data vs. the timestamp to fit the desired interval. Since you don't read every counter at the exact time of your interval, RRDTool is always manipulating the numbers to fit. The only numbers that are not changed before storing are the timestamp and value for the most recent update (which get overwritten at each update); everything else is adjusted to fit. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble.
On 01/20/2012 10:53 AM, Chris Adams wrote:
To suggest Netflow is more accurate than rrdtool seems rather strange to me. It can be as accurate, but is not the way most people deploy it. Comparing Netflow to RRDTool is comparing apples to cabinets; one is a
Once upon a time, Leo Bicknell<bicknell@ufp.org> said: source of information and one is a way of storing information.
RRDTool pulls the SNMP counters from an interface and records them to a file. No, RRDTool stores data given to it by a front end such as MRTG, Cricket, Cacti, etc. That front end can fetch data from any number of sources, including (but not limited to) SNMP. RRDTool then stores information in its database.
With no aggregation, and assuming your device has accurate SNMP, this should be 100% accurate. While you are right that the defaults for RRDTOOL aggregate data (after a day, week, and month, approximately) those aggregates can be disabled keeping the raw data. RRDTool does not store the raw data. Even for 5-minute intervals, it adjusts the data vs. the timestamp to fit the desired interval. Since you don't read every counter at the exact time of your interval, RRDTool is always manipulating the numbers to fit. The only numbers that are not changed before storing are the timestamp and value for the most recent update (which get overwritten at each update); everything else is adjusted to fit.
I suggest reading http://oss.oetiker.ch/rrdtool/tut/rrd-beginners.en.html -- Stephen Clark *NetWolves* Director of Technology Phone: 813-579-3200 Fax: 813-882-0209 Email: steve.clark@netwolves.com http://www.netwolves.com
2012/1/20 Chris Adams <cmadams@hiwaay.net>
Once upon a time, Leo Bicknell <bicknell@ufp.org> said:
To suggest Netflow is more accurate than rrdtool seems rather strange to me. It can be as accurate, but is not the way most people deploy it.
Comparing Netflow to RRDTool is comparing apples to cabinets; one is a source of information and one is a way of storing information.
I assumed he meant an RRDTool kit that creates graphs with RRDTool. Technically, mysql is the "way of storing information". RRDTool processes it and has the ability to make it pretty for us humons.
RRDTool pulls the SNMP counters from an interface and records them to a file.
No, RRDTool stores data given to it by a front end such as MRTG, Cricket, Cacti, etc. That front end can fetch data from any number of sources, including (but not limited to) SNMP. RRDTool then stores information in its database.
Same as above
With no aggregation, and assuming your device has accurate SNMP, this should be 100% accurate. While you are right that the defaults for RRDTOOL aggregate data (after a day, week, and month, approximately) those aggregates can be disabled keeping the raw data.
RRDTool does not store the raw data. Even for 5-minute intervals, it adjusts the data vs. the timestamp to fit the desired interval. Since you don't read every counter at the exact time of your interval, RRDTool is always manipulating the numbers to fit. The only numbers that are not changed before storing are the timestamp and value for the most recent update (which get overwritten at each update); everything else is adjusted to fit.
I think every graphing tool does this. I pretty much ignored this though
since I was asking about aggregating data from multiple objects not aggregating data over time. Cheers
-- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble.
RTG uses MySQL for it's backend, so you can basically setup queries however you like and you can use RTGPOLL to graph multiple interfaces as well. It's a super good tool and I think there is a group working on RTG2 at googlecode (I think). -Drew -----Original Message----- From: Keegan Holley [mailto:keegan.holley@sungard.com] Sent: Thursday, January 19, 2012 10:51 PM To: NANOG Subject: Polling Bandwidth as an Aggregate Has anyone had to aggregate bandwidth data from multiple interfaces for billing. For example I'd like to poll with an open source tool and aggregate data from multiple interfaces connected to the same customer or multiple customers for the purpose of billing and capacity management. Is there an easy way to do this with cacti/rrd or another open source kit? Keegan Holley ▪ Network Architect ▪ SunGard Availability Services ▪ 401 North Broad St. Philadelphia, PA 19108 ▪ (215) 446-1242 ▪ keegan.holley@sungard.com Keeping People and Information Connected® ▪ http://www.availability.sungard.com/ Think before you print CONFIDENTIALITY: This e-mail (including any attachments) may contain confidential, proprietary and privileged information, and unauthorized disclosure or use is prohibited. If you received this e-mail in error, please notify the sender and delete this e-mail from your system.
RTG uses MySQL for it's backend, so you can basically setup queries however you like and you can use RTGPOLL to graph multiple interfaces as well.
It's a super good tool and I think there is a group working on RTG2 at googlecode (I think).
Another RTG user! I didn't know many of us existed! RTG is a great tool. It's design (perl and PHP and MySQL) lends itself to being modified at will; integration with tools like PHP NetworkWeathermap is very straightforward (http://pastebin.com/9RiZx4A8), and the MySQL backend makes it super flexible. There's no aggregation of data, unless you hack it in yourself with some fancy queries. RTG's data is ideal for doing MySQL partitioning, and there are some indexes that need to be added. But when you get those things in place, it becomes fast and powerful - and it's easy to drop out old data without a lengthy query (just drop the partition). The fact that each SNMP device gets its own table is also a big performance win over the more popular tools. The web interface allows for interface aggregation, and the code for doing that could probably be reverse engineered easily enough for other reporting mechanisms as well. Nathan Eisenberg
On Jan 20, 2012, at 12:49, Nathan Eisenberg <nathan@atlasnetworks.us> wrote:
The web interface allows for interface aggregation, and the code for doing that could probably be reverse engineered easily enough for other reporting mechanisms as well.
On this point (of nice aggregation UIs) is anyone here using Graphite as a backend for their time series data stores? You have to supply/write the poller yourself but it seems an ideal backend for a "just graph everything" approach which allows the poller to use SNMP get-bulk requests which I haven't seen other pollers (rtg/mrtg/spine) doing. ~Matt
Matt Addison <matt.addison@lists.evilgeni.us> wrote:
On this point (of nice aggregation UIs) is anyone here using Graphite as a backend for their time series data stores?
I'm not personally, but I know some of our support clients are happily using it along with OpenNMS' support for outboarding of data storage via TCP and Google protobuf. -jeff
On Fri, Jan 20, 2012 at 08:15:45AM -0500, Drew Weaver wrote:
RTG uses MySQL for it's backend, so you can basically setup queries however you like and you can use RTGPOLL to graph multiple interfaces as well.
It's a super good tool and I think there is a group working on RTG2 at googlecode (I think).
-Drew
I agree with Drew -- I have several functions that do their best to correlate readings amount multiple interfaces, combine them with other readings near the same time intervals, and output a single set of aggregate bandwidth data. One of RTG's big problems is scalability -- as you monitor more and more devices, going further and further back in time, you're ending up with a gigantic MySQL dataset that can be difficult to manage. Fortunately, there are open-source tools to help manage this. There's a Ruby program that automates consolidation of multiple rows into single rows based on configuration data -- allowing you to keep 5-minute readings of interface data for 2 months, then condensing it to 1 hour readings after that, with the flexibility to identify specific tables and specific timeframes to give you maximum control. -- Brandon Ewing (nicotine@warningg.com)
participants (13)
-
Brandon Ewing
-
Chris Adams
-
Dale W. Carder
-
Drew Weaver
-
Ian Goodall
-
Jeff Gehlbach
-
Jimmy Hess
-
Keegan Holley
-
Leo Bicknell
-
Matt Addison
-
Nathan Eisenberg
-
Nick Hilliard
-
Steve Clark