Open source Netflow analysis for monitoring AS-to-AS traffic
What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic? I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management. Our routers are mostly $VENDOR_C_XR so Netflow support is key. In the past, I've used AS-Stats [1] for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017. InfluxDB wants to sell me [2] on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can't find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however. pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however. I'm aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party. Elastiflow [3] appears to have been open source [4] at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above. On-list and off-list replies are welcome. Thanks, -Brian Links: ------ [1] https://github.com/manuelkasper/AS-Stats [2] https://www.influxdata.com/what-are-netflow-and-sflow/ [3] https://www.elastiflow.com/ [4] https://github.com/robcowart/elastiflow?tab=readme-ov-file
I’m using Alvarado for netflow and I’m pretty happy with it. Seeing it recommended more frequently on Reddit and elsewhere lately too. <https://github.com/akvorado/akvorado> [akvorado.png] akvorado/akvorado: Flow collector, enricher and visualizer<https://github.com/akvorado/akvorado> github.com<https://github.com/akvorado/akvorado> John Stitt Sent from my pocket CRAY-1 On Mar 26, 2024, at 7:05 PM, Brian Knight via NANOG <nanog@nanog.org> wrote: What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic? I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management. Our routers are mostly $VENDOR_C_XR so Netflow support is key. In the past, I've used AS-Stats<https://github.com/manuelkasper/AS-Stats> for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017. InfluxDB wants to sell me<https://www.influxdata.com/what-are-netflow-and-sflow/> on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can't find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however. pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however. I'm aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party. Elastiflow<https://www.elastiflow.com/> appears to have been open source<https://github.com/robcowart/elastiflow?tab=readme-ov-file> at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above. On-list and off-list replies are welcome. Thanks, -Brian CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. If you are not expecting this message contact the sender directly via phone/text to verify.
Brian, Take a peek at Akvorado - https://github.com/akvorado/akvorado We recently set up a lab instance, and seems to check the boxes below.
On Mar 26, 2024, at 19:04, Brian Knight via NANOG <nanog@nanog.org> wrote:
What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic?
I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management.
Our routers are mostly $VENDOR_C_XR so Netflow support is key.
In the past, I've used AS-Stats <https://github.com/manuelkasper/AS-Stats> for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017.
InfluxDB wants to sell me <https://www.influxdata.com/what-are-netflow-and-sflow/> on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can't find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however.
pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however.
I'm aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party.
Elastiflow <https://www.elastiflow.com/> appears to have been open source <https://github.com/robcowart/elastiflow?tab=readme-ov-file> at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above.
On-list and off-list replies are welcome.
Thanks,
-Brian
Brian, I have used Akvorado in an environment with ~80G of traffic and I was super happy. It can be easily set via a docker-compose file and amongst its key benefits is the user-friendly UI that allows you to gain insight into your network traffic. There is also a demo instance available to find out what to expect: https://demo.akvorado.net/ My only "concern" was that it did not provide an API for consuming data externally. - Marinos On 3/27/2024 2:55 AM, Andrew Hoyos wrote:
Brian,
Take a peek at Akvorado - https://github.com/akvorado/akvorado We recently set up a lab instance, and seems to check the boxes below.
On Mar 26, 2024, at 19:04, Brian Knight via NANOG <nanog@nanog.org> wrote:
What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic?
I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management. Our routers are mostly $VENDOR_C_XR so Netflow support is key.
In the past, I've used AS-Stats <https://github.com/manuelkasper/AS-Stats> for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017.
InfluxDB wants to sell me <https://www.influxdata.com/what-are-netflow-and-sflow/> on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can't find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however. pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however. I'm aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party. Elastiflow <https://www.elastiflow.com/> appears to have been open source <https://github.com/robcowart/elastiflow?tab=readme-ov-file> at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above. On-list and off-list replies are welcome. Thanks, -Brian
On 2024-03-27 09:09, Marinos Dimolianis wrote:
My only "concern" was that it did not provide an API for consuming data externally.
This is very high on my todo list, notably because I don't want to reimplement Grafana. The API already exists (the current web interface uses it) but it is not "stable" (it may change in future versions).
Interested in responses to this as well. Perhaps something informative that I can also adopt for zero $$ would be amazing. In case you do get pointers off-list kindly share- we can walk the journey together and compare notes :) On Wed, 27 Mar 2024 at 03:06, Brian Knight via NANOG <nanog@nanog.org> wrote:
What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic?
I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management.
Our routers are mostly $VENDOR_C_XR so Netflow support is key.
In the past, I've used AS-Stats <https://github.com/manuelkasper/AS-Stats> for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017.
InfluxDB wants to sell me <https://www.influxdata.com/what-are-netflow-and-sflow/> on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can't find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however.
pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however.
I'm aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party.
Elastiflow <https://www.elastiflow.com/> appears to have been open source <https://github.com/robcowart/elastiflow?tab=readme-ov-file> at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above.
On-list and off-list replies are welcome.
Thanks,
-Brian
Try FlowViewer http://flowviewer.net Free, complete, graphical netflow analysis tool. Developed for NASA. Runs on top of SiLK, a powerful open-source netflow capture and analysis tool developed by Carnegie-Mellon for DoD. Supports IPFIX, netflow v5, sflow, IPv6. Text reports, graphing and long-term tracking via graphs. Automatic storage control capability. In general, as you probably know, it's amazing what you can get from netflow. Best, Joe On 3/26/2024 8:04 PM, Brian Knight via NANOG wrote:
What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic?
I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management. Our routers are mostly $VENDOR_C_XR so Netflow support is key.
In the past, I've used AS-Stats <https://github.com/manuelkasper/AS-Stats> for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017.
InfluxDB wants to sell me <https://www.influxdata.com/what-are-netflow-and-sflow/> on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can't find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however. pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however. I'm aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party. Elastiflow <https://www.elastiflow.com/> appears to have been open source <https://github.com/robcowart/elastiflow?tab=readme-ov-file> at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above. On-list and off-list replies are welcome. Thanks, -Brian
Brian, you may want to see if your routers support sFlow (vendors have added the feature over the last few years). In particular, see if it includes support for the sFlow extended_gateway structure: /* Extended Gateway Data */ /* opaque = flow_data; enterprise = 0; format = 1003 */ struct extended_gateway { next_hop nexthop; /* Address of the border router that should be used for the destination network */ unsigned int as; /* Autonomous system number of router */ unsigned int src_as; /* Autonomous system number of source */ unsigned int src_peer_as; /* Autonomous system number of source peer */ as_path_type dst_as_path<>; /* Autonomous system path to the destination */ unsigned int communities<>; /* Communities associated with this route */ unsigned int localpref; /* LocalPref associated with this route */ } The dst_as_path field is particularly valuable since it allows you to see who your customers are peering with. While not a complete solution, you might want to take a look at sflowtool, https://github.com/sflow/sflowtool, to decode the sFlow records and convert them to JSON. It's not hard to write a Python script to calculate BGP peering metrics and push the results into a time series database (Prometheus, InfluxDB, etc) and build dashboards in Grafana. The following article gives a few examples: https://blog.sflow.com/2018/12/sflow-to-json.html On Tue, Mar 26, 2024 at 5:06 PM Brian Knight via NANOG <nanog@nanog.org> wrote:
What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic?
I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management.
Our routers are mostly $VENDOR_C_XR so Netflow support is key.
In the past, I've used AS-Stats <https://github.com/manuelkasper/AS-Stats> for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017.
InfluxDB wants to sell me <https://www.influxdata.com/what-are-netflow-and-sflow/> on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can't find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however.
pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however.
I'm aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party.
Elastiflow <https://www.elastiflow.com/> appears to have been open source <https://github.com/robcowart/elastiflow?tab=readme-ov-file> at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above.
On-list and off-list replies are welcome.
Thanks,
-Brian
In the same vein, if you can get your devices exporting sFlow, or for others reading that do have sFlow capable devices: the sFlow-RT team has built ready to deploy, all in one docker containers using Grafana and Prometheus that you can stand up within minutes to start visualizing and easily querying/processing sFlow data from your routers, with no prior experience with the underlying software needed. https://blog.sflow.com/2023/07/deploy-real-time-network-dashboards.html https://github.com/sflow-rt/prometheus-grafana On Wed, Mar 27, 2024 at 12:00 PM Peter Phaal <peter.phaal@gmail.com> wrote:
Brian, you may want to see if your routers support sFlow (vendors have added the feature over the last few years).
In particular, see if it includes support for the sFlow extended_gateway structure:
/* Extended Gateway Data */ /* opaque = flow_data; enterprise = 0; format = 1003 */
struct extended_gateway { next_hop nexthop; /* Address of the border router that should be used for the destination network */ unsigned int as; /* Autonomous system number of router */ unsigned int src_as; /* Autonomous system number of source */ unsigned int src_peer_as; /* Autonomous system number of source peer */ as_path_type dst_as_path<>; /* Autonomous system path to the destination */ unsigned int communities<>; /* Communities associated with this route */ unsigned int localpref; /* LocalPref associated with this route */ }
The dst_as_path field is particularly valuable since it allows you to see who your customers are peering with.
While not a complete solution, you might want to take a look at sflowtool, https://github.com/sflow/sflowtool, to decode the sFlow records and convert them to JSON. It's not hard to write a Python script to calculate BGP peering metrics and push the results into a time series database (Prometheus, InfluxDB, etc) and build dashboards in Grafana. The following article gives a few examples:
https://blog.sflow.com/2018/12/sflow-to-json.html
On Tue, Mar 26, 2024 at 5:06 PM Brian Knight via NANOG <nanog@nanog.org> wrote:
What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic?
I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management.
Our routers are mostly $VENDOR_C_XR so Netflow support is key.
In the past, I've used AS-Stats <https://github.com/manuelkasper/AS-Stats> for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017.
InfluxDB wants to sell me <https://www.influxdata.com/what-are-netflow-and-sflow/> on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can't find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however.
pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however.
I'm aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party.
Elastiflow <https://www.elastiflow.com/> appears to have been open source <https://github.com/robcowart/elastiflow?tab=readme-ov-file> at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above.
On-list and off-list replies are welcome.
Thanks,
-Brian
On Wed, 27 Mar 2024 at 21:02, Peter Phaal <peter.phaal@gmail.com> wrote:
Brian, you may want to see if your routers support sFlow (vendors have added the feature over the last few years).
Why is this a solution, what does it solve for OP? Why is it meaningful what the wire-format of the records are? I read OP's question at a much higher level, about how to interact and reason about data, rather than how to emit it. Ultimately sFlow is a perfect subset of IPFIX, when you run IPFIX without caching you get the functional equivalent of sFlow (there is an IPFIX entity for emitting n bytes from frame as well as data). -- ++ytti
I hope my comments were useful. I was trying to raise awareness that bgp as-path information is an option and might be helpful in addressing Brian's requirements, "I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management." Possible reports that could be of interest are: 1. destination AS numbers by traffic volume and as-path length 2. destination AS numbers by traffic volume and second to last AS in path (AS of peering with destination). 3. traffic volume by transit AS 4. traffic volume passing through AS allow / deny ASN list. What other types of report might be interesting? sFlow was mentioned because I believe Brian's routers support the feature and may well export the as-path data directly via sFlow (I am not aware that it is a feature widely supported in vendor NetFlow/IPFIX implementations?). However, some of the tools mentioned (pmacct, Kentik, Akvorado) can enrich flow data downstream (through BGP / BMP peering session with router) if it isn't present in the sFlow/Netflow/IPFIX records, although downstream enrichment does add a level of operational complexity. On Wed, Mar 27, 2024 at 11:03 PM Saku Ytti <saku@ytti.fi> wrote:
On Wed, 27 Mar 2024 at 21:02, Peter Phaal <peter.phaal@gmail.com> wrote:
Brian, you may want to see if your routers support sFlow (vendors have added the feature over the last few years).
Why is this a solution, what does it solve for OP? Why is it meaningful what the wire-format of the records are? I read OP's question at a much higher level, about how to interact and reason about data, rather than how to emit it.
Ultimately sFlow is a perfect subset of IPFIX, when you run IPFIX without caching you get the functional equivalent of sFlow (there is an IPFIX entity for emitting n bytes from frame as well as data).
-- ++ytti
Hey, On Thu, 28 Mar 2024 at 17:49, Peter Phaal <peter.phaal@gmail.com> wrote:
sFlow was mentioned because I believe Brian's routers support the feature and may well export the as-path data directly via sFlow (I am not aware that it is a feature widely supported in vendor NetFlow/IPFIX implementations?).
Exporting AS information is wire-format agnostic feature, if it's supported or not, it can equally be injected into sFlow, NetflowV5 (src and dst only), NetflowV9 and IPFIX. The cost is that you need to program in FIB entries the information, so that the information becomes available at look-up time for record creation. In OP's case (IOS-XR) this means enabling 'attribute-download' for BGP, and I believe IOS-XR will never download any other asn but src and dst, therefore full information cannot be injected into any emitted wire-format. -- ++ytti
Yeah, cost to implement dst_as_path lookups far outweighs the usefulness IMO. If you really want that it's much better to get it via BMP. ( Same with communities and localpref in the extended gateway definition of sflow. ) Fundamentally I've always disagreed with how sFlow aggregates flow data with network state data. IMO you collect the two things separately, and join them off-device should you need to for analysis. On Thu, Mar 28, 2024 at 1:50 PM Saku Ytti <saku@ytti.fi> wrote:
Hey,
On Thu, 28 Mar 2024 at 17:49, Peter Phaal <peter.phaal@gmail.com> wrote:
sFlow was mentioned because I believe Brian's routers support the feature and may well export the as-path data directly via sFlow (I am not aware that it is a feature widely supported in vendor NetFlow/IPFIX implementations?).
Exporting AS information is wire-format agnostic feature, if it's supported or not, it can equally be injected into sFlow, NetflowV5 (src and dst only), NetflowV9 and IPFIX. The cost is that you need to program in FIB entries the information, so that the information becomes available at look-up time for record creation.
In OP's case (IOS-XR) this means enabling 'attribute-download' for BGP, and I believe IOS-XR will never download any other asn but src and dst, therefore full information cannot be injected into any emitted wire-format. -- ++ytti
Tom Beecher wrote on 28/03/2024 18:35:
Fundamentally I've always disagreed with how sFlow aggregates flow data with network state data.
"can aggregate" rather than "aggregates" - this is implementation dependent and most implementations don't bother with it. Overall, sflow has one major advantage over netflow/ipfix, namely that it's a stateless sampling mechanism. Once you have hardware that can reliably pick out one in N frames, the rest of the protocol is straightforward enough, which means that it's cheap to implement in hardware. If you're ok with 1. sampling and 2. the set of data that sflow provides, then sflow is great. Netflow / ipfix, on the other hand, assumes that it's learning about flow state. For this, you need both a flow lookup mechanism and flow storage memory. Usually the flow lookup mechanism is implemented using the same technology as the packet forwarding lookup mechanism due to performance requirements, i.e. expensive. Similarly, the storage mechanism needs to be fast, which often precludes being large. Often both the lookup and storage mechanism are linked, e.g. tcam. Obviously, not all netflow/ipfix implementations implement flow state, but most do; some implement stateless sampling ala sflow. Also many netflow implementations don't export mac address information, which limits usefulness in certain situations. But this is an implementation gap rather than a protocol weakness. Tools should be chosen to fit the job. There are plenty of situations where sflow is ideal. There are others where netflow is preferable. Nick
On Fri, 29 Mar 2024 at 02:15, Nick Hilliard <nick@foobar.org> wrote:
Overall, sflow has one major advantage over netflow/ipfix, namely that it's a stateless sampling mechanism. Once you have hardware that can
Obviously, not all netflow/ipfix implementations implement flow state, but most do; some implement stateless sampling ala sflow. Also many
Tools should be chosen to fit the job. There are plenty of situations where sflow is ideal. There are others where netflow is preferable.
This seems like a long-winded way of saying, sFlow is a perfect subset of IPFIX. We will increasingly see IPFIX implementations omit state, because states don't do anything anymore in high-volume networks, you will only ever create flow in cache, then delay exporting the information for some seconds, but the flow is never hit twice, therefore paying massive cost for caching, without getting anything out of it. Anyone who actually needs caching, will have to buy specialised devices, as it will no longer be economical for peering-routers to offer such memory bandwidth and cache sizes that caches will actually do something. In a particular network we tried 1:5000 and 1:500 and in both cases flow records were 1 packet long, at which point we hit record export policer limit, and couldn't determine at which sampling rate we will start to see cache being useful. I've wondered for a long time, what would a graph look like, where you graph sampling ratio and percentage of flows observed, it will be linear to very high sampling ratios, but eventually it will start to taper off, I just don't have any intuitive idea when. And I don't think anyone really knows what ratio of flows they are observing in the sFlow/IPFIX, if you keep sampling ratio static over a period of time, say decade, you will continuously reduce your resolution, seeing a smaller percentage of flows. This worries me a lot, because statistician would say that you need this share of volume or this share of flows if you want to use the data like this with this confidence, therefore if we formally think the problem, we should constantly adjust our sampling ratios to fit our statistical model to keep same promises about data quality. -- ++ytti
On Fri, 2024-03-29 at 00:15 +0000, Nick Hilliard wrote:
Overall, sflow has one major advantage over netflow/ipfix, namely that it's a stateless sampling mechanism.
Precisely. From my corner of the industry, my use case for flow data is extremely limited: I need (sampled) frame information: src-mac, dst- mac, qtag, ethernet protocol, framesize, sample rate. sFlow provides that in every sample, in a straighforward manner. (Never mind that the vendor we use does interesting things with the way they sample.) IPFIX, by comparison, is a nightmare: to understand the data records, you need to have seen (and stored) the corresponding data template first. Those records will contain most of the information I need, *except* the sampling rate, which comes from an options data record... which you first have to match to an options template. Then, the sampling rate may not be present, but the sampling probability can be. Slightly different semantics. So that's four types of records your collector may receive. There is also at least one vendor that believes it's perfectly fine to export those over different transport sessions (read: different UDP source ports), which makes it really hard to do load balancing on the receiving side. To top it off, both the sFlow and IPFIX specs are sufficiently vague about the meaning of the "frame size", so vendors can implement whatever they want (include/exclude padding, include/exclude FCS). This implies that you shouldn't trust these fields. Ah, well. -- Steven
The sFlow frame_length field isn't intended to be vague. If you are seeing non-conforming sFlow implementations, please raise the issue with the vendor so they can fix the issue. Verifying that the frame_length and stripped fields are correctly implemented is one of the tests performed by the sFlow Test tool and running the tool can be helpful in persuading a vendor that they are out of compliance: https://blog.sflow.com/2015/11/sflow-test.html The following language is included in the sFlow Version 5 spec, https://sflow.org/sflow_version_5.txt. /* Raw Packet Header */ /* opaque = flow_data; enterprise = 0; format = 1 */ struct sampled_header { header_protocol protocol; /* Format of sampled header */ unsigned int frame_length; /* Original length of packet before sampling. Note: For a layer 2 header_protocol, length is total number of octets of data received on the network (excluding framing bits but including FCS octets). Hardware limitations may prevent an exact reporting of the underlying frame length, but an agent should attempt to be as accurate as possible. Any octets added to the frame_length to compensate for encapsulations removed by the underlying hardware must also be added to the stripped count. */ v1.00 sFlow.org [Page 35] FINAL sFlow Version 5 July 2004 unsigned int stripped; /* The number of octets removed from the packet before extracting the header<> octets. Trailing encapsulation data corresponding to any leading encapsulations that were stripped must also be stripped. Trailing encapsulation data for the outermost protocol layer included in the sampled header must be stripped. In the case of a non-encapsulated 802.3 packet stripped >= 4 since VLAN tag information might have been stripped off in addition to the FCS. Outer encapsulations that are ambiguous, or not one of the standard header_protocol must be stripped. */ opaque header<>; /* Header bytes */ } On Fri, Mar 29, 2024 at 12:46 PM Steven Bakker <steven.bakker@ams-ix.net> wrote:
To top it off, both the sFlow and IPFIX specs are sufficiently vague about the meaning of the "frame size", so vendors can implement whatever they want (include/exclude padding, include/exclude FCS). This implies that you shouldn't trust these fields.
Hi Peter, Thanks for that link. I did read the spec, and while the definition itself is clear, the escape clause gives a lot of wiggle room: "Hardware limitations may prevent an exact reporting of the underlying frame length, but an agent should attempt to be as accurate as possible." I read that as, "the vendor will do whatever it pleases, and you should be grateful to receive a non-negative integer at all." I could be too cynical, though. Anyway, this particular vendor does other funny things (such as sometimes stripping the q-tag headers from the sampled frame; throttling the frame sampling on the box, but not adjusting the sampling interval in the sFlow exports) that make it a true joy to work with this gear. ;-) Cheers, -- Steven
Hi everyone, I've been trying to get Akvorado to work on my environmnet but I keep getting the flows to stop collecting, it seems like the issue is related to the number of exporters I have sending data, can someone please share the max number they have gotten to work and the flows/s rate without the system crashing? Thanks in advance for your answers. ________________________________ From: NANOG <nanog-bounces+gutierrezj=westmancom.com@nanog.org> on behalf of Steven Bakker <steven.bakker@ams-ix.net> Sent: Sunday, March 31, 2024 4:53 AM To: Peter Phaal <peter.phaal@gmail.com> Cc: nanog@nanog.org <nanog@nanog.org> Subject: Re: Open source Netflow analysis for monitoring AS-to-AS traffic CAUTION: This email is from an external source. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Peter, Thanks for that link. I did read the spec, and while the definition itself is clear, the escape clause gives a lot of wiggle room: "Hardware limitations may prevent an exact reporting of the underlying frame length, but an agent should attempt to be as accurate as possible." I read that as, "the vendor will do whatever it pleases, and you should be grateful to receive a non-negative integer at all." I could be too cynical, though. Anyway, this particular vendor does other funny things (such as sometimes stripping the q-tag headers from the sampled frame; throttling the frame sampling on the box, but not adjusting the sampling interval in the sFlow exports) that make it a true joy to work with this gear. ;-) Cheers, -- Steven
Without much information, I think this is more likely that you are running out of disk space. On 2024-06-05 23:15, Javier Gutierrez wrote:
Hi everyone, I've been trying to get Akvorado to work on my environmnet but I keep getting the flows to stop collecting, it seems like the issue is related to the number of exporters I have sending data, can someone please share the max number they have gotten to work and the flows/s rate without the system crashing?
Thanks in advance for your answers. ------------------------------------------------------------------------ *From:* NANOG <nanog-bounces+gutierrezj=westmancom.com@nanog.org> on behalf of Steven Bakker <steven.bakker@ams-ix.net> *Sent:* Sunday, March 31, 2024 4:53 AM *To:* Peter Phaal <peter.phaal@gmail.com> *Cc:* nanog@nanog.org <nanog@nanog.org> *Subject:* Re: Open source Netflow analysis for monitoring AS-to-AS traffic
*CAUTION: *This email is from an external source. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi Peter,
Thanks for that link. I did read the spec, and while the definition itself is clear, the escape clause gives a lot of wiggle room:
"/Hardware limitations may// prevent an exact reporting of the underlying frame length, but an agent should attempt to //be as accurate as possible./"
I read that as, "the vendor will do whatever it pleases, and you should be grateful to receive a non-negative integer at all." I could be too cynical, though.
Anyway, this particular vendor does other funny things (such as sometimes stripping the q-tag headers from the sampled frame; throttling the frame sampling on the box, but not adjusting the sampling interval in the sFlow exports) that make it a true joy to work with this gear. ;-)
Cheers,
-- Steven
After some troubleshooting I ended up having to increase my kafka partitions as well as my clickhouse collectors as it seemed like clickhouse would lack behind quite a bit I also has some issues with the server where CPU and RAM would max out all the time, my RAM usage is still quite high and seems to grow exponentially as the day goes by, but i don't think its causing the issues anymore. Storage wise I'm good tho Thanks for the advice. Kind regards, Javier Gutierrez, ________________________________ From: Vincent Bernat <bernat@luffy.cx> Sent: Saturday, June 8, 2024 2:46 AM To: Javier Gutierrez <GutierrezJ@westmancom.com>; Steven Bakker <steven.bakker@ams-ix.net>; Peter Phaal <peter.phaal@gmail.com> Cc: nanog@nanog.org <nanog@nanog.org> Subject: Re: Open source Netflow analysis for monitoring AS-to-AS traffic CAUTION: This email is from an external source. Do not click links or open attachments unless you recognize the sender and know the content is safe. Without much information, I think this is more likely that you are running out of disk space. On 2024-06-05 23:15, Javier Gutierrez wrote:
Hi everyone, I've been trying to get Akvorado to work on my environmnet but I keep getting the flows to stop collecting, it seems like the issue is related to the number of exporters I have sending data, can someone please share the max number they have gotten to work and the flows/s rate without the system crashing?
Thanks in advance for your answers. ------------------------------------------------------------------------ *From:* NANOG <nanog-bounces+gutierrezj=westmancom.com@nanog.org> on behalf of Steven Bakker <steven.bakker@ams-ix.net> *Sent:* Sunday, March 31, 2024 4:53 AM *To:* Peter Phaal <peter.phaal@gmail.com> *Cc:* nanog@nanog.org <nanog@nanog.org> *Subject:* Re: Open source Netflow analysis for monitoring AS-to-AS traffic
*CAUTION: *This email is from an external source. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi Peter,
Thanks for that link. I did read the spec, and while the definition itself is clear, the escape clause gives a lot of wiggle room:
"/Hardware limitations may// prevent an exact reporting of the underlying frame length, but an agent should attempt to //be as accurate as possible./"
I read that as, "the vendor will do whatever it pleases, and you should be grateful to receive a non-negative integer at all." I could be too cynical, though.
Anyway, this particular vendor does other funny things (such as sometimes stripping the q-tag headers from the sampled frame; throttling the frame sampling on the box, but not adjusting the sampling interval in the sFlow exports) that make it a true joy to work with this gear. ;-)
Cheers,
-- Steven
On Fri, 29 Mar 2024 at 20:10, Steven Bakker <steven.bakker@ams-ix.net> wrote:
To top it off, both the sFlow and IPFIX specs are sufficiently vague about the meaning of the "frame size", so vendors can implement whatever they want (include/exclude padding, include/exclude FCS). This implies that you shouldn't trust these fields.
I share this concern, but in my experience the market simply does not care at all what the data means. People happily graph L3 rate from Junos, and L2 rate from other boxes, using them interchangeably as well as using them to determine if or not there is congestion. While in reality, what you really want is L1 speed, so you can actually see if the interface is full or not. Luckily we are starting to see more and more devices also support peak-buiffer-util in previous N seconds, which is far more useful for congestion monitoring, unfortunately it is not IF-MIB so most will never ever collect it. Note, it is possible to get most Juniper gear to report L2 rate like IF-MIB specifies, but it's a non-standard configuration option, therefore very rarely used. I also wholeheartedly agree on inline templates being near peak insanity. Huge complexity for upside that is completely beyond my understanding. If I decide to collect a new metric, then punching in the metric number+name somewhere is the least of my worries. Idea that the costs are lowered by having machines dynamically determine what is being collected and monitored is just bizarre. Most of the cost of starting to collect a new metric is figuring out how it is actionable, what needs to happen to the metric to trigger a given action, and how exactly we are extracting value from this action. Definitely Netflow v9/v10 should have done out-of-band templates, and left it to operator concern to communicate to the collector what it is seeing. Even exceedingly trivial things in v9/v10 entities can be broken for years and years before anyone notices, like for example the original sampling entities are deprecated, they are replaced with new entities, which communicate 'every N packets, sample C packets', this is very very good, because it allows you to do stateless sampling, while still filling out export packet with MTU or larger size to keep export PPS rate same before/after axing cache. However, by the time I was looking into this, only pmacct correctly understood how to use these entities, nfcapd and arbor either didn't understand them, or understood them incorrectly (both were fixed in a timely manner by responsible maintainers, thank you). -- ++ytti
The documentation for IOS-XR suggests that enabling extended-router in the sFlow configuration should export "Autonomous system path to the destination", at least on the 8000 series routers: https://www.cisco.com/c/en/us/td/docs/iosxr/cisco8000/netflow/command/refere... I couldn't find a similar option in the NetFlow/IPFIX configuration guide, but I might have missed it. On Thu, Mar 28, 2024 at 10:48 AM Saku Ytti <saku@ytti.fi> wrote:
Hey,
On Thu, 28 Mar 2024 at 17:49, Peter Phaal <peter.phaal@gmail.com> wrote:
sFlow was mentioned because I believe Brian's routers support the feature and may well export the as-path data directly via sFlow (I am not aware that it is a feature widely supported in vendor NetFlow/IPFIX implementations?).
Exporting AS information is wire-format agnostic feature, if it's supported or not, it can equally be injected into sFlow, NetflowV5 (src and dst only), NetflowV9 and IPFIX. The cost is that you need to program in FIB entries the information, so that the information becomes available at look-up time for record creation.
In OP's case (IOS-XR) this means enabling 'attribute-download' for BGP, and I believe IOS-XR will never download any other asn but src and dst, therefore full information cannot be injected into any emitted wire-format. -- ++ytti
On Thu, 28 Mar 2024 at 20:36, Peter Phaal <peter.phaal@gmail.com> wrote:
The documentation for IOS-XR suggests that enabling extended-router in the sFlow configuration should export "Autonomous system path to the destination", at least on the 8000 series routers: https://www.cisco.com/c/en/us/td/docs/iosxr/cisco8000/netflow/command/refere... I couldn't find a similar option in the NetFlow/IPFIX configuration guide, but I might have missed it.
Hope this clarifies. ------- https://www.cisco.com/c/en/us/td/docs/routers/asr9000/software/asr9k-r7-9/co... Use the record ipv4 [peer-as] command to record peer AS. Here, you collect and export the peer AS numbers. Note Ensure that the bgp attribute-download command is configured. Else, no AS is collected when the record ipv4 or record ipv4 peer-as command is configured. ------------ -- ++ytti
On 27/03/24 01:04, Brian Knight via NANOG wrote:
What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic?
I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management. … pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however.
pmacct will do what you want and it's not particularly difficult to set it up. For example, you can aggregate data into a database using: aggregate[in]: src_as,src_net,src_mask aggregate[out]: dst_as,dst_net,dst_mask Now you can issue SQL queries that tell you which ASes or prefixes you send/receive the most bits or packets to/from. Tore
Thanks to all who took the time to comment and make suggestions. To summarize the private messages, one respondent suggested Argus as a collector. Another mentioned that they are still using AS-Stats. I'm drawn to Akvorado. I like the self-contained nature of the application. NF collector, database, and modern web GUI are all bundled in one docker container. The full-featured demo [5] is fantastic. That the app can enrich the Netflow data with BMP is an added bonus. The best part is, the GUI has the report viz I need, and it is actually the default visualization in the demo. It also has the graph types that I didn't know I needed, like the Sankey graph. FlowViewer looks interesting as well. I suspect getting the reports right may take some time, given the amount of GUI filtering options. pmacct and Argus seem to be capable tools that have been around for a long time, but I haven't seen a concise stack building guide to get Netflow data into a good GUI using these. Looks like there are some older Docker images available for both. I could write my own SQL or roll my own stack, but I'd much rather spend my time on other things. I appreciate the conversation around sFlow. I actually wasn't aware that XR supported it. AS path probably doesn't add a whole lot of value given that I'm focused on flows across our IP transit circuits. I'm able to determine my next AS hop simply by looking at the flow's associated tuple of (flow exporter, interface). I can use other tools like RouteViews or RIPE's RIS to determine the destination AS's upstreams if needed. The rest of the path is probably not too helpful for determining peering opportunities. I think I'm going to get Akvorado running in my environment. If that doesn't pan out, I'll likely go back to AS-Stats. Can those running Akvorado comment on their system specs? The only spec I've seen is a mention in this blog post [6]: "Akvorado is performant enough to handle 100 000 flows per second with 64 GB of RAM and 24 vCPU. With 2 TB of disk, you should expect to keep data for a few years." Thanks again all, -Brian On 2024-03-26 19:04, Brian Knight via NANOG wrote:
What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic?
I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management.
Our routers are mostly $VENDOR_C_XR so Netflow support is key.
In the past, I've used AS-Stats [1] for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017.
InfluxDB wants to sell me [2] on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can't find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however.
pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however.
I'm aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party.
Elastiflow [3] appears to have been open source [4] at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above.
On-list and off-list replies are welcome.
Thanks,
-Brian
Links: ------ [1] https://github.com/manuelkasper/AS-Stats [2] https://www.influxdata.com/what-are-netflow-and-sflow/ [3] https://www.elastiflow.com/ [4] https://github.com/robcowart/elastiflow?tab=readme-ov-file [5] https://demo.akvorado.net/ [6] https://vincent.bernat.ch/en/blog/2022-akvorado-flow-collector
We are in the process of adding netflow collection to libreqos. Any potential testers using any of these backends described below out there? On Thu, Mar 28, 2024, 5:02 PM Brian Knight via NANOG <nanog@nanog.org> wrote:
Thanks to all who took the time to comment and make suggestions.
To summarize the private messages, one respondent suggested Argus as a collector. Another mentioned that they are still using AS-Stats.
I'm drawn to Akvorado. I like the self-contained nature of the application. NF collector, database, and modern web GUI are all bundled in one docker container. The full-featured demo <https://demo.akvorado.net/> is fantastic. That the app can enrich the Netflow data with BMP is an added bonus.
The best part is, the GUI has the report viz I need, and it is actually the default visualization in the demo. It also has the graph types that I didn't know I needed, like the Sankey graph.
FlowViewer looks interesting as well. I suspect getting the reports right may take some time, given the amount of GUI filtering options.
pmacct and Argus seem to be capable tools that have been around for a long time, but I haven't seen a concise stack building guide to get Netflow data into a good GUI using these. Looks like there are some older Docker images available for both. I could write my own SQL or roll my own stack, but I'd much rather spend my time on other things.
I appreciate the conversation around sFlow. I actually wasn't aware that XR supported it. AS path probably doesn't add a whole lot of value given that I'm focused on flows across our IP transit circuits. I'm able to determine my next AS hop simply by looking at the flow's associated tuple of (flow exporter, interface). I can use other tools like RouteViews or RIPE's RIS to determine the destination AS's upstreams if needed. The rest of the path is probably not too helpful for determining peering opportunities.
I think I'm going to get Akvorado running in my environment. If that doesn't pan out, I'll likely go back to AS-Stats.
Can those running Akvorado comment on their system specs? The only spec I've seen is a mention in this blog post <https://vincent.bernat.ch/en/blog/2022-akvorado-flow-collector>: "Akvorado is performant enough to handle 100 000 flows per second with 64 GB of RAM and 24 vCPU. With 2 TB of disk, you should expect to keep data for a few years."
Thanks again all,
-Brian
On 2024-03-26 19:04, Brian Knight via NANOG wrote:
What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic?
I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management.
Our routers are mostly $VENDOR_C_XR so Netflow support is key.
In the past, I've used AS-Stats <https://github.com/manuelkasper/AS-Stats> for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017.
InfluxDB wants to sell me <https://www.influxdata.com/what-are-netflow-and-sflow/> on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can't find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however.
pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however.
I'm aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party.
Elastiflow <https://www.elastiflow.com/> appears to have been open source <https://github.com/robcowart/elastiflow?tab=readme-ov-file> at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above.
On-list and off-list replies are welcome.
Thanks,
-Brian
participants (16)
-
Andrew Hoyos
-
Brian Knight
-
Dave Taht
-
Javier Gutierrez
-
Joe Loiacono
-
John Stitt
-
Marinos Dimolianis
-
Nick Hilliard
-
Nick Plunkett
-
Pascal Masha
-
Peter Phaal
-
Saku Ytti
-
Steven Bakker
-
Tom Beecher
-
Tore Anderson
-
Vincent Bernat