Real world sflow vs netflow?
Can anyone on or off list give me some real world thoughts on sflow vs netflow for border routers? (multi-homed, BGP, straight v4 & v6 only for web hosting, no mpls, vpns, vlans, etc.) Finding it hard to decipher the vendor version of the answer to that question. We use netflow v9 currently but are considering hardware that would be sflow. We don't use it for billing purposes, mostly for spotting malicious remote hosts doing things like scans, spotting traffic such as weird ports in use in either direction that warrant further investigation, watching for ddos/dos destinations to act on mitigation, or investigating the nature of unusual levels of traffic on switch ports that set off alarms. I'm concerned things like port scans, etc. won't be picked up by the NMS if fed by sflow due to the sampling nature, or similar concern if 500 ssh connections by the same remote host are sampled as 1 connection, etc. Of course these concerns were put in my head by someone interested in me continuing to use equipment that happens to output netflow data, hence me wanting some real people answers. :-) Thanks!
On 2012-07-13 19:30, David Hubbard wrote: [..]
We don't use it for billing purposes, mostly for spotting malicious remote hosts doing things like scans, spotting traffic such as weird ports in use in either direction that warrant further investigation, [..]
The primary difference between NetFlow/IPFIX and sFlow is that NetFlow is unsampled while sFlow is sampled. As such, for these kind of cases it might be more worthy to have NetFlow than sFlow as you get all the source/dest ports. On the other hand sFlow can give you packet headers and that might be useful if you get every first say 200 bytes of every flow. Though depending on the hardware and traffic volume and traffic mix you might have to sample anyway. Oh and there is a small difference in the packet formats and the idea behind why something exists, but that won't hurt you too much. Greets, Jeroen
Hi David, I'm not sure that sflow is going to get your the granularity that you are looking for. It's usually better to start more granular and then aggregate into larger flows when you graph or reference for historic values. Have you looked at other options, such as argus [1] to collect flow data outside of the networking gear? This way the networking gear can do what its primary job and flow collection can happen elsewhere. There's a whole argus community that discusses the information security topics you're interested in and Carter, the guy who wrote all (?) of the code is very responsive. Argus can also take in NetFlow flows from your routers too. There are obviously other tools available, that may work as well or better, but argus is one I've been using with great success in a fairly heavily trafficked environment. Cheers, Harry [1] http://www.qosient.com/argus/ On 07/13/2012 01:30 PM, David Hubbard wrote:
Can anyone on or off list give me some real world thoughts on sflow vs netflow for border routers? (multi-homed, BGP, straight v4 & v6 only for web hosting, no mpls, vpns, vlans, etc.)
Finding it hard to decipher the vendor version of the answer to that question. We use netflow v9 currently but are considering hardware that would be sflow. We don't use it for billing purposes, mostly for spotting malicious remote hosts doing things like scans, spotting traffic such as weird ports in use in either direction that warrant further investigation, watching for ddos/dos destinations to act on mitigation, or investigating the nature of unusual levels of traffic on switch ports that set off alarms. I'm concerned things like port scans, etc. won't be picked up by the NMS if fed by sflow due to the sampling nature, or similar concern if 500 ssh connections by the same remote host are sampled as 1 connection, etc. Of course these concerns were put in my head by someone interested in me continuing to use equipment that happens to output netflow data, hence me wanting some real people answers. :-)
Thanks!
Hi David, The main architectural difference between sFlow and Netflow is the location of the flow cache: 1. NetFlow: Packets are decoded on the router, flow keys are extracted and used to lookup/create an entry in a flow cache which is then updated based on values in the packet. Records are exported from the flow cache in the form of Netflow datagrams when the flow completes or based on a timeout. 2. sFlow: Packets are randomly sampled in hardware and the packet headers are immediately exported as sFlow datagrams - there is no flow cache on the switch/router. In addition to exporting the packet header, the sFlow agent captures the FIB state associated with forwarding the sampled packet, exporting information such as next hop router, AS-path, communities etc. An sFlow agent also periodically sends all the MIB-II interface counters, eliminating the need for SNMP polling - this isn't very important if you are only monitoring a few links, but makes a big difference if you are monitoring large chassis switches or tens or hundreds of thousands of ports in a data center or campus environment. Moving the flow cache off the router has a number of benefits: 1. You are no longer limited by the hardware/firmware capabilities of the router - your analysis software decides which fields to decode and how to accumulate results. For example, if you are managing a mixed IPv4/IPv6 environment you can decide to use sFlow to look into v6 over v4 and v4 over v6 tunnels (to do the same thing with Netflow would likely require a hardware upgrade). You can even feed sFlow into Wireshark for detailed analysis of protocols and packet headers. 2. Operational complexity is greatly reduced since the configuration options and resource management issues associated with the flow cache are eliminated. 3. Low latency. Measurements aren't delayed by the flow cache - you can detect DDoS attacks/large flows within seconds. 4. Scalability - you can turn on sFlow on every link (even 100G links), on every device for a comprehensive view of traffic. 5. Multi-vendor interoperability. The sFlow measurements are interoperable across vendors (since very little processing is performed on the devices). With NetFlow, different vendors and devices have different hardware limitations affecting the fields that they can export. Unsampled Netflow is only practical for moderate traffic levels. If you carry significant traffic you would want to enable sampling anyway, even with Netflow. However, there are a wide range of Netflow sampling implementations, many of which yield questionable results. In contrast, the sFlow standard specifies how sampling must be performed and ensures that information is included that allows the sampled data to be correctly scaled and produce unbiased measurements. Cheers, Peter On Fri, Jul 13, 2012 at 10:30 AM, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Can anyone on or off list give me some real world thoughts on sflow vs netflow for border routers? (multi-homed, BGP, straight v4 & v6 only for web hosting, no mpls, vpns, vlans, etc.)
Finding it hard to decipher the vendor version of the answer to that question. We use netflow v9 currently but are considering hardware that would be sflow. We don't use it for billing purposes, mostly for spotting malicious remote hosts doing things like scans, spotting traffic such as weird ports in use in either direction that warrant further investigation, watching for ddos/dos destinations to act on mitigation, or investigating the nature of unusual levels of traffic on switch ports that set off alarms. I'm concerned things like port scans, etc. won't be picked up by the NMS if fed by sflow due to the sampling nature, or similar concern if 500 ssh connections by the same remote host are sampled as 1 connection, etc. Of course these concerns were put in my head by someone interested in me continuing to use equipment that happens to output netflow data, hence me wanting some real people answers. :-)
Thanks!
Peter Phaal <peter.phaal@gmail.com> wrote on 07/13/2012 04:20:45 PM:
2. sFlow: Packets are randomly sampled in hardware and the packet headers are immediately exported as sFlow datagrams - there is no flow cache on the switch/router. In addition to exporting the packet header, the sFlow agent captures the FIB state associated with forwarding the sampled packet, exporting information such as next hop router, AS-path, communities etc
What about byte counts? Just those in the sampled packet (i.e., no running totals per flow)?
In contrast, the sFlow standard specifies how sampling must be performed and ensures that information is included that allows the sampled data to be correctly scaled and produce unbiased measurements.
Does sflow software typically recreate the total byte count per flow (e.g., TCP session) by scaling? Thanks, Joe
On 7/13/12 10:20 PM, Peter Phaal wrote:
1. NetFlow: Packets are decoded on the router, flow keys are extracted and used to lookup/create an entry in a flow cache which is then updated based on values in the packet. Records are exported from the flow cache in the form of Netflow datagrams when the flow completes or based on a timeout.
This is because NetFlow is based on the Flows, where sFlow name is misleading - it's actually PACKET monitoring technology, not FLOW monitoring. So the difference in the way both mechanisms work is inline with their definition.
2. sFlow: Packets are randomly sampled in hardware and the packet headers are immediately exported as sFlow datagrams - there is no flow cache on the switch/router.
And that's the biggest problem with sFlow. Packets are sampled, not flows. You may miss the big or important flow, you don't have visibility into every conversation going through the device. sFlow and randomized sampling rely heavily on statistics, but as soon as you agree on that, you'll loose accuracy right away.
Moving the flow cache off the router has a number of benefits: 1. You are no longer limited by the hardware/firmware capabilities of the router - your analysis software decides which fields to decode and how to accumulate results. For example, if you are managing a mixed IPv4/IPv6 environment you can decide to use sFlow to look into v6 over v4 and v4 over v6 tunnels (to do the same thing with Netflow would likely require a hardware upgrade). You can even feed sFlow into Wireshark for detailed analysis of protocols and packet headers.
NetFlow supports IPv6. As well as L2 traffic (v9), MPLS, multicast and so on.
2. Operational complexity is greatly reduced since the configuration options and resource management issues associated with the flow cache are eliminated.
That will depend on the device and the options. It takes around 3-4 commands to configure the export and then one to activate it without any templates on a interface on Cisco device. What's more important, you can have multiple monitors on one interface monitoring & exporting different sets of traffic to different groups within company (Security, Network Monitoring, Trafic Engineering). sFlow gives just sampled packets.
3. Low latency. Measurements aren't delayed by the flow cache - you can detect DDoS attacks/large flows within seconds.
The same with NetFlow. Cache can be actively flushed.
4. Scalability - you can turn on sFlow on every link (even 100G links), on every device for a comprehensive view of traffic.
Same with NetFlow & sampling turned on.
However, there are a wide range of Netflow sampling implementations, many of which yield questionable results. In contrast, the sFlow standard specifies how sampling must be performed and ensures that information is included that allows the sampled data to be correctly scaled and produce unbiased measurements.
The measurements provided by sFlow are only approximation of the real traffic and while it may be acceptable on LAN links where details don't matter as much, it's hardly good enough to present a real view on the WAN links. sFlow was built to work on switches and provide "some" accuracy, it's not good enough (unless you do sampling on a 1:5-1:10 basis) to do billing or some detailed analysis of traffic: http://www.inmon.com/pdf/sFlowBilling.pdf You can use it to *estimate* the traffic, detect DDoS, sure. But the data & scaling used by sFlow (and additionally tricks used by ASIC vendors implementing it in the hardware) can't change the fundamental difference - sFlow is really sPacket, as it doesn't deal with flows. NetFlow, jFlow, IPFIX deal with flows. You can discuss sampling accuracy and things like that, but working with flows is more accurate. -- "There's no sense in being precise when | Łukasz Bromirski you don't know what you're talking | jid:lbromirski@jabber.org about." John von Neumann | http://lukasz.bromirski.net
On Sat, 14 Jul 2012, Łukasz Bromirski wrote:
NetFlow, jFlow, IPFIX deal with flows. You can discuss sampling accuracy and things like that, but working with flows is more accurate.
If you do 1:1000 sampling with both Netflow and sFlow, why would one of them be more accurate than the other? If you analyze the flow on the device or on the collector (as might be done with sFlow), I don't see why one would be btter than the other. -- Mikael Abrahamsson email: swmike@swm.pp.se
On 7/14/12 11:15 AM, Mikael Abrahamsson wrote:
On Sat, 14 Jul 2012, Łukasz Bromirski wrote:
NetFlow, jFlow, IPFIX deal with flows. You can discuss sampling accuracy and things like that, but working with flows is more accurate.
If you do 1:1000 sampling with both Netflow and sFlow, why would one of them be more accurate than the other? If you analyze the flow on the device or on the collector (as might be done with sFlow), I don't see why one would be btter than the other.
Sure, but with sampling you'll loose accuracy anyway. The difference is subtle, and depends on the (Net|j)Flow implementation - on some devices for sampled NetFlow you'll still get sampled FLOWS (1:x) not sampled PACKETS (thus disregarding the flow advantage). -- "There's no sense in being precise when | Łukasz Bromirski you don't know what you're talking | jid:lbromirski@jabber.org about." John von Neumann | http://lukasz.bromirski.net
On Sat, Jul 14, 2012 at 10:30:25AM +0200, ?ukasz Bromirski wrote:
NetFlow supports [ .. ] As well as L2 traffic (v9) [ .. ]
Let's be real and speak implementations: where is L2 information in NetFlow for routed traffic on bigger platforms typically thrown for peering at internet exchanges - ASR9K, C7600 (ie. hopefully without get to invest more money in such platform to upgrade to Sup2T), MX, CRS? Cheers, Paolo PS: Let's not return on the point of availability of MAC accounting, since that is not the solution.
On 14/07/2012 09:30, Łukasz Bromirski wrote:
And that's the biggest problem with sFlow. Packets are sampled, not flows. You may miss the big or important flow, you don't have visibility into every conversation going through the device.
Unless you enable sampling, which is pretty much necessary for non-trivial traffic volumes.
NetFlow supports IPv6. As well as L2 traffic (v9), MPLS, multicast and so on.
It does, depending on hardware variety, but you need specific platform support for each packet variety (v4 / v6 / mpls / etc), and platform support for this can be very dodgy. You don't need this with sflow - it just punts 1 in N raw packets out to your collector, and the statistical assumptions which were made by the networking device are well documented. I've never seen documentation on the sampling technique used for each netflow implementation.
The measurements provided by sFlow are only approximation of the real traffic and while it may be acceptable on LAN links where details don't matter as much, it's hardly good enough to present a real view on the WAN links.
sFlow was built to work on switches and provide "some" accuracy, it's not good enough (unless you do sampling on a 1:5-1:10 basis) to do billing or some detailed analysis of traffic:
Depends on how detailed your requirements are. For billing, most people don't classify by packet analysis, but rather by byte count which can be handled by snmp port counters. If you need to do something fancier, non-sampled netflow is indeed good enough for billing.
http://www.inmon.com/pdf/sFlowBilling.pdf
You can use it to *estimate* the traffic, detect DDoS, sure. But the data & scaling used by sFlow (and additionally tricks used by ASIC vendors implementing it in the hardware) can't change the fundamental difference - sFlow is really sPacket, as it doesn't deal with flows.
agreed, the name is wrong.
NetFlow, jFlow, IPFIX deal with flows. You can discuss sampling accuracy and things like that, but working with flows is more accurate.
Depends on your use case. For large traffic values, you run into the law of large numbers and you can get accurate visibility into what's happening on your network. Certainly, netflow _can_ offer amazingly precise visibility into your network. But the trade-off is that you need specialised hardware to do this on your line cards or your forwarding engine. This drives up both the capex (extra hardware) and the opex (tcam is power hungry) of your network. sflow is much cheaper to implement as you're not maintaining any state on your chassis. You're just picking out a packet every so often. The current generation of high end service provider hardware (juniper mx-3d, cisco sup2t / n7k / asr9k) is pretty much the first generation of hardware which doesn't have crippling netflow limitations, such as poor support for v6 / mpls, too small cache sizes, etc. This fact alone should provide a good indication of how difficult it is to implement it well on fast boxes. sflow is simpler, cheaper and in many cases is simply a better choice if you don't need drill-down into every single flow going through your networking. Nick
Dear All Around a year ago I had the same debate sflow vs netflow vs snmp port counters. read lots of stories lots of myths lots of good information. My Conclusion In the end I did real life testing comparing each platform We routed live traffic (about 250mbits) from our Cisco 7200 G2 routers though Brocade MLXe routers and exported netflow from the Cisco platform and sFlow from the Brocade platform. Each router sent netflow/sflow traffic to two collectors on independent hardware (same specifications) running the same collection netflow analyzer software. The end result was after hours of testing, or even days and weeks of testing there was no significant difference between traffic volumes netflow was showing vs slfow. Ie less than 0.5% variance between each environment. That being said both netflow and sflow both under read by about 3% when compared to snmp port counters, which we put to the conclusion was broadcast traffic etc which the routers didn't see / flow. Regardless if you're going to bill from netflow or sflow in our test environment we saw no significant difference between either platform. Hope that helps Kindest Regards James Braunegg W: 1300 769 972 | M: 0488 997 207 | D: (03) 9751 7616 E: james.braunegg@micron21.com | ABN: 12 109 977 666 This message is intended for the addressee named above. It may contain privileged or confidential information. If you are not the intended recipient of this message you must not use, copy, distribute or disclose it to anyone other than the addressee. If you have received this message in error please return the message to the sender by replying to it and then delete the message from your computer. -----Original Message----- From: Nick Hilliard [mailto:nick@foobar.org] Sent: Monday, July 16, 2012 6:53 AM To: nanog@nanog.org Subject: Re: Real world sflow vs netflow? On 14/07/2012 09:30, Łukasz Bromirski wrote:
And that's the biggest problem with sFlow. Packets are sampled, not flows. You may miss the big or important flow, you don't have visibility into every conversation going through the device.
Unless you enable sampling, which is pretty much necessary for non-trivial traffic volumes.
NetFlow supports IPv6. As well as L2 traffic (v9), MPLS, multicast and so on.
It does, depending on hardware variety, but you need specific platform support for each packet variety (v4 / v6 / mpls / etc), and platform support for this can be very dodgy. You don't need this with sflow - it just punts 1 in N raw packets out to your collector, and the statistical assumptions which were made by the networking device are well documented. I've never seen documentation on the sampling technique used for each netflow implementation.
The measurements provided by sFlow are only approximation of the real traffic and while it may be acceptable on LAN links where details don't matter as much, it's hardly good enough to present a real view on the WAN links.
sFlow was built to work on switches and provide "some" accuracy, it's not good enough (unless you do sampling on a 1:5-1:10 basis) to do billing or some detailed analysis of traffic:
Depends on how detailed your requirements are. For billing, most people don't classify by packet analysis, but rather by byte count which can be handled by snmp port counters. If you need to do something fancier, non-sampled netflow is indeed good enough for billing.
http://www.inmon.com/pdf/sFlowBilling.pdf
You can use it to *estimate* the traffic, detect DDoS, sure. But the data & scaling used by sFlow (and additionally tricks used by ASIC vendors implementing it in the hardware) can't change the fundamental difference - sFlow is really sPacket, as it doesn't deal with flows.
agreed, the name is wrong.
NetFlow, jFlow, IPFIX deal with flows. You can discuss sampling accuracy and things like that, but working with flows is more accurate.
Depends on your use case. For large traffic values, you run into the law of large numbers and you can get accurate visibility into what's happening on your network. Certainly, netflow _can_ offer amazingly precise visibility into your network. But the trade-off is that you need specialised hardware to do this on your line cards or your forwarding engine. This drives up both the capex (extra hardware) and the opex (tcam is power hungry) of your network. sflow is much cheaper to implement as you're not maintaining any state on your chassis. You're just picking out a packet every so often. The current generation of high end service provider hardware (juniper mx-3d, cisco sup2t / n7k / asr9k) is pretty much the first generation of hardware which doesn't have crippling netflow limitations, such as poor support for v6 / mpls, too small cache sizes, etc. This fact alone should provide a good indication of how difficult it is to implement it well on fast boxes. sflow is simpler, cheaper and in many cases is simply a better choice if you don't need drill-down into every single flow going through your networking. Nick
From: James Braunegg [mailto:james.braunegg@micron21.com]
Dear All
Around a year ago I had the same debate sflow vs netflow vs snmp port counters. read lots of stories lots of myths lots of good information. My Conclusion
In the end I did real life testing comparing each platform
We routed live traffic (about 250mbits) from our Cisco 7200 G2 routers though Brocade MLXe routers and exported netflow from the Cisco platform and sFlow from the Brocade platform.
Each router sent netflow/sflow traffic to two collectors on independent hardware (same specifications) running the same collection netflow analyzer software.
The end result was after hours of testing, or even days and weeks of testing there was no significant difference between traffic volumes netflow was showing vs slfow. Ie less than 0.5% variance between each environment.
That being said both netflow and sflow both under read by about 3% when compared to snmp port counters, which we put to the conclusion was broadcast traffic etc which the routers didn't see / flow.
Regardless if you're going to bill from netflow or sflow in our test environment we saw no significant difference between either platform.
What are your thoughts on the non-billing aspects after your comparison testing; if you are/were using it for those purposes? We don't use our current netflow for billing, just for security investigation and (ideally) early alerting of abnormal activity like port scans, compromised apps on servers, etc. Thanks, David
Dear David
From a visibility point of view, we obtain as much information as we require to know exactly what's occurring on our network where and when in real-time.
We know what's happening, on any interface on any network at any time. - that being said for us the most important visibility is all about the flow of traffic and packet counts.... the security side should be done at the firewall level ! If anyone wants a demo of our sFlow setup happy to show you via a team viewer session or something ! By the way we are using sFlow now Kindest Regards James Braunegg W: 1300 769 972 | M: 0488 997 207 | D: (03) 9751 7616 E: james.braunegg@micron21.com | ABN: 12 109 977 666 This message is intended for the addressee named above. It may contain privileged or confidential information. If you are not the intended recipient of this message you must not use, copy, distribute or disclose it to anyone other than the addressee. If you have received this message in error please return the message to the sender by replying to it and then delete the message from your computer. -----Original Message----- From: David Hubbard [mailto:dhubbard@dino.hostasaurus.com] Sent: Tuesday, July 17, 2012 8:26 AM To: nanog@nanog.org Subject: RE: Real world sflow vs netflow? From: James Braunegg [mailto:james.braunegg@micron21.com]
Dear All
Around a year ago I had the same debate sflow vs netflow vs snmp port counters. read lots of stories lots of myths lots of good information. My Conclusion
In the end I did real life testing comparing each platform
We routed live traffic (about 250mbits) from our Cisco 7200 G2 routers though Brocade MLXe routers and exported netflow from the Cisco platform and sFlow from the Brocade platform.
Each router sent netflow/sflow traffic to two collectors on independent hardware (same specifications) running the same collection netflow analyzer software.
The end result was after hours of testing, or even days and weeks of testing there was no significant difference between traffic volumes netflow was showing vs slfow. Ie less than 0.5% variance between each environment.
That being said both netflow and sflow both under read by about 3% when compared to snmp port counters, which we put to the conclusion was broadcast traffic etc which the routers didn't see / flow.
Regardless if you're going to bill from netflow or sflow in our test environment we saw no significant difference between either platform.
What are your thoughts on the non-billing aspects after your comparison testing; if you are/were using it for those purposes? We don't use our current netflow for billing, just for security investigation and (ideally) early alerting of abnormal activity like port scans, compromised apps on servers, etc. Thanks, David
James Braunegg writes:
In the end I did real life testing comparing each platform
Great, thanks for sharing your results! (It would be nice if you could tell us a little bit about the configuration, i.e. what kind of sampling you used.) [...]
That being said both netflow and sflow both under read by about 3% when compared to snmp port counters, which we put to the conclusion was broadcast traffic etc which the routers didn't see / flow.
That's one reason, but another reason would be that at least in Netflow (but sFlow may be similar depending on how you use it), the reported byte counts only include the sizes of the "L3" packets, i.e. starting at the IP header, while the SNMP interface counters (ifInOctets etc.) include L2 overhead such as Ethernet frame headers and such. -- Simon.
On 17/07/2012 16:32, Simon Leinen wrote:
That's one reason, but another reason would be that at least in Netflow (but sFlow may be similar depending on how you use it), the reported byte counts only include the sizes of the "L3" packets, i.e. starting at the IP header, while the SNMP interface counters (ifInOctets etc.) include L2 overhead such as Ethernet frame headers and such.
sflow includes both figures. Nick
In the case of sFlow, the collector determines how to report bytes. The sFlow agent reports the size of the sampled layer 2 frame (along with the first 128 bytes of the frame) and the collector can choose whether to report L2 bytes, L3 bytes, L4 bytes etc. by subtracting the sizes of the headers. It seems likely that the sFlow collector used in the tests was reporting L3 bytes since the numbers were in agreement with the numbers reported by NetFlow. Peter On Tue, Jul 17, 2012 at 8:32 AM, Simon Leinen <simon.leinen@switch.ch> wrote:
James Braunegg writes:
That being said both netflow and sflow both under read by about 3% when compared to snmp port counters, which we put to the conclusion was broadcast traffic etc which the routers didn't see / flow.
That's one reason, but another reason would be that at least in Netflow (but sFlow may be similar depending on how you use it), the reported byte counts only include the sizes of the "L3" packets, i.e. starting at the IP header, while the SNMP interface counters (ifInOctets etc.) include L2 overhead such as Ethernet frame headers and such. -- Simon.
On Sat, Jul 14, 2012 at 1:30 AM, Łukasz Bromirski <lukasz@bromirski.net> wrote:
sFlow is really sPacket, as it doesn't deal with flows.
NetFlow, jFlow, IPFIX deal with flows.
I am a puzzled by the orthodoxy that seems to prevail around the value "flows" as a measure of network traffic in packet switched networks. The following article contains some thoughts on flow oriented and packet oriented measurements. Apologies to NANOG readers for the simplistic analogies used to describe packet switching, the article is also intended for server administrators and application developers who often don't really know what happens when they write some bytes to a TCP socket. http://blog.sflow.com/2012/09/packets-and-flows.html The article positions flows as a useful abstraction for characterizing host and application performance, but as a poor fit for understanding packet traffic and measuring the performance of packet switches and routers. This isn't really an issue of sFlow vs. NetFlow/IPFIX etc. Either protocol can be used to export both types of measurements; the question is what types of measurement should be exported. What do people think? Peter
On Thu, 20 Sep 2012, Peter Phaal wrote:
I am a puzzled by the orthodoxy that seems to prevail around the value "flows" as a measure of network traffic in packet switched networks.
What platforms actually do real unsampled netflow today, and do it well for multi-10gigabit worth of typical Internet traffic? Most of the platforms I know of do sampled netflow at 1:100-1:1000 or so, and then I don't really see the fundamental difference in doing the flow analysis on the router itself (classic netflow) or doing the same but at the sFlow collector. -- Mikael Abrahamsson email: swmike@swm.pp.se
On Thu, Sep 20, 2012 at 11:21 AM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
Most of the platforms I know of do sampled netflow at 1:100-1:1000 or so, and then I don't really see the fundamental difference in doing the flow analysis on the router itself (classic netflow) or doing the same but at the sFlow collector.
There is no difference in the flow records you would obtain in either case. However, moving the flow generation out of the router gives a lot of flexibility. You can now choose how you want to generate flows, rather than depend on the router vendor. You are also guaranteed multi-vendor interoperability since problems associated with differences in how each vendor generates flows are eliminated. For a real world example on the need for flexibility in monitoring, consider the challenge posed by IPv6 migration and virtualization as they greatly increase the amount of layer 2, 3 and 4 tunneled traffic. With an external software based flow generation you can easily upgrade the software to report flows within the tunnels etc. http://blog.sflow.com/2012/05/tunnels.html There are many other things you can do with packet oriented (sFlow) data besides flow generation and analysis that I think are worth being aware of: 1. Route analytics. Packet forwarding decisions are made on a packet by packet basis and sFlow accurately captures the forwarding decision made for each sampled packet (flows are not a good way to report forwarding decisions since you are forced to assume that the all packets in the flow took the same forwarding path, which may not be the case). With packet oriented measurements you can build a route cache and use it to understand traffic forwarding based on AS-path, next hop router etc. 2. Analysis of multi-path forwarding. Detailed visibility into per-packet forwarding lets you diagnose issues with unbalanced LAG groups, ECMP paths, TRILL paths etc. 3. Packet sizes. With packet oriented data you can easily calculate packet size distributions by protocol, DSCP class, egress port etc. 4. DDoS detection and mitigation. Analysis of the sampled packet stream can detect DDoS attacks within seconds and an automatic response can be constructed using packet forwarding and header information to find a signature for the attack, point of ingress etc. You can also use packet analyzers like Wireshark and tcpdump to look at the sFlow packet header records, http://blog.sflow.com/2011/11/wireshark.html 5. Packet counters. MIB-2 interface counters are included in the set of measurements that sFlow exports. Eliminating SNMP polling reduces CPU load on the router (I have seen very high router CPU loads associated with SNMP) and provides much faster updates on link utilizations, packet discard rates etc. I think Nick Hilliard put it well:
Flows are good for measuring some things; raw packet sampling is good for measuring others.
Decide on what you're trying to measure, then pick the best tool for the job.
However, to choose intelligently requires an understanding of the fundamental differences between packet oriented and flow oriented measurements, particularly as to how those differences relate to the problem you are trying to solve. The two types of measurement are related, but not the same.
On Sep 22, 2012, at 12:40 AM, Peter Phaal wrote:
However, moving the flow generation out of the router gives a lot of flexibility.
Actually, moving it out of the router creates huge problems and destroys a lot of the value of the flow telemetry - it nullifies your ability to traceback where traffic is ingressing your network, which is key for both security as well as traffic engineering, peering analysis, etc. It is far, far better to get your flow telemetry from your various edge routers, if at all possible, rather that probes. Scales better, too - and is less expensive in terms of both capex and opex. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton
On Fri, Sep 21, 2012 at 10:02 PM, Dobbins, Roland <rdobbins@arbor.net> wrote:
On Sep 22, 2012, at 12:40 AM, Peter Phaal wrote:
However, moving the flow generation out of the router gives a lot of flexibility.
Actually, moving it out of the router creates huge problems and destroys a lot of the value of the flow telemetry - it nullifies your ability to traceback where traffic is ingressing your network, which is key for both security as well as traffic engineering, peering analysis, etc.
It is far, far better to get your flow telemetry from your various edge routers, if at all possible, rather that probes. Scales better, too - and is less expensive in terms of both capex and opex.
Roland, I probably wasn't as clear as a should have been in describing how sFlow works. Here are some comments and links to additional information that address each of your concerns: 1. There are no probes involved when using sFlow, the architecture looks very similar to NetFlow with UDP records streaming from multiple routers to a software collector. http://blog.sflow.com/2009/05/choosing-sflow-analyzer.html 2. The sFlow records exported by the router include telemetry that allows you to trace traffic paths through the network (ingress port, egress port, FIB entry etc.). http://blog.sflow.com/2009/05/packet-paths.html 3. sFlow has a lower CAPEX, the flow cache resides in inexpensive memory on a commodity server instead of limited, expensive, TCAM memory on the router. The sFlow instrumentation is included in ASICs and is a base feature of the device; unlike NetFlow which often requires upgraded supervisor cards etc. sFlow is widely supported in merchant silicon, further reducing costs and increasing multi-vendor interoperability - Cisco supports sFlow in the merchant silicon based Nexus 3k series. http://blog.sflow.com/2010/09/superlinear.html http://blog.sflow.com/2011/12/merchant-silicon.html http://blog.sflow.com/2012/09/vendor-support.html http://blog.sflow.com/2012/08/cisco-adds-sflow-support.html 4. sFlow has lower OPEX, the architecture is simpler, has lower operational complexity and provides much greater scalability. http://blog.sflow.com/2010/11/complexity-kills.html http://blog.sflow.com/2010/09/superlinear.html Peter
On Sep 23, 2012, at 1:51 AM, Peter Phaal wrote:
Here are some comments and links to additional information that address each of your concerns:
You have misinterpreted what I said. I was saying that flow telemetry of any variety must be exported from edge devices, which in most cases are routers (in some cases layer-3 switches), in response to your 'move it out of the router' comment. I disagree quite strongly with your comments regarding s/Flow vs. NetFlow, but am not interested in spamming the list with an extended discussion thereof. Let's just agree to disagree on that issue. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton
On Sat, Sep 22, 2012 at 4:41 PM, Dobbins, Roland <rdobbins@arbor.net> wrote:
You have misinterpreted what I said. I was saying that flow telemetry of any variety must be exported from edge devices, which in most cases are routers (in some cases layer-3 switches), in response to your 'move it out of the router' comment.
I am sorry I misunderstood your comment, I agree that it is important to gather telemetry directly from your edge devices. The comment "move it out of the router" referred to the location of the flow-cache in the following scenario. On Thu, Sep 20, 2012 at 11:21 AM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
Most of the platforms I know of do sampled netflow at 1:100-1:1000 or so, and then I don't really see the fundamental difference in doing the flow analysis on the router itself (classic netflow) or doing the same but at the sFlow collector.
In both cases the router is generating the telemetry, in the netflow case, packets are sampled on the router, the router builds flow records based on the contents of the sampled packets, and the flow records are exported. In the sFlow case, the raw sampled packet headers are exported to external software which builds flow records. In both cases the router is making the primary measurements and you end up with the same measurements. On Fri, Sep 21, 2012 at 10:02 PM, Dobbins, Roland <rdobbins@arbor.net> wrote:
Actually, moving it out of the router creates huge problems and destroys a lot of the value of the flow telemetry - it nullifies your ability to traceback where traffic is ingressing your network, which is key for both security as well as traffic engineering, peering analysis, etc.
It is far, far better to get your flow telemetry from your various edge routers, if at all possible, rather that probes. Scales better, too - and is less expensive in terms of both capex and opex.
I agree completely, probes are expensive, difficult to manage and can't accurately tell you how the traffic passed through the router.
On Sep 23, 2012, at 12:43 AM, Peter Phaal wrote:
In both cases the router is generating the telemetry, in the netflow case, packets are sampled on the router, the router builds flow records based on the contents of the sampled packets, and the flow records are exported. In the sFlow case, the raw sampled packet headers are exported to external software which builds flow records. In both cases the router is making the primary measurements and you end up with the same measurements.
Actually, you don't... If the *flow generation process is not performed on the router (or otherwise conveyed by some metadata outside of "raw [sampled] packet headers") then you lose visibility to ingress and egress ifIndex (interface) information -- information which is required if/when deploying controls on those systems to squelch various traffic flows. This is _part of the point Roland was trying to make. -danny
On Sep 23, 2012, at 7:55 PM, Danny McPherson wrote:
If the *flow generation process is not performed on the router (or otherwise conveyed by some metadata outside of "raw [sampled] packet headers") then you lose visibility to ingress and egress ifIndex (interface) information -- information which is required if/when deploying controls on those systems to squelch various traffic flows.
Thanks, Danny - I guess I should've spelled it out, thanks for clarifying, heh. It should also be noted that generating the flows directly from the data plane of the router/switch or doing it offboard (as long as sufficient ingress/egress ifindex metadata are collected and exported, as you note) is just an implementation detail - it isn't inherent to s/Flow, NetFlow, IPFIX, et. al. So, claiming this as some kind of advantage for a particular flow telemetry format is a non sequitur. ;> ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton
On Sun, Sep 23, 2012 at 8:16 AM, Dobbins, Roland <rdobbins@arbor.net> wrote:
On Sep 23, 2012, at 7:55 PM, Danny McPherson wrote:
If the *flow generation process is not performed on the router (or otherwise conveyed by some metadata outside of "raw [sampled] packet headers") then you lose visibility to ingress and egress ifIndex (interface) information -- information which is required if/when deploying controls on those systems to squelch various traffic flows.
Thanks, Danny - I guess I should've spelled it out, thanks for clarifying, heh.
It should also be noted that generating the flows directly from the data plane of the router/switch or doing it offboard (as long as sufficient ingress/egress ifindex metadata are collected and exported, as you note) is just an implementation detail - it isn't inherent to s/Flow, NetFlow, IPFIX, et. al. So, claiming this as some kind of advantage for a particular flow telemetry format is a non sequitur.
Exporting packet oriented measurements doesn't mean that you have to loose ingress/egress interface data. In the specific example being discussed (sFlow export), detailed forwarding information from the router forwarding plane is exported with each sampled packet header (full AS-path if you are using BGP). An external flow generator in this case can produce flow records that are identical to those that the device would produce, i.e. include ingress/egress ports. The difference between packet oriented or flow oriented export is an "implementation detail" if your only requirement is to obtain layer IP flow records, but becomes significant if you want to create customized flow records or create packet oriented metrics. Applications for packet oriented metrics mentioned earlier in this thread included route analytics, analysis of ECMP/LAG/TRILL forwarding, packet size distribution vs. DSCP, DDoS mitigation. The problem with having the router perform the flow analysis is that once data is aggregated, it can't be disaggregated. It's like the difference between receiving eggs or an omelette. If you like the omelette, great! But if you wan't a different omelette or would like to poach, boil, scramble or bake your eggs then getting the raw eggs is a lot more versatile.
On Sep 23, 2012, at 11:23 PM, Peter Phaal wrote:
The difference between packet oriented or flow oriented export is an "implementation detail" if your only requirement is to obtain layer IP flow records, but becomes significant if you want to create customized flow records or create packet oriented metrics. Applications for packet oriented metrics mentioned earlier in this thread included route analytics, analysis of ECMP/LAG/TRILL forwarding, packet size distribution vs. DSCP, DDoS mitigation.
It might be a good idea to read up on Flexible NetFlow, IPFIX, and PSAMP over IPFIX, since everything you mention above can be done by collecting/analyzing those telemetry formats. In fact, it might be a good idea to read up on plain old classical NetFlow v5 and v9, too, as almost all of what's mentioned above is accomplished every day using them, as well, heh.
The problem with having the router perform the flow analysis is that once data is aggregated, it can't be disaggregated.
Nobody in this thread has advocated aggregated NetFlow. I certainly don't. At any rate, I knew this would happen if we started talking about the merits of s/Flow vs. NetFlow. For some reason, s/Flow advocates seem to feel compelled to come up with straw-man arguments and misstatements, and try to use them to 'prove' what they view as the inherent superiority of s/Flow - when any unbiased indvidual who's worked with both formats at length knows that this simply isn't true. In this particular instance, I guess it's natural to feel compelled to present one's own creations in a positive light. However, it just isn't cricket to make incorrect, incomplete, and/or misleading statements about perceived competitors to one's own creations, you know?
It's like the difference between receiving eggs or an omelette. If you like the omelette, great! But if you wan't a different omelette or would like to poach, boil, scramble or bake your eggs then getting the raw eggs is a lot more versatile.
At any rate, I've wasted enough of everyone's time/bandwidth as a result of this particular instance of flow telemetry format trolling; I won't be providing anything more in the way of sustenance. ;> ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton
Peter Phaal <peter.phaal@gmail.com> wrote on 09/23/2012 12:23:57 PM:
Exporting packet oriented measurements doesn't mean that you have to loose ingress/egress interface data. In the specific example being discussed (sFlow export), detailed forwarding information from the router forwarding plane is exported with each sampled packet header (full AS-path if you are using BGP).
Wrt AS-path, I don't get how this happens. Since this is important to this community, could you explain? Thanks, Joe
On 2012-09-24 14:48 , Joe Loiacono wrote:
Peter Phaal <peter.phaal@gmail.com> wrote on 09/23/2012 12:23:57 PM:
Exporting packet oriented measurements doesn't mean that you have to loose ingress/egress interface data.
Note that you get these in NetFlow too. Depends on which version you pick or how you combine your template and of course if the hard and software allows it, but it is there.
In the specific example being
discussed (sFlow export), detailed forwarding information from the router forwarding plane is exported with each sampled packet header (full AS-path if you are using BGP).
Wrt AS-path, I don't get how this happens. Since this is important to this community, could you explain?
As sFlow runs on the same box that knows the BGP tables the packets sflow packets get that information too. No magic there. This can also be done with NetFlow/IPFIX though, as shown in: http://www.pmacct.net/building_traffic_matrices_n49.pdf thus by combining a BGP feed with the NetFlow/IPFIX feed. There is of course a small chance in such a setup that the tables mismatch and is not the same as the router would have made it. Then again with sFlow you typically sample and thus you have windows of loss anyway... Note that there are IPFIX/NetFlow enabled boxes which also include BGP details if one is worried about that, though if your path changes mid-flow you have a slight error there too again. Greets, Jeroen
On Mon, Sep 24, 2012 at 5:48 AM, Joe Loiacono <jloiacon@csc.com> wrote:
Peter Phaal <peter.phaal@gmail.com> wrote on 09/23/2012 12:23:57 PM:
Exporting packet oriented measurements doesn't mean that you have to loose ingress/egress interface data. In the specific example being discussed (sFlow export), detailed forwarding information from the router forwarding plane is exported with each sampled packet header (full AS-path if you are using BGP).
Wrt AS-path, I don't get how this happens. Since this is important to this community, could you explain?
Sure. I think it's worth discussing in some detail since this is relevant to the NANOG community and it is important to understand how it works. When a switch/router decides to sample a packet it records the ingress/egress interfaces and accumulates information about how it decided to forward the packet by examining its FIB tables. Each packet may take a different path, some may by switched at layer 2, others may be forwarded based on a local routing protocol like OSPF, and still others may be forwarded based on BGP. The forwarding data associated with each packet is irregular (e.g. a switched packet won't have BGP information), and so sFlow doesn't try to flatten it into tables, but instead encodes the data using XDR (RFC 1832), expressing each element of the forwarding decision as a tag, length, value encoded structure that contains attributes relevant to each type of forwarding decision. The AS-Path itself is a fairly complicated, variable length structure and again, this is encoded as XDR. These are all optional fields in sFlow, so you should check with your switch vendor to see which ones they support. If they don't currently export the FIB data you are looking for, you should ask them to upgrade their agent because as Jeroen pointed out, populating each structure is just an extra lookup performed by the management CPU on the router. FYI I have see full AS-path data exported from a busy 100G router, so there should be no problem collecting these measurements in a production setting. The following extract from the sFlow version 5 specification shows what forwarding information is exported: /* Extended Flow Data Extended data types provide supplimentary information about the sampled packet. All applicable extended flow records should be included with each flow sample. */ /* Extended Switch Data */ /* opaque = flow_data; enterprise = 0; format = 1001 */ /* Note: For untagged ingress ports, use the assigned vlan and priority of the port for the src_vlan and src_priority values. For untagged egress ports, use the values for dst_vlan and dst_priority that would have been placed in the 802.Q tag had the egress port been a tagged member of the VLAN instead of an untagged member. */ struct extended_switch { unsigned int src_vlan; /* The 802.1Q VLAN id of incoming frame */ unsigned int src_priority; /* The 802.1p priority of incoming frame */ unsigned int dst_vlan; /* The 802.1Q VLAN id of outgoing frame */ unsigned int dst_priority; /* The 802.1p priority of outgoing frame */ } /* IP Route Next Hop ipForwardNextHop (RFC 2096) for IPv4 routes. ipv6RouteNextHop (RFC 2465) for IPv6 routes. */ typedef next_hop address; /* Extended Router Data */ /* opaque = flow_data; enterprise = 0; format = 1002 */ struct extended_router { next_hop nexthop; /* IP address of next hop router */ unsigned int src_mask_len; /* Source address prefix mask (expressed as number of bits) */ unsigned int dst_mask_len; /* Destination address prefix mask (expressed as number of bits) */ } enum as_path_segment_type { AS_SET = 1, /* Unordered set of ASs */ AS_SEQUENCE = 2 /* Ordered set of ASs */ } union as_path_type (as_path_segment_type) { case AS_SET: unsigned int as_set<>; case AS_SEQUENCE: unsigned int as_sequence<>; } /* Extended Gateway Data */ /* opaque = flow_data; enterprise = 0; format = 1003 */ struct extended_gateway { next_hop nexthop; /* Address of the border router that should be used for the destination network */ unsigned int as; /* Autonomous system number of router */ unsigned int src_as; /* Autonomous system number of source */ unsigned int src_peer_as; /* Autonomous system number of source peer */ as_path_type dst_as_path<>; /* Autonomous system path to the destination */ unsigned int communities<>; /* Communities associated with this route */ unsigned int localpref; /* LocalPref associated with this route */ }
Peter Phaal <peter.phaal@gmail.com> wrote on 09/24/2012 10:39:26 AM:
When a switch/router decides to sample a packet it records the ingress/egress interfaces and accumulates information about how it decided to forward the packet by examining its FIB tables. Each packet may take a different path, some may by switched at layer 2, others may be forwarded based on a local routing protocol like OSPF, and still others may be forwarded based on BGP.
OK, Well I guess I was thinking sFlow was primarily a switch oriented technology versus on a layer-3 peering router.
On Mon, Sep 24, 2012 at 11:19 AM, Joe Loiacono <jloiacon@csc.com> wrote:
OK, Well I guess I was thinking sFlow was primarily a switch oriented technology versus on a layer-3 peering router.
The sFlow technology is a good fit for any device that performs a packet forwarding function (including routers) and the sFlow.org web site maintains a list of switches and routers that implement the technology, http://sflow.org/products/network.php However, you are correct that today sFlow is more broadly implemented in switching platforms than routing platforms, but I expect this will change as network speeds increase and platforms converge.
On Mon, Sep 24, 2012 at 11:52:28AM -0700, Peter Phaal wrote:
On Mon, Sep 24, 2012 at 11:19 AM, Joe Loiacono <jloiacon@csc.com> wrote:
OK, Well I guess I was thinking sFlow was primarily a switch oriented technology versus on a layer-3 peering router.
The sFlow technology is a good fit for any device that performs a packet forwarding function (including routers) and the sFlow.org web site maintains a list of switches and routers that implement the technology,
Minus a whole pile of babble from people who don't actually know what a router vs layer 3 switch is...The difference at this point is mostly that NetFlow has provisions to allow exporting all data about an ENTIRE flow, whereas sFlow is designed to only take statistical samples for overall traffic analysis. Tracking an entire flow is much harder, it requires keeping state on the router, so if you only care about overall traffic analysis sampling is just fine. Originally sFlow introduced features like raw packet export (including layer 2 headers), and extensible formatting, which NetFlow later copied with v9 and v10/IPFIX. At this point they're "mostly" on the same footing technically, though sFlow does have a "counter export" feature which is essentially a "push" version of polling SNMP IF-MIB counters. Only Cisco and Juniper are still trying to push NetFlow though, sFlow has been adopted by nearly ehter other vendor at this point. Even some Juniper products, like EX (which is really Marvell ASICs with a JUNOS wrapper), support sFlow only. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
http://www.plixer.com/blog/netflow/netflow-vs-sflow-for-network-monitoring-a... Regards, Benoit.
Can anyone on or off list give me some real world thoughts on sflow vs netflow for border routers? (multi-homed, BGP, straight v4 & v6 only for web hosting, no mpls, vpns, vlans, etc.)
Finding it hard to decipher the vendor version of the answer to that question. We use netflow v9 currently but are considering hardware that would be sflow. We don't use it for billing purposes, mostly for spotting malicious remote hosts doing things like scans, spotting traffic such as weird ports in use in either direction that warrant further investigation, watching for ddos/dos destinations to act on mitigation, or investigating the nature of unusual levels of traffic on switch ports that set off alarms. I'm concerned things like port scans, etc. won't be picked up by the NMS if fed by sflow due to the sampling nature, or similar concern if 500 ssh connections by the same remote host are sampled as 1 connection, etc. Of course these concerns were put in my head by someone interested in me continuing to use equipment that happens to output netflow data, hence me wanting some real people answers. :-)
Thanks!
participants (15)
-
Benoit Claise
-
Danny McPherson
-
David Hubbard
-
Dobbins, Roland
-
Harry Hoffman
-
James Braunegg
-
Jeroen Massar
-
Joe Loiacono
-
Mikael Abrahamsson
-
Nick Hilliard
-
Paolo Lucente
-
Peter Phaal
-
Richard A Steenbergen
-
Simon Leinen
-
Łukasz Bromirski