Re: Open source Netflow analysis for monitoring AS-to-AS traffic

30 Mar 2024

      On Fri, 29 Mar 2024 at 20:10, Steven Bakker <steven.bakker@ams-ix.net> wrote:
...
To top it off, both the sFlow and IPFIX specs are sufficiently vague about the meaning of the "frame size", so vendors can implement whatever they want (include/exclude padding, include/exclude FCS). This implies that you shouldn't trust these fields.
I share this concern, but in my experience the market simply does not
care at all what the data means. People happily graph L3 rate from
Junos, and L2 rate from other boxes, using them interchangeably as
well as using them to determine if or not there is congestion.
While in reality, what you really want is L1 speed, so you can
actually see if the interface is full or not. Luckily we are starting
to see more and more devices also support peak-buiffer-util in
previous N seconds, which is far more useful for congestion
monitoring, unfortunately it is not IF-MIB so most will never ever
collect it.

Note, it is possible to get most Juniper gear to report L2 rate like
IF-MIB specifies, but it's a non-standard configuration option,
therefore very rarely used.

I also wholeheartedly agree on inline templates being near peak
insanity. Huge complexity for upside that is completely beyond my
understanding. If I decide to collect a new metric, then punching in
the metric number+name somewhere is the least of my worries. Idea that
the costs are lowered by having machines dynamically determine what is
being collected and monitored is just bizarre. Most of the cost of
starting to collect a new metric is figuring out how it is actionable,
what needs to happen to the metric to trigger a given action, and how
exactly we are extracting value from this action.
Definitely Netflow v9/v10 should have done out-of-band templates, and
left it to operator concern to communicate to the collector what it is
seeing.

Even exceedingly trivial things in v9/v10 entities can be broken for
years and years before anyone notices, like for example the original
sampling entities are deprecated, they are replaced with new entities,
which communicate 'every N packets, sample C packets', this is very
very good, because it allows you to do stateless sampling, while still
filling out export packet with MTU or larger size to keep export PPS
rate same before/after axing cache. However, by the time I was looking
into this, only pmacct correctly understood how to use these entities,
nfcapd and arbor either didn't understand them, or understood them
incorrectly (both were fixed in a timely manner by responsible
maintainers, thank you).

-- 
  ++ytti

Re: Open source Netflow analysis for monitoring AS-to-AS traffic

Saku Ytti