Maybe a slightly off topic math-geek kind of question to take time out from the ARIN/death-of-IPv4/IPv6-evangalist thread of the week. Has anyone found any value in examining network utilization numbers with Fourier analyses? After staring at pretty MRTG graphs for a bit too long today, I'm wondering if there are some interesting periodic characteristics in the data that could be easily teased out beyond, "Well, the diurnal fluctuations are obvious, but looks like we may have some hourly traffic spikes in there too. And maybe some of those are bigger every fourth hour." A quick Google search turned up nothing at all. With many EE-types who find their way into network operations and wannabe-EEs already there, I found that maybe a little surprising. I know the EEs love Fourier transforms.
Hi Crist, On Tue, Apr 21, 2009 at 05:12:04PM -0700, Crist Clark wrote:
Has anyone found any value in examining network utilization numbers with Fourier analyses? After staring at pretty MRTG graphs for a bit too long today, I'm wondering if there are some interesting periodic characteristics in the data that could be easily teased out beyond, "Well, the diurnal fluctuations are obvious, but looks like we may have some hourly traffic spikes in there too. And maybe some of those are bigger every fourth hour."
A quick Google search turned up nothing at all.
Such techniques are used in the are of network anomaly detection. For instance, a search for "network anomaly detection" at scholar.google.com will yield very many results. Our 2002 paper, "A Signal Analysis of Network Traffic Anomalies" [ACM SIGCOMM Internet Measurement Workshop 2002, Barford, et al.], is one such work. We mention that we use wavelet analysis rather than Fourier analysis because wavelet/framelet analysis is able to localize events both in the frequency and time domains, whereas Fourier analysis would localize the events only in frequency, so an iterative approach (with varying intervals of time) would be necessary. In general, this is the reason why Fourier analysis has not been a common technique used in network anomaly detection. That work used data stored in RRD files at five minute intervals. Our subsequent work used data stored at one second intervals, again in RRD files. Dave -- plonka@cs.wisc.edu http://net.doit.wisc.edu/~plonka/ Madison, WI
Gents, On Tue, Apr 21, 2009 at 5:30 PM, Dave Plonka <plonka@doit.wisc.edu> wrote:
Hi Crist,
On Tue, Apr 21, 2009 at 05:12:04PM -0700, Crist Clark wrote:
Has anyone found any value in examining network utilization numbers with Fourier analyses? After staring at pretty
In short, yup!
there are some interesting periodic characteristics in the data that could be easily teased out beyond, "Well, the
Indeed, there are. Interesting things emerge in frequency (or phase) space - bits/sec, packets/sec, and ave size, etc. - all have new meaning, often revealing subtle details otherwise missed. The UW paper [Barford/Plonka et. al] is one of my favories and often referenced in other publications. Along similar lines, I presented a lightning talk at nanog that demonstrates using windowed Ft's (mostly Gaussian or Hamming) in three-axis graphs (i.e. 'waterfalls') available in common tools (buadline, sigview, labview, etc) for characterizing round trip times through various network queues and queue states. Unexpectedly, interesting details regarding host IP stacks and OS scheduler behavior became visible. Find the talk slides and video here (look for 'kapela'): http://www.nanog.org/meetings/nanog37/agenda.php
A quick Google search turned up nothing at all.
Signal analysis, sadly, isn't as fun as going shopping or posting to webhosting talk, etc. so you won't likely find much there.
Such techniques are used in the are of network anomaly detection. For instance, a search for "network anomaly detection" at scholar.google.com will yield very many results.
I would also mention citeseer (http://citeseer.ist.psu.edu/) and ieee explore (http://ieeexplore.ieee.org) - there's lots of related application of Ft's and wavelet/fir filters in various disciplines, all of which can apply to the analysis of time-series data.
is one such work. We mention that we use wavelet analysis rather than Fourier analysis because wavelet/framelet analysis is able to localize events both in the frequency and time domains, whereas Fourier analysis would localize the events only in frequency, so an iterative approach (with varying intervals of time) would be necessary. In general, this is the reason why Fourier analysis has not been a common technique used in network anomaly detection.
I want to suggest that time windowed Ft might be a reasonable middle ground, certainly for Crist's case. Naturally, the trade-offs will be in frequency accuracy (ie. longer window) vs. temporal accuracy (ie. short window). Another solution for your needs might be cascaded FIR "bandpass" filters, but again, you're subject to time/frequency error trade-offs as related a filter's bandwidth. While you're at it, consider processing your time series data into histogram stacks, or nested histograms. I haven't specifically seen a paper covering this, but another UW gent (DW, are you reading this?) used to process their 30 second ifmib data into a raw .ps file, and printed this out weekly/daily. The trends visible here were quite interesting, but I don't think much further work was done to see if anything super-interesting was more/less visible in this form than traditional ones. -Tk
On Thu, Apr 23, 2009 at 2:48 PM, Anton Kapela <tkapela@gmail.com> wrote:
Indeed, there are. Interesting things emerge in frequency (or phase) space - bits/sec, packets/sec, and ave size, etc. - all have new
Forgot to mention one point - since packets/bits/etc data is more monotonic than not (math wizards, please debate/chime in) and since it's not a 'signal' in the continuous sense, you might find value in differentially filtering the input data *before* FT or wavelet processing. This would serve to remove the weird-looking "DC" offset in the output simply by creating a semi-even distribution of both positive and negative input sample values. -Tk
As IP traffic is assumed to be self-similar, my EE origins tell me to look for parameters that could measure it from stochastic process theory. On a Google search this paper sounded interesting: http://www.sparc.uni-mb.si/OPNET/PDF/IWSSIP2007Fras.pdf (...) We estimated the Hurst parameter (H) for the arrival process, and the fitted distributions for the measured data (packet size and inter-arrival processes). Using the autocorrelation function of the process, we determined long-range or short-range dependence. distribution and its parameters. The Hurst parameter was estimated using three graphical methods (variance, R/S, and periodogram methods). Distribution and its parameters were estimated using fitting tools. (...) Doing it in RRD-time seems like a challenge, though. It might be easier to plot fractals from the data source if your target audience is made of humans, because they will spot patterns real fast with much less number crunching. Rubens On Tue, Apr 21, 2009 at 9:12 PM, Crist Clark <Crist.Clark@globalstar.com> wrote:
Maybe a slightly off topic math-geek kind of question to take time out from the ARIN/death-of-IPv4/IPv6-evangalist thread of the week.
Has anyone found any value in examining network utilization numbers with Fourier analyses? After staring at pretty MRTG graphs for a bit too long today, I'm wondering if there are some interesting periodic characteristics in the data that could be easily teased out beyond, "Well, the diurnal fluctuations are obvious, but looks like we may have some hourly traffic spikes in there too. And maybe some of those are bigger every fourth hour."
A quick Google search turned up nothing at all. With many EE-types who find their way into network operations and wannabe-EEs already there, I found that maybe a little surprising. I know the EEs love Fourier transforms.
participants (4)
-
Anton Kapela
-
Crist Clark
-
Dave Plonka
-
Rubens Kuhl