RE: What percentage of the Internet Traffic is junk?
Very very very near to, but not quite 100%. Since almost all of the traffic on the Internet isn't sourced by or destined for me, I consider it junk. Also remember that to a packet kid, that insane flood of packets destined for his target is the most important traffic in the world. And to a spammer, the very mailings that are making him millions are more important than pictures of someone's grandkids. I guess my point is junk is a very relative term. A study would need to first be done to identify what junk actually is, then measuring it is trivial. -Mike -----Original Message----- From: William B. Norton [mailto:wbn@equinix.com] Sent: Wednesday, May 05, 2004 11:21 AM To: nanog@merit.edu Subject: What percentage of the Internet Traffic is junk? With all the spam, infected e-mails, DOS attacks, ultimately blackholed traffic, etc. I wonder if there has been a study that quantifies What percentage of the Internet traffic is junk? Bill
So instead of trying to determine what percentage of internet traffic is junk, why don't we set up categories (I saw someone make a start at it a couple of messages back) and figure out what percentage of traffic fits under each category. We can come up with our own opinions as to which of those categories is junk. So I guess we would start with stuff that stands as a major category: e-mail, nntp, ftp, telnet, ssh, web... and then you start doing a lot of subcategorizations. I imagine it would start looking like a hierarchical org chart. ** Reply to message from Mike Damm <MikeD@irwinresearch.com> on Wed, 5 May 2004 11:51:19 -0700
Very very very near to, but not quite 100%. Since almost all of the traffic on the Internet isn't sourced by or destined for me, I consider it junk.
Also remember that to a packet kid, that insane flood of packets destined for his target is the most important traffic in the world. And to a spammer, the very mailings that are making him millions are more important than pictures of someone's grandkids.
I guess my point is junk is a very relative term. A study would need to first be done to identify what junk actually is, then measuring it is trivial.
-Mike
-----Original Message----- From: William B. Norton [mailto:wbn@equinix.com] Sent: Wednesday, May 05, 2004 11:21 AM To: nanog@merit.edu Subject: What percentage of the Internet Traffic is junk?
With all the spam, infected e-mails, DOS attacks, ultimately blackholed traffic, etc. I wonder if there has been a study that quantifies
What percentage of the Internet traffic is junk?
Bill
-- Jeff Shultz A railfan pulls up to a grade crossing hoping that there will be a train.
Jeff Shultz wrote:
So instead of trying to determine what percentage of internet traffic is junk, why don't we set up categories (I saw someone make a start at it a couple of messages back) and figure out what percentage of traffic fits under each category. We can come up with our own opinions as to which of those categories is junk.
So I guess we would start with stuff that stands as a major category: e-mail, nntp, ftp, telnet, ssh, web... and then you start doing a lot of subcategorizations. I imagine it would start looking like a hierarchical org chart.
I imagine there are places that already produce statistics by protocol, and I am reluctant to endorse a program that says one protocol is junk and another is not. I would prefer (but have no clue as to how to do) a catagorization that has handles like "business transactions", "student research", "warehouse transfers", "recreational", and so on until what ever is left is counted as "junk" or some ephemistically similar term. -- Requiescas in pace o email Ex turpi causa non oritur actio http://members.cox.net/larrysheldon/
If a few of you can stop being so pedantic for a second, the definition looks pretty easy to me: traffic unlikely to be wanted by the recipient. Presumably, if it's being sent that means somebody wanted to send it, so the senders' desires are a pretty meaningless metric. The harder pieces are going to be defining what traffic is unwanted in a way that scales to large-scale measurement. Worm traffic is presumably measurable with Netflow, as are various protocol-types used mainly in DOS attacks. Spam is harder to pinpoint by watching raw traffic, but perhaps comparing the total volume of TCP/25 traffic to the SpamAssassain hit rates at some representative sample of mail servers could provide some reasonable numbers there. So, any of you security types have a list of the protocols that are more likely to be attack traffic than legitimate? -Steve On Wed, 5 May 2004, Mike Damm wrote:
Very very very near to, but not quite 100%. Since almost all of the traffic on the Internet isn't sourced by or destined for me, I consider it junk.
Also remember that to a packet kid, that insane flood of packets destined for his target is the most important traffic in the world. And to a spammer, the very mailings that are making him millions are more important than pictures of someone's grandkids.
I guess my point is junk is a very relative term. A study would need to first be done to identify what junk actually is, then measuring it is trivial.
-Mike
-----Original Message----- From: William B. Norton [mailto:wbn@equinix.com] Sent: Wednesday, May 05, 2004 11:21 AM To: nanog@merit.edu Subject: What percentage of the Internet Traffic is junk?
With all the spam, infected e-mails, DOS attacks, ultimately blackholed traffic, etc. I wonder if there has been a study that quantifies
What percentage of the Internet traffic is junk?
Bill
At 12:55 PM 5/5/2004, Steve Gibbard wrote:
If a few of you can stop being so pedantic for a second, the definition looks pretty easy to me: traffic unlikely to be wanted by the recipient. Presumably, if it's being sent that means somebody wanted to send it, so the senders' desires are a pretty meaningless metric.
Thanks Steve - good point. I have to believe that some of those that have solutions to some of these problems have made *some* measures so they can quantify the value of their solution.
The harder pieces are going to be defining what traffic is unwanted in a way that scales to large-scale measurement. Worm traffic is presumably measurable with Netflow, as are various protocol-types used mainly in DOS attacks. Spam is harder to pinpoint by watching raw traffic, but perhaps comparing the total volume of TCP/25 traffic to the SpamAssassain hit rates at some representative sample of mail servers could provide some reasonable numbers there.
Yea, we can't get absolute #'s, but I think it would be helpful to have a defensible approximation.
So, any of you security types have a list of the protocols that are more likely to be attack traffic than legitimate?
Or maybe those in the Research Community that have been doing traffic capture and analysis?
-Steve
On Wed, 5 May 2004, Mike Damm wrote:
Very very very near to, but not quite 100%. Since almost all of the traffic on the Internet isn't sourced by or destined for me, I consider it junk.
Also remember that to a packet kid, that insane flood of packets destined for his target is the most important traffic in the world. And to a
spammer,
the very mailings that are making him millions are more important than pictures of someone's grandkids.
I guess my point is junk is a very relative term. A study would need to first be done to identify what junk actually is, then measuring it is trivial.
-Mike
-----Original Message----- From: William B. Norton [mailto:wbn@equinix.com] Sent: Wednesday, May 05, 2004 11:21 AM To: nanog@merit.edu Subject: What percentage of the Internet Traffic is junk?
With all the spam, infected e-mails, DOS attacks, ultimately blackholed traffic, etc. I wonder if there has been a study that quantifies
What percentage of the Internet traffic is junk?
Bill
Whenever I hear a question like this, I think of the weekly I2 netflow reports http://netflow.internet2.edu/weekly/ http://netflow.internet2.edu/weekly/20040426/ Look at Table's 6, 7 and 8 - email, for example, is 1/2 %, so even if all email is spam, it's not that big a flow. Unidentified is typically about 30%, but most of that is probably file sharing. My opinion, from looking at these tables, is that probably little is junk, at least in the eye's of the receiver. Regards Marshall Eubanks On Wed, 05 May 2004 13:17:45 -0700 "William B. Norton" <wbn@equinix.com> wrote:
At 12:55 PM 5/5/2004, Steve Gibbard wrote:
If a few of you can stop being so pedantic for a second, the definition looks pretty easy to me: traffic unlikely to be wanted by the recipient. Presumably, if it's being sent that means somebody wanted to send it, so the senders' desires are a pretty meaningless metric.
Thanks Steve - good point. I have to believe that some of those that have solutions to some of these problems have made *some* measures so they can quantify the value of their solution.
The harder pieces are going to be defining what traffic is unwanted in a way that scales to large-scale measurement. Worm traffic is presumably measurable with Netflow, as are various protocol-types used mainly in DOS attacks. Spam is harder to pinpoint by watching raw traffic, but perhaps comparing the total volume of TCP/25 traffic to the SpamAssassain hit rates at some representative sample of mail servers could provide some reasonable numbers there.
Yea, we can't get absolute #'s, but I think it would be helpful to have a defensible approximation.
So, any of you security types have a list of the protocols that are more likely to be attack traffic than legitimate?
Or maybe those in the Research Community that have been doing traffic capture and analysis?
-Steve
On Wed, 5 May 2004, Mike Damm wrote:
Very very very near to, but not quite 100%. Since almost all of the traffic on the Internet isn't sourced by or destined for me, I consider it junk.
Also remember that to a packet kid, that insane flood of packets destined for his target is the most important traffic in the world. And to a
spammer,
the very mailings that are making him millions are more important than pictures of someone's grandkids.
I guess my point is junk is a very relative term. A study would need to first be done to identify what junk actually is, then measuring it is trivial.
-Mike
-----Original Message----- From: William B. Norton [mailto:wbn@equinix.com] Sent: Wednesday, May 05, 2004 11:21 AM To: nanog@merit.edu Subject: What percentage of the Internet Traffic is junk?
With all the spam, infected e-mails, DOS attacks, ultimately blackholed traffic, etc. I wonder if there has been a study that quantifies
What percentage of the Internet traffic is junk?
Bill
At 01:56 PM 5/5/2004, Marshall Eubanks wrote:
Look at Table's 6, 7 and 8 - email, for example, is 1/2 %, so even if all email is spam, it's not that big a flow. Unidentified is typically about 30%, but most of that is probably file sharing.
Thanks Marshall - a few others have said (paraphrasing): On average we have seen about 30% by packets (but only 10% by bandwidth) are junk, with higher %'s during major attacks and worm infestations. For those who say things like "can't define 'junk' precisely", I would agree, but I think we also can agree that we all have a general idea of what junk is. Just looking for round #'s really. It isn't 0%, and it isn't 90% (although it seems that way sometimes). I would also agree that it would be valuable for the community to track this # over time. You can't manage it if you can't measure it. Bill
My opinion, from looking at these tables, is that probably little is junk, at least in the eye's of the receiver.
Regards Marshall Eubanks
On Wed, 05 May 2004 13:17:45 -0700 "William B. Norton" <wbn@equinix.com> wrote:
At 12:55 PM 5/5/2004, Steve Gibbard wrote:
If a few of you can stop being so pedantic for a second, the definition looks pretty easy to me: traffic unlikely to be wanted by the recipient. Presumably, if it's being sent that means somebody wanted to send it, so the senders' desires are a pretty meaningless metric.
Thanks Steve - good point. I have to believe that some of those that have solutions to some of these problems have made *some* measures so they can quantify the value of their solution.
The harder pieces are going to be defining what traffic is unwanted in a way that scales to large-scale measurement. Worm traffic is presumably measurable with Netflow, as are various protocol-types used mainly in DOS attacks. Spam is harder to pinpoint by watching raw traffic, but perhaps comparing the total volume of TCP/25 traffic to the SpamAssassain hit rates at some representative sample of mail servers could provide some reasonable numbers there.
Yea, we can't get absolute #'s, but I think it would be helpful to have a defensible approximation.
So, any of you security types have a list of the protocols that are more likely to be attack traffic than legitimate?
Or maybe those in the Research Community that have been doing traffic capture and analysis?
-Steve
On Wed, 5 May 2004, Mike Damm wrote:
Very very very near to, but not quite 100%. Since almost all of the
traffic
on the Internet isn't sourced by or destined for me, I consider it junk.
Also remember that to a packet kid, that insane flood of packets destined for his target is the most important traffic in the world. And to a spammer, the very mailings that are making him millions are more important than pictures of someone's grandkids.
I guess my point is junk is a very relative term. A study would need to first be done to identify what junk actually is, then measuring it is trivial.
-Mike
-----Original Message----- From: William B. Norton [mailto:wbn@equinix.com] Sent: Wednesday, May 05, 2004 11:21 AM To: nanog@merit.edu Subject: What percentage of the Internet Traffic is junk?
With all the spam, infected e-mails, DOS attacks, ultimately blackholed traffic, etc. I wonder if there has been a study that quantifies
What percentage of the Internet traffic is junk?
Bill
William B. Norton wrote:
For those who say things like "can't define 'junk' precisely", I would agree, but I think we also can agree that we all have a general idea of what junk is. Just looking for round #'s really. It isn't 0%, and it isn't 90% (although it seems that way sometimes).
I would also agree that it would be valuable for the community to track this # over time. You can't manage it if you can't measure it.
There is also a lot of "background Internet radiation" coming from p2p applications which seem to remember their peers for a week or two. These usually account for most of the unidirectional traffic knocking on doors unanswered. (not counting large DDoS). Pete
There is also a lot of "background Internet radiation" coming from p2p applications which seem to remember their peers for a week or two. These usually account for most of the unidirectional traffic knocking on doors unanswered. (not counting large DDoS).
Pete
While working on a private network, I captured some packets trying to reach off-net destinations. After the initial panic that something might be leaking, we figured out that these packets were being generated by applications which were trying to communicate with their mother ships for software updates. These automatic update requests would qualify as junk for some, not for others I suppose.
Petri Helenius wrote:
There is also a lot of "background Internet radiation" coming from p2p applications which seem to remember their peers for a week or two. These usually account for most of the unidirectional traffic knocking on doors unanswered. (not counting large DDoS).
Martian packets, idiots who configure non rfc1918 ips into their LANs and then leak these out to the world, random spoofed source address traffic and/or DDoS traffic as you say (insert bcp 38 thread here) - all far more common than they ought to be. But junk p2p applications written by people who can read /. far better than they can code, and who will be first up against the wall when the coding revolution begins, is definitely the major factor. -- suresh ramasubramanian suresh@outblaze.com gpg EDEDEFB9 manager, security and antispam operations, outblaze ltd
On Wed, 05 May 2004 16:56:59 EDT, Marshall Eubanks said:
Look at Table's 6, 7 and 8 - email, for example, is 1/2 %, so even if all email is spam, it's not that big a flow. Unidentified is typically about 30%, but most of that is probably file sharing.
Note that this is biased by a very significant factor - we're looking here at Internet2 traffic *only*, which basically ends up meaning that email isn't seen unless both the sender *and* recipient are at one of the 200 or so universities that are members, or one of the 50 or so corporate/associate members. For starters, if the sender *or* recipient is at a commercial ISP, it won't have been included in those numbers. It's basically the same error as monitoring traffic on some of the DoD's telephone network that connects military bases, and from that concluding that 87% of *all* phone calls involve military matters. You'd get different numbers if you monitored the trunks that connect military bases with the outside world, and still different ones if you measured trunks that connect different parts of the outside world.
--- Steve Gibbard <scg@gibbard.org> wrote:
If a few of you can stop being so pedantic for a second, the definition looks pretty easy to me: traffic unlikely to be wanted by the recipient. Presumably, if it's being sent that means somebody wanted to send it, so the senders' desires are a pretty meaningless metric.
I'm not sure that I'd agree with this statement. What about the traffic from compromised sources? The pps floods or spam emails are not being created with the knowledge of the source, so it would be hard to say that the source "wanted" to send it. -David Barak -Fully RFC 1925 Compliant- __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
I'm not sure that I'd agree with this statement. What about the traffic from compromised sources? The pps floods or spam emails are not being created with the knowledge of the source, so it would be hard to say that the source "wanted" to send it.
Exactly. A great example is a web server struggling to continue to accept connections in the face of a spoofed SYN flood. The SYN/ACK packets are junk. The definition of "junk" is that the sender would not have wanted to send it or the receiver would not have wanted to receive it if either had had a chance to have the appropriate human or humans investiage the transaction in full detail. Traffic you are duped into sending by traffic you wish you hadn't received or cannot distinguish from legitimate traffic is junk.
Perhaps now I'm the one being pedantic, but you're confusing "somebody" with the owner of the resources involved in the sending. What I said was, "presumably, if it's being sent that means *somebody* wanted to send it." Otherwise, we have to consider somebody doing what would otherwise be legitimate web browsing from an untentionally open wireless access point to be junk traffic, which is both very hard to figure out in any large-scale analysis, and gives the numbers a very different meaning. -Steve On Wed, 5 May 2004, David Schwartz wrote:
I'm not sure that I'd agree with this statement. What about the traffic from compromised sources? The pps floods or spam emails are not being created with the knowledge of the source, so it would be hard to say that the source "wanted" to send it.
Exactly. A great example is a web server struggling to continue to accept connections in the face of a spoofed SYN flood. The SYN/ACK packets are junk.
The definition of "junk" is that the sender would not have wanted to send it or the receiver would not have wanted to receive it if either had had a chance to have the appropriate human or humans investiage the transaction in full detail.
Traffic you are duped into sending by traffic you wish you hadn't received or cannot distinguish from legitimate traffic is junk.
-------------------------------------------------------------------------------- Steve Gibbard scg@gibbard.org +1 415 717-7842 (cell) http://www.gibbard.org/~scg +1 510 528-1035 (home)
Perhaps now I'm the one being pedantic, but you're confusing "somebody" with the owner of the resources involved in the sending.
Look, we're the ones asking what percentage of Internet traffic is junk, so we're the somebody. We know what we mean and can do a reasonably good job of explaining it. Basically, it's junk if the sender wouldn't have wanted to send it, the receiver wouldn't have wanted to receive it, the owner of a computer was duped or tricked into sending it, or it's an attack, and so on. It's not complicated. We do have to pass some value judgments. But any number of things we measure requires such value judgments. DS
On Wed, 05 May 2004 12:55:04 PDT, Steve Gibbard said:
Presumably, if it's being sent that means somebody wanted to send it, so the senders' desires are a pretty meaningless metric.
Actually, there's two cases: 1) the sender intended to send it, so the sender's desires don't matter as we "know" a priori what the answer was... 2) The sender has malware on the box - I am including in here everything from viruses, worms, and trojans to the popular software that tries to register an RFC1918 address in the DNS (resulting in traffic to the root DNS servers). Here, the sender's desires don't matter, since they aren't aware they're even doing it until somebody *tells* them....
On 5-mei-04, at 21:55, Steve Gibbard wrote:
If a few of you can stop being so pedantic for a second, the definition looks pretty easy to me: traffic unlikely to be wanted by the recipient. Presumably, if it's being sent that means somebody wanted to send it, so the senders' desires are a pretty meaningless metric.
Exactly.
The harder pieces are going to be defining what traffic is unwanted in a way that scales to large-scale measurement.
I think if someone sends something back that isn't an error, then at some level the traffic is desired. However, this only works at one layer in the stack: DDoS packets aren't replied to, so they can be categorized as abusive at the IP level. However, even though spam emails aren't replied to (hopefully, and not counting bounces), the TCP port 25 packets flow in both directions, so at the IP level spam isn't abusive. (There are a few corner cases where legitimate traffic only flows in one direction but this is very unusual.)
participants (13)
-
David Barak
-
David Schwartz
-
Iljitsch van Beijnum
-
Jeff Shultz
-
Laurence F. Sheldon, Jr.
-
Mark Borchers
-
Marshall Eubanks
-
Mike Damm
-
Petri Helenius
-
Steve Gibbard
-
Suresh Ramasubramanian
-
Valdis.Kletnieks@vt.edu
-
William B. Norton