OT: question re. the Volume of unwanted email (fwd)
Hi Folks, Someone on the cybertelecom list raised a question about the real costs of handling spam (see below) in terms of computer resources, transmission, etc. This dovetailed a discussion I had recently with several former BBN colleagues - where someone pointed out that email is not a very high percentage of total internet traffic, compared to all the multimedia and video floating around these days. Since a lot of the arguments about spam hinge on the various costs it imposes on ISPs, it seems like it would be a good thing to get a handle on quantitative data. It occurs to me that a lot of people on this list might have that sort of quantitative data - so... any comments? Regards, Miles Fidelman ---------- Forwarded message ---------- Date: Wed, 18 Jun 2003 09:15:08 -0400 From: Timothy Denton <tim@TMDENTON.COM> Reply-To: Telecom Regulation & the Internet <CYBERTELECOM-L@LISTSERV.AOL.COM> To: CYBERTELECOM-L@LISTSERV.AOL.COM Subject: Issue: the Volume of unwanted email Cybertelecomers: I want the advice and knowledge of people on this list. I dared not use the word spam lest I be filtered out, but the issue is the economic cost of spam for ISPs. There has been much to-do about spam of late. Figures from Canarie show that SMTP transmissions account for about .5% of the volume of Internet traffic. This may be typical of backbone networks, or not. Commercial networks are jealous of revealing information of this nature. ISPs report that spam is now about 46% of email, and that it adds to the cost of transmissions because of the extra machines that have to be bought and operated. Question: What is the economic cost of handling all this spam, in terms of additional boxes, software, transmission costs etc? I am aware that spam adds large costs in terms of time and attention at the user end. Is there evidence of what it adds in terms of hardware and software? As we head toward legislative remedies in the US and Canada, I would like to have a better idea of the economic impact of spam. Timothy Denton, BA,BCL 37 Heney Street Ottawa, Ontario Canada K1N 5V6 www.tmdenton.com 1-613-789-5397 tmdenton@magma.ca
While the question (metrics for operators, backbone-to-retail, spam) is current in the asrg list, the question is posed by (informally) by the (outgoing) secretary of the ICANN Registrar's Constituency to a listserv in the AOL playpen. The question is not current in the Registrar's Constituency, not is it likely to be, IMHO. There are several ways nanog'ers can take it, back to the AOL listserv, or over the fence to the irtf/asrg playpen, or yawn. There is one modality of spam that interests me technically, one that Bill touched on in his note in the "rr style scanning" thread, and Sean and others have touched on in the "use trojans" thread. Buffering up hosts (acquired via technical means), and expending hosts (sending until some terminal condition occurs) at a rate approximating the rate of buffer-fill. Anyone else interested drop me a line. Better still would be the peer reviewed paper in the open literature that answers all the questions I've thought of, and haven't thought of. Eric
Miles Fidelman wrote:
Since a lot of the arguments about spam hinge on the various costs it imposes on ISPs, it seems like it would be a good thing to get a handle on quantitative data.
While there is a cost to ISPs reguarding spam, the highest cost is still on the recipient. End User's who are outraged by their children getting pornography in email, or having trouble finding their legitimate emails due to the sheer volume of spam that fills their inbox. There are cases where emails are so far out of 822 compliance that the mail clients lock up or crash when attempting to read the message. Time is expended across the board in handling, blocking, verifying, or deleting spam. In this day and age, time is often more valuable than money and the assigned value is dependant on the individual. Unfortunately, end user's cannot just highlight and hit delete on spam. They must look at almost every email to verify that it is spam and not a business or personal email. The misleading subject lines and forgeries are making this even more necessary. -Jack
jbates@brightok.net (Jack Bates) writes:
While there is a cost to ISPs reguarding spam, the highest cost is still on the recipient. End User's who are outraged by their children getting pornography in email, or having trouble finding their legitimate emails due to the sheer volume of spam that fills their inbox.
yes. lartomatic=# select date(entered),count(*) from spam where date(entered)>now()-'20 days'::interval group by date(entered) order by date(entered) desc; date | count ------------+------- 2003-06-18 | 505 2003-06-17 | 873 2003-06-16 | 644 2003-06-15 | 621 2003-06-14 | 667 2003-06-13 | 396 2003-06-12 | 696 2003-06-11 | 517 2003-06-10 | 673 2003-06-09 | 616 2003-06-08 | 421 2003-06-07 | 398 2003-06-06 | 558 2003-06-05 | 534 2003-06-04 | 616 2003-06-03 | 464 2003-06-02 | 555 2003-06-01 | 677 2003-05-31 | 378 2003-05-30 | 642 (20 rows) that's actually not too bad. the trend is flattening after the Q1'03 surge.
In this day and age, time is often more valuable than money and the assigned value is dependant on the individual. Unfortunately, end user's cannot just highlight and hit delete on spam. They must look at almost every email to verify that it is spam and not a business or personal email. The misleading subject lines and forgeries are making this even more necessary.
let's not lose site of the privacy and property issues, though. even if all spam were accurately marked with "SPAM:" (or "ADV:") in its subject line and there were no false positives, there is no implied right to send it since it still shifts costs toward the recipient(s). all communication should be by mutual consent, and one way or another, some day it will be. -- Paul Vixie
value is dependant on the individual. Unfortunately, end user's cannot just highlight and hit delete on spam. They must look at almost every
Isn´t "highlight and hit delete" exactly what has been implemented since Mozilla 1.3 and works with almost perfect accuracy after you give it a few dozen messages to build up the "good and bad" database with? PEte
Petri Helenius wrote:
Isn´t "highlight and hit delete" exactly what has been implemented since Mozilla 1.3 and works with almost perfect accuracy after you give it a few dozen messages to build up the "good and bad" database with?
Actually, I find that 1.3 and 1.4 still have issues with determining spam. While fairly decent, one still has to go through looking for false positives. The other issue is that spammers have been doing a good job at designing emails to fool filters. I'm starting to see more and more spam designed to defeat Baynesian filters. By including "good" words in their emails, they either make good words spammy so that you get more FP's or they make their email clean enough that it's still in your inbox. The worst part of it is that spam is quickly becoming unreadable, so that legitimate emails that are readable are the emails more likely filtered. -Jack
On Wed, 2003-06-18 at 17:09, Jack Bates wrote:
The worst part of it is that spam is quickly becoming unreadable, so that legitimate emails that are readable are the emails more likely filtered.
-Jack
On the upside, this means replacing the spam filter with a spell checker will move us toward 100% accuracy! :-) -Paul -- Paul Timmins paul@timmins.net / http://www.timmins.net/ H: 313-586-9514 / C: 248-379-7826 / DC: 130*116*24495 AIM: noweb4u / Callsign: KC8QAY
Actually, I find that 1.3 and 1.4 still have issues with determining spam. While fairly decent, one still has to go through looking for false positives. The other issue is that spammers have been doing a good job at designing emails to fool filters. I'm starting to see more and more spam designed to defeat Baynesian filters. By including "good" words in their emails, they either make good words spammy so that you get more FP's or they make their email clean enough that it's still in your inbox. The worst part of it is that spam is quickly becoming unreadable, so that legitimate emails that are readable are the emails more likely filtered.
I hope I never get your "legitimate" email. :) Since about 100 messages I practically stopped visiting the Junk folder every now and then because no false positives occurred. Just for the sake of this message, I peeked into the folder and scrolled trough the last ~300 messages and all spam. About one in 50 does not get flagged and this stream has already gone through the basic checks like that sender needs to have a legit domain name and such. So I´m happy camper and I hope that legislation catches up with spammers before they figure out a surefire way to defeat Baynesians. Pete
Jack Bates wrote:
Petri Helenius wrote:
Isn´t "highlight and hit delete" exactly what has been implemented since Mozilla 1.3 and works with almost perfect accuracy after you give it a few dozen messages to build up the "good and bad" database with?
Actually, I find that 1.3 and 1.4 still have issues with determining spam. While fairly decent, one still has to go through looking for false positives. The other issue is that spammers have been doing a good job at designing emails to fool filters. I'm starting to see more and more spam designed to defeat Baynesian filters. By including "good" words in their emails, they either make good words spammy so that you get more FP's or they make their email clean enough that it's still in your inbox. The worst part of it is that spam is quickly becoming unreadable, so that legitimate emails that are readable are the emails more likely filtered.
I have not found this to be the case. While I don't manage an abuse mailbox, I do manage a busy mailing list. The mailing list address and administrative addresses have been picked up by spammers and are probably now on all those "millions of email addresses" CDs. The mailing list address and administrative addresses are also both regularly forged (used to send spam) so I get all the undeliverable spams mixed in with all the undeliverable actual list email. Until I started using the Bayesian filters in Mozilla, weeding thru the spam to find the actual administrative emails that needed my attention was a very big chore, and my false positive rate utilizing JHD was fairly high. Now Mozilla filters for me, and has a much lower false positive rate. Note, I fed Mozilla's Bayesian filters two folders, each containing over 1000 emails, one full of spam and one full of legitimate administrative email, to train it to learn what was and wasn't acceptable email. Hand sorting until I had these two seed folders took a fair bit of time, but it was clearly worth it! The Bayesian filters are the main reason I'm using Mozilla. Eudora does some things much better than Mozilla, but I can't live without the spam filters anymore! jc
on 6/18/2003 9:51 AM Miles Fidelman wrote:
Someone on the cybertelecom list raised a question about the real costs of handling spam (see below) in terms of computer resources, transmission, etc. This dovetailed a discussion I had recently with several former BBN colleagues - where someone pointed out that email is not a very high percentage of total internet traffic, compared to all the multimedia and video floating around these days.
The major cost items I've seen are increased bandwidth costs (measured rate), equipment, filtering software/services, and personnel. These costs vary depending on the size of the organization and the kinds of service the organization provides (as a dramatic example, the cost burden is proportionally higher for an email house like pobox than it would be for yahoo). There are other indirect costs too; lots of organizations have stopped sharing backup MX services because of problems with assymetrical filtering, which can translate into more outages, which can lead to ... My feeling is that any organization with at least one full-time spam staffer could probably come up with a minimal cost estimate of $.01 per message. End-users with measured rate services (eg, cellular) can also reach similar loads with little effort. But due to the variables and competitive concerns, you'll probably have to go door-to-door with a non-disclosure agreement to get people to cough up their exact costs, assuming they are tracking it.
There has been much to-do about spam of late. Figures from Canarie show that SMTP transmissions account for about .5% of the volume of Internet traffic. This may be typical of backbone networks, or not. Commercial networks are jealous of revealing information of this nature.
The backbone utilization isn't going to be relevant unless it is high enough to affect the price of offering the connection. The mailstore is where the pressure is at. Companies and users who sink capital and time into unnecessary maintenance have always been the victims. These costs also have secondary effects, like permanently delaying rate reductions (sorry your tuition went up again, but we had to buy another cluster), which in turn affects other parties, but the bulk of the pressure is wherever the mailstore is at. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
On Wed, 18 Jun 2003, Miles Fidelman wrote: It occurs to me that a lot of people on this list might have that sort of quantitative data - so... any comments? Regards, Miles Fidelman For my little corner: http://mrtg.snark.net/spam/ It seems >1:1 is the norm these days, at least at my scale. matto --mghali@snark.net------------------------------------------<darwin>< Flowers on the razor wire/I know you're here/We are few/And far between/I was thinking about her skin/Love is a many splintered thing/Don't be afraid now/Just walk on in. #include <disclaim.h>
Interesting pattern. Kind of looks like "cutting z's." :-) curtis just me said:
On Wed, 18 Jun 2003, Miles Fidelman wrote:
It occurs to me that a lot of people on this list might have that sort of quantitative data - so... any comments?
Regards,
Miles Fidelman
For my little corner: http://mrtg.snark.net/spam/
It seems >1:1 is the norm these days, at least at my scale.
matto
--mghali@snark.net------------------------------------------<darwin>< Flowers on the razor wire/I know you're here/We are few/And far between/I was thinking about her skin/Love is a many splintered thing/Don't be afraid now/Just walk on in. #include <disclaim.h>
On Wed, 18 Jun 2003, just me wrote:
For my little corner: http://mrtg.snark.net/spam/
It seems >1:1 is the norm these days, at least at my scale.
How do you get your mail delivery attempts to occur so linearly? :) I think something's busted with your mrtg script... Here's the stats for one of the smtp boxes in our cluster (83% rejection rate...and it's +/- 1% across the other boxes in the cluster): Postfix log summaries for Jun 18 Grand Totals ------------ messages 396087 received 148369 delivered 0 forwarded 672 deferred (9504 deferrals) 1636 bounced 718k rejected (83%) 0 reject warnings 0 held 0 discarded (0%) Andy --- Andy Dills Xecunet, Inc. www.xecu.net 301-682-9972 ---
On Thu, 19 Jun 2003, Jack Bates wrote:
Andy Dills wrote:
How do you get your mail delivery attempts to occur so linearly? :)
I think something's busted with your mrtg script...
Depends on which stats he wants. He's showing the total since midnight in the graph instead of the count since the last run.
Yeah, mea culpa :) Don't know why you have your graphs set up that way, unless you have no other way of reporting aggregate scores for the day... http://people.ee.ethz.ch/~oetiker/webtools/mrtg/reference.html "In the absence of 'gauge' or 'absolute' options, MRTG treats variables as counters and calculates the difference between the current and the previous value and divides that by the elapsed time between the last two readings to get the value to be plotted." Sounds like you have 'gauge" option set where you shouldn't...unless that is exactly how you want the graphs to behave, in which case I'll shut up and respect your right to run mrtg any way you want. :) Andy --- Andy Dills Xecunet, Inc. www.xecu.net 301-682-9972 ---
On Thu, 19 Jun 2003, Andy Dills wrote: Yeah, mea culpa :) Don't know why you have your graphs set up that way, unless you have no other way of reporting aggregate scores for the day... http://people.ee.ethz.ch/~oetiker/webtools/mrtg/reference.html "In the absence of 'gauge' or 'absolute' options, MRTG treats variables as counters and calculates the difference between the current and the previous value and divides that by the elapsed time between the last two readings to get the value to be plotted." Sounds like you have 'gauge" option set where you shouldn't...unless that is exactly how you want the graphs to behave, in which case I'll shut up and respect your right to run mrtg any way you want. :) My configuration lets me see daily totals as well as rate vs. time-of-day pretty easily. Using "absolute", the only thing I'd be able to see is a running total. I like the ability to compare traffic between days, as well as see when the bulk of my mail is delivered- any anomalous traffic is pretty easy to spot. matto --mghali@snark.net------------------------------------------<darwin>< Flowers on the razor wire/I know you're here/We are few/And far between/I was thinking about her skin/Love is a many splintered thing/Don't be afraid now/Just walk on in. #include <disclaim.h>
Not a lot to break; here's the script in its entirety: #!/usr/local/bin/bash grep -c mailer=local /var/log/maillog egrep -c 'uce@ftc|reject|njabl' /var/log/maillog A lot of mail traffic on my box is mailing lists; perhaps thats why the graphs look so smooth. matto On Thu, 19 Jun 2003, Andy Dills wrote: On Wed, 18 Jun 2003, just me wrote:
For my little corner: http://mrtg.snark.net/spam/
It seems >1:1 is the norm these days, at least at my scale.
How do you get your mail delivery attempts to occur so linearly? :) I think something's busted with your mrtg script... Here's the stats for one of the smtp boxes in our cluster (83% rejection rate...and it's +/- 1% across the other boxes in the cluster): Postfix log summaries for Jun 18 Grand Totals ------------ messages 396087 received 148369 delivered 0 forwarded 672 deferred (9504 deferrals) 1636 bounced 718k rejected (83%) 0 reject warnings 0 held 0 discarded (0%) Andy --- Andy Dills Xecunet, Inc. www.xecu.net 301-682-9972 --- --mghali@snark.net------------------------------------------<darwin>< Flowers on the razor wire/I know you're here/We are few/And far between/I was thinking about her skin/Love is a many splintered thing/Don't be afraid now/Just walk on in. #include <disclaim.h>
On Wed, 18 Jun 2003, Miles Fidelman wrote:
It occurs to me that a lot of people on this list might have that sort of quantitative data - so... any comments?
You might find this useful. http://zebulon.miester.org/spam/ Justin
participants (12)
-
Andy Dills
-
Curtis Maurand
-
Eric A. Hall
-
Eric Brunner-Williams in Portland Maine
-
Jack Bates
-
JC Dill
-
just me
-
Justin Shore
-
Miles Fidelman
-
Paul Timmins
-
Paul Vixie
-
Petri Helenius