for a measurement experiment, i would like O(100k) *headers* from spam from europe and a similar sample from the states. this would be a straight sample, before filtering, ip address blocking, etc. if you can help, please drop me a note and we can discuss how the sample is taken and how delivered. thanks! randy
On Thu, Apr 10, 2008 at 06:32:53PM +0900, Randy Bush wrote:
for a measurement experiment, i would like O(100k) *headers* from spam from europe and a similar sample from the states.
Request for clarification: do you mean "spam originating at IP addresses believed to be in Europe" or "spam received at a mail server located in Europe" or "spam putatively from domains in Europe" or something else? ---Rsk
On Thu, Apr 10, 2008 at 08:55:21AM -0400, Rich Kulawiec wrote:
On Thu, Apr 10, 2008 at 06:32:53PM +0900, Randy Bush wrote:
for a measurement experiment, i would like O(100k) *headers* from spam from europe and a similar sample from the states.
Request for clarification: do you mean "spam originating at IP addresses believed to be in Europe" or "spam received at a mail server located in Europe" or "spam putatively from domains in Europe" or something else?
One thing that happened when I moved to Europe and started doing business in Germany is that relatively soon I began receiving spam in German (which seems to have quite different content, and sales strategy, actually, perhaps reflecting cultural differences in the manner of buying and selling between the anglophone world and Germany). Trying to separate out what "in" Europe means in this case seems to come down to having given out email addresses to web sites and collegues in a different language environment rather than physical presence of either myself or my mailserver in either North America or Europe. I guess the German spam I have been receiving is only european in that German speakers happen to be mostly in Europe, which is not true of English speakers. I wonder, is the (English language) spam set that one is likely to receive in Australia statistically different than what one is likely to receive in the US? -w
On Apr 10, 2008, at 9:35 AM, William Waites wrote:
On Thu, Apr 10, 2008 at 08:55:21AM -0400, Rich Kulawiec wrote:
On Thu, Apr 10, 2008 at 06:32:53PM +0900, Randy Bush wrote:
for a measurement experiment, i would like O(100k) *headers* from spam from europe and a similar sample from the states.
Request for clarification: do you mean "spam originating at IP addresses believed to be in Europe" or "spam received at a mail server located in Europe" or "spam putatively from domains in Europe" or something else?
One thing that happened when I moved to Europe and started doing business in Germany is that relatively soon I began receiving spam in German (which seems to have quite different content, and sales strategy, actually, perhaps reflecting cultural differences in the manner of buying and selling between the anglophone world and Germany).
I receive serious amounts of spam in Hebrew and Russian, and haven't even been to either Israel or Russia recently. Regards Marshall
Trying to separate out what "in" Europe means in this case seems to come down to having given out email addresses to web sites and collegues in a different language environment rather than physical presence of either myself or my mailserver in either North America or Europe. I guess the German spam I have been receiving is only european in that German speakers happen to be mostly in Europe, which is not true of English speakers.
I wonder, is the (English language) spam set that one is likely to receive in Australia statistically different than what one is likely to receive in the US?
-w
s/recently/ever/ I'd be happy if I could tell Gmail to delete anything in a non Roman character set. I don't read Hebrew, Arabic, Kanji, Hangul, Cyrillic, or any of the other various character sets I get spam in. -----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Marshall Eubanks Sent: Thursday, April 10, 2008 9:39 AM To: William Waites Cc: Rich Kulawiec; North American Network Operators Group Subject: Re: spam wanted :) On Apr 10, 2008, at 9:35 AM, William Waites wrote:
On Thu, Apr 10, 2008 at 08:55:21AM -0400, Rich Kulawiec wrote:
On Thu, Apr 10, 2008 at 06:32:53PM +0900, Randy Bush wrote:
for a measurement experiment, i would like O(100k) *headers* from spam from europe and a similar sample from the states.
Request for clarification: do you mean "spam originating at IP addresses believed to be in Europe" or "spam received at a mail server located in Europe" or "spam putatively from domains in Europe" or something else?
One thing that happened when I moved to Europe and started doing business in Germany is that relatively soon I began receiving spam in German (which seems to have quite different content, and sales strategy, actually, perhaps reflecting cultural differences in the manner of buying and selling between the anglophone world and Germany).
I receive serious amounts of spam in Hebrew and Russian, and haven't even been to either Israel or Russia recently. Regards Marshall
Trying to separate out what "in" Europe means in this case seems to come down to having given out email addresses to web sites and collegues in a different language environment rather than physical presence of either myself or my mailserver in either North America or Europe. I guess the German spam I have been receiving is only european in that German speakers happen to be mostly in Europe, which is not true of English speakers.
I wonder, is the (English language) spam set that one is likely to receive in Australia statistically different than what one is likely to receive in the US?
-w
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Marshall Eubanks Sent: Thursday, April 10, 2008 9:39 AM To: William Waites Cc: Rich Kulawiec; North American Network Operators Group Subject: Re: spam wanted :)
[ clip ]
I receive serious amounts of spam in Hebrew and Russian, and haven't even been to either Israel or Russia recently.
Regards Marshall
I started getting spam in Icelandic < 24 hours after my account was set up. I get Russian, Chinese, and Hebrew spam all the time. The most spam I receive is from an old domain that I turned off the MX records. Every now and then I turn them back on to see what's flowing and it never changes. Within seconds. [obOp] I think that the language change defeats many of the heuristics found in common spam appliances. -- Martin Hannigan http://www.verneglobal.com/ Verne Global e: hannigan@verneglobal.com Keflavik, Iceland p: +16178216079
Rich Kulawiec wrote:
On Thu, Apr 10, 2008 at 06:32:53PM +0900, Randy Bush wrote:
for a measurement experiment, i would like O(100k) *headers* from spam from europe and a similar sample from the states. Request for clarification: do you mean "spam originating at IP addresses believed to be in Europe"
yes. and, because i have gotten a lot of well-meaning but non-reading offers, to repeat
this would be a straight sample, before filtering, ip address blocking, etc.
i realize this is difficult, as all of us go through much effort to reject this stuff as early as possible. but it will be a sample unbiased by your filtering techniques. randy
Request for clarification: do you mean "spam originating at IP addresses believed to be in Europe" yes.
<blush> aiiii! speaking of non-reading <blush> i mean spam arriving at port 25 on a european host. and an unfiltered unblocked port 25, no dnsbl, ... it looks like i have a great stateside volunteer source, though the proof will be known when we have the data. and we're in asia and have data from here. so it's europe i need. randy
Randy Bush <randy@psg.com> writes:
this would be a straight sample, before filtering, ip address blocking, etc.
i realize this is difficult, as all of us go through much effort to reject this stuff as early as possible. but it will be a sample unbiased by your filtering techniques.
How do you classify email as spam without adding bias? Bjørn
Randy Bush <randy@psg.com> writes:
this would be a straight sample, before filtering, ip address blocking, etc.
i realize this is difficult, as all of us go through much effort to reject this stuff as early as possible. but it will be a sample unbiased by your filtering techniques.
How do you classify email as spam without adding bias?
You can always claim bias. There's often been debate, even in the anti-spam community, about what "spam" actually means. The meaning has repeatedly been diluted over the years, to a point where some now define it merely as "that which we do not want," an attitude supported in code by some service providers who now sport great big "Easy Buttons" (with apologies to any office supply chain) labelled "This Is Spam." Even so, there's some complexity - users making typos, for example. However, the easiest way to avoid bias is to look for a mail stream that has the quality of not having any valid recipients. There will be, of course, someone who will disagree with me that mail sent to an address that hasn't been valid in years, and whose parent domain was unresolvable in DNS for at least a year is spam. However, it's as unbiased as I can reasonably imagine being. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.
this would be a straight sample, before filtering, ip address blocking, etc. i realize this is difficult, as all of us go through much effort to reject this stuff as early as possible. but it will be a sample unbiased by your filtering techniques. How do you classify email as spam without adding bias?
reasonable question. i suspect you pull out the 0.5% of the inbound you actually wanted and consider the bias small. as the dnsbls alone block way over 90% of the inbound here, i would not classify that as small. randy
participants (8)
-
Bjørn Mork
-
Jamie Bowden
-
Joe Greco
-
Marshall Eubanks
-
Martin Hannigan
-
Randy Bush
-
Rich Kulawiec
-
William Waites