Randy Bush <randy@psg.com> writes:
this would be a straight sample, before filtering, ip address blocking, etc.
i realize this is difficult, as all of us go through much effort to reject this stuff as early as possible. but it will be a sample unbiased by your filtering techniques.
How do you classify email as spam without adding bias?
You can always claim bias. There's often been debate, even in the anti-spam community, about what "spam" actually means. The meaning has repeatedly been diluted over the years, to a point where some now define it merely as "that which we do not want," an attitude supported in code by some service providers who now sport great big "Easy Buttons" (with apologies to any office supply chain) labelled "This Is Spam." Even so, there's some complexity - users making typos, for example. However, the easiest way to avoid bias is to look for a mail stream that has the quality of not having any valid recipients. There will be, of course, someone who will disagree with me that mail sent to an address that hasn't been valid in years, and whose parent domain was unresolvable in DNS for at least a year is spam. However, it's as unbiased as I can reasonably imagine being. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.