Deepak Jain wrote: Can someone explain to me (publicly or privately) why someone would send spam with no product to sell, no position to pitch, nothing except text designed to get by a spam filter -- without even HTML to KNOW it got by a spam filter..
Likely two different goals here: 1. Reduce the efficiency of Bayesian-like filters: Trouble with this kind of email is that they are a) of sufficient length b) contain only "real" words c) contain none of the words regularly used by spammers such as the v. word. It's a lose-lose situation for the spam engine: - If this message is marked as spam, it increases the likeliness of false positives, as the message shares different common points with real email in spam-measuring metrics such as length, percentage of real words, etc. - If this message is marked as legit, it reduces the catching abilities of the spam engines as it shares similar patterns with a spam that would be essentially the same text altered to a spam content. You can bet that it won't be long until we see such messages that not only use only dictionary words, but furthermore are constructed with a valid grammar (and still mean nothing). One of the next fronts in spam detection is based on grammatical correctness. What we are looking at is the eternal battle between the shield and the weapon: as soon as someone invents a new shield, someone else develops a new weapon that will pierce it. 2. It might a statistical probe spam: Spammer xyz has a list of 1 million addresses, out of which 500,000 are invalid and bounce. By using a return address that is actually not bogus, the spammer can indirectly measure the efficiency of Bayesian outsmarting strategies. First the spammer send a spam that will be blocked by the majority of spam-detection engines (even the dumber ones) by including correctly spelled well-knows spam words in both subject and text. You know, the stuff that promises to put a foot in your pants that features "always-on" service. Let's say that our spammer gets 150,000 bounces out of this one, the math is simple: out of 500,000 potential bounces they got only 150,000 which means that 350,000 have been blocked by spam engines prior to the non-existing-user bounce. Then, our spammer sends the kind of email you referred to the same list and measures the bounce rate one more time. If this time he gets 450,000 bounces, it means that only 50,000 out of the 500,000 potential bounces have been blocked by spam engines, which in turns mean that the same email slightly alter will reach a large part of its targets. This is a simplified view, as bounce rate alone is not a valid measurement of outsmarting strategies, but correlating two or three of that kind of metric gives a reasonably precise of which spamming techniques still work, and which have become a waste of bandwidth. Michel. When someone dies, it's a tragedy. When millions die, it's a statistic. -- Josef Stalin --
On Wed, 31 Mar 2004, Michel Py wrote:
Deepak Jain wrote: Can someone explain to me (publicly or privately) why someone would send spam with no product to sell, no position to pitch, nothing except text designed to get by a spam filter -- without even HTML to KNOW it got by a spam filter..
I'm surprised you only got it now. I had been receiving emails like that for probably at least a year.
Likely two different goals here:
1. Reduce the efficiency of Bayesian-like filters: Trouble with this kind of email is that they are a) of sufficient length b) contain only "real" words c) contain none of the words regularly used by spammers such as the v. word. Have to agree, this foremost the reason.
You can bet that it won't be long until we see such messages that not only use only dictionary words, but furthermore are constructed with a valid grammar (and still mean nothing). I already saw it. Right now its just random phrases being put together and not yet entire text. And somewhere (actually several years ago), I've read of AI program capable of creating complete stories when its given some key
Its interesting however that spammers are doing it not for their own companies specific interest but for interest of their spamming industry in general phrases to start with, would not be surprised if same or similar algorithms began to be used. Personally I do not believe that bayesian filtering (or text filtering in general) is the way to fight spam, there is too much chance of filtering false positives along the way (and it is only increasing as spammers are is evident by what is discussed in this thread). Its better to focus on authentication of the source source and of trust mechanisms for legitimate mail senders. Spammers have a problem taht they are often operating against the laws or policies of their providers and they have to try to hide their identity and the mechanisms they use for that can be identified and loopholes closed as much as possible. -- William Leibzon Elan Networks william@elan.net
On Wed, 31 Mar 2004, Michel Py wrote:
1. Reduce the efficiency of Bayesian-like filters: Trouble with this kind of email is that they are a) of sufficient length b) contain only "real" words c) contain none of the words regularly used by spammers such as the v. word.
Good bayesian filters do not score on single words alone, they also score on "phrases" (ie multiple words). Random strings of words will result in neutral scores (presuming those words are also used in non-spam), while the phrases will be slightly higher. Re-used gibberish (ie apparently random) strings of words will result in "phrases" from that gibberish having high scores. Also, a good bayesian filter should prune its database regularly of phrases (including one word phrases) that have not had their score updated recently, further reducing "pollution" by random words and phrases. noise is just noise. the spam specific stuff will still be statistically significant, hopefully. regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A warning: do not ever send email to spam@dishone.st Fortune: It's currently a problem of access to gigabits through punybaud. -- J. C. R. Licklider
participants (3)
-
Michel Py
-
Paul Jakma
-
william(at)elan.net