On Mon, 5 Apr 2004, Michel Py wrote:
Paul,
I hope you forgive my bluntness, but this is the worst argument you have ever made in the hundreds of postings I had the privilege to exchange with you on other mailing lists over the years.
Bluntness forgiven ;). How about you put the "obviously doomed to extinction" part down to subtle humour (though that's a cover up on my part, really it was due to severe jetlag ;) ). My point still stands though, if we're going to discuss bayesian filtering there is no point deliberately constraining ourselves to poor implementations of it when considering weaknesses of / attacks on bayesian filtering.
Especially on _this_ mailing list, if you were right, Microsoft would be extinct.
;) We cant stop people using technically poor implementations. In the case of spam filtering, this might well be done ISP side, not client side, and hence filtering solution might be chosen by more technically astute minds, rather than joe-six-pack who is not. anyway, read the rest of my post. text-stuffing is not per se a problem for bayesian filtering. So long as an email still contains phrases which are sufficiently good indicators of either spam or non-spam, it will be classed as such. Obviously though, in the face of such attacks, the bayesian filter will no longer count general phrases as being signs of non-spam. However, spam mail will _always_ differentiate itself somehow, it must do to deliver its "spam" payload, be it URL, image, whatever. Also, one thing I like to do is add X-RBL-Warning: headers and have the bayesian filter consider that as part of its analysis. Which in time will cause the different DNSBl's I use (by means of the header) to be perfectly weighted according to the statistical probability of the DNSBl being "correct" in indicating a mail as spam.
Michel.
regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A warning: do not ever send email to spam@dishone.st Fortune: My family history begins with me, but yours ends with you. -- Iphicrates