On April 13, 2008 at 14:24 jgreco@ns.sol.net (Joe Greco) wrote:
I would have thought it was obvious, but to see this sort of enlightened ignorance(*) suggests that it isn't: The current methods of spam filtering require a certain level of opaqueness.
Indeed, that must be the problem.
But then you proceed to suggest:
So, on one hand, we have the "filtering by heuristics," which require a level of opaqueness, because if you respond "567 BODY contained www.sex.com, mail blocked" to their mail, you have given the spammer feedback to get around the spam.
Giving the spammer feedback?
In the first place, I think s/he/it knows what domain they're using if they're following bounces at all. Perhaps they have to guess among whether it was the sender, body string, sending MTA, but really that's about it and given one of those four often being randomly generated (sender) and another (sender MTA) deducible by seeing if multiple sources were blocked on the same email...my arithmetic says you're down to about two plus or minus.
In many (even most) cases, that is only useful if you're sending a lot of mail towards a single source, a variable which introduces yet *another* ambiguity, since volume is certainly a factor in blocking decisions. Further, if you look at the average mail message, you have domains based on multiple factors, such as services to do open tracking (1x1/invisible pixels, etc), branding, and many other reasons that there could be more than a single domain in a single message. Further, once you're being blocked, it may be implemented by-IP even though there was some other metric that triggered the block. Having records that allow a sender to go back and unilaterally determine what was amiss may not be considered desirable by the receiving site.
But even that is naive since spammers of the sort anyone should bother worrying about use massive bot armies numbering O(million) and generally, and of necessity, use fire and forget sending techniques.
Do you mean to suggest that your definition of "spammer" only includes senders using massive bot armies? That'd be mostly pill spammers, phishers, and other really shady operators. There are whole other classes of spam and spammer.
Perhaps you have no conception of the amount of spam the major offenders send out. It's on the order of 100B/day, at least.
I have some idea. However, I will concede that my conception of current spam volumes is based mostly on what I'm able to quantify, which is the ~4-8GB/day of spam we receive here.
That's why you and your aunt bessie and all the people on this list get the same exact spam. Because they're being sent out in the hundreds of billions. Per day.
Actually, we see significant variation in spam received per address.
Now, what exactly do you base your interesting theory that spammers analyze return codes to improve their techniques for sending through your own specific (not general) mail blocks? Sure they do some bayesian scrambling and so forth but that's general and will work on zillions of sites running spamassassin or similar so that's worthwhile to them.
I'm sure that if you were to talk to the Postmasters at any major ISP/mail provider, especially ones like AOL, Hotmail, Yahoo, and Earthlink, that you would discover that they're familiar with businesses which claim to be in the business of "enhancing deliverability." However, what I'm saying was pretty much the inverse of the theory that you attribute to me: I'm saying that receivers often do NOT provide feedback detailing the specifics of why a block happened. As a matter of fact, I think I can say that the most common feedback provided in the mail world would be notice of listing on a DNS blocking list, and this is primarily because the default code and examples for implementation usually provide some feedback about the source (or, at least, source DNSBL) of the block. You'll see generic guidance such as the Yahoo! error message that started this thread ("temporarily deferred due to user complaints", IIRC), but that's not particularly helpful, now, is it. It doesn't tell you which user, or how many complaints, etc.
But what, exactly, do you base your interesting theory that if a site returned "567 BODY contained www.sex.com" that spammers in general and such that it's worthy of concern would use this information to tune their efforts?
Because there are businesses out there that claim to do that very sort of thing, except that they do it by actually sending mail and then checking canary e-mail boxes on the receiving site to measure effectiveness of their delivery strategy. Failures result in further tuning. Being able to simply analyze error messages would result in a huge boost for their effectiveness, since they would essentially be able to monitor the deliverability of entire mail runs, rather than assuming that the deliverability percentage of their canaries, plus any open tracking, indicated the actual delivery success rate. I would have expected this to be stunningly obvious to anyone discussing deliverability.
This is not an existence proof, one example is not sufficient, it has to be evidence worthy of concern given O(100 billion) spams per day overwhelmingly sent by botnets which are the actual core of the actual problem.
No, it doesn't. Don't be silly. There are spammers who are flooding the system, and hope to get mail through using sheer bulk. These guys aren't caring to stick around and listen to the result code. They've got their infected PC armies with however many hundreds of threads of spam-blasting gooness they can squeeze out of each, and they're pounding the hell out of recipients. They have a vested interest in not being easy to track back, so that's why we get so much fun broken spam with broken payloads. OBVIOUSLY they're not going to be listening for result codes. But that doesn't mean that every spammer works that way. There are entire e-mail service providers based on the principles of sending vast amounts of non-opt-in email. Spamhaus has a lot of information on the biggest of these. They exist.
I say you're guessing, and not very convincingly either.
I'm not guessing. Go visit Spamhaus.
So you have two opaque components to filtering. And senders are deliberately left guessing - is the problem REALLY that a mailbox is full, or am I getting greylisted in some odd manner?
Except that most sites return some indication that a mailbox is full. It's just unfortunately in the realm of heuristics.
There are sites that return "mailbox full" for a variety of cases.
But look into popular mailing list software packages (mailman, majordomo) and you'll see modules for classifying bounce backs heuristically and automatic list removal (or not if it seems like a temporary failure, e.g., mailbox full.)
Right. Except that it's quite a bit more complex than that. A typical E-Mail Service Provider ("ESP") has an extensive system for dealing with known brokenness at various mailbox providers, and very few ESP's are willing to drop a subscriber from a list for a single bounce. Now, of course, ESP's range from the whitehat (for those who missed it, Rodney Joffe founded "whitehat.com" a long time ago) to the greys, and all the way on down to the blackhats. There are certainly a lot of ESP's that attempt to implement various levels of "opt in" and "permission based" e-mailing, but there are also those that pretty much spam unapologetically. Bounce processing is complicated for them all. Even the blackhats have significant cause to carefully analyze return codes and try to divine some greater meaning, because if they get blocked, their delivery rates go down.
Filtering stinks. It is resource-intensive, time-consuming, error-prone, and pretty much an example of something that is desperately flagging "the current e-mail system is failing."
And standardized return codes (for example) will make this worse, how?
Standardized return codes (assuming any meaningful amount of detail was included) would make it easier for spammers to determine how their mail was being filtered, and to evade accordingly. That's a tragedy, because for legitimate senders, it means that they /also/ do not get automatic feedback on what they could do differently. I *suspect* that avoiding providing too much feedback may be why a certain percentage of e-mail simply vanishes at certain mailbox providers (cough, Hotmail, cough).
You want to define standards? Let's define some standard for establishing permission to mail. If we could solve the permission problem, then the filtering wouldn't be such a problem, because there wouldn't need to be as much (or maybe even any). As a user, I want a way to unambiguously allow a specific sender to send me things, "spam" filtering be damned. I also want a way to retract that permission, and have the mail flow from that sender (or any of their "affiliates") to stop.
Sure, but this is pie in the sky.
Sure. :-)
For starters you'd have to get the spammers to conform which would almost certainly take a design which was very difficult not to conform to, it would have to be technologically involuntary. Whitelists are the closest I can think of but they haven't been very popular and for good reasons.
Sure. The spammers stand to lose. Given a system where end users can revoke permission, they know that end users will. The current system, even at 99% rejection rates, is preferable because they can get through to a small percentage. Unfortunately, legitimate senders suffer under the current model.
Anyhow, the entire planet awaits your design.
I didn't say I had a design. Certainly there are solutions to the problem, but any solution I'm aware of involves paradigm changes of some sort, changes that apparently few are willing to make.
A set of standardized return codes was carefully chosen by me as something which could be (other than the standards process itself) adopted practically overnight and with virtually zero backwards compatability problems (oh there'll always be an exception.)
Sure. Anyone could do this. It's trivial. Perhaps there's a reason that virtually no one implements something like this. (Hm!)
Right now I've got a solution that allows me to do that, but it requires a significant paradigm change, away from single-e-mail-address.
There's nothing new in disposable, single-use addresses (or credit card numbers for that matter, a different realm) if that's what you mean but if you have something more clever the world (i.e., the big round you see when you look down) is your oyster.
I'm currently working towards a model where I deploy an address per site, which isn't a single-use model by any means. As a matter of fact, it's a model that allows that address to be "shared" (even abusively) by the senders, but at the point I decide to revoke permission, permission goes away for _everyone_ sending to that address. So it _is_ disposable, in the conventional sense. It brings the permission control aspect back squarely under my control, not under some random ESP's decision about whether or not to send to me. Consider the benefits for deliverability if a major ISP implemented something like this. Provide a facility for users to be able to get disposable addresses (preferably ones where the "disposable" portion could be handled prior to hitting the mail server, i.e. in DNS), and then guarantee to both users and senders that no mail sent to these addresses would be subject to spam filtering, rate limits, or other arbitrary things, on the basis that the subscriber clearly asked for the material. Revocation of permission would be available to the user, through the simple process of eliminating the DNS record for that particular disposable address. Quite frankly, this is almost the scenario that started me on this in the first place, because I was having such a devil of a time with getting our anti-spam measures to not trip on invoices and other "legitimate" stuff that arrives here, much of which is nearly indistinguishable, at the machine level, from spam. Despite being a viable solution to a large portion of the e-mail deliverability puzzle, my best guess is that no ISP actually wants to incur the cost and support hit of trying to get their users to use such a system. The current system, where users simply sigh and accept that they may not get their e-mail, is apparently preferable. It's certainly easier. Lower the expectations rather than try to fix the problem. That's fine, but then I'd really like them to be honest about it, and just admit that they're not so concerned about actually delivering desired mail as they are about keeping their costs as low as possible (etc.)
Addressing "standards" of the sort you suggest is relatively meaningless in the bigger picture, I think. Nice, but not that important.
Well, first you'd have to indicate that you actually have a view of the problem which supports such a judgment.
At any rate you're quibbling the example as I forewarned.
But standardizing receiving MTA fail codes is, I suspect, more useful than you give them credit. It would be some progress at little to no cost in the large.
By all means, then. Go ahead. You'll amaze me if you can actually get this implemented at any major ISP or mailbox provider. It would be nice for my cold and cynical viewpoints to be disproven, rather than to be proven as too optimistic.
It deals less with spam filtering and more with effective MTA to MTA operation.
That's not how the large ISP/mailbox providers will see it.
At least it's sticking to the realm of improving standards in a way that can be accomplished.
I don't see how I could have given a better example without a lot of hand-waving and vagaries.
Look, I certainly agree that it'd be *nice*, but there are lots of things that are *nice* that aren't going to happen. Shall we beat the BCP38 horse any further? There's a long history of things that would be nice that never come to pass. I've already written off reliable deliverability at large ISP's as one of those things. I'm now looking towards solutions to enable reliable deliverability at smaller sites where principles might still matter enough that people haven't completely written off e-mail as unusable. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.