On Wed, 4 Dec 2019 at 16:38, John R. Levine <johnl@iecc.com> wrote:
Though I agree that Gmail spam filtering is top grade, or close to be so, it still sends to spam a statistically significant number of emails from IETF and ICANN mailing lists I'm subscribed to. It depends as well on which account I should receive those emails.
Yes, that's mostly the DMARC problem. We're painfully familiar with it.
In this case, the mail origin is DMARC signed, and Gmail accepts all other messages. It simply *appears* to be that they've decided the URLs in mailman's admin/moderator messages are suspicious enough to warrant outright rejection of the message, and not just labelling it as spam or suspicious in the recipient's mailbox. Someone up-thread noted that my personal domain is hosted on google groups. I've noticed in the past that the behaviour of gmail.com can be very different from the behaviour of a paid mail domain like mine... I've seen the same sorts of messages accepted by one and refused by the other on more than one occasion, and it's not always the same one being more strict or restrictive.
While I understand and totally accept that there might be issues with the respective senders' configuration; with mailing lists at least, spam filtering is more of a duty of the mailing list admins. ...
One day I asked a guy at Google why they don't just whitelist incoming mailing list mail, since they clearly have a good idea where the list hosts are. He said that legit lists send spam (actual ugly spam, not filter errors) all the time, either because a subscriber's account is compromised or the list itself is compromised. Accurate filtering is remarkably complicated.
Agreed that spam filtering today is a hard problem, and given Google's scale their problem with it is bigger than most others'. My assertion is that given how ubiquitous mailman's administrative messages are (as opposed to random list traffic), and given that those messages haven't changed in structure in aeons, it should be trivial for a company with Google's resources to not get false positives on those messages. Their heuristics and learning algorithms should be primed with a ton of samples of such messages to inform their decision making, if not to outright whitelist them.