On Thu, Feb 15, 2007 at 09:03:17PM -0800, Peter Moody wrote:
Dr. Cerf wasn't speaking for Google when he said this, so I'm not sure why you're looking that direction for answers. But since you ask, his data came from informal conversations with A/V companies and folks actually in the trenches of dealing with botnet ddos mitigation. The numbers weren't taken from any sort of scientific study, and they were in fact mis-quoted (he said more like 10%-20%).
Then I think they're too small -- actually, I thought 140M was also too small, but plausible. A couple of years ago, I had a series of conversations with some people who have insight into very large system populations. The question at hand was "how many zombie'd boxes are out there?" and was intended to yield some concept of distributed the spam problem had become. We kept in mind the following: (a) zombies which do nothing observable will escape external detection (b) zombies which do things, but direct those things against hosts that aren't paying attention, will also escape external detection and (c) zombies which do things, and direct those things against hosts that are paying attention, but which are sufficiently clever about how they do it, will also escape external detection. Everyone used their methods and reasoning. We concurred that they were probably on the order of ~100M zombies *just based on the spam we were seeing*, i.e. ignoring everything else. (As in "order of magnitude". I thought the number was perhaps 50% low; others thought it was perhaps 50% high. So call it a ballpark estimate, no better.) That was during the spring of 2005. I can't think of anything that's happened since then to give me the slightest reason to think the number's gone down. I can think of a lot of reasons to think the number's gone up. I suggest everyone run their own experiment. Deploy something that does passive OS fingerprinting (e.g. OpenBSD's pf) and just look at SMTP: then correlate (a) whether the host tried to deliver spam or not (b) detected OS type and (c) rDNS (if any exists). If you want to fold in data from ssh brute-force attempts and the like, sure, go ahead. Let it run for a month and collate results. Alternatively, look at SYN packet rates and destination diversity for outbound port 25 connections from those portions of your own networks ostenibly populated with end users. Compare to what "normal" should look like. I've concluded three things (by doing experiements like that). (a) Where there are Windows boxes, there are zombies. "Securing Microsoft operating systems adequately for use on the Internet" is not a solved problem in computing. (b) As of the moment, "the spam problem" nearly equates to "the Microsoft insecurity problem". (Yes, there are non-Windows spam-sending hosts, but most of those seem to be dedicated spammer servers, quickly identified and blacklisted, thus not a serious threat to anyone who's using a sane combination of DNSBLs.) (c) Amusingly, it's possible to detect new end-user allocations and service rollouts by noting when spam starts to arrive from them. (e.g. the Verizon FIOS deployment, if I may use hostnames of the form *.fios.verizon.net as a guide, is going well in NYC, Dallas, DC, Tampa, Philly, LA, Boston and Newark, but lags behind in Seattle, Pittsburgh, Buffalo and Syracuse.) ---Rsk