On Fri, Jun 19, 2020 at 9:15 AM Christopher Tyler <chris@totalhighspeed.net> wrote:
We run a smaller ISP of about 7.5k customers and the other day we got an email (excerpt below) from one of Google's automated tools.
We are seeing automated scraping of Google Web Search from a large number of your IPs. Automated scraping violates our /robots.txt file and also our Terms of Service. We request that you terminate this traffic immediately. Failure to do so may cause your network to be blocked by our abuse systems.
To allow you to identify the traffic, we are providing a list of your IPs they used today (Source field), as well as the most common destination (Google) IP and port and a timestamp of a recent request (in UTC) to aid in your identification. Note that this list may not be exhaustive, and we request that you terminate all such traffic, not just traffic from IPs in this list.
All of the destination ports are either 80 or 443, so they at least appear to be legit web traffic on the surface. They are obviously spoofed IP address as there are network addresses in the list and the IP belongs to a router that doesn't appear to be compromised in any way. The initial letter included 700+ IP addresses from our network.
Hi Christopher, Presumably Google is smart enough to know the difference between spoofed port scanning and completed TCP connections performing a web search. If you take Google's report at face value, the addresses aren't spoofed; something else is happening. The question is how. There was a company revealed on Nanog earlier this year (or maybne last year, I'm not great with dates) which contracts small ISPs and virtual server providers to use their "spare bandwidth" to pseudonymously originate web requests. They don't require you to assign them IP addresses because they overload their activity on all of your IP addresses. In theory they do this without disturbing your customers and only access web sites whose owners have contracted them to do so, generally to test connectivity. In practice, there's a device inline with your traffic flow that injects TCP connections and captures the associated return packets across your entire address space. Including, for example, your routers' IP addresses. Do you, or perhaps your upstream have such a contract? Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/