
On 7/7/25 13:46, Dan Lowe via NANOG wrote:
For the crawler mitigations I've personally been involved with, this would not have worked. The source IPs numbered at least in the tens of thousands. They didn't repeat; each source IP made one request and was never seen again. They didn't aggregate together into prefixes we could filter. They didn't use any common identifier we could find to filter on, including the user-agent (which were valid-looking and randomized).
In my case, we were able to simply put 99% of the site behind a login, which mitigated the problem. Many sites don't have that option..
A perhaps interesting question would be how the entities involved in this crawling activity have come to control so much IP space. It doesn't seem like a use-case that readily justifies it. (Yes, I know that hardly matters)