
As far as I'm aware, or as far as anyone's ever made me aware when I asked, there remains zero evidence that the high-intensity high-anonymity bots some sites are seeing have anything to do with AI. If they are AI, a court just ruled that AI scraping is fair use, so maybe you should offer them a zipped copy of your site and they won't have to scrape it. The actual reason for CAPTCHAs is revenue. Site operators would like to give a zipped copy of their sites to OpenAI - for $100,000. And they want that to be the only way OpenAI can get a copy. On 2 July 2025 12:50:28 pm GMT+02:00, niels=nanog--- via NANOG <nanog@lists.nanog.org> wrote:
* Constantine A. Murenin [Wed 02 Jul 2025, 05:23 CEST]:
But the bots are not a problem if you're doing proper caching and throttling.
Have you been following the news at all lately? Website operators are complaining left and right about the load from scrapers related to AI companies. They're seeing 10x, 100x the normal visitor load, with not just User-Agents but also source IP addresses masked to present as regular visitors. Captchas is unfortunately one of the more visible ways to address this, even if not perfect.
For example, https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-for...
-- Niels. _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/L5WNOGOA...