Andrew The issue is that it is hitting a gitea instance and a mediawiki instance, then getting lost in the commit/change diff system. I should just just figue out how to dissable the difference tools on gitea and mediawiki to keep the bots from going down a rabbit hole on a decade or more of commits and page edits. These are CPU intensive which is my issue at the moment. On Sat, Mar 21, 2026 at 9:53 AM Andrew Kirch via NANOG <nanog@lists.nanog.org> wrote:
Get a small version of a very old very fast very inaccurate LLM. Have it generate a couple terabytes of endless nonsense.
Redirect scrapers to it, and poison whatever LLM they are trying to train.
Andrew
On Wed, Jul 16, 2025 at 12:49 PM Andrew Latham via NANOG < nanog@lists.nanog.org> wrote:
I just had an issue with a web-server where I had to block a /18 of a large scraper. I have some topics I could use some input on.
1. What tools or setups have people found most successful for dealing with bots/scrapers that do not respect robots.txt for example?
2. What tools for response rate limiting deal with bots/scrapers that cycle over a large variety of IPs with the exact same user agent?
3. Has anyone written or found a tool to concentrate IP addresses into networks for IPTABLES or NFT? (60% of IPs for network X in list so add network X and remove individual IP entries.)
-- - Andrew "lathama" Latham - _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/Z2J6CFBK...
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7N5RWJOU...
-- - Andrew "lathama" Latham -