
I've thought about using client-side tokens with behavioral analysis and a points system, but haven't implemented it in one of my sites to test it's reliability just yet. But basically issue a signed token to each client and watch that client using js instantiated on each page of the site. Give actions (or lack thereof) scores based on likelyhood to be a bot and fail2ban them past a certain threshold. look at mouse movement, scroll depth, keypress timings, page transitions, dwell time, js events, start/end routes, etc. - so if a client is on a page for under 100ms, it accumulates 'bot' points, say +10 pts - no EventListener JS events +40 pts - client joins site at a nested page and drills down +5 pts (low because people could bookmark a specific page that triggers this) - etc. then decay these points based on the opposite effects happening. There's no guaranteed solution and this would require careful tweaking of the ban threshold and whatnot with real testing to not accidentally block your users, but I feel it could mitigate a lot of bot situations. For banned users just redirect them to a page with a little unban form they can quickly fill out or something. -- Ryland ------ Original Message ------ From "Tom Beecher via NANOG" <nanog@lists.nanog.org> To "North American Network Operators Group" <nanog@lists.nanog.org> Cc "Tom Beecher" <beecher@beecher.cc> Date 7/16/2025 2:43:51 PM Subject Re: Correctly dealing with bots and scrapers.
As Chris states, broad IP based blocking is unlikely to be very effective , and likely more problematic down the line anyway.
For the slightly more 'honorable' crawlers, they'll respect robots.txt, and you can block their UAs there.
Fail2ban is a very good option right now. It will be even better if nepenthes eventually integrates with it. Then you can have some real fun.
On Wed, Jul 16, 2025 at 3:39 PM Andrew Latham via NANOG < nanog@lists.nanog.org> wrote:
Chris
Spot on, and I am getting the feeling this is where the value to a geo-ip service comes to play that offers defined "eyeball networks" to allow.
On Wed, Jul 16, 2025 at 12:57 PM Chris Adams via NANOG <nanog@lists.nanog.org> wrote:
Once upon a time, Marco Moock <mm@dorfdsl.de> said:
Place a link to a file that is hidden to normal people. Exclude the directory via robots.txt.
Then use fail2ban to block all IP addresses that poll the file.
The problem with a lot of the "AI" scrapers is that they're apparently using botnets and will often only make a single request from a given IP address, so reactive blocking doesn't work (and can cause other issues, like trying to block 100,000 IPs, which fail2ban for example doesn't really handle well). -- Chris Adams <cma@cmadams.net> _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/AFJF4UQJ...
-- - Andrew "lathama" Latham - _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/DHUYTBIX...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/ECB77Z6S...