
On Wed, 2 Jul 2025 at 14:38, William Kern via NANOG <nanog@lists.nanog.org> wrote:
On 7/1/25 8:22 PM, Constantine A. Murenin via NANOG wrote:
But the bots are not a problem if you're doing proper caching and throttling.
Not all site traffic is cacheable or can be farmed out to a CDN.
That's just an excuse for inadequate planning and misplaced priorities. If you start with the requirement that it'd all be cacheable, then EVERYTHING can be cached, especially for the ecommerce and the catalogue stuff. OSS nginx is free and relatively easy to use, with excellent documentation, and it offers superb caching functionality. You don't need an external CDN to do the caching. You can even cache search results, especially for the non-logged users. Why would you NOT? If, to quote arstechnica, "a GitLab link is shared in a chat room", why would you want ANYONE to wait an extra millisecond, let alone "having to wait around two minutes" for Anubis proof-of-work, to access the result, if the result was already computed and known, because it was already assembled for the person who posting the link in the first place? These things could even be cached in the app itself, and even shared between all logged and non-logged users, if performance and web scale is paramount. Else, it can be architectected to be cachable with nginx.
Dynamic (especially per-session) requests (think ecommerce) can't be cached.
Putting an item into the shopping cart is typically one of the more resource driven events.
We have seen bots that will select the buy button and put items into the cart, possibly to see
any discounts given. You end up with hundreds of active 'junk' cart sessions on a small site
that was not designed for that much traffic.
Why is the simple act of placing an item in a shopping cart a resource-driven event? This can literally be done on the front-end without any server requests at all, let alone resource-driven ones. If you DO store an expensive session on the server for this, instead of in the browser, then you also likely expire said carts even for users who intended to return and complete the purchase. Does the owner know? Yes, it's more work to have a separate cookie cart for anonymous users, but if that's a business requirement, why not? This way, even if someone comes back many months later, if they've never cleared the cookies, their cart will still be there, waiting for them, at zero cost to your shopping cart database. Isn't that how it should be? Stores that empty your cart in 3 days, or which require captchas for basic product viewing, are the best example of misplaced priorities. I usually click the X button before they can complete their captcha. And won't bother adding anytihng to the shopping cart again if the store is known for data loss.
Forcing the bot (or a legit customer) to create yet another login to create a cart can help
but that generates push back from the store owner. The owners don't want that until
the payment details phase or they want purchasers to be able to do a guest checkout.
They will point that on amazon.com you don't have to login to put an item in the cart.
Rate limiting is not effective when they come from different ip ranges. The old days of using
Rate limiting would make sense for expensive things like search (and `git blame`), which should also be combined with caching, too, especially if you aren't even using AI or past purchases/views. Things like adding an item to a cart, should be a local event for anonymous users, so, it should be impossible to rate-limit that. Product listings and categories should 100% be cached, absolutely no exceptions. Search pages also absolutely have to be cached, I dunno who ever though of the brilliant idea that search somehow isn't cacheable, especially on all those sites where it's 100% deterministic and identical for all users. If someone wants to get the entire site of all the products, I don't see a good reason to preclude that. In the old days, any vendor would be happy to send you the entire catalogue of their offerings, all at once, in print form in the US for major brands, and in Microsoft Excel for the more local vendors, but now suddenly we want to prevent people from viewing several products all at a time, or being able to shop the way they want to, or see the prices for more than a handful of products at a time?! Misplaced priorities 100%. Best regards, Constantine.
a Class C (/24) as a rate limit key are no longer effective. The bots come from all over the providers space
(often Azure) but can be from any of the larger providers and often from different regions.
if you throttle EVERYONE then legit customers can get locked out with 429 or even 503s
And has been pointed out. Relying on the browser string is no longer effective. They use
common strings and change them dynamically.
Sincerely,
William Kern
PixelGate Networks.