Re: www.gigablast.com

12 Jul 2006

      On Wed, Jul 12, 2006 at 02:50:54PM -0700, Malcolm Staudinger wrote:
...
Google is your friend?
They're a search engine. robots.txt and forget it.
Malcolm
That's assuming whoever designed their software actually adheres
to robots.txt.  RFCs recommend people adhere to it, but there are
some who don't; it's operationally "optional".

I can't find a single reference to what standards "GigaBlast"
adheres to, or any technical data about how their engine works.
The way their site is designed, it looks like a total fly-by-night
operation.

If "GigaBlast" is supposedly "indexing" his site, they have to be
basing their GET requests on something (the equivalent of a normal
browsers' Referer header; but again, who knows if they pass that
along?).  The requests Jim is seeing appear to be garbage, similar
to spam composition, not based on actual references/indexes.  I
could be outright wrong here.

Additionally, how does this solve the issue of Jim's bandwidth,
CPU, memory, if not his time, being wasted for HTTP requests which
shouldn't necessarily even be arriving at his boxes (which is what
he's essentially complaining about)?  "So filter upstream, or on
the machine itself".  Okay, that's a solution, but it doesn't address
incoming traffic (just responses).

-- 
| Jeremy Chadwick                                 jdc at parodius.com |
| Parodius Networking                        http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP: 4BD6C0CB |