At 12:44 -0400 2001-10-26, Wojtek Zlobicki wrote:
1b) If I wget the whole site, is that wrong?
Sure is, I've given you no right to pull down my site. Copyright law rules here (depending on what the copyright of the site is).
Yes, copyright law applies. But how is using wget to get the whole site different to me navigating the whole site... rights based on user agents? Hmm.
1c) wget it once an hour?
You'll show up in my traffic logs, expect to be ACL'd.
wget on the whole site, yes, probably not nice. wget on a single page?
1d) Request web pages as fast as my system allows?
If you're legitimately surfing, sure, if not, ACL once again.
How do you detect "legitimately surfind"?
Also important is the notion of transaction, which seems to have been lost in this discussion. If a user requests a web page it is quite possible that the web server may attempt to use a mechanism other than HTTP to communicate with the client. In the simple example, consider a web server that for each page downloaded pings the client once and uses that data to improve the client experience. In my opinion, that ping is part of the transaction of getting the web page that the user requested, and as such cannot be considered abusive. This is particularly true when the volume is high. I've seen queries before from sites hosting thousands of users accessing popular sites who complain that the site then sends back a couple of hundred pings.
I know of no standard that incorporates ICMP probes with HTTP transfers. If I ask for HTTP data, thats all that I expect, nothing less, nothing more. I am not opposed to such a standard, but am opposed to people trying such schemes without my knowledge or permission.
Funny, I seem to recall that the default for CERN and NCSA httpd's (yes, I know, years old) was to send an ident request back to the requesting host. If memory serves it's also trivially simple (and painfully dumb in most cases these days) to configure Apache to do so. Am I right to assume that doing a reverse lookup on the requesting host is also bad? I'm not aware of any standard that states that's acceptable either... --