On Fri, 6 Aug 2004, John K Lerchey wrote:
Hi NANOG folks,
We have a situation (which has come up in the past) that I'd like some opinions on.
Periodically, we have researchers who develop projects which will do things like randomly port probe off-campus addresses...
Here are some observations based on an internal corporate R&D project we ran about 4 years ago that crawled all the websites on the Internet for use with a search engine. * Lower your impact. Limit the number of requests sent to a specific IP within a time period. Limit how fast you make requests. Don't assume adjacent IPs aren't the same server, don't make parallel requests to IPs within the same /24. Limit the total number of requests you make to a specific IP. Limit the amount of data transferred from each IP. * Make sure to implement a block list to avoid scanning people that ask you to stop. * Make your hostname something that helps explain what you are doing. * Make sure that other people in your group know that you are running the experiment and who to forward phone calls to. * Run a webserver on the IP or IPs that are doing the scanning explaining what you are doing. * Honor robots.txt, and other "access denied" type responses or error codes. * Don't assume the data returned is valid or nonhostile. Some people run search engine traps (infinitely large programmatically generated websites) to try to salt the search engines with their bogus advertising data. Some people want to crash any program that scans them. Some people will do things you didn't think of. * Expect some people to send automated complaints without knowing that they are sending them and without understanding the contents of the complaints they are sending. * Expect some people to complain about you attacking them on port 53 when you look up the address for their domain name, even if you never scan their website or otherwise interact with any of their IPs. (During the experiment this was the largest source of complaints.) * If you run the project 24 x 7, you need to respond 24 x 7. Mike. +----------------- H U R R I C A N E - E L E C T R I C -----------------+ | Mike Leber Direct Internet Connections Voice 510 580 4100 | | Hurricane Electric Web Hosting Colocation Fax 510 580 4151 | | mleber@he.net http://www.he.net | +-----------------------------------------------------------------------+