Research - Valid Data Gathering vs Annoying Others
Hi NANOG folks, We have a situation (which has come up in the past) that I'd like some opinions on. Periodically, we have researchers who develop projects which will do things like randomly port probe off-campus addresses. The most recent instance of this is a group studying "bottlenecks" on the internet. Thus, they hit hosts (again, semi-randomly) on both the commodity internet and on I2 (abeline) to look for places where there is "traffic congestion". The problem is that many of their "random targets" consider the probes to be either malicious in nature, or outright attacks. As a result of this, we, of course, get complaints. One suggestion that I received fro a co-worker to help to mitigate this is to have the researchers run the experiments off of a www host, and to have the default page explain the experiment and also provide contact info. We also discussed having the researchers contact ISPs and other large providers to see if they can get permission to use addresses in their space as targets, and then providing the ISPs with info from the testing. How do you view the issue of experiments that probe random sites? Should this be accepted as "reasonable", or should it be disallowed? Something in between? What other suggestions might you have about how such experiments could be run without triggering alarms? Please send any suggestions directly to me and once I have some answers, I'll post a compilation to the list. Thanks! John John K. Lerchey Computer and Network Security Coordinator Computing Services Carnegie Mellon University
On Fri, 6 Aug 2004, John K Lerchey wrote:
Hi NANOG folks,
We have a situation (which has come up in the past) that I'd like some opinions on.
Periodically, we have researchers who develop projects which will do things like randomly port probe off-campus addresses...
Here are some observations based on an internal corporate R&D project we ran about 4 years ago that crawled all the websites on the Internet for use with a search engine. * Lower your impact. Limit the number of requests sent to a specific IP within a time period. Limit how fast you make requests. Don't assume adjacent IPs aren't the same server, don't make parallel requests to IPs within the same /24. Limit the total number of requests you make to a specific IP. Limit the amount of data transferred from each IP. * Make sure to implement a block list to avoid scanning people that ask you to stop. * Make your hostname something that helps explain what you are doing. * Make sure that other people in your group know that you are running the experiment and who to forward phone calls to. * Run a webserver on the IP or IPs that are doing the scanning explaining what you are doing. * Honor robots.txt, and other "access denied" type responses or error codes. * Don't assume the data returned is valid or nonhostile. Some people run search engine traps (infinitely large programmatically generated websites) to try to salt the search engines with their bogus advertising data. Some people want to crash any program that scans them. Some people will do things you didn't think of. * Expect some people to send automated complaints without knowing that they are sending them and without understanding the contents of the complaints they are sending. * Expect some people to complain about you attacking them on port 53 when you look up the address for their domain name, even if you never scan their website or otherwise interact with any of their IPs. (During the experiment this was the largest source of complaints.) * If you run the project 24 x 7, you need to respond 24 x 7. Mike. +----------------- H U R R I C A N E - E L E C T R I C -----------------+ | Mike Leber Direct Internet Connections Voice 510 580 4100 | | Hurricane Electric Web Hosting Colocation Fax 510 580 4151 | | mleber@he.net http://www.he.net | +-----------------------------------------------------------------------+
participants (2)
-
John K Lerchey
-
Mike Leber