On 2012-03-20 16:53 , Nick Hilliard wrote:
On 20/03/2012 14:54, Jeroen Massar wrote:
For everybody who is "monitoring" other people's websites, please please please, monitor something static like /robots.txt as that can be statically served and is kinda appropriate as it is intended for robots.
Depends on what you are monitoring. If you're looking for layer 4 ipv6 connectivity then robots.txt is fine. If you're trying to determine whether a site is serving active content on ipv6 and not serving http errors, then it's pretty pointless to monitor robots.txt - you need to monitor /.
And as can be seen with the monitoring of ipv6.level3.com it will tell you 'it is broken' but as the person who is monitoring has no relation or contact with them, it only leads to public complaints which do not get resolved.... If the site themselves cannot be arsed to monitor their own, then why would you bother to do so. Indeed, I agree that it can be useful, especially as an access ISP, to monitor popular websites so that you know that you can reach them, but that does not mean you need to pull large amounts of data. (for determining MTU issues yes, but likely you have a full 1500 path anyway thus these should as good as possible not happen anyway) But unless you have a contact at the site it will be tough to resolve the issue anyway.
Oh and of course do set the User-Agent to something logical and to be super nice include a contact address so that people who do check their logs once in a while for fishy things they at least know what is happening there and that it is not a process run afoul or something.
Good policy, yes. Some robots do this but others don't.
Of course, asking before doing tends to be a good idea too.
Depends on the scale. I'm not going to ask permission to poll someone else's site every 5 minutes, and I would be surprised if they asked me the same. OTOH, if they were polling to the point that it was causing issues, that might be different.
I was not talking about that low rate, not a lot of people will notice that, but the 1000qps from 500 sources was quite noticed and thus at first they got blocked, then we tried to find out who was doing it, and then they repointed to robots.txt, unblocked them and all was fine.
The IPv6 Internet already consists way too much out of monitoring by pulling pages and doing pings...
"way too much" for what? IPv6 is not widely adopted.
In comparison to real traffic. There has been a saying since the 6bone days already that IPv6 is just ICMPv6...
Fortunately that should heavily change in a few months.
We've been saying this for years. World IPv6 day 2012 will come and go, and things are unlikely to change a whole lot. The only thing that World IPv6 day 2012 will ensure is that people whose ipv6 configuration actively interferes with their daily Internet usage will be self-flagged and their configuration issues can be dealt with.
Fully agree, but at least at that point nobody will be able to claim that they can't deploy IPv6 on the access side as there is no content ;)
(who noticed a certain s....h company performing latency checks against one of his sites, which was no problem, but the fact that they where causing almost more hits/traffic/load than normal clients was a bit on the much side
If that web page is configured to be as top-heavy as this, then I'd suggest putting a cache in front of it. nginx is good for this sort of thing.
nginx does not help if your content is not cacheable by nginx, for instance if you simply show the IP address of the client and if they thus have IPv6 or IPv4. In our case, indeed, everything that is static is served by nginx, which is why hammering on /robots.txt is not an issue at all... On 2012-03-20 21:45 , Charles N Wyble wrote:
On 03/20/2012 09:54 AM, Jeroen Massar wrote:
On 2012-03-20 15:40 , Vinny_Abello@Dell.com wrote:
For everybody who is "monitoring" other people's websites, please please please, monitor something static like /robots.txt as that can be statically served and is kinda appropriate as it is intended for robots.
This could provide a false positive if one is interested in ensuring that the full application stack is working.
As stated above and given the example of the original subject of ipv6.level3.com, what exactly are you going to do when it does not? And again, if the owner does not care, why should you? Also, maybe they do a redesign of the site and remove the keywords or other metrics you are looking for. It is not your problem to monitor it for them, unless they hire you to do so of course.
Oh and of course do set the User-Agent to something logical and to be super nice include a contact address so that people who do check their logs once in a while for fishy things they at least know what is happening there and that it is not a process run afoul or something.
A server side process? Or client side?
Take a guess what something that polls a HTTP server is.
If the client side monitoring is too aggressive , then your rate limiting firewall rules should kick in and block it. If you don't have a rate limiting firewall on your web server, (on the server itself, not in front of it) then you have bigger problems.
You indeed will have a lot of problems when you are doing connection tracking on your website, be that on the box itself or in front of it in a separate TCP state engine.
Of course, asking before doing tends to be a good idea too.
If you are running a public service, expect it to get monitored/attacked/probed etc. If you don't want traffic from certain sources then block it.
That is exactly what happened, but if they would have set a proper user-agent it would not have taken time to figure out why they where doing it. There is a big difference between malicious and good traffic, people tend to want to serve the latter one.
The IPv6 Internet already consists way too much out of monitoring by pulling pages and doing pings...
Who made you the arbiter of acceptable automated traffic levels?
I did! And as you state yourself, if you do not like it, block it, which is what we do. But that was not what this thread was about, if you recall, it started with noting that you might want to ask for permission and that you might want to provide proper contact details in the probing.
(who noticed a certain s....h company performing latency checks against one of his sites, which was no problem, but the fact that they where causing almost more hits/traffic/load than normal clients was a bit on the much side,
Again. Use a firewall and limit them if the traffic isn't in line with your site policies.
I can only suggest running a site once with more than a few hits per second that is distributed around the world and with actual users ;)
And for the few folks putting nagios's on other people's sites, they obviously do not understand that even if the alarm goes off that something is broken that they cannot fix it anyway, thus why bother...
You obviously do not understand why people are implementing these monitors.
Having written various monitoring systems I know exactly why they are doing it. I also know that they are monitoring the wrong thing.
It's to serve as a canary for v6 connectivity issues.
Just polling robots.txt is good enough for that. Asking the site operator if it is good with them is also a good idea. Providing contact details in the User-Agent is also a good idea.
If I was implementing a monitor like this, I'd use the following logic:
HTTP 200 returned via v4/v6 == all is well HTTP 200 returned via v4 or v6 , no HTTP code returned via v4 or v6 (ie one path works) == v6/v4 potentially broken. no HTTP code returned via either method == end site problem. nothing we can do. don't alert.
And then you get an alert, who are you going to call?
Presumably you'd also implement a TCP 80 check as well.
Ehmmm, you do realize that if you are able to get a HTTP response that you have (unless doing HTTPS) actually already contacted port 80 over TCP? :) Greets, Jeroen