Sorry! Here's the URL content (re. Paging Google...)
Doh! I had no idea my thread would require login/be hidden from general view! (A robots.txt info site had directed me there...) It seems I fell for an SEO scam... how ironic. I guess that's why I haven't heard from google... Anyway, here's the page content (with some editing and paraphrasing): Subject: paging google! robots.txt being ignored! Hi. My robots.txt was put in place in August! But google still has tons of results that violate the file. http://www.searchengineworld.com/cgi-bin/robotcheck.cgi doesn't complain (other than about the use of google's nonstandard extensions described at http://www.google.com/webmasters/remove.html ) The above page says that it's OK that #per [[AdminRequests]] User-agent: Googlebot Disallow: /*?* is last (after User-agent: *) and seems to suggest that the syntax is OK. I also tried User-agent: Googlebot Disallow: /*? but it hasn't helped. I asked google to review it via the automatic URL removal system (http://services.google.com/urlconsole/controller). Result: URLs cannot have wild cards in them (e.g. "*"). The following line contains a wild card: DISALLOW: /*? How insane is that? Oh, and while /*?* wasn't per their example, it was legal, per their syntax, same as /*? ! The site as around 35,000 pages, and I don't think a small robots.txt to do what I want is possible without using the wildcard extension.
participants (1)
-
Matthew Elvey