Quote:
Originally Posted by Dephormation
Robots.txt doesn't allow (check the spec), it disallows.
Its a denial mechanism.
What's required is a mechanism of consent, where no consent (ie, explicit consent is not present) means no consent.
Pete.
|
The original RFC spec only does disallow. However the benchmark they have set is Google and Google's bots support an allow extension. Google's bots also check for meta tags in the documents. Checking those would require interception first though.
Interestingly, if they do obey robots.txt (at all) then they won't be able to use searches done on Google! What a shame
http://www.google.com/robots.txt disallows all the actual search pages for
all user agents.