Quote:
Originally Posted by AlexanderHanff
When you send out a web request to a web site, Phorm (not you) will go off and look for robots.txt (providing it is not already cached) to check if search engines are allowed to spider. This stage is the one where they refuse to tell us what user-agent they will use.
|
This has just given me an idea, although not exactly a straightfoward proposition.
If/When the BT trial happens, it's not going to be that difficult to work out what that user agent will be, just visit your own website (assuming you are phormed) then check your web logs.
Now assuming they don't forge a googlebot user agent and do use their own unique user agent, then it should be fairly simple to configure a web server to parse robots.txt as a script (I am sure I could set this up easily with apache/php) and serve different content based on the user agent. If it's a phorm user agent then deny the entire site, if not then serve your usual robots.txt.
Although this still doesn't get round the implied consent/default opt-in issue for webmasters/content authors, it's something to think about.
Regards...
T