Quote:
Originally Posted by AlexanderHanff
I don't for one minute think Phorm would honour robots.txt if it explicitly denies them access. This is exactly why they won't tell us what user-agent they plan to use because they don't want to be denied access.
Let's not forget that robots.txt is not an access control mechanism, it is an honour based system which robots can either adhere to or ignore, it doesn't physically stop them accessing pages.
If their user-agent ever does get discovered, it would be useful to just add a script to your site which checks user-agent and if the Phorm user-agent is detected it builds a page which says something like "Get your hands of me you dirty ape!" or "Phorm is not welcome here, please go away." etc etc etc.
Alexander Hanff
|
I agree that this still relies on them honouring the robots.txt they get served, I am just going by what they have said so far and trying to come up with something that will block phorm but allow google. I know we shouldn't have to resort to this, I was just throwing the idea out there.
My suggestion was just for the request for robots.txt, any other page (eg stage two as you put it) I believe they just pass on the end users user agent which is useless in this situation.
Regards...
T
---------- Post added at 10:13 ---------- Previous post was at 10:08 ----------
Quote:
Originally Posted by R Jones
That's assuming that there IS a legitimate Phorm/Webwise user agent. My personal view is - there won't be one. Based on analysing the silences and fudges from my ISP. What they DONT say is far more revealing than what they DO say, it's why I keep asking them awkward questions - to find out which ones they don't answer.
|
I did say it relies on them having a unique user agent but as Alexander puts it, if they do not have a cached version of robots.txt they will make a request for one. This request I would imagine originates from the phorm equipment and not the end user so do they forge the users user-agent or googles etc oruse their own? This is the key question as you already know, but if/when the trial goes live, the question should be very quickly answered by looking at your own web server logs.