View Single Post
Old 13-05-2008, 09:35   #6428
AlexanderHanff
Permanently Banned
 
Join Date: Mar 2008
Posts: 1,028
AlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful oneAlexanderHanff is the helpful one
Re: Virgin Media Phorm Webwise Adverts [Updated: See Post No. 1, 77, 102 & 797]

Quote:
Originally Posted by rryles View Post
The've said that robots.txt will be cached* and not fetched for every phormed user. So it seems unlikely to me that they would pick a random user and forge her user agent string.

Also: If they by some miracle actually followed the robots.txt standard then the user-agent they match against and the one they send in the http headers must match:

"The name token a robot chooses for itself should be sent
as part of the HTTP User-agent header, and must be well documented."**

Sources:

* From http://www.cl.cam.ac.uk/~rnc1/080404phorm.pdf "40. Once the robots.txt file (if any) has been fetched, it will be cached. The cache retention period will be value set by the website using standard HTTP cache-control mechanisms, or for one month if no period is specified. The minimum period that the file will be cached for is two hours."

** From http://www.robotstxt.org/norobots-rfc.txt
You misunderstood me I think. I was trying to explain that the system will consist of 2 stages. When you send out a web request to a web site, Phorm (not you) will go off and look for robots.txt (providing it is not already cached) to check if search engines are allowed to spider. This stage is the one where they refuse to tell us what user-agent they will use.

Then the second stage is them actually forwarding your original request (yes there are some redirects and stuff going on in between but lets try and keep it simple) where we can only assume your real user-agent will be used. Certainly there has been no indication from Phorm that they will be using a different user-agent for these requests (and realistically they wouldn't want to as they could then be easily identified and blocked).

Alexander Hanff
AlexanderHanff is offline