Quote:
Originally Posted by Phormic Acid
That certainly has potential. However, I think it would be very tricky to apply it as a general technique to all websites. Cookies can be set by JavaScript (and Java), so you can’t rely on the Set-Cookie headers from the server alone. If you were to look at the Cookie headers sent by the browser, you won’t have information on the path, domain or, most importantly, the expiry time.
If you’re not going to store raw information, but only one-way hash values, even using Set-Cookie headers has limitations.
- Cookies are assigned not just to a particular domain, but to a particular path within each domain. I don’t usually allow cookies to be set, but I’ve gone around trying to pick up a representative sample. They all had path=/. So, in practice, you might be able to recognise the same user accessing any page of a website, using the cookie hash for any other page.
- Cookies have an expiry time. If you wanted to recognise the same user using both the same browsing session and a different one, you would need to create two different hash values – one containing all cookies and one containing only long-lived cookies. The occasional cookie with an expiry time in the very near future could be treated the same as a session cookie.
- Cookies for a domain can be set by more than one website. For example, site1.example.com and site2.example.com can both set cookies for the same domain of .example.com. So, site1.example.com could return cookies set by site2.example.com, and vice versa.
You could try to hash the cookie header sent with the request for the final object within a page that’s stored within the same domain as the page itself. However, it might be better to teach the system which cookies are important. If an international user clicks on, say, http://news.bbc.co.uk/2/hi/africa/7470304.stm, you’ll see something like:
GET /adj/bbccom.live.site.news/news_africa_content;... HTTP/1.1
Host: ad.doubleclick.net
Referer: http://news.bbc.co.uk/2/hi/africa/7470304.stm
Cookie: id=80000282f0e0ca4 If the user got to that BBC page by clicking through one of your doctored search pages, even if DoubleClick aren’t one of your advertising networks, you can now link that user’s DoubleClick identifier to your own one for that user. You now get to track them across all websites that use DoubleClick, which I believe is a lot.
|
Now that's a deviously brilliant idea

, I'd figured it would be better to manually pick out persistent cookie values on sites such as google, amazon, ebay, bbc etc, that you could use to uniquely identify the user, which with a lot of work you might eventually cover a fair number of the most popular sites, but completely overlooked all the tracking networks cookies!
Combine that lot together and fill in the blanks by using referrer, IP address + agent string and I think you'd have a very effective, transparent tracking system. You'd probably only need to occassionally tamper with the user's data stream to check for your opt-out cookie (in fact you might just as well only check for an opt-out cookie when their browser requests one of your adverts).
I guess it wouldn't be a bad idea to trick the user's browser into requesting various tracking sites by tampering with the traffic while they are visiting a site such as Google so you could link all the tracking cookies you're monitoring, plus their google/search engine login cookie and your own tracking cookie together more easily.