[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bugtraq
Subject:    privacy problems with HTTP cache-control
From:       Martin Pool <mbp () LINUXCARE ! COM>
Date:       2000-03-29 5:19:21
[Download RAW message or body]

  executive summary

     HTTP cache-control headers such as If-Modified-Since allow servers
     to track individual users in a manner similar to cookies, but with
     less constraints. This is a problem for user privacy against which
     browsers currently provide little protection.

  problem statement

     Alice is browsing the web; Bob runs a number of otherwise-unrelated
     web servers. Alice makes several requests to Bob's servers over
     time. Bob would like to tie together as many as possible of the
     requests made by Alice to learn more about Alice's usage patterns
     and identity: we call this identifying the request chain. Alice
     would like to access Bob's servers but not give away this
     information.

  existing approaches

    cookies

   The standard approach for associating user requests across several
   responses is the HTTP `Cookie' state-management extension. The Cookie
   response header allows a server to ask the client to store arbitrary
   short opaque data, which should be returned for future requests of
   that server matching particular criteria. Cookies are commonly used to
   store per-user form defaults, to manage web application sessions, and
   to associate requests between executions of the user agent.

   The user agent always has the option to just ignore the Set-Cookie
   response header, but most implementations default to obeying it to
   preserve functionality. Cookies can optionally specify an expiry time
   after which they should no longer be used, that they should persist on
   disk between client session, or that they should only be passed over
   transmission-level-secure connections.

   The privacy implications of cookies have been [1]extensively
   discussed, and several problems have been found and recitified in the
   past. One example of privacy compromise through cookies is the use of
   cookies attached to banner images downloaded from a central banner
   server: the same cookie is used within images linked from several
   servers, and so the user can be tracked as they move around.

    other approaches

   An obvious means to associate requests is by source IP address. Over
   the short term this will generally work quite well, as a client is
   likely to use a single IP address during a browsing session. Even then
   it is complicated by proxies acting for multiple clients, network
   address translation, or multiuser machines. Over a longer term, the
   information is convolved by dynamically-assigned IPs, mobile computers
   moving between networks, dialup pools and the like. Indeed, cookies
   were proposed in large part to allow legitimate stateful applications
   to cope with the impossibility of uniquely identifying users by IP
   address.

  the meantime exploit

   The fundament of the meantime exploit is that the server wishes to
   `tag' the client with some information that will later be reported
   back, allowing the server to identify a chain. Cookies are a good
   approach to this, but their privacy implications are well known and so
   Bob requires a more surreptitious approach.

   The HTTP cache-control headers are perfect for this: the data is
   provided by the server, stored but not verified by the client, and
   then provided verbatim back to the server on the next matching
   request.

   Two headers in particular are useful: Last-Modified and ETag. Both are
   designed to help the client and server negotiate whether to use a
   cached copy or fetch the resource again.

   The general approach of meantime is that rather than using the headers
   for their intended purpose, Bob's servers will instead send down a
   unique tag for the client.

   Last-Modified is constrained to be a date, and therefore is somewhat
   inflexible. Nevertheless, the server can reasonably choose any second
   since the Unix epoch, which allows it to tag on the order of one
   billion distinct clients.

   ETag allows an arbitrary short string to be stored and passed. It is
   not so commonly implemented in user agents at the moment, and so not
   such a good choice.

   In both cases the tag will be lost if the client discards the resource
   from its cache, or if it does not request the exact same resource in
   the future, or if the request is unconditional. (For example, Netscape
   sends an unconditional response when the user presses Shift+Reload.)
   Bob has less control over this than he has with cookies, which can be
   instructed to persist for an arbitrarily long period.

   The date is only sent back for the exact same URL, including any query
   parameters. By contrast, cookies can be returned for all resources in
   a site or section of a site. This makes Bob's job a little harder.

   Bob therefore should make sure that all pages link to a small common
   resource: perhaps a one-pixel image. This image is generated by a
   script that supplies and records a unique timestamp to each client,
   and records whatever is already present.

For a demonstration, more explanation and details, please see

  http://www.linuxcare.com.au/mbp/meantime/

Cheers!
--
Martin Pool, Linuxcare, Inc.
+61 2 6262 8990
mbp@linuxcare.com, http://www.linuxcare.com/
Linuxcare. Support for the revolution.

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic