Lest you think the recent high-profile info leak cases are unique, new research finds that many popular websites leak your information to third parties — often on purpose.
And, while each website might be leaking only a small portion of your information, the powerful tracking tools that receive it are able to patch all those small tidbits together into a pretty clear picture of who you are and what you are interested in.
That's the finding of a study of more than 100 popular websites used by tens of millions of people that found three-quarters directly leak either private information or users' unique identifiers to third-party tracking sites , according to research co-author Craig Wills, professor of computer science at Worcester Polytechnic Institute (WPI).
Wills research also demonstrated how the leakage of private information — including email addresses, physical addresses, and even the configuration of a user's web browser — by many different sites, could permit tracking sites to link many disparate pieces of information, including browsing histories contained in tracking cookies and the contents of searches on health and travel sites, to create detailed profiles of individuals.
"Despite a number of proposals and reports put forward by researchers, government agencies and privacy advocates, the problem of privacy has worsened significantly," Wills said of his study, which he presented last week at the Web 2.0 Security and Privacy Conference in Oakland, Calif. "With the growing disconnect between the existing and proposed privacy protection measures and the increasing and increasingly worrisome linkage of personal information from all sorts of websites, we believe it is time to move beyond what is clearly a losing battle with third-party aggregators and examine what roles first-party sites can play in protecting the privacy of their users."
Creating a profile of you
The researchers, who had previously brought attention to the leakage of personal information from many popular social networking sites, decided to explore the handling of private information by conventional websites, an area that has gone largely unexamined, Wills said. They focused on sites that encourage users to register, since users often share personal and personally identifiable information, including their names and physical and email addresses, during the registration process. They also examined popular health and travel sites, since users conduct searches on these sites that can point to their health issues or reveal their travel plans.
They found that information is leaked through a number of routes to third-party sites that track users' browsing behavior for advertisers. In some cases, information was passed deliberately to the third-party sites. In others it was included, either deliberately or inadvertently, as part of routine information exchanges with these sites. Depending on the site, the leakage occurred as users were creating, viewing, editing or logging into their accounts, or while navigating the websites. They also observed sensitive search terms (such as pancreatic cancer) being leaked by health sites and travel itineraries being leaked by travel sites .
The researchers examined the types of information being leaked by the websites and rated it according to sensitivity and ability to identify users. A user's name, phone number or email address rated highest on the "identifiability" scale, for example, while health information and travel itineraries rated highest on the sensitivity scale. While the majority of leaked information rated low on both scales, the authors said this does not necessarily suggest that users need not be concerned about privacy leaks from websites.
No easy solution
The study also evaluated a range of actions that web users could take to prevent their information from being leaked, including blocking the setting of cookies and using an advertising-blocking utility or the blocking features built into the newest versions of some popular browsers. They found that all of these techniques miss some types of leakage. Ad blockers, for example, do not reliably block leakage to so-called hidden third-party sites and also impair the usability of some websites.
They also reviewed proposals included in a December 2010 report on online privacy release by the FTC.
"The report advocates the Privacy by Design initiative, which seeks proactive embedding of privacy at the design stage, defaults to be set to private, transparence about users' information, and access to all user-related sensitive data stored in aggregators," the study said. But even these proposals fail to provide safeguards against the linkage of user information by third-party sites or leakage to hidden third parties, and they do not include methods for either verifying that third-party sites abide by the guidelines or penalizing those that do not, according to the researchers.
"A key failure of the FTC report is that it largely ignores the responsibility of websites in safeguarding the privacy of their users," Wills said. "These sites should play a custodial role in protecting their users and preventing the leakage of their sensitive or identifiable information. Third-party sites have a powerful economic incentive to continue to collect and aggregate user information, so relying on them to protect user privacy will continue to be a losing battle. It is time to put the focus on what first-party sites can and should do."