MADRID, Spain — Regardless of how much consumers to protect themselves, many accept their privacy is at risk every time they go online. A new report on website tracking likely won’t add any comfort to this perception. Researchers say their review finds more than 150 million websites contain sensitive content. Even more concerning, information about your activity in these places can be tracked and shared with others.
Researchers say this a major issue when looking at international laws restricting the collection and processing of personal data online. The European General Data Protection Regulation (GDPR) specifically includes any information revealing racial and ethnic origins, political opinions, religious beliefs, or union membership status. Legislation also protects data on a person’s health, sex life, or sexual orientation.
The study spent two years using machine learning technology to go over one billion sites across the English-speaking web. Researchers from TU Berlin and the Cyprus University of Technology developed the ability to identify sensitive URLs. These websites include content related to health, politics, and sexual orientation, but their information is still traced and shared like the rest of the internet.
“Tracking people when they visit websites with content that belongs to the GDPR sensitive categories is the true ‘Elephant in the Room’ of privacy,” Nikolaos Laoutaris of IMDEA Networks Institute says in a media release.
“Most people don’t mind be tracked about things that they consider innocent, but would be very upset to know that their visit to sensitive websites are being logged and released to unknown third parties. Our study is, by far, the biggest study about tracking of sensitive topics on the web. It shows that a good part of the web includes content of sensitive character. Unfortunately, these sensitive pages appear to be as tracked as the rest of the web.”
Making online privacy a priority
Researchers point out that, using machine learning systems, websites can be quickly classified for the first time to assess the risks to users. They say either a user’s browser or an add-on feature could alert someone before taking them to a website with sensitive content that may be tracked. When visiting such a site, trackers could be blocked and complaints automatically filed under privacy laws.
The main obstacle is properly classifying these URLs. Study authors say the ambiguity of terms like “health” are used by both legally sensitive documents and websites which are merely discussing healthy eating, sports, or diseases.
The results of Laoutaris and the team’s review will be presented at the ACM Internet Measurement Conference 2020 in Pittsburgh.