小程序
传感搜
传感圈

Science Shouldn’t Give Data Brokers Cover for Stealing Your Privacy

2023-06-16 19:10:49
关注

When SafeGraph got caught selling location information on Planned Parenthood visitors last year, the data broker responded to public outcry by removing its family planning center data. But CEO Auren Hoffman tried to flip the script, claiming his company’s practice of harvesting and sharing sensitive data was actually an engine for beneficial research on abortion access—brandishing science as a shield for shredding people’s privacy.

SafeGraph’s move to cloak its privacy pillaging behind science comes as just one example of an industry-wide dodge. Other companies such as Veraset, Cuebiq and X-Mode also operate so-called data for good programs with academics and seized on the COVID pandemic to expand them. These brokers provide location data to academic researchers with prestigious publications in venues such as Nature and the Proceedings of the National Academy of Sciences USA. Yet in 2020 Veraset also gave Washington, D.C., officials bulk location data on hundreds of thousands of people without their consent. And a proposed class-action lawsuit this year named Cuebiq, X-Mode, and SafeGraph among data brokers that bought location data from the family tracking app Life360 without users’ consent.

Data brokers are buying and selling hundreds of millions of people’s location information, and too many researchers are inadvertently providing public-relations cover to this massive privacy invasion by using the data in scientific studies.

Researchers must carefully consider whether such data make them accomplices to this dubious practice. Lawmakers must act now to halt this trampling of Americans’ privacy rights. And the legal barricades that prevent full scrutiny of data brokers’ abuses must be dismantled.

SafeGraph’s removal of the clinic data was the real problem, Hoffman argued in a May 2022 interview with the now defunct tech news site Protocol: “Once we decided to take it down, we had hundreds of researchers complain,” he said. Yet when pressed, he could not name any—and the fact remains that the data put actual abortion seekers, providers and advocates in danger in the wake of the U.S. Supreme Court’s ruling on Dobbs v. Jackson Women's Health Organization.

Location data brokers such as SafeGraph, Veraset and the others simply don’t meet the standards for human subjects demanded of researchers, starting with the fact that meaningful “opt in” consent is consistently missing from their business practices. Data brokers often argue that the data they collect are opt in because users have agreed to share that information with an app—even though the overwhelming majority of users have no idea that it’s being sold on the side to brokers who, in turn, sell it to businesses, governments, local law enforcement and others.

In fact, Google concluded that SafeGraph’s practices were so out of line that it banned any apps using the company’s code from its Google Play app store, and both Apple and Google banned X-Mode from their respective app stores.

Furthermore, the data feeding into data brokers’ products can easily be linked to identifiable people despite the companies’ weak claims of anonymization. Information about where a person has been is itself enough: One widely cited study from 2013 found that researchers could uniquely characterize 50 percent of people using only two randomly chosen time and location data points.

Due to rapid growth of social media and smartphone use, data brokers today collect sensitive user data from a much wider variety of sources than in 2013, including hidden tracking in the background of mobile apps. While techniques vary and are often obscured behind nondisclosure agreements (NDAs), the resulting raw data they collect and process are based on sensitive, individual location traces.

Aggregating location data can sometimes preserve individual privacy, with safeguards accounting for the size of the data set and the type of data it includes. But no privacy-preserving aggregation protocols can justify the initial collection of location data from people without their consent.

Data brokers’ products are notoriously easy to reidentify, especially when combined with other data sets—and that’s exactly what some academic studies are doing. Studies have combined data broker locations with Census data, real-time Google Maps traffic estimates, local household surveys and figures from the Federal Highway Administration. While researchers appear intent on building the most reliable and comprehensive possible data sets, this merging is also a first step to reidentifying the data.

Behind layers of NDAs, data brokers typically hide their business practices—and the web of data aggregators, ad tech exchanges and mobile apps that their data stores are built on—from scrutiny. This should be a red flag for institutional review boards (IRBs), which oversee proposed research involving human subjects, and IRBs need visibility into whether and how data brokers and their partners actually obtain consent from users. Likewise, academics themselves have an interest in confirming the integrity and provenance of the data on which their work relies.

Without this accuracy and verification, some researchers obfuscate data broker information with prattle that mirrors marketing language. For example, one paper described SafeGraph data as “anonymized human mobility data,” and another called them “foot traffic data from opt-in smartphone GPS tracking.” A third described data broker Spectus as providing “anonymous, privacy-compliant location data” with an “ironclad privacy framework.” None of this is close to the whole truth.

One Nature paper even paradoxically characterized Veraset’s location data as being both “fine-grained” and “anonymized.” Its specific data points included “anonymized device IDs” and “the timestamps, and precise geographical coordinates of dwelling points” where a device spent more than five minutes. Such fine-grained data cannot be anonymous.

Academic data sharing programs will remain disingenuous public relations ploys until companies obey data privacy and transparency requirements. The sensitive location data that brokers provide should only be collected and used with specific, informed consent, and subjects must have the right to withdraw that consent at any time.

We need comprehensive federal consumer data privacy legislation to enforce these standards—far more comprehensive than what Congress has put on the table to date. Such a bill must not preempt even stricter state laws; it should serve as a floor instead of a ceiling. And it must include a private right of action so that ordinary people can sue data brokers who violate their privacy rights, as well as strong minimization provisions that will prohibit companies from processing a person’s data except as strictly necessary to provide them the service they asked for. The bill also must prohibit companies from processing a person’s data except with their informed, voluntary, specific, opt-in consent — not the opt-out scenario that often exists now — and must prohibit pay-for-privacy schemes in which companies charge more from or provide lower quality to those who refuse to waive their privacy rights.

And we must strip away the NDAs to allow research into the data brokers themselves: their business practices, their partners, the ways their data can be abused, and the steps that can be taken to protect the people they put in harm’s way.

Data brokers claim they are bringing transparency to tech or “democratizing access to data.” But their scientific data sharing programs are nothing more than attempts to control the narrative around their unpopular and nonconsensual business practices. Critical academic research must not become reliant on profit-driven data pipelines that endanger the safety, privacy and economic opportunities of millions of people without their meaningful consent.

This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of Scientific American.

参考译文
科学不应为数据中间商窃取您的隐私提供掩护
去年,SafeGraph因出售参与计划生育组织访客的位置信息而遭到曝光后,这家数据经销商对公众的强烈抗议做出回应,删除了其有关计划生育中心的数据。然而,首席执行官奥伦·霍夫曼(Auren Hoffman)试图改变这一局面,声称其公司收集和分享敏感数据的行为实际上推动了关于堕胎可及性的有益研究,并以科学为盾,为侵犯个人隐私的行为开脱。SafeGraph将侵犯隐私的行为披上科学的外衣,只是整个行业内普遍规避责任的一个例子。其他公司,如Veraset、Cuebiq和X-Mode,也与学术界合作开展所谓的“数据用于公益”计划,并利用新冠疫情扩大这些计划。这些数据中介向学术研究人员提供数据,这些研究成果发表在《自然》和《美国国家科学院院刊》等知名期刊上。然而,2020年,Veraset在未获得个人同意的情况下,向华盛顿特区官员提供了数十万人的位置数据。此外,今年提出的一起集体诉讼案中,Cuebiq、X-Mode和SafeGraph被列为未经用户许可即从家庭追踪应用Life360购买位置数据的数据中介。这些数据中介正在买卖数亿人的位置信息,而许多研究人员在科学研究中使用这些数据,无意中为大规模侵犯隐私的行为提供了公关掩护。研究人员必须认真思考,使用这些数据是否使他们成为这种可疑做法的共犯。立法者必须立即采取行动,阻止对美国人隐私权的践踏。同时,阻碍对数据中介滥用行为进行彻底审查的法律障碍也必须被拆除。霍夫曼在2022年5月接受现已停刊的技术新闻网站Protocol采访时辩称,SafeGraph删除诊所数据才是真正的问题:“一旦我们决定将其删除,就有数百名研究人员提出抗议。”但他无法说出任何一个研究人员的名字。而事实是,这些数据在最高法院裁决“多布斯诉杰克逊妇女健康组织案”后,使真实的堕胎寻求者、提供者和倡导者陷入危险。像SafeGraph、Veraset及其他数据中介公司显然不符合研究人员对人类受试者所要求的标准,首先就是他们在业务实践中始终缺乏真正意义上的“自愿加入”同意。数据中介经常辩称,他们收集的数据是自愿提供的,因为用户同意将信息分享给某个应用程序——尽管绝大多数用户根本不知道这些信息会被转售给中介,而这些中介又将数据卖给企业、政府、地方执法部门等。事实上,谷歌认定SafeGraph的做法严重背离规范,因此禁止其任何使用该公司代码的应用程序进入Google Play应用商店,苹果和谷歌也都将X-Mode从各自的应用商店中移除。此外,尽管这些公司宣称数据已进行匿名处理,但输入数据中介产品的位置信息很容易与特定个人联系起来。一个人的行踪本身就足以识别其身份:2013年一项广受引用的研究发现,研究人员仅需两个随机选择的时间和位置数据点,就足以独特地识别50%的人。随着社交媒体和智能手机使用的快速普及,如今数据中介收集的敏感用户数据来源比2013年广泛得多,包括隐藏在手机应用后台的跟踪数据。尽管具体技术各不相同,通常还被保密协议(NDA)所掩盖,但它们所收集和处理的原始数据都是基于敏感的个人位置轨迹。对位置数据进行汇总有时可以保护个人隐私,前提是采取的保护措施能考虑到数据集的规模和数据类型。然而,任何保护隐私的汇总协议都无法为未经用户同意就收集其位置数据的行为辩护。数据中介的产品通常极易重新识别,尤其是与其他数据集结合时——这也是某些学术研究正在进行的操作。已有研究将数据中介的位置数据与人口普查数据、实时谷歌地图交通预估、地方住户调查数据以及美国联邦公路管理局的统计数据结合起来。虽然研究人员似乎致力于构建尽可能可靠和全面的数据集,但这种合并也是重新识别数据的第一步。在保密协议的重重保护下,数据中介通常隐藏其商业行为,以及其数据存储所依赖的数据聚合商、广告技术交易平台和移动应用网络,从而逃避监督。这应引起人类受试者研究伦理审查委员会(IRB)的高度警惕,这些委员会监管涉及人类受试者的科研项目,它们需要了解数据中介及其合作伙伴是否以及如何真正获得用户的同意。同样,学术界本身也有兴趣验证其研究依赖的数据的真实性和来源。缺乏这种准确性与验证性,一些研究人员就用模糊不清、模仿营销语言的措辞来掩盖数据中介的信息。例如,一篇论文将SafeGraph的数据描述为“匿名化的人类移动数据”,另一篇则将其称为“来自自愿加入的智能手机GPS跟踪的行人流量数据”。第三篇论文则称数据中介Spectus提供的数据是“匿名化、符合隐私规范的位置数据”,并配有“铁板钉钉的隐私保护框架”。然而,这些说法与事实相去甚远。甚至有一篇《自然》论文矛盾地将Veraset的位置数据同时称为“精细的”和“匿名的”,其具体数据点包括“匿名化设备ID”和“设备停留时间超过五分钟的精确地理坐标和时间戳”。如此精细的数据根本不可能是匿名的。学术数据共享计划将一直只是虚假的公关噱头,除非公司遵守数据隐私和透明度的要求。中介提供的敏感位置数据,必须以明确、知情的同意为基础进行收集和使用,并且受试者有权随时撤回其同意。我们需要全面的联邦消费者数据隐私立法来执行这些标准——远比国会至今提出的立法更全面。此类法案不得排除更严格的地方法律;它应作为最低标准,而非最高标准。此外,它必须包括个人诉讼权,使普通人能够起诉违反其隐私权的数据中介,并规定强有力的最小化条款,禁止公司以提供服务以外的任何理由处理个人数据。该法案还必须规定,公司只能在获得用户知情、自愿、明确、主动同意的前提下处理其数据,而非目前常见的默认同意(opt-out)模式;还必须禁止“付费隐私”计划,即公司向拒绝放弃隐私权的人收取更高费用或提供更低质量服务。我们还需要取消保密协议,使对数据中介本身的研究成为可能——包括他们的商业行为、他们的合作伙伴、他们数据被滥用的方式,以及可采取的保护真正处于危险中的人的措施。数据中介声称自己正在为科技带来透明度,或“使数据访问民主化”。但他们的科学数据共享计划只是试图控制公众对其不受欢迎且未经同意的商业行为的舆论。关键的学术研究不应依赖于以营利为目的的数据渠道,这些渠道在没有用户真正知情同意的情况下,危及数百万人的安全、隐私和经济机会。本文为观点与分析文章,作者的观点未必代表《科学美国人》的立场。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告
提取码
复制提取码
点击跳转至百度网盘