Benefits & Risks of Using Anonymized Health Data in Research

Dessislava Fessenko discusses some of the ethical challenges of access to and use of anonymized health data and how risks and benefits to society could be balanced.   __________________________________________

The use of Big Data and algorithms in drug discovery are gaining ground. Research shows that, in year 2021, more than 100 regulatory submissions of drug and biological products to the United States’ Food and Drug Administration (“FDA”) used machine learning or another artificial intelligence (AI) technique to screen large amounts of anonymized health data, to automate data analysis, to predict outcomes, or for other uses during the drug development process. AI was used to screen patient demographics, lab values, or clinical outcomes, for example. The use of computational techniques and vast troves of data promise societal benefits, as well as risks and ethical challenges, as the FDA cautions in a recent discussion paper.

The collection and analysis of anonymized health data can be beneficial if the data is collided into large datasets and processed computationally, that is, if it is turned into “Big Data”. Health-related Big Data holds high predictive value for biomedical research because Big Data may reveal patterns and correlations between phenomena such as symptoms and conditions and likely health outcomes. The predictive capacity of Big Data can enable investigators to design and conduct their studies in ways that better target specific outcomes. This strategy can increase the rate of success, accuracy and expedience of biomedical research, expand its scope (e.g. to rare diseases) and decrease costs.

Photo Credit: Christiaan Colen/flickr. Image Description: Binary code.

The use of anonymized health data and Big Data in particular is not, however, entirely risk-free due to the technical, methodological, and broader ethical and social implications of their processing. Research subjects’ privacy can be compromised if the data has not undergone irreversible anonymization and can be re-identified. If data is not adequately curated and de-biased, it may prove unrepresentative for the health needs of prospective patients. The use of biased data may lead to unintentional harm to patients, such as inaccurate diagnosis. Use of anonymized health data for primary or secondary research in the United States would not require consent because data is de-identified. However, its unintended use might undermine research subjects’ trust if they are not informed of such (re)use. Trust might also be undermined when the data is accidentally disclosed, or monetized.

Benefits to society and risks from the (re)use of anonymized health data should be balanced in ways that uphold research subjects’ privacy and their and future patients’ fair treatment and safety. Preserving privacy would entail irreversible anonymization. To justify research subjects’ trust and to treat them fairly, investigators would need to disclose upfront, in understandable language, the possible purposes and manner of data use, including monetization. Adequate data curation and de-biasing would be needed to avert harm to future patients due to unrepresentativeness of data. Robust technical and cyber-security protocols are required to forestall accidental disclosure, data breaches, and error-prone processing that might undermine research subjects’ trust or harm patients. Data monetization in and of itself may not result in exploitation and injustice as data holds no tangible value outside Big Data sets, as scholars and researchers have argued with regard to tissue and other biological specimens. However, trust-building and fair treatment through disclosure and protection from accidental unintended use are still important.

The use of anonymized health data for research purposes could therefore be both a “greater good” and a liability. If collected and processed at large scale, such data holds high predictive value for biomedical research. Anonymized health data may also pose risks to research subjects’ privacy and to their and future patients’ fair treatment and safety because of the technical, methodological, and social challenges associated with data use. To balance the risks and benefits for society, data should be collected and used in fair, privacy- and trust-preserving ways, for example through greater transparency regarding the potential (re)use of anonymized health data, forestalling and at least mitigating biases, unintended use and accidental disclosure. 

__________________________________________

Dessislava Fessenko is a Master of Bioethics candidate in the Center for Bioethics at Harvard Medical School. She is also a technology lawyer and a policy researcher working on AI and data policy and governance. @DessiFessenko