Almost two dozen repositories of research and public health data supported by the National Institutes of Health are marked for “review” under the Trump administration’s direction, and researchers and archivists say the data is at risk of being lost forever if the repositories go down.
“The problem with archiving this data is that we can’t,” Lisa Chinn, Head of Research Data Services at the University of Chicago, told 404 Media. Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency, and those agreements are carefully administered through a disclosure risk review process.
A message appeared at the top of multiple NIH websites last week that says: “This repository is under review for potential modification in compliance with Administration directives.”
Repositories with the message include archives of cancer imagery, Alzheimer’s disease research, sleep studies, HIV databases, and COVID-19 vaccination and mortality data. A list identified by an archivist includes:
- DASH Data and Specimen Hub
- National COVID Cohort Collaborative
- The DANDI Archive
- The Brain Image Library
- The Cancer Imaging Archive
- BioData Catalyst
- National Sleep Research Resource
- National Alzheimer’s Coordinating Center
- AgingResearchBiobank
- Seattle Alzheimer’s Disease Brain Cell Atlas
- ApoE Pathobiology in Aging & Alzheimer’s Disease
- Child Language Data Exchange System
- LDbase
- Cellular Senescence Network (SenNet)
- The National Center for Advancing Translation Sciences’ OpenData Portal
- Catalog of the NINDS Human Cell and Data Repository
- The Brain Research Through Advancing Innovative Neurotechnologies Initiative
- HIV databases
- The Neuroscience Multi-Omic Archive
- The Human Health Exposure Analysis Resource Data Center
- Mouse Models of Human Cancer Database
Based on archived versions of the websites, the message was added to most of the sites last week, around March 26 or 27. On March 28, Health and Human Services Secretary Robert F. Kennedy Jr. announced that HHS and agencies it oversees, including NIH, would lay off 10,000 full-time employees as part of a “reduction in force” plan. On Tuesday, at least five directors of NIH’s 27 institutes and centers were told they were put on leave. Kennedy’s plan outlines 1,200 layoffs at NIH alone. Yesterday, Kennedy said some of the cuts to programs will be reinstated. “Personnel that should not have been cut were cut. We’re reinstating them,” he said. “Part of the DOGE—we talked about this from the beginning—is we’re going to do 80 percent cuts, but 20 percent of those are going to have to be reinstalled, because we’ll make mistakes.” Earlier this week, researchers filed a lawsuit challenging the cancellation of research grants totaling more than $2.4 billion over the past month by NIH.
Under the Trump administration’s purge of public government websites and health resources, archivists have been diligently saving what they can. But there are limits to what can be archived by volunteers, and many of these databases marked for “potential modification” can’t be saved.
“People don’t usually appreciate, much less our current administration, how much labor goes into maintaining a large research dataset.”
Even if someone does have access through a DUA, they might not have long term access or the data might only be accessible through secure devices that aren’t connected to external networks, so data can’t be downloaded or backed up. And much of the data contains personally identifying information or health information that’s protected under HIPAA, which complicates volunteers’ efforts to store it.
Henrik Schönemann, a historian who started the Safeguarding Research & Culture archivist project, told 404 Media that as part of the project, they rely on institutions to help contribute storage; if they can’t guarantee all of the data is legal to download and store, they can’t save it in partnership with an institution if the opportunity arises.
“In general it’s very important for us to be able to say to institutions, ‘yes we got public data, we did not break paywalls, we did not break any agreements, it’s fine for you to contribute with hosting,’” Schönemann said. The group is using Bittorrent to store and seed archived pages for now. But the NIH datasets under threat contain potentially multiple petabytes of data to be saved, and archivists need hosts to help with storage. “All of this is only possible for the publicly funded institutions if they can be sure they don’t host any infringing material,” he said.
“So far, it seems like what is happening is less that these data sets are actively being deleted or clawed back and more that they are laying off the workers whose job is to maintain them, update them and maintain the infrastructure that supports them,” a librarian affiliated with the Data Rescue Project told 404 Media. “In time, this will have the same effect, but it’s really hard to predict. People don’t usually appreciate, much less our current administration, how much labor goes into maintaining a large research dataset.”
“The impacts that I’ve personally seen are that researchers lose five years of research because they once had access and now their DUA is up, and there’s no one in office, because they’ve been fired, to renew their DUA,” Chinn said. “This means researchers can’t publish (de-identified versions) papers based on data analysis they’ve already completed.” She gave an example of research from the Department of Education, which has decades of studies that some researchers use to compare student performance and learning outcomes that teach us about how wealth and location impact education. In a scenario where that data is lost, “we will not have access to that data to compare year over year shifts in performance,” she said. “We will also not be able to compare, on a national scale, where we stand in comparison to other nations.”
“Right now, the best I can do is advise the researchers that they need to get copies of the data that they are researching with that’s restricted,” the librarian-archivist said.