The National Institutes of Health (NIH) announced on Tuesday the launch of the world’s most expansive database that blends human genomic data with extensive clinical records, marking a pivotal step toward advanced personalized medicine.
The All of Us program, initiated in 2018, actively recruits participants from a wide spectrum of backgrounds and merges their DNA profiles with real‑world health information—including electronic medical records, wearable‑device data, and other relevant sources—to aid researchers in uncovering disease causes and therapeutic approaches.
At the time of the announcement, more than 747,000 Americans had contributed data. Among them, 535,000 individuals had whole‑genome sequences linked to 482,000 electronic health records that encompass physicians’ notes, diagnoses, and laboratory results. The repository also integrates health surveys covering socioeconomic status and environmental exposures such as air quality.
In contrast, the UK Biobank—widely regarded as a leading genomic archive—contains data for around 500,000 participants, predominantly of white European ancestry, which limits its applicability to more diverse populations.
“The program’s rich diversity is one of its most exciting features,” said Alicia Martin, a statistical geneticist at the Broad Institute who utilizes All of Us data to refine risk‑prediction tools. “It provides innovative opportunities to explore not only who is predisposed to disease but also how disease progresses, who is responsive to specific treatments, and how various factors influence these outcomes.”
This milestone arrives as the 21st Century Cures Act, a major funding source for All of Us, is slated to expire at the end of the fiscal year. Since 2023, the program’s budget has already been slashed by 72 percent, and a coalition of over 50 medical societies has urged Congress to establish a new funding mechanism to safeguard the database’s future.
For decades, genetic research largely proceeded in isolation from environmental and other health data. Modern, personalized care demands a comprehensive database that layers biological, behavioral, and environmental information, enabling scientists to examine disease manifestations in a more integrated context.
The program aims to enroll at least one million volunteers, collecting data over a minimum of ten years to illuminate how genes interact with factors such as sleep patterns and geographic location. To date, the collection contains over 1.3 billion genetic variants and has contributed to the development of multiple genetic tests, including one that predicts inherited risk of cardiovascular conditions and another, currently in clinical trials, that could improve early detection of prostate cancer.
Because the U.S. health system is fragmented, the depth and continuity of electronic health records in the dataset may not match that of older, nationalized repositories like the UK Biobank, said Dr. Martin.
Nonetheless, the All of Us database’s strength lies in its breadth. Over 86 percent of participants represent groups that have historically been underrepresented in biomedical research—including racial and ethnic minorities, rural residents, and individuals with disabilities. The data have already uncovered genetic variants that lower the risk of kidney disease among people of African ancestry.
Researchers have long sought to study environmental hazards and diseases that disproportionately affect marginalized communities. However, the fragmented nature of American healthcare has made assembling data on such a scale nearly impossible—until now.
“You might have data at Vanderbilt and at Mount Sinai,” Dr. Martin noted, “but integrating them into a single, comprehensive resource is the unique value of All of Us.”

