Two University of Chicago research groups will help build the pilot phase of an ambitious new National Institutes of Health initiative to make U.S. biomedical research data and tools accessible to more scientists.
The NIH Data Commons, a shared virtual space where scientists can work with the digital objects of biomedical research, will launch a four-year pilot phase, the agency announced Nov. 6. Globus, the UChicago-based non-profit research data management platform, and the Center for Data Intensive Science at UChicago are both part of the multi-institutional consortium receiving 12 awards totaling $9 million to implement this powerful new platform.
“Harvesting the wealth of information in biomedical data will advance our understanding of human health and disease,” said NIH Director Francis S. Collins. “However, poor data accessibility is a major barrier to translating data into understanding. The NIH Data Commons Pilot Phase is an important effort to remove that barrier.”
Researchers in medicine and biology increasingly work with massive datasets to better understand disease, find new treatments, and decode the basics of life. These data are rich with information, but create technical challenges due to their size, complexity, privacy requirements, and the specialized analytic tools needed for their analysis.
A “data commons” helps eliminate these barriers by creating a virtual, cloud-based platform in which researchers can easily access and work with otherwise intractable datasets. For example, scientists at multiple institutions could share and compare patient genetic sequences to find potential new drug targets for a disease. Scientists can also extract more value from federally funded research, as data collected by a single laboratory will be available for others to discover and build upon in their own work.
Other data-heavy sciences, such as astronomy and climate research, have constructed data commons, and last year the National Cancer Institute—one of 27 centers at the NIH—announced their Genomic Data Commons, built and managed by CDIS and the University of Chicago.
But building a data commons for the nearly $30 billion of research funded by the NIH each year is an even larger enterprise. The pilot phase for the NIH Data Commons will explore the feasibility and best practices for making digital objects available through collaborative platforms.
Globus, a widely used platform for transferring, sharing and discovering research data developed by University of Chicago and Argonne National Laboratory, will partner with USC Information Sciences Institute to provide cloud-based services that enable key capabilities for the NIH Data Commons pilot. Those services include new privacy and security measures for controlled-access data, leveraging tools for managing Protected Health Information Globus is concurrently developing in an NCI-funded project. Globus also led the creation of the Materials Data Facility, a commons-like environment that enables researchers in the Materials Genome Initiative to share datasets.
“Globus is used by thousands of researchers in other scientific fields with intensive computational and data needs, and our platform is ready to help support the architecture of the new NIH Data Commons,” said Ian Foster, co-founder and director of Globus and the Arthur Holly Compton Distinguished Service Professor of Computer Science at UChicago. “We’re excited to bring our mission of accelerating research to this important effort that will unlock new discoveries.”
The Center for Data Intensive Science, led by Jim and Karen Frank Director Robert L. Grossman, will partner with the University of California Santa Cruz and the Broad Institute for their contribution to the pilot phase. Each institution has a strong track record of developing production-grade software platforms that currently support flagship scientific efforts, including the CDIS-developed NCI Genomic Data Commons at the University of Chicago. They will align these individual efforts in a collaboration called the Commons Alliance so that data commons can be the foundation for an open ecosystem of software applications and services developed by a research community.
“We have developed eight data commons that are used by thousands of researchers each day and that all interoperate with each other,” said Robert L. Grossman, the Frederick H. Rawson Professor of Medicine and Computer Science at the University of Chicago. “For this project, the Commons Alliance will be building an open platform so that researchers anywhere in the world can easily build their own custom applications over the NIH Data Commons to advance their own research.”
Three NIH-funded data sets on genotype-tissue expression, trans-omics for precision medicine, and model organism genomes will serve as test cases for the NIH Data Commons Pilot Phase. More data resources will be added once the pilot phase has achieved its primary objectives, the NIH announced.