Big data is recognized as essential to efforts in understanding and treating cancer. Cancer is as complex as is it is devastating. It involves a host of genetic, lifestyle and environmental factors, and is now known to comprise hundreds of diseases—each with unique features, driving forces and vulnerabilities to treatments. Large sample sizes are required to provide the statistical power to understand which combinations of drugs are effective against which combinations of mutations that drive cancer.
Breaking barriers
While enormous amounts of genomic and clinical data have been gathered by NCI-funded research, several barriers have prevented researchers from making full use of them. Genomic data from different projects, clinical trials and cancer types are siloed in different locations with local management systems, making data sharing difficult. These large datasets can take months to download, and not all researchers have access to the sophisticated tools needed to study them. In addition, disparate collection and analysis approaches by separate research groups inhibit collaborative work.
The Genomic Data Commons breaks down these barriers by bringing cancer genomics datasets and associated clinical data into one location that any researcher may access. It harmonizes the data with a common set of analytic pipelines to make it easier to study the information, which in the past has typically been available as separate datasets analyzed with separate pipelines. By making these data available using modern computing and network technology, the GDC makes it possible for any researcher to ask new and fundamental questions about cancer.
Built and managed by Grossman’s team at the University of Chicago, the Genomic Data Commons will:
- Serve as a central unified repository for cancer genomic data and associated clinical data.
- Clean, standardize and harmonize data, as well as provide quality control, so that analyses can be conducted using common algorithms and pipelines.
- Support basic research and clinical trials by making data easily accessible, findable, interoperable and reusable.
- Provide powerful data transfer, search, Application Programming Interface (API) and analysis tools to researchers at no cost.
A foundation for the future
As the first step in a next-generation knowledge system for cancer, the Genomic Data Commons enables and accelerates efforts to identify both high- and low-frequency cancer driver mutations, assists in revealing the genetic determinants of response to therapy, and informs the composition of clinical trial cohorts.
The Genomic Data Commons will help bridge silos by providing researchers with access to high-quality data, the tools needed to share and study them, and support to submit their own data. It will house data from a new era of programs that will sequence the DNA of patients enrolled in NCI clinical trials. These datasets will lead to a much deeper understanding of which therapies are most effective for different cancers. The GDC will support clinical trials that focus on single patients, known as “n of 1” clinical trials, and will become an important component in how precision medicine is used to treat individual patients.
The Genomic Data Commons also creates a foundation for future cloud-based technologies that could allow researchers to analyze large-scale datasets and perform experiments remotely, such as through the NCI’s Cancer Cloud Pilots Program. In addition, the open-source software being developed by the CDIS has the potential to become a model for data-intensive research efforts for other diseases, such as Alzheimer’s and diabetes, which would greatly benefit from similar large-scale, data-driven approaches to develop cures.
“We are at a crossroads today in whether we will have the critical mass of cancer-related data needed to power new discoveries and improve cancer care,” Grossman said. “Over time, I expect the GDC will play a more and more important role in providing the data required at the scale required so that precision medicine fulfills its promise.”