Computation Institute to bulk up data analysis capability with $1.5 million grant

The Computation Institute, a joint effort of the University of Chicago and the U.S. Department of Energy's Argonne National Laboratory, has received a grant for a computer system that will enable researchers to store, access and analyze massive data sets.

The system is made possible through a $1.5 million National Science Foundation grant, which includes cost-sharing support from the University of Chicago. The new system is called the Petascale Active Data Store (PADS), which has been optimized for rapid data transactions, both on campus and around the globe.

Petascale computing involves the manipulation of petabytes of data. A petabyte is the equivalent of data contained on 1.5 million CD-ROMs.

The PADS design resulted from a study of the storage and analysis requirements of groups in astronomy and astrophysics, computer science, economics, evolutionary and organismal biology, geosciences, high-energy physics, linguistics, materials science, neuroscience, psychology and sociology.

For these groups, according to the PADS team, PADS represents a significant opportunity to look at their data in new ways, enabling new scientific insights and collaborations across disciplines. PADS also will serve as a vehicle for computer science research into active data storage systems and will provide rich data to investigate new techniques.

Results will be available as open source software, which interested users can download freely and adapt for other purposes.

"PADS will bring a significant analysis resource to the University of Chicago campus and provide a testbed for research on high-performance analysis, a likely bottleneck in the scientific pipeline of the future," said Michael Papka, Deputy Associate Laboratory Director for Computing, Environment and Life Sciences at Argonne. Papka led the interdisciplinary team of University of Chicago researchers who developed the PADS proposal.

Several nVidia Tesla graphics processing units (GPUs) will be integrated with traditional CPUs in the PADS system. These GPUs are capable of computing certain operations many times faster than general-purpose personal computers.

"The Tesla nodes will allow us to experiment with algorithms that combine traditional CPUs and special-purpose GPUs to extract results from data faster than in the past," said Ian Foster, Director of the Computation Institute and the Arthur Holly Compton Distinguished Service Professor in Computer Science at the University of Chicago. "For example, in neuroscience, we will be using the system to accelerate Magnetic Resonance Imaging algorithms to diagnose traumatic brain injury."

PADS will be a hybrid system with many layers of storage. These layers range from a large, tape-based system at Argonne to individual computers on campus and elsewhere. The intermediate layer is a rack of computer disks at Argonne containing duplicate data sets as insurance against hard-drive failure.

To University of Chicago scientists, PADS represents a dramatic improvement over current practice, which requires them to quickly analyze data and then remove it from the system to make room for new data sets. With the storage that PADS provides, groups will be able to keep data active for longer periods of analysis.

"PADS will allow us to share unique data sets with a larger community of researchers, enabling analysis of the data in different ways without the necessity to quickly remove the data because we need the space," said Don Lamb, Director of the Center for Astrophysical Thermonuclear Flashes and the Louis Block Professor in Astronomy & Astrophysics at the University of Chicago.

The Computation Institute was founded in 2000 as a joint effort between Argonne and the University. Its mission is to address the most challenging problems arising in the use of strategic computation and communications.

For more technical specification and other information on PADS, see www.ci.uchicago.edu/pads.