National Cancer Institute

Star-P and SGI® Altix® Accelerate Genomic Correlation 200-Fold

Star-P and SGI Altix are helping researchers at the National Cancer Institute (NCI) plumb a vast public database of genomic information for potential discoveries. This genomic profi ling effort may help researchers better understand genetic risk factors for cancer. Or it might help them develop new procedures for testing the genomic profi les of tumors – procedures that might advance the cause of personalized medicine, in which patient’s genetic information may be used to customize the detection, treatment, or prevention of disease.

But an explosion in the amount of genomic data available to NCI researchers has made their work increasingly diffi cult. Their tasks require more computing power, more system memory, and – all too often – more time. And in the race to understand how genetics and cancer are linked, time is precious.

A project underway at NCI, however, exemplifi es how servers from SGI and software from Interactive Supercomputing, Inc. are shattering the limits of traditional computing. And as a result, the answers to some researchers’ questions are arriving faster – up to 200 times faster – than ever before.

CORR4DB: Understanding the Relationship of Genes
Using a specialized software application called CORR4DB (as in "correlation for database"), researchers calculate the correlation between genes in microarray gene expression studies. CORR4DB taps a microarray database with more than 40,000 probe entries. The correlations help researchers better understand the relationship of genes, and their conclusions can provide the basis for additional genomic research. Until recently, however, a CORR4DB correlation running on a desktop system could take days to complete. The larger the probe numbers, the longer the "one-to-all" pattern recognition routine takes. Some microarrays run into the tens of thousands of probes.

CORR4DB was developed at the Advanced Biomedical Computing Center (ABCC), an NCI contractor lab that serves as the NCI’s supercomputing facility. Developed with MATLAB® from MathWorks, CORR4DB is an interactive desktop application whose performance is constrained by the amount of system memory available to the problem. Because the desktop system running CORR4DB supported only 2GB of memory, extensive correlations proved problematic for some projects.

Turn-around times of up to a week had proven to be the practical limit for researchers. And with bioinformatics on a steep growth curve, the problem was only going to grow worse.

Desktop speed limitations were eroding the interactivity benefi ts of MATLAB and CORR4DB. Researchers knew that larger correlations could be completed faster if CORR4DB could be parallelized to run on a high-performance computing (HPC) system outfi tted with a shared-memory architecture.

There was only one problem: Interactive applications like MATLAB don’t run on parallel HPC systems, which meant scientists would be forced to reprogram its algorithms for parallelization, most likely rewriting its software in C or Fortran, perhaps using MPI.

Bridging Desktop and Parallel HPC Systems
The Star-P™ interactive parallel computing platform and the SGI® Altix® 3700 server help bridge desktop systems with highperformance parallel computing resources without manually reprogramming algorithms written for the desktop. Star-P is designed to automate the process of parallelizing models and algorithms developed in MATLAB, adding the computational power of scalable SGI Altix servers to the desktop interactivity that MATLAB users rely on.

The Altix system, housed at the ABCC facility in Frederick, Md., is one of two SGI Altix resources owned and operated by NCI. Running SUSE® Linux Enterprise Server 9 from Novell®, the 64-processor system had been recently upgraded to 256GB of memory – all of which could be made available to solve CORR4DB problems once the application was parallelized. With a few simple modifi cations to their software code, most MATLAB users can quickly and easily harness the proven HPC capabilities of SGI Altix systems to compute even the largest, most complicated algorithms.

The Star-P approach had its advantages, notes Dr. Mark Potts, principal of HPC Applications, Inc., a consulting firm contracted to get NCI’s software up and running on the SGI Altix. "If your goal is to take the same interactive environment and transfer it to a parallel processing system with a lot more memory, then you’ll look for the easiest way to get there," says Potts. "NCI is accustomed to working in MATLAB and with certain formatted fi les, and this approach retains that environment."

Even more important was the impact on NCI’s scientifi c workfl ow. A typical CORR4DB routine might represent a data matrix 30,000 by 30,000, in which each probe is correlated with every other probe in the sample. On the desktop, the routine took more than two days to complete. But using just six processors and 25GB of memory on the Altix system, the entire correlation is done in 15 minutes or less. That means typical CORR4DB routines are running on the order of 200 times faster than before.

That acceleration is signifi cant to researchers looking to continually extend the reach of their studies. As correlations increase in size, the computational requirements increase as well. In fact, doubling the data matrix takes more than twice as long to complete the correlation. On the desktop, the size and capability of the machine determined the scope of the problem. But today, when even the largest problems can be solved in hours or minutes, researchers can run 10 correlations in a day.

A Return to Interactivity
The combination of Star-P and SGI Altix has given researchers the ability to run more samples, and possibly approach problems differently than they would have before. Parallelizing CORR4DB, in effect, has given them more fl exibility to fi nd what they want and in greater detail.

With a more powerful parallel system at their disposal, researchers may also try even more complex searches that previously weren’t an option. For example, the group has estimated that, using today’s database, the largest potential CORR4DB correlation – with a data matrix of 100,000 by 100,000 – would require more than 256GB of memory to solve. For that task, they would transfer the correlation to another ABCC resource – an SGI Altix 3700 system powered by 256 Intel® Itanium® 2 processors and equipped with 1TB of memory.

Altix systems in particular offer an advantage for running software applications parallelized with Star-P. The systems can make any number of processors or any amount of memory available to the application, supporting both fine-grained (global) and coarse-grained (distributed) parallelism. For example, an Altix system can apply eight processors to a single problem, or commit a single processor to handle computational tasks while still accessing dozens of Gigabytes of memory to work with a large data matrix.

Built on industry-standard components and a 64-bit Linux operating environment, the flexible SGI Altix platform is available in a vast range of confi gurations, from entry-level and mid-range cluster solutions to larger shared memory clusters, servers, and supercomputers that scale as users’ needs evolve. SGI Altix servers are powered by dual- and quad-core Intel® Xeon® processors and single- and dual-core Intel Itanium 2 processors, and feature an independently scalable system architecture that allows users to individually scale CPUs, memory, or I/O to meet application requirements.

Learn how Star-P and SGI Altix systems can transform your workfl ow and extend your desktop MATLAB environment.

For more information on the Star-P interactive parallel computing platform, visit www.sgi.com/products/software/starp/. Or visit Interactive Supercomputing at www.interactivesupercomputing.com.

For more information on flexible and scalable SGI Altix servers, visit www.sgi.com/altix.


Images courtesy of the Interactive Supercomputing