Due to the size of our experiments, our microarray studies looking at juvenile idiopathic arthritis have outstripped our divisional hardware resources. Specifically, we need approximately a terabyte of disk space to store our data and resources with 18 GB RAM and multiple processors to analyze it. These requirements make the use of desktop machines unfeasible. In order to access sufficient computing space and power in an economical manner, we have worked closely with BMI staff to get the specialized programs we require (GeneSpring, R/Bioconductor, PLINK, SNPGWA, and Haploview) installed and running on the cluster. Our specific requirements have presented challenges along the way, and the BMI staff has been extremely responsive to our needs. Without the hardware and personnel support from BMI, our research would be much more difficult.
Michael Barnes, PhD
Division of Rheumatology, Cincinnati Children's
We use the cluster to conduct microarray data analysis on a project with Dr. Tracy Glauser titled "Microarray analysis of newly onset epilepsy." Microarray data typically contains tens of thousands of data points on each patient, and pre-processing and analyzing these data is time- and resource-consuming. For example, standard desktop computers with 4GB RAM do not not allow us to conduct quality assessment of even 50 arrays using a package in Bioconductor (QCReport package) due to the memory shortage. By moving the analysis to the Linux cluster, we can get the results in several hours. In addition to standard data analyses, we do some computationally-intense procedures such as penalized logistic regression. With the cluster, this takes a few days but would be impossible to do on a desktop computer. Additionally, using the cluster frees up our desktop machines for other analyses.
Todd Nick, PhD
Center for Epidemiology and Biostatistics, Cincinnati Children's
To explore the effects of factors determining the power and efficiency of genome-wide association (GWA) studies, we tested the empirical power of case-control studies by extensive simulations using empirical genotype data from the HapMap ENCODE Project. The computational burden of such a study can be great. In part, this is due to the large number of genetic markers need to be simulated and the corresponding large number of association tests need to be performed for a single replication. Before we had the access to the computational cluster, the time-consuming nature of the computations had originally limited us to consider only a few realizations, rather than a full-scale simulation study. Fortunately, the computational power provided by the cluster allowed us to run the simulations on dozens of nodes simultaneously, which made it possible for us to examine the effects of different factors on the power of GWA under a wide range of scenarios.
Ge Zhang, PhD
Departments of Environmental Health and Family Medicine, University of Cincinnati College of Medicine
The Rance laboratory studies the structure and dynamics of proteins, protein complexes and protein/nucleic acid complexes using nuclear magnetic resonance (NMR) spectroscopy methods and molecular dynamics (MD) simulations. Our research is largely computational in nature and quite demanding of computer resources. The BMI computational cluster allows us to perform a large number of serial NMR structure and dynamics calculations in a short amount of time, as well as utilize parallelization strategies for long MD simulations. Our calculations, which can take many days to weeks to complete on our workstation computers, now finish on the order of hours-to-days. The BMI cluster facility is first-class and extremely integral to our research program, and access to these resources significantly speeds up our ability to produce high quality results.
Douglas Kojetin, PhD
Department of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati College of Medicine
Our group uses the BMI cluster in several ways. In collaboration with experimental and clinical colleagues, we use docking software packages, such as AUTODOCK, to perform in silico screening in the context of rational drug design. We also use the cluster for other molecular simulations as part of the services offered by the Protein Informatics Core. In the context of genotype-phenotype association and gene expression studies we use the cluster for the identification of predictive fingerprints of phenotypes of interest, which involves computationally expensive applications of machine learning techniques to feature selection and aggregation, and other intermediate problems. We also use the cluster to develop and validate our own methods for protein structure and function prediction, genome annotation, analysis of genome rearrangements, etc. In addition, the cluster is used to run jobs spawned by outside users via Web servers, such as SABLE, POLYVIEW or CINTENY, increasing the visibility of Cincinnati Children's and UC as a hub for bioinformatics.
Jarek Meller, PhD
Department of Environmental Health, University of Cincinnati College of Medicine
The Child Policy Research Center uses state and national data for health services research. In contrast to genomic studies, we had difficulties processing due to the length of our datasets (number of records) rather than the width (number of variables). Most of our data are large transaction sets such as Medicaid enrollment and claims. Our studies often examine multiple years, and even when the scope is limited to children, we end up needing to process more than 200 million records. On a desktop, simple procedures took hours or days. Overnight processing was extremely frustrating and inefficient. Now that we can process on the cluster using SAS Enterprise Guide, we are able to very quickly turn around analyses. Even many of the state agencies that collect the data don't have such processing capacity. This tool allows us to operate within the constant flux of the policy arena, meeting the demand for evidence behind critical decisions. BMI has been exceptionally helpful in making sure our needs are met.
Joseph Schuchter, MPH, and Gerry Fairbrother, PhD
Child Policy Research Center, Cincinnati Children's