Bio
Curriculum Vitae
I'm a fourth-year Computer Science Ph.D. student at Stanford, in the
Folding@Home research group, and am primarily
advised by Prof. Vijay Pande. I am co-advised in the CS department by Professor Daphne Koller.
My primary research focus is in computational drug design, but I also have interests
in data visualization, computational biology, and distributed systems.
Before coming to Stanford, I graduated from the University of California, Berkeley, with a degree
in Electrical Engineering and Computer Science
(Go Bears!). I did undergraduate research with Professors Kathy Yelick,
Bora Nikolic, and John Wawrzynek. I was also a member and officer for
several semesters at the Berkeley Mu Chapter of
Eta Kappa Nu.
Even further back, I graduated from Bellarmine College Preparatory
in San Jose (Go Bells!). I doubt any high school students will care to
read this page, but if you do, I strongly encourage you to do (as I
did), speech and debate. Without a doubt, the skills I gained there have been extremely useful to me.
Outside the lab, I sing in the Stanford University Singers, and while I think long walks on the beach are rather boring and slow, I find such things much more interesting on a bike.
Publications
- Imran S. Haque and Vijay S. Pande. Hard Data on Soft Errors - A Large-Scale Assessment of Real-World Error Rates in GPGPU. Accepted to Resilience 2010: 3rd Workshop on Resiliency in High Performance Computing (held in conjunction with CCGrid 2010). Preprint PDF Supplemental Information
- Imran S. Haque and Vijay S. Pande. PAPER -- Accelerating Parallel Evaluations of ROCS. Journal of Computational Chemistry 31(1), 117-132 (2010). link PDF
- Jed Pitera, Imran Haque, and William Swope. Absence of reptation in the high-temperature folding of the trpzip2 beta-hairpin peptide. Journal of Chemical Physics 124, 141102 (2006). link PDF
Current Projects
 |
In silico cheminformatic prediction of toxicity |
Predicting activity and toxicity of prospective drugs in silico... [show details]
Predicting activity and toxicity of prospective drugs in silico is the major goal of computational drug discovery. Current state-of-the-art experimental techniques for drug development typically include a "high-throughput screening" (HTS) step in which (hundreds of) thousands of compounds are simultaneously tested for activity against a desired target. These experimental screens are labor-intensive, expensive, and time-consuming. I am interested in accurate computational approaches to improve this procedure. In collaboration with my advisors, I am using machine learning techniques and relative descriptors of molecules in order to predict biological activity of lead compounds for drug discovery.
[hide details]
|
 |
Investigating soft error rates in GPU memory |
GPUs originated in error-tolerant graphics applications, but are now used for error-intolerant scientific computing... [show details]
GPUs originated in error-tolerant graphics applications, but are now used for error-intolerant scientific computing. In particular, current generation GPUs do not have error protection (parity or ECC) on their memory subsystems. To investigate the impact of this design, we wrote a custom test code, MemtestG80, and ran it on over 50,000 GPUs on the Folding@home distributed computing network.
Our control experiments on consumer-grade and dedicated-GPGPU hardware in a controlled environment found no errors. However, our survey over cards on Folding@home found that, in their installed environments, two-thirds of tested GPUs exhibit a detectable, pattern-sensitive rate of memory soft errors. We demonstrate that these errors persist after controlling for overclocking and environmental proxies for temperature, but depend strongly on board architecture.
MemtestG80 source code (LGPL) available at http://simtk.org/home/memtest. Precompiled binaries available on SimTK or through Folding@home at http://folding.stanford.edu/English/DownloadUtils
Haque IS and Pande VS. Hard Data on Soft Errors - A Large-Scale Assessment of Real-World Error Rates in GPGPU. Accepted to Resilience 2010: 3rd Workshop on Resiliency in High Performance Computing (held in conjunction with CCGrid 2010). Preprint PDF Supplemental Information
Haque IS and Pande VS. GPUs - TeraFLOPs or TeraFLAWED?. Poster presented at 2009 ACM/IEEE Conference on High Performance Computing, Networking, Storage, and Analysis (SC'09). PDF
[hide details]
|
 |
Methods for virtual high-throughput screening |
Biophysical methods to computationally estimate binding affinity and compound activity... [show details]
Biophysical methods to computationally estimate binding affinity and compound activity, in theory, can make recommendations on promising compounds acting on previously-uncharacterized targets. The availability of structural information for relevant drug targets, combined with data about the interaction networks present within biological organisms, may make it possible to specifically design chemical agents with higher potency and lower toxicity. These methods are applicable not only to the design of pharmaceuticals, but also to the design of agents to interact with specific cellular systems for research work in chemical biology. I am particularly interested in combinations of docking and free energy approaches.
Docking techniques (also known as virtual high-throughput screening, or vHTS) trade accuracy for speed, with the goal of being an in silico alternative to wet-bench based HTS methods. In collaboration with Dr. Kim Branson, I am interested in improving the accuracy of vHTS techniques, primarily through improved scoring techniques. Free-energy methods, by contrast, tend to be slow. However, they are usually more accurate at predicting the Gibbs free energy of an interaction, which is a physical parameter that determines the interaction affinity between a chemical agent and its target (colloquially speaking, how strongly the two "stick to" one another), which is a critical determinant of the potency of a particular chemical. I am interested in improving the accuracy and performance of free-energy techniques to make them more applicable to drug design.
Imran Haque, John D. Chodera, Michael R. Shirts, David L. Mobley, Vijay S. Pande. Toward Quantitative Prediction of Binding Affinities to JNK3 by Alchemical Free Energy Methods. Poster presented at the CUP IX conference, Santa Fe, NM, 17 Mar 2008.
[hide details]
|
 |
PAPER - Accelerating Parallel Evaluations of ROCS |
PAPER is a GPU-accelerated implementation of Gaussian molecular shape overlay (the algorithm in OpenEye ROCS)... [show details]
PAPER is a GPU-accelerated implementation of Gaussian molecular shape overlay (the algorithm in OpenEye ROCS) running on NVIDIA graphics cards. We have demonstrated multiple-order-of-magnitude speedups relative to a CPU-based implementation of the same algorithm, and 5x speedup relative to OpenEye ROCS even on low-end graphics hardware (an NVIDIA 8600GT).
PAPER source code (GPL-licensed) is available at http://simtk.org/home/paper.
Imran Haque and Vijay Pande. PAPER -- Accelerating Parallel Evaluations of ROCS. Journal of Computational Chemistry 31(1), 117-132 (2010). link PDF
[hide details]
|
Past Projects
 |
gCensus(-GT): Free Online GIS with Google Earth |
Poor organization and expensive software should not restrict the public's access to public data.... [show details]
Poor organization and expensive software should not restrict the public's access to public data. gCensus and gCensus-GT are my effort to make geographic data freely and easily accessible to the public, without the need for expensive GIS software, by leveraging Google's excellent free mapping program Google Earth.
Online since late 2006, gCensus exposes the entire 2000 US Census Summary Files 1 and 3. It is widely used, with over 7,500 unique maps generated in 2008. It has received extensive press coverage on(among others) ExtremeTech, Digg, Slashdot, and the San Jose Mercury News. gCensus can be found at http://gecensus.stanford.edu
gCensus-GT solves a parallel problem. While Google Earth Pro lets you load geotagged GeoTIFF images into Google Earth, for a fee, the conversion to KML is in fact a very simple process. gCensus-GT converts from GeoTIFF (and a variety of other geotagged formats) to KML/KMZ for free.
[hide details]
|
 |
Protein folding mechanics |
The mechanism by which proteins fold into their native shapes is an open problem in biophysics... [show details]
The mechanism by which proteins fold into their native shapes is an open problem in biophysics. In work I performed with Drs. Jed Pitera and Bill Swope (of IBM Almaden Research Center), I investigated the mechanisms of beta-hairpin rearrangement using molecular dynamics simulations of a model peptide, trpzip2.
Jed Pitera, Imran Haque, and William Swope. Absence of reptation in the high-temperature folding of the trpzip2 beta-hairpin peptide. Journal of Chemical Physics 124, 141102 (2006). link PDF
[hide details]
|
 |
Computational analysis of genome regulation |
Genome sequences alone do not tell us how genes are expressed in vivo... [show details]
Genome sequences alone do not tell us how genes are expressed in vivo, but computational analysis of gene expression levels can offer insight into the higher-level organization controlling cell biology. This research area (also known as computational systems biology) seeks to determine the structure of the systems which regulate the activity levels of genes and their products in order to produce biological function. Further understanding of this field would have effects not only on our understanding of biology, but also on medicine and pharmacology (by granting better understanding of the mechanisms of disease) and on synthetic biology (through a better understanding of the "architecture" behind existing biological systems).
In collaboration with fellow students and Professor Daphne Koller, I have worked on a machine learning-based model seeking to explain the relevance of DNA copy-number variation to the regulatory network and phenotype of cancer cells.
Brad Gulko, Imran Haque, Sharareh Noorbaloochi, and Keyan Salari. Role of DNA Copy Number Alterations in the trans-Regulatory Network of Cancer Cells. Poster presented at National Cancer Institute Integrative Cancer Biology Program meeting at Stanford, 13 Feb 2007.
[hide details]
|
 |
Architecture and Implementation of LDPC Codecs |
Low-density parity check codes closely approach the Shannon limit, but their maximum-likelihood decoding is NP-hard... [show details]
Low-density parity check codes closely approach the Shannon limit, but their maximum-likelihood decoding is NP-hard. With Professor Bora Nikolic and Zhengya Zhang at Berkeley, I worked on hardware architectures for efficient iterative decoding of LDPC codes, as well as algorithms for hardware real-time analysis of our noise simulation.
Acknowledged in Z. Zhang, L. Dolecek, B. Nikolic, V. Anantharam, M. J. Wainwright, Investigation of error floors of structured low-density parity-check codes by hardware emulation. Proceedings of IEEE Global Communications Conference (GLOBECOM), San Francisco CA, November 2006. (Best Paper Award Finalist). link
[hide details]
|
Talks
- Of Jacquard Looms and Jaccard Coefficients: multithreading biomolecular simulations in a GPU world. Presented at NSF-NAIS Workshop on Intelligent Software, Edinburgh, UK, 19-21 Oct 2009. PDF
- Do GPUs really need ECC? A global-scale assessment of GPU Memory Soft Error Rates. Presented at NVIDIA Corporation, Santa Clara, CA, 2 Dec 2009. PDF
Stanford Coursework
- Biochemistry 224 - Cell Biology of Physiological Processes (audited)
- Biochemistry 230 - Molecular Interventions in Human Disease
- Bioengineering 331 - Protein Engineering
- CS 148 - Introduction to Computer Graphics
- CS 229 - Machine Learning
- CS 279 - Computational Analysis and Reconstruction of Biological Networks
- ME 334 - Statistical Mechanics
- Structural Biology 241 - Biological Macromolecules
Friends
Thumbnail for GPU SER project used under Creative Commons Attribution-Noncommercial License: