<<

IBM Research Life Sciences Projects

March 2009

Computational Biology Center IBM Thomas J. Watson Research Center Yorktown Heights, NY www.research..com/compsci/compbio

Life Sciences Research @ IBM

IBM Research

IBM invests approximately $6 Billion annually in research and development. The IBM Research Division has eight research laboratories around the world and employs over 3,000 employees in a variety of disciplines ranging from , chemistry and electrical engineering to computer science and advanced mathematics. • Watson Research Center (Yorktown Heights, New York; Hawthorne, New York; and Cambridge, Massachusetts) • Austin Research Laboratory (Austin, Texas) • Almaden Research Center (San Jose, ) • Zurich Research Laboratory (Zurich, ) • Haifa Research Laboratory (Haifa, Israel) • India Research Laboratory (Delhi and Bangalore, India) • China Research Laboratory (Beijing and Shanghai, China) • Tokyo Research Laboratory (Yamato, Japan)

IBM Research has continuously been at the forefront of key innovations that have progressively transformed the industry. Some of our innovations include: Magnetic Disk Drives, DRAM memory, the Winchester Disk, FORTRAN, the Relational Database, Speech Recognition, RISC Architecture, Scalable Parallel Systems, CMOS, Silicon-on - Insulator, and Silicon Germanium semiconductor technologies.

IBM was awarded 4186 US patents in 2008. For sixteen years in a row, IBM has had more US patents issued than any other corporation. Our portfolio consists of a rich collection of patents including the composition of matter patent for single walled carbon nanotubes. IBM Research also has had five laureates in its community, including Gerd Binnig and for the invention (in 1981) of the scanning tunneling – a tool which opened up the atomic scale world and started the modern era of nanotechnology.

Computational Biology at IBM Research

The Computational Biology Center (CBC) at IBM Research consists of approximately 40 full time researchers. The bulk of IBM’s expertise is located in the T.J. Watson Research Center (NY), with additional presence in the Almaden Research Center (San Jose, CA) and the Zurich Research Laboratory (Rueschlikon, Switzerland). The CBC was formed in 1995 as a group engaged in exploratory research, and rapidly became vital to IBM’s business in healthcare, life sciences, and high performance (“deep”) computing. Researchers in CBC have extensive backgrounds in computer science, mathematics, chemistry, physics, and biology. Our research model includes active collaboration with industrial, academic and government research organizations around the world. The nature of these collaborations varies greatly and is customized in each instance, but is always centered around scientist to scientist interactions, with the goal of validating our technology, enhancing our understanding of relevant scientific challenges and - if appropriate - joint publication of new scientific results.

Page 2 Life Sciences Research @ IBM

CBC’s mission is to engage in basic and exploratory research at the interface of information technology and biology. This research, frequently conducted in collaboration with partners in universities, medical research centers, biotechnology and pharmaceutical industry, aims at impact in the following strategic life sciences areas: • Understand biological systems with predictive models • Translate molecular biology research into improved clinical care • Develop new, more effective, drugs, faster and cheaper CBC’s current research agenda covers bioinformatics, pattern discovery, functional genomics, systems biology, structural biology, computational chemistry, medical imaging and computational neuroscience.

Members of CBC are actively engaged in visible leadership roles in the computational biology research community. In addition to active leadership in organization of research conferences in computational biology, members have received recognition, such as • Fellow of – American Physical Society, New York Academy of Sciences, American Institute for Medical and Biological Engineering • Editorial Board of journals – PLoS One, J Experimental Biology & Medicine, Gene Therapy & Molecular Biology, Genomics, Human Genomics, Bioinformatics, Proceedings of Royal Society A: Math, Physics & Modeling, Omics, Molecular Simulations • Adjunct Faculty at - M.I.T., Columbia University, Johns Hopkins University, New York University.

In addition to the specific projects described below, the CBC serves as a portal to the broader interests of IBM Research in areas relevant to the Life Sciences - such as grid and cloud computing, high performance computing and storage, Knowledge Management, and data mining – in association with the Deep Computing Institute and the Computational Science Center in IBM Research.

References

General description and links to much of our computational biology research is at http://www.research.ibm.com/compsci/compbio

Structural Biology http://www.research.ibm.com/bluegene http://domino.research.ibm.com/comm/research_projects.nsf/pages/bluegene.pubs.html

Functional and Medical Genomics http://domino.research.ibm.com/comm/research_projects.nsf/pages/cancermodeling.index.html http://domino.research.ibm.com/comm/research_projects.nsf/pages/bioinformatics.overview.ht ml http://www.nationalgeographic.com/genographic http://www.ibm.com/genographic http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=DetailsSearch&Term=(Genographi c[Corporate+Author])

Systems Biology http://domino.research.ibm.com/comm/research_projects.nsf/pages/cardiacmodeling.index.html

Page 3 Life Sciences Research @ IBM http://www.research.ibm.com/journal/rd/521/djurfeldt.html http://bluebrain.epfl.ch/page26906.html

Blue Gene @Watson http://domino.research.ibm.com/comm/research_projects.nsf/pages/bluegene.bgw.html Massively Parallel Simulations on Blue Gene http://www.research.ibm.com/journal/rd52-12.html

There is Only One IBM. Bio-IT World, September 2008 http://www.bio-itworld.com/issues/2008/sept/cover-story-evolution-of-ibm.html

Page 4 Life Sciences Research @ IBM

Computational Biology Projects at IBM Research

Description of IBM’s recent/current work in computational biology and computational chemistry, of relevance to pharmaceutical discovery and development.

Structural Biology

GPCR Activation Mechanisms

Understanding the detailed mechanisms of activation of the GPCR’s is central to many endeavors in biology including signal transduction and the development of new medicines. Our research focuses on gaining insight into GPCR activation through analysis of very large scale all atom molecular dynamics simulations of GPCR’s in native-like conditions. These membrane protein systems are very difficult to study experimentally, and insights gained from simulation have led to very specific experiments that have confirmed our predictions and establish large scale simulation as an effective tool for experimental design and a means to advance our understanding of how these systems work. Specific areas of study are Rhodopsin, the cannabinoid CB2 receptor, the beta2-adrenoreceptor, and other GPCRs. Our key finding revealed that water is central to activation of rhodopsin, and may be central to the activation of GPCRs across the family, a result that would have a strong impact oin rational approaches to ligand design targeting GPCRs (which include about half the drug targets for drugs currently on the market). We are currently studying details and differences between agonism and antagonism1 related to analgesia.

By large scale simulation, we refer to microsecond scale all-atom simulation of the protein in an explicit membrane environment. Such simulations require supercomputing facilities, and simulation software designed for this mode of operation. We have developed the BlueMatter simulation software for this purpose, and have tuned its performance to Blue Gene architectures. This allows a rigorous level of classical molecular dynamics to be applied to membrane protein systems for microsecond scale simulations utilizing many thousands of processors to achieve production rates 40- 60ns/day. Currently, for membrane protein systems, our production rates at this level of theory are unmatched by any other means available, giving us a unique ability to investigate these systems. We have also developed in house methods for preparing and analyzing these systems. Projects will typically require a minimum of four racks of Blue

1 From Wikipedia: In pharmacology the term agonist-antagonist is used to refer to a drug (usually, if not exclusively, psychoactive) which exhibits some properties of an agonist (a substance which fully activates the neuronal receptor that it attaches to) and some properties of an antagonist (a substance which attaches to a receptor but does not activate it or if it displaces an agonist at that receptor it seemingly deactivates it thereby reversing the effect of the agonist). The best known agonists and antagonists are opioids; morphine is an agonist to opioid receptors while naloxone (Narcan) is an antagonist.

Page 5 Life Sciences Research @ IBM

Gene/L for several months CPU time.

Research areas of interest include working with experimentalists and theoreticians to investigate membrane protein-ligand interactions, and studies of GPCR structure and dynamics. Microsecond-scale simulations are far larger than the norm in the simulation field. Accordingly, we have developed a broad range of methods and means of conducting research that enable joint projects between IBM Research and collaborators. Preferably, we collaborate with experimentalists working with the systems of study. Frequently, we design and conduct experiments guided by the interpretations and predictions from the simulation. The central role of water played in the activation of GPCR’s is one of the published findings from a comparison of the simulation data, analysis, and predictions with the experimental results.

Molecular Modeling of Avian Flu

The Spanish flu (1918), Asian flu (1957), and Hong Kong (1968) flu pandemics wreaked havoc on the world community in the last century, causing significant social and economic disruption and loss of life. Although the latter two affected those at the extremes of life, the 1918 influenza strain targeted mainly healthy young adults. While vaccines were available during the 1957 and 1968 pandemics, they could not be prepared in time to thwart the spread of the disease which circled the globe in less than a year. All of the elements are currently in place for an impending influenza pandemic of global proportions. Recently, a highly pathogenic avian influenza (“bird flu”, H5N1) strain has emerged in Southern China. When humans become infected from birds, they manifest symptoms reminiscent of the Spanish flu of 1918: abnormal liver function, multiple organ failure and high rate of mortality. Current research has identified the potential for H5N1 to undergo genetic changes that may enable direct human to human airborne transmission. Should that occur, experts estimate the ensuing pandemic will cause over 2 million deaths in the U.S. and have a devastating impact on the economy (Department of Homeland Security conservatively estimates >$200 billion), healthcare systems and national security.

This project aims to effectively anticipate genetic changes in the virus that might pose a threat to human health. Employing computational technologies developed in IBM, including algorithms for analysis of influenza genomes, methodologies for protein structural simulations, particularly the free energy perturbation (FEP) simulations for viral surface protein binding, and IBM’s Blue Gene supercomputing technology, we aim to understand the biological impact of variations in key Influenza proteins, such as Hemagglutinin (HA), which has emerged to be one of the major targets for both drug and vaccine development. Predicted outcomes of particular variations could be made available for validation in a laboratory setting.

Protein Structure Prediction

Protein structure prediction is one of the most challenging research areas at the forefront of molecular biology. It is of high importance in both conventional drug design and in the

Page 6 Life Sciences Research @ IBM field of biotechnology, e.g. for the design of novel enzymes. There are enormous worldwide efforts to tackle this scientific challenge such as the Critical Assessment of techniques for Structure Prediction (CASP). CASP started in 1994 and has been held every two years for blind tests on predicted protein structures from research groups all over the world. Despite the intense effort, the problem still remains largely unsolved.

The Protein Structure Prediction project aims at new strategies and algorithms for protein structure prediction using technologies developed within IBM. The goal is to go beyond the homology modeling for structure prediction by developing novel algorithms for fold recognition and enhanced sampling methods for structure refinement. The proposed fold recognition method will use the latest neural network and dynamic programming techniques to combine the sequence data with secondary structure information, hydrophobicity (hydrophobic profiling), and solvent accessibility, etc. to map the protein amino acid sequence to the 3D protein folds. The enhanced sampling algorithms include the recently developed Hydrophobic-Aided Replica Exchange Method (HAREM) to refine the model structures.

Protein Misfolding and Aggregation: Single Mutation Effects

Understanding the mechanism behind fatal diseases such as Alzheimer's disease related to protein misfolding and amyloid formation is one of the most challenging problems remaining in molecular biology. Recent experiments have shown that amyloids and fibrils can be formed not only from the traditional beta-amyloid peptides but also from almost any protein given the appropriate conditions. This opened a new and exciting window of research about the mechanism behind diseases related to protein misfolding and aggregation. Most interestingly, recent experiments show that a single mutation can cause some proteins, such as lysozyme, to misfold and form amyloids due to the loss of key long-range hydrophobic interactions.

This project aims to use large scale molecular dynamics simulations to study this extremely interesting and important phenomenon, which should have a significant scientific impact in improved understanding of the mechanism behind the amyloid formation. Particularly, how does a single mutation (W62G) cause the protein lysozyme to lose local and non-local contacts during the misfolding process, and how does this misfolding affect the protein aggregation process? This also offers a unique opportunity for better understanding the water-mediated hydrophobic interactions during the folding/misfolding process. This will also be a great showcase for high performance computing (HPC) using BlueGene.

Protein-Ligand Binding: HIV/AIDS

The design and development of anti-HIV drugs has been of great interest in recent decades due to the wide spread infection of AIDS2. One of the key enzymes packaged within the HIV viron capsid is a reverse transcriptase (RT), which plays an essential role

2 according to a 2008 UNAIDS Global Report, there are approximately 33 million individuals in the world living with HIV.

Page 7 Life Sciences Research @ IBM in the replication of the virus and has emerged as one of the prime targets for the development of drugs for HIV/AIDS therapy. Currently, the use of HIV-1RT nucleoside inhibitors, non-nucleoside inhibitors, HIV protease inhibitors, fusion inhibitors, and/or a combination of these is the best method for controlling HIV infection. In this project, we focus on the non-nucleoside inhibitors of HIV-1RT (NNRTI), particularly the nevirapine and HEPT (1-[(2-hydroxy-ethoxy)methyl]-6-(phenylthio)thymine) analogues. Nevirapine was the first FDA approved non-nucleoside inhibitor, and the HEPT analogue MKC-442 (also known as emivirine) was chosen as a drug candidate for clinical trials earlier. Unfortunately, MKC-442 was reported to trigger the liver enzyme cytochrome P450, leading to drug interactions between MKC-442 and protease inhibitors. Noncompliance and non-ideal pharmacokinetics are major factors in the rise of drug resistance. Thus, more highly potent and reliable analogues are in great demand for lead optimization.

This project aims to develop a new Linear Interaction Energy (LIE) method based on a Surface Generalized Born (SGB) continuum solvent model (LIE-SGB) to predict the binding affinity. The LIE method is an approximation to the free energy perturbation (FEP) method. It combines the molecular mechanics calculations with experimental data to build a model scoring function for the fast evaluation of ligand-protein binding free energies. We will apply this LIE-SGB method to nevirapine and HEPT analogues binding to HIV-1RT. Some initial tests on a binding set with more than 50 ligands show very encouraging results. The binding mechanism learned from the LIE-SGB predictions can then be used to design novel ligands for lead optimization.

Protein Folding with Molecular Dynamics

Protein folding is one of the most fundamental problems in molecular biology. Recent advances in experimental techniques that probe proteins at different stages during the folding process have shed light on the nature of the physical mechanisms and relevant interactions that determine the kinetics of folding, binding, function, and thermodynamic stability. However, many of the details of protein folding pathways remain unknown. Computer simulations performed at various levels of complexity ranging from simple lattice models, models with continuum solvent, to all atom models with explicit solvent can be used to supplement experiment and fill in some of the gaps in our knowledge about folding pathways. In this project, we study the folding thermodynamics and kinetics of a series of proteins, such as beta-hairpin, Trp-cage, trip-zipper, lamda- repressor, lysozyme, gamma-crystalline, etc. We also develop efficient sampling techniques for protein conformation space sampling, such as hydrophobic-aided replica exchange method (HAREM), replica exchange with solute tempering (REST), and multiple time step algorithms (RESPA/PME).

We characterize the conformational states observed at equilibrium at various temperatures using replica exchange molecular dynamics. Recent work on the five helix bundle, 80 amino acid lambda repressor protein has shown complex folding behavior, with several distinct folding transitions corresponding to the development of different types of local order. We also characterize the dynamics of proteins in equilibrium at various temperatures to capture the timescales for folding and unfolding of localized

Page 8 Life Sciences Research @ IBM secondary and tertiary structure. Recent work on the 80 amino acid lambda repressor protein has captured the dynamical behavior of different structural elements unfolding at different temperatures, implying that the nature of “folding” as observed by experimental probes will be sensitive to the temperature and structure being observed. Ongoing simulations of high-temperature unfolded states for a set of six different proteins are revealing some elements of residual structure in the unfolded state.

Protein dynamical motion is too slow for full scale folding to be observed directly in simulation even with supercomputers. We have been active in the development of methods to analyze protein simulation data to extract states appropriate for Markov models, which are capable of describing long timescale dynamical behavior. The goal is to be able to run thousands of short simulations and extract the necessary information to characterize millisecond behavior. We have had success in applying our approach to small peptides, and it has allowed us to elucidate the dynamics of misfolding. We are also interested in and have begun to carry out simulation studies of peptide- surface interactions. The ultimate goal would be to determine the feasibility of forming regular patterns with long range order of peptide/proteins on different types of surfaces as a possible route to nano-patterning for various types of technology. Models of increasing complexity are being explored, starting with an analysis of the conformational preferences of peptides at various distances from idealized, inert surfaces. The work is intended to evolve to include the adsorption of proteins on lipid bilayers and vesicles.

Nanoscale Dewetting in Biological Systems

Hydrophobicity is believed to be the main driving force in protein folding, a process that still remains largely a mystery. Understanding the nature of hydrophobic collapse is an important step towards solving the protein folding problem. For simple nanoscale solutes, such as paraffin-like plates, hydrophobicity induces a strong drying transition in the gap between the hydrophobic surfaces as they approach each other. This transition, although occurring on a microscopic scale, is analogous to a first order from liquid to vapor. The question we try to address in this project is whether or not a similar dewetting transition occurs when proteins fold or form large multi-protein complexes, and, if it does, what physical interactions govern the dewetting critical distance as well as the collapse speed. Such a deeper understanding might help (1) to design novel water nanopores (similar to membrane protein Aquaprion); (2) to design nanoscale molecular switches; and (3) to better understand the mechanism behind all subcellular self- assemblies.

To our surprise, we have recently observed such a dramatic dewetting transition inside a nanoscale channel of protein melittin tetramer. Melittin, a 26-residue polypeptide, is a small toxic protein found in honey bee venom, which often self-assembles into a tetramer. The strong dewetting transition occurs in a subnanosecond time scale and a subnanometer (up to 2-3 water diameters) length scale. The dewetting transition is also found to be very sensitive to single mutations of the three very hydrophobic amino acids (isoleucines) to less hydrophobic residues and such mutations in the right locations can switch the channel from being dry to being wet - a "molecular switch". Thus quite subtle

Page 9 Life Sciences Research @ IBM changes in hydrophobic surface topology can have a pronounced influence on the drying transition. This study shows that, even in the presence of the polar protein backbone, sufficiently hydrophobic protein surfaces can induce a liquid-vapor transition which can then provide an enormous driving force towards further collapse. Our early study also shows that the protein-water electrostatic forces are found to be largely responsible for the much slower collapse in the multi-domain protein than the idealized nanoscale hydrophobic plates, while the van der Waals interactions largely count for the smaller dewetting critical distances.

Force Field Assessment and Development

We have been using molecular dynamics free energy simulations, performed on the Blue Gene Supercomputer, to calculate the hydration free energy of approximately 300 molecules with drug-like chemical functionality. In collaboration with a major pharmaceutical company3 we have assessed the quality of force fields for applications such as ligand binding free energy prediction. In this project, we have identified a number of molecules that pose major challenges for the current generation of fixed charge force fields and are investigating these with polarizable force fields and QM/MM4 treatments.

We have begun a project that involves the use of a newer generation of force fields that include the physics of polarization. We intend to assess existing force fields and, potentially, develop new polarizable force fields for molecules that are problematic for fixed charge force fields, including those with nitro (-NO2) groups.

We have also developed novel capability for the correct treatment of periodicity in systems that involve a small region treated quantum mechanically but embedded in a larger region that is treated classically. This capability is being used to provide benchmark data for force field development, and also to investigate solvent effects that cannot be treated properly with implicit solvent models that are in common use in the quantum chemistry community. These include the effects of (hydrogen bonding) solvent on torsional energy barriers.

Reaction mechanisms, energetics for catalysis of (bio)polymerization reactions

This project involves the use of high quality quantum chemistry methods to elucidate the reaction mechanism, nature and energetics of transition states and intermediates for polymerization reactions catalyzed by a new generation of nonmetallic, organic catalysts so that new and better ones can be designed. We have also developed capability to query a data base of conformationally flexible molecules for those similar to a particular query molecule. The concept of similarity is captured with a characterization that includes essential chemical attributes such as hydrogen bonding and electrostatics.

3 GlaxoSmithKline 4 Quantum Mechanics / Molecular Mechanics

Page 10 Life Sciences Research @ IBM

Neuroimaging

Digital imaging technologies are among the most powerful and useful tools available for deciphering the brain and for assessing the health of a patient’s nervous system. Imaging technologies have been developed for measuring neuroanatomical structure at scales smaller than the vesicle of a synapse to as large as the entire brain. Furthermore, brain function can be imaged from the molecular level with PET5, to tissues with IR6, to cognition with fMRI7. The use of voltage sensitive dyes and calcium imaging permits examination of neural processes on the time scale of electrophysiological function. With an ever increasing array of transgenic animals, molecular markers, contrast agents, dyes, and experimental protocols, nearly any imaginable property can be captured. Finally, the measured properties are ordered against a reference frame (usually physical space) to make their values and relationships amenable to analysis and interpretation.

There are many factors that prevent realization of the full potential of imaging for research and clinical applications. These include: • Acquisition degradation: The quality of acquired data is often degraded, such as patient motion in brain scanners, blurred contributions from outside the focal plane in traditional optical microscopy, and noise across all modalities. Often sophisticated preprocessing techniques can be applied to compensate, such as registration, deconvolution, and filtering. • Invalid assumptions: Images from many modalities, such as MRI and PET, must be reconstructed. For computational expediency, simplifying but erroneous assumptions are often applied. Unfortunately, such shortcuts may result in artifacts. For example, in MRI, discontinuities violate the requirements of the Fourier transform-based reconstruction algorithms, leading to magnetic susceptibility artifacts, and, in PET and CT, photon scatter violates the line integral assumption. • Desired information is not directly measured: Most often the properties measured do not constitute the information sought. They are merely projections of a system we wish to understand or characterize. To bridge this gap, models, statistical analysis techniques, and machine learning tools are explicitly or implicitly applied to extract usable features and infer the desired information. For example, a clinician may infer the presence of Alzheimer’s disease from morphological measurements of proton density in MRI volumes. • Costs and difficulty of data acquisition: Due to the cost of imaging devices, the challenges of working with patients, living animals, and biological tissue, and the influence of experimental protocol on brain function, acquiring sufficient quantities of reliable, unbiased data can be extremely costly and difficult. Specialized expertise and analysis techniques to remove biases, exploit the available information, and draw valid inferences are required. • Computation, data management, and visualization challenges: Information technology often constrains the techniques applied, the reuse of data, and the

5 Positron Emission Tomography 6 InfraRed 7 functional Magnetic Resonance Imaging

Page 11 Life Sciences Research @ IBM

ability to gain valuable insights. High performance computing platforms, high capacity storage solutions, and distributed visualization systems can overcome these limitations.

IBM employs high performance computing systems, such as Blue Gene, in research to overcome the limiting factors described above. In each of the projects described below, multiple limiting factors are addressed:

3D processing and reconstruction from neural slice data

IBM is seeking collaborations with researchers with the ability to obtain high-resolution serial cross-section data from brain tissue to engage in an automated connectomics research project. Modalities of interest range from transmission electron microscopy to ‘Brainbow.’ We have explored the computational challenges in processing the large volumes of high-resolution imagery of successive slices through neural tissue and shown that parallel architectures, such as IBM Blue Gene, are well suited to handle the characteristic complex communication and parallel computation tasks.

HRRT PET Reconstruction

The HRRT PET scanner is the world’s highest resolution PET scanner with human brain capacity. With 10 times the detector crystals, it requires 100 times the computation of a typical scanner. For computational tractability, simplifications and invalid assumptions are used in reconstructions. This project, in collaboration with the Stockholm Brain Institute, exploits Blue Gene to model image generation more faithfully and to explore otherwise infeasible techniques, such as high-resolution scatter simulation, statistical reconstruction techniques, true list-mode reconstruction, use of prior information, and 4D image reconstruction to directly estimate tracer kinetic parameters.

Characterization of macaque cortical representations

This project seeks to characterize the representation of information in macaque cortex using machine learning techniques applied to IR imagery. By predicting features with support vector machines applied to small overlapping neighborhoods across the visual cortex, the project has shown that color and orientation representations are non- overlapping and that the spatial statistics of cortical representations are consistent with those in natural images. To explore more complex stimuli, new protocols exploiting voltage sensitive dyes are being developed.

Page 12 Life Sciences Research @ IBM fMRI analysis

This collection of projects seeks to characterize classes of brains and brain states based on fMRI data. We apply conventional GLM8 and novel statistical network theory-based techniques to extract networks of functional interactions between brain regions. These networks expose new classes of features shown to be effective for characterization of hallucinations and schizophrenia biomarker development. Sparse machine learning techniques are developed and applied to draw valid, interpretable inferences from small data sets. Techniques for discovering data acquisition biases, characterizing network properties, and evaluating the significance of network motifs have been developed.

Neuroscience Analysis and Modeling

The brain is perhaps the least understood of all organs. Its inaccessibility and its complexity at nearly every level of integration, from ion channels to neural processes to anatomically distinct substructures to the full nervous system, pose innumerable challenges. Even the notion of “understanding” is poorly defined when applied to the brain. If “understanding” is achieved when emergent, system-level phenomena can be predicted and explained in terms of the interactions and mechanisms of its underlying components, then the infinite situational capabilities, many levels of nervous system organization, and uncertainty concerning the proper level of grounding make identification of both the desired phenomena of the brain and their foundations elusive.

This lack of understanding impedes progress in the characterization and treatment of neurological and neuropsychiatric disorders, improvements in individual performance, education, and quality of life, and even the process of neuroscientific inquiry itself. Recognizing that the clearest demonstration of understanding is achieved through predictive models grounded in accepted components and mechanisms, IBM Research is working to develop models of the brain. To chip away at the brain’s inaccessibility and complexity, we seek to exploit any available observations to constrain and validate models at corresponding levels of neural system integration. We develop novel and sophisticated analysis and machine learning techniques to turn observations into features and metrics suitable for constraining and validating our models. Our modeling efforts range from replicating observations and functions of neural tissue based on detailed measurements, as with the Blue Brain Project9, to realizing emergent properties of neural subsystems, such as map formation in visual cortex, to organism-level capabilities like behavior selection, perception, and learning. These models may be tightly bound to biological observations to address questions about specific tissues, or they may be grounded in biologically plausible components and mechanisms suitable for answering questions about how biological capabilities may be achieved or modified. Gaps in levels of modeling abstraction are bridged by imposing predictions of higher level models as

8 GLM – General Linear Model: It involves doing linear regression analysis for individual voxels during specific periods of an experiment. For more details see http://www.fil.ion.ucl.ac.uk/~mgray/Presentations/General%20Linear%20Model.ppt#256,1,General Linear Model 9 EPFL, Lausanne, Switzerland

Page 13 Life Sciences Research @ IBM constraints on the emergent phenomena of lower level models.

Tackling the scope and complexity of neural system modeling requires high performance computing platforms, neuroscientific knowledge spanning multiple levels of integration, skills and perspectives drawn from many disciplines, access to experimental data, and computational tools for simulation and analysis. IBM’s multi-disciplinary team includes expertise spanning neuroscience, mathematics, physics, engineering, and computer science. Furthermore, IBM has developed a neural system simulation environment that runs across a variety of computational platforms, including Blue Gene, and supports arbitrary modeling abstractions. This tool is or is planned to be used in all but one of our modeling projects (the exception is for legacy reasons). IBM does not experiment on humans or animals directly, nor is the team large enough to include experts from many domains. For these reasons, IBM routinely engages in complementary collaborations.

Examples of ongoing modeling efforts include (from low to high levels of integration):

Volumetric neuropil modeling

This effort departs from the neuron-centric modeling paradigm to model volumes of neuropil at the compartmental level. By combining implicit and explicit modeling techniques this approach enables gap junction modeling and promises computational scalability.

Neural development

This effort models how signaling between neurons during neural development shapes the formation of contacts or potential synapses between neurons in a microcircuit. The approach borrows techniques from molecular dynamics to model the influences of molecular signaling mechanisms, such as reelin, where force field gradients are analogs of chemical gradients and the action of the forces is analogous to the neurite growth cone response to molecular signals. The complex, asymmetric effects of pseudo-forces (each corresponding to a specific signaling molecule) on the synaptic networks will be explored and parameterized to replicate biological observations.

Spike-timing-dependent synaptic plasticity (STDP)

Building upon an earlier project that mathematically and computationally showed how spike timing dependent plasticity can influence the topology of brain networks and the prevalence of loops, in particular, this effort explores two additional potential roles for STDP: 1) differential modulation of STDP at the cortico-striatal synapse to modify cortico-striatal-thalamo-cortical loop strength and striatal control of the frontal lobe, and 2) spiking-based feature binding in circuits with feed-forward and feedback connectivity. Cortical map formation: While computational models for cortical map formation abound, none has each of the following properties: support for multiple, arbitrary input attributes, multi-map compatibility, biological plausibility, rigorous grounding in theory and mathematics, capacity for over-complete representations, and biologically consistent,

Page 14 Life Sciences Research @ IBM topographic representations. This effort seeks to develop a model satisfying all of these properties for deeper insights into cortical map formation, for implementing global brain models, and for making experimentally testable predictions on cortical representations. Cortical signaling: This effort uses a mathematically principled approach to investigate multi-map, spatio-temporal signaling of the cortex necessary to resolve representational ambiguities (bind features) and maximize representational discrimination given constraints in neural population size and connectivity. Such a model is necessary to understand cortical signaling, implementing global brain models, and making predictions about neural signaling in cortex.

Cerebellar system modeling

This effort seeks to develop a system level model incorporating nearly all regions of the brain related to the cerebellum, (e.g., the inferior olive, the red nucleus, the cortex, the basal ganglia, etc.) that realizes or explains a broad range of behavioral, cognitive, and associative learning functions attributed to the cerebellum. This model has yielded insights into a system-level functional role for the cerebellum, it is needed for global brain modeling, and it may yield insights into disorders such as tremor.

Emotion control system modeling

This collaboration with Martin Ingvar, Arne Ohman, Anders Lansner and David Silverstein at the Stockholm Brain Institute seeks to first develop a predictive, phenomenological model of the functional relationships between brain regions involved in emotional control based on fMRI studies. It will then expand the model with other modalities of observation to develop computational, meso-level models with biologically consistent functional neuroanatomy.

Global brain modeling

This effort explores the brain-behavior relationship with self-organizing global brain models capturing and exploiting complex relationships between perceptions, interoceptive measures, and behavioral and environmental states for selecting behaviors. These models build upon component models of specific brain regions, their relationships, biological constraints, and theoretical requirements.

Systems Biology

Modeling Biological pathways and cellular processes

We create mathematical models of biological processes which allow us to have a quantitative understanding of the workings of specific cellular processes, such as p53 signaling and cardiac models. The relationship between this and our other lines of research stems from the fact that in many cases, the models that we develop feed in the data that we analyze, and allow for an interpretation of the results that arise from the data-mining.

Page 15 Life Sciences Research @ IBM

Inference and Reverse Engineering of cellular circuits

One of the present challenges in biological research is the organization of the data originating from high-throughput technologies. One way in which this information can be organized is in the form of networks of influences, physical or statistical, between cellular components. We have ongoing efforts to develop methods of biological networks inference from high throughput data. The goal is to reconstruct the network architecture given transcriptomics and proteomics data. We are also the organizers of the DREAM (Dialogue on Reverse Engineering Assessment Methods) project, a community based effort to reverse engineer cellular networks.

Mining for topological motifs in cellular networks

One of the ways in which biology is understood today is via graphical circuit-like models called pathway diagrams. These diagrams can be represented and analyzed using graph theory concepts. When looked from this perspective there are a number of interesting questions that we can ask of the graphs. We are interesting in detecting recurring motifs such as feedback loops and cycles in the biological circuits associated with gene regulatory networks, metabolic networks and cell signaling circuits.

Tissue level cardiac models on HPC

High performance computing is required to simulate whole organ models of the heart with biophysically detailed cellular models. Given that Moore’s Law scaling can no longer be assumed at the processor level, further increases in compute power will require greater levels of parallelism. We present a data decomposition strategy to scale cardiac simulations to thousands of computational processors and can be implemented without the need for shared memory architecture. We demonstrate that the simulation of large scale cardiac model with biophysically detailed cellular descriptions and high resolution anatomical structure scales to 16,384 processors with speedup factors above the theoretical value. This opens up new opportunities for cardiac models in research and clinical applications to increase diagnostic information and value in cardiology. We also look for ways in which new data at the molecular scale can be combined with more classical data and data collected at higher levels of organization such as myofibril, muscle cell, and whole heart. In this way, we hope to produce models that can be applied across spatial scales, a factor that is often key in understanding cardiac phenomenon. For example, many cardiac drugs operate on the molecular scale with effects on ionic channels, whereas one wants to understand the effects of these drugs on arrythmias and sudden cardiac death (i.e., the effects at organism and whole heart level over a much longer time scale).

Multiscale model of the cardiac contraction and electromechanics

The availability of increased computing power will make possible new classes of biological models that include detailed representations of proteins and protein complexes

Page 16 Life Sciences Research @ IBM with spatial interactions. Along these lines, we are developing a model of the interaction of actin and myosin within one pair of thick and thin filaments in the cardiac sarcomere. The model includes explicit representations of actin, myosin, and regulatory proteins (see Hussan et al., 2006). Although this is not an atomic-scale model, as would be the case for molecular dynamics simulations, the model seeks to represent spatial interactions between protein complexes that are thought to produce characteristic cardiac muscle responses at larger scales. While the model simulates the microscopic scale, when model results are extrapolated to larger structures, the model recapitulates complex, non-linear behavior. By bridging spatial scales, the model provides a plausible and quantitative explanation for several unexplained phenomena observed at the tissue level in cardiac muscle. Model execution entails Monte Carlo based simulations of Markov representations of calcium regulation and actin-myosin interactions. The detailed model includes explicit representations of actin, myosin, and regulatory proteins is too computationally expensive for large-scale tissues. However, the model served as a guide to for more approximate and phenomenological model implemented with ordinary differential eqns. with high computational efficiency (See Rice et al., Biophys J., 2008). This approximate model is currently being implemented in electromechanical models by our groups and others.

Modeling bacterial chemotaxis: from molecules to behaviors

Our main goal is to develop quantitative molecular level models based on network of protein-protein interactions to explain specific experimental behaviors of biological systems. We have worked on the chemosensory system of E. Coli. We are studying the molecular machinery and mechanism by which a biological system processes information and reacts to external signals. Eventually, we want to develop an in silico chemotaxis system and be able to predict the detailed response of the cell under different stimulus, which can be compared directly with experiments. In terms of bacteria, this amounts to explaining its sensory system (“nerve”), signal processing system (“brain”), and locomotion (“muscle”) altogether. Besides trying to understand the specific properties of the bacterial chemotaxis system, another goal is to look into the more general questions in modeling biological systems, such as functional stability (robustness) in biological systems, discrete versus continuous mathematics in biological modeling, effects of noise in the signaling systems and the importance of spatial information (transport, spatial coupling) in modeling biological systems.

Functional and Medical Genomics

Data Analysis, noise characterization and novel biotechnologies

We develop algorithms to mine high-throughput data arising in biological labs, with specific emphasis in gene expression array data, chip on chip data and proteomics. We typically use these data sets, which we obtain through our academic and industrial collaborators, to discover which biological pathways are involved in specific cellular processes. In particular we have been actively engaged in the analysis of cancer data and

Page 17 Life Sciences Research @ IBM in the characterization of noise in the technologies involved. We are also involved in the development of a DNA nanopore sequencer.

RNA interference (RNAi)

Probably one of the best discussed examples of an early observation of this phenomenon pertains to Rich Jorgensen’s effort in the late 1980’s to engineer deep purple petunias by introducing extra copies of chalcone synthase, a key enzyme in anthocyanin biosynthesis, in the form of double stranded RNA. This resulted in engineered petunias that turned out white or patterned. In the late ‘90s, scientists realized that what was responsible for the results of the petunia experiments was a mechanism now known as post-transcriptional gene silencing (PTGS). The mechanism resulted in the decrease of expression for both the introduced and the endogenous copy of the chalcone synthase gene thus leading to white or patterned petunias. Since Jorgensen’s serendipitous discovery, scientists have learned a lot about PTGS and RNAi. Initially, it was thought to serve as a defense mechanism that organisms devised to protect themselves from the activity of viruses and transposable elements that have invaded their genomes. But nowadays, RNAi is believed to be a very important element for running the biological processes that we have come to know from our decades-long studies of cells.

Through our work, we try to analyze and address questions such as "how many microRNAs are encoded by a given genome?" and "given a microRNAs how many and which are its targetsi?". TO this end we have developed "rna22" a pattern-based method for addressing these questions. Our computational analyses to date suggest several hypotheses that paint a picture of cell regulation that is substantially different than what is currently believed. First, we find that there may be as many as a few tens of thousands of endogenously encoded microRNAs (and their respective precursors) in the human genome. Second that as many as 90% of the known protein-coding human genes may be targets of one or more microRNAs. And, third, that a microRNA may target as many as a few thousands genes.

"Junk" DNA

In recent work, we described a large-scale computational analysis of the human genome that was aimed at revealing the underlying connections between coding and non-coding DNA. We discovered a very large number of short very-well-conserved blocks that we termed pyknons: pyknons were originally discovered in the intergenic and intronic regions of the human genome and shown to have additional copies in the 5'UTRs, CDSs and 3'UTRs of almost all known protein-coding human genes. Our studies also showed that the pyknons are connected to biological processes and to RNA interference. Notably, this work predicted the existence of piRNAs that were later reported experimentally by three different groups. We continue our work in order to better understand this very extensive layer of cell process regulation

Page 18 Life Sciences Research @ IBM

Population Genetics and Personalized Medicine

We develop algorithms and techniques to analyze data in the study of genetic variation in large human population, and its application to areas of pharmacogenomics and personalized medicine. Recent work includes: data mining and statistical analysis of large (several hundred thousand patients) health records to identify potential novel disease associations; algorithms to discover patterns of associated genetic markers in stratified, case vs. control populations; algorithms to reconstruct non-recombinant phylogeny; algorithms to reconstruct ancestral recombination graphs and application to reconstruction of human recombinant phylogeny.

Page 19 Life Sciences Research @ IBM

Computational Biology Selected publications, 2000 - Present

Structural Biology

• Grossfield, M.C. Pitman, S.E. Feller, O. Soubias and K. Gawrisch. Internal Hydration Increases during Activation of the G-Protein-Coupled Receptor Rhodopsin J. Mol. Bio. (2008) 381, 478-486 • G. Khelashvili, A. Grossfield, S.E. Feller, M.C. Pitman, and H. Weinstein, Structural and dynamic effects of cholesterol at preferred sites of interaction with rhodopsin identified from microsecond length molecular dynamics simulations. Submitted to Proteins. • G. Fitch, A. Rayshubskiy, M. Eleftheriou, T. J. C. Ward, M. E. Giampapa, M. C. Pitman, J. W. Pitera, W. C. Swope, , and R. S. Germain. Blue matter: Scaling of n-body simulations to one atom per node. IBM Journal of Research and Development, 52(1/2):145–158, 2008. (doi:10.1147/rd.521.0145) • P.-W. Lau, A. Grossfield, S.E. Feller, M.C. Pitman, M.F. Brown. Dynamic structure of retinal inverse agonist of rhodopsin probed by molecular dynamics, J. Mol. Biol. 2007, 372, 906-917 • Grossfield, S.E.Feller and M.C. Pitman. Convergence of molecular dynamics simulations of membrane proteins, Proteins: Struc. Func. Bioinf., 2007, 67, 31-40 • H. Lan, X. Huang, R. Zhou and B. J. Berne, Water dynamics in the salvation shell of a multi-domain protein, J. Phys. Chem. B., 110, 3704-3711, 2006 • T. Z. Lwin, R. Zhou and R. Luo, Is Poisson-Boltzmann theory insufficient for protein folding simulations? J. Chem. Phys. 124, 34902-34907, 2006 • R. Zhou, A. Royyuru, P. Athma, F. Suits and BD Silverman, Magnitude and Spatial Orientation of the Hydrophobic Moments of Multi-Domain Proteins, Int. J. Bioinf. Res. Appl. 2, 161-176, 2006 • K. Martínez-Mayorga, M.C. Pitman, A. Grossfield, S.E. Feller and M.F. Brown. Retinal counterion switch mechanism in vision evaluated by molecular simulation, J. Am. Chem. Soc., 2006, 128, 16502-16503 • Grossfield, S.E. Feller, and M. C. Pitman, A role for direct interactions in the modulation of rhodopsin by omega-3 polyunsaturated lipids. Proc. Natl. Acad. Sci. U S A, 2006. 103(13): p. 4888–4893. • Grossfield, S.E. Feller, and M.C. Pitman, Contribution of omega-3 Fatty acids to the thermodynamics of membrane protein solvation. J Phys Chem B Condens Matter Mater Surf Interfaces Biophys, 2006. 110(18): p. 8907-9. • X. Huang, R. Zhou and B. J. Berne, Drying and Hydrophobic Collapse of Paraffin Plates, J. Phys. Chem. B. 109, 3546-3552, 2005 • J. Li, Y. Lei, T. Liu, Z. Wu, X. Tang, and R. Zhou, Water hydration near graphite- CH3 and graphite-COOH plates, J. Phys. Chem. B. 109, 13639-13648, 2005 • L. Parida and R. Zhou, Combinatorial Pattern Discovery for Protein Folding Trajectory Analysis, PLoS J. Comp. Biol., 1, 32-40, 2005 • Y. Lei, J. Li, T. Liu, Z. Wu and R. Zhou, Wavelets approach for protein trajectory

Page 20 Life Sciences Research @ IBM

analysis, J. Bioinfo. Comp. Biol. 3, 1351-1370, 2005 • P. Liu, X. Huang, R. Zhou and B. J. Berne, Drying and Hydrophobic Collapse of Melittin Tetramer, Nature, 437, 159-162, 2005 • R.F. Enenkel, B.G. Fitch, R.S. Germain, F.G. Gustavson, A. Martin, M. Mendell, J. Pitera, M.C. Pitman, A. Rayshubskiy, F. Suits, W.C. Swope, and T.J.C. Ward, Custom math functions for molecular dynamics. IBM Journal of Res. and Dev., 2005. 49(3/4): p. 465-474. • M.C. Pitman, A. Grossfield, F. Suits, and S.E. Feller, Role of cholesterol and polyunsaturated chains in lipid-protein interactions: Molecular dynamics simulation of rhodopsin in a realistic membrane environment. J. Am. Chem. Soc., 2005. 127: p. 4576-4577. • M.C. Pitman, F. Suits, K. Gawrisch, and S.E. Feller, Molecular dynamics investigation of dynamical properties of phosphatidylethanolamine lipid bilayers. J. Chem. Phys., 2005. 122(24): p. 244715. • F. Suits, M.C. Pitman, and S.E. Feller, Molecular dynamics investigation of the structural properties of phosphatidylethanolamine lipid bilayers. J Chem Phys, 2005. 122(24): p. 244714. • F. Suits, M.C. Pitman, J. Pitera, W.C. Swope, and R.S. Germain, Overview of molecular dynamics techniques and early scientific results from the Blue Gene project. IBM Journal of Res. and Dev., 2005. 49(2/3): p. 475-488. • R. Zhou, Sampling Protein Folding Free Energy Landscape: Coupling Replica Exchange Method with P3ME/RESPA Algorithm, J. Mol. Grap. Model. 22, 451- 463, 2004 • R. Zhou, G. Krilov and B. J. Berne, Comment on "Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in water?": The Poisson-Boltzman Model, J. Phys. Chem. B 108, 7528-7530, 2004 • W. Swope, J. Pitera, F. Suits, M. Pitman, M. Eleftheriou, B. Fitch, R. Germain, A. Rayshubskiy, T. J. C. Ward, Y. Zhestkov, and R. Zhou, Describing Protein Folding Kinetics by Molecular Dynamics Simulations: II. Application to a beta- hairpin Peptide, J. Phys. Chem. B 108, 6582-6594, 2004 • R. Zhou, X. Huang, C. Margulius and B. J. Berne, Dewetting and Hydrophobic Collapse in Multi-domain Protein Folding, Science, 305, 1605-1609, 2004 • M.C. Pitman, F. Suits, A.D. Mackerell, Jr., and S.E. Feller, Molecular-level organization of saturated and polyunsaturated fatty acids in a phosphatidylcholine bilayer containing cholesterol. Biochemistry, 2004. 43(49): p. 15318-28. • G. Kaminski, R. A. Friesner and R. Zhou, A computationally inexpensive modification of the point dipole electrostatic polarization model for molecular simulation, J. Comp. Chem. 24, 267-276, 2003 • R. Zhou, B. D. Silverman, A. Royyuru, and P. Athma, Spatial Profiling of Protein Hydrophobicity: Native vs. Decoy Structures , Proteins, 52, 561-572, 2003 • R. Zhou, Folding free energy landscape of protein folding in water: explicit vs. implicit solvent, Proteins 53, 148-161, 2003 • G. Fitch, R. S. Germain, M. Mendell, J. Pitera, M. Pitman, A. Rayshubskiy, Y. Sham, F. Suits, W. Swope, T. J. C. Ward, Y. Zhestkov, and R. Zhou, Blue Matter, An Application Framework for Molecular Simulation on Blue Gene , J. Parallel & Distrib. Comput. 63, 759-773, 2003

Page 21 Life Sciences Research @ IBM

• R. Zhou, Trp-cage: Folding Free Energy Landscape in Explicit Water, Proc. Natl. Acad. Sci., 100, 13280-13285, 2003 • G. A. Kaminski, H. A. Stern, B. J. Berne, R. A. Friesner, Y. Cao, R. B. Murphy, R. Zhou, and T. A. Halgren, Development of a Polarizable Force Field for Proteins via ab initio Quantum Chemistry: First Generation Model and Gas phase Tests, J. Comp. Chem. 23, 1515-1531, 2002 • R. Zhou and B. J. Berne, Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in water?, Proc. Natl. Acad. Sci. 99, 12777-12782, 2002 • F. Allen, et al. R. Zhou, Blue Gene: A vision for protein science using a petaflop supercomputer, IBM Systems Journal 40, 310-327, 2001 • R. Zhou, E. Harder, H. Xu and B. J. Berne, Efficient multiple time step method for use with Ewald and Particle-Mesh Ewald for large biomolecular systems , J. Chem. Phys. 115, 2348-2358, 2001 • R. Zhou, R. A. Friesner, A. Ghosh, R. C. Rizzo, W. L. Jorgensen, and R. M. Levy, New Linear Interaction Method for Binding Affinity Calculations using a Continuum Solvent Model, J. Phys. Chem. B105, 10388-10397, 2001 • R. Zhou, B. J. Berne and R. Germain, Free energy landscape of a beta-hairpin folding in explicit water, Proc. Natl. Acad. Sci. 98, 14931-14936, 2001 • M.C. Pitman, W.K. Huber, H. Horn, A. Kramer, J.E. Rice, and W.C. Swope, FLASHFLOOD: a 3D field-based similarity search and alignment method for flexible molecules. J Comput Aided Mol Des, 2001. 15(7): p. 587-612.

Neuroimaging

• M.Carroll, G.A. Cecchi, I.Rish, R.Garg, A.R. Rao (2008) Prediction and interpretation of distributed neural activity with sparse models. NeuroImage, 44(1):112-122. • Y. Xiao, A.R. Rao, G.A. Cecchi, E. Kaplan (2008) Improved mapping of information distribution across the cortical surface with the Support Vector Machine, Neural Networks , 21(2/3):341-348. • “Network related challenges and insights from neuroscience,” C.C. Peck, J.R. Kozloski, G.A. Cecchi, S.Hill, F.Schuermann, H.Markram, A.R.Rao, appeared in BioWire, published by Springer Verlag, 2008. • “Inferring brain dynamics using granger causality on fMRI data”, G.A. Cecchi R. Garg and A.R. Rao, IEEE Intl. Symposium on Biomedical Imaging (ISBI), 2008, pp. 604-607. • “Unsupervised segmentation with dynamical units,” A.R.Rao, G.A.Cecchi, C.C.Peck, J.Kozloski., IEEE Trans. Neural Networks, Vol 19, No. 1, Jan 2008, pp. 168-182. • “Prediction of Brain Activity Based on the Elastic Net Algorithm I”, G.A.Cecchi, R.Garg, A.R.Rao, I.Rish. Human Brain Mapping Conference 2007, Pittsburgh Brain Activity Interpretation Competition. • “Prediction of Brain Activity Based on the Elastic Net Algorithm II”, I.Rish, G.A.Cecchi, R.Garg, A.R.Rao. Human Brain Mapping Conference 2007, Pittsburgh Brain Activity Interpretation Competition.

Page 22 Life Sciences Research @ IBM

• “Topographic Infomax in a Neural Multigrid”, J.Kozloski, G.A.Cecchi, C.Peck, A.R.Rao, ISNN 2007, International Symposium on Neural Networks, pp 500- 509. • “Emergence of Topographic Cortical Maps in a Parameterless Local Competition Network”, A.R.Rao, G.A.Cecchi, C.C.Peck, J.Kozloski, ISNN 2007, International Symposium on Neural Networks, pp 552-561. • ``Performance characterization of an oscillatory neural network that achieves binding through phase synchronization'', A.R.Rao, G.A.Cecchi, C.C.Peck, J.R.Kozloski, Proceedings of Dynamic Brain Forum, Japan, 2007. • “Cortical representation of information about visual attributes: one network or many?” Y.Xiao, A.R.Rao, G.A.Cecchi, E.Kaplan, in International Joint Conference on Neural Networks, IJCNN 2007, Page(s):1785 – 1789. • 'An optimization approach to achieve unsupervised segmentation in a network using dynamical units', Rao, Cecchi, Peck, Kozloski, Proceedings of IEEE Intl. Joint Conf on Neural Networks, July 2006 Page(s):4159 – 4166. • “The use of parameterless self-organizing maps in modeling the visual cortex”, Rao, Cecchi, to appear in International Symposium on Neural Networks, June 2007. • Evaluation of the effect of input stimuli on the quality of orientation maps produced through self organization, A. R. Rao, G. Cecchi, C. Peck and J. Kozloski, 14th Scandinavian Conference on Image Anlysis, Springer Verlag LNCS 3540, pp 810-820

Neuroscience Analysis and Modeling

• J. Kozloski, K. Sfyrakis, S. Hill, F. Schürmann, H. Markram (2008) Identifying, tabulating, and analyzing contacts between branched neuron morphologies, IBM Journal of Research and Development special issue on Massively Parallel Computing, 52:1/2(43-55). • R. Rao, G. A. Cecchi (2008) Spatio-temporal Dynamics during Perceptual Processing in an Oscillatory Neural Network, International Conference on Artificial Neural Networks, 2:685-694. • Y. Xiao, A.R. Rao, G.A. Cecchi, E. Kaplan (2008) Improved mapping of information distribution across the cortical surface with the Support Vector Machine, Neural Networks , 21(2/3):341-348. • “Network related challenges and insights from neuroscience,” C.C. Peck, J.R. Kozloski, G.A. Cecchi, S.Hill, F.Schuermann, H.Markram, A.R.Rao, appeared in BioWire, published by Springer Verlag, 2008. • “Unsupervised segmentation with dynamical units,” A.R.Rao, G.A.Cecchi, C.C.Peck, J.Kozloski., IEEE Trans. Neural Networks,Vol 19, No. 1, Jan 2008, pp. 168-182. • “Prediction of Brain Activity Based on the Elastic Net Algorithm I”, G.A.Cecchi, R.Garg, A.R.Rao, I.Rish. Human Brain Mapping Conference 2007, Pittsburgh Brain Activity Interpretation Competition. • “Prediction of Brain Activity Based on the Elastic Net Algorithm II”, I.Rish, G.A.Cecchi, R.Garg, A.R.Rao. Human Brain Mapping Conference 2007,

Page 23 Life Sciences Research @ IBM

Pittsburgh Brain Activity Interpretation Competition. • “Topographic Infomax in a Neural Multigrid”, J.Kozloski, G.A.Cecchi, C.Peck, A.R.Rao, ISNN 2007, International Symposium on Neural Networks, pp 500- 509. • “Emergence of Topographic Cortical Maps in a Parameterless Local Competition Network”, A.R.Rao, G.A.Cecchi, C.C.Peck, J.Kozloski, ISNN 2007, International Symposium on Neural Networks, pp 552-561. • ``Performance characterization of an oscillatory neural network that achieves binding through phase synchronization'', A.R.Rao, G.A.Cecchi, C.C.Peck, J.R.Kozloski, Proceedings of Dynamic Brain Forum, Japan, 2007. • “Cortical representation of information about visual attributes: one network or many?” Y.Xiao, A.R.Rao, G.A.Cecchi, E.Kaplan, in International Joint Conference on Neural Networks, IJCNN 2007, Page(s):1785 – 1789. • 'An optimization approach to achieve unsupervised segmentation in a network using dynamical units', Rao, Cecchi, Peck, Kozloski, Proceedings of IEEE Intl. Joint Conf on Neural Networks, July 2006 Page(s):4159 – 4166. • “The use of parameterless self-organizing maps in modeling the visual cortex”, Rao, Cecchi, to appear in International Symposium on Neural Networks, June 2007. • Evaluation of the effect of input stimuli on the quality of orientation maps produced through self organization, A. R. Rao, G. Cecchi, C. Peck and J. Kozloski, 14th Scandinavian Conference on Image Anlysis, Springer Verlag LNCS 3540, pp 810-820 • “Computational models of adult neurogenesis”, G.A. Cecchi & M.O. Magnasco, Physica A 356, 43-47 (2005). • “Scale-free brain functional networks”, V.M. Eguiluz, D.R. Chialvo. G.A. Cecchi, M. Baliki, A.V. Apkarian, Physical Review Letters 94, 018102 (2005). • “Global properties of the Wordnet lexicon”, M. Sigman & G.A. Cecchi, Proceedings of the National Academy of Sciences USA 99 (3): 1742-7 (2002). • Self organizing cortical color maps, A. R. Rao, G. Cecchi, C. Peck and J. Kozloski, in proceedings, Human Vision and Electronic Imaging, Jan 2005, SPIE Vol. 5666 pp 17-26. • “Simulation system architecture”, C. Peck, G. Cecchi, A. R. Rao, and J. Kozloski in Intl. Conf on Computer Systems, Springer Verlag, pp. 1127-1136 June 2003.

Systems Biology

• G. Meacci and Yuhai Tu. “Dynamics of the bacterial flagellar motor with multiple stators”, PNAS, accepted, 2009. • Y.V. Kalinin, L. Jiang, Y. Tu and M. Wu. “Logarithmic sensing in Escherichia coli bacteria chemotaxis”, Biophysical Journal, to appear, 2009. • M. Reumann, V. Gurev and J. Jeremy Rice, Computational model of cardiac disease: potential for personalized medicine, Personalized Medicine, 6(1), 45-66 (2009). • J. Wagner and G. Stolovitzky. Stability and time-delay modeling of negative feedback loops. Proceedings of the IEEE, 96(8):1398—1410, 2008.

Page 24 Life Sciences Research @ IBM

• J.J. Rice, F. Wang, D.M. Bers and P.P de Tombe. Approximate model of cooperative activation and crossbridge cycling in cardiac muscle using ordinary differential equations, Biophys J. 95(5):2368-90. (2008). • Ma’ayan, G. Cecchi, J. Wagner, R. Rao, R. Iyengar, G. Stolovitzky, Ordered cyclic motifs contribute to dynamic stability in biological and engineered networks, Proc Natl Acad Sci U S A., 105(49):19235-39 (2008). • P.N. Ayittey, J.S. Walker, J.J. Rice and P.P. de Tombe. Glass microneedles for force measurements: a finite-element analysis model, Pflugers Arch. - Eur J Physiol (2008). • Y. Tu, T. S. Shimizu and H. Berg. “Modeling the chemotactic response of E. coli to time-varying stimuli”, PNAS, 105(39), 14855-14860 (2008). • Y. Tu. “The nonequilibrium mechanism for a biological switch: Sensing by Maxwell’s demons”, PNAS, 105(33), 11737-11741 (2008). • Christin, A.K. Smilde, H.C. Hoefsloot, F. Suits, R. Bischoff and P.L. Horvatovich, Optimized time alignment algorithm for LC-MS data: correlation optimized warping using component detection algorithm-selected mass chromatograms, Anal Chem. 15;80(18):7012-21 (2008). • J.J. Rice, Y. Tu, C. Poggesi and P.P. de Tombe. Spatially-compressed cardiac myofilament models generate hysteresis that is not found in real muscle, Pac Symp Biocomput. 366-77 (2008). • P. Du, G. Stolovitzky, P. Horvatovich, R. Bischoff, J. Lim and F. Suits, A Noise Model for Mass Spectrometry Based Proteomic, accepted, Bioinformatics;24(8):1070-7 (2008). • M. Reumann, J. Bohnert, G. Seemann, B. Osswald and O. Dössel. Preventive Ablation Strategies in a Biophysical Model of Atrial Fibrillation Based on Realistic Anatomical Data. IEEE Transactions on Biomedical Engineering, 2008;55(2):399-406 • M. Reumann, B. G. Fitch, A. Rayshubskiy, D. L. Weiss, G. Seemann, O. Dössel, M. C. Pitman and J. J. Rice. Large-scale parallel and distributed memory implementation of a bidomain cardiac model on the Blue Gene/L supercomputer. In Proc Computers in Cardiology 2008, 2008;35:81-84 • M. Reumann, B. G. Fitch, A. Rayshubskiy, D. U. J. Keller, D. L. Weiss, G. Seemann, O. Dössel, M. C. Pitman and J. J. Rice.. Simulation framework for high resolution cardiac models on the distributed memory Blue Gene supercomputer. Conf Proc IEEE Eng Med Biol Soc. 2008;2008:577-580 • G.A Stolovitzky, D. Monroe and A. Califano. Dialogue on Reverse Engineering Assessment and Methods: the DREAM of high throughput pathway inference, Ann N Y Acad Sci., Oct 9; (2007). • W. Hu, Z. Feng, J. Wagner, L. Ma, G. Stolovitzky and A.J. Levine, A single nucleotide polymorphism in the MDM2 gene disrupts the p53-Mdm2 oscillation, Cancer Res., 67(6):2757-65 (2007). • S. Polonsky, S. Rossnagel and G. Stolovitzky, Nanopore in metal-dielectric sandwich for DNA position control, Applied Physics Letters, 91,153103 (2007) • P. Horvatovich, N.I. Govorukhina, T.H. Reijmers, A.G. van der Zee, F. Suits and R. Bischoff, Chip-LC-MS for label-free profiling of human serum, Electrophoresis. 2007 Dec;28(23):4493-505.

Page 25 Life Sciences Research @ IBM

• G. Stolovitzky and A. Califano, book editors: Reverse Engineering Biological Networks, Annals of the NY Academy of Sciences, vol. 1115, (2007). • Mello and Y. Tu. “Effects of adaptation in maintaining high sensitivity over a wide range of backgrounds for E. coli chemotaxis”, Biophysical Journals, 92(4), 2329-2337 (2007). • Y. Tu and J. Tersoff. “Corsening, Mixing and Motion: The complex evolution of epitaxial islands”, Physical Review Letters 98, 096103-096106 (2007). • G. A. Held, K. Duggar and G. Stolovitzky, "Comparison of Amersham and Agilent microarray technologies through quantitative noise analysis," OMICS, 10(4), 532-544 (2006). • J. Hussan and P.P. de Tombe and J.J. Rice, A spatially detailed myofilament model as a basis for large-scale biological simulations, IBM Journal of Research and Development, 50(6) (Issue on Systems Biology) (2006). • G. Stolovitzky, A. Kundaje, G.A. Held, K. Duggar, C. Haudenschild, D. Zhou, T. Vasicek, K. Smith, A. Aderem and J. Roach, Statistical analysis of MPSS measurements: application to the study of LPS activated macrophage gene expression, Proc. Natl. Acad. Sci. USA, 102 (5), 1402-1407, (2005). • K. Basso, A. Margolin, G. Stolovitzky, U. Klein, R. Dalla Favera, and A. Califano, Reverse engineering of regulatory networks in human B cells, Nature Genetics, 37(4), 382-90 (2005). • J.J. Rice, Y. Tu and G. Stolovitzky, Reconstructing synthetic biological networks using conditional correlation analysis, Bioinformatics (Epub 2004 Oct 14), 21(6):765-73 (2005). • Ma’ayan, S. L. Jenkins, S. Neves, A. Hasseldine, E. Grace, B. Dubin-Thaler, N. J. Eungdamrong, G. Weng, P. Ram, J. J. Rice, A. Kershenbaum, G. Stolovitzky, R. D. Blitzer and R. Iyengar, Formation of regulatory patterns during signal propagation in a mammalian cellular network, Science, 309, 1078-1083 (2005). • J.J. Rice, A. Kershenbaum and G. Stolovitzky, Lasting impressions: Motifs in protein-protein maps may provide footprints of evolutionary events, Proc Natl Acad Sci U S A. 1;102(9):3173-4 (2005). • J. Wagner, L. Ma, J.J. Rice, W. Hu, A.J. Levine and zG.A. Stolovitzky, p53- Mdm2 loop controlled by a balance of its feedback strength and effective dampening using ATM and delayed feedback, IEE Proc Sys Biol, 152, 3, 109-118 (2005). • L. Ma, J. Wagner, J.J. Rice, W. Hu, A. Levine and G. Stolovitzky, A plausible model for the digital response of p53 to DNA damage, Proc Natl Acad Sci U S A. 1;102(40):14266-14271 (2005). • J.J. Rice and D.M. Bers. The response of cardiac muscle to stretch: The role of calcium. In Cardiac Mechano-electric Feedback and Arrythmias: From Pippette to Patient. Kohl, Franz, and Sachs, eds. Elsevier: Philadelphia (2005). • Mello and Yuhai Tu. “An allosteric model for heterogeneous receptor complexes: Understanding bacterial chemotaxis responses to multiple stimuli”, Proceedings of National Academy of Sciences, 102(48), 17354-17359 (2005). • Y. Tu and G. Grinstein. “How white noise generates power-law switching in bacterial motors”, PRL, 94, 208101(2005) • J.J. Rice and G. Stolovitzky, Making the most of it: Pathway reconstruction and

Page 26 Life Sciences Research @ IBM

integrative simulation using the data at hand, Biosilico 2(2):70-7 (2004). • J. Lepre, J.J. Rice, Y. Tu, and G. Stolovitzky, Genes@Work: an efficient algorithm for pattern discovery and multivariate feature selection in gene expression data, Bioinformatics 20(7):1033-44 (2004). • K. Basso, U. Klein, H. Niu, G. Stolovitzky, Y. Tu, A. Califano, G. Cattoretti, R. Dalla Favera, Tracking CD40 signaling during normal germinal center development, Blood 104(13):4088-96 (2004). • J.J. Rice and P.P. de Tombe. Approaches to modeling crossbridges and calcium- dependent activation in cardiac muscle. Prog Biophys Mol Biol. Jun-Jul;85(2- 3):179-95 (2004). • J.J. Rice and P.P. de Tombe. Approaches to modeling crossbridges and calcium- dependent activation in cardiac muscle. Prog Biophys Mol Biol. Jun-Jul;85(2- 3):179-95 (2004). • Mello, L. Shaw and Y. Tu. “Effects of receptor interaction in bacterial chemotaxis”, Biophysical Journal, 87(3), 1578-1595(2004). • D.F. Jelinek, R.C. Tschumper, G. Stolovitzky, S.J. Iturria, Y. Tu, J. Lepre, N. Shah, and N.E. Kay, Identification of a global Gene Expression Signature of B- Chronic-Lymphocytic Leukemia, Molecular Cancer Research, 1 (5) :346-61 (2003). • U. Klein, Y. Tu, G. Stolovitzky, J.L. Keller, J. Haddad Jr., V. Miljkovic, G. Cattoretti, A. Califano, and R. Dalla Favera, Transcriptional analysis of the germinal-center reaction, Proc. Nat. Acad. Sci. USA, 100(5):2639-44, (2003). • G. Stolovitzky, Gene selection in microarray data: the elephant, the blind men and our algorithms. Current Opinion in Structural Biology, 13:370–376 (2003). • R. Kuppers, U. Klein, I. Schwering, V. Distler, A. Brauninger, G. Cattoretti, Y. Tu, G. Stolovitzky, A. Califano, M.L. Hansmann, R. Dalla Favera, Identification of Hodgkin and Reed-Sternberg cell-specific genes by gene expression profiling. J Clin Invest. 111(4):529-37 (2003). • J.J. Rice, G. Stolovitzky, Y. Tu, and P.P. de Tombe, Ising Model of cardiac thin filament activation with nearest neighbor interactions, Biophys. Journal, 84(2), 897-909, (2003). • Mello and Y. Tu. “Quantitative Modeling of Sensitivity in Bacterial Chemotaxis: The Role of Coupling Between Different Chemoreceptor Species”, PNAS, 100(14), 8223-8228 (2003). • Mello and Y. Tu. “Perfect and near perfect adaptation in a model of bacterial chemotaxis”, Biophysical Journal, 84(5), 2843-2856 (2003). • S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y. Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen, D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, T.R. Golub, Prediction of central nervous system embryonal tumour outcome based on gene expression, Nature, Jan 24;415(6870):436-42 (2002). • U. Klein, Y. Tu, G. Stolovitzky, M. Mattioli, G. Cattoretti, H. Husson, A. Freedman, G. Inghirami, L. Cro, L. Baldini, A. Neri, A. Califano and R. Dalla Favera, Gene expression profiling of B cell chronic lymphocytic leukemia reveals a homogeneous phenotype related to memory B cells, J Exp Med. Dec

Page 27 Life Sciences Research @ IBM

3;194(11):1625-38 (2001). • J.J. Rice and M.S. Jafri. Modeling calcium handling in cardiac cells. Philosophical Transactions of the Royal Society London, 359: 1143-1157 (2001).

Functional and Medical Genomics

• Silverman, B.D. Modeling the Effect of Growth Rate and Survivability Trade- Offs on Species Coexistence and Spatial Topology at a Traveling Invasive Wave- Front. Ecological Modeling. In Press. • Kirino, Y., N. Kim, M. de Planell-Saguer, E. Khandros, S. Chiorean, P. S. Klein, I. Rigoutsos, T. A. Jongens and Z. Mourelatos, Methylation of Piwi proteins is conserved across phyla and in D. melanogaster it is catalyzed by PRMT5 and required for Ago3 and Aub stability. Nature Cell Biology. To Appear. • Zalloua PA, Platt DE, El Sibai M, Khalife J, Makhoul N, Haber M, Xue Y, Izaabel H, Bosch E, Adams SM, Arroyo E, López-Parra AM, Aler M, Picornell A, Ramon M, Jobling MA, Comas D, Bertranpetit J, Wells RS, Tyler-Smith C; Genographic Consortium. Identifying genetic traces of historical expansions: Phoenician footprints in the Mediterranean. Am J Hum Genet. 2008 Nov;83(5):633-42. • Parida L, Melé M, Calafell F, Bertranpetit J; Genographic Consortium. Estimating the ancestral recombinations graph (ARG) as compatible networks of SNP patterns. J Comput Biol. 2008 Nov;15(9):1133-54. • Rosset S, Wells RS, Soria-Hernanz DF, Tyler-Smith C, Royyuru AK, Behar DM; Genographic Consortium. Maximum-likelihood estimation of site-specific mutation rates in human mitochondrial DNA from partial phylogenetic classification. Genetics. 2008 Nov;180(3):1511-24. • Behar DM, Blue-Smith J, Soria-Hernanz DF, Tzur S, Hadid Y, Bormans C, Moen A, Tyler-Smith C, Quintana-Murci L, Wells RS; Genographic Consortium. A novel 154-bp deletion in the human mitochondrial DNA control region in healthy individuals. Hum Mutat. 2008 Dec;29(12):1387-91. • Behar DM, Villems R, Soodyall H, Blue-Smith J, Pereira L, Metspalu E, Scozzari R, Makkan H, Tzur S, Comas D, Bertranpetit J, Quintana-Murci L, Tyler-Smith C, Wells RS, Rosset S; Genographic Consortium. The dawn of human matrilineal diversity. Am J Hum Genet. 2008 May;82(5):1130-40. • Zalloua PA, Xue Y, Khalife J, Makhoul N, Debiane L, Platt DE, Royyuru AK, Herrera RJ, Hernanz DF, Blue-Smith J, Wells RS, Comas D, Bertranpetit J, Tyler- Smith C; Genographic Consortium. Y-chromosomal diversity in Lebanon is structured by recent historical events. Am J Hum Genet. 2008 Apr;82(4):873-82. • Gan RJ, Pan SL, Mustavich LF, Qin ZD, Cai XY, Qian J, Liu CW, Peng JH, Li SL, Xu JS, Jin L, Li H; Genographic Consortium. Pinghua population as an exception of Han Chinese's coherent genetic structure. J Hum Genet. 2008;53(4):303-13. • Detection of Subtle Variations as Consensus Motifs, Matteo Comin, Laxmi Parida, Theoretical Computer Science, 395(2-3), pp 158-170, May, 2008. • Christin C, Smilde AK, Hoefsloot HC, Suits F, Bischoff R, Horvatovich PL., Optimized time alignment algorithm for LC-MS data: correlation optimized

Page 28 Life Sciences Research @ IBM

warping using component detection algorithm-selected mass chromatograms, Anal Chem. 15;80(18):7012-21 (2008). • Frank Suits, Jorge Lepre, Peicheng Du, Rainer Bischoff, and Peter Horvatovich, A New Two-Dimensional Method for Time Aligning Liquid Chromatography- Mass Spectrometry Data, Anal Chem.;80(9):3095-104. (2008). • Peicheng Du, Gustavo Stolovitzky, Peter Horvatovich, Rainer Bischoff, Jihyeon Lim, Frank Suits, A Noise Model for Mass Spectrometry Based Proteomic, accepted, Bioinformatics;24(8):1070-7 (2008). • Adam A. Margolin, Teresa Palomero, Pavel Sumazin, Andrea Califano, Adolfo Ferrando, Gustavo Stolovitzky, ChIP-on-chip significance analysis reveals large- scale binding and regulation by human transcription factor oncogenes, Proc Natl Acad Sci U S A.;106(1):244-9 (2008). • Tay Y., A. Thomson, T. Huynh, J. Zhang, B. Lim and I. Rigoutsos, “MicroRNAs to Nanog, Oct4 & Sox2 coding regions modulate embryonic stem cell differentiation.” Nature, 455(7216):1124-8, October 2008. Epub September 17 2008. • Kalyuzhnaya, M. G., A. Lapidus, N. Ivanova, A. C. Copeland, A. C. McHardy, E. Szeto, A. Salamov, I. V. Grigoriev, D. Suciu, S. R. Levine, V. M. Markowitz, I. Rigoutsos, S. G. Tringe, D. C. Bruce, P. M. Richardson, M.. E. Lidstrom and L. Chistoserdova, “High-resolution metagenomics targets major functional types in complex microbial communities.” Nature Biotechnology, 26(9):1029-34, September 2008. • Holland, L. Z., R. Albalat, K. Azumi, E. Benito-Gutierrez, M. J. Blow, M. Bronner-Fraser, F. Brunet, T. Butts, S. Candiani, L. J. Dishaw, D. E. Ferrier, J. Garcia-Fernandez, J. J. Gibson-Brown, C. Gissi, A. Godzik, F. Hallbook, D. Hirose, K. Hosomichi, T. Ikuta, H. Inoko, M. Kasahara, J. Kasamatsu, T. Kawashima, A. Kimura, M. Kobayashi, Z. Kozmik, K. Kubokawa, V. Laudet, G. W. Litman, A. C. McHardy, D. Meulemans, M. Nonaka, R. P. Olinski, Z. Pancer, L. Pennacchio, M. Pestarino, J. P. Rast, I. Rigoutsos, M. Robinson-Rechavi, G. Roch, H. Saiga, Y. Sasakura, M. Satake, Y. Satou, M. Schubert, N. Sherwood, T. Shiina, N. Takatori, J. Tello, P. Vopalensky, S. Wada, A. Xu, Y. Ye, K. Yoshida, F. Yoshizaki, J. K. Yu, Q. Zhang, C. M. Zmasek, P. J. de Jong, K. Osoegawa, N. H. Putnam, D. S. Rokhsar, N. Satoh and P. W. Holland, “The amphioxus genome illuminates vertebrate origins and cephalochordate biology.” Genome Research, 18(7):1100-11, July 2008. Epub June 18 2008. • Tsirigos, A. and I. Rigoutsos, “Human and mouse introns are linked to the same processes and functions through each genome's set of most frequent non- conserved motifs.” Nucleic Acids Research, 36(10):3484-3493, June 2008. Epub May 03, 2008. • Styczynski, M., K. Jensen, I. Rigoutsos and G. Stephanopoulos, “Miscalculations in BLOSUM62 surprisingly improve search performance.” Nature Biotechnology, 26(3):274-275, March 2008. • Wang, W. X., B. W. Rajeev, A. J. Stromberg, N. Ren, G. Tang, Q. Huang, I. Rigoutsos and P. T. Nelson, “The expression of microRNA miR-107 decreases early in Alzheimer's disease and may accelerate disease progression through regulation of beta-site amyloid precursor protein-cleaving enzyme 1.” J Neurosci,

Page 29 Life Sciences Research @ IBM

28(5):1213-23, January 2008. • Motif Patterns in 2D, Alberto Apostolico, Laxmi Parida, Simona E. Rombo, Theoretical Computer Science, vol 390, N0 1, pp 40-55, 22 January 2008. • Stas Polonsky, Steve Rossnagel, and Gustavo Stolovitzky, Nanopore in metal- dielectric sandwich for DNA position control, Applied Physics Letters, 91,153103 (2007) • Statistical Significance of Large Gene Clusters, Laxmi Parida, Journal of Computational Biology, 14(9), pp 1145–1159, 2007. • Behar DM, Rosset S, Blue-Smith J, Balanovsky O, Tzur S, Comas D, Mitchell RJ, Quintana-Murci L, Tyler-Smith C, Wells RS; Genographic Consortium. The Genographic Project public participation mitochondrial DNA database. PLoS Genet. 2007 Jun;3(6):e104. Erratum in: PLoS Genet. 2007 Sep 14;3(9):1785. • Horvatovich P, Govorukhina NI, Reijmers TH, van der Zee AG, Suits F, Bischoff R., Chip-LC-MS for label-free profiling of human serum, Electrophoresis. 2007 Dec;28(23):4493-505. • Warnecke, F., P. Luginbühl, N. Ivanova, M. Ghassemian, T. Richardson, J. Stege, M. Cayouette, G. Djordjevic, N. Aboushadi, R. Sorek, S. Tringe, M. Podar, H. Garcia Martin, V. Kunin, D. Dalevi, J. Madejska, E. Kirton, D. Platt, E. Szeto, A. Salamov, K. Barry, N. Mikhailova, N. Kyrpides, E. Matson, E. Ottesen5, X. Zhang, A. McHardy, M. Hernández, C. Murillo, L. Acosta, I. Rigoutsos, G. Tamayo, B. Green, C. Chang, E. Rubin, E. Mathur, D. Robertson, P. Hugenholtz, and J. Leadbetter, “Metagenomic and functional analysis of hindgut microbiota of a wood feeding higher termite.” Nature, 450(7169):560-5, November 2007. • Tay, Y., W.-L. Tam, Y.-S. Ang, P. Gaughwin, H. Yang, W. Weijia, L. Rubing, J. George, H.-H. Ng, R. Perera, T. Lufkin, I. Rigoutsos¶, A. Thomson¶, and B. Lim¶, “MicroRNA-134 modulates the differentiation of mouse embryonic stem cells where it causes post-transcriptional attenuation of Nanog and LRH1.” Stem Cells, 26(1):17-29, January 2008. Epub Oct 4 2007. • McHardy, A. and I. Rigoutsos, “What’s in the mix: phylogenetic classification of metagenome sequence samples.” Curr Opinion in Microbiology, 10(5):499-503, October 2007. • Mavromatis, K., N. Ivanova, K. Barry, H. Shapiro, E. Goltsman, A.C. McHardy, I. Rigoutsos, A. Salamov, F. Korzeniewski, M. Land, A. Lapidus, I. Grigoriev, P. Richardson, P. Hugenholtz and N.C. Kyrpides, “Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.” Nature Methods, 4(6):495-500, June 2007. Epub Apr 29 2007. • Discovering Topological Motifs Using a Compact Notation, Laxmi Parida, Journal of Computational Biology, 14(3), pp 46–69, 2007. • Gapped Permutation Pattern Discovery for Gene Order Comparisons, Laxmi Parida, Journal of Computational Biology, vol 14, No 1, pp 46-56, 2007. • McHardy, A. C., H. G. Martín, A. Tsirigos, P. Hugenholtz and I. Rigoutsos, “Accurate phylogenetic classification of DNA fragments based on sequence composition.” Nature Methods, 4(1):63-72, January 2007. Epub Dec 10 2006 • Mullins IM, Siadaty MS, Lyman J, Scully K, Garrett CT, Miller WG, Muller R, Robson B, Apte C, Weiss S, Rigoutsos I, Platt D, Cohen S, Knaus WA. Data mining and clinical data repositories: Insights from a 667,000 patient data set.

Page 30 Life Sciences Research @ IBM

Comput Biol Med. 2006 Dec;36(12):1351-77. • G. A. Held, Keith Duggar and G. Stolovitzky, "Comparison of Amersham and Agilent microarray technologies through quantitative noise analysis," OMICS, 10(4), 532-544 (2006). • Using PQ Structures for Genomic Rearrangement Phylogeny, Laxmi Parida, Journal of Computational Biology, 13(10), pp 1685-1700, 2006. • Krause, L., A.C. McHardy, T.W. Nattkemper, A. Puhler, J. Stoye and F. Meyer. GISMO--gene identification using a support vector machine for ORF classification. Nucleic Acids Research, December 2006. • Lan, H., X. Huang, R. Zhou and B.J. Berne. Water dynamics in the salvation shell of a multi-domain protein, J. Phys. Chem. B., 110, 3704-3711, 2006. • Lwin, T.Z., R. Zhou and R. Luo. Is Poisson-Boltzmann theory insufficient for protein folding simulations? J. Chem. Phys. 124, 34902-34907, 2006. • Loose, C., K Jensen, I. Rigoutsos and G. Stephanopoulos, A Linguistic Model for the Rational Design of Antimicrobial Peptides, Nature, (7113):867-9, October 2006. • Krause, A., A. Ramakumar, D. Bartels, F. Battistoni, T. Bekel, J. Boch, M. Bohm, F. Friedrich, T. Hurek, L. Krause, B. Linke, A.C. McHardy, A. Sarkar, S. Schneiker, A.A. Syed, R. Thauer, F.J. Vorholter, S. Weidner, A. Puhler, B. Reinhold-Hurek, O. Kaiser and A. Goesmann. Complete genome of the mutualistic, N2-fixing grass endophyte Azoarcus sp. strain BH72. Nature Biotechnology, 24(11):1385-91, November 2006. Epub 2006 Oct 22. • Eleftheriou, R., A. Germain, A. Royyuru and R. Zhou. Thermal denaturing of mutant lysozyme with both OPLSAA and CHARMM force fields, J. Am. Chem. Soc. 128, 13388-13395, 2006. • McHardy, A.C., H. Neuweger, L. Krause and F. Meyer. REGANOR: a gene prediction server for prokaryotic genomes and a database of high quality gene predictions for prokaryotes. Applied Bioinformatics, 5(3):193-8, 2006. • Martin, H.G, N. Ivanova, V. Kunin, F. Warnecke, K. Barry, A. C. McHardy, C. Yeates, S. He, A. Salamov, E. Szeto, E. Dalin, N. Putnam, H. J. Shapiro, J. L. Pangilinan, I. Rigoutsos, N. C. Kyrpides, L. L. Blackall, K. D. McMahon and P. Hugenholtz, Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nature Biotechnology. 24(10):1263-9, October 2006. Epub Sep 24 2006. • Miranda, K., T. Huynh, Y. Tay, Y.-S. Ang, W.-L. Tam, A. Thomson, B. Lim and I. Rigoutsos, A pattern-based method for the identification of micro-RNA target sites and their corresponding heteroduplexes. Cell. 126, 1203-1217, September 2006. • Li, J., X. Li, M. Eleftheriou, and R. Zhou. Hydration and Dewetting near Fluoronated Superhydrophobic Plates, J. Am. Chem. Soc. 128, 12439-12447, 2006. • Zhou, R., L. Parida, K. Kapila, and S. Mudur. PROTERAN: Animated Terrain Evolution for Visual Analysis of Patterns in Protein Folding Trajectory, Bioinformatics, in press, 2006. • Silverman, B.D. Hydrophobic and acidic moments of a nucleoplasmin NP-core chaperone. Journal of Biomolecular Structure and Dynamics, 24(1):49-56, August

Page 31 Life Sciences Research @ IBM

2006. • Zhou, R., A. Royyuru, P. Athma, F. Suits and B.D. Silverman. Magnitude and Spatial Orientation of the Hydrophobic Moments of Multi-Domain Proteins, Int. J. Bioinf. Res. Appl. 2, 161-176, 2006. • Schneiker, S., V.A. Martins dos Santos, D. Bartels, T. Bekel, M. Brecht, J. Buhrmester, T.N. Chernikova, R. Denaro, M. Ferrer, C. Gertler, A. Goesmann, O.V. Golyshina, F. Kaminski, A.N. Khachane, S. Lang, B. Linke, A.C. McHardy, F. Meyer, T. Nechitaylo, A. Puhler, D. Regenhardt, O. Rupp, J.S. Sabirova, W. Selbitschka, M.M. Yakimov, K.N. Timmis, F.J. Vorholter, S. Weidner, O. Kaiser O and P.N. Golyshin. Genome sequence of the ubiquitous hydrocarbon-degrading marine bacterium Alcanivorax borkumensis. Nature Biotechnology, 24(8):997- 1004, August 2006. Epub 2006 Jul 30. • Liu, P., X. Huang, R. Zhou and B.J. Berne. Hydrophobic Aided Replica Exchange Method for Protein Folding, J. Phys. Chem. B 110, 19018 - 19022, 2006. • Rigoutsos, I., T. Huynh, K. Miranda, A. Tsirigos, A. McHardy, and D. Platt. Short Blocks from the Noncoding Parts of the Human Genome have Instances within Nearly All Known Genes and Relate to Biological Processes. PNAS, 103(17):6605-10, April 2006. • Liu, T., L. Ye, H. Chen, J. Li, Z. Wu and R. Zhou. A Combined Steepest Descent and Genetic Algorithm (SD/GA) Approach for the Optimization of Solvation Parameters, Mol. Simul. 32, 427-435, 2006. • Zhou, R. and B. J. Berne, Recent Progress in Protein Folding Kinetics, Ann. Rev. of Phys. Chem., 2006 • Jensen, K., M. Styczynski, I. Rigoutsos and G. Stephanopoulos, "A Generic Motif Discovery Algorithm for Sequential Data." Bioinformatics, 22(1):21-8, January 2006. • G. Stolovitzky, A. Kundaje, G.A. Held, K. Duggar, C. Haudenschild, D. Zhou, T. Vasicek, K. Smith, A. Aderem and J. Roach, Statistical analysis of MPSS measurements: application to the study of LPS activated macrophage gene expression, Proc. Natl. Acad. Sci. USA, 102 (5), 1402-1407, (2005). • Lan, H., X. Huang, R. Zhou and B. J. Berne, Water dynamics in the salvation shell of a multi-domain protein, J. Phys. Chem. B. 109, 2005. • Silverman, B. David, "Asymmetry in the burial of hydrophobic residues along the histone chains of Eukarya, Archaea, and a transcription factor." BMC Structural Biology, 5:20, 2005 • Gene Proximity Analysis Across Whole Genomes via PQ Trees, G M Landau, L Parida, O Weimann, to appear in Journal of Computational Biology, 2005. • Malware Phylogeny Generation Using Permutations of Code, A Lakhotia, M E Karim, A Walenstein, L Parida, Journal in Computer Virology, 2005. • An inexact suffix tree based algorithm for extensible pattern discovery, Abhijit Chattaraj, Laxmi Parida, Theoretical Computer Science, 335:1, pp 3-14, 2005. • Lei, L., J. Li, T. Liu, Z. Wu and R. Zhou, Wavelets approach for protein trajectory analysis. J. Bioinfo. Comp. Biol., 3, 2005. • Thieme F, Koebnik R, Bekel T, Berger C, Boch J, Buttner D, Caldana C, Gaigalat L, Goesmann A, Kay S, Kirchner O, Lanz C, Linke B, McHardy AC, Meyer F, Mittenhuber G, Nies DH, Niesbach-Klosgen U, Patschkowski T, Ruckert C, Rupp

Page 32 Life Sciences Research @ IBM

O, Schneiker S, Schuster SC, Vorholter FJ, Weber E, Puhler A, Bonas U, Bartels D, Kaiser O. Insights into Genome Plasticity and Pathogenicity of the Plant Pathogenic Bacterium Xanthomonas campestris pv. vesicatoria Revealed by the Complete Genome Sequence. J. Bacteriol. 187:7254-66, 2005.[ • Zhou, R., A. Royyuru, P. Athma, F. Suits and B. D. Silverman, Magnitude and Spatial Orientation of the Hydrophobic Moments of Multi-Domain Proteins. Int. J. Bioinfo. Res. Appl., 2, in press, 2005 • Mullins, I. M., M. S. Siadaty, J. Lyman, K. Scully, C. T. Garrett, W. G. Miller, R. Muller, B. Robson, C. Apte, S. Weiss, I. Rigoutsos, D. Platt, S. Cohen and W. A. Knaus, "Data mining and clinical data repositories: Insights from a 667,000 patient data set." Computers in Biology and Medicine, 36(12):1351-77, December 2005. Epub Dec 22, 2005. • Darzentas, N., I. Rigoutsos and C. Ouzounis, "Sensitive Detection of Sequence Similarity Using Combinatorial Pattern Discovery: a Challenging Study of Two Distantly Related Protein Families." Proteins, 61(4):926-37, December 2005. • Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA, Ruckert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucl. Acids Res. 33:5691-702, 2005.[ • Silverman, B. David, "Underlying Hydrophobic Sequence Periodicity of Protein Tertiary Structure." Journal of Biomolecular Structure and Dynamics 22, 411- 423, 2005 • Tsirigos, A and I. Rigoutsos, "A Sensitive Method Capable of Detecting Horizontal Gene Transfers in Viral, Archaeal and Bacterial Genomes." Nucleic Acids Research, 33(12):3699-3707, July 2005. • Silverman, B. David, "The Hydrophobicity of the H3 Histone Fold differs from the Hydrophobicity of the other three Folds." Journal of Molecular Evolution, 60, 354-364, 2005 • Wouters, M., I. Rigoutsos, C. Chu, L. Feng, D. Sparrow and S. Dunwoodie, "Evolution of Distinct EGF Domains with Specific Functions." Protein Science, 14(4):1091-1103, April 2005. • Tsirigos, A. and I. Rigoutsos, "A New Computational Method for the Detection of Horizontal Gene Transfer Events." Nucleic Acids Research, 33(3):922-933, February 2005. • Huang, X., R. Zhou, and B. J. Berne, Drying and Hydrophobic Collapse of Paraffin Plates, J. Phys. Chem. B 109, 3546-3552, 2005 • Parida, L., and R. Zhou, Combinatorial Pattern Discovery Approach for the Folding Trajectory Analysis of a beta-hairpin, PLoS Comp. Biol. 1, 32-40, 2005 • Li, J., H. Chen, T. Liu, L. Ye, H. Fang, X. Tang, Z. Wu and R. Zhou, Water Hydration Near Graphite-CH3 and Graphite-COOH Surfaces, J. Phys. Chem. B 109, 13639-13648, 2005 • Liu, P., X. Huang, R. Zhou and B. J. Berne, Drying and Hydrophobic Collapse of

Page 33 Life Sciences Research @ IBM

Melittin Tetramer, Nature, 437, 159-162, 2005 • J. Lepre, J.J. Rice, Y. Tu, and G. Stolovitzky, Genes@Work: an efficient algorithm for pattern discovery and multivariate feature selection in gene expression data, Bioinformatics 20(7):1033-44 (2004). • K. Basso, U. Klein, H. Niu, G. Stolovitzky, Y. Tu, A. Califano, G. Cattoretti, R. Dalla Favera, Tracking CD40 signaling during normal germinal center development, Blood 104(13):4088-96 (2004). • Zhou, R., Sampling Protein Folding Free Energy Landscape: Coupling Replica Exchange Method with P3ME/RESPA Algorithm , J. Mol. Grap. Model. 22, 451- 463, 2004 • A.C. McHardy, A. Goesmann, A. Pühler, F. Meyer. Development of joint application strategies for two microbial gene finders, Bioinformatics, 20:1622-31, 2004. • Zhou, R., G. Krilov and B. J. Berne, Comment on "Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in water?": The Poisson-Boltzman Model, J. Phys. Chem. B 108, 7528 - 7530, 2004 • Swope, W., J. Pitera, F. Suits, M. Pitman, M. Eleftheriou, B. Fitch, R. Germain, A. Rayshubskiy, T. J. C. Ward, Y. Zhestkov, and R. Zhou, Describing Protein Folding Kinetics by Molecular Dynamics Simulations: II. Application to a beta- hairpin Peptide, J. Phys. Chem. B 108, 6582-6594, 2004 • A.C. McHardy, J. Kalinowski, A. Pühler, F. Meyer. Comparing expression level- dependent features in codon usage with protein abundance: An analysis of predictive proteomics, Proteomics 4: 46-58, 2004. • Zhou, R., X. Huang, C. Margulius and B. J. Berne, Hydrophobic Collapse in Multi-domain Protein Folding, Science, 305, 1605-1609, 2004 • Eres, R., G.M. Landau and L. Parida, "Permutation Pattern Discovery in Biosequences." In Journal of Computational Biology. To Appear. 2004. • Chattaraj A. and L. Parida, "An inexact suffix tree based algorithm for extensible pattern discovery." In Theoretical Computer Science. To Appear. 2004 • Huynh, T. and I. Rigoutsos, "The Web Server of IBM's Bioinformatics and Pattern Discovery Group: 2004 update." In Nucleic Acids Research, 32:10-15, July 2004. • Paredes, C., I. Rigoutsos and E. Papoutsakis, "Transcriptional Organization of the Clostridium acetobutylicum Genome." In Nucleic Acids Research, 32(6):1973- 1981, April 2004. • Wong, M. S., R. M. Raab, I. Rigoutsos, G. N. Stephanopoulos, and J. K. Kelleher, "Metabolic and Transcriptional Patterns Accompanying Glutamine Depletion and Repletion in Mouse Hepatoma Cells: A model for Physiological Regulatory Networks." In Physiological Genomics, 16(2):247-55, January 2004. • Apostolico, A. and L. Parida, "Incremental Paradigms of Motif Discovery." In Journal of Computational Biology 11(1):15-25, January 2004. • D.F. Jelinek, R.C. Tschumper, G. Stolovitzky, S.J. Iturria, Y. Tu, J. Lepre, N. Shah, and N.E. Kay, Identification of a global Gene Expression Signature of B- Chronic-Lymphocytic Leukemia, Molecular Cancer Research, 1 (5) :346-61 (2003). • U. Klein, Y. Tu, G. Stolovitzky, J.L. Keller, J. Haddad Jr., V. Miljkovic, G.

Page 34 Life Sciences Research @ IBM

Cattoretti, A. Califano, and R. Dalla Favera, Transcriptional analysis of the germinal-center reaction, Proc. Nat. Acad. Sci. USA, 100(5):2639-44, (2003). • The IBM Bioinformatics GroupWeb Server - Tools and Content, Tien Huynh, Isidore Rigoutsos, Laxmi Parida, Daniel Platt, Tetsuo Shibuya, Nuleic Acids Research, 31(13):3645-3650, July 2003. • G. Stolovitzky, Gene selection in microarray data: the elephant, the blind men and our algorithms. Current Opinion in Structural Biology, 13:370–376 (2003). • R. Kuppers, U. Klein, I. Schwering, V. Distler, A. Brauninger, G. Cattoretti, Y. Tu, G. Stolovitzky, A. Califano, M.L. Hansmann, R. Dalla Favera, Identification of Hodgkin and Reed-Sternberg cell-specific genes by gene expression profiling. J Clin Invest. 111(4):529-37 (2003). • Kaminski, G., R. A. Friesner and R. Zhou, A computationally inexpensive modification of the point dipole electrostatic polarization model for molecular simulation, J. Comp. Chem. 24, 267-276, 2003 • Silverman, B. David, "Hydrophobicity of Transmembrane Proteins: Spatially profiling the distribution." Protein Science 12, 586-599, 2003 • Wilke, C. Rückert, D. Bartels, M. Dondrup, A. Goesmann, A. Hüser, S. Kespohl, B. Linke, M. Mahne, A. McHardy, A. Pühler, F. Meyer. Bioinformatics support for high-throughput proteomics, J. Biotechnol. 106: 147-56, 2003. • Zhou, R., B. D. Silverman, A. Royyuru, and P. Athma, Spatial Profiling of Protein Hydrophobicity: Native vs. Decoy Structures , Proteins, 52, 561-572, 2003 • Zhou, R., Folding free energy landscape of protein folding in water: explicit vs. implicit solvent, Proteins 53, 148-161, 2003 • Goesmann, B. Linke, O. Rupp, L. Krause, D. Bartels, M. Dondrup, A.C. McHardy, A. Wilke, A. Pühler, F. Meyer. Building a BRIDGE for the integration of heterogeneous data from functional genomics into a platform for systems biology, J. Biotechnol. 106: 157-67, 2003. • Fitch, B. R., R. S. Germain, M. Mendell, J. Pitera, M. Pitman, A. Rayshubskiy, Y. Sham, F. Suits, W. Swope, T. J. C. Ward, Y. Zhestkov, and R. Zhou, Blue Matter, An Application Framework for Molecular Simulation on Blue Gene , J. Parallel & Distrib. Comput. 63, 759-773, 2003 • Zhou, R., Trp-cage: Folding Free Energy Landscape in Explicit Water, Proc. Natl. Acad. Sci., 100, 13280-13285, 2003 • Murphy, E., I. Rigoutsos, T. Shibuya and T. Shenk, "Re-evaluation of Human Cytomegalovirus Coding Potential." Proc. Nat. Academy of Sciences, USA. In Proc. Nat. Acad. Sciences USA, 100(23):13585-13590, November 2003 • Azumi, K., R. De Santis, A. De Tomaso, I. Rigoutsos, F. Yoshizaki, M. R. Pinto, R. Marino, K. Shida, M. Ikeda, M. Arai, Y. Inoue, T. Shimizu, N. Satoh, D.S. Rokhsar, L. Du Pasquier, M. Kasahara, M. Satake and M. Nonaka, "Genomic Analysis of Immunity in a Urochordate and the Emergence of the Vertebrate Immune System: Waiting for Godot." In Immunogenetics, 55(8):570-581, November 2003. • Silverman, B. David, "Hydrophobic Moments of Protein Tertiary Structure." Proteins: Structure, Function and Genetics 53, 880-888, 2003 • J. Kalinowski, B. Bathe, D. Bartels, N. Bischoff, M. Bott, A. Burkovski, N.

Page 35 Life Sciences Research @ IBM

Dusch, L. Eggeling, B.J. Eikmanns, L. Gaigalat, A. Goesmann, M. Hartmann, K. Huthmacher, R. Krämer, B. Linke, A.C. McHardy, F. Meyer, B. Mockel, W. Pfefferle, A. Pühler, D.A. Rey, C. Rückert, O. Rupp, H. Sahm, V.F. Wendisch, I. Wiegräbe, A. Tauch. The complete Corynebacterium glutamicum ATCC 13032 genome sequence and its impact on the production of L-aspartate-derived amino acids and vitamins, J. Biotechnol. 104: 5-25, 2003. • Platt, D., C. Guerra, G. Zanotti and I. Rigoutsos, "Global secondary structure packing angle bias in proteins." In Proteins: Structure, Function and Genetics, 53(2):252-261, November 2003. • Miranda, K.C., J.R. Shannon, A.S. Yap, R.D.Teasdale, and J.L. Stow, "Contextual Binding of p120ctn to E-cadherin at the Basolateral Plasma Membrane in Polarized Epithelia." In Journal of Biological Chemistry, 278:43480 - 43488, October 2003. • Rigoutsos, I., P. Riek, R. M. Graham and J. Novotny, "Structural Details (Kinks and Non-a Conformations) in Transmembrane Helices are Intrahelically Determined and can be Predicted by Sequence Pattern Descriptors." In Nucleic Acids Research, 31(15):4625-31, August 2003. • Huynh, T., I. Rigoutsos, D. Platt, L. Parida and T. Shibuya, "The Web Server of IBM's Bioinformatics and Pattern Discovery Group." Nucleic Acids Research, 31(13):3645-3650, July 2003. • A.C. McHardy, A. Tauch, C. Rückert, A. Pühler, J. Kalinowski. Genome-based analysis of biosynthetic aminotransferase genes of Corynebacterium glutamicum, J. Biotechnol. 104: 229-40, 2003. • Grimmond S.M., K.C. Miranda, Z. Yuan, M.J. Davis, D.A. Hume, K. Yagi, N. Tominaga, H. Bono, Y. Hayashizaki, T. Okazaki, R.D. Teasdale, "The Mouse Secretome: Functional Classification of the Proteins Secreted Into the Extracellular Environment." In Genome Research, 13(6b):1350-1359, June 2003. • Iliopoulos, I., S. Tsoka, M. A. Andrade, A. J. Enright, M. Carroll, P. Poullet, V. Promponas, T. Liakopoulos, G. Palaios, C. Pasquier, S. Hamodrakas, J. Tamames, A. T. Yagnik, A. Tramontano, D. Devos, C. Blaschke, A. Valencia, D. Brett, D. Martin, C. Leroy, I. Rigoutsos, C. Sander and C. A. Ouzounis, "Evaluation of Annotation Strategies Using an Entire Genome Sequence. " In Bioinformatics 19(6):717-726, June 2003. • Rigoutsos, I., J. Novotny, T. Huynh, S. Chin-Bow, L. Parida, D. Platt, D. Coleman and T. Shenk, "In Silico Pattern-based Analysis of the Human Cytomegalovirus (HHV5) Genome." In Journal of Virology, 77(7):4326-4344, April 2003. • F. Meyer, A. Goesmann, A.C. McHardy, D. Bartels, T. Bekel, J. Clausen, J. Kalinowski, B. Linke, O. Rupp, R. Giegerich, A. Pühler. GenDB--an open source genome annotation system for prokaryote genomes, Nucleic Acids Res. 31: 2187- 95, 2003. • Apostolico, A. and L. Parida, "Compression and the Wheel of Fortune." In Proceedings of IEEE Data Compression Conference (DCC '03). Snowbird, Utah, March 2003. • Incremental Paradigms ofMotif Discovery, Alberto Apostolico, Laxmi Parida, to appear in Journal of Computational Biology, 2003.

Page 36 Life Sciences Research @ IBM

• S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y. Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen, D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, T.R. Golub, Prediction of central nervous system embryonal tumour outcome based on gene expression, Nature, Jan 24;415(6870):436-42 (2002). • Kaminski, G., H. A. Stern, B. J. Berne, R. A. Friesner, Y. Cao, R. B. Murphy, R. Zhou, and T. A. Halgren, Development of a Polarizable Force Field for Proteins via ab initio Quantum Chemistry: First Generation Model and Gas phase Tests, J. Comp. Chem. 23, 1515-1531, 2002 • Zhou, R., and B. J. Berne, Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in water?, Proc. Natl. Acad. Sci. 99, 12777-12782, 2002 • Silverman, B. David, "A Two-Component Nucleation Model of Protein Hydrophobicity.' Journal of Theoretical Biology 216, 130-146, 2002 • Dehal, P., Y. Satou, R. K. Campbell, J. Chapman, B. Degnan, A. De Tomaso, B. Davidson, A. Di Gregorio, M. Gelpke, D. M. Goodstein, et al., "The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins." In Science, 2157-2167, December 2002. • Wang, X., J. T.L. Wang, D. Shasha, B. A. Shapiro, I. Rigoutsos and K. Zhang, "Finding Patterns in Three Dimensional Graphs: Algorithms and Applications to Scientific Data Mining." In IEEE Transactions on Data and Knowledge Engineering, 14(4):731-749, July/August 2002 • Rigoutsos, I., T. Huynh, A. Floratos, L. Parida and D. Platt, "Dictionary-driven Protein Annotation." In Nucleic Acids Research. 30(17), September 2002. • Shibuya, T. and I. Rigoutsos, "Dictionary-driven Prokaryotic Gene Finding." In Nucleic Acids Research. 30(12), July 2002. • U. Klein, Y. Tu, G. Stolovitzky, M. Mattioli, G. Cattoretti, H. Husson, A. Freedman, G. Inghirami, L. Cro, L. Baldini, A. Neri, A. Califano and R. Dalla Favera, Gene expression profiling of B cell chronic lymphocytic leukemia reveals a homogeneous phenotype related to memory B cells, J Exp Med. Dec 3;194(11):1625-38 (2001). • Zhou, R., B. J. Berne and R. Germain, Free energy landscape of a beta-hairpin folding in explicit water, Proc. Natl. Acad. Sci. 98, 14931-14936, 2001. • An Output-sensitive Flexible Pattern Discovery Algorithm, Laxmi Parida, Isidore Rigoutsos, Dan Platt, Combinatorial Pattern Matching (CPM 2001), LNCS vol 2089, pp 131–142, 2001. • Novotny, J., I. Rigoutsos, D. Coleman and T. Shenk, "In Silico Structural and Functional Analysis of the Human Cytomegalovirus (HHV5) Genome" In Journal of Molecular Biology, 310(5):1151-1166, July 2001. • Platt, D.E., "Are Mitochondria Mesoscopic?" In Biophysical Chemistry. 91(3):245-252 (2001) • Floratos, A., I. Rigoutsos, L. Parida and Y. Gao, "DELPHI: A pattern-based method for detecting sequence similarity." In IBM Journal of Research and Development, 45(3/4):455-474, May/July 2001. • Platt, D. E., L. Parida, Y. Gao, A. Floratos and I. Rigoutsos, "QSAR in Grossly

Page 37 Life Sciences Research @ IBM

Underdetermined Systems: Opportunities and Issues." In IBM Journal of Research and Development. 45(3/4):533-544, May/July 2001. • Miranda KC, T. Khromykh, P. Christy, T.L. Le, C.J. Gottardi, A.S. Yap, J.L. Stow and R.D. Teasdale, "A Dileucine Motif Targets E-cadherin to the Basolateral Cell Surface in Madin-Darby Canine Kidney and LLC-PK1 Epithelial Cells." In Journal of Biological Chemistry, 276(25):22565-72, June 2001. • Silverman, B. David, "Hydrophobic Moments of Protein Structures: Spatially Profiling the Distribution." Proceedings of the National Academy of Sciences 98, 4996-5001, 2001 • Riek, R. P., I. Rigoutsos, J. Novotny and R. M. Graham, "Non-a-Helical Elements Modulate Polytopic Membrane Architecture." In Journal of Molecular Biology, 306(2):349-362, February 2001. • Allen, F., G. Almasi, W. Andreoni, D. Beece, B. J. Berne, A. Bright, J. Brunheroto, C. Cascaval, J. Castanos, P. Coteus, P. Crumley, A. Curioni, M. Denneau, W. Donath, M. Eleftheriou, B. Fitch, B. Fleischer, C. J. Georgiou, R. Germain, M. Giampapa, D. Gresh, M. Gupta, R. Haring, H. Ho, P. Hochschild, S. Hummel, T. Jonas, D. Lieber, G. Martyna, K. Maturu, J. Moreira, D. Newns, M. Newton, R. Philhower, T. Picunko, J. Pitera, M. Pitman, R. Rand, A. Royyuru, V. Salapura, A. Sanomiya, R. Shah, Y. Sham, S. Singh, M. Snir, F. Suits, R. Swetz, W. C. Swope, N. Vishnumurthy, T. J. C. Ward, H. Warren, and R. Zhou, Blue Gene: A vision for protein science using a petaflop supercomputer, IBM Systems Journal 40, 310-327, 2001 • Zhou, R., E. Harder, H. Xu and B. J. Berne, Efficient multiple time step method for use with Ewald and Particle-Mesh Ewald for large biomolecular systems , J. Chem. Phys. 115, 2348-2358, 2001 • Zhou, R., R. A. Friesner, A. Ghosh, R. C. Rizzo, W. L. Jorgensen, and R. M. Levy, New Linear Interaction Method for Binding Affinity Calculations using a Continuum Solvent Model, J. Phys. Chem. B105, 10388-10397, 2001 • Silverman, B. David, "Molecular Moments for Computer-Aided Drug Discovery." Inaugural Issue of Mini-Reviews in Medicinal Chemistry 1(1), 1-4, 2001 • Parida, L., "Some Results on Flexible-pattern Discovery." In Proceedings 11th Annual Symposium on Combinatorial Pattern Matching. June 2000. Montreal, Canada. • Silverman, B. David, "Three-Dimensional Moments of Molecular Property Fields." Journal of Chemical Information and Computer Science 40, 1470-1476, 2000 • Rigoutsos, I., A. Floratos, L. Parida, Y. Gao and D. E. Platt, "The Emergence of Pattern Discovery Techniques in Computational Biology." In Metabolic Engineering, 2(3):159-177, July 2000. • Silverman, B. David, "The Thirty-one Benchmark Steroids Revisited: Comparative Molecular Moment Analysis (CoMMA) with Principal Component Regression." QSAR (Quantitative Structure Activity Relations) 19, 237-246, 2000. • Some Results on Flexible-pattern Discovery, Laxmi Parida, Combinatorial Pattern Matching (CPM 2000), LNCS vol 1848, pp 33–45, 2000.

Page 38