BIOMEDICAL COMPUTATION @ STANFORD 2000 SYMPOSIUM PROCEEDINGS

BCATS 2000 SYMPOSIUM PROCEEDINGS

Copyright Ó 2000 Biomedical Computation at Stanford (BCATS)

Printed in United States of America

Editors: David Paik, Jonathan Dugan Associate Editors: Brooke Steele, Olga Troyanskaya

“Hands” artwork courtesy of Biomedical Information Technology at Stanford (BITS)

Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. copyright law for private use of patrons.

Ordering information: [email protected] Web Site: http://bcats.stanford.edu

iii BIOMEDICAL COMPUTATION AT STANFORD 2000

Symposium Co-Chairs Jonathan Dugan David Paik Brooke Steele Olga Troyanskaya

Administrative Help Stanley Jacobs Kevin Lauderdale Rosalind Ravasio Darlene Vian

Symposium Volunteers Michael Cantor Jeffrey Chang Carol Cheng David Elgart Valerie Favier Yueyi (Irene) Liu Jodi Elgart Paik Bill Petitt Rosalind Ravasio Tomoko Shintani Matt Stocksiek

Symposium Sponsorship Biomedical Information Technology at Stanford (BITS) & The National Library of Medicine Northern California Pharmaceutical Discussion Group DoubleTwist InforMax Incyte Genomics GeneLogic Skjervan, Morril, MacPherson, LLP Genencor International Guidant SGI Sun Microsystems

iv TABLE OF CONTENTS

I. Symposium Information………………………………………………….. 1 a. Acknowledgements b. Symposium Schedule and Map II. Keynote Speakers……………………………………………..………..… 5 a. David Haussler, Ph.D. b. Richard Satava, M.D., F.A.C.S. III. Abstract List………………………………………………..…………….. 9 IV. Scientific Talks Session I………………………………….………….….17 V. Scientific Talks Session II………………………………………………. 29 VI. Poster Session / Software Demonstration.…………….…..…………..... 39 VII. Symposium Participant List……………………………….…………….101 VIII. Symposium Sponsors………………………………...…………………115

v vi SYMPOSIUM INFORMATION

1 2 BCATS 2000 Symposium Proceedings Symposium Information

ACKNOWLEDGEMENTS

This symposium would not have been possible without help from many people and organizations, both financial and in the donation of peoples’ time.

We’d like to acknowledge the Biomedical Information Technology at Stanford (BITS) faculty group for the initial idea of creating this symposium and for their suggestions in various aspects of the process of planning the symposium.

We’d like to thank Dr. Richard Satava and Dr. David Haussler for taking the time out of their very busy schedules to give the keynote speeches at the symposium.

We’d also like to acknowledge the National Library of Medicine for its financial support through the Medical Informatics Training Grant supplement T15-LM 07033.

We’d also like to acknowledge the Northern California Pharmaceutical Discussion Group, DoubleTwist, InforMax, Inc., Incyte Genomics, GeneLogic, Skjervan, Morril, MacPherson, LLP, Genencor International, Guidant, SGI, and Sun for their co- sponsorship of the symposium.

We’d also like to acknowledge the following people for their generous help in organizing the conference: Michael Cantor, Jeffrey Chang, Carol Cheng, David Elgart, Valerie Favier, Yueyi (Irene) Liu, Jodi Elgart Paik, Bill Petitt, Rosalind Ravasio, Tomoko Shintani, and Matt Stocksiek.

Last but not least, the co-chairs would like to thank their friends and family for being supportive and understanding when all we seemed to do and talk about was BCATS.

3 BCATS 2000 Symposium Proceedings Symposium Information

SYMPOSIUM SCHEDULE AND MAP

Saturday, October 28, 2000

On Site Registration and Badge Pickup 8:00 am - 9:00 am Poster and Software Demonstration Setup 9:00 am - 9:30 am Opening Comments 9:30 am - 10:15 am Keynote Address I 10:15 am - 10:30 am Break 10:30 am - 12:00 pm Scientific Talks Session I 12:00 pm - 1:00 pm Lunch (Stone Pine Plaza) 1:15 pm - 2:00 pm Keynote Address II 2:00 pm - 2:15 pm Break 2:15 pm - 3:45 pm Scientific Talks Session II 3:45 pm - 5:15 pm Poster Session / Software Demonstrations 5:15 pm - 5:30 pm Closing Presentation and Awards

Lunch (Stone Pine Plaza)

Overflow Auditorium Men Posters Main Posters Auditorium

Posters

Women Registration & Check In Posters

4 KEYNOTE SPEAKERS

5 6 BCATS 2000 Symposium Proceedings Keynote Speakers

David Haussler, Ph.D. U.C. Presidential Chair in Computer Science University of California Santa Cruz

A WORKING DRAFT OF THE HUMAN GENOME

We discuss the bioinformatic challenges UCSC and David Kulp at Neomorphic, in creating and using the current public Inc., among many others, made working draft of the human genome, and substantial contributions to this effort look at what lies ahead as the genome is (http://genome.ucsc.edu/). We finished and comparisons with other look at the current state of this draft vertebrate genomes are made. Working genome, discussing assembly and with Francis Collins and the major genefinding methods, and methods for public sequencing centers, an mapping sequences from other international group led by Eric Lander vertebrates onto the human genome. It is and John Sulston has produced the initial our hope that this work will soon lead to working draft of the genome a significantly better understanding the (http://www.ncbi.nlm.nih.go functional organization of our genome. v/genome/central/). Jim Kent at

7 BCATS 2000 Symposium Proceedings Keynote Speakers

Richard Satava, M.D., F.A.C.S. Professor of Surgery Yale University

THE BIOINTELLIGENCE AGE: MEDICINE AFTER THE INFORMATION AGE

The Information Age is NOT the Future, microscopic, ubiquitous sensors that are the Information Age is the present. networked together to change our There is something else in the future. As current dumb and unconnected world a place holder, the name BioIntelligence into a smart and networked one. Age is suggested. The Information Age is already a century old and the Even as we struggle to understand these biotechnology revolution is over 40-50 revolutionary changes, there are words years old. The future will belong to the of caution from respected scientists. By interdisciplinary sciences that are 2030 or 2040, computers will have the emerging at the interface of traditional same computational power of a human sciences. Thus, nanotechnology brain, but will such systems be (physical and information sciences), intelligent, have emotions or even be embedded biosensors (biologic and controllable by humans? As Bill Joy physical sciences) and rational drug suggests in “Why the Future Doesn't design (biologic and information Need Us”, genetics, nanotechnology and sciences) are paving the way. The most robotics will become self assembling complex of the new technologies will and self maintaining - thus they may not incorporate all three sciences - biologic, need the human species which created physical and information. An example is them. Scientists must proactively tissue engineering which is beginning to consider the consequences of their grow synthetic organs. Thus the future progress before entering into a Faustian belongs to the interdisciplinary team of bargain. The future is bright, but we researchers, the hallmark of the must enter the BioIntelligence Age with BioIntelligence Age. They will create a our eyes wide open. world that is populated by transparent,

8 ABSTRACT LIST

9 10 BCATS 2000 Symposium Proceedings Abstract List

SCIENTIFIC TALKS SESSION I

A First Look at the tRNA and snoRNA Genes Sets in the Human & Arabidopsis Genomes Todd M. Lowe

Relational Data Mining: Using Probabilistic Relational Models to Discover Patterns in Epidemiological Data Lise Carol Getoor, Benjamin Taskar, Jeanne Rhee, Peter Small, and Daphne Koller

Blind Prediction of Structure: What Have We Learnt and Where Do We Go? Ram Samudrala, Yu Xia, and Michael Levitt

A Fluoroscopic X-Ray Registration Process for Three-Dimensional Surgical Navigation Hamid Reza Abbasi, Sanaz Hariri, Shao Chin, Robert Grzeszczuk, Daniel Kim, Gary Steinberg, and Ramin Shahidi

Evaluation of Musculoskeletal Models Derived from Magnetic Resonance Images Silvia S. Blemker, Allison S. Arnold, Deanna S. Asakawa, and Scott L. Delp

Mathematical Analysis of Ensemble Dynamics: Making Simulations of Protein Folding Feasible Michael Randall Shirts, and Vijay S. Pande

11 BCATS 2000 Symposium Proceedings Abstract List

SCIENTIFIC TALKS SESSION II

Discovering DNA Motifs in Upstream Regulatory Regions of Co-expressed Genes Xiaole Liu, Jun Liu, and Doug Brutlag

Developing and Evaluating Assessment Measures From a Simulation Tool: Tinkering Towards Utopia? Carla Marie Pugh

A One-Dimensional Finite Element Method For Simulation-Based Medical Planning For Cardiovascular Disease Jing Wan, Brooke Steele, Thomas J.R. Hughes, and Charles A. Taylor

Finding Distinctive Expression Patterns in Microarray Data with Independent Components Analysis Joshua Michael Stuart, Soumya Raychaudhuri, Xiaole Liu, and Russ Altman

Automated Quantification of 4D Ultrasound for Carotid Artery Disease Haobo Xu, David S. Paik, Barbara Ross, Thilaka Sumanaweera, John Hossack, R. Brooke Jeffrey, and Sandy Napel

Designing a Knowledge Base for Pharmacogenomics: an Ontology for Genetic Information Daniel Rubin, and Russ Altman

12 BCATS 2000 Symposium Proceedings Abstract List

POSTER SESSION / SOFTWARE DEMONSTRATION

01 Project Mothra: Designing a System for Video-based, Markerless, Human Motion Analysis in an Arbitrary Environment Ajit M Chaudhari, Richard W Bragg, Eugene J Alexander, and Thomas P Andriacchi

02 Using Metacomputing Tools To Facilitate Large-Scale Analyses of Biological Databases Allison Waugh, Glenn A. Williams, and Russ B. Altman

03 Cluster Comparisons Alok Saldanha

04 Prediction of Novel Functional Domains Using Rates of Evolution Alexander Simon

05 Recognizing Polyps From 3D CT Colon Data Salih Burak Gokturk, Burak Acar, David Paik, Carlo Tomasi, Christopher Beaulieu, and Sandy Napel

06 Folding of a Coarse-grained Model of the Tetrahymena Ribozyme Bradley J. Nakatani, and Vijay S. Pande

07 Use of Multiple Clustering Algorithms for Analysis of Human Lung Cancer Gene Expression Data Jessica Ross, and Glenn Rosen

08 In Vivo Validation of Cardiovascular Blood Flow Simulations Joy Ku, Gregory Wai Mong Chan, Mary Draney, Frank Arko, Chris Zarins, and Charles Taylor

09 Computational Analyses of the Differences between Daily and Intermittent Alendronate Treatment Christopher J. Hernandez, Gary S. Beaupre, Robert Marcus, and Dennis R. Carter

10 Mechanical Influences on Oblique Pseudarthrosis Formation Elizabeth G. Loboa Polefka, Gary S. Beaupre, and Dennis R. Carter

11 Internal and Relative Structural Conservation of Discrete Protein Sequence Motifs Steven Paul Bennett, and Douglas Brutlag

12 Constrained Global Optimization for Estimating Molecular Structure from Atomic Distances Gl7enn A. Williams, Jonathan M. Dugan, and Russ B. Altman

13 BCATS 2000 Symposium Proceedings Abstract List

13 Automated Individualized Decision Support George Christopher Scott, Ross Shachter, and Leslie Lenert

14 A Comparative Statistical Error Analysis of Neuronavigation Systems in a Clinical Setting Hamid Reza Abbasi, Sanaz Hariri, David Martin; Daniel Kim, John Adler, Gary Steinberg, and Ramin Shahidi

15 Neuronavigational Epilepsy Focus Mapping Hamid Reza Abbasi, Sanaz Hariri, David Martin, Michael Risinger, and Gary Heit

16 Comparative Tracking Error Analysis of Five Different Optical Tracking Systems Jeremy Johnson, Rasool Khadem, Clement C Yeh, Mohammad Sadeghi-Tehrani, Michael R Bax, Jacqueline Nerney Welch, Eric P Wilkinson, and Ramin Shahidi

17 Use of XML/RDF to Create Structured Metadata for Medical Images John Joseph Michon

18 A Real-Time Freehand 3D Ultrasound System for Image Guided Surgery Jacqueline Nerney Welch, Jeremy A. Johnson, Michael R. Bax, and Ramin Shahidi

19 Bridging the Gap: Simulated Dynamics of Lipid Bilayers at Boundaries Peter M. Kasson, and Vijay S. Pande

20 KB-Driven Model Building: Challenges and Approaches Mike Cantor, Peter Karp, and Masaru Tomita

21 MOTIFFEATURE: Automated Construction of 3D Models from Sequence Motifs Mike Hsin-Ping Liang, and Russ Altman

22 Guideline Interchange Format: A Representation for Sharable, Computer- Interpretable Guidelines Mor Peleg

23 A Finite Element Model of the Human Cornea Assad Anshuman Oberai, Peter M. Pinsky, and Thomas A. Silvestrini

24 Mechanical Regulation of Growth Plate Morphology Sandra Shefelbine, and Dennis R. Carter

25 The Importance of Swing Phase Initial Conditions in Stiff-knee Gait: A Case Study Saryn Goldberg, Steven Piazza, and Scott Delp

14 BCATS 2000 Symposium Proceedings Abstract List

26 A New Twist on the -Coil Transition: A Non-biological Helix with Protein-like Intermediates Sidney P. Elmer, and Vijay S. Pande

27 Sequence Analysis and Structure Comparison of the SH3 Domain Family Stefan M. Larson, and Alan R. Davidson

28 Representing Contextually Changing Decision Making Behavior in Medical Organizations Carol HF Cheng, and Raymond E Levitt

29 Medline Query-by-Example Elmer Bernstam, Olga Troyanskaya, and Jeff Chang

30 Offline Testing of a Computerized Decision Support System for Management of Hypertension Susana Martins, MK Goldstein, BB Hoffman, RW Coleman, SW Tu, R Shankar, M O'Connor, MA Musen, SB Martins, N Hastings

31 Comparison of Ribosomal Models to Experimental Data with the RiboWeb System Michelle Whirl Carrillo, and Russ B. Altman

32 The Mouse SNP Database: Mapping QTLs in silico Jonathan Usuka

33 A New Method for Determining Protein Function Similarity based on Keywords and Gene Ontology Yueyi Liu, and Russ Altman

34 Optimizing Knowledge-based Energy Functions. From Lattice Study to Real Yu Xia, and Michael Levitt

35 Monte Carlo Simulations of Folding of Simple Alpha Helices Bojan Zagrovic, Jessica Shapiro, and Vijay Pande

36 Automatic Detection and Quantification of Abdominal Aortic Thrombus in CT Angiograms Based on Clustering and Global Geometric Information Feng Zhuge, Sandy Napel, David Paik, and Geoffrey D. Rubin

37 Quantification of the Hydrophobic Interaction by Simulations of the Aggregation of Small, Hydrophobic Solutes in Water Tanya M. Raschke, Jerry Tsai and Michael Levitt

15 BCATS 2000 Symposium Proceedings Abstract List 38 ViewFeature: Integrated Feature Analysis and Visualization D. Rey Banatao, Conrad C. Huang, Patricia C. Babbitt, Russ B. Altman, and Teri E. Klein

39 Using Human Language Ability to Learn and Recognize Protein Folds Neil F. Abernethy

40 Structure and Stability of Sean Mooney, Teri Klein

41 Combining Kinetic Inference to Extract Parameters and Predictor-Corrector Method to Develop Genetic Regulatory Circuits that are Consistent with Heterogeneous Experimental Data Nizar Batada, Mike Laub, Harley McAdams Demonstrations:

42D An Interactive Biomechanical Model of the Human Hand Robert Pao-Feng Cheng, Jean Heegaard, Parvati Dev, Sakti Srivastava, Leroy Heinrichs, and Tonia Sengelin

43D Implementation of a Radio-Frequency Intravascular Ultrasound System for Quantitative Tissue Characterization in Coronary Arteries Brian Courtney, Abel L. Robertson, Paul G. Yock, and Peter J. Fitzgerald

44D Two Sided Clustering for Yeast Gene Expression Using Probabilistic Relational Models Eran Segal, Ben Taskar, and Daphne Koller

45D Web Applications for Microarray Data Analysis and Presentation Christian A. Rees, Charles M. Perou, Douglas T. Ross, Jonathan R. Pollack, J. Michael Cherry, Patrick O. Brown, and David Botstein

46D IRaCS: A Literature Mining Tool for Fast Interpretation of Microarray Data Sep Kamvar, Eldar Giladi, Jeanne Loring, and Mike Walker

16 SCIENTIFIC TALKS SESSION I

17 18 BCATS 2000 Symposium Proceedings Scientific Talks I A FIRST LOOK AT THE TRNA AND SNORNA GENES SETS IN THE HUMAN & ARABIDOPSIS GENOMES Todd M. Lowe

Introduction scoring pseudogenes. All results were Transfer RNA (tRNA) genes make up one deposited in an on-line database of tRNAs, of the largest gene families in all organisms. the Genomic tRNA Database Taking into account known types of (http://rna.wustl.edu/GtRDB/), “wobble” base pairings between the third which I maintain. position of the mRNA codon & tRNA anticodon, eukaryotes require just 46 The probabilistic snoRNA scanning program different tRNAs in theory. In reality, they was re-trained on biochemically identified have many, many more. Previous research human or plant snoRNAs, and used scan the has suggested some reasons why eukaryotes human and Arabidopsis genomes. The require such high tRNA redundancy. Now results were manually inspected to identify that we have complete gene sets for the highest scoring candidate snoRNAs for organisms from four diverse eukaryotic each known or phylogenetically inferred phyla, we will be able to directly address methylation site in ribosomal RNA. these hypotheses. Relationships between tRNA gene copy number, intracellular tRNA Both tRNAscan-SE and the snoRNA concentration, and protein codon usage are scanning programs were designed and examined. implemented as part of my graduate thesis.

SnoRNA genes are probably the second Results largest gene family in eukaryotes. These How many more tRNA genes would you non-coding are required for guess humans need relative to “lower” processing and modification of ribosomal eukaryotes such as the roundworm C. RNA, each one pairing with a particular site elegans? If you guessed 2-3 times as many, of post-transcriptional modification. you'd be wrong. The current draft of the snoRNAs can be grouped into two families, human (~90% completed) contains the H/ACA box family which direct approximately 39 fewer tRNAs than the pseudouridylations of rRNA, and the C/D worm (540 human vs. 579 worm). The box family, which guides 2'-O-ribose “completed” fly genome contains just 11 methylation of rRNA. Approximately 100 more tRNAs than the single celled baker's genes are anticipated in yeast, and over 200 yeast (285 fly vs. 274 yeast). As the first in humans, based on the number of completed genome of a plant, Arabidopsis modifications found in the respective shows the most tRNAs in any organism to ribosomal RNAs. In a previous study, I date: 614 genes. Contributing reasons for computationally identified C/D box these somewhat unexpected results will be snoRNAs for nearly all of the 55 ribose discussed. For example, Arabidopsis methylation sites in yeast. Now, I seek to do contains a nearly complete complement of the same for the human genome, a much its mitochondrial tRNAs within the nuclear more challenging task due to the 215-fold genome. Also, an array of 81 tRNAs was increased search space. found in just 40 Kbp in a highly amplified 3- tRNA repeat region. Materials and Methods tRNAscan-SE v. 1.21 was used to scan all SnoRNA searches of the Arabidopsis completed eukaryotic genomes, and the genome turned up over 60 new snoRNA results were manually inspected for low- gene predictions, in addition to the 21

19 BCATS 2000 Symposium Proceedings Scientific Talks I previously identified genes. In many cases, the 100 known ribose methylation sites. these gene predictions were supported as These results are still currently being they appear to be part of polycistronic arrays analyzed. As expected, most snoRNAs of multiple snoRNA genes. From these occur in multiple copies, spread widely results, we can predict with some certainty across the genome in many cases. A the existence of 50+ corresponding ribose collaborating experimental snoRNA lab is in methylations in ribosomal RNA. the process of verifying these 55+ new gene candidates. SnoRNA searches of the human genome turned up strong candidates for nearly all of

Web Page http://rna.wustl.edu/GtRDB/

20 BCATS 2000 Symposium Proceedings Scientific Talks I RELATIONAL DATA MINING: USING PROBABILISTIC RELATIONAL MODELS TO DISCOVER PATTERNS IN EPIDEMIOLOGICAL DATA Lise Carol Getoor, Benjamin Taskar*, Jeanne Rhee, Peter Small, Daphne Koller

Biological data sets are often characterized undergo a genetic marker analysis by their rich relational structure. Such a determining the strain of Mtb that is causing data set might contain: demographic, disease in the patient. A contact clinical, and genomic information about investigation is performed for each patient to patients; genomic and drug-resistance identify persons with whom the patient has information about infectious agents; drug been in contact during his/her infectious treatment history; and epidemiologic contact period. Data for each contact include the tracing for patients. Traditional approaches relationship of the contact to the case (e.g., to statistical data analysis often have family member, co-worker) and the contact's difficulty dealing with such complex age. structured datasets. Probabilistic relational models (PRMs) are a recent development The learned PRM contains rich dependency that extend the standard attribute-based structure both within classes and between Bayesian network representation to attributes in different classes (see incorporate a much richer relational http://www-cs/~getoor/tb.ps). The structure. A PRM specifies a template for a domain experts who developed the database probability distribution over a relational found the model interesting, and most of the database. It specifies, for each type of entity dependencies quite reasonable: the in the domain a dependency model for each dependence of age at diagnosis on HIV attribute in that table. This model encodes status --- typically, HIV-positive patients are the way in which the attribute of an object in younger, and are infected with TB as a result that table depends on other attributes, of AIDS; the dependence of the contact's including those of related objects. age on the type of contact --- contacts who are coworkers are likely to be older than In our work, we have developed algorithms contacts who are school friends. There are for learning PRMs directly from structured also dependencies that indicate a bias in the data. Our methods build on the work in TB control procedures: contacts who were learning Bayesian networks, and provide a screened at the TB clinic were much more powerful and flexible method for learning likely to be diagnosed with TB and receive from relational data. Our algorithm takes as treatment than those screened by their input a relational database and tries to detect private medical doctor. the most significant direct correlations in the data. It performs a heuristic search over the There are also correlations that are clearly space of possible dependency structures relational, and that would have been difficult using a Bayesian scoring function. to detect using a non-relational learning algorithm. For example, there is a We have applied our algorithm to a database dependence between the patient's HIV result of epidemiological data gathered at the San and whether he transmits the same strain to a Francisco Tuberculosis Clinic (1991-1999), contact: HIV positive patients are much containing 1843 patients and their more likely to transmit the disease. Another approximately 21,000 contacts. The example is the correlation between the database contains patient demographic and ethnicity of the patient and the number of clinical attributes. Additionally, sputum patients infected by the strain: Asian patients samples are obtained for each patient, and are more likely to be infected with a strain

21 BCATS 2000 Symposium Proceedings Scientific Talks I which is unique in the population, whereas Our learning algorithms for PRMs provide a other ethnicities more often have strains that powerful technique for discovering the recur in several patients. The reason is that statistical dependencies in a relational Asian patients are more often immigrants, domain. These methods are particularly who arrive at the U.S. with a latent strain of well suited to data encountered in many TB, whereas other ethnicities are often biomedical domains, where our goal is infected locally. scientific discovery from a rich relational dataset.

22 BCATS 2000 Symposium Proceedings Scientific Talks I BLIND PREDICTION OF PROTEIN STRUCTURE: WHAT HAVE WE LEARNT AND WHERE DO WE GO? Ram Samudrala, Yu Xia, Michael Levitt

The Critical Assessment of protein Structure small proteins, or fragments of a protein (up Prediction (CASP) methods conference was to ~60 residues), for more than 50% of the instigated to ensure that protein structure sequences modeled. The results represent a prediction approaches are tested rigorously marked progress in bona fide ab initio without advance knowledge of the prediction since the first CASP in 1994. We experimental answer. We have made have taken the methodologies one step predictions at all three CASP meetings, each further by using predicted structure to time improving upon previously developed predict function and guide experimental methodologies. In the recent CASP3, we work for a 67-residue fragment of the DNA made ab initio predictions based on a lattice- polymerase alpha-associated protein. A based exhaustive enumeration technique to discussion on the utility of our approach for sample protein conformational space, and an solving relevant biological problems will be all-atom conditional probability presented, as well as the new approaches we discriminatory function to select native-like have implemented at the fourth CASP, conformations. Using this approach, we which recently ended. were successfully predict the topology of

23 BCATS 2000 Symposium Proceedings Scientific Talks I A FLUOROSCOPIC X-RAY REGISTRATION PROCESS FOR THREE-DIMENSIONAL SURGICAL NAVIGATION Hamid Reza Abbasi, Sanaz Hariri (CandMed), Shao Chin, Robert Grzeszczuk, Daniel Kim, Gary Steinberg, Ramin Shahidi

Back pain has a lifetime incidence of about operatively and a C-arm fluoroscope intra- 80% and is the 2nd leading reason why operatively. Use of this algorithm involves Americans see physicians. Causing three steps: calibration, tracking, and suffering and stress, it costs as much as $50 registration. billion a year for medical care, workers compensation payments, and time lost from The IGL spinal registration algorithm (SRA) work. Surgical procedures are performed to addresses the problem of fine registering alleviate pain and neurological deficits; such a dynamic region (course registration accurately placed transpedical screws may being similar to the cranial registration). allow secure and reliable fixation of an The SRA uses the original 2D axial planes unstable spine. However, often the surgeon of the CT scans to create a 3D reconstructed has no direct visual guidance during the image of the patient. The SRA can now act procedure. Alignment of the drill and the as a virtual fluoroscope, obtaining virtual 2D decision to proceed to a certain depth fluoroscopic images from this 3D depends on the skill of surgeons. The reconstruction in any plane (e.g. lateral, AP). surgeon must infer the 3D positions and These virtual fluoroscopic images are called dimensions of critical anatomic structures digitally reconstructed radiographs (DRRs). based on their relationship to exposed Intra-operatively, two real oblique anatomical landmarks aided by 2D imaging fluoroscopic images of the patient are data (e.g. plain films, fluoroscopy, and obtained. The SRA matches the 2 real ultrasound). Studies show that incorrect fluoroscopic images with the 2 DRRs. This placement of screws range from 10% to match enables the navigation system to 40%. Neurologic complications due to assign spatial positions acquired from imprecisely placed screws range from preoperative CT images to the actual 1.5%% to 6%; inadequate biomechanical anatomical position of the patient through fixation is reported in up to 31% of cases. the following logic: a) The relationship of the camera to the real fluoroscope is known Traditional cranial neuronavigation systems (the tracking step). b) The relationship of are inappropriate for use in spinal surgeries the real fluoroscope (i.e. OI) to the CT is because the marker-to-bone relationship known through the virtual fluoroscope (fine changes significantly in the spinal region registration). The fine registration step is from the time of preoperative CT image repeated each time the patient is moved and collection to the time of intraoperative each time a large piece of equipment is marker registration. There is thus a need for moved in the OR (since this changes the intraoperative 3D real-time visualization of magnetic field and thus changes the image). spinal anatomy. To fill this technological IGL is currently in the process of testing the gap while addressing the unique constraints algorithm’s accuracy using a phantom of spinal anatomy, the Stanford Image patient vertebra. Ultimately, utilization of a Guidance Laboratory (IGL) has developed a spinal navigation system with this surgical navigation registration algorithm to noninvasive registration method will provide allow the surgeon to precisely locate greater surgical precision in spine surgical tools with respect to the patient’s procedures, especially in more sensitive anatomy during spinal surgeries using a computed tomography (CT) scanner pre-

24 BCATS 2000 Symposium Proceedings Scientific Talks I anatomic areas such as the cervical and upper thoracic spine.

25 BCATS 2000 Symposium Proceedings Scientific Talks I EVALUATION OF MUSCULOSKELETAL MODELS DERIVED FROM MAGNETIC RESONANCE IMAGES Silvia S. Blemker, Allison S. Arnold, Deanna S. Asakawa, Scott L. Delp

Introduction outlined manually in the two-dimensional The medial hamstrings and psoas muscles (2D) images (Fig. 1-A), and 3D surface are frequently lengthened surgically in an models were created from these boundaries attempt to improve walking in children with for each series. The surfaces from cerebral palsy. Previous studies have overlapping series were registered (Fig. 1- suggested that analysis of muscle lengths B), generating an accurate representation of during gait may be helpful in deciding when the musculoskeletal anatomy at a single limb a muscle should be surgically lengthened position (Fig. 1-C). To estimate the muscle (Hoffinger et al. 1993, Delp et al. 1996). moment arms for a range of limb positions, These studies have relied on a computer models of hip and knee kinematics were model of the lower limb that represents the scaled to the specimens’ bones (Fig. 2-A). musculoskeletal geometry of an average- The hip was assumed to be a ball-and-socket sized adult male. It is not clear how joint, and the hip center was estimated by variations in subject size or the presence of fitting a sphere to the femoral head using a musculoskeletal deformities may affect the nonlinear least-squares algorithm. The knee accuracy of the muscle lengths estimated model was based on published 3D using the average-sized model. Therefore, measurements of tibiofemoral kinematics techniques to accurately and non-invasively (Walker et al. 1988, Nisell et al. 1986). The characterize muscle lengths of individual musculotendon paths were derived from the subjects must be developed to test the results 3D muscle surfaces (Fig. 2-B), and of previous simulation studies. ellipsoidal wrapping surfaces were defined The goals of this study were to: (i) develop for each muscle to simulate wrapping over methods to construct subject-specific underlying structures (Fig 2-C). biomechanical models from magnetic resonance (MR) images, (ii) create models The moment arms estimated from the of three lower extremity cadaver specimens, models were compared to the moment arms and (iii) test the accuracy of muscle lengths determined experimentally on the same and moment arms estimated using these specimens using the tendon displacement models. To test the accuracy of the models, method (An et al. 1984). The specimens the hip and knee flexion moment arms were mounted in a jig that provided control estimated from models of the three of hip flexion, adduction, rotation, and knee specimens were compared to the moment flexion. Joint angles were monitored by arms determined experimentally on the same tracking the locations of infrared emitters specimens. Because a muscle’s moment that were fixed to the bones. For each arm determines its change in length with muscle, a wire was connected to the tendon, joint rotation, these comparisons also tested routed through a suture anchor at the muscle the accuracy with which the models could origin, and attached to a position transducer. estimate muscle lengths over a range of hip Fourth order polynomials were fit to the and knee motions. tendon excursion vs. flexion data, and the hip and knee flexion moment arms were Methods obtained from the first derivatives of the Models of three lower limb cadaver polynomial fits averaged over multiple specimens were constructed from six series trials. of T1-weighted spin-echo images (Fig. 1). Boundaries of the bones and muscles were

26 BCATS 2000 Symposium Proceedings Scientific Talks I Results Conclusion The moment arms estimated from the Generic musculoskeletal models that models compared favorably with the compute the lengths and moment arms of experimental data (Figs. 3 and 4). For the soft tissues have been used to study the psoas (Fig. 3-A), the average errors between treatment of wide range of movement the experimentally determined hip flexion abnormalities and to plan orthopaedic moment arms and the calculated moment surgical procedures. However, before any arms ranged from 1.1 mm, or 5% of the generic model can be used to guide patient- experimental moment arms, to 2.7 mm specific treatment decisions, the accuracy of (8%). For the medial hamstrings (Fig. 3-B), the model must be tested. This study the hip extension moment arm errors ranged demonstrates that the combination of MR from 1.0 mm (2%) to 3.8 mm (9%). The imaging and graphics-based musculoskeletal average knee flexion moment arm errors for modeling is a promising approach for the medial hamstrings (Fig. 3-C) ranged accurately estimating muscle lengths and from 0.1 mm (<1%) to 3.9 mm (9%). We moment arms in vivo (errors within 10%). also determined that the models could Using the methods presented in this study, accurately estimate muscle lengths during MR-based models of children with cerebral walking (not shown, Arnold et al., 2000a). palsy have been developed and used to examine the causes of movement abnormalities (Arnold et al., 2000b).

References

1. An KN, Takahashi K, Harrigan TP, Chao EY: Determination of muscle orientations and moment arms. J. Biomech. Eng. 106: 280-282, 1984. 2. Arnold AS, Asakawa DJ, Delp SL: Do the hamstrings and adductors contribute to excessive internal rotation of the hip in persons with cerebral palsy. Gait and Posture, Awarded Best Paper of 1999, vol. 11, pp. 181-190, 2000a. 3. Arnold, AS, Salinas S, Asakawa DJ, Delp SL: Accuracy of muscle moment arms estimated form MRI-based musculoskeletal models of the lower extremity. Computer Aided Surgery,. 5: 108-119, 2000b. 4. Hoffinger SA, Rab GT, Abou-Ghaida H: Hamstrings in cerebral palsy crouch gait. J. Pediatr. Orthop. 13: 722-726, 1993. 5. Nisell R, Nemeth G, Ohlsen H: Joint forces in extension of the knee. Acta. Orthop. Scand. 57: 41-46, 1986. 6. Walker PS, Rovic JS, Robertson DD: The effects of knee brace hinge design and placement on joint mechanics. J. Biomech. 21: 965-974, 1988.

Acknowledgements

We are grateful to JoAnn Mason, Erik King, Mahi Durbhakula, Norman Fung, and Emil Davchev. Funded by NIH, NSF, and the United Cerebral Palsy Foundation.

27 BCATS 2000 Symposium Proceedings Scientific Talks I MATHEMATICAL ANALYSIS OF ENSEMBLE DYNAMICS: MAKING SIMULATIONS OF PROTEIN FOLDING FEASIBLE Michael Randall Shirts, Vijay S. Pande

Ensemble dynamics is a new methodology argument has previously been presented for extending simulations to very long time demonstrating that with exponential rate scales by efficiently parallelizing the processes, ensemble dynamics should give calculation among many machines. an exactly linear speedup of rates with Ensemble dynamics is shown to scale nearly number of processors, and that the as fast as the number of processors in many distribution of different pathway frequencies physical situations, rendering previously is preserved. We find that tremendous intractable problems within reach of large speedups can be obtained, rendering computer clusters. Interestingly, it is previously intractable problems within possible to obtain speedup greater than the reach. number of processors under some conditions, although other systems are Ensemble dynamics is a powerful technique limited to sublinear speedup. One of the which offers highly scaleable speedup of most important applications of this new simulations. In many physical cases, work is the computer simulation of protein superlinear speedup can be obtained, folding, and small peptides have very yielding effective efficiency greater than recently been folded within our group at 100%. Deviations from linearity are Stanford using this method. representative of physical trajectories that are faster or slower than the usual trajectory, Ensemble dynamics is suited for problems but are still physical trajectories. Under the such as molecular dynamics simulations of right conditions, an ensemble dynamics condensed phase systems which are simulation can ignore traps and proceed in a characterized by infrequent crossings of more direct manner along the productive energy or free-energy barriers alternating reaction pathway. This method should be with long persistence times in energy or highly effective for the simulation of long free-energy minima. An ensemble dynamics time scales by use of hundreds or even simulation consists of M simulations thousands of computers, limited only by the running in parallel. When one of the M ratio of typical simulation times to the simulations crosses an energy barrier, all of fastest simulation times. Consider again the the M simulations are reset to (or near) the simulation of protein folding dynamics. point of phase space of this barrier-crossing While the fastest proteins fold in ~10 simulation, and the M simulations are microseconds, a single CPU can only continued from there. simulate 1 ns/day, thus requiring about three CPU years. With a 1000 processor cluster, We present a formalism for calculating the and ensemble dynamics, once can simulate 1 computational advantage (speedup using M microsecond/day, rendering the problem processors) as well as interpreting the tractable. simulation data to predict rates. A heuristic

Web Page http://foldingathome.stanford.edu

28 29 SCIENTIFIC TALKS SESSION II

30 31 BCATS 2000 Symposium Proceedings Scientific Talks II DISCOVERING DNA MOTIFS IN UPSTREAM REGULATORY REGIONS OF CO-EXPRESSED GENES Xiaole Liu, Jun Liu, Doug Brutlag

The development of genome sequencing and modifies the motif model used in the earlier DNA microarray analysis of gene Gibbs samplers to allow for the modeling of expression gives rise to the demand for data- gapped motifs and motifs with palindromic mining tools. BioProspector, a C program patterns. All these modifications greatly using a Gibbs sampling strategy, examines improve the performance of the program. the upstream region of genes in the same Although testing and development are still gene expression pattern group and looks for in progress, the program has shown regulatory sequence motifs. BioProspector preliminary success in finding the binding uses zero to third-order Markov background motifs for Saccharomyces cerevisiae RAP1, models whose parameters are either given Bacillus subtilis RNA polymerase, and by the user or estimated from a specified Escherichia coli CRP. We are currently sequence file. The significance of each working on combining BioProspector with a motif found is judged based on a motif score clustering program to explore gene distribution estimated by a Monte Carlo expression networks and regulatory method. In addition, BioProspector mechanisms.

32 BCATS 2000 Symposium Proceedings Scientific Talks II DEVELOPING AND EVALUATING ASSESSMENT MEASURES FROM A SIMULATION TOOL: TINKERING TOWARDS UTOPIA? Carla Marie Pugh

Purpose pressure points touched during the exam, 3) With the advent of simulation technology, the frequency at which a given pressure the medical profession can expect point was touched, and 4) the maximum significant changes in the ability to train amount of pressure used while touching heath care professionals. From surgical each pressure point. These variables were procedures to basic physical exam skills, used as indicators, or measures of student simulation and virtual reality technology performance. bring the promise of a new era for medical education. However, for quality assurance To better understand the variables we purposes, these new teaching tools must be created, we conducted a controlled evaluated. The purpose of this study is to randomized study using eighty-seven evaluate the assessment measures developed medical students. The study protocol from the pelvic exam simulator. consisted of a training phase, and an assessment phase. Only the treatment group Materials and Methods (33 students) trained on the simulator. We have designed a new teaching tool, the During the assessment phase, all students pelvic exam simulator, which consists of a performed clinical pelvic exams on three partial manikin - umbilicus to mid thigh - different simulators and sensor data was constructed in the likeness of an adult collected to generate the variables discussed human female. The device is instrumented above. internally with several electronic sensors and has the ability to provide the user with After examining each simulator, the students immediate visual feedback regarding filled out assessment forms regarding their performance. While the user is performing exam findings. These forms were evaluated an exam, sensor inputs are sampled at a rate and an accuracy variable was created. As of 30 hertz and the outputs are captured and part of our data analysis, the accuracy stored in a data file. Figure 1 depicts a variable was correlated with the simulator sample of the data generated from the variables to determine if the variables we simulator. created were significant indicators of student performance. http://www.stanford.edu/ ~cpugh/BCATS.html Results Pearson's correlations showed that the Because the sensor data represents student accuracy variable was significantly information that has never been collected correlated with the pressure point and while performing clinical pelvic exams, we maximum pressure variables, p <.05, developed a method of analyzing the data. establishing the validity of these two Our purpose was to extract meaningful measures as indictors of student indicators of student performance from large performance. The frequency variable only data files containing the electrical signals achieved a moderate correlation with student captured during the exam. accuracy, p = .056. Time did not correlate with accuracy. The results also showed The variables developed from the sensor statistically significant inter-item data include: 1) length of time required to correlations for the time, pressure point and perform a complete exam, 2) number of the maximum pressure variables, p < .01,

33 BCATS 2000 Symposium Proceedings Scientific Talks II further establishing the potential use of these Conclusion variables as measures of student We have developed a method of analyzing performance. The reliability coefficients for raw sensor data for the purposes of the simulator variables are as follows: Time generating valid measures of student = .7240, Pressure Points = .6329, Maximum performance. Although two of the variables Pressure = .7701, Frequency = .5011, and created seem to be valid measures of student Accuracy = .6007. performance, more studies need to be done.

Acknowledgements

I wish to acknowledge the following people for their guidance and support in this research project - Sakti Srivastava, M.D., Richard Shavelson, Ph.D., Decker Walker, Ph.D., Teresa Cotner, Ph.D., Beth Scarloss, MS, Merry Kuo, MS, Chantal Rawn, BS, Parvati Dev, Ph.D., Thomas H. Krummel, M.D., and Leroy H. Heinrichs, M.D., Ph.D.

34 BCATS 2000 Symposium Proceedings Scientific Talks II A ONE-DIMENSIONAL FINITE ELEMENT METHOD FOR SIMULATION-BASED MEDICAL PLANNING FOR CARDIOVASCULAR DISEASE Jing Wan, Brooke Steele, Thomas J.R. Hughes, Charles A. Taylor

Purpose finite element method. We applied Current methods for vascular treatment Galerkin/Least Square stabilization method planning rely on diagnostic and empirical in space and Discontinous Galerkin in time, data to guide the decision-making process. which has been proven to be stable and This approach does not enable a physician to robust. preoperatively assess the efficacy of alternate therapies in determining the Results preferred treatment for an individual. We This system is applied to compute, flow rate have previously described a new approach to and pressure in a single segment model, an planning treatments for cardiovascular idealized model of the abdominal aorta, in disease, Simulation-Based Medical three alternate treatment plans for a case of Planning, whereby a physician utilizes aorto-iliac occlusive disease and in a computational tools to construct and vascular bypass graft. We demonstrate that, evaluate a combined anatomic/physiologic based on flow rate, this method can be used model to predict the outcome of alternative to rank treatments in the same order as our treatment plans for an individual patient. fully three-dimensional method. Current systems for Simulation-Based Medical Planning utilize finite element Conclusion methods to solve the time-dependent, three- Compared with three-dimensional method, dimensional equations governing blood flow one-dimensional method has the advantage and provide detailed data on blood flow of low computational cost. Although it distribution, pressure gradients and locations cannot give the details, such as flow of flow recirculation, low wall shear stress recirculation, local blood flow distribution, and high particle residence. However, these it can still give fairly accurate flow rate and methods are computationally expensive and pressure distribution along different pass. often require hours of time on parallel We also proved that for cases of vascular computers. treatment planning, one-dimensional model can be used to rank treatments in the same Materials and Methods order as our fully three-dimensional method. We describe, herein, a system for Further work still needs to be done to Simulation-Based Medical Planning based precisely calibrate the role of one- on the solution of the one-dimensional dimensional model in vascular treatment equations of blood flow using a space-time planning.

35 BCATS 2000 Symposium Proceedings Scientific Talks II FINDING DISTINCTIVE EXPRESSION PATTERNS IN MICROARRAY DATA WITH INDEPENDENT COMPONENTS ANALYSIS Joshua Michael Stuart, Soumya Raychaudhuri, Xiaole Liu, Russ B Altman

We introduce the application of Independent the presence of common regulatory sites. In Components Analysis (ICA) for identifying addition, sequences from the anti-cluster can distinctive expression profiles within also be used as negative examples during microarray data. Instead of clustering, we motif identification. We provide an use ICA to find axes of the data that example where such additional information distinguish a small number of genes whose significantly increases the performance of a expression profiles are similar to each other motif-finding algorithm. The performance while dissimilar to the entire remaining set of the method is demonstrated on a well- of profiles. Each collection of outlier studied yeast sporulation data set. Applied profiles along one axis determined by ICA is to this data, ICA successfully rediscovers naturally contrasted to the set of profiles on the MSE regulatory element known to play a the opposite end of the same axis. For each role in sporulation while also increasing the outlier cluster, an “anti-cluster” is also number of genes known to contain such defined. The upstream sequences belonging sites. to a cluster of outliers can be searched for

36 BCATS 2000 Symposium Proceedings Scientific Talks II AUTOMATED QUANTIFICATION OF 4D ULTRASOUND FOR CAROTID ARTERY DISEASE Haobo Xu, David S. Paik, Barbara Ross, Thilaka S. Sumanaweera, John Hossack, R. Brooke Jeffrey, Sandy Napel

Purpose classified the results of the ultrasound and To develop and test the feasibility of a comparative studies into 4 grades (1:<30%, technique for automatic categorization 2:31%-60%; 3:61%-99%; 4:occluded). carotid artery disease into those patients who are normal and those who require more Results definitive tests and, possibly, surgery, from 4D ultrasound acquisition times averaged 12 a rapidly-acquired four-dimensional minutes per subject. All 10 subjects with ultrasound examination. <30% stenosis (either assumed-normal volunteers or from comparative Materials and Methods examinations) were correctly identified by 3D ultrasound data (B-mode and color ultrasound, as were two patients with Doppler energy) were collected with an complete occlusions. In addition, 2 of the 3 Acuson Sequoia 512 using a modified linear grade 3 patients were correctly identified by array transducer, which was translated along ultrasound; one was underestimated as a the elevation direction to acquire 3D grade 2. The patient with a grade 2 stenosis ultrasonic data sets. We acquired, in was overestimated by ultrasound as a grade addition, the electrocardiogram (ECG) of the 3. subject and automatically annotated each acquired image with its cardiac phase. Conclusion Speckle data between successive images It is feasible to acquire and process 4D was analyzed using a computer-based ultrasound data of the extracranial carotid tracking technique to accurately position the artery to compute cross-sectional area successive 2D image planes in 3D space. stenoses. Examination time is approximately We then used the ECG phase to parcel the 1/4 of what is required for a conventional images into ten separate 3D volumes, each bilateral duplex ultrasound examination. In with nearly constant ECG phase. A this preliminary study, all normal patients computer program automatically determined and those with complete occlusions were the medial centerline of the correctly identified. Although there were common/internal carotid artery from the some misclassifications between grades 2 most systolic phase volume, and the cross- and 3, all patients with mild-to-significant sectional area of images perpendicular to disease were correctly separated from the this centerline were plotted. From these area normals. This rapid, operator-independent vs. distance plots we determined the technique shows promise for identifying maximum percent stenosis for 8 assumed- patients with carotid disease, though further normal volunteers, and 8 patients (5 M, 3 F study in asymptomatic populations is ages 52-76; mean=66), each of whom had a required. comparative study (MRA:4 angio:4). We

37 BCATS 2000 Symposium Proceedings Scientific Talks II DESIGNING A KNOWLEDGE BASE FOR PHARMACOGENOMICS: AN ONTOLOGY FOR GENETIC INFORMATION Daniel Rubin, Russ Altman

With the human genome now nearly field, and they provide a means of modeling completely sequenced, attention is focusing a complex domain. In this study, we on learning the medical significance of this developed an ontology for genetic genetic information. Large-scale studies in information to represent genes, alleles, pharmacogenetics are being done to find sequences, genomic structure, variations in genotype in order to understand polymorphisms, and their relationships. We the variability in drug response among implemented this ontology in Protégé-2000, individuals. But to make sense of this an environment for developing ontologies information, computational tools capable of and knowledge bases. We tested our model efficiently accessing and analyzing these by representing genetic data obtained from a data are needed. Genetic data are complex, research center that is actively collecting and simply storing raw sequences in a genetic data for a pharmacogenetics study. relational database will be inadequate to We were able to store all the data collected answer the complex queries that are needed for a single gene, and we could reconstruct to discover the links between genotype to various views of the data, similar to those phenotype. We need to represent the varied the study center currently constructs by features of genetic sequences and their hand. We believe our ontology is a rich yet genomic structure to allow a broad range of flexible model of genetic information, and queries that are needed to analyze may be suitable for storing data and pharmacogenetics data. Ontologies specify supporting queries in pharmacogenetics the concepts and relationships in a given studies.

38 39 POSTER SESSION / SOFTWARE DEMONSTRATIONS

40 41 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations PROJECT MOTHRA: DESIGNING A SYSTEM FOR VIDEO-BASED, MARKERLESS, HUMAN MOTION ANALYSIS IN AN ARBITRARY ENVIRONMENT Ajit M Chaudhari, Richard W Bragg, Eugene J Alexander, Thomas P Andriacchi

Purpose to create a 3-dimensional subject-specific While current approaches to motion analysis model by attaching body segments to one are valuable to obtain sensitive, quantitative, another with idealized joints. Body segments kinematic and kinetic measurements, most will be chosen from a library ranging in are limited to a laboratory environment in detail from geometric primitives to realistic which a subject must perform activities on a segments reconstructed from human force plate while wearing reflective markers subjects, then scaled and positioned to fixed to relevant anatomical landmarks. match the subject. Inertial properties will be These restrictions can affect a subject’s associated with each segment, either by comfort, which may result in data that using the constant density assumption over differs from natural motion. The laboratory the segment volume or by heuristic values environment also makes it difficult to published in the literature. Body joints will accurately reproduce sports activities. enforce constraints on the relative positions Clearly, there is a need for a method to of the connected segments, and will range in obtain both kinematic and kinetic quantities complexity from a basic hinge joint to a for human motion in an arbitrary setting joint that models the complexities of an without markers or force plates. The purpose articulating surface. Once built, the model of Project Mothra is to achieve this goal. will be used by: 1) the tracking algorithm to obtain the kinematic data, and 2) the inverse Methodology dynamics engine to calculate the kinetic A logical approach to calculate the forces data. and torques associated with human motion obtained from video data is to match a Tracking Algorithm subject-specific, 3-dimensional model to the Markerless visual tracking techniques will recorded motion. If inertial properties are be used to obtain kinematic data for the then associated with each limb segment in motion of the body segments. While most the model, it is possible to use inverse studies will only be interested in kinetic data dynamics to calculate kinetic quantities for one or a small subset of the joints, without a force plate. Currently Project inverse dynamics calculations necessitate Mothra has three distinct areas of tracking the motion of every segment of the development: 1) an application for building body simultaneously. Therefore, a multi- 3-dimensional, subject-specific models, 2) a mode model will be used to track the most model-based visual tracking algorithm, and critical segments most accurately, while 3) a dynamics engine to calculate the saving processing time for the less critical kinematics and kinetics of the observed segments. For all segments, the tracking motion. This poster focuses on the first two problem will be solved by matching a areas of development, while the third is left model, created by the model-building as a long-term goal. application, to the images seen by multiple cameras. Subjects will initially wear colored Model Builder tights, where each non-critical segment will The model-building application will be be a unique solid color to easily identify it implemented as a GUI that allows the user and determine its orientation and position. In 01 42 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations contrast, each critical segment will have calculate the kinematics of motion while many smaller, non-repeating patterns on it minimizing processing time. that can be tracked individually as points on the segment. Once these points are Summary identified, the motion of the segment can be A framework is presented for a system to determined with much greater accuracy acquire motion data in an arbitrary setting using an existing algorithm such as the without the use of markers or force plates. A point-cluster technique (Andriacchi et al, model-building application and a tracking 1998). By using this approach, it should be algorithm are proposed as key elements. possible to obtain enough information to

43 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations USING METACOMPUTING TOOLS TO FACILITATE LARGE-SCALE ANALYSES OF BIOLOGICAL DATABASES Allison Waugh, Glenn A. Williams, Russ B. Altman

Given the high rate at which biological data (PDB). In particular, we employed the are being collected and made public, it is Feature program to scan all protein essential that computational tools be structures in the PDB in search for developed that are capable of efficiently unrecognized potential cation binding sites. accessing and analyzing these data. High- I will talk about the efficiency of Legion’s performance distributed computing parallel execution capabilities and report on resources can play a key role in enabling the initial biological implications that result large-scale analyses of biological databases. from having a site annotation scan of the I will discuss using a distributed computing entire PDB. Four interesting proteins with environment, Legion, to enable large-scale unannotated, high-scoring candidate cation computations on the Protein Data Bank binding sites will be highlighted.

02 44 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations CLUSTER COMPARISONS Alok Saldanha

Purpose some were split. This correlated well with There are many available clustering subtypes that were known to be methods, but very few ways to compare the heterogeneous. These heterogeous groups resulting clusterings. I have implemented a sometimes appear as outliers of a more visualization scheme which allows easy stable group. Finally, the experiment identification of the major differences clustering under different gene lists was between several clusterings. found to be very robust, although intentional selection of a gene list which was expressed Materials and Methods only in epithelial cells radically changed the My visualization is a web-based application clustering to reflect epithelial/non-epthelia which allows the user to select several origin instead of tumor subtype. clusterings and/or gene lists and displays the membership of the clusters and lists in a Conclusion visual fashion. As a test case, breast cancer This method is a good way of getting a expression data were clustered using many qualitative idea of how clusterings compare, methods and with multiple gene lists to and can give one a sense of which clusters determine which clusters were robust under are robust to different methods of clustering. various perturbations. However, it does not give one a quantitative measure of cluster robustness, and can be difficult to interpret. Results Several clusters of breast cancer subtypes remained under various clusterings, and

Web Page http://gort.stanford.edu:8000/alok/presentations/retreat2000/

03 45 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations PREDICTION OF NOVEL FUNCTIONAL DOMAINS USING RATES OF EVOLUTION Alexander Simon

I have developed a method to predict domains. The Notch receptor has 36 tandem putative functional regions and/or structural EGF-like repeats in its extracellular domain, domains of a protein and rank their relative which are thought to bind multiple ligands. importance. It is based on the principle that Repeats #10-12 are required to bind the two the rate of evolution of a domain is inversely known ligands, Delta and Serrate, however related to the strength of functional they are less evolutionarily constrained than constraints on it. Thus, slowly evolving repeat #26. The conservation of a regions are more likely to be functionally glycosylation consensus site in this repeat important than regions that accumulate more that is potentially modified by Fringe, a amino acid substitutions. My technique glycosyltransferase which modulates Notch does not require any information other than signaling, as well as Abruptex mutations a multiple sequence alignment. Rates of which cluster near this repeat, support the evolution are calculated in a moving prediction that it is a critical functional window along an alignment using the domain. maximum likelihood phylogenetic tree of a gene family and plotted as a function of My analysis also reveals two sequence position. Regions that are uncharacterized domains present in both evolving more slowly are candidates for Delta and Serrate that are as constrained as being “functional domains.” The method has the DSL domain, which is reported to be been validated with several functionally necessary for Notch interaction. diverse protein families. Its predictions are Furthermore, these two domains are part of a accurate and consistent with known large N-terminal region of Delta and Serrate biological information. that exhibits a strikingly conserved pattern of evolutionary rates. Identification of other For example, in the Notch – Delta/Serrate such signatures may allow the detection of protein families, a receptor-ligand system non-orthologous proteins that are important in cell fate specification, my functionally similar. method correctly predicts 70/71 known

04 46 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations RECOGNIZING POLYPS FROM 3D CT COLON DATA Salih Burak Gokturk, Burak Acar, David Paik, Carlo Tomasi, Christopher Beaulieu, Sandy Napel

Colon cancer is the second fatal cancer type The smaller the number of SVs, the more in USA. Early diagnosis is crucial for a generalizable the classifier. A trade-off successful treatment. The common method between the generalization capability and is fiber-optic colonoscopy, which is the classification performance on the invasive, time consuming and does not training set can easily be established via a allow re-assessment. A non-invasive, fast single parameter. SVM is not a clustering and sensitive method, which is also suitable algorithm; it constructs the classifying for screening, is required. It should also be surface itself, which can also be used to able to assess the information in a high learn the characteristics of polyps. dimensional feature space. The method proposed attempts to meet these criteria for Furthermore, the dimension of the CT Colonoscopy. classification domain can be increased indefinitely without adding much Since the first medical diagnosis systems, computational cost because SVM uses a the basic questions have been: How to (i) kernel function in the observation space represent the observed data, (ii) incorporate (much lower dimension) to perform inner the medical know-how, (iii) deal with the product in the classification space. The inaccuracy and uncertainty of the observed kernel can be designed to enhance the data? discrimination between polyps and non- polyp structures. The expert systems in radiology generally rely on subjective findings of a radiologist. In this study, the 3D CTC data is Besides their poor reproducibility, the preprocessed [3] and the subvolumes at the amount of data rapidly increasing with candidate polyp locations are extracted. technological advances is not manageable Each volume is sampled with 700 random with such an approach. The Bayesian slices on which four parameters are approach has been the most accepted measured: (i) Distance to closest fitting method with a sound mathematical circle, (ii) Distance to closest fitting line, justification. However, in this approach (iii) 2nd order moment and (iv) 3rd order every single data point contributes to the moment. The random slicing eliminates any decision function, irrespective of its possible bias due to the volume orientation information content. The idea of identifying and position. A histogram with 10 bins is and using the data points that carry the created over all slices for each parameter. relevant information, thus focusing on the The 40 dimensional feature vector construction of the classifier itself, is composed of these histograms for each utilized by Support Vector Machines polyp candidate is used as the input to SVM. (SVM). There are 8 true polyps and 34 non-polyps in the data set. Three artificial polyp data are SVM, originally proposed by Vapnik [1], created by applying a small perturbation to constructs a classifying surface that each one of the true polyps. We use the minimizes the training error and maximizes exponential radial basis function as kernel. the generalization capability of the classifier [2]. It determines the data points (Support The preliminary results presented Vectors, SV) that are closest to such a demonstrate the capability of SVM in surface and defines it using the SVs only. constructing classifying surfaces even in the 05 47 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations case of inseparable data sets and in utilizing The future work includes: (i) Designing high dimensional feature spaces. Selecting polyp descriptors and SVM kernels. ii) the features and the kernel is the key issue in Using a much more significant training SVM applications. population. (iii) Clinical evaluation. (iv) Clinical interpretation of the classifying surface.

References

1. Vapnik V: The Nature of Statistical Learning Theory, New York, Springer-Verlag, 1995 2. http://www.support-vector.net 3. Gokturk SB, Tomasi C: A graph method for the conservative detection of polyps in the colon. 2nd Inter Symp on Virtual Colonoscopy, Boston, October 2000

48 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations FOLDING OF A COARSE-GRAINED MODEL OF THE TETRAHYMENA RIBOZYME Bradley J. Nakatani, Vijay S. Pande

While protein folding remains one of the have reduced the 421-nucleotide ribozyme most intensely-studied problems in (8000+ atoms) to just 36 spheres, where computational biology, relatively little work each sphere typically represents 5-6 base- is being done in the related field of RNA paired nucleotides in the model structure folding. RNA folding is complicated by the (see figure). We have simulated the folding large size of the polymers and the increased of the ribozyme using Langevin dynamics. role of electrostatics, but at the same time simplified by the lower number of In spite of the relative simplicity of our constituent monomers (4 different model, we have been able to reproduce nucleotides in RNA compared to 20 many of the experimentally observed results naturally-occurring amino acids in proteins) including (1) the early collapse to a compact and the primarily hierarchical folding state, followed by a conformational search process (secondary structure ® tertiary for the correct topology and (2) the structure). We have focused our studies on existence of folding intermediates and the Tetrahymena ribozyme, one of the few misfolded kinetic “traps”. From our RNA molecules for which a three simulation results, we have also been able to dimensional model and an abundance of characterize the folding pathways and the experimental data exists. As a first step in nature of on and off-pathway intermediates. the study of the folding process, we have These initial findings provide an important chosen to forego a full atomic representation basis for further study in this relatively of the branched polymer in favor of a unexplored field. coarse-grained approach. In our model, we

06 49 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations USE OF MULTIPLE CLUSTERING ALGORITHMS FOR ANALYSIS OF HUMAN LUNG CANCER GENE EXPRESSION DATA Jessica Ross, Glenn Rosen

Purpose a given set of data. Such a study might be The use of tools for researchers to analyze helpful in determining both the strengths and global gene expression, such as the DNA weaknesses of groups of algorithms with microarray, generate large amounts of respect to these types of data. biological data. Computers are necessary to analyze experimental output in order to Materials and Methods make sense of this mass of data. Classic This study will attempt to qualify groups of biological clustering methods are being clustering algorithms in terms of their assessed to determine utilities towards effectiveness in determining underlying detecting non-random patterns from large patterns of gene expression in these type of data sets. Furthermore, modified versions of data sets. We compare differential gene the classical methods of cluster analysis, as expression data from DNA microarray well as new algorithms based on analysis of both tumor specimens and mathematical techniques used for uninvolved lung tissue procured from identifying patterns in complex data in a patients at the time of surgery. Presently, variety of other fields, are being applied to twelve lung cancer specimens have been the problem of interpreting gene expression obtained, and we anticipate reaching a total data generated from high throughput sample size of at least one hundred. In systems analyzing these data we are initially focusing on algorithms that have been well Initial experiments predominantly generated reviewed in the literature with respect to data comprised from temporal expression microarray data, and/or used to analyze patterns in manipulated cell lines. experimental microarray data. These Algorithms used in clustering these data are include two types of hierarchical methods, based on similarities in gene expression agglomerative clustering and divisive patterns over time in response to a stimulus. clustering, and two types of partitioning Later, experiments that compare two methods, self-organizing maps and k-means different tissue populations in vivo were clustering. performed. Unlike time course experiments, analyses of these data is complicated by the Results fact that it is not possible to synchronize Graphical representations of these data are cells in a population, and there are multiple shown in addition to a qualitative analysis. cells types in each tissue sample. Conclusion Analyzing data of this type is a new Clustering algorithms have specific challenge, and there is no consensus as to strengths and weaknesses when applied the optimal method of analysis at this point towards identifying non-random patterns in in time. Furthermore, there have been no these data. studies in the literature, which compare clusters derived from different algorithms on

07 50 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations IN VIVO VALIDATION OF CARDIOVASCULAR BLOOD FLOW SIMULATIONS Joy Ku, Gregory Wai Mong Chan, Mary Draney, Frank Arko, Chris Zarins, Charles Taylor

Purpose and MR-compatible pressure catheters for We have previously proposed a new both the pre-operative and post-operative paradigm for treatment planning whereby states. We have utilized this data to generate physicians utilize computer simulations, computational models and perform blood based on patient-specific data, to evaluate flow simulations using a finite element the effectiveness of alternate treatment plans method. Cine PC-MRI velocity for individual patients. We have also measurements taken at locations described a simulation-based medical downstream of the inlet to examine the planning system for planning cardiovascular accuracy of the computation simulations in surgery (Taylor et al.). In order for such a vivo. system to be clinically useful, it is necessary to validate the system and demonstrate the Results accuracy of the simulation predictions in Computed solutions compared very vivo. favorably with experimental data. Flow distribution between the native aorta and Materials and Methods bypass graft was predicted within 10% as We have performed in vivo validation compared to experimental data. studies by creating an aortic coarctation in pigs to simulate occlusive vascular disease Conclusion and then utilizing a thoraco-thoraco aortic bypass graft to treat this condition. Computational methods have significant Anatomic data is acquired using magnetic application in predicting changes in blood resonance angiography (MRA) physiologic flow due to vascular surgical intervention. In data is acquired utilizing cine phase-contrast vivo validation studies are essential for the magnetic resonance imaging (cine PC-MRI) development of these methods.

08 51 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations COMPUTATIONAL ANALYSES OF THE DIFFERENCES BETWEEN DAILY AND INTERMITTENT ALENDRONATE TREATMENT Christopher J. Hernandez, Gary S. Beaupré, Robert Marcus, Dennis R. Carter

Purpose allowing the origination frequency to return Alendronate has been shown to be an to pre-treatment levels between doses. This effective method of increasing bone mineral occurs at the same rate that alendronate is density (BMD) and reducing the risk of eliminated from the bone. Starting from an fracture in osteoporosis patients. Due to the equilibrium condition corresponding to a chronic nature of osteoporosis and the healthy post-menopausal woman, special dosing requirements for alendronate, alendronate treatment is simulated over 10 a reduction in the frequency of treatment years for daily, twice-weekly, weekly, would be an effective way to improve twice-monthly and monthly treatments of patient compliance (1, 2). A recent year-long the same cumulative dose. clinical study has indicated that weekly and twice-weekly alendronate treatments of the Results same cumulative dose are therapeutically The predicted increase in BMD found for equivalent to daily treatment (2). Given intermittent treatments was less than that these findings it is possible that even less predicted for daily treatment (figure 1). As frequent treatment could be equivalent. the frequency of dose administration Computational methods are an attractive decreased the percent increase in BMD was way of addressing this question because reduced. The difference in BMD increase alendronate influences the bone remodeling between the daily model and the intermittent process, a biological system that has been models was observed to increase with time well-quantified in the past 30 years. In this but the efficacy of the intermittent study we use a model of bone remodeling to treatments (BMD increase from intermittent simulate alendronate treatment and to treatment/BMD increase from daily predict whether or not intermittent (non- treatment) was dependent only on the dose daily) alendronate treatments are equivalent frequency. The efficacies for the different to daily treatment in terms of increases in intermittent treatments were as follows: BMD. twice-weekly (96%), weekly (92%), twice- monthly (88%) and monthly (84%). Materials and Methods Bone remodeling is a focal process that Conclusion occurs through the action of groups of cells In this study we have used a computational organized into basic multicellular units model of bone remodeling to predict (BMUs). BMUs move through bone whether or not intermittent alendronate resorbing and later reforming mineralized treatments increase BMD to the same degree tissue. We use a time-dependent, non-linear as daily treatments. Decreasing the feedback model of cancellous bone that frequency of alendronate dosage resulted in quantifies BMU size, shape, origination and reduced BMD changes as compared to daily rate of bone resorption and formation (3). treatment, although some treatments showed Model parameters are determined from only small differences. Previous studies measurements reported in the literature. have used a ±1.5% BMD difference to Alendronate treatment is simulated by define a range for equivalence (2). By this decreasing the BMU origination frequency definition our model would predict that according to the changes observed clinically twice-weekly, weekly and twice-monthly (4). Intermittent treatments are simulated by treatments are all equivalent to daily 09 52 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations treatment over the 10 year period studied. mentioned above. The model is useful for Monthly treatments are predicted to be comparing treatments because it is a equivalent in short-term studies (1 year) but mechanistic simulation relating cellular would not be equivalent in long-term studies activity to changes in BMD. Although over (10 years). For this reason we recommend long time periods other factors could the use of efficacy for defining equivalence influence the net change in BMD, the trends since the length of the study does not identified in this comparison model would influence it. An efficacy range between 100 remain the same, making this model useful and 85% would result in similar but during the development of clinical studies. consistent equivalence conclusions to those

References

1. Bone, et al. (2000) Clin Ther 22: 15-28. 2. Schnitzer, et al. (2000) Aging (Milano) 12: 1-12. 3. Hernandez, et al. (2000) J Rehab Res Develop 37: 235-44. 4. Chavassieux, et al. (1997) J Clin Invest 100: 1475-80.

53 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations MECHANICAL INFLUENCES ON OBLIQUE PSEUDARTHROSIS FORMATION Elizabeth G. Loboa Polefka, Gary S. Beaupré, Dennis R. Carter

Purpose in Model I. The same material properties Through finite element (FE) analysis, this and loads were assigned to Model II and study analyzes the effects of stresses and patterns of maximum tensile strain and strains on tissue differentiation in oblique hydrostatic stress were again determined. pseudarthrosis development. A tissue differentiation hypothesis previously Results developed in our laboratory proposes that: 1) Model I: Stress distributions showed low hydrostatic pressure directs the levels of hydrostatic tension at two pluripotential mesenchymal tissue of a periosteal corners of the fracture ends, high fracture callus down a chondrogenic levels of hydrostatic pressure at the pathway; 2) significant shear or tensile strain opposing periosteal corners, and leads to fibrogenesis; and 3) given adequate intermediate levels of hydrostatic pressure vascularity, minimal levels of hydrostatic throughout the interfragmentary gap (Fig. stress and shear/tensile strain allow direct 2A). Maximum tensile strains were highest intramembranous bone formation (1,3). In a within the interfragmentary gap and lowest low strain environment, intramembranous within the external callus (Fig. 2B). Based bone formation can be accelerated by slight upon a maximum tensile strain failure hydrostatic tension (1). The objective of the criterion, callus failure is predicted within present study was to test this tissue the interfragmentary gap. Bone formation is differentiation hypothesis with an FE model predicted at the periosteal corners exposed of an oblique fracture to determine if to low hydrostatic tension and fibrocartilage pseudarthrosis formation could be predicted formation is predicted in the based on stress and strain distributions interfragmentary gap region exposed to both within the fracture callus. hydrostatic pressure and tensile strain. Model II: Hydrostatic stress distributions in Methods the contact model were similar to those of Model I: An idealized 2-D FE model of an Model I. However, maximum tensile strain oblique fracture was created (Fig. 1A) based distributions for Model II were quite upon the geometry of a typical oblique different. Tensile strains decreased within pseudarthrosis (Fig. 1B) (2). Both cortical the interfragmentary gap and increased bone and pluripotential callus were assumed within the external callus (Fig. 3A). These to be linear elastic and isotropic materials. results would predict fibrocartilage Cortical bone was assigned an elastic maintenance within the interfragmentary gap modulus (E) of 18.5 GPa and a Poisson’s and bone formation at the two periosteal ratio (n) of 0.3. Pluripotential callus was corners experiencing low hydrostatic tension assigned E = 1 MPa and n = 0.49. A and low tensile strain (Figs 2A, 2B and 3A). compressive axial force was applied to the cortical bone ends and plane strain analysis Discussion was performed to determine patterns of In this study, we have predicted hydrostatic stress and maximum tensile interfragmentary tissue failure, fibrocartilage strain. Model II: A contact model was then formation, and locations of bone formation developed incorporating sliding contact and resorption (4) consistent with initial surfaces within the interfragmentary gap stages of pseudarthrosis development seen corresponding to locations of high tensile in vivo (Figs. 1B and 3B) (2,3). These strain and regions of callus failure predicted results suggest that pseudarthrosis formation 10 54 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations can be explained based upon the stresses and formation in an oblique fracture based upon strains occurring within an oblique fracture. the stresses and strains occurring within the fracture callus. Results from this study Although pseudarthroses can arise from a provide us with a better understanding of multitude of factors including compromised how the stress and strain distributions at a vascularity, large fracture gaps and fracture site may cause delayed union, metabolic status, fracture geometry and nonunion, and pseudarthrosis formation. load/motion at the fracture site seem to be This information may lead to improved key. To our knowledge, no prior studies fixation techniques and clinical outcomes have attempted to predict pseudarthrosis for patients undergoing fracture treatment.

References

1. Carter et al. (1998) CORR 355S:S41; 2. McLean and Urist (1968) Bone. Chicago, Univ of Chicago Press 234 3. Pauwels (1980) Biomechanics of the Locomotor Apparatus. Berlin, Springer-Verlag 106- 137; 4. Robertsson et al. (1997) Acta Orthop Scand 68(3):231.

Acknowledgements

Supported by VA Rehabilitation R&D grant A501-4RA.

Web Page http://guide.stanford.edu/People/polefka/polefka.html

55 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations INTERNAL AND RELATIVE STRUCTURAL CONSERVATION OF DISCRETE PROTEIN SEQUENCE MOTIFS Steven Paul Bennett, Douglas, L, Brutlag

Purpose residue in the eMOTIF pairs were collected Protein multiple sequence alignments and rigidly aligned as described above. provide considerable information about conservation among related proteins, Controls. Alignment results were analyzed particularly in demonstrating which regions against a background of randomly generated of a sequence are important. These eMOTIFS. For each individual eMOTIF alignments have allowed bioinformatics structural alignment, as well as for each researchers to construct sensitive and highly eMOTIF-pair structural alignment, a specific motifs to describe these important population of 50 random structural subsequences. Building on the database of alignments were generated. These random discrete sequence motifs previously alignments were created by applying the constructed in the Brutlag bioinformatics same specification template as the eMOTIF group, this work extends the information (or eMOTIF pair) in the experiment to a resulting from sequence conservation, to randomly selected piece of sequence in a make inferences about structural randomly chosen PDB and chain within the conservation as well. PDB_SELECT data set. Structural alignments were calculated using the same Materials and Methods protocol described above for the Conservation of individual eMOTIFs. experiments. 50 control alignments were Accepted structures in the PDB_SELECT performed in this way for each experimental subset of structures were analyzed to alignment, in order to generate a sufficient compile a data structure relating each number of samples to evaluate the eMOTIF and the structures in experimental alignment score. Z-scores were PDB_SELECT that contain it, as well as the calculated as a measure of each result’s converse data structure relating each significance against a random population, structure and the given eMOTIFs it contains. and to control for eMOTIF length variation For each eMOTIF present in two or more of effects on the RMSD scores. the structures in the data set, the coordinates of each residue specified in the eMOTIF Results were collected and rigidly aligned in We have shown that within a set of pairwise fashion using a quaternian structures that are globally dissimilar in transformation algorithm. Alignments were structure (having less than 25% or 30% made using alpha-carbon coordinates of the sequence identity with one another), that our specified positions in the eMOTIF, and were set of discrete motifs, called eMOTIFs, are scored with the resulting root mean square observed to be highly conserved structurally, deviation (RMSD). as measured by rigid structural alignment. In addition to individual eMOTIFs, we also Conservation of eMOTIF pairs. As above, show that eMOTIF pairs are structurally data structures were compiled relating conserved with respect to one another as eMOTIFs and PDB_SELECT structures. well. Here we relate pairs of eMOTIFs observed in multiple structures; for each eMOTIF-pair Conclusion found in two or more structures, the alpha- These results indicate that discrete motifs carbon coordinates from each specified such as eMOTIFs can be expected to imply structural conservation with a high degree of 11 56 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations confidence, both internally and with respect in the folded structure, even if distant in to one another in pairs. Implicit in the sequence. We have analyzed the structure relationship between discrete sequence database, and have demonstrated a motifs and structural conservation is the significant propensity for interaction among hypothesis that multiple conserved eMOTIFs occurring in the same structure, as subsequences within a protein may interact measured by distances between them.

57 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations CONSTRAINED GLOBAL OPTIMIZATION FOR ESTIMATING MOLECULAR STRUCTURE FROM ATOMIC DISTANCES Glenn A. Williams, Jonathan M. Dugan, Russ B. Altman

Finding optimal three-dimensional optimal three-dimensional configurations molecular configurations based on a limited that are guaranteed to satisfy known van der amount of experimental and/or theoretical Waals constraints. The algorithm uses an data requires very efficient nonlinear atomic-based approach that reduces the optimization algorithms. Optimization dimensionality and allows for tractable methods must be able to find atomic enforcement of constraints while configurations that are close to the absolute, maintaining good global convergence or global, minimum error and also satisfy properties. We evaluate the new known physical constraints such as van der optimization algorithm using synthetic data Waals separation distances. The most on the yeast phenylalanine tRNA and difficult obstacles in these types of problems several proteins from the Protein Data Bank are that using a limited amount of input data (PDB), all with known crystal structure. We leads to many possible local optima, and that compare the results to commonly used while introduction of constraints such as van global optimization methods such as der Waals helps to limit the search space, it simulated annealing, continuation, and often makes convergence to a global smoothing. Results show that compared to minimum more difficult. the standard optimization approaches, our algorithm is able combine sparse input data We investigate several commonly used with known physical constraints in an optimization methods, and introduce a efficient manner to yield more accurate constrained global optimization algorithm structures in terms of RMSD. that is robust and efficient in yielding

12 58 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations AUTOMATED INDIVIDUALIZED DECISION SUPPORT George Christopher Scott, Ross Shachter, Leslie Lenert

Purpose assigned a set of consistent preference The field of decision science has approached values randomly drawn from population the problem of formally modeling decisions distributions determined by meta-analysis. and analysis of decision outcomes. These The preference values of the virtual patients methods are often applied to groups or were then assessed by the system, applying populations to assess cost-effectiveness. the appropriate stopping criteria. The However, medical decisions for individuals number of assessments and the agreement of are often dominated by their personal the abbreviated assessment method with the preferences. By learning more about a conventional method were recorded. patient's preferences and how they change over time, one should be able to determine Results what makes them different from the Using the variance reduction algorithm and “average” patient and whether or not 95% certainty interval criteria, we found a appropriate guidelines might need to be mean number of assessments to be 4.24 (SD modified for that patient. = 1.97) utilities out of the seven described in the model. Further, 38% of the virtual Obtaining estimates of how patients value patients required only 2 out of the 7 health different health states relative to each other states to be assessed in order to have their is a time-consuming and, for the patient, 95% certainty interval exclude zero. Only fatiguing task. As the number of health 47% of the simulated patients required 5 states increases, the likelihood of mis- assessments or more. And 12% required all assignment, error or inconsistency increases. 7 assessments. It is therefore very desirable to obtain a recommendation with certainty from a Conclusion decision model with the fewest number of Although the sensitivity of each of the value assessments as possible. preferences to the model and the individual's value for each health state strongly Materials and Methods determines the efficiency of this approach, We developed a method of estimating the we feel that this method is a simple and variance contribution of each of the model effective way to improve automated parameters to the final prediction of the decision support systems. Due to the model. This was then used in an algorithm dependency of the algorithm's ability to to reduce the number of assessments reduce the mean number of assessments on necessary by terminating a patient-computer external parameters, it is not possible to decision analytic dialogue once one of the reliably determine this prior to its use. Any treatments being considered was believed to reduction in the number of assessments is be better with 95% certainty. desirable, and the worst case results in the current approach. In light of this, we feel A population of 100 patients was simulated that this approach is worthwhile even if its for preference value assessments using the effects are not known ahead of time. system. Each simulated patient was

13 59 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations A COMPARATIVE STATISTICAL ERROR ANALYSIS OF NEURONAVIGATION SYSTEMS IN A CLINICAL SETTING Hamid Reza Abbasi, Sanaz Hariri (CandMed), David Martin, Daniel Kim, John Adler, Gary Steinberg, Ramin Shahidi The use of neuronavigation (NN) in and tracking systems to localize the probe's neurosurgery has become ubiquitous. A tip in 3D. growing number of neurosurgeons are When the probe tip was placed at the edge utilizing NN for a wide variety of purposes, of a rod, the NN systems visualized the including optimizing the surgical approach probe's position on their screens in the (macrosurgery) and locating small areas of original axial plane of the CT scans and in interest (microsurgery). Experimental and the sagittal and coronal planes reconstructed methodological assumptions (e.g. using from the CT scans. In each of the three more markers increases accuracy) have been cross-sections, we measured how far from retained in the use of NN over its the actual edge of the rod (x=0, y=0) the development. While rapid advances in monitor was representing the probe tip. hardware and software have emerged in the Paging through the images on the monitor to last few years, there have been only few find the largest diameter of the Plexiglas attempts at challenging the old NN tenets sphere, we used the known diameter of the and applying new technology to update sphere to establish a system-independent these systems. To identify possible areas in scale for measurements. These which new technology may improve the measurements were acquired for both the surgical applications of NN and to test these Radionics and BrainLab NN systems in old NN tenets, we conducted system- three different marker counts. Thus, we independent accuracy tests of obtained 12 series of measurements, each neuronavigational measurements in two series consisting of 218 separate currently used systems: Radionics™ and measurements, totaling 2616 discrete BrainLab™. An immediate goal of this measurements of accuracy. project is to give surgeons information about the accuracy of NN machines; surgeons We found that, despite current NN tenets, 4 should be able to estimate the accuracy of or 8, but not 6, markers yield most efficient images generated by the system to optimize accuracy. We are aware of the their surgical accuracy. counterintuitive nature of this finding, and our lab is currently investigating this result. We obtained a phantom skull to most Additionally, the movement of skin on the realistically simulate the surgical setting, skull is not included in this study and will removed the calvaria, and installed 3 theoretically aggravate the overall error in Plexiglas square rods of different heights in each setting. We also found that: placing each of the 3 anatomical fossae (anterior, fewer markers around the region of interest middle and posterior). We used the edges of (ROI) decreases registration error at the these rods as our targets. We installed a ROI; active tracking does not necessarily Plexiglas ball of known diameter on the increase accuracy; the spreaded marker phantom's sella turcica. Replacing the setting increases accuracy; and accuracy of calvaria, we placed a total of 12 markers the NN machines differs both overall and in bilaterally on the exterior of the skull in the different axes. As researchers continue to following regions: 6 frontal, 2 mastoid, 2 apply recent developments in hardware and occipital and 2 high parietal. We performed software technology to the NN field, an a CT of the skull in 1.25 mm slices and sent increasing number of currently held tenets the data over the network to the two NN will be challenged and revised, rapidly and machines evaluated in this study. The dramatically changing the field. systems utilized their respective registration

14 60 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations NEURONAVIGATIONAL EPILEPSY FOCUS MAPPING Hamid Reza Abbasi, Sanaz Hariri (CandMed), David Martin, Michael Risinger, Gary Heit

Epilepsy occurs with a prevalence of about of the three-dimensional MRI patient gyral 0.5 percent and a cumulative lifetime anatomy. incidence of 3 percent. Approximately one quarter of these patients eventually become On the day of the surgery, 6 to 8 skin MR refractory to pharmacotherapy despite the compatible registration markers are placed introduction of a number of new and on the patient's head, and an MRI of the relatively improved drugs. For medically patient is obtained and sent via DICOMM intractable patients, resective surgery transfer protocols to the SNS (Radionics represents the next therapeutic intervention. Maynard MA, software OTS version 2.2). Intraoperatively, following a craniotomy and In extratemporal epilepsy, an exact registration of the patient to the SNS using localization of the resection target is crucial the markers, the SNS is used to center the to surgery but is often not possible with subdural electrodes over an area of known noninvasive data collection (e.g. pathology and/or to co-register the electrode electroencephalography) alone. Resections to the gyral anatomy. With the SNS probe in extratemporal cortex require definition of touching a representative contact on an a seizure focus that often lacks anatomical electrode, a display screen capture is boundaries. In contrast, the resection site for performed. This precisely localizes each medial temporal sclerosis is clearly electrode in the axial, sagittal, and coronal demarcated as the pes hippocampi, amygala, planes of the MR. The operative field is and lateral temporal cortices. Further secured in the standard fashion, and the complicating the definition of an patient is taken to the telemetry unit after an extratemporal resection site are structural appropriate post-operative recovery interval. lesions that may have a complex spatial relationship to the actual ictal focus. Post-operatively, continuous intracranial Additionally the potential presence of cortex EEG and simultaneous video are recorded. involved in language, primary sensory Two functional maps are generated: one processing, motor control or cognition can based on inter-ictal and ictal events recorded provide further constraints on the extent of from specific electrode pairs and the other tissue to extirpate. identifying which electrodes, if any, overlie the eloquent cortex. These maps are co- In response to these constraints, we report registered via common electrode contacts to on the use of a surgical navigation system the SNS maps, providing assignment of (SNS) in both cases of absent structural electrographic pathology and function (e.g. abnormalities as well as in cases of speech) to a specific cortical surface “normal” cortex to determine the precise anatomy. Based on functional as well as relation between the subdural electrodes and anatomical criterion, this mapping permits the underlying anatomy. This correlation is more precise a priori surgical resection achieved by co-registering the planning and better assessment of potential “electrographic map” generated during sub- risk. acute intracranial recordings to the images

15 61 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations COMPARATIVE TRACKING ERROR ANALYSIS OF FIVE DIFFERENT OPTICAL TRACKING SYSTEMS Jeremy Johnson, Rasool Khadem, Clement C. Yeh, Mohammad Sadeghi-Tehrani, Michael R. Bax, Jacqueline Nerney Welch, Eric P. Wilkinson, Ramin Shahidi

Purpose The positional and angular precision of five Angular jitter was measured throughout a different optical tracking system (OTS) subset of the volume, and for angles configurations are measured. The between 0° and the maximum viewable dependence of the two precision angle. The angle step size was determined measurements on position and within the by the minimum rotation of the stepper digitizing volume and angle between the motor, 1.8°. For each position and angle, dynamic reference frame (DRF) and camera 100 angle measurements were taken and the are examined. The maximum positional and jitter calculated. angular error for all measurements and for 95% of all measurements are also presented. Results Positional Jitter for All Systems Materials and Methods · Dominated by the z component Optical Tracking Systems (OTS): Four (camera look direction). cameras from two manufacturers were · Relatively constant over single z- tested: the FlashPoint™ (Image Guided plane (independent of x, y, and Technology, Boulder, Colorado) and the theta). Polaris™ (Northern Digital Inc., Ontario, · Increases with increasing z. Canada). Three different sizes of · Relatively constant for varying FlashPoint™ cameras were tested, and the angles up to some cutoff angle. Polaris™ camera was tested in both active · Best jitter obtained with 300 mm and passive configurations. FlashPoint™ due to proximity of digitizing volume to OTS camera. Linear Testing Apparatus (LTA): A precision-machined assembly consisting of a Angular Jitter for All Systems movable, vertical plate with uniformly · Relatively constant over single z- spaced holes on which the DRF was plane (independent of x, y, and theta mounted. · Relatively constant for a given depth up to some angle (60° for Stepper Motor Assembly: The assembly active configurations, 40° for allowed the DRF to be mounted to the LTA passive). and be rotated about the vertical axis. Differences Between Systems Jitter is defined as the standard deviation of · For IGT systems, positional jitter a sequence of measurements about the mean increases with z; for NDI systems, it of the measurements. remains relatively constant over a given range of z and theta Positional jitter measurements were obtained · Both passive and active at positions uniformly spaced throughout a configurations of the Polaris™ three-dimensional volume for each OTS. camera have much larger outliers The spatial x, y and z coordinates were for both positional and angular consecutively sampled 100 times at each measurements than do any of the sensor position. FlashPoint™ systems (Figure 5). 16 62 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations · When considering all data, the larger than for the active maximum error for the NDI cameras configurations. is far larger than the error for any of the IGT configurations; when the Conclusion worst 5% of outliers are ignored, the The precision of position and angle performance of the NDI measurements made by five commercially configurations significantly improve available optical tracking systems has been and nearly reach that of the IGT quantified throughout a volume. The easiest systems. way to reduce both positional and angular · Both positional and angular jitter of jitter of measurements made by an optical the IGT systems were more tracking system is to minimize the distance predictable and well-behaved than between the camera and the tracked that of either NDI configuration. instrument while staying in the camera’s · Passive NDI behaves differently digitizing volume. than the four active OTS configurations. Positional and The method presented for jitter angular jitters increase dramatically measurement and analysis is independent of for orientations larger than 40°. The the tracking technology, and can be used for variation in jitter for the NDI investigating the precision of future tracking passive configuration is also much systems.

63 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations USE OF XML/RDF TO CREATE STRUCTURED METADATA FOR MEDICAL IMAGES John J. Michon

Purpose An RDF schema derived from Protégé We have built an XML (Extensible Markup describes the semantics and allowed syntax Language) schema to describe ophthalmic of data elements in the document that refer images. Using the knowledge-modeling tool to it. The schema identifies the values, or Protégé, we have created a schema for value ranges, that are permitted for each ophthalmic images using the Resource property, and the types of resources that it Description Framework (RDF), an XML can describe. application for encoding and exchange of structured metadata. The schema describes An RDF statement becomes part of each the types of resources and property values image document, and specifies the semantics allowed for a wide variety of images of each property in the document. Thus, commonly used in clinical ophthalmology clinical meaning can be extracted from the and is being used to populate an ophthalmic document metadata. image database. This schema for clinical markup is expected to become incorporated Conclusion into the DICOM (Digital Imaging and We have built a draft schema for ophthalmic Communications in Medicine) standard for images for inclusion as a more general ophthalmology, an internationally DICOM standard for ophthalmology. The recognized standard for the encoding and schema defines resources and properties of transmission of digital images. the most common image types. It creates a structured data model that allows for Materials and Methods automatic severity assessment and complex We created the imaging schema using the querying of image metadata. knowledge-modeling tool Protégé (http://smi.stanford.edu/proje We have begun to instantiate an image cts/protege). database using this schema. Further testing and validation will be performed on image Results sets contributed by Stanford and other In our model, the class Patient has child institutions. Iterative development of the classes MedicalClassification, full schema with DICOM Working Group 9 Observations&Exam and Therapy. The will result in an international standard for class Eye_Image is a subclass of describing ophthalmic images. We will Observations&Exam and is the parent class further validate this model using another for all of the imaging modalities in imaging domain to show its general ophthalmology (see Fig.). The primary applicability. A wide variety of medical image classes are Angio_Image (flourescein domains can benefit from user-defined and indocyanin green angiography), clinical criteria based on image metadata to Laser_Image (optical coherence guide diagnosis and therapy. tomography), MRI_Image (MRI), Radiog_Image (CT and X-Ray), A long-term goal is the creation of a Ultrasound_Image (A scan, B scan, and platform for integration of ophthalmic biomicroscopic) and VisibleLight_Image clinical data sets, images and genomic data (external, biomicroscopic, pathologic). for data mining, meta-analysis, and customized therapy (Fig. 5). For example, it is known that those with a first-degree 17 64 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations relative with open angle glaucoma are at a macular degeneration from the less severe tenfold greater risk of the disease than the atrophic form to the blinding neovascular general population. There are also likely to form. Thus the integration of genetic data be important genetic factors in the response with rich clinical and imaging data sets will to drug therapy, progression of diabetic be a powerful tool for the prevention of retinopathy, and progression of age related blindness in the future.

Web Page http://smi-web.stanford.edu/people/michon/APAMI-paper.htm

65 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations A REAL-TIME FREEHAND 3D ULTRASOUND SYSTEM FOR IMAGE GUIDED SURGERY Jacqueline Nerney Welch, Jeremy A. Johnson, Michael R. Bax, Ramin Shahidi

Current freehand 3D ultrasound techniques orientation of the ultrasound probe, a video separate the scanning or acquisition step frame-grabber for capturing ultrasound from the visualization step. The process frames, and a high-performance computer leads to a single image volume dataset that for performing real-time volume updates can be rendered for viewing later. While and volume rendering. The system satisfactory for diagnostic purposes, the incorporates novel methods for inserting method is not useful for surgical guidance new frames into, and removing expired where the anatomy must be visualized in frames from, the volume dataset in real time. real time. The Image Guidance Laboratories The position and orientation of a surgical are currently developing a freehand 3D instrument can be tracked and used for ultrasound system that will allow real-time viewing the instruments position or updates to the scanned volume data as well trajectory with respect to the imaged region, as the capability to simultaneously view or can be used to determine the viewpoint of cross-sections through the volume as well as the perspective image. This paper reports on a volume rendered perspective view. The current work in progress, and focuses on equipment used is not unlike other freehand methods unique to achieving real-time 3D 3D ultrasound systems: an optical tracking visualization using freehand 3D ultrasound. system for locating the position and

18 66 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations BRIDGING THE GAP: SIMULATED DYNAMICS OF LIPID BILAYERS AT BOUNDARIES Peter M. Kasson, Vijay S. Pande

Simulation of lipid bilayers in full-atom Within two nanoseconds of gap creation, the detail is challenging because of the large simulated bilayer edges rearranged to form number of atoms involved--on the order of micelle-like structures. Although such a 25,000 for a 60x60 angstrom patch of rearrangement is perhaps to be anticipated, it membrane. Furthermore, many processes of is surprising in two respects. First, the interest, such as membrane fusion, involve micellization occurs very quickly on the rearrangements of large areas of the bilayer scale of normal bilayer motions-the on timescales spanning several seconds. If experimentally measured mean lateral we assume Moore's law to hold, such displacement of a DMPC molecule in a calculations will become feasible for bilayer is less than 5 angstroms over this supercomputers in approximately 40 years. time period (2). Second, pure phospholipids In this report, we investigate a much do not normally form micelles, as this smaller-scale rearrangement of lipid presents a packing problem for the bilayers. Recently developed experimental hydrocarbon tail groups. In our model, the techniques allow the removal of a narrow bilayer edges avoid this problem by bulging strip from a supported bilayer, leaving a gap slightly at the end. Such a structure would with bilayer on either side (1). Surfaces solve the hydrophobicity challenge created micropatterned in this manner can be used by a new water-bilayer interface but would for the construction of biosensors or nevertheless be somewhat energetically chemically defined and manipulable cell- strained. The DMPC molecules at the surface interfaces. The bilayer expands only bilayer boundary are structurally and slightly but does not fill this gap, creating a dynamically similar to molecules in an situation where there are two water-bounded ungapped fluid-phase bilayer, suggesting edges to the bilayer. No experimental that no phase transition or drastic approaches attempted to date can determine conformational transition has occurred. This the structure of these edges. Molecular model is consistent with the experimental dynamics simulation is therefore particularly observation that vesicles containing labeled well-suited for developing a model for the lipids preferentially fuse to the boundary behavior of such bilayer boundaries. region, suggesting that the bilayer edges are energetically disfavored even though they Using molecular dynamics, we are kinetically stable over times greater than approximated the blotting process by which one week (1, 3). such gapped bilayers are created as follows. A simulated bilayer of 128 or 256 In summary, our molecular dynamics data dimyristoylphosphatidylcholine (DMPC) suggest a model for the structure of bilayer molecules was equilibrated in a three- boundaries in which the edges of the bilayer dimensionally periodic box filled with water have micellized. This micelle-like structure molecules for 250 ps. A 14 angstrom gap also resembles hemifusion intermediates was created in the bilayer by deleting all postulated to occur during membrane fusion. DMPC molecules that fell within a Although a full-atom model of membrane designated region. Subsequent to gap fusion is computationally infeasible at this creation, six molecular dynamics time, it is hoped that further experiments on simulations were run for a minimum of 2 ns the hemifusion-like structures we have each. generated may shed light on the larger process. 19 67 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations

References

1. Hovis, J.S. and Boxer, S.G. Patterning Barriers to Lateral Diffusion in Supported Lipid Bilayer Membranes by Blotting and Stamping. Langmuir 2000, 16, 894-897. 2. Scandella, C.J., Devaux, P., and McConnell, H.M. Rapid lateral diffusion of phospholipids in rabbit sarcoplasmic reticulum. Proc Natl Acad Sci USA 1972 Aug;69(8):2056-60. 3. Hovis, J.S. and Boxer, S.G. Personal communication.

68 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations KB-DRIVEN MODEL BUILDING: CHALLENGES AND APPROACHES Mike Cantor, Peter Karp (Bioinformatics Research Group, SRI Internationa)l, Masaru Tomita (Laboratory for Bioinformatics, Keio University, Japan)

Our research explores the following the case of bioengineering) of metabolic, question: How can we best leverage the signaling, transport, and genetic regulatory power of knowledge bases to help experts pathways in a variety of organisms. build simulation models. Simulation tools provide sophisticated engines and interface tools for observing the Knowledge bases (KBs) are playing an behavior of mathematical models of increasingly important role in biological biological processes. However, actual research. Loosely defined, a knowledge construction of these models remains time- base is a collection of information that is consuming and labor-intensive, as the organized into a structured representation fidelity and/or scope of the model requires (sometimes called an “ontology”), designed the synthesis of large amounts of data and to enable the automation of complex query information from multiple sources. and reasoning tasks. KBs like Ecocyc, PubMed, and GenBank are used by We explore the question of how this researchers to solve problems as varied as difficulty might be alleviated by taking predicting a metabolic pathway from a advantage of the structured information in genome, finding recent references in the knowledge-bases to automate or semi- literature on a given disease, or searching for automate the generation of models for a homologous sequence or structure to a simulation engines. As our test-system, we recently discovered gene. are attempting to use EcoCyc, a metabolic and regulatory KB of e-coli cellular Another method of increasing importance in function, to generate models for E-cell, a molecular biology is the computer powerful cellular simulation package. Our simulation of cellular processes. Simulation poster outlines some of the challenges and packages such as Ecell, MIST, Scamp, issues involved in attempting this DBSolve, and Gepasi are currently used by connection, and the current state of our researchers in the modeling and design (in progress.

20 69 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations MOTIFFEATURE: AUTOMATED CONSTRUCTION OF 3D MODELS FROM SEQUENCE MOTIFS Mike Hsin-Ping Liang, Russ B. Altman

Purpose selected examples to account for under- and Identifying important physical-chemical over-representation of sites. properties around functional sites can provide information about the biochemical Materials and Methods environment required for protein function. Given a set of sequence motifs, Machine learning methods that build 3- MOTIFFEATURE scans the Protein Data dimensional models from these features are Bank for an appropriate set of sites and non- powerful tools for protein function sites to include in the training set. To prediction. These methods promise to be correct for unbalanced representation in the useful even for detecting functional training set, we use an Expectation similarity between proteins of little or no Maximization algorithm to weight the sequence similarity. However, choosing a examples based on their sequence. The good training set for the learning task is weighted training set can be used to build a often manually done. Manual selection of 3D model of the functional site. The model the training set is not only laborious, but is can then be used to score previously unseen also error prone since it can lead to biases in proteins for potential functional sites. the model due to disproportionate representation of sites. With the ever- Conclusion increasing number of resolved structures, Automated construction of models from there is a growing need to quickly build sequence motifs is potentially useful in accurate models of functional sites from detecting functional sites on proteins relatively well-characterized proteins and to without detectable sequence similarity. scan uncharacterized proteins for them. We MOTIFFEATURE eliminates the laborious introduce MOTIFFEATURE, an algorithm manual selection of the training set as well for automating the task of selecting training as corrects for disproportionate examples from a database of protein representation of sites, thus providing a structures. MOTIFFEATURE also faster and more accurate 3D model of implements a weighting scheme over the functional sites. Evaluation of the algorithms used to select the sites, non-sites, and weights is currently in progress.

21 70 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations GUIDELINE INTERCHANGE FORMAT: A REPRESENTATION FOR SHARABLE, COMPUTER-INTERPRETABLE GUIDELINES Mor Peleg

The GuideLine Interchange Format (GLIF) clinical steps, which facilitates human is a format for sharing computer- understanding. Different guideline steps are interpretable clinical guidelines independent possible. They represent clinical actions, of platforms and systems. GLIF is based on decisions and patient states, as well as an object-oriented logical model of concepts branch and synchronization steps that enable that can be used to model a guideline, and concurrency. The model is hierarchical and has an RDF-based syntax. The ability to allows action and decision steps to contain share guidelines is central to the GLIF sub-guidelines. This enables the viewer to methodology. Sharing is supported by: (1) a browse the flowchart at different levels of multi-level representation that facilitates granularity. sharing guidelines across different institutions and software applications; (2) a The second representation level is a formal consensus-based multi-institutional process representation of decision criteria and for developing GLIF that involves research actions that can be analyzed for correctness groups from Stanford, Harvard, and and executed by an interpreter. In order to Columbia Universities; (3) an open process support a formal model, GLIF3 uses a resulting in a product that is not proprietary; formal expression language and a medical and (4) a data model that is designed to domain object model. The formal expression support multiple vocabularies and medical language is a superset of the Health-Level 7 knowledge bases. (HL-7) Arden Syntax's logic grammar that is used by GLIF3 for specifying decision GLIF version 2 (GLIF2), published in 1998, criteria and patient states. GLIF3's medical enabled modeling of a guideline as a domain object model is being designed to flowchart of structured steps, representing enable GLIF3 steps to refer to patient data clinical actions and decisions. However, the items that are defined by a controlled attributes of structured constructs were terminology. The controlled terminology defined as text strings that could not be includes standard medical vocabularies that parsed, and therefore such guidelines could include concept definitions and codes (e.g., not be used for computer-based execution the Unified Medical Language System that required matching of guideline criteria (UMLS) of the National Library of to patient-specific data. Medicine) as well as standard data models for medical concepts and their attributes GLIF3 is a developing version of GLIF, (e.g., HL-7's Unified Service Action Model, designed to support computer-based which is GLIF3's default medical domain execution. GLIF3 builds upon the GLIF2 object model). framework but augments it by introducing several new constructs and requiring a more The third representation level, which is not formal definition of decision criteria, action supported yet, will represent application- specifications and patient data. specific mappings and modifications that facilitate integration into application There are three different levels at which a environments. Other features of GLIF3 GLIF3-encoded guideline may be include a flexible decision model, event- represented. The first is the author/viewer based control flow, and iterations in actions level that models the guideline as a and decisions. conceptual flowchart of temporally ordered 22 71 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations We are using Protégé, a knowledge- guidelines. These include: (1) Managing engineering environment developed at Cough as a Defense Mechanism and as a Stanford Medical Informatics, as a GLIF3 Symptom, a Consensus Panel Report of the authoring tool. We have added to it feature American College of Chest Physicians, (2) that enable import and export of GLIF3- Prevention and Control of Influenza, of the encoded guidelines that are devoid of Advisory Committee on Immunization visualization-specific details. Protégé Practices, and (3) Pharmacologic Treatment automatically lays out guideline flowcharts of Acute Major Depression and Dysthymia that are imported. In order to validate the of the American College of Physicians - expressiveness of GLIF3, we are using American Society for Internal Medicine. Protégé to encode several clinical

Web Page http://smi-web.stanford.edu/projects/intermed-web/

72 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations A FINITE ELEMENT MODEL OF THE HUMAN CORNEA Assad Anshuman Oberai, Peter M. Pinsky, Thomas A. Silvestrini

Purpose deformations using a least-squares The Intrastromal Corneal Ring (ICR), Ring methodology. Segments, and Ring Arcs are devices developed for correcting refractive defects Results including myopia, hyperopia and The model has been used to simulate the astigmatism. Simulation of the mechanics of instantaneous non-linear, elastic response of device-cornea interaction can help in the cornea to implantation of Intrastromal identifying parameters affecting alterations Corneal Rings and Segments with varying in corneal topography due to these devices, design parameters. The shifts obtained from thereby providing a useful tool in designing these parameters have been analyzed and these devices. explained.

Materials and Methods Conclusion A finite element model of the human cornea The finite element model has provided based on a mathematical description of the results that correlate well with clinical ultrastructural features and material measurements. Further, these results provide characteristics of the corneal tissue has been a valuable insight into the mechanics of developed. This model incorporates sliding tissue deformation and an explanation for contact conditions to model the stroma- the observed shifts in power. This device interface. Shifts in corneal power are application is indicative of the role of this calculated from the finite element technique in assisting the development of innovative ideas related fields.

23 73 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations MECHANICAL REGULATION OF GROWTH PLATE MORPHOLOGY Sandra Shefelbine, Dennis R. Carter

Introduction the joint surface to represent joint loading Long bones grow by endochondral and muscle contractions. ossification, the process in which cartilage is replaced by bone. The ossification process The specific growth rate (de/dt) represents begins in the center of the bone and the rate of longitudinal growth relative to an progresses toward the ends of the bone. initial length. The specific growth rate was Distinct changes occur in growth front determined by contributions from the morphology during long bone growth and biological growth rate, the stimulatory development. In most mammalian long effects of octahedral shear stress, and the bones the growth front becomes convex inhibitory effects of hydrostatic when it approaches the end of the bone compression. Octahedral shear stress is shaft. After the secondary ossific center always positive, thereby increasing the appears at the end of the bone, the growth specific growth rate. In this model front forms a concave dip in the center. hydrostatic stress is always negative (compressive) and decreases the specific Many have suggested that these changes in growth rate. The model was grown using morphology are caused by mechanical orthonormal expansion of the elements in stresses at the growth front in the developing the growth region by an amount determined cartilage. Carter and colleagues proposed by the specific growth rate. A parametric that the endochondral growth and study of boundary conditions was conducted ossification process is accelerated by to determine the effects of the compliance of intermittent shear stresses and inhibited by the newly mineralized bone under the intermittent hydrostatic pressure and that growth front on stresses in the cartilage and these factors influence growth plate growth front morphology. In addition, a morphogenesis. Using the theoretical secondary ossific center composed of framework of Carter et. al, the objectives of cancellous bone (E=500 MPa, Poisson's this study are to determine the effects of (1) ratio=0.2) was introduced to determine its material compliance of the newly formed influence on growth front progression. bone behind the growth front and (2) presence of the secondary ossific center on Results stresses in the developing cartilage and With a compliant interface between the resulting growth front progression. cartilage and newly formed bone at the growth front, octahedral shear promoted Methods growth more than hydrostatic stress An axi-symmetric finite element model was inhibited it at the center of the growth front. created to represent the growth front as it This resulted in the development of a approaches the end of a generic long bone. convex growth front as the specific growth The model consisted of isoparametric hybrid rate was higher in the center than at the elements with the materials properties of periphery of the bone. This convexity was cartilage (E=6 MPa, Poisson’s ratio = 0.49). reduced and even reversed when the All material properties were assumed linear interface was made more rigid. elastic, homogeneous, and isotropic. Compliance of the newly formed bone under The appearance of the secondary ossific the growth front was modeled by varying center introduced higher hydrostatic boundary conditions at the growth front. A pressure and lower octahedral shear in the compressive load of 0.5 MPa was placed on center of the model causing the growth front to become concave as bone growth was 24 74 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations inhibited more at the center than at the this study are consistent with clinical periphery. observations: the convex growth front occurs when the bone is growing relatively Discussion fast and the newly formed bone is relatively The results demonstrate that growth front compliant; the growth front becomes and growth plate morphology can be concave as growth slows and the secondary influenced by material properties of the ossific center forms. These findings indicate newly formed bone under the growth front the important role of mechanics in skeletal as well as the presence of the secondary morphogenesis. ossific center. The growth predictions in

Acknowledgments

This work was supported by the NSF Fellowship, the Stanford Graduate Fellowship, and the Veterans Affairs RR&D Center (Palo Alto, CA). We thank Gary Beaupré for his suggestions.

75 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations THE IMPORTANCE OF SWING PHASE INITIAL CONDITIONS IN STIFF-KNEE GAIT: A CASE STUDY Saryn Goldberg, Steven Piazza, Scott Delp

Introduction and foot) suspended from a translating hip Persons with cerebral palsy often walk with (Fig. 2). All motion was confined to the stiff-knee gait, a condition characterized by sagittal plane. The model was used to insufficient knee flexion during the swing perform inverse and forward dynamic phase of the gait cycle. When accompanied analyses of both limbs before and after by over-activity of the rectus femoris surgery. The same dynamic analyses were muscle, stiff-knee gait is commonly treated performed using kinematics from 13 normal by a rectus femoris transfer. However, this subjects. surgery is sometimes unsuccessful, in part because the factors that lead to stiff-knee Pre- and post-operative hip, knee, and ankle gait have not been adequately characterized. kinematics recorded during gait analysis We believe that examining the dynamics of were input into the inverse dynamic model stiff-knee gait in individual subjects will to compute pre- and post-operative muscular allow us to identify these factors and aid in joint moments about each of the three joints. the determination of the most appropriate Combinations of these moments and swing treatments. phase initial conditions served as input into the forward dynamic simulation. Pre- and Previously, a dynamic simulation of swing post-operative moments were paired with phase showed that several factors can limit their respective initial conditions as input to knee flexion, including excessive knee calculate the contribution of each joint extension moment, diminished hip flexion moment to the total knee angular moment, and insufficient knee flexion acceleration. Preoperative moments were velocity at toe-off [1]. In the present study, then combined with normal initial a dynamic simulation of a subject with stiff- conditions of interest (knee angle, knee knee gait revealed the influential role of all velocity, and hip velocity) as input into the swing phase conditions at toe-off (swing forward dynamic simulation to evaluate the phase initial conditions) in the unilateral resulting knee kinematics. improvement of the subject after receiving bilateral rectus femoris transfers. Results The average knee angular accelerations due Methodology to moments about the hip and knee were We studied an eighteen-year-old female multiple standard deviations above normal with spastic diplegic cerebral palsy who (Fig. 3). This observation is consistent with exhibited a bilateral stiff-knee pattern and an over-active rectus femoris muscle. swing phase activity of the rectus femoris. However, the total knee extension Following bilateral rectus transfer (and no acceleration was smaller than normal, other surgery), gait analysis showed that suggesting that the large knee extension swing phase knee motion improved on the acceleration induced by the knee extension left side (range of motion increased by 15°), moment was approximately balanced by the but did not improve on the right side (Fig. large knee flexion acceleration induced by 1). the hip flexion moment. Thus, these abnormal hip and knee moments were not A computer model of swing phase was the cause of diminished knee flexion in this created in which the swing leg was subject. represented by three segments (thigh, shank, 25 76 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations Of the swing phase initial conditions, knee appear to be responsible for the diminished angle, knee velocity, and hip velocity were preoperative knee flexion and knee range of found to have the strongest influence on motion. When these values were raised to knee flexion. In both limbs, these initial normal levels, the resulting knee flexion was values were significantly lower than normal near normal. We believe that the limited both pre- and post-operatively (Table 1). post-operative increase of these initial The right limb values were further from condition values for the right limb resulted normal than those for the left limb. When in negligible post-operative improvement in normal swing phase initial conditions of right knee flexion, while marginal increase interest and the subject’s abnormal in these values in the left limb resulted in preoperative moments were used as input some improvement in post-operative left into the forward dynamic simulation, the knee flexion. resulting knee flexion kinematics were near normal (Fig.4). The strong influence of initial swing phase conditions on swing phase knee flexion Discussion points to the importance of stance phase in This study demonstrates the importance of the generation of the stiff-knee gait pattern, swing phase initial conditions in the and demonstrates the need to study terminal determination of the stiff-knee gait pattern. stance phase to understand the causes of The subject’s low preoperative initial knee stiff-knee gait. angle, knee velocity, and hip velocity values

eferences

1. Piazza et al. J. Biomech. 29(6): 723-733, 1996.

Acknowledgments

Funded by NIH and a Whitaker Foundation Graduate Fellowship.

77 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations A NEW TWIST ON THE HELIX-COIL TRANSITION: A NON-BIOLOGICAL HELIX WITH PROTEIN-LIKE INTERMEDIATES Sidney P. Elmer, Vijay S. Pande

Polyphenylacetylene, hereafter referred to as reveals that the length of consecutive cis pPA, is a nonbiological polymer which has dihedral angles, D, is a good reaction been shown experimentally to fold into a coordinate. In addition, a non-linear least- helix, with typical folding times of 10s of squares analysis of the fluctuations of the nanoseconds and nonexponential kinetics[1]. polymer reduces the complex motions of the Nonexponential kinetics is indicative of a polymer to a subspace of essential motions complex free energy landscape with described by principle components of the intermediates, traps, and multiple pathways. system. The motions of the polymer Therefore, pPA demonstrates many of the projected onto the two primary components same kinetic properties that proteins and reveal a rugged free energy landscape in the other complex biological systems exhibit. folded basin, thus providing a microscopic Since proteins fold in the micro- to view of the complex mechanism for folding millisecond time scales, a full molecular and offering a physical interpretation of the dynamics (MD) trajectory is very difficult to nonexponential kinetics. obtain under physiological conditions. However, the time-scale for pPA to fold is For many decades, the Helix-Coil Transition easily attainable on modern processors, Theory has stood as a model for the allowing us to collect 2228 all-atom MD formation of helical structures in Biology. trajectories of a 12-mer of pPA. We Its main tenets describe the folding of characterize the thermodynamic and kinetic helices via a rate-limiting step of nucleation properties of this synthetic polymer, which of a few local residues into helical has relatively simple interactions, and then configurations. Once this event occurs, the use these results to gain insights into the helix formation will rapidly propagate in molecular details of the folding mechanism both directions to the ends of the polymer. for more complex biological structures. The result of this theory is an exponentially distributed mean folding time for the folding Our simulations result in the very interesting of the helix, denoting a simple mechanism observation that this model of a 12-mer also and a smooth free energy surface. We have folds with nonexponential folding rates in shown that the Helix-Coil Theory does not agreement with the experiments mentioned hold for even simple helices, such as previously. The mean folding time was polyphenylacetylene. For more complex calculated to be 8.7 ± 3.0 ns, which is on the systems, such as proteins and nucleic acids, same order of magnitude as the experiments. the Helix-Coil Theory clearly cannot be a Analysis of individual trajectories uncovers reasonable model for helix formation and an intermediate state containing growth. Therefore, new theories are needed configurations with little conservation of that can take into account the complex local structure, suggesting there are multiple nature of the folding dynamics of helices pathways to the folded state. A search for a and other secondary structures in complex parameter that accurately describes the biological systems. progress of the folding of the polymer

References

1. Yang, WY; Prince, RB; Sabelko, J; Moore, JS; Gruebele M JACS, 2000, 122, 3248-49.

26 78 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations SEQUENCE ANALYSIS AND STRUCTURE COMPARISON OF THE SH3 DOMAIN FAMILY Stefan M. Larson, Alan R. Davidson

As biological research rushes through the residues as being responsible for the genomic and proteomic era, the resulting exquisite binding specificity of SH3 accumulation of protein sequences and domains. structures is creating huge demand for efficient analytical methods. For example, To further understand residue interactions in the complexity of the protein-folding the SH3 domain, an algorithm for problem requires computational analysis of covariation analysis was developed. Because protein sequences and structures to provide of its potential to aid in structure prediction, insights into their relationships and to aid in a focus of this work was on elimination of experimental interpretation and design. In artifactual covariations and accurate this study, we developed and applied a prediction of residue contacts. The vast rigorous set of computational and statistical majority of covariations in the SH3 domain analyses to a single protein family, the SH3 involved residues in the hydrophobic core domain. The SH3 domain was chosen as a and in the ligand-binding pocket. Several model system due to its biological networks of three covarying residues were importance as a ubiquitous mediator of also identified. Two of these triplets have protein-protein interactions, its relatively been dramatically confirmed experimentally, small size and simple fold, and its well- by combining three destabilizing mutations behaved experimental nature. These factors into a triplet mutant of near wild-type (Fyn have also led to it being well characterized SH3) stability. Contact prediction was experimentally in our lab and many others, successful: 84% of the highly covarying allowing for direct comparison of theoretical residue pairs are within 8 angstroms in at and experimental results. least one of the eighteen SH3 structures studied. Fifteen additional domain To start, a non-redundant alignment of 266 alignments were analyzed using the SH3 domains was carefully assembled. covariation algorithm. Six of these produced Henikoff weighting was used to reduce significantly accurate contact predictions. sequence bias and Shannon entropy was calculated at each position as a measure of Sequence alignment analysis and structure residue conservation. Eighteen SH3 domain comparison of the SH3 domain produced structures were aligned and a number of all- useful data not previously available through vs-all comparisons were performed to structural or experimental studies. Much of quantify structural variation in the domain. this data has already been incorporated into No direct correlations between sequence other studies to interpret results and design identity or positional entroy and RMSD new experiments. New work in the Pande between structures were observed. However, group at Stanford aims at large-scale conserved residues were found to sequence design of SH3 domains (among consistently play important structural and/or others). By reconciling the results of functional roles in the SH3 domain. It was computational sequence design with detailed found that residues playing consistent analysis of naturally occurring sequences structural roles in ligand-binding were much and structures, we hope to more rigorously more highly conserved than those which define what features of a protein sequence contact the ligand differently in different are necessary to define its fold. structures. This points to the less conserved

27 79 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations REPRESENTING CONTEXTUALLY CHANGING DECISION MAKING BEHAVIOR IN MEDICAL ORGANIZATIONS Carol HF Cheng, Raymond E Levitt

In medical practice and scientific research, many members of a team work together to However, in industries as diverse as call synthesize data about a patient or problem. centers, banks, and healthcare, managers As problems become more complex, the have already attempted to institute quality communication and coordination tasks standards through standardized protocols, among team members becomes non-trivial. best practice guidelines, and workflow Individuals participating in this process are management systems [6]. These devices ill equipped to understand their role in it, outline the ideal process, but inadequately and efforts to improve process frequently anticipate the contexts in which decision focus on the elimination of local errors. A makers systematically deviate from the recent Institute of Medicine report [1] argues ideal. These contexts include the time of that local errors may be caused by day, the workload of the decision maker, the systematic factors, and an understanding of workload of collaborators, and the schedule the global work processes may assist in status of an process. In these contexts, diagnosing probable areas of error decision makers' behavior deviates in commission [2]. Such diagnostic tools are predictable ways from the ideal protocol, in especially lacking in medicine, a highly order to fulfill objectives not described by flexible service industry with varied roles the protocol. In general, failure to consider and many concurrent processes. context-dependent changes in decision making behavior can lead to unanticipated We have created a simulation environment, results in the load on individual workers, the the Virtual Design Team (VDT) [3, 4], to communication and coordination represent the many professions who requirements of activities, and the amount of coordinate in the care of a case-mix of error and rework necessary. The patients. This discrete-event simulation tool alternatives workers develop to the desired provides a virtual test-bed for designers of workflow thus impact service quality and clinical protocols to assess their impact on efficiency. the workflow of the organization. VDT allows the description of the actors in the In keeping with information-processing organization, their roles, skills, and theory, we model contexts induced by the experience levels, as well as the activities of activities and environment of the worker, the organization. We make explicit the but not directly related to the content of the responsibilities of each actor, the work. Our goals for incorporating relationships between the activities, and the contextual effects are two-fold: to describe communication and coordination the heuristics used by decision makers to requirements of each activity. Using an respond to recurring contexts, and to information-processing framework [5], we measure the effects of such local behaviors assume that each activity can be represented on the organization-level process by the amount of time it requires in direct performance. We do not pretend to describe work, coordination, and rework. Protocol contexts exhaustively, but focus on those designers can thus describe their protocols which have had a documented effect on succinctly and create computer simulation decision making behavior in the medical models of organizational behavior, a domain. We model the effects of delaying controlled, cheap, and quick alternative to work, allowing decisions to be made by experimental studies. lower-skilled workers, and hurrying through 28 80 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations tasks. In our initial scenarios, we find that process execution. We have defined a small although such contextual behavior may subset of contexts and their effects on decrease the time spent on direct work for decision-making behavior. We represent activities, it may lower quality through these in a framework, which highlights the fewer attempts at coordination, and increase communication requirements of coordinated the total time of the project because of work processes. We plan to evaluate the necessary rework. Thus, the aggregation of representation of contexts by modeling a small, isolated decisions to optimize medical clinic, and evaluate its generality by individual work can lead to significant modeling an airline service organization. changes in the “macro-behavior” of the We hope that the investigation of contextual organization. responses will inform protocol design, prevent implementation failures, and lead to Contextual changes in decision making is an more flexible decision-making capabilities understudied phenomenon in workflow in existing workflow management software. analysis with potentially large effects on

References

1. Kohn, L., J. Corrigan, and M. Donaldson, eds. To Err is Human: Building a Safer Health System. , ed. I.o. Medicine. 1999, National Academy Press: Washington, DC. 2. Chen, R. and R. Altman, Automated diagnosis of data-model conflicts using metadata. JAMIA, 1999. 6(5): p. 374-392. 3. Fridsma, D. Representing the Work of Medical Protocols for Organizational Simulation. in AMIA Annual Symposium. 1998. Orlando, FL. 4. Kunz, J., et al., The Virtual Design Team. Communications of the ACM, 1998. 41(11): p. 84-91. 5. Galbraith, J., Designing complex organizations. 1973, Reading, MA: Addison- Wesley. 6. Massaro, T., Introducing physician order entry at a major academic medical center. Acad Med, 1993. 68(1): p. 25-30.

Web Page http://www.stanford.edu/group/CIFE/VDT/index.html

81 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations MEDLINE QUERY-BY-EXAMPLE Elmer Bernstam, Olga Troyanskaya, Jeff Chang

Purpose There are two general classes of modules: Medline is a database of over 10,000,000 (1) modules to perform the initial search and citations to the world's biomedical literature (2) modules to refine searches (i.e., input and is growing at a rate of over 30,000 new includes a list of relevant articles as selected citations per month. Although Medline has by the user). Modules can be written in any proven extremely valuable, retrieval from compiled or interpreted language, which can Medline is difficult. Recall and precision execute from the Linux command-line, but rates are quite variable, but 25 - 60% are the example modules are implemented in typical rates for both parameters. The goal Perl and Python. of the MedlineQBE project is to facilitate the use of Medline by: (1) allowing non- We implemented modules to perform expert users to query Medline by giving “Power Search”, where the user's input examples of what they want, rather than by string is sent directly to PubMed, “Simple specifying a query using traditional query Search” where the user is able to fill out a languages and (2) to create a flexible, form to issue a structured boolean query to extendible framework that allows developers PubMed and “Related-by-MeSH” a module to easily create modules implementing novel that allows search refinement using the search strategies. combination of MeSH terms of user-selected articles. The base module, which handles all The general paradigm for MedlineQBE use interaction with PubMed is written in is: (1) perform initial search (2) select Python. relevant articles from the retrieved set, (3) repeat as necessary. Results The system, as described above has been Materials and Methods successfully implemented. As of September We implemented MedlineQBE using 2000, it has not been made available to the industry standard technologies. The system public, though there are plans to do so. is currently running on a Linux based Tomcat WWW Server on a Dell Inspiron Modules written in Python and Perl have 7000 (Pentium II, 300 MHz) notebook been successfully integrated into computer. MedlineQBE. Given example modules, the developer only has to write enough code to The user interface is written in Java. A Java construct a valid PubMed query given user Servlet controls the display of multiple Java input. Server Pages (JSPs). JSPs are responsible for interacting with modules implementing Conclusion specific search strategies. We have created a flexible, extendible framework for a Query-by-Example interface to Medline. Evaluation of usability and performance is planned.

29 82 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations OFFLINE TESTING OF A COMPUTERIZED DECISION SUPPORT SYSTEM FOR MANAGEMENT OF HYPERTENSION Susana Martins, MK Goldstein, BB Hoffman, RW Coleman, SW Tu, R Shankar, M O'Connor, MA Musen, SB Martins, N Hastings VA Palo Alto Health Care System, Palo Alto, CA Stanford University, Stanford, CA.

Complex decision support systems (DSS) 10 of 13 MD disagreements the MD did not require evaluation before they can be safely note all technically possible drug deployed for clinical uses, either recommendations. These omissions were prospectively to make recommendations or without clinical significance (e.g., MD noted retrospectively for quality review. “substitute X for Y, or A for B” but did not also note “substitute X for B, or Y for A.”). Methods In 32 disagreements, MD applied different We developed ATHENA, a DSS interpretation (30) or used additional implementing the JNC 6 guidelines for medical knowledge (2) to make a clinically hypertension. A physician (MD) previously appropriate recommendation. For example unrelated to the project developed with us in 14 heart failure cases MD recommended a by consensus a written document (RULES) beta blocker while the RULES stated that detailing our operationalization of the JNC 6 this issue was beyond the scope of rules. We selected from electronic medical ATHENA recommendations and in 4 cases records a random sample of 100 ATHENA recommended either DHP or hypertensive cases, stratified by comorbid NDHP calcium channel blockers while MD disease. The MD reviewed the same case decided on only one of these subclasses. material available to ATHENA and made After corrections were made to the DSS a recommendations based on the RULES. A second review revealed that disagreements physician and a pharmacist compared MD previously noted due to error in the recommendations with those from knowledge base or drug tables were ATHENA with another physician corrected. adjudicating disagreements. After identifying and correcting problems in the Conclusions DSS, physician and pharmacist carried out a A complex hypertension DSS can work second review of all cases. remarkably well. As expected, the DSS was more complete in listing all possible Results combinations of recommendations. It is In the 87 cases that met inclusion criteria for interesting to note that the MD who ATHENA review, 224 drug participated in development of the RULES recommendations were made by MD and/or document consciously deviated from it in ATHENA. Of these, there were 87 many cases, suggesting that the MD's disagreements. In 25 of the 81 overall impression of the best therapeutic disagreements ATHENA (12) and the MD decision overrode the rules. The evaluation (13) deviated from the RULES prescribed of a DSS before implementation in clinical recommendation. 11/12 ATHENA practice is imperative to detect errors that deviations were due to incorrect coding of a could affect clinicians' confidence in using drug dosage form and 1 case was due to a information from the DSS. wrong entry in the pharmacy database. No errors in program logic were observed. In

30 83 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations COMPARISON OF RIBOSOMAL MODELS TO EXPERIMENTAL DATA WITH THE RIBOWEB SYSTEM Michelle Whirl Carrillo, Russ B. Altman

The RiboWeb system was designed to may show data that are consistently provide a web-based computational incompatible with other data, or suggest that environment facilitating ribosomal modeling certain data types are not being interpreted and evaluation. It links a knowledge base of accurately. This information is valuable for experimental structural data regarding the future model building. ribosome to molecular modeling programs and other computational tools. One We compared five widely accepted models available tool supports the comparison of of the 30S ribosomal subunit to all of the molecular ribosomal models with footprinting, crosslinking and cleavage experimental data in the knowledge base. experimental data in our knowledge base. This type of comparison has important We saw trends in the overall satisfaction of implications for modeling. Determining the data by the models. We were also able which data is consistently compatible, or to rank models by overall data satisfaction, not, with models can be a clue to and view the “problem areas” in each model, understanding the reliability of the data for according to the data comparison. model building. Trends in data satisfaction

31 84 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations THE MOUSE SNP DATABASE: MAPPING QTLS IN SILICO Jonathan Usuka

Understanding the underlying genetics of generation. In order to accelerate these steps human diseases is the focus of many current we developed two computational tools: a research projects. Because of the searchable mouse SNP database with allele experimental limitations in human genetics, information for the 13 most commonly used mouse intercross models exhibiting inbred mouse strains, and a SNP based phenotypes observed in human disease are linkage prediction program that predicts the analyzed instead. Genes that are identified in most likely QTL's from quantitative mouse experiments often belong to the same phenotype data across three or more mouse pathways involved in the human disease and strains. The computational QTL prediction therefore yield a better understanding of the method correctly predicted the human disease process. Two of the slower experimentally identified QTL's from six steps in genetic analysis involve determining published mouse models with various the appropriate mouse cross and the phenotypes. subsequent genotyping of the intercross

32 85 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations A NEW METHOD FOR DETERMINING PROTEIN FUNCTION SIMILARITY BASED ON KEYWORDS AND GENE ONTOLOGY Yueyi Liu, Russ, Altman

Sequence homology search programs such Ontology is hierarchical, we can measure as BLAST are very useful in getting some the pair-wise distance for the mappings of idea of the function of a gene or protein two keywords. The keyword distance is when nothing except the sequence is known. defined as the minimum number of edges They have also been used for clustering from the two mappings to their first genes or proteins based on function, since common root. The closer their mappings similarity in sequence tends to lead to are, the smaller the keyword distance. The similarity in function. For genes or proteins pair-wise distances between all pairs of that have been studied experimentally, text mapped keywords are then calculated. We documents are a useful source in downloaded the keywords for all proteins of determining their function. Natural five genomes from Swiss-Prot and language processing (NLP) is one way of calculated the pair-wise distance between clustering based on documents, but it is each pair of proteins. The distance between usually computationally intensive. We the keywords for a particular protein and propose to use Swiss-Prot keywords in some other protein is defined as the sum of comparing protein functions and in minimum keyword distances between each clustering based on function. Swiss-Prot is a keyword for this protein with every keyword protein sequence database that provides for the other protein, divided by its total keywords for the function of a protein. The number of keywords. We found that this keywords of proteins with similar function distance is sensitive enough to pick up are more closely related then keywords of proteins with similar function. We hope this proteins with completely different functions. method will compliment the sequence The relatedness of two keywords is captured comparison methods in clustering proteins by their mappings on Gene Ontology, which with similar function, since not every consists of three distinct ontologies for three protein involved in similar function has areas: molecular function, biological process similar sequence. and cellular component. Since Gene

33 86 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations OPTIMIZING KNOWLEDGE-BASED ENERGY FUNCTIONS: FROM LATTICE STUDY TO REAL PROTEINS Yu Xia, Michael Levitt

Given the increasing number of known lattice model in the context of three different protein structures and the limited success of energy function scenarios. We show that physical potentials in discriminating native our method, which is based on the most structures against misfolded structures, stringent criteria, performs best in all cases. knowledge-based energy functions extracted Z-score optimization also performs well. from a database of known protein native structures have been widely used in protein We go on to derive energy parameters for structure prediction. real proteins with optimal performance. We choose residue-residue contact potential as We propose a general framework for an example. We select a representative set extracting knowledge-based energy of protein sequences with experimentally functions. We assume that the total energy determined structures from the Protein Data for a protein structure is a linear Bank. For these sequences, we use fast combination of certain basis functions, and a Monte Carlo methods and off-lattice models set of native protein structures with to generate over forty million randomly corresponding libraries of decoy structures sampled misfolded conformations that have are known. In our scheme, the energy protein-like features such as self-avoidance, function is optimal when there is least compactness and preferred bond length, chance that a random structure has a lower angle and dihedral angle values. We then energy than the corresponding native compare these misfolded conformations to structure. The optimal energy parameters their corresponding native conformations, depend on the distribution of decoy and optimal energy parameters are derived structures in the structure space. Subject to from these data. certain approximations of this distribution, most current database-derived energy Our method is optimal in that given a functions fall within this framework, specific energy function representation and a including mean-field potentials, Z-score large set of randomly sampled misfolded optimization, and constraint satisfaction structures, this approach will find the energy methods. function parameters that give the best discriminating power. We apply our optimal We propose a fast and effective method for energy function parameters to energy function parameterization based on discriminatory tests and compare its this framework. We go on to compare our performance to other energy functions. method to other methods using a simple

34 87 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations MONTE CARLO SIMULATIONS OF FOLDING OF SIMPLE ALPHA HELICES Bojan Zagrovic, Jessica Shapiro, Vijay Pande

Purpose beta sheets); b) minor backbone phi/psi Alpha helices are basic elements of protein moves; c) rotamer moves for arginines. Data structure, but the manner in which they fold analysis was performed using Mathematica is still not nearly fully understood. In this software. study, Monte Carlo (MC) simulations of capped 21-residue peptides, (Ala)21 and Results and Conclusions (Ala)5-(AlaAlaAlaArgAla)3-Ala, were Polyalanine peptides exhibit a preference for conducted to analyze the preferred location nucleation at the C-terminus and, of helix nucleation sites, speed and direction concomitantly, tend to extend in the C to N of helical propagation, and the influence of direction. Their folding times are roughly bulky arginine side-chains on these exponentially distributed with the mean of attributes of folding. In addition, the results 20,000 MC steps. Finally, the residues at were compared with the results of molecular both ends of polyalanine helices adopt dynamics (MD) simulations of the same helical conformation more quickly systems. compared with the residues in the middle of the helix. The dynamics of folding of the Materials and Methods arginine containing peptides depends The simulations involved standard strongly on the characteristics of the allowed Metropolis Monte Carlo using OPLS force moves for arginines. “Slow arginine” field and Tinker software for energy peptides (1 backbone angle move per 81 evaluation. The simulations were performed rotamer moves) exhibit no preferred in implicit solvent starting from an nucleation sites, fold in ~120,000 MC steps, antiparallel beta sheet configuration with no and tend to get trapped in collapsed states. pre-equilibration. Temperature was set at “Fast arginine” peptides (1 backbone move 300K. The simulations involved 3 kinds of for each rotamer move) fold in a manner that Monte Carlo moves: a) major backbone is indistinguishable from the polyalanine phi/psi moves (dihedral angles locked to helices. values characteristic of alpha helices, 3/10 helices, parallel beta sheets or antiparallel

35 88 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations AUTOMATIC DETECTION AND QUANTIFICATION OF ABDOMINAL AORTIC THROMBUS IN CT ANGIOGRAMS BASED ON CLUSTERING AND GLOBAL GEOMETRIC INFORMATION Feng Zhuge, Sandy Napel, David Paik, Geoffrey D. Rubin

Purpose To evaluate our algorithm, we used a helical Detection of thrombus based on CT CT simulation program to simulate a human attenuation alone will unavoidably generate abdomen including a thrombosed aortic errors in certain local regions with the aneurysm with adjacent vena cava, bowel relatively low contrast to noise ratios. This loops, and spine at various locations. We problem is aggravated by the occurrence of performed 4 simulations: one without noise adjacent tissues such as bowel loops, with and 3 with realistic CT noise. similar attenuation that should be excluded from the detection result. The purpose of Results this research is to develop an algorithm that We evaluated detection results by two detects aortic thrombus edge in the presence metrics: (1) False negative volume fraction of noise and other interfering structures. (FNVF = undetected thrombus volume / true thrombus volume) and (2) False positive Materials and Methods volume fraction (FPVF = falsely included Our method use a classical edge detector to thrombus volume / true thrombus volume). find all possible edges based on gray level information only. Edge candidates are For the noiseless case, PNVF = 0.06% and organized by the distance and angle to a FPVF = 5.22%. For the noisy cases: FNVF given center point. Our segmentation model = 0.14% ± 0.02%, and FPVF = 5.56% ± assumes that the real thrombus edge should 0.05%. Residual error is due to noise and not contain high frequency components; the interpolation; all false edges were rejected variation of these distances with angle is by our algorithm. therefore restricted. Then, edge candidates are clustered according to the distance and Conclusion angle to the center point. Edges caused by Adding global geometry constraints to gray noise and other structures are determined to level information improves detection and be in a different cluster than the true edge quantification of aortic thrombus in a because of sharp changes of distance in a phantom model. Accurate delineation of small angle range. The surfaces comprised abdominal aortic thrombus will allow of these false edges are assumed to have accurate and reproducible quantification of smaller area than the true edge surface. Thus aortic aneurysm size and growth, which has we reject clusters corresponding to small become a critical issue in the era of stent- surface areas. Interpolation is applied where graft therapy. edge candidates are judged to be false.

36 89 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations QUANTIFICATION OF THE HYDROPHOBIC INTERACTION BY SIMULATIONS OF THE AGGREGATION OF SMALL, HYDROPHOBIC SOLUTES IN WATER Tanya M. Raschke, Jerry Tsai, Michael Levitt

We have used molecular dynamics (MD) derived free energy is proportional to the simulations to investigate the aggregation of loss in exposed molecular surface area that a series of small, hydrocarbon molecules in occurs when a solute molecule joins a pre- water. MD simulations were performed on existing cluster. Furthermore, the constant systems containing increasing numbers of of proportionality (45 cal/Å2) is in complete solute molecules in water-filled boxes of agreement with experimental measurements different sizes, sampling a hundred-fold of the hydrophobic effect. This is the first range of solute concentrations. Throughout direct calculation of the hydrophobic these simulations, the formation and interaction from MD simulations; the disruption of solute clusters was observed. excellent agreement with experiment By treating the data from the trajectories as a indicates that force fields with van der series of equilibrium measurements, we Waals interactions and atomic point-charge directly measured the free energy of adding electrostatics account for the most important a single solute molecule to a cluster. This driving force in biology.

37 90 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations VIEWFEATURE: INTEGRATED FEATURE ANALYSIS AND VISUALIZATION D. Rey Banatao, Conrad C. Huang, Patricia C. Babbitt, Russ B. Altman, Teri E. Klein

Visualization interfaces for high that define a site of interest. We applied performance computing systems pose ViewFeature in an analysis of the enolase special problems due to the complexity and superfamily; a functionally distinct class of volume of data these systems manipulate. In proteins that share a common fold, the a/b the post-genomic era, scientists must be able barrel, in order to gain a more complete to quickly gain insight into structure- understanding of the conserved physical function problems, and require flexible properties of this superfamily. In particular, computing environments to quickly create we wanted to define the structural interfaces that link the relevant tools. determinants that distinguish the enolase Feature, a program for analyzing protein superfamily active site scaffold from other sites, takes a set of 3-dimensional structures a/b barrel superfamilies and particularly and creates statistical models of sites of from other metal-binding a/b barrel proteins. structural or functional significance. Until Through the use of ViewFeature, we have now, Feature has provided no support for found that the C-terminal domain of the visualization, which can make enolase superfamily does not differ at the understanding its results difficult. We have scaffold level from metal-binding a/b developed an extension to the molecular barrels. We are, however, able to visualization program Chimera that differentiate between the metal-binding sites integrates Feature's statistical models and of a/b barrels and those of other metal- site predictions with 3-dimensional binding proteins. We describe the overall structures viewed in Chimera. We call this architectural Features of enolases in a radius extension ViewFeature, and it is designed to of 10 Angstroms around the active site. help users understand the structural Features

38 91 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations USING HUMAN LANGUAGE ABILITY TO LEARN AND RECOGNIZE PROTEIN FOLDS Neil F. Abernethy

Purpose dimensional structure. In both cases the Although a wealth of protein sequence and subjects would attempt to identify the fold structural data is now widely available, this class of a protein. Prior work in the field information remains difficult to digest for has generated hand-drawn iconographs typical biologists. Even displays of protein representing protein secondary structure and structures often seem complex and difficult basic fold. to recognize, understand, and remember. Applications The question this theoretical research seeks If successful, this research could lead to to address is whether innate human language better education for molecular biologists, skills can be used as a mechanism to help enabling them to visually recognize protein people learn to recognize protein folds. Our motifs. This could greatly deepen their existing pattern-recognition skills may be understanding of the useful for both raw amino-acid sequences sequence/structure/function relationship in and the two-dimensional projection of three- proteins, and help them better utilize the dimensional folds. For example, the 1996 structure images generated by display CASP2 competition in protein-fold software. Such skills would prove valuable recognition was won by Alexey G. Murzin, in the coming era of therapeutic protein a protein structure expert who scored higher design. than any of the competing bioinformatics applications. Is it possible that all biologists A “legible” protein iconography would be could become “experts” by using their extremely useful to scientists attempting to natural language skills? visually understand gene/protein networks on a page. Finally, it may motivate earlier Methods training of structural rules as a part of the As a first test, a simplified set of known rapid language acquisition of childhood secondary and tertiary structural rules years. corresponding to particular amino-acid residues and sequences will be selected. Future work These rules will be taught to non-biologists If non-biologists show an ability to who will then be tested on their ability to recognize these patterns, we may suspect recognize the simplified patterns in amino that they are using linguistic or spatial- acid sequences. For instance, one rule could cognitive abilities. To further refine our be that a string of characters consisting of understanding of what cognitive abilities are (A,D,E,F,H,I,L,K,M,Q,W,V) would be under use, this ability could be explored classified as “” (alpha-helix-forming), with functional neuroimaging. This since these are the amino acids with alpha- technique pinpoints the areas of the brain helix-forming tendencies. Use of non- being used to process information or biologists will control for background perform tasks. The results could then be experience and encourage a natural compared and contrasted to existing data approach to the pattern-recognition problem. from various language and spatial information processing tasks. A second test will examine a subjects' ability to recognize either the two-dimensional This work is in a largely theoretical stage - icons representing the topology of proteins interested potential collaborators are or the actual projection of a three- encouraged to contact the author. 39 92 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations STRUCTURE AND STABILITY OF COLLAGEN Sean Mooney, Teri Klein

Collagen is the most abundant protein in theoretical results to guide molecular mammals, comprising over 28% the total mechanics studies of collagen-like peptides, dry protein weight. Unlike globular with the goal of building a model that can proteins, collagen is a fibril protein predict structural and thermodynamic identified by the presence of a triple helical changes that occur when mutations are domain that has a regular X-Y-Gly repeating introduced into the triple helix of collagen. amino acid sequence. Interruptions in the X-Y-Gly repeat in collagen can cause We have built models of several collagen- diseases such as Osteogenesis Imperfecta like peptides to address these questions. Our and disorders such as Ehrlos Danos models quantitatively reproduce the Syndrome. An understanding of the thermodynamics of introducing mutations structure and the factors that contribute to into the position of the repeating X- the stability of collagen will lead us to a Y-Gly triplet motif (Klein, et al. better understanding of how mutations in Biopolymers, 1999 and Mooney, et al. collagen causes disease. Biopolymers, In Press). We are currently using our models to better understand how Because of its regular structure, the triple hydroxyproline stabilizes the triple helix and helix of collagen can be modeled using to better understand the structural changes short, idealized collagen-like peptides. We that occur when mutations that cause lethal are using clinical, experimental and Osteogenesis Imperfecta are present.

40 93 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations COMBINING KINETIC INFERENCE WITH A PREDICTOR-CORRECTOR METHOD TO MODEL GENETIC REGULATORY CIRCUITS THAT ARE CONSISTENT WITH HETEROGENEOUS EXPERIMENTAL DATA Nizar Batada, Mike Laub, Harley McAdams

Purpose Results To infer kinetic parameters regarding gene We have developed a functional simulation transcription and translation and to model that implements the differential investigate regulatory circuitry by equation model of a three gene class correlating and checking the consistency of regulatory cascade involved in flagellar heterogeneous experimental data (genomics, biosynthesis, and have extending it to proteomics) using mathematical modeling incorporate several regulatory feedback and simulations. mechanisms that regulate mRNA stability as variant regulatory architectures. We have Materials and Methods demonstrated the utility of the predictor- Microarray experiments were done with corrector method where results predicted mRNA samples taken from synchronized from the model can be compared to the time Caulobacter cells taken at ten time points series to check for consistency and to with 15-minute intervals over the 150 propose and test new regulatory circuits. minute cell cycle to obtain gene expression data on genes involved in flagellar Conclusion biosynthesis (Laub, M et al, Science, 2000, We have demonstrated the utility of taking a in press). Results from this time series systems-level perspective, by combining expression profiles of genes involved in forward modeling with the reverse problem flagellar biosynthesis is compared to delay of network inference from the rich data differential equation model of flagellar gene generated from genomics and proteomics regulatory cascade and simulated using research. The potential for this predictor- Matlab/Simulink software. The simulation corrector approach which takes into model includes synthesis and degradation consideration heterogeneous data is kinetics of proteins and mRNA as well as enormous and enables identification of estimated time delays associated with genes subject to postranscriptional transcription initiation and protein folding regulation. Current work is focused on events. investigating whether it is possible to distinguish between autoregulated and non- autoregulated genes by identifying distinctive “signature” time series profiles.

41 94 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations AN INTERACTIVE BIOMECHANICAL MODEL OF THE HUMAN HAND Robert Pao-Feng Cheng, Jean Heegaard, Parvati Dev, Sakti Srivastava, Leroy Heinrichs, Tonia Sengelin

Purpose development of the application and The human hand is controlled by a complex interface. The flow of data in the application interaction of muscles, tendons, and other goes from an input motion or force applied soft tissues. The manner in which the to the model to a dynamics solver, which tendons act on the bones is not always computes the new position of each of the obvious. In order to understand hand objects in the system. The updated positions function, medical students often require the are then relayed back to the visualization opportunity to interact with cadaver software for display. The dynamics equation specimens. Due to the limited availability of solver is implemented in C++ to ensure specimens, students typically rely on static adequate update rates for the graphics anatomic images to gain insight on the display. function. With current graphics hardware and software, the ability to create a virtual Results model of the hand is possible. A A set of animations demonstrating the computational model also allows for function of normal fingers, as well as fingers repeated simulations of various tendon with tendon lesions, has been created. The lesions, while the specimen can only be used series of motions include: flexion at the once. In this project, we are developing a MCP (metacarpophalangeal) joint, flexion software application for the interactive of the DIP (distal interphalangeal) and PIP manipulation of a 3D hand model. The (proximal interphalangeal) joints, thumb model is designed to behave with the abduction/adduction, thumb appropriate biomechanical constraints to flexion/extension, and thumb opposition. produce realistic motions. These 3D animations can be viewed with any VRML (Virtual Reality Modeling Materials and Methods Language) enabled web browser. An image A full model of the human hand has been from the animation is shown in Figure 1. obtained from Primal Pictures (London, The function of the flexor digitorum UK). The model contains detailed geometric profundus, flexor digitorum superficialis, representations of the bones, cartilage, extensor digitorum, and intrinsics are tendons, ligaments, muscles, and nerves. included. These models were placed into a 3D environment, using the CosmoWorlds A prototype of the application interface has software (SGI, Mountain View, CA), where been developed using the python and the joint axes could be defined. The models Tkinter programming languages with vtk. were also decimated (up to 40%) to increase The prototype uses slide bars to control the the responsiveness of the model during joint angles of the finger (shown in Figure interaction. Only the bone and cartilage 2). The limits of the joint rotations are models are used because we are mainly constrained to lie within physical limits. concerned with the motion of rigid bodies in the system. Soft tissue deformation Conclusion algorithms are required if the tendons, While the primary work accomplished has muscles, and skin are to be included. been based on preset animations, continued work with vtk is aimed at permitting We are using an open source graphics interaction with the model. The inclusion of software, the visualization toolkit (vtk), for soft tissue behavior will be required in the 42D 95 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations computational model to deliver physiologic next step, the dynamics model will be motion. Though the prototype includes the interfaced with a haptics device to provide behavior of a single finger, the goal is to force feedback while interacting with the have a model of the complete hand. In the hand.

Web Page http://www.stanford.edu/~alief/bcats.html

96 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations IMPLEMENTATION OF A RADIO-FREQUENCY INTRAVASCULAR ULTRASOUND SYSTEM FOR QUANTITATIVE TISSUE CHARACTERIZATION IN CORONARY ARTERIES Brian Courtney, Abel L. Robertson, Paul G. Yock, Peter J. Fitzgerald

Introduction about the tissue structure than the envelope A method to perform in vivo of highly processed ultrasound signals used characterization of tissues within coronary to produce traditional IVUS images. arteries would have several important applications. Clinically, knowledge of the Method and Equipment structure and composition of an The system consists of a personal computer atherosclerotic lesion provides valuable equipped with a 500 MHz 8-bit analog to information with respect to the likelihood of digital converter, controlled by custom the onset of acute myocardial infarction. software and connected to a radio-frequency Tools to assist in the identification of output connector of an IVUS console. The vulnerable plaques in vivo would therefore software has several additional functions, assist greatly in the clinical management of including digital filtering, image coronary artery disease. Similarly, research reconstruction, transducer calibration and of coronary artery disease can benefit from the interactive measurement of several in vivo methods to monitor changes in the ultrasound parameters. Measurable composition of coronary arteries over time. parameters include backscatter intensity, Such methods would provide new insights statistics regarding the envelope of the into the progression of disease and facilitate signal, attenuation, geometric information of studies involving different modalities of the vessel components and frequency intervention, such as angioplasty, content. Video reconstruction, high volume atherectomy, brachytherapy, stenting, data management and data exportation make angioplasty and pharmacological agents. the system flexible for several research purposes. Intravascular ultrasound (IVUS) is a method that produces two-dimensional cross- Uses and Future Directions sectional images of arteries and is currently This system has been and continues to be used for assessing coronary lesions in many used in several in vivo and in vitro models clinical and research centers. Such and has provided insights into the assessments generally consist of qualitative interactions of ultrasound with different descriptions of real-time video images and tissue types under various conditions. New geometric measurements within the images. measurements and signal processing techniques will continue to be added to the In order to enable more quantitative software. Ultimately, it would be desirable measurements of intravascular ultrasound, to include an inference engine to the and to extract different measurements from software so that tissue components and the ultrasound signals used to produce the important geometric features could be image, a radio-frequency ultrasound data algorithmically detected based on a acquisition and analysis system has been collection of several measurements within developed. The system enables quantitative segments of the vessel cross-section. measurements through an ultrasound image- based interface. The hardware and software Although greater processing power, data acquire high frequency (500 MHz) digitally storage mechanisms and higher resolution of sampled records of unprocessed ultrasound the digitized ultrasound signals will signals that may contain more information eventually be incorporated to facilitate 43D 97 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations widespread adoption of radio-frequency development of minimally-invasive tissue IVUS, the current system is easy to use and characterization methods based on IVUS. provides important information into the

98 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations TWO SIDED CLUSTERING FOR YEAST GENE EXPRESSION USING PROBABILISTIC RELATIONAL MODELS Eran Segal, Ben Taskar, Daphne Koller

DNA microarray technology is currently both the experiment object and the gene producing a wealth of gene expression data object have additional attributes that can on genome-wide scale. Much work has influence the expression level. Thus, the focused on clustering genes and experiments cluster attribute would capture the residual with similar expression level. However, dependence not explained by the observed most methods perform clustering of genes attributes (e.g., the experiment type). and experiments separately and then combine the results. Furthermore, these Given the measured expression levels for all methods ignore significant information genes, we learn the model parameters using about both genes and experiments that can EM. The EM procedure requires that we aid in discovering more accurate and compute, in each iteration, the distribution significant clusters. For example, cellular over the hidden attributes. For gene role, biochemical function, and localization expression data, the probabilistic model may be known for some genes, and type of resulting from our two-sided clustering tissue and conditions are often known about schema consists of tens of thousands of the experiments. We present a novel highly dependent objects, making the approach that incorporates information inference task computationally intractable. about genes and experiments in a unified We therefore use an approximate belief probabilistic model and allows to cluster propagation algorithm due to Pearl. Several both genes and experiments simultaneously. groups have recently reported excellent experimental results by using this Our methods are based on probabilistic approximation scheme. relational models (PRMs), which extend the standard attribute-based Bayesian network We ran our experiments on yeast gene representation to incorporate a rich expression data (http://rana/ relational structure. A PRM specifies a clustering). Each column represents an template for a probability distribution over a experiment, and each row a probe on the set of (complex) objects. It specifies, for microarray designed to detect the expression each type of entity in the domain, a level of a particular gene. We clustered this dependency model for each attribute of that data using our two-sided clustering entity. This model encodes the dependence algorithm, and compared to a standard of the attribute of an object on other clustering approach (EM on a Naïve Bayes attributes of this and related objects. model) on genes and experiments separately. The results, see http://robotics/ Our PRM schema consists of an object for ~erans/twosided.html, show that the each gene and an object for each clusters obtained by our approach are experiment, with a many-to-many relation substantially more coherent. between them containing the expression level measured for the gene in the PRMs provide a flexible general-purpose experiment. Each gene and each experiment framework for representing models of have a hidden attribute corresponding to the complex biological processes such as gene cluster, both influencing the expression level expression. They can represent additional of the gene for that experiment. Thus, the attributes, time series for the experiments, expression level depends only on the cluster and gene expression pathways. We show assignments of the gene and experiment. We that we can effectively learn even these very can also consider richer structures where complex models from data. 44D 99 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations WEB APPLICATIONS FOR MICROARRAY DATA ANALYSIS AND PRESENTATION Christian A. Rees, Charles M. Perou, Douglas T. Ross, Jonathan R. Pollack, J. Michael Cherry, Patrick O. Brown, David Botstein

Microarray experiments generate large hyperlinks of this information can be datasets that require computational tools for configured with a stylesheet. 3) To find the analysis and visualisation. Making these neighbors of a gene of interest, the user can tools available as web applications allows click on the image representation of an researchers simple access with a web individual gene. This will display the most browser. We have developed two web similar genes in order of similarity together applications for visualisation, analysis and with the correlation value. publication of microarray data. CaryoScope is a visualisation and analysis GeneXplorer uses a web browser as its tool for microarray based Comparative interface to visualize large matrices of Genome Hybridisation - aCGH (Pollack et microarray gene expression datasets. A al., 1999). While determining genomic copy dataset matrix can contain thousands of number changes with CGH has a resolution genes and hundreds of experiments. The at the megabase level, array CGH can matrix is displayed as a color coded image. increase the resolution by orders of This makes it easy to detect patterns of magnitude. Copy number changes can be similarly expressed genes if the matrix has mapped at the single gene level. Taking the previously been ordered by a clustering CGH technique to the microarray platform algorithm. requires a new approach to visualisation and analysis. The genomic location of cDNA GeneXplorer provides different ways of clones is obtained by comparing their analyzing the data: 1) The user can visually sequence - usually ESTs - to the human browse by clicking on the matrix image in genome draft assembly (data provided by an overview frame. A zoom of the selected Jim Kent, UCSC). The ratios of normal region is then displayed in a separate frame DNA vs. cancer DNA are drawn by the together with gene names and hyperlinks to CaryoScope web application as barcharts more detailed gene information. 2) A representing the chromosomes. Genetic keyword search on the gene descriptions can markers are included as reference landmarks be performed. All genes with a matching in the human genome. This way, keyword in their description are displayed in chromosomal regions of amplification or the zoom frame. For different organisms deletion in human cancer cells can be gene information is easily configurable. For determined. The graphical output of human genes the following fields are CaryoScope is provided in GIF, PostScript searchable: names, gene symbols, cDNA and Portable Document Format (PDF). clone identifier, UniGene cluster identifier, Hyperlinks from each clone to additional GenBank accession numbers, chromosome gene information are added to the PDF and cytoband. The display images and output.

Web Page http://genome.stanford.edu/~rees

45D 100 BCATS 2000 Symposium Proceedings Poster Session / Software Demonstrations IRACS: A LITERATURE MINING TOOL FOR FAST INTERPRETATION OF MICROARRAY DATA Sep Kamvar, Eldar Giladi, Jeanne Loring, Mike Walker

Purpose Partitioning (Boley, 1998), a clustering The advent of the microarray has made algorithm based on generalizations of graph large-scale gene expression studies partitioning. thousands of times faster than previously possible. However, meaningful Results interpretation of this expression data is still a This tool has proven to be highly effective in bottleneck to the discovery process, often trials. In all trials, the analysis of large-scale taking several weeks or months of literature gene expression data took under an hour, review. We seek to automate this while the analysis of these same results took interpretation procedure by developing a several weeks using traditional methods. In literature-mining tool to organize and addition, this tool has led to novel insights in summarize the literature in a manner helpful Alzheimer's disease and osteoporosis. to biologists analyzing large-scale expression data. Conclusion We present a new tool to aid biologists in Materials and Methods interpreting gene expression data. This tool We aim to (a) retrieve all MedLine articles has been shown to be highly effective in pertaining to coexpressed or differentially organizing and summarizing the relevant expressed genes in a microarray experiment, literature in far less time than traditional and (b) find clusters of closely related methods used by biologists. Currently, gene articles within this document collection and expression data can be produced much faster summarize these clusters. than it can be interpreted, and we suggest that this tool can be significant in widening For document retrieval, we use standard the bottleneck that slows the discovery term-matching methods. Clustering is process in functional genomics. achieved using Principal Direction Divisive

46D 101 SYMPOSIUM PARTICIPANT LIST

102 103 BCATS 2000 Symposium Proceedings Symposium Participant List Neil Abernethy Mechanical Engineering Center for Computational Genetics Biomedical Informatics [email protected] and Biological Modeling [email protected] [email protected] Allison Arnold Rami Aburomia BME Division, ME Department Elmer Victor Bernstam Genetics [email protected] General Internal Medicine: Stanford [email protected] Medical Informatics Blake Ashby [email protected] Burak Acar Mechanical Engineering Dept. of Radiology, School of [email protected] Gail Binkley Medicine Department of Genetics [email protected] Srinivasan B [email protected] Asia Pacific Research Centre, Inst of Annette Adler International Studies Jon Binkley Agilent Technologies [email protected] Genetics [email protected] [email protected] Brian Babcock Aneel Advani Computer Science Terrence F Blaschke Medical Informatics [email protected] Medicine/Clinical Pharmacology [email protected] [email protected] Virginia Bachrach Susanne Elizabeth Ahmari Pediatrics Silvia Salinas Blemker Molecular and Cellular Physiology [email protected] Biomechanical Engineering [email protected] Division, Mechanical Engineering Pierre Barbero Department Ahmed Murad Akhter [email protected] Computer Science [email protected] [email protected] Jason Bock Leah Ortiz-Luis Barrera Molecular and Cellular Physiology Gene Alexander, PhD Math and Computational Science [email protected] Mechanical Engineering [email protected] [email protected] Roy Bohenzky Dr. John Bashkin Roche Diagnostics Russ Biagio Altman SRI International [email protected] Medicine [email protected] [email protected] David Botstein Sanmit Basu Genetics Dong Anton An Mechanical Engineering, Division of [email protected] Computer Science Biomechanical Engineering [email protected] [email protected] Leah Bowser biology Kirk Anders Nizar Batada [email protected] Genetics Developmental Biology [email protected] [email protected] Edward Stuart Boyden Neurosciences Clay Anderson, Ph.D. Gary Beaupré [email protected] Mechanical Engineering Biomechanical Engineering Division [email protected] [email protected] Richard William Bragg Mechanical Engineering Nancy Anderson Donia Larissa Bencke [email protected] Undergraduate Advising Center Biology [email protected] [email protected] Dena Bravata, M.D., M.S. Primary Care & Outcomes Research Kok Long Ang Sandra Elizabeth Bendeck [email protected] Gyn & Ob medicine [email protected] [email protected] Andrew Broderick Stanford Research Institute Mehmet Serkan Apaydin Steve Bennett [email protected] Electrical Engineering Biochemistry [email protected] [email protected] Igor Eric Brodsky Microbiology and Immunology Lauren Marie Aquino Aviv Bergman [email protected]

104 BCATS 2000 Symposium Proceedings Symposium Participant List Pediatrics Michael Brudno David Yu Chen [email protected] Computer Science CS [email protected] [email protected] Evan Chou Computer Science Anh Bui Kenneth Chen [email protected] Biological Sciences Statistics [email protected] [email protected] Grace Chou SRI International Elaine Carlson Mingying Chen [email protected] Buck Institute Computer Science [email protected] [email protected] Douglas Chow Graduate School of Business Michelle Whirl Carrillo Christopher Paochung Cheng [email protected] Biophysics Mechanical Engineering [email protected] [email protected] Andrzej Chruscinski Medicine Dennis R. Carter Carol Hsen-Fae Cheng [email protected] Mechanical Engineering Biomedical Informatics [email protected] [email protected] Su Chung San Diego Supercomputer Center John Cavallaro Jason Cheng [email protected] Management Science and Biology and Computer Science Engineering [email protected] Janice Yu-Hsin Chyou [email protected] Undeclared Phillip Ming-Da Cheng [email protected] Urszula Chajewska BMI Computer Science [email protected] Erin Cline [email protected] Molecular and Cellular Physiology Robert Cheng [email protected] Dr. Albert S Chan Mechanical Engineering Department of Family Medicine [email protected] Brian Courtney [email protected] School of Medicine Christine Cheng [email protected] Lap Fung Chan Computer Science Electrical Engineering [email protected] Craig Anthony Cummings [email protected] Microbiology and Immunology Yu-Che Eddie Cheng [email protected] Ravi A. Chandrasekaran EE Biological Sciences and Chemistry [email protected] Dr. Ronald L. Dalman [email protected] Surgery Wendy Cheng [email protected] Jeffrey Chang Mechanical Engineering Medical Informatics [email protected] Oranee Daniels, MD. [email protected] Division of Clinical Pharmacology Mike Cherry [email protected] Doug N. Chang Genetics cs [email protected] Mindy Davis [email protected] Chemistry Steve Chervitz [email protected] Chi Chang Neomorphic, Inc. (Affymetrix) Mechanical Engineering: [email protected] Jerel Clayton Davis Biomechanics Division Biological Sciences [email protected] Ming Chiang [email protected] ISIS Pharmaceuticals Celene Chang [email protected] Ed Davis Business/Engineering SRI International [email protected] Kyeongjae Cho [email protected] Mechanical Engineering Austin Che [email protected] Ulrike DeMarco Computer Science Psychology [email protected] Edward Choice [email protected]

105 BCATS 2000 Symposium Proceedings Symposium Participant List [email protected] Peter Dehlinger Sid Elmer Iota Pi Law Group Chemistry Jie Gao [email protected] [email protected] Computer Science [email protected] Scott L Delp Gerald Engel Mechanical Engineering Mechanical Engineering MArgarita Garcia [email protected] [email protected] TAIR/Carnegie Institution of Washington Corrie Detweiler Carol C. Epstein, Ph.D. [email protected] Stanford Department of BioInfoStrategies Microbiology [email protected] Audrey Gasch [email protected] Biochemistry Christian Eversull [email protected] Parvati Dev Medicine SUMMIT, School of Medicine [email protected] Lise Carol Getoor [email protected] Computer Science Rob Ewing [email protected] Erlind Nasufi Dine Carnegie Inst Graduate School of Business [email protected] Dr. Gary Gilbert [email protected] Telemedicine and Advance Larry Fagan Technology Research Center Kara Dolinski Stanford University [email protected] Genetics [email protected] [email protected] David McPherson Goehring Zhenbin Fan Biological Sciences Magdalena Dorywalska Urology [email protected] Structural Biology [email protected] [email protected] Salih Burak Gokturk Daryl Faulds Electrical Engineering Mary Therese Draney Berlex Biosciences [email protected] Mechanical Engineering [email protected] [email protected] Saryn Goldberg Chris Feezor Mechanical Engineering, Katerina Athena Drouvalakis Guidant Corporation Biomechancis Division Medicine [email protected] [email protected] [email protected] Gonzalo Raul Feijoo Seri Gomberg Chenggang Duan Mechanical Engineering iKnowMed Computer Science [email protected] [email protected] [email protected] Yanan Feng Matthew Gonzales Chris Duffield Genetics Div. of Infectious Diseases Stanford Materials Science & [email protected] [email protected] Engineering Dept. [email protected] Tracy Ferea, Ph.D. Justin Graham Applied Biosystems Stanford Medical Informatics Jonathan Dugan [email protected] [email protected] BMI [email protected] Dr. Michael John Fero Randy Grow School of Medicine Applied Physics Maitreya Dunham [email protected] [email protected] Genetics [email protected] Patrick Alexander Fleisch Ming Gu Engineering Computer Science Phillip Marks Ecker [email protected] [email protected] med [email protected] Kelly Frazer Kyle Alan Gurley Affymetrix, Inc. Developmental Biology David Elgart [email protected] [email protected] Genencor [email protected] Robert French

106 BCATS 2000 Symposium Proceedings Symposium Participant List Katsuyuki Hoshina [email protected] Oliver Kaljuvee Vascular Surgery Computer Science [email protected] Allison Yen-Ling Hsieh [email protected] [email protected] Scot Lee Haire Sep Kamvar Mechanical Engineering Dept, Flow Jerry Yungchi Hsu SCCM Physics and Computation Div Cancer Biology [email protected] [email protected] [email protected] Kenneth Sye-young Kang Joan Hebert Kurt Huang Computer Science Genetics Biomedical Informatics [email protected] [email protected] [email protected] John Kang Adolff Theodorus van Der Heide Suttiporn Janenawasin Alpha Innotech Corporation Mechanical Engineering (Division psychiatry [email protected] of Biomechanics) [email protected] [email protected] Rami Kantor Guha Jayachandran Division of Infectious Diseases W LeRoy Heinrichs Computer Science [email protected] Gynecology & Obstetrics/SUMMIT [email protected] [email protected] Fiona Kaper Michael Christopher Jewett Radiation Oncology Christopher John Hernandez Chemical Engineering [email protected] Mechanical Engineering [email protected] [email protected] Peter Kasson Xuhuai Ji Biophysics Program Catherine Hettinger Department of Medicine/Division of [email protected] eng Gastroenterology [email protected] [email protected] Brett Taketsugu Kawakami Civil and Environmental Carol A. Hill Audrey Jia Engineering CliniCon Protein Design Labs, Inc [email protected] [email protected] [email protected] David Kiang Tad Hogg Rong Jiang Medicine/Gastroenterology Xerox PARC computer science [email protected] [email protected] [email protected] Charlie Kim Karin Hollerbach, Ph.D. Jeremy Aaron Johnson Microbiology and Immunology [email protected] Electrical Engineering [email protected] [email protected] Bret Alan Holley Dong-Hyun Kim Biological Sciences Betsy Johnston Electrical Engineering [email protected] Infectious Diseases [email protected] [email protected] Bret Holley Teri E. Klein Biological Sciences Keith Joho Stanford Medical Informatics [email protected] Abgenix, Inc. [email protected] [email protected] Lena Hong Uwe Klein [email protected] Sunghae Joo Advanced Medicine, Inc Nanogen, Inc. [email protected] Zachary Dolph Hornby [email protected] Biological Sciences Tod Klingler [email protected] Andy Kacsmar Prospect Genomics Computer Science [email protected] Brita Hornung [email protected] Anesthesia Pete Klosterman [email protected] Herbert Kaizer UC Berkeley / Lawrence Berkeley Medicine Lab Gabriel Howles [email protected] [email protected] Biology

107 BCATS 2000 Symposium Proceedings Symposium Participant List Alex Kobler Biology Civil and Environmental GSB [email protected] Engineering [email protected] [email protected] Choonghyun Lee Robert Kohlenberger Computer Science Yueyi (Irene) Liu Applied Biosystems [email protected] Biomedical Informatics [email protected] [email protected] Ann Lee-Karlon Daphne Koller Business May Liu Computer Science [email protected] Biomechanics Division, Mechanical [email protected] Engineering Michael Levitt [email protected] Charlene Kon Structural Biology Developmental Biology [email protected] Xiaole Liu [email protected] Stanford Medical Informatics Kaijun Li [email protected] Christian Johannes Korth Pathology MS and E [email protected] Shuo Liu [email protected] Biomedical Informatics Wenchuan Liang [email protected] Kalpagam Kowsik Biochemistry Chabot College [email protected] Michael Liu [email protected] Biology/Computer Science Mike Hsin-Ping Liang [email protected] John R Koza Stanford Medical Informatics Stanford Medical Informatics [email protected] H.F. Machiel Van der Loos, PhD [email protected] Functional Restoration (Consulting DeYong Liang Assistant Professor) Lee G. Kozar Neurobiology [email protected] Bioinformatics Resource [email protected] [email protected] Rita Lopatin Yung S. Lie Cygnus, Inc. Aruna V. Krishnan Biological Sciences [email protected] Medicine/Endocrinology [email protected] [email protected]. Ann Loraine edu Jason Lih Neomorphic Genetics [email protected] Tom Krummel [email protected] Surgery Hui-Ling Lu [email protected] Min Chin Lim EE Biological Sciences [email protected] Christopher Kueny [email protected] Lawrence Livermore National Charity Yueh-chwen Lu Laboratory Connie Lin Computer Science [email protected] Biology [email protected] [email protected] David Kulp Walter Jaren Luh Neomorphic, Inc Zhen Lin Computer Science Medicine, Stanford Medical [email protected] Eric J. Kunkel Informatics Pathology [email protected] Terry S. Desser M.D. [email protected] Radiology Richard Lin [email protected] Chuck Pui Lam Computer Science Electrical Engineering [email protected] Jiong Ma [email protected] Biology Yuhong Liu [email protected] Stefan Larson geological and environmental Biophysics science Gregory Marsden [email protected] [email protected] Computer Science [email protected] Ivy Ann Lee Michael Ming-Cheng Liu

108 BCATS 2000 Symposium Proceedings Symposium Participant List Susana Martins Las Positas College VA Palo Alto Health Care System [email protected] Rakesh Nigam [email protected] Mathematics Dan Morris [email protected] Mary Mata Computer Science Nanogen, Inc. [email protected] Dave Nix, Ph.D. [email protected] Medicine Joseph M. Morris [email protected] John Charles Matese Affymetrix/Neomorphic Genetics [email protected] Nassim Nouri [email protected] Affymetrix Willy Moss [email protected] Alexander F. Mayer Microbiology and Immunology Affymetrix [email protected] Patrick John O'Brien [email protected] Biochemistry Yannick Moy [email protected] Frederic Mazzella CS National Biocomputation Center [email protected] Mary O'Connell [email protected] Biomechanical Engineering Lukas A Mueller [email protected] Scott E Mcphillips Carnegie Institution of Washington Stanford Synchrotron Radiation [email protected] Brian O'Connor Laboratory iScribe [email protected] Mark A Musen [email protected] Medicine (Medical Informatics) Craig Meyer [email protected] Assad Anshuman Oberai Electrical Engineering Mechanical Engineering [email protected] Ankur Nagaraja [email protected] Biology John Joseph Michon [email protected] Christine Olsson Biomedical Informatics Deltagen, Inc. [email protected] Rob Nail [email protected] Velocity11 Nesanet Senaite Mitiku [email protected] John G. Olyarchuk, MD Medicine, Genetics Medicine [email protected] Brad Nakatani [email protected] Chemistry Subhasish Mitra [email protected] Jessica Hammond Owens Electrical Engineering Cancer Bio [email protected] Girish Narayan [email protected] Cardiology Shannon Elizabeth Moffett [email protected] Ramesh Padala medicine CS [email protected] Rosa Ines Navarro [email protected] Biological Sciences Joshua Irving Molho [email protected] Rasmus Pagh Mechanical Engineering Computer Science [email protected] Krishna S. Nayak [email protected] Electrical Engineering Ja Moon [email protected] Jodi Paik Cooley Godward LLP HRP [email protected] David Neale [email protected] Applied Biosystems Sean Mooney [email protected] David Paik Stanford Medical Informatics Stanford Medical Informatics [email protected] Kelvin Ming-Wei Neu [email protected] Immunology Edward William Moore [email protected] Chana Palmer Human Biology Genetics [email protected] Lan T. Nguyen [email protected] Graduate School of Business Katherine Moore [email protected]

109 BCATS 2000 Symposium Proceedings Symposium Participant List Shyam N. Panchal School of Education [email protected] Cardiovascular Medicine [email protected] [email protected] Daniel Rubin, MD Prasanth Pulavarthi Stanford Medical Informatics Hoi-Cheung Pang Computer Science [email protected] [email protected] [email protected] Daniel B. Russakoff You-Wen Qian Computer Science Hyunsun Park Medicine [email protected] Iconix Pharmaceuticals, Inc. [email protected] [email protected] Michael G. Shulman Attila Racz Biomedical Consulting Aarati Parmar UCSF [email protected] Computer Science [email protected] [email protected] Pinkesh Sachdev Tanya M. Raschke Electrical Engg Phil Payne Structural Biology [email protected] Protein Design Labs [email protected] [email protected] Bauback Safa Rosalind M Ravasio School of Medicine Boris Peker Medicine (Medical Informatics) [email protected] Biophysics [email protected] [email protected] Khaled Nabil Salama Soumya Raychaudhuri EE Mor Peleg Medecine [email protected] Medicine [email protected] [email protected] Alok Jerome Saldanha Paul Reicherter, MD Genetics Kent Peterson Dermatology [email protected] SRI International [email protected] [email protected] Peter Salzman Leonore Reiser MS&E William Petitt The Arabidopsis Information [email protected] Biomedical Informatics Resource/Carnegie Institution of [email protected] Washington Ram Samudrala [email protected] Structural Biology Nicolas Peyret [email protected] Applied Biosystems Martin Axel Reznek [email protected] Surgery Brynnen Noelle Sandoval [email protected] Biology Hamid R Abbasi MD PhD [email protected] Neurosurgery Wito Richter [email protected] GYN/OB Kavita Yang Sarin [email protected] Medicine Jan Benjamin Pietzsch [email protected] Management Science and Gabriel del Rio Engineering The Buck Institute Serge Saxonov [email protected] [email protected] Biomedical Informatics [email protected] Zachary Pincus Adam Josef Rodriguez Biological Sciences [email protected] Peter Leif Schilling [email protected] Rob Rogers [email protected] Elizabeth G. Loboa Polefka Business Mechanical Engineering [email protected] Thomas Michael Schmid [email protected] Business Jessica Ross [email protected] Murali Prakriya BMI Molecular and Cellular Physiology [email protected] George Christopher Scott [email protected] Biomedical Informatics Michael Ross [email protected] Carla Marie Pugh Computer Science

110 BCATS 2000 Symposium Proceedings Symposium Participant List Eran Azriel Segal Arend Sidow [email protected] Computer Science Pathology and Genetics [email protected] [email protected] Daniel Steines Radiology Adam Seiver Mark Siegal [email protected] Surgery Biological Sciences [email protected] [email protected] Fredrik Sterky Department of Plant Biology, Peter K. Seperack Natalie Simmons Carnegie Institution of Washington Skjerven Morrill MacPherson, LLP [email protected] [email protected] [email protected] Alexander Simon Dr. Veronika Stoka Anand Sethuraman Pathology and Genetics, Program in Buck Institute Biochemistry Cancer Biology [email protected] [email protected] [email protected] Renee Patricia Stokowski Ross D Shachter Nita Singh Genetics Management Science and EE [email protected] Engineering [email protected] [email protected] Derek Stonich Rohit Singh [email protected] Maulik Kamlesh Shah Computer Sciene Computer Science [email protected] John David Storey [email protected] Statistics Department Amit P. Singh [email protected] John Sheehan Biomedical Informatics Affymetrix, Inc. [email protected] Joshua Michael Stuart [email protected] Biomedical Informatics Sheela Singla [email protected] Sandra Joan Shefelbine medicine Mechanical Engineering - [email protected] Ted Su Biomechanics Chemistry, Economics [email protected] Katharine Elise Skillern [email protected] Medicine Earl R. Shelton [email protected] Cenk Sumen Kowa Research Institute Microbiology & Immunology [email protected] David Alan Socks [email protected] GSB Smadar Shiffman [email protected] Ray-Hon Sun Psychiatry SCCM [email protected] Manoon Somrantin [email protected] Cardiovascular Medicine Hidetoshi Shimodaira [email protected] Patrick David Sutphin Department of Statistics Radiation Oncology [email protected] Ruchira Sood [email protected] Biochemistry Michael Randall Shirts [email protected] Alrik Suvari Chemistry Department Genentech, Inc. [email protected] Alexis Sowa [email protected] Human Biology Eiketsu Sho [email protected] Srilatha R. Swami Vascular surgery Medicine/Endocrinology [email protected] Kunju Joshi Sridhar [email protected] Hematology John Shon [email protected] Jim Swartz Internal Medicine Chemical Engineering [email protected] Brooke Noelani Steele [email protected] Mechanical Enginering Jennifer Ann Shumilla [email protected] Michael Sykes Pediatrics Biophysics [email protected] Carl Steeves [email protected] Agilent Technologies

111 BCATS 2000 Symposium Proceedings Symposium Participant List Yuichiro Takagi [email protected] structural biology Yi-shin Weng [email protected] Matthew Tsang health research and policy Biological Science [email protected] Mary Tang [email protected] Electrical Engineering Jason Brian Whitt [email protected] James Turner Business School Molecular Dynamics [email protected] Hua Tang [email protected] Statistics Sutanto Widjaja [email protected] Jonathan Andrew Usuka WineShopper.com chemistry [email protected] Ashish Tara [email protected] Graduate School of Business Gio Wiederhold [email protected] Priya Venkatesan CSD and Medicine Symbolic Systems and Biology [email protected] Benjamin M Taskar [email protected] Computer Science Eric P. Wilkinson [email protected] Hugo O Villar Image Guidance Laboratory, Telik, Inc. Department of Neurosurgery Charles Anthony Taylor [email protected] [email protected] Surgery [email protected] Jing Wan Marna Williams Petroleum Engineering Pathology Kavitha Thangavelu [email protected] [email protected] INCYTE GENOMICS [email protected] Justin Wan Kim Williams SCCM Program Biological Science Yvonne Thorstenson [email protected] [email protected] Stanford Genome Technology Center [email protected] Ping Wang Glenn Williams CS/EE/Bio Medical Informatics Rabin Tirouvanziam [email protected] [email protected] Psychology [email protected] Alfred Yu-Leen Wang Cyrus A. Wilson Molecular Pharmacology Biochemistry Carlo Tomasi [email protected] [email protected] Computer Science [email protected] James Warren Lisa Wong Scientific Computing and Biophysics Simon Tong Computational Mathematics [email protected] Computer Science [email protected] [email protected] Stacey Woo Allison Waugh Human Biology Thodoros Topaloglou Computer Science [email protected] Gene Logic Inc [email protected] [email protected] Jim Wood Thomas Scott Wehrman Crosby, Heafey, Roach & May Lorenzo Torresani Molecular Pharmacology [email protected] Computer Science [email protected] [email protected] Kim Woodrow Silvia Weinberger Chemical Engineering Kristina Nikolova Toutanova San Jose State University [email protected] Computer Science [email protected] [email protected] Kristina Nicole Woods Jacqueline Nerney Welch biphysics Joseph D. Towles Medicine, Mechanical Engineering [email protected] Mechanical Engineering [email protected] [email protected] Dr. John Wooley Peizhong Wen University of California San Diego Olga G Troyanskaya Cardiovascular medicine [email protected] Biomedical Informatics [email protected]

112 BCATS 2000 Symposium Proceedings Symposium Participant List Shu-Hsing Wu Jian Yang Medicine Carnegie Institution Iconix Pharmaceuticals, Inc [email protected] [email protected] [email protected] Hong Zhang Jenny Wu Iwei Yeh Genetics Ciphergen Biosystems, Inc Biomedical Informatics [email protected] [email protected] [email protected] Lu Zhang Yu Xia Krishna C Yeshwant Department of Medicine, Division of Department of Structural Biology Computer Science Hematology [email protected] [email protected] [email protected]

Qunong Xiao Golan Yona Jian Zhang Computer Science Dept. of Structural Biology Pherin Pharmaceuticals [email protected] [email protected] [email protected]

Wenzhong Xiao Elizabeth M Yu Kemin Zhou biochemistry [email protected] Neomorphic [email protected] [email protected] Ron Yu Yu Katherine Xu SCCM Ji Zhu Developmental Biology [email protected] Statistics [email protected] [email protected] Xiang Yu Chengpei Xu, MD, PhD psychiatry Feng Zhuge Surgery [email protected] Electrical Engineering [email protected] [email protected] Bojan Zagrovic Haobo Xu Biophysics Jenny H. Zou Appled Physics [email protected] medicine [email protected] [email protected] James Francis Zawada Sanae Yamada Chemical Engineering Graduate School of Business [email protected] [email protected] Shuli Zhang

113 SPONSOR PARTICIPANTS / CONTACTS

DoubleTwist Sun Microsystems Chris Campbell Jon Arikata [email protected] [email protected] Andrew Kasarkis Guidant James Hong Informax [email protected] Dennis Bittner [email protected] Christopher Feezor [email protected] Andrew Cogill [email protected] Reid Hayashi [email protected] Tim O'Brien [email protected] Northern California Jim Dickey Pharmeceutical Discussion [email protected] Group Joel Haaf Eric Schuur [email protected] [email protected]

Jeffrey Flatgaard Incyte Genomics [email protected] Timothy Nelson [email protected] Genencor International Tofoi Yandal-Moore Molly B. Schmid Cindy Georgette [email protected] Karen Wood Rick Silvers Donald Naki [email protected]

GeneLogic Jian Yao Thodoros Topaloglou [email protected] [email protected]

Madhavan Ganesh Skjervan, Morril, [email protected] MacPherson, LLP, Peter Seperack Kevin McLoughlin [email protected] [email protected] Christopher Allenby Krishna Papaniapan [email protected] [email protected] Gregory Powell SGI [email protected] Jeffrey Hausch [email protected] Signe Holmbeck [email protected]

114 115 SYMPOSIUM SPONSORS

116

Full Sponsors

· DoubleTwist · Incyte · Informax · GeneLogic · Guidant · SGI · Sun Microsystems

Half Sponsors

· Genencor International · Northern California Pharmeceutical Discussion Group · Skjervan, Morril, MacPherson, LLP,

Please refer to the end of the Participant List for individual contact information.

118

DoubleTwist.com (privately held) Vertical ASP www.doubletwist.com Oakland, CA

into DoubleTwist.com. Included in theses strategic relationships are Derwent Information Ltd., Myriad Genetics, Inc., Molecular Simulations, Inc., Chemdex, BioTools Inc., and Eragen Biosciences, Inc.

Launched in January 2000, DoubleTwist.com is located in Oakland, California, with additional offices in Germany and Switzerland. Current DoubleTwist.com customers include Affymetrix, Inc., Bristol-Myers Squibb Company, Chiron Corporation, E.I. du Pont de Nemours and Company, Elan Pharmaceuticals, Hitachi Ltd., Merck 7 Co., Inc., Millenium Pharmaceuticals Inc. and Monsanto Company.

DoubleTwist is currently hiring for a variety of DoubleTwist is an application service provider (ASP) positions at our Oakland offices. Within a devoted to empowering life scientists. The company burgeoning new scientific discipline at the provides research environments that leverage intersection of computer and life sciences, information technology and the World Wide Web to DoubleTwist is looking for people with simplify and accelerate genomic research. experience in areas such as bioinformatics, molecular biology, chemistry, computer science, The company’s leading product, DoubleTwist.com sales, and customer support. We offer the ™ is a secure and comprehensive online research opportunity to join a leading-edge company in a environment that enables life scientists to perform field that is only beginning to take off. sophisticated genomic analysis without requiring bioinformatics expertise. Subscribers to For a complete listing of open positions, please DoubleTwist.com receive access to intelligent and log onto our website at: www.doubletwist.com. automated analysis tools and advanced, interactive You may also e-mail your resume directly to software for the visualization of their research results. DoubleTwist at: [email protected]. In addition, DoubleTwist.com provides a number of resources that support life science research, as well as e-commerce functionality and value-added content of relevance to life scientists.

The technology platform underlying DoubleTwist.com integrates more than 25 disparate genomic databases. These databases include public databases, databases licensed from third parties and strategic partners and the DoubleTwist, Inc. proprietary databases, such as the Annotated Human Genome Database and the Annotated Human Gene Index, which are created by processing and annotating the public genomic data. DoubleTwist has established several strategic relationships as a means of integrating additional features and content

Since 1983 the NCPDG has provided a forum for the Bay Area pharmaceutical/biotechnology industry for development of the community and discussion of topics important to our industry. The NCPDG holds monthly dinner meetings that are attended by individuals representing every aspect of industry life and from nearly every pharmaceutical/biotechnology company in the Bay Area to hear talks on subjects ranging from genomics to contract manufacturing to financing company operations. Our unique combination of fellowship and education provides several material benefits to our members.

· Opportunities for effective networking · Increased understanding of industry issues · Expanded knowledge of various pharmaceutical/biotechnology businesses · Self-Improvement and education

Your company can also take advantage of NCPDG involvement. Our membership spans the width and breadth of the pharmaceutical/biotechnology industry in the Bay Area and we regularly have presentors and attendees from as far away as Europe and Japan. By becoming a NCPDG Sponsor your company can gain increased visibility in the Bay Area and beyond while helping to support individual professional development. Benefits of sponsorship include:

· Name exposure on our printed materials, web site and e-mail distributions · Tailored sponsorship opportunities · Distribution of job announcements and other materials at NCPDG meetings · Another way to help your employees develop professional and social skills

For more information on the NCPDG visit our web site or contact Ben Borson at (415) 362-3800, Eric Schuur at (650) 224-4178, or Helen Wang at (415) 922-3868.

WWW.NCPDG.ORG

www.skjerven.com

An Interdisciplinary Firm

For an Interdisciplinary Field

SKJERVEN MORRILL MACPHERSON LLP

A law firm serving high technology clients from offices in San Francisco, San Jose, Newport Beach, and Austin is pleased to sponsor BCATS - Biomedical Computation @ Stanford 2000

Genencor International, Inc. is proud to have provided sponsorship for the BCATS 2000 Symposium

Genencor is a diversified biotechnology company that develops and delivers products into the health care, agriculture and industrial chemicals markets. Using an integrated set of technology platforms, our products deliver innovative and sustainable solutions to many of the problems of everyday life.

Find more information about us at: www.genencor.com