Data Science at OLCF

Total Page:16

File Type:pdf, Size:1020Kb

Data Science at OLCF Data Science at OLCF Bronson Messer Scientific Computing Group Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory Mallikarjam “Arjun” Shankar Group Leader – Advanced Data and Workflows Group Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory ORNL is managed by UT-Battelle, LLC for the US Department of Energy OLCF Data/Learning Strategy & Tactics 1. Engage with applications – Summit Early Science Applications (e.g., CANDLE) – INCITE projects (e.g., Co-evolutionary Networks: From Genome to 3D Proteome, Jacobson, et al.) – Directors Discretionary projects (e.g., Fusion RNN, MiNerva) 2. Create leadership-class analytics capabilities – Leadership analytics (e.g., Frameworks: pbdR, TensorFlow + Horovod) – Algorithms requiring scale (e.g., non-negative matrix factorization) 3. Enable infrastructure for analytics/AI and data-intensive facilities – Workflows to include data from observations for analysis within OLCF – Analytics enabling technologies (e.g., container deployments for rapidly changing DL/ML frameworks, analytics notebooks, etc.) 2 Data Science at the OLCF Applications Supported through DD/ALCC: Selected Machine Learning Projects on Titan: 2016-2017 Program PI PI Employer Project Name Allocation (Titan core-hrs) Discovering Optimal Deep Learning and Neuromorphic Network Structures using Evolutionary ALCC Robert Patton ORNL 75,000,000 Approaches on High Performance Computers ALCC Gabriel Perdue FNAL Large scale deep neural network optimization for neutrino physics 58,000,000 ALCC Gregory Laskowski GE High-Fidelity Simulations of Gas Turbine Stages for Model Development using Machine Learning 30,000,000 High-Throughput Screening and Machine Learning for Predicting Catalyst Structure and Designing ALCC Efthimions Kaxiras Harvard U. 17,500,000 Effective Catalysts ALCC Georgia Tourassi ORNL CANDLE Treatment Strategy Challenge for Deep Learning Enabled Cancer Surveillance 10,000,000 DD Abhinav Vishnu PNNL Machine Learning on Extreme Scale GPU systems 3,500,000 DD J. Travis Johnston ORNL Surrogate Based Modeling for Deep Learning Hyper-parameter Optimization 3,500,000 DD Robert Patton ORNL Scalable Deep Learning Systems for Exascale Data Analysis 6,500,000 DD William M. Tang PPPL Big Data Machine Learning for Fusion Energy Applications 3,000,000 DD Catherine Schuman ORNL Scalable Neuromorphic Simulators: High and Low Level 5,000,000 DD Boram Yoon LANL Artificial Intelligence for Collider Physics 2,000,000 DD Jean-Roch Vlimant Caltech HEP DeepLearning 2,000,000 DD Arvind Ramanathan ORNL ECP Cancer Distributed Learning Environment 1,500,000 DD John Cavazos U. Delaware Large-Scale Distributed and Deep Learning of Structured Graph Data for Real-Time Program Analysis 1,000,000 DD Abhinav Vishnu PNNL Machine Learning on Extreme Scale GPU systems 1,000,000 DD Gabriel Perdue FNAL MACHINE Learning for MINERvA 1,000,000 TOTAL 220,500,000 - Highlighted rows are Algorithm and Infrastructure Examples; Rest are Primarily Science Applications Acknowledgement: J. Wells, SC 2017 3 Data Science at the OLCF Gordon Bell Prizes in 2018: Peak Performance on Summit Attacking the Opioid Epidemic: Determining the Exascale Deep Learning for Climate Analytics Epistatic and Pleiotropic Genetic Architectures Thorsten Kurth, Sean Treichler, Joshua Romero, Mayur for Chronic Pain and Opioid Addiction Mudigonda, Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe, Massimiliano Fatica, Prabhat, Michael Houston, Wayne Joubert, Deborah Weighill, David Kainer, Sharlee Climer, Amy Justice, Kjiersten Fagnan, Daniel Jacobson 4 Data Science at the OLCF Methods: Leadership Data Analytics Capabilities • Engage parallel math libraries at scale • R language unchanged • New distributed concepts • New profiling capabilities • New interactive SPMD parallel • Significantly improved http://pbdr.org • In-situ distributed capability • In-situ staging capability via ADIOS throughput compared to, e.g., Apache Spark HPC libraries and their R/pbdR connections – “PCA of a 134 GB matrix: ‘hours on . Apache Spark, . less than a minute using R.’” – June 2016, HPCWire 5 Data Science at the OLCF Scaling Deep Learning for Science with ORNL’s MENNDL ORNL-designed algorithm leverages Titan to create high-performing neural networks Evolve d An image generated from neutrino scattering data captured by the MINERvA detector at Fermi National Accelerator Laboratory. Researchers are using MENNDL and the Titan supercomputer to generate deep neural networks that can classify high-energy physics data and improve the ORNL Data Analytics Group used Titan to develop an efficiencies of measurements. evolutionary algorithm to search for optimal hyper- parameters and topologies for ML networks. Steven R. Young, Derek C. Rose, Travis Johnston, William T. Heller, Thomas P. Karnowski, Thomas E. Potok, Robert M. Patton, Gabriel Perdue, and Jonathan Miller, “Evolving Deep Networks Using HPC.” In Proceedings of the Machine Learning on HPC Environments. Paper presented at The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, Colorado (November 6 2017), doi: 10.1145/3146347.3146355. Data Science at the OLCF Key Data Science and Learning Methods PCA, K-Means, CORAL 2 Components etc. excel on Benchmark Suite “traditional HW” part of the node Big Data Analytic PCA, K-Means, and due to the node’s Suite (BDAS) SVM (based on memory, CPU, pbdR) and on-chip Deep Learning Suite CANDLE, CNN, bandwidth (DLS) RNN, and ResNet-50 (distributed memory) Deep Learning Codes (CNN; ResNet50; ..) excel here with NVM and GPUs enabling tensor operations. Code suites are in the CORAL2 (Collaboration of Oak Ridge, Argonne, Livermore laboratories) benchmark suite: https://asc.llnl.gov/coral-2-benchmarks/ 7 Data Science at the OLCF Weak Scaling of Data Benchmarks on Big Data Analytic Suite Titan 1000 Speedup Over Titan Baseline for CORAL-2 Big Data Benchmarks (based on pbdR) PCA K-Means SVM 41 100 1 node 2 nodes 4 nodes 36 Wall time (seconds) time Wall 10 16 32 64 128 31 # of nodes 26 Strong Scaling of Data Benchmarks on SummitDev 21 100 16 10 11 1 6 (seconds) time Wall 1 2 4 # of nodes PCA K-Means SVM H2O-PCA 1 H2O-K-Means PCA K-Means SVM PCA K-Means SVM PCA K-Means SVM SummitDev Summit 8 Data Science at the OLCF Deep Learning Suite Speedup Over Titan Baseline for CORAL-2 Deep Learning Benchmarks 15 x6 SummitDev Summit x6 10 x6 x4 x6 x4 x4 x4 5 x5.9 x6 x3.5 x4 0 CANDLE RNN CNN-googlenet CNN-vgg CNN-alexnet CNN-overfeat Strong Scaling of ResNet-50 on Summit 2000 Scaling of Resnet-50 based on Keras (Tensorflow backend) and 200 Horovod on ImageNet data actual ideal seconds/epoch 20 # of GPUs 9 24 48 96 192 384 Data Science at the OLCF Infrastructure - Cross-Facility Design Pattern From: Policy Considerations when Federating Facilities for Experimental and Observational Data Analysis, Mallikarjun (Arjun) Shankar, Suhas Somnath, Sadaf Alam, Derek Feichtinger, Leonardo Sala, and Jack Wells, (2018, Submitted Book Chapter) 10 Data Science at the OLCF CADES runs BEAM and Pycroscopy for SNS and CNMS CADES: Compute and Data Environment for Science Web and Supercomputing Data Server Interface/Visualization Tier Access from anywhere Compute Resource Logic Tier Parallel Data PHP Web Service Analysis Routines & Serial Data Analysis HTTPS Data Tier MySQL Data Database & Artifacts LOCAL Scientific Instrument Tier High Speed Supercomputing Tier CNMS Resources Secure Data Transfer Scanning Transmission Electron Microscopy (STEM) Scanning Probe Microscopy (SPM) 11 Lingerfelt et al., Procedia Computer Science 80, 2276-2280 (2016). Data Science at the OLCF Towards Data Service Offerings and Easier Data Access Across Facilities • Categories of Data Services – Type 1 data repository program for “data-only” projects. – Type 2 data services program for user communities. – Type 3 computational and data science end station program. • “DataFed” prototype to enable federated data access across facilities (currently being tested) 13 Data Services Program POC: Val Anantharaj; DataFed POC: Dale Stansberry Data Science at the OLCF.
Recommended publications
  • Sophie Voisin
    Sophie Voisin Contact Geographic Information Science & Technology Office: (865) 574-8235 Information Oak Ridge National Laboratory E-mail: [email protected] Oak Ridge, TN 37831-6017 Citizenship: France US Permanent Resident Biosketch Dr. Sophie Voisin received her PhD Degree in Computer Science and Image Processing from the University of Burgundy, France, in 2008. She was a visiting scholar with the Imaging, Robotics, and Intelligent Systems Laboratory at The University of Tennessee from October 2004 to December 2008 to work on her PhD research, and subsequently was involved in program development activities. In September 2010, she joined the Oak Ridge National Laboratory (ORNL) as a Postoctoral Reseach Associate to support various efforts related to applied signal processing, 2D and 3D image understanding, and high performance computing. Firstly, she worked at the ORNL Spallation Neutron Source performing quantitative analysis of neutron image data for various industrial and academic applications related to quality control, process monitoring, and to retrieve the structure of objects. Then, she joined the ORNL Biomedical Science & Engineering Center to develop on one hand image processing algorithms for eyegaze data analysis and on the other hand text processing techniques for social media data mining to correlate individuals' health history and their geographical exposure. Noteworthily she was part of the team that received a R&D 100 award for the developement of a personalized computer aid diagnostic system relying on eyegaze analysis for decision making. Since April 2014, she has been working for the Geographic Information Science & Technology group. Her research focuses on developing multispectral image processing algorithms for CPU and GPU platforms for high performance computing of satellite imagery.
    [Show full text]
  • Dr. Georgia Tourassi Director, Health Data Sciences Institute Group
    Dr. Georgia Tourassi Director, Health Data Sciences Institute Group Leader, Biomedical Science, Engineering and Computing Group Computing, Science, and Engineering Division Oak Ridge National Laboratory Dr. Georgia Tourassi is the founding Director of the Health Data Sciences Institute and Group Leader of Biomedical Sciences, Engineering, and Computing at the Oak Ridge National Laboratory (ORNL). Concurrently, she holds appointments as an Adjunct Professor of Radiology at Duke University and the University of Tennessee Graduate School of Medicine and as a joint UT-ORNL Professor of the Bredesen Center Data Sciences Program and Mechanical, Aerospace, and Biomedical Engineering at the University of Tennessee at Knoxville. Her research interests include artificial intelligence for biomedical applications, medical imaging, biomedical informatics, clinical decision support systems, digital epidemiology, and data-driven biomedical discovery. She served on the FDA Advisory Committee and Review Panel on Computer-Aided Diagnosis Devices, and she has served as appointed Charter member and ad hoc reviewer at several NIH Study sections. Her scholarly work has led to 11 US patents and innovation disclosures and more than 250 peer-reviewed journal articles, conference proceedings articles, editorials, and book chapters. Her research has been featured in numerous high-profile publications such as the MIT Science and Technology Review, Oncology Times and the Economist. Dr. Tourassi has served as Associate Editor of the scientific journals Radiology and Neurocomputing and as a Guest Associate Editor of Medical Physics and IEEE Journal of Biomedical and Health Informatics. She is elected Fellow of the American Institute of Medical and Biological Engineering (AIMBE), the American Association of Medical Physicists (AAPM) and the International Society for Optics and Photonics (SPIE).
    [Show full text]
  • Profiling Top Most Influential Blogger Using Synonym Substitution Approach
    VASANTHAKUMAR G U et al.: PTMIBSS: PROFILING TOP MOST INFLUENTIAL BLOGGER USING SYNONYM SUBSTITUTION APPROACH PTMIBSS: PROFILING TOP MOST INFLUENTIAL BLOGGER USING SYNONYM SUBSTITUTION APPROACH Vasanthakumar G.U1, Priyanka R, Vanitha Raj K.C, Bhavani S, Asha Rani B.R, P. Deepa Shenoy and Venugopal K. R. Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, India E-mail: [email protected] Abstract fulfill their needs this way is adversely affecting the human day- Users of Online Social Network (OSN) communicate with each other, to-day lives, relationships and even families. exchange information and spread rapidly influencing others in the In recent days online social network has gained huge attention network for taking various decisions. Blog sites allow their users to from businesses point of view, where industries consider knowing create and publish thoughts on various topics of their interest in the their audience as the key to their success. Since to remain form of blogs/blog documents, catching the attention and letting readers to perform various activities on them. Based on the content of competitive and noticed in the market is a need for industries and the blog documents posted by the user, they become popular. In this it has made them to engage in social networking which involves work, a novel method to profile Top Most Influential Blogger (TMIB) all age group people from various sectors. Many companies is proposed based on content analysis. Content of blog documents of provide various tools for their customers to engage and discuss bloggers under consideration in the blog network are compared and about their experience on their products, creating interactive analyzed.
    [Show full text]
  • MD Program.Fm
    Doctor of Medicine Program 24 Doctor of Medicine Program Mission Statement and the Medical Curriculum The mission of the Duke University School of Medicine is: To prepare students for excellence by first assuring the demonstration of defined core competencies. To complement the core curriculum with educational opportunities and advice regarding career planning which facilitates students to diversify their careers, from the physician-scientist to the primary care physician. To develop leaders for the twenty-first century in the research, education, and clinical practice of medicine. To develop and support educational programs and select and size a student body such that every student participates in a quality and relevant educational experi- ence. Physicians are facing profound changes in the need for understanding health, dis- ease, and the delivery of medical care changes which shape the vision of the medical school. These changes include: a broader scientific base for medical practice; a national crisis in the cost of health care; an increased number of career options for physicians yet the need for more generalists; an emphasis on career-long learning in investigative and clinical medicine; the necessity that physicians work cooperatively and effectively as leaders among other health care professionals; and the emergence of ethical issues not heretofore encountered by physicians. Medical educators must prepare physicians to respond to these changes.The most successful medical schools will position their stu- dents to take the lead addressing national health needs. Duke University School of Med- icine is prepared to meet this challenge by educating outstanding practitioners, physician scientists, and leaders. Continuing at the forefront of medical education requires more than educating Duke students in basic science, clinical research, and clinical programs for meeting the health care needs of society.
    [Show full text]
  • Use of Natural Language Processing to Extract Clinical Cancer Phenotypes From
    Author Manuscript Published OnlineFirst on August 8, 2019; DOI: 10.1158/0008-5472.CAN-19-0579 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records Authors and Affiliations: Guergana K. Savova 1,2, Ioana Danciu 3, Folami Alamudun 3, Timothy Miller 1,2, Chen Lin 1, Danielle S. Bitterman 2,4, Georgia Tourassi 3, Jeremy L. Warner 5 1. Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 2. Harvard Medical School, Boston, MA 3. Oak Ridge National Lab, Knoxville, TN 4. Dana Farber Cancer Institute, Boston, MA 5. Vanderbilt University Medical Center, Nashville, TN Running title: Natural Language Processing for Cancer Phenotypes from EMRs Corresponding author: Guergana Savova, PhD, FACMI PI Natural Language Processing Lab Boston Children's Hospital and Harvard Medical School 300 Longwood Avenue Mailstop: BCH3092 Enders 144.1 Boston, MA 02115 Tel: (617) 919-2972 Fax: (617) 730-0817 [email protected] Conflict of Interest Statement: none Downloaded from cancerres.aacrjournals.org on September 26, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on August 8, 2019; DOI: 10.1158/0008-5472.CAN-19-0579 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Abstract Current models for correlating electronic medical records with -omics data largely ignore clinical text, which is an important source of phenotype information for cancer patients. This data convergence has the potential to reveal new insights about cancer initiation, progression, metastasis, and response to treatment.
    [Show full text]
  • Using Case-Level Context to Classify Cancer Pathology Reports
    University of Kentucky UKnowledge Kentucky Cancer Registry Faculty Publications Cancer Registry 5-12-2020 Using Case-Level Context to Classify Cancer Pathology Reports Shang Gao Oak Ridge National Laboratory Mohammed Alawad Oak Ridge National Laboratory Noah Schaefferkoetter Oak Ridge National Laboratory Lynne Penberthy National Cancer Institute Xiao-Cheng Wu Louisiana State University See next page for additional authors Follow this and additional works at: https://uknowledge.uky.edu/kcr_facpub Part of the Computational Engineering Commons, Data Science Commons, and the Oncology Commons Right click to open a feedback form in a new tab to let us know how this document benefits ou.y Repository Citation Gao, Shang; Alawad, Mohammed; Schaefferkoetter, Noah; Penberthy, Lynne; Wu, Xiao-Cheng; Durbin, Eric B.; Coyle, Linda; Ramanathan, Arvind; and Tourassi, Georgia, "Using Case-Level Context to Classify Cancer Pathology Reports" (2020). Kentucky Cancer Registry Faculty Publications. 1. https://uknowledge.uky.edu/kcr_facpub/1 This Article is brought to you for free and open access by the Cancer Registry at UKnowledge. It has been accepted for inclusion in Kentucky Cancer Registry Faculty Publications by an authorized administrator of UKnowledge. For more information, please contact [email protected]. Using Case-Level Context to Classify Cancer Pathology Reports Digital Object Identifier (DOI) https://doi.org/10.1371/journal.pone.0232840 Notes/Citation Information Published in PLOS ONE, v. 15, no. 5, 0232840, p. 1-21. This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
    [Show full text]
  • I V Computer Aided Detection of Masses in Breast Tomosynthesis
    Computer Aided Detection of Masses in Breast Tomosynthesis Imaging Using Information Theory Principles by Swatee Singh Department of Biomedical Engineering Duke University Date:_______________________ Approved: ___________________________ Joseph Y. Lo, Ph.D., Supervisor ___________________________ Jay A. Baker, M.D. ___________________________ Kathryn R. Nightingale, Ph.D. ___________________________ Loren W. Nolte, Ph.D. ___________________________ Georgia D. Tourassi, Ph.D. Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biomedical Engineering in the Graduate School of Duke University 2008 i v ABSTRACT Computer Aided Detection of Masses in Breast Tomosynthesis Imaging Using Information Theory Principles by Swatee Singh Department of Biomedical Engineering Duke University Date:_______________________ Approved: ___________________________ Joseph Y. Lo, Ph.D., Supervisor ___________________________ Jay A. Baker, M.D. ___________________________ Kathryn R. Nightingale, Ph.D. ___________________________ Loren W. Nolte, Ph.D. ___________________________ Georgia D. Tourassi, Ph.D. An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biomedical Engineering in the Graduate School of Duke University 2008 i v Copyright by Swatee Singh 2008 Abstract Breast cancer screening is currently performed by mammography, which is limited by overlying anatomy and dense breast tissue. Computer aided detection (CADe) systems can serve as a double reader to improve radiologist performance. Tomosynthesis is a limited-angle cone-beam x-ray imaging modality that is currently being investigated to overcome mammography’s limitations. CADe systems will play a crucial role to enhance workflow and performance for breast tomosynthesis. The purpose of this work was to develop unique CADe algorithms for breast tomosynthesis reconstructed volumes.
    [Show full text]
  • Envisioning Computational Innovations for Cancer Challenges
    ENVISIONING COMPUTATIONAL INNOVATIONS FOR CANCER CHALLENGES: SCOPING MEETING March 6-7, 2019 Livermore Valley Open Campus PRESENTER BIOSKETCHES Livermore, California WELCOME AND ORIENTATION Name Emily Greenspan, Ph.D. Title Program Director Affiliation National Cancer Institute Email [email protected] Dr. Greenspan works as a biomedical informatics program director in the Center for Biomedical Informatics and Information Technology (CBIIT) at NCI. Her primary role is NCI program lead for the NCI/Department of Energy collaborations, including the Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium and the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program. Prior to her role at CBIIT, Emily served as a program director for the NCI's Provocative Questions Initiative in the Center for Strategic Scientific Initiatives where she also helped organize and lead several of NCI's cancer technology focused programs. Emily's grad school and postdoctoral research focused on cancer chemoprevention and arachidonic acid metabolism. 3 | P a g e WELCOME AND ORIENTATION Name Carolyn Lauzon, Ph.D. Title Program Manager Affiliation US Department of Energy Office of Science Advanced Scientific Computing Research Email [email protected] Phone (301) 903-6139 Dr. Carolyn Vea Lauzon serves in the Department of Energy, Office of Science, Office of Advanced Scientific Computing Research. Carolyn is the Facilities Program Manager for the NERSC high performance computing facility at Lawrence Berkeley National Laboratory and the DOE Office of Science lead for DOE partnerships with the National Institutes of Health and the Department of Veterans Affairs. Before joining federal service, Carolyn completed her post doctoral fellowship in Computational Medical Image Analysis at Vanderbilt University and received her PhD in Molecular Biophysics from the Johns Hopkins School of Medicine.
    [Show full text]
  • Opportunities and Obstacles for Deep Learning in Biology and Medicine
    bioRxiv preprint doi: https://doi.org/10.1101/142760; this version posted January 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Opportunities and obstacles for deep learning in biology and medicine A DOI-citable preprint of this manuscript is available at https://doi.org/10.1101/142760. This manuscript was automatically generated from greenelab/deep-review@a01dd71 on January 19, 2018. Authors 1,☯ 2 3 4 Travers Ching , Daniel S. Himmelstein , Brett K. Beaulieu-Jones , Alexandr A. Kalinin , 5 2 6 7 2 Brian T. Do , Gregory P. Way , Enrico Ferrero , Paul-Michael Agapow , Michael Zietz , 8,9,10 11 12 13 Michael M. Hoffman , Wei Xie , Gail L. Rosen , Benjamin J. Lengerich , Johnny 14 15 12 16 Israeli , Jack Lanchantin , Stephen Woloszynek , Anne E. Carpenter , Avanti 17 18 19,20 21 Shrikumar , Jinbo Xu , Evan M. Cofer , Christopher A. Lavender , Srinivas C. 22 17 23 24 25 Turaga , Amr M. Alexandari , Zhiyong Lu , David J. Harris , Dave DeCaprio , 15 17,26 23 27 28 Yanjun Qi , Anshul Kundaje , Yifan Peng , Laura K. Wiley , Marwin H.S. Segler , 29 30 31 32,33,† Simina M. Boca , S. Joshua Swamidass , Austin Huang , Anthony Gitter , 2,† Casey S. Greene ☯ — Author order was determined with a randomized algorithm † — To whom correspondence should be addressed: [email protected] (A.G.) and [email protected] (C.S.G.) 1. Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI 2.
    [Show full text]
  • Hierarchical Neural Architectures for Classifying Cancer Pathology Reports
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 12-2019 Hierarchical Neural Architectures for Classifying Cancer Pathology Reports Shang Gao University of Tennessee, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Recommended Citation Gao, Shang, "Hierarchical Neural Architectures for Classifying Cancer Pathology Reports. " PhD diss., University of Tennessee, 2019. https://trace.tennessee.edu/utk_graddiss/5722 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Shang Gao entitled "Hierarchical Neural Architectures for Classifying Cancer Pathology Reports." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Doctor of Philosophy, with a major in Computer Science. Georgia Tourassi, Major Professor We have read this dissertation and recommend its acceptance: Arvind Ramanathan, Hairong Qi, Russell Zaretzki Accepted for the Council: Dixie L. Thompson Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) Hierarchical Neural Architectures for Classifying Cancer Pathology Reports A Dissertation Presented for the Doctor of Philosophy Degree The University of Tennessee, Knoxville Shang Gao December 2019 c by Shang Gao, 2019 All Rights Reserved. ii This work is dedicated to Nala, my landlord's cat, who thinks she is the queen of the universe.
    [Show full text]
  • BHI 2016 Workshop Proposal
    BHI 2016 Workshop Proposal Type: Half day Georgia Tourassi, PhD - IEEE Member Georgia Tourassi is the Director of the Health Data Sciences Institute at the Oak Ridge National Laboratory and Adjunct Professor of Radiology at Duke University Medical Center and the University of Tennessee at Knoxville. She holds a Ph.D. in Biomedical Engineering from Duke University. She is a senior member of IEEE, INNS, and SPIE, Fellow of the American Institute of Medical and Biological Engineering (AIMBE) and the American Association of Physicists in Medicine (AAPM). Bradford Hesse, PhD Bradford W. Hesse (Brad) is Chief of the Health Communication and Informatics Research Branch at the National Cancer Institute (NCI). Dr. Hesse has spent most of his career working to improve the ways in which mediated communication environments can be used to improve decision making, enhance the user experience, influence group outcomes, and to support adaptive and healthy behaviors. He has authored or co-authored over 160 publications including peer-reviewed journal articles, technical reports, books, and book chapters. His latest edited book, titled “Oncology Informatics,” is scheduled for release by Elsevier Publishing in the spring of 2016. TITLE: Web-Based Public Health Informatics Theme : Behavior and Health Informatics Keywords (Max 5): web mining, population health, outcomes research, knowledge discovery Abstract— The World Wide Web has been transforming radically the landscape of medical research and healthcare delivery. Online sources including digital social networks have facilitated a wide range of novel applications in the public health domain, including disease surveillance, early-warning, and rapid response, monitoring of population health-related interests, sentiment, and behaviors, drug performance monitoring and others.
    [Show full text]
  • Organization Oak Ridge National Laboratory (ORNL)
    Opportunity Title: ORNL Deep Learning for Biomedical and Health Informatics Post-Master's Research Associate Opportunity Reference Code: ORNL17-48-CSED Organization Oak Ridge National Laboratory (ORNL) Reference Code ORNL17-48-CSED Description The Biomedical Science, Engineering, and Computing (BSEC) Group in the Computational Science and Engineering Division (CSED) at the Oak Ridge National Laboratory (ORNL) is seeking a Post-Master's Research Associate with a focus on deep learning for biomedical applications. We are seeking creative people with solid analytical capabilities, broad machine learning and statistical modeling skills, and strong programming experience. Our group is committed to conducting research in computational, data, and engineering sciences in the context of biomedical, clinical, and public health disciplines and applying this knowledge to support the nation’s leading health initiatives. Job Duties and Responsibilities • Research and develop new and novel methods in deep learning as applied to biomedical applications. • Collaborate with BSEC/CSED scientists in a wide variety of data analytics and architecture research. • Author peer reviewed papers, technical papers, reports and proposals for internal and external release and represent the organization by giving technical presentations in large public forums. • Be a part of a vibrant collaborative research environment which will provide the opportunity to perform cutting-edge research in high-performance computing, using ORNL’s Leadership Class Supercomputer. Technical Questions: Georgia Tourassi Tel: (865) 576-4829 Email: [email protected] Qualifications M.S. in Computer Science, Applied Mathematics, Statistics, Engineering or a closely related field. Previous research experience in deep learning methods. Preferred: • Solid theoretical foundation, algorithmic skills, and hands-on experience with deep learning, in the biomedical domain.
    [Show full text]