Text Mining of Mutations and Their Impact from Biomedical Literature

Total Page:16

File Type:pdf, Size:1020Kb

Load more

TEXT MINING OF MUTATIONS AND THEIR IMPACT FROM BIOMEDICAL LITERATURE by A. S. M. Ashique Mahmood A dissertation submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Fall 2018 c 2018 A. S. M. Ashique Mahmood All Rights Reserved TEXT MINING OF MUTATIONS AND THEIR IMPACT FROM BIOMEDICAL LITERATURE by A. S. M. Ashique Mahmood Approved: Kathleen F. McCoy, Ph.D. Chair of the Department of Computer and Information Sciences Approved: Babatunde A. Ogunnaike, Ph.D. Dean of the College of Engineering Approved: Douglas J. Doren, Ph.D. Interim Vice Provost for Graduate and Professional Education I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Vijay K. Shanker, Ph.D. Professor in charge of dissertation I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Cathy H. Wu, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Li Liao, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Peter McGarvey, Ph.D. Member of dissertation committee ACKNOWLEDGEMENTS This research work is the outcome of years-long dedication and patience. But, it would not have been possible without the support from many people around me. First of all, I express my gratitude towards my advisor and mentor, Prof. Vijay K Shanker. Throughout the research journey, his continuous advisement, mentoring and encouragement played an integral role in shaping up this dissertation. Specially, Prof. Shanker taught me how to think critically about a research problem, how to effectively write research papers and how to present research in front of peers. These skills have helped me in my research in many ways and I believe I will continue to benefit from them in my future career. I am truly grateful for all he has done for me. I thank my dissertation committee members: Prof. Cathy Wu, Prof. Li Liao and Dr. Peter McGarvey. Despite their busy schedules, they were kind enough to serve in my dissertation committee and helped me with their suggestions and insights regarding the applicability of my research work. I am grateful for their invaluable time and attention towards this dissertation. I have spend many wonderful years in the BioTM lab. I have come across wonderful minds in BioTM lab, who also played roles in shaping my research. I am thankful to Oana (Catalina Tudor) for mentoring me when I first joined the lab. She helped me getting into the NLP research world. I fondly remember former and present members of BioTM lab: Gang Li, Yifan Peng, Samir Gupta, Ruoyao Ding, Jia Ren and Peng Su. We spent a lot of time together, be it for \research" or leisurely activities. We had fun together in hacking into some new cool tool as well as watching UCL matches. Thank you guys! Since I moved to USA, I am lucky to have wonderful friends who were there for me always. It would take pages if I start listing why they are special to me. iv Instead, I just express my heartfelt gratitude to Farzana Khair, Musawir Chowdhury, Shermin Ashraf, Saif Tahsin, Samara Saif, Tareque Aziz, Firdous Saleheen, Purujit Saha, Laura Moum, Sonia Jahan, Fazle Rob, Mahfuzur Khan, Zannatun Noor, Rifat Lutful, Shafique Ahmed, Mithub Deb and Dabojani Das for their kind friendship. They all made me feel home, while away from home. I thank my family and relatives for their unconditional love and support that shaped my entire life. My parents, Shaheen Sultana and Mahbubul Hoq, have always believed in me and encouraged in every step of my academic journey. I cannot thank my parents enough for this. Specially, without the love, care and sacrifices from my mother, I would not be the person that I am today. Thank you mom! And last but not the least, I thank my wife Nancy (Tanjima Ferdous). We got married while I was a PhD student; and since then, she has supported my PhD journey through unconditional love, sacrifices, encouragement and patience. She is the best partner and companion I could wish for. I love her and I am grateful for all she has done for me. In addition, I would like to thank my department (CIS, UDEL) for this won- derful opportunity of graduate education as well as for the financial support at the be- ginning. I also thank the funding agencies who continued to fund the research projects that I was involved with. I am grateful to our research collaborators in Georgetown University, George Washington University, Delaware Biotechnology Institute (DBI) and University of Delaware for the countless fruitful discussions, from which I learned a lot. In a nutshell, I am grateful to each and everyone who supported my journey, in one way or other. To everyone I mentioned and forgot to mention, thank you. v TABLE OF CONTENTS LIST OF TABLES :::::::::::::::::::::::::::::::: x LIST OF FIGURES ::::::::::::::::::::::::::::::: xi ABSTRACT ::::::::::::::::::::::::::::::::::: xii Chapter 1 INTRODUCTION :::::::::::::::::::::::::::::: 1 1.1 Motivation ::::::::::::::::::::::::::::::::: 1 1.2 Thesis contributions ::::::::::::::::::::::::::: 2 1.2.1 Mutation detection :::::::::::::::::::::::: 2 1.2.2 Mutation-disease association ::::::::::::::::::: 2 1.2.3 Impact of genomic anomalies on drug responses :::::::: 3 1.2.4 Mutation impact on PPI ::::::::::::::::::::: 4 1.3 Outline of the dissertation :::::::::::::::::::::::: 5 2 MUTATION DETECTION :::::::::::::::::::::::: 6 2.1 Introduction :::::::::::::::::::::::::::::::: 6 2.2 Related works ::::::::::::::::::::::::::::::: 6 2.3 Approach ::::::::::::::::::::::::::::::::: 7 2.3.1 Mutation detection :::::::::::::::::::::::: 8 2.3.2 Genotype/Allele detection :::::::::::::::::::: 9 2.3.3 Mutation-gene association :::::::::::::::::::: 10 2.4 Evaluation ::::::::::::::::::::::::::::::::: 13 2.4.1 Evaluation setup ::::::::::::::::::::::::: 13 vi 2.4.2 Evaluation metrics :::::::::::::::::::::::: 14 2.5 Results and discussion :::::::::::::::::::::::::: 15 2.5.1 Results on mutation detection :::::::::::::::::: 15 2.5.2 Results on mutation-gene association :::::::::::::: 16 2.6 Conclusion ::::::::::::::::::::::::::::::::: 17 3 MUTATION-DISEASE ASSOCIATION :::::::::::::::: 19 3.1 Introduction :::::::::::::::::::::::::::::::: 19 3.2 Related works ::::::::::::::::::::::::::::::: 23 3.3 Approach ::::::::::::::::::::::::::::::::: 24 3.3.1 General relation extraction system ::::::::::::::: 24 3.3.2 CAIR relations :::::::::::::::::::::::::: 26 3.3.3 MF relations ::::::::::::::::::::::::::: 28 3.3.4 Statistical relations :::::::::::::::::::::::: 28 3.3.5 Co-occurrence in title/conclusion :::::::::::::::: 29 3.3.6 Extracting specific information ::::::::::::::::: 29 3.3.6.1 Extracting mutations :::::::::::::::::: 29 3.3.6.2 Extracting diseases ::::::::::::::::::: 30 3.3.6.3 Patient Context (PC) sentence :::::::::::: 30 3.3.7 Extracting additional information :::::::::::::::: 31 3.3.7.1 Rhetorical zones :::::::::::::::::::: 31 3.3.7.2 Patient related information :::::::::::::: 32 3.4 System implementation :::::::::::::::::::::::::: 34 3.5 Evaluation ::::::::::::::::::::::::::::::::: 35 3.5.1 Evaluation setup ::::::::::::::::::::::::: 35 3.5.2 Evaluation metrics :::::::::::::::::::::::: 35 3.6 Results and discussion :::::::::::::::::::::::::: 36 3.6.1 Results on annotated datasets :::::::::::::::::: 36 3.6.2 Full-scale processing ::::::::::::::::::::::: 37 3.7 Conclusion ::::::::::::::::::::::::::::::::: 38 vii 4 IMPACT OF GENOMIC ANOMALIES ON DRUG RESPONSES 39 4.1 Introduction :::::::::::::::::::::::::::::::: 39 4.2 Related works ::::::::::::::::::::::::::::::: 42 4.3 Approach ::::::::::::::::::::::::::::::::: 43 4.3.1 Different information types :::::::::::::::::::: 43 4.3.1.1 Association ::::::::::::::::::::::: 44 4.3.1.2 Comparison ::::::::::::::::::::::: 44 4.3.1.3 Biomarker :::::::::::::::::::::::: 46 4.3.1.4 Sensitization :::::::::::::::::::::: 47 4.3.2 Syntactic processing ::::::::::::::::::::::: 47 4.3.3 Entity recognition ::::::::::::::::::::::::: 48 4.3.4 Typing of phrases ::::::::::::::::::::::::: 50 4.3.5 Pattern matching ::::::::::::::::::::::::: 51 4.3.6 Extracting specific information ::::::::::::::::: 52 4.3.6.1 Extracting drugs :::::::::::::::::::: 52 4.3.6.2 Extracting diseases ::::::::::::::::::: 52 4.3.7 Extracting additional information :::::::::::::::: 53 4.4 System implementation :::::::::::::::::::::::::: 53 4.5 Evaluation ::::::::::::::::::::::::::::::::: 54 4.5.1 Evaluation setup ::::::::::::::::::::::::: 54 4.5.2 Evaluation metrics :::::::::::::::::::::::: 56 4.6 Results and discussion :::::::::::::::::::::::::: 57 4.6.1 Results on annotated datasets :::::::::::::::::: 57 4.7 Conclusion ::::::::::::::::::::::::::::::::: 59 5 MUTATION IMPACT ON PROTEIN-PROTEIN INTERACTIONS :::::::::::::::::::::::::::::: 60 5.1 Introduction :::::::::::::::::::::::::::::::: 60 5.2 Related works ::::::::::::::::::::::::::::::: 62 viii 5.3 Approach ::::::::::::::::::::::::::::::::: 63 5.3.1 Extraction of PPI relation ::::::::::::::::::::
Recommended publications
  • BMC Bioinformatics Biomed Central

    BMC Bioinformatics Biomed Central

    BMC Bioinformatics BioMed Central Introduction Open Access Proceedings of the Second International Symposium for Semantic Mining in Biomedicine Sophia Ananiadou*1 and Juliane Fluck*2 Address: 1School of Computer Science, National Centre for Text Mining, Manchester Interdisciplinary Biocentre, University of Manchester, Oxford Road, M13 9PL, Manchester, UK and 2Fraunhofer Institute SCAI, Schloss Birlinghoven, 53754 St. Augustin, Germany Email: Sophia Ananiadou* - [email protected]; Juliane Fluck* - [email protected] * Corresponding authors from Second International Symposium on Semantic Mining in Biomedicine (SMBM) Jena, Germany. 9–12 April 2006 Published: 24 November 2006 <supplement> <title> <p>Second International Symposium on Semantic Mining in Biomedicine (SMBM)</p> </title> <editor>Sophia Ananiadou, Juliane Fluck</editor> <note>Proceedings</note> </supplement> BMC Bioinformatics 2006, 7(Suppl 3):S1 doi:10.1186/1471-2105-7-S3-S1 © 2006 Ananiadou and Fluck; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Introduction • discovery of semantic relations between entities With an overwhelming amount of biomedical knowledge recorded in texts, it is not surprising that there is so much • event discovery interest in techniques which can identify, extract, manage, integrate and exploit this knowledge, and moreover dis- The current limitations of using existing terminological cover new, hidden or unsuspected knowledge. For this and ontological resources such as the Gene Ontology, reason, in the past five years, there has been an upsurge of Swiss-Prot, Entrez Gene, UMLS, and Mesh etc.
  • AI and Bioinformatics

    AI and Bioinformatics

    AI Magazine Volume 25 Number 1 (2004) (© AAAI) Articles Editorial Introduction AI and Bioinformatics Janice Glasgow, Igor Jurisica, and Burkhard Rost ■ This article is an editorial introduction to the re- modern-day biology is far more complex than search discipline of bioinformatics and to the articles suggested by the simplified sketch presented in this special issue. In particular, we address the issue here. In fact, researchers in life sciences live off of how techniques from AI can be applied to many of the introduction of new concepts; the discov- the open and complex problems of modern-day mol- ecular biology. ery of exceptions; and the addition of details that usually complicate, rather than simplify, his special issue of AI Magazine focuses the overall understanding of the field. on some areas of research in bioinfor- Possibly the most rapidly growing area of re- Tmatics that have benefited from applying cent activity in bioinformatics is the analysis AI techniques. Undoubtedly, bioinformatics is of microarray data. The article by Michael Mol- a truly interdisciplinary field: Although some la, Michael Waddell, David Page, and Jude researchers continuously affect wet labs in life Shavlik (“Using Machine Learning to Design science through collaborations or provision of and Interpret Gene-Expression Microarrays”) tools, others are rooted in the theory depart- introduces some background information and ments of exact sciences (physics, chemistry, or provides a comprehensive description of how engineering) or computer sciences. This wide techniques from machine learning can be used variety creates many different perspectives and to help understand this high-dimensional and terminologies. One result of this Babel of lan- prolific gene-expression data.
  • Alliheedi Mohammed.Pdf (7.910Mb)

    Alliheedi Mohammed.Pdf (7.910Mb)

    Procedurally Rhetorical Verb-Centric Frame Semantics as a Knowledge Representation for Argumentation Analysis of Biochemistry Articles by Mohammed Alliheedi A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 2019 c Mohammed Alliheedi 2019 Examining Committee Membership External Examiner: Vlado Keselj Professor, Faculty of Computer Science Dalhousie University Supervisor(s): Robert E. Mercer Professor, Dept. of Computer Science, The University of Western Ontario Robin Cohen Professor, School of Computer Science, University of Waterloo Internal Member: Jesse Hoey Associate Professor, School of Computer Science, University of Waterloo Internal-External Member: Randy Harris Professor, Dept. of of English Language and Literature, University of Waterloo Other Member(s): Charles Clarke Professor, School of Computer Science, University of Waterloo ii I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. iii Abstract The central focus of this thesis is rhetorical moves in biochemistry articles. Kanoksila- patham has provided a descriptive theory of rhetorical moves that extends Swales' CARS model to the complete biochemistry article. The thesis begins the construction of a com- putational model of this descriptive theory. Attention is placed on the Methods section of the articles. We hypothesize that because authors' argumentation closely follows their experimental procedure, procedural verbs may be the guide to understanding the rhetor- ical moves. Our work proposes an extension to the normal (i.e., VerbNet) semantic roles especially tuned to this domain.
  • Are You an Invited Speaker? a Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics

    Are You an Invited Speaker? a Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics

    Are You an Invited Speaker? A Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics Senator Jeong, Sungin Lee, and Hong-Gee Kim Biomedical Knowledge Engineering Laboratory, Seoul National University, 28–22 YeonGeon Dong, Jongno Gu, Seoul 110–749, Korea. E-mail: {senator, sunginlee, hgkim}@snu.ac.kr Participating in scholarly events (e.g., conferences, work- evaluation, but it would be hard to claim that they have pro- shops, etc.) as an elite-group member such as an orga- vided comprehensive lists of evaluation measurements. This nizing committee chair or member, program committee article aims not to provide such lists but to add to the current chair or member, session chair, invited speaker, or award winner is beneficial to a researcher’s career develop- practices an alternative metric that complements existing per- ment.The objective of this study is to investigate whether formance measures to give a more comprehensive picture of elite-group membership for scholarly events is represen- scholars’ performance. tative of scholars’ prominence, and which elite group is By one definition (Jeong, 2008), a scholarly event is the most prestigious. We collected data about 15 global “a sequentially and spatially organized collection of schol- (excluding regional) bioinformatics scholarly events held in 2007. We sampled (via stratified random sampling) ars’ interactions with the intention of delivering and shar- participants from elite groups in each event. Then, bib- ing knowledge, exchanging research ideas, and performing liometric indicators (total citations and h index) of seven related activities.” As such, scholarly events are communica- elite groups and a non-elite group, consisting of authors tion channels from which our new evaluation tool can draw who submitted at least one paper to an event but were its supporting evidence.
  • Syntactic Analyses and Named Entity Recognition for Pubmed and Pubmed Central — Up-To-The-Minute

    Syntactic Analyses and Named Entity Recognition for Pubmed and Pubmed Central — Up-To-The-Minute

    Syntactic analyses and named entity recognition for PubMed and PubMed Central — up-to-the-minute Kai Hakala1;2∗, Suwisa Kaewphan1;2;3∗, Tapio Salakoski1;3 and Filip Ginter1 1. Dept. of Information Technology, University of Turku, Finland 2. The University of Turku Graduate School (UTUGS), University of Turku, Finland 3. Turku Centre for Computer Science (TUCS), Finland [email protected], [email protected], [email protected], [email protected] Abstract Various community efforts, mainly in the form of shared tasks, have resulted in steady improve- Although advanced text mining methods ment in biomedical text mining methods (Kim et specifically adapted to the biomedical do- al., 2009; Segura Bedmar et al., 2013). For in- main are continuously being developed, stance the GENIA shared tasks focusing on ex- their applications on large scale have been tracting biological events, such as gene regula- scarce. One of the main reasons for this tions, have consistently gathered wide interest and is the lack of computational resources and have led to the development of several text mining workforce required for processing large tools (Miwa et al., 2012; Bjorne¨ and Salakoski, text corpora. 2013). These methods have been also succes- In this paper we present a publicly avail- fully applied on a large scale and several biomed- able resource distributing preprocessed ical text mining databases are publicly available biomedical literature including sentence (Van Landeghem et al., 2013a; Franceschini et al., splitting, tokenization, part-of-speech tag- 2013; Muller¨ et al., 2004). Although these re- ging, syntactic parses and named entity sources exist, their number does not reflect the recognition.
  • Adding Value to Scholarly Communications Through Text Mining

    Adding Value to Scholarly Communications Through Text Mining

    Adding Value to Scholarly Communications Enhancing User Experience of Scholarly Communicationthrough through Text Text Mining Mining Sophia Ananiadou UK National Centre for Text Mining • first national text mining centre in the world www.nactem.ac.uk • Remit : Provision of text mining services to support UK research • Funded by • University of Manchester, collaboration with Tokyo From Text to Knowledge Applications, users and techniques Scholarly Communication Requirements • What is needed in the repositories – Annotation and curation assistance • Creation of metadata, consistent manner – Name authorities • Merging and mapping existing resources • Prediction lists based on named entity recognition • Disambiguation – Semantic metadata creation and enhancement Provision of semantic metadata to support search • Extraction of terms and named entities (names of people, organisations, diseases, genes, etc) • Discovery of concepts allows semantic annotation and enrichment of documents – Improves information access by going beyond index terms, enabling semantic querying – Improves clustering, classification of documents • Going a step further: extracting relationships, events from text – Enables even more advanced semantic applications Semantic metadata for whom? Semantic metadata for whom? • end users – adds value to library content – allows enhanced searching functionalities – allows interaction with content, living document • automated content aggregators – access to data-driven, quality metadata derived from text • librarians – enhanced capability for semantic indexing, cross- referencing between Library collections and classification Terminology Services TerMine Identifies the most significant terms Used as metadata Suggests similar areas of interest Refines index terms for document classification Used for ontology building (Protégé TerMine plug-in) Semantic metadata: terms Term Based Applications Tag Cloud based on terms automatically extracted from the blog of BBSRC Chief Executive Professor Kell.
  • Syntactic Analyses and Named Entity Recognition for Pubmed and Pubmed Central — Up-To-The-Minute

    Syntactic Analyses and Named Entity Recognition for Pubmed and Pubmed Central — Up-To-The-Minute

    Syntactic analyses and named entity recognition for PubMed and PubMed Central — up-to-the-minute 1,2 1,2,3 1,3 1 Kai Hakala ∗, Suwisa Kaewphan ∗, Tapio Salakoski and Filip Ginter 1. Dept. of Information Technology, University of Turku, Finland 2. The University of Turku Graduate School (UTUGS), University of Turku, Finland 3. Turku Centre for Computer Science (TUCS), Finland [email protected], [email protected], [email protected], [email protected] Abstract Various community efforts, mainly in the form of shared tasks, have resulted in steady improve- Although advanced text mining methods ment in biomedical text mining methods (Kim et specifically adapted to the biomedical do- al., 2009; Segura Bedmar et al., 2013). For in- main are continuously being developed, stance the GENIA shared tasks focusing on ex- their applications on large scale have been tracting biological events, such as gene regula- scarce. One of the main reasons for this tions, have consistently gathered wide interest and is the lack of computational resources and have led to the development of several text mining workforce required for processing large tools (Miwa et al., 2012; Bjorne¨ and Salakoski, text corpora. 2013). These methods have been also succes- In this paper we present a publicly avail- fully applied on a large scale and several biomed- able resource distributing preprocessed ical text mining databases are publicly available biomedical literature including sentence (Van Landeghem et al., 2013a; Franceschini et al., splitting, tokenization, part-of-speech tag- 2013; Muller¨ et al., 2004). Although these re- ging, syntactic parses and named entity sources exist, their number does not reflect the recognition.
  • Text Mining for Biomedicine an Overview: Selected Bibliography

    Text Mining for Biomedicine an Overview: Selected Bibliography

    Text Mining for Biomedicine an Overview: selected bibliography Sophia Ananiadou a & Yoshimasa Tsuruoka b University of Manchester a, University of Tokyo b National Centre for Text Mining a,b http://www.nactem.ac.uk/ (i) Overviews on Text Mining for Biomedicine Ananiadou, S. & J. McNaught (eds) (2006) Text Mining for Biology and Biomedicine, Artech House. MacMullen, W.J, and S.O. Denn, “Information Problems in Molecular Biology and Bioinformatics,” Journal of the American Society for Information Science and Technology , Vol. 56, No. 5, 2005, pp. 447--456. Lars Juhl Jensen, J. Saric and P. Bork (2006) "Literature mining for the biologist: from information retrieval to biological discovery", In Nature Reviews Genetics, Vol. 7, Feb. 2006, pp 119-129 Blaschke, C., L. Hirschman, and A. Valencia, “Information Extraction in Molecular Biology,” Briefings in Bioinformatics , Vol. 3, No. 2, 2002, pp. 1--12. Cohen, A. M., and W. R. Hersh, “A Survey of Current Work in Biomedical Text Mining,” Briefings in Bioinformatics , Vol. 6, 2005, pp. 57--71. Nédellec, C., “Machine Learning for Information Extraction in Genomics---State of the Art and Perspectives.” In Text Mining and its Applications , pp. 99--118, S. Sirmakessis (ed.), Berlin: Springer-Verlag, Studies in Fuzziness and Soft Computing 138, 2004. Rebholz-Schuhmann, D., H. Kirsch, and F. Couto, “Facts from Text—Is Text Mining Ready to Deliver? ” PLoS Biology , Vol. 3, No. 2, 2005, pp. 0188--0191, http://www.plosbiology.org, June 2005. Shatkay, H., and R. Feldman, “Mining the Biomedical Literature in the Genomic Era: An Overview,” Journal of Computational Biology , Vol. 10, No. 6, 2004, pp.
  • Dear Delegates,History of Productive Scientific Discussions of New Challenging Ideas and Participants Contributing from a Wide Range of Interdisciplinary fields

    Dear Delegates,History of Productive Scientific Discussions of New Challenging Ideas and Participants Contributing from a Wide Range of Interdisciplinary fields

    3rd IS CB S t u d ent Co u ncil S ymp os ium Welcome To The 3rd ISCB Student Council Symposium! Welcome to the Student Council Symposium 3 (SCS3) in Vienna. The ISCB Student Council's mis- sion is to develop the next generation of computa- tional biologists. We would like to thank and ac- knowledge our sponsors and the ISCB organisers for their crucial support. The SCS3 provides an ex- citing environment for active scientific discussions and the opportunity to learn vital soft skills for a successful scientific career. In addition, the SCS3 is the biggest international event targeted to students in the field of Computational Biology. We would like to thank our hosts and participants for making this event educative and fun at the same time. Student Council meetings have had a rich Dear Delegates,history of productive scientific discussions of new challenging ideas and participants contributing from a wide range of interdisciplinary fields. Such meet- We are very happy to welcomeings have you proved all touseful the in ISCBproviding Student students Council and postdocs Symposium innovative inputsin Vienna. and an Afterincreased the network suc- cessful symposiums at ECCBof potential 2005 collaborators. in Madrid and at ISMB 2006 in Fortaleza we are determined to con- tinue our efforts to provide an event for students and young researchers in the Computational Biology community. Like in previousWe ar yearse extremely our excitedintention to have is toyou crhereatee and an the opportunity vibrant city of Vforienna students welcomes to you meet to our their SCS3 event. peers from all over the world for exchange of ideas and networking.
  • Themes in Biomedical Natural Language Processing: Bionlp08

    Themes in Biomedical Natural Language Processing: Bionlp08

    Edinburgh Research Explorer Themes in biomedical natural language processing: BioNLP08 Citation for published version: Demner-Fushman, D, Ananiadou, S, Cohen, KB, Pestian, J, Tsujii, J & Webber, BL 2008, 'Themes in biomedical natural language processing: BioNLP08', BMC Bioinformatics, vol. 9, no. S-11. https://doi.org/10.1186/1471-2105-9-S11-S1 Digital Object Identifier (DOI): 10.1186/1471-2105-9-S11-S1 Link: Link to publication record in Edinburgh Research Explorer Document Version: Publisher's PDF, also known as Version of record Published In: BMC Bioinformatics General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 29. Sep. 2021 BMC Bioinformatics BioMed Central Research Open Access Themes in biomedical natural language processing: BioNLP08 Dina Demner-Fushman*1, Sophia Ananiadou2, K Bretonnel Cohen3, John Pestian4, Jun'ichi Tsujii5 and Bonnie Webber6 Address: 1US National Library of Medicine, 8600 Rockville Pike, Bethesda,
  • Development and Analysis of NLP Pipelines in Argo

    Development and Analysis of NLP Pipelines in Argo

    Development and Analysis of NLP Pipelines in Argo Rafal Rak, Andrew Rowley, Jacob Carter, and Sophia Ananiadou National Centre for Text Mining School of Computer Science, University of Manchester Manchester Institute of Biotechnology 131 Princess St, M1 7DN, Manchester, UK rafal.rak,andrew.rowley,jacob.carter,sophia.ananiadou @manchester.ac.uk { } Abstract in text is preceded by text segmentation, part-of- speech recognition, the recognition of named enti- Developing sophisticated NLP pipelines ties, and dependency parsing. Currently, the avail- composed of multiple processing tools ability of such atomic processing components is and components available through differ- no longer an issue; the problem lies in ensur- ent providers may pose a challenge in ing their compatibility, as combining components terms of their interoperability. The Un- coming from multiple repositories, written in dif- structured Information Management Ar- ferent programming languages, requiring different chitecture (UIMA) is an industry stan- installation procedures, and having incompatible dard whose aim is to ensure such in- input/output formats can be a source of frustration teroperability by defining common data and poses a real challenge for developers. structures and interfaces. The architec- Unstructured Information Management Archi- ture has been gaining attention from in- tecture (UIMA) (Ferrucci and Lally, 2004) is a dustry and academia alike, resulting in a framework that tackles the problem of interoper- large volume of UIMA-compliant process- ability of processing components. Originally de- ing components. In this paper, we demon- veloped by IBM, it is currently an Apache Soft- strate Argo, a Web-based workbench for ware Foundation open-source project1 that is also the development and processing of NLP registered at the Organization for the Advance- pipelines/workflows.
  • Improved Prediction of Protein Secondary Structure by Use Of

    Improved Prediction of Protein Secondary Structure by Use Of

    Proc. Natl. Acad. Sci. USA Vol. 90, pp. 7558-7562, August 1993 Biophysics Improved prediction of protein secondary structure by use of sequence profiles and neural networks (protein structure prediction/multiple sequence alinment) BURKHARD ROST AND CHRIS SANDER Protein Design Group, European Molecular Biology Laboratory, D-6900 Heidelberg, Germany Communicated by Harold A. Scheraga, April 5, 1993 ABSTRACT The explosive accumulation of protein se- test set (7-fold cross-validation). The use of multiple cross- quences in the wake of large-scale sequencing projects is in validation is an important technical detail in assessing per- stark contrast to the much slower experimental determination formance, as accuracy can vary considerably, depending of protein structures. Improved methods of structure predic- upon which set of proteins is chosen as the test set. For tion from the gene sequence alone are therefore needed. Here, example, Salzberg and Cost (3) point out that the accuracy of we report a subsantil increase in both the accuracy and 71.0% for the initial choice of test set drops to 65.1% quality of secondary-structure predictions, using a neural- "sustained" performance when multiple cross-validation is network algorithm. The main improvements come from the use applied-i.e., when the results are averaged over several ofmultiple sequence alignments (better overall accuracy), from different test sets. We suggest the term sustained perfor- "balanced tninhg" (better prediction of«-strands), and from mance for results that have been multiply cross-validated. "structure context training" (better prediction of helix and The importance of multiple cross-validation is underscored strand lengths). This method, cross-validated on seven differ- by the difference in accuracy of up to six percentage points ent test sets purged of sequence similarity to learning sets, between two test sets for the reference network (58.3- achieves a three-state prediction accuracy of 69.7%, signi- 63.8%).