Natural Language Processing Week 03 Contents

Natural language processing Week 03 Contents 1 Information extraction 1 1.1 History ................................................. 1 1.2 Present significance .......................................... 2 1.3 Tasks and subtasks ........................................... 2 1.4 World Wide Web applications ..................................... 3 1.5 Approaches .............................................. 3 1.6 Free or open source software and services ............................... 4 1.7 Commercial software and services .................................. 4 1.8 See also ................................................ 4 1.9 References ............................................... 5 1.10 External links ............................................. 5 2 Named-entity recognition 6 2.1 Problem definition ........................................... 6 2.1.1 Formal evaluation ....................................... 7 2.2 Approaches .............................................. 7 2.3 Problem domains ........................................... 7 2.4 Current challenges and research .................................... 8 2.5 Software ................................................ 8 2.6 See also ................................................ 8 2.7 References ............................................... 8 2.8 External links ............................................. 10 3 Part-of-speech tagging 11 3.1 Principle ................................................ 11 3.2 History ................................................. 11 3.2.1 The Brown Corpus ...................................... 12 3.2.2 Use of hidden Markov models ................................ 12 3.2.3 Dynamic programming methods ............................... 12 3.2.4 Unsupervised taggers ..................................... 13 3.2.5 Other taggers and methods .................................. 13 3.3 Issues ................................................. 13 3.4 See also ................................................ 14 i ii CONTENTS 3.5 References ............................................... 14 3.6 External links ............................................. 15 4 Phrase chunking 16 4.1 See also ................................................ 16 4.2 External links ............................................. 16 5 Relationship extraction 17 5.1 Applications .............................................. 17 5.2 Approaches .............................................. 17 5.3 See also ................................................ 17 5.4 References ............................................... 17 6 Sentence boundary disambiguation 19 6.1 Strategies ............................................... 19 6.2 Software ................................................ 19 6.3 See also ................................................ 20 6.4 References ............................................... 20 6.5 External links ............................................. 20 7 Shallow parsing 21 7.1 References .............................................. 21 7.2 External links ............................................. 21 7.3 See also ................................................ 21 8 Stemming 22 8.1 Examples ............................................... 22 8.2 History ................................................. 22 8.3 Algorithms ............................................... 22 8.3.1 The production technique ................................... 23 8.3.2 Suffix-stripping algorithms .................................. 23 8.3.3 Lemmatisation algorithms .................................. 24 8.3.4 Stochastic algorithms ..................................... 24 8.3.5 n-gram analysis ........................................ 24 8.3.6 Hybrid approaches ...................................... 24 8.3.7 Affix stemmers ........................................ 25 8.3.8 Matching algorithms ..................................... 25 8.4 Language challenges .......................................... 25 8.4.1 Multilingual stemming .................................... 25 8.5 Error metrics ............................................. 25 8.6 Applications .............................................. 26 8.6.1 Information retrieval ..................................... 26 8.6.2 Domain Analysis ....................................... 26 CONTENTS iii 8.6.3 Use in commercial products ................................. 26 8.7 See also ................................................ 26 8.8 References ............................................... 27 8.9 Further reading ............................................ 27 8.10 External links ............................................. 28 9 Text segmentation 30 9.1 Segmentation problems ........................................ 30 9.1.1 Word segmentation ...................................... 30 9.1.2 Sentence segmentation .................................... 30 9.1.3 Topic segmentation ...................................... 31 9.1.4 Other segmentation problems ................................ 31 9.2 Automatic segmentation approaches ................................. 31 9.3 See also ................................................ 31 9.4 References .............................................. 32 9.5 External links ............................................. 32 10 Tokenization (lexical analysis) 33 10.1 Methods and obstacles ......................................... 33 10.2 Software ................................................ 33 10.3 See also ................................................ 34 10.4 References ............................................... 34 11 Parsing 35 11.1 Human languages ........................................... 35 11.1.1 Traditional methods ...................................... 35 11.1.2 Computational methods .................................... 36 11.1.3 Psycholinguistics ....................................... 36 11.2 Computer languages ......................................... 36 11.2.1 Parser ............................................. 36 11.2.2 Overview of process ..................................... 37 11.3 Types of parsers ............................................ 38 11.4 Parser development software ..................................... 38 11.5 Lookahead .............................................. 39 11.6 See also ................................................ 40 11.7 References .............................................. 41 11.8 Further reading ............................................ 41 11.9 External links ............................................. 41 12 Parse tree 43 12.1 Constituency-based parse trees .................................... 43 12.2 Dependency-based parse trees ..................................... 44 12.3 Phrase markers ............................................ 44 iv CONTENTS 12.4 See also ................................................ 45 12.5 Notes ................................................. 45 12.6 References ............................................... 45 12.7 External links ............................................. 45 13 Constituent (linguistics) 46 13.1 Constituency tests ........................................... 46 13.1.1 Topicalization (fronting) ................................... 46 13.1.2 Clefting ............................................ 46 13.1.3 Pseudoclefting ........................................ 47 13.1.4 Pro-form substitution (replacement) ............................. 47 13.1.5 Answer ellipsis (answer fragments, question test) ....................... 47 13.1.6 Passivization ......................................... 47 13.1.7 Omission (deletion) ...................................... 47 13.1.8 Coordination ......................................... 48 13.2 Constituency tests and disambiguation ................................. 48 13.3 Competing theories .......................................... 48 13.4 See also ................................................ 49 13.5 Notes ................................................. 49 13.6 References ............................................... 50 14 Dependency grammar 52 14.1 History ................................................. 52 14.2 Dependency vs. constituency ..................................... 52 14.3 Dependency grammars ........................................ 53 14.4 Representing dependencies ...................................... 54 14.5 Types of dependencies ........................................ 55 14.5.1 Semantic dependencies .................................... 55 14.5.2 Morphological dependencies ................................. 56 14.5.3 Prosodic dependencies .................................... 57 14.5.4 Syntactic dependencies .................................... 57 14.6 Linear order and discontinuities .................................... 58 14.7 Syntactic functions ........................................... 60 14.8 See also ................................................ 60 14.9 Notes ................................................. 61 14.10References ............................................... 62 14.11External links ............................................. 63 15 Phrase structure grammar 64 15.1 Constituency relation ......................................... 64 15.2 Dependency relation .......................................... 65 15.3 Non-descript grammars ........................................ 65 CONTENTS v 15.4 See also ................................................ 65 15.5 Notes ................................................

Natural Language Processing Week 03 Contents

A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Enhanced Thesaurus Terms Extraction for Document Indexing

Implementing a Portable Clinical NLP System with a Common Data Model: a Lisp Perspective

Ece351 Lab Manual

NLP Commercialisation in the Last 25 Years

A Way with Words: Recent Advances in Lexical Theory and Analysis: a Festschrift for Patrick Hanks

Validating LR(1) Parsers

Finetuning Pre-Trained Language Models for Sentiment Classification of COVID19 Tweets

An Evaluation of Machine Learning Approaches to Natural Language Processing for Legal Text Classification

Implementation of Processing in Racket 1 Introduction

Intellibot: a Domain-Specific Chatbot for the Insurance Industry

Formal Concept Analysis in Knowledge Processing: a Survey on Applications ⇑ Jonas Poelmans A,C, , Dmitry I