Unsupervised Language Acquisition: Theory and Practice

Unsupervised Language Acquisition: Theory and Practice

Unsupervised Language Acquisition: Theory and Practice Alexander Simon Clark Submitted for the degree of D.Phil. University of Sussex September, 2001 Declaration I hereby declare that this thesis has not been submitted, either in the same or different form, to this or any other university for a degree. Signature: Acknowledgements First, I would like to thank Bill Keller, for his supervision over the past three years. I would like to thank my colleagues at ISSCO for making me welcome and especially Susan Armstrong for giving me the job in the first place, and various people for helpful comments and discussions including Chris Manning, Dan Klein, Eric Gaussier, Nicola Cancedda, Franck Thollard, Andrei Popescu-Beilis, Menno van Zaanen and numerous other people including Sonia Halimi for check- ing my Arabic. I would also like to thank Gerald Gazdar and Geoffrey Sampson for their helpful comments as part of my thesis committee. I would also like to thank all of the people that have worked on the various software packages and operating systems that I have used, such as LaTeX, Gnu/Linux, and the gcc compiler, as well as the people who have worked on the preparation of the various corpora I have used: in particular I am grateful to John McCarthy and Ramin Nakisa for allowing me to use their painstakingly collected data sets for Arabic. Finally I would like to thank my wife, Dr Clare Hornsby, for her unfailing support and encouragement (in spite of her complete lack of understanding of the subject) and my children, Lily and Francis, for putting up with my absences both physical and mental, and showing me how language acquisition is really done – over-regularisation of past tense verbs included. Unsupervised Language Acquisition: Theory and Practice Alexander Simon Clark Abstract In this thesis I present various algorithms for the unsupervised machine learning of aspects of nat- ural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the so-called Argument from the Poverty of the Stimulus advanced in favour of the proposition that humans have language-specific innate knowledge. I start by examining an a priori argument based on Gold’s theorem, that purports to prove that natural languages cannot be learned, and some formal issues related to the choice of statistical grammars rather than symbolic grammars. I present three novel algorithms for learning various parts of natural languages: first, an algorithm for the induction of syntactic categories from unlabelled text using distributional in- formation, that can deal with ambiguous and rare words; secondly, a set of algorithms for learning morphological processes in a variety of languages, including languages such as Arabic with non- concatenative morphology; thirdly an algorithm for the unsupervised induction of a context-free grammar from tagged text. I carefully examine the interaction between the various components, and show how these algorithms can form the basis for a empiricist model of language acquisition. I therefore conclude that the Argument from the Poverty of the Stimulus is unsupported by the evidence. Submitted for the degree of D.Phil. University of Sussex September, 2001 Contents 1 Introduction 2 1.1 The Scientific Question . 3 1.2 Basic assumptions . 3 1.3 Outline of the thesis. 4 1.4 Prerequisites . 5 1.5 Literature review . 5 1.6 Bibliographic note . 5 1.7 Acknowledgments . 5 2 Nativism and the Argument from the Poverty of the Stimulus 6 2.1 Introduction . 7 2.2 Nativism . 7 2.3 The Argument from the Poverty of the Stimulus . 9 2.4 A learning algorithm as a refutation of the APS . 11 2.4.1 Data Limitations . 12 2.4.2 Absence of Domain-Specific information . 14 2.4.3 Absence of Language Specific Assumptions . 14 2.4.4 Plausibility of output . 14 2.4.5 Excluded sources of information . 15 2.5 Possible Counter-arguments . 16 2.5.1 Inadequate data . 16 2.5.2 Incomplete learning . 16 2.5.3 Undesirable Properties of Algorithm . 17 2.5.4 Incompatible with psychological evidence . 17 2.5.5 Incompatible with Developmental Evidence . 18 2.5.6 Storage of entire data set . 18 2.5.7 Argument from truth of the Innateness Hypothesis . 19 2.5.8 Argument from the inadequacy of empiricist models of language acquisition 20 2.5.9 Use of tree-structured representations . 21 2.5.10 Not a general purpose learning algorithm . 22 2.5.11 Fails to learn deep structures . 22 2.5.12 Fails to guarantee convergence . 24 2.5.13 Doesn’t use semantic evidence . 24 2.5.14 Fails to show that children actually use these techniques . 24 Contents vi 3 Language Learning Algorithms 26 3.1 Introduction . 27 3.2 Machine Learning of Natural Languages . 27 3.3 Supervised and Unsupervised Learning . 28 3.4 Statistical Learning . 28 3.5 Neural Networks . 29 3.6 Maximum Likelihood estimation . 30 3.7 The Expectation-Maximisation Theorem . 30 3.8 Bayesian techniques . 32 3.9 Evaluation . 34 3.10 Overall architecture of the language learner . 35 3.10.1 Categorisation . 35 3.10.2 Morphology . 35 3.10.3 Syntax . 36 3.10.4 Interactions between the models . 36 3.11 Areas of language not discussed in this thesis . 36 3.11.1 Acoustic processing . 37 3.11.2 Phonology . 37 3.11.3 Segmentation . 37 3.11.4 Semantics and Discourse . 37 4 Formal Issues 39 4.1 Introduction . 40 4.2 The A Priori APS . 40 4.3 Learnability in Formal language theory . 41 4.3.1 Gold . 42 4.4 Measure-one Learnability . 42 4.4.1 Horning . 42 4.5 Extension to Ergodic Processes . 45 4.6 Arguments from Gold’s Theorem . 47 4.7 Statistical Methods in Linguistics . 51 4.8 Formal Unity of Statistical Grammars and Algebraic grammars . 52 5 Syntactic Category Induction 55 5.1 Introduction . 56 5.2 Syntactic Categories . 56 5.2.1 Necessity of Syntactic Categories . 56 5.2.2 Learning categories . 57 5.3 Previous Work . 58 5.3.1 Harris and Lamb . 58 5.3.2 Finch and Chater . 59 5.3.3 Schutze¨ . 59 5.3.4 Brown et al. 59 Contents vii 5.3.5 Pereira, Tishby and Lee . 59 5.3.6 Dagan et al. 60 5.3.7 Li and Abe . 60 5.3.8 Brent . 60 5.4 Context Distributions . 60 5.4.1 Clustering . 62 5.4.2 Algorithm Description . 62 5.5 Ambiguous Words . 63 5.6 Rare words . 64 5.7 Results . 64 5.8 Discussion . 73 5.8.1 Limitations . 73 5.8.2 Independence assumptions . 74 5.8.3 Similarity with Hidden Markov Models . 75 5.8.4 Hierarchical clusters . 75 5.8.5 Use of orthography and morphology . 75 5.8.6 Multi-dimensional classes . 76 6 Morphology Acquisition 77 6.1 Introduction . 78 6.2 Computational Morphology . 78 6.3 Computational Models of Learning Morphology . 80 6.3.1 Supervised . 80 6.3.2 Unsupervised Learning . 81 6.3.3 Partially supervised . 82 6.3.4 Desiderata . ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    196 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us