People-Centric Natural Language Processing

People-Centric Natural Language Processing

People-Centric Natural Language Processing David Bamman CMU-LTI-15-007 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis committee Noah Smith (chair), Carnegie Mellon University Justine Cassell, Carnegie Mellon University Tom Mitchell, Carnegie Mellon University Jacob Eisenstein, Georgia Institute of Technology Ted Underwood, University of Illinois Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy In Language and Information Technologies © David Bamman, 2015 Contents 1 Introduction 3 1.1 Structure of this thesis . .5 1.1.1 Variation in content . .5 1.1.2 Variation in author and audience . .6 1.2 Evaluation . .6 1.3 Thesis statement . .8 2 Methods 9 2.1 Probabilistic graphical models . .9 2.2 Linguistic structure . 12 2.3 Conditioning on metadata . 13 2.4 Notation in this thesis . 15 I Variation in content 16 3 Learning personas in movies 18 3.1 Introduction . 18 3.2 Data . 19 3.2.1 Text . 19 3.2.2 Metadata . 20 3.3 Personas . 21 3.4 Models . 22 3.4.1 Dirichlet Persona Model . 22 3.4.2 Persona Regression . 24 3.5 Evaluation . 25 3.5.1 Character Names . 25 i CONTENTS ii 3.5.2 TV Tropes . 25 3.5.3 Variation of Information . 26 3.5.4 Purity . 27 3.6 Exploratory Data Analysis . 28 3.7 Conclusion and Future Work . 29 4 Learning personas in books 33 4.1 Introduction . 33 4.2 Literary Background . 34 4.3 Data . 35 4.3.1 Character Clustering . 36 4.3.2 Pronominal Coreference Resolution . 37 4.3.3 Dimensionality Reduction . 38 4.4 Model . 38 4.4.1 Hierarchical Softmax . 39 4.4.2 Inference . 42 4.5 Evaluation . 42 4.6 Experiments . 44 4.7 Analysis . 45 4.8 Conclusion . 47 5 Learning events through people 49 5.1 Introduction . 49 5.2 Data . 50 5.3 Model . 52 5.4 Evaluation . 56 5.5 Analysis . 58 5.6 Additional Analyses . 61 5.6.1 Correlations among events . 62 5.6.2 Historical distribution of events . 62 5.7 Related Work . 63 5.8 Conclusion . 65 CONTENTS iii II Variation in the author and audience 66 6 Learning ideal points of propositions through people 70 6.1 Introduction . 70 6.2 Task and Data . 71 6.2.1 Data . 72 6.2.2 Extracting Propositions . 73 6.2.3 Human Benchmark . 74 6.3 Models . 75 6.3.1 Additive Model . 75 6.3.2 Single Membership Model . 77 6.4 Comparison . 79 6.4.1 Principal Component Analysis . 79 6.4.2 `2-Regularized Logistic Regression . 79 6.4.3 Co-Training . 80 6.5 Evaluation . 80 6.6 Convergent Validity . 83 6.6.1 Correlation with Self-declarations . 83 6.6.2 Estimating Media Audience . 84 6.7 Conclusion . 84 7 Learning word representations through people 86 7.1 Introduction . 86 7.2 Model . 87 7.3 Evaluation . 90 7.3.1 Qualitative Evaluation . 90 7.3.2 Quantitative Evaluation . 91 7.4 Conclusion . 94 8 Improving sarcasm detection with situated features 95 8.1 Introduction . 95 8.2 Data . 97 8.3 Experimental Setup . 97 8.4 Features . 98 8.4.1 Tweet Features . 98 8.4.2 Author Features . 99 CONTENTS 1 8.4.3 Audience Features . 100 8.4.4 Environment Features . 101 8.5 Evaluation . 101 8.6 Analysis . 101 8.7 Conclusion . 102 9 Conclusion 104 9.1 Summary of contributions . 104 9.2 Future directions . 105 9.2.1 Richer models . 105 9.2.2 Supervision . 106 9.2.3 Fine-grained opinion mining . 106 9.2.4 Stereotyping . 107 9.2.5 Richer context . 107 10 Bibliography 108 Acknowledgments The work presented in this thesis addresses the complexity of the interaction between peo- ple and text. The words on these pages naturally hold a similarly complex relationship; while I am their author, I count my words lucky to be influenced by many people: most im- mediately, my advisor, Noah Smith, and my collaborators Justine Cassell, Chris Dyer, Jacob Eisenstein, Tom Mitchell, Brendan O’Connor, Tyler Schnoebelen and Ted Underwood. In my time at Carnegie Mellon, you have each shaped my thinking profoundly; the work in this thesis has benefited not only from the clarity and depth of your thoughts on it, but also in the interests and ideals you have each inspired in me. There are many others whose influence I am grateful to acknowledge: my colleagues Waleed Ammar,.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    136 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us