
SEGMENTATION AND INTEGRATION IN TEXT COMPREHENSION: A MODEL OF CONCEPT NETWORK GROWTH A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Manas Sudhakar Hardas May 2012 Dissertation written by Manas Sudhakar Hardas B.E., University of Mumbai, 2003 M.S., Kent State University, 2006 Ph.D., Kent State University, 2012 Approved by Dr. Javed I. Khan , Chair, Doctoral Dissertation Committee Dr. Austin Melton , Members, Doctoral Dissertation Committee Dr. Arvind Bansal Dr. Katherine Rawson Dr. Denise Bedford Accepted by Dr. Javed I. Khan , Chair, Department of Computer Science Dr. John Stalvey , Dean, College of Arts and Sciences ii TABLE OF CONTENTS LIST OF FIGURES . vii LIST OF TABLES . xi Acknowledgements . xii Dedication . xiv 1 Introduction: Text Comprehension . 1 1.1 Text comprehension in literature . 9 1.1.1 Knowledge representation in text comprehension . 10 1.1.2 Cognitive processes in text comprehension . 12 1.1.3 Other factors in text comprehension . 17 1.2 Thesis contributions . 18 1.3 Thesis organization . 19 2 Computational segmentation and integration . 20 2.1 An example of concept network growth by SI . 20 2.2 Computational model for SI . 23 2.2.1 Model parameters . 24 2.2.2 Notation . 25 2.2.3 Segmentation of concepts . 27 iii 2.2.4 Integration of concepts . 28 2.2.5 Constraints formation in SI . 28 2.3 Computing the model parameters . 31 2.4 Solvability issues: mutual exclusivity . 33 2.4.1 Mutual exclusivity between two individual readers . 34 2.4.2 Mutual exclusivity due to delayed concept recognition . 36 2.5 Implementation and computability . 38 2.6 Exercise: Computing a base concept network for a text . 40 2.7 Conclusion . 40 3 Function of concept sequence . 43 3.1 Motivation . 43 3.2 Experiment . 44 3.2.1 Design . 44 3.2.2 Assumptions . 46 3.2.3 Statistics obtained from individual concept networks . 47 3.2.4 Statistics of base concept network for different sequences . 50 3.3 Conclusion . 58 4 Structure of a grown concept network . 59 4.1 Algorithm for the SI model of growth . 60 4.2 Important properties of the growth algorithm . 61 4.3 Structural properties . 63 4.3.1 Size of the network . 64 iv 4.3.2 Connectivity/sparsity . 65 4.3.3 Neighborhood clustering . 66 4.3.4 Reachability . 69 4.3.5 Degree distribution . 71 4.4 Implications of structural properties . 72 4.4.1 Small worlds . 72 4.4.2 Proportional connectivity . 73 4.4.3 Other implications . 74 4.5 Conclusion . 75 5 Connectedness of a concept network . 76 5.1 Motivation . 77 5.2 Component analysis of simulated concept networks . 78 5.2.1 Number of components . 80 5.2.2 Fraction of nodes in the giant component . 83 5.2.3 Average size of non giant component U . 84 5.2.4 Phase transition point versus average degree z . 86 5.3 Implications . 89 5.4 Conclusion . 90 6 Conclusion . 91 6.1 Some thoughts about the mathematical model ... 91 6.2 Short term versus long term memory . 93 6.3 What is a concept? . 95 v 6.4 Constructivism and the SI model . 96 6.5 Concept network as an objective measure of quality of comprehension . 97 6.6 Conclusion . 99 7 Appendix A: Data Collection . 100 8 Appendix B: Derivation for degree distribution . 103 BIBLIOGRAPHY . 107 vi LIST OF FIGURES 2.1 ......................................... 21 2.1 Example of concept network growth.(a) A toy concept network constructed for the example sentence 1 (b) Interaction with background concepts (c) An example construction of concept network after comprehending second example sentence 2. The concept network in (b) is now the background concept knowledge for (c). (d) Textbase and Situation model for example sentence 1 [Kintsch 1994]. 22 2.2 Example of constraints formation in segmentation and integration. 28 2.3 Transformation of a concept network in an episode. 31 2.4 The issue of linear separability between individuals. 34 2.5 Solution to mutual exclusivity between two individuals. 36 2.6 Solution to mutual exclusivity due to delayed concept recognition. 37 2.7 Example of a computed base concept network. The nodes named std1; std2, etc. represent the background knowledge of each of the students. The paragraph of text used and the ICNs from which this BCN is computed are shown in Appendix A. 42 3.1 Age of acquisition against concept weight for four different sequences. 53 3.2 Concept id against concept weight in four different sequences. 53 3.3 Age of acquisition against variance in concept weight within a sequence, plotted for four different sequences . 55 vii 3.4 Concept id against variance in the concept weight within a sequence, plot- ted for four different sequences . 56 3.5 Plot of average concept weight of a concept in between four different se- quences against variance in concept weight in between four different se- quences. 58 4.1 Analysis of size of simulated concept networks (a) Number of concepts recognized (nrec) as a function of node recognition rare (#) (b) number of edges recognized (lrec) as a function of node recognition rate (#). The size of the graph is constant at n = 100. 64 4.2 Analysis of connectivity/sparsity of simulated concept networks (a) Av- erage degree (z) as a function of node recognition rate (#) (b) average weighted degree (zw) as a function of node recognition rate (#). The size of the graph is constant at n = 100. 65 4.3 Analysis of clustering coefficients of the simulated concept networks (a) Clustering coefficient 1 (CC1) [Barrat, Weigt 2000] against node recog- nition rate (#). (b) Clustering coefficient 2 (CC2) [Watts, Strogatz 1998] against node recognition rate (#).The size of the graph is constant at n = 100. 67 4.4 Analysis of reachability of simulated concept networks (a) Average shortest path length (L) as a function of node recognition rate (#) (b) Diameter (D) as a function of node recognition rate (#). The size of the graph is constant at n = 100. 69 4.5 Degree distribution and cumulative degree distribution. Values of k are generated over 10000 runs. The size of the graph is constant at n = 200. 70 viii 5.1 Evolution of a network structure and the phenomenon of phase transition. The lower figure shows the fraction of the nodes in the giant component against time. The upper three figures show the time step at which snap- shots of the evolving networks are taken at t = 5; 20; 40. β = 0:5 is the association threshold. Red nodes are a part of the giant component. Phase transition occurs at t = 20. At t = 40 most of the nodes in the network are in the giant component with very few disconnected components. 79 5.2 Number of components (Cn) versus time for a graph of 200 nodes. In figure (a) n0=15% of the nodes are initially assumed to be included in the graph as disconnected components, in (b) n0=50% and in (c) n0=85%. In each figure, Cn versus t is plotted for three values of association threshold β=[0.15,0.5,0.85]. 81 5.3 Ratio of size of giant component to the total nodes in a network against S time ( n ) for a graph of n=200 nodes. In figure (a) n0=15% of the nodes are initially assumed to be included in the graph as disconnected components, S in (b) n0=50% and in (c) n0=85%. In each figure, n versus t is plotted for three values of association threshold β=[0.15,0.5,0.85]. 83 5.4 Average size of non-giant components (U) over time for a graph of n = 200. In figure (a) n0=15% of the nodes are initially assumed to be included in the graph as disconnected components, in (b) n0=50% and in (c) n0=85%. In each figure, U versus t is plotted for three values of association threshold β=[0.15,0.5,0.85]. 85 ix 5.5 Relationship between average degree and phase transition point. (a) Phase transition occurs when 50% of the nodes in giant component, (b) 70% and (c) 90% of the nodes. The initial graph contains n0=50 nodes as discon- nected components. In each figure, phase transition point (pT ) versus t is plotted for three values of association threshold β=[0.15,0.5.0.85]. 86 7.0 Data collection: Drawn ICNs . 102 x LIST OF TABLES 3.1 Correlation between presentation episode for concepts between sequences. 46 3.2 Average number of concepts recognized (n) and average number of asso- ciations made (l) by each of the four groups of readers are their standard deviations. 47 3.3 Moving averages of number of concepts and associations in each of the 8 episodes fitted to a linear regression model. 48 3.4 Correlation between node weights of each concept in different sequences. 51 3.5 Correlation values (r) between concept weights and the time (age) of their acquisition in different sequences. 54 4.1 Statistics for simulated concept networks . 63 xi Acknowledgements At the outset I would like to thank Dr. Javed I. Khan, my graduate advisor, for teaching me everything I know. I have known Dr. Khan for almost 9 years now, first as my advisor for Master's degree and then the doctoral degree. I am still amazed by his enthusiasm for new ideas, his ability to think critically and unabashedly belief in audacious ideas. He has the tremendous ability to inspire people and dream with him.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages130 Page
-
File Size-