A Descriptive Profile of Process in Serendipity

A Narrative and Network Study of Information Behaviour in Context

Abigail McBirnie Department of Information Studies Aberystwyth University

Submitted in partial fulfilment of the requirements of the Degree of Doctor of Philosophy

2012 Recommended citation:

McBirnie, A. (2012). A Descriptive Profile of Process in Serendipity: A Narrative and Network Study of Information Behaviour in Context, PhD thesis, Aberystwyth University, Aberystwyth.

BIBTEX version:

@phdthesis{mcbirnie12aber, author = {Mc{B}irnie, Abigail}, year = {2012}, title = {A Descriptive Profile of Process in Serendipity: A Narrative and Network Study of Information Behaviour in Context}, type = {Ph{D} thesis}, school = {Aberystwyth University}, address = {Aberystwyth} note = {Includes a supplementary volume} } Declaration

This work has not previously been accepted in substance for any de- gree and is not being concurrently submitted in candidature for any degree.

Signed ...... (candidate) Date ......

Statement One

This thesis is the result of my own investigations, except where oth- erwise stated. No correction services have been used. Sources are acknowledged by citations giving explicit references. A bibliography is appended.

Signed ...... (candidate) Date ......

Statement Two

I hereby give consent for my thesis, if accepted, to be available for photocopying and for inter-library loan, and for the title and summary to be made available to outside organisations.

Signed ...... (candidate) Date ...... Electronic Thesis Deposit Agreement

Details of the Work I hereby authorise deposit of this work item in the digital repository maintained by Aberystwyth University, and/or in any other repository authorized for use by Aberystwyth University. This item is a product of my own research endeavours and is covered by the agreement below in which the item is referred to as ‘the Work’. It is identical in content to that deposited in the Library, subject to point 4 below.

Non-exclusive Rights Rights granted to the digital repository through this agreement are entirely non-exclusive. I am free to publish the Work in its present version or future versions elsewhere. I agree that Aberystwyth University may electronically store, copy or translate the Work to any approved medium or format for the purpose of future preservation and accessibility. Aberystwyth University is not under any obligation to reproduce or display the Work in the same formats or resolutions in which it was originally deposited.

AU Digital Repository I understand that work deposited in the digital repository will be accessible to a wide variety of people and institutions, including auto- mated agents and search engines via the World Wide Web. I understand that once the Work is deposited, the item and its metadata may be incorporated into public access catalogues or services, national of electronic theses and dissertations such as the British Library’s EThOS or any service provided by the National Library of Wales. I understand that the Work may be made available via the National Library of Wales Online Electronic Theses Service under the declared terms and conditions of use. I agree that as part of this service the National Library of Wales may electronically store, copy or convert the Work to any approved medium or format for the purpose of future preservation and accessibility. The National Library of Wales is not under any obligation to reproduce or display the Work in the same formats or resolutions in which it was originally deposited.

I agree as follows:

1. That I am the author or have the authority of the author/s to make this agreement and do hereby give Aberystwyth University the right to make available the Work in the way described above. 2. That the electronic copy of the Work deposited in the digital re- pository and covered by this agreement, is identical in content to the paper copy of the Work deposited in the Library of Aberys- twyth University and the National Library of Wales, subject to point 4 below. 3. That I have exercised reasonable care to ensure that the Work is original and, to the best of my knowledge, does not breach any laws including those relating to defamation, libel and copyright. 4. That I have, in instances where the intellectual property of other authors or copyright-holders is included in the work, and as ap- propriate, either: ˆ gained explicit permission for the inclusion of that material in the electronic form of the Work as accessed through the open access digital repository OR ˆ limited it to amounts allowed for by current legislation OR ˆ established that the material is out of copyright OR ˆ removed that material from the electronic version to be de- posited OR ˆ highlighted that material which needs to be removed from the electronic version and informed the repository adviser appropriately (see guidance note 4.). 5. That Aberystwyth University does not hold any obligation to take legal action on behalf of the Depositor, or other rights hold- ers, in the event of a breach of intellectual property rights, or any other right, in the material deposited. 6. That if, as a result of my having knowingly or recklessly given a false statement at points 1, 2, 3 or 4 above, the University or the National Library of Wales suffers loss, I will make good that loss and indemnify Aberystwyth University and the National Library of Wales for all actions, suits, proceedings, claims, demands and costs occasioned in consequence of my false statement.

Signed ...... (candidate) Date ...... Abstract

This research describes information behaviour in context: exper- iences of serendipity in research. The study contributes to the un- derstanding of serendipity as a complex phenomenon by looking at process in serendipity through a relational, phenomenological and so- ciological lens. The research asks: what linked events of doing and happening do people recount when they talk about their experiences of serendipity? and, how do they make sense of the circumstances surrounding these events? The research investigates a sample of fifty first-person narratives of lived experiences of serendipity recounted in the Citation Classics online dataset. A mixed methods parallel conversion design opera- tionalises the research: one strand of the study focuses on description of contextual data, the other, on descriptions of two different event structure models. To meet its descriptive aims, the research draws on multiple methods: narrative approaches, network analysis and stat- istical techniques, including network topology inference and motif de- tection. A descriptive profile of process in serendipity, a portfolio, which collects the network drawings and data for the one hundred event structures modelled by the study, and a research credibility audit stand as the study’s substantive outcomes. The research findings make a four-fold contribution to serendip- ity theory: they provide new insight into experiences of process in serendipity; add concrete, precise detail to fuzzy, abstract process- related serendipity constructs; highlight problems with existing the- oretical assumptions; and present evidence for normality in serendip- ity. Methodologically, the research opens alternative avenues into serendipity’s complexity and brings fresh perspectives to the practice of serendipity research.

Acknowledgements

I am grateful to Dr Christine Urquhart, for her useful advice, readings of drafts and steady presence and guidance throughout the course of the research; to Dr Allen Foster, for his comments at earlier stages of the study and his later reviews of some drafts; to Professor Mike Thelwall, for his time and thought-provoking questions; to an expert, who, by request, shall remain unnamed, for an upsetting, yet ulti- mately, useful, fight over the network component of the research; to Dr Tony Olden, for his reassuring remarks and encouragement; to Dr Stephann Makri and his colleagues at the SerenA project, for valuable discussions and insight; to Dr Angela Winnett, for her early practical pointers on statistical aspects of the study; to Dr Eugene Garfield and Meher Mistry, for their work on establishing the online Citation Clas- sics dataset; to all in the TEX and R communities, for their time, effort and commitment to such valuable open source projects; to Professor Vladimir Batagelj, Professor Andrej Mrvar, Professor Kathleen Car- ley, Dr J¨urgenPfeffer, Florian Rasche, and Dr Sebastian Wernicke, for creating and making freely available the software tools that made the research possible; to Gregory W. Tauer, for his ever so helpful LATEX table converter; to Emma Hastings, Nina Docking, and David Lakey, for assistance with administrative aspects; to my family, for their care and interest; and to my husband, Dr Andrew McBirnie, for so many things.

Stories are the moments that you didn’t see coming, that are what live in you and burn in you forever.

Steven Moffat, speaking on BBC Radio Five Live, 11 May 2011

Contents

List of Figures xxv

List of Tables xxix

1 Exposition 1 1.1 Sixty second summary ...... 1 1.2 Prologue to the research ...... 2 1.2.1 A problematic keyword ...... 2 1.2.2 An intriguing dataset ...... 3 1.2.3 Studying serendipity ...... 4 1.3 Serendipity ...... 5 1.3.1 Both a word and a pattern ...... 5 1.3.2 Accident and sagacity ...... 8 1.3.3 As sociological ...... 9 1.3.4 Related concepts ...... 11 1.4 Research overview ...... 13 1.4.1 Pragmatic stance ...... 13 1.4.2 Motivations and methodological concerns ...... 14 1.4.2.1 Motivations for the study ...... 14 1.4.2.2 Methodological concerns ...... 16 1.4.2.3 Significance of methodology ...... 18 1.4.3 Key terms ...... 18 1.4.3.1 Process ...... 18 1.4.3.2 Event structure ...... 19 1.4.4 Questions and theoretical aim ...... 20

xiii CONTENTS

1.4.4.1 Research questions ...... 20 1.4.4.2 Theoretical aim ...... 21 1.4.5 The experiential stage ...... 22 1.4.6 Substantive outcomes ...... 22 1.4.7 Original contribution ...... 23 1.5 The thesis document ...... 23 1.5.1 Preparation ...... 23 1.5.2 Reader’s roadmap ...... 23

2 Conceptual Framework 25 2.1 Comments on the literature review ...... 25 2.1.1 The reporting approach ...... 25 2.1.2 The serendipity literature ...... 26 2.2 Serendipity as information behaviour in context ...... 31 2.2.1 Context ...... 31 2.2.1.1 Understandings and debates ...... 31 2.2.1.2 Research as context ...... 32 2.2.1.3 Relational context ...... 32 2.2.2 Information behaviour ...... 33 2.2.2.1 Literature reviews and collections ...... 33 2.2.2.2 Wilson’s Nested Model ...... 33 2.3 Serendipity as process ...... 39 2.3.1 Nature ...... 39 2.3.2 Associated circumstances ...... 41 2.3.2.1 When, where and how? ...... 41 2.3.2.2 Time and location ...... 41 2.3.2.3 Serendipity classifications/patterns ...... 42 2.3.2.4 Sociological filters ...... 45 2.3.2.5 Phenomenological interpretations ...... 47 2.3.3 Process and participants ...... 53 2.3.3.1 Who and what? ...... 53 2.3.3.2 Participants ...... 53 2.3.3.3 Process itself ...... 55

xiv CONTENTS

2.3.3.4 Events ...... 55 2.3.3.5 Event structures ...... 57

3 Methodological Framework 61 3.1 Relationalism ...... 61 3.1.1 The relational dyad ...... 61 3.1.2 The relational perspective ...... 62 3.2 Phenomenology ...... 64 3.2.1 Introduction and definitions ...... 64 3.2.2 Historical development ...... 67 3.2.3 The Sch¨utzianperspective ...... 68 3.2.4 The structure of intentionality ...... 70 3.2.5 Problems and limitations ...... 71 3.3 Narrative theory ...... 72 3.3.1 Overview ...... 72 3.3.1.1 Narrative in social science research ...... 72 3.3.1.2 A definition of narrative ...... 73 3.3.1.3 Approaches to narrative ...... 74 3.3.2 Event narrative as structure ...... 76 3.3.2.1 The Labovian narrative analysis framework . . . 76 3.3.2.2 Franzosi’s Model of Narrative Elements ...... 77 3.3.2.3 Multiple levels of structure ...... 78 3.3.3 The semantic triplet ...... 79 3.3.3.1 Definition and notation ...... 79 3.3.3.2 Three points ...... 79 3.3.3.3 The syntactic/semantic subject-object discrepancy 81 3.3.3.4 Semantic grammar ...... 81 3.3.4 Event narrative and the semantic triplet ...... 83 3.3.4.1 The semantic triplet as process ...... 83 3.3.4.2 Aggregation approaches ...... 84 3.3.4.3 Matrix-based approaches ...... 84 3.4 Modelling theory ...... 89 3.4.1 Modelling as methodology ...... 89

xv CONTENTS

3.4.2 Structuring structure ...... 90 3.4.3 Balancing accuracy and simplicity ...... 91 3.4.4 Subjective interpretations ...... 91 3.4.5 Wrong, but useful ...... 92 3.5 Network theory ...... 92 3.5.1 Introduction ...... 92 3.5.2 Mathematical foundations of networks ...... 94 3.5.3 Measuring and comparing relational structures ...... 97 3.5.4 Networks in the real world ...... 100 3.5.4.1 A taxonomy of network classes ...... 100 3.5.4.2 Fuzzy boundaries ...... 101 3.5.4.3 Social semantic networks ...... 101 3.5.5 Network research ...... 102 3.5.5.1 The network perspective ...... 102 3.5.5.2 Information studies ...... 103 3.5.5.3 Creativity and innovation studies ...... 106 3.5.6 Network methods ...... 107 3.5.6.1 Historical context ...... 107 3.5.6.2 Approach ...... 108 3.5.6.3 Model ...... 108 3.5.6.4 Mode ...... 110 3.5.6.5 Measures of degree ...... 111 3.5.6.6 Debates ...... 116 3.5.7 Inferential statistical network modelling ...... 120 3.5.7.1 Random graph models ...... 120 3.5.7.2 Network topology inference ...... 122 3.5.7.3 Network motif detection ...... 124 3.5.7.4 Potential for indirect network comparison . . . . 127 3.6 A multimethodological framework ...... 128 3.6.1 Re/orientation ...... 128 3.6.2 Multimethodology ...... 128 3.6.3 Shared themes ...... 129 3.6.4 The framework in a ‘nutshell’ ...... 130

xvi CONTENTS

4 Research Design 131 4.1 Research aims and objectives ...... 131 4.1.1 Methodological concerns and objectives ...... 131 4.1.2 Theoretical aim of the research ...... 132 4.1.3 Objectives designed to address the theoretical aim . . . . . 132 4.1.3.1 Objective set one—contextual elements ...... 132 4.1.3.2 Objective set two—event structures ...... 133 4.2 Study data ...... 134 4.2.1 The Citation Classics dataset ...... 134 4.2.2 The Citation Classics as phenomenological data ...... 136 4.2.3 Serendipity in the Citation Classics ...... 137 4.2.4 Advantages and limitations of the dataset ...... 138 4.2.5 A subpopulation of serendipity narratives ...... 140 4.2.6 Sampling the serendipity subpopulation ...... 145 4.3 Overview of the research methods ...... 146 4.3.1 Mixed methods design ...... 146 4.3.2 Study stages ...... 148 4.3.3 The pilot studies ...... 149 4.4 Event structure coding ...... 151 4.4.1 Event structure as normal form ...... 151 4.4.2 Use of narrative analysis ...... 152 4.4.3 Use of relational text analysis ...... 154 4.4.3.1 What is relational text analysis? ...... 154 4.4.3.2 Relational text analysis coding choices ...... 155 4.4.4 Two approaches to modelling event structure ...... 161 4.4.5 Advantages of the two model approach ...... 162 4.4.6 A weak (and weaker) semantic approach ...... 164 4.4.7 Overview of event structure coding decisions ...... 165 4.5 Network analysis ...... 167 4.5.1 Networks as translations of event structures ...... 167 4.5.2 Network methods decisions as predetermined ...... 167 4.5.3 Some comments on working with networks ...... 167 4.5.4 From abstract constructs to concrete measures ...... 170

xvii CONTENTS

4.6 Statistical analysis ...... 173 4.6.1 Use of descriptive statistics ...... 173 4.6.2 Model difference and correlation ...... 175 4.7 Research quality ...... 176 4.7.1 Quality issues ...... 176 4.7.2 An Integrative Framework for Inference Quality ...... 177 4.8 Software ...... 182 4.8.1 Pragmatics ...... 182 4.8.2 R ...... 182 4.8.3 PAJEK ...... 183 4.8.4 EXCEL2PAJEK ...... 184 4.8.5 ORA ...... 187 4.8.6 FANMOD ...... 187

5 Data Collection and Analysis 191 5.1 Timeline ...... 191 5.2 Contextual data ...... 192 5.2.1 Collection ...... 192 5.2.2 Analysis ...... 192 5.2.2.1 Classification and pattern data ...... 192 5.2.2.2 Filter data ...... 193 5.2.2.3 Process interpretation data ...... 193 5.2.2.4 Discipline data ...... 193 5.2.2.5 Rich descriptive data ...... 193 5.3 Collection of the event structures ...... 194 5.3.1 Overview of the collection process ...... 194 5.3.2 Narrative proper ...... 195 5.3.3 Model one ...... 195 5.3.4 Model two ...... 202 5.4 Pre-analysis network manipulation ...... 205 5.4.1 Adjustments to model one ...... 205 5.4.2 Adjustments to model two ...... 206 5.4.3 Summary of model similarities/differences ...... 206

xviii CONTENTS

5.5 Generic analysis procedures ...... 208 5.5.1 Network analysis ...... 208 5.5.2 Descriptive statistics ...... 212 5.6 Analysis related to complexity ...... 213 5.6.1 Event structure magnitude ...... 213 5.6.2 Event structure cohesion ...... 213 5.6.3 Event structure event contexts ...... 214 5.6.4 Semantic triplet participant attributes ...... 214 5.7 Analysis related to load ...... 214 5.7.1 Pushed, pulled and total load ...... 214 5.7.2 The activity and passivity of ego ...... 215 5.8 Analysis related to interaction ...... 216 5.8.1 Interaction and sequence ...... 216 5.8.1.1 Two types of interaction ...... 216 5.8.1.2 Analysis of true interaction ...... 217 5.8.1.3 Analysis of dependent interaction ...... 218 5.8.2 Local interaction patterns ...... 218 5.8.3 Global interaction patterns ...... 224 5.9 Comparison of the network models ...... 225 5.9.1 Visual inspection ...... 225 5.9.2 Statistical tests ...... 226

6 Descriptive Inferences 231 6.1 Why describe? ...... 231 6.2 The study’s inference process ...... 232 6.2.1 Inference as outcome and process ...... 232 6.2.2 Multiple levels and multiple steps ...... 233 6.2.3 Levels of inference ...... 233 6.2.4 Steps in the inference process ...... 234 6.2.5 Restatement of questions and aims ...... 235 6.3 Contextual, process-related elements of serendipity ...... 237 6.3.1 Classifications of lived experience ...... 237 6.3.1.1 Occurrence and associated outcomes ...... 237

xix CONTENTS

6.3.1.2 Campanario, van Andel, Foster and Ford . . . . . 238 6.3.1.3 Cross-typology patterning ...... 244 6.3.2 Sociological elements as obstacles ...... 246 6.3.2.1 Sociological filters in the literature ...... 246 6.3.2.2 Sociological filters in lived experience ...... 246 6.3.3 Retrospective interpretations and sensemaking ...... 248 6.3.3.1 Intention versus realisation ...... 248 6.3.3.2 Recognition ...... 250 6.3.3.3 (Non)linearity ...... 250 6.3.3.4 Multiple disciplinarity ...... 251 6.4 Phenomenological event structures of serendipity ...... 255 6.4.1 Event structure description ...... 255 6.4.1.1 Visualisations ...... 255 6.4.1.2 Magnitude ...... 256 6.4.1.3 Cohesion ...... 264 6.4.1.4 Event contexts ...... 268 6.4.1.5 Participant attributes ...... 272 6.4.2 Relational aspects of process ...... 278 6.4.2.1 Ego’s load ...... 278 6.4.2.2 Event structure-level load ...... 290 6.4.2.3 Ego’s activity ...... 294 6.4.2.4 Interaction ...... 301 6.4.2.5 Interaction patterns ...... 309 6.4.3 Evidence for normality ...... 319 6.4.3.1 Normal structure ...... 319 6.4.3.2 Normal load ...... 322 6.4.3.3 Normal patterns of interaction ...... 327 6.5 A descriptive profile of process in serendipity ...... 330

7 Credibility Audit 345 7.1 Introduction to the audit ...... 345 7.2 Design quality ...... 346 7.2.1 Design suitability ...... 346

xx CONTENTS

7.2.2 Design fidelity ...... 347 7.2.3 Within-design consistency ...... 351 7.2.4 Analytic adequacy ...... 352 7.3 Interpretive rigour ...... 355 7.3.1 Interpretive consistency ...... 355 7.3.2 Theoretical consistency ...... 355 7.3.3 Interpretive agreement ...... 356 7.3.4 Interpretive distinctiveness ...... 358 7.3.5 Integrative efficacy ...... 360 7.3.6 Interpretive correspondence ...... 360 7.4 Conclusions of the audit ...... 360

8 Contributions and Prospects 363 8.1 One minute overview ...... 363 8.2 Transferability ...... 364 8.2.1 To whom? ...... 364 8.2.2 In what context? ...... 366 8.2.3 Under what circumstances? ...... 368 8.3 Contributions ...... 370 8.3.1 Serendipity theory ...... 370 8.3.2 Serendipity research ...... 370 8.4 Prospects ...... 371 8.4.1 Recommendations for practice ...... 371 8.4.2 Further research ...... 372 8.5 Reflections on unintended consequences ...... 374

Bibliography 377

Appendices 433

Appendix A The LATEX packages used in the study 435

Appendix B Citation Classics count 437

xxi CONTENTS

Appendix C Estimation of unique Citation Classics count 441

Appendix D Citation Classics in the study sample 443

Appendix E Survey instrument used to collect the contextual data 449

Appendix F Model one coding walk-through 455

Appendix G Model two coding walk-through 465

Appendix H Model one supplementary coding rules 469

Appendix I Estimation procedures for absolute total load 473

Appendix J Estimation procedures for activity 475

Appendix K Multidimensional scaling procedures 479

Appendix L Campanario/Foster and Ford interrelations 483

Appendix M Journal title derived narrow disciplines 485

Appendix N Multidimensional scaling matrices 487

xxii CONTENTS

Appendix O Pilot stage hand-drawn event structure 491

Appendix P Research-related publications and presentations 493

xxiii CONTENTS

xxiv List of Figures

1.1 An analogy: from islands to an archipelago ...... 15

2.1 Wilson’s (1999) Nested Model ...... 34

3.1 Franzosi’s (2010) Model of Narrative Elements ...... 78 3.2 A network ...... 94 3.3 A directed network ...... 96 3.4 A weighted directed network ...... 96 3.5 An ego network extracted from a complete network ...... 109 3.6 Degree in undirected and directed networks ...... 113 3.7 The sixteen possible patterns for a three-vertex directed graph . . 126

4.1 Study keyword search method ...... 143 4.2 The study’s parallel conversion mixed methods research design . . 147 4.3 The nine stages of the study ...... 150 4.4 PAJEK’s user interface ...... 184 4.5 A PAJEK .net file ...... 185 4.6 A PAJEK .net file with added time event records ...... 186 4.7 ORA’s user interface ...... 188 4.8 FANMOD’s user interface ...... 189

5.1 An example of coded model one event structure data ...... 197 5.2 Model one coded data prepared for import into EXCEL2PAJEK ... 201 5.3 The EXCEL2PAJEK settings used to import the model one data . . 201 5.4 Example model two event structure lists and matrices ...... 204 5.5 A workspace loaded in ORA ...... 209

xxv LIST OF FIGURES

5.6 Behind the Generate Reports. . . button ...... 210 5.7 The selection of report parameters ...... 211 5.8 Choices for report output ...... 212 5.9 A FANMOD input file ...... 219 5.10 The FANMOD setup used in the study ...... 220 5.11 The FANMOD export options used in the study ...... 221 5.12 An example of FANMOD’s HTML output ...... 222 5.13 A visual comparison of paired networks ...... 227

6.1 Magnitude for models one and two ...... 257 6.2 Magnitude distribution for models one and two ...... 258 6.3 Magnitude correlation for models one and two ...... 263 6.4 Cohesion for models one and two ...... 264 6.5 Cohesion distribution for models one and two ...... 265 6.6 Cohesion correlation for models one and two ...... 266 6.7 Event context counts for models one and two ...... 269 6.8 Event context count distributions for models one and two . . . . . 270 6.9 Event context correlation for models one and two ...... 271 6.10 Attribute data for models one and two ...... 273 6.11 Attribute data distributions for models one and two ...... 274 6.12 Median attribute percentages for models one and two ...... 276 6.13 Attribute distributions for models one and two ...... 277 6.14 Ego’s absolute pulled/pushed load ...... 279 6.15 Estimates of ego’s absolute total load ...... 281 6.16 Distributions of ego’s estimated absolute total load ...... 282 6.17 Ego absolute total load estimate correlation ...... 284 6.18 Ego’s normalised pulled/pushed load ...... 285 6.19 Ego/average normalised total load ...... 288 6.20 Event structure-level normalised total load ...... 291 6.21 Comparison of ego’s absolute load and activity ...... 295 6.22 Comparison of ego’s normalised load and activity ...... 298 6.23 Ego’s direct interaction ...... 304 6.24 Attribute distribution in the interaction ego networks ...... 306

xxvi LIST OF FIGURES

6.25 Changes in attribute with type of direct interaction ...... 307 6.26 Clustering of dominant motifs ...... 313 6.27 The four dominant motifs ...... 315 6.28 Clustering of event structures and motifs ...... 318 6.29 Quantile-quantile plots of the ‘person’ attribute data ...... 321 6.30 Distribution of ego’s summed pulled and pushed load ...... 323 6.31 Quantile-quantile plot of ego’s summed pulled and pushed load . . 324 6.32 Quantile-quantile plots of ego’s normalised total load ...... 325 6.33 Distribution of ego/average normalised total load ...... 326

F.1 The second pass model one coding worksheet for Cleaver (1981) . 461

G.1 The second pass model two element lists for Cleaver (1981) . . . . 466 G.2 The AA matrix for Cleaver (1981) ...... 466 G.3 The AE and EE matrices for Cleaver (1981) ...... 467 G.4 The AK matrix for Cleaver (1981) ...... 468

O.1 An early hand-drawn event structure ...... 492

xxvii LIST OF FIGURES

xxviii List of Tables

3.1 A participant analysis table ...... 87 3.2 A binary matrix ...... 95 3.3 A matrix for a directed network ...... 97 3.4 A matrix for a weighted directed network ...... 97

4.1 Keyword search results ...... 142 4.2 Relational text analysis coding decisions ...... 157 4.3 Relational text analysis coding decisions applied in the study . . . 166 4.4 Network methods decisions ...... 168 4.5 Matches between serendipity constructs and network measures . . 172 4.6 Integrative Framework for Inference Quality ...... 178

6.1 Descriptions of van Andel serendipity patterns ...... 240 6.2 Descriptions of Foster and Ford serendipity classes ...... 242 6.3 Class interrelations between Campanario and Foster and Ford . . 245 6.4 Intention versus realisation patterns ...... 249 6.5 Narrower discipline spread based on metadata ...... 253 6.6 Complexity data summary ...... 259 6.7 Attribute data summary ...... 275 6.8 Ego’s absolute load data summary ...... 280 6.9 Ego’s normalised pulled/pushed load data summary ...... 286 6.10 Ego/average normalised total load data summary ...... 289 6.11 Event structure-level normalised total load data summary . . . . . 292 6.12 Ego’s absolute activity data summary ...... 296 6.13 Ego’s normalised activity/passivity data summary ...... 299

xxix LIST OF TABLES

6.14 Interaction data summary ...... 303 6.15 True/dependent attribute data summary ...... 308 6.16 Topology inference scores by random graph model ...... 310 6.17 The ten motifs involving ego ...... 312

B.1 Count of the Citation Classics in the online ...... 439

M.1 Narrower discipline spread based on journal title ...... 485

N.1 Matrix underlying Figure 6.26 ...... 488 N.2 Matrix underlying Figure 6.28 ...... 490

xxx Chapter 1

Exposition

This chapter sets forth the principal themes of the research. The discussion considers the motivations for the study: specific triggers and key theoretical in- fluences. In respect of the latter, time is spent introducing serendipity, especially as a pattern and sociological phenomenon. Attention then turns to the aims and contributions of the research. Practical information about the thesis concludes the chapter.

1.1 Sixty second summary

This study describes an information behaviour in context: serendipity in research. Specifically, the description details process in serendipity, outer experiences of doing and happening rather than inner experiences of thoughts, motivations or feelings. The research asks, what events of doing and happening do people recount when they talk about their experiences of serendipity and, how do they make sense of the circumstances surrounding these events? Significantly, the research seeks to describe these recounted events in relation to one another. This relational perspective is important because outer experience of serendipity is not a static state, but rather, a process, comprising linked events and their associated circumstances. The question of how to capture this process systematically is nontrivial. As such, the research is motivated by methodological concerns as much as, if not more than, theoretical concerns.

1 1. EXPOSITION

The research contributes to the understanding of serendipity as a complex phenomenon: methodologically, by bringing fresh perspectives and new tools to serendipity research; and, theoretically, by introducing detailed and precise descriptions of process to serendipity theory.

1.2 Prologue to the research

1.2.1 A problematic keyword

During studies for my librarianship qualification, I undertook research that res- ulted in a model of serendipity in information seeking (McBirnie, 2008a). The research led to the opportunity to publish in a journal (McBirnie, 2008b), a novel and exciting experience for me at the time. As part of the pre-publication pro- cess, I supplied a list of carefully-considered keywords to the editors, along with the final version of my paper. Several months later, the journal issue was published. There was the paper— in print! However, as I scanned down the first page, a change in the keyword list caught my eye: the editors’ addition of individual behaviour. I knew im- mediately that this new keyword bothered me as a label for my research; the question was, why? I had certainly reported findings based on individual cases of serendipity. Surely then, labelling serendipity as individual behaviour should be unproblematic? With this question in mind, I revisited my research data and findings. Three points stood out. The first was that the individualistic aspects of serendipity were important. Crucially, this importance implied that any meaningful approach to serendipity must acknowledge the lived, experienced and personal nature of the phenomenon. The second point was that the research participants consistently described serendipity relationally: they reported serendipity in terms of interlinked ele- ments. From this perspective, serendipity seemed concerned with individuals in relations, not individuals per se. The final point was that, despite differences in the details, the participants’ accounts seemed somehow similar. Indeed, the study outcome, the model of

2 1.2 Prologue to the research serendipity in information seeking, was based on these perceived similarities. Might the similarities be markers of normal (i.e. typical or average) aspects of serendipity? These three points led me to conclude that, although the individual behaviour keyword was not technically ‘wrong’, it was problematic: explicitly highlighting serendipity as individual behaviour implicitly forces other relevant understand- ings of serendipity, such as relational or normal interpretations, into the back- ground. What insight into serendipity might be gleaned from placing these other interpretations of serendipity under the empirical spotlight?

1.2.2 An intriguing dataset

Shortly after completing my first study of serendipity, I came across a paper that I had overlooked: Campanario (1996). Campanario uses the Citation Classics to study incidences of serendipity in research. Section 4.2 provides extensive discussion of the Citation Classics; suffice it to say here that a Citation Classic is a one-page first-person narrative that provides the author of a highly-cited research paper the opportunity to discuss events and experiences surrounding the research project not reported in the formal research report. Significantly, some authors use their Citation Classics to recount stories of serendipity. Campanario examines only part of the Citation Classics dataset; however, in his recommendations for future research, Campanario (1996, p. 22) encourages others interested in serendipity to work with “the very special first hand testi- monials included in the Citation Classics database” and stresses the potential usefulness of “a detailed exploration of the whole Citation Classics database”. Campanario is not alone in his belief in the value of the Citation Classics dataset for empirical research: Eugene Garfield, who has a close connection with the Citation Classics (see section 4.2.1), notes the usefulness of the dataset for researchers interested in studying the research process. Garfield (1993d,e) high- lights the potential for content analysis of the Citation Classics, especially the digital version of the dataset. Garfield (1993a, p. 319) also argues that such analysis could contribute to understandings of the “human side of discovery”.

3 1. EXPOSITION

1.2.3 Studying serendipity

Serendipity research has a recognised problem—or rather, set of problems. Sun, Sharples and Makri (2011, ‘Introduction’) take stock of the situation:

There is much . . . interest in this subject [serendipity] from different disciplines . . . There are no cohesive theories about serendipity, as it is associated with a wide range of meanings . . . There is little consensus on theoretical frameworks . . . [and] a lack of empirical evidence to demonstrate the nature of serendipity.

While some of these problems stem from conceptual issues (see section 1.3), others relate to methodological concerns. Serendipity is considered a difficult topic to investigate empirically (Bj¨orne- born, 2004, p. 6–7; Erdelez, 2010; Toms, 2010). Two approaches are used most commonly to study serendipity (Rubin, Burkell and Quan-Haase, 2011): qualitat- ive, interview-based research (for example, Foster and Ford, 2003) and controlled experiments (see Erdelez, 2004; Toms, 2000a,b; Toms and McCay-Peet, 2009).1 Recent explicit calls for new methods in serendipity research (Erdelez, 2010; Toms, 2010) suggest that the more familiar approaches may have reached the limits of their potential contribution to the further advancement of the serendipity research field. Put bluntly, serendipity researchers are stuck in a rut.2 As Cunha, Clegg and Mendon¸ca(2010, p. 328) note, “A major challenge for researchers will consist in finding adequate methods to do . . . research on a process that is both unpredictable and unsought”.

1Design interventions also exist that seek to facilitate and encourage serendipity (for ex- amples, see Campos and de Figueiredo, 2001; Eagle, 2004; Eagle and Pentland, 2004; Hangal, Nagpal and Lam, 2012). These applied projects can be placed into the experimental category for the purposes of the discussion here. 2Note that the recent Information Research, Special Issue on the Opportunistic Discovery of Information (see Erdelez and Makri, 2011) suggests that this situation is beginning to change.

4 1.3 Serendipity

1.3 Serendipity

1.3.1 Both a word and a pattern

The word serendipity, “. . . making discoveries, by accidents and sagacity, of things which [one is] not in quest of . . . ” (Walpole in Merton and Barber, 2004, p. 2), was born into the English language on January 28, 1754, coined in a letter by Horace Walpole, the British correspondent, author and politician.3 As Remer (1965, p. 15) pointedly notes, “About this fact [the coinage] there is a unan- imity of agreement . . . that is unique vis-`a-visalmost any other fact relating to serendipity”. Likely due, not only to the inept examples Walpole offered as introductions to his fledgling word (Campa, 2008; Remer, 1965), but also to the changes in se- mantics and use that have affected serendipity (Merton and Barber, 2004; Remer, 1965), the question ‘what is serendipity?’ is not easily answered. As a contem- porary empirical concept, serendipity is, at worst, “. . . merely an attractive label designating ever more diverse . . . phenomena . . . ” (Merton, 2004, p. 294), and at best, assigned “. . . a kind of continuum of understandings . . . ” (Andr´e,Schraefel, Teevan and Dumais, 2009, p. 313). This brief introduction to serendipity does not intend to become bogged down in matters related to definition; many, the author included, (McBirnie, 2008a; Merton and Barber, 2004; Remer, 1965) have spent ink pursing that subject, yet the lack of consensus remains. Instead, the present discussion aims to highlight a sociological stance that views serendipity “. . . both as a pattern of behavior and as a word” (Merton and Barber, 2004, p. 20). The American sociologist, Robert K Merton formally introduced serendipity “. . . into the vocabulary of the social sciences . . . ” (Merton and Barber, 2004, p. 140) in two texts that explore the interconnection between sociological theory and empirical research (Merton, 1945, 1948).4 Perry and Edwards (2010, p. 858, emphasis in original) describe Merton’s stance on serendipity:

3Horace was also the youngest son of Sir Robert Walpole, the statesman generally regarded as the first British Prime Minister. 4In the 1945 paper, the reference to serendipity is in a footnote; however, serendipity moves up to the main discussion in the 1948 paper.

5 1. EXPOSITION

Robert Merton makes a subtle differentiation between serendipity and serendipity pattern. Serendipity is the discovery of an unsought find- ing, whereas a serendipity pattern involves observing a surprising and irregular finding, recognizing that is is potentially strategic, and using it to develop a new theory or advance an existing theory.

Prior to Merton, the word serendipity had mostly remained in the care of collectors, literary scholars and lexicographers, although evidence (Rivers, 1920) indicates that, early in the 20th century, the word began to circulate in the med- ical profession, especially amongst those interested in understanding the process of scientific discovery (see Cannon, 1945). Merton claims that he took a detailed interest in serendipity for two reasons (Merton and Barber, 2004, p. 141):

1. his work on the unanticipated consequences of purposive social action (Mer- ton, 1936); and,

2. his belief that neologistic sociological terms should be clearly and tightly conceptualised initially and remain so in ongoing use (Campa, 2008; Merton, 1987, p. 4; Portes, 2010, p. 33–34).

However, a third reason also presents—Merton’s interest in the sociology of sci- ence (see Merton, 1973),5 especially the process of research. It is this third reason that provides the most useful clue to understanding Merton’s stance on serendipity. Yes, in some contexts, serendipity can be con- ceived usefully, properly and simply as a word. An appropriate analogy from information studies for this interpretation of serendipity might be the discovery by an information seeker of a useful, but unexpected, result in the first top ten links returned by a Google search aimed at a different topic. The serendipity pattern, however, unfolds in a richer context, specifically the research process, and results in a significant outcome—the development or exten- sion of theory—with application potential above and beyond localised individual needs. The discovery of penicillin is often presented as a ‘quick and dirty’ example of the serendipity pattern (Rosenman, 2002, p. 189).6

5Merton uses ‘science’ as a blanket term to cover both the natural and social sciences. 6It is worth noting that Cunha et al. (2010, p 325) term this example the “Fleming Myth”.

6 1.3 Serendipity

Long after its initial appearance in the sociology literature, the serendipity pattern continues to be recognised in sociological circles for its potential contri- bution to the theory and practice of research (for examples, see Fine and Deegan, 1996; Perry and Edwards, 2010). Yet, despite the “Mertonian influence on in- formation studies” (Garfield, 2004a, p. 57), explicit references to the serendipity pattern have not featured frequently in the information studies literature;7 in- deed, citations to Merton in the information studies literature on serendipity are rarer full-stop than might be expected given his role in establishing serendipity as a formal concept in the social sciences.8 There is, however, a notable exception to this general trend. Garfield (2004a, p. 54) highlights Foster and Ford’s (2003, p. 322) citation of Merton in their paper on serendipity in information seeking:

In the social sciences, serendipity appears in a . . . “connection build- ing” role. Merton describes this process within sociological research ....

This exception may, in the long run, prove important because the frequent citation of Foster and Ford (2003) by serendipity researchers demonstrates the continuing importance of this paper to serendipity research; however, to date, there is little, if any, evidence that Foster and Ford’s passing citation to Merton has stimulated an interest in the serendipity pattern per se amongst researchers in information studies. A point noted by Garfield (2004a, p. 54) hints at a possible reason for why the serendipity pattern continues to remain out of view in information research:

This label is intended to make the point that, although Fleming made the initial discovery, the story does not stop there; it was “the use of those ideas by a group of people (the so-called Oxford team) that gave them full use in medical practice” (Cunha et al., 2010, p. 325, emphasis added; see also Meyer, 2007, p. 78–81). As discussed above, a discovery that fits the serendipity pattern serves to advance theory (or, as with the case of penicillin, practice). 7A recent exception is McCay-Peet and Toms’s (2011b) citation of Merton’s (1948) discussion of the serendipity pattern. 8This claim is based on the author’s review of the bibliographies of information studies papers on serendipity written within the last fifteen years. As is suggested later (see section 8.4.2), a systematic citation analysis of this question might make a useful and significant contribution.

7 1. EXPOSITION

The reference [Foster and Ford (2003, p. 322)] is to the 1968 edi- tion of Merton’s Social Theory and Social Structure (STSS), where he discusses the concept of “Serendipity Patterns” in some detail. Unfor- tunately, like most references to STSS, the citation is pageless, that is, it fails to cite chapter V or pages 157–162 explicitly.

All of this fuss over ‘serendipity the word’ versus ‘serendipity the pattern’ may seem somewhat irrelevant to the study of serendipity in information re- search; however, as section 2.2 argues, serendipity research, even that focused specifically on information retrieval or system design, is increasingly recognising the implications of the complexity and richness of serendipity as an information behaviour in context. ‘Serendipity the word’ (e.g. as understood in the Google analogy presented above) may never properly accommodate the complexity and richness of the concept information researchers seek to grasp. If awareness of Mer- ton’s interpretation of serendipity as a pattern was more widespread, then this might lead to more useful and mature conceptualisations of serendipity within the information studies field.

1.3.2 Accident and sagacity

Whether understood as a word or as a pattern, serendipity involves accident and sagacity—not one or the other, but both. To stress this point for her non- specialist audience,9 Alcock (2008, p. 13, 15) differentiates between serendipity lite, “the workings of chance and fortune”, and serendipity strong, “the stronger brew of accidents and sagacity”. The present research focused exclusively on serendipity strong. The idea of serendipity as the product of interactions between accidents and sagacity is clearly articulated in the literature. Merton (Tilly, 2010, p. 54) por- trays serendipity as an interaction of environment and cognition. Fine and Dee- gan (1996, p. 9) note that “serendipity involves planned insight coupled with un- planned events”. McBirnie (2008a,b) terms serendipity’s combination of events

9Discussions of serendipity appear in a range of literature, from academic papers to popular essays, , opinion pieces and personal anecdotes. In recognition of the pragmatic stance of the present study (see section 1.4.1), the literature review was not restricted to empirical peer-reviewed material, an approach that created both opportunities and challenges (see section 2.1.2).

8 1.3 Serendipity and insight, process/perception duality. Crucially, it is a specific10 combination of events and insight that results in serendipity: “. . . for serendipitous outcomes to occur, both the internal and external conditions must be right” (Makri and Blandford, 2011, ‘Themes that arose from discussion’, emphasis in original).

1.3.3 As sociological

Although serendipity involves both accident and sagacity, it is not uncommon to find these aspects explored separately in the literature; for example, Andr´e et al. (2009) divide their review of serendipity into discussions of chance elements and of sagacity elements. Related to this approach, is the conceptualisation of serendipity as either psychological or sociological. The idea of a psychological or sociological approach to a concept should sit comfortably with those involved in information studies, a research area influ- enced by “sociology, mass communication, and psychology” that represents an overlap between the psychological and the sociological (Case, 2007, p. 149–151). Indeed, a recent meta-synthesis (Urquhart, 2011) confirms the attention to both psychological approaches and constructivism in information behaviour research. There is some evidence, as Olsson (2005, ‘Social approaches to information behavior’) argues, that “cognitive ‘internal approaches’ to the study of informa- tion behavior” have prevailed over sociological approaches in the recent history of information research. This said, Olsson (2005, ‘Social approaches to information behavior’) goes on to cite Pettigrew, Fidel and Bruce (2001, p. 54):

Approaches to studying information behavior that focus on social con- text emerged slowly during the early 1990s and are becoming increas- ingly prominent. . . . social approaches were developed to address in- formation behavior phenomena that lie outside the realm of cognitive frameworks.

Despite this social turn, some information research (for examples, see Hein- str¨om,2005; Quinn, 2003) remains focused on the influence of personality, mo- tivation and other internal, psychological aspects of information behaviour; as

10Note that the term specific as used here should not be confused conceptually with the term unique.

9 1. EXPOSITION

Urquhart (2011, ‘Findings’, emphasis added) remarks, “. . . interest in psycholo- gical contributions to information behaviour research is still strong”. Although examples do present that explore sociological aspects of serendipity (see Fine and Deegan, 1996; Merton and Barber, 2004), there has been a notice- able tendency in serendipity research, both in information studies (see Erdelez, 1999; McCay-Peet and Toms, 2011b) and in other disciplines (see Andr´e et al., 2009; de Rond, 2005), to focus on the ‘sagacity’ component of serendipity, con- ceived as the cognitive, internal, psychological aspects of the phenomenon. Of course, these psychological aspects are important; however, there is no logical reason that the sociological aspects, the ‘external conditions’ (Makri and Bland- ford, 2011), deserve any less attention. As Thagard and Croft (1999, p. 136, emphasis added) note, in serendipity “. . . physical and social processes introduce unexpected elements into the cognitive process of problem solving”. A possible reason that the sociological conceptualisation of serendipity is sometimes treated as the ‘poor cousin’ of the psychological may stem from the (mis)understanding of serendipity’s ‘accident’ component, its interpretation as mere chance, Alcock’s (2008) serendipity lite. De Rond (2005) argues that the sagacity component rather than the accident element of serendipity is key. He also proclaims “human agency” rather than chance as the proper “focus of atten- tion” for serendipity research (de Rond, 2005, p. 2). But does human agency only relate to sagacity? And, is serendipity’s ‘accident’ really the same as chance? Merton (2004, p. 257), who chaperoned the concept during its introduction to the social sciences (see section 1.3.1), argues strongly against the idea of serendip- ity as a “faculty, capacity, gift, or talent”. Merton (2004, p. 257–258, emphasis in original) rejects any understanding of the phenomenon that is limited to psy- chological aspects:

By thus conceptualizing serendipity as a personal disposition of the discoverer, such definitions tacitly claim that psychology can exhaust- ively and exclusively account for the complex phenomenon of serendip- itous discovery. In short, these recurrent definitions lead to a reductive reconceptualization. They lead by strong implication only to psycho- logical questions that are then boiled down to this one: “What are the special psychological attributes of the individuals who make serendip- itous discoveries?”

10 1.3 Serendipity

Instead, Merton urges a focus on the ‘external conditions’ of serendipity, a per- spective that views serendipity “ . . . as an event that occurs in . . . ‘serendipitous sociocognitive microenvironments’ ” (Tilly, 2010, p. 54). Merton’s (2004, p. 261) conceptualisation of ‘accident’ understands that component as much more than chance, considering it as key to the serendipity event:

. . . Walpole’s intuitive coinage of serendipity as “accidental sagacity” had in effect, if unwittingly, implied the sociological importance of the specific and instinctive component of “accident” along with the psy- chological individual attribute of the generic component of “sagacity.” . . . “sagacity” is a . . . psychological attribute that is involved in any sort of discovery, and not specifically of the serendipitous sort. It is the other component, “accident,” the unanticipated circumstance that provides the opportunity for putting that sagacity to work, that gives “serendipity” its altogether distinctive meaning. And that brings us to the sociological perspective. The sociological analysis of serendipity presents us with a special case of the generic sociological problem of how differing social environments increase or decrease the probability of particular behavioral outcomes. (That is known in the sociological trade as the “agent-social structure problem.”)

This argument illustrates how Merton views both ‘human agency’ and ‘acci- dent’ very differently from de Rond (2005). In fact, Merton’s stance on serendipity reflects his more general interest in social structure, that is, in “specifying the antecedent structural conditions that give rise to social phenomenon” (Sztompka in Merton, 1996, p. 10). It is this structural analysis approach that Merton brings to bear on his seminal work in the sociology of scientific discovery (see Calhoun, 2010; Merton, 1973, 1996). Merton’s sociological conceptualisation of serendipity, a phenomenon which he classes as a sub-type of discovery (Merton, 2004, p. 261), is grounded firmly in this same structural analysis tradition.

1.3.4 Related concepts

The literature frequently links serendipity to certain other concepts. Section 1.3.1 introduced Alcock’s (2008) serendipity lite, which broadly encompasses the concept of chance. Other concepts that the literature relates to serendipity in- clude: synchronicity and coincidence (Boden, 2004; Cunha et al., 2010), luck

11 1. EXPOSITION

(Roberts, 1989), abduction (Merton, 1968; van Andel, 1994), error (Eco, 1999), bricolage (Cunha et al., 2010), imagination (Barber, 1953) and improvisation (Cunha et al., 2010; Laborde, 2009; McBirnie, 2008a,b). Two concepts appear most often alongside serendipity in the literature: serendipity may be linked to creativity (Bawden, 2011; Eaglestone, Ford, Brown and Moore, 2007; Foster and Ford, 2003), innovation (de Rond, 2005; Johnson, 2010; Koenig, 2000) or both (Anderson, 2011; Thagard and Croft, 1999). Although a tendency to disguise serendipity “under other labels” is noted in the literature (Cunha et al., 2010, p. 320; see also de Figueiredo and Campos, 2001),11 many authors make a concerted effort to differentiate serendipity from related concepts; for example, authors may place serendipity in a supporting role for creativity (Anderson, 2011; Cunha et al., 2010; de Figueiredo and Campos, 2001; Mueller, 1990; Ram, Wills, Domesck, Nersessian and Kolodner, 1995), an approach that relates the two concepts yet allows them to remain distinct. Possible explanations exist for the overlap between creativity and serendipity in the literature. Like serendipity, creativity can be viewed as a psychological (see Hewett, Czerwinski, Terry, Nunamaker, Candy, Kules and Sylvan, 2005) or as a sociological concept (see section 3.5.5.3). Models of creativity (Isaksen, 1987; Lee, Theng and Goh, 2005) include process, product and contextual elements, as do models of serendipity (Cunha, 2005; Erdelez, 2004, 2005; McBirnie, 2008a; McCay-Peet and Toms, 2010). Nonetheless, a closer look, reveals at least one crucial distinction between serendipity and creativity. Merton highlighted the significance of the accident component of serendipity (see section 1.3.3). As Cunha et al. (2010, p. 323) pointedly remark, “. . . there may be nothing accidental in creativity . . . ”. Given the strong logic of this conceptual argument against treating serendipity and creativity as synonymous, it seems reasonable to assume that any lack of differentiation in the literature stems, at least in part, from an unrecognised need “to ‘establish’ the phenomenon” (Merton, 1987, p. 4). Concerns related to

11The trend can be seen in some information research, where the terms serendipity and cre- ativity are employed with little concern for differentiating them as distinct concepts for the reader (for a recent example, see Bawden (2011)). Other examples are found in the applied business and management literature (for examples, see Holder and McKinney, 1993; Schum- peter, 2010; Tebbutt, 2007).

12 1.4 Research overview the problem of establishing the phenomenon have been discussed, not only in the general context of information behaviour research (Dervin, Rosenbaum and Urquhart, 2010), but also in the local context of serendipity research (Erdelez and Makri, 2011; Smith and Burns, 2011). In the case of serendipity, even if it is difficult to establish the phenomenon (see section 1.3.1), perhaps it is at least possible to agree on what the phenomenon is not. Certainly within contemporary serendipity research, the call increasingly resounds that serendipity differs from both creativity and innovation: “. . . being distinctive, serendipity should become a subject of research in its own right” (de Figueiredo and Campos, 2001, p. 1).

1.4 Research overview

1.4.1 Pragmatic stance

Teddlie and Tashakkori (2009, p. 7–8) define the philosophical orientation of pragmatism:12

. . . [pragmatism] focuses . . . on “what works” as the truth regard- ing the research questions under investigation. Pragmatism rejects . . . either/or choices . . . advocates for the use of mixed methods in re- search, and acknowledges that the values of the researcher play a large role . . . .

With its “down-to-earth philosophy” (Scott and Marshall, 2009, p. 594), a prag- matic approach is both practical and pluralistic.13 The present research adopted the pragmatic stance of focusing on ‘what works’ for two reasons:

1. the research questions—or, more specifically, the mixed methods approach they demanded; and,

12Space here does not permit anything other than the briefest of introductions to pragmatism. Readers seeking in depth discussions may wish to explore Goodman (2005). 13Classical pragmatism is sometimes presented as a framework, the four Ps of pragmatism: practical, pluralistic, participatory and provisional (see Brendel, 2006, p. 29). For further discussion of this framework and the debates it raises for contemporary practice, see Whetsell and Shields (2011).

13 1. EXPOSITION

2. the choice of serendipity as the research topic.

Teddlie and Tashakkori (2009, p. 7) note that pragmatism is “the philosoph- ical orientation most often associated with MM [mixed methods]”, an approach which advocates “the use of whatever methodological tools are required to an- swer the research questions under study”. As a research topic, serendipity invites a pragmatic approach. Serendipity is recognised both inside and outside of the academy: while academic treatises and empirical research consider serendipity, so do blogs and personal anecdotes. As well, serendipity features in multiple dis- ciplines and in numerous contexts. A pluralistic and practical approach helps to accommodate such diversity.

1.4.2 Motivations and methodological concerns

1.4.2.1 Motivations for the study

Four seemingly unrelated motivations drove the research:

1. questions raised by the author’s previous study;

2. the call for ‘new’ methods in serendipity research;

3. a call to study serendipity in the Citation Classics; and,

4. influences from the literature, especially Merton’s ideas on serendipity.

The study’s methodological concerns emerged from these disparate motivations. An analogy may help the reader to understand this process of emergence.

From islands to an archipelago Imagine each of the four motivations for the research as an island. Figure 1.1 provides a pictorial representation. Note that each island is named after one of the research motivations. At the outset of her travels, the explorer knows only that all of these islands are found in the Sea of Serendipity. However, as she sails amongst the islands, making repeated visits to each, the explorer notices several trends.

14 1.4 Research overview

Figure 1.1: An analogy: from islands to an archipelago - This figure shows each of the four motivations for the study as an island in the sea of serendipity. The explorer sails amongst the islands, noticing the connections she finds between them. The island image used in the figure is an edited version of public domain clip art available in the Open Clip Art Library (2012).

15 1. EXPOSITION

Trend 1: Something found on one island may be found on another; for example, the idea of serendipity as relational resides on both Previous Study and Literature.

Trend 2: Something found on one island may necessitate a journey to another; for example, a visit to New Methods is required to seek out a means to explore the idea of serendipity as relational because the methods native to Previous Study and so widespread on Literature are insufficient for rela- tional approaches.

Trend 3: Something found on one island may complement something found on another; for example, the large dataset found on Citation Classics may offer the potential to address the question of normal description found on Previous Study.

Noting these trends helps the explorer to view the islands from a different per- spective, to see the islands as an archipelago instead of as four unconnected islands.

1.4.2.2 Methodological concerns

As noted in section 1.4.2.1, the study’s methodological concerns emerged from multiple, initially disparate, motivations. Four methodological concerns were identified:

Concern 1: the decision to focus on the description of process (see section 1.4.3.1) in serendipity, and especially, precise and detailed description of process;

(rationale) This concern related to a specific interest in serendipity as sociolo- gical as well as a more general interest in serendipity as outer experience, as an event. The focus on precise and detailed description implied at least some quantitative description. For example, the literature suggests that serendipity is both active and passive, that it involves both ‘pull’ and ‘push’; however, serendipity theory lacks precision concerning the proportion of pull to push. Is the ratio 50 : 50 ? Or, does one normally dominate the other?

16 1.4 Research overview

If so, then by how much? The quantification of concepts such as push and pull may be important, especially in applied interventions, including systems design, that involve programming or the use of algorithms.

Concern 2: the conceptualisation of process in serendipity as relational;

(rationale) This concern stemmed from the sociological perspective on serendip- ity and from the questions raised by the author’s previous research. The relational perspective (see section 3.1) differs fundamentally from other ap- proaches in that individuals are only studied in relations. Such an approach has the potential to offer a view on serendipity invisible to non-relational approaches.

Concern 3: the hypothesis that serendipity is both individualistic and normal;

(rationale) Both the author’s previous research and aspects of serendipity the- ory contributed to the development of this hypothesis by stressing the need to acknowledge the lived, experienced and personal nature of serendipity. This need highlighted the suitability of a phenomenological approach. The theoretical literature, especially material referencing classifications/patterns of serendipity, and the author’s previous empirical findings also suggested the possibility of normality in serendipity. Testing such an idea required a dataset large enough to allow a sample-based approach.

Concern 4: the desire to answer two existing calls: one to study serendipity in the Citation Classics dataset, and the other, to expand the methods used in serendipity research.

(rationale) The use of the Citation Classics complemented both the need for a large dataset to test for normality and the desire, stemming from Merton’s ideas, to investigate serendipity in the context of research. As narratives of personal experience, the Citation Classics were also highly phenomenolo- gical in nature. The involvement of narratives and the conceptualisation of process as relational implied the use of certain methods. The use of these methods, narrative and network analysis, represented a step away from the approaches commonly used in serendipity research.

17 1. EXPOSITION

As noted in section 1.4.1, the pragmatic stance acknowledges that researcher values play a large role in research. The methodological concerns represented the methodological interests of the informed researcher, the questions and aspects that she believed, based on the background information and literature, were im- portant, meaningful and useful to study in relation to serendipity. The label methodological concern (i.e. rather than ‘methodological aim’) was selected to make explicit this involvement of researcher values. A set of four methodolo- gical objectives (see section 4.1.1) served to operationalise the methodological concerns.

1.4.2.3 Significance of methodology

It is a convention of academic research to define the research aim before turning to methodological matters; however, in the case of the present study, not only did the methodological concerns come first, but also they subsequently openly guided the development of the theoretical research aim. Such an approach was appropriate, indeed, necessary, due to the methodological implications intrinsically tied in with the initial motivations for the study. Returning to the island analogy for a moment (see section 1.4.2.1, Figure 1.1), the reader will note that two of the four motivations, the islands of Cita- tion Classics and New Methods, were inherently methodological. The remaining two motivations for the study combined theoretical and methodological elements. This distribution of methodology and theory across the islands highlights a key point: the research was motivated by methodological concerns as much as, if not more than, theoretical concerns. The study’s contribution to new knowledge—not only what we know but also how we know—should be judged accordingly

1.4.3 Key terms

1.4.3.1 Process

This study was concerned with process in serendipity. The focus was on the ‘doing and happening’ in serendipity. This ‘doing and happening’ was captured from the Citation Classics narratives using event-based (i.e. structural, linguistic)

18 1.4 Research overview narrative analysis (see section 3.3.2). As Ricoeur (1984, p. 56) remarks, “ . . . there is no structural analysis of narrative that does not borrow from an explicit or implicit phenomenology of ‘doing something’ ”. More specifically, the study was concerned with the sociological aspects— external actions and events rather than internal feelings, thoughts or motivations— of the ‘doing and happening’ in serendipity. Halliday and Matthiessen (2004, p. 170) note, “The prototypical form of the ‘outer’ experience is that of actions and events: things happen, and people or other actors do things, or make them happen”. In other words, actions and events represent process as opposed to stasis: “Process statements are in the mode of DO or HAPPEN . . . Stasis state- ments are in the mode of IS” (Chatman, 1978, p. 32). Halliday (1994, p. 108) clarifies this linguistic conceptualisation of process:

A process consists, in principle, of three components:

(i.) the process itself; (ii.) participants in the process; and, (iii.) circumstances associated with the process.

In this interpretation of what is going on, there is a doing, a doer, and a location where the doing takes place. This tripartite interpretation of processes is what lies behind the grammatical distinction of word classes into verbs, nouns, and the rest.

Halliday’s use of the term ‘location’ should not be taken too literally. As Franzosi (2010, p. 23) remarks, “For Halliday (1994), the circumstances of a process refer to ‘the location of an event in time or space, its manner, or its cause’ ”. It is also important to note that in any given process statement, both the process itself (i.e. the doing) and the participants may have attributes.14

1.4.3.2 Event structure

As stated in section 1.4.2.2, the research was concerned, not simply with studying process in serendipity, but specifically with studying this process as relational;

14Note also that the number of participants involved in a process statement depends on the transitivity/intransitivity of the verb (see section 3.3.3.4).

19 1. EXPOSITION therefore, for each Citation Classic narrative explored, process statements were studied as dyadic relations (see section 3.1.1), that is as {Participant → Process → Participant} statements termed semantic triplets (see sections 3.3.3 and 3.3.4). Linked together, these process statements formed an event structure, a network of semantic triplets found within the narrative. The research borrowed both the event structure label and the event structure concept from an existing work: Labov’s (2003) study entitled, “Uncovering the event structure of narrative”. Although Labov does not define event structure formally, in practice, he uses the term to label the semantic structure formed by the set of process statements found within a narrative. To avoid confusion, it should be noted that event structures, “relations on sets of events” (Winskel, 1980, p. ii), also feature as semantic models of process in mathematics and computer science (see Winskel, 1987, 1989). In particular, the concurrency theory literature recognises several classes of formal event struc- ture models (Virbitskaite and Dubtsov, 2008, p. 125). Because, as formal models, “. . . they explicitly exhibit the interplay between concurrency and nondetermin- ism”, event structures are especially useful for the study of distributed systems (Dubtsov, 2005, p. 42).

1.4.4 Questions and theoretical aim

1.4.4.1 Research questions

The research questions centered on the components involved in the tripartite in- terpretation of process outlined in section 1.4.3.1: the process itself (i.e. the doing or happening), the participants in the process, and the circumstances associated with the process. Some questions related to the relational event structures formed by process and participant in serendipity:

ˆ What event structures are described in personal accounts of serendipity in research?

ˆ How does description of these event structures contribute to understandings of relational aspects of process in serendipity?

20 1.4 Research overview

ˆ To what extent, if at all, do the described event structures support the idea of average or normal process in serendipity?

Other questions addressed the circumstances associated with process in serendip- ity, especially issues related to context and manner of occurrence:

ˆ How do recounted lived experiences of serendipity relate to classifications and/or patterns of serendipity proposed in the literature?

ˆ What sociological elements are proposed in the literature as obstacles to serendipity? To what extent do recounted lived experiences of serendipity support/refute these propositions?

ˆ How do people retrospectively interpret and make sense of their experiences of process in serendipity above and beyond a basic recounting of events?

1.4.4.2 Theoretical aim

Based on pre-existing methodological concerns (see section 1.4.2.2) and the re- search questions (see section 1.4.4.1), the study aimed to describe process in serendipity, conceived as phenomenological and relational, as recounted by the narrators of a random sample of research narratives drawn from a subpopula- tion of narratives of experiences of serendipity found within the Citation Classics dataset. This aim comprised two elements, each designed to investigate different components of process:

1. the characterisation of selected contextual, process-related elements of the sample narratives (i.e. circumstances associated with process15); and,

2. the description of phenomenological event structures of serendipity extrac- ted from the sample narratives (i.e. process itself and the participants in the process16).

15These associated circumstances are henceforth referred to as ‘contextual’ and/or ‘process- related’ elements due to the potential for semantic misunderstanding involved with the use of the term ‘circumstantial’. 16Note that, because of the relational perspective adopted in the study, these two components were studied in relation rather than individually.

21 1. EXPOSITION

Each element of the theoretical aim was operationalised by a distinct set of the- oretical objectives (see sections 4.1.3.1 and 4.1.3.2).

1.4.5 The experiential stage

Two frameworks—one conceptual, one methodological—supported the experien- tial stage of the research. Key conceptual constructions included: serendipity as a sociological pattern; serendipity as information behaviour in context; and, serendipity as process. From a methodological perspective, both relationalism and phenomenology played fundamental roles in the research. Modelling theory, with its pluralistic and practical implications, was also important to the study. In line with its pragmatic stance, the research followed a parallel conversion mixed methods design: ‘parallel’ because the two components of the theoretical aim required different strands of investigative work; and, ‘conversion’ due to the need to translate the qualitative Citation Classic data into quantitative data. Linked through relational text analysis, a form of content analysis, narrative and network approaches served as the study’s primary methods. Additionally, the research involved survey methods, inferential statistical network modelling, and traditional statistical analysis. As appropriate for a mixed methods study, the research combined qualitative (e.g. narrative analysis), quantitative (e.g. statistical analysis) and inherently mixed (e.g. network analysis) approaches.

1.4.6 Substantive outcomes

The experiential stage led to three substantive outcomes:

1. a descriptive profile of process in serendipity;

2. a network portfolio; and,

3. a research credibility audit.

Given that theoretical and methodological motivations drove the study, all of these outcomes should be considered ‘research results’.

22 1.5 The thesis document

1.4.7 Original contribution

The research contributes both methodologically and theoretically to the under- standing of serendipity as a complex phenomenon. The markers of the research, that in combination make the study unique, are the interest in a relational view of serendipity, the focus on the outer experience of the phenomenon, and the attention—especially unusual in serendipity research—to both qualitative and quantitative description.

1.5 The thesis document

1.5.1 Preparation

The thesis was written in the LATEX (Lamport, 2012) document mark-up lan- 17 guage and processed with the MiKTEX (Schenk, 2012) typesetting system. BIB- TEX (Patashnik and Lamport, 2012) managed the references. A thesis template (Suckale, 2007), which is in turn a corrected and extended version of Bhanderi (2002), lies behind the present document. Appendix A lists the LATEX packages (see DANTE, 2012; Heffron, 2012; Mittelbach, Sch¨opfand Downes, 2001) added to customise the thesis template. Many tools helped to create the thesis fig- ures (see section 4.8); however, JING® (TechSmith Corporation, 2012) played a special role by capturing the .png image files incorporated into this document.

1.5.2 Reader’s roadmap

As well as more familiar territory, the thesis covers some conceptual and meth- odological grounds not commonly visited in information behaviour research. Al- though it makes for an interesting landscape, the terrain can be challenging at times; every attempt has been made to provide ample signposts for the reader and to smooth the ride over the bumpy bits.

17Version 2.9 of MiKTEX was used in the study.

23 1. EXPOSITION

Here is a roadmap of the route ahead:

↓ Chapter 2 comments on the literature review and assembles the study’s conceptual framework.

↓ Chapter 3 builds the study’s methodological framework.

↓ Chapter 4 details the mixed methods research design.

↓ Chapter 5 traces the operationalisation of the research.

↓ Chapters 6 and 7 present and discuss the research results.

↓ Chapter 8 draws the study to a close, emphasising contributions made and new doors opened.

A supplementary volume offers the reader an extra side-trip: the Network Portfo- lio collects the study’s event structure visualisations, along with the quantitative matrix data that underlie these illustrations.

24 Chapter 2

Conceptual Framework

This chapter considers the conceptual framework that supported the research. After comments on the literature review, especially clarifications concerning the study’s approach to the serendipity literature, the discussion concentrates on two constructions that helped to shape the conceptual framework:1 serendipity as information behaviour in context; and, serendipity as process.

2.1 Comments on the literature review

2.1.1 The reporting approach

Given its theoretical and methodological motivations, the research needed to review literature from multiple areas and disciplines. For reporting purposes, the literature reviewed is presented within the two frameworks that supported the experiential stage of the study. Chapter 3 concentrates on the methodological framework, while the present chapter, along with section 1.3 from Chapter 1, sets out the conceptual framework. Due to practical concerns related to the study methods, the conceptual and methodological frameworks were finalised around the mid-way point of the re- search (see section 5.1). In practice, this meant that literature published from summer 2010 did not influence the development of the study frameworks. This

1The third construction that contributed to the conceptual framework—serendipity as a sociological pattern—is discussed in section 1.3.

25 2. CONCEPTUAL FRAMEWORK

later literature only impacted the descriptive inference process and the credibility audit; however, to avoid the need to introduce new literature in the results and discussion sections of the report (i.e. in Chapters 6, 7 and 8), the post-summer 2010 literature is incorporated into Chapters 2 and 3.2 It is important for the reader to be aware that some areas of the literature have taken significant steps forward very recently. As such, the ratio of pre- to post-summer 2010 literature in Chapters 2 and 3 may seem unbalanced, biased towards the newer research that did not contribute to the study frameworks. This point applies especially to the serendipity literature, a body of work that has experienced a recent “swell” in publications (McCay-Peet and Toms, 2011a, ‘Previous work’) and benefited from new, concerted efforts to advance the research area (Erdelez and Makri, 2011; Smith and Burns, 2011).

2.1.2 The serendipity literature

The literature on serendipity presented as a special case in the context of the research. Several factors affected the approach to the serendipity literature:

1. the number of existing literature reviews on serendipity;

2. the breadth of the serendipity literature;

3. the study’s pragmatic stance;

4. the grounding of the research in information studies;

5. the focus on process; and,

6. the researcher’s previous work on serendipity. 2In actuality, later sections of the report, especially inference evaluations, transferability discussions and the research credibility audit, do reference a limited amount of additional literature. The author recognises that some readers find the introduction of new literature after the methods section of a thesis undesirable; however, with methods-related matters and transferability issues, it is sometimes only possible to fully appreciate problems and possibilities retrospectively.

26 2.1 Comments on the literature review

Existing literature reviews Several reviews that reference the serendipity literature are included in the re- cent wave of publications on serendipity (see section 2.1.1). As Bawden (2011, p. 1) notes in his own selective review, the year 2010 produced three such sum- maries (Anderson, 2010; Makri and Warwick, 2010; Nutefall and Ryder, 2010) to expand the choice already available (Bates, 2007; Bawden, 1993; Chang and Rice, 1993; Foster and Ford, 2003; McBirnie, 2008b; Rice, McCreadie and Chang, 2001).3 Andr´e et al.’s (2009) review of serendipity in the design context can be added to Bawden’s (2011) list.4 Given the plethora of secondary sources avail- able to the reader seeking an overview of the serendipity literature, a reporting approach intent on simply ‘introducing’ the main primary sources—yet again— seemed unuseful.

Breadth of the literature The serendipity literature is broad. It encompasses the academic domains of the sciences, social sciences and arts and humanities (Foster and Ford, 2003; Merton and Barber, 2004). Disciplines with an interest in serendipity include: information studies, human computer interaction, organisational studies, history, psychology, art, chemistry, astronomy and medicine (de Rond and Morley, 2008; Erdelez and Makri, 2011; Garc´ıa,2009; Merton and Barber, 2004; van Andel and Bourcier, 2009).5 Serendipity is explored in several contexts, such as everyday life, chance meet- ings, and the research process (Blandford and Attfield, 2010; van Andel, 1994). Foci of discussions include: theoretical frameworks/models (McCay-Peet and Toms, 2010); empirical research findings (Foster and Ford, 2003); personal ex- periences (Winchester, 2008); design interventions (Campos and de Figueiredo, 2001); and, the benefits/impacts of serendipity (Morton and Swindler, 2005; P. H. A., 1963).

3Some of these reviews focus almost entirely on serendipity, while others discuss serendipity in the company of creativity and/or browsing. Sections 1.3.4 and 2.2.2.2 outline the links between serendipity and creativity/browsing. 4The present author has benefited considerably from reading another extensive, but as yet, unpublished, literature review on serendipity. That review’s author has requested that the unpublished version not be cited, a wish that is respected here. 5This list is not intended to be exhaustive.

27 2. CONCEPTUAL FRAMEWORK

Discussions of serendipity take many forms: from personal anecdotes and opinion pieces (de Rond and Morley, 2008), to research reports Erdelez and Makri (2011) and theoretical treatises (Merton and Barber, 2004). As well, discussions may be intended for a variety of audiences: popular (de Rond and Morley, 2008), academic (Merton, 1948) and professional (Snowden, 2003).

Pragmatic stance Because of its pragmatic stance, the research sought to explore serendipity beyond information studies and outside of the empirical research literature; as highlighted above, discussions of serendipity, and therefore, by default, discus- sions of process in serendipity, are neither restricted to the academy nor limited to information studies. The study’s pragmatic stance also helped to overcome two limitations that often affect more traditionally situated literature reviews of serendipity. One limitation of some existing serendipity literature reviews is the tendency to focus on a particular discipline; for example, Nutefall and Ryder (2010) concen- trate on serendipity in library and information science. While this discipline-tied approach is a useful, and often necessary, means for summarising a research area, in the case of serendipity, such an approach can cause problems, especially the perpetuation of a lack of awareness of the true extent of the serendipity literature. A quick look at Garfield’s (2004b) HistCite citation analysis of papers with the stem serendipit* in the title published during the period 1950–2003 shows that, in fact, a fair amount has been written about serendipity;6 however, both the citation matrix and the graph from this analysis7 indicate that what material does exist, certainly from the period 1950–2003,8 is not very well connected. In other words, the material does not demonstrate a lot of intercitation. This lack of intercitation results in part from intra- as opposed to interdiscip- linary citation. It also stems from the fact that much writing on serendipity is of

6Note that the analysis includes 832 nodes (i.e. documents). Note also that Garfield’s (2004b) HistCite does not take account of the recent swell in serendipity research (see section 2.1.1). 7The hot links to these can be found at the top of the page in Garfield (2004b). 8As section 8.4.2 suggests, a new Hist Cite for the period 2003–present might provide an interesting comparison. Has the level of connectedness across the serendipity literature improved in recent years?

28 2.1 Comments on the literature review the ‘historical account’ type (Garc´ıa,2009). The use of citation in such personal anecdotes may be limited or nonexistent. Garfield’s (2004b) HistCite emphasises the dominance of historical accounts in the serendipity literature: whereas anecdotes, opinion pieces and theoret- ical discussions about serendipity are relatively numerous, empirical research on serendipity is less plentiful. Thus, in citing material to support its arguments, a serendipity literature review may find itself relying heavily on non-empirical literature. This point serves to introduce the second limitation of literature reviews of serendipity: some reviews fail to make a distinction between theoretical and em- pirical literature. The result is that the reader cannot determine to what extent a given claim is supported by systematically-collected evidence (as opposed to ‘untested’ assumption). A related problem is that the serendipity literature’s many ‘ways of appearing’ can leave the nonexpert reader of a review9 uncertain about the nature and/or quality of a cited source: is the reference in the review to an anecdote or an empirical study? if an anecdote, then is the narrative a first- or second-hand account? Such questions are not always readily answered by bibliographic data alone. The ‘ways of appearing’ problem is exasperated by the sometimes fuzzy boundaries between formal academic works and informed opinion pieces in the serendipity literature; for example, parts of de Rond and Morley’s (2008) edited collection of popular lectures seem no less or more ‘academic’ or ‘informed’ than parts of Merton and Barber’s (2004) scholarly treatise on serendipity. The study’s pragmatic approach ensured that the serendipity literature re- view was not tied to a single discipline; this said, some disciplines did contribute more than others. Claims supported by systematically collected research evid- ence are highlighted as such. As well, the report specifies the intended audience and/or ‘way of appearing’ when it is deemed important for the reader to recognise that the discussion has stepped outside the formal academic literature normally expected in a thesis.

9Literature reviews are aimed to a large extent at such nonexpert readers (i.e. readers who require high-level overviews of a topic, or readers at the early stages of exploring a subject).

29 2. CONCEPTUAL FRAMEWORK

Information studies grounding The present study considered serendipity through the lens of information be- haviour research. As such, the information studies literature provided a constant backdrop for the review’s interpretation of the serendipity literature. The thesis assumes a reader familiar both with common information behaviour concepts and with the broad information research literature.

Focus on process In terms of serendipity, the research aims demanded both a sociological per- spective and a focus on process. The literature included in the thesis was limited accordingly.

Researcher experience The present author’s (i.e. the researcher’s) previous work on serendipity could not be ignored in terms of its impact on the literature review process. A reason- able existing familiarity and confidence with the information studies literature on serendipity left the researcher free to explore areas of outside literature that limits on time or lack of knowledge might have otherwise prohibited. Also, as noted in section 1.4.1, the researcher’s empirical background with serendipity influenced the adoption of the study’s pragmatic stance, along with the resulting decision to explore outside of information studies and beyond the empirical research liter- ature. This decision was guided in large part by the researcher’s curiosity about what contributions to serendipity research might be found by taking a broader view.

30 2.2 Serendipity as information behaviour in context

2.2 Serendipity as information behaviour in context

2.2.1 Context

2.2.1.1 Understandings and debates

As Case (2007, p. 243) notes, “An increasingly emphasized generalization is that information behavior takes place in a context”. As evidence for the importance of context in information behavior research, Case (2007, p. 244–245) goes on to highlight the number of conferences devoted to the topic. Despite this interest, context remains a problematic term: “. . . no term [is] more often used, less often defined, and when defined, defined so variously as context” (Dervin, 1997, p. 14). The limited space here does not permit anything more than a brief introduction to context; Agarwal, Xu and Poo (2009), Courtright (2007), Dervin (1997), Johnson (2003) and Talja, Keso and Pietil¨ainen(1999) provide more in depth reviews. While theoretical understandings of context range from the rather general, “frameworks of meaning” (Cool, 2001, p. 8) and “some kind of background for something the researcher wishes to understand and explain (Alajoutsij¨arviand Eriksson, 1998; Pettigrew, 1987)” (Talja et al., 1999, p. 752), to the more spe- cific, “the collection of of events, histories, culture, knowledge and understand- ing, which exist together at a point in time (Dervin, 1997; Sonnenwald, 1999)” (Prekop, 2002, p. 535), “. . . the most common meaning of context in [empirical] research . . . [is] the concrete context in which use of information is studied” (Talja et al., 1999, p. 752). These ‘concrete contexts’ include, amongst other external elements: settings, environments, events, activities, roles and groups (Agarwal et al., 2009; Allen and Kim, 2001; Case, 2007; Prekop, 2002). A single external element may suffice as context for a given study’s purpose. Alternatively, a contextual focus may encompass several external elements, for ex- ample, by concentrating on a certain group within a specific setting. Lopatovska, Basen, Duneja, Kwong, Pasquinelli, Sopab, Stokes and Weller’s (2011) study of the information behaviour of New York City subway commuters provides an ex- ample of such a multi-element approach. Yet a different approach (for example, see T¨ottermanand Wid´en-Wulff,2007) considers relationships and the struc-

31 2. CONCEPTUAL FRAMEWORK tures they form as context. Sections 2.2.1.2 and 2.2.1.3 link the multi-element and relational approaches to the present study.

2.2.1.2 Research as context

As noted in section 2.2.1.1, a multi-element contextual focus may encompass settings, activities and groups. This approach to context serves as the foundation for information behaviour studies that explore the activity of academic research (for examples, see Ellis, 1993; Foster, 2003) and, more specifically, serendipity in academic research (Foster and Ford, 2003; Nutefall and Ryder, 2010). In these studies, academic researchers are the group, the research study is the setting, and the research process is the activity. It is important to note that the use of the term research varies in informa- tion studies. Writing in French, Ertzscheid and Gallezot (2003, section 1.2) refer to this situation, highlighting the difference in meaning between rechercher, ‘to search’, and rechercher, ‘to research’: search finds new information (e.g. in an in- formation retrieval context), while research produces new knowledge and requires a richer context, such as that of academic research. Based on Merton’s interpret- ation (see section 1.3.1), serendipity can occur in a search context; however, the serendipity pattern must play out in the richer context of research.

2.2.1.3 Relational context

A relational perspective (see section 3.1) offers another understanding of context (see section 2.2.1.1). As with a focus on settings, activities or groups, information behaviour remains contextually situated; however, with the relational approach, a network of relations is ‘the context’. Information behaviour is both enabled and restricted by the structure of the network. The relational approach to context can be combined with an interest in settings, activities or groups; however, in such combined situations, the theory that underlies relational approaches demands that the focus remains first and foremost on the relational aspects of the context.

32 2.2 Serendipity as information behaviour in context

The relational approach to context is especially evident in studies of inform- ation seeking (Borgatti and Cross, 2003; Cross, Rice and Parker, 2001; Evans, Kairam and Pirollo, 2009; Johnson, 2004; Mackenzie, 2005); however, other in- formation behaviour research does draw on established relational concepts, in- cluding social networks and small worlds (Fisher and Naumer, 2006; Hargittai and Hinnant, 2006; Stutzman, 2011). Section 3.5.5.2 provides further discussion on the use of network theory and methods in information studies research. In- terpretations of context as relational also present in innovation and creativity research, areas related to serendipity research (see section 3.5.5.3).

2.2.2 Information behaviour

2.2.2.1 Literature reviews and collections

This thesis does not attempt a review of information behaviour research; instead, the discussion concentrates on one model of the field. For the reader interested in undertaking an extensive exploration of information behaviour research, the existing literature includes several reviews and collections that, in sum, provide an excellent compendium of the field (see Case, 2002, 2007; Fisher, Erdelez and McKechnie, 2005; Spink and Cole, 2006; Wilson, 1997, 2000).

2.2.2.2 Wilson’s Nested Model

Overview of the model Figure 2.1 illustrates Wilson’s (1999, p. 263) Nested Model of information be- haviour research fields. The model places the study of information search, seeking and behaviour in nested circles. Wilson clarifies the model’s terminology:

1. information behaviour is “the general field of investigation” interested in “. . . those activities a person may engage in when identifying his or her own needs for information, searching for such information in any way, and using or transferring that information” (1999, p. 263, 249);

2. information seeking is “. . . a sub-set of [information behaviour] . . . concerned with the variety of methods people employ to discover, and gain access to information . . . ” (1999, p. 263); and,

33 2. CONCEPTUAL FRAMEWORK

Figure 2.1: Wilson’s (1999) Nested Model - The model in this figure is reproduced from Wilson (1999, p. 263). The model conceives information searching behaviour as a subset of information seeking behaviour, which is in turn conceived as a subset of information behaviour. Each circle represents a research field.

34 2.2 Serendipity as information behaviour in context

3. information searching is “. . . concerned with the interactions between in- formation user . . . and computer-based information systems . . . ” (1999, p. 263).

Bearing in mind the date of the model, the abundant theoretical and empir- ical work conducted in the relevant sub-fields since the model’s introduction, and the variety of newer perspectives now explored in information behaviour research (see Spink and Cole, 2006), these definitions are not without their problems; for example, in the case of information behaviour, researchers interested in collab- orative information behaviour (Prekop, 2002; Talja and Hansen, 2006) find the restriction to a single “person” with “his or her own needs” problematic. This said, recall from section 2.2.2.1 that a review of information behaviour research— including, by default, the many debates about terminology—is not the goal here; for the purposes of the present discussion, Wilson’s definitions suffice. In relation to the model, Wilson (1999, p. 263) argues:

We might also extend the nested model further by showing that in- formation behaviour is a part of human communication behaviour: given the amount of information-related research in various aspects of communication studies . . . it may be particularly useful to remember this in certain contexts.

It is important to note that while he highlights the “connection” between inform- ation and communication behaviours, Wilson (1999, p. 264) does not conceive communication studies as an “all-embracing” field for information behaviour, hence, the absence of a fourth level (i.e. an outer circle labelled ‘communication behaviour’) in the Nested Model. The idea of the larger circles embracing the smaller circles within the model needs to be interpreted with care. Larger circles do not equate with greater value or importance: information search is no more or less worthy of research attention than information seeking. As well, the nested structure is not intended to isolate. As Wilson (1999, p. 264–265) notes:

. . . an investigation could, in fact, explore the relationships across the fields . . . taking a slice across the circles to explore the behaviour of a group or an individual in terms of . . . information behaviour, information-seeking . . . and information searching . . . .

35 2. CONCEPTUAL FRAMEWORK

Despite its age, the Nested Model remains a useful tool for understanding in- formation behaviour research. The extended version is especially able to accom- modate perspectives that draw on communications theory, such as sensemaking (see Dervin, 1992, endnote 2) and play and entertainment theories (see Case, 2007, p. 161–164). The model also provides a convenient space in which to locate the study of serendipity, both within and beyond information studies.

Information studies of serendipity in relation to the model As noted in section 2.1.2, serendipity is studied by many disciplines. Within information studies, discussions of serendipity tend to self-locate within the two inner circles of Wilson’s (1999) model, information search and information seek- ing; for example, Toms (2000a, Title) considers “serendipitous information re- trieval” while Foster and Ford (2003, Title) explore “serendipity and information seeking”. Because browsing,10 which is sometimes linked to serendipity in the literature (for a recent example, see Bawden, 2011), is considered by many “to be a type of information seeking” (Case, 2007, p. 90), it, too, traditionally sits in the middle circle of Wilson’s (1999) model. A growing tendency for information behaviour research to self-locate incor- rectly (e.g. ‘search’ as ‘seeking’, or even, as ‘behaviour’) is noted in the literature (see Case, 2007, p. 323–324); however, some evidence exists to suggest that in- formation researchers are, not only aware of the bias towards search and seeking in information studies of serendipity, but also, that they believe work beyond these areas is needed. 10Selected related terms include: encountering, scanning, foraging, and grazing (see Case, 2007, p. 91–93).

36 2.2 Serendipity as information behaviour in context

For example, Erdelez (2010), in a discussion that differentiated between in- formation encountering (Erdelez, 1999, 2005) and the broader opportunistic dis- covery of information (Erdelez and Makri, 2011),11 highlighted the tendency to favour search and seeking in serendipity research. She noted a need for more serendipity research to be explicitly situated in the wider arena of information behaviour.12 This said, examples of studies that consider serendipity explicitly as informa- tion behaviour do exist. Sun et al. (2011) stands as a clear example of serendipity research that looks beyond search and seeking. Other recent work, although still revealing some ‘apron string’-like ties to search and seeking, also attempts to self-locate in the larger circle of information behaviour and/or the related area of communication behaviour through references to frameworks such as informa- tion grounds (P´alsd´ottir,2011), everyday life (Rubin et al., 2011; Yadamsuren and Heinstr¨om,2011), and dynamic digital and entertainment environments (McCay- Peet and Toms, 2011a; Yadamsuren and Heinstr¨om,2011).

Other serendipity literature in relation to the model Wilson’s (1999) Nested Model also provides a space in which to consider the serendipity literature found outside of information behaviour research. One sub-

11For the purposes of the present study, these two terms are treated as synonymous with serendipity; however, it is important to stress that Erdelez is actively promoting the term opportunistic discovery of information (see Smith and Burns, 2011). Her efforts reflect desires “to resolve the challenge of conflicting terminology” (Smith and Burns, 2011, ‘Day one’) and to encourage researchers to explore the wider (and richer) domain of information discovery (i.e. rather than search or seeking). While the present author shares Erdelez’s concerns, she prefers the term serendipity, even with its problematic semantic issues, for two reasons:

1. the connection with serendipity research in fields outside of information studies, discip- lines where the term serendipity is in established use, is more easily maintained; and, 2. the link back to Merton’s (1948) formal introduction of serendipity, and especially the serendipity pattern, to the social sciences (see section 1.3.1) remains explicit.

Figure 1 (Smith and Burns, 2011), a word cloud of the above mentioned “conflicting termino- logy”, suggests that the author is not alone in her preference: serendipity is the most dominant word in the word cloud. 12At her seminar, Erdelez illustrated this point by first drawing a cloud-like shape to represent the opportunistic discovery of information, and then colouring in a small area of that shape to represent the limited space encompassed by information encountering.

37 2. CONCEPTUAL FRAMEWORK

set of this literature relates to computing, systems design and human-computer interaction; another comprises business, management and organisational studies. All of these areas are connected by formal ties, if not directly to information behaviour research, then at least indirectly via links with information and/or communication studies. This situation results in some degree of common usage of terminology, concepts and frameworks. For example, Blandford and Attfield (2010, p. 7), noting that sensemaking ap- pears not only in information behaviour research but also in organisational stud- ies and the human computer interaction literature, argue that all three areas link sensemaking with “mapping out the processes through which people construct meaning from the information they experience”. The links between information behaviour research and other literatures make it possible to map empirical stud- ies of serendipity from fields such as organisational studies or human computer interaction onto the Nested Model. What about the remaining serendipity literature—for example, theoretical discussions from sociology (see Fine and Deegan, 1996; Merton and Barber, 2004) or collections of anecdotes from the sciences (see Hannan, 2006; Roberts, 1989)? Can this literature be mapped onto Wilson’s (1999) Nested Model? Literature from sociology and the sciences normally does not use formal in- formation behaviour terminology; nonetheless, recognisable information beha- viour concepts and frameworks are often present. For example, an anecdote discussing serendipity in scientific research may refer to ‘the research questions’ or ‘the aims of the experiment’; the information behaviour researcher may map these onto the concept of information needs. Many theoretical and anecdotal discussions of serendipity consider the phe- nomenon in the context of research, a rich context that accommodates the serendip- ity pattern (see sections 1.3.1 and 2.2.1.2). Such discussions do display links to models that fall into Wilson’s (1999, p. 263) “information-seeking behaviour . . . sub-set”; for example, links to both process-based and problem solving in- formation seeking models are especially evident in the case of research anecdotes, narratives in which the scientific method may feature heavily. Despite these links, the theoretical and anecdotal literature’s focus on the research context implies that much of this literature primarily reflects the interests of the information

38 2.3 Serendipity as process behaviour research field, the largest circle of the Nested Model.

A clearer picture A review of the serendipity literature in the space of Wilson’s (1999) Nested Model results in a clearer picture of the major research foci and gaps in serendipity research both within and across disciplines; for example, the picture drawn con- trasts the attention paid to serendipity in seeking and search within information studies with the focus on information behaviour in the theoretical and anecdotal literature in sociology and the sciences. As well, the Nested Model acts as a fa- miliar travelling companion for the information behaviour specialist who seeks to step out and explore the territory of serendipity research in seemingly unfamiliar and distant domains and disciplines. This latter point is especially important given the abundance of serendipity literature outside of information studies (see section 2.1.2) and the potential contribution of this literature to understandings of serendipity as information behaviour in context.

2.3 Serendipity as process

2.3.1 Nature

Sun et al. (2011) note that people have a perception of serendipity, including what it is, how it happens, and what its characteristics are. People describe not only process in serendipity—the actions, events, participants and associated circumstances—but also the nature of this process. The literature suggests that the process of serendipity is endowed with experiential, temporal and relational characteristics.

Lived, experienced and personal The large number of anecdotes found in the literature (see section 2.1.2) sug- gests that serendipity is perceived as lived personal experience. The non-anecdotal serendipity literature reflects this perception; for example, Leong, Wright, Vetere and Howard’s (2010, p. 256, 261) empirical study conceives serendipity as “lived experience” that is “personally meaningful”, while Hangal et al.’s (2012) design

39 2. CONCEPTUAL FRAMEWORK intervention presents an experience-infused browser for serendipity that incor- porates the personal data of the user. More generally, the experiential nature of serendipity is implied whenever serendipity is labelled as ‘phenomenon’ (for examples, see Foster and Ford, 2003; Johnson, 2010; Leong et al., 2010; Merton and Barber, 2004), a point discussed further in section 3.2.1.

Longitudinal and retrospective Solomon (1997) links time and timing with information behaviour. Temporal aspects of serendipity are noted in the literature. Serendipity is perceived as lon- gitudinal: in other words, serendipity, like other information behaviours, “takes time” (see Solomon, 1997, p. 1107). Some literature assigns time a necessary, but neutral role in serendipity; for example, serendipity is understood both as a process that moves through time (Sawaizumi, Katai, Kawakami and Shiose, 2009) and as a process that happens in time (Cunha, 2005; Erdelez, 1999). Other literature assigns a positive, non- neutral role to time: serendipity benefits from the passage of time (Barber, 1953; Johnson, 2010; Moghnieh, Arroyo and Blat, 2008). The literature also suggests that serendipity requires retrospective sensemak- ing. Makri and Blandford (2011, ‘Themes that arose from discussion’, emphasis in original) argue:

the serendipitous event . . . must be recognised and mentally la- belled as serendipitous by the person experiencing it . . . Whilst it is possible to realise immediately that a serendipitous event has happened . . . there may be a lag ranging from seconds to years for someone to realise that serendipity has occurred . . . .

Note that the mental labelling of serendipity—either immediate or with a time lag—only occurs after the serendipitous event ‘has happened’. As such, serendip- ity is, by default, a past experience.

Relational The literature presents numerous references to the relational nature of serendip- ity. These references are marked by the use of terms such as ‘networks’ (Andr´e et al., 2009; Cunha et al., 2010; Johnson, 2010; McCay-Peet and Toms, 2010;

40 2.3 Serendipity as process

Schumpeter, 2010; Snowden, 2003), ‘links’ (Foster and Ford, 2003), ‘ties’ (Fine and Deegan, 1996; Sawaizumi et al., 2009), ‘relationships’ (Cunha et al., 2010; Fine and Deegan, 1996), ‘interactions’ (Eagle, 2004; Thagard and Croft, 1999), ‘connections’ (Cunha et al., 2010; Fine and Deegan, 1996; Foster and Ford, 2003) or ‘adjacencies’ (Johnson, 2010; Thudt, Hinrichs and Carpendale, 2011). It is im- portant to note that the perception of serendipity as relational applies, not only to connection building within the internal, psychological realm of the mind, but also to interaction in the external, sociological world: as Cunha et al. (2010, p. 321) argue, “The process of serendipity . . . involves . . . relational ability . . . whether this ability belongs to the serendipitous hero in a tale of discoveries or is vested in the surrounding networks”.

2.3.2 Associated circumstances

2.3.2.1 When, where and how?

Section 1.4.3.1 introduced Halliday’s (1994) tripartite interpretation of process: process as associated circumstances, the process itself, and the participants in process. Franzosi (2010, p. 23) notes, “For Halliday (1994), the circumstances of a process . . . refer to notions such as ‘when, where, how and why’ something happens”. The present study was descriptive in nature. As emphasised in section 1.4.4.1, the research questions concerned with the circumstances associated with process in serendipity focused on context and manner of occurrence rather than on reason for occurrence; therefore, questions of when? where? and how? received the bulk of the investigative attention. With its causal implications, why? was a secondary question, when asked at all.

2.3.2.2 Time and location

Section 2.3.1 noted that serendipity is perceived as longitudinal. This perception implies that serendipity process is associated with a minimum of two temporal contexts. The description of temporal contexts may range from the general—“one day” (Interviewee 6, Foster and Ford, 2003, p. 332), “in the morning” (Participant

41 2. CONCEPTUAL FRAMEWORK

1, Sun et al., 2011, ‘Example 2: review of the iPhone’)—to the specific—“Labour Day in . . . 1998” (Winchester, 2008, p. 137). Serendipity is also associated with locations. Either one or several locations may be involved. As with time, locations may be general: “in a bookshop” (In- terviewee 37, Foster and Ford, 2003, p. 331), “next to me” (Participant 5, Sun et al., 2011, ‘Example 1: making unexpected connections between information’); or, specific: “in box 27” (Interviewee 30, Foster and Ford, 2003, p. 334), “about 150 miles north of New York City” (Winchester, 2008, p. 133), “by the Victoria and Albert museum” (Participant 11, Sun et al., 2011, ‘Example 4: making unexpected connections between people’s physical environment and people’s ex- isting memories and ideas’). Section 2.3.3.4 examines how the concepts of time and place are intimately tied to the concept of event: events happen in temporal and locational contexts. In light of this relationship, the present research adopted the term event context to label the context created by the combined circumstances of time and location associated with serendipity process (see section 4.8.3).

2.3.2.3 Serendipity classifications/patterns

As discussed in section 1.3.1, a definitional approach to serendipity is problematic. The use of classifications and patterns offers an alternative approach. Indeed, this alternative approach underlies Merton’s differentiation between ‘serendipity the word’ and ‘serendipity the pattern’. As with definitions, classifications and patterns are inherently descriptive: when applied to serendipity, they describe how serendipity is experienced and/or how it plays out. Nonetheless, some serendipity classifications and patterns serve to shed light on the ‘whys’ of serendipity: why is serendipity experienced the way it is? and, why does serendipity play out the way it does? A class represents a “group of . . . things having some characteristic in com- mon” while a pattern describes an “ideal example”, “regular form” or “model” (Sykes, 1976, p. 184, 809). The literature employs serendipity classifications and patterns to label typical or normal examples of serendipity. In other words, the literature makes the assumption that multiple cases of serendipity, despite any

42 2.3 Serendipity as process heterogeneity across the cases, can be described in terms of representative ex- amples. Several serendipity classifications and patterns are proposed in the literat- ure. Schemes may group serendipity examples according to psychological, (e.g. individualistic) (for example, see Austin, 2003, p. 76) and/or sociological (e.g. process-based) aspects (for example, see de Figueiredo and Campos, 2001). It is also important to note that some classifications presented in the literature are inherently context-tied, for example to libraries (Liestman, 1992) or to digital environments (Nothhaft Jr., 2010). The present study considered two classifications of serendipity (Campanario, 1996; Foster and Ford, 2003) and one set of serendipity patterns (van Andel, 1994). These schemes were selected for several reasons:

1. their process bases;

2. their applicability to the context of research (see section 2.2.1.2);

3. their empirical derivations;

4. their explicit interrelations; and,

5. their potential interrelations.

Foster and Ford Foster and Ford’s (2003) classification derives from an empirical study of serendip- ity as experienced by interdisciplinary researchers.13 The scheme includes four classes, the last of which divides into two subclasses:

1. reinforcing or strengthening the researcher’s existing problem conception or solution; or 2. taking the researcher in a new direction, in which the problem conception or solution is re-configured in some way. 3. The unexpected finding of information the existence and/or loc- ation of which was unexpected, rather than the value.

13More precisely, the classification stems from a wider study of the information seeking of interdisciplinary researchers. See Foster (2003) for the classification presented as part of that wider study.

43 2. CONCEPTUAL FRAMEWORK

4. The unexpected finding of information that also proved to be of unexpected value: (a) by looking in “likely” sources; (b) by chance (Foster and Ford, 2003, p. 330, 332, emphasis in original).

Foster and Ford’s (2003) classification divides into two parts: Classes One and Two “[detail] the impact . . . of serendipitous encounters”, while Classes Three and Four “[describe] the nature of these encounters” (Foster and Ford, 2003, p. 332). As such, it is possible for a serendipity experiences to fall into more than one class.

Van Andel Van Andel (1994, p. 637) notes that his serendipity patterns stem from a study of “more than one thousand examples of serendipity”. The scheme comprises seventeen patterns:

1. Analogy with same or different context 2. One surprising observation 3. Repetition of a surprising observation 4. Successful error 5. From side-effect to main effect 6. From by-product to main product 7. Wrong hypothesis 8. No hypothesis 9. Inversion 10. Testing a popular ‘belief’ 11. Discover[y] by child, student or outsider 12. Disturbance or interference 13. Scarcity 14. Interruption of work 15. Playing 16. Joke

44 2.3 Serendipity as process

17. Dream or ‘forgetting-hypothesis’ (summary of van Andel, 1994, sec. 9 in Campanario, 1996, p. 12, Table 2).

Note that van Andel (1994, p. 640) states that the various patterns can “coexist, overlap and/or cooperate”. This point is confirmed empirically by Campanario (1996, p. 10).

Campanario Campanario’s (1996) is the broadest of the three schemes employed by the present research. Campanario’s classification is based on van Andel (1994) and comprises two categories:

1. The goal of a research project is reached accidentally; 2. In the course of an investigation something is discovered that does not have to do with the original research project (Cam- panario, 1996, p. 10).

Campanario (1996, p. 10) remarks, not only that “his scheme can be complemen- ted by using some of the 17 patterns of serendipity . . . described by van Andel (1994)”, but also that van Andel’s serendipity patterns are “often compatible with both of [the] categories” of his own (i.e. Campanario’s) scheme.

2.3.2.4 Sociological filters

The literature refers to non-serendipity, cases of ‘serendipity lost’ (Barber and Fox, 1958; Merton, 1975). Such cases never become serendipity: either the po- tential for serendipity is not recognised, or the potential is recognised but then purposively ignored or not acted upon. The literature suggests that aspects of the social environment in which serendip- ity plays out affect both the occurrence and non-occurrence of serendipity. The present research labels these aspects sociological filters.14 This section considers four types of sociological filter:

1. time (e.g. limits, deadlines);

14Psychological filters (e.g. emotions, mental ability) may also affect serendipity; however, such filters are beyond the scope of the present study.

45 2. CONCEPTUAL FRAMEWORK

2. financial (e.g. money, funding);

3. need to follow intention through (e.g. aims, priorities); and,

4. social responsibility (e.g. to a team).15

The concept of sociological filters is recognised in information behaviour re- search: Prabha, Connaway, Olszewski and Jenkins (2007, p. 77) argue that the need to avoid “additional time and expenditure” plays a role in satisficing; Prekop (2002, p. 536) suggests that a “group’s norms, social rules and social structures” affect the activity of collaborative information seeking; Spink (2004, p. 347, em- phasis in original) finds that “the level of planning and priorities by the informa- tion seeker in relation to their information tasks” and “the pros and cons or effects on effectiveness, efficiency and productivity” affect task switching in information behaviour; and, Foster (2005, ‘External context’) notes that information seeking is “framed by the resolution of information problems, and by limits to time and financial resources”. The serendipity literature also recognises sociological filters. A reference to a filter may be general: for example, a comment on the need to have “less pres- sure” in order for serendipity to occur (Sun et al., 2011, ‘The role of context in serendipitous experience’), or a call to “keep the flexibility, keep the freedom” in research in order to promote serendipity (Langmuir in Merton and Barber, 2004, p. 201). Alternatively, references to filters may be specific. Examples include references to pressures of time (Makri and Blandford, 2011; McCay-Peet and Toms, 2010; Morton and Swindler, 2005; Toms and McCay-Peet, 2009), money (Campanario, 1996; Merton and Barber, 2004; Morton and Swindler, 2005), goals, priorities or plans (McCay-Peet and Toms, 2010; Mueller, 1990) or responsibility to a team or organisation (Campanario, 1996; Cunha, 2005; Cunha et al., 2010; Merton and Barber, 2004). Note that multiple specific filters may be cited; for example,

15Although these four filters are acknowledged in the literature, additional sociological filters may exist. The present research focused on these four filters because, as a set, they represented a finding of McBirnie (2008a,b). The author sought to examine this finding further in the present study. To stress the point that filters are recognised by the wider literature (i.e. that the author’s finding is not an isolated result), the examples in this section do not cite McBirnie (2008a,b).

46 2.3 Serendipity as process

Morton and Swindler (2005) mention timeliness, investment, organisations and research goals in relation to serendipity. Although filters are commonly considered in terms of their negative effects, the serendipity literature also references the positive effects of sociological fil- ters. Time may have either positive or negative effects: Cunha et al. (2010, p. 324–325) suggest that “tight deadlines” with minimally-defined processes facil- itate serendipity while “clear time limits with well-defined processes” discourage serendipity. As well, responsibility to a team need not necessarily limit oppor- tunities to follow up the unexpected: Cunha et al. (2010, p. 325) go on to note, “Serendipity may also benefit from more general attributes that are commonly associated with teams, such as high levels of organizational trust and support”.

2.3.2.5 Phenomenological interpretations

Although not strictly ‘circumstances associated with process’, phenomenological interpretations reflect how people make sense of their experiences of process in serendipity. Such sense making involves questions of how? when? and where? This section reviews four phenomenological interpretations relevant to process in serendipity:

1. intention versus realisation: how are intended goal and process and realised outcome and process related (e.g. goal and outcome differ but process remains the same; process differs but goal and outcome remain the same; process, goal and outcome all differ; etc.)?

2. recognition: when is serendipity recognised (e.g. immediately, with a delay)?

3. (non)linearity: how does process in serendipity play out (e.g. in sequential steps, as multiple events happening at once)? and,

4. multiple disciplinarity: how and where does serendipity process occur with respect to disciplinary boundaries (e.g. within a single discipline; across multiple disciplines; in new disciplinary spaces; etc.)?

47 2. CONCEPTUAL FRAMEWORK

Intention versus realisation As some of the classifications discussed in section 2.3.2.3 indicate, the literat- ure differentiates between intention and realisation in serendipity. For example, Campanario’s (1996) classification captures two commonly recognised relation- ships between process, goal and outcome in serendipity: process differs while goal and outcome remain the same; and, goal and outcome differ while process remains the same. In a research context, these relationships can be understood more specifically in terms of links between research aims (i.e. goals), findings (i.e. outcomes) and methods (i.e. process). The recognition of a difference between intention and realisation appears in several guises in the literature. Perhaps most frequently, the recognition is evident simply from descriptions presented by a text. Consider this example: “For in order to absorb the unanticipated datum, he had to be prepared to spur his horse to a full gallop directly away from his previously intended path” (Shulman, 2004, p. xxiv). The recognition is also captured in terminology; for example, Roberts (1989, p. x) introduces the term pseudoserendipity to specify serendipity in which goal and outcome remain the same, but process, the “ways to achieve an end sought for”, differs. De Figueiredo and Campos’s (2001) serendipity equations present yet another means for differentiating intention and realisation in serendipity; for example, the authors’ logical equation (de Figueiredo and Campos, 2001, p. 3)

is semantically equivalent to Robert’s pseudoserendipity.16

Recognition Recall Makri and Blandford’s (2011) argument that serendipity ‘must be re-

16In de Figueiredo and Campos’s (2001) equation, P 1 represents problem one and S1 repres- ents solution one, indicating that the intended goal and the realised outcome are the same. In other serendipity equations, de Figueiredo and Campos (2001) use the notation P 1, P 2 and S2, where P 1 represents the initial problem and P 2 represents the actual problem that the realised solution, S2, addresses. Note that in such cases of serendipity, S1, the intended solution to the initial problem P 1, does not factor in the equation because S1 never actually results.

48 2.3 Serendipity as process cognised . . . by the person experiencing it’ (see section 2.3.1). The literature ac- knowledges that the recognition of serendipity may be either immediate or delayed (Andr´e et al., 2009; Cunha et al., 2010; Makri and Blandford, 2011; McCay-Peet and Toms, 2010; van Andel, 1994). Empirical studies provide some evidence that immediate (or near-immediate) recognition may be indicated by the use of the word ‘suddenly’ (e.g. in anecdotes of serendipity); for example, “I suddenly realised” (Interviewee 36, Foster and Ford, 2003, p. 332), or “I suddenly noticed” (Participant 2, Sun et al., 2011, ‘Example 1: unexpected course module’). Theor- etical discussions of serendipity propose that delayed recognition occurs because “a period of incubation is sometimes necessary before recognition” (McCay-Peet and Toms, 2010, p. 377). As Cunha et al. (2010, p. 325) suggest, “discoverers . . . may need significant periods of time to interpret . . . their discoveries”.

(Non)linearity Models of information behaviour describe discovery as linear, a process of steps or stages through time (for examples, see Ellis, 1989; Kuhlthau, 1993) and as nonlinear, a “wholly interactive and non-sequential” (Foster, 2006, p. 156) process in time, “concurrent, continuous, cumulative, and looped” (Foster, 2005, ‘Results’).17 It is important to note that the concept of time is intimately bound up in the concepts of linearity and nonlinearity.18 Foster (2006, p. 162) argues, “The accepted norm for information seeking research has been to understand the world as a movement from one point to an- other stage-by-stage . . . ”. This perspective perceives time as linear, as Edding- ton’s (1928, p. 68) “arrow” of time; however, other perspectives accommodate nonlinear views of time. For example, concerning the construction of time in sensemaking, Dervin (1992, sec. 2) writes, “When seen from the actor’s perspect- ive, time can be constructed in a variety of ways, linear, cyclically and otherwise”. As noted in section 2.3.1, serendipity ‘takes time’. The serendipity literature

17Note that these sociological interpretations are ‘borrowed’ from the formal, mathematical understandings of linearity and nonlinearity (see Bertuglia and Vaio, 2005, Part 1). 18It is also important to note that time is not the only concept bound up in linearity and non-linearity; for example, change and emergence are also relevant. This section focuses on time because the temporal aspects of serendipity process were especially relevant to the present research.

49 2. CONCEPTUAL FRAMEWORK suggests that ‘taking time’ may be understood as linear or nonlinear. The anec- dotal literature provides numerous accounts of serendipity as a process moving through time. However, a caveat must be stressed concerning such examples: personal narratives take a normal form that potentially favours the reporting of process as a linear, sequential series of events (see section 3.3.2). Outside of the anecdotal literature, the focus is more often on the nonlin- ear aspects of serendipity process; for example, Cunha (2005, p. 12) describes serendipity as “an emergent process” (i.e. a complex or nonlinear process), while Holder and McKinney (1993, p. 57) refer to “the myth of linear time” as an obstacle to understanding serendipity. The trend emphasising the nonlinearity of serendipity is especially notice- able in discussions of the problems with conventional research reporting. Merton writes of the “standard scientific article” (see Merton, 2004, p. 269):

When mistakenly taken as an account of how the reported research actually proceeded, that representation of scientific inquiry not only leads specifically to the omission of serendipitous episodes . . . but gives rise more generally to a misleading image of scientific work as nothing but linear hypothetico-deductive inquiry; this, presumably, in accord- ance with the so-called scientific method . . . the etiquette governing the writing of scientific (or scholarly) papers requires them to be works of vast expurgation, stripping the complex events that cumulated in the published reports . . . it invites an imagery of scientists and schol- ars moving coolly, methodically, and unerringly to their reported ideas and findings . . . (Merton, 2004, p. 272)

Note that although Merton refers to cumulative ‘complex events’, he does not attach the label ‘nonlinear’ to serendipity process here; instead, he relies on im- plication to suggest that serendipity is something other than ‘linear inquiry’. The evidence that Merton labels serendipity as ‘nonlinear’ lies elsewhere—in his 2004 description of how he decided, in his seminal 1945/1948 papers (see section 1.3.1), to apply the term ‘serendipity’ to empirical research:

My generic interest in the unanticipated had led to specific focus on the unanticipated (and hence, nonlinear) phases in much scientific inquiry. So it was that I soon adopted the—for me—variously en- compassing word and went on to remark of an unexpected scientific

50 2.3 Serendipity as process

finding that “this might be deemed the ‘serendipity’ component of research . . . (Merton, 1945, p. 469)” (Merton, 2004, p. 236, emphasis added)

It is interesting to note that, in her discussion of information encountering, Er- delez (1997, p. 418) refers to “nonlinear movement” in information behaviour. Merton’s reference to ‘nonlinear phases in scientific inquiry’ can be seen as a context-specified version of Erdelez’s general point.

Multiple disciplinarity The literature associates serendipity with the involvement of multiple discip- lines; for example, Merton (2004, p. 263) refers to the “ . . . wide-ranging explor- ation of developing interests and interaction with fellow scientists and scholars drawn from other disciplines (and subdisciplines) that make[s] for serendipity in the domain of learning”. The discussion here considers two aspects of the serendipity literature’s reference to the involvement of multiple disciplines:

1. the nature or type of this involvement; and,

2. the strength of the theoretical assumption concerning this involvement versus the amount of empirical evidence available to support this assump- tion.

The terms multi-, inter-, trans- and crossdisciplinarity all refer to the involve- ment of multiple disciplines. However, as Choi and Pak (2006, p. 259) argue, the first three terms differ in meaning:

Multidisciplinarity draws on knowledge from different disciplines but stays within the boundaries of those fields (Natural Sciences and En- gineering Research Council of Canada (NSERC), 2004). Interdisciplinarity analyzes, synthesizes and harmonizes links between disciplines into a coordinated and coherent whole (Canadian Institutes of Health Research (CIHR), 2005). Transdisciplinarity integrates the . . . sciences in a humanities con- text, and in so doing transcends each of their traditional boundaries (Soskolne, 2000).

51 2. CONCEPTUAL FRAMEWORK

With respect to the fourth term, crossdisciplinarity, Hulme and Toye (2005, p. 6) argue that crossdisciplinary approaches can be decomposed into “multidisciplin- ary and interdisciplinary approaches”. In terms of explicit use, the theoretical serendipity literature favours the inter- and cross- prefixes. Examples include: D¨ork,Carpendale and Williamson’s (2011, p. 1215, 1217) conceptualisation of the information flaneur, a “human-centred view on information seeking that is grounded in interdisciplinary research”, as “the embodiment of . . . serendipity”; Blackwell, Weison, Street, Boulton and Knell’s (2009, p. 15) finding of “the key role of serendipity in interdisciplinary in- novation” in their report on crossing knowledge boundaries with interdisciplinary teams; Cunha’s (2005, p. 14) association of “cross-functional teams, boundaryless structures and interdepartmental communication” with serendipity; and, Andr´e et al.’s (2009) suggestions that crossing domains and disciplines aids serendipity. Nonetheless, close readings suggest that implicit references to multi- and transdisciplinarity are also to be found in theoretical discussions of serendip- ity: Friend’s (2008, p. 97) first rule for serendipity, “Look for technology outside the field [of physics]”, implies a multidisciplinary approach (i.e. the outside tech- nology is brought into and used within physics); and, Shulman’s (2004, p. xvii) association of serendipity with the “exchange between the scientific and human- istic communities” represents an implicit reference to transdisciplinarity. Whether referenced explicitly as cross- or interdisciplinarity, or implicitly as multi- or transdisciplinarity, there is a normal bias in the theoretical literature to treat the involvement of multiple disciplines as beneficial to serendipity. Some empirical evidence exists to support this theoretical assumption. Foster and Ford (2003, p. 336) found that “[s]erendipity was widely experienced amongst inter- disciplinary researchers”, while McBirnie (2008a, p. 42) found that “[i]nput from or influence by an ‘outsider’ (from the community of practice) . . . occurred often” in serendipity. The present research sought to add to the empirical evidence base available to support/refute the theoretical assumption of a beneficial association between the involvement of multiple disciplines and serendipity. As noted above, the literature suggests, not only that multi-, inter-, cross- and transdisciplinarity may all be relevant to serendipity, but also that these terms are not always applied

52 2.3 Serendipity as process explicitly. In respect of the three base terms,19 Choi and Pak (2006, p. 359) argue:

. . . we propose that the terms “multidisciplinary”, “interdisciplinary” and “transdisciplinary” are to be used to describe the multiple dis- ciplinary approaches to varying degrees on the same continuum. We further propose that when the exact nature of a multiple disciplinary effort is not known, the specific terms “multidisciplinary”, “interdis- ciplinary” and “transdisciplinary” should be avoided, and the general term “multiple disciplinary” used instead.

Following, Choi and Pak’s suggestion, the present study preferred the broader term multiple disciplinarity. The research sought only to determine to what ex- tent multiple disciplinarity was associated with serendipity. Serendipity research is still in the early stages of collecting empirical evidence in respect of multiple disciplinarity; if the evidence base expands in support of an association between multiple disciplinarity and serendipity, then it may be possible, at that stage, to drill down and explore in detail the potentially distinctive roles of multi-, inter- and transdisciplinarity.

2.3.3 Process and participants

2.3.3.1 Who and what?

Whereas the circumstances associated with process relate to when, where and how (see section 2.3.2.1), the participants and process itself refer to who and what: who are the participants? and, what are they doing? Related questions (Ricoeur, 1984, p. 54–55) include: with whom? and, against whom? In the case of happening rather than doing, the questions must be rephrased: what is happening? and, to whom?

2.3.3.2 Participants

Section 2.3.1 discussed serendipity as personal lived experience. As lived and personal experience, serendipity must, by default, involve at least one participant in process: the person who experiences the serendipity, he or she who brings

19Recall that Hulme and Toye (2005) decompose crossdisciplinarity into multi- and interdis- ciplinarity.

53 2. CONCEPTUAL FRAMEWORK

the serendipity into existence as a phenomenon (see section 3.2.1). Beyond this significant person, the literature recognises three broad types of participants in serendipity process: other people, objects and information.20 The theoretical serendipity literature frequently assigns a role to people in serendipity process: for example, Fine and Deegan (1996) refer to the impacts of social relations on serendipity, while Cunha et al. (2010) discuss the benefits of social networks. The empirical literature also provides evidence to support a role for people in serendipity: for example, the research participants in Sun, Sharples and Makri’s (2011, ‘The nature of serendipitous experience’) recent study vari- ously recounted serendipity experiences that involved the participation of the “sister of a Swiss colleague”, a “lecturer”, and a “chap called John”. McBirnie and Urquhart (2011) argue that the other people role assignment is so strongly assumed in the literature that some researchers essentially treat social interaction as a necessary condition for serendipity (for examples, see Eagle, 2004; Eagle and Pentland, 2004). The literature also assigns a role to information in serendipity process. The information role may be linked to the social role; for example, Cunha (2005, p. 13) refers to “interaction between people with different types of knowledge”. Alternatively, information may be seen as a participatory “node” (Toms, 2000b) in its own right. Anecdotes of serendipity may refer to information in both physical (e.g. books) and non-physical forms (e.g. comments). Although they generally receive less attention in the literature than people or information, physical objects represent the third type of participant in serendipity process; for example, Wills and Kolodner (1994) focus on objects in serendipity process, while Boden (2004, p. 237) gives an example of an object, “a new gadget”, playing a role in the process of serendipity. A more unusual example is Mueller’s (1990) computer program, DAYDREAMER, which explicitly includes an object-driven serendipity protocol. Mueller (1990, p. 14) demonstrates the code for DAYDREAMER’s object-driven serendipity protocol by assigning the object roles to a lampshade and to a com- puter terminal. Mueller’s choice of the latter object serves to illustrate an im-

20The author acknowledges that information is a problematic term (see Case, 2007, Chapter 3). The term is employed here in its broadest sense, to include data and knowledge.

54 2.3 Serendipity as process portant point: although in some cases of serendipity it may be appropriate to class a physical object as information (e.g. as in the example of a ‘book as information’ given in the paragraph above), in other cases, an object may be a valuable serendipity process participant in its own right. In his example serendip- ity scenario, Mueller intends the computer as an object-type participant, not as an information-type participant.

2.3.3.3 Process itself

Section 1.4.3.1 stressed the point that action and events represent process as opposed to stasis. In his tripartite interpretation of process, Halliday (1994, p. 108) argues that process itself is represented grammatically by verbs. Process itself is the action that serendipity participants do or have happen to them. This action may or may not have physical manifestations; for example, in an anecdote of serendipity, Winchester (2008, p. 133) recounts, not only ‘remembering’ a telephone number, but also, ‘picking up’ the phone and ‘dialing’ the number. As well, the action need not necessarily be ‘acted’ by a human actor. Later in his anecdote, Winchester (2008, p. 134) tells of pages ‘coming though’ a fax machine. References to action are also found in the non-anecdotal serendipity literature; for example, McCay-Peet and Toms (2010) make multiple references to “task” (i.e. doing) as a “precipitating condition” of serendipity. More generally, several authors explicitly acknowledge serendipity as process (Cunha, 2005; Cunha et al., 2010; Fine and Deegan, 1996; Makri and Blandford, 2012a,b; McCay-Peet and Toms, 2010).

2.3.3.4 Events

The present research described event structures of process in serendipity. As explained in section 1.4.3.2, an event structure represents a collection of events. The literature presents three broad interpretations of the term event. An event can be defined simply as “something that happens” (Rimmon- Kenan, 2002, p. 2–3).21 Although not made explicit in the definition, this in-

21Franzosi (2010, p. 17) highlights Rimmon-Kenan’s reliance ”on the OED definition of event as a ‘thing that happens’ ”.

55 2. CONCEPTUAL FRAMEWORK terpretation of event implies change: “ . . . there is some recognized state or set of conditions, and . . . something happens, causing a change to that state” (Toolan, 2001, p. 6). In respect of this simplest interpretation of event, Franzosi (2010, p. 21) notes, “It is the action [i.e. process itself] that defines the event, although without characters [i.e. participants] there can be no action”. In other words, only participant and process linked together form an event. Closely tied to the simplest interpretation of event, is the more specific un- derstanding of an event as “something that happens at a particular time and place” (Allan, Papka and Lavrenko, 1998, p. 38). This second interpretation places an event in a temporal and locational context and makes the concept of change explicit. For example, consider an arrow of time (see Eddington, 1928) that points forward: when something happens at a particular time, then, by definition, that something changes the temporal context (i.e. the previous ‘par- ticular time’ becomes the new ‘particular time’, which in turn will be followed by the next ‘particular time’, and so on). The third interpretation of event differs from the first two in that it associates an event with significance:22 an event is “something (non-trivial) happening” (Yang, Pierce and Carbonell, 1998, p. 29). The significance may be personal (e.g. getting married) or societal (e.g. fighting a war). Although some studies of nontrivial events focus on news stories (for examples, see Allan, 2002; Waring, 2010), ‘significant’ need not imply a story worthy of the nine o’clock news: turning right rather than left can be significant to the right person at the right time in the right place. Two further points worth stressing about events understood as significant are that they may be labeled as nouns and that they may describe a collection of events; for example, from a process-perspective, we understand ‘war’—a noun— as a collection of actions—fighting and fleeing, winning and losing, to name a few. In this sense, events are treated as objects (Quine, 1996) and regarded as “bounded regions of space-time” (Zacks and Tversky, 2001, p. 4). As Winchester’s (2008) anecdote (see section 2.3.3.3) illustrates, all three in- terpretations of event are relevant to serendipity. ‘Remembering’, ‘picking up’,

22The reader familiar with event-history analysis may recognise this interpretation of an event as a “socially significant event” or “life-event” (Scott and Marshall, 2009, p. 232, 417–418).

56 2.3 Serendipity as process

‘dialing’ and ‘coming through’ each indicate something that happens. Further- more, these somethings happen at particular times and places: the actions oc- cur sequentially—from ‘remembering’ to ‘coming through’—and are locationally- situated—the first three actions “in the bath” and the fourth, “by the fax ma- chine” (Winchester, 2008, p. 133, 134). Evidence for a link between event and significance is also demonstrated in Winchester’s (2008) anecdote: the phrase, “And here’s the serendipitous moment” (p. 133), implies an event that is non- trivial; and, the anecdote’s concluding sentence, “So there you have it: a clas- sic example of serendipity . . . ” (p. 137), treats a collection of recounted events, ‘serendipity’, as an object of bounded space-time.

2.3.3.5 Event structures

As a collection of events, an event structure is a relational description. Event structures are relational on two levels: they describe relations between parti- cipants and process itself as well as relations between events. While the relational description of the event structure elements—participants, process itself, and events—can be of interest in its own right, it may be possible to obtain additional descriptive data by ‘asking questions’ of the event structure. This section explores several questions that, asked of serendipity event structures, might serve to develop understanding of serendipity as a complex phenomenon.

Complexity Perhaps the most obvious set of questions to ask about an event structure relates to its complexity: how many participants, processes and events (along with related event contexts) comprise the event structure? how connected is the event structure (i.e. is it sparse or dense in terms of relations)? and, where attributes are relevant—for example, in the case of participant-types—which attributes are present and how are they distributed within the event structure? The literature suggests that all of these questions are relevant to process in serendipity. Wills and Kolodner (1994, p. 940) argue that a “rich environment of stimuli” supports serendipity. Cunha et al. echo this sentiment: “serendipity can occur by virtue of rich networks” (2010, p. 324).

57 2. CONCEPTUAL FRAMEWORK

The term ‘rich’ has many semantic associations, including full, elaborate, deep, abundant and ample (Sykes, 1976, p. 967).23 Applied to a relational object such as an event structure, the term ‘rich’ relates to the notion of complexity:24 how intricate or complicated is the event structure?

Load and activity Event structures can provide information on load and activity in process. The concept of load can be understood as how much of something someone or some- thing carries. Information behaviour research recognises the concept of load, most frequently in references to information overload, a state resulting from “too much information” (Case, 2007, p. 333). The broad concept of activity relates to the concept of load: load may be pushed or pulled, and carried frequently or infrequently. Activity is also acknow- ledged in information behaviour research; for example, Boyd (2004, p. 85, Fig. 3, Table IV) notes that information seeking involves “pull methods” and “push methods”—with pull understood as “active” (i.e. seeking) and push as “passive” (i.e. receiving)—and that these methods may be more or less frequent.25 The serendipity literature presents conflicting views on load.26 Many authors suggest that serendipity benefits from or requires a larger load (Bj¨orneborn, 2008; Rice, 1988; Sawaizumi et al., 2009; Snowden, 2003; Thudt et al., 2011; Wills and Kolodner, 1994); however, others propose that a smaller, limited load is preferable (Andr´e et al., 2009; Moghnieh et al., 2008). Occasionally, the benefits and disadvantages of load amount are considered in

23The term ‘rich’ is also associated with the concept of value; however, the context surround- ing the use of ‘rich’ in Wills and Kolodner’s (1994) and Cunha et al.’s (2010) texts strongly suggests that the authors intend to refer first and foremost to ideas such as abundance, fullness, and elaborateness. 24Note that the present study does not employ the term ‘complexity’ in a mathematical sense, for example, as used in complexity science (see Bertuglia and Vaio, 2005, Part 4). 25A caveat should be noted here concerning the term ‘passive’: due to its cognitive aspects, information behaviour is never truly ‘passive’. As such, in both Boyd’s (2004) model and the present research, the term ‘passive’ is used in a looser sense, to contrast with ‘active’. 26Although not directly relevant to the present study, it is worth highlighting that this conflict is especially evident in discussions of the effect of the Web, and the related issues of Web personalisation and filtering, on serendipity (for examples, see Beacham, 2001; Campos and de Figueiredo, 2001; Hangal et al., 2012; Helft, 2007; Johnson, 2010; Leong, Vetere and Howard, 2007).

58 2.3 Serendipity as process the context of a single discussion; for example, Beale (2007, p. 421) notes that quantity, not only leads to the “increased opportunity of being able to discover”, but also “mitigates against us being able to discover”. As well, the benefits of load may be discussed in terms of the balance or distribution of load type: for example, Cunha (2005, p. 12–14) argues that an increased load of people, but a restricted load of knowledge, facilitates serendipity. The broad concept of activity also features in the serendipity literature. Some discussions refer to push and pull; for example, Koenig (2000, p. 11) argues for “push” that is “not too aggressive” for serendipity, while Schumpeter (2010, p. 61) refers to the “power of pull” for serendipity. Other discussions consider the active and passive aspects of serendipity: Foster and Ford (2003, p. 323) describe serendipity as “in some way both passive and yet capable of ‘efficiency’ ”; McBirnie (2008b) suggests that serendipity is exper- ienced as both active and passive; and, Sun et al. (2011, ‘The role of context in serendipitous experiences’, ‘Understanding serendipity’) note both that a role may be “active” or “less active” in serendipity and that activity in serendipity may take a “passive form”. Based on their empirical findings, Sun et al. (2011, ’Under- standing serendipity’) even issue a call concerning the concept of activity: “. . . we suggest that studying the nature of serendipity should include people’s activity, which can help to describe what happens when people experience serendipity”. Frequency is also acknowledged as relevant to serendipity; for example, John- son (2010, p. 113–117) describes storing information in such a way that it can be used again, at some later time, to facilitate serendipity. Hangal et al.’s (2012) applied intervention, an experience-infused browser for serendipity, is based on this same idea of repeated use. It is important to note that as with load, frequency may be active (i.e. pulled) or passive (i.e. pushed). The present study employed the labels activity and passivity to differentiate active and passive frequency. The label activity should not be confused semantically with the concept of activity: the concept encom- passes several facets of serendipity process while the label describes only one facet of serendipity process.

59 2. CONCEPTUAL FRAMEWORK

Interaction As noted in section 2.3.1, serendipity is associated with interaction. Interaction is defined by the process links between participants. The relational nature of event structures accommodates questions focused, not only on the concept of interaction, but also on the related concept of interaction patterns. Interactions can collectively form a whole. The event structure itself is perhaps the most obvious example of a whole formed by a collection of interactions;27 however, within event structures, smaller ‘wholes’ of interaction are possible. A pattern of interaction is a special type of ‘whole’ that, because it repeats across multiple event structures, is deemed significant. The present study referred to such significant ‘wholes’ of interaction as interaction patterns. The fact that the literature classifies serendipity according to patterns (see section 2.3.2.3), some of which reference interaction, provides support for the idea of interaction patterns in serendipity. As van Andel (1994, p. 640) remarks, serendipity patterns represent “ways in which unsought findings have been made”. Crucially, these are not just any ‘ways’, but rather ‘ways’ that are deemed rep- resentative or typical. Nonetheless, despite the explicit acknowledgment of typical ‘ways’ of serendip- ity, the description of interaction patterns in the literature remains relatively general. For example, consider van Andel’s (1994) eleventh serendipity pattern, which refers, in part, to interaction with an ‘outsider’ in serendipity process, a proposition that is supported by the findings of several empirical studies (Foster and Ford, 2003; McBirnie, 2008b; Sun et al., 2011). What of the specific inter- action events of this outsider involvement? Consider a collection of event structures described by serendipity experiences where each experience matches Van Andel’s outsider pattern. Questions related to specific interaction events at both the local and global level may be asked of the event structures: do the local interactions between outsider and non-outsider(s) follow the same (or similar) pattern in every event structure? and, does every event structure describe the same (or similar) global interaction pattern?

27Note that labelling a relational structure as a ‘whole’ does not imply that the structure is ‘holistic’. See section 3.1.2 for further discussion of this point.

60 Chapter 3

Methodological Framework

This chapter assembles the study’s methodological framework. The discussion considers relationalism, phenomenology, narrative theory, modelling and network theory, and suggests how these interrelate. Although concerned with methodo- logy, the chapter does dip into methods, especially narrative and network meth- ods, when information about how concepts are applied in practice has the poten- tial to aid methodological understanding.

3.1 Relationalism

3.1.1 The relational dyad

What is the smallest unit of analysis in relational research? It may seem odd to open a methodology chapter with a measurement question that would normally be placed in a methods section; however, the answer to this question is fundamental to understanding the relational perspective. Non-relational approaches take the individual, or some analogous holistic sys- tem, as the smallest unit of analysis. Relational approaches treat the relational dyad, conceived as two elements and an interrelation between them, as the smal- lest unit of analysis. The study of dyadic relations, rather than individuals or holistic systems, fundamentally affects the types of questions asked in relational research and how these questions are studied.

61 3. METHODOLOGICAL FRAMEWORK

3.1.2 The relational perspective

Relational research describes structure. Indeed, Wellman and Berkowitz (2008) refer to the relational approach as structural analysis. Relational research also describes trans-action,1 a conception of process that differs from both self-action and inter-action (Emirbayer, 1997). Of course, many theories in the humanities and the sciences focus on structure and process (Wellman and Berkowitz, 2008, p. 4); for introductions to sociolo- gical examples, see Craib (1992), Craib (1997, parts I–II), Layder (2006) and Wellman and Berkowitz (2008). So, what distinguishes the relational perspective on structure and process from the non-relational perspective? As Crossley (2011, p. 4) notes, this question has received considerable attention from social theorists. The discussion here draws on three such discussions: Emirbayer (1997), Crossley (2011) and Wellman (2008).

Emirbayer In his comparison of substantialism, which he considers the dominant soci- ological perspective, and relationalism, Emirbayer (1997) refers to Dewey and Bentley’s (1949) concepts of self-action, inter-action and trans-action. Emirbayer (1997) conceives both individualistic relationships, such as those focused on by Rational Actor Theory,2 and holistic relationships, such as those studied in Sys- tems Thinking,3 as self-actional relationships. In such relationships, the interest in relations is secondary: the primary interest is in how individuals or systems benefit from relations. Stepping briefly outside of Emirbayer (1997), it is worth stressing that the re- lational perspective does not preclude the consideration of structures as ‘wholes’, the sum of their parts (for example, see Lazer and Friedman, 2007). Indeed,

1Emirbayer (1997, p. 285) stresses that because, for example, the terms interaction and transaction are sometimes used as synonyms in everyday language, he maintains Dewey and Bentley’s (1949) use of the hyphen to emphasise the specialised meanings of these three terms. The discussion in the present chapter follows Emirbayer’s (1997) example by including the hyphen in trans-action here, but removing it in later sections. 2Emirbayer (1997, p. 284) cites Game Theory (see Ross, 2010) as an example of Rational Actor Theory. 3Actor Network Theory (for example, see Latour, 2005) provides an example of a holistic approach.

62 3.1 Relationalism one relational model, the network architecture model (see section 3.5.6.3), con- ceives structures as dependency structures. This said, a systemic viewpoint that considers structures more than the sum of their parts is fundamentally holistic rather than relational in nature. Inter-actional relationships are variable-based relationships. With inter-actional relationships, the focus is on “causal interconnection” (Dewey and Bentley, 1949, p. 108 in Emirbayer, 1997, p. 285). Emirbayer (1997) argues that only trans- actional relationships, those that focus on actors in relations, fall within the re- lational perspective. He notes that such relationships can be studied using social network analysis.

Crossley Crossley (2011) argues his case for relational sociology by considering three dichotomies: individualism/holism, agency/structure and micro/macro. His ar- guments concerning individualism and holism echo Emirbayer’s (1997) comments. In terms of agency and structure, Crossley (2011, p. 5) notes that the relational perspective studies both, albeit with a specific understanding: “ . . . inter-actors make history (agency) but not in circumstances of their choosing (structure)”.4 Crossley (2011, p. 181–182) considers the micro/macro dichotomy in sociology a “problem of scale” and notes that relational structures scale both up and down.5

Wellman Wellman’s (2008) discussion is notable for its emphasis on the differences between relational and non-relational perspectives on normative behaviour. Well- man (2008, p. 33) argues:

. . . [structural analysts] interpret behavior in terms of structural con- straints on activity instead of assuming that inner forces (i.e., intern- alized norms) impel actors in voluntaristic, sometimes teleological, behavior toward desired goals. Thus, they treat norms as effects of

4The terminology here is confusing given Emirbayer’s (1997) use of both interaction and transaction. Although Crossley (2011) prefers the term, interaction, his use of the term is more in line with Emirbayer’s (1997) understanding of transaction. See Crossley (2011, p. 28–35). 5For example, network motifs (see section 3.5.7.3) represent a scaling down of relational structure.

63 3. METHODOLOGICAL FRAMEWORK

structural location, not causes . . . .

Wellman stresses that the relational perspective focuses on external social, as opposed to internal psychological, structures. As such, structural analysts are not concerned with internal motivations for action: “ . . . accounting for individual motives is a job better left to psychologists” (Wellman, 2008, p. 33). Two points related to the external focus of relationalism should be noted: the focus does not deny the internal experiences of individuals, but rather, examines such experiences only in the context of external relations; and, the focus does not assume that social structure works towards a collective goal or results in intended consequences.

3.2 Phenomenology

3.2.1 Introduction and definitions

Section 2.3 specified the nature and features of process in serendipity. Recall that, as well as being fundamentally defined by the duality at its core, serendip- ity is perceived as lived, experiential, cumulative, longitudinal, retrospective and relational. The use of the term ‘perceived’ is significant. The present study in- vestigated serendipity as phenomenon, serendipity as perceived by the individual experiencing it. Phenomenon is defined as: “Thing that appears or is perceived . . . ; that of which a sense or the mind directly takes note . . . ” (Sykes, 1976, p. 828). Phenomena are in the mind of the beholder, irrespective of their ex- istence, or not, in any external world beyond the beholder. Thus, the study of serendipity as phenomenon is concerned with appearance rather than reality in any positivist sense. To study serendipity as a phenomenon, the research adopted a phenomenological stance.

64 3.2 Phenomenology

In a paper examining phenomenology in information research,6 Budd (2005, p. 45) notes:

. . . phenomenology is complex and it has a rich and complicated his- tory. There is no single agreed-on definition of phenomenology; a number of philosophers have developed phenomenological programs that have similarities, but they also have distinct identities.

This is not the place to attempt a review of the numerous discussions and debates that abound in both philosophy and sociology concerning the nature, conceptions and branches of phenomenology. Such a review is a substantial study in its own right; however, with so many distinct understandings of phenomenology presented in the literature, an explication of the present study’s interpretation of phenomenology is necessary. In its most basic definition, phenomenology means ‘the science of the phe- nomenon’ (Lewis and Staehler, 2010, p. 1; Moran, 2002, p. 4). Moran (2002, p.4) elaborates further, referring to phenomenology as, “ . . . the science which stud- ies appearances, and specifically the structure of appearing . . . ”. Smith (2008, sec. ‘Phenomenology’) continues with the theme of structure, defining phenomen- ology as, “ . . . the study of structures of conscious experience as experienced from the first-person point of view”. Section 3.2.4 examines one understanding of phenomenological structure. Defining conscious experience as experience that an individual lives through, acts or performs, Smith (2008, sec. 2) writes:

How shall we study conscious experience? We reflect on various types of experiences just as we experience them. That is to say, we proceed from the first-person point of view. However, we do not normally char- acterize an experience at the time we are performing it. . . . Rather, we acquire a background of having lived through a given type of ex- perience, and we look to our familiarity with that type of experience

6Because of the limited space available here, the decision was taken not to review the lit- erature on the use of phenomenology in information research. The present study adopts a phenomenological stance overwhelmingly due to the nature of serendipity rather than the in- volvement of information behaviour. Readers interested in analyses of phenomenology in rela- tion to information research, especially work in information seeking and information behaviour, are referred to Budd’s (2005) review and Wilson’s (2002) discussion.

65 3. METHODOLOGICAL FRAMEWORK

. . . The practice of phenomenology assumes such familiarity with the type of experiences to be characterized. Importantly, also, it is types of experience that phenomenology pursues, rather than a particular fleeting experience—unless its type is what interests us.

Smith highlights several points fundamental to phenomenology’s study of con- scious experience. Firstly, conscious experience is always articulated from the first-person perspective. Secondly, conscious experience is retrospective and cu- mulative. Finally, conscious experience can be categorised into types of experi- ence, especially familiar types of experience. In relation to his last point, the possible types of experience, Smith (2008, sec. 2) elaborates:

We all experience various types of experience including perception, imagination, thought, emotion, desire, volition, and action. Thus, the domain of phenomenology is the range of experiences including these types (among others).

Note that the experience types studied in phenomenology are familiar and ex- perienced by ‘all’. Its categorisation of common and familiar experience into types highlights phenomenology’s interest in the normative aspects of conscious experience. Phenomenology aims for presuppositionless7 description of conscious experi- ence and avoids a causal, explanatory focus. Moran (2002, p. 1) notes that, “The phenomenological approach is primarily descriptive, seeking to illuminate issues in a radical, unprejudiced manner, paying close attention to the evidence that presents itself . . . ”. Moran goes on to comment that phenomenology’s goal of presuppositionlessness is impossible to attain in science, a discipline which must always rely to some degree on the use of cumulated human knowledge; he argues that the more attainable aim of “ . . . freedom from undisclosed prejudices” is normally the recognised and more realistic goal of phenomenology (Moran, 2002, p. 2).8

7Moran (2002, p. 2) credits this term to Husserl. 8All italics in the preceding paragraph are Moran’s own.

66 3.2 Phenomenology

3.2.2 Historical development

As a way of perceiving the social world, the phenomenological perspective is evid- ent in ancient Greek philosophy (Heidegger, 2002; Lewis and Staehler, 2010); how- ever, modern scholars recognise that, “ . . . as a distinctive philosophical method, phenomenology emerged gradually only in the context of post-Cartesian mod- ern philosophy and specifically in post-Kantian German philosophy . . . ” (Moran, 2002, p. 9). Throughout the eighteenth and much of the nineteenth century, the term phenomenology was used in conjunction with the idea of phenomena as facts that could be observed empirically, especially in scientific contexts; however, late in the nineteenth century, the meaning of phenomenon began to change (Smith, 2008). In 1874, the German philosopher and psychologist, Franz Brentano (1838- 1917), wrote his Psychologie vom Empirischen Standpunkte [Psychology from an Empirical Standpoint] (see Brentano, 1874), in which he refers to phenomena as occurring in the mind: phenomena can be either physical or mental, but both exist as intentional acts of consciousness (Huemer, 2010). With Brentano, phenomena begin to lose their purely positivist connotations. Edmund Husserl (1859-1938), a German mathematician, is recognised as the father of the formal discipline of phenomenology (Lewis and Staehler, 2010, p. 2). Put simply, Husserl sought to study phenomena, understood as fundamentally subjective, objectively. Husserl had a strong positivist background, studying astronomy, as well as mathematics and physics; however, he also had an active interest in philosophy. After completing his PhD in mathematics in 1883, Husserl went to study philo- sophy with Brentano and was strongly influenced by the psychologist’s anti- Hegelian approach to phenomena. Husserl sought to combine Brentano’s ideas with his own positivist back- ground and new approaches to the concepts of objectivity and subjectivity that were emerging in the field of logic (Beyer, 2011; Moran, 2002). In his Logische Untersuchungen [Logical Investigations] Parts One and Two (see Husserl, 1900, 1901), Husserl introduced his version of phenomenology that aimed to combine the ideas of phenomena as both subjective and objective.

67 3. METHODOLOGICAL FRAMEWORK

Husserl’s writing on phenomenology impacted the work of other philosophers in Germany, France and America (although not extensively in the UK) (Moran, 2002, p. 16–22); several branches and approaches to phenomenology developed as a result.

3.2.3 The Sch¨utzianperspective

Alfred Sch¨utz(1899-1959), an Austrian-born philosopher and social scientist who settled in America after fleeing Nazi Europe, is credited with entrenching phe- nomenology within sociology (Barber, 2010; Wilson, 2002), beginning with a work entitled, Der sinnhafte Aufbau der sozialen Welt [The Phenomenology of the So- cial World] (see Sch¨utz,1932). In his pragmatic and sociologically relevant inter- pretation of phenomenology, Sch¨utzargued that subjective acts of consciousness construct an objective social world. Psathas (1999, p. 47) writes of Sch¨utz:

The problems of the social sciences he [Sch¨utz]says are posed by two questions: How is it possible to grasp subjective meaning scien- tifically? and how is it possible to grasp by a system of objective knowledge subjective meaning structures?

This interest in a scientific, objective approach to subjective structures defines Sch¨utzianphenomenology. Phenomenology is other than anti-positivist (Compton, 1988; Embree, 1992); however, although both share an interest in norms, phenomenology also stands away from positivism. Positivism concentrates on observing facts and events by removing the sub- jective human factor from the consideration of these facts and events. Phenomen- ology views the world differently, considering the impact of the human factor:

it [the world] is from the outset meaningful to the human beings living their everyday lives within it. This is so because our common-sense thinking has pre-interpreted and pre-structurized the world (Sch¨utz, 1999, p. ix).

As well, while positivism aims at causal explanation, phenomenology focuses on description: pure description; description in combination with interpretation

68 3.2 Phenomenology of context; or, analysis of forms of description (Moran, 2002; Smith, 2008). Despite the desire to investigate subjective, conscious experience as described in the first-person by individuals, Sch¨utzplaces absolutely and exclusively with the researcher all decisions on the structural description of this conscious exper- ience:

. . . structurization will depend upon knowledge of problems solved, of their still hidden implications and open horizons of other still not formulated problems. The scientist takes for granted what he defines to be a datum, and this is independent of the beliefs accepted by any in-group in the world of everyday life. The scientific problem, once established, determines alone the structure of relevance (Sch¨utzin Psathas, 1999, p. 48).

Sch¨utzalso openly values the independence of the investigator. Wilson (2002, sec. “Schutz’s methodological position”) summarises Sch¨utz’sthinking regarding this “disinterested observer”:

In Schutz’s words . . . The social scientist . . . is operating on the basis of a scientifically-determined set of relevances, choosing those aspects of the situation that are appropriate for the objectives of the research. As a result, the social scientist may focus on aspects of behaviour that are taken-for-granted by the ordinary person, but that are topics of cognitive interest to the social scientist. In behaving in this way, the researcher develops models of human action . . . .

Sch¨utz’snotions of defining data, assuming investigator independence, and establishing a scientific problem that focuses on structure, ‘models of human ac- tion’, highlight the differences between phenomenology and other social science approaches interested in the social world as human experience, including nat- uralistic inquiry (Lincoln and Guba, 1985) and grounded theory (Glaser, 1992; Glaser and Strauss, 1967). Although, in common with phenomenology, natur- alistic inquiry is not interested in generalisations concerning cause and effect, the naturalistic approach differs from the phenomenological approach in several key ways: naturalistic inquiry considers the act of research as fundamentally subjective, rather than objective; the naturalistic perspective understands the world as comprising multiple realities, therefore making the description of a phe-

69 3. METHODOLOGICAL FRAMEWORK

nomenon as a single predictable reality impossible; and the naturalistic approach concentrates on developing hypotheses concerning individual cases rather than generalising to norms (Lincoln and Guba, 1985). Unlike naturalistic inquiry, grounded theory shares phenomenology’s focus on “ . . . fundamental latent patterns . . . ” (Glaser, 2004, sec. 3.1); however, while phenomenology is interested in the description of patterns, grounded theory is concerned with the conceptualisation of theory based on the patterns. As well, the bottom-up, inductive approach of grounded theory is in direct opposition to the Sch¨utzianapproach to the deductive definition of data and establishment of scientific problems.

3.2.4 The structure of intentionality

Sch¨utz’sreferences to structure (‘pre-structurized’, ‘structurization’, ‘structure of relevance’, see section 3.2.3) and the definition of phenomenology examined in section 3.2.1 serve as reminders that phenomenology focuses on structure. But what is meant by structure in the phenomenological sense? Structure relates to the idea of intentionality. Recall from section 3.2.2 that Brentano considered all phenomena as occurring in the mind, as intentional acts of consciousness; therefore, in its very existence, conscious experience is an expres- sion of intentionality. Intentionality implies that experience is always of some- thing and directed at something (Lewis and Staehler, 2010, p. 22). The relation between the first person, ‘I’, who articulates the conscious experience, and the object receiving the intentionality of that experience is the fundamental structure of interest in phenomenology. Consider a phenomenological description of one type of conscious experience: action. At a minimum, all phenomenologically relevant action can be expressed in the form of a sentence with the structure {Subject:Verb:Object} (Smith, 2008, sec. 2). Now focus in to consider a specific sentence: ‘I walk the dog.’ The subject, ‘I’, is crucial because of the first-person perspective required in phenomenology: it is ‘I ’ who proceeds with intent. This intentionality is carried through in my activity of walking, and ‘I’ intend my activity onto something, ‘the dog’. The

70 3.2 Phenomenology intentionality of my conscious experience, the action of walking the dog, only has meaning as an expression when the subject and object link through the verb to form a relation. In phenomenology, this intentionality relation, a relational dyad, is the smal- lest structural unit of study. Of course, most phenomenological description, in- terpretation and analysis in the context of sociological research focuses on struc- tures many times more complex than a single sentence; nonetheless, the simple example presented above illustrates how intentionality is evident even in the most basic phenomenological action unit, conscious experience expressed as a {Subject:Verb:Object} relation.

3.2.5 Problems and limitations

Phenomenology is not without its problematic aspects and limitations. The phe- nomenologist must not only accept methodologically inherent limitations on the description of conscious experience but also take steps to outline accurately the boundary of that conscious experience. The boundary of conscious experience is affected by several concerns. Firstly, conscious experience will never include subconscious experience unless such experience is first made conscious (Reber, 1993). Frameworks focusing on cognitive activities tend to distinguish between two modes of control, the atten- tional mode, which guides conscious cognitive activity, and the schematic mode, which regulates subconscious activity (Reason, 1990, p. 50–51). Cognitive activ- ity in the attentional mode requires constant effort to sustain, whereas cognitive activity in the schematic mode uses schemas, established patterns of past think- ing, experience and awareness, to ease the strain of cognitive processing, removing the effort required in the attentional mode. Patterning as schema is highly pre- valent in human behaviour, action and thought. We are unlikely to describe patterning, which in most cases we are unaware of, as conscious experience. Secondly, humans are subject to the combined effects of motivated thinking (Molden and Higgins, 2005) and the use of mental models (Johnson-Laird, 2005). Motivated thinking implies that humans seek desired outcomes and shape their thinking accordingly (Molden and Higgins, 2005, p. 297). Mental models are

71 3. METHODOLOGICAL FRAMEWORK perceptions of the external world created by an individual for the purpose of in- ternal manipulation. This internal manipulation allows the individual to anticip- ate possible outcomes (Johnson-Laird, 2005, p. 185). Results of the combination of motivated thinking and the use of mental models are the human tendencies to filter internal and external environments, to restrict attention to environmental elements considered relevant, and to see the world in a preconceived way, poten- tially not as it really is. Finally, the limits of human memory (see Morrison, 2005) bound the articu- lation of conscious experience. Because conscious experience is always defined as past experience, the risk is ever present that the articulation of conscious experi- ence omits forgotten elements. As such, conscious experience must be understood to encompass only remembered experience.

3.3 Narrative theory

3.3.1 Overview

3.3.1.1 Narrative in social science research

Barthes (1977, p. 79) notes that in its “ . . . almost infinite diversity of forms, narrative is present in every age, in every place, in every society . . . ”. Franzosi (1998, p. 519) argues that this diversity and omnipresence makes narrative highly relevant to social science research: “It is precisely because (a) narrative texts are packed with sociological information and (b) much of our empirical evidence is in narrative form that sociologists should be concerned with narrative”. Indeed, since the early 1990s, the study of narrative has become notably more popular in disciplines across the social sciences (Elliott, 2005; Squire, 2008); for example, in information-related disciplines, narrative approaches have been used to study information behaviour (Bates, 2004a,b; Gainor, 2011), information re- trieval and storage (Beghtol, 1997), information systems (Alvarez and Urla, 2002; Gazan, 2005; Wagner, 2003) and information/knowledge management practice (Brophy, 2009; Watson, 2001). The use of narrative to which Elliott (2005), Franzosi (1998) and Squire (2008) refer is not the study of data that ‘just happen’ to be in narrative format, but

72 3.3 Narrative theory rather the explicit study of narrative in its own right. Elliott suggests that this explicit interest in narrative is driven by several themes:

1. An interest in people’s lived experiences and an appreciation of the temporal nature of that experience; 2. A desire to empower research participants and allow them to contribute to determining what are the most salient themes in an area of research; 3. An interest in process and change over time; 4. An interest in the self and representations of the self; 5. An awareness that the researcher him- or herself is also a nar- rator. (2005, p. 6)

A single study may explore all of these themes or choose to focus on only one or two. As the themes highlight, both a phenomenological approach and a focus on process are compatible with the study of narrative.

3.3.1.2 A definition of narrative

What is narrative? Many definitions of narrative exist; in fact, the definition is a source of controversy because what narrative is depends to some extent on the approach used to study it (Squire, 2008). Hinchman and Hinchman (1997, p. xvi) offer a useful starting point:

Narratives (stories) in the human sciences should be defined provision- ally as discourse with a clear sequential order that connects events in a meaningful way for a definite audience and thus offer insights about the world and/or people’s experiences of it.

Note that this definition endows narrative with what Squire (2008, p. 9) terms, structural, semantic and pragmatic features. Narratives have structure; an essential component of this structure is a chro- nologically ordered sequence of events that cumulates in at least one reportable event (Franzosi, 2010; Labov, 1997; Labov and Waletzky, 1967). Narratives also have meaning—in some cases, multiple meanings; as such, narrative research- ers need to be clear about which meanings they propose to study (Squire, 2008,

73 3. METHODOLOGICAL FRAMEWORK

p. 34–35). Finally, narrative is told to someone, a defined audience, even if that audience is imagined; thus, narratives are social, both performed in local social contexts and “ . . . bound into wider negotiated social worlds” (Plummer, 1995, p. 24).

3.3.1.3 Approaches to narrative

As discourse, narratives are stories that communicate events and experiences to an audience. These stories may be told from the first- or third-person perspective, and may range from fact to fiction. Narrative research in the social sciences tends to focus more on factual stories— or, at least, stories with some basis in real-life experience—told in the first-person; however, the research rarely, if ever, conceives such narrative as an objective rep- resentation of reality in a positivist sense. For example, narrative researchers ac- knowledge that, due to the inherently retrospective nature of narrative data, the informant accuracy problem (Bernard, Killworth, Kronenfeld and Sailer, 1984; Marsden, 1990) affects all narrative research; nonetheless, narrative researchers also argue that, in the context of narrative research, the issue of informant ac- curacy need not necessarily be considered problematic in terms of its impact on any research findings (Elliott, 2005, p. 22–28). At least two typologies of narrative approaches have been established (Elli- ott, 2005, p. 37). One framework classifies narrative approaches according to Halliday’s (1973) three functions of language: syntax, or structure; semantics, or meaning; and pragmatics, or performance context (Elliott, 2005; Mishler, 1995; Squire, 2008). Structural approaches base their analyses on the form underlying narrative and draw heavily on linguistic concepts. Semantic approaches are in- terested in the meaning of narratives—what experience does narrative describe? The semantic approach to narrative is grounded in phenomenology. Squire (2008, p. 16) notes that the experience-based approach “ . . . rests on the phenomenolo- gical assumption that experience can, through stories, become part of conscious- ness”. Pragmatic approaches see narratives as performance and/or as classed into genres or types. Questions in this tradition of narrative research seek to address the social and cultural causes, effects and impacts of narrative.

74 3.3 Narrative theory

Somewhat confusingly, as Squire (2008) illustrates, the ‘function of language’ framework is commonly used to classify narratives as well as the approaches to them; thus, researchers may refer to event narratives, narratives of experience, or cultural or context-bound narratives. It is important to realise that a single narrative can be classed into any one of these three categories: thus, narrative classification is not absolute, but rather, entirely dependent on the perspective adopted and analysis approach taken. It is for this reason that the definition of narrative depends to some extent on the approach used to study it (see section 3.3.1.2). Lieblich, Tuval-Maschiach and Zilber (1998) suggest a second framework for classifying narrative approaches. Their framework describes narrative approaches in two dimensions (Elliott, 2005, p. 38). In the first dimension, narrative ap- proaches are classed into those that concentrate on form and those that focus on content. Thus, this dimension is similar to the first two categories of the ‘function of language’ framework discussed above. In the second dimension, narratives ap- proaches are typed as holistic or categorical: holistic analyses study the complete narrative, while categorical analyses focus on extracted sections of text. Elliott (2005, p. 38) describes the difference between these two analytic approaches:

Whereas holistic approaches attempt to understand sections of the text in the context of other parts of the narrative, categorical ap- proaches do not attempt to preserve the integrity of the whole account. As Lieblich et al. argue, . . . these categorical approaches resemble tra- ditional content analysis and are more amenable to quantitative or statistical methods of analysis.

While these two typologies of narrative approaches are helpful for understand- ing how narrative is studied, the frameworks should not be seen as implying the need for rigidity or exclusivity in narrative approaches. Elliott (2005, p. 38) notes, “As with all typologies, these two rather different frameworks both repres- ent something of an oversimplification”. In practice, narrative approaches may overlap or be used in combination. For example, the present study relied heavily on the use of categorical analysis that focused on structural aspects of narrative; however, this analysis and focus would have been impossible (and irrelevant) without consideration of narrative content.

75 3. METHODOLOGICAL FRAMEWORK

3.3.2 Event narrative as structure

3.3.2.1 The Labovian narrative analysis framework

Labovian analysis is a well-known structural approach to event narrative (Elliott, 2005, p. 42). The original version of the formal analysis framework was presented in Labov and Waletzky (1967) and re-published in Labov and Waletzky (1997). The original framework was concerned primarily with temporal organisation and evaluation in narrative (Labov, 1997). Alongside the Labov and Waletzky (1997) re-publication, Labov published an extended version of the original framework (Labov, 1997), that, amongst other issues, considers reportability and credibility (see section 4.4.2). Labov and Waletsky (1967; 1997) divide the overall structure of an oral nar- rative of personal experience into six sections:

1. an abstract, which provides a summary of the story;

2. an orientation, which provides the setting for the story;

3.a complicating action, which sets out the sequence of events of the story;

4. an evaluation, which describes the result or impact of the events;

5.a resolution, which provides an ending to the story; and,

6.a coda, which brings the story back to the present.

Of these six sections, the complicating action and evaluation are especially relevant to the present study. A narrative complicating action comprises a min- imum of two events that the narrator perceives as related in some way. Although a complicating action is theoretically “ . . . both necessary and sufficient to consti- tute a narrative” (Franzosi, 2010, p. 14), in practice, a narrative comprising only a complicating action would be “abnormal” (Labov and Waletzky, 1997, p. 13) because such a narrative would lack a main point, an element normally presented in the evaluation section. Indeed, it is the main point that helps an audience to understand the sig- nificance of the relationships between the individual events in the complicating action. Elliott (2005, p. 43) notes:

76 3.3 Narrative theory

. . . the evaluation demonstrates what meaning events have for the nar- rator and makes the point or purpose of the story clear to the audi- ence. It is because the evaluative dimension of a narrative provides an insight into how the narrator has chosen to interpret the events recounted that these evaluative elements can be of particular interest for sociologists in the hermeneutic tradition.

Event narrative can thus offer much to the researcher grounded in the phe- nomenological tradition: as Labov (1997, p. 396) emphasises, personal experience narratives are not tall tales or elaborated exaggerations of experiences, but rather “ . . . an attempt [by the narrators] to convey simply and seriously the most im- portant experiences of their own lives”.

3.3.2.2 Franzosi’s Model of Narrative Elements

Franzosi (2010) considers event narrative a mix of action, description and evalu- ation. This mix is evident at all levels of narrative structure. Figure 3.1 illustrates Franzosi’s (2010, p. 19) Model of Narrative Elements, which depicts action, de- scription and evaluation as three overlapping circles of different sizes. Given the minimal definition of narrative as a sequence of events (Elliott, 2005; Franzosi, 2010; Labov and Waletzky, 1967, 1997), action resides in the largest circle; non- etheless, the overlapping circles serve as a reminder that the different aspects of narrative mix to some degree. Franzosi (2010, p. 19) labels the largest circle the narrative proper, an element that focuses on “ . . . what people do, not on what people say, feel, [or] think . . . ”. Action events in the narrative proper are described as semantic triplets (see section 3.3.3.1). Franzosi (2010, p. 22–23) classes action in terms of Halliday’s process groups,9 arguing that material processes, along with some verbal and behavioural processes, are found most frequently in narrative propers. Franzosi’s narrative proper is similar conceptually to Labov and Waletzky’s complicating action—both encompass narrative events; however, due to the ana- lysis methods involved in its delineation, the complicating action is perhaps best

9See (Halliday, 1985; Halliday, 1994; or Halliday and Matthiessen, 2004) for more detail on these process groups.

77 3. METHODOLOGICAL FRAMEWORK

Figure 3.1: Franzosi’s (2010) Model of Narrative Elements - The model in this figure is reproduced from Franzosi (2010, p. 19). The model realises narrative as comprising three elements, narrative proper, or action, evaluation and descrip- tion, that may overlap. As an essential component of all narrative, action, in the form of the narrative proper, occupies the largest circle. conceived as a formal concept found only within the bounds of a Labovian ana- lysis. As such, the present study preferred the less specific term, narrative proper.

3.3.2.3 Multiple levels of structure

It is important to note that the structural approaches discussed in sections 3.3.2.1 and 3.3.2.2 conceive event narrative as multi-level. The complete narrative rep- resents the highest structural level. The complicating action or narrative proper is described at a more finely-tuned structural level. The narrative clause and action events in the form of the semantic triplet (see section 3.3.3.1) are found in the deepest structural level.

78 3.3 Narrative theory

3.3.3 The semantic triplet

3.3.3.1 Definition and notation

The semantic triplet is a semantic relationship expressed as, {Subject:Predicate: Object}, that comprises “ . . . a directed link from the subject node to the ob- ject node, where the predicate link describes the relationship between them [the subject and object nodes]” (Kinsella, Harth and Breslin, 2007, p. 1–2). One- way direction is a defining feature of the semantic triplet: semantic meaning is only established by the movement from subject through predicate to object as in the statement, {Subject → Predicate → Object}. Direction allows the semantic triplet to convey meaning in structural form. For this reason, semantic triplets play a role in semantic grammars (see section 3.3.3.4). An example of the use of the semantic triplet in an information-related area is the Resource Description Framework (RDF),10 the metadata model used to label the conceptual linking structure of the Web, a structure commonly termed the Semantic Web11 (Er´et´eo,Gandon, Buffa and Corby, 2009; Kinsella et al., 2007; W3C®, 2004b, sec. 6.1). Alternative notations for the semantic triplet, used especially in semantic grammar and event-based analysis contexts (see sec- tion 3.3.4), include: {Subject:Verb:Object} (Roberts, 1997, p. 90); and {Subject: Action:Object} (Franzosi, 1989, p. 263; Franzosi, 2004, p. 49; Franzosi, 2010, p. 24) or {Participant:Process:Participant} (Franzosi, 2010, p. 24). The present study used the semantic triplet notations {Subject → Action → Object} and {Participant → Process → Participant} interchangeably.

3.3.3.2 Three points

Three points about the semantic relationship {Subject:Predicate:Object} are es- pecially relevant to the present study. The first point is that a collection of related semantic triplets forms a relational structure. This structure is normally conceived and studied as a network. The second point is that the elements of

10See W3C® (2004b) for an introduction to RDF concepts and W3C® (2004a) for an overview of the RDF vocabulary. For more information on the current status of the RDF, see W3C® (2011a). 11See W3C® (2011b) for information about the Semantic Web.

79 3. METHODOLOGICAL FRAMEWORK

the semantic triplet can have attributes. These attributes can be generalised or highly specified. The third point is that that the semantic triplet can be either intransitive or transitive. An example from the literature serves to illustrate all of these points. Kinsella et al. (2007) aimed to study how objects form implicit links between people in object-centred networks. The researchers extracted semantic triplet data, including Friend of a Friend (FOAF) descriptions12 based on {Subject: Predicate:Object} triples, from the Semantic Web in order to investigate the networks of two people, Tim Berners-Lee and Andreas Harth. The research considered two types of networks for these people, people-only networks and object-centred networks. Each type of network was determined by the attributes assigned to the semantic triplets: the people-only network comprised triplets re- stricted to a {PersonSubject:Predicate:PersonObject} format, while the object- centred network allowed subject and object to have any attribute, {Subject: Pre- dicate:Object}. The FOAF data collected in the study formed relational structures, networks, because the data were expressed in the FOAF vocabulary. This vocabulary, which includes terms for classes, the subject and object in the semantic triplet, and properties, the predicate in the semantic triplet, follows the Resource Description Framework’s linking structure, or triple model.13 While many elements of the FOAF vocabulary combine into either statements of association (e.g. triplets that associate an image or email address of a person with that person) or copula clause-type statements,14 others describe transitive semantic triplets; for example, the class, ‘agent’, notated as foaf:Agent, in com- bination with the property, ‘made’, notated as foaf:made, and the class ‘doc- ument’, notated as foaf:Document, outlines an action as the semantic triplet, {foaf:Agent → foaf:made → foaf:Document} . Section 3.3.4 explores the rel- evance of such transitive semantic triplets to the event-based study of narrative.

12An FOAF description is a machine-readable document that follows the RDF linking struc- ture model. A FOAF description describes a person in terms of that person’s Web-based links to their creations, activities and other people (Brickley, 2011a). FOAF descriptions make up part of the Semantic Web. See Brickley (2011b) for more information on the FOAF project. 13See Brickley and Miller (2010) for more information on the FOAF vocabulary. 14’The author is a researcher’ is an example of a copula clause.

80 3.3 Narrative theory

3.3.3.3 The syntactic/semantic subject-object discrepancy

Despite its notation in syntactic terms, the semantic triplet is, as its name sug- gests, semantic. Work in the area of natural language processing recognises the existence of several levels of natural language understanding, from words to syn- tax and semantics, to discourse and pragmatics (Liddy, 2003). While automated “thematic content analysis has yielded considerable suc- cess”, the automated coding of semantic relationships such as semantic triplets “has proven more difficult” (van Atteveldt, Kleinnijenhuis and Ruigrok, 2008, p. 428). Because they cannot rely on syntax alone to capture semantic relation- ships, automated systems can struggle with the ambiguity possible in a natural language semantic triplet. Van Atteveldt et al. (2008, p. 432) refer to this am- biguity as the “syntactic subject-object semantic subject-object discrepancy”. Semantic grammars offer systematic ways to address this discrepancy. Section 3.3.3.4 introduces one such grammar, Dixon’s (2005) framework.

3.3.3.4 Semantic grammar

In his, A Semantic Approach to English Grammar, Dixon (2005, p. xv) aims “ . . . to put forward a semantically oriented framework for grammatical analysis, and to indicate how this framework can be applied”. Dixon relies heavily on the concepts of semantic types and semantic roles. Semantic types are groups of words arranged at the semantic level that have the same meaning component. Dixon labels each semantic type after import- ant members of the group of words belonging to that type. For example , the verbs ‘begin’, ‘start’, ‘finish’, ‘stop’ and ‘continue’ all have a common meaning component and are collectively labelled, the BEGINNING type (Dixon, 2005, p. 7). At the level of grammar, Dixon places words in word classes, or parts of speech. An example word class is VERB. Semantic types can belong to only one word class or to several word classes. Dixon (2005, p. 9) notes:

A verb is the centre of a clause. A verb may refer to some activity and there must be a number of participants who have roles in that activity . . . or a verb may refer to a state, and there must be a participant to

81 3. METHODOLOGICAL FRAMEWORK

experience the state . . . .

In the VERB word class, Dixon (2005, p. 10) argues that the verbs of each semantic type have a distinct set of roles; for example, GIVING semantic type verbs demand a Donor, a Gift and a Recipient while AFFECT verbs require an Agent, a Target and a Manip, something which is manipulated by the Agent to affect the Target. Note that the Manip need not be explicitly stated in the case of the AFFECT semantic type. Dixon (2005, p. 10) proposes that language has a limited number of syntactic relations, with Subject and Object being “probably universal”. Semantic roles can often be mapped onto syntactic relations in different ways. As an example, Dixon (2005, p. 11) gives three different Subject:Object mappings of AFFECT verbs:

1. Agent as Subject, Target as Object, Manip marked by a prepos- ition, as in John hit the pig with his stick; 2. Manip in Object slot, as in John hit his stick against the lamp post; 3. Manip in subject relation, as in John’s stick hit Mary (when he was swinging it as she walked by, unnoticed by him.)

While Dixon (2005) also reviews nouns, adjectives, and adverbs, his attention to verbs is especially relevant to the present study. Dixon (2005, p. 11–12) argues that verbs can be divided into those demanding only one role and those demand- ing two or more roles. Verbs with only one role are intransitive and map onto an S relation, intransitive subject. Such a relation is not a semantic triplet, but rather a ‘duple’. Verbs with two or more roles are transitive and must map one role onto an A relation, transitive subject, and another role onto an O relation, transitive object, thereby creating a transitive semantic triplet. Dixon argues that some transitive verbs, termed extended transitive or ditrans- itive verbs, including ‘give’, ‘show’ and ‘put’, must map onto an E relation, or extension to the core, as well as onto A and O relations; for example, the sen- tence, ‘I put the plate in the dishwasher’, requires the verb to be mapped onto A, O and E relations. An extended transitive mapping effectively results in two

82 3.3 Narrative theory semantic triplets, one with an explicit predicate and one with an implicit predic- ate. Dixon also notes that some verbs are ambitransitive because they can be either transitive or intransitive, depending on how they are used. In the process of developing his semantic grammar, Dixon (2005, p. 13) clas- sified around nine hundred English verbs into semantic types and mapped the roles of the semantic types onto syntactic relations. A list of these verbs and their semantic roles is presented as an appendix in Dixon (2005, p. 484–491). Dixon’s appendix list is a practical tool, in part, because it allows the quick identification of the transitive subject and object, even in complex sentence struc- tures, but more significantly, because it offers a systematic means for approaching the extraction of semantic triplets from semantically ambiguous sentences. This practical aspect of Dixon’s semantic grammar has been noted previously in the context of applied network research involving semantic triplets (van Atteveldt et al., 2008, p. 432). The present study used Dixon’s (2005) list as a tool to aid the collection of the semantic triplets underlying the event structure models (see sections 5.3.3 and 5.3.4).

3.3.4 Event narrative and the semantic triplet

3.3.4.1 The semantic triplet as process

The definition supplied in section 3.3.3.1 presents the semantic triplet as a generic concept for representing semantic relationships; however, in an event narrative analysis context, the semantic triplet has a more specific meaning—it repres- ents process. As such, the semantic triplet predicate is normally an action verb. Exceptions to this norm involve extended transitive verbs (see section 3.3.3.4). The semantic triplet is a useful concept in event narrative analysis for three reasons:

1. the semantic triplet reduces the description of narrative events to a bare minimum, potentially allowing researchers to notice information that might otherwise remain hidden in more complex descriptions;

2. the semantic triplet is the smallest relational unit describing process in an event narrative; and,

83 3. METHODOLOGICAL FRAMEWORK

3. the complicating action or narrative proper, as an event structure described by the interrelation of multiple semantic triplets, can be studied using net- work methods.

3.3.4.2 Aggregation approaches

One approach to semantic triplets is to aggregate triplets from multiple texts by classifying the triplets into categories based on semantic content. Roberts (1997, p. 90) describes an aggregated semantic triplet approach:

First, the researcher isolates a population of texts germane to the phenomenon under investigation. Second, a semantic grammar must be acquired that specifies the relations that may be encoded among themes in the texts. Finally, the texts’ themes are encoded according to the relations specified in the semantic grammar. Encoded inter- relations may then be used as indicators of various characteristics of the phenomenon under study.

In aggregation approaches, once semantic triplets are identified, they are cat- egorised according to themes or concepts. Statistical methods may also be applied to the counts of semantic triplets in each theme or concept class. The important thing to note about aggregation approaches is, that although specific semantic triplets are extracted from individual texts, the research interest is not in the individual triplet or text, but rather, in the aggregated triplets as a collective representation of multiple texts. As such, aggregation approaches are not suitable for research that seeks to capture individualistic aspects of texts.

3.3.4.3 Matrix-based approaches

A collection of semantic triplets can also be expressed as a matrix. Section 3.5.2 introduces mathematical matrices; in the social sciences, matrices, normally conceived as tables, are used to collect relational data, including semantic triplet data. This section explores three matrix-based approaches to collecting semantic triplets from narrative: aggregated semantic triplet matrices, participant ana- lysis tables, and meta-matrix models. Because they produce matrices from single,

84 3.3 Narrative theory rather than multiple, narratives, the latter two approaches were especially relev- ant to the present study.

Aggregated semantic triplet matrices Franzosi (2010, p. 109) comments, “There is a class of statistical tools that is perfectly suited to dealing with relationships between actors, tools that mirror the SAO [Subject:Action:Object] structure: network models”. One use of net- work models in an event narrative analysis context relies on the description of aggregated semantic triplets in matrix format. These matrices underlie networks that can be analysed using network methods (for examples, see Franzosi (2010, p. 112–113) and Wada (2004)). Note that automation of the description of aggregated semantic triplet matrices is possible: for example, PC-ACE (Program for Computer-Assisted Coding of Events) (Franzosi, Elliott-Smith and Cunial, 2012) stores semantic triplets in tables from which the software then computes the matrices underlying the net- work models (Franzosi, 2010, p. 110).

Participant analysis tables Labov (2003) reports on research that aimed to uncover the event structure of narrative. Although some of Labov’s research objectives, especially his attempts to infer causality from rewritten semantic triplets, are potentially problematic (Squire, 2008, p. 13–14), Labov’s methods are of interest because they capture in matrix format complicating actions represented as collections of semantic triplets. The present discussion focuses only on the matrix aspect of Labov’s (2003) work with event structures. Labov constructs matrices of sequences of events found in the complicating actions of narratives. Each event structure matrix contains narrative actors, ex- tracted from the complicating action, in the rows, and narrative action, also extracted from the complicating action, ordered sequentially in the columns. Re- lations between actors and action are marked in the cells of the matrix. Labov presents several example matrices, which he terms participant analysis tables. Table 3.1 reproduces a participant analysis table from Labov’s (2003) study. In addition to the participant analysis table, Labov (2003, p. 65) provides

85 3. METHODOLOGICAL FRAMEWORK

the reader with the complicating action from part one of the underlying narrative, The first man killed by a car in this town:

Complicating action

e And this son—I guess he must’ve got drunk because he drove through town with a chauffeur with one of those old touring cars without, you know— open tops and everything, big cars, first ones— f and they—they come thr-through town in a—late in the night. g And they went pretty fast, I guess, h and they come out here to the end of a– where–uh–Pontiac Trail turns right or left in the road i and they couldn’t make the turn j and they turned left k and they tipped over in the ditch, l steerin’ wheel hit this fellow in the heart, this chauffeur, m killed him.

Each event (e–m) from the complicating action, with the exception of event (i), is labelled in sequence across the top of the matrix in Table 3.1.15 Where relevant, the semantic triplet {Subject → Action → Object} structure of the events can be found in the columns that contain both a y, an active participant (i.e. the semantic triplet subject), and a z, the participant affected by the action (i.e. the semantic triplet object.). Thus, the matrix contains three semantic triplets: {chauffeur → drove → car} from event (e), {steering wheel → hit → chauffeur} from event (l), and {steering wheel → killed → chauffeur} from event (m). Note another significant aspect of Labov’s matrix: it allows an inanimate object, a steering wheel, to sit in the subject position of the semantic triplet.

15Note that Labov does not give a reason for the omission of event (i); however, it seems likely that event (i) was omitted because it is negative (i.e. the turn didn’t happen).

86 3.3 Narrative theory - This y y , and patients directly y The first man killed by a car in The first man killed by a car in this town . The first action is in parentheses, since the narrator qualifies it with x z x x x x y y y y y y z z e f g h j k l m (y) x x x x gets drove came went came turned tipped hit killed drunk through through fast out left over him him ’ ”. Car ; other participants are denoted as Judge z Chauffeur . Labov (2003, p. 68) explains the table’s content: “Active causal agents are entered as Judge’s son Steering Wheel I guess he must’ve Table 3.1: Labov’s participanttable analysis reproduces table Labov’s for (2003, part 1 p. of 68, Table 5)affected participant as analysis table‘ for part 1 of this town

87 3. METHODOLOGICAL FRAMEWORK

The meta-matrix model Carley, Bigrigg, Reminga, Storrick, DeReno and Columbus (2009, p. 86–93) present another matrix-based approach, one designed specifically to map nar- ratives as networks. The method takes a narrative and describes it as several distinct matrices. Each matrix represents a network. Taken together, these net- works form a meta-network of the narrative. Diesner and Carley (2005) term this meta-network method the meta-matrix model. The method involves three steps. The first step creates lists for classes of elements found in the narrative. Suggested element classes include agents, loca- tions, events, tasks, knowledge and resources; however, this list is not meant to be exhaustive. Analysts can include additional/fewer/different elements as re- quired (Carley, Bigrigg, Reminga, Storrick, DeReno and Columbus, 2009, p. 88). An example element list is an agent list, which includes all of the people in the narrative. In the next step, element lists are placed in the rows and columns of matrices. Due to the directed nature of semantic triplets, relations are always mapped from row elements to column elements, {row element → column element}; therefore, it is essential that the directed relation between the selected elements is a viable one. For example, agents can be placed in the rows opposite any other elements in the columns because agents can relate directly to all elements: agents can have ties with other agents, agents can participate in events, agents can be in locations, agents can use resources, and so on. Viable links between non-agent elements tend to be more limited; for example, consider events and tasks. Events can be placed opposite events and tasks can be placed opposite tasks because both events and tasks can occur sequentially, {task → task} or {event → event}; however, the relation {event → task} is not viable because events cannot perform tasks (Carley, Bigrigg, Reminga, Storrick, DeReno and Columbus, 2009, p. 92). The final step records the relations between the elements located in the rows and columns. Relations are recorded in the cells of the matrix using numbers. A number indicates the number of times a relation occurs; for example, a one would indicate a single occurrence of a relation between two elements while a zero would indicate the absence of a relation between two elements.

88 3.4 Modelling theory

When used to its fullest extent, the meta-matrix approach collects much more information than simply the process and participants in narrative; however, the analyst can choose to apply an abridged form of the method when the analytical aim is to capture only semantic triplets from narrative. The present study used an abridged form of the meta-matrix approach to collect event structure model two (see section 5.3.4). It is important to note that the meta-matrix approach relies heavily on de- cisions taken by the analyst. Carley, Bigrigg, Reminga, Storrick, DeReno and Columbus (2009, p. 87) emphasise this point:

It is up to you [the analyst] . . . HOW to construct the individual NodeSets [element lists] and Networks [matrices] and how to combine the Networks into a MetaNetwork.

3.4 Modelling theory

3.4.1 Modelling as methodology

In their empirical study of the modelling process, Srinivasan and Te’eni (1995, p. 419) note:

Modelling is an approach that is used in many disciplines to manage complex problems . . . It is a process of abstraction in which important details are represented while irrelevant ones are omitted. The problem solver can thereby focus on the critical elements of the problem . . . .

Modelling, especially conceptual modelling that relies on words, diagrams or flowcharts, is established practice in information research; for related discussions, see Wilson’s (1999) review of models in information behaviour research, J¨arvelin and Wilson’s (2003) consideration of models for information seeking and retrieval, and Ford’s (2004) exploration of models of cognitive processing in information seeking.16

16These examples provide broad discussions of the use of modelling in information research. Section 2.2 presents selected information studies models; however, many more exist, too nu- merous to review in the limited space available here.

89 3. METHODOLOGICAL FRAMEWORK

The purpose of all models, including those used in information research, is to describe a representation of reality in a useful way. In other words, models are practical tools. Models develop and inform theoretical and practical understand- ing of a given problem or a domain by limiting attention to specific aspects of that problem or domain. Because models are tools, modelling can be considered a method; however, modelling is also a methodology. The choice to conceive social concepts as mod- elled is only one of many valid approaches to framing the social world. As a methodology, modelling is highly flexible. Apostel (1961, p. 1–2) ar- gues that models, as “ . . . instruments in the service of many different needs . . . ”, demonstrate an “ . . . undeniable diversity . . . ”. Models can be of many types and can be described to varying levels of abstraction; for example, Raftery (1998, p. 297) presents a taxonomy of models that ranges from iconic models, “ . . . essentially scale transformations of the real world . . . ”, to abstract symbolic and conceptual models. Despite their diversity, all models share three features:

1. they are structural representations;

2. they are simplified representations; and,

3. they are subjective representations.

3.4.2 Structuring structure

Models are comprised of parts, parameters, connected into a whole, the model. Models aim to present these connected parts in a structured manner. It is this concept of structure that contributes to the flexibility of modelling as a method- ology. Pullan (2000, p. 1) writes of structure:

Certain words spring to mind: organisation, arrangement, connec- tion, orientation, framework, and order—especially order, which is more encompassing but provides the ground for structure. Indeed, it is impossible to conceive of structure without order. Too often the im- plication is of something rigid and complete, yet structure, like order,

90 3.4 Modelling theory

can be open and dynamic, a matter of structuring as much as struc- ture . . . without the singular notion of some structure or framework it is difficult to make sense of anything, but it is in the articulating of structure, resulting in a potentially infinite plurality of structures, that meaning resides.

Modelling enables the examination of structure, in an ordered, although not necessarily linear, flexible framework; more significantly, modelling potentially en- ables this infinitely. In a very real sense, one can ‘play’ with a model’s structure— add to it, subtract from it, shift it, adapt it, turn it upside down and inside out—and through the process arrive at a more detailed understanding of what is being modelled.

3.4.3 Balancing accuracy and simplicity

Useful modelling involves establishing a balance between accuracy and simplicity. It is impossible to include every detail of a complex, real-world phenomenon or problem in a single model. Raftery (1998, p. 301) argues:

The purpose of the model is not to describe reality but to reduce it to a more manageable form . . . Models thus could be regarded as translations of reality into a different language which has fewer words and which addresses itself only to the features of reality relevant to the problem under scrutiny. Models frequently are criticized for not taking account of variables x or y. The criticism is, of course, justified only if the missing variables are important to the problem.

Relevancy relies on both accuracy and simplicity: include everything neces- sary, exclude everything unnecessary ‘to the problem under scrutiny’. As well, because models are “ . . . studied with a certain purpose in mind . . . ”, relevancy is also determined by factors such as decisions on scale and perspective (Apostel, 1961, p. 15).

3.4.4 Subjective interpretations

While it is very tempting to think of some models, for example, mathematical models, as purely objective, the issue of relevancy highlights the fact that there is

91 3. METHODOLOGICAL FRAMEWORK always, somewhere at some stage in the modelling process, a human interpreter (of ‘the problem under scrutiny’) making decisions that affect model description. No matter how carefully considered, human decisions are always fundamentally subjective. The very nature of decision implies that a choice was made. As such, all models are subjective interpretations.

3.4.5 Wrong, but useful

“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful” (Box and Draper, 1987, p. 74). This oft paraphrased comment is attributed to the statistician, George Box, and serves as a reminder of the utilitarian purpose of models. The three features discussed in sections 3.4.2, 3.4.3 and 3.4.4 common to all models ensure that a model is always wrong:

1. given the flexible nature of structure, no one model can have an absolute claim to structuring structure;

2. as a simplification, every model leaves something out; and,

3. as a subjective interpretation, no model can claim an objective, unbiased representation of reality.

Paradoxically, these three features that make a model wrong also make it useful. Structure gives meaning to our social world, our existence in it and our experience of it; however, for this structure to be useful, we must interpret it, place boundaries on it, and recognise its re-structuring potential. Modelling allows us to capture the order present in the structure of our social world in a flexible, useful and interpreted way.

3.5 Network theory

3.5.1 Introduction

Section 3.5 considers aspects of the theory, methods and debates relevant to the study of networks. In general, a non-network specialist, information studies

92 3.5 Network theory audience is assumed; however, due to the multidisciplinary nature of the research, the report may also be of interest to network specialists. The need to cater to both specialist and non-specialist audiences raised con- flicting concerns in terms of content. On the one hand, a considerable amount of introductory content was needed in order to support an understanding of the research by the non-specialist:

Network theory, like some other branches of . . . sociology, has de- veloped its own specialized terminology and technology. An unfor- tunate result has been that many non-specialists have lost touch with advances and debates in this branch. (Alder in Abrahamson and Rosenkopf, 1997, p. 289)

On the other hand, introductory content would be common knowledge to a net- work specialist, who would expect the discussion to engage with more advanced network theory issues. Given these conflicting content concerns, the discussion in section 3.5 aims to strike a three-way balance between:

1. ensuring that the non-specialist reader is introduced to the network termin- ology and techniques needed to understand the research study;

2. engaging with network theory and debates at an advanced level appropriate to specialist knowledge; and,

3. restricting the content given that the report must necessarily review a large amount of material in a limited space.

Three network studies texts are recommended to readers interested in a more in-depth discussion of network theory than is provided here. Wasserman and Faust (1994) is a foundational text that reviews network theory and its applic- ation in the social sciences. Newman (2010) is a multidisciplinary text that provides an extensive survey of network measures and includes discussion of re- cent developments in network theory. A noteworthy point is that both of these books discuss network measures with little or no reference to their operational- isation in network analysis software, an approach that is especially helpful for the

93 3. METHODOLOGICAL FRAMEWORK reader seeking to understand the mathematical formulae underlying the network measures. For readers seeking an open access option, Hanneman and Riddle’s (2005) online text on social network methods is recommended, especially for its broad, yet succinct, coverage and its readability.

3.5.2 Mathematical foundations of networks

Networks are a type of mathematical model used to study relations. A network is “ . . . a collection of elements and their inter-relations” (Kolaczyk, 2009, p. 15). As visualisations, networks are commonly conceived as dots connected by lines. Figure 3.2 illustrates a network.

Figure 3.2: A network - The figure shows a network visualised as a collection of dots connected by lines. This network is formed by the interrelation of five elements (v1–v5).

Networks are mathematical objects, graphs, that derive from the branch of mathematics known as graph theory. Network elements are termed, vertices or nodes, and network relations, lines or edges. A network can be represented quant- itatively as a square matrix, an array of numbers, and measured and manipulated using matrix algebra. For readers unfamiliar with matrix algebra, Namboodiri (1984) provides an introduction aimed at the social scientist.

94 3.5 Network theory

Table 3.2 presents the matrix for the network illustrated in Figure 3.2. The ones and zeros in the matrix indicate the presence (1) or absence (0) of an edge between two vertices. This type of matrix, incorporating only ones and zeros, is termed a binary matrix.

v1 v2 v3 v4 v5

v1 0 0 1 1 1 v2 0 0 0 1 0 v3 1 0 0 0 0 v4 1 1 0 0 1 v5 1 0 0 1 0

Table 3.2: A binary matrix - This table presents the binary matrix underlying the network illustrated in Figure 3.2.

Lines in a network can be undirected or directed. In a directed network, all lines must have direction, but they may be either one-way (→ or ←) or bidirectional (↔). Directed lines are termed arcs instead of edges. Bidirectional arcs represent the combination of two arcs linking in opposite directions (). Figure 3.3 shows the undirected network from Figure 3.2 transformed into a directed network. Table 3.3 presents the matrix corresponding to the directed network. Note that the matrix has changed; unlike the undirected example in Table 3.2, the matrix in Table 3.3 is not symmetrical in terms of the ones and zeros, confirming quantitatively that some arcs in the network are one-way. Network lines can also carry weights. Although weights can have several different interpretations, they are often used to indicate frequency, the number of times two vertices link. In the case of directed networks, the two one-way arcs comprising a bidirectional arc can each have a different weight. Figure 3.4 shows a weighted version of the directed network from Figure 3.3. Table 3.4 provides the matrix corresponding to this weighted version of the directed network. Note from the matrix that weights expand the limited binary matrix options of link (1) or no link (0); the numbers in the matrix now represent the number of times a link occurs.

95 3. METHODOLOGICAL FRAMEWORK

Figure 3.3: A directed network - The figure shows the network from Figure 3.2 transformed into a directed network.

Figure 3.4: A weighted directed network - The figure shows the network from Figure 3.3 with weights on the lines. The bidirectional arcs are separated into two distinct arcs, each with its own weight. The weights represent frequency, the number of times an arc occurs.

96 3.5 Network theory

v1 v2 v3 v4 v5

v1 0 0 1 1 1 v2 0 0 0 1 0 v3 1 0 0 0 0 v4 0 1 0 0 0 v5 0 0 0 1 0

Table 3.3: A matrix for a directed network - This table presents the matrix for the directed network illustrated in Figure 3.3. Note that the distribution of ones and zeros is no longer symmetrical.

v1 v2 v3 v4 v5

v1 0 0 1 1 1 v2 0 0 0 1 0 v3 2 0 0 0 0 v4 0 1 0 0 0 v5 0 0 0 3 0

Table 3.4: A matrix for a weighted directed network - This table presents the matrix for the weighted directed network illustrated in Figure 3.4. Note that the matrix is no longer binary. The numbers in the matrix indicate the arc weights.

3.5.3 Measuring and comparing relational structures

Networks are analysed quantitatively using network counts and network measures. Examples of counts include order, the number of vertices in a network, and size, the number of edges in a network.17 Network counts are a standard starting point for network analysis; however, because networks are relational, network measures, rather than counts, are required to capture structural meaning. The smallest unit

17Note that outside the mathematical literature, size is often used to refer to the number of vertices, rather than edges, in a network; for example, the social network literature refers to the comparison of networks of different ‘sizes’ (i.e. networks that have different vertex counts). To avoid confusion, the present document uses order and size in the formal mathematical sense and refers to networks with different vertex counts as networks of different magnitudes.

97 3. METHODOLOGICAL FRAMEWORK

of measurement (i.e. not unit of counting) possible in a network is the network dyad (• − •), a unit describing the relation between two elements. Note that the relation between two elements can be null (as indicated by the zeros in the matrices presented in section 3.5.2), producing a null dyad (••). Network analysis permits both the measurement and comparison of relational structure. In terms of the measurement of relational structure, the distinction between measures and counts is an important one: only network measures provide relational information. Such information cannot be provided by a count. For example, neither order nor size in isolation can define a network as a relational object; the two counts can only do so in combination, a fact written into the formal graph notation for a network:

graph G = (V,E)

where

V = a set of vertices (the order of the graph)

and

E = a set of edges (the size of the graph)

and

the elements of E are pairs of distinct vertices { u, v }.

In contrast, density, a network measure, allows the direct calculation of a relational aspect of a graph, its cohesion. Cohesion refers to the ‘connectedness’ of a graph—how dense or sparse its relations are. Density measures cohesion by calculating the number of links present in a network relative to the number of potential links possible in that network. The potential for links is in turn dependent on the number of vertices in the network. Density presents as a ratio of actual links to possible links. In a directed network, this ratio implies that density is equal to the total number of lines present between pairs of vertices in the network divided by the number of pairs

98 3.5 Network theory of vertices in the network. Note that the term ‘line’ is used here in a literal sense: direction becomes irrelevant—there either is or isn’t a ‘line’ between two network vertices. The formula for density in a directed network18 is

e D = dir v(v − 1) where

Ddir = the density for a directed network

and

e = the number of edges present in the graph

(where all edges are considered undirected)

and

v = the number of vertices in the graph.

The result of a density calculation can range from 0.00 to 1.00. In other words, as a percentage, density ranges from 0%, when a network only contains null dyads, to 100%, when every vertex in the network connects to every other vertex. The comparison of relational structures is also important in some network analysis contexts. Because its value ranges from zero to one, density is a norm- alised measure. Some network measures, including degree (see section 3.5.6.5), have both a normalised and an absolute variant. Counts, such as order and size, are always absolute. Normalised measures are useful in network analysis because they allow the direct structural comparison of networks of different magnitudes, a comparison that is impossible using counts or absolute measures. As real networks often vary in magnitude, the means to compare networks of different magnitudes is a valuable asset in empirical research contexts that involve the investigation of multiple networks.

18The density of a directed network also equates to that network’s average degree centrality. See sections 3.5.6.5 and 7.2.4.

99 3. METHODOLOGICAL FRAMEWORK

3.5.4 Networks in the real world

3.5.4.1 A taxonomy of network classes

Network methods focus on the empirical study of several classes of network. The network class tends to impact the research questions asked. Kolaczyk (2009) and Newman (2010) outline a typology comprising four network classes: technological networks, biological networks, networks of information, and social networks. Lan- guage networks19 (see Mehler, 2008, sec. 3; Mihalcea and Radev, 2011, chap. 4) comprise a fifth network class. Research examining technological networks, such as the Internet, the tele- phone network, power grids, and transportation and distribution networks, is often concerned with determining network topology, or the overall structure of the network, and the related issue of structural robustness. Biological networks include biochemical networks, neural networks, and ecological networks. The fo- cus of much biological network research is on patterns of interaction. The Web and citation networks are examples of networks of information. A notable feature of information networks is that they tend to be directed, some in only one dir- ection. Because, like technological networks, information networks can be very large, many of the topological questions asked of technological networks are also relevant to information networks. Social networks are networks that involve people, animals or larger combined social units, such as kinship groups or organisations. Other examples usually classed under social networks include networks and networks formed by social media. Many social network research questions focus on the concept of flow and its relation to social capital. Of work in all the network classes, research with social networks comes closest to addressing issues of cause and effect. As discussed in sections 3.5.5.1 and 3.5.6.6, causality cannot be addressed through network analysis alone. For this reason, social network studies often include non-network components designed to investigate actor attributes deemed relevant to the research questions. Attribute investigation and network analysis can then be synthesised to inform hypothesis development.

19Language networks may also be termed linguistic, lexical or text networks.

100 3.5 Network theory

Examples of language networks include syntactic, similarity and semantic net- works. A language network can be described from a single text or from a collec- tion of texts. Language networks are often studied in natural language processing contexts; however, semantic networks, in particular, are also studied outside of computational linguistics (see section 3.5.4.3).

3.5.4.2 Fuzzy boundaries

The network typology presented in section 3.5.4.1, while a useful starting point, must be understood in the context of the messiness of both real-world networks and real-world research: as Newman (2010, p. 63) stresses, “The classification of networks as social networks, information networks, and so forth is a fuzzy one, and there are plenty of examples that . . . straddle the boundaries”. Secondary typological labels also cross boundaries; for example, both social and technological networks may be described as geospatial networks.

3.5.4.3 Social semantic networks

A semantic network describes concepts found in text and the semantic links between the concepts (Sowa, 1983). Studies such as Popping’s (2003) research on the extraction of knowledge graphs from texts and Feldstein’s (2009) narrative approach to the extraction of mental maps from online texts provide examples of semantic network analysis. While some semantic network analysis is concerned primarily with the se- mantic understanding of text, other research takes the stance that semantic structure sheds light on the social structure underlying text. This social structure stance mirrors the viewpoint of Carley and Kaufer (1993), who argue that con- cepts and relations explicit in the text are symbolic, inferences of implicit social understanding. Hennig (2006) provides an example of a basic social semantic network ap- proach. She analyses Thomas Mann’s (1924), Der Zauberberg, by mapping the social interactions between characters in the novel, an approach that results in a social network (albeit, a fictional one) derived from a semantic text. The im- portant point to note about this research is that it treats the semantic and social

101 3. METHODOLOGICAL FRAMEWORK networks, if not as synonymous, then certainly as closely related. A range of possible approaches to the analysis of semantic and social semantic networks exists. As discussed in section 3.5.4.1, natural language processing se- mantic network research is often driven by lexical or linguistic aims; however, sociological research is usually more interested in social aspects of semantic net- works. Some sociological research investigates semantic networks primarily to explore underlying social structure, meaning or culture (Carley, 1994; Diesner and Car- ley, 2004; Palmquist, Carley and Dale, 1997; Scholand, Tausczik and Pennebaker, 2010). Other research, especially studies of online social semantic networks, fo- cuses equally on social and semantic aspects, aiming to overlay or merge separate analyses for optimum effect (Downes, 2005; Jung and Euzenat, 2007).

3.5.5 Network research

3.5.5.1 The network perspective

Network research focuses on the description of existing relational structure; as such, network methods are appropriate for research questions that have rela- tional, structural bases and do not seek to prove causal relationships. In prac- tice, network research normally includes some non-network aspects, including the study of attributes and wider contextual elements. Network methods may also be used in combination with other research methods, an approach that is essential if network methods are employed in research defined by explanatory aims (for example, see Fischer, 2011). While some network research continues to focus on theoretical issues, other work aims to apply abstract theory in concrete settings: for example, Wilson, Dougherty, Sutter, Mitchell-Jackson, Wellner and Hall (2008) recommend that network methods be introduced into best practice in programme evaluation and design, while Thimbleby and Oladimeji (2009) consider the potential network methods offer for interaction programming and device design. A recent report (Scearce, 2011), commissioned by the Knight Foundation, examines networks in society and provides a good example of how network theory has moved beyond the academic arena into real-world settings.

102 3.5 Network theory

3.5.5.2 Information studies

This section begins with an overview of a recent study, Schultz-Jones (2009), that surveyed the use of network theory and methods in information behaviour research. The overview concentrates on the survey’s findings, rather than on the studies reviewed. The remainder of the section examines selected research not reviewed in Schultz-Jones (2009). The selected research is either representative of a major research area related to both information and network studies or relevant to the present study.

Schultz-Jones Schultz-Jones (2009) reviews the post-1996 use of network theory and methods in information behaviour research. The review was inspired by Haythornthwaite’s (1996) discussion of the usefulness of network methods for information research and practice. Amongst other recommendations, Haythornthwaite (1996, p. 340) suggests that network methods be used as a tool by information professionals to assess information needs and to modify the delivery of information. Schultz-Jones (2009) examines a sample of information and non-information science research reports referring to network theory or analysis. She finds that content in information science journals does reflect an interest in network theory and analysis, with the largest number of reports of network research published in the journal, Information Research (Schultz-Jones, 2009, p. 599); however, she also argues that the growth of interest in network methods, especially network analysis, in information science is more recent than that in non-information sci- ence fields, a finding that suggests that information science has been slower than other fields to embrace network methods (Schultz-Jones, 2009, p. 601). Schultz-Jones (2009, p. 605) identifies four foci for network research in inform- ation behaviour:

1. the impact of or value received from network structures;

2. diffusion across networks;

3. influence in networks; and,

4. case-studies of information seeking in networks.

103 3. METHODOLOGICAL FRAMEWORK

In line with her expectations, Schultz-Jones finds some research focuses on cita- tion analysis; however, she suggests that most network research in information science explores social networks and draws on the social capital tradition. Many studies focus on the impact or value created by structure; for example, studies examine strong and weak ties as sources of information, constraint in information-seeking networks, information sharing in academic environments, and the social aspects of information behaviour. Research into diffusion, despite be- ing a popular network research focus outside of information science (see section 3.5.5.3), is limited in information science to a few studies examining how inform- ation diffuses in information provision contexts and knowledge work. Research exploring influence is highly focused on who influences whom in publication and citation networks, although some studies focus on influence patterns in collab- orative social networks. Finally, a small number of studies investigate cases of information seeking networks, especially cases of networks created by information needs; however, none of these case studies employ quantitative network analysis or report their results as network measures. Schultz-Jones (2009) concludes her review with suggestions for future network research in information science. She notes that network research especially offers potential for the investigation of information behaviour in context, with the net- work providing a useful relational context. Echoing Haythornthwaite’s (1996) call for the increased use of network methods in information science, Schultz-Jones (2009, p. 626) suggests that, although progress has been made, more remains to be done, with limits to the application of network theory and methods existing “ . . . only in the imagination of the researcher”.

Selected research not reviewed in Schultz-Jones As Schultz-Jones (2009) notes, the studies she reviewed indicate that network methods are used in both information seeking and knowledge management re- search. The wider literature supports this finding. Several examples from the information seeking literature focus on social search: Borgatti and Cross (2003) adopt a relational perceptive to study information seeking in social networks; Kossinets, Kleinberg and Watts (2008) study the effect of network structure on information flow potential in communication contexts; Cross et al. (2001) consider

104 3.5 Network theory the information seeking benefits gained from networks of social relationships; and, Evans et al. (2009) explore the role of social interaction in information seeking tasks. The knowledge management literature is notable for its combined use of social and semantic networks (Liebowitz, 2005; Scott, 2005). Information retrieval research also draws on network theory. Work in the areas of link analysis, bibliometrics and webometrics (de Bellis, 2009; Osareh, 1996a,b; Thelwall, 2004, 2008, 2009, 2010; Thelwall, Vaughan and Bj¨orneborn, 2005) is grounded in network methods. It is important to note that network research in information retrieval may not necessarily refer explicitly to the term ‘social networks’;20 however, social networks and social issues are certainly studied (for a recent example, see Chmiel, Sienkiewicz, Thelwall, Paltoglou, Buckley, Kappas and Holyst, 2011). One area where network theory is highly relevant is the development of search algorithms. Examples of search algorithms based on network theory include: The HITS, or hyperlink-induced topic search algorithm, which relies on the net- work concept of hubs and authorities (Kleinberg, 1999; Newman, 2010); Google’s PageRank, a type of centrality measure (Brin and Page, 1998; Newman, 2010); and ‘find similar’ algorithms (for an example, see Smucker and Allan, 2007). Network theory, especially its interpretation of the small world problem (Mil- gram, 1967), is also relevant to another area of information retrieval research, the study of local search in unstructured, decentralised networks. Such networks are often termed power law, or scale-free networks because of their power law degree distributions (Newman, 2010, p. 249).21 Research suggests that many modern technology-based information and communication networks, including the Web, peer-to-peer file sharing networks, and some email networks, are power law networks (Adamic and Adar, 2005; Adamic, Lukose and Huberman, 2002; Adamic, Lukose, Puniyani and Huberman, 2001; Barab´asi,Albert and Jeong, 2000; Bj¨orneborn, 2004; Thelwall, 2003; Thelwall and Wilkinson, 2003).

20Note that Schultz-Jones (2009, p. 598) searched only on the keyword phrases, ’social net- work theory‘ and ’social network analysis’, during the course of her review—a decision that potentially made her study less likely to locate information retrieval network research. 21See section 3.5.7.1 for a definition of degree distribution.

105 3. METHODOLOGICAL FRAMEWORK

3.5.5.3 Creativity and innovation studies

As argued in section 1.3.4, creativity and innovation differ from serendipity and from each other; nonetheless, in practice, the three concepts are sometimes stud- ied interchangeably. In recognition of this situation, it is pertinent that the present study briefly considers the use of network research in the context of cre- ativity and innovation studies. Network research exploring creativity and innovation draws heavily on the concept of social capital and usually involves social networks. Like information behaviour network research (see section 3.5.5.2), creativity and innovation net- work research can be classified according to research focus. Innovation studies often focus on diffusion while studies of creativity tend to concentrate on either influence or structural impact and value received; however, it is important to note that research foci may overlap, especially in research that approaches innovation and creativity as synonymous. Innovation studies seek to explore how knowledge or ideas diffuse through so- cial networks. Questions asked include: are certain network actors keys to innov- ation (Tsai, 2001)? do some areas of a network participate in innovations faster, or not at all (Gilsing, Nooteboom, Vanhaverbeke, Duysters and van den Oord, 2008; Nooteboom, Gilsing, Vanhaverbeke, Duysters and van den Oord, 2006)? and, how does network structure affect the extent of innovation (Abrahamson and Rosenkopf, 1997)? Network research examining creativity focuses on the influence of vertices on other vertices in the network (Kijkuit and van den Ende, 2007; Zhou, Shin, Brass, Choi and Zhang, 2009); the local patterns of interaction and clustering within the network (Fleming, Mingo and Chen, 2007); the impact of global network structure (Uzzi and Spiro, 2005); and, the advantages network position affords vertices (Perry-Smith and Shalley, 2003). A key assumption underlying creativity studies network research is that creativity is not an internal, psychological activity, but rather, an activity that relies on social interactions as sources of new ideas.

106 3.5 Network theory

3.5.6 Network methods

3.5.6.1 Historical context

Social network analysis, the formal method employed to measure networks, is used to study networks in all classes—not just social networks; so why is the term ‘so- cial’ found in the method’s name? The term’s inclusion reflects the historical development of network studies as an interdisciplinary discipline born from work in several fields, including mathematics and the social sciences (Borgatti, Mehra, Brass and Labianca, 2009; Freeman, 2004, 2011; Newman, 2010). Mathematics’ involvement with networks traditionally traces back to 1736 when Euler’s work on the K¨onigsburg bridge problem inspired his idea of graph theory. As Freeman (1996) notes, in the social sciences, the first empirical studies documented as re- lying on recognisable network methods derive from the 1920s and 1930s (Almack, 1922; Bott, 1928; Chevaleva-Janovskaja, 1927; Hagman, 1933; Hubbard, 1929; Moreno, 1934; Wellman, 1926). Throughout much of the twentieth century network developments in vari- ous fields, including mathematics, physics, sociology, anthropology, psychology, biology and computer science, intermingled and interinfluenced, processes that continue today. The limited space available here does not permit a fuller discus- sion of the history of network theory and methods; readers interested in more information about the development of network analysis in the social sciences are referred to Borgatti et al.’s (2009) review.22 Despite some problems associated with a lack of knowledge about network methods (Mergel and Henning, 2007, p. 2) and the dispersion of network think- ing across disparate fields (Newman, 2010, p. x), the use of network approaches is flourishing (Marin and Wellman, 2011, p. 15): Everett (2010, slides 2, 27) notes the “rise and rise” of social network analysis and argues that the method is “here to stay”. The increased availability of large datasets and capability of contem- porary network analysis software to handle these datasets effectively (Newman, 2010, p. 275) has also played a role in promoting the continued interest in network research. 22Note that the longer, pre-publication version of this review is the most in-depth.

107 3. METHODOLOGICAL FRAMEWORK

3.5.6.2 Approach

Networks are studied using sociocentric or egocentric approaches. The socio- centric approach adopts a global perspective and examines a complete network, one that is bounded and finite, with no intended focus on a given vertex. By ‘complete’, network theory assumes that no vertices or lines are missing from the network; however, network practice accepts that true completeness is often difficult to achieve in empirical research contexts due to problems with boundary specification and missing data (see section 3.5.6.6). The egocentric approach centres on one vertex and examines the local en- vironment of that vertex. The localised structure that results is termed centred (Freeman, 1982). The egocentric approach is not concerned about network com- pleteness beyond this local, centred environment. The classical egocentric approach studies ego, all of ego’s direct links to alters, and any direct links between the alters. Together these vertices describe an ego network. Although ego networks are most often described at a distance of one link from ego, they can be described at any predetermined distance or, indeed, at no predetermined distance from ego. Data can be collected specifically with the intention of describing an ego network; alternatively, as illustrated in Figure 3.5, an ego network can be extracted from an existing complete network. The research questions guide the choice of sociocentric or egocentric approach. One approach normally dominates a study; however, both approaches may be used in contexts where the study data can be conceived as either ego or complete networks and the research questions address both local and global aspects of the data.

3.5.6.3 Model

The research questions and study data also influence the choice of network model. Borgatti and Lopez-Kidwell (2011) define two network models, the flow model and the architecture model. The flow model conceives the network as a series of links along which something passes from vertex to vertex. Given the right circumstances, the ‘something’ can enter the network, travel through it, and emerge out the other side essentially unchanged. The architecture model, also

108 3.5 Network theory

Figure 3.5: An ego network extracted from a complete network - The figure presents two networks. The top network is a complete network. The network in the bottom right-hand corner is an ego network at a distance of one centring on vertex five (v5) that has been extracted from the larger, complete network.

109 3. METHODOLOGICAL FRAMEWORK

termed the bond or coordination model (Borgatti and Halgin, 2011), realises the network differently, as a dependency structure. Rather confusingly, both models accommodate the transactional concept of flow. Borgatti and Lopez-Kidwell (2011) use the example of the communication of orders to illustrate the difference emphases of the two models. Communication that repeats down the line, for example, orders to change the course heading of a ship, follows a flow model. Communication that changes down the line fol- lows an architecture model; for example, in management structures, the orders that pass from senior management to middle management may differ from those passed from middle management to the staff on the shop floor. As Borgatti and Lopez-Kidwell (2011, p. 46) note, alignment is key in the architecture model: “Communication is involved, but the coordination, not the message, is the mech- anism”. Borgatti and Halgin (2011, p. 6) argue that the flow model has dominated network theory, but stress that non-flow models offer the potential for new the- oretical developments:

The flow model is the most developed theoretical platform in network theory, but it is not the only one. The field has clearly identified phenomena and developed theoretical explanations that cannot be reduced to the flow model.23

It is important that the reader of network-related texts is aware of this strong bias towards the flow model. For example, some network measures commonly presented in standard network studies textbooks are only appropriate for flow model networks; however, this limitation is rarely acknowledged or discussed in the texts because the flow model assumption is so deeply ingrained.

3.5.6.4 Mode

Networks can be conceived as one-mode or multi-mode. In a one-mode network, any vertex in the network can theoretically link directly to any other vertex in

23Examples of such phenomena include structual (Lorrain and White, 1971), automorphic (Winship, 1974, 1988) and regular (Sailer, 1978; White and Reitz, 1983) equivalence, concepts that are studied using blockmodels (see Doreian, Batagelj and Ferligoj, 2005).

110 3.5 Network theory that network. In a multi-mode network, vertices within the same mode may not link directly; direct links are only permitted between vertices belonging to different modes. Like approach and model, mode is a modelling choice. For example, given a network with vertices that represent doctors and nurses, should doctors link to nurses and other doctors (i.e. one-mode), or only to nurses (i.e. two-mode)? The answer depends on the research questions asked, the nature of the study data, and the effect of mode choice on the measurement of that data. As such, network methods ultimately place the decision to use a single or multi-mode approach with the individual analyst (Borgatti and Everett, 1997, p. 244). It is important not to confuse the concept of network mode with that of vertex attribute; the two are distinct concepts that are not required to relate. In the example presented above, ‘doctor’ and ‘nurse’ are vertex attributes, even in the two-mode case, where ‘doctor’ and ‘nurse’ are also network modes. The decision to adopt a one- or two-mode approach to a network has no effect on the attributes of its vertices.

3.5.6.5 Measures of degree

Numerous network measures, calculations which allow the quantitative study of relational aspects of networks, are available to the network analyst (see section 4.5.4). The present study involved directed networks and relied heavily on the network concept of degree. As such, the discussion here focuses on measures of degree, especially those calculable from directed networks. The degree of a vertex is equal to the number of lines directly connected to that vertex, or, in other words, the number of dyads in which a vertex participates. As such, degree is a dyad level network measure. A vertex in an undirected network simply has a degree; however, a vertex in a directed network has three degree measures—an indegree, measuring incoming direct links, an outdegree, measuring outgoing direct links, and a total degree, the sum of the in and outdegrees. Figure 3.6 presents an example of degree measured in both undirected and directed versions of a network. Readers may wish to refer back to this figure as a visual aid during the course of some of the mathematical discussion that follows in this

111 3. METHODOLOGICAL FRAMEWORK

section. The maximum degree possible for a vertex in an undirected network is equal to the total number of vertices in the network minus one:

kv = n − 1

where

kv = the maximum degree of vertex v

and

n = the total number of vertices in the network.

A vertex with maximum degree is connected to every other vertex in the network except itself (hence, the −1). It is important to note that maximum degree is entirely dependent on network order. A vertex in a network of order forty can have a maximum degree of thirty-nine whereas a vertex in a network of order five can only ever have a maximum degree of four; nonetheless, with degrees of thirty-nine and four respectively, both vertices are said to be at maximum degree. In a directed network, maximum indegree and maximum outdegree are cal- culated using the same general formula for defining maximum degree in an un- directed network; however, the formula for maximum total degree in a directed network requires a crucial change. Instead of n − 1, the formula now becomes 2(n − 1). The change is because total degree is the sum of indegree and out- degree. Consider one of the examples given above—the vertex in the network of order five—but now imagine that the vertex is maximally connected to every other vertex in the network by both incoming links

in kv = n − 1 = 5 − 1 = 4

112 3.5 Network theory

Figure 3.6: Degree in undirected and directed networks - The illustration is divided into two halves. The top half shows an undirected network and the bottom half presents a directed version of that network. In the undirected network, vertex 5 simply has a degree of two; however, in the directed network, vertex 5 has an indegree of one, an outdegree of one, and a total degree of two.

113 3. METHODOLOGICAL FRAMEWORK

and outgoing links

out kv = n − 1 = 5 − 1 = 4.

total in out Maximum total degree is defined as kv = kv + kv ; therefore, in this example, maximum total degree is 4 + 4 = 8. Note that the calculation of maximum degree using the 2(n − 1) formula produces the same result:

total kv = 2(n − 1) = 2(5 − 1) = 8.

The calculation of maximum degree results in absolute measures that are not comparable across networks of different magnitudes; however, normalised meas- ures of degree, which are comparable across networks of different magnitudes, are defined at both the dyad and network levels. The dyad level normalised degree measure is termed degree centrality. The degree centrality of a vertex is the ratio of the actual degree of that vertex to the maximum degree possible for that vertex. Degree centrality measures range from 0.0 (0%) to 1.00 (100%). Equations 3.1, 3.2 and 3.3 present the formulae for the indegree centrality, outdegree centrality, and total degree centrality of a vertex respectively:24

in in dv Cv = in (3.1) kv

out out dv Cv = out (3.2) kv 24Note that approaches for the calculation of degree centrality for weighted networks also exist. For example, the present study used a procedure that incorporates a weighting constant into the calculation of maximum degree centrality (i.e. into the denominator of the degree centrality formulae in Equations 3.1, 3.2 and 3.3). See Wei, Pfeffer, Reminga and Carley (2011, p. 10–11) for the weighted versions of the formulae. See sections 6.4.2.3 for further discussion of general issues related to the calculation of weighted degree centrality.

114 3.5 Network theory

total total dv Cv = total (3.3) kv where

dv = the actual degree of vertex v

and

kv = the maximum degree of vertex v

and

Cv = the degree centrality of vertex v.

As mentioned earlier, degree centrality is useful for the comparison of vertex degree across networks of different magnitudes; for example, consider the com- parison of the indegree of two egos found in two networks of different orders. If Ego One has an indegree centrality of 0.35 (35%) and Ego Two has an indegree centrality of 0.80 (80%), then the analysis can conclude that Ego Two has the higher indegree centrality. This conclusion, which accommodates the relational nature of the measured networks, cannot be drawn from the comparison of the absolute measures of indegree for the two egos. The network level normalised degree measure is termed degree centralisation. As with absolute degree and degree centrality, degree centralisation has in, out and total variants for directed networks. Degree centralisation is best understood as a ratio of the variation in degree centrality across a network to the maximum possible variation in degree centrality in that network. Like degree centrality, degree centralisation ranges between 0% and 100% and is directly comparable across networks of different magnitudes. It is important to remember the dis- tinction between the two normalised measures of degree: centrality applies to individual vertices while centralisation applies to the network as a whole.

115 3. METHODOLOGICAL FRAMEWORK

3.5.6.6 Debates

Network methods are not without their problems. The discussion here does not aim to present an extensive review of the controversies and ongoing debates sur- rounding network methods, but rather to develop a basic awareness of four cur- rent concerns in network research. Readers may find familiarity with these issues useful in the context of the research credibility audit presented in Chapter 7.

Boundary specification A network boundary delineates the “ . . . set(s) of objects that lie within a network . . . ” (Marsden, 2005, p. 9). Both sociocentric and egocentric network approaches require decisions concerning network boundaries. Normally, a com- bination of research aim-related needs and pragmatic concerns and limitations guide these decisions. As such, the specification of network boundaries involves both simplification and subjectivity, a situation that highlights the network re- searcher’s role as modeller. In a paper that remains a gold standard discussion, Laumann, Marsden and Prensky (1989, p. 70)25 present a typology comprising eight network boundary specification strategies. Each of the eight strategies is based on one of two broad approaches to boundary specification: nominalist or realist. With a nominalist approach, “. . . an analyst self-consciously imposes a conceptual framework con- structed to serve his or her own analytic purposes” (Laumann et al., 1989, p. 66). An analyst employing a realist approach adopts the network actors’ point of view. As well as being appropriate for many egocentric network studies,26 a realist approach to boundary specification is essential for network studies grounded in phenomenology. Laumann et al. (1989, p. 65) note:

Thus [with a realist approach], the network is treated as a social fact only in that it is consciously experienced as such by the actors composing it; Braithwaite (1959) refers to this as a phenomenolist conception of facts.

25The 1989 paper is a reprint of Laumann, Marsden and Prensky’s (1983) paper. 26Laumann et al. (1989, p. 66) note that a realist approach can be problematic in sociocentric networks because the approach assumes a shared, single point of view amongst network actors.

116 3.5 Network theory

In addition to the two broad approaches, Laumann et al. (1989, p. 70) further specify their typology by means of three definitional foci, one of which is “par- ticipation in an event or activity”. The resulting eight boundary specification strategies represent various combinations of the broad approaches and definitional foci; for example, the present study adopted strategy five—a realist event-based approach. Each boundary specification strategy assumes different rules, which in turn impact the possible choices for the analysis and affect the results. Laumann et al. (1989, p. 63–64) argue that this situation has two implications:

. . . the appropriate choice of rules remains contingent on the object of explanation for a given study . . . irrespective of the solution chosen, the problem of boundary definition should be given conscious atten- tion when studies using a network approach are designed.

In respect of the second implication, Laumann et al. (1989, p. 79) note the “little attention in the past decade [the 1970s] ” paid to boundary specification by network researchers and call for “more explicit attention” to the topic. Recent literature indicates that the problem of network boundaries and their specification remains (Crow, 2004; Heath, Fuller and Johnston, 2009; Karafillidis, 2008; Tilly, 2004).

Missing/inaccurate network data Missing/inaccurate network data can result from boundary specification de- cisions and/or data collection methods (Kossinets, 2006; Marsden, 1990); as such, many real networks studied empirically include data error (Frantz and Carley, 2005; Gile and Handcock, 2006; Handcock and Gile, 2007). If ignored, the miss- ing/inaccurate data problem can result in seriously biased network research. Of course, data error problems are not restricted to network studies; however, the relational nature of network data means that “ . . . the structure of the network is especially sensitive to missing data . . . ” (Huisman, 2009, p. 2). Research suggests that the missing/inaccurate data problem affects network analysis results differently, depending on network topology, the measures used, and the nature and extent of the missing or inaccurate data (Frantz and Car- ley, 2005; Kossinets, 2006). Work aimed at addressing the missing/inaccurate

117 3. METHODOLOGICAL FRAMEWORK data problem includes imputation methods (Huisman, 2009) and estimation ap- proaches involving exponential random graph models (ERGMs)27 (Butts, 2003; Gile and Handcock, 2006; Handcock and Gile, 2007; Koskinen, 2007; Koskinen, Robins and Pattison, 2008; Robins, Pattison and Woolcock, 2004); however, rigourous studies of the problem remain relatively scare in the network literature (Huisman, 2009, p. 2).

Qualitative, quantitative or mixed? As discussed in section 3.5.6.1, network methods have both qualitative and quantitative roots; however, many advances since the 1970s in network methods have related to quantitative aspects, a trend that has been enforced by the devel- opment of network analysis software (Crossley, 2010; Edwards, 2010). The result, as Edwards (2010, p. 5) notes, “ . . . has therefore been a tendency for mainstream discussion of ‘SNA’ [social network analysis] to have an implicit orientation to the more dominant quantitative tradition”. Despite this implicit quantitative orientation, some network research grounds itself firmly and explicitly in the qualitative tradition (Heath et al., 2009; Hollstein and Straus, 2006). Other network research (Conti and Doreian, 2009; Coviello, 2005; Crossley, 2010; Edwards and Crossley, 2009; Jack, 2010) has made a con- scious effort to “ . . . bring qualitative concerns and issues back into social network analysis . . . ” (Edwards and Crossley, 2009, p. 37), aiming for an integrated ap- proach. Support for an integrated approach is also found outside of the network literature; for example, Teddlie and Tashakkori (2009, p. 256) argue that network methods represent a “necessarily” mixed methods approach. In a recent review of mixed method approaches to network analysis, Edwards (2010, p. 3) identifies three ways in which qualitative and quantitative elements are integrated in network methods:

27Kolaczyk (2009, p. 180–191) provides an overview of the main aspects of this model. See also Holland and Leinhardt’s (1981) foundational paper and Robins and Morris (2007), an edited collection of five papers devoted to ERGMs.

118 3.5 Network theory

1. Using qualitative approaches to inform the use of quantitative SNA, and quantitative approaches to inform the use of qualitat- ive SNA; 2. Mixing quantitative and qualitative methods of data collection and analysis; [and,] 3. Mixing qualitative methods of data collection with mixed-method data analysis.

Edwards stresses that research questions—not assumptions about data or methods—should ultimately guide the approach in network research. In respect of point three above, she notes a trend in sociological research to view networks as stories, narrative descriptions that have both ‘insides’ and ‘outsides’ (Knox, Savage and Harvey, 2006; Riles, 2001). The ‘insides’ are the narrators’ descrip- tions of network elements and relations, and the ‘outsides’ are the network model representations of these descriptions. Edwards (2010, p. 23–24) argues:

The specific opportunity that SNA provides for combining quantitat- ive and qualitative approaches does not . . . come from the fact that networks have different features—the structure (outside) and process (inside)— . . . but because the outside and the inside are essentially versions of the same thing.

Edwards (2010, p. 24) concludes her review with a comment concerning the in- tegration of qualitative and quantitative approaches in network analysis: “. . . there is both a great deal of innovation in existing research designs, and a lot more room for methodological development in the future”.

Causality, dynamics and agency As discussed in section 3.5.5.1, network methods are descriptive and, used in isolation, cannot prove cause and effect; however, networks are useful for hypo- thesis development and, as such, can be a component of exploratory research (de Nooy, Mrvar and Batagelj, 2005). Work on the modelling of dynamics and agency in networks represents at- tempts to improve the capability of network methods to ask the question, how (Burt, 2010, p. 223; Carley, 2003, p. 137)? Significantly, how? is important, not only in its own right, but also because of its relation to why? As Doreian (2001,

119 3. METHODOLOGICAL FRAMEWORK p. 111) argues, “Understanding generative mechanisms [in networks] . . . seems the most promising way to proceed . . . ” to address the issue of causality in networks. McCulloh and Carley (2009, p. 4) outline four dynamic network states repres- ented in the network literature: stability, evolution, shock, and mutation. Time- event (Sampson, 1969) and other longitudinal network approaches (see Doreian and Stokman, 1997; Snijders, 2007), established practices in network research, focus on network evolution, the endogenous changes visible in a network cap- tured at different points in time; however, this focus on network evolution has not normally included explicit attention to agency (Burt, 2010, p. 221). More recently, methods such as dynamic network analysis (Carley, 2003) and social network change detection (McCulloh, 2009) have emerged, allowing the study of dynamic change in networks under conditions of uncertainty and the investigation of exogenous effects on network changes.28 Carley (2003, p. 143) argues that “thinking about networks from a dynamic perspective is absolutely essential to understanding the modern world”, while McCulloh (2009) and McCul- loh and Carley (2009) note that a better understanding of dynamics in networks has useful real-world applications; however, dynamic network methods are even more complex than standard network methods (Carley, 2003, p. 135), a potential obstacle to the wide-spread use of dynamic methods in empirical contexts.

3.5.7 Inferential statistical network modelling

3.5.7.1 Random graph models

The inferential statistical network modelling techniques introduced in sections 3.5.7.2 and 3.5.7.3 use random graph models as abstract frames of reference for assessing statistical significance in real networks. Newman (2010, p. 398) defines a random graph as “ . . . a model network in which some specific set of parameters take fixed values, but the network is random in other respects”. Newman (2010, p. 399) elaborates further:

28Note that agent-based simulation (see Gilbert (2008)), another dynamic technique involving networks, is traditionally viewed as a distinct discipline separate from dynamic network analysis.

120 3.5 Network theory

Strictly, in fact, the random graph model is not defined in terms of a single randomly generated network, but as an ensemble of networks, i.e. a probability distribution over possible networks . . . When one talks about the properties of random graphs one typically means the average properties of the ensemble.

Kolaczyk (2009, p. 154–159) divides random graph models into two categories, classical and generalised. Classical random graph models fix the order (n) and size (m) (i.e. the number of edges) of a graph (G), G(n, m). Alternatively, they fix the order (n) of the graph, and the probability (p) of edges, G(n, p). The latter variant, G(n, p), termed the Erd¨os-R´enyimodel, or simply, ‘the random graph’, is the more common in inferential statistical network modelling (Kolaczyk, 2009, p. 156; Newman, 2010, p. 399, 400).29 In the G(n, p) model, p affects both the degree distribution30 and component31 sizes of the graph, basic underlying mathematical properties of the graph that “ . . . are two of the model’s most illuminating characteristics” (Newman, 2010, p. 400). Generalised random graph models generalise the G(n, p) model: they retain the fixed order (n) and condition the graph to possess a given characteristic (i.e. other than the probability (p) of edges) with equal probability (Kolaczyk, 2009, p. 158). Kolaczyk (2009, p. 158) notes that “ . . . the most commonly chosen condition is that of a fixed degree sequence”.32

29Kolaczyk (2009) and Newman (2010) disagree over who first suggested/studied the G(n, p) model, with Kolaczyk (2009, p. 156) stating it was Gilbert (1959) and Newman (2010, p. 400) claiming it was Solomonoff and Rapoport (1951). The present study is content to leave this debate to the mathematicians. Regardless of their disagreement over who first introduced the G(n, p) model, Kolaczyk and Newman agree that the model was established in a series of important papers by Erd¨osand R´enyi (1959; 1960; 1961). 30The degree distribution of an undirected network can be conceived as a histogram with bins of size one where the x-axis represents the absolute degree of the network vertices and the y-axis represents the number of vertices with that absolute degree. Note that directed networks have an indegree distribution, an outdegree distribution and a joint distribution of the in and out degrees (Newman, 2010, p. 246–247). 31Components are connected subgroups of vertices in a network. Components can be classed according to strength of connection—weak or strong—and by size—small, large and giant. The giant component and its relation to small components is especially relevant to the compon- ent structure of the G(n, p) random graph model. For more information see Newman (2010, sec. 6.11, 12.5–12.6). 32 The degree sequence of a network is the ordered “ . . . set {k1, k2, k3,... } of degrees for all the vertices” in a network where k represents the absolute degree of vertex 1, 2, 3, etc. (Newman, 2010, p. 244). Directed networks have in-, out-, and joint in/out degree sequences.

121 3. METHODOLOGICAL FRAMEWORK

Random graph models are used in inferential stasticial network modelling to assess the significance of a given measure (Mobserved) in an observed network. Significance is assessed by comparing (Mobserved) with the average of the same measure (Mrandom) calculated from a random graph model. Importantly, the analyst can choose which elements in the random graph to control. For example, controlling for network order is essential for comparing a random graph model with a real network because the control ensures that only networks of the same magnitude are compared.33 The discussion here is necessarily limited in both scope and technical detail. Readers seeking a more extensive overview of random graph models are referred to Kolaczyk (2009, sec. 6.2) and Newman (2010, chap. 12). For a thorough, albeit highly technical, review of random graph theory, see Bollob´as(2001).

3.5.7.2 Network topology inference

As an umbrella term, network topology inference34 encompasses several methods that employ inferential statistical network modelling to support inferences about the topology of real networks; network tomography (see Castro, Coates, Liang, Nowak and Yu, 2004) and exponential random graph modelling (see Kolaczyk, 2009, sec. 6.5) are two such methods. The present study employed a third method, a simpler, albeit somewhat simplistic, approach that compares certain network measures taken on a real network with those same measures taken on a random graph model (see Airoldi, Bai and Carley, 2011; Airoldi and Carley, 2005). The remainder of the discussion in this section focuses on this specific understanding of network topology inference. Network topology inference involves generating a random graph model, meas- uring both the random graph model and the real network on certain network measures, and testing for statistically significant differences between the means of the real network and random graph model measurements. Importantly, the simulated random graph model is assigned a given topology a priori. The idea

33Recall from sections 3.5.3 and 3.5.6.5 that the direct comparison of different magnitude networks is theoretically impossible on some measures. 34This term is used inconsistently in the literature, a trend that this document admits to following. Here, network topology inference is used initially as a general term and later as a specific term.

122 3.5 Network theory underlying network topology inference is that if a real network is not significantly different from a topologically-conditioned random graph model on a given meas- ure, then the real network echoes the topology of the random graph model on the measure tested. Airoldi and Carley (2005) and Airoldi et al. (2011) outline six network to- pologies that can condition random graph models. Although each topology has a distinct structure, some overlap of topological properties occurs across the six topologies. Certain network measures, including betweenness, closeness and in- verse closeness centrality,35 can be used to separate the topologies, even given the overlap across the topologies (Airoldi and Carley, 2005, p. 20). The present study inferred real network topology from two random graph models, each of which has a different topology and a distinctive degree distri- bution: the Erd¨os-R´enyi model36 and a core-periphery model.37 The degree distribution of the random graph model used in network topology inference is important: as Newman (2010, p. 397) notes, “It turns out that properties like degree distributions can in fact have huge effects on networked systems, which is one of the main reasons we are interested in them [degree distributions]”. The two random graph models employed in the present study were selec- ted because of empirical evidence drawn from the literature. Network research indicates that most real networks have right-skewed degree distributions that differ noticeably from the degree distribution of the Erd¨os-R´enyi model. Net-

35These network measures all belong to the general class of normalised centrality measures, a class to which degree centrality (see section 3.5.6.5) also belongs. 36The Erd¨os-R´enyi model has a random topology, one of Airoldi et al.’s (2011) six topologies. For a technical discussion of the Erd¨os-R´enyi model’s degree distribution, a binomial degree distribution with a Poisson approximation, see Kolaczyk (2009, p. 157) and Newman (2010, p. 401–402). 37The core-periphery model has a core-periphery topology, another one of Airoldi et al.’s (2011) six topologies. In broad terms, a core-periphery network has a densely connected core of vertices and a periphery of vertices that connects to this core. Vertices in the periphery connect only to the core, not to other periphery vertices. In more precise terms, several variants of the core-periphery model are possible (see Borgatti and Everett, 1999). The general shape of the degree distribution of a core-periphery model is defined by two noticeable groupings of vertices (i.e. into the core or periphery); however, the exact shape of the degree distribution depends on the extent of the ‘core-peripheriness’ of the network. Figure 3 in Brede and de Vries (2009, p. 2114) illustrates this point.

123 3. METHODOLOGICAL FRAMEWORK

work research also suggests that some real networks, especially social networks, commonly display a core-periphery structure (Newman, 2010, p. 246, 230). It is important to note that network topology inference, as an aid to un- derstanding real networks, is no do-it-all tool. Empirical real-world networks are fundamentally different from formal random graph models: the topology of a real network is normally much more complex than that of a topologically-conditioned random graph model (Kolaczyk, 2009, p. 191). As such, it is highly unlikely that a real network will show ‘no significant difference’ from a topologically-conditioned random graph model on all possible network measures; it is much more likely that a real network will represent a “mixture of types” of topologies (Airoldi and Carley, 2005, p. 17).

3.5.7.3 Network motif detection

A motif is a recurring structure or pattern appearing within the bounds of a larger structure. In some contexts, for example, in works of music or narrative, motifs are considered to inform and develop larger works; the implication is that motifs, higher-order structures that occupy the region between elemental and overall structure, play an important role in advancing a work’s transition, both structurally and semantically, from a collection of basic fundamental elements to an advanced, complex state (Dar´anyi and Lendvai, 2010, p. 5). In this sense, motifs are ‘building blocks’ of larger structure. Network motif detection is an inferential statistical network modelling tech- nique that draws on this general concept of the motif. Network motifs are subnet- works that occur statistically significantly more frequently in a real network than in a random graph model (Milo, Shen-Orr, Itzkovitz, Kashtan, Chklovskii and Alon, 2002). Network motifs have theoretical applications; for example, research suggests that motifs define and predict network topology (V´azquez,Dobrin, Sergi, Eckmann, Oltvai and Barab´asi,2004). Network motifs are also used in empirical contexts for modelling and simulation purposes. Studies of network motifs present most often in the bioinformatics literat- ure (for examples, see Gao, Mathee, Narasimhan and Wang, 1999; Zhu and Qin, 2005). Information research involving network motifs includes studies of

124 3.5 Network theory email networks (Juszczyszyn and Musial, 2011), blog networks (Kamaliha, Riahi, Qazvinian and Adibi, 2008; Kolaczyk, 2009) and the Web (Milo et al., 2002). The social network literature’s attention to the triad census38 is considered a precursor to network motif detection (Kolaczyk, 2009, p. 167). Recent research demonstrates that work with the triad census continues, albeit it with methods and an orientation that differ from those of network motif detection (Faust, 2006, 2010).39 Although, in theory, network motifs can be of any size, much network motif detection focuses on motifs of size-3 or -4, connected subnetworks of three or four vertices respectively. The reason for this focus is that larger motifs often contain these smaller motifs; as such, larger motifs are less “ . . . likely candidates to serve as elementary ‘building block’ structures” (Kolaczyk, 2009, p. 167). As well as being of different sizes, network motifs can be directed or undirected; the discussion that follows focuses on directed network motifs of size-3, the size and type of motif investigated in the present study. The number of size-3 directed motifs is mathematically bounded by the num- ber of possible combinations of three vertices in a network. There are sixteen possible patterns for a directed graph with an order of three. Figure 3.7 sets out these sixteen patterns. Network motif detection does not detect subgraphs con- taining isolate vertices. Once patterns 1–3 in Figure 3.7 are discarded, thirteen possible size-3 directed motifs remain. A key point about network motifs is that they are statistically significant— they present more frequently than expected by chance alone. Problematically, a directed network with three or more vertices and no isolates will definitely contain one or more of the thirteen patterns presented in Figure 3.7; so, how to determine which, if any, of the presenting patterns are statistically significant?

38The triad census is a vector of length 16 that is the sum of the frequencies of occurrences of the sixteen patterns illustrated in Figure 3.7 in a network. Wasserman and Faust (1994, p. 567) note that Davis and Leinhardt (1968; 1972) introduced the triad census into social network research. See Wasserman and Faust (1994, chap. 14) for a detailed discussion of the triad census. The concept of the triad census also underlies the role census, an approach which interprets triads as roles (see section 6.4.3.3). 39In fact, Faust (2011), in her recent keynote presentation at the 7th UK Social Network Con- ference, called for social network researchers to consider a stronger focus on triads in networks, especially for network comparison purposes.

125 3. METHODOLOGICAL FRAMEWORK

Figure 3.7: The sixteen possible patterns for a three-vertex directed graph - This figure sets out the sixteen possible ways of linking three vertices in a directed network. Note that vertices can link by one-way, bidirectional and null dyads. Note also that patterns 1–3 contain isolate vertices. This figure is reproduced from McBirnie and Urquhart (2011, Figure 6).

126 3.5 Network theory

Network motif detection overcomes this problem by generating a random graph model,40 counting the number of appearances of each of the thirteen pos- sible patterns in both the real network and the random graph model,41 and test- ing for statistically significant differences between the real network and random graph model pattern counts. The network motifs are those subgraphs in the real network that have a statistically significant higher count. Because of the large number of random graphs required42 and the multiple steps involved, network motif detection techniques rely on the computational power of computers. Several network motif detection tools and algorithms are described in the literature (see Milo, Kashtan, Itzkovitz, Newman and Alon, 2003; Zuba, 2009). The edge-switching algorithm employed in the present study43 controls the random graph model for degree sequence: that is, each vertex in each of the generated random graphs has the same number of incoming, outgoing and bidirectional links as the equivalent vertex in the real network. Like network topology inference, network motif detection is a limited tool for understanding real networks. It is important to remember the proviso that stat- istical importance does not automatically equate to non-statistical importance: as Milo et al. (2002, p. 825) comment, “Patterns that are functionally import- ant but not statistically significant could exist which would be missed by our approach [network motif detection]”. As such, network motif detection is best combined with other methods.44

3.5.7.4 Potential for indirect network comparison

Neither of the techniques explored in sections 3.5.7.2 and 3.5.7.3 accommodate the direct comparison of two or more real networks: the comparisons involved

40Recall from section 3.5.7.1 that a random graph model is actually an ensemble of graphs. 41Technically, this aspect of network motif detection involves three steps: finding the sub- graphs; determining which subgraphs are isomorphic and grouping these together; and, counting the subgraphs (Zuba, 2009, p. 1). 42For example, research accounts from the literature report the sampling of 1, 000 (Milo et al., 2002; Zhu and Qin, 2005) and 10, 000 (Kolaczyk, 2009) random graphs. 43See section 4.8.6 for citations to this algorithm and a discussion of FANMOD, the motif detection software used in the present study. 44For an example of research that employs network motif detection along side other methods, see Zhu and Qin (2005).

127 3. METHODOLOGICAL FRAMEWORK

in network topology inference and network motif detection match a single real network with an ensemble of random graphs. However, real networks of different magnitudes can be compared indirectly using the results of inferential statistical network modelling. For example, net- work motifs can be detected separately in several real networks and the results correlated using traditional statistical methods or a qualitative method, such as multidimensional scaling (see section 4.6.1). The restrictions limiting the compar- ison of different magnitude networks are not relevant to such correlations because, once detected, network motifs are no longer dependent on network magnitude.

3.6 A multimethodological framework

3.6.1 Re/orientation

The reader having reached this point in the chapter may feel somewhat disori- entated by the preceding sections’ wide methodological swings. This last section of the chapter seeks to re/orientate the reader to the present study’s multimeth- odological framework. Wide though the swings between them have been, the methodologies discussed in this chapter do have shared themes and can be com- bined into a unified methodological framework.

3.6.2 Multimethodology

Teddlie and Tashakkori (2009, p. 21) note that there is little consensus across the research community “ . . . concerning what constitutes mixed methodology in broad terms . . . ”. As such, the discussion here is content to limit itself to developing a specific understanding of the narrower term, multimethodology. The present study’s understanding of multimethodology followed Mingers and Brocklesby’s (1997, p. 503): “The essence of multimethodology is linking together parts of methodologies . . . ”. Although Mingers and Brocklesby (1997, Table 1, p. 491) present a range of possibilities for mixing methodologies, they focus on multimethodology. A key point about multimethodology is that it draws on elements of multiple

128 3.6 A multimethodological framework methodologies—it does not seek to combine entire methodologies. Mingers and Brocklesby (1997, p. 489) argue that the desirability of multi- methodology is its usefulness for “ . . . dealing with the richness of the real world . . . ”. However, they also note that multimethodology raises philosophical, theor- etical and practical feasibility issues: why use mutlimethodology? what parts of which methodologies to combine? and, how to put multimethodology into prac- tice? In respect of this last issue, Mingers and Brocklesby (1997, p. 490) comment on the pivotal role of the agent, highlighting “ . . . the wide range of knowledge, skills and flexibility required of practitioners”. A multimethodological framework was highly appropriate for the present study: multimethodology sat comfortably with the pragmatic stance of the re- search (see section 1.4.1); elements of methodologies were selected based on shared themes that were relevant to the research questions (see section 3.6.3); and, a mixed methods research design (see section 4.3.1) supported the operationalisa- tion of the research in an empirical setting. Chapter 7 returns to these points in more detail, especially in relation to the pivotal role of the agent.

3.6.3 Shared themes

Three themes dominate this chapter’s methodological discussion:

1. The relational dyad

ˆ represents the smallest unit of analysis in relationalism; ˆ translates to the network dyad in network theory; ˆ structures the simplest form of the phenomenological type, action; and, ˆ reduces narrative proper action events to their fundamental elements.

2. Process

ˆ characterises the transactional perspective intrinsic to relationalism; ˆ informs phenomenology’s interest in the type, action; ˆ represents an essential focus in event narrative approaches; and,

129 3. METHODOLOGICAL FRAMEWORK

ˆ permeates the concept of flow inherent in both the flow and architec- ture network model.

3. Structure

ˆ relates explicitly to both relationalism’s relational dyad and event nar- rative’s semantic triplet; ˆ informs the analysis of event narrative at multiple levels; ˆ represents a major focus of network analysis; and, ˆ plays a key role in modelling.

3.6.4 The framework in a ‘nutshell’

The present study’s multimethodological framework had relationalism and phe- nomenology as its foundations. Narrative, network, and modelling theories were employed to support the empirical aspects of the study. The research conceived process, expressed in the form of narrative action events, as transactional; as such, the analytic focus was not on individuals, but rather, on individuals in relations. Single action events in narrative were rep- resented as semantic triplets. Multiple semantic triplets combined to form the narrative proper, or event structure. Narrative action is an example of the phenomenological type, action: the phenomenological structure of action, {Subject:Verb:Object}, and the semantic triplet structure of action, {Subject:Action:Object}, are equivalent. With this equivalence in mind, the study labelled the event structure that is the narrative proper, the phenomenological event structure. Network methods, which neither focus on internal concepts such as motivation nor require an intended consequence or outcome, were well placed to study and describe the relational nature of the phenomenological event structure.

130 Chapter 4

Research Design

This chapter states the research aims and objectives, introduces the study dataset and overviews the research design and methods. The chapter also considers the concept of research quality and reviews the software used by the study. With respect to methods, specifics of the research implementation are presented in Chapter 5, which traces the operationalisation of the research; in the present chapter, the methods discussion is more general in nature, aimed at a survey of the event structure coding and analysis approach adopted and the decisions taken in relation to this approach.

4.1 Research aims and objectives

4.1.1 Methodological concerns and objectives

Recall from section 1.4.1 that the study was conceived in the pragmatic tradition. As such, several pre-existing methodological concerns openly guided the theoret- ical research aim: the decision to focus on description of process in serendipity, especially with the goals of contributing detail and precision to this description; the conception of process in serendipity as fundamentally intertwined and con- nected; the hypothesis that serendipity is both normative and individualistic; and the desire to answer two existing calls, one for further investigation of serendip- ity in the Citation Classics dataset, and the other for new approaches to study serendipity.

131 4. RESEARCH DESIGN

These concerns gave rise to the study’s four methodological objectives.

1. To investigate serendipity in the Citation Classics dataset.

2. To integrate the methodologies of phenomenology and relationalism into a methodological framework.

3. To apply the methodological framework in an empirical context using meth- ods new to serendipity research.

4. To audit the research, not only to establish the extent of its theoretical cred- ibility, but also to assess its methodological value.

4.1.2 Theoretical aim of the research

The study aimed to describe process in serendipity, conceived as phenomenolo- gical and relational, as recounted by the narrators of a random sample of research narratives drawn from a subpopulation of narratives of experiences of serendip- ity found within the Citation Classics dataset. This theoretical research aim encompassed two components.

1. The characterisation of selected contextual, process-related elements of the sample narratives.

2. The description of phenomenological event structures of serendipity extrac- ted from the sample narratives.

The theoretical research aim focused on the elements of the conceptual framework for specifying process in serendipity presented in section 2.3.

4.1.3 Objectives designed to address the theoretical aim

4.1.3.1 Objective set one—contextual elements

Objective set one addressed the first component of the theoretical aim: the char- acterisation of contextual elements related to process in serendipity. Unless in- dicated otherwise, the objectives in set one employed survey methods.

132 4.1 Research aims and objectives

1. To review, apply and relate the serendipity classifications presented in the literature, to examine the appearance of serendipity patterns proposed in the literature, and to clarify the relationship, if any, between patterns and classifications in the study sample.

2. To establish and correlate the appearance of serendipity filters in the study sample, and, drawing on rich textual description, to assess the impact of filters on process in serendipity as interpreted by the sample narrators.

3. To reveal how process in serendipity is interpreted by the sample narrators in reference to intention versus realisation, linearity versus non-linearity, multiple disciplinarity, and immediacy versus delay in recognition.

4.1.3.2 Objective set two—event structures

Objective set two focused on the second component of the theoretical aim: the description of phenomenological event structures of serendipity. This set of ob- jectives demanded the use of mixed methods.

1. To identify, code and extract the phenomenological event structures of serendip- ity underlying the sample narratives.

2. To specify descriptive statistics for the elements and relations, structural cohesion, and contexts of these event structures.

3. To quantify from the event structures the load carried locally by narrators, to determine how this load is divided between active (pulled) and passive (pushed) load, and to measure the activity and passivity of narrators in relation to this load.

4. To estimate, compare and evaluate interaction, including the occurrence and effects of sequence and indirect versus direct interaction, the existence of local and global normative interaction patterns, and the distribution of attributes in these patterns, in the event structures.

133 4. RESEARCH DESIGN

4.2 Study data

4.2.1 The Citation Classics dataset

The Citation Classics commentaries online database (Garfield, 2012) provided the raw data for the study. This database was established at the University of Pennsylvania by the information scientist, Eugene Garfield, who is best known for his contributions to the field of citation analysis. The database contains the full- text of the Citation Classics commentaries published weekly from 1977 to 1991 and every two weeks in 1992 and 1993 in all seven editions of Current Contents.

Current Contents In its published format during the period 1977-1993, Current Contents collec- ted together the table of contents from a large number of preselected academic journals and redistributed them under a single cover with the aim of providing a current awareness and alerting service to readers. As the scientific publisher Sandy Grimwade once remarked, Current Contents “. . . was pretty much the best way to find out what was going on in the scientific literature [at that time] without going to the library and digging through every single journal” (in Constans, 2005, p. 7). Seven editions of Current Contents were produced:

1. Agriculture, biology and environmental sciences

2. Arts and humanities

3. Clinical medicine

4. Engineering, technology and applied sciences

5. Life sciences

6. Physical, chemical and earth sciences

7. Social and behavioural sciences

134 4.2 Study data

Current Contents began in the 1950s with early versions of the Social and behavioural sciences and the Life sciences editions (Garfield, 1993f). Today, Cur- rent Contents continues in print published by Thomson Reuters and on the Web as Current Contents Connect, integrated into the Web of Knowledge.

Citation Classics As well as listings of table of contents, Current Contents also contain original contributions, including short essays and Citation Classic commentaries. A Cita- tion Classic commentary (henceforth, Citation Classic) is a one-page, first-person narrative that provides the author of a highly-cited research paper or book the opportunity to discuss aspects of the research retrospectively, especially events and experiences informing and surrounding the research that were not reported in the formal research report. Discussing the content of Citation Classics, Garfield (1993e, p. 389) writes:

Authors of highly cited papers [were] invited to provide us with the story behind their research. For example, what prompted them to un- dertake a project? What obstacles did they encounter?—in short, the human and personal details rarely revealed in formal journal articles.

Other than being subject to the prompting inherent in the editors’ invitation, authors were free to decide the contents and slants of their Citation Classics. Most Citation Classics were equally short—printed on a single page—and were not generally subjected to peer review prior to publication. Each Citation Classic began with a brief author-supplied abstract intended to provide context and an introduction to the highly-cited paper. In cases of highly-cited papers with multiple authors, the first author was normally asked to write the commentary. Initially, the papers meriting Citation Classics were drawn from a list of five hundred papers comprising the most-cited papers during the period 1961-1975 (Garfield, 1993c, p. 1). The life sciences, especially the areas of molecular biology and biochemistry, dominated the list. Garfield (1993b, p. 1) notes that in 1977, the first year of the Citation Classics, thirty-seven of the fifty-two commentaries were discussions of research in the life sciences. From 1979, the criteria for determining Citation Classics changed. Discipline was taken into account to allow for the differing sizes of literature output in dif-

135 4. RESEARCH DESIGN

ferent fields. This change gave papers from disciplines with traditionally smaller bodies of literature, including medicine, technology and applied sciences, social and behavioural sciences, and the arts and humanities, fairer chances of becoming Citation Classics (Garfield, 1993b,e). The literature is contradictory regarding the exact number of Citation Classics published; Garfield himself refers varyingly to counts of 5000 (Garfield, 1993d,e) and 4000 (Garfield, 2007, 2012). This discrepancy most likely results from the journal overlap between different editions of Current Contents. For example, in 1992, the table of contents for the journal, Science, was in- cluded in three editions of Current Contents: Life sciences, Arts and humanities, and Social and behavioural sciences. Thus, a Citation Classic written in that year by an author of a paper published in Science would have been published in all three Current Contents editions (and, consequently, now has three entries, each tagged with different edition metadata, in the online Citation Classics com- mentaries database). Some quantitative calculations in the present study relied on a count of the number of Citation Classics contained in the online database. Appendix B out- lines how this count was achieved. The resulting count of 5082 indicated that neither of Garfield’s figures for the Citation Classics was accurate. Because of the multiple Citation Classics publications, the number of unique Citation Classics in the online database is lower than 5082. This study’s estim- ate of the number of unique Citation Classics (3945) suggests that, of his two estimates, Garfield’s figure of 4000 is a more realistic count. See Appendix C for the calculations used in the study to estimate the number of unique Citation Classics in the online database.

4.2.2 The Citation Classics as phenomenological data

In his 1977 introduction to the Citation Classics, published in the same Current Contents issue as the first ever Citation Classic, Garfield (1993c, p. 1) writes:

Citation Classics will enable the authors of these [highly-cited] papers to discuss their work retrospectively. It is the kind of science ‘review- ing’ rarely seen in scientific journals. It will provide a kind of living

136 4.2 Study data

history. Authors will discuss interesting aspects in the development of their techniques, the role played by co-authors or others, and the encouragement received from colleagues. This is the human side of science.

Garfield (1993e, p. 389) maintains that the Citation Classics provide a collection of “personal recollections”, each of which is a “mini-autobiography” of research, as performed, lived through, and experienced, that “capture[s]” the “unique per- spective” of the researcher. Garfield’s comments emphasise the phenomenological nature of the Citation Classics. Firstly, the Citation Classics, as discussions of past work, are retro- spective and consciously temporally bounded. Secondly, the commentaries are structures of past experience, ‘a kind of living history’. Thirdly, the accounts discuss normal aspects of research such as technique development and interaction with colleagues; whether he intended to or not, Garfield uses a phrase very at- tuned to phenomenological principles when, generalising, he refers to the normal experiences of research as ‘the human side of science’. Finally, as Garfield points out, the Citation Classics are personal and autobiographical, intended to capture the ‘unique perspective’ of the individual researcher involved and not the kind of depersonalised ‘reviewing’ accounts presented in formal research reports.

4.2.3 Serendipity in the Citation Classics

The Citation Classics have been studied previously in serendipity research by Campanario (1996) and Hannan (2006).

Hannan Supported by a collection of anecdotal accounts of serendipity, Hannan (2006) presents a discussion of serendipity, luck and wisdom in research. Hannan (2006, p. xiii) describes using the Citation Classics, in combination with other source material, as examples in his review because of the informal, personal nature of the accounts and the fact that “many” mentioned accidental discovery; however, Hannan (2006, p. ix) does not clarify how he selected Citation Classics as ex- amples for his review, other than commenting that he browsed through years of

137 4. RESEARCH DESIGN hard copies of Current Contents in a library, a tactic which suggests he did not use the online Citation Classics database.

Campanario Campanario (1996) reported on research designed to examine a subpopula- tion of 205 Citation Classics for accounts of serendipity. Concerned primarily with classification matters and drawing heavily on the work of van Andel (1994), Campanario found that 8.29% of his subpopulation of Citation Classics (17 out of 205 papers) mentioned serendipitous factors, defined as accidental, luck, or chance events. Like Hannan, Campanario does not provide details of a specific method (e.g. keywords) he employed to determine if a Citation Classic was an example of serendipity.

Links with the present research Unlike the present study, Campanario (1996) applied a broad definition of serendipity (i.e. encompassing accident, luck, and chance). The present study re-examined Campanario’s findings, checking his seventeen examples of serendip- ity against the criteria presented in section 4.2.5 that were used to extract the serendipity subpopulation from the Citation Classics database. Six of Cam- panario’s study’s seventeen papers qualified for inclusion, indicating that 2.9% of Campanario’s study subpopulation (6 out of 205 Citation Classics) met the criteria for serendipity as specified in the present study. As expected, the present study’s inclusion criteria were narrower than those of Campanario’s study. The figure of 2.9% estimated from Campanario’s research was used to compare related figures calculated from the present study’s results (see section 6.3.1).

4.2.4 Advantages and limitations of the dataset

As a research dataset, the Citation Classics offers advantages; however, it also presents limitations.

Advantages As discussed, the Citation Classics are specifically intended to be phenomen- ological and narrative in nature. Because of their frequent focus on action and

138 4.2 Study data interaction, the narratives contain good process descriptions presented in well- developed and detailed narrative propers. Some may dispute the use of data not gathered specifically to study serendip- ity; however, the Citation Classics writers’ freedom to decide on the topics and slants of their commentaries, tempered only by limited prompting by the Current Contents editors, can be viewed as an advantage for the study purposes. Some phenomenological narrative methods, including the narrative interview, specific- ally aim for this unstructured approach coupled with minimal intrusion by the researcher (Bauer, 1996; Pfaff, 2006; Sch¨utze,1983). Precedent also exists for the use of the Citation Classics dataset to invest- igate serendipity, including Campanario’s study, research that provides useful comparison data. In fact, Campanario (1996, p. 22) specifically calls for follow- up research, “a detailed exploration of the whole Citation Classics database”, to expand his findings on serendipity. On a practical level, the Citation Classics dataset files are indexed and fully searchable online. As well, each file is a manageable length, usually limited to one page of text.

Limitations One potential limitation of the Citation Classics dataset is that serendipity described in a Citation Classic may represent ‘expert’ serendipity. Only highly- cited papers merit Citation Classics, and a high citation rate means that many other researchers have referenced the paper. It seems reasonable to assume that researchers producing highly-cited papers are more likely to be experts and/or have expertise knowledge. Garfield (1993c, p. 1) supports this assumption, pre- dicting that Citation Classics “. . . authors will not only be among the most cited, but also the most highly qualified in their respective fields”. Theory and empirical research suggest that experts think, behave and per- form differently from novices and non-experts (Ericsson, Charness, Feltovich and Hoffman, 2006); however, care must be taken over the expertise assumption in the context of the Citation Classics. Garfield (1993a,c) notes that some papers were written by researchers at the start of careers or as the result of doctoral work. In these early career cases, the authors would not have yet had the opportunity to

139 4. RESEARCH DESIGN

accumulate the years of knowledge and training often associated with expertise. A second significant potential limitation is the accuracy, both general and spe- cific, of the Citation Classics. As narrative commentaries, the Citation Classics are subject to the general limitations and conditions of narrative, including the informant accuracy problem discussed in section 3.3.1.2; however, many of these concerns are irrelevant in a phenomenological approach. Intentional inaccuracy may also occur in the Citation Classics. Garfield (1993e, p. 390) comments that the accuracy of Citation Classics cannot be guar- anteed as the commentaries are not subject to peer review, acknowledges that some Citation Classics contain disputed claims, and notes that co-authors of a highly-cited paper sometimes disagree with the Citation Classics author (nor- mally the highly-cited paper’s first author) over the version of research study events described in the Citation Classic. Other limitations to the dataset exist. Despite the broader discipline inclu- sion from 1979, many Citation Classics still focus on the sciences and lab-based research. This focus, even though non-biased in terms of citation rates has the potential to introduce bias into a study of serendipity because there are simply fewer narratives available from non-science disciplines. It is also important to emphasise that the Citation Classics recount research conducted pre-Web, and in some cases, pre-Internet. Potential for findings based on this dataset to generalise to the modern-day digital context may be limited. Section 8.2.2 considers this limitation in more detail.

4.2.5 A subpopulation of serendipity narratives

Only some of the Citation Classics discuss serendipity; as such, the study re- quired a subpopulation of narratives describing narrators’ experiences of process in serendipity to be extracted from the larger set of commentaries.

The boundary problem Due to the difficulty of defining serendipity (see section 1.3.1) and the overlap between serendipity and related concepts (see section 1.3.4), the boundary prob- lem for the serendipity subpopulation was nontrivial. Should the research follow

140 4.2 Study data the leads of Campanario (1996) and Hannan (2006) and include commentaries referring to luck and chance in the study population? If so, then how should these commentaries be identified and where should the investigation draw the line between non-serendipitous luck or chance and serendipitous luck or chance? The phenomenological foundations of the research resolved the boundary problem: these foundations demanded that the Citation Classic narrator expli- citly identify an experience as serendipity. Thus, the present study’s serendipity subpopulation was rather different to those defined by Hannan (2006) and Cam- panario (1996): since no objective means existed to clarify from the Citation Classics that, by mentioning luck, chance or related concepts, narrators were equating these concepts with serendipity, the research ignored all such potential, implicit references to serendipity.

Keyword searching Keyword searching of the Citation Classics database spotted explicit references to serendipity in the narratives. The Citation Classics commentaries are available online in full-text as .pdf files. The database interface only supports author or year index searches of the commentaries, not direct full-text searching of the .pdf files; however, using Google (Google, 2012)1 and the advanced operator site:, it is possible to search the full-text of the database .pdf files by year. The study search sought keywords containing the stem serendip*. Figure 4.1 illustrates an example keyword search from the study. Eighty keywords were located by the keyword search. This figure was probably slightly lower than the actual number of appearances of the selected keywords in the Citation Classic database because keywords could occur more than once in the same commentary. Table 4.1 presents the distribution of the keyword results by year.

1The UK version of Google was used in the study.

141 4. RESEARCH DESIGN

Year Serendipity Serendip Serendipitous Serendipitously

1977 2 1 0 0 1978 2 0 0 0 1979 3 0 1 0 1980 8 0 0 0 1981 4 0 4 1 1982 2 0 3 1 1983 1 0 2 1 1984 1 1 1 1 1985 4 0 2 1 1986 1 0 1 1 1987 4 0 2 0 1988 1 0 4 0 1989 3 0 1 0 1990 2 0 1 0 1991 0 0 0 0 1992 3 0 1 1 1993 1 0 4 2

Table 4.1: Keyword search results - This table presents the counts of the stem serendip* keywords in the online Citation Classics database. The counts are listed by year.

142 4.2 Study data

Figure 4.1: Study keyword search method - The figure presents the results of a Google search for the keyword serendipity in the 1977 Citation Classics .pdf files.

Semantic meaning As a purely text-based method, keyword searching fails to capture semantic meaning. Opening each .pdf file and locating keywords in the text using the find function helped to address this weakness by allowing the investigation of semantic context. Problematically, a few narratives included keywords but not in reference to the process of serendipity as experienced by the narrators. For example, consider this comment by a Citation Classics narrator, Harris (1980, p. 22): “Here is a field rich in serendipity for the researcher open to discovery”! Drawing on event-based narrative approaches (see section 3.3.2), the study placed this phrase in the evaluation section of the narrative rather than in the complicating action. In other examples, narrators mentioned the word ‘serendipity’ in relation to process, but only when providing historical background to their own narratives (e.g. when referring to serendipity experienced by other researchers). In such cases, the keyword was not used in reference to narrators’ own experiences of process in serendipity. Keywords that appeared only in the Citation Classics abstracts were also ignored.

143 4. RESEARCH DESIGN

Additional checks As a systematic semantic check on the keyword search results, the study em- ployed Campanario’s (1996) two-category taxonomy (see section 2.3.2.3) to test the suitability of the narratives highlighted by the keyword search for inclusion in the subpopulation. This taxonomy is broad enough to capture examples of serendipity, yet specific enough to exclude examples of non-serendipity. Recall that Campanario uses two categories to classify serendipity. Both cat- egories are process-based, making them suitable for classifying the process aspects of the narratives. The serendipity subpopulation included only narratives that fell into either of the two categories. A further stumbling block remained to identifying the subpopulation of serendip- ity narratives: authorship. A few Citation Classics are written, not in the first- person, but rather by a third party (e.g. after a highly-cited paper’s author’s death). Because of the study’s grounding in phenomenology, the subpopulation needed to comprise only first-person narratives. Occasionally, Citation Classics listed multiple authors; if these narratives used the first-person, either the singular (I) or the plural (we), then they were included in the subpopulation. Although assumed to be written by the first author in accordance with normal Citation Classic practice, the narratives presented in the first-person plural required spe- cial treatment (see sections 5.3.3 and 5.3.4).

The subpopulation In summary, extracting a study subpopulation of serendipity narratives from the Citation Classics commentaries was nontrivial and required several steps. The final subpopulation comprised seventy-six Citation Classics and was bounded by the following three conditions:

1. Citation Classics included in the subpopulation contained the keyword(s) serendip, serendipity, serendipitous, and/or serendipitously;

2. Citation Classics included in the subpopulation were narratives written in the first-person singular or plural;

144 4.2 Study data

3. Citation Classics included in the subpopulation described narrators’ own experiences of process in serendipity, and this process was classifiable using Campanario’s (1996) taxonomy.

4.2.6 Sampling the serendipity subpopulation

Statistical approaches were key to the research. As such, a random sample of sufficient size suitable for statistical analysis was needed. The study drew a ran- dom sample of fifty narratives from the serendipity subpopulation. Because the data seemed highly heterogeneous, the research preferred a large sample relative to the subpopulation;2 however, as the study methods did not rely on the use of classical inferential statistics, a statistically-exact-sized sample was not necessary. Sampling involved assigning each Citation Classic a number from one to seventy-six before randomly selecting fifty numbered commentaries. The random number generator, random.org (Haahr, 2012), supplied the selection numbers. Once the random selection process had achieved the desired sample size, the fifty .pdf files were examined for duplicate content. Recall that, because of journal overlap between Current Contents editions, the same Citation Classic could appear multiple times in the online database. The study required a sample comprised of distinct cases; if multiple appearances of a Citation Classic were sampled, then all but one of these were discarded and replacements randomly selected. The iterative steps of sampling and review continued until the sample comprised fifty distinct Citation Classics. During the course of the data collection, some small changes to the sample were necessary. Four of the fifty narratives—Ahlquist (1978), Barnstaple (1992), Dewar (1985), and Wasserman (1988) (see Garfield, 2012)—were discarded and replaced through additional random sampling because they contained too little useful process data to allow network description. One sample narrative, Weiss- man and Koe (1990) (see Garfield, 2012), comprised three distinct serendipity narratives, resulting in two extra narratives being added to the sample; to re-

2Note that seventeen duplicates were present in the subpopulation of seventy-six Citation Classics. The removal of these duplicates from the subpopulation would have effectively de- creased the subpopulation size to fifty-nine. Given this information, the sample size of fifty was even larger than first thought relative to the actual size of the subpopulation.

145 4. RESEARCH DESIGN turn the sample size to fifty, two of the Citation Classics awaiting data collection were selected at random and discarded. Appendix D lists the Citation Classics comprising the final study sample.

4.3 Overview of the research methods

4.3.1 Mixed methods design

Teddlie and Tashakkori (2009) describe five families of mixed methods research design, including the parallel mixed design and the conversion mixed design fam- ilies. Teddlie and Tashakkori (2009, p. 151) summarise the characteristics that define these two families of mixed methods research designs:

Parallel mixed designs—In these designs, mixing occurs in a parallel manner, either simultaneously or with some time lapse; planned and implemented QUAL [qualitative] and QUAN [quantitative] phases an- swer related aspects of the same questions. Conversion mixed designs—In these parallel designs, mixing occurs when one type of data is transformed and analyzed both qualitatively and quantitatively; this design answers related aspects of the same questions.

Teddlie and Tashakkori (2009) note five points concerning the mixed methods designs. Firstly, the approach to data analysis must be appropriate for the type of mixed methods design used; for example, conversion mixed designs employ con- version mixed data analysis, an approach which can involve quantitising narrative data, qualitising, or profiling, quantitative data, and/or the use of inherently and neccesarily mixed techniques, including social network analysis. Secondly, data collection and sampling strategies are directly informed by the intended data ana- lysis methods. Thirdly, permutations are possible within and across the families of research design. Fourthly, the mixed designs all have at least two strands, but may have more than two. Finally, all true mixed methods designs integrate qualitative and quantitative aspects at the inferential and/or conceptualisation stages of a study, not just at the experiential stage.

146 4.3 Overview of the research methods

The present study employed a conversion mixed design nested within a par- allel mixed design. As such, all elements of the research design were aimed at addressing related aspects of the study’s overarching research question, which focused on the description of process in serendipity. Qualitative and quantitative aspects were integrated at the conceptual (i.e. question formulation), experiential and inferential stages of the study. Figure 4.2 presents a high-level diagram of the parallel conversion mixed method research design structure adopted in the study.

Figure 4.2: The study’s parallel conversion mixed methods research design - This figure illustrates the high-level structure of the parallel conversion mixed methods research design followed in the study. The three study stages are indicated on the left-hand side of the diagram.

147 4. RESEARCH DESIGN

The parallel design component involved two strands. One strand, primarily quantitative in nature, focused on contextual elements of process in serendipity, addressed objective set one, and employed survey methods. The other strand, fundamentally mixed method in nature, comprised the conversion design com- ponent. The conversion design component focused on the description of the serendip- ity event structures, involved two strands of its own—one for each event struc- ture model—and addressed objective set two. All three conversion mixed data analysis approaches were employed in the conversion component of the research design: the coded qualitative narratives were quantitised into network models; social network analysis, which produced both qualitative and quantitative results, was used to study the narrative event structures; and, some of the quantitative results of the social network and statistical analysis were qualitised to construct a descriptive profile for the study population.

4.3.2 Study stages

The research design incorporated a total of nine stages. The two parallel strands of the study diverged in stage three and re-converged in stage eight. Figure 4.3 presents a flow-chart of the study stages.

Stage 1: The research extracted and sampled a subpopulation of serendipity narratives.

Stage 2: A set of pilot studies trialled the methods intended for use in stage three.

Stage 3: The study employed narrative analysis and relational text analysis to code the event structures and survey methods to collect the contextual data.

Stage 4: The research translated the event structures into networks and ana- lysed these using network analysis and inferential statistical network mod- elling techniques.

Stage 5: The research analysed the contextual data collected in stage three. This analysis was primarily quantitative.

148 4.3 Overview of the research methods

Stage 6: Descriptive statistical analysis of the network analysis results was un- dertaken.

Stage 7: Additional statistical analysis served as a check on the study’s two model event structure approach. This check contributed to the credibility audit conducted in stage nine.

Stage 8: The outputs from stages four to seven informed, both separately and in integration, the inference process. In line with the research aim, the inference process focused on description.

Stage 9: An existing integrative framework guided a credibility audit of the research. The audit evaluated the quality of the research inferences prior to the consideration of transferability issues and recommendations for practice and further research.

4.3.3 The pilot studies

Three consecutive pilot studies, each an attempt to improve upon the previous, involved five Citation Classics from the sample of fifty. The pilot work tested the methods and established useful and consistent approaches to the narrative coding. The studies highlighted the importance of coding for direction and frequency, provided examples of data collection consistency problems, and illustrated the vital role of computer software.

149 4. RESEARCH DESIGN

Figure 4.3: The nine stages of the study - This figure presents a flow-chart of the nine study stages. Note the divergence in stage three. The study stages reconverged in the inference process.

150 4.4 Event structure coding

4.4 Event structure coding

4.4.1 Event structure as normal form

Despite its detailed analysis of each sample narrative, the research should not be labelled a case-study design: the primary investigative focus was on the ag- gregated and average description of phenomenological event structures across the sample narratives. The research considered the phenomenological event structure as the semantic normal form for process in serendipity in the study population. It was this normal form, translated into network format, that was investigated. Gopnik (1972)3 explored the concept of the normal form of a text in her study of the linguistic structures of scientific texts. She studied twenty-eight scientific research reports with the aim of determining a linguistically computable underlying semantic normal form for the texts. This underlying form equated to the normal form for the set of texts. According to Gopnik (1972, p. 10–12), three requirements underlie the de- termination of a normal form from a set of texts:

1. the texts in the set must be stylistically uniform;

2. the texts in the set must contain explicit semantic content;

3. the texts in the set must be similar in certain practical respects, including length, completeness and purpose.

The study’s sample Citation Classic texts met Gopnik’s three requirements. The commentaries were all narratives, all told in the first person, and all phe- nomenological in nature (requirement 1). Although the story details and the spe- cific words used to describe them were highly heterogeneous, the sample texts all incorporated narrative propers containing explicit semantic content, description of process in serendipity as a sequence of events (requirement 2). The Citation Classics, normally one page long,4 were (almost)5 all complete, stand-alone texts

3The author wishes to thank S. Urquhart (1977, p. 28) for drawing her attention to Gopnik’s study and for pointing out its methodological value. 4One text in the study sample was two pages long: Kroto (1993). 5Recall from section 4.2.6 that one of the sampled narratives, Weissman and Koe (1990), actually contained three separate serendipity stories.

151 4. RESEARCH DESIGN written in response to a similar request and to serve a similar purpose (require- ment 3).

4.4.2 Use of narrative analysis

The study adopted an event-based approach to analyse the sample Citation Clas- sic narratives. The primary goal of the analysis was to delineate description, evaluation and narrative proper in the narrative texts. As such, the method used by the study was closer to Franzosi’s (2010) approach to narrative than either the strict (Labov and Waletzky, 1967, 1997) or simplified (Labov, 1997) Labovian approaches. To complement its use of the event-based approach, the study also considered contextual aspects of the narratives (see section 5.2). The relational text analysis (see section 4.4.3) used by the research aimed to code only the narrative proper text that was relevant to the serendipity experi- ence described by a sample Citation Classic narrative. While the relevant text certainly never equated to the full narrative, neither did it always equate to the full narrative proper. As such, one important goal of the narrative analysis was to identify and extract the narrative proper from each sample narrative; the second was to ensure that the parts of the narrative proper relevant to the serendipity experience were identified. The narrative proper normally contains all of the action and sequences of events described in a given narrative; however, in the present study, the term narrative proper was used more restrictively, to encompass all the narrative text relevant to the narrator’s experience of process in serendipity. For example, if a narrator discussed a serendipity experience in research and followed it with an account of how she published the results of that research, then only the narrative text relating to the serendipity experience was coded for the study purposes. Because the sample Citation Classics often contained several mini-stories, usually only one of which related to the serendipity experience, the study needed some way of delineating the narrative component that recounted the serendipity experience. Narrative norms were applied to establish the boundaries of the recounted serendipity experiences. In the case of personal narrative, narrative norms relate to the ‘main point’

152 4.4 Event structure coding of the story and the social context in which the story is told, or more formally, to reportability and credibility (Labov, 1997). Put simply, narrative norms dictate that a narrator will include all (but no more) of the event detail necessary to make the main point of a story. Reportability draws on Sack’s rules of speakership assignment (see Sacks, 1995; Sacks, Schegloff and Jefferson, 1974) and relies on the idea of a reportable event justifying “the automatic reassignment of speaker role to the narrator” (Labov, 1997, sec. 5.2). In other words, an audience grants attention to a narrator because his story seems to be leading to a main point that has some significance or interest. In the sample Citation Classics, serendipity was assumed as the main point of all the narratives. It is important to note that what makes a reportable event significant or interesting is entirely related to the context in which a narrative is told. Credibility rests on the condition of the audience having the right amount of event detail to support belief in the occurrence of the reportable event. Labov (1997, sec. 5.2.1, 6.2) notes two key rules that fundamentally inform narrative norms:

1. “To be an acceptable social act, a narrative of personal experience must contain at least one reportable event”; and,

2. “reportability is inversely correlated with credibility”.

The implication of the first rule is clear: an audience is unlikely to be willing to listen to someone ramble on for no reason. The second rule implies that a narrator must put in more effort to tell a story with a highly reportable event. An audience bases its belief of a story on the event detail given. If the narrator recounts irrelevant event detail, then the audience may wander off the track of the main story-line; however, if not enough event detail is provided, then the audience may not understand how the sequence of events reported is relevant to or cumulates in the reportable event. In either case (wandering off or not understanding), the theoretical (and, often, real-life) result is that the audience deprives the narrator of the right to speaker assignment, either immediately or in future social contexts.

153 4. RESEARCH DESIGN

The theory of narrative norms has been shown to translate to practical con- texts. Roles played by norms are acknowledged in the contexts of written news (Grunwald, 2005) and the narrative interview (Pfaff, 2006), and by research par- ticipants in empirical settings (Riessman, 2002).

4.4.3 Use of relational text analysis

4.4.3.1 What is relational text analysis?

Relational text analysis, also known as network text analysis (Carley, 1997; Pop- ping, 2003), map analysis or mental models (Carley, 1993; Carley and Palmquist, 1992; Palmquist et al., 1997), semantic text analysis (van Atteveldt, 2008; van Atteveldt et al., 2008) and meta-network text analysis (Diesner and Carley, 2005), is a coding approach used to capture semantic and/or social semantic networks. Recall from section 3.5.4 that such networks derive from texts. As a text analysis method, relational text analysis adopts some features of traditional content analysis; however, whereas content analysis codes concepts from texts, relational text analysis codes relational dyads from texts. The coded dyads combine into a relational structure that can be measured using network analysis. A significant strength of relational text analysis is its ability to capture the social structure implicit in the connections between words in text. Diesner and Carley (2005, p. 3) note:

NTA [network (or relational) text analysis] approaches vary on a num- ber of dimensions such as the level of automation, a focus on verbs or nouns, the level of concept generalization, and so on. Nevertheless, in all cases, networks of relations among concepts are used to reveal the structure of the text, meaning, and the views of the authors. Fur- ther, these networks are windows into the structure of the groups, organizations and societies discussed in these texts. This structure is implicit in the connections among people, groups, organizations, resources, knowledge, tasks, events, and places.

Its focus on social structure locates relational text analysis firmly within re- lationalism and sets the method apart from other text analysis methods. For

154 4.4 Event structure coding example, domain and taxonomic coding, a content analysis method, aims to cap- ture semantic relationships (Salda˜na,2009, p. 133–134); however, the focus of this approach is on developing categories and lists rather than on capturing structure. In domain and taxonomic coding, the coded semantic relationships are ultimately analysed individualistically rather than relationally. Like content analysis, relational text analysis can be automated; for example, Carley’s (2012a) AutoMap software is specifically aimed at mining semantic net- works from texts. Automating relational text analysis can allow for the efficient coding of a very large collection of texts, something that is not generally possible when coding by hand. For example, Diesner and Carley (2010) report on a study that mapped the sociotechnological networks of Sudan using 45,000 open source text documents. However, as noted by van Atteveldt (2008), problems remain with automated relational text analysis, especially in terms of the capability of software to correctly differentiate between syntax and semantics. The present study employed relational text analysis to code the semantic triplets underlying the narrative propers of the sample Citation Classics. The coding was done by hand. Attempts were made during the pilot studies to employ AutoMap to code the sample narratives; however, because of the heterogeneity of the narrative proper content across the sample narratives, extensive thesauri were required for even limited success with the automated coding. In the end, given the nature of the Citation Classics, the study found hand coding as efficient, if not more efficient, than automated coding.

4.4.3.2 Relational text analysis coding choices

In her comparison of traditional content analysis and relational text analysis,6 Carley (1993) argues that relational text analysis subsumes content analysis. She reviews eight coding choices necessary for content analysis and an additional four choices required for relational text analysis. The coding choices relate to issues such as the level of analysis, generalisation and the use of inference. Although she discusses coding choices only in relation to analysis issues, Carley reminds her readers that coding choices affect not only analysis results but also their

6Carley uses the term map analysis.

155 4. RESEARCH DESIGN

interpretation. Table 4.2 lists the twelve coding choices reviewed in Carley (1993).

Coding choices shared with traditional content analysis

Concepts: Carley (1993, p. 81) defines a concept as, “ . . . a single idea, or ideational kernel (Carley, 1986), regardless of whether it is represented by a single word or a phrase”. She notes that in content analysis, the researcher must decide if single words or longer phrases are to be coded as concepts (choice 1), if concepts are to be determined in advance or only determined interactively during the course of coding (choice 3), if concepts are to be coded verbatim from the text or if they are to be generalised during the first cycle of coding (choice 4), and if an existing or a specially created thesaurus will be used to translate coded concepts into more general terms (choice 5).

Coding decisions: Carley highlights the need for the researcher to trans- late coding decisions into coding rules. Coding decisions relate to the issue of irrelevant information; for example, will it be ignored or is no information con- sidered irrelevant (choice 2)? Coding decisions clarify if only explicit information is to be coded or if implicit concepts are to be coded based on some prede- termined approach to perceived missing information (choice 6). Decisions also dictate whether to code for existence or frequency: is knowledge of the existence of a concept in the text sufficient or is information required on the frequency of a concept’s appearance in a text in order to address the research questions (choice 7)? Finally, coding decisions establish the number of concepts to code appropriate to the topic of study and given the sociolinguistic environment of the text (choice 8); this decision is normally based to some extent on sociocultural knowledge of the topic domain.

Additional choices required by relational text analysis

Number of concept types: Researchers can choose to code one or more concept types (choice 9); for example, ‘apple’, ‘pear’ and ‘cabbage’ could be classed into two concept type categories, ‘fruits’ and ‘vegetables’, or into a single

156 4.4 Event structure coding

Coding choice Coding choice description

1 Level of analysis—what constitutes a concept

2 Approach to irrelevant information

3 Use of predefined or interactive concepts

4 Level of generalisation

5 Creation of translation rules

6 Level of implication for concepts

7 Existence or frequency

8 Number of concepts

9 How many types of concepts

10 Level of analysis—what constitutes a relation

11 Level of implication for relations

12 Use of social knowledge

Table 4.2: Relational text analysis coding decisions - This table lists the coding decisions that Carley (1993) argues are required in relational text analysis. Coding choices 1–8 apply to both traditional content analysis and relational text analysis. Coding choices 9–12 apply only to relational text analysis.

157 4. RESEARCH DESIGN

concept type category, such as ‘produce’, ‘food’, ‘noun’ or ‘object’. The choice of number of concept types is related to the level of generalisation of the data desired. Carley (1993, p. 93) suggests that the use of a single concept type is often the most appropriate choice, especially in exploratory research or where generalisation needs to be retained:

In many studies, one category of concepts is sufficient. The only justification for multiple categories of concepts occurs if there are predefined classes of concepts, each of which the researcher wants to examine separately.

For example in the present serendipity study, the participants in the narrative proper were assigned the attributes of ‘person’, ‘object’ and ‘knowledge’; how- ever, the participants were treated as belonging to a single concept type category because each participant needed to be treated equally in the analysis. If the study had treated the participants differently at the collection stage by placing them in different categories, then, given the mathematics underlying network measures, equal analysis would not have been possible at the analysis stage.

What constitutes a relationship: Carley (1993, p. 92) defines two terms, relationship and statement: “A relationship is a connection between concepts, whereas a statement is two concepts and the relationship between them”. Note that in the language of network theory, Carley’s terms correspond to relation and relational dyad respectively. Carley (1993, p. 94) explains that choices need to be made about what con- stitutes a relationship (choice 10):

When a map-analytic [relational text analysis] approach is taken, the researcher can choose how much information to preserve about the relationship. The researcher may choose to simply note that two con- cepts are related. This choice implies a specific series of secondary choices—all relationships are of the same strength, all have the same sign, all are bidirectional, and all have the same meaning (i.e. differ- ences in meaning are not preserved).

As Carley notes, if all that is coded is that a relation exists between two con- cepts, then the remaining decisions concerning the relationship are automatically

158 4.4 Event structure coding dictated by the initial coding choice. In some contexts, nothing more may be required; however, in others, additional detail about the relationship is useful. The strength of a relationship can be conceived in at least two ways:

1. value; and,

2. frequency.

In network theory, relationship strength is expressed as weights on the lines of a network. Weights can represent value; for example, a weight of two may rep- resent a more ‘valuable’ relationship than a weight of one, perhaps the idea of ‘best friend’ rather than ‘acquaintance’. Weights can also indicate frequency, the number of times a relation occurs between two elements. The sign of a relationship specifies whether a relationship is negative or pos- itive (e.g. hates versus loves), or more often, simply whether a relationship is present or absent. For example, in network terminology, a null dyad represents a negatively signed, or absent relationship. Carley (1993, p. 96) notes that in many relational text analysis studies all relationships are positively signed because the analysis codes only relationships that are present. Carley (1993, p. 96) writes of direction in relationships:

. . . coding directionality can provide information about . . . the struc- ture of meaning. If a researcher is interested in separating procedural or temporal knowledge from declarative or definitional knowledge, eliminating directionality can make it impossible to locate these dif- ferences.

Direction can provide fundamental information about a relationship. For ex- ample, directionality in the semantic triplet represents the intentionality (in the phenomenological sense) of the action event, or statement, described by the nar- rator. This intentionality encompasses both temporal and procedural knowledge (e.g. the first process element intends action on to the second process element). Thus, coding relationship directionality is absolutely essential for capturing mean- ing from semantic triplets. Like concepts, relationships can be placed into one or more categories. Car- ley (1993, p. 96) suggests that, “In many studies, one type of relationship (one

159 4. RESEARCH DESIGN category of ‘meaning’) is sufficient as the dominant interest of the researcher is simply to determine which concepts are related”. As with concept type, gener- alisation of relationship meaning is often desirable. For example, in the present serendipity study, all relationships were assigned a single attribute, ‘process’: in other words, the relationships all ‘meant’ the same thing.

Inferred information: In addition to decisions about concept types and what constitutes a relationship, researchers employing relational text analysis need to decide what, if any, inferred information to code. For example, are only explicit relationships to be coded, or will the researcher allow the inference of implicit relationships (choice 11)? As with the use of implicit concepts in traditional content analysis, the use of implicit relationships in relational text analysis can add meaning to the analysis; however, in both methods, the risk arises that adding incorrect inferences may negatively affect the results of the analysis.

Implicit social knowledge: A coding choice related to implicit relation- ships is the use of implicit social knowledge (choice 12). Carley (1993, p. 97) defines social knowledge as, “. . . knowledge that one can reasonably expect all members of the sociolinguistic environment to share—for example, that parents are older than their children”. The decision to use implicit social knowledge can significantly alter the results of the relational text analysis. Carley argues that adding implicit social knowledge can enhance the meaning of a text and can emphasise similarities across of set of texts; however, Carley (1993, p. 100) also warns against the use of added implicit social knowledge in some research situations, such as those where the research interest is in how individuals report a phenomenon:

. . . if the researcher is interested in comparing how different individuals portray their information . . . [then] the correct analysis uses informa- tion explicit in the text and not the underlying social knowledge.

In line with Carley’s recommendations, implicit social knowledge was not coded in the present study due to the phenomenological nature of the research and

160 4.4 Event structure coding the interest in comparing multiple, highly individualistic reports of serendipity; however, implicit relationships (i.e. choice 11), and any implicit concepts required by those relationships, were coded in both event structure models.

4.4.4 Two approaches to modelling event structure

Many different approaches are possible to content analysis; for example, Salda˜na (2009) outlines twenty-nine different coding methods. Although not as numerous, different approaches are also possible in relational text analysis. Carley (1993) describes proximity, sociolinguistic (or story-line) and simple semantic approaches to relational text analysis. As in content analysis, each coding method produces a different model of the underlying text. For reasons relating to consistency checking and measurement, the study em- ployed two relational text analysis methods to code the sample Citation Classics. Each method demanded its own set of coding decisions and produced a different event structure model. The model one event structure derived from a sociolin- guistic approach that focused on coding the texts as described linguistically by the narrators and on capturing the story-line of the narrative. The model two event structure resulted from a sociostructural approach that coded the texts based on certain common understandings and norms associated with social structure. More specifically, the model one approach derived from Franzosi’s (2010) story grammar methods while the model two approach followed the methods outlined in Diesner and Carley (2005) and in Carley, Bigrigg, Reminga, Storrick, DeReno and Columbus (2009, p. 87–96). Both event structure models were semantic representations of texts; how- ever, the semantic representations differed. Carley (1993, p. 106–107) notes that some approaches, such as sociolinguistic methods, lean more towards syntax than others, but suggests that the major difference between relational text analysis ap- proaches “. . . is one of emphasis”. As an example of the differences in emphasis between the sociolinguistic and the sociostructural approaches, consider the following two sentences:

1. An idea came to Sue.

161 4. RESEARCH DESIGN

2. Sue came up with an idea.

On some semantic levels these sentences are identical. Each can be conceived as a semantic triplet {subject:action:object}, each presents the description of a process event related to Sue having an idea, and each uses the same verb, ‘came’. However, on at least one semantic level, these two sentences do not mean the same thing: sentence one implies a sense of passivity—that Sue did nothing and the idea just arrived in her head—while sentence two implies that in Sue’s relation to the idea, she was active in her approach. For the purposes of the sociolinguistic approach, how something is said really matters. The approach assumes that the speaker wants to pass a highly spe- cific semantic understanding to the listener—in other words, that the speaker has a reason for choosing to express meaning by selecting a particular semantic structure out of the set of all possible semantic structures. For example, a soci- olinguistic approach to the analysis of the process relations described in the two example sentences produces two relational dyads that, although identical in their elements and process relations, differ in their direction:

1. idea → Sue

2. Sue → idea

The sociostructural approach draws on normal understanding of social struc- ture. For example, we know that our ideas do not really come out of thin air, but rather, that we create ideas within our minds: when we receive input from our external world, we have to actively engage with that information, even if only subconsciously, to have ideas. In this sense, our relations with ideas are always active. Accordingly, the sociostructural approach would map the two example sentences identically: Sue → idea.

4.4.5 Advantages of the two model approach

The study was especially concerned with the phenomenological description of process in serendipity. As such, the sociolinguistic approach seemed the preferred option because text analysed using this approach is likely to reflect more closely

162 4.4 Event structure coding the narrator’s phenomenological experience. Indeed, the study did prefer model one, using it for the majority of the analyses; however, as discussed in section 3.4.3, all models are simplifications and all models leave things out. For certain analyses, model two was the preferred choice. If the study had only examined single sentence structures, such as the two example sentences presented in the previous section, then model one would have been sufficient; however, recall from section 3.3.2 that event structures must comprise at least two relational dyads. The occurrence of two or more relational dyads creates the potential for indirect links in the event structure model. For example, consider the sentence below, which has been coded according to the two approaches, sociolinguistic (model one) and sociostructural (model two), used in the study.

Sentence: Sue looked through the telescope at the star.

Model one: Sue → telescope → star

Model two: star ← Sue → telescope

Note that the two coded results differ in terms of sequence and indirect and direct links: in model two, Sue links directly to the star, while in model one, she does not. Now consider a full range of potential process-orientated research questions that one might ask about Sue looking through the telescope at the star. If a potential research question relates to aspects of sequence or indirect links, then model one is clearly the preferred model. In fact, model two is useless for questions related to sequence or indirect links. However, if a potential research question seeks to measure the load that Sue carries in terms of relations, how many ‘things’ she needs to connect with in the process of looking through the telescope at the star, then model two is a better choice because the model permits an efficient count of the number of elements to which Sue is directly linked. Bear in mind that this example, which uses only two relational dyads, is extremely simplistic. Event structures combining many relational dyads in a variety of connections and interlinks are considerably more complex and highlight to a much greater degree the difference between indirect and direct links and the

163 4. RESEARCH DESIGN

effect of sequence. The important point to note from the example presented here is that different research questions may require the use of different models because the nature of models dictates that no single model can be universally useful.

4.4.6 A weak (and weaker) semantic approach

Through the use of event-based narrative approaches and the relational text cod- ing decisions applied, the study adopted a weak semantic approach to the sample narratives. Gopnik (1972, p. 15) distinguishes between weak semantic approaches and strong semantic approaches:

A strong semantic analysis of a given linguistic segment determines what the segment “means”. A weak semantic analysis of a linguistic segment determines whether or not it “says the same thing” as some other linguistic segment.

As a combination of the semantic triplets underlying the narrative proper, an event structure aims to ‘say the same thing’ as the narrative proper in a weak semantic sense. Event structures are inherently weak (rather than strong) se- mantically for several reasons. Firstly, an event structure describes only the nar- rative proper, not the complete narrative; therefore, full (i.e. strong) semantic understanding of the narrative is impossible. Secondly, no description is included in an event structure; as such, the fullest meaning (in an experience-based narrat- ive analysis sense) of even the narrative proper can never be reached. Finally, an event structure treats all process in the narrative proper equally, as a relational tie between two participants. As its name suggests, an event structure serves primarily a structural purpose; for example, in the present study, although verbs were coded explicitly into the event structures, the actions ‘put’ and ‘drop’ ef- fectively became semantically equivalent once coded—one had no more semantic weight, emphasis or meaning than the other. A weak semantic equivalent can be conceived as more or less weak depending on how it paraphrases, or normalises, the original text. Gopnik (1972, p. 16) introduces the term included paraphrase to label the weaker semantic equivalent: an included paraphrase is a text that “does not say anything which was not said in the original text. . . It may not say everything said in the [original text], but

164 4.4 Event structure coding it does not say anything new”. Following Gopnik’s distinction between weak (paraphrases) and weaker (included paraphrases) semantic models, model two of the sample narratives was considered a weaker semantic equivalent to the narrative proper because, unlike model one, model two captured neither sequence nor multiple interrelations between the same process elements. It is important to note that the formal concept of being ‘weaker than’ se- mantically should not be confused with the informal everyday concept of being ‘weak’. Recall from the discussion of modelling (of which paraphrasing is an example) that models are always simplified. Sometimes a simpler, semantically weaker model is precisely what is needed for a given use. For example, as ex- plained previously, although the model one event structures were used for much of the analysis in the study, model two, the weaker semantic equivalent, was the preferred and more useful model for addressing certain research questions.

4.4.7 Overview of event structure coding decisions

Table 4.3 presents an overview of the decisions taken in the coding of the two event structure models. Decisions were necessary, not only on the eight traditional content analysis choices, but also on the four relational text analysis choices (see section 4.4.3.2). In most cases, the coding decisions taken were identical for both event structure models; only the decisions relating to choices five, seven, and ten differed. Sections 5.3.3 and 5.3.4 provide further details of the two coding procedures used in the study.

165 4. RESEARCH DESIGN Event structure model two Elements (participants) in semantic triplets Ignore Predefined Some generalisation Attributes added Sociostructural approach Implicit Existence All concepts present in narrative proper One Strength: not used Sign: all positive Direction: all directed Meaning: single (process) Implicit No - This table presents an overview of Event structure model one Elements (participants) in semantic triplets Ignore Predefined Some generalisation Attributes added Sociolinguistic approach Implicit Frequency All concepts present in narrative proper One Strength: frequency Sign: all positive Direction: all directed Meaning: single (process) Implicit No Coding choice description Level of analysis– what constitutes a concept Approach to irrelevant information Use of predefined or interactive concepts Level of generalisation Creation of translation rules Level of implication for concepts Existence or frequency Number of concepts How many types of concepts Level of analysis— what constitutes a relation Level of implication for relations Use of social knowledge Coding choice 1 2 3 4 5 6 7 8 9 10 11 12 Table 4.3: Relational textthe relational analysis text coding analysis decisions coding applied decisions in applied the to study the sample narratives.

166 4.5 Network analysis

4.5 Network analysis

4.5.1 Networks as translations of event structures

The networks analysed in the study were translations of the narrative proper normal form, the event structure, into mathematical structure. Sections 5.3.3 and 5.3.4 outline the technical details of this translation process. An important point about the study networks as translations is that the quantitative mathematical structure of a network and the qualitative narrative proper event structure ‘said the same thing’ in terms of their usefulness for addressing the study’s research question; whether words or numbers were used to say this same thing was a matter of pragmatics rather than methodological or theoretical stance.

4.5.2 Network methods decisions as predetermined

As translations of event structures, the networks analysed in the study were predetermined by the use of narrative and the decisions applied to code the nar- ratives; therefore, the network methods decisions applied in the network analysis stage were also predetermined. Boundary specification and the use of an ego- centric approach were determined by the narrative analysis, while vertex and line representation—direction, weights and attributes—and mode and model choice were determined by the relational text analysis. Table 4.4 summarises the network methods decisions applied in the study and gives the reasons for their application.

4.5.3 Some comments on working with networks

A few general points apply when working with networks.

Use of computers The first point relates to the use of computers. While it is possible, given a small enough network, to perform manual network analysis, modern network methods rely heavily on the use of computers, exclusively so when large networks are involved. Sections 4.8.3 and 4.8.5 discuss the specialist network analysis software used in the study.

167 4. RESEARCH DESIGN Reason for decision Participants are the elements thatlate interre- in a semanticProcess triplet. links the twomantic participants triplet. in aThe se- semantic triplet is inherently directed. The coding decisionswhich captured translated frequency, to lineone, weights, but in existence model only (i.e.absence presence [0]), [1] in or modelThe two. coding rules allowed threetribute possible at- labels forcipants: person, semantic object, tripletThe or knowledge parti- codingtriplet rules relations a single assigned attribute:Only process. all one concept semantic type was coded. The codedoverlapping event and cumulative structures process. Narrative represented are told from acentric first-person, perspective. ego- Narrative norms determined the boundary. Method predetermining decision Narrative analysis Narrative analysis Narrative analysis Relational text analysis Relational text analysis Relational text analysis Relational text analysis Relational text analysis Narrative analysis Narrative analysis - This table summarises the study’s decisions in relation to network methods Decision Semantic triplet participants Semantic triplet process Directed Weighted (model one) Binary (model two) Yes Yes One-mode Architectural Egocentric Realist approach Table 4.4: Network methodschoices. decisions The third column notes predetermining analyses. The final column gives reasons for the decisions. Network methods choice What do vertices rep- resent? What do lines repres- ent? Directed or undirec- ted network? Binary (unweighted) or weighted network? Do vertices havetributes? at- Do lines have attrib- utes? One-mode or multi- mode network? Network model? Network approach? How is theboundary specified? network

168 4.5 Network analysis

Network visualisations The second point relates to visualisations of networks. It can be tempting, looking at a network, to make assumptions about its structure based on visual inspection alone. While visual inspection can be a useful starting point, and should never be summarily discarded, it is important to recognise that a network comprising more than a few elements and interrelations can be drawn multiple ways. Visual inspection may suggest a pattern in a network is significant; how- ever, network measures may show that there is nothing significant about the visual pattern. Network software uses spring-embedded algorithms to draw networks consist- ently. The basic premise informing these algorithms is one of a spring creating tension between two vertices, drawing them closer together or letting them posi- tion further apart based on mathematical calculations. Despite the use of these algorithms, the network researcher will normally need to tweak the network pic- ture, by moving vertices or lines slightly, to achieve the clearest possible visual description. For consistency reasons, the study left the task of creating the net- work visualisations up to the computer software as much as possible.

Analysis in and across networks The third point relates to the levels of analysis and types of comparisons pos- sible in and across networks. As discussed in section 3.5.6.5, some network meas- ures possible at the dyadic level, including degree, have a variant at the network level. In some cases, it is also possible to describe measures in both absolute and normalised terms; recall from section 3.5.3 that normalised measures are needed for comparisons across networks of different magnitudes. Other analyses that permit comparison, albeit indirect, across networks of different magnitudes rely on inferential statistical network modelling techniques such as topology inference (see section 3.5.7.2) and motif detection (see section 3.5.7.3). While the network dyad is the smallest possible unit of analysis, it is not the smallest possible unit of measurement; for example, a network’s order and size are recorded in absolute terms (i.e. as counts). Although counts can be useful in some contexts, network measures are needed to analyse a network relation- ally. For example, to understand what order and size signify structurally, the

169 4. RESEARCH DESIGN network measure, density, must be calculated. Thus it is helpful to make a clear distinction between counts in networks, which are inherently non-relational, and network measures, which are inherently relational. Analysis of attributes, which are variables, involves counts, not network measures. As outlined in section 4.5.4, the study made use of both counts and network measures.

Network manipulation The final point concerns the manipulation of networks. As mathematical ob- jects, networks can be transformed. For example, several networks can be ag- gregated, or unioned, to form a single network, or a weighted network can be binarised so that all non-zero line weights are set to one. While all manipulation techniques change a network in some way, some techniques change the network structure more drastically than others. For example, binarising a network only alters line weights; the structural cohesion of the network (i.e. its density) is not affected. Network manipulation is accepted practice in network analysis. The network analyst may choose (or need) to transform a network prior to analysis. Provided the original network is documented and the manipulation is undertaken with an awareness of the effect it will have on network measures and structure, working with a transformed network can sometimes be more beneficial for addressing a research question than working with the original network. The study employed some network manipulation techniques, including unionisation and binarisation.

4.5.4 From abstract constructs to concrete measures

The representation of abstract serendipity constructs as concrete network meas- ures was necessary for the quantitative investigation of the serendipity event structures. Objective set two (see section 4.1.3.2) referred to the measurement of the event structures in terms of constructs such as load, activity and passivity, and interaction. Section 2.3.3 introduced these constructs and located them in the context of a conceptual framework for specifying serendipity as process.

170 4.5 Network analysis

To enable the network analysis of the translated event structures, the research had to determine what network measures best captured the abstract constructs. Table 4.5 matches the counts, network measures and statistical network tech- niques used in the study to the serendipity constructs they measured. The specific analyses are discussed in the relevant sections in Chapter 5.

Complexity = order, size, density Several questions referred to the complexity of an event structure: how big was the event structure? how connected was the event structure? how were its attributes distributed? and, how many contexts did the event structure represent? Complexity was measured using a combination of counts, including vertex and line counts, attribute counts, and time event counts. Structural cohesion was calculated using the network measure, density.

Load, activity = degree Load, and the related construct of activity (see section 2.3.3), were represented by various measures of ego’s (the narrator’s) degree. Load represents the elements with which a person directly interacts in the serendipity process. As discussed in section 3.5.6.5, degree measures the direct dyadic connections a vertex shares with other vertices. In the study, indegree equated to ego’s incoming direct connections, or pushed load, outdegree to ego’s outgoing connections , or pulled load, and total degree, to ego’s total load. Activity, which refers to how often load is carried, was calculated from network line weights. Recall from sections 4.4.7 and 4.5.2 that frequency, or the number of times a relation occurred between two elements in an event structure, translated to line weights in the networks.

Interaction, interaction patterns = ego networks, topology, motifs Other constructs focused on interaction (see section 2.3.3). Questions relating to these constructs included: do any interaction patterns dominate ego’s local environment? are any patterns found equally distributed around the event struc- ture or do they tend to centre on ego? and, how much of the event structure interaction is beyond ego’s direct influence?

171 4. RESEARCH DESIGN - This table matches the counts, Construct measured Complexity Pushed, pulledand and total activity/passivity load tion in to load rela- Complexity Pushed, pulledand and total activity/passivity load tion in to load rela- Global interaction patterns Local interaction patterns cent- ring on ego Potential direct influence of ego Network measure/technique Size Order Attributes Time events Ego in-degree Ego out-degree Ego total degree Density Ego in-degree centrality Ego out-degree centrality Ego total degree centrality Event structure in-degree centralisation Event structure out-degree centralisation Event structure total degree centralisation Topology inference Motif detection Comparison of ego network at nowith set ego distance network set at a distance of one Table 4.5: Matchesnetwork between measures and serendipity statistical constructs networkthe techniques and used extensive network in use the measures of study degree. to the serendipity constructs they measured. Note Type ofemployed network analysis Counts Absolute network measures Normalised network measures Statistical network techniques Ego networks

172 4.6 Statistical analysis

For example, the literature presents collaboration with others as important to serendipity. Is it only ego who must collaborate or is collaboration distributed more equally around the event structure, even when ego is not involved in this distributed collaboration? To address questions related to interaction, the study employed statistical network techniques such as topology inference to examine global interaction pat- terns (i.e. interaction patterns across the event structure) and motif detection to investigate interaction patterns centring on ego. As well, the study compared the complete event structure, or ego network at no set distance, with a reduced ver- sion of the event structure, the ego network set at a distance of one, to establish the range of ego’s potential direct influence in the serendipity process.

A note about network measure selection A large number of network measures of varying degrees of complexity are cal- culable using network analysis; for example, one network program used in the study, ORA (Carley, 2012b), offers the analyst 145 different measures. Use of these measures in empirical contexts is complicated by the fact that some meas- ures are relevant only to a particular network model and/or type, such as a social network conceived as a flow model. In light of this situation, the study’s network analysis actually relied on relat- ively simple, yet fundamental, network measures. For example, degree is applic- able to all networks, regardless of model or type, and, as Newman (2010, p. 169) notes, degree may be a “simple” measure, but “ . . . it can be very illuminating”.

4.6 Statistical analysis

4.6.1 Use of descriptive statistics

The study employed nonparametric descriptive statistics7 to analyse the results of the network analysis. The statistical results informed the descriptive profile of process in serendipity. Graphical presentations accompanied the inferences.

7Section 4.6.2 explains the study’s preference for nonparametric approaches.

173 4. RESEARCH DESIGN

Inferential statistical network modelling techniques, including topology infer- ence and motif detection (see sections 3.5.7.2 and 3.5.7.3), also aided the descrip- tion of process in serendipity; however, these techniques do not come under the area of traditional statistical analysis. The descriptive statistical analysis focused on central tendency and dispersion. Analysis involved measurement of data at different levels: the attribute data were measured at the nominal level; some network measures were ranks, and thus, were measured at the ordinal level; and the network count data, such as vertex and line counts, were measured at the ratio level. In general, network measures result in continuous data while network counts represent discrete data. Absolute network measures, including degree, are also discrete in nature. Concerning the differentiation of continuous and discrete data, Boslaugh and Watters (2008, p. 5) comment:

From a statistical point of view, there is no absolute point when data become continuous or discrete for the purposes of using particular analytic techniques . . . This is another decision to be made on a case- by-case basis, informed by the usual standards and practices of your particular discipline and the type of analysis proposed.

The debate over the use of discrete versus continuous data is sometimes high- lighted by reference to the example of a hypothetical measurement of 2.5 children. A similar issue exists in network studies: a network cannot have 2.5 vertices. Nonetheless, it is accepted practice in network studies to treat both network measures and counts as continuous. For an empirical example, see the descriptive statistics for the measures of order (vertex count) and mean nodal degree (ab- solute degree) in Faust (2006, p. 192, Table 1). The serendipity study adopted this accepted practice and treated all network measures and counts as continuous data. Along with more traditional graphic methods, such as histograms and box plots, the study employed multi-dimensional scaling. Multi-dimensional scaling techniques produce graphical visualisations of patterns of relations found in nu- merical data contained in matrices (Bernard and Ryan, 2010, p. 176). In the visualisations, data that are similar cluster together and data that are dissimilar

174 4.6 Statistical analysis sit further apart, thereby allowing the analyst to see patterns difficult to observe in the numerical data alone. As such, although multidimensional scaling requires numerical calculations, the resulting graphs are considered qualitative in nature.

4.6.2 Model difference and correlation

Each pair of event structure models coded in the study, model one based on the sociolinguistic approach and model two based on the sociostructural approach, derived from a common narrative. The research design anticipated that the differ- ent coding decisions applied to the two models would result in differences between some model parameters but not others. For example, no statistically significant difference between vertex counts in a model pair was expected. If found, a significant difference would have suggested that the number of participants (i.e. the people, knowledge and objects) described in the narrative proper had somehow changed between the coding passes used to extract the two models. Given that the narrative texts were fixed data, any such inconsistency would have had to result from problems with the quality of the research.8 The research credibility audit, discussed in Chapter 7, relied in part on in- formation about the similarities, differences and correlation between the two event structure models used in the study. The use of nonparametric inferential statics, including hypothesis testing and measures of association, provided some of this information. Regarding the decision to employ either nonparametric or classical parametric statistical techniques, Gibbons (1993, p. 63) argues that nonparametric methods should be used whenever any of several conditions are true:

1. The data are counts or frequencies . . . 2. The data are measured on a nominal scale. 3. The data are measured on an ordinal scale. 4. The assumptions required for the validity of the corresponding parametric procedure are not met or cannot be verified.

8See section 4.7 for a discussion of the issues affecting research quality.

175 4. RESEARCH DESIGN

5. The shape of the distribution from which the sample is drawn is unknown. 6. The sample size is small. 7. The measurements are imprecise. 8. There are outliers and/or extreme values in the data, making the median more representative than the mean.

The study data met all of these conditions, partially due to the qualitative nature of the original dataset and partially due to the general nature of network data. Regarding point six, although the sample size was likely ‘large enough’,9 recall from section 4.2.6 that a statistically-exact-sized sample was not drawn from the serendipity subpopulation; therefore, the ‘large enough’ nature of the sample size was not quantitatively confirmed. As such, the study’s preference for nonparametric statistical techniques, both descriptive and inferential, was appropriate. The Wilcoxon Signed Rank Test was employed to test for significant dif- ference between the paired event structure models. As the most conservative nonparametric measure of association, Kendall’s Tau was used to examine stat- istical correlation between the paired models. Section 5.9.2 discusses the study’s application of these statistical tests in more detail.

4.7 Research quality

4.7.1 Quality issues

Because of the mixed methods design adopted, quality issues related to both quantitative and qualitative methods were relevant to the study. As Teddlie and Tashakkori (2009, p. 300) note:

An ostensible obstacle for MM [mixed methods] researchers is that they must employ three sets of standards for assessing the quality of their inferences: 9For example, Crawley (2005, p. 43) suggests: “. . . 30 is a reasonably good sample. Anything less than this is a small sample . . . Anything more than 30 may be an unnecessary luxury . . . ”.

176 4.7 Research quality

ˆ Evaluating the inferences derived from the analysis of QUAN [quantitative] data using QUAN [quantitative] standards ˆ Evaluating the inferences made on the basis of QUAL [qualitative] data using QUAL [qualitative] “standards” ˆ Assessing the degree to which the meta-inferences made on the basis of these two sets of inferences are credible . . . .

All research is concerned with quality issues related to methods, including data collection and analysis, and to conclusions, including the interpretation of results; however, how quality is assessed differs in qualitative and quantitative research. To address quality issues, quantitative research relies heavily on validity, a concept encompassing internal, external and construct validity. Along the lines of Lincoln and Guba (1985), the quality of qualitative research is assessed drawing on the idea of trustworthiness, a concept encompassing credibility, transferability, dependability and confirmability. A significant problem for mixed methods research is that validity and trust- worthiness are not directly comparable because of their different approaches to and conceptions of the role of the human researcher. Additional concerns in mixed methods research include: the integration of qualitative and quantitative methods; and, the synthesis of the results of these methods in order to allow inference formation.

4.7.2 An Integrative Framework for Inference Quality

The literature describes several frameworks aimed at addressing quality issues in mixed methods research (for examples, see Dellinger and Leech, 2007; Onwueg- buzie and Johnson, 2006). The research credibility audit presented in Chapter 7 adopts the structure of Teddlie and Tashakkori’s (2009) Integrative Framework for Inference Quality, a framework that focuses on questioning research rather than on matching research to idealised components or elements. The Integrative Framework addresses two broad concerns: design quality and interpretative rigour. Design quality criteria consider the research design in rela- tion to the research questions. Interpretive rigour criteria examine the research inferences in relation to the research results. It is important to note that the

177 4. RESEARCH DESIGN

Integrative Framework does not incorporate transferability criteria. Teddlie and Tashakkori (2009, p. 311) argue that credibility must be established before any consideration is given to the issue of transferability . Table 4.6 reproduces Teddlie and Tashakkori’s (2009, p. 301–302) Integrative Framework in full.

Table 4.6: Integrative Framework for Inference Quality – This table presents the complete Integrative Framework for Inference Quality proposed by Teddlie and Tashakkori (2009). The framework is reproduced from pages 301–302 of Foundations of Mixed Methods Research.

Aspects of Quality Research Criterion Indicator or Audit

1. Design suitability 1a. Are the methods of study Design quality (appropriateness) appropriate for answering the research questions? Does the design match the research questions?

1b. Does the mixed meth- ods design match the stated purpose for conducting an integrated study?

1c. Do the strands of the mixed methods study address the same research questions (or closely related aspects of questions)?

Continued on next page

178 4.7 Research quality

Table 4.6 – continued from previous page

Aspects of Quality Research Criterion Indicator or Audit

2. Design fidelity 2. Are the QUAL, QUAN, and (adequacy) MM procedures or design com- ponents (e.g. sampling, data collection procedures, data analysis procedures) imple- mented with the quality and rigor necessary for (and cap- able of) capturing the mean- ings, effects, or relationships?

3. Within-design 3a. Do the components of consistency the design fit together in a seamless manner? Is there within-design consistency across all aspects of the study?

3b. Do the strands of the MM study follow each other (or are they linked) in a logical and seamless manner?

4. Analytic adequacy 4a. Are the data analysis procedures/strategies appro- priate and adequate to provide possible answers to research questions?

4b. Are the MM analytic strategies implemented effect- ively . . . ?

Continued on next page

179 4. RESEARCH DESIGN

Table 4.6 – continued from previous page

Aspects of Quality Research Criterion Indicator or Audit

5. Interpretive 5a. Do the inferences closely consistency follow the relevant findings Interpretive rigor in terms of type, scope, and intensity?

5b. Are multiple infer- ences made on the basis of the same findings consistent with each other?

6. Theoretical 6. Are the inferences consist- consistency ent with theory and state of knowledge in the field?

7. Interpretive 7a. Are other scholars likely to agreement reach the same conclusions on the basis of the same results?

7b. Do the inferences match participants’ constructions?

8. Interpretive 8. Is each inference distinct- distinctiveness ively more credible/plausible than other possible conclusions that might be made on the basis of the same results?

Continued on next page

180 4.7 Research quality

Table 4.6 – continued from previous page

Aspects of Quality Research Criterion Indicator or Audit

9. Integrative 9a. Do the meta-inferences efficacy (mixed and adequately incorporate the multiple methods) inferences that are made in each strand of the study?

9b. If there are credible inconsistencies between the inferences made within/across strands, are the theoretical explanations for these in- consistencies explored, and possible explanations offered?

10. Interpretive 10a. Do the inferences correspondence correspond to the stated pur- poses/questions of the study? Do the inferences made in each strand address the purposes of the study in that strand?

10b. Do the meta-inferences meet the stated need for using an MM design? (i.e., is the stated purpose for using MM met?)

181 4. RESEARCH DESIGN

4.8 Software

4.8.1 Pragmatics

The study relied heavily on the use of computer software. As well as the open tools for research introduced in the sections below, the study also employed other software, including EXCEL10 (Microsoft Corporation, 2006) for data collection and JING11 (TechSmith Corporation, 2012) for screen captures. Although many of the programs recognised text files (.txt), other types of files were not always directly transferable between programs. As well, each program required inputted files to follow specific formatting, even in the case of simple .txt files. Thus, the need to move between different software during the course of the analysis meant that considerable effort was spent transferring files from one type to another and reformatting files. One approach could have been to use fewer programs; however, each program used in the study had a particular advantage over the others for the use to which it was put. Network analysis programs compute standard network measures. Different programs produce the same results because the underlying analyses rely on the same mathematical formulae. In general, a network analyst is free to move between software for reasons of convenience or because one program offers a fea- ture that another does not; however, in some cases, when the literature presents several possible versions of a network measure, the analyst needs to consult the software manual to determine the version of the network measure and underlying formula used by the software.

4.8.2 R

Much of the statistical analysis in the study was carried out in R12 (R Development Core Team, 2012a); the remainder was conducted within the network analysis software. R was also used to produce multidimensional scaling models and to graph the results of the statistical analysis.

10EXCEL 2007 was used in the study. 11The free version 2.4.10231 of JING was used in the study. 12Version 2.12.2 of R was used in the study.

182 4.8 Software

R is command driven and allows users to write functions. User contributed packages, available as add-ons to R, offer a wide range of statistical, graphical and other capabilities. The R packages used in the study are referenced under the relevant analyses sections in Chapter 5. Readers interested in further information about the R environment are referred to the R project homepage (R Development Core Team, 2012a) and the textbook literature (for examples, see Crawley, 2005, 2007; Muenchen, 2009; Teetor, 2011).

4.8.3 PAJEK

PAJEK13 (Batagelj and Mrvar, 2012), a program for network analysis, was used in the course of modelling network model one. The online manual (Batagelj and Mrvar, 2011) and the textbook literature (de Nooy et al., 2005) provide extensive details about PAJEK’s features. Figure 4.4 presents a screenshot of PAJEK’s user interface. PAJEK employs drop down menus from which network analysis commands can be accessed. The Draw tab opens another window, which contains the network visualisation; however, a significant drawback of PAJEK is that only a bitmap image can be produced directly in the software. To achieve higher quality visualisation output, the image file needs to be transferred to vector drawing software. The primary file type in PAJEK is the .net file type, which can also be read as text in a text editor. The .net file contains the network information stored as a list of numbered vertices along with the relations between them. Figure 4.5 presents the .net file for one of the study networks. One feature of PAJEK, the ability to model time event networks, was especially important for the study. Time event records can be stored either in the special time event network .tim file type or in the standard .net file type. The .net file type was used in the study. The addition of time event data permits the longitudinal modelling of a net- work, allowing the analyst to see how the network changes over time. Networks containing time event records were used in the study to delineate and capture event contexts, conceived as locations, larger events (i.e. not single process events

13Version 1.25 of PAJEK was used in the study.

183 4. RESEARCH DESIGN

Figure 4.4: PAJEK’s user interface - This figure presents a screenshot of PAJEK’s user interface. The arrow indicates the location of the Draw tab.

in the semantic triplet sense) or times. Each time event record in a network file corresponded to a different context described in the narrative. Figure 4.6 shows the network illustrated in Figure 4.5 with time event records added.

4.8.4 EXCEL2PAJEK

EXCEL2PAJEK (Pfeffer, 2012) is software specifically designed to transfer EXCEL files that take the form of a list of network relations into .net PAJEK files. The software was used in the study to transfer the model one event structures coded using the sociolinguistic approach into PAJEK for network modelling.

184 4.8 Software

Figure 4.5: A PAJEK .net file - The figure shows the contents of a .net file for a network extracted from Blanchard (1986) (see Garfield, 2012), one of the sample narratives. The orange arrow points to the list of vertices, while the blue arrow points to the list of relations.

185 4. RESEARCH DESIGN

Figure 4.6: A PAJEK .net file with added time event records - This figure shows the addition of time event records to the contents of the .net file illustrated in Figure 4.5. The arrows point to the time event data contained in square brackets. This example network contains data for three time events, or contexts. The time event data in the list of relations (left-hand side of the figure) assigns a time event record to each relation. The time event data in the list of vertices (top right-hand side of the figure) indicates the time events in which each vertex participates. For example, ‘Woodcock’ (vertex 2) participates in time events one and two, but not three. The * symbol in ‘Blanchard’s’ (vertex 3) record indicates that he participates in all remaining time events from event one onwards. In the case of this example, this means that Blanchard participates in all of the time events recorded in the network.

186 4.8 Software

4.8.5 ORA

ORA14(Carley, 2012b) was the primary network analysis software used in the study. Support for ORA includes user guides, which are produced annually (Carley, Rem- inga, Storrick and Columbus, 2010, 2011; Carley, Reminga, Storrick and DeReno, 2009). Like PAJEK, ORA incorporates a user interface. Figure 4.7 illustrates the ORA user interface loaded with the network illustrated in Figure 4.6. As well as a drop-down tab menu across the top, the user interface includes three windows: a section in the top left for uploading networks, an area on the right-hand side for working with the networks, and an output area at the bottom of the screen. Three types of output are possible with ORA: reports, visualisations and charts. ORA has at least four advantages over PAJEK. Firstly, visualisations produced in ORA are good quality and can be saved directly as .pdf, .png or .jpeg files so transference of image output to vector drawing software is not necessary. Secondly, ORA offers a range of reports in which network measures are automat- ically accompanied by additional statistical analysis; although some statistical analysis is possible in PAJEK, the range is more limited. Thirdly, ORA facilitates the storage of a collection of networks as a single ORA .ows workspace. The use of workspaces saved time in the study because a report could be set up and run on multiple networks at once. Finally, ORA includes a data import wizard, allowing various file types to be imported. In particular, .csv files (which can be created from EXCEL files) and PAJEK .net files are recognised by the data import wizard.

4.8.6 FANMOD

FANMOD (Rasche and Wernicke, 2012) was used for the motif detection. FANMOD de- rives its name (the abbreviated form of FAst Network MOtif Detection) from the fact that the software does not rely on the generation of ensembles of whole ran- dom networks, but rather on a more efficient edge-switching algorithm (Wernicke, 2006). Wernicke and Rasche (2006) provide a good overview of the software. A more detailed user’s manual is also available (Rasche and Wernicke, 2006).

14Version 2.2.2 of ORA was used in the study.

187 4. RESEARCH DESIGN

Figure 4.7: ORA’s user interface - This figure presents a screenshot of ORA’s user interface loaded with the example network from Figure 4.6. Note the node (vertex) count of ten and the link (relation) count of sixteen. The arrows indicate the report, visualisation and chart buttons.

188 4.8 Software

Figure 4.8 presents FANMOD’s user interface. FANMOD reads .txt files that con- sist of lists of network relations. As well as an area in the top right for inputting these files, the user interface includes drop-down menus for selecting the size of motif and the algorithm settings, and a box for entering the number of simula- tions. Once the algorithm has run, a pop-up box allows the analyst to select the output parameters.

Figure 4.8: FANMOD’s user interface - This figure presents a screenshot of FANMOD’s user interface. Files are inputted in the top right-hand side of the screen. The output appears at the bottom of the screen and as a separate HTML screen. The blue arrow indicates the drop-down box for selecting the motif size. The orange arrow indicates the algorithm choice drop-down box. The white arrow indicates the box for entering the number of random networks to be simulated.

189 4. RESEARCH DESIGN

190 Chapter 5

Data Collection and Analysis

This chapter sets out the data collection and analysis procedures that were em- ployed by the research. The discussion, which is often technical in nature, assumes a reader familiar with the methodologies and methods reviewed in Chapters 3 and 4: explanations of the data collection and analysis procedures are restricted here to details not provided in previous chapters. References to earlier sections are supplied where relevant. Research quality concerns related to the data collection and analysis are explored separately in Chapter 7.

5.1 Timeline

The study data were collected manually in two passes over a five month period, mid-March to mid-August 2010.1 A final pass in August 2010 checked the data for errors. The data analysis, the bulk of which was automated, occurred during September and October 2010. The decision, taken in April 2011, to switch from parametric to nonparametric tests made additional statistical analyses necessary.2 In all cases, a single researcher, this report’s author, carried out the data collection and analysis.

1Recall from section 4.3.3 that pilot studies preceded the full data collection. The pilot studies took place between August 2009 and February 2010. 2This decision was taken due to the nonnormal nature of the study data.

191 5. DATA COLLECTION AND ANALYSIS

5.2 Contextual data

5.2.1 Collection

The contextual data collection and analysis addressed the first set of theoretical research objectives (see section 4.1.3.1). A closed-ended survey instrument aided the extraction of the contextual data from the study sample. Appendix E provides a copy of this instrument. The instrument collected four types of contextual data:

1. serendipity classification and pattern data;

2. serendipity filter data;

3. narrator process interpretation data; and,

4. discipline data.

The collection process involved several passes through each narrative. An EXCEL workbook stored the collected data. Collection of rich textual description supplemented the serendipity filter ele- ment of the survey. In the evaluation sections of their narratives, some narrators explicitly discussed the impact of serendipity filters in relation to the recoun- ted serendipity occurrences; when such rich descriptions appeared, the data were extracted verbatim and stored.

5.2.2 Analysis

5.2.2.1 Classification and pattern data

The analysis focused on both counts and combinations of serendipity classifica- tions and patterns. Counts recorded the number of appearances of a pattern/class from the Campanario, Van Andel, and Foster and Ford taxonomies (see section 2.3.2.3). Combinations of patterns/classes were studied within taxonomies (e.g. joint Van Andel patterns), in relation to taxonomy pairs (e.g. combinations of Campanario/Foster and Ford classes), and across all three taxonomies (e.g. com- binations of Campanario/Van Andel/Foster and Ford patterns and classes).

192 5.2 Contextual data

5.2.2.2 Filter data

As with the classification and pattern data, the analysis of the filter data examined both counts and combinations. The analysis then correlated this information with the filter impact data.

5.2.2.3 Process interpretation data

Intention/realisation The analysis of this data involved four counts:

1. counts of categorical responses;

2. counts of each possible intention combination (e.g. ab, dd, ad, etc.);

3. counts of each possible realisation combination (e.g. ab, dd, ad, etc.); and,

4. counts of each possible overall combination (e.g. aaab, abbc, adbc, etc.).

Linearity/non-linearity, recognition, multiple disciplinarity, outcome The analyses of these data counted the number of categorical responses.

5.2.2.4 Discipline data

Two analyses investigated discipline: one analysis studied the journal title, and the other, the Current Content edition metadata (see section 4.2.1). The first analysis classed the fields of study based on the journal titles both broadly (e.g. sciences, social sciences, humanities, etc.) and more narrowly (e.g. medicine, biology, law, etc.). Counts of each of the seven metadata categories were also taken. The second analysis took special note of any narratives in the sample that had multiple metatdata tags. Narratives with multiple metatdata tags were deemed multiple disciplinary for the study purposes.

5.2.2.5 Rich descriptive data

The collection stage encountered far less rich textual description of serendipity filter impacts than had been expected; nonetheless, a correlation of the limited

193 5. DATA COLLECTION AND ANALYSIS descriptive data with the serendipity filter data helped to inform the discussion of the study results.

5.3 Collection of the event structures

5.3.1 Overview of the collection process

The event structure collection involved three elements: coding, verification and error-checking. Event structure model one was collected first in two passes through the study sample. The collection of event structure model two followed, again in two passes. In both cases, the second pass concentrated on the systematic resolution of consistency issues raised in the first pass. It is important to emphasise that delineation of the narrative proper was an integral part of the first two passes through the sample narratives; however, to avoid redundancy in this report, the narrative proper delineation process is reviewed only once, presented separately in section 5.3.2. A third pass checked, first the model one, and then the model two event structure data for errors. This final pass was restricted to recording-related errors (e.g. typos, incorrect attribute codes, inconsistent semantic triplet participant labelling); the third pass made no attempt to address consistency issues at a deeper semantic level, for example, by substantially re-coding semantic triplets or altering their sequence. It is important for the reader to be aware that the qualitative data collec- tion process described in sections 5.3.2 to 5.3.4 is sanitised and simplified for brief presentation here. In actuality, the study of rich narrative data was messy and inherently complex; for example, the analyst became more familiar with the study data as collection progressed, a situation that may have affected the event structure coding. The limited space in sections 5.3.2 to 5.3.4 does not permit a detailed discus- sion of the numerous and varied decisions taken during the coding of the model one and model two event structures; instead, overviews of the collection and trans- lation processes are provided accompanied by an introduction to some of the more frequent problematic situations encountered. Readers seeking more detail may

194 5.3 Collection of the event structures wish to review Appendices F (model one) and G (model two), ‘walk-throughs’ of event structure codings from a sample narrative.

5.3.2 Narrative proper

By default, the data collection process began with the complete sampled narrat- ive; however, as noted in section 3.3.1.3, the study adopted a categorical event- based approach. The narrative proper was the category of interest. As such, the event structure coding needed to be limited to the sequence of events comprising the narrative proper.3 To aid the delineation of the narrative proper, the analyst read the complete narrative to identify the Labovian event narrative sections (see section 3.3.2.1).4 These sections acted as useful reference points for the event structure coding; for example, the boundary between the complicating action and the evaluation sections was important because it was normally here that the most reportable event was found (see sections 3.3.2.1 and 4.4.2). Once found, the most reportable event marked the end point of the narrative proper. The analyst worked backwards through the sequence of narrative events to locate the initial event. Narrative norms (see section 4.4.2) dictated that all events between these start and end points were coded in the event structure models.

5.3.3 Model one

As discussed in section 4.4.4, the model one event structures were coded us- ing a sociolinguistic approach derived from Franzosi’s (2010) story grammar and quantitative narrative analysis methods. The coding process also drew on other methods (Dixon, 2005; Labov, 1997, 2003; Labov and Waletzky, 1967, 1997). Readers are referred to Table 4.3 for an overview of the model one event struc- 3More exactly, as discussed in section 4.4.2, the study focused on the serendipity-related component of the narrative proper. 4In practice, several read-throughs of a narrative were often required. It is important to stress that the identification of the event narrative sections was not based on formal Labovian analysis (see section 3.3.2.2). Labovian analysis was trialled in the pilot studies; however, the method was deemed unnecessarily complex for the needs of the study.

195 5. DATA COLLECTION AND ANALYSIS

ture coding decisions. Software employed during the collection of the model one event structures included: EXCEL, PAJEK and EXCEL2PAJEK (see sections 4.8.3 and 4.8.4).

Collection process The collection of the model one event structures followed the coding rules set out in Franzosi (2010, p. 84–85); however, coding was restricted to the semantic triplet component of the story grammar. Prior to coding, supplementary coding rules were established. As well as providing instructions for attribute and event context coding, these supplementary coding rules clarified aspects of Franzosi’s (2010) rules and incorporated elements from the other methods (see Dixon, 2005; Labov, 1997, 2003; Labov and Waletzky, 1967, 1997) used in the model one collection process. Appendix H presents the supplementary coding rules. The modelling process began with the semantic triplet coding. The data from each coding pass were recorded on separate worksheets in an EXCEL workbook. Each row in a worksheet represented a single semantic triplet. Fifty workbooks, one for each sample narrative, resulted. Figure 5.1 presents the model one event structure second-pass data coded from one sample narrative. Semantic triplets were coded in sequence, with the initial narrative proper event as the starting point. Depending on the verb type (see section 3.3.3.4) and the complexity of the sentence structure, the analyst extracted one or more se- mantic triplets from each narrative proper sentence. The subject, action/predicate, and object elements of the semantic triplet, along with the attribute codes for these elements, were entered in the relevant columns of the EXCEL worksheet.5

5The reader familiar with traditional content analysis may find it helpful to relate the col- lection process to the concept of unitising (see Krippendorf, 1980, p. 57–63; Franzosi, 2010, p. 85–86). In the study, the sample unit was the Citation Classic narrative and the recording unit was the semantic triplet element (i.e. participant or process). The study considered two context units—the narrative proper and the semantic triplet.

196 5.3 Collection of the event structures - The figure shows the model one event worksheet record the event context data. Note that this sample narrative describes five event contexts. Each Figure 5.1: Anstructure data example collected of onExcel the coded second model pass throughsemantic one triplet Galanter is event (1983) identified by structure (see an(column Garfield, ID data H). number 2012). Columns (column E C) The and andNote I first coded that contain by two ‘colleagues’, the subject columns ‘ideas’ participant (column and of attribute D),(red ‘literature’, codes, action the box). despite (column while their F) column Recall plural and G nature, from objecta records are the the person coded process discussion (blue as attribute in single arrow). code. semantic section triplet 4.4.4 participants that the sociolinguistic coding approach allows ‘ideas’ to come to

197 5. DATA COLLECTION AND ANALYSIS

The analyst aimed to code labels for the semantic triplet elements directly from the Citation Classic texts; however, situations frequently presented that required generalisation. For example, if the narrator referred to ‘the first sample’ and ‘the second sample’, then the analyst coded ‘sample1’ and ‘sample2’. As outlined in the coding rules, some common referents, such as experiments and results, were coded with generic labels across the sample. Note that it was the attribute code rather than the semantic triplet element label that was important in the network analysis stage of the study (for example, see section 5.6.4). Although not used per se in the network analysis, the semantic triplet element qualitative labels served three useful purposes. Firstly, the qualitative labels allowed the analyst to check for semantic coherence; the expectation was that a sequential read-through of the semantic triplets would make sense and present a sequence of events that made up a story. Secondly, the labels allowed the analyst to check the semantic match between coded output and the input, the original narrative.6 Finally, the labels provided a helpful means to identify individual vertices during the network analysis stage. The qualitative codes not only labelled the network vertices in a practical sense (see section 5.4.3) but also helped to trigger the analyst’s recall of the original narrative events. This recall allowed the analyst to remain close to the original qualitative data, even at later, primarily quantitative, stages of the study. During the semantic triplet coding, the analyst noted the semantic boundaries that marked a new event context; for example, a narrator might refer to a different location or begin a sentence with, ‘The next day . . . ’. Several narrative proper events normally contributed to a single event context. The analyst recorded event contexts in the EXCEL worksheet using both a qualitative label and a sequential id number.

Problematic situations The coding process presented numerous problematic situations. Some were restricted to one or two narratives; others were more common across the study

6As Franzosi (2010, p. 90) notes, semantic coherence does not guarantee that output matches input in terms of coding; for example, the story presented by the output data might make perfect sense but be very different semantically from the original story.

198 5.3 Collection of the event structures sample. The discussion here considers a few of the more common problematic situations.

Complex sentence structures: The coding rules addressed many syntactic is- sues, such as the coding of passive verbs and indirect/direct objects; how- ever, sometimes a sentence had such a complex syntactic structure that it was difficult for the analyst to code the semantic triplet(s). Two tools helped to resolve this problem:

1. Dixon’s (2005) Appendix, which provided information about the se- mantic roles for a given verb-type (see section 3.3.3.4); and, 2. the Stanford Parser (The Stanford Natural Language Processing Group, 2012).

The Stanford Parser not only parses sentences but also supplies typed dependencies (see de Marneffe, MacCartney and Manning (2006) and de Marneffe and Manning (2008)). The typed dependencies helped the analyst to identify the relational links between subjects, verbs and objects.

Description: At times, especially when a narrator used technical or scientific terms, the analyst found it difficult to separate out descriptive elements, for example, adjectives, from the narrative proper text. Two methods helped to address this problem:

1. parsing the sentences using the Stanford Parser; and, 2. manually crossing-out descriptive elements on printed copies of the Citation Classics .pdf files.

Subject matter: As alluded to above, some texts proved difficult to understand without specialised knowledge of the subject matter. In such cases, the analyst had to rely on multiple read-throughs and the semantic context of the narrative; outside sources were not consulted.

Sequence: Event sequence was ambiguous in some sentences. Whenever pos- sible, sequence was coded according to the order given in the sentence; for

199 5. DATA COLLECTION AND ANALYSIS

example, in the sentence, ‘I ate the apple and pear’ , ‘I ate the apple’ was coded as the first semantic triplet. Events not assigned a sequence by the narrator were placed at the beginning of the semantic triplet coding list. The use of participant analysis tables (see section 3.3.4.3) also helped to clarify sequence.

Semantic issues: The analyst sometimes needed to rewrite terms based on their semantic meaning; for example, the term ‘we’, when used by the narrator(s), was understood to include all of the researchers referred to in the narrat- ive and/or the authors of the Citation Classic, depending on the semantic context. Thus, a phrase such as, ‘we read the literature’, where ‘we’ rep- resented, say, five people, was coded as five semantic triplets, each with a different person in the subject slot. Note, however, that ‘the literature’, des- pite being plural in nature, was coded as a single object. Another semantic issue was consistency of labelling; for example, a narrator might refer to ‘the literature’, but later, to ‘the existing research’. If it was clear that the narrator used these terms interchangeably, then the analyst selected a single term for coding purposes.

Translation to network format Recall from section 4.5.1 that the networks analysed in the study were trans- lations of the qualitative coded event structure data. To enable this translation using EXCEL2PAJEK, the analyst first created a new, two-column worksheet la- belled ‘exceltopajek’ in the EXCEL workbook. The semantic triplet subject and object data from the second-pass coding worksheet were copied and pasted into the new worksheet—the subjects into column A and the objects into column B. Figure 5.2 shows the coded data from Figure 5.1 prepared for import into EXCEL2PAJEK. As the second step, the analyst loaded the ‘exceltopajek’ worksheet into EXCEL2PAJEK. The worksheet data were imported as one-mode and directed. Fig- ure 5.3 presents a screenshot of the EXCEL2PAJEK settings used to import the data shown in Figure 5.2. The EXCEL2PAJEK output was a PAJEK .net file (for example, see Figure 4.5)

200 5.3 Collection of the event structures

Figure 5.2: Model one coded data prepared for import into EXCEL2PAJEK - This figure presents an example of the elements of the model one coded data that were imported into EXCEL2PAJEK. The data derive from the EXCEL worksheet illustrated in Figure 5.1. Note how the worksheet illustrated here explicitly includes only the semantic triplet participants. The process element of each semantic triplet has been removed; however, the process element is still included implicitly because each row in the spreadsheet represents a single semantic triplet. EXCEL2PAJEK will read column A as ‘linking’ to column B.

Figure 5.3: The EXCEL2PAJEK settings used to import the model one data - The figure shows a screenshot of the EXCEL2PAJEK settings used to import the model one data. In this example, the data illustrated in Figure 5.2 are being imported. Note that the worksheet shown in Figure 5.2 has been selected from the larger workbook (red arrow). The data are being imported as one-mode and directed (blue arrows). The ’directed’ option means that EXCEL2PAJEK will read the data in Figure 5.2 as {column A element → column B element}. The Create Pajek File button (yellow arrow) runs the program, which translates the data into a PAJEK .net file (see Figure 4.5 for an example .net file).

201 5. DATA COLLECTION AND ANALYSIS of the imported data. As the final step in the translation process, the analyst manually added the time event data to the .net file based on the event context data from the second-pass coding worksheet. See Figure 4.6 for an example of a .net file with added time event data.

Outcomes The model one event structure collection process produced fifty network mod- els, each described as two types of PAJEK network files: .net files and .net files with added time event data (see section 4.8.3). Both file types were needed for the analyses (see sections 5.4.1 and 5.8.2). Each time event .net file could be loaded and visualised in PAJEK either as a single cumulative network or as a col- lection of individual time event networks. Note that, because it resulted from the individual time event networks effectively being ‘piled on top of each other’, the cumulative network was weighted. Also note that, due to the coding rules, the model one networks allowed directed dyads in any attribute combination.

5.3.4 Model two

The model two event structure coding followed a sociostructural approach (see section 4.4.4). The coding process relied on the meta-matrix model (see section 3.3.4.3) and adopted the methods outlined in Carley, Bigrigg, Reminga, Storrick, DeReno and Columbus (2009, p. 86–93). Table 4.3 presents an overview of the model two event structure coding decisions. The collection of the model two event structures required the use of both EXCEL and ORA (see section 4.8.5).

Collection process Following the coding rules set out in Carley, Bigrigg, Reminga, Storrick, DeR- eno and Columbus (2009, p. 86–93),7 the analyst created lists of four elements: agents (A), events (E), knowledge (K), and resources (R). The agent (i.e. per- son), knowledge and resource (i.e. object) lists represented the semantic triplet attributes allowable in the study (see section 4.4.3.2). The event lists captured

7Recall from section 3.3.4.3 that the meta-matrix method is flexible. The study adopted a reduced version of the method, using fewer element classes.

202 5.3 Collection of the event structures the narrative event contexts based on the same coding practice used during the model one event structure collection. From these lists, six matrices were described: AK, AA, AE, AR, EE and ER. Note that the matrices were labelled row first; for example, the AK matrix placed the agent elements in the rows and the knowledge elements in the columns. Note also that these matrices represented all allowable relations between the four element classes (see section 3.3.4.3 and Carley, Bigrigg, Reminga, Storrick, DeReno and Columbus (2009, p. 86–93)). The analyst then coded the semantic triplet relations into the AA, AK, and AR matrices by placing a 1 in matrix cells when a directed semantic triplet relation occurred from row element to column element. Because the matrices were eventually imported into ORA, the analyst did not need to enter a 0 into the matrix cells to indicate an absence of a relation between row and column elements (i.e. a null dyad): ORA automatically assumes a 0 entry for any empty matrix cell. The AE, EE and ER matrix relations were also coded. Although these matrices were deleted prior to the network analysis (see section 5.4.2), the matrices offered a useful semantic check on the event context data during the collection stages. The lists and matrices were created in EXCEL. Figure 5.4 presents two coded element lists and two matrices of relations created from these lists. Upon comple- tion of the model two event structure collection process, fifty EXCEL workbooks had been created, one for each sample narrative. Within these workbooks, each element list and matrix was stored as a separate worksheet.

Problematic situations Like the model one event structure coding, the model two event structure coding encountered problematic situations. Complex sentence structures, the separation of description from narrative proper, specialised subject matter, and semantic issues remained concerns; however, issues related to sequence posed fewer problems due to the fact that model two was weaker semantically and captured existence rather than frequency (see section 4.4.6).

203 5. DATA COLLECTION AND ANALYSIS

Figure 5.4: Example model two event structure lists and matrices - The figure presents an agent element list and a knowledge element list (red arrows) extracted from Milstein and Brownlee (1992) (see Garfield, 2012). Two matrices are possible from these element lists, an AA matrix and an AK matrix (blue arrows). Note that, as part of the coding process, each element has been assigned an ID and that this ID retains information about the element class (i.e. either an A or a K).

204 5.4 Pre-analysis network manipulation

Translation to network format Each model two event structure was imported into ORA for translation to net- work format. ORA’s File>Data Import Wizard function enabled this process. The translation of a single event structure involved three sequential steps:

1. each list and matrix worksheet contained in the EXCEL workbook was saved as a separate .csv file;

2. all of the .csv matrix files were imported into ORA; and,

3. all of the .csv list files were imported into ORA.8

Outcomes The model two event structure collection process resulted in fifty ORA meta- network files, one for each sampled narrative. The meta-networks were multi- mode, binary and directed. Note that, ignoring the event attribute,9 only four directed dyad attribute combinations were possible in the meta-networks: A → A (and, therefore, by default, also A ↔ A); A → K; and, A → R. This fact influenced the choice of network model used in the network analysis (for example, see section 5.8.2).

5.4 Pre-analysis network manipulation

5.4.1 Adjustments to model one

ORA was used for all of the network analysis; therefore the PAJEK model one networks needed to be imported into ORA. This task was accomplished using ORA’s File>Data Import Wizard function (see section 4.8.5). Each imported model one network file appeared in ORA as a collection of sep- arate, individual PAJEK time event networks, rather than as a single, cumulative PAJEK network. Because the network analysis measured the cumulative event

8Carley, Bigrigg, Reminga, Storrick, DeReno and Columbus (2009) explain these steps in detail: saving as .csv files (p. 97); importing .csv matrix files (p. 98–102); and, importing .csv list files (p. 103–107). 9The event attribute was not one of the three semantic triplet participant attributes allowed in the study (see section 4.4.3.2).

205 5. DATA COLLECTION AND ANALYSIS structure, each collection of PAJEK time event networks was unioned (see section 4.5.3) into a single ORA network using ORA’s Data Management>Meta-Network Union function. To retain as much information as possible about the original PAJEK time event networks, two types of union were conducted:

1. a binary union, which ignored multiple interactions and assigned relations a weight of either zero or one; and,

2. a summed union, which retained multiple interaction information.

In the summed unioned networks, line weights represented the number of times an interaction occurred.

5.4.2 Adjustments to model two

The network methods analysed only the semantic triplets comprising the event structures. As non-semantic triplet data, the event data collected for model two was not required for the network analysis. All event data, including the event lists and any matrices containing event lists, were deleted from the model two meta-networks. The network analysis treated the event structures as one-mode; however, the model two meta-networks were multi-mode. As such, they needed to be transformed to one-mode for the network analysis. This transformation was ac- complished using ORA’s Data Management>Meta-Network Transform function, which converts the multiple networks of a meta-network into a single one-mode network.

5.4.3 Summary of model similarities/differences

Before the discussion moves on to the network analysis, the reader may find a summary of the similarities and differences between the final versions of the two network models helpful. Post-manipulation, the two network models of each sample narrative shared certain traits:

ˆ both were complete (i.e. sociocentric) networks;

206 5.4 Pre-analysis network manipulation

ˆ both were also ego networks at no set distance;

ˆ both had the narrator as ego;

ˆ both allowed three vertex attributes—‘person’, ‘knowledge’ and ‘object’;

ˆ both allowed only one line attribute—‘process’;

ˆ both were directed;

ˆ both were one-mode;

ˆ both were binary;

ˆ both described cumulative process; and,

ˆ despite originally capturing context in the form of event counts, both models had this data removed prior to network analysis.

Post-manipulation, the two network models also differed in some ways:

ˆ each could have a different order and size;

ˆ each allowed different dyad combinations by attribute (i.e. all combinations of ‘person’, ‘knowledge’ and ‘object’ were possible in model one; however, only the directed dyad combinations AA, AK and AR were possible in model two);

ˆ each was easily visually distinguishable from the other due to different ver- tex labelling (i.e. model one had textual labels [e.g. ‘Blanchard’] while model two had alphanumeric labels [e.g. A01] ); and,

ˆ model one came in a weighted version while model two did not.

See Table 4.4 for an additional overview of the two network models.

207 5. DATA COLLECTION AND ANALYSIS

5.5 Generic analysis procedures

5.5.1 Network analysis

The majority of the network analyses presented in sections 5.6 to 5.8 followed a generic procedure, which is outlined here. The motif detection work described in section 5.8.2 is a notable exception. As the analysis was carried out in ORA, the reader may find it helpful to refer back to the information in section 4.8.5. The analyses of network model one and network model two were conducted separately. It is important to note that while some network analyses employed both network models, others involved only one network model. Recall from sec- tion 4.4.4 that the two event structure models resulted from different relational text analysis approaches; as discussed in section 4.4.5, the usefulness of the two models varied depending on the research question asked and the measurement sought. Prior to the network analysis, two ORA .ows workspaces were created, one for each of the network models. Each workspace contained the fifty event structure networks for the relevant model. The creation of a workspace was accomplished by uploading each of the fifty networks using ORA’s File>Open Meta-Network func- tion and then saving the uploaded networks using ORA’s File>Save Workspace function (i.e. as opposed to the File>Save Meta-Network function). The saved workspace, which could be reloaded at any time, allowed the network analysis to be performed on all fifty networks at once. The generic network analysis procedure involved four steps:

1. loading the .ows workspace using ORA’s File>Open Workspace function;

2. selecting the desired ORA report using the Generate Reports. . . button;

3. setting parameters for the report; and

4. running and saving the report.

Figures 5.5 to 5.8 illustrate these four steps.

208 5.5 Generic analysis procedures . The ORA - This figure presents a screenshot of a workspace loaded in ORA file type and the individual networks, visible in the left-hand window, contained in the .ows Figure 5.5:red A arrows workspace highlight loaded the in workspace. Clicking on any one of these networks brings that network up in the right-hand window.

209 5. DATA COLLECTION AND ANALYSIS button used to select the networks for which Select. . . - This figure shows the pop-up screen generated when the button Generate Reports. . . button is clicked. The red arrows indicate the drop-down menu used to select the type of report—in Generate Reports. . . this case, the ‘All Measures’ report has been selected— and the to run the report. The report will run on any network that has a ticked box. Figure 5.6: Behind the

210 5.5 Generic analysis procedures - This figure shows example parameters for defining a report—in Figure 5.7: Thethis selection case, of as the report redthat parameters arrow indicates, can the be ‘Sphere ofIn defined Influence’ this by report. illustration, the Each the type analyst.arrow). of report report The is The has analyst a set ‘Sphereego different has to of set network selected run of at Influence’ parameters the on one report narrator, the (yellow calculates arrows). network vertex an extracted 3 from ego ‘Blanchard’, Blanchard as network (1986) for the (see ego a Garfield, vertex selected 2012) and vertex. (blue has set the distance of the

211 5. DATA COLLECTION AND ANALYSIS

Figure 5.8: Choices for report output - This figure illustrates the final screen behind the Generate Reports. . . button. The analyst selects the desired output format(s)—in this example, HTML and .csv output files have been selected—(blue arrow), chooses a directory and file name for the saved output (red arrow), and runs the report by clicking the Finish button (yellow arrow). It is worth noting that the analyst may often choose to select more than one type of output format. Each format presents the results differently, and some output types provide more information than others.

5.5.2 Descriptive statistics

Descriptive statistics were calculated in R (see sections 4.8.2) for each set of network analysis results (i.e. across the study sample of fifty networks). As with the network analysis, a generic procedure was adopted. Robust measures of location and scale were employed; these are detailed in section 6.4.1.2. Box plots and histograms were also called to allow visual inspection of the data;

212 5.6 Analysis related to complexity some of these graphical representations are reproduced in Chapter 6 as part of the study results. Unless the discussion indicates otherwise, the reader should assume that the analyses explained in sections 5.6 to 5.8.1 employed the study’s generic statistical procedure. Preparation for the statistical analysis involved formatting the ORA report out- put into .txt files readable by R. New .csv files were created in EXCEL, a process that normally required a combination of reformatting the ORA .csv output files (e.g. transposing rows and columns, replacing spaces with full stops, renaming columns or rows) and copying/pasting data from the ORA HTML output files. The new .csv files were then saved as .txt files, ready for loading into R.

5.6 Analysis related to complexity

5.6.1 Event structure magnitude

Network order and size (see section 3.5.3) defined event structure magnitude. Order provided a count of the number of semantic triplet participants in the event structure. Size gave a count of the number of process links in the event structure. The analysis examined both network models. Order and size were called as part of an ‘All Measures’ report.10

5.6.2 Event structure cohesion

Density (see section 3.5.3) measured event structure cohesion, conceived as the ratio of the number of actual process links between semantic triplet participants to the number of possible process links between semantic triplet participants (see 4.5.4). ‘All Measures’ reports calculated density for the model one and model two networks. 10By default, ORA’s ‘All Measures’ report calculates 145 measures; however, all 145 are rarely needed in a single study (see section 4.5.4). The analyst can limit the measures calculated by the report with ORA’s Analysis>Measures Manager function.

213 5. DATA COLLECTION AND ANALYSIS

5.6.3 Event structure event contexts

Sections 5.3.3 and 5.3.4 explained how the event structure models retained selec- ted event context information; however, as discussed in sections 5.4.1 and 5.4.2, this information, with the exception of cross-event context multiple interactions in model one, was removed from the final network models. Thus, the analysis of the event data involved only a descriptive statistics component. The analysis, which studied both network models, focused on three event counts:

1. the total event context count for the event structure;

2. the ego event context count (i.e. the number of event contexts in which ego participated); and,

3. the ego proportional event context count (i.e. the proportion, expressed as a percentage, of the number of event contexts in which ego participated relative to the total event context count for the event structure).

5.6.4 Semantic triplet participant attributes

Recall from sections 5.3.3 and 5.3.4 that attribute information was coded during event structure collection. The analysis of this attribute data involved the use of counts and proportional distributions. Counts were taken of the number of person, knowledge and object vertices in each model one and model two network. The proportional distribution of attributes across each network was then calcu- lated. Note that some analyses related to interaction also studied vertex attribute data (see sections 5.8.1 and 5.8.2).

5.7 Analysis related to load

5.7.1 Pushed, pulled and total load

Load (see section 2.3.3) was calculated from measures of degree (see sections 3.5.6.5 and 4.5.4) taken on the binary network models. In all cases, the vertex- level degree measurements focused on the ego vertex. The analysis employed ORA’s ‘All Measures’ report.

214 5.7 Analysis related to load

Because of the different coding approaches used to collect the event structure models, the analysis used network model one for all pushed load calculations and network model two, for all pulled load calculations.11 In the case of total load, absolute total load was estimated from the two network models rather than calculated exactly from a single network model. Appendix I outlines the two estimation procedures followed. Statistical tests checked the correlation of the results of the two estimation procedures (see section 5.9.2). Given that the estimation of total load involved two network models, the analysis could only calculate average ego total degree centrality and network total degree centralisation for each network pair. As such, for an additional check on ego total degree centrality, ego’s rank was considered; for example, did the network model one ego and the network model two ego have the highest total degree centrality in their respective networks?

5.7.2 The activity and passivity of ego

The analysis of ego’s activity and passivity (see section 2.3.3) followed aspects of the procedure outlined in section 5.7.1; however, significantly, both binary and summed network models were studied. As in the load analyses, vertex-level measurements focused on ego and ORA’s ‘All Measures’ report ran the calculations. Passivity was calculated from absolute and normalised indegree measures taken on the summed versions of network model one. The analysis preferred the summed version because that model captured multiple interactions, informa- tion essential to the concept of passivity (see sections 4.5.4 and 5.4.1). The use of the summed versions of the networks implied the calculation of weighted degree centrality measures. Note that the procedure (Wei et al., 2011) implemented in ORA to calculate weighted degree centrality relies on the use of a weighting constant for the calculation of maximum degree centrality.12

11These calculations included the absolute and normalised measures of degree listed in Table 4.5. 12See section 3.5.6.5.

215 5. DATA COLLECTION AND ANALYSIS

The analysis of activity needed absolute and normalised outdegree measures calculated from the model two networks; however, summed versions of the model two networks were not available. As such, the analysis estimated the outdegree measures. The estimation procedure, outlined in Appendix J, used the binary model two networks, and the binary and summed versions of the model one networks.

5.8 Analysis related to interaction

5.8.1 Interaction and sequence

5.8.1.1 Two types of interaction

For the study purposes, interaction13 was defined as the extent of ego’s direct interaction with other network vertices. Because the focus was on direct inter- action, the analysis measured ego networks at a distance of one. The analysis investigated two types of interaction: true interaction and dependent interaction. True interaction was calculated from the model two networks while dependent interaction was measured on the model one networks. The attribute distribution was also measured for both types of interaction. It is important to highlight the conceptual difference between true and de- pendent interaction. True interaction represents the actual direct interaction between ego and other network vertices. This type of interaction is best captured by collected semantic triplets that ignore sequence—for example, star ← Sue → telescope (see section 4.4.5)—and capture underlying social structure. Note that in this example of two collected semantic triplets, which represents Sue’s ego net- work at a distance of one, both the telescope and the star would be measured in a direct interaction calculation. 13Note that the study viewed interaction and interaction patterns (see sections 5.8.2 and 5.8.3) as distinct concepts.

216 5.8 Analysis related to interaction

Dependent interaction highlights the potential role played by sequence in the serendipity process (see section 2.3.3). Measures of dependent interaction rely on collected semantic triplets that capture sequence—for example, Sue → telescope → star (see section 4.4.5). Note that in this example of two collected triplets, only the telescope would be measured in the direct interaction calculation. The star lies at a distance of two links from Sue; therefore, Sue’s interaction with the star is considered indirect because it is not measured in the ego network set at a distance of one. Network model one was used to study dependent interaction because the coding process that produced this model retained the information about ego’s perception of sequence in the serendipity process.

5.8.1.2 Analysis of true interaction

The analysis of true interaction employed ORA’s ‘Sphere of Influence’ report with the Citation Classics narrator selected as the ego vertex and the distance of the ego network set at one. The report visualises the ego network and indicates its order. The report also presents the ego network order as a percentage of the complete network order. To collect information on vertex attribute, the analyst manually counted the number of vertices in the ego network by attribute type. These counts were then each represented as a proportion of the total vertex count for the ego network, minus ego:14

aattribute (n − 1) where

aattribute = the vertex count for an attribute

and

n = the total number of vertices in the ego network.

14Ego was always a person; therefore, including ego in the attribute distribution would have automatically weighted the distribution in favour of the person attribute.

217 5. DATA COLLECTION AND ANALYSIS

For each attribute type, this proportion was reported as a percentage. As a collection, the three percentages represented the attribute distribution for true interaction in the ego network.

5.8.1.3 Analysis of dependent interaction

The analysis of dependent interaction repeated the procedure set out in sec- tion 5.8.1.2. The analysis also estimated the percentage of direct interactions in each network that might need to occur in certain sequences in order to result in serendipity. This rough estimate was calculated by subtracting dependent inter- action from true interaction. Radial diagrams were used to examine differences in attribute distribution between true and dependent interaction.

5.8.2 Local interaction patterns

Network motif detection (see section 3.5.7.3) identified statistically significant local interaction patterns centring on ego. To allow the full range of possible dyad attribute combinations, network model one was studied. The analysis employed FANMOD for network motif detection, ORA for ego network visualisation, and R for multidimensional scaling. In addition, some manual analysis was necessary. The analysis was a three step procedure.

Step one: network motif detection Before work in FANMOD could begin, a suitably formatted .txt input file for each model one network was needed. The original .net PAJEK files for model one (i.e. the .net PAJEK files without the added time event data) were used as a starting point. See Figure 4.5 for an example .net file. Only the list of relations from the .net file was retained in the new .txt FANMOD input file. Figure 5.9 presents an example of a FANMOD input file.

218 5.8 Analysis related to interaction

Figure 5.9: A FANMOD input file - This figure shows a .txt FANMOD input file. The file consists entirely of a list of relations.

FANMOD was employed with the setup illustrated in Figure 5.10. Key settings included the detection of size-3 directed motifs, the generation of 1000 random networks, and the preservation of a locally constant number of bidirectional edges for each vertex. FANMOD’s HTML export function was employed for the motif detection output. Figure 5.11 presents the pop-up screen behind the HTML-Export button. The pop-up screen in the Figure has been set to the export options used in the study: create pictures; filter by Z-score higher than 2 and p-value less than 0.05; sort the motif results by Z-score; and, put motifs not found in the random networks at the bottom of the output list. Figure 5.12 shows an example of FANMOD’s HTML output. Note the content of each output row: motifs are identified by id number; a visual depiction is provided; and, the statistical results are presented. An index is also included (not shown here) that lists the setup and output parameters used in the analysis and provides information on the algorithm run. The parameter information is useful for record keeping purposes and the algorithm information is helpful for determining whether or not the analysis needs to be repeated, for example, with a larger number of random networks.

219 5. DATA COLLECTION AND ANALYSIS

Figure 5.10: The FANMOD setup used in the study - This figure presents the FANMOD setup used in the study. The red arrows indicate key algorithm and random graph model settings while the blue arrow points to the button used to run the analysis.

220 5.8 Analysis related to interaction

Figure 5.11: The FANMOD export options used in the study - This figure illustrates the FANMOD export options applied in the study. The red arrow indic- ates the HTML-Export button that pops up the export options screen (white arrow). The blue arrows point to the export options selected in the study. The Generate HTML button (yellow arrow) produces the HTML report.

221 5. DATA COLLECTION AND ANALYSIS

Figure 5.12: An example of FANMOD’s HTML output - This figures reproduces the FANMOD HTML output for one of the networks analysed in the study, the model one network extracted from Blanchard (1986) (see Garfield, 2012). The output data is sorted by motif pattern in rows (blue arrows). The red arrows point to the links to the Index output.

222 5.8 Analysis related to interaction

Step two: manual checks The automated process described above offered no means for limiting motif detection to those motifs containing ego. As such, this check was completed manually. To begin with, each model one network was loaded into ORA and visualised using the Visualize button. Once inside the visualisation screen, the analyst selected ORA’s Tool>Sphere of Influence function to reduce the complete net- work to an ego network set at a distance of two. In many cases, this removed a substantial part of the network, making the process of manually checking for motifs containing ego much easier. Next, each of the motifs returned by FANMOD was checked against the ego network to see if ego participated in a motif of that pattern. The ego network was set at a distance of two rather than one, in order that all size-3 motif possibilities would be visible on the screen; for example, a size-3 motif consisting of a chain of three vertices with ego at the beginning of the chain (i.e. a motif where the third vertex is two links away from ego), could not be visualised in an ego network set at a distance of one. Finally, for all motifs found to contain ego, the attributes of the other two motif vertices and the range of possible positions for all three motif vertices were recorded. Importantly, the analysis linked the attribute and position inform- ation; for example, for the chain motif described above, the analysis captured the information, not only that ego was positioned at the start of the chain, but also that the next vertex along in the chain was knowledge x number of times or an object y number of times. This manual approach to the analysis allowed the extraction of more complex information than would have been possible with automated methods alone.

Step three: data synthesis The final analysis step synthesised the results from step two. Many model one networks contained multiple motifs. The analysis considered three questions:

1. Do any motifs dominate the sample networks? If so, which one(s)?

2. Do any motifs show a tendency to ‘go together’ in the sample networks? If

223 5. DATA COLLECTION AND ANALYSIS

so, which motifs, and in what patternings?

3. Do any sample networks share similar motif patterning? If so, which net- works?

Three multidimensional scaling procedures were used to address these ques- tions.15 The analysis employed Oksanen, Blanchet, Kindt, Legendre, Minchin, O’Hara, Simpson, Solymos, Stevens and Wagner’s (2012) vegan package16 for R and called the function >metaMDS with the distance argument set to either "jaccard" or "bray", depending on the nature of the input data.17 Input data included two matrices in community data format and a similarity matrix.18 For all the analyses, the number of runs was limited to one hundred. Appendix K provides full details of the multidimensional scaling procedures followed by the study.

5.8.3 Global interaction patterns

The analysis employed network topology inference (see section 3.5.7.2) to identify global interaction patterns. Both network model one and network model two were studied. The analysis used ORA’s ‘Context’ report. The ‘Context’ report compared the centrality measures of betweenness, close- ness and inverse closeness for each observed model one and model two network with two random graph models: the Erd¨os-R´enyi model and a core-periphery model. The report output included the mean results for the three centrality measures, the results of a t-test on these means, and a qualitative indication (‘yes’ or ‘no’) of whether or not the observed network showed ‘no significance difference’ from the random graph model based on the significance level of the t-test result. 15Note that to check the multidimensional scaling’s identification of dominant motifs, the counts of motif appearances were also considered in relation to the motif Z-score data. 16Version 2.0–1 of vegan was used in the study. 17See Holland (2008) and R Development Core Team (2012b) for details of the >metaMDS function. See Greenacre (2008) for an overview of non-Euclidean measures of distance between samples, including the Bray-Curtis and Jaccard Index dissimilarities. 18Note that prior to the MDS procedure, the similarity matrix was converted to a dissimilarity matrix. See Kruskal and Wish (1978, p. 77) or Leydesdorff and Vaughan (2006, sec. 3) for discussions of this conversion procedure.

224 5.9 Comparison of the network models

Each real network could score between 0, the lowest score, and 3, the highest score, for a given random graph model topology, depending on how many of the three centrality measures led to a t-test result of ‘no significant difference’ (i.e. a qualitative indication of ‘yes’ rather than ‘no’). The score for each observed network was recorded in an EXCEL spreadsheet, resulting in a collection of 200 scores (i.e. two network models, comprising fifty networks each times two random graph model variants). The analysis then calculated total scores by topology for each model one and model two network pair; for example, the ‘Blanchard’ network pair scored a 3 (model one) and a 2 (model two) for the Erd¨os-R´enyi topology, giving the pair a total score of 5 for that random graph model topology. The analysis interpreted the level of ‘randomness’ or ‘core-peripheriness’ of the real network based on the total score: 0 − 2 was low, 3 or 4, medium and 5 or 6, high.

5.9 Comparison of the network models

5.9.1 Visual inspection

The visual inspection of the network models had two aims:

1. to make an ‘eyeball’ estimate of network topology—for example, does the network look like a star or chain? are vertices evenly distributed or do they group together in places? are a core and periphery evident? etc.; and,

2. to compare the ‘eyeball’ estimates of model one and model two network pairs.

The ‘eyeball’ estimates in combination with the network topology inference results (see section 5.8.3) fed into the descriptive inference process relating to global interaction patterns. The results of this process are reported in sections 6.4.1 and 6.4.2. The comparison of ‘eyeball’ estimates for network model pairs provided qual- itative information about the similarity/difference between the two paired net- works. It is important to note that each network in a pair was drawn by ORA

225 5. DATA COLLECTION AND ANALYSIS using the same spring-embedded algorithm (see section 4.5.3). Although, given the two different coding approaches used to capture the event structures, there was no inherent assumption that two paired networks should look visually sim- ilar, if they did, then this information suggested that, despite their differences, the two coding approaches had produced similar models. Comparison of the two models involved examining the overall topologies of the paired networks as well as more localised patterns. For example, Figure 5.13 illustrates a model one and a model two network pair. Note the overall bow-tie shape of both networks and the similar positioning of ego within this bow-tie shape.

5.9.2 Statistical tests

Because visual comparisons of networks can be unreliable (see section 4.5.3), the study also employed statistical tests to compare paired networks. The tests compared four elements of the complexity data measured on both network models (see section 5.6): 1. order;

2. size;

3. density; and,

4. event context count. The statistical tests also checked the correlation of the two absolute total load estimates (see section 5.7.1) across the study sample. The analysis treated the network model one and network model two collected from a single narrative as a paired sample. As discussed in section 4.6.2, due to the nature of the data, nonparametric statistical tests were preferred. The analysis employed both hypothesis testing and correlation. Because a network pair described a single narrative, the study expected the complexity data not to reject the null hypothesis (i.e. to show no significant difference)19 and to be highly correlated. 19Note that this approach differs from the more standard hypothesis testing focus on rejection of the the null hypothesis.

226 5.9 Comparison of the network models - This figure shows a pair of networks extracted from Figure 5.13:Ashbaugh A (1979) visual (see Garfield, comparisonis 2012). rotated of The approximately red ninety paired arrow degreesNote highlights networks counter-clockwise, also the the the ego similar similar vertex bow-tie positioning in shape each of of model. the ego two in If networks both the is networks. model very evident. two network

227 5. DATA COLLECTION AND ANALYSIS

The hypotheses related to the complexity data were two-tailed and followed the general format:20

H0 : µ1 = µ2

HA : µ1 6= µ2 where

µ = the complexity data element and

1 or 2 refers to the network model. Hypotheses testing was accomplished using Wilcoxon Signed Rank Tests. The analysis employed the R coin package (Hothorn, Hornik, van de Wiel and Zeileis, 2006, 2008). The >wilcoxsign test21 function was called with the settings, (distribution="exact", zero.method="Pratt"). The setting "exact" requests the use of an exact distribution, computed by the Streitberg- R¨ohmelshift algorithm (see Streitberg and R¨ohmel, 1986, 1987). The Pratt (1959) zero method was selected for handling ties because of the prevalence of ties in the data.22 Kendall’s Tau was used to check correlation between network pairs across the study sample. This measure of association was deemed conservative (see section 4.6.2) and appropriate for tied data. The analysis employed R’s >cor.test func- tion, called with the setting, (method="k"). The analysis also used R to visualise the correlations as scatterplots. It is important to note that the statistical comparison of the network models, with the exception of the density comparison, examined non-relational aspects of the paired networks. Two networks with the same order, size and density can still be very different as relational objects; for example, although two networks may

20Note that, occasionally, hypotheses were one-tailed. The discussion of the study results (see Chapter 6) indicates those instances when one-tailed hypotheses were used. 21For details of this function, see Hothorn et al. (2008) and R Development Core Team (2012c). 22Ties in the paired samples were especially common due to the single narrative underlying both network models. As such, ties were a nontrivial concern.

228 5.9 Comparison of the network models have the same number of links, the links might cluster around a few vertices in one network, but be evenly distributed amongst vertices in the other. The non- relational nature of the statistical comparison was another reason that visual comparison of the network models was undertaken (see section 5.9.1). Chapter 7 returns to the issue of network model comparison.

229 5. DATA COLLECTION AND ANALYSIS

230 Chapter 6

Descriptive Inferences

This chapter presents the descriptive inferences of the research. After a brief overview of the inference process, the discussion considers the research results in relation to the two components of the study aim: the contextual, process- related elements of serendipity; and, the phenomenological event structures of serendipity. The final section of the chapter reports the meta-inference of the research, a descriptive profile of process in serendipity.

6.1 Why describe?

The present study draws on descriptive inferences to develop an understanding of serendipity as a complex phenomenon. Teddlie and Tashakkori (2009, p. 114) note that, “Understanding complex phenomena involves considerations of con- text, process, meaning, and so on”; for these reasons, the authors argue, such understandings need not—and often, cannot—focus on causality. Descriptive inference normally precedes causal inference. An unfortunate side- effect of this sequential relationship is a tendency to deem descriptive inference as inferior, a ‘less worthy’ research outcome than causal inference. A remark by Teddlie and Tashakkori (2009, p. 113) succinctly captures the bias towards causality: “This [the development of causal explanations] is, of course, the raison d’etre for post-positivists of all genres”.

231 6. DESCRIPTIVE INFERENCES

However, as King, Keohane and Verba (1994, p. 15, emphasis in original) argue, descriptive inference can be a valuable, useful research focus in its own right:

... a research project should make a specific contribution . . . by increas- ing our collective ability to construct . . . explanations of some aspect of the world. This . . . does not imply that all research that contributes to our stock of social science explanations in fact aim directly at making causal inferences. Sometimes the state of knowledge in a field is such that much fact-finding and description is needed before we can take on the challenge of explanation. Often the contribution of a single project will be descriptive inference. Sometimes the goal may not even be descriptive inference but rather will be the close observation of particular events or the summary of historical detail. These, how- ever, meet our . . . criterion [i.e. make a specific contribution] because they are prerequisites to explanations.

In many ways, the area of serendipity research is in disarray: the serendipity literature lacks ‘cohesive theories’, ‘consensus on theoretical frameworks’ and, crucially, ‘empirical evidence to demonstrate the nature of serendipity’.1 It is precisely this missing empirical evidence—fundamental fact-finding2 and detailed description—that is required to develop understanding of serendipity as a complex phenomenon, and, ultimately, to construct serendipity theories and frameworks less reliant than existing ones on assumptions and untested propositions.

6.2 The study’s inference process

6.2.1 Inference as outcome and process

Teddlie and Tashakkori (2009, p. 288) suggest that the term inference be used in research “to denote both a process and an outcome”. As an outcome, an inference is an interpretation representative of a meaning or understanding. The inference

1See section 1.2.3. 2Of course, in the case of serendipity research, the term ‘fact-finding’ must be understood in the context of a study of a phenomenon. Recall from section 3.2.1 that a phenomenon does not represent a positivist ‘reality’.

232 6.2 The study’s inference process process is a series of steps taken to enable such “meaning making” (Teddlie and Tashakkori, 2009, p. 288).

6.2.2 Multiple levels and multiple steps

The process of meaning making occurs across multiple levels and involves mul- tiple steps. This is especially true in the case of mixed methods research, which requires the convergence of study strands and the integration of qualitative and quantitative results. Teddlie and Tashakkori (2009, p. 288) argue that inference reporting should trace the multiple levels and multiple steps inherent in meaning making:

. . . an effective presentation of research findings emulates the think- aloud process. In such a process, the researcher communicates how he or she reaches an inference by following a question-and-answer style: examining the results, making a conclusion, evaluating the conclusion, and proceeding to the next level of inferences.

The present author does not believe that emulating the think-aloud process—a highly personal process that can be messy, long-winded and difficult to follow— necessarily leads to the most effective presentation of research findings; however, she does agree with Teddlie and Tashakkori that it is important to make both the levels and steps of meaning making explicit in research inference reporting. As such, the reporting in sections 6.3 to 6.5 is designed to take the reader level- by-level and step-by-step through the study’s inference process.

6.2.3 Levels of inference

The reporting focuses on three levels of inference:

1. the local level, which refers directly to specific results of a data analysis;

2. the research question level, which combines local inferences into a higher- order response to a research question; and,

3. the meta-inference level, which combines the research question inferences to address the overall aim (i.e. overarching research question) of the study.

233 6. DESCRIPTIVE INFERENCES

The report’s structure emphasises these different levels of inference, as does the degree of detail in the reporting: higher levels of inference are presented in relation to broader concepts and theory, while the most detailed information is discussed in the deeper levels of the report. It is important to stress that all inferences presented, regardless of level, must be understood as inherently bounded by the inclusion and exclusion criteria that determined the subpopulation of serendipity narratives studied by the research (see section 4.2.5).

6.2.4 Steps in the inference process

As noted in section 6.2.2, the effective communication of inferences requires four steps. These steps apply regardless of inference level. The reporting follows the four-step inference process, and does so for each inference level.

Local inference reporting Local inferences are discussed in the deepest divisions of the report.

Step 1: The report presents and examines specific results of the data analysis.

Step 2: The report makes an inference based on these specific results.

Step 3: The report evaluates the inference made.3

3a) The discussion flags up concerns related to localised methods issues. Localised methods issues are matters immediately related to the results under discussion. The discussion of broader methods issues—as well as all methodological concerns—is reserved for the research credibility audit (see Chapter 7). 3b) The discussion links the inference to the literature.

Step 4: The report proceeds to the next inference level. Note that, in practice, this step equates to the first step of the process of meaning making at the next higher inference level. 3This evaluation may involve one or both aspects of Step 3.

234 6.2 The study’s inference process

The descriptive profile The descriptive profile presented in section 6.5 integrates the lower level in- ferences into higher level inferences and serves as part of the research meta- inference. It is important to note that, at the higher levels of inference making that distinguish the descriptive profile, references to existing understandings may be general—for example, to broad concepts or theoretical propositions—rather than specific. In practice, this means fewer formal citations. The assumption is that, since specific relationships with the literature have already been referenced at the local inference level, these references hold and apply at all higher inference levels. Recall that the descriptive profile is a substantive outcome of the study.4 The descriptive profile encompasses steps one to three of the inference process at the meta-inference level; the discussions related to transferability and contributions (see sections 8.2 and 8.3) can be considered the fourth step of meaning making at the meta-inference level, and indeed, as the final step of the study’s inference process.

A reminder concerning the reporting of methodological matters As this overview of the study’s inference process concludes, one point should be made again: methodological matters beyond localised methods issues are dis- cussed in Chapter 7. As such, the research credibility audit serves as Step 3a of the inference process for both the research question and meta-inference levels of meaning making.

6.2.5 Restatement of questions and aims

The inference levels, the steps of the inference process and the reporting approach are all inherently tied to the study’s aim and research questions. Although the aim and questions have been stated previously, they are restated on the following page for the convenience of the reader. Note that sections 6.3.1 to 6.4.3 each focus on one of the study’s six research questions.

4See section 1.4.6.

235 6. DESCRIPTIVE INFERENCES

The study aimed to describe process in serendipity, conceived as phe- nomenological and relational,5 through

1. the characterisation of contextual, process-related elements of serendip- ity based on the questions:

ˆ How do recounted lived experiences of serendipity relate to classifica- tions and/or patterns of serendipity proposed in the literature?

ˆ What sociological elements are proposed in the literature as obstacles to serendipity? To what extent do recounted lived experiences of serendipity support/refute these propositions?

ˆ How do people retrospectively interpret and make sense of their exper- iences of process in serendipity above and beyond a basic recounting of events? and,

2. the description of phenomenological event structures of serendipity in response to the questions:

ˆ What event structures are described in personal accounts of serendip- ity in research?

ˆ How does description of these event structures contribute to under- standings of relational aspects of process in serendipity?

ˆ To what extent, if at all, do the described event structures support the idea of average, typical or normal process in serendipity?

5Note that, to keep this section brief, shortened versions of the aim and its components are presented. For the full versions, see section 1.4.4.2 or 4.1.2.

236 6.3 Contextual, process-related elements of serendipity

6.3 Contextual, process-related elements of serendipity

6.3.1 Classifications of lived experience

6.3.1.1 Occurrence and associated outcomes

The research investigated the rate of occurrence of serendipity across the Citation Classics6 as well as the type of outcome—findings or methods—associated with the sample serendipity examples.

Occurrence The research found that serendipity7 featured in 1.5% of the Citation Classics.8 Unlike formal research reports, where convention exerts pressure on writers to hide serendipity (Merton, 2004), the Citation Classics encourage narrators to highlight serendipity (Garfield, 1993e). As such, it is reasonable to extrapolate the rate of serendipity reporting to the rate of occurrence. The study’s result (1.5%) can be compared with the corresponding figure (2.9%) calculated from Campanario’s (1996) existing study.9 Up close, the two figures differ significantly: one is almost double the other. However, step back and the two figures illustrate the same overall picture: serendipity is reported infrequently10 in the Citation Classics. Research anecdotes make up a large proportion of the serendipity literature.11 Both their sheer abundance and the serendipity-centric focus of these anecdotes

6In other words, the research studied occurrence by investigating the the complete Citation Classics database, not just the subpopulation of serendipity narratives. 7As noted in section 6.2.3, serendipity as used throughout this Chapter is specifically under- stood as bounded by the inclusion/exclusion criteria that determined the serendipity subpop- ulation studied by the research (see section 4.2.5). 8This figure was calculated by dividing the number of unique Citation Classics comprising the serendipity subpopulation (n = 59, see section 4.2.6) by the estimate of the unique Citation Classics in the online database (n = 3945, see section 4.2.1 and Appendix C). 9See section 4.2.3. 10Note that, given the qualitative understandings underlying the survey instrument, the quantitative survey results presented throughout section 6.3 are qualitised (e.g. infrequent, about equal, almost three quarters, etc.) for inference making purposes. 11See section 2.1.2.

237 6. DESCRIPTIVE INFERENCES

can seem to imply that serendipity is common, that it happens ‘all the time’. This idea has infiltrated serendipity theory; for example, Morton and Swindler (2005, p. 346) refer to serendipity as “a fundamental aspect of . . . research”. As a set of anecdotes about research experience in general (i.e. not exclusively serendipity anecdotes), the Citation Classics allow occurrences of serendipity to be placed within the wider context of research: in the Citation Classics, research ‘without serendipity’ is by far the more common experience.

Outcome Almost three quarters of the sample Citation Classics (74%, 37 out of n = 50) indicated findings as the significant research outcome; the remainder (26%, 13 out of n = 50) reported methods. These results suggest that, in the Citation Classics, findings are the ‘things’ (i.e. after Walpole’s definition)12 more often associated with serendipitous discovery. It must be emphasised that association does not prove cause and effect: the study results provide no information about whether serendipity leads to findings as the primary research outcome.

6.3.1.2 Campanario, van Andel, Foster and Ford

The research classed the sample Citation Classics according to three serendipity typologies selected from the literature: Campanario’s (1996) classification, van Andel’s (1994) patterns and Foster and Ford’s (2003) classification.13

Campanario 46% (23 out of n = 50) of the sample narratives fell into Class One and 54% (27 out of n = 50) into Class Two of Campanario’s (1996) serendipity classification.14 These results suggest that serendipity in the Citation Classics is about equally divided between the two classes. While it is tempting to interpret this division as indicating that serendipity is half unexpected outcomes and half unexpected process, a caveat concerning

12See section 1.3.1. 13See section 2.3.2.3. 14Recall from section 2.3.2.3 that Class One relates to the original research goal reached in an accidental way while Class Two relates to a discovery unrelated to the original research project.

238 6.3 Contextual, process-related elements of serendipity

Campanario’s classification should be noted. Class One inherently rules out the possibility of an unexpected outcome; however, Class Two does not rule out the possibility of unexpected process. This caveat implies that, with respect to outcomes, the results do indeed show a roughly equal division: expected research outcomes are about as common as unexpected research outcomes. However, in terms of process, the results provide, not a ratio, but rather, a minimum figure: about half of the examples—at least— refer to unexpected process. The Campanario classification results make two points concerning serendipity in the Citation Classics:

1. the sought finding is as common as the unsought finding; and,

2. pseudoserendipity (Roberts, 1989) occurs as often as—and potentially, even more often than—so-called ‘true’ serendipity.

Van Andel Table 6.1 lists the number of appearances of each of van Andel’s (1994) sev- enteen serendipity patterns across the study sample.15 The eighty serendipity patterns found in the Citation Classics divided into four groups:

1. a most frequently described pattern (Pattern 2);

2. patterns described about half as frequently as the most frequently described pattern (Patterns 3, 6 and 11);

3. patterns described infrequently (eight patterns); and,

4. patterns never described (five patterns).

The focus on Patterns 2, 3, 6 and 11 was maintained by the joint pattern res- ults.16 A joint pattern was defined as a set of two different van Andel serendipity patterns described by a single Citation Classic. Thirty-one such joint patterns were found. All but one of these involved Patterns 2, 3, 6 and/or 11. Repeated

15See section 2.3.2.3 for details of the patterns. 16Recall from section 2.3.2.3 that van Andel’s (1994) patterns can ‘coexist’.

239 6. DESCRIPTIVE INFERENCES

Serendipity Number of Pattern Descriptions

1 2

2 23

3 13

4 4

5 4

6 11

7 1

8 2

9 0

10 0

11 11

12 2

13 1

14 0

15 6

16 0

17 0

Table 6.1: Descriptions of van Andel serendipity patterns in the study sample - This table lists the number of times each of van Andel’s (1994) seventeen patterns were described by the sample Citation Classics. Note that it was possible for a single Citation Classic to describe more than one serendipity pattern.

240 6.3 Contextual, process-related elements of serendipity appearances of joint patterns were much more limited across the study sample than repeated appearances of single patterns; at just five appearances, Pattern 2 combined with Pattern 6 was the most frequently described joint pattern. The study results indicate that certain serendipity patterns are more common across the Citation Classics than others. Four patterns dominate:

1. one surprising observation;

2. repetition of a surprising observation;

3. from by-product to main product; and,

4. discovery by an outsider.

A single surprising observation features most frequently. This pattern fits closely with both Walpole’s definition and Merton’s conceptualisation of serendip- ity (see section 1.3.1). The literature also regularly refers to the potential people from outside of the community of practice bring to serendipity. Just as telling, are the patterns never described—for example, serendipity resulting from a dream. Despite occasional links in the literature between dream states (such as day- dreaming, see Mueller, 1990) and serendipity, such links are not reported in the lived experiences of serendipity recounted in the Citation Classics. Serendipity patterns do coexist in the Citation Classics; however, joint ap- pearances are restricted almost exclusively to combinations of the four most com- mon patterns, a situation that serves to further accentuate the dominance of these patterns. Interestingly, the one outlier joint pattern combined Pattern 8 (no hypothesis) with Pattern 15 (playing). Common sense suggests that it is entirely reasonable that these two patterns should ‘go together’, a point that has not gone unnoticed in the serendipity literature; this said, the literature can tend to over-emphasise the importance of such undirected, free activity for serendipity (for example, see Langmuir in Merton and Barber, 2004, p. 200). The study result—a single occurrence—suggests that such ‘playing around’ is in fact highly unusual in the serendipity experiences described by the Citation Classics.

241 6. DESCRIPTIVE INFERENCES

Foster and Ford Table 6.2 reports the results related to Foster and Ford’s (2003) serendipity classification.17 In theory, the two-part nature of the classification permits certain

Serendipity Number of Class Descriptions

1 12

2 26

3 0

4a 1

4b 12

Table 6.2: Descriptions of Foster and Ford serendipity classes in the study sample - This table lists the number of times each of Foster and Ford’s (2003) serendipity classes were described by the sample Citation Classics. Note that the two-part nature of Foster and Ford’s (2003) classification permits serendipity to belong to (certain) multiple classes. classes to be described in combination (see section 2.3.2.3); however, in practice, the study found only one such combination, Classes 2 and 4b, which in turn, appeared only once in the study sample. Classes One and Two comprise the first part of Foster and Ford’s classifica- tion, that relating to the impact of serendipity. The counts in Table 6.2 indicate that about half of the sample Citation Classics fell into Class Two and around a quarter fell into Class One; the remaining quarter of the sample fell into neither class. The counts also show that the second part of Foster and Ford’s classifica- tion, which relates to the nature of serendipity, applied to only a quarter of the sample Citation Classics. This latter result strongly suggests that the second part of Foster and Ford’s classification is inadequate for describing the nature of the bulk of the serendipity experiences recounted in the Citation Classics.

17See section 2.3.2.3 for details of this classification.

242 6.3 Contextual, process-related elements of serendipity

A case-by-case consideration of the results sheds additional light on the counts reported in Table 6.2: the counts show almost no overlap between the first and second parts of Foster and Ford’s classification. In other words, the results follow an either/or pattern: the quarter of the study sample not described by Classes One or Two is an almost perfect match to the quarter of the study sample de- scribed by the second part of Foster and Ford’s classification. From Foster and Ford’s (2003, p. 333–334) discussion and diagram of their classification, it seems that the theoretical intention—despite the presence of “conceptually distinct” concepts—is for the classification to allow a holistic, com- plex description of serendipity, one where multiple concepts meld together. The findings of the present research imply the exact opposite: no overlap in concepts and a rigid, either/or description. Given the discrepancy between the study’s findings and the theoretical inten- tions of the Foster and Ford classification, the possibility that the classification was incorrectly applied to the study sample must be considered. Indeed, this possibility highlights another issue raised by the research: the question of the classification’s practical value (i.e. as opposed to its theoretical value). Put simply, the researcher found the classification difficult to apply in practice. Some specific semantic issues arose; for example, did the phrase ‘problem solution’ in the descriptions for Classes One and Two refer to an end (i.e. the solution to a problem), a means (i.e. the process of solving a problem) or both? More generally, a secure semantic understanding of the classification was hindered by inconsistencies between Foster and Ford’s (2003) narrative (p. 330, 332) and diagrammatic (p. 334, Fig. 1) presentations of the classification. Given these concerns, it is possible that semantic misunderstanding may have affected the results presented in Table 6.2 as well as the study’s other results related to the Foster and Ford classification (see section 6.3.1.3); however, regard- less of whether or not semantic misunderstanding occurred, the research finding related to the usability of the classification stands.

243 6. DESCRIPTIVE INFERENCES

6.3.1.3 Cross-typology patterning

Recall from section 2.3.2.3 that the three serendipity typologies studied were selec- ted, in part, because of their potential interrelations.18 The research investigated two such potential interrelations:

1. links between Campanario’s (1996) and Foster and Ford’s (2003) serendipity classifications; and,

2. links between all three selected serendipity typologies.

Campanario/Foster and Ford interrelations Table 6.3 presents the correlations between Campanario (1996) and Foster and Ford (2003). Given the total number of descriptions recorded when the classifications were considered separately (i.e. 27 for Campanario, Class Two and 26 for Foster and Ford, Class Two, see section 6.3.1.2), the count of 23 joint occurrences (see Table 6.3) indicates a strong interrelation between Campanario Class Two and Foster and Ford Class Two. As detailed in Appendix L, the results also provide some evidence for a weaker, either/or-type link between Campanario Class One and Foster and Ford Classes One and 4b.

Interrelations between all three typologies Given the three typologies, 170 distinct class/pattern combinations were pos- sible.19 34 (20%) of the 170 possible combinations were described by the study sample. Of the 34 described combinations, 17 (50%) occurred multiple times. Two points are notable from these results:

1. some experiences of serendipity are described frequently enough across the sample Citation Classics to be deemed representative examples or ‘types’; and,

18Recall also that Campanario (1996) claims an explicit interrelation between his serendipity classification and van Andel’s (1994) serendipity patterns. The present study investigated the question of unrecognised rather than recognised links. As such, the research did not focus on the interrelations between Campanario’s classification and van Andel’s patterns. 19The figure of 170 was calculated by multiplying the numbers of classes/patterns from each typology (i.e. 2 x 17 x 5=170).

244 6.3 Contextual, process-related elements of serendipity

Campanario Foster and Ford Number of Class Class Joint Occurrences

11 12

12 3

1 4a 1

1 4b 7

21 0

22 23

2 4a 0

2 4b 5

Table 6.3: Interrelations in the study sample between the Campanario and the Foster and Ford serendipity classifications - This table lists the number of times correlations occurred in the study sample between classes in Cam- panario’s (1996) and Foster and Ford’s (2003) serendipity classifications. Note that the table omits Foster and Ford, Class 3a, because this class was not described by the sample Citation Classics (see Table 6.2).

245 6. DESCRIPTIVE INFERENCES

2. the sample Citation Classics do not describe a wide-range of ‘types’ of serendipity.

6.3.2 Sociological elements as obstacles

6.3.2.1 Sociological filters in the literature

From its review of the literature,20 the research identified four sociological ele- ments with the potential to impact serendipity:

1. time;

2. money;

3. existing goals, priorities or plans; and

4. responsibility to a team or organisation.

Note that the research labelled these elements sociological filters. The literature referenced both the negative and positive effects of these elements; as such the elements did not necessarily act as ‘obstacles’.

6.3.2.2 Sociological filters in lived experience

Once sociological filters had been identified from the literature review, the re- search examined to what extent these filters featured in the lived serendipity experiences recounted in the Citation Classics. All four types of sociological fil- ters made an appearance in the study sample. Eight (16%, n = 50) of the sample Citation Classics recounted serendipity experiences impacted by sociological fil- ters. Four of these eight examples referenced multiple sociological filters: the study found three cases of mixed time/financial filters and one case where all four types of filter were involved. Given the multiple filter cases, a total of fourteen sociological filters were described across the study sample. The majority (thirteen) of these sociolo- gical filters were linked by the Citation Classics narrators to positive impacts on serendipity process. The impact of the remaining (one) filter was not discussed.

20See section 2.3.2.4.

246 6.3 Contextual, process-related elements of serendipity

Most (six out of eight) of the Citation Classics narrators who explicitly refer- enced sociological filters used the evaluation sections of their narratives to expand upon the effects of the sociological filters. For example, Poulik (1984) (see Gar- field, 2012) provided a description of the (positive) effect of the pressure of time:

This consumption of time was not conducive to a smooth course in either my medical school work or my marriage, even though my wife, Emily, was working with me as a research technician. Consequently, we both tried to devise a less time-consuming procedure.

As well, narrators across the sample Citation Classics occasionally used eval- uation sections of their narratives to reference the idea of sociological filters in a more general sense (i.e. rather than in reference to a specific serendipity ex- perience): discussions referred to matters such as overly-restrictive demands of research organisations or diminishing research budgets. Whereas the sociological filter examples found by the study (i.e. the thirteen examples that discussed specific impacts of filters) all explicitly linked filters to positive impacts, all of the general references across the study sample associated sociological filters with negative impacts. The research findings on sociological filters in the lived serendipity experiences recounted in the Citation Classics supported most of the literature’s theoretical propositions:21 all of the proposed sociological filters were relevant to the lived experiences; the filters did appear in combination, especially as the combined pressures of time and money; and, the filters were discussed both in terms of negative and positive effects. A detail related to this latter point was the one finding slightly at odds with the literature: while the literature makes some general references (i.e. not linked to specific lived experiences) to the positive impact of filters, such general references found in the Citation Classics focused exclusively on the negative impact of filters.

21See section 2.3.2.4.

247 6. DESCRIPTIVE INFERENCES

6.3.3 Retrospective interpretations and sensemaking

6.3.3.1 Intention versus realisation

Table 6.4 presents the intention/realisation patterns recorded across the study sample as well as their frequencies. The research found that specific intentions related to both outcome and method (68%, 34 out of n = 50) were most commonly reported.22 This result was not especially surprising given the formal nature of research designs in the scientific disciplines that dominated the sample Citation Classics (see section 6.3.3.4). More surprising was the number of narrators (12%, 6 out of n = 50) who reported no intentions in relation to both outcome and method.23 The literature documents a tendency for serendipity to be held up as an argument against too much control and rigidity in research (see Merton and Barber, 2004, Chapter 10). The study’s finding concerning the conscious lack of intentions in the serendipity experiences recounted in the Citation Classics provides some support for this argument; however, the finding also contradicts the study’s classification-related finding that ‘playing around’ is unusual in the serendipity experiences recounted in the Citation Classics (see section 6.3.1.2). This contradiction is discussed further in sections 6.5 and 7.3.5. The intention/realisation results also shed light on the degree and type of ‘un- expectedness’ in the serendipity described by the Citation Classics. 56% (28 out of n = 50) of the narrators described the process that actually unfolded during the course of their research as unintended, while only 16% (8 out of n = 50) repor- ted that process played out as intended. This finding supports Merton’s (2004) argument concerning the significance of the accident component of serendipity.24 Note that the nature (i.e. intended or unintended) of the actual process could not be determined in a significant number of examples (28%, 14 out of n = 50); how- ever, even given a hypothetical extreme case (i.e. if this 28% of the study sample had described intended process), unexpectedness in process still dominated.

22A specific intention on both outcome and method is recorded as SS in the first part of the patterns reported in Table 6.4. 23Note that ‘no intentions’ does not imply that a narrator’s intentions could not be determ- ined; rather, ‘no intentions’ means that a narrator explicitly referred to having no intentions. 24See section 1.3.3.

248 6.3 Contextual, process-related elements of serendipity

Pattern Number of Appearances

SS UU 15

SS IU 11

SS UI 8

NN XX 6

SX UX 4

XX XX 3

SV UU 2

SV UX 1

Legend

S=Specific V=Vague N=None I=as Intended U=Unintended X=not discussed/not applicable

Table 6.4: Intention versus realisation patterns across the study sample - This table lists the intention versus realisation patterns found across the sample Citation Classics. Each pattern’s frequency of appearance is also given. Note that the frequencies are listed from highest to lowest. The first group of letters in each pattern refers to reported intentions: letter one to intended aim/goal; and letter two to intended process/method. The second group of letters refers to reported realisation (i.e. what actually happened): letter three to actual outcome/result; and letter four to actual process/method. The legend gives the descriptions for the letter abbreviations.

249 6. DESCRIPTIVE INFERENCES

At 60% (30 out of n = 50), unexpectedness in outcome dominated the study sample to about the same degree as unexpectedness in process. Only 22% (11 out of n = 50) of the narrators explicitly reported an intended outcome. This finding concerning the dominance of unexpected outcomes was inconsistent with the study’s finding that the sought finding was as common as the unsought finding in the serendipity described by the Citation Classics (see section 6.3.1). Sections 6.5 and 7.3.5 make the point that inconsistencies between inferences are not necessarily problematic in mixed methods research.

6.3.3.2 Recognition

Across the sample Citation Classics, immediate (20%, 10 out of n = 50) and delayed (22%, 11 out of n = 50) recognition were about equal; however, cases not referring to (6%, 3 out of n = 50) or unclear about (52%, 26 out of n = 50) recognition outnumbered cases discussing recognition. These results suggest, not only that the recognition of serendipity does not factor strongly in the retrospect- ive interpretations presented in the Citation Classics, but also that, when it does factor, recognition occurs equally in both forms, immediate and delayed. Recall that existing studies provide some evidence that the use of the word ‘suddenly’ may indicate immediate recognition in serendipity.25 As a check on the present study’s weak findings on recognition, the research reexamined the sample Citation Classics using a keyword search on the stem sudden* : only three (6%, n = 50) narratives contained the word ‘suddenly’,26 and each of these stories used the term only once. If ‘suddenly’ really is a good recognition indicator, then the present study’s first result related to immediacy of recognition (i.e. 20%, n = 50) is potentially far too high.27

6.3.3.3 (Non)linearity

The majority of the serendipity experiences (68%, 34 out of n = 50) were repor- ted as linear interpretations (i.e. as a process occurring in steps with one thing

25See section 2.3.2.5. 26None contained the word ‘sudden’. 27The research instrument relied on a semantic reading of the Citation Classics texts. Such a semantic approach can never be as ‘objective’ as a keyword search.

250 6.3 Contextual, process-related elements of serendipity happening after another). 18% (9 out of n = 50) presented nonlinear interpreta- tions of serendipity experiences. The research was unable to determine the type of interpretation for the remaining 14% (7 out of n = 50). Recall that the normal personal narrative form potentially favours a linear re- porting of process.28 It is also worth considering the possibility that the Citation Classics authors, who are used to reporting research linearly as demanded by con- ventional reports (see Merton, 2004), may demonstrate an increased propensity to fall habitually into this familiar sequential style when discussing research-related matters. Given the issues of normal form and increased propensity, how indicative is the study’s result of a tendency towards linear interpretations of serendipity process? Significantly, a third of the sample Citation Classics did not present linear interpretations.29 The implication of this finding is that any bias that may have resulted from the normal form or reporting habit did not overwhelm the study sample. The research findings suggest that the theoretical literature’s focus on the nonlinearity of serendipity process30 may be somewhat misguided: an increased focus on linear aspects of serendipity may be worthwhile. Yes, the present study’s results provide evidence that serendipity can be interpreted as nonlinear process; however, the results also show that linear interpretations are possible and com- mon. In the Citation Classics, linearity in serendipity is certainly not the ‘myth’ proclaimed by Holder and McKinney (1993).

6.3.3.4 Multiple disciplinarity

The research investigated the prevalence of multiple disciplinarity in the study sample by considering both narrators’ references to multiple disciplinarity and the Citation Classics Current Contents metadata. To provide context for the discussion of multiple disciplinarity, the discipline spread of the study sample was also examined.31 28See sections 2.3.2.5 and 3.3.2. 29Granted, not all of these interpretations were explicitly nonlinear; however, all were some- thing other than linear. 30See section 2.3.2.5. 31See section 5.2.2.4.

251 6. DESCRIPTIVE INFERENCES

Prevalence 30% (15 out of n = 50) of the sample narratives’ authors indicated that multiple disciplinarity played a role in their reported serendipity experiences. 25% (12 out of n = 48)32 of the Citation Classics studied had metadata suggestive of multiple disciplinarity.33 Although slightly lower, the metadata result presents a similar picture to that illustrated by the result derived from the narrators’ descriptions: fewer than one third of the sample Citation Classics were indicative of multiple disciplinarity. As noted in section 2.3.2.5, there is a normal bias in the literature towards viewing multiple disciplinarity as beneficial to serendipity; some authors even translate this idea of beneficiality to one of essentiality, arguing that multiple disciplinarity ‘makes for serendipity’ (Merton, 2004) or is the ‘embodiment’ of serendipity (D¨ork et al., 2011). The present study found no strong evidence from the Citation Classics in support of the serendipity literature’s emphasis on multiple disciplinarity. The research findings suggest that multiple disciplinarity is present at only moderate levels in the serendipity described by the Citation Classics: multiple disciplinarity is relevant, but at the same time, neither dominant nor essential.

Discipline spread The journal title information and the Current Contents metadata gave near identical results concerning the broad discipline spread across the sample Citation Classics.34 The sciences clearly dominated both the social sciences and the arts and humanities: 90% (43 out of n = 48) of the journals titles and 88% (42 out of n = 48) of the metadata tags related to the sciences.

32Note that due to the presence of multiple serendipity narratives within in a single Citation Classic text, the study sample comprised forty-eight, rather than fifty, distinct Citation Classics (see section 4.2.6 and Appendix D). 33Recall from section 4.2.1 that the Current Contents metadata refers to the multiple discip- linarity of journals—not individual articles; this said, common sense suggests that publication in a multiple disciplinary journal suggests an increased likelihood that an article incorporates some aspect of multiple disciplinarity. 34See section 5.2.2.4.

252 6.3 Contextual, process-related elements of serendipity

Table 6.5 lists the study results related to the narrower subject areas described by the Current Contents metadata.35 Over half of the sample narratives described

Current Contents Metadata Tags Counts

Life sciences 19

Clinical medicine 11

Agriculture, biology and environmental sciences 7

Engineering, technology and applied sciences 4

Social and behavioural sciences 4

Arts and humanities 2

Physical, chemical and earth sciences 1

Table 6.5: Narrower discipline spread across the study sample based on Current Contents metadata - This table gives the spread of the Current Contents metadata tags across the forty-eight Citation Classics that made up the study sample. The metadata tags are listed beginning with the most frequently occurring . Note that when a Citation Classic is published in several Current Contents, each published version has different metadata (see section 4.2.1). The metadata tag attached to the randomly sampled Citation Classic selected for the study was the one counted. research from the life sciences and clinical practice. Recall from section 4.2.1 that the life sciences dominated the papers initially meriting Citation Classics; however, from 1979, the selection criteria changed to avoid bias against areas of study with traditionally smaller bodies of literature. Indeed, all (8%, 4 out of n = 48) of the sample Citation Classics from 1977 and 1978 were from the life sciences; nonetheless, this did not bias the results. Even if these 1977 and 1978 Citation Classics are disregarded, the conclusion concerning the dominance of the life sciences and clinical practice stands.

35See section 4.2.1 for an introduction to the Current Contents editions.

253 6. DESCRIPTIVE INFERENCES

The study’s attempt to class the sample Citation Classics by narrower subject area according to journal title proved problematic. Journal titles often seemed relevant to several fields; without additional information about a journal’s remit (i.e. information beyond the title), the researcher was essentially left ‘guessing’ at the most appropriate field class. Because of its unreliability, this set of res- ults based on journal title was not included in the research findings; however, Appendix M reproduces the set for readers who wish to review the results. The research cannot comment about either the broad or narrow discipline spread across the entire Citation Classics dataset; however, in terms of the Cita- tion Classics that discuss serendipity, the sciences—and more specifically, the life sciences and medicine—prevail, with the social sciences and the arts and human- ities less in evidence. This situation is mirrored to a large extent by the anecdotal serendipity liter- ature. Several collections of anecdotes focus on serendipity in science (Hannan, 2006; Roberts, 1989) or medicine (Austin, 2003). As well, many of the research anecdotes captured by Garfield’s (2004b) stem serendipit* HistCite analysis are from subject areas in the sciences or medicine. Nonetheless, anecdotes do exist, although in smaller numbers, from the social sciences and the arts and humanities (for example, see de Rond and Morley, 2008). Intriguing questions (beyond the remit of the present research) are raised by the study’s findings. Is serendipity really more common in the sciences? Why do narrower subject areas such as the life sciences and medicine dominate serendipity in the sciences? Does the nature of research in the sciences better accommodate serendipity? Or, does serendipity occur just as often in the social sciences and arts and humanities but under a different name, appearing perhaps as ‘creativity’ or ‘innovation’?36 36See section 1.3.4.

254 6.4 Phenomenological event structures of serendipity

6.4 Phenomenological event structures of serendipity

6.4.1 Event structure description

6.4.1.1 Visualisations

The research modelled a pair of event structures for each of the sample Citation Classics. The resultant one hundred network visualisations, along with their un- derlying matrices, are collected in a network portfolio presented as a supplement- ary volume to the main thesis volume. Only a summary, qualitative description of the network visualisations is provided here. The event structures described by the sample Citation Classics were varied: small and large, sparse and dense, and simple and complicated networks presen- ted. This said, certain features seemed37 to repeat across the collection. The paired models often ‘looked related’: they comprised the same sort of smaller components, such as patterns or clusters, and took similar overall shapes. As might be expected with ego networks—which tend to present as centred38— overall network shapes that followed a star-like pattern were common; however, in many cases, the star-like shape was far from perfect, perhaps hinting at the significance of roles played by ego’s alters in the serendipity experiences. Within model pairs, network model one was the less likely to take a star shape. More generally across the sample, model one seemed to display more variability in shape than model two. In most of the network visualisations, the egos tended to be centrally loc- ated. Again, given the star-like shape of many ego networks, this situation was not especially surprising; however, the situation did not appear to be restricted to ego—the nodes representing other people also tended to occupy the middle spaces of the visualisations. As well, the links between people often seemed the most intricate, going in both directions, knitting tightly together, and crossing

37The use of the word ‘seemed’ here is a conscious choice, one intended, not to indicate lack of commitment to a conclusion, but rather, to highlight the subjectivity inherently involved in the interpretation of network visualisations. 38See section 3.5.6.2.

255 6. DESCRIPTIVE INFERENCES frequently. These points suggested that, even if people did not dominate in terms of their numbers, people in general (i.e. not just ego) were central to the structure of the networks. As discussed in section 4.5.3, reliance on network visualisations alone is prob- lematic. The impressions gleaned from the network visualisations act as a use- ful starting point for understanding the complexity of the event structures of serendipity described by the Citation Classics; however, it is important to bear in mind that these impressions are only a starting point. The quantitatively- derived inferences reported in sections 6.4.1.2 to 6.4.3.3 serve to fill out and check the visually-based inferences. When possible, examples from the supplementary volume (i.e. network port- folio) are provided alongside the descriptive statistics presented in the thesis text in order to link the quantitative inferences to the network visualisations; for ex- ample, an estimate of central tendency is followed by (see Visualisations x1, x2, x3), where the visualisations listed illustrate networks representative of the es- timate. It is important to stress that the networks selected as examples are not necessarily the only networks illustrative of the estimate presented.

6.4.1.2 Magnitude

The magnitude of an event structure was measured by its order and size—its vertex and link count respectively.39 Figure 6.1 provides boxplots of the vertex and link counts for models one and two.40 Figure 6.2 presents histograms41 that show the distributions of the vertex and link count data. Table 6.6 summarises the estimates related to magnitude.

39See section 3.5.3 for an introduction to network order and size. All lines in the event structures were directed: in other words, links in the networks were either one- or two-way. Note that size accounts for two-way links: a bidirectional arc increases the size by two. 40Recall from section 4.6.1 that the study treated network counts as continuous data. 41Note that, throughout Chapter 6, histograms of study data are provided only in cases where, in an ideal situation, the data were expected to have identical distributions. Of course, this ideal situation rarely occurred; as such, the histograms presented are never exactly the same. Nonetheless, in many cases, the histograms are quite similar and their illustration allows the reader to see this similarity.

256 6.4 Phenomenological event structures of serendipity

Model One Model Two

Kappas (1987) ● 40 40 Kappas (1987) ● ● Kroto (1993)

30 30

20 20

Vertex Count Vertex 10 10

0 0

200 200 Kappas (1987) ●

Kroto (1993) ● 150 150 Gocke (1985) ● ● Kappas (1987) Harrison (1987) ● ● ● Kroto (1993) Smythe (1980) ● Smythe (1980) 100 ● Gocke (1985) 100

Link Count 50 50

0 0

Figure 6.1: Magnitude for models one and two - This figure presents boxplots of the data related to magnitude for event structure models one and two. Order is represented by vertex count, and size, by link count. The outlier Citation Classics from the study sample are labelled in grey on the diagrams. Note that the consistent scales of the y axes allow direct comparisons across the two models.

257 6. DESCRIPTIVE INFERENCES

Model One Model Two 15 15

10 10

5 5 Frequency

0 0 0 10 20 30 40 0 10 20 30 40 Vertex Count Vertex Count

25 25

20 20

15 15

10 10

Frequency 5 5

0 0 0 50 100 150 200 0 50 100 150 200 Link Count Link Count

Figure 6.2: Magnitude distribution for models one and two - This figure shows the distributions of the vertex and link counts for event structure models one and two. Note that identical bin choices and consistent y axes scales enable cross-model comparison.

258 6.4 Phenomenological event structures of serendipity 4 . 7 5 5 6 8 8 5 2 9 4 1 3 . 5 1. 2. 3. 2. 2. 6 0.44 0.61 2 . 5 4 . Model 7 6 4 4 0 EVENT 5 7 8 6 1 3 5 . . 24 2 4 8 7. 4 3. CONTEXTS 2422 23 21 6 2. 3. 3. 3. 2. 5 One Two 0.67 0.61 4 2 . 5 1 2 18 19 . . 0 0 4 7 4 1 1 9 0 9 7 5 5 6 0 1 5 15 16 06 09 . . . . 0 0 0 0 0.21 0.43 0.37 0.11 0.06 0.08 0.08 0.07 0.07 0.57 0.52 1 2 11 14 . . 0 0 estimates, and the unitless estimate, n 3 2 Q Model 15 17 . . 0 0 DENSITY 9 3 7 7 6 7 0 9 7 0 7 0 9 4 0 and n 13 15 05 10 . . . . S One Two 0 0 0 0 0.19 0.33 0.28 0.08 0.04 0.07 0.06 0.06 0.05 0.50 0.44 1 4 12 13 . . 0 0 6 . 50 34 1 4 8 3 0 6 3 7 5 9 5 . 31 45. 30. 23. 37. 20. 20. 39 0.74 0.94 2 . 29 9 . Model LINKS 3946 24 4 9 - This table summarises the magnitude, cohesion and event context data. 8 8 2 4 3 6 5 9 . . 6 3 17 15. 15 15. 163157 191 188 One Two 43. 26. 22. 31. 20. 18. 32 37 0.68 0.82 0 . 23 29 2 . 5 . 18 ) forms—Rousseeuw and Croux’s (1993) 5 2 n 17 1 8 3 5 5 7 6 2 2 . . 4. 6. 7. 7. 6. 15 16 0.44 0.47 9 . 13 6 . Model 18 18 12 7 2 VERTICES 5 5 7 3 2 2 . 6 4 9 8. 16 1120 11 3832 19. 41 37 4. 6. 7. 7. 6. One Two 16 0.41 0.44 13 5 . 14 n n n S Q SD Min IQR Max Med RSD the relative median deviationknown (RMD). The as mean, the standardprovided deviation coefficient in (SD), of and grey variation)—all relativenon-robust below standard non-robust measures deviation the measures (RSD) differ equivalent (also not forBCa robust confidence some used measure interval estimates. was by for unstable the The for comparison the median study—are purposes. model and two interspersed event mean Note context in are count how the data, displayed markedly the with list, Percentile the confidence interval is robust intervals. listed instead. and As the Table 6.6: ComplexityThe data five-number summary boxplot summaryinterquartile (min, range. 1st The quartile,(MAD) remaining median, and statistics normalised 3rd are (MAD quartile, measures max) of is scale: listed, the followed median by absolute the deviation—in range non-normalised and Mean RMD MAD 25%Q 75%Q Range MAD

259 6. DESCRIPTIVE INFERENCES

Central tendency Due to the skewed distributions of the data, the median was selected as the 42 estimate of central tendency. The estimated median vertex counts were 13 16 18 for model one (see Visualisations 17.1, 19.1, 20.1) and 12 15 17.5 for model two (see

Visualisations 3.2, 40.2, 43.2). The estimated median link counts were 23 32.5 39 for model one (see Visualisations 19.1, 21.1, 43.1) and 24 31 34 for model two (see Visualisations 14.2, 15.2, 17.2). The median estimates for both models suggested a rough 1 : 2 ratio for node count to link count. Note that the medians and their confidence intervals are presented in Louis and Zeger’s (2007, p. 1) graphical format, which serves to “visually connect an estimate with its uncertainty”. The results incorporate bootstrapped, asymmet- rical 95% confidence intervals.43 The lower bound interval is in subscript to the left of the estimate and the upper bound, to the right. This graphical approach is adopted throughout the thesis. More generally, the thesis text follows the format recommended by Kozak and Krzanowski (2010) for the effective visual presentation of major/minor digits of data in statistical reporting: minor digits are italicised and typeset in a smaller font. Such an approach emphasises the major digits, “those governing the major variation in the data” (Kozak and Krzanowski, 2010, p. 41), yet retains some minor digit information to allow the presentation of more exact values. Depending on the measure/statistic, numbers in the thesis include two or three major digits and up to two minor digits. Major digits can be zeros. Numbers are rounded to the last minor digit. For consistency, zeros may act as as place holders. The major/minor format is not used for p values or in diagrams.

Variability The study also sought to describe variability for the magnitudes of the event structures. While the boxplots together with their five-number summaries gave

42As shall be discussed later, the removal of outliers through the use of the trimmed mean was not desirable given the nature of the study data and the motivations of the research. 43Bootstrapping was carried out using the boot package (Canty and Ripley, 2012) for R. The bootstrapped confidence interval was based on 10 000 replicates. The adjusted bootstrap percentile (BCa) interval was selected. See DiCiccio and Efron (1996) for further information on bootstrap confidence intervals.

260 6.4 Phenomenological event structures of serendipity reasonable pictures of data dispersions, single-value measures of scale were also desired. Given the skewed distributions of the data, robust measures of scale were preferred. The study employed the median absolute deviation (MAD)44 and Rousseeuw 45 and Croux’s (1993) estimators, Sn and Qn. The relative median deviation (RMD),46 was also calculated to allow the comparison of data measured in differ- ent units.47 Only relative median deviations are reported in the text; the other estimates of scale are summarised in tables. The relative median deviations of the node counts for model one (RMD = 0.417 ) and model two (RMD = 0.445 ) suggested similar variability. For the link data, model two (RMD = 0.741 ) was slightly more dispersed than model one (RMD = 0.684 ).48 The relative median deviations indicate that variability was greater in the link count data than in the node count data.

Consistency checks Recall from section 5.9.2 that the study employed both correlation and hypo- thesis testing to check the magnitude results derived from the two event struc- ture models. This check was important given the different relational text analysis methods underlying the models. Because a pair of event structures modelled the same raw data—a single Citation Classic—vertex and link counts were expected to remain roughly consistent across both models.

44Two versions of MAD were calculated using the mad function in R: normalised and non- normalised. The non-normalised version (constant = 1) matches the expectation that the MAD should be approximately one half the value of the interquartile range. The normalised version (constant = 1.4826) serves to make the MAD consistent with the standard deviation parameter at the normal distribution. 45 Sn and Qn were calculated in R using the robustbase package (Rousseeuw, Croux, Todorov, Ruckstuhl, Salibian-Barrera, Verbeke, Koller and Maechler, 2012). 46RMD was calculated as the normalised MAD divided by the median. 47While the MAD has a high breakdown point and a bounded influence function, it is still reliant on a measure of central location (i.e. the median), and, therefore, less appropriate for asymmetric distributions; nonetheless, the MAD may be more familiar to readers than Sn and Qn. As well, the MAD was required for the calculation of the RMD, which was useful for comparison purposes. Because neither Sn nor Qn depends on a measure of location, the estimators are especially suitable for asymmetric distributions. 48 Note that while the Qn estimate agreed with this finding, the Sn estimate was identical for both models. See Table 6.6.

261 6. DESCRIPTIVE INFERENCES

Figure 6.3 presents scatterplots for the vertex and link counts of models one and two. Kendall’s Tau49 indicated a strong correlation for both the vertex

counts (τB = 0.732 ,Z = 7.258 , p < 0.001) and the link counts (τB = 0.743 ,Z = 7.527 , p < 0.001).50 As well, neither the Wilcoxon Signed Rank Test for the vertex counts (Z = 1.360 , p = 0.176) nor for the link counts (Z = −0.304 , p = 0.765) rejected the null hypothesis.51 While these Wilcoxon results could not prove that the two sets of data were ‘the same’, they did indicate that, on this statistical test, the model one and model two data were not significantly different.

Inference evaluation The research chose not to discard outliers (for example, by using the trimmed mean), opting instead to use robust measures. Outliers were identified and recor- ded, thus making them available to other researchers for further investigation. Sample Citation Classics with extreme scores on the complexity measures might represent unusual examples of serendipity; although beyond the remit of the present study, further—and, likely, qualitative—investigation of these examples could shed light on why they are unusual. It is notable that two Citation Classics, Kappas (1987) and Kroto (1993), presented as outliers in both the vertex count and link count data. While the research did not aim to explore the reasons behind outliers, it is worth pointing out that Kroto (1993) was the only narrative in the study sample to stretch to two pages. It is possible that, because Kroto (1993) was longer, it simply had the ‘space’ (i.e. in terms of word count) to incorporate more events. Indeed, the finding that Kroto (1993) appeared as the only outlier for the event context count data (see section 6.4.1.4) adds weight to this argument.

49Due to the ties in the data, Kendall’s tau-b was calculated. 50 Note that the p values presented for τB are not exact. In particular, very small p values are simply given as p < 0.001. 51Unlike with Kendall’s Tau, the Wilcoxon Signed Rank Test p values are exact.

262 6.4 Phenomenological event structures of serendipity

Magnitude Correlation

40 ●

30 ● ● ● ●● ● ● ● ● ● ● 20 ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● 10 ●● ● τ = 0.732 ● B ●● ● ● p < 0.001 ● Vertex Count Model Two Vertex 0

0 10 20 30 40 Vertex Count Model One

200 ●

150

● ● ● ● 100

● ● ●● ● ● τ = ●● B 0.743 50 ● ●● ●● ● ● < ●● ●● ● ● p 0.001 ●● ●●●● ● ● Link Count Model Two ● ●● ●●● ●●● 0 ●

0 50 100 150 200 Link Count Model One

Figure 6.3: Magnitude correlation for models one and two - This figure shows scatterplots of the vertex and link count correlations for models one and two. As indicated by the τB coefficients, both sets of counts were strongly correlated.

263 6. DESCRIPTIVE INFERENCES

6.4.1.3 Cohesion

The structural cohesion, or ‘interconnectedness’, of an event structure was meas- ured by its density.52 It is important to be aware that this measure of cohesion is inherently tied to magnitude:53 as a ratio of actual links to possible links, density is affected by both the number of vertices and the number of links in a network. Figure 6.4 provides boxplots of the density measurements for models one and two. The histograms presented in Figure 6.5 illustrate the distributions of the density measurements for models one and two. The descriptive statistics related to cohesion are summarised in Table 6.6.

Model One Model Two

Gocke (1985) ● ● Myers (1993) 0.4 0.4 Gocke (1985) ● Levy (1981) 0.3 0.3

0.2 0.2 Density

0.1 0.1

0.0 0.0

Figure 6.4: Cohesion for models one and two - This figure presents boxplots of the density measurement data for event structure models one and two. The outlier Citation Classics from the study sample are labelled in grey on the diagrams. Note that the consistent scales of the y axes allow direct comparisons across the two models.

52See section 3.5.3 for the formula for density in directed networks. 53Alternative measures of cohesion, including average degree and clustering coefficient, are presented in the literature as preferable to density (de Nooy et al., 2005; Scott, 2000). While these measures claim not to be tied to magnitude, they are problematic in their own ways and were not deemed suitable for use in the study. Section 7.2.4 considers this point further.

264 6.4 Phenomenological event structures of serendipity

Model One Model Two 15 15

10 10

5 5 Frequency

0 0 0.0 0.2 0.4 0.0 0.2 0.4 Density Density

Figure 6.5: Cohesion distribution for models one and two - This figure shows the distributions of the density measurements for event structure models one and two. Note that identical bin choices and consistent y axes scales enable cross-model comparison.

Central tendency and variability Because the density data were not normal, the median was again the focus for the measures of location and dispersion. The estimated median densit- ies were 0.121 0.139 0.153 for model one (see Visualisations 27.1, 34.1, 41.1) and

0.111 0.154 0.181 for model two (see Visualisations 14.2, 30.3, 31.2). Density in model two (RMD = 0.576 ) was slightly more variable than in model one (RMD = 0.504 ). In terms of variability, cohesion sat within magnitude: the density meas- urements were more dispersed than the node counts but less dispersed than the link counts.

Consistency As with magnitude, the cohesion results were checked for consistency across the event structure pairs. Figure 6.6 provides a scatterplot of the density measure- ments for event structure models one and two. Kendall’s Tau showed a correla- tion (τB = 0.543 ,Z = 5.548 , p < 0.001) between the density data. The Wilcoxon Signed Rank Test result for the density measurements (Z = −1.400 , p = 0.164) supported the null hypothesis of no significant difference.

265 6. DESCRIPTIVE INFERENCES

Cohesion Correlation

● ● 0.4

● 0.3 ● ● ●

● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ●● ●

Density Model Two ● ● ● ●●● ● ● τ = 0.543 0.1 ●● ● B ● ● ●● ● ● ● ● ● p < 0.001

0.0

0.0 0.1 0.2 0.3 0.4 Density Model One

Figure 6.6: Cohesion correlation for models one and two - This figure presents a scatterplot of the density measurements for event structure models one and two. As the τB coefficient indicates, the density measurements were correlated.

266 6.4 Phenomenological event structures of serendipity

Inference evaluation Although density is tied to magnitude—and thus, in practice, is rarely compar- able across multiple real-world networks—some normal expectations concerning the measure are noted in the literature. Real-world, sociological networks are generally considered sparse networks (de Nooy et al., 2005; Faust, 2010; New- man, 2010):54 the number of relations that network participants can maintain is thought to be limited by sociological context. But what is an indicative density value for such a real-world ‘sparse’ network? In her empirical study, Faust (2010, p. 224, Table 2) found a mean density of 0.30 (SD 0.19) and a median density of 0.29 in a sample of 159 social networks of different types and magnitudes. Median density in Faust’s (2010) sample is noticeably higher than median density in the present study’s sample. As well, with its relative standard deviation of 0.63, Faust’s density data seems to display higher variability than the present study’s density data.55 These differences sug- gest that the density measured on the serendipity event structures tends towards the lower end of the values normally expected for real-world sociological networks. The outliers in the cohesion data differed from the outliers found in the mag- nitude data: only one overlap, Gocke (1985), between the cohesion and magnitude outliers occurred. As a measure, density is biased in that it is inversely related to network order: as the number of vertices in a network increases, density decreases. Could the bias inherent in density have caused the outliers? The vertex counts for the outliers Gocke (1985) and Myers (1993) fell between the first and third quartiles of the data, suggesting that they were neither unusu- ally large or small in terms of vertex count. The remaining outlier, Levy (1981) Model One, which like the other two cohesion outliers, had an unusually high density, sat at the low end of the vertex count data: all things being equal, given the bias inherent in the density measure, the low vertex count of Levy (1981) could imply a high density. However, five other model one event structures had vertex counts equal to or lower than Levy (1981) and none of these five were outliers on density.

54In fact, Newman (2010) argues that most networks, with the exception of food webs, are sparse networks. 55Note that only relative standard deviations were available for comparison.

267 6. DESCRIPTIVE INFERENCES

Given these points, the cohesion outliers should be considered ‘true’ outliers and not the result of density bias: there is likely something unusual about the cohesion of these outlier event structures compared with the rest of the study sample. As with the magnitude outliers, the reasons for the cohesion outliers might be ascertained through qualitative case studies of the relevant Citation Classics, investigations beyond the remit of the present study.

6.4.1.4 Event contexts

In the research, the term event context was applied to the event structures to label the context created by the combined circumstances of time and location; for example, ‘yesterday in the lab’, ‘today in the lab’ and ‘today in the coffee shop’ constitute three different event contexts.56 Figure 6.7 presents boxplots of the event context count data for event structure models one and two. The distributions of the event context count data are shown as histograms in Figure 6.8. Table 6.6 provides a summary of the estimates related to the event context count data.

Central tendency and variability As with the other complexity data, the highly skewed distributions of the event context count data indicated the need for robust measures of location and dis- persion. The estimated median event context counts were 4 5.5 6 for model one

(see Visualisations 7.1, 11.1, 41.1) and 4 5 6 for model two (see Visualisations 1.2, 6.2, 14.2). The relative median deviations indicated that event context counts in model one (RMD = 0.674 ) were more variable than in model two (RMD = 0.445 ).57

Consistency Figure 6.9 provides a scatterplot of the event context count data for event structures models one and two. Kendall’s Tau suggested a strong correlation

(τB = 0.895 ,Z = 8.422 , p < 0.001) between the data. The Wilcoxon Signed

56See sections 2.3.2.2 and 4.8.3 for more detailed discussions. 57 Note, however, that Qn disagreed, producing an identical estimate for models one and two. See Table 6.6.

268 6.4 Phenomenological event structures of serendipity

Model One Model Two

25 25 Kroto (1993) ● Kroto (1993) ● 20 20

15 15

10 10

5 5 Event Context Count Context Event 0 0

Figure 6.7: Event context counts for models one and two - This figure provides boxplots of the event context count data for event structure models one and two. The outlier sample Citation Classics are labelled in grey. Note that the consistent scales of the y axes allow direct comparisons of the boxplots.

269 6. DESCRIPTIVE INFERENCES

Model One Model Two 35 35

30 30

25 25

20 20

15 15 Frequency 10 10

5 5

0 0

0 5 10 15 20 25 0 5 10 15 20 25 Event Context Count Event Context Count

Figure 6.8: Event context count distributions for models one and two - This figure presents histograms of the event context count data distributions for event structure models one and two. Note that identical bin choices and consistent y axes scales enable cross-model comparison. Both distributions are highly right- skewed and show strong peaks for event counts centring on five (i.e. bin 2).

270 6.4 Phenomenological event structures of serendipity

Rank Test result for the event context counts (Z = −0.047 , p = 1.000) supported the null hypothesis of no significant difference.

Event Context Correlation

25

20

15

● ● ● ● ● ● 10 ● ● ● ● ● τB = 0.895 ● ● ● ● p < 0.001 5 ● ● ● ● ● ● ● ● Event Context Count Model Two Event ● ●

0

0 5 10 15 20 25 Event Context Count Model One

Figure 6.9: Event context correlation for models one and two - This figure presents a scatterplot of the event context count data for event structure models one and two. As the τB coefficient indicates, the event context counts were strongly correlated.

Inference evaluation Kroto (1993) was consistently the only outlier in the event context data. This particular sample Citation Classic has already been discussed in the context of

271 6. DESCRIPTIVE INFERENCES

outliers. The same points made in those discussions stand here. The research findings indicate that multiple event contexts come into play in the serendipity described by the Citation Classics. Even at a minimum, the narratives describe two different times and/or locations; the majority of the nar- ratives describe at least double this number. Merton’s (1948) conceptualisation of the serendipity pattern requires serendip- ity to play out in a richer context.58 The study results suggest that the serendipity patterns of the Citation Classics do indeed display a richness in terms of temporal and locational contexts.

6.4.1.5 Participant attributes

Along with magnitude, cohesion and event contexts, participant attributes factor in the complexity of an event structure.59 The literature recognises three broad types of participants in serendipity process: people, things and information.60 The present study sought to examine the distribution of participant attributes in the modelled serendipity event structures. The attribute type of each parti- cipant (i.e. vertex) in an event structure was coded as ‘person’, ‘knowledge’ or ‘object’. Figure 6.10 provides boxplots of the attribute data recorded for event structure models one and two. Figure 6.11 shows the attribute data as histo- grams. Table 6.7 presents descriptive statistics for the attribute data.

Attribute prevalence Not all of the attribute data followed a normal distribution; as such, the median was selected as the most appropriate measure of central tendency. Figure 6.12 provides a bar chart that illustrates the median attribute percentages for event structure models one and two. The median estimates and confidence intervals of the attribute data for both event structure models are listed in Table 6.7. 58See sections 1.3.1 and 2.2.1.2. 59See section 2.3.3.5. 60See section 2.3.3.2.

272 6.4 Phenomenological event structures of serendipity

Model One Model Two

100 100

80 80 Pedersen (1985) ● Kanner('79),Myers('93) 60 60 ● ● Gocke('85),Sussman('90)

40 40

20 20 Person % of Total

● Hodge (1979) 0 0

100 100

Weissman&Koe3('90) ● 80 ● Buzbee (1992) 80

60 60

40 40

20 20 ● Giblett (1982) Knowledge % of Total Knowledge 0 0

100 100

80 80

60 60

40 40

20 20 Object % of Total

0 0

Figure 6.10: Attribute data for models one and two - This figure shows boxplots of the attribute data for event structure models one and two. The data are presented as percentages of totals (i.e. in the format, X% of the nodes in the event structure have the attribute type ‘Y’). The outlier sample Citation Classics are labelled in grey. Note that the consistent scales of the y axes permit direct comparisons across the six boxplots.

273 6. DESCRIPTIVE INFERENCES

Model One Model Two 25 25 20 20 15 15 10 10

Frequency 5 5 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Person % of Total Person % of Total

25 25 20 20 15 15 10 10

Frequency 5 5 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Knowledge % of Total Knowledge % of Total

25 25 20 20 15 15 10 10

Frequency 5 5 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Object % of Total Object % of Total

Figure 6.11: Attribute data distributions for models one and two - This figure shows histograms of the attribute data distributions for event structure mod- els one and two. The data are percentages of the total event structure. The con- sistent scales of the axes allow comparisons of the histograms.

274 6.4 Phenomenological event structures of serendipity ) n 4 9 . . 30 29 1 2 2 1 3 3 9 9 6 9 8 2 3 1 . . 12. 37. 24. 11. 17. 16. 18. 17. 26 25 0.67 0.67 9 3 . . 19 20 5 0 . . Model 24 25 OBJECT 8 3 6 6 2 7 9 6 7 3 7 5 . . % OF TOTAL 0 0 6060 64. 64. 7. One Two 19 20 30. 22. 11. 17. 15. 16. 16. 0.89 0.76 4 1 . . 13 16 7 3 . . 41 45 4 6 1 8 7 0 9 9 1 4 0 0 8 7 . . 75 7. 17. 31. 38 41 48. 57. 17. 11. 15. 13. 13. 0.29 0.36 3 7 . . 33 36 0 5 . . Model 50 50 4 0 4 4 4 8 4 0 4 4 6 8 9 5 7 . . % OF TOTAL KNOWLEDGE 7. One Two 15. 39. 54. 81. 66. 15. 11. 14. 12. 13. 46 46 0.24 0.31 3 4 . . 43 42 3 0 . . 33 37 2 2 6 1 3 2 5 9 3 9 1 4 1 3 8 . . 9. 5. 8. 9. 9. - This table provides descriptive statistics for participant (i.e. vertex) attribute 28. 58. 49. 10. 10. 33 33 0.24 0.32 6 8 . . 28 30 7 4 . . Model 34 36 PERSON 5 3 6 7 5 4 5 0 9 5 6 0 1 4 . . % OF TOTAL 40 39. 7. 8. One Two 25. 66. 59. 14. 12. 12. 11. 11. 31 33 0.39 0.36 9 6 . . 28 29 estimates, and the relative median deviation (RMD). The mean, standard deviation (SD), and n n n n Q S Q SD Min IQR Med Max RSD Mean RMD MAD 25%Q 75%Q Range MAD and n S type measured as aquartile, median, percentage 3rd of quartile, thestatistics max) event are are structure listed measures first, total of followed attributes. by scale: range The and the five-number interquartile boxplot median range estimates. summaries absolute (min, The deviation—in remaining 1st non-normalised (MAD) and normalised (MAD Table 6.7: Attribute data summary forms— relative standard deviationpurposes. (RSD) Confidence were intervals are not provided used for both by the the median study; and the however, mean. they are provided in grey for comparison

275 6. DESCRIPTIVE INFERENCES

Median Attribute Percentages 50

Model One Model Two 40

30

20 Median Percentage

10

0 Persons Knowledge Objects Attribute Type

Figure 6.12: Median attribute percentages for models one and two - This bar chart shows the median percentages of the ‘person’, ‘knowledge’ and ‘ob- ject’ attributes for event structure models one and two. The bootstrap confidence intervals for the medians are marked. See Table 6.7 for the values of the intervals.

Given the two modelling approaches used by the study, the research expected the results to show that the knowledge attribute was more prevalent in the model one event structures.61 The higher median estimate for model one and the lack of overlap between the confidence intervals for the knowledge attribute data for models one and two seen in Figure 6.12 seemed to support this expectation.

61A higher percentage of knowledge nodes was expected in model one because of the number of semantic triplets with ‘experiment’ or ‘result’ in the subject or object position: both the labels ‘experiment’ and ‘result’ were attribute-coded as ‘knowledge’. Section 7.2.2 considers this point further in the context of the research credibility audit.

276 6.4 Phenomenological event structures of serendipity

A Wilcoxon Signed Rank Test was employed to test the one-tailed hypothesis that knowledge formed a higher percentage of the model one event structures. The test result (Z = 3.524 , p < 0.001) indicated a significant difference: the percentage of nodes coded as ‘knowledge’ in the model one event structures was greater than the percentage coded in model two.

Attribute distribution Figure 6.13 provides horizontal stacked bar charts of the attribute distributions for event structure models one and two. From the attribute distributions in both model one and two, it is clear that, taken together, people and knowledge dom- inate the event structures. This finding agrees with the serendipity literature’s focus on people and information.62

Attribute Distribution

Model One

Persons Knowledge Objects

Model Two

0 20 40 60 80 100 % of Model

Figure 6.13: Attribute distributions for models one and two - This hori- zontal stacked bar shows the distribution of the ‘person’, ‘knowledge’ and ‘object’ attributes (as a percentage of event structure total) for event structure models one and two. The distributions do not sum to the full 100% because median rather than mean percentages are illustrated.

Nonetheless, objects do make up a noticeable part of the distributions il- lustrated in Figure 6.13. This said, the dispersion statistics listed in Table 6.7

62See section 2.3.3.2.

277 6. DESCRIPTIVE INFERENCES

suggest that objects present in the event structures with the greatest variability; the attributes of ‘person’ and ‘knowledge’ display less dispersion. As discussed in section 2.3.3.2, objects in serendipity process generally receive less attention in the literature. The study result related to the ‘object’ attribute raises two points for consideration:

1. objects may receive the least attention in the literature because objects are both the least prevalent and the least consistent in serendipity process; and,

2. objects may factor strongly enough in serendipity process to merit greater attention from serendipity research—certainly in the case of the Citation Classics, serendipity is not only about people and information.

6.4.2 Relational aspects of process

6.4.2.1 Ego’s load

The research investigated load—process interaction—in the event structures.63 Ego’s load refers to ego’s process interaction with alters. The study addressed three questions focused on ego’s load:

1. How much load does ego carry?

2. How does ego’s load compare in terms of pull and push? and,

3. How does ego’s load compare to average load in the event structure?

Question one required absolute measures of load. Questions two and three drew on normalised measures of load.64 In all cases, ‘ego’ referred to the sample Citation Classics narrator.

How much load? The study measured ego’s pulled (i.e. outgoing) and pushed (i.e. incoming) load. The study also calculated two estimates of ego’s total load; both estimates combined data from models one and two.65 63See section 2.3.3.5 for a discussion of the concept of load. 64See section 3.5.3 for a discussion of absolute versus normalised network measures. 65See section 5.7.1.

278 6.4 Phenomenological event structures of serendipity

Pulled/pushed load: Boxplots of the data related to ego’s absolute pulled and pushed load are provided in Figure 6.14. Table 6.8 lists descriptive statistics for the data.

Pulled Pushed

40 Kappas (1987) ● 40

30 30

● Kroto (1993)

20 20

Kroto (1993) ●

Ego's Absolute Load Ego's 10 10 Cohn('89),Gocke('85), ● Kappas('87)

0 0

Figure 6.14: Ego’s absolute pulled/pushed load - These boxplots present the data for ego’s absolute pulled and pushed loads. The scales of the y axes permit comparison of the boxplots.

As the data were not normally distributed, robust measures of location and scale were preferred. The median estimate was 9.5 11.5 12.5 for ego’s pulled load

(see Visualisations 25.1, 38.1, 41.1) and 2 3 3 for ego’s pushed load (see Visualisa- tions 5.2, 7.2, 23.2). The relative median deviations indicated that pushed load (RMD = 0.741 ) was considerably more variable than pulled load (RMD = 0.451 ). These results provide evidence, not only that the narrators of the sample Citation Classics carry more pulled than pushed load, but also that they carry pulled load with more consistency than pushed load. The results also suggest that, while narrators must have some pulled load, having pushed load is not essential: zero values were recorded only for ego’s pushed load.

279 6. DESCRIPTIVE INFERENCES

PULLED PUSHED TOTAL TOTAL ESTIMATE ONE ESTIMATE TWO

Min 3 0 3 2.5 25%Q 8 1.3 9 8.3

Med 9.5 11.5 12.5 2 3 3 12 15 16 10.3 12.8 13.5

Mean 10.6 12.5 14.3 2.6 3.3 4.1 13.3 15.8 18.3 11.2 13.3 15.4 75%Q 15 4 19.5 15 Max 39 12 48 38 Range 36 12 45 35 IQR 7 2.8 10.5 6.8 MAD 3.5 1.5 5.5 3.5

MADn 5.2 2.2 8.2 5.2 SD 6.6 2.7 8.8 7.3

Sn 4.8 2.4 7.2 5.4

Qn 6.2 2.1 8.3 6.2 RMD 0.451 0.741 0.544 0.407 RSD 0.530 0.811 0.559 0.551

Table 6.8: Ego’s absolute load data summary - This table summarises the data related to ego’s absolute pulled, pushed and total load. The five-number box- plot summary (min, 1st quartile, median, 3rd quartile, max) is listed first, followed by the range and interquartile range. The remaining statistics are measures of scale: the two forms of the median absolute deviation, the Sn and Qn estimates, and the relative median deviation (RMD). Non-robust measures, the mean, stand- ard deviation (SD), and relative standard deviation (RSD), are interspersed in the list, provided in grey below the equivalent robust measure for comparison purposes. The median and mean are displayed with confidence intervals.

280 6.4 Phenomenological event structures of serendipity

Total load: The study also considered ego’s absolute total load. Ego’s total load was estimated from measured data.66 The data for the two estimates of ego’s total load are presented as boxplots in Figure 6.15 and as histograms in Figure 6.16. The descriptive statistics for the estimated data are provided in Table 6.8.

Estimate One Estimate Two

50 50 Kappas (1987) ●

40 ● 40 Kroto (1993) Kroto (1993) ● ● Kappas (1987)

30 30 Smythe (1980) ●

20 20

10 10 Ego's Absolute Total Load Absolute Total Ego's 0 0

Figure 6.15: Estimates of ego’s absolute total load - These two boxplots illustrate the estimated data for ego’s absolute total load. The estimation proced- ures that produced these data are outlined in Appendix I. Note that the identical scales of the y axes allow the boxplots to be compared.

Again, robust measures were used. The median estimate was 12 15 16 for the 67 first estimate of ego’s total load and 10.3 12.8 13.5 for the second estimate. Only egos with a total load of 12 or 13 fell within the bounds of both median estimates. Six model one and five model two event structures had an ego with a load of 12 or 13; however, none of these model one and model two event structures were pairs (i.e. none described the same sample Citation Classic). The estimate one data (RMD = 0.544 ) were more variable than the estimate two data (RMD = 0.407 ).

66See Appendix I for explanations of the estimation procedures. 67Note that examples from the network portfolio are not provided for estimated data.

281 6. DESCRIPTIVE INFERENCES

Estimate One Estimate Two 20 20

15 15

10 10 Frequency 5 5

0 0 0 10 20 30 40 50 0 10 20 30 40 50

Ego's Absolute Total Load Ego's Absolute Total Load

Figure 6.16: Distributions of ego’s estimated absolute total load - These histograms illustrate the distributions of the two estimates of ego’s absolute total load. The identical scales of the histograms allow comparison. Note that the distributions are not the same.

282 6.4 Phenomenological event structures of serendipity

By using data derived from both models to estimate ego’s absolute total load, the research had hoped to overcome a problem anticipated with the event struc- ture models: more accurate models of ego’s pulled and pushed load would result in less accurate models of ego’s total load. Ego’s total load was expected to differ in models one and two, with model two having the higher values.68 The aim was for the estimated data to temper this difference and bring the total load measures to a consistent centre; however, the results of the statistical analysis—the limited overlap in the median estimates and the different degrees of dispersion—suggested some difference between the two estimates of ego’s total load. As the analysis had aimed for no difference, the research needed to examine the difference in the estimated data in order to determine, which, if either, of the two estimates of ego’s absolute total load were useful for inference making. Figure 6.17 presents a scatterplot of the estimate one and estimate two data.

Kendall’s Tau indicated a strong correlation (τB = 0.865 ,Z = 8.630 , p < 0.001) between the estimates, suggesting that at least this aspect of consistency was not a concern; however, a one-tailed Wilcoxon Signed Rank Test indicated that the median for estimate one was significantly higher (Z = 5.836 , p < 0.001)69 than the median for estimate two. Indeed, further investigation showed that, while estimate two sat between the measured ego total load data for models one and two,70 estimate one of ego’s total load showed no significant difference (Z = −0.718 , p = 0.502) from ego’s total load measured on model two.71 Given these results, was either estimate useful for inference making? The answer depended on whether an average or maximum value for ego’s total load was desired. As only estimate two met the study’s stated aim of bringing the ego total load measures to a centre, only estimate two was incorporated into the

68Indeed, a one-tailed Wilcoxon Signed Rank Test conducted on the measured ego total load data for models one and two showed that the median of the model two data was significantly higher (Z = 5.836 , p < 0.001). 69Note the similarity between this Wilcoxon result and the Wilcoxon result related to the measured ego total load (see footnote directly above). In actuality, although the Z scores were identical, the exact p values calculated did differ slightly. This cannot be discerned from the result due to the fact that very low p values are presented simply as p < 0.001. 70This point, established from a collection of one-tailed Wilcoxon Signed Rank Tests (the results of which are not reproduced here) was not surprising given that estimate two was an average of the ego total load measures made on model one and model two. 71This result derived from a two-tailed Wilcoxon Signed Rank Test.

283 6. DESCRIPTIVE INFERENCES

Correlation of Ego Absolute Total Load Estimates

50

40 ●

30 ●

● ● ●

20 ● ●

Estimate Two ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● τ = 0.865 10 ● B ● ● ● ● p < 0.001 ● ● ● ● ● ● ● ● 0

0 10 20 30 40 50 Estimate One

Figure 6.17: Ego absolute total load estimate correlation - This figure presents a scatterplot of the two estimates of ego’s total load. As the τB coefficient indicates, the estimated data were correlated.

284 6.4 Phenomenological event structures of serendipity higher level research inferences. This said, estimate one was deemed to have some merit: in a different inference making context, a maximum value could be preferable—for example, to reduce the risk of underestimation.

Pull versus push The research investigated ego’s pulled versus pushed load in the event struc- tures. Figure 6.18 provides boxplots of the study data related to ego’s normalised pulled and pushed load. Table 6.9 presents descriptive statistics for the normal- ised load data.

Pulled Pushed

1.0 1.0

0.8 0.8

Pedersen (1985)

● 0.6 0.6 ● Gocke (1985)

0.4 0.4

Ego's Normalised Load Ego's 0.2 0.2

0.0 0.0

Figure 6.18: Ego’s normalised pulled/pushed load - These boxplots illus- trate the data for ego’s normalised pulled and pushed load. The pulled load meas- ures were made on the model two event structures and the pushed load measures, on the model one event structures (see section 5.7.1).

285 6. DESCRIPTIVE INFERENCES

NORMALISED PULLED NORMALISED PUSHED

Min 0.46 0 25%Q 0.760 0.11

Med 0.827 0.847 0.909 0.12 0.20 0.22

Mean 0.801 0.842 0.883 0.168 0.206 0.244 75%Q 0.994 0.27 Max 1.00 0.636 Range 0.54 0.636 IQR 0.233 0.16 MAD 0.101 0.09

MADn 0.150 0.133 SD 0.145 0.132

Sn 0.149 0.123

Qn 0.138 0.124 RMD 0.178 0.667 RSD 0.172 0.643

Table 6.9: Ego’s normalised pulled/pushed load data summary - This table lists descriptive statistics for ego’s normalised pulled and pushed load. The five-number boxplot summaries, range and interquartile range estimates, and ro- bust estimates of scale are presented. Although not used by the study, non-robust measures are provided in grey below the equivalent robust measures for comparison purposes. Confidence intervals are given for both the median and the mean.

286 6.4 Phenomenological event structures of serendipity

The median estimate was 0.827 0.847 0.909 for ego’s pulled load (see Visualisa- tions 8.2, 26.2, 33.2) and 0.12 0.20 0.22 for ego’s pushed load (see Visualisations 6.1, 10.1, 42.1). Based on these median estimates, the research inferred a 85% : 20% ratio for ego’s pulled to pushed load, where 100% was the maximum possible load on either side of the ratio. The dispersions of ego’s normalised pushed and pulled load supported the inference drawn from the absolute data: normalised pushed load (RMD = 0.643 ) was more variable than normalised pulled load (RMD = 0.172 ). Indeed, nor- malised pulled load displayed even less variability than absolute pulled load (RMD = 0.451 ), further demonstrating the consistency of pull. Outliers only presented for pushed load: two Citation Classics narrators, Gocke (1985) and Pedersen (1985), had unusually high degrees of pushed load. As with the other outliers in the study, further qualitative investigation might shed light on why ego’s pushed load was exceptionally high in these outlier examples.

Ego’s load versus average load The research compared ego’s total load with the average total load. The study used averaged data for the comparison.72 Figure 6.19 presents boxplots of the averaged data for ego’s normalised total load (i.e. measured per event structure) and the average vertex’s normalised total load (i.e. measured per the same event structure). Table 6.10 presents descriptive statistics for the averaged normalised total load data. Both a quantile-quantile plot (not shown) and a Shapiro-Wilk Test result (W = 0.975 , p = 0.354) suggested that the data for ego’s normalised total load were normally distributed; however, the same tests indicated that the data for the average normalised total load did not follow a normal distribution. As such, the study employed robust measures of location and scale.

The median estimate was 0.381 0.431 0.471 for ego’s normalised total load and

0.108 0.155 0.178 for the average normalised total load. The relative median devi- ations suggested that average normalised total load (RMD = 0.526 ) was more dispersed than ego’s normalised total load (RMD = 0.247 ). Based on these results, the research inferred that the Citation Classics narrators carry above

72See section 5.7.1.

287 6. DESCRIPTIVE INFERENCES

Ego Vertex Average Vertex

1.0 1.0

0.8 0.8

0.6 0.6

Gocke (1985) 0.4 0.4 ●

0.2 0.2 Averaged Normalised Total Load Normalised Total Averaged

0.0 0.0

Figure 6.19: Ego/average normalised total load - These boxplots show the averaged normalised total load data for both the ego vertex and the average ver- tex. The identical y axes allow direct comparison of the boxplots. Note that the normalised total load data illustrated by the boxplots represent averages of net- work degree centrality measures that were calculated separately on event structure model one and two (see section 5.7.1).

288 6.4 Phenomenological event structures of serendipity

EGO’S NORMALISED AVERAGE NORMALISED TOTAL LOAD TOTAL LOAD (Averaged) (Averaged)

Min 0.257 0.065 25%Q 0.361 0.10

Med 0.381 0.431 0.471 0.108 0.155 0.178

Mean 0.407 0.436 0.465 0.139 0.160 0.180 75%Q 0.510 0.209 Max 0.685 0.385 Range 0.429 0.32 IQR 0.149 0.109 MAD 0.072 0.055

MADn 0.106 0.082 SD 0.102 0.071

Sn 0.103 0.078

Qn 0.112 0.062 RMD 0.247 0.526 RSD 0.235 0.448

Table 6.10: Ego/average normalised total load data summary - This table summarises the averaged data for ego’s normalised total load and the average vertex’s total load. Five-number boxplot summaries (min, 1st quartile, median, 3rd quartile, max) are presented along with estimates for range and dispersion. Note that non-robust statistics are listed in grey; however, the non-robust statistics were not used by the study since only one set of the normalised load data, ego’s total load, was normally distributed. Confidence intervals are provided for both the median and the mean.

289 6. DESCRIPTIVE INFERENCES

average total loads and carry them with above average consistency. Because the research relied on averaged data to study normalised total load, ego’s rank on load relative to the complete network was also recorded.73 The egos in model one and model two shared the same rank for 38 (76%) of the 50 model pairs. When all non-first ranks were collapsed (i.e. so that the ego could either rank ‘first’ or ‘not first’), the egos in models one and two ranked the same for 39 (78%) of the pairs. It is common in ego networks for the ego vertex to have the highest degree (i.e. to carry the most load). As such, it was not especially surprising to find that the Citation Classics narrators ranked first on load in around three quarters of the event structures. More surprising, was the finding that the narrators did not rank first in all of the event structures: in approximately one quarter of the event structures, someone or something was involved in more process interaction than the narrator. This finding highlights the role played by ‘others’ in serendipity process.

6.4.2.2 Event structure-level load

Event structure-level normalised total load measures the degree to which the structure of a network is dominated by some vertices more than others: a high score on the measure indicates that most of the load in the event structure is carried by only a few vertices, or even a single vertex; a low score indicates that load is more evenly distributed between the vertices.74 Figure 6.20 provides boxplots of the event structure-level normalised total load data for models one and two. The figure also includes a boxplot of averaged data (i.e. the averages of the two event structure-level loads). Table 6.11 presents descriptive statistics for the data illustrated by the boxplots.

73See section 5.7.1. 74Event structure-level normalised total load equates to the total degree centralisation of a network (see section 3.5.6.5).

290 6.4 Phenomenological event structures of serendipity ● Pedersen (1985) Averaged 1.0 0.8 0.6 0.4 0.2 0.0 - This figure shows boxplots of the data for event Model Two 1.0 0.8 0.6 0.4 0.2 0.0 ● Pedersen (1985) Model One

1.0 0.8 0.6 0.4 0.2 0.0 Event Structure−Level Normalised Total Load Total Normalised Structure−Level Event Figure 6.20:structure-level Event normalised structure-level load. normalisedthird The total boxplot first shows load two the boxplots averaged data show estimated the from data the as measured measured data illustrated (i.e. by the on first models two one boxplots. and two); the

291 6. DESCRIPTIVE INFERENCES

MODEL ONE MODEL TWO AVERAGED

Min 0.10 0.23 0.21 25%Q 0.19 0.38 0.305

Med 0.20 0.25 0.28 0.400 0.435 0.480 0.320 0.335 0.350

Mean 0.235 0.267 0.299 0.416 0.440 0.465 0.330 0.354 0.377 75%Q 0.33 0.50 0.393 Max 0.70 0.64 0.67 Range 0.60 0.41 0.46 IQR 0.14 0.12 0.088 MAD 0.075 0.06 0.04

MADn 0.111 0.089 0.059 SD 0.112 0.086 0.083

Sn 0.095 0.095 0.066

Qn 0.103 0.083 0.072 RMD 0.445 0.204 0.177 RSD 0.421 0.195 0.235

Table 6.11: Event structure-level normalised total load data summary - This table summarises the descriptive statistics for the event structure-level nor- malised total load data. Statistics are presented for the data as measured (i.e. on models one and two) and for the averaged data (i.e. as estimated from the model one and model two data). The table lists the five-number boxplot summaries, range and interquartile estimates, and robust estimates of scale. Non-robust measures (in grey) are supplied for comparison purposes only. The medians and means include confidence intervals.

While a quantile-quantile plot (not shown) and the result of a Shapiro-Wilk Test (W = 0.9993 , p = 0.989) suggested that the data for model two followed a normal distribution, tests indicated that neither the model one data nor the av- eraged data were normally distributed. As such, robust measures were preferred. The median estimate of normalised total load at the event structure-level was

0.20 0.25 0.28 for model one (see Visualisations 3.1, 20.1, 38.1) and 0.400 0.435 0.480 for model two (see Visualisations 14.2, 16.2, 33.2). The median estimate of the

averaged event structure-total load data was 0.320 0.335 0.350 . The model one data (RMD = 0.445 ) displayed the highest variability; the lower dispersions of the

292 6.4 Phenomenological event structures of serendipity

model two (RMD = 0.204 ) and averaged data (RMD = 0.177 ) were similar to each other. Event structure-level normalised total load can have any value from 0 to 1: the closer an event structure scores to 1, the more centralised it is—that is, the more likely that load focuses on a single vertex in the event structure. Both of the median estimates for the measured data fell below the halfway point (i.e. 0.50) of this 0 to 1 range, suggesting that the event structure scores tended to be lower rather than higher; even the highest third interquartile estimate only reached a value of 0.50 (see Table 6.11). On average, the event structures displayed about one third of the maximum possible centralisation for load. As a general rule, the higher the score on event structure-level normalised total load, the more star-like the event structure. The visually-based inferences noted that, although star-like patterns were common in the visualisations, these star shapes seemed far from perfect (see section 6.4.1.1). The descriptive statistics for event structure-level normalised total load lent weight to the idea that the serendipity event structures described by the Citation Classics do indeed follow star-like patterns—but, that overall these patterns have a weaker rather than stronger star shape. It was clear from the first two boxplots illustrated in Figure 6.20 and the location and dispersion statistics presented in Table 6.11 that the event structure- level total normalised load data for models one and two differed; for example, neither medians of the data overlapped. Indeed, a one-tailed Wilcoxon Signed Rank Test found that the measures of event structure-level total normalised load were significantly higher in model two than in model one (Z = 5.870 , p < 0.001). Since event structures with higher scores on event structure-level normalised total load tend to have a more star-like structure, the Wilcoxon result supported the visually-based inference that model one was the less likely to take a star shape (see section 6.4.1.1) .

293 6. DESCRIPTIVE INFERENCES

6.4.2.3 Ego’s activity

As a concept, activity describes load frequency, ego’s multiple process interactions with the same alter. As a label, activity refers to the frequency of pulled load. Passivity refers to the frequency of pushed load.75 The study addressed two questions related to ego’s activity:

1. To what extent do multiple interactions factor in ego’s load? and,

2. How do ego’s activity and passivity compare?

The first of these questions drew on absolute network measures, the second, on normalised network measures. In all cases, ‘ego’ represented the Citation Classics narrator.

Multiple interactions The research investigated frequency for ego’s pulled and pushed load. In particular, the study sought to examine how the absolute measures of ego’s pulled/pushed load (see Figure 6.14 and Table 6.8) changed when multiple inter- actions were taken into account. Ego’s passivity was measured on the summed version of the model one event structures. Because no summed version of the model two event structures was available, the study estimated ego’s activity.76 Figure 6.21 shows four boxplots that allow the comparison of ego’s load and activity; the load data are reproduced—albeit, with different y axes—from Figure 6.14. Table 6.12 provides descriptive statistics for the activity data; for ease of comparison, the load statistics, originally presented in Table 6.8, are listed again. As the activity data were not normally distributed, robust measures of location and scale were employed. The median estimate was 12 13 16.5 for ego’s estimated activity and 3 3 5.5 for ego’s passivity (see Visualisations 21.1, 27.1, 40.1). The relative median deviations indicated that the passivity data (RMD = 0.988 ) displayed more than double the variation shown by the activity data (RMD = 0.456 ).

75See section 2.3.3.5 for more detailed discussion. 76See section 5.7.2 for further details and Appendix J for the estimation procedure.

294 6.4 Phenomenological event structures of serendipity

Ego's Pulled Load Ego's Estimated Activity

60 60 Kappas (1987) ●

50 50 ● Kroto (1993)

40 Kappas (1987) ● 40 Gocke (1985) ●

30 30 ● Kroto (1993)

20 20 Absolute Process Interaction 10 10

0 0

Ego's Pushed Load Ego's Measured Passivity

60 60

50 50

40 40

Gocke('85),Kroto('93) 30 30 ● ● Kappas (1987)

20 20

Kroto (1993) Absolute Process Interaction ● Cohn('89),Gocke('85), 10 ● 10 Kappas ('87)

0 0

Figure 6.21: Comparison of ego’s absolute load and activity - The top row boxplots relate to ego’s outgoing, or ‘active’, process interaction: the top left plot illustrates ego’s pulled load and the top right, ego’s activity. The bottom row boxplots relate to ego’s incoming, or ‘passive’, process interaction: the bottom left plot shows ego’s pushed load and the bottom right, ego’s passivity. The identical y axes allow for comparisons across all four boxplots. Unlike the load data, the activity/passivity data takes multiple interactions into account. Note that the pulled/pushed load data is reproduced from Figure 6.14 and that that ego’s activity is estimated rather than measured (see section 5.7.2 and Appendix J).

295 6. DESCRIPTIVE INFERENCES

PULLED ACTIVITY PUSHED PASSIVITY LOAD (Estimated) LOAD

Min 3 3 0 0 25%Q 8 9.3 1.3 2

Med 9.5 11.5 12.5 12 13 16.5 2 3 3 3 3 5.5

Mean 10.6 12.5 14.3 12.8 15.7 18.6 2.6 3.3 4.1 4.1 6.0 7.9 75%Q 15 19 4 8.8 Max 39 57 12 29 Range 36 54 12 29 IQR 7 9.8 2.8 6.8 MAD 3.5 4 1.5 2

MADn 5.2 5.9 2.2 3.0 SD 6.6 10.2 2.7 6.6

Sn 4.8 7.2 2.4 2.4

Qn 6.2 6.2 2.1 4.1 RMD 0.451 0.456 0.741 0.988 RSD 0.530 0.653 0.811 1.098

Table 6.12: Ego’s absolute activity data summary - This table summarises the data related to ego’s activity and passivity (highlighted columns). For com- parison purposes, the table also reproduces the pulled/pushed load statistics from Table 6.8 (non-highlighted columns). Five-number boxplot summaries and estim- ates of range and variability are listed. Although not used by the study, non-robust measures are provided in grey below the equivalent robust measures for comparison purposes. The medians and means are displayed with confidence intervals. Note that the BCa bootstrap confidence intervals for the activity/passivity medians were unstable; the Percentile intervals for these estimates are given instead.

296 6.4 Phenomenological event structures of serendipity

Comparison of the statistical results presented in Table 6.12 indicated that the bounds of the median estimates overlapped for pulled load/activity and also for pushed load/passivity. As well, the relative median deviations showed little change in variation between pulled load and activity; however, this was not the case for pushed load and passivity. Passivity was considerably more variable than pushed load. The absolute activity/passivity results supported the research inferences made in relation to absolute pulled/pushed load (see section 6.4.2.1): when multiple process interactions were considered, pull from ego (i.e.‘doing’) still dominated push to ego (i.e. ‘happening’); and, ego’s pull still demonstrated more consistency than ego’s push. The notable difference between ego’s load and ego’s activity was the increased level of inconsistency once multiple interactions were factored into ego’s ‘passive’ process. A caveat must be noted concerning the results related to absolute activity: because they derived from estimated data, the results related to ego’s activity were inherently less secure than the results related to ego’s passivity. Given the estimation procedure used, the research expected to over- rather than underes- timate ego’s activity; however, the study was not able to quantify the degree of overestimation—or indeed, underestimation—a point discussed further in section 7.2.4. Common sense suggested that a small degree of overestimation would not have changed the descriptive statistics presented in Table 6.12 to such an extent that the research inferences related to activity no longer stood; nonetheless, all considerations of the activity results should bear in mind that the results relied in part on estimation.

Activity versus passivity As well as ego’s absolute activity, the study investigated ego’s relative activ- ity in the event structures. Figure 6.22 presents boxplots of ego’s normalised activity and passivity. Note that, to enable the comparisons of ego’s normalised activity/passivity with ego’s normalised pulled/pushed load, Figure 6.22 includes boxplots reproduced from Figure 6.18. Also note that, like ego’s absolute activity, ego’s normalised activity was estimated rather than measured. Table 6.13 lists descriptive statistics for the four boxplots illustrated in Figure 6.22.

297 6. DESCRIPTIVE INFERENCES

Pulled Load Estimated Activity

1.0 1.0 Dicke('80), ● Galanter('83),Schatzmann('84)

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

Ego's Normalised Process Interaction Ego's 0.0 0.0

Pushed Load Passivity

1.0 1.0

0.8 0.8

Pedersen('85) ● 0.6 0.6 Pedersen('85) ● Gocke('85) ● 0.4 0.4

0.2 0.2

Ego's Normalised Process Interaction Ego's 0.0 0.0

Figure 6.22: Comparison of ego’s normalised load and activity - The top row boxplots relate to ego’s outgoing process interaction: the top left plot illus- trates ego’s normalised pulled load and the top right, ego’s normalised activity. The bottom row boxplots relate to ego’s incoming process interaction: the bottom left plot shows ego’s normalised pushed load and the bottom right, ego’s norm- alised passivity. Identical y axes allow for comparisons across the boxplots. The pulled/pushed load data are reproduced from Figure 6.18.

298 6.4 Phenomenological event structures of serendipity 9 2 17 16 . . 0 0 3 9 7 3 2 8 6 7 0 8 9 4 0 14 11 . . 0.50 0.50 0.08 0 0.21 0.13 0.05 0.07 0.10 0.08 0.09 0.67 0.70 0 9 10 11 . PASSIVITY . 0 0 NORMALISED 4 24 . 22 . 0 0 6 6 6 3 2 3 4 7 3 20 0 20 . . 0 0.11 0.27 0.16 0.09 0 0.63 0.63 0.13 0.13 0.12 0.12 0.66 0.64 8 12 . 0 - This table presents descriptive statistics for 16 . 0 NORMALISED PUSHED LOAD 6 50 55 . . 0 0 5 1 2 6 9 9 8 2 2 2 1 9 9 8 47 49 . . 1.00 0 0.14 0.37 0.56 0.85 0.19 0.10 0.15 0.21 0.18 0.18 0.31 0.42 0 6 5 39 43 . . 0 0 NORMALISED ACTIVITY (Estimated) 9 3 90 88 . . 0 0 7 2 0 4 3 1 0 5 9 8 8 2 84 84 . . 0.46 1.00 0.54 0 0 0.76 0.99 0.23 0.10 0.15 0.14 0.14 0.13 0.17 0.17 7 1 82 80 . . 0 0 NORMALISED PULLED LOAD n n n S Q SD Min IQR Med Max RSD Mean RMD MAD 25%Q 75%Q Range MAD Table 6.13: Ego’sthe normalised four activity/passivity boxplots datanormalised illustrated summary activity/passivity in (highlighted Figure columns);normalised for 6.22. pulled/pushed comparison purposes, load Robust (non-highlighted the measuresare columns). table reproduced of also Note from includes central that Table statistics tendencyand the 6.9. for confidence five-number and ego’s boxplot intervals As dispersion are summaries with given are for other for ego’s listed tables medians load in for and the means. ego’s current chapter, non-robust statistics are provided in grey,

299 6. DESCRIPTIVE INFERENCES

Given the non-normal distributions of the normalised activity data, robust

measures were preferred. The median estimate was 0.396 0.475 0.50 for ego’s es-

timated normalised activity and 0.10 0.114 0.162 for ego’s normalised passivity (see Visualisations 3.1, 21.1, 42.1). Based on these estimates, the research inferred a rough 48% : 11% ratio for ego’s relative activity to relative passivity. For each type of activity, 100% was the maximum possible for ego in the event structure. Recall that ego’s median normalised pulled and pushed loads described a rough 85% : 20% ratio (see section 6.4.2.1). A comparison of this ratio with the normalised activity/passivity ratio of 48% : 11% showed that, when multiple interactions were taken into account, there was an overall drop in the load carried by ego relative to the event structure. Despite this change, multiple interactions did not affect the distribution of ego’s push to pull. The ratio of ego’s median load to median activity effectively remained constant:

47.5 % 11.4 % : 84.7 % 20% or

0.56 : 0.57.

Once again, push displayed more variability than pull: the relative median deviation was higher for ego’s passivity (RMD = 0.678 ) than for ego’s activity (RMD = 0.319 ). This said, pull was more affected by multiple interactions than push: while dispersions for ego’s normalised pushed load (RMD = 0.667 ) and passivity (RMD = 0.678 ) showed little change, variability increased for ego’s normalised activity (RMD = 0.319 ) compared to pulled load (RMD = 0.178 ). Three outliers presented in the activity data whereas none had presented in the pulled load data. Recall that the research expected the estimation procedure used to generate the activity data to over- rather than underestimate. As the three activity outliers were all at the top extreme of the data, it is possible that they resulted from overestimation. The outlier situation for passivity was more consistent with that of pushed load: although only one passivity outlier presented—instead of two—this one passivity outlier was the highest of the two outliers found in the pushed load data.

300 6.4 Phenomenological event structures of serendipity

Overall, the research inferences concerning ego’s normalised activity should be viewed as less secure than those relating to ego’s normalised load. The procedure (Wei et al., 2011) used by the study for the measurement of degree centrality in weighted networks relies on the use of a weighting constant for the calculation of the maximum degree centrality. Because this constant equates to the highest weight found in the network, the assumption implied is that any link in the network has the potential to have that highest weight. This assumption is problematic in the context of real-world networks where sociological aspects—for example, circumstances or environments that create op- portunities for vertices to interact with one another—may inherently constrain the weights on network links: in actuality, not all network links can be guaranteed the potential to carry the maximum weight. The problem of how to approach degree centrality in weighted networks continues to receive attention in network studies research (see Abbasi, Hossain and Chung, 2012; Newman, 2004; Opsahl, 2009; Opsahl, Agneessens and Skvoretz, 2010).

6.4.2.4 Interaction

The research defined interaction as the extent of the Citation Classics narrator’s direct interaction with other participants in the event structure. Two types of interaction were considered: true interaction, or actual interaction, and dependent interaction, which emphasised sequence.77 The study investigated two aspects of interaction:

1. differences between true and dependent interaction; and,

2. changes in attribute distribution with type of interaction.

In all cases, interaction was measured on ego networks of the Citation Classics narrators.78 77See section 5.8.1.1 for in-depth explanations of these concepts. 78See sections 5.8.1.2and 5.8.1.3.

301 6. DESCRIPTIVE INFERENCES

True versus dependent interaction Figure 6.23 presents boxplots of the study data related to ego’s interaction. The first two boxplots show the measured data for ego’s true and dependent interaction; the final boxplot illustrates data for the estimated difference between the two types of interaction.79 Table 6.14 provides descriptive statistics for the interaction data. Quantile-quantile plots (not shown) and Shapiro-Wilk Tests suggested that, although the data for dependent interaction (W = 0.974 , p = 0.330) and the estimated difference (W = 0.974 , p = 0.337) were normally distributed, the data for true interaction (W = 0.892 , p < 0.001) were non-normal. As such, robust measures of central location and variability were employed. The median percent- age estimates were 83.0 85.7 89.4 for ego’s true interaction (see Visualisations 11.2,

27.2, 32.2) and 50.0 53.1 62.3 for ego’s dependent interaction (see Visualisations 25.1, 36.1, 44.1). The median percentage for the estimated difference between true and dependent interaction was 21.8 28.7 36.6 . These results suggest that ego has the potential to reach well over three quar- ters (roughly 86%) of the event structure by means of direct process interaction; only about 14% of the event structure is beyond ego’s direct reach. To access this remaining area, ego is entirely dependent on process interactions that oc- cur between other vertices in the event structure. When the sequence of pro- cess is considered, ego’s reach effectively diminishes, encompassing just over half (roughly 53%) of the total event structure; the study results suggest that at least one quarter (roughly 29%) of ego’s direct involvement with the serendipity process described by the event structure is estimated as dependent on sequence—that is, on process occurring in a certain order. The extreme ranges80 of the interaction data also provide insight into ego’s reach in the complete event structure. In terms of true interaction, ego always interacts with at least 50% of the total event structure. This said, ego’s reach may extend to 100% of the event structure. Indeed, this was not a rare oc- currence: a 100% reach was recorded in 13 (26%, n = 50) of the sample event structures. When dependent interaction is considered, ego may interact with as

79See section 5.8.1.3. 80Note the absence of outliers in the interaction data.

302 6.4 Phenomenological event structures of serendipity little as 28% of the total event structure; however, the maximum possible inter- action still extends to 100%. These differences illustrated by the extreme range statistics are echoed by the dispersion statistics for interaction: true interaction (RMD = 0.164 ) is more homogeneous in terms of variability than dependent in- teraction (RMD = 0.339 ).

TRUE DEPENDENT ESTIMATED INTERACTION INTERACTION DIFFERENCE

Min 50 28 0 25%Q 78.3 43.7 18.4

Med 83.0 85.7 89.4 50.0 53.1 62.3 21.8 28.7 36.6

Mean 81.6 85.5 89.3 51.6 56.3 61.0 24.7 29.5 34.3 75%Q 99.4 66.7 41.9 Max 100 100 63.4 Range 50 72 63.4 IQR 21.1 23.0 23.6 MAD 9.5 12.1 12.1

MADn 14.1 18.0 17.9 SD 13.5 16.6 17.0

Sn 14.6 16.3 18.1

Qn 12.9 17.2 19.1 RMD 0.164 0.339 0.623 RSD 0.158 0.294 0.577

Table 6.14: Interaction data summary - This table summarises the descriptive statistics for the interaction data. Statistics are presented for true and dependent interaction as well as for the estimated difference between the two types of inter- action. The table lists the five-number boxplot summaries, range and interquart- ile estimates, and robust estimates of scale. Non-robust measures (in grey) are supplied for comparison purposes only. The median and mean estimates include confidence intervals.

303 6. DESCRIPTIVE INFERENCES Estimated Difference 0 80 60 40 20 100 Dependent Interaction 0 - The first two boxplots in this figure allow the comparison of ego’s true 80 60 40 20 100 True Interaction True 0

80 60 40 20

100 Ego Network % of Total Event Structure Event Total of % Network Ego and dependent interaction. Theinteraction. third boxplot shows the estimated data for the difference between true and dependent Figure 6.23: Ego’s direct interaction

304 6.4 Phenomenological event structures of serendipity

Changes in attribute distribution The research also examined how the attribute distribution in the Citation Clas- sics narrator’s ego network changed with the type of interaction measured. Figure 6.24 shows boxplots of the ego network attribute data for true and dependent in- teraction. Table 6.15 lists descriptive statistics for the data illustrated in the boxplots. A comparison of the median percentage estimates presented in Table 6.15 indicates that ‘knowledge’ was the most common attribute found in the ego net- works for both true and dependent interaction. The extreme range statistics also single out ‘knowledge’ as distinctive. The attribute was the only one to reach the maximum possible value of 100%. Although this value was only recorded for outliers in the true interaction data, the same was not true for the dependent data, which had a range of 100%. ‘Knowledge’ was also the only attribute not recorded with the lowest possible value of 0%:81 for true interaction, ‘knowledge’ made up at least 18% of the ego network. Broadly speaking, the ‘person’ and ‘knowledge’ attributes displayed similar dispersions; more specifically, variability of the ‘person’ attribute data decreased slightly for dependent interaction, while variability for the ‘knowledge’ attrib- ute data increased slightly for dependent interaction. Compared with the other attributes, the ‘object’ attribute displayed noticeably higher variability as meas- ured by the relative median deviation, especially for dependent interaction. This said, it must be stressed that the relative median deviation statistic can be un- duly affected by extreme values; the dependent interaction data for the ‘object’ attribute did contain a single, yet very extreme outlier. Figure 6.25 provides another perspective on the changes in attribute distribu- tion with type of interaction. The radial diagram shows two triangles formed by the median point estimates for the attribute percentages. Note how the triangle moves further away from an equilateral shape for the dependent interaction estim- ates: ‘persons’ and ‘knowledge’ form a larger proportion of the overall attribute distribution in the dependent interaction ego networks, while the proportion of objects in the dependent interaction ego networks decreases noticeably.

81Obviously, due to the presence of ego, the ‘person’ attribute was never truly 0%; however, recall from section 5.8.1.2 that ego was not considered in the interaction attribute analysis.

305 6. DESCRIPTIVE INFERENCES

Persons Knowledge Objects

100 100 ● 100 Brown('81),Cleaver('81)

80 80 80

60 60 60

40 40 40

20 20 20 Attribute % of True Interaction Ego Network % of True Attribute

0 0 0

100 100 100

80 80 80 Bandurski('81)

60 60 60

40 40 40

20 20 20 Attribute % of Dependent Interaction Ego Network Attribute 0 0 0

Figure 6.24: Attribute distribution in the interaction ego networks - The boxplots in the top row show the percentage data for persons, knowledge and objects in the true interaction ego networks of the sample event structures. The bottom row boxplots show the equivalent data measured on the dependent interaction ego networks. 306 6.4 Phenomenological event structures of serendipity

Changes in Attribute with Type of Direct Interaction

True Interaction Dependent Interaction

Persons 50%

40%

30%

20%

10%

0%

Objects Knowledge

Figure 6.25: Changes in attribute with type of direct interaction - This radial diagram shows how the median estimates for the attributes of ‘person’, ‘knowledge’ and ‘object’ changed with the type of direct interaction measured in the ego networks of the sample event structures. The red and blue dots mark median estimates for the attribute percentage relative to the ego network: red relates to true interaction, and blue, to dependent interaction. Note that because median rather than mean estimates are illustrated, neither set of point estimates (i.e. red dots, blue dots) forming the two triangles sums to exactly 100%.

307 6. DESCRIPTIVE INFERENCES 3 . 7 . 19 16 3 1 4 4 6 6 6 2 3 5 . 6 0 . 75 75 9. 21. 10. 15. 17. 13. 14 1.48 1.23 10 3 . 0 9 5 . 6 . 33 OBJECT 36 2 7 4 9 4 2 2 2 6 7 8 1 . 0 0 34 21. 30 10. 27 44. 69. 69. 15. 22. 19. 21. 20. True Dependent 7 0.75 0.70 . 3 . 17 Interaction Interaction % OF EGO NETWORK 22 9 . 3 . 54 59 4 7 9 4 7 7 7 7 1 7 6 . 0 50 100 33. 48 66. 32. 16. 24. 23. 23. 23. 5 0.49 0.47 . 8 . 37 41 0 5 . . - This table summarises the attribute percentage data for 50 52 1 2 0 3 8 1 5 3 2 8 5 2 3 9 . . KNOWLEDGE 100 100 18. 35. 56. 82. 21. 12. 18. 19. 17. 17. 42 46 True Dependent 0.43 0.42 9 3 . . % OF EGO NETWORK Interaction Interaction 36 41 9 8 . . 42 42 7 5 8 2 3 9 8 8 9 0 2 9 2 . . 25 10. 39 37 48. 77. 77. 23. 16. 19. 19. 17. 0.40 0.51 3 9 . . 33 31 4 0 . . 29 29 PERSON 4 8 1 2 7 3 6 6 7 4 3 4 8 4 . . 0 0 8. 26 25 16. 33. 55. 55. 16. 12. 13. 13. 13. True Dependent 0.47 0.52 0 4 . . % OF EGO NETWORK Interaction Interaction 20 21 n n n S Q SD Min IQR Med Max RSD Mean RMD MAD 25%Q 75%Q Range MAD Table 6.15: True/dependent attributetrue data and summary dependentstatistics interaction as for measured centralcomparison on tendency purposes, the and non-robust ego dispersion statistics networks are are in provided listed in the grey. for sample Confidence the event intervals structures. are attributes given of for Robust ‘person’, medians descriptive and ‘knowledge’ means. and ‘object’. For

308 6.4 Phenomenological event structures of serendipity

Although both people and knowledge increased with dependent interaction, people displayed the greater increase. Overall, the ‘knowledge’ attribute still dominated the ego interaction networks. This result mirrored the research finding that the ‘knowledge’ attribute comprised the largest proportion of the attribute distribution in the total event structure (see section 6.4.1.5).

6.4.2.5 Interaction patterns

The study sought to describe statistically significant interaction patterns, initially, within, and subsequently, across the event structures. Both global and local interaction patterns were investigated. Network topology inference served to identify global interaction patterns. With respect to local interaction patterns, the research focused on those involving ego (i.e. the Citation Classics narrator). Network motif detection was employed to infer local interaction patterns.82

Global interaction Table 6.16 presents the topology inference results. A score of ‘high’ was deemed significant. 27 (54%, n = 50) of the serendipity event structures scored as sig- nificant on at least one random graph topology, with 16 (32%, n = 50) of the 27 scoring as significant on both of the topologies studied. The remaining 23 (46%, n = 50) event structures did not score as significant on either random graph topology. Of the two random graph topologies studied, the core-periphery topology (‘not found’= 2) was identified in the event structures more often than the Erd¨os-R´enyi topology (‘not found’= 7). In a sense, the two topologies studied in the research are opposite extremes of each other: the core-periphery graph is highly centralised and star-like, while the Erd¨os-R´enyi graph is decentralised, not focused on any particular vertices. Recall that the two random graph models studied were selected because of em- pirical evidence from the literature.83 There were implications of this evidence for the study. On the one hand, given the egocentric nature of the Citation Classic event structures, the study expected to find many examples of the star-

82See sections 3.5.7.2 and 3.5.7.3 for introductions to topology inference and motif detection. See sections 5.8.3 and 5.8.2 for the specific procedures used in the study. 83See section 3.5.7.2.

309 6. DESCRIPTIVE INFERENCES

Score Erd¨os-R´enyi Core-Periphery

low 7 11 med 17 13 high 19 24 not found 7 2

Table 6.16: Topology inference scores by random graph model - This table lists the scores (‘low’, ‘med’, ‘high’ and ‘not found’) for the two random graph topologies studied (see section 5.8.3).

like core-periphery structure; on the other hand, if random events played a large role in serendipity, then the research expected to find repeated examples of the decentralised random structure. In fact, the research found the ‘mixture of types’ of topologies (Airoldi and Carley, 2005) common to real-life complex networks.84 The study results suggest that the Citation Classics event structures were almost as likely to reflect neither extreme represented by the two topologies studied as to mirror these extremes. The results also indicate that around one third of the sample event structures matched both extremes; in other words, a notable number of the event structures displayed a complex mix of focus and randomness in process. As the most frequently recorded significant topology, the centralised struc- ture of the core-periphery model did dominate the decentralised structure of the Erd¨os-R´enyi model; however, with an overall prevalence of 48%, the core- periphery structure fell just short of dominating the study sample. This find- ing provides yet more support for the research inference that, although star-like shapes presented across the Citation Classics event structures, the perfection of these star shapes was compromised by the structural influence of other types of patterns (see sections 6.4.1.1 and 6.4.2.2). It should be emphasised that the topology inference method used by the study relies on the network measures of betweenness, closeness and inverse closeness centrality—all flow-based measures—to determine significance.85 The sample

84See section 3.5.7.2. 85See section 5.8.3.

310 6.4 Phenomenological event structures of serendipity event structures were all non-flow network models. While the measures used by the topology inference procedure do still assess structural properties of networks— and structure is structure, whether created by flow or not—it is possible to argue that, inferences made on the basis of non-flow measures would have been prefer- able for the study. Unfortunately, non-flow measures for assessing significance were not available in ORA’s topology inference routine.

Local interaction The study addressed four questions related to the motifs detected as local interaction patterns in the sample event structures:

1. Do any motifs dominate, and, if so, which ones?

2. Do any motifs tend to ‘go together’ and, if so, how?

3. If dominant motifs exist, what link/attribute patternings do they describe?

4. Do the event structures group by motif and, if so, how?

Question three was addressed through manual follow-ups on the automated motif detection. Nonparametric multidimensional scaling enabled the investiga- tion of questions one, two and four.86 Note that some of the research findings related to local interaction patterns are presented in McBirnie and Urquhart (2011), a report which includes a more substantial discussion than is possible in the limited space available here.

Motif dominance and clustering: Ten statistically significant local inter- action patterns involving ego were detected in thirty of forty-eight sample event structures.87 Table 6.17 enables the reader to link these ten motifs, now labelled

86See section 5.8.2 and Appendix K. 87McBirnie and Urquhart (2011) reported that nine motifs were found. A further review of the data showed that an additional motif (id164) involving ego appeared in one of the sample event structures (Harrison 1987). As the links in event structure for Harrison 1987 were very dense around ego (see Visualisation 24.1), the manual follow-ups on the detected motifs (see section 5.8.2) failed originally to spot ego’s involvement in motif id164 in this event structure. Section 7.2.2 returns to the issue of why a combination of automated and manual analysis was necessary in the study.

311 6. DESCRIPTIVE INFERENCES

by FANMOD id numbers, to those previously illustrated in Figure 3.7. Note that faulty FANMOD output for two of the sample event structures, Galanter (1983) and Schaefer (1981), meant that the sample size was effectively reduced to n = 48 for the motif detection.88

FANMOD id number Number from Figure 3.7

id6 no. 5 id12 no. 6 id38 no. 7 id140 no. 8 id164 no. 10 id46 no. 11 id166 no. 12 id102 no. 13 id78 no. 14 id238 no. 16

Table 6.17: The ten motifs involving ego - This table lists the ten motifs involving ego detected in the sample event structures. The FANMOD id number and the corresponding number from Figure 3.7 is provided for each motif.

Figure 6.26 presents a nonparametric multidimensional scaling graph of the data for the ten detected motifs. For the matrix underlying this graph, see Appendix N. The graph illustrates both motif dominance and motif clustering: motifs found at the centre of the graph were the most dominant; and, motifs located closer together on the graph occurred together more often. An examination of Figure 6.26 shows that four motifs lie especially close to the centre of the graph (i.e. the 0.0 point of the x–y axes): id46, id12, id238 and id6. Of these four dominant motifs, two, id46 and id12, appeared together most often in the event structures. 88As checks did not find problems with the FANMOD input files for the two event structures, a definitive reason for the faulty output could not be determined. It should be noted that the study was not able to check for non input-related problems, such as software bugs.

312 6.4 Phenomenological event structures of serendipity 0.6 Schatzmann + Reynolds Cohn o id102 0.4 + Kappas + Fitzgerald id166 o o id78 0.2 + + o Myers Giblett Adams id140 + + Bull Gocke + Field id12 Weissman2 Weissman3 0.0 Milstein o Warner + o + Boyd Mayer + Hudson id238 + + + o NMDS1 + id6 id46 o Kirk Buzbee Ashbaugh Beadle Smythe Weissman1 + Blanchard Dominant Motifs Harrison + - Four dominant motifs (in red) cluster at the centre of this nonpara- Kroto −0.2 o id164 −0.4 o id38 Fogg + −0.6 Poulik

Greenwood

0.4 0.2 0.0 −0.2 NMDS2 Figure 6.26: Clusteringmetric of multidimensional dominant scaling motifs graphclustering that shows of the the motifs eventmeant involving structures ego to (in detected be grey) in read is the in sample shown detail. event in See structures. the Figure The background 6.28 of for the details graph of to the provide event context structure only clustering. and is not

313 6. DESCRIPTIVE INFERENCES

A range of other information on dominance and clustering can be discerned from studying the graph. For example, although not one of the four most domin- ant motifs, motif id78 was more dominant than motif id102; this said, motif id78 was more likely to appear with motif id102 in an event structure than, say, with motifs id164 or id38. It is important to stress that the study determined the dominance of a motif by the number of event structures in which it appeared. This formal assessment of dominance did not incorporate the frequency of a motif’s occurrence within an event structure; for example, a motif detected with a frequency of three in ten event structures (i.e. 3 x 10 = 30) was considered less dominant than a motif detected with a frequency of one in thirty event structures (i.e. 1 x 30 = 30). An informal assessment of the motif detection data suggested that a domin- ant motif was also likely to occur with a higher frequency than a non-dominant motif in any given event structure. In other words, dominant motifs were rarely recorded with a frequency of one in event structures: much more often, dominant motifs had the highest recorded frequencies (i.e. ≥ 5).

Attributes and link patterns of the dominant motifs: Figure 6.27 shows details of the four dominant motifs involving ego detected by the study. Because of the position of ego and the attributes of the other motif vertices, these local interaction patterns were nicknamed the exchange motif (id46), the solo motif (id6), the collaboration motif (id238) and the chain motif (id12). With its appearance in 21 (43.8%, n = 48) of the event structures, the ex- change motif was the most ‘dominant’ of the four dominant motifs. In this pat- tern, ego and another person were connected by a two-way link. The motif owes its nickname to this two-way exchange of process. As well as linking to each other, both ego and the second person linked out to the third vertex in the motif. This third vertex was most often information. The solo motif was detected in 13 (27.1%, n = 48) of the event structures. In this pattern, ego linked out to two other vertices. One of these was always information; although usually also information, the other was occasionally an object or a person. The solo motif gets its nickname from the fact that ego has no incoming links: all of ego’s process is active, outgoing process.

314 6.4 Phenomenological event structures of serendipity

id46 id6 Exchange Motif Solo Motif

id238 Collaboration Motif

Ego positioned at beginning of chain Ego positioned in middle of chain

id12 Chain Motif

Figure 6.27: The four dominant motifs - This figure shows the four dominant motifs involving ego detected in the sample event structures. Note the two versions of the chain motif (id12). Vertices illustrated by thumbnail images have fixed attributes, while vertices illustrated by blue dots have variable attributes.

315 6. DESCRIPTIVE INFERENCES

The collaboration motif occurred in 12 (25%, n = 48) of the event structures. As the pattern’s nickname suggests, all links in the motif were bidirectional; vertices were always people—never information or objects. The chain motif presented in 14 (29.2%, n = 48) of the event structures and displayed more complex behaviour than the other three dominant motifs. The two one-way links in this local interaction pattern formed a chain. The chain motif was more complex because, unlike the other dominant motfis, ego’s position could vary. All fourteen event structures that contained the chain motif had examples of the motif with ego positioned at the beginning of the chain; additionally, three of the fourteen had examples with ego positioned in the middle of the chain. When the motif appeared with ego positioned at the beginning of the chain, the other vertices in the motif were almost exclusively information and objects; however, in two of the three cases where ego sat in the middle of the chain, the vertex occupying the starting position was a person whose only link—not only in the local environment of the motif but also in the larger event structure—was out to ego. All considerations of the motif findings should bear in mind that, just as event contexts can overlap in the cumulative event structures, event contexts can overlap in the local interaction patterns. For example, the two-way process interaction described by the exchange motif need not occur in a single event context: the two-way link may describe a one-way process link in one event context followed by a process link going the other way several event contexts later. The four dominant motifs displayed unexpected consistency with respect to attributes and the position of ego, a point discussed further in section 6.4.3.3. The dominant motifs were also strongly associated with people and knowledge; although the ‘object’ attribute did appear, it did so infrequently in comparison with the ‘person’ and ‘knowledge’ attributes. Like the results related to attribute distribution in the event structures (see section 6.4.1.5), the motif findings lent weight to the literature’s idea of people and information as significant participants in serendipity process.89

89See section 2.3.3.2.

316 6.4 Phenomenological event structures of serendipity

Motif-based clustering of the event structures: Figure 6.28 shows a nonparametric multidimensional scaling graph that illustrates the clustering of the event structures by motif. Note that, although points on the graph are positioned exactly, text label locations are optimised to minimise overlapping.90 Appendix N presents the data underlying the graph. An inspection of Figure 6.28 finds several clusters of event structures. The graph shows a large cluster of approximately twenty event structures and two smaller clusters of three event structures each. A few isolate event structures are also evident. Six of the ten motifs detected by the study are associated with the large event structure cluster. These six motifs include the four dominant motifs. The two smaller groups of event structures each cluster near a motif disassociated from the six motifs found in the area of the larger event structure cluster. The remaining two motifs appear to bridge the isolate event structures. Event structures found towards the centre of the graph are more likely to contain dominant motifs; for example, Kirk (1989), Mayer (1992) and Smythe (1980)—all located near the centre point—contain only dominant motifs. Event structures found at the outskirts of the graph—for example, Fitzgerald (1980) and Greenwood (1977)—contain no dominant motifs. The finding that the sample event structures grouped into three distinct clusters, one of which was notably larger than the others, highlights several points:

1. serendipity experiences described by the Citation Classics share similar underlying local patterns of process;

2. the serendipity experiences can be grouped according to these shared pat- terns;

3. the resulting groups are noticeably distinct from each other;

4. one large group dominates; and

5. serendipity experiences found in this dominant group are also more likely to contain the most dominant local patterns of process.

90The graph was created with the >ordipointlabel function of Oksanen et al.’s (2012) vegan package for R. The graph in Figure 6.26 was also drawn with this function.

317 6. DESCRIPTIVE INFERENCES 2 Weissman2 o Hudson o Ashbaugh Weissman1 Weissman3 1 o Beadle id46 Buzbee + o + Boyd o + o Milstein id12 Mayer Smythe Blanchard id6 o + o id140 Field Gocke o + + o Kirk Adams id164 o 0 o o - This nonparametric multidimensional scaling graph Bull id238 Giblett o Kroto id166 Harrison Myers + NMDS1 Warner + o id78 Fitzgerald −1 o Clustering of Event Structures and Motifs Structures Clustering of Event Kappas id38 Poulik + o o o Fogg Cohn Reynolds + o o o Greenwood −2 id102

Schatzmann

2.0 1.5 1.0 0.5 0.0 −0.5 −1.0 −1.5 NMDS2 Figure 6.28: Clusteringshows event of structures (‘o’ event in structures blue) and and motifs motifs (‘+’ in red). Event structures closer together are more similar.

318 6.4 Phenomenological event structures of serendipity

These points made, a caveat must be stressed. Much of the potential credited to motifs as a means for understanding complex phenomena lies in the idea of motifs as fundamental building blocks of a larger structure. This idea remains open to debate:91 statistical significance does not necessarily equate to functional importance (Milo et al., 2002). Section 7.3.3 returns to the issue of the credibility of inferences made on the basis of motif detection.

6.4.3 Evidence for normality

6.4.3.1 Normal structure

The study found evidence to support the idea of normal process structure in the serendipity described by the Citation Classics. The visualisations (see section 6.4.1.1) provided initial indications that some features appeared to repeat across the sample event structures: star-like patterns and shapes with people clustered towards the centres were common, as were smaller, repeated patterns within the overall event structure patterns. These early indications of normality were subsequently given more support by the study’s statistical results related to load and interaction patterns (see sections 6.4.3.2 and 6.4.3.3). Other evidence for normality arose from the statistical analysis of the com- plexity data. In terms of cohesion, the event structures had density values typical of sparse networks, even if the measured values were at the lower end of those normally expected for real-world, sociological networks. With respect to attrib- utes, although the ‘knowledge’ attribute dominated the event structures in terms of prevalence, only the ‘person’ attribute displayed normality in the statistical sense. Shapiro-Wilk Test results for the ‘person’ attribute percentage data for model one (W = 0.979 , p = 0.497) and model two (W = 0.955 , p = 0.056) did not reject the null hypothesis. Figure 6.29 shows normal quantile-quantile plots of the data for both models. It is clear from the quantile-quantile plots that model one matches the straight line fairly closely; a S-shape is more evident for model two. This more prominent S-shape likely results from the heavier tail of the distribution of the model two data (see Figure 6.11) and the outliers located at

91See section 3.5.7.3 and the ‘Limitations’ section of McBirnie and Urquhart (2011).

319 6. DESCRIPTIVE INFERENCES

both extremes of the model two data (see Figure 6.10). Once these points are taken into consideration, the visual evidence seems in reasonable agreement with the results of the statistical tests. One implication of the normal distribution of the ‘person’ attribute percent- age data is that the measures of central tendency for the data can be considered typical: in other words, in the serendipity experiences described by the Citation Classics, it is typical for people to comprise around one third of the total parti- cipants in an event structure and for the variability of this value to display the same consistency above and below the central point estimate. More broadly, the finding of the normal distribution raises the possibility that the average number of people involved in a serendipity experience described by a Citation Classic can be estimated simply from the total number of participants involved in the experience; for example, in a computational modelling situation, the modeller might begin with the point estimate for central tendency, creating ten participants and assigning the ‘person’ attribute to three or four of these participants.

320 6.4 Phenomenological event structures of serendipity ● 2 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● Theoretical Quantiles Theoretical ● ● ● ● - This figure shows normal quantile-quantile ● −2 ● Model Two Person % of Total Person Model Two

60 50 40 30 20 10 Sample Quantiles Sample ● 2 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● Theoretical Quantiles Theoretical ● ● ● ● ● −2 ● Model One Person % of Total Model One Person

60 50 40 30 20 10 Sample Quantiles Sample Figure 6.29: Quantile-quantile plotsplots of of the the ‘person’formed ‘person’ attribute by data attribute the percentage plotted data data suggest for that event the structure data are models normally one distributed. and two. The relatively straight lines

321 6. DESCRIPTIVE INFERENCES

6.4.3.2 Normal load

The study found consistent evidence that the Citation Classics narrators carried a normal load. Recall that pulled load was measured on the model two event structure, and pushed load, on the model one event structure.92 When added together, ego’s pulled and pushed load form ego’s total load.93 The histogram in Figure 6.30 showed a bell-curved shape for the distribution of the sum of ego’s normalised pulled and pushed load. Both the quantile-quantile plot illustrated in Figure 6.31 and a Shapiro-Wilk Test result (W = 0.970 , p = 0.234) also suggested that the summed data were normally distributed. A check of the study’s ego normalised total load data reiterated the research finding of statistical normality in the summed data. The quantile-quantile plots of the model one, model two and averaged ego normalised total load data illustrated in Figure 6.32 suggested that, despite some lack of correspondence in the extremes of the data, all sets of ego normalised total load data were normally distributed.94 As well, Shapiro-Wilk Tests performed on the model one (W = 0.961 , p = 0.097), model two (W = 0.974 , p = 0.337) and averaged data (W = 0.975 , p = 0.354) did not reject the null hypothesis. Another perspective on the normality of ego’s total load is provided by Figure 6.33, which shows histograms of the normalised total load data for ego and average vertices. Note the heavily right-skewed distribution of the average vertex data compared to the more bell-curved, normal distribution of the ego vertex data. As the average vertex data illustrate, no purely mathematical reason forces measures of normalised total load to follow a normal distribution; therefore, it is possible that a sociological reason lies behind the normal distribution of the ego vertex data in the sample event structures. An investigation of this possibility was outside the remit of the study.

92See section 5.7.1. 93Strictly speaking, for normalised load measures, this statement is not mathematically cor- rect. In fact, the sum of the normalised pulled and pushed loads must be divided by two to give the correct value for normalised total load (see section 3.5.6.5). However, for the purposes of the discussion here, this point is irrelevant; for example, the same Shapiro-Wilk Test result was recorded by the study, for both the summed data [ i.e. (a + b) ] and the summed data divided by two [ i.e. (a + b) ÷ 2 ]. 94Note that the earlier discussion of the study’s ego normalised total load data (see section 6.4.2.1) only references the averaged data.

322 6.4 Phenomenological event structures of serendipity

Distribution of the Sum of Ego's Measured Pulled and Pushed Load

20

15

10

5

0

0.4 0.6 0.8 1.0 1.2 1.4 1.6

Sum of Pulled and Pushed Load

Figure 6.30: Distribution of ego’s summed pulled and pushed load - This histogram shows the distribution of the sum of ego’s pulled and pushed load.

323 6. DESCRIPTIVE INFERENCES

Sum of Ego's Measured Pulled and Pushed Load

1.6 ●

1.4 ●

● ● ●● ●● 1.2 ●●● ●●● ●● ●●●● ●●● ●● ●●

1.0 ●● ● ●●● ●● ● ●●● Sample Quantiles ● ● ●● ● 0.8 ●

● ●

0.6 ● ●

−2 −1 0 1 2

Theoretical Quantiles

Figure 6.31: Quantile-quantile plot of ego’s summed pulled and pushed load - This figure shows a normal quantile-quantile plot of the sum of ego’s meas- ured pulled and pushed load. The relatively straight line formed by the plotted data suggests that the data are normally distributed.

324 6.4 Phenomenological event structures of serendipity ● 2 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● Theoretical Quantiles Theoretical Averaged Ego Load Averaged ● −2 ●

0.7 0.6 0.5 0.4 0.3 Sample Quantiles Sample - This figure shows normal quantile-quantile ● 2 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● Theoretical Quantiles Theoretical ● Model Two Ego Load Model Two −2 ●

0.7 0.6 0.5 0.4 0.3 Sample Quantiles Sample ● 2 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● Theoretical Quantiles Theoretical ● Model One Ego Load −2 ●

0.7 0.6 0.5 0.4 0.3 0.2 Sample Quantiles Sample Figure 6.32: Quantile-quantile plotsplots of of ego’s the normalised ego totalwere total calculated load from load the data modelThe measured relatively one on straight and event lines two structure data. formed models by In the all one plotted cases, and data the two, suggest data as are that well the the as normalised data the measures are of averaged normally ego’s data, distributed. total which load.

325 6. DESCRIPTIVE INFERENCES 1.0 0.8 0.6 0.4 0.2 Average Vertex Average 0.0 Averaged Normalised Total Load Normalised Total Averaged - These histograms show the distributions of the 5 0 25 20 15 10 1.0 0.8 0.6 0.4 Ego Vertex 0.2 0.0 Averaged Normalised Total Load Normalised Total Averaged

5 0

25 20 15 10 Frequency Figure 6.33: Distribution ofaveraged data ego/average for normalised normalised total total load of load the for histograms. both the These ego data vertex were and originally the presented average vertex. in section Identical scales 6.4.2.1. allow comparison

326 6.4 Phenomenological event structures of serendipity

6.4.3.3 Normal patterns of interaction

The presence of the dominant motifs in the event structures described by the Citation Classics raised the possibility for normality on three levels:

1. the triad census level;

2. the role census level; and,

3. the attribute level.

Normal triads Recall that the sixteen patterns possible for a three-vertex graph are known in the social network studies literature as the triad census. Work with this triad census was a precursor to network motif detection.95 As these two—now, essen- tially parallel96—lines of research originated from the same theoretical starting point, the findings of one area are potentially relevant to the findings of the other. A recent triad census study offers a useful perspective from which to consider the present study’s motif findings. Faust (2010) compared statistically significant triads97 across a sample of 159 social networks of different types and magnitudes.98 She found that networks not only exhibit triadic patterning, but also that different types of networks are associated with different triads.99

95See section 3.5.7.3. 96Comments made by delegates at the 2011 UK Social Networks Conference were indicative of a degree of irritation with network motif detection’s perceived lack of acknowledgment (i.e. through formal citation in published works) of the social network field’s early introduction of (see Davis and Leinhardt, 1968, 1972) and long association with the triad census. As someone who has explored both lines of research, the present author agrees that the two lines seem to move in parallel, missing out on the potential for exchanges of findings and ideas; however, it does seem that, just as the network motif camp neglects the triad census, the triad census camp likewise ignores motif detection. 97Although Faust (2010) looked at all sixteen possible patterns, rather than only the thirteen normally formally considered ‘motifs’, this difference is not important for the purposes of the discussion here. 98Note that an earlier version of the study (Faust, 2006) involved a smaller sample of 51 networks. 99Faust (2010) also highlights limitations of the triad census approach, particularly the fact that lower order graph features (i.e. dyads) inherently constrain the appearance of triads in networks. Section 7.3.3 returns to this point.

327 6. DESCRIPTIVE INFERENCES

Like Faust (2010), the present study found triadic patterning in networks. It is the latter aspect of Faust’s finding—the link between network type and triad type—that has the most relevance for the present study. The four dominant motifs described by the study (see section 6.4.2.5) are equivalent to Faust’s (2010) ‘triad types’. The networks studied by the present research were of different magnitudes; in this regard, the study sample mirrored Faust’s sample. However, unlike Faust’s networks, the present study’s networks were not of different types: the present study’s networks all followed the normal form of the event structure.100 Thus, the four dominant motifs, or triad types, were all associated with the same type of network. Based on the findings of Faust (2010), the inference can be drawn that the four dominant motifs are typical or normal structural markers of the serendipity process described by the Citation Classics event structures.

Normal roles The consistent positioning of ego in the four dominant motifs highlights another aspect of normality in the sample event structures: the possibility of normal roles. The role census (Burt, 1990, 2010; Hummell and Sodeur, 1987) builds on the idea of the triad census. Visually, the role census can be conceived in a manner similar to the sixteen three-vertex patterns illustrated in Figure 3.7; however, the role census consists of the thirty-six possible different three-vertex patterns formed by an ego’s relations with two alters.101 The idea behind the role census is that it allows role equivalence,102 an alternative to structural equivalence103 to be determined. Burt (2010, p. 363) notes:

The important point [about the role census] is that the alters are defined by the pattern of their relations with each other and ego. They are not identified as individuals. The micro-structural orientations toward others represented by the triad types are components in roles. [The role census] is an inventory of the components . . . Two individuals [i.e. egos] then play the same role . . . to the extent that they are equivalently involved in the role components . . . .

100See section 4.4.1. 101For illustrations of the patterns, see Burt (2010, p. 362, Figure G8). 102See Wasserman and Faust (1994, Chapter 12), Burt (1990) and Burt (2010, p. 361–365). 103See section 3.5.6.3.

328 6.4 Phenomenological event structures of serendipity

In other words, the role census takes the triad census one step further. A local interaction pattern involving ego is not simply a process structure; rather, it becomes a process structure that represents a role. The relevance of the role census for the present study centred on whether or not the dominant motifs detected in the event structures could be deemed to provide information, not only about normal process structures, but also about normal process roles.104 Crucially, only ego, the person experiencing the serendip- ity, could retain his or her individuality in terms of these process roles; the indi- viduality of alters would be irrelevant for the roles. While the implications of the role census for the study’s dominant motif find- ings were intriguing, especially when considered in combination with the normal aspects of attribute presentation in the motifs (see below), the research could not go so far as to infer that the dominant motifs represented roles in the event structures described by the Citation Classics; the more modest claim that ego’s consistent positioning in the four dominant motifs added to the evidence for nor- mality in the serendipity experiences had to suffice. The role census derives from the social capital tradition, which in turn relies on the study of flow-model networks. The social capital tradition and the study of process are not entirely unrelated; however, it is questionable whether process patterns described as ‘roles’ in non-cumulative, flow-model networks translate to the same roles—or indeed, to roles at all—when described in cumulative, non-flow network models such as the event structures described by the Citation Classics.

Normal attributes The research found that certain attributes and attribute patternings appeared repeatedly across the four dominant motifs detected in the event structures. Ego’s alters in the motifs were most often people and knowledge. In several of the motifs, attributes were highly consistent in terms of the positions they occupied; for example, both alters in the collaboration motif were always people, while the third vertex in the exchange motif was normally information.

104It should be noted that the role census treats the study’s four dominant motifs as five distinct role patterns (see patterns 3, 8, 17, 22 and 29 in Burt, 2010, p. 362, Figure G8). This occurs because the chain motif describes ego occupying one of two possible positions: the role census counts these two positionings as different role interaction triads (i.e. patterns 8 and 22).

329 6. DESCRIPTIVE INFERENCES

Given the heterogeneity of the storyline details presented in the Citation Clas- sics narratives, the consistency of the attribute presentation in the motifs was surprising. It is possible that some of the consistency arose from the narratives’ relatively narrow grounding in a limited number of scientific contexts. Work and research in these areas of science may involve similar processes and interactions with similar sorts of ‘things’ (see Latour and Woolgar, 1986). Or, to put it an- other way, work and research in these fields may involve similar roles. Although the research could not infer from the evidence available that the dominant mo- tifs represented roles, the appearance—yet again—of the concept of roles was notable.105

6.5 A descriptive profile of process in serendipity

The descriptive inference process aimed to integrate the study’s findings related to to the two components of the theoretical aim into a descriptive profile of process in serendipity. It is important to stress that the descriptive profile presented here applies specifically to the experiences of serendipity in research described by the Citation Classics: the question of the transferability of the profile to other populations and contexts is discussed in section 8.2. It is also important to stress that the serendipity described by the descriptive profile is specifically understood as bounded by the inclusion and exclusion criteria that determined the serendipity subpopulation studied by the research (see sections 4.2.5 and 6.2.3). For emphasis and clarity, the profile employs bullet point summaries of the main inferences drawn in response to each of the study’s six research questions. A discussion of the implications of the inferences for the understanding of serendip- ity as a complex phenomenon follows each summary: inferences are held up in the light of serendipity theory and possible reasons for inference inconsistencies are considered. A brief summary and synthesis section closes the profile.

105Section 8.4.2 considers the potential for further research raised by the possibility of normal roles in serendipity.

330 6.5 A descriptive profile of process in serendipity

Serendipity in the Citation Classics: patterns, classes and types The research explored how lived experiences of serendipity related to serendip- ity classifications and patterns proposed in the literature. The comparison led to several inferences about the experiences of serendipity recounted in the Citation Classics.

ˆ Serendipity is an infrequent research experience.

ˆ Findings—not methods—are the research outcome most often associated with serendipity;

ˆ this said, the sought finding is as common as the unsought finding.

ˆ The ‘unexpectedness’ of serendipity lies in its process—the means to the end—as much as, if not more than in its outcome—the end itself.

ˆ ‘Types’ of serendipity exist;

ˆ however, the range of types is not especially wide.

ˆ Certain types of serendipity are much more common than others, and some types never occur at all.

Relating the lived experiences of serendipity recounted in the Citation Classics to serendipity classifications and patterns proposed in the literature also resulted in some general observations concerning the value and use of classification.

ˆ Serendipity classifications can overlap and interrelate.

ˆ Classifications applied in combination enable a more complex description of serendipity than is possible with a single classification.

ˆ Some serendipity classifications are easier to apply in practice;

ˆ such classifications tend to use uncomplicated descriptions and straight- forward language.

331 6. DESCRIPTIVE INFERENCES

Classification helps to develop a complex understanding of the serendipity ex- periences recounted in the Citation Classics: the rich descriptions possible and insight gained through this approach are above and beyond what a definitional approach allows. Indeed, the study’s classification results lead to questions about the theoretical validity of a definitional approach. For example, the prevalence of the sought finding in the Citation Classics examples illustrates a serendipity very different from the serendipity of historically important and oft-quoted defin- itions: the serendipitous ‘sought finding’ is an altogether different concept from Walpole’s ‘things not in quest of’ or Merton’s ‘unsought finding’. At the same time, classification is well placed to capture the ‘accidental discovery’ aspect of serendipity in the Citation Classics provided ‘discovery’ is understood as a verb (i.e. as process) rather than a noun (i.e. as outcome). With his conceptualisation of serendipity as both a word and pattern (and despite his insistence on the ‘unsought finding’), Merton sought to differen- tiate clearly between a definitional approach and an alternative, more com- plex approach to serendipity. His serendipity pattern may only be a one-class classification—and therefore, not a true classification; nonetheless, it is both a clear call for the recognition of serendipity as a complex phenomenon and a sug- gestion for how such an understanding of serendipity might be framed. Serendipity theory is increasingly heeding this call and turning to classifica- tion as a way to describe the complexity of serendipity: McCay-Peet and Toms (2011a), a study which incorporates Bj¨orneborn’s (2008) serendipity dimensions, and Makri and Blandford (2012a,b), a study which compares new empirical data not only with existing serendipity classifications but also with existing serendip- ity models are recent examples. The findings of the present study strengthen the argument for this turn to classification. The findings suggest that having more classifications to combine could be a good thing for the complex understanding of serendipity. The findings also show that much insight can be gained from existing classifications.

Serendipity filters in the Citation Classics: a contradiction The research considered the idea of sociological elements as obstacles to serendip- ity. Based on the results derived from both the literature review and the experien- tial stage of the study, the research made several inferences about the sociological filters detailed in the serendipity experiences described by Citation Classics.

332 6.5 A descriptive profile of process in serendipity

ˆ Sociological filters proposed in the literature are described in the lived serendipity experiences.

ˆ When they are reported as part of lived serendipity experience—which is not especially often—sociological filters are equally likely to appear in com- bination as singly;

ˆ in particular, pressures of time and money tend to be reported in combin- ation.

ˆ Specific references to sociological filters (i.e. those making up the process of a recounted lived serendipity experience) normally describe the positive impacts of filters;

ˆ in contrast, general references to sociological filters (i.e. comments not made in reference to specific events) focus on the negative impacts of filters.

The research findings suggest that the literature’s idea that certain sociological elements serve to filter serendipity does indeed play out in the lived experiences recounted in the Citation Classics. This said, the contradictory, and somewhat surprising, findings related to the perceived impact (i.e. positive or negative) of sociological filters raises at least two related questions: are negative filter impacts not recounted in lived experiences of serendipity because in actuality such negative filter examples become cases of ‘serendipity lost’? or, is the negative impact of pressures such as time and money more an element of serendipity ‘folklore’, a popular belief or myth perpetuated but only very loosely grounded in real, lived experience?

Retrospective sensemaking of serendipity in the Citation Classics The research examined how people retrospectively interpret and make sense of their experiences of process in serendipity above and beyond a basic recounting of events. Based on narrators’ sensemaking, the research made several inferences about the retrospective interpretations of serendipity presented in the Citation Classics.

ˆ A combination of specific intentions—clearly articulated goals and methods— is the most common starting point for serendipity;

333 6. DESCRIPTIVE INFERENCES

ˆ this said, ‘playing around’—beginning with no articulated intentions—also features.

ˆ The unexpected dominates the expected in serendipity:

ˆ an unintended outcome resulting from unintended process is the most com- mon end point for serendipity.

ˆ The ‘unexpectedness’ of serendipity relates in equal measure to outcome and method.

ˆ Recognition does not factor strongly in serendipity sensemaking;

ˆ when referenced, immediate and delayed recognition present about equally.

ˆ Linear interpretations of serendipity are more common than nonlinear in- terpretations.

Note that several of the phenomenologically-derived research inferences contra- dicted the classification-derived research inferences. This inconsistency highlights the difference that perspective can make to an understanding of serendipity: the ‘inside view’ may differ from the ‘outside view’. Sociological research recognises that the ‘insides’ and ‘outsides’ of narrative descriptions offer unique opportunities for complex descriptions of phenomena (Edwards, 2010; Knox et al., 2006; Riles, 2001). Certainly, an understanding of a complex phenomenon such as serendipity must be prepared to accommodate multiple perspectives: such multiplicity is inherent in complexity. The research also produced findings related to disciplinarity, especially mul- tiple disciplinarity. These inferences relied on both phenomenological data and Citation Classics metadata.

ˆ Multiple disciplinarity, although relevant to serendipity, is neither dominant nor essential.

ˆ Most of the serendipity experiences relate to the sciences, especially the life sciences and medicine.

334 6.5 A descriptive profile of process in serendipity

Taken together, the research findings contribute to a complex understanding of serendipity by suggesting that the real-world serendipity sensemaking described by the Citation Classics differs somewhat from the theoretical ideal presented in the literature. On some points, the real-life descriptions and the literature agree. Unexpectedness is indeed a marker of serendipity; this said, unsought process— accidental doing and happening—is as dominant as the unsought finding, a point not always reflected in the literature. As well, like the large population of re- search anecdotes found in the serendipity literature, most of the Citation Classics serendipity experiences centre on the sciences. On other points, the Citation Classics serendipity experiences differ markedly from the literature’s theoretical ideal: linearity dominates, while multiple discip- linarity does not. As well, the Citation Classics provide little support for the idea of sudden recognition in serendipity—the so-called ‘flash’ of inspiration. Put simply, when the retrospective interpretations and sensemaking presented in the Citation Classics are analysed systematically, the real-life serendipity described proves rather more mundane than the serendipity idealised by theory.

Event structures of serendipity described by the Citation Classics The research described event structures derived from personal accounts of serendipity in research. The literature associates rich networks with process in serendipity.106 As such, the study sought to describe the complexity of the serendipity event structures described by the sample Citation Classics. This complexity was captured in visualisations of the event structures and in measures of magnitude and cohesion made on the event structures. Information related to event contexts and participant attributes contributed further detail to the picture of event structure complexity. The research drew several inferences about the serendipity in the Citation Classics based on the event structure descriptions.

ˆ When their event structures are compared, although the serendipity exper- iences display variety, they also appear to share similar features.

106See section 2.3.3.5.

335 6. DESCRIPTIVE INFERENCES

ˆ Imperfect, yet still discernibly star-like patterns are common in the event structures of the serendipity experiences;

ˆ the persons experiencing the serendipity—indeed, people in general—are often found towards the centres of the event structures, particularly in the star-like examples, suggesting that people play a central role in serendipity process.

ˆ The event structure descriptions enable estimates of the ‘average’ number107 of participants (15–16) and processes (31–33) involved in the serendipity experiences:

ˆ participants to process describe a rough 1 : 2 ratio.

ˆ The number of participants involved in the serendipity experiences is more consistent than the number of processes involved.

ˆ The ‘average’ cohesion (0.14–0.15) of the event structures falls toward the lower end of the values for sparse networks normally expected for real- world sociological networks, suggesting that process in serendipity is a less- interconnected social process.

ˆ Outlier examples of serendipity experiences present;

ˆ the outliers differ for magnitude and cohesion, a situation that emphasises the complexity of the serendipity experiences.

ˆ The serendipity experiences comprise multiple event contexts (5–6, on ‘av- erage’):

ˆ even at a minimum, at least two different times and/or locations are de- scribed.

ˆ All three broad attribute types are found in the serendipity experiences:

ˆ ‘knowledge’ is the most prevalent attribute, followed by ‘person’, and then ‘object’.

107In actuality, the study estimated the median rather than mean.

336 6.5 A descriptive profile of process in serendipity

ˆ The distribution of attributes in the serendipity experiences agrees with the literature’s focus on people and information;

ˆ this said, objects do feature noticeably—the serendipity experiences by no means involve only people and information.

A response to the research question ‘what event structures are described?’ fundamentally relies on the presentation of description. The implication for the present study was that this description should be detailed and precise—and in- deed, that the merit of the description would lie in its detail and precision.108 As such, the inferences, especially the quantitative estimates, presented by the bullet point list above should only be considered together with the more detailed description presented in section 6.4.1.

Relational aspects of serendipity in the Citation Classics The research examined how the description of the event structures contrib- uted to understandings of relational aspects of process in serendipity. Several relational aspects were considered, including load, activity, interaction and inter- action patterns. The study of the event structures led to a number of inferences about the relational aspects of process in the serendipity experiences described by the Citation Classics.

ˆ For narrators (i.e. the persons experiencing serendipity), the ratio of pulled to pushed process is roughly 4 : 1.

ˆ The amount of pull is more consistent than the amount of push;

ˆ indeed, while pull is always present, push may be absent.

ˆ Multiple interactions do not affect the rough 4 : 1 pull to push ratio;

ˆ however, when multiple interactions are considered, the inconsistency of push increases.

ˆ Narrators carry well above average process load—approximately 2.5 times the average—and carry this load with above average consistency.

108See section 1.4.2.2.

337 6. DESCRIPTIVE INFERENCES

ˆ In just over three quarters of the serendipity experiences, narrators have the highest process load;

ˆ in the remaining examples, ‘others’ have higher process loads than narrat- ors, a situation that emphasises the point that process in serendipity need not necessarily focus on the person experiencing the serendipity.

ˆ Irrespective of the higher process loads of either narrators or ‘others’, over- all, the tendency for process to focus on single participants in the serendipity experiences is weaker rather than stronger given the maximum potential for focus.

ˆ Narrators interact with a large proportion of the other participants in serendipity experiences:

ˆ only about 14% of the participants are beyond the direct reach of narrators.

ˆ Sequence—dependence on process occurring in a certain order— is estim- ated to impact approximately one quarter of the process interaction in serendipity, effectively reducing narrator reach.

ˆ Narrators interact with at least half of the other serendipity participants.

ˆ This said, interaction with all other participants is not uncommon:

ˆ about one quarter of the serendipity experiences describe interaction rates of 100%.

ˆ While narrators interact directly with more objects than people in the serendipity experiences, the role of people—and the consistency of this role—increases when sequence is taken into account.

ˆ Knowledge is distinctive in that it is possible for narrators to interact dir- ectly only with knowledge in serendipity experiences.

ˆ As well, sequence impacts narrators interactions with knowledge least of all (i.e. compared to people and objects).

ˆ The serendipity experiences do describe global interaction patterns;

338 6.5 A descriptive profile of process in serendipity

ˆ however, with respect to these patterns, the serendipity experiences display a complex mix of focus and randomness, with no single pattern dominating overall.

ˆ The serendipity experiences also describe local interaction patterns, some of which are more likely to appear together than others.

ˆ In the four local interaction patterns that dominate the serendipity experi- ences, people and information are notable for their presence.

ˆ The serendipity experiences can be grouped according to local interaction patterns:

ˆ one large cluster, which comprises the majority of the serendipity experi- ences, and two much smaller clusters form alongside a few isolate examples.

The study of the relational aspects of the event structures described by the Citation Classics helps to develop an understanding of the complex mix of focus and randomness, of the active ‘doing’ and passive ‘happening’, and of the cent- ralised and distributed process that appears at multiple levels in the serendipity experiences. On the one hand, the person experiencing the serendipity is central to the process and interaction; on the other hand, the distribution of process and interaction amongst others, along with the limited potential for access to this distributed process and interaction, emphasises that there is more to serendipity process than simply ‘being central’. Sequence and process reach seem to play a role as do load level and the distribution of pull and push. Although the research inferences shed light on the level of load carried by persons experiencing serendipity, the study results cannot resolve the literature’s conflicting views on whether a smaller or larger load is beneficial for serendipity.109 While the research findings suggest that the process load of Citation Classics nar- rators is higher on average, the findings also indicate that this load remains well under the maximum load possible: given this situation, is the narrator’s load lar- ger or smaller? A point about load that the research findings emphasise is that an answer to this question is inherently relative, dependent on the perspective

109See section 2.3.3.5.

339 6. DESCRIPTIVE INFERENCES

adopted and the relational process structure of the serendipity experience: cer- tainly, some serendipity experiences offer greater potential for larger absolute loads. Along with the role of load, the literature recognises roles for both pull and push in serendipity; however, disagreement is evident as to how push and pull distribute. Some (for example, see P. H. A., 1963) argue that active ‘doing’ forms the bulk of serendipity, while others (for example, see Schumpeter, 2010) emphasise passive ‘happening’. Certainly in the serendipity described by the Citation Classics, pull dominates: push need not even occur. It is important to stress that the present study focused on describing the frequency, distribution and prevalence of pull and push in serendipity. Importance need not relate to these measures: in theory, a single pushed process could be the ‘magic key’ to serendipity. In practice, given the complex relational aspects of serendipity described by the event structures, the present study finds little, if any, evidence to support the idea of a ‘magic key’ to serendipity: even if a single pushed process were key, it seems unlikely that this single process could have an effective impact without the support of the complex relational structure formed by process in serendipity.

Normality in the serendipity described by the Citation Classics The research considered to what extent, if at all, the described event structures supported the idea of average, typical or normal process in serendipity. The research found evidence for normal structure, normal load and normal patterns of interaction in the event structures. The research drew several inferences that support the idea of normality in the serendipity experiences described by the Citation Classics.

ˆ Evidence of normal structure presents.

ˆ Certain visual features seem to repeat across the event structures.

ˆ The event structures display typically sparse levels of cohesion that fall within the normal range for real-world sociological networks.

340 6.5 A descriptive profile of process in serendipity

ˆ While knowledge is most prevalent, it is typical for an event structure to involve approximately one third people.

ˆ Evidence of normal load also presents.

ˆ Unlike the average total load, the narrator’s total load displays statistical normality;

ˆ the normal distribution of narrator load may result from sociological effects.

ˆ Local patterns of interaction in the event structures display normality on at least two levels: the triad census level and the attribute level.

ˆ The four dominant patterns can be considered typical structural markers of the serendipity process described by the event structures.

ˆ The attribute presentation in these dominant patterns displays consistency with respect to type and positioning of attribute.

ˆ The possibility for normality at the role census level remains open:

ˆ although ego occupies consistent positions in the dominant patterns, given the nature of the event structures, this consistency may or may not be indicative of roles.

The evidence for normality offers an alternative view on the seemingly het- erogeneous serendipity experiences described by the Citation Classics: while at the level of storyline detail, the experiences display great variety, the experiences display a number of similarities at the level of process structure. Some of these similar aspects are deemed normal because of comparisons that can be made with known values—for examples, expected density measures—reported in the literat- ure. Other aspects are statistically normal: certain study data follow a normal distribution when no mathematical reason forces them to do so, a situation that raises the possibility that sociological effects may be at play. As emphasised throughout this chapter, the study focused on descriptive in- ferences. While the findings in support of normality are intriguing, the research

341 6. DESCRIPTIVE INFERENCES

evidence is best viewed conservatively as it remains tied to the descriptive stage of understanding serendipity. The next stage should be to determine what causes, if any, underly the nor- mality found in the serendipity experiences described by the Citation Classics: identifying the reasons for the normality could open new doors to the understand- ing of serendipity.

Summary and synthesis Classification of the serendipity experiences described by the Citation Classics enables complex description of serendipity process, especially when classifications are used in combination. This research finding supports the recent turn to clas- sification in serendipity research: the rich description possible with classification seems to far surpass the simplistic description that can be achieved through the use of definitions, historically, a common approach in the serendipity literature. The serendipity experiences described by the Citation Classics do indeed ref- erence sociological elements as filters for serendipity process; however, the view on filters is contradictory, different for real and abstracted situations. It would seem that the impact of sociological filters can only be understood as entirely context- dependent. As such, sweeping statements bemoaning constraints on serendipity seem unwise: what is a constraint in one context may not be a constraint in another. While the ideal presented by theory and the sensemaking of serendipity pro- cess in real-life share some similarities, they also show some clear discrepancies. For example, both serendipity theory and the lived experiences described by the Citation Classics focus on the unexpectedness of serendipity; however, although theory marks multiple disciplinarity out as highly relevant for serendipity, the lived experiences give the issue of multiple disciplinarity only cursory attention. The event structures unveiled by the research provide rich description of the complex serendipity process described by the Citation Classics. The description, both visual and statistical, is precise and detailed, providing information about serendipity process at multiple levels. The relational nature of the event struc- tures allow relational aspects of serendipity process to be explored, leading to a complex understanding of how process and participants in serendipity are inter-

342 6.5 A descriptive profile of process in serendipity twined and dependent. The event structure descriptions also provide evidence for normality in serendipity. Based as it is, on description alone, this evidence is relatively weak in its current state; however, research focused on identifying potential causes for the normality might strengthen the evidence. Of course, the study’s descriptive profile of process in serendipity must be understood for what it is: a detailed and precise description of a few selected as- pects of serendipity process in a specific population. Even serendipity process is, in turn, only one aspect of the complex phenomenon that is serendipity. The lim- itations of the research and the impacts of these limitations on the transferability of the research are considered systematically in Chapters 7 and 8. Suffice it to say here that some might argue that ‘artificially’ dividing a com- plex phenomenon—which in its very nature can only be fully understood as a whole—into parts is misguided. The author disagrees. A complex phenomenon such as serendipity can be overwhelming, especially as the subject of empirical research. Breaking down a complex phenomenon, even ‘artificially’, provides new perspective and forces the researcher to confront detail. Because complexity—a painting, a symphony, a novel—inherently encom- passes detail—the brush stroke, the melodic motif, the turn of phrase—familiarity with detail can support complex understandings. Once articulated, detail also becomes harder to ignore; certainly, any theory focused on developing a complex understanding of a phenomenon requires some awareness of detail in order to reach maturity.

343 6. DESCRIPTIVE INFERENCES

344 Chapter 7

Credibility Audit

This chapter audits the research. Teddlie and Tashakkori’s (2009) Integrative Framework for Inference Quality structures the main discussion, which addresses the issues of design quality and interpretative rigour. The audit concludes with a summary statement concerning the credibility of the research.

7.1 Introduction to the audit

Before the transferability of research inferences can be considered, the quality of the inferences must be assessed. As discussed in section 4.7.1, quality judgements in quantitative and qualitative research not only rely on different evaluative con- structs but also adopt different perspectives on the role of the researcher. An assessment of mixed methods research needs to integrate these different—and often, conflicting—standards. Section 4.7.2 introduced Teddlie and Tashakkori’s (2009) Integrative Frame- work for Inference Quality as a useful tool for assessing the credibility of mixed methods research:1 “Because the integrative framework incorporates many of the standards of quality from the QUAL [qualitative] and QUAN [quantitative] ap- proaches, it is also applicable to each of the strands and assists the MM [mixed methods] researcher by providing at least partially common sets of standards” (Teddlie and Tashakkori, 2009, p. 300).

1See Table 4.6 for the complete framework.

345 7. CREDIBILITY AUDIT

The framework supports two broad concerns related to research credibility:

1. design quality, which assesses the implementation of the research; and,

2. interpretative rigour, which evaluates the interpretation of the results.

Sections 7.2 and 7.3 consider research criteria related to design quality and in- terpretative rigour. The proper evaluation of research procedures and inferences depends on attention to detail; this said, space in the audit is limited—certainly, a full and detailed discussion of every aspect of the research cannot be accommod- ated. As a compromise, the audit often relies on the use of selected examples from the research to address the questions posed by Teddlie and Tashakkori’s (2009) Integrative Framework. The examples chosen are deemed especially important to the credibility of the research.

7.2 Design quality

7.2.1 Design suitability

The research sought to develop an understanding of serendipity as a complex phenomenon, and in so doing, to make contributions to serendipity theory and serendipity research.2 Recall that certain methodological motivations drove the conceptualisation stage of the research and that the theoretical aim of the research was based in part on four explicit methodological concerns.3 As such, the study was unusual in that its research questions emerged from both methodological and theoretical issues: normally, methodological concerns are expected to come into play only after research questions have been articulated. The research focus on the description of process in serendipity required plur- alistic perspectives: no single method could have captured the relational and phenomenological aspects of serendipity along with the tripartite elements of process. As well, the pragmatic stance of the research and the explicit aim of de- tailed, precise and, in part, quantitative description, demanded the use of mixed methods. 2See section 1.4.7. 3See sections 1.4.2 and 1.4.4.2.

346 7.2 Design quality

The aim of the research comprised two components, each associated with a distinct set of research questions.4 Every research question concentrated on a different aspect of process in serendipity; however, taken in combination, all of the research questions addressed the description of process in serendipity. In other words, the questions all supported the same overarching aim. The mixed methods design was crucial for the implementation of the research. The parallel nature of the research design meant that the research questions in each set could be explored using the method(s) that those questions required; for example, the survey approach adopted for the investigation of the serendipity classifications would have been completely inappropriate for the relational study of the event structures. The conversion aspect of the research design ensured that the study remained ‘true’ to the qualitative nature of the Citation Clas- sics dataset, even as the study unveiled the quantitative information held in the dataset.

7.2.2 Design fidelity

Openness about methods implementations is an important aspect of research credibility. The thesis aimed to make available as many pragmatic details of the study as possible to ensure transparency:

ˆ Appendix D provides links to the online Citation Classics dataset;

ˆ sections 4.2.5 and 4.2.6 provide full details of the sampling procedures;

ˆ section 4.8 lists the software used, while Chapter 5 details the reports and software settings selected;

ˆ the network portfolio presents the matrices for every event structure coded and analysed by the research;

ˆ the portfolio’s Appendix provides the qualitative vertex labels that allow the quantitative matrices to be linked back to the qualitative Citation Classics; and,

4See section 6.2.5.

347 7. CREDIBILITY AUDIT

ˆ information on the R packages employed and the parameter settings for the functions called are provided for the statistical tests.

From these details, and the methods discussions in Chapters 4 and 5, the reader can trace the implementation of the research through the stages of sampling, data collection and data analysis.

Sampling The selection of an appropriate sample that balanced quantitative and qualitat- ive concerns was a key starting point for the research. Because of the quantitative aspects of the study, the sample needed to be suitable for statistical analysis— that is, random, and large enough in relation to the population size. This said, given the qualitative nature of the data, the sample also needed to ‘make sense’: inclusion/exclusion criteria were necessary, as were semantic checks on keyword searches. In practical terms, the research was concerned that the sample effectively be ‘large enough’ to accommodate minor inconsistencies in the data collection and be ‘random enough’ to assess normality. Evidence from the experiential stage of the research suggested that the sample size was adequate for the study needs. The statistical analyses were initially run with an error in the data;5 the large sample size helped to ensure that the error did not affect the research inferences.6 With respect to randomness, a random selection procedure did underlie the study sample. This point, considered alongside the fact that the sample comprised most of the study population,7 meant that any lack of randomness resulting from the inclusion/exclusion criteria or semantic checks was not deemed a major concern: it was highly unlikely that the nine population examples not included in the sample would be so different from those included as to significantly alter the research inferences. 5See Figure/Table 30.2 in the Supplementary Volume. Note that the statistical results presented in section 6.4 all derived from the corrected data. 6The use of nonparametric statistics also factored. 7See section 4.2.6.

348 7.2 Design quality

Data collection The systematic collection of event structures from the Citation Classics texts was a primary concern of the study. Along with standard approaches intended to limit threats to reliability, such as multiple passes and error checking, the study employed a two-model approach; although this approach did have theoretical aims,8 it also had methodological aims. Because the underlying data were the same for model pairs, if the data collection was systematic, then certain measures made on models pairs would be similar. Correlations between the two models were expected on several measures, and in some cases, such as vertex counts, similar estimates of central tendency were also expected. The statistical results related to correlation and hypothesis testing indicated that the data collection procedures met these expectations.9 The visual results seemed to agree.10 Decisions and limitations on the coding procedures were also necessary. For example, the decision to include only the three attributes of ‘person’, ‘knowledge’ and ‘object’ in the event structures was based on evidence from the literature; however, the decision was also based on a need to keep the collection and analysis procedures manageable. Hindsight suggested that other attributes—for example, ‘task’ to type concepts coded as ‘the test’ or ‘the experiment’—might have fine- tuned the research inferences; nonetheless, additional attributes could also have overburdened the research. There are benefits to ‘keeping it simple’ in initial attempts at detailed description. Dealing appropriately with the impact of researcher subjectivity was a major issue for the study; for example, coding decisions needed to be made as objectively as possible. To enhance objectivity, narrative norms were employed to establish boundaries and existing relational text analysis approaches were used. This said, subjective decisions did necessarily come into play due to the semantic complexity of the Citation Classics narratives.11 8For example, some measures had more meaning if captured from model one. 9See section 6.4.1. 10See section 6.4.1.1 and the network portfolio. 11See section 5.3 for a discussions of the problematic situations encountered during coding.

349 7. CREDIBILITY AUDIT

Multiple passes to improve within model reliability and explicit coding rules12 were employed; nonetheless, beyond the results of the statistical tests mentioned above, it was difficult to asses the extent of the impact of the subjective de- cisions taken. If multiple researchers had been available to code the texts (for example, see Krippendorff, 2009), then perhaps a better assessment would have been possible.

Data analysis Note that most of the discussion related to data analysis is found in section 7.2.4; here, the focus is on concerns related to the study’s use of a combination of automated and manual analyses. This combined approach was necessary for two reasons:

1. the qualitative nature of the Citation Classics; and,

2. the volume and complexity of the quantitative analysis.

It is important to recognise that researcher decisions affected not only the manual but also the automated analysis; for example, ORA analysed the networks a certain way because the researcher selected certain software settings and re- ports. Checks showed that, once input instructions were received, the software used by the study normally implemented the analyses systematically and without problems: the two faulty FANMOD output files were the only notable exceptions to this rule.13 The manual analysis was more difficult to implement systematically; for ex- ample, during the manual follow-ups on the motif detection, the researcher ini- tially missed a motif involving ego. As with the data collection, multiple passes were important aspects of the manual analyses. There is no doubt that the automated procedures aided the analysis of the data. Complex computations and large amounts of data were easily handled. As well, the illustration of the data benefitted greatly from automated proced- ures. For example, the network visualisations were drawn using spring-embedded

12See Appendices F to H. 13See section 6.4.2.5.

350 7.2 Design quality algorithms;14 such precise drawings would have been impossible using manual methods—certainly, at least, for the more complex networks. As an illustrative example of this point, Appendix O presents a network drawing made in the pilot stages of the research before automated methods were implemented. Recognising the advantages of automation, the researcher initially considered using automated data collection methods as well as automated data analysis methods;15 however, the researcher found that the semantic complexity of the Citation Classics made automated collection methods cumbersome and often, semantically inaccurate. Of course, the manual collection methods used by the study also had the potential for semantic inaccuracy; nonetheless, the researcher decided that, on balance, the manual methods were more useful because they offered the necessary flexibility to accommodate complex semantic situations.

7.2.3 Within-design consistency

The research adopted a multi-faceted approach to promoting within-design con- sistency: clear definitions, careful integrations and logical progressions all played roles. As the research focused on process description, the explicit articulation of a definition of process was an important early step in ensuring that the two strands of the study contributed to the single overall aim of the research.16 The research also drew on several different methodologies;17 the use of a multimethodological framework that integrated themes from these differing methodologies served to establish consistency in the research.18 The involvement of multiple, different methodologies inevitably meant the use of multiple, different methods. Care was taken to ensure that each method used progressed logically and seamlessly into the next method used. For example, the network methods decisions were all predetermined by method decisions taken earlier in the study:19 boundary decisions were prescribed by narrative norms from the narrative analysis stage, while network model and mode were in turn

14See section 4.5.3. 15See section 4.4.3.1. 16See section 1.4.3.1. 17See Chapter 3. 18See section 3.6. 19See section 4.5.2.

351 7. CREDIBILITY AUDIT dictated by the relational text analysis stage.

7.2.4 Analytic adequacy

The study adopted general measures to improve analytic adequacy; for example, nonparametric statistics were employed due to the non-normal distributions of the study data. As well, whenever possible, automated quantitative analysis procedures were used in a bid to reduce the risk of error resulting from manual calculation (see section 7.2.2). The research also needed to consider study-specific issues related to analytic adequacy. Two such issues included the qualitatively-based quantitative choices made with respect to the network analysis and the effects of estimation and missing data.

Network analysis choices As emphasised in section 7.2.3, the network methods decisions were predeter- mined by earlier stages of the research; for example, although from a network analysis perspective, the single mode, multi-attribute approach adopted to the event structures was unusual, both the aims of the research and the event struc- ture coding decisions necessitated such an approach. As well, the research took sociocentric and egocentric views on the same network data—again, an unusual approach in network studies; however, the two perspectives were required to address the research questions. The network studies literature recognises that ‘unusual’ approaches in network analysis are sometimes necessary: network ana- lysis choices must be based on the aims and needs of the research (Borgatti and Everett, 1997). While network methods decisions were predetermined by the event structure coding, the selection of network measures was not. The analyst had to choose concrete network measures to represent the abstract serendipity constructs of the theoretical framework.20 Sometimes, several network measures presented as pos- sibilities for capturing an abstract construct; for example, the network measures of density, average degree and clustering all measure cohesion.

20See section 4.5.4.

352 7.2 Design quality

Density, although sometimes considered a poor cohesion measure (de Nooy et al., 2005; Scott, 2000), was selected by the study in part to allow comparisons with existing empirical evidence (for example, see Faust, 2010). As well, density in directed networks equates to average degree centrality, which, as a normalised measure, is comparable across networks.21 Some (for example, see de Nooy et al., 2005, p. 63–64) claim that average degree (i.e. the absolute form of average degree centrality) is more comparable than density across networks of different magnitudes. The present author disagrees: a network with an order of four is inherently mathematically more limited in its potential for average degree than a network with an order of forty. Although clustering coefficients22 do capture the idea of cohesion better than density from a sociological rather than purely structural perspective, the Watts Strogatz clustering coefficient (Watts and Strogatz, 1998) employed by ORA (i.e. the clustering coefficient that was available to the study) favours vertices with low degree and, as such, “gives a rather poor picture of the overall properties of any network with a significant number of such vertices” (Newman, 2010, p. 204). The event structures often contained many vertices with low degree as evidenced by the star-like shapes that presented; since the study’s understanding of cohesion focused on overall structural properties, the use of the Watts Strogatz clustering coefficient would not have been an appropriate choice. While decisions on network measures were based on knowledge of serendip- ity theory and network analysis, it is important to stress that the choices made also involved researcher judgement; for example, the argument for translating the concept of load to the measure of degree was not the only argument possible. Meaning making in network analysis is achieved by associating network meas- ures with real-world concepts; in sociological contexts, the associations made are normally representative of choices, not givens.

Estimation and missing data Estimation error and missing data effects were not always quantifiable in the study. The research relied on estimation procedures for the analyses of absolute

21See section 3.5.3 for a discussion of absolute versus normalised measures. 22Several types of clustering coefficient exist. See Newman (2010, p. 198–204).

353 7. CREDIBILITY AUDIT total load and ego’s activity.23 Statistical evaluations of the absolute total load estimates were possible;24 however, no quantitative assessment of the activity estimate was possible. This said, the activity results did suggest that, as expected, over- rather than underestimation had occurred.25 Because the results provided some evidence that the estimation error was as expected, the activity results were incorporated into the research inferences; however, the research explicitly acknowledged that the inferences relating to the activity results were inherently less secure. As is common in network studies (Kossinets, 2006, p. 248), the research could not quantify the effects of missing network data across the study sample.26 This said, an error was quantified in one of the sample networks (see Figure/Table 30.2, Suppl. Vol.): the error increased the link count from 118 to 122, and raised the density from 0.112 to 0.116 . The error made almost no difference to the results of the network measures used by the study; however, the error could have had a greater impact if differ- ent measures had been selected. For example, the error meant that the network comprised two components rather than one,27 a situation that would have signi- ficantly altered some measures of connectedness. It is also important to stress that the error appeared in a larger network; had the error appeared in a smaller network, then the impact would likely have been greater. The effects of this error in the one study network highlight two points about the possible impact of missing data in the networks across the study sample. Firstly, some network measures are more susceptible to certain types of missing data than others (Frantz and Carley, 2005; Kossinets, 2006). The study’s selec- tion of network measures had this point in mind. Secondly, missing data tend to impact smaller networks more than larger networks: on balance, the study

23See Appendices I and J. 24See section 6.4.2.1. 25See section 6.4.2.3. 26Correlation and hypothesis testing helped to evaluate consistency in the coding (see section 7.2.2). Missing data is a different issue. Two ‘consistent’ networks might each have a vertex count of 6; however, if the networks should each have a count of 8, then they are both missing data. 27Components are disconnected subgroups in a network. See Newman (2010, p. 142–145) for a detailed explanation of components.

354 7.3 Interpretive rigour networks—like many sociological real-world networks (Newman, 2010)—were on the smaller side.

7.3 Interpretive rigour

7.3.1 Interpretive consistency

The research drew inferences that were consistent with the results obtained. The results were descriptive; as such, causal inferences were not attempted. Some results were stronger than others, a point reflected in the intensity of the inferences drawn; for example, the inferences related to activity were more conservative than the inferences related to load because the activity results relied heavily on estimation procedures that could not be quantitatively evaluated (see section 7.2.4). As the introduction to the descriptive profile stated,28 inferences were also limited to the population investigated. The research looked out for inconsistencies in inferences drawn from the same results; for example, did any of the inferences based on the motif detection con- flict? The study found that inferences drawn from the same results agreed. Note that inconsistencies did present in inferences drawn from different results. In mixed methods research, such inconsistencies are not necessarily considered prob- lematic (see section 7.3.5). The research also ensured that the inferences reflected not only the type of evidence found but also the nature of the data studied. Quantitative description was an explicit aim of the study, and indeed, many of the lower level research inferences were quantitative in nature; however, in recognition of the nature of the Citation Classics, the higher level inferences were predominantly qualitative.

7.3.2 Theoretical consistency

The research inferences were consistent with the current state of serendipity the- ory; indeed, the fact that problems with the state of serendipity theory are expli- citly recognised in the literature (see Sun et al., 2011) was a major influence on

28See section 6.5.

355 7. CREDIBILITY AUDIT

the study’s decision to focus on descriptive, as opposed to causal, inferences.29 Although not always in specific agreement with other studies’ empirical find- ings, the study’s findings broadly agreed with and complemented existing know- ledge of serendipity. For example, the research did not find the extensive evidence for multiple disciplinarity needed to support some claims made by the serendip- ity literature—say, that multiple disciplinarity is the ‘embodiment’ of serendipity (D¨ork et al., 2011); however, in agreement with existing empirical research (Foster and Ford, 2003), the research did find some evidence that multiple disciplinarity is, at least, relevant to serendipity. It is true that some of the research findings were ‘new’ conceptually to serendip- ity theory. For example, no precedent exists in serendipity research for describing serendipity experiences as event structures. This said, neither the idea of a so- ciological process perspective on serendipity (Cunha et al., 2010; Merton, 2004) nor the concept of the event structure (Labov, 2003) are ‘new’. In other words, the study’s ‘new’ findings did not come out of thin air: they were built on ex- isting theories and empirical research, even if these theories and research were sometimes from outside of serendipity studies.30

7.3.3 Interpretive agreement

Interpretative agreement encompasses both scholarly agreement and agreement with research participants’ constructions. Teddlie and Tashakkori (2009, p. 304) note that a “formal demonstration” of scholarly interpretative agreement is ac- ceptance of research by means of the peer-review process. As the list of research- related papers and presentations presented in Appendix P indicates, some aspects of the research have already been subjected to the scrutiny of other scholars, either through a formal peer-review process or through ad hoc questions raised at seminar presentations. As the research investigated historical data, checks with participants to assess interpretive agreement were not practical, and in many cases, would have been

29See sections 1.2.3 and 6.1. 30Of course, the question of what exactly ‘serendipity studies’ encompasses (or should en- compass) remains very much open. The serendipity literature is certainly broad, yet somehow, disconnected (see section 2.1.2).

356 7.3 Interpretive rigour impossible. In its early stages, the research did consider comparing the events reported in the Citation Classics with the events reported in the associated formal research reports; however, such an approach was deemed incompatible with the phenomenological stance of the research. The discussion here considers two examples of issues related to the interpretat- ive agreement of the research with the Citation Classics narrators’ constructions of serendipity:

1. the relevance of the event-based approach to narrative; and,

2. the idea of motifs as basic building blocks.

Narrative as events Event-based approaches to narrative assume that the events in a narrative represent a credible phenomenological description of experience;31 for example, concerns related to informant accuracy, such as the ‘incorrect’ remembering of events, are not generally considered problematic (Elliott, 2005). As such, inter- pretative agreement in event-based narrative research depends on whether or not the research accurately captures the events as reported in the narrative. The present study’s collection of narrative events was guided by established event- based narrative approaches;32 these approaches were used to ensure an as accurate as possible coding of narrative events.

Motifs as building blocks The idea of motifs as fundamental building blocks of larger structure is open to debate.33 Some of this debate rests on theoretical issues, such as the point that statistical significance need not equate to functional importance (Milo et al., 2002); however, the debate also draws on concerns related to mathematical mat- ters. For example, motifs contain dyads; as such, the presence or absence of motifs in a network is inherently constrained by the presence or absence of cer- tain types of dyads in that network. Recent research (Faust, 2010) addresses the

31See section 3.3.1.3. 32See section 3.3.2. 33See section 3.5.7.3.

357 7. CREDIBILITY AUDIT

issue of mathematical constraints in motif detection; however, many questions remain unanswered. The motifs detected by the present study derived from the event structures. On a basic level, provided the event structures agreed with participant construc- tions (i.e. captured narrative events faithfully), then it followed that the motifs also agreed (i.e. because they were simply smaller components of the larger event structures). However, the key point about motifs is that they are not just smaller components: motifs are fundamental building blocks of a construction. Although the research detected motifs, it could not confirm if the detected motifs contributed to constructions. For example, had the research been able to draw a firm conclusion that the motifs detected equated to roles,34 then an ar- gument could have been made that participants constructed serendipity in terms of roles; however, the research findings only hinted at the possibility of roles. As such, it remains unclear whether or not the motifs detected by the study represent role constructions of the participants.

7.3.4 Interpretive distinctiveness

The research adopted a conservative approach to inference making (see section 7.3.1). The pragmatic stance of the research implied that researcher values were influential in all stages of the research, including inference making; however, the role of researcher values was openly acknowledged and multiple sides to arguments were presented. While the inferences made were deemed the most plausible based on the evid- ence available, important questions remain concerning the wider interpretive dis- tinctiveness of the research inferences. Put simply, the research can defend the question of whether or not its findings describe serendipity; however, the re- search cannot defend the question of whether or not its findings are distinctive of serendipity. A study of experiences of creativity in research might arrive at the same descriptive inferences. Or, an analysis of the event structures of fifty ran- domly selected narratives—irrespective of topic—might produce the same results. Without a control group, the research had no means of checking if the research

34See section 6.4.3.3.

358 7.3 Interpretive rigour

findings were unique to serendipity. The research examined the literature for an appropriate existing control group. As none was found, the research considered creating its own control group of event structures, for example, by selecting fifty narratives at random from the Citation Classics and coding the event structures of these narratives. Although this would likely have produced a reasonable control for the research, it proved beyond the practical limits of a doctoral study: coding two models of another fifty narratives (i.e. creating an additional 100 event structures that would then have needed to be mapped and analysed) would have doubled the workload of what was already a very large project. As an alternative to a sample control group, the research gave serious thought to negative case analysis; however, too many issues presented with such an ap- proach. For example, a sample-based approach was fundamental to the research; a few negative cases analyses would have been inadequate for systematic compar- isons with the statistical results of the study. Indeed, to attempt such comparisons would have been methodologically flawed. There was also the question of what counted as a negative case: would any narrative do? or, did the example need to be more specific, a case of ‘non- serendipity’? and, if so, then what exactly is meant by ‘non-serendipity’? Overall, the research decided the best option was to openly acknowledge the problems with the wider interpretative distinctiveness of the research findings rather than spend precious time and effort on negative case analyses that could not offer rigourous comparisons. It is important to stress that none of the research questions addressed the issue of serendipity’s distinctiveness: the research aimed at description of serendipity, not proof that this description was unique to serendipity. In other words, the lack of wider interpretive distinctiveness is irrelevant to the within-research inter- pretative distinctiveness. As such, the lack of wider interpretative distinctiveness should be seen, not as a threat to the credibility of the research, but rather as a limitation of the research.

359 7. CREDIBILITY AUDIT

7.3.5 Integrative efficacy

The descriptive profile of process in serendipity served to integrate the inferences from the two strands of the research. The profile elaborated on the inferences and linked common elements across the inferences in order to develop a complex de- scription of serendipity. When relevant, the profile also highlighted and explained inference inconsistencies that occurred because of the different perspectives ad- opted by the research questions. Note that in mixed methods research, inferences based on different sets of results may be inconsistent; while this is not necessarily problematic, explanations for such inconsistencies should be proffered.

7.3.6 Interpretive correspondence

The structure of the inference reporting in Chapter 6 emphasised the different levels of inference making required by the research.35 Local inferences were associ- ated with their corresponding research questions. In turn, the research questions were divided into two sets, each of which addressed a different component of the research aim. The descriptive profile served to integrate the lower level inferences and stood as a response to the overall research aim. The research sought to describe both qualitative and quantitative aspects of process in serendipity. The operationalisation of the research through a mixed methods design accommodated the different perspectives of the research ques- tions. Different views on a phenomenon can help to develop a complex under- standing of that phenomenon: the mixed method approach allowed the research to draw inferences that described multiple, distinct aspects of serendipity process and to integrate these inferences into a complex understanding.

7.4 Conclusions of the audit

The research credibility audit considered issues of design quality and interpret- ative rigour. The audit report presented in sections 7.2 and 7.3 employs selected examples to illustrate some of the quality issues that arose during the course of

35See section 6.2.

360 7.4 Conclusions of the audit the study and to explain the measures taken to address these concerns. The selected examples highlight strengths and weaknesses of the research design, im- plementation and inferences. The audit noted that design quality strengths included the accommodation of pluralistic perspectives, significant attention to consistency issues, and the ef- fective integration of different methodologies and methods; weaknesses included the inability to quantify estimation and missing data effects, as well as the po- tential for manual data collection and analysis errors. With respect to interpret- ative rigour, the audit found that the conservative approach to inference making and the qualitative/quantitative balance achieved in the inference reporting were strong points of the research; however, the lack of wider interpretative distinct- iveness of the research inferences was deemed a notable limitation of the study. The audit openly acknowledged that researcher decisions played a necessary role in both the implementation of the research and the interpretation of the study results, and that these decisions increased the potential for bias. From its evaluation of design quality and interpretative rigour, the audit con- cluded that the research was credible. It is important to stress that this conclu- sion was based on an overall assessment of the research—that is, not just on the selected examples presented in the limited space of the audit report. It is also important to recognise that the audit report is necessarily static and retrospective in nature. In actuality, attention to research credibility was an active, ongoing process that encompassed all elements of the research at all stages of the study.

361 7. CREDIBILITY AUDIT

362 Chapter 8

Contributions and Prospects

This chapter concludes the thesis. After a quick review of the research imple- mentation and inferences, attention turns to issues of transferability: “to whom, in what context, [and] under what circumstances” (Teddlie and Tashakkori, 2009, p. 311) does the research transfer? The discussion then moves on to the original contributions of the research and the prospects for change and further research raised by the study and its findings. Some brief reflections on the unintended consequences of the research close the chapter.

8.1 One minute overview

A pragmatic stance accommodated the mixture of methodological and theoret- ical motivations that drove the research. Six research questions addressed the description of process in serendipity: the questions focused on the event struc- tures and circumstances associated with process reported in lived experiences of serendipity in research. Prior to the experiential stage of the study, concep- tual and methodological frameworks were developed based on the literature of multiple fields. The research investigated a sample drawn from a subpopulation of serendipity narratives found in the Citation Classics dataset. A parallel conversion mixed methods design supported the operationalisation of the research: one strand of the study focused on the collection and analysis of contextual data, the other,

363 8. CONTRIBUTIONS AND PROSPECTS

on the coding and analysis of two event structure models. Research methods employed included narrative analysis, relational text analysis, network analysis, statistical analysis and inferential statistical network modelling. The research outcomes included a descriptive profile of process in serendipity, a network portfolio and a credibility audit. The descriptive profile integrated the study’s findings on serendipity process and its associated circumstances. The network portfolio presented the study’s event structure visualisations and their underlying quantitative matrices. The audit of the design quality and inter- pretative rigour of the research highlighted the research inferences’ lack of wider interpretative distinctiveness as a notable limitation of the study but concluded that the research was credible.

8.2 Transferability

8.2.1 To whom?

The research inferences drawn from the Citation Classics narrators’ experiences of serendipity in research have the potential to transfer to certain other populations, especially texts similar to the Citation Classics narratives and people similar to the Citation Classics narrators; as Teddlie and Tashakkori (2009, p. 311, emphasis added) note, “Transferability is a matter of degree and depends highly on the extent of the similarity between the context and people [or entities] in your study [i.e. the ‘sending’ context] and the ‘receiving’ context”.

Texts The Citation Classics narratives are similar to other personal anecdotes found in the serendipity literature.1 Narrative theory proposes that narratives have recognisable structural, semantic and pragmatic features (Squire, 2008). With respect to structural features, event-based approaches (see Labov, 1997; Labov and Waletzky, 1967) argue that narratives follow a normal form. Evidence also

1Of course, the Citation Classics relate only to highly-cited research. Section 4.2.4 considers the issue of expertise in relation to such highly-cited research and explains why the Citation Classics do not necessarily represent cases of ‘expert serendipity’.

364 8.2 Transferability exists to suggest that research-related texts tend to report predictable types of process at predictable points (Gopnik, 1972; Silva, 2010). Retrospective accounts of serendipity in research form a large part of the serendipity literature (Garc´ıa,2009). The Citation Classics narratives roughly echo the broad discipline distribution of other serendipity narratives found in the literature: although the social sciences and arts and humanities present, the sciences dominate.2 Given the discipline distribution of serendipity anecdotes and the normal as- pects of narrative, it is reasonable to assume that the research inferences based on the Citation Classics should transfer to other first-person narratives of serendip- ity in research; indeed, it seems likely that the inferences would transfer especially well to personal research anecdotes of serendipity in the sciences. If the two as- sumptions of narrative form and research grounding were not met—for example, as might be expected in the case of blogged experiences of serendipity in everyday life (see Rubin et al., 2011)—then it seems less likely that the inferences would transfer convincingly.

People Narratives are not just texts: narrative theory argues that first-person nar- ratives voice the phenomenological experiences of narrators (Squire, 2008). It is important to stress that phenomenological experiences are deemed to represent ‘types of experience’, not ‘particular fleeting’ examples (Smith, 2008).3 The present study was concerned with the outer experience of serendipity. More specifically, the research focused on people’s lived experience of process in serendipity. As such, the potential for the research inferences to transfer from the Citation Classics narrators to other narrators of serendipity-related research an- ecdotes arises, not because the narrators of both text populations share similar in- dividualistic traits or attributes (i.e. the narrators are ‘researchers’),4 but rather, because the narrators report similar phenomenological experiences of process in

2See section 6.3.3.4. 3See section 3.2.1. 4Of course, the narrators of both populations may indeed display similar traits and attributes—for example, expertise knowledge (see section 4.2.4); however, in relation to the question of research inference transferability in the present study, this point is irrelevant.

365 8. CONTRIBUTIONS AND PROSPECTS

serendipity (i.e. the narrators share the lived experience of ‘doing research’).

8.2.2 In what context?

The research inferences drawn from the serendipity experiences described by the Citation Classics have the potential to transfer, not only to other populations, but also to other sociological and temporal settings. Once again, the extent of the similarity between the sending and receiving contexts determines the degree of transferability.

The research context If the research inferences may transfer to researchers due to a shared lived experience of ‘doing research’ (see section 8.2.1), then it follows that the infer- ences may also transfer to the context in which research ‘is done’;5 indeed, the sociological similarities of research settings are noted in the literature (Calhoun, 2010; Merton, 1973, 1996). One reason to suppose transferability is the potential part played by roles in research settings: ‘doing research’ normally involves actu- alising well-established expectations concerning interactions with people, things and information (see Latour and Woolgar, 1986).6 Of course, the specifics of both the role expectations and the role interactions differ from one research field to the next. Although the Citation Classics encom- pass research in the arts and humanities and the social sciences, the serendipity experiences described in the narratives primarily relate to research in the sci- ences, especially the life sciences and medicine.7 As such, the study’s descriptive profile of process in serendipity may transfer well to scientific research contexts, but perhaps, less well to artistic or sociological research contexts.

Research, serendipity and information behaviour A second reason to suppose transferability of the study’s descriptive profile to research settings beyond those described in the Citation Classics is the richness

5See section 2.2.1.2 for a discussion of the research context. 6See also section 6.4.3.3. 7See section 6.3.3.4.

366 8.2 Transferability of the research context. Merton (1948) explicitly associated the serendipity pat- tern with empirical research. The present study’s descriptive profile is certainly reflective of serendipity as a pattern; as a word, serendipity relates more to search than research (see Ertzscheid and Gallezot, 2003). An argument can be made that both research and the serendipity pattern fall into the largest circle of Wilson’s (1999) Nested Model:8 in other words, both research and the serendipity pattern should be understood and studied as information behaviour (Erdelez, 2010). If this argument holds, then, while the study’s descriptive profile of process in serendipity may have some limited relev- ance to Wilson’s (1999) middle ground of information seeking, it seems unlikely that the descriptive profile would stand as a good description of serendipity in the inherently more search-like context of information retrieval.

From past to present A final point related to the transferability of the descriptive profile to the broad context of research rests on a temporal issue. The Citation Classics narratives are historical documents. How well, if at all, does the pre-Web research context they describe transfer to the research context of the digital age? This question echoes a much wider debate, one which remains, as yet, unre- solved: while it is widely excepted that new technologies, online communication and digital information all play roles in modern research settings, the evidence base required to assess the impacts of these roles on the process of ‘doing re- search’ is still being established (Dutton and Jeffreys, 2010; Palmer, Teffeau and Pirmann, 2009; Rowlands, Nicholas, Williams, Huntington, Fieldhouse, Gunter, Withey, Jamali, Dobrowolski and Tenopir, 2008). At this early stage, some evidence certainly exists to suggest that ‘doing re- search’ by any means—whether those means be digital, pre-Web or even pre- Internet—remains tied to social and environmental contexts; for example, Walsh and Bayma (1996) and Walsh, Kucker, Maloney and Gabbay (2000) argue that online collaboration differs according to the social structures of research fields, while Thelwall (2002, p. 564) finds “the existence of a geographic component in

8The research makes this argument in section 2.2.2.2.

367 8. CONTRIBUTIONS AND PROSPECTS research . . . first tested for by Katz (1994)”9 still present in the digitally connected world. In the modern research context, the circumstances associated with process, the push and pull of interactions, and the relations between people, things and information still matter. And, if they matter for research, then they are relevant to serendipity. While the details of the study’s descriptive profile might differ for serendipity experiences in modern research, it seems reasonable to suggest that the sociological concepts that underlie the descriptive profile do transfer to the digital age.

8.2.3 Under what circumstances?

The research was implemented under assumptions of certain conceptual and methodological circumstances. For example, the study placed relatively narrow boundaries around serendipity as a theoretical construct; indeed, the study’s focus on sociological aspects of serendipity reduced the conceptual space even further. As well, methodologically, the research modelled serendipity as phenomenological and relational. How do such theoretical and methodological assumptions affect the trans- ferability of the research? In other words, if the conceptual or methodological ground of the research were to shift, would the research still stand? Of course, such a shift could occur in any one of a number of directions. It is impossible to consider all possible shifts in the limited space available here; as such, the discussion limits itself to three examples of how the transferability of the research varies under different conceptual and methodological circumstances.

The study of related concepts The study’s constructs and methods may be useful for the investigation of phe- nomena similar to serendipity. The literature relates several concepts to serendip- ity: overlaps between creativity, innovation and serendipity are especially com-

9Katz (1994, p. 31) found “that research cooperation decreases exponentially with the dis- tance separating the collaborative partners”. Although received for publication in April 1993, Katz’s discussion of research collaboration makes no reference to either the Internet or the Web.

368 8.2 Transferability mon.10 The present research described process in serendipity as relational; a similar approach to process is evident in network studies of creativity and innovation.11 As such, it seems likely that the event structure approach adopted by the present research would transfer to studies of process in experiences of creativity and innovation.

The limits of stories Tilly (1999, p. 256, 262) refers to “the trouble with stories”—that is, “the incompatibility in causal structure between most standard stories [including nar- ratives] and social processes”. The present research did indeed study the struc- ture of social process as set out in standard stories; however, in line with the normal expectations of narrative theory, the research explicitly understood such structure as phenomenological experience rather than as reality in any positivist sense.12 As well, the research drew only descriptive inferences; no claims were made concerning cause and effect. The study’s heavy reliance on standard stor- ies limits the transferability of the research to situations supported by similar phenomenological and descriptive foundations.

Different perspectives Several perspectives on serendipity are possible: certainly, sociological and individualistic—and, more specifically, psychological—approaches to serendipity are recognised in the literature. Merton (2004) argues for a sociological stance on serendipity and fiercely rejects approaches that limit serendipity to ‘personal disposition’ or ‘psychological attributes’.13 In agreement with Merton’s stance, the present study saw serendipity as so- ciological: the research focus was on outer experiences of doing and happening in serendipity rather than on inner experiences such as motivations or feelings. This said, although it explicitly excluded a psychological component, the study’s perspective did not ignore the individualistic aspect of serendipity: the focus on

10See section 1.3.4. 11See section 3.5.5.3. 12See sections 3.2.1 and 3.3.1.3. 13See section 1.3.3.

369 8. CONTRIBUTIONS AND PROSPECTS narratives meant that the research was built upon individual, lived experiences. Given these points, it seems reasonable to assume that the research would have high relevance for other sociological approaches to serendipity, varying degrees of relevance for mixed perspectives on serendipity (i.e. dependent on the distribu- tion of sociological and individualistic interest in the phenomenon), but little, if any, relevance for purely psychological views on serendipity.

8.3 Contributions

8.3.1 Serendipity theory

To serendipity theory, the research contributes a detailed descriptive profile of process in serendipity. The profile makes a four-fold contribution to the under- standing of serendipity as a complex phenomenon:

1. the description provides new insights into outer experience of serendipity;

2. the description adds concrete, precise detail to previously fuzzy, abstract serendipity constructs;

3. the description highlights problems with existing theoretical assumptions; and,

4. the description presents evidence for normality in serendipity.

8.3.2 Serendipity research

Factors at play in serendipity’s complexity include serendipity’s relational aspects and serendipity’s lived, experienced and retrospective nature. Such complexity poses considerable methodological challenges. The research not only developed a methodological framework to support this complexity but also systematically applied the framework in an empirical setting and evaluated this implementa- tion and its outcome. As a result, the research brings fresh perspectives and alternative methods—new ways into serendipity’s complexity— to the practice of serendipity research.

370 8.4 Prospects

8.4 Prospects

8.4.1 Recommendations for practice

Two recommendations for practice arise from the research. The first recommend- ation is directed at practitioners of serendipity research; the second, at those who fund serendipity research.

Stories and numbers Much serendipity research focuses on what Spalter-Roth (2000) terms the “story that shows how the problem [e.g. serendipity] works in daily life and provides for empathetic understanding” rather than the “numbers that define the scope and patterns of the problem” (in Teddlie and Tashakkori, 2009, p. 317, em- phasis in original). Given the lived, experiential, and personal nature of serendip- ity, this focus on serendipity as story is entirely understandable; however, as the present research has shown, there are numbers in the stories. More attention to the numbers of serendipity is merited. Seeking out these numbers does not detract from the stories—indeed, quite the opposite: exploring both the numbers and the stories of serendipity leads to an increased appreciation of serendipity’s complexity.

Vogue versus vague The recent surge in serendipity research makes the point that serendipity is very much ‘in vogue’. A pragmatic argument can be made that research funds, and especially public research funds, should go to projects that are of current interest to society; it follows from such an argument that, as serendipity is ‘in vogue’, it is right and proper that serendipity research should be funded. This said, those who fund serendipity research need to be aware of serendip- ity’s semantic history, especially of the problematic link between serendipity the “vogue word” and serendipity the “vague word” (Merton, 2004, p. 289). Care must be taken that the enticing gleam of serendipity’s vogueness does not blind funders to vagueness. As the present research notes, the serendipity literature is disconnected but substantial. Although serendipity theory still incorporates myths and assump-

371 8. CONTRIBUTIONS AND PROSPECTS

tions, empirical research has laid the firm foundations of an evidence base. Vague proposals that show little awareness of these points should not be allowed to cash in on the sole basis of serendipity being vogue.

8.4.2 Further research

The research and its findings point to numerous investigative avenues; the dis- cussion below maps some promising paths that lead out from the study.

Citation analysis The research suggested a need for two citation studies:

1. an analysis of citations to Merton—and, more specifically, his serendipity pattern—in the information studies serendipity literature; and

2. a post-2003 HistCite analysis of publications with the stem serendipit* in their titles.

The research raised the possibility that, due to its lack of citations of Mer- ton’s work, the information studies literature on serendipity risked a missing link to the serendipity pattern. Of course, Merton’s pattern clearly has relevance for the study of serendipity; however, its resonance with the concept of human in- formation behaviour means that Merton’s pattern also has a wider relevance for information research. A citation analysis, not only would establish the present state of the information studies literature’s link to Merton’s serendipity pattern, but also would make comparisons of present and past states possible. A potential problem for serendipity theory—certainly one faced by the area in the past—is that, although many fields are interested in serendipity, intradis- ciplinary citation practices mean that different fields do not necessarily ‘talk to each other’ about serendipity. As a result, the historical serendipity literature is noticeably disconnected. A post-2003 HistCite analysis would allow a comparison with an existing 1950–2003 HistCite (Garfield, 2004b) of the serendipity literature. In particular, the new HistCite might confirm whether or not interconnectedness across the serendipity literature has improved in recent years.

372 8.4 Prospects

Outliers The research identified various outliers in the sample event structures. Some event structures presented as outliers for several descriptive measures; other event structures made one-off appearances as outliers. The event structure approach to the serendipity narratives meant that some degree of semantic meaning was lost. Qualitative case studies of the outliers detected by the research might provide semantic clues as to why these serendipity experiences appeared as outliers. In turn, these clues could help serendipity theory to identify extreme cases of serendipity.

Normality The research highlighted evidence for normality in the serendipity event struc- tures. The questions raised in relation to the idea of normal roles were particularly striking. Given understandings of the sociological aspects of empirical research— that is, both the process of ‘doing research’ and the research context itself—the idea of normal roles makes sense; however, the study’s event structures were not suited to taking the investigation of normal roles further. Case studies would provide a useful means to investigate the potential role patterns identified by the study. It is important to stress that such case studies would not need to address the question of normality per se; rather, the studies would only need to confirm whether or not the identified patterns were indeed representative of roles.

Wider interpretative distinctiveness A significant limitation of the research was the lack of wider interpretative distinctiveness of the research inferences; in other words, while the research could say that it described serendipity, the research could not say that its descriptive inferences were unique to serendipity. Determining if the study’s descriptive profile highlights any unique mark- ers of serendipity would be a useful next step for serendipity research. The present research has already suggested a potential way forward: the comparison of serendipity event structures with a control. Another possibility would be to compare serendipity event structures with event structures of a related concept

373 8. CONTRIBUTIONS AND PROSPECTS such as creativity. While neither of these approaches could definitively define unique markers of serendipity, such studies might at least help to determine if the idea of unique markers is one worth taking forward in serendipity theory.

8.5 Reflections on unintended consequences

As a static document, this thesis stands still; however, the research it reports was dynamic and lived. All discussion up to this point has had one aim: to drive home the contributions of the study to serendipity theory and research. The byproducts of the lived experience of ‘doing research’ have been entirely ignored. But, as anyone familiar with serendipity knows, research can have unintended consequences. These unintended consequences may be concrete contributions of potential significance for other research. Or, they may be more personal in nature—for example, new skills or awareness.

A unique network dataset The study’s paired event structure sample offers network research a unique dataset. Network data reuse is well-established practice in network studies; in- deed, online network data archives exist to make data as widely available as possible. The event structure dataset is especially suitable for testing analyses of current interest to network studies: comparisons across networks of different magnitudes; comparisons of paired networks; and, the study of network samples rather than cases. Such analyses are far from straight-forward with relational data.

Software and open source The study provided an excellent opportunity to learn new software. The learn- ing curve was steep at first; however, proficiency increased with each command learnt and function called to the point that, even unfamiliar software is now quickly grasped. This said, working with the new software had an impact above and beyond skills development. The research has lead to an appreciation of—indeed, an awe of—the capability and potential open source and other online research software offers to researchers.

374 8.5 Reflections on unintended consequences

Collaborative projects such as R and LATEX and tools such as PAJEK and ORA draw on advanced knowledge and expertise. Contributors and creators spend large amounts of time and effort; yet, their high quality contributions and creations remain freely available for others to use. Such assets should be deeply cherished by all in the research community.

375 8. CONTRIBUTIONS AND PROSPECTS

376 Bibliography

Note: Thesis page numbers for citations are provided at the end of entries.

Abbasi, A., Hossain, L. and Chung, K. (2012). Hybrid centrality measures for binary and weighted networks, Presentation at 3rd Workshop on Complex Networks, 7–9 March 2012, Department of Computer Sciences, Florida In- stitute of Technology, Melbourne, FL. 301

Abrahamson, E. and Rosenkopf, L. (1997). Social network effects on the ex- tent of innovation diffusion: a computer simulation, Organization Science 8(3): 289–309. 93, 106

Adamic, L. A. and Adar, E. (2005). How to search a social network, Social Networks 27(3): 187–203. 105

Adamic, L. A., Lukose, R. M. and Huberman, B. A. (2002). Local search in un- structured networks, In: S. Bornholdt and H. G. Schuster (Eds.), Handbook of Graphs and Networks: From the Genome to the Internet, Wiley-VCH, Berlin, pp. 295–317. 105

Adamic, L. A., Lukose, R. M., Puniyani, A. R. and Huberman, B. A. (2001). Search in power-law networks, Physical Review E 64: 046135. 105

Agarwal, N. K., Xu, Y. and Poo, D. C. C. (2009). Delineating the bound- ary of “context” in information behavior: towards a contextual identity framework, Proceedings of ASIS&T 2009, Thriving on Diversity: Informa- tion Opportunities in a Pluralistic World, 6–11 November 2009, Vancouver, paper 52. Available at: http://www.asis.org/Conferences/AM09/ open-proceedings/papers/52.xml (Accessed on: 21 February 2012). 31

Airoldi, E., Bai, X. and Carley, K. M. (2011). Network sampling and classifica-

377 BIBLIOGRAPHY

tion: an investigation of network model representations, Decision Support Systems 51(3): 506–518. 122, 123

Airoldi, E. M. and Carley, K. M. (2005). Sampling algorithms of pure net- work topologies: stability and separability of metric embeddings, SIGKDD Explorations Newsletter 7: 13–22. Available at: http://www.sigkdd. org/explorations/issues/7-2-2005-12/2-Airoldi.pdf (Accessed on: 8 November 2011). 122, 123, 124, 310

Alajoutsij¨arvi,K. and Eriksson, P. (1998). Paper life: making sense of a forest sector triad in its contexts, In: H. Tikkanen (Ed.), Marketing and Interna- tional Business (Series A–2), Publications of the Turku School of Economics and Business Administration, pp. 9–44. 31

Alcock, S. E. (2008). The stratigraphy of serendipity, In: M. de Rond and I. Morley (Eds.), Serendipity: Fortune and the Prepared Mind, Vol. 22 of The Darwin College Lectures, Cambridge University Press, Cambridge, pp. 11– 25. 8, 10, 11

Allan, J. (Ed.) (2002). Topic Detection and Tracking: Event-based Information Organization, Kluwer Academic Publishers, Dordrecht, The Netherlands. 56

Allan, J., Papka, R. and Lavrenko, V. (1998). On-line new event detection and tracking, SIGIR ’98: Proceedings of the 21st ACM-SIGIR International Conference of Research and Development in Information Retrieval, ACM, New York, pp. 37–45. 56

Allen, B. and Kim, K. (2001). Person and context in information seeking: interac- tions between cognitive and task variables, The New Review of Information Behaviour Research 2: 1–16. 31

Almack, J. C. (1922). The influence of intelligence on the selection of associates, School and Society (16): 529–530. 107

Alvarez, R. and Urla, J. (2002). Tell me a good story: using narrative analysis to examine information requirements interviews during an ERP implementa- tion, Data Base for Advances in Information Systems 33: 38–52. 72

Anderson, T. D. (2010). Kickstarting creativity: supporting the product- ive faces of uncertainty in information practice, Information Research

378 BIBLIOGRAPHY

15(4): colis 721. Available at: http://informationr.net/ir/15-4/ colis721.html (Accessed on: 17 February 2012). 27

Anderson, T. D. (2011). Beyond eureka moments: supporting the invisible work of creativity and innovation, Information Research 16(1): paper 471. Available at: http://informationr.net/ir/16-1/paper471.html (Accessed on: 13 February 2012). 12

Andr´e,P., Schraefel, M. C., Teevan, J. and Dumais, S. T. (2009). Discovery is never by chance: designing for (un)serendipity, In: N. Bryan-Kinns, m. D. Gross, H. Johnson, J. Ox and R. Wakkary (Eds.), Creativity & Cognition, ACM, New York, NY, pp. 305–314. 5, 9, 10, 27, 40, 49, 52, 58

Apostel, L. (1961). Towards the formal study of models in the non-formal sci- ences, In: H. Freudenthal (Ed.), The Concept and the Role of the Model in Mathematics and Natural and Social Sciences, D. Reidel Publishing Com- pany, Dordrecht, The Netherlands, pp. 1–37. 90, 91

Austin, J. H. (2003). Chase, Chance, and Creativity: The Lucky Art of Novelty, MIT Press, Cambridge, MA. 43, 254

Barab´asi,A. L., Albert, R. and Jeong, H. (2000). Scale-free characteristics of random networks: the topology of the World Wide Web, Physica A 281: 69– 77. 105

Barber, B. (1953). Science and the Social Order, George Allen & Unwin Ltd, London. 12, 40

Barber, B. and Fox, R. C. (1958). The case of the floppy-eared rabbits: an instance of serendipity gained and serendipity lost, The American Journal of Sociology 64(2): 128–136. 45

Barber, M. (2010). Alfred Schutz, In: E. N. Zalta (Ed.), The Stanford En- cyclopedia of Philosophy, Center for the Study of Language and Inform- ation (CSLI), Stanford University, Stanford, CA. Available at: http: //plato.stanford.edu/entries/schutz/ (Accessed on: 3 October 2011). 68

Barthes, R. (1977). Introduction to the structural analysis of narratives, Image, Music, Text, Essays selected and translated by Stephen Heath, Fontana, London, pp. 79–124. 72

379 BIBLIOGRAPHY

Batagelj, V. and Mrvar, A. (2011). PAJEK: Program for Analysis and Visualization of Large Networks—Reference Manual, University of Ljubljana, Ljubljana, Slovenia. Available at: http://pajek.imfm.si/lib/exe/fetch.php? media=dl:pajekman204.pdf (Accessed on: 14 September 2011). 183

Batagelj, V. and Mrvar, A. (2012). PAJEK: Program for Analysis and Visualization of Large Networks . Available at: http://pajek.imfm.si/doku.php?id= download (Accessed on: 4 September 2012). 183

Bates, J. A. (2004a). Use of narrative interviewing in everyday information be- havior research, Library & Information Science Research 26(1): 15–28. 72

Bates, J. A. (2004b). Use of the episodic interview method in everyday life information behaviour research, Information Research 10(1): abs 15. Avail- able at: http://informationr.net/ir/10-1/abs15.html (Accessed on: 8 December 2011). 72

Bates, M. J. (2007). What is browsing—really? A model drawing from behavi- oural science research, Information Research 12(4): paper 330. Available at: http://informationr.net/ir/12-4/paper330.html (Accessed on: 17 February 2012). 27

Bauer, M. (1996). The narrative interview: comments on a tech- nique for qualitative data collection, Papers in Social Research Methods, Qualitative series no. 1, London School of Economics and Political Science, Methodology Institute, London. Available at: http://www.lse.ac.uk/collections/methodologyInstitute/pdf/ QualPapers/Bauer-NARRAT1SS.pdf (Accessed on: 21 May 2010). 139

Bawden, D. (1993). Browsing: theory and practice, Perspectives in Information Management 3(1): 71–85. 27

Bawden, D. (2011). Encountering on the road to Serendip? Browsing in new information environments, In: A. Foster and P. Rafferty (Eds.), Innova- tions in Information Retrieval: Perspectives for Theory and Practice, Facet, London, pp. 1–22. 12, 27, 36

Beacham, F. (2001). The loss of serendipity, TV Technology 3 Octo- ber 2001. Available at: http://www.tvtechnology.com/article/ the-loss-of-serendipity/184152 (Accessed on: 30 March 2012). 58

380 BIBLIOGRAPHY

Beale, R. (2007). Supporting serendipity: using ambient intelligence to augment user exploration for data mining and web browsing, International Journal of Human-Computer Studies 65(5): 421–433. 59

Beghtol, C. (1997). Stories: applications of narrative discourse analysis to issues in information storage and retrieval, Knowledge Organization 24(2): 64–71. 72

Bernard, H. R., Killworth, P., Kronenfeld, D. and Sailer, L. (1984). The problem of informant accuracy: the validity of retrospective data, Annual Review of Anthropology 13: 495–517. 74

Bernard, H. R. and Ryan, G. W. (2010). Analyzing Qualitative Data: Systematic Approaches, Sage, London. 174

Bertuglia, C. S. and Vaio, F. (2005). Nonlinearity, Chaos, and Complexity: The Dynamics of Natural and Social Systems, Oxford University Press, Oxford. 49, 58

Beyer, C. (2011). Edmund Husserl, In: E. N. Zalta (Ed.), The Stan- ford Encyclopedia of Philosophy, Center for the Study of Language and Information (CSLI), Stanford University, Stanford, CA. Available at: http://plato.stanford.edu/entries/husserl/ (Accessed on: 3 Octo- ber 2011). 67

Bhanderi, H. (2002). PhD/MPhil Thesis—a LaTeX Template. Avail- able at: http://www-h.eng.cam.ac.uk/help/tpl/textprocessing/ ThesisStyle/ (Accessed on: 10 January 2012). 23

Bj¨orneborn, L. (2004). Small-World Link Structures across an Academic Web Space: A Library and Information Science Approach, PhD thesis, Royal School of Library and Information Science, Denmark. 4, 105

Bj¨orneborn, L. (2008). Serendipity dimensions and users’ information behaviour in the physical library interface, Information Research 13(4): paper 370. Available at: http://informationr.net/ir/13-4/paper370.html (Ac- cessed on: 30 March 2012). 58, 332

Blackwell, A. F., Weilson, L., Street, A., Boulton, C. and Knell, J. (2009). Radical innovation: crossing knowledge boundaries with interdisciplinary teams, Technical Report 760, University of Cambridge, Computer Laborat-

381 BIBLIOGRAPHY

ory, Cambridge. Available at: http://www.cl.cam.ac.uk/techreports/ UCAM-CL-TR-760.pdf (Accessed on: 20 March 2012). 52

Blandford, A. and Attfield, S. (2010). Interacting with Information, Vol. 6 of Synthesis Lectures on Human-Centered Informatics, Morgan & Claypool, San Francisco, CA. 27, 38

Boden, M. A. (2004). The Creative Mind: Myths and Mechanisms, 2nd edn., Routledge, London. 11, 54

Bollob´as,B. (2001). Random Graphs, Vol. 73 of Cambridge Studies in Advanced Mathematics, 2nd edn., Cambridge University Press, Cambridge. 122

Borgatti, S. P. and Cross, R. (2003). A relational view of information seeking and learning in social networks, Management Science 49(4): 432–445. 33, 104

Borgatti, S. P. and Everett, M. G. (1997). Network analysis of 2-mode data, Social Networks 19(3): 243–269. 111, 352

Borgatti, S. P. and Everett, M. G. (1999). Models of core/periphery structures, Social Networks 21(4): 375–395. 123

Borgatti, S. P. and Halgin, D. S. (2011). On network theory, Organiza- tion Science (Articles in Advance), pp. 1–14. Available at: http:// www.steveborgatti.com/papers/orsc.1110.0641.pdf (Accessed on: 29 September 2011). 110

Borgatti, S. P. and Lopez-Kidwell, V. (2011). Network theory, In: J. Scott and P. J. Carrington (Eds.), The Sage Handbook of Social Network Analysis, Sage, London, pp. 40–54. 108, 110

Borgatti, S. P., Mehra, A., Brass, D. J. and Labianca, G. (2009). Network analysis in the social sciences, Science 323(5916): 892–895. Available at: http://www.steveborgatti.com/papers/SNA_Review_for_Science. pdf (Accessed on: 2 September 2010) [ Note that this url links to the longer, pre-publication version of the article.]. 107

Boslaugh, S. and Watters, P. A. (2008). Statistics in a Nutshell, O’Reilly, Se- bastopol, CA. 174

382 BIBLIOGRAPHY

Bott, H. (1928). Observation of play activities in a nursery school, Genetic Psy- chology Monographs 4: 44–88. 107

Box, G. E. P. and Draper, N. R. (1987). Empirical Model-building and Response Surfaces, John Wiley & Sons, Inc., New York, NY. 92

Boyd, A. (2004). Multi-channel information seeking: a fuzzy conceptual model, Aslib Proceedings: New Information Perspectives 56(2): 81–88. 58

Braithwaite, R. (1959). Scientific Explanation: A Study of the Function of The- ory, Probability, and Law in Science, Cambridge University Press, Cam- bridge. 116

Brede, M. and de Vries, B. J. M. (2009). Networks that optimize a trade-off between efficiency and dynamical resilience, Complex Sciences, Vol. 5 of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications, Springer, Berlin, pp. 2109–2117. 123

Brendel, D. H. (2006). Healing Psychiatry: Bridging the Science/Humanism Divide, MIT Press, Cambridge, MA. 13

Brentano, F. (1874). Psychologie vom Empirischen Standpunkt, Vol. 1– 2, Verlag von Duncker & Humblot, Leipzig. Available at: http: //ia700404.us.archive.org/12/items/psychologievome02brengoog/ psychologievome02brengoog.pdf (Accessed on: 3 October 2011). 67

Brickley, D. (2011a). About (FOAF), FOAF Project . Available at: http: //www.foaf-project.org/ (Accessed on: 13 October 2011). 80

Brickley, D. (2011b). The Friend of a Friend (FOAF) project, FOAF Project . Available at: http://www.foaf-project.org/ (Accessed on: 13 October 2011). 80

Brickley, D. and Miller, L. (2010). FOAF Vocabulary Specification 0.98 . Available at: http://xmlns.com/foaf/spec/ (Accessed on: 13 October 2011). 80

Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine, Computer Networks and ISDN Systems 30: 107–117. 105

Brophy, P. (2009). Narrative-Based Practice, Ashgate, Farnham. 72

383 BIBLIOGRAPHY

Budd, J. M. (2005). Phenomenology and information studies, Journal of Docu- mentation 61(1): 44–59. 65

Burt, R. S. (1990). Detecting role equivalence, Social Networks 12(1): 83–97. 328

Burt, R. S. (2010). Neighbor Networks: Competitive Advantage Local and Per- sonal, Oxford University Press, Oxford. 119, 120, 328, 329

Butts, C. T. (2003). Network inference, error, and informant (in)accuracy: a Bayesian approach, Social Networks 25(2): 103–140. 118

Calhoun, C. (Ed.) (2010). Robert K. Merton: Sociology of Science and Sociology as Science, Columbia University Press, New York, NY. 11, 366

Campa, R. (2008). Making science by serendipity. A review of Robert K. Merton and Elinor Barber’s The Travels and Adventures of Serendipity, Journal of Evolution and Technology 17(1): 75–83. Available at: http://jetpress. org/v17/campa.htm (Accessed on: 6 February 2012). 5, 6

Campanario, J. M. (1996). Using Citation Classics to study the incidence of serendipity in scientific discovery, Scientometrics 37(1): 3–24. 3, 43, 45, 46, 48, 137, 138, 139, 141, 144, 145, 237, 238, 244, 245, 450, 483

Campos, J. and de Figueiredo, A. D. (2001). Searching the unsearchable: in- ducing serendipitous insights, In: R. Weber and C. Gresse von Wangen- heim (Eds.), Case-based Reasoning: Workshop Program at ICCBR-2001, Navy Center for Applied Research in Artificial Intelligence, Naval Research Laboratory, Washington, D. C., pp. 159–164. 4, 27, 58

Canadian Institutes of Health Research (CIHR) (2005). Training Program Grant Guide: Strategic Training Initiative in Health Research, CIHR, Ottawa, ON. 51

Cannon, W. B. (1945). The Way of an Investigator: A Scientist’s Experiences in Medical Research, W. W. Norton & Co., Inc., New York, NY. 6

Canty, A. and Ripley, B. D. (2012). boot: Boostrap R (S-Plus) Functions. Available at: http://cran.r-project.org/web/packages/boot/boot. pdf (Accessed on: 2 July 2012). 260

Carley, K. (1986). An approach for relating social structure to cognitive structure,

384 BIBLIOGRAPHY

Journal of Mathematical Sociology 12(2): 137–189. 156

Carley, K. (1993). Coding choices for textual analysis: a comparison of content analysis and map analysis, Sociological Methodology 23: 75–126. Avail- able at: http://www.casos.cs.cmu.edu/publications/papers/carley_ 1993_codingchoices.PDF (Accessed on: 20 September 2011). 154, 155, 156, 157, 158, 159, 160, 161

Carley, K. (1994). Extracting culture through textual analysis, Poetics 22(4): 291–312. 102

Carley, K. (1997). Network text analysis: the network position of concepts, In: C. W. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts, Lawrence Erlbaum Associates, Mahwah, NJ, pp. 79–1100. 154

Carley, K., Bigrigg, M., Reminga, J., Storrick, J., DeReno, M. and Columbus, D. (2009). Basic lessons in *ORA and Automap 2009, Technical report CMU-ISR- 09-117, Carnegie Mellon University, School of Computer Science, Institute for Software Research, Pittsburgh, PA. Available at: http://www.casos. cs.cmu.edu/publications/papers/CMU-ISR-09-117.pdf (Accessed on: 4 May 2010). 88, 89, 161, 202, 203, 205

Carley, K. M. (2003). Dynamic network analysis, In: R. Breiger, K. Carley and P. Pattison (Eds.), Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers, Committe on Human Factors, National Research Council, National Research Council, Washington, D. C., pp. 133– 145. 119, 120

Carley, K. M. (2012a). AutoMap, Center for Computational Analysis of Social and Organizational Systems (CASOS), Institute for Software Research Interna- tional (ISRI), School of Computer Science (SCS), Carnegie Mellon Univer- sity (CMU), Pittsburgh, PA. Available at: http://www.casos.cs.cmu. edu/projects/automap/software.php (Accessed on: 9 September 2012). 155

Carley, K. M. (2012b). *ORA, Center for Computational Analysis of Social and Organizational Systems (CASOS), Institute for Software Research Interna- tional (ISRI), School of Computer Science (SCS), Carnegie Mellon Univer- sity (CMU), Pittsburgh, PA. Available at: http://www.casos.cs.cmu. edu/projects/ora/software.php (Accessed on: 9 September 2012). 173,

385 BIBLIOGRAPHY

187

Carley, K. M. and Kaufer, D. S. (1993). Semantic connectivity: an approach for analyzing symbols in semantic networks, Communication Theory 3(3): 183– 213. 101

Carley, K. and Palmquist, M. (1992). Extracting, representing, and analyzing mental models, Social Forces 70(3): 601–636. 154

Carley, K., Reminga, J., Storrick, J. and Columbus, D. (2010). ORA user’s guide 2010, Technical report CMU-ISR-10-120, Carnegie Mellon Univer- sity, School of Computer Science, Institute for Software Research, Pitts- burgh, PA. Available at: http://www.casos.cs.cmu.edu/publications/ papers/CMU-ISR-10-120.pdf (Accessed on: 17 August 2010). 187

Carley, K., Reminga, J., Storrick, J. and Columbus, D. (2011). ORA user’s guide 2011, Technical report CMU-ISR-11-107, Carnegie Mellon Univer- sity, School of Computer Science, Institute for Software Research, Pitts- burgh, PA. Available at: http://www.casos.cs.cmu.edu/publications/ papers/CMU-ISR-11-107.pdf (Accessed on: 15 September 2011). 187

Carley, K., Reminga, J., Storrick, J. and DeReno, M. (2009). ORA user’s guide 2009, Technical report CMU-ISR-09-115, Carnegie Mellon Univer- sity, School of Computer Science, Institute for Software Research, Pitts- burgh, PA. Available at: http://www.casos.cs.cmu.edu/publications/ papers/CMU-ISR-09-115.pdf (Accessed on: 12 July 2010). 187

Case, D. O. (2002). Looking for Information: A Survey of Research on Informa- tion Seeking, Needs, and Behavior, Academic Press, San Diego, CA. 33

Case, D. O. (2007). Looking for Information: A Survey of Research on Inform- ation Seeking, Needs, and Behavior, 2nd edn., Elsevier, London. 9, 31, 33, 36, 54, 58

Castro, R., Coates, M., Liang, G., Nowak, R. and Yu, B. (2004). Network tomo- graphy: recent developments, Statistical Science 19(3): 499–517. 122

Chang, S.-J. and Rice, R. E. (1993). Browsing: a multidimensional framework, Annual Review of Information Science and Technology (ARIST) 28: 231– 276. 27

386 BIBLIOGRAPHY

Chatman, S. (1978). Story and Discourse: Narrative Structure in Story and Film, Cornell University Press, Ithaca, NY. 19

Chevaleva-Janovskaja, E. (1927). Groupements spontan´es d’enfants `a l’age pr´escolaire, Archives de Psychologie 20: 219–223. 107

Chmiel, A., Sienkiewicz, J., Thelwall, M., Paltoglou, G., Buckley, K., Kappas, A. and Holyst, J. (2011). Collective emotions online and their influence on com- munity life, PLoS ONE 6(7): e22207. Available at: http://www.plosone. org/article/info:doi%2F10.1371%2Fjournal.pone.0022207 (Accessed on : 8 December 2011). 105

Choi, B. C. K. and Pak, A. W. P. (2006). Multidisciplinarity, interdisciplinarity and transdisciplinarity in health research, services, education and policy: 1. definitions, objective, and evidence of effectiveness, Clinical & Investigative Medicine 29(6): 351–364. 51, 53

Compton, J. J. (1988). Some contributions of existential phenomenology to the philosophy of natural science, American Philosophical Quarterly 25(2): 99– 113. 68

Constans, A. (Ed.) (2005). Garfield at Eighty, The Scientist, New York, NY. Available at: http://images.the-scientist.com/pdfs/corporate/ Garfieldat80.pdf (Accessed on: 26 July 2011). 134

Conti, N. and Doreian, P. (2009). Social network engineering and race in a police academy: a longitudinal analysis, Social Networks 32(1): 30–43. 118

Cool, C. (2001). The concept of situation in information science, Annual Review of Information Science and Technology 35: 5–42. 31

Courtright, C. (2007). Context in information behavior research, Annual Review of Information Science and Technology 41: 273–306. 31

Coviello, N. (2005). Integrating qualitative and quantitative techniques in net- work analysis, Qualitative Market Research 8(1): 39–60. 118

Craib, I. (1992). Modern Social Theory: From Parsons to Habermas, 2nd edn., Pearson, Harlow. 62

Craib, I. (1997). Classical Social Theory: An Introduction to the Thought of

387 BIBLIOGRAPHY

Marx, Weber, Durkheim, and Simmel, Oxford University Press, Oxford. 62

Crawley, M. J. (2005). Statistics: An Introduction to Using R, John Wiley & Sons, Ltd., Chichester, UK. 176, 183

Crawley, M. J. (2007). The R Book, John Wiley & Sons, Ltd., Chichester, UK. 183

Cross, R., Rice, R. E. and Parker, A. (2001). Information seeking in social context: structural influences and receipt of information benefits, IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews 31(4): 438–448. 33, 104

Crossley, N. (2010). The social world of the network: combining qualitative and quantitative elements in social network analysis, Sociologica 1: 1–34. Available at: http://www.sociologica.mulino.it/journal/article/ index/Article/Journal:ARTICLE:356/Item/Journal:ARTICLE:356 (Ac- cessed on: 2 July 2010). 118

Crossley, N. (2011). Towards Relational Sociology, Routledge, London. 62, 63

Crow, G. (2004). Social networks and social exclusion: an overview of the debate, In: C. Phillipson, G. Allan and D. Morgan (Eds.), Social Networks and Social Exclusion: Sociological and Policy Perspectives, Ashgate, Aldershot, pp. 8–19. 117

Cunha, M. P. (2005). Serendipity: why some organizations are luckier than others, FEUNL Working Paper 472, Faculdade de Economia, Universidade Nova de Lisboa, Lisbon. Available at: http://fesrvsd.fe.unl.pt/WPFEUNL/ WP2005/wp472.pdf (Accessed on: 13 February 2012). 12, 40, 46, 50, 52, 54, 55, 59

Cunha, M. P., Clegg, S. R. and Mendon¸ca,S. (2010). On serendipity and organ- izing, European Management Journal 28(5): 319–330. 4, 6, 7, 11, 12, 40, 41, 46, 47, 49, 54, 55, 57, 58, 356

DANTE (2012). dtklogos.sty (under development). Available at: http:// ctan.mackichan.com/usergrps/dante/dtk/dtklogos.sty (Accessed on: 18 February 2012). 23, 435

Dar´anyi, S. and Lendvai, P. (2010). Foreword, In: S. Dar´anyi and P. Lendvai

388 BIBLIOGRAPHY

(Eds.), Proceedings of the First International AMICUS Workshop on Auto- mated Motif Discovery in Cultural Heritage and Scientific Communication Texts, AMICUS Project, ILK Research Group, Tilburg University, The Netherlands, pp. 5–6. Available at: http://ilk.uvt.nl/amicus/WS01/ amicus_kotet_10_12.pdf (Accessed on: 20 May 2011). 124

Davis, J. A. and Leinhardt, S. (1968). The structure of positive interpersonal rela- tions in small groups, Presentation at the Annual Meeting of the American Sociological Association, August 1968, Boston, MA. 125, 327

Davis, J. A. and Leinhardt, S. (1972). The structure of positive interpersonal relations in small groups, In: J. Berger (Ed.), Sociological Theories in Progress, Volume 2, Houghton Mifflin, Boston, MA, pp. 218–251. 125, 327 de Bellis, N. (2009). Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics, Scarecrow Press, Lanham, MA. 105 de Figueiredo, A. D. and Campos, J. (2001). The serendipity equations, Technical Note AIC–01–003, Navy Center for Applied Research in Arti- ficial Intelligence, Naval Research Laboratory, Washington, D.C. Avail- able at: http://www.aic.nrl.navy.mil/papers/2001/AIC-01-003/ws4/ ws4toc2.pdf (Accessed on: 13 February 2012). 12, 13, 43, 48 de Marneffe, M.-C., MacCartney, B. and Manning, C. D. (2006). Generating typed dependency parses from phrase structure parses, Proceedings of the International Conference on Language Resources and Evaluation, LREC 2006, 22–28 May 2006, Genoa, Italy. Available at: http://www.lrec-conf. org/proceedings/lrec2006/ (Accessed on: 22 November 2011). 199 de Marneffe, M.-C. and Manning, C. D. (2008). Stanford Typed Dependen- cies Manual, The Stanford Natural Language Processing Group, Stan- ford University, Palo Alto, CA. Available at: http://nlp.stanford.edu/ software/dependencies_manual.pdf (Accessed on: 22 November 2011). 199 de Nooy, W., Mrvar, A. and Batagelj, V. (2005). Exploratory Social Network Analysis with Pajek, Cambridge University Press, Cambridge. 119, 183, 264, 267, 353 de Rond, M. (2005). The structure of serendipity, Working Paper 07/2005, Judge Business School, University of Cambridge, Cambridge. Avail-

389 BIBLIOGRAPHY

able at: http://www.jbs.cam.ac.uk/research/working_papers/2005/ wp0507.pdf (Accessed on: 13 February 2012). 10, 11, 12 de Rond, M. and Morley, I. (Eds.) (2008). Serendipity: Fortune and the Prepared Mind, Vol. 22 of The Darwin College Lectures, Cambridge University Press, Cambridge. 27, 28, 29, 254

Dellinger, A. B. and Leech, N. L. (2007). Toward a unified validation framework in mixed methods research, Journal of Mixed Methods Research 1: 309–332. 177

Dervin, B. (1992). The sense-making qualitative-quantitative methodology, Ori- ginally submitted version of chapter published as: Dervin, B. (1992). From the mind’s eye of the user: the sense-making qualitative-quantitative meth- odology, In: J. D. Glazier and R. R. Powell, Qualitative Research in Information Management, Libraries Unlimited, Englewood, CO, pp. 61– 84. Available at: http://www.ideals.illinois.edu/bitstream/handle/ 2142/2281/Dervin1992a.htm (Accessed on: 23 February 2012). 36, 49

Dervin, B. (1997). Given a context by any other name: methodological tools for taming the unruly beast, In: P. Vakkari, R. Savolainen and B. Dervin (Eds.), Information Seeking in Context: Proceedings of an International Conference on Research in Information Needs, Seeking and Use in Different Contexts, Taylor Graham, London, pp. 13–38. 31

Dervin, B., Rosenbaum, H. and Urquhart, C. (2010). What in the world are we talking about: info needs, seeking and use: key concepts: conver- gences and divergences. The differences that definitions make in core con- cepts, Workshop presented at ASIST 2010, 22–27 October 2010, Pittsburgh, PA. Available at: http://www.lib.utk.edu/refs/engineering/sigSI_ and_sigUSEworkshopASIST_2010_finalREV.pdf (Accessed on: 13 Febru- ary 2012). 13

Dewey, J. and Bentley, A. F. (1949). Knowing and the Known, Beacon Press, Boston, MA. 62, 63

DiCiccio, T. J. and Efron, B. (1996). Bootstrap confidence intervals, Statistical Science 11(3): 189–228. 260

Diesner, J. and Carley, K. (2004). Using network text analysis to detect the organizational structure of covert networks, Proceedings

390 BIBLIOGRAPHY

of the NAACSOS 2004 Conference, NAACSOS, Pittsburgh, PA. Available at: http://www.casos.cs.cmu.edu/publications/papers/ diesner_2004_usingnetworktext.pdf (Accesed on: 22 May 2010). 102

Diesner, J. and Carley, K. (2005). Revealing social structure from texts: meta- matrix text analysis as a novel method for network text analysis, In: V. K. Narayanan and D. J. Armstrong (Eds.), Causal Mapping for Research in Information Technology Research, Idea Group Publishing, London, pp. 81– 108. 88, 154, 161

Diesner, J. and Carley, K. (2010). Mapping socio-technical networks of Su- dan from open-source, large-scale text data, Abstract, Presentation at the 29th Annual Conference of the Sudan Studies Association (SSA), May 2010, West Lafayette, IN. Available at: http://www.andrew.cmu.edu/ user/jdiesner/publications/diesner_carley_sudan_2010.html (Ac- cessed on: 20 September 2011). 155

Dixon, R. M. W. (2005). A Semantic Approach to English Grammar, 2nd edn., Oxford University Press, Oxford. 81, 82, 83, 195, 196, 199, 469

Doreian, P. (2001). Causality in social network analysis, Sociological Methods & Research 30(1): 81–114. 119

Doreian, P., Batagelj, V. and Ferligoj, A. (2005). Generalized Blockmodeling, Vol. 25 of Structural Analysis in the Social Sciences, Cambridge University Press, Cambridge. 110

Doreian, P. and Stokman, F. N. (Eds.) (1997). Evolution of Social Networks, Gordon and Breach, Amsterdam. 120

D¨ork,M., Carpendale, S. and Williamson, C. (2011). The information flaneur: a fresh look at information seeking, CHI ’11: Proceedings of the SIGCHI conference on Human Factors in Computing Systems, ACM, New York, NY, pp. 1215–1224. 52, 252, 356

Downes, S. (2005). Semantic networks and social networks, The Learning Organ- ization 12(5): 411–417. 102

Dubtsov, R. (2005). Real-time event structures and Scott domains, In: V. Malyshkin (Ed.), Parallel Computing Technologies, Vol. 3606 of Lecture Notes in Computer Science, Springer, Berlin, pp. 42–48. 20

391 BIBLIOGRAPHY

Dutton, W. and Jeffreys, P. W. (Eds.) (2010). World Wide Research: Reshaping the Sciences and Humanities, The MIT Press, London. 367

Eagle, N. (2004). Can serendipity be planned?, MIT Sloan Management Review 46(1): 10–14. 4, 41, 54

Eagle, N. and Pentland, A. (2004). Social serendipity: proximity sensing and cueing, Technical Note 580, MIT Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA. Available at: http://vismod.media.mit. edu/tech-reports/old-TR-580.pdf (Accessed on: 17 February 2012). 4, 54

Eaglestone, B., Ford, N., Brown, G. J. and Moore, A. (2007). Information systems and creativity: an empirical study, Journal of Documentation 63(4): 443– 464. 12

Eco, U. (1999). Serendipities: Language and Lunacy, Phoenix, London. 12

Eddington, A. S. (1928). The Nature of the Physical World, MacMil- lan, New York, NY. Available at: http://archive.org/details/ natureofphysical00eddi (Accessed on: 28 March 2012). 49, 56

Edwards, G. (2010). Mixed-method approaches to social network analysis, Review Paper NCRM/015, Economic & Social Research Council, National Centre for Research Methods, Southampton. Available at: http://eprints.ncrm. ac.uk/842/1/Social_Network_analysis_Edwards.pdf (Accessed on: 20 May 2010). 118, 119, 334

Edwards, G. and Crossley, N. (2009). Measures and meanings: ex- ploring the ego-net of Helen Kirkpatrick Watts, militant suffra- gette, Methodological Innovations Online 4(1): 37–61. Available at: http://www.pbs.plym.ac.uk/mi/pdf/17-04-09/4.%20Edwards%20and% 20Crossley%20paper%2037-61.pdf (Accessed on: 7 June 2010. 118

Elliott, J. (2005). Using Narrative in Social Research: Qualitative and Quantit- ative Approaches, Sage, London. 72, 73, 74, 75, 76, 77, 357

Ellis, D. (1989). A behavioural approach to information retrieval system design, Journal of Documentation 45(3): 171–212. 49

Ellis, D. (1993). Modeling the information-seeking patterns of academic research-

392 BIBLIOGRAPHY

ers: a grounded theory approach, The Library Quarterly 63(4): 469–486. 32

Embree, L. E. (1992). Preface, In: L. Hardy and L. E. Embree (Eds.), Phe- nomenology of Natural Science, Vol. 9 of Contributions to Phenomenology, Kluwer Academic Publishers, Dordrecht, The Netherlands, pp. vii–xiv. 68

Emirbayer, M. (1997). Manifesto for a relational sociology, The American Journal of Sociology 103(2): 281–317. 62, 63

Erdelez, S. (1997). Information encountering: a conceptual framework for acci- dental information discovery, In: P. Vakkari, R. Savolainen and B. Dervin (Eds.), Information Seeking in Context: Proceedings of an International Conference on Research in Information Needs, Seeking and Use in Differ- ent Contexts, Taylor Graham, London, pp. 412–421. 51

Erdelez, S. (1999). Information encountering: it’s more than just bumping into information, Bulletin of the American Society for Information Science 25(3): 25–29. 10, 37, 40

Erdelez, S. (2004). Investigation of information encountering in the controlled research environment, Information Processing & Management 40(6): 1013– 1025. 4, 12

Erdelez, S. (2005). Information encountering, In: K. E. Fisher, S. Erdelez and E. F. McKechnie (Eds.), Theories of Information Behavior, Information Today, Medford, NJ, pp. 179–184. 12, 37

Erdelez, S. (2010). Opportunistic discovery of information: research challenges, Seminar presentation, UCL Interaction Centre, 9 July 2010, University Col- lege London, London. Available at: http://www-typo3.cs.ucl.ac.uk/ events/?eventnum=570 (Accessed on: 17 February 2012). 4, 37, 367

Erdelez, S. and Makri, S. (2011). Introduction to the thematic issue on opportun- istic discovery of information, Information Research 16(3): odiintro. Avail- able at: http://informationr.net/ir/16-3/odiintro.html (Accessed on: 13 February 2012). 4, 13, 26, 27, 28, 37

Erd¨os,P. and R´enyi, A. (1959). On random graphs, Publicationes Mathematicae (Debrecen) 6: 290–297. 121

Erd¨os,P. and R´enyi, A. (1960). On the evolution of random graphs, Publications

393 BIBLIOGRAPHY

of the Mathematical Institute of the Hungarian Academy of Sciences 5: 17– 61. 121

Erd¨os,P. and R´enyi, A. (1961). On the strength of connectedness of a random graph, Acta Mathematica Hungarica 12(1–2): 261–267. 121

Er´et´eo,G., Gandon, F., Buffa, M. and Corby, O. (2009). Semantic social net- work analysis, Proceedings of the WebSci’09: Society On-Line, 18–20 March 2009, Athens, Greece. Available at: http://www-sop.inria.fr/members/ Fabien.Gandon/docs/EreteoEtAl_WebSci2009.pdf (Accessed on: 25 June 2011). 79

Ericsson, K. A., Charness, N., Feltovich, P. J. and Hoffman, R. R. (Eds.) (2006). The Cambridge Handbook of Expertise and Expert Performance, Cambridge University Press, Cambridge. 139

Ertzscheid, O. and Gallezot, G. (2003). Chercher faux et trouver juste: serendipit´e et recherche d’information, Presentation at X° Colloque bilat´eralfranco- roumain, CIFSIC, Communication et Complexit´e,28 June–3 July 2003, University of Bucarest, Bucarest. Available at: http://archivesic. ccsd.cnrs.fr/docs/00/06/22/72/PDF/sic_00000689.pdf (Accessed on: 20 February 2012). 32, 367

Evans, B. M., Kairam, S. and Pirollo, P. (2009). Exploring the cognitive con- sequences of social search, CHI ’09: Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems, ACM, New York, NY, pp. 3377–3382. 33, 105

Everett, M. (2010). Social network analysis: current developments, Presentation at the Social Networks Day, The Mitchell Centre for Social Network Ana- lysis and methods@manchester joint event, 8 February 2010, University of Manchester, Manchester. Available at: www.methods.manchester.ac.uk/ events/2010-02-08/everett.ppt (Accessed on: 30 September 2011). 107

Faust, K. (2006). Comparing social networks: size, density, and local structure, Metodolo˘skizvezki 3(2): 185–216. 125, 174, 327

Faust, K. (2010). A puzzle concerning triads in social networks: graph constraints and the triad census, Social Networks 32(3): 221–233. 125, 267, 327, 328, 353, 357

394 BIBLIOGRAPHY

Faust, K. (2011). Comparing social networks, Keynote presentation at the 7th UK Social Networks Conference, 7–9 July 2011, University of Greenwich, London. 125

Feldstein, A. P. (2009). Analyzing online communitites: A narrative approach, Presentation at MiT6 Interantional Conference, 24 April 2009, MIT, Bo- ston, MA. 101

Fine, G. and Deegan, J. (1996). Three principles of serendip: insight, chance, and discovery in qualitative research, International Journal of Qualitative Studies in Education 9(4): 434–447. 7, 8, 10, 38, 41, 54, 55

Fischer, M. (2011). Social network analysis and qualitative comparative analysis: their mutual benefit for the explanation of policy network structures, Meth- odological Innovations Online 6(2): 27–51. Available at: http://www.pbs. plym.ac.uk/mi/pdf/31-08-11/3.%20Fischer%20-%20pp27-51.pdf (Ac- cessed on: 4 November 2011). 102

Fisher, K. E., Erdelez, S. and McKechnie, E. F. (Eds.) (2005). Theories of Information Behavior, Information Today, Medford, NJ. 33

Fisher, K. E. and Naumer, C. M. (2006). Information grounds: theoretical basis and empirical findings on information flow in social settings, In: A. Spink and C. Cole (Eds.), New Directions in Human Information Behavior, Vol. 8 of Information Science and Knowledge Management, Springer, Dordrecht, The Netherlands, pp. 93–111. 33

Fleming, L., Mingo, S. and Chen, D. (2007). Collaborative brokerage, gen- erative creativity, and creative success, Administrative Science Quarterly 52(3): 443–475. 106

Ford, N. (2004). Modeling cognitive processes in information seeking: from Pop- per to Pask, Journal of the American Society for Information Science and Technology 55(9): 769–782. 89

Foster, A. (2003). Interdisciplinary Information Seeking Behaviour: A Natural- istic Inquiry, PhD thesis, University of Sheffield, Sheffield. 32, 43

Foster, A. (2005). A non-linear model of information seeking behaviour, Informa- tion Research 10(2): paper 222. Available at: http://informationr.net/ ir/10-2/paper222.html (Accessed on: 28 March 2012). 46, 49

395 BIBLIOGRAPHY

Foster, A. (2006). A non-linear perspective on information seeking, In: A. Spink and C. Cole (Eds.), New Directions in Human Information Behavior, Vol. 8 of Information Science and Knowledge Management, Springer, Dordrecht, The Netherlands, pp. 155–170. 49

Foster, A. and Ford, N. (2003). Serendipity and information seeking: an empirical study, Journal of Documentation 59(3): 321–340. 4, 7, 8, 12, 27, 32, 36, 40, 41, 42, 43, 44, 49, 52, 59, 60, 238, 242, 243, 244, 245, 356, 450, 483

Frantz, T. L. and Carley, K. M. (2005). Relating network topology to the robustness of centrality measures, Technical Report CMU-ISRI-05- 117, Center for Computational Analysis of Social and Organizational Systems (CASOS), Institute for Software Research International (ISRI), School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. Available at: http://reports-archive.adm.cs.cmu.edu/anon/ isri2005/CMU-ISRI-05-117.pdf (Accessed on: 29 August 2010). 117, 354

Franzosi, R. (1989). From words to numbers: a generalized and linguistics- based coding procedure for collecting textual data, Sociological Methodology 19: 263–298. 79

Franzosi, R. (1998). Narrative analysis—or why ( and how ) sociologists should be interested in narrative, Annual Review of Sociology 24: 517–554. 72

Franzosi, R. (2004). From Words to Numbers: Narrative, Data, and Social Sci- ence, Vol. 22 of Structural Analysis in the Social Sciences, Cambridge Uni- versity Press, Cambridge. 79

Franzosi, R. (2010). Quantitative Narrative Analysis, Sage, London. 19, 41, 55, 56, 73, 76, 77, 78, 79, 85, 152, 161, 195, 196, 198, 462, 469, 471

Franzosi, R., Elliott-Smith, B. and Cunial, F. (2012). PC-ACE: Program for Computer-Assisted Coding of Events, Emory University, Atlanta, GA. Available at: http://www.pc-ace.com/ Accessed on: 4 September 2012). 85

Freeman, L. C. (1982). Centered graphs and the structure of ego networks, Mathematical Social Sciences 3(3): 291–304. 108

Freeman, L. C. (1996). Some antecedents of social network analysis, Connections

396 BIBLIOGRAPHY

19(1): 39–42. Available at: http://www.insna.org/PDF/Connections/ v19/1996_I-1-4.pdf (Accessed on: 30 September 2011). 107

Freeman, L. C. (2004). The Development of Social Network Analysis: A Study in the Sociology of Science, Empirical Press, Vancouver. 107

Freeman, L. C. (2011). The development of social network analysis—with an emphasis on recent events, In: J. Scott and P. J. Carrington (Eds.), The Sage Handbook of Social Network Analysis, Sage, London, pp. 26–39. 107

Friend, R. (2008). Serendipity in physics, In: M. de Rond and I. Morley (Eds.), Serendipity: Fortune and the Prepared Mind, Vol. 22 of The Darwin College Lectures, Cambridge University Press, Cambridge, pp. 91–107. 52

Gainor, R. (2011). The relevant clues: information behaviour and assessment in classic detective fiction, In: P. McKenzie, C. Johnson and S. Steven- son (Eds.), Exploring Interactions of People, Places and Information, Pro- ceedings of the 39th CAIS-ACSI conference, University of New Brun- swick/St. Thomas University, 2–4 June 2011, Fredericton, NB. Available at: http://www.cais-acsi.ca/proceedings/2011/30_Gainor.pdf (Ac- cessed on: 8 December 2011). 72

Gao, Y., Mathee, K., Narasimhan, G. and Wang, X. (1999). Motif detection in protein sequences, In: R. Baeza-Yates, E. Ch´avez and J. Favela (Eds.), SPIRE ’99: Proceedings of String Processing and Information Retrieval Symposium & International Workshop on Groupware, IEEE Computer So- ciety, Washington, D. C., pp. 63–72. 124

Garc´ıa, P. (2009). Discovery by serendipity: a new context for an old riddle, Foundations of Chemistry: Philosophical, Historical, Educational and In- terdisciplinary Studies of Chemistry 11(1): 33–42. 27, 29, 365

Garfield, E. (1993a). Current comments—Creativity and science. Part 2. The process of scientific discovery, Essays of an Information Scientist, Vol. 12 (1989), ISI Press, Philadelphia, pp. 314–320. Available at: http://www. garfield.library.upenn.edu/essays/v12p314y1989.pdf (Accessed on: 7 June 2010). 3, 139

Garfield, E. (1993b). Current comments—Expansion of Citation Classics—250 unique commentaries per year, Essays of an Information Scientist, Vol. 4 (1979-1980), ISI Press, Philadelphia, pp. 1–8. Available at: http:

397 BIBLIOGRAPHY

//www.garfield.library.upenn.edu/essays/v4p001y1979-80.pdf (Ac- cessed on: 12 September 2011). 135, 136

Garfield, E. (1993c). Current comments—Introducing Citation Classics: the human side of scientific reports, Essays of an Information Scientist, Vol. 3 (1977-1978), ISI Press, Philadelphia, pp. 1–2. Available at: http: //www.garfield.library.upenn.edu/essays/v3p001y1977-78.pdf (Ac- cessed on: 1 June 2011). 135, 136, 139

Garfield, E. (1993d). Current comments—Using the Citation Classics database for science studies. Part 1. Helen Astin on gender differences in author productivity, Essays of an Information Scientist, Vol. 15 (1992-1993), ISI Press, Philadelphia, pp. 347–354. Available at: http://www.garfield. library.upenn.edu/essays/v15p347y1992-93.pdf (Accessed on: 8 June 2011). 3, 136

Garfield, E. (1993e). Current contents—From obliteration to immortality—And the role of autobiography in reporting the realities behind high impact re- search, Essays of an Information Scientist, Vol. 15 (1992-1993), ISI Press, Philadelphia, pp. 387–392. Available at: http://www.garfield.library. upenn.edu/essays/v15p387y1992-93.pdf (Accessed on: 7 June 2011). 3, 135, 136, 137, 140, 237

Garfield, E. (1993f). Current contents—Ninth anniversary, Essays of an In- formation Scientist, Vol. 1 (1962-1973), ISI Press, Philadelphia, pp. 12– 15. Available at: http://www.garfield.library.upenn.edu/essays/ V1p012y1962-73.pdf (Accessed on: 12 September 2011). 135

Garfield, E. (2004a). The intended consequences of Robert K. Merton, Sciento- metrics 60(1): 51–61. 7

Garfield, E. (2004b). Papers with “serendipit*” in the title, HistCite . Avail- able at: http://garfield.library.upenn.edu/histcomp/serendipit_ title/ (Accessed on : 17 February 2012). 28, 29, 254, 372

Garfield, E. (2007). Citation classics, Web of Stories (Video number 60, length 3 min 43 sec). Available at: http://www.webofstories.com/play/19481? o=MS (Accessed on: 10 July 2011). 136

Garfield, E. (2012). Citation Classics . Available at: http://www. citationclassics.org (Accessed on: 4 September 2012). 134, 136, 145,

398 BIBLIOGRAPHY

185, 197, 204, 211, 222, 227, 247, 443, 455, 461, 465, 466, 467, 468

Gazan, R. (2005). Imposing structures: narrative analysis and the design of information systems, Library & Information Science Research 27(3): 346– 362. 72

Gibbons, J. D. (1993). Nonparametric Statistics: An Introduction, Sage, London. 175

Gilbert, E. (1959). Random graphs, Annals of Mathematical Statistics 30(4): 1141–1144. 121

Gilbert, N. (2008). Agent-Based Models, Sage, London. 120

Gile, K. and Handcock, M. S. (2006). Model-based assessment of the impact of missing data on inference for networks, Working Paper 66, Center for Statistics and the Social Sciences, University of Washington, Seattle, WA. Available at: http://www.csss.washington.edu/Papers/wp66.pdf (Ac- cessed on 3 November 2011). 117, 118

Gilsing, V., Nooteboom, B., Vanhaverbeke, W., Duysters, G. and van den Oord, A. (2008). Network embeddedness and the exploration of novel technologies: technological distance, betweenness centrality and density, Research Policy 37: 1717–1731. 106

Glaser, B. G. (1992). Basics of Grounded Theory Analysis: Emergence vs. For- cing, Sociology Press, Mill Valley, CA. 69

Glaser, B. G. (2004). “Naturalist inquiry” and grounded theory, Forum Qualit- ative Sozialforschung / Forum: Qualitative Social Research 5(1). Available at: http://www.qualitative-research.net/index.php/fqs/article/ view/652/1412 (Accessed on: 3 October 2011). 70

Glaser, B. G. and Strauss, A. L. (1967). The Discovery of Grounded Theory: Strategies for Qualitative Research, Aldine, Chicago. 69

Goodman, R. B. (Ed.) (2005). Pragmatism: Critical Concepts in Philosophy, Vol. 1–4, Routledge, London. 13

Google (2012). GOOGLE Search Engine, Google, Mountain View, CA. Available at: http://www.google.com (Accessed on: 4 September 2012). 141

399 BIBLIOGRAPHY

Gopnik, M. (1972). Linguistic Structures in Scientific Texts, Mouton, Paris. 151, 164, 365

Greenacre, M. (2008). Measures of distance between samples: non-Euclidean, Course STA254, Correspondence Analysis and Related Methods, Additional reading material, Fall, 2008, Department of Statistics, Stanford Univer- sity, Palo Alto, CA. Available at: http://www.econ.upf.edu/~michael/ stanford/maeb5.pdf (Accessed on: 5 December 2011). 224, 481

Grunwald, E. (2005). Narrative norms in written news, Nordicom Review 26(1). Available at: http://www.nordicom.gu.se/common/publ_pdf/ 180_063-080.pdf (Accessed on: 25 June 2011). 154

Haahr, M. (2012). random.org . Available at: http://www.random.org/ (Ac- cessed on: 7 September 2012). 145

Hagman, E. P. (1933). The companionships of preschool children, University of Iowa Studies in Child Welfare 7(4): 10–69. 107

Halliday, M. (1973). Explorations in the Function of Language, Edward Arnold, London. 74

Halliday, M. A. K. (1985). An Introduction to Functional Grammar, Edward Arnold, London. 77

Halliday, M. A. K. (1994). An Introduction to Functional Grammar, 2nd edn., Edward Arnold, London. 19, 41, 55, 77

Halliday, M. A. K. and Matthiessen, C. M. I. M. (2004). An Introduction to Functional Grammar, 3rd, revised edn., Arnold, London. 19, 77

Handcock, M. S. and Gile, K. (2007). Modeling social networks with sampled or missing data, Working Paper 75, Center for Statistics and the Social Sciences, University of Washington, Seattle, WA. Available at: http:// www.csss.washington.edu/Papers/wp75.pdf (Accessed on: 3 November 2011). 117, 118

Hangal, S., Nagpal, A. and Lam, M. S. (2012). Effective browsing and serendip- itous discovery with an experience-infused browser, IUI ’12: Proceedings of the 2012 International Conference on Intelligent User Interfaces, ACM, New York, NY. 4, 39, 58, 59

400 BIBLIOGRAPHY

Hannan, P. J. (2006). Serendipity, Luck and Wisdom in Research, iUniverse, Inc., New York, NY. 38, 137, 141, 254

Hanneman, R. A. and Riddle, M. (2005). Introduction to Social Network Methods, University of California, Riverside, Riverside, CA. Available at: http: //www.faculty.ucr.edu/~hanneman/nettext/ (Accessed on: 26 October 2011). 94

Hargittai, E. and Hinnant, A. (2006). Toward a social framework for informa- tion seeking, In: A. Spink and C. Cole (Eds.), New Directions in Human Information Behavior, Vol. 8 of Information Science and Knowledge Man- agement, Springer, Dordrecht, The Netherlands, pp. 55–70. 33

Harris, D. B. (1980). Citation classics—Children’s drawings as measures of intellectual maturity: a revision and extension of the Goodenough draw-a-man test, Current Contents: Social and Behavioral Sciences (17, April 28): 22. Available at: http://garfield.library.upenn.edu/ classics1980/A1980JN51000001.pdf (Accessed on: 5 April 2010). 143

Haythornthwaite, C. (1996). Social network analysis: an approach and techniques for the study of information exchange, Library & Information Science Re- search 18(4): 323–342. 103, 104

Heath, S., Fuller, A. and Johnston, B. (2009). Chasing shadows: defining net- work boundaries in qualitative social network analysis, Qualitative Research 9(5): 645–661. 117, 118

Heffron, J. (2012). Search CTAN, AZ: Jim Hefferon’s Web Views of the Compre- hensive TEX Archive Network . Available at: http://www.ctan.org/ (Ac- cessed on: 18 February 2012). 23, 435

Heidegger, M. (2002). The phenomenological method of investigation, In: D. Moran and T. Mooney (Eds.), The Phenomenology Reader, Routledge, London, pp. 278–287. 67

Heinstr¨om,J. (2005). Fast surfing, broad scanning and deep diving: the influ- ence of personality and study approach on students’ information-seeking behavior, Journal of Documentation 61(2): 228–247. 9

Helft, M. (2007). A way to find your corner of the Internet sky, Business section, The New York Times 7 October 2007. 58

401 BIBLIOGRAPHY

Hennig, M. (2006). Netzwerkanalyse literarishcer Texte—am Beispiel Thomas Manns Der Zauberberg, In: B. Hollstein and F. Straus (Eds.), Qualitative Netzwerkanalyse: Konzepte, Methoden, Anwendungen, VS Verlag, Wies- baden, Germany, pp. 465–480. 101

Hewett, T., Czerwinski, M., Terry, M., Nunamaker, J., Candy, L., Kules, B. and Sylvan, E. (2005). Creativity support tool evaluation methods and metrics, Creativity Support Tools: A Workshop Sponsored by the National Science Foundation, 13–14 June 2005, Washington, D.C., pp. 10–24. 12

Hinchman, L. P. and Hinchman, S. K. (1997). Introduction, In: L. P. Hinch- man and S. K. Hinchman (Eds.), Memory, Identity, Community: The Idea of Narrative in the Human Sciences, State University of New York Press, Albany, NY, pp. xiii–xxxii. 73

Holder, R. J. and McKinney, R. N. (1993). Serendipity, the art of discontinuous improvement and staying alive. Wasn’t that new just yesterday?, Journal for Quality and Participation 16(7): 54–63. 12, 50, 251

Holland, P. W. and Leinhardt, S. (1981). An exponential family of probabil- ity distributions for directed graphs, Journal of the American Statistical Association 76(373): 33–65. 118

Holland, S. M. (2008). Non-Metric Multidimensional Scaling (MDS), Department of Geology, University of Georgia, Athens, GA. Available at: http:// strata.uga.edu/software/pdf/mdsTutorial.pdf (Accessed on: 11 July 2011). 224

Hollstein, B. and Straus, F. (Eds.) (2006). Qualitative Netzwerkanalyse: Konzepte, Methoden, Anwendungen, VS Verlag f¨urSozialwissenschaften, Wiesbaden, Germany. 118

Hothorn, T., Hornik, K., van de Wiel, M. A. and Zeileis, A. (2006). A lego system for conditional inference, The American Statistician 60(3): 257–263. 228

Hothorn, T., Hornik, K., van de Wiel, M. A. and Zeileis, A. (2008). Imple- menting a class of permutation tests: the coin package, Journal of Statist- ical Software 28(8): 1–23. Available at: http://www.jstatsoft.org/v28/ i08/ (Accessed on: 21 November 2011). 228

Hubbard, R. M. (1929). A method of studying spontaneous group formation,

402 BIBLIOGRAPHY

In: Dorothy Swaine Thomas and Associates (Ed.), Some New Techniques for Studying Social Behavior, Teachers College, Columbia University, Child Development Monographs, New York, NY, pp. 76–85. 107

Huemer, W. (2010). Franz Brentano, In: E. N. Zalta (Ed.), The Stan- ford Encyclopedia of Philosophy, Center for the Study of Language and Information (CSLI), Stanford University, Stanford, CA. Available at: http://plato.stanford.edu/entries/brentano/ (Accessed on: 3 Octo- ber 2011). 67

Huisman, M. (2009). Imputation of missing network data: some simple pro- cedures, Journal of Social Structure 10(1). Available at: http://www. cmu.edu/joss/content/articles/volume10/huisman.pdf (Accessed on: 3 November 2011). 117, 118

Hulme, D. and Toye, J. (2005). The case for cross-disciplinary social science re- search on poverty, inequality and well-being, Working paper GPRG–WPS– 001, Global Poverty Research Group, Department of Economics, Oxford University, Oxford. Available at: http://economics.ouls.ox.ac.uk/ 14081/1/gprg-wps-001.pdf (Accessed on: 20 March 2012). 52, 53

Hummell, H. J. and Sodeur, W. (1987). Strukturbeschreibung von Positionen sozialen Beziehungsnetzen, In: F. U. Pappi (Ed.), Techniken der Em- pirischen Sozialforschung: 1, Methoden der Netzwerkanalyse, Oldenbourg, Munich, pp. 177–202. 328

Husserl, E. (1900). Logische Untersuchungen,, Vol. 1, Max Niemeyer, Halle A. S. Available at: http://www.freidok.uni-freiburg.de/volltexte/5920/ pdf/Husserl_Logische_Untersuchungen_1_1900.pdf (Accessed on: 3 October 2011). 67

Husserl, E. (1901). Logische Untersuchungen, Vol. 2, Max Niemeyer, Halle A. S. Available at: http://www.freidok.uni-freiburg.de/volltexte/6020/ pdf/Husserl_Logische_Untersuchungen_2_1901.pdf (Accessed on: 3 October 2011). 67

Isaksen, S. G. (1987). Introduction: an orientation to the frontiers of creativity research, In: S. G. Isaksen (Ed.), Frontiers of Creativity Research: Beyond the Basics, Bearly Limited, Buffalo, NY, pp. 1–26. 12

Jack, S. L. (2010). Approaches to studying networks: implications and outcomes,

403 BIBLIOGRAPHY

Journal of Business Venturing 25(1): 120–137. 118

J¨arvelin, K. and Wilson, T. D. (2003). On conceptual models for information seek- ing and retrieval research, Information Research 9(1): paper 163. Available at: http://informationr.net/ir/9-1/paper163.html (Accessed on: 10 October 2011). 89

Johnson, C. (2004). Choosing people: the role of social capital in informa- tion seeking behaviour, Information Research 10(1): paper 201. Available at: http://informationr.net/ir/10-1/paper201.html (Accessed on: 20 February 2012). 33

Johnson, J. D. (2003). On contexts of information seeking, Information Processing & Management 39(5): 735–760. 31

Johnson-Laird, P. N. (2005). Mental models and thought, In: K. J. Holyoak and R. G. Morrison (Eds.), The Cambridge Handbook of Thinking and Reason- ing, Cambridge University Press, Cambridge, pp. 185–208. 71, 72

Johnson, S. (2010). Where Good Ideas Come From: The Natural History of Innovation, Penguin, London. 12, 40, 41, 58, 59

Jung, J. J. and Euzenat, J. (2007). Towards semantic social networks, ESWC ’07: Proceedings of the 4th European Conference on The Semantic Web: Research and Applications, Springer-Verlag, Berlin, pp. 267–280. 102

Juszczyszyn, K. and Musial, K. (2011). Motif analysis and the periodic structural changes in an organizational e-mail-based social network, Virtual Com- munities: Concepts, Methodologies, Tools and Applications, IGI Global, Hershey, PA, pp. 292–302. 125

Kamaliha, E., Riahi, F., Qazvinian, V. and Adibi, J. (2008). Characterizing network motifs to identify spam comments, ICDMW ’08: Proceedings of the 2008 IEEE International Conference on Data Mining Workshops, IEEE Computer Society, Washington, D. C., pp. 919–928. 125

Karafillidis, A. (2008). Networks and boundaries, Presentation at the In- ternational Symposium, Relational Sociology: Transatlantic Impulses for the Social Sciences, 25–26 September 2008, Humboldt Universit¨at zu Berlin, Berlin. Available at: http://www.relational-sociology.de/ karafillidis.pdf (Accessed on: 2 November 2011). 117

404 BIBLIOGRAPHY

Katz, J. S. (1994). Geographical proximity and scientific collaboration, Sciento- metrics 31(1): 31–43. 368

Kijkuit, B. and van den Ende, J. (2007). The organizational life of an idea: integ- rating social network, creativity and decision-making perspectives, Journal of Management Studies 44(6): 863–882. 106

King, G., Keohane, R. O. and Verba, S. (1994). Designing Social Inquiry: Sci- entific Inference in Qualitative Research, Princeton University Press, Prin- ceton, NJ. 232

Kinsella, S., Harth, A. and Breslin, J. G. (2007). Network analysis of semantic connections in heterogeneous social spaces, Presentation at The UK Social Network Conference, 13 July 2007, University of London, London. Avail- able at: http://www.johnbreslin.org/files/publications/20070713_ uksn2007.pdf (Accessed on: 9 September 2011). 79, 80

Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment, Journal of the ACM 46(5): 604–632. 105

Knox, H., Savage, M. and Harvey, P. (2006). Social networks and the study of relations: networks as method, metaphor and form, Economy and Society 35(1): 113–140. 119, 334

Koenig, M. (2000). Why serendipity is the key to innovation, Knowledge Man- agement Review 3(2): 10–11. 12, 59

Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Mod- els, Springer, New York, NY. 94, 100, 118, 121, 122, 123, 124, 125, 127

Koskinen, J. (2007). Fitting models to social networks with missing data, Present- ation at Sunbelt XXVII, the International Sunbelt Social Network Confer- ence, 1–6 May 2007, Corfu, Greece. 118

Koskinen, J., Robins, G. and Pattison, P. (2008). Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation, Technical Report 08–04, MelNet Social Networks Laboratory, Department of Psychology, School of Behavioural Science, University of Melbourne, Mel- bourne. Available at: http://www.sna.unimelb.edu.au/publications/ MelNet_Techreport_08_04.pdf (Accessed on: 3 November 2011). 118

405 BIBLIOGRAPHY

Kossinets, G. (2006). Effects of missing data in social networks, Social Networks 28(3): 247–268. 117, 354

Kossinets, G., Kleinberg, J. and Watts, D. (2008). The structure of information pathways in a social communication network, KDD ’08: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, pp. 435–443. Available at: http:// arxiv.org/PS_cache/arxiv/pdf/0806/0806.3201v1.pdf (Accessed on: 28 July 2011). 104

Kozak, M. and Krzanowski, W. J. (2010). Effective presentation of data, European Science Editing 36(2): 41–42. 260

Krippendorf, K. (1980). Content Analysis: An Introduction to its Methodology, Sage, Beverly Hills, CA. 196

Krippendorff, K. (2009). Testing the reliability of content analysis data: what is involved and why, In: K. Krippendorff and M. A. Bock (Eds.), The Content Analysis Reader, Sage, London, pp. 350–357. 350

Kruskal, J. B. and Wish, M. (1978). Multidimensional Scaling, Sage, Beverly Hills, CA. 224

Kuhlthau, C. C. (1993). Seeking Meaning: A Process Approach to Library and Information Services, Ablex, Norwood, NJ. 49

Laborde, D. (2009). Improvisation, s´erendipit´e, ind´etermination, Presenta- tion at La S´erendipit´edans les Sciences, les Arts et la D´ecision,20–30 July 2009, Centre Culturel International de Cerisy-La-Salle. Available at: http://www.ccic-cerisy.asso.fr/serendipite09.html (Accessed on: 13 February 2012). 12

Labov, W. (1997). Some further steps in narrative analysis, Journal of Narrative and Life History 7(1–4): 395–415. 73, 76, 77, 152, 153, 195, 196, 364

Labov, W. (2003). Uncovering the event structure of narrative, In: D. Tannen and J. E. Alatis (Eds.), Georgetown University Round Table on Languages and Linguistics (GURT) 2001. Linguistics, Language, and the Real World: Discourse and Beyond, Georgetown University Press, Washington, DC, pp. 63–83. 20, 85, 87, 195, 196, 356, 469

406 BIBLIOGRAPHY

Labov, W. and Waletzky, J. (1967). Narrative analysis: oral versions of personal experience, In: J. Helm (Ed.), Essays on the Verbal and Visual Arts, University of Washington Press, Seattle, pp. 12–44. 73, 76, 77, 152, 195, 196, 364, 469

Labov, W. and Waletzky, J. (1997). Narrative analysis: oral versions of personal experience, Journal of Narrative and Life History 7(1–4): 3–38. 76, 77, 152, 195, 196, 469

Lamport, L. (2012). LATEX. Available at: http://www.latex-project.org/ (Ac- cessed on: 10 January 2012). 23

Latour, B. (2005). Reassembling the Social: An Introduction to Actor-Network- Theory, Oxford University Press, Oxford. 62

Latour, B. and Woolgar, S. (1986). Laboratory Life: The Construction of Sci- entific Facts, Princeton University Press, Princeton, NJ. 330, 366

Laumann, E. O., Marsden, P. V. and Prensky, D. (1983). The boundary specific- ation problem in network analysis, In: R. S. Burt and M. J. Minor (Eds.), Applied Network Analysis: A Methodological Introduction, Sage, London, pp. 18–34. 116

Laumann, E. O., Marsden, P. V. and Prensky, D. (1989). The boundary spe- cification problem in network analysis, In: L. C. Freeman, D. R. White and A. K. Romney (Eds.), Research Methods in Social Network Analysis, George Mason University Press, Fairfax, VA, pp. 61–87. 116, 117

Layder, D. (2006). Understanding Social Theory, 2nd edn., Sage, London. 62

Lazer, D. and Friedman, A. (2007). The network structure of exploration and exploitation, Administrative Science Quarterly 52(4): 667–694. 62

Lee, S. S., Theng, Y. L. and Goh, D. H. (2005). Creative information seeking. Part I: a conceptual framework, Aslib Proceedings: New Information Perspectives 57(5): 460–475. 12

Leong, T. W., Vetere, F. and Howard, S. (2007). Serendipity doesn’t happen by chance, Unpublished manuscript, Department of Information Systems, The University of Melbourne, Melbourne. Available at: http://disweb.dis.unimelb.edu.au/staff/twleong/

407 BIBLIOGRAPHY

publications/serendipity_doesn’t_happen_by_chance.pdf (Accessed on: 30 March 2012). 58

Leong, T. W., Wright, P., Vetere, F. and Howard, S. (2010). Understanding ex- perience using dialogical methods: the case of serendipity, OZCHI ’10: Pro- ceedings of the 22nd Conference of the Computer-Human Interaction Special Interest Group of Australia on Computer-Human Interaction, ACM, New York, NY, pp. 256–263. 39, 40

Lewis, M. and Staehler, T. (2010). Phenomenology: An Introduction, Continuum, London. 65, 67, 70

Leydesdorff, L. and Vaughan, L. Q. (2006). Co-occurrence matrices and their applications in information science: extending ACA to the Web environ- ment, Journal of the American Society for Information Science and Tech- nology 57(12): 1616–1628. Available at: http://www.leydesdorff.net/ aca/index.htm (Accessed on: 5 December 2011). 224

Liddy, E. D. (2003). Natural language processing, Encyclopedia of Library and Information Science, Marcel Decker, Inc., New York, NY. Available at: http://cnlp.syr.edu/publications/03NLP.LIS. Encyclopedia.pdf (Accessed on: 13 October 2011). 81

Lieblich, A., Tuval-Maschiach, R. and Zilber, T. (1998). Narrative Research: Reading, Analysis, and Interpretation, Sage, Thousand Oaks, CA. 75

Liebowitz, J. (2005). Linking social network analysis with the analytic hier- archy process for knowledge mapping in organizations, Journal of Know- ledge Management 9(1): 76–86. 105

Liestman, D. (1992). Chance in the midst of design: approaches to library re- search. Serendipity, RQ 31(Summer): 524–532. 43

Lincoln, Y. S. and Guba, E. G. (1985). Naturalistic Inquiry, Sage, London. 69, 70, 177

Lopatovska, I., Basen, A. S., Duneja, A. M., Kwong, H., Pasquinelli, D. L., Sopab, S., Stokes, B. J. and Weller, C. (2011). Information behaviour of New York City subway commuters, Information Research 16(4): paper 501. Available at: http://informationr.net/ir/16-4/paper501.html (Accessed on: 21 February 2012). 31

408 BIBLIOGRAPHY

Lorrain, F. and White, H. C. (1971). Structural equivalence of individuals in social networks, Journal of Mathematical Sociology 1: 49–80. 110

Louis, T. A. and Zeger, S. L. (2007). Effective communication of standard errors and confidence intervals, Working Paper 151, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins Uni- versity, Baltimore, MD. Available at: http://biostats.bepress.com/ jhubiostat/paper151 (Accessed on: 27 June 2012). 260

Mackenzie, M. (2005). Managers look to the social network to seek inform- ation, Information Research 10(2): paper 216. Available at: http:// informationr.net/ir/10-2/paper216.html (Accessed on: 20 February 2012). 33

Makri, S. and Blandford, A. (2011). What is serendipity? A workshop report, In- formation Research 16(3): paper 491. Available at: http://informationr. net/ir/16-3/paper491.html (Accessed on: 13 February 2012). 9, 10, 40, 46, 48, 49

Makri, S. and Blandford, A. (2012a). Coming across information serendipitously: part 1—a process model, Journal of Documentation 68(5): Earlycite (Date online 31/7/2012). 55, 332

Makri, S. and Blandford, A. (2012b). Coming across information serendipitously: part 2—a classification framework, Journal of Documentation 68(5): Early- cite (Date online 31/7/2012). 55, 332

Makri, S. and Warwick, C. (2010). Information for inspiration: understanding ar- chitects’ information seeking and use behaviors to inform design, Journal of the American Society for Information Science and Technology 61(9): 1745– 1770. 27

Mann, T. (1924). Der Zauberberg, S. Fischer, Berlin. 101

Marin, A. and Wellman, B. (2011). Social network analysis: an introduction, In: J. Scott and P. J. Carrington (Eds.), The Sage Handbook of Social Network Analysis, Sage, London, pp. 11–25. 107

Marsden, P. V. (1990). Network data and measurement, Annual Review of Soci- ology 16: 435–463. 74, 117

409 BIBLIOGRAPHY

Marsden, P. V. (2005). Recent developments in network measurement, In: P. J. Carrington, J. Scott and S. Wasserman (Eds.), Models and Methods in Social Network Analysis, Vol. 27 of Structural Analysis in the Social Sciences, Cambridge University Press, Cambridge, pp. 8–30. 116

McBirnie, A. (2008a). A Model of Serendipity in Information Seeking, Masters dissertation, Thames Valley University, London. 2, 5, 8, 12, 46, 52

McBirnie, A. (2008b). Seeking serendipity: the paradox of control, Aslib Proceed- ings: New Information Perspectives 60(6): 600–618. 2, 8, 12, 27, 46, 59, 60

McBirnie, A. (2009). Serendipity, Workshop presentation at the Oxford Internet Institute Doctoral Workshop, March 2009, University of Oxford, Oxford. 494

McBirnie, A. (2010a). Social network analysis, Workshop presentation at the Research2 PhD Forum, 8–9 July 2010, Loughborough University, Lough- borough. 494

McBirnie, A. (2010b). The structure of serendipity: research approaches and practical applications, Seminar presentation, UCL Interaction Centre, 27 October 2010, University College London, London. Available at: http: //www.cs.ucl.ac.uk/events/?eventnum=588 (Accessed on: 17 August 2012). 493

McBirnie, A. (2011). Studying serendipity: a multi-model network approach to narrative, Presentation at the 7th UK Social Network Conference, 7–9 July 2011, University of Greenwich, London. 493

McBirnie, A. (2012). Phenomenological event structures of serendipity. A multi-model network approach to information behaviour in con- text, Doctoral Forum presentation, Statistical Cybermetrics Research Group, 14 February 2012, University of Wolverhampton, Wolverhamp- ton. Available at: http://cybermetrics.wlv.ac.uk/abigail.mcbirnie. abstract14feb.pdf (Accessed on: 17 August 2012). 493

McBirnie, A. and Urquhart, C. (2011). Motifs: dominant interaction patterns in event structures of serendipity, Information Research 16(3): paper 494. Available at: http://informationr.net/ir/16-3/paper494.html (Ac- cessed on: 9 November 2011). 54, 126, 311, 319, 493

410 BIBLIOGRAPHY

McCay-Peet, L. and Toms, E. (2010). The process of serendipity in knowledge work, IIiX 20101: Proceedings of the 3rd Symposium on Information In- teraction in Context, New Brunswick, NJ, 18–21 August 2010, ACM, New York, NY, pp. 377–382. 12, 27, 40, 46, 49, 55

McCay-Peet, L. and Toms, E. (2011a). Measuring the dimensions of serendipity in digital environments, Information Research 16(3): paper 483. Available at: http://informationr.net/ir/16-3/paper483.html (Accessed on: 14 December 2011). 26, 37, 332

McCay-Peet, L. and Toms, E. (2011b). The serendipity quotient, Presentation at ASIST 2011, Bridging the Gulf: Communication and Information in Soci- ety, Technology, and Work, 9–13 October 2011, New Orleans, LA. Available at: www.asis.org/asist2011/posters/236_FINAL_SUBMISSION.doc (Ac- cessed on: 13 February 2012). 7, 10

McCulloh, I. (2009). Detecting Changes in a Dynamic Social Net- work, PhD thesis, Carnegie Mellon University, Pittsburgh, PA. Available at: http://www.casos.cs.cmu.edu/publications/papers/ CMU-ISR-09-104.pdf (Accessed on: 4 November 2011). 120

McCulloh, I. and Carley, K. (2009). Longitudinal dynamic network ana- lysis, Technical Report CMU-ISR-09-118, Institute for Software Re- search, School of Computer Science, Carnegie Mellon University, Pitts- burgh, PA. Available at: http://www.casos.cs.cmu.edu/publications/ papers/CMU-ISR-09-118.pdf (Accessed on: 4 November 2011). 120

Mehler, A. (2008). Large text networks as an object of corpus linguistic stud- ies, In: A. L¨udelingand M. Kyt¨o(Eds.), Corpus Linguistics. An In- ternational Handbook of the Science of Language and Society, De Gruyter, Berlin, pp. 328–382. Available at: http://www.hucompute.org/data/pdf/ mehler_2007_a.pdf (Accessed on: 28 October 2011). 100

Mergel, I. and Henning, M. (2007). Market analysis of “social network analysis” books, Working paper PNG07–001(a), Program on Networked Gov- ernance, Kennedy School of Government, Harvard University, Cambridge, MA. Available at: http://www.hks.harvard.edu/netgov/files/png_ workingpaper_series/PNG07_001a_WorkingPaper_MergelHennig_EN. pdf (Accessed on: 26 May 2010). 107

Merton, R. K. (1936). The unanticipated consequences of purposive social action,

411 BIBLIOGRAPHY

American Sociological Review 1(6): 894–904. 6

Merton, R. K. (1945). Sociological theory, American Journal of Sociology 50(6): 462–473. 5, 51

Merton, R. K. (1948). The bearing of empirical research upon the development of social theory, American Sociological Review 13(5): 505–515. 5, 7, 28, 37, 272, 367

Merton, R. K. (1968). Social Theory and Social Structure, 1968 enlarged edn., Free Press, New York, NY. 8, 12

Merton, R. K. (1973). The Sociology of Science: Theoretical and Empirical In- vestigations, University of Chicago Press, Chicago, IL. 6, 11, 366

Merton, R. K. (1975). Thematic analysis in science: notes on Holton’s concept, Science 188(4186): 335–338. 45

Merton, R. K. (1987). Three fragments from a sociologist’s notebook: establish- ing the phenomenon, specified ignorance, and strategic research materials, Annual Review of Sociology 13: 1–29. 6, 12

Merton, R. K. (1996). On Social Structure and Science, University of Chicago Press, Chicago, IL. 11, 366

Merton, R. K. (2004). Afterword, The Travels and Adventures of Serendipity, Princeton University Press, Princeton, NJ, pp. 230–298. 5, 10, 11, 50, 51, 237, 248, 251, 252, 356, 369, 371

Merton, R. K. and Barber, E. (2004). The Travels and Adventures of Serendipity, Princeton University Press, Princeton, NJ. 5, 6, 10, 27, 28, 29, 38, 40, 46, 241, 248

Meyer, M. A. (2007). Happy Accidents: Serendipity in Modern Medical Break- throughs, Arcade, New York, NY. 7

Microsoft Corporation (2006). Microsoft® Office EXCEL® 2007, Microsoft Cor- poration, Redmond, WA. 182

Mihalcea, R. and Radev, D. (2011). Graph-Based Natural Language Processing and Information Retrieval, Cambridge University Press, Cambridge. 100

412 BIBLIOGRAPHY

Milgram, S. (1967). The small world problem, Psychology Today 2(1): 60–67. 105

Milo, R., Kashtan, N., Itzkovitz, S., Newman, M. E. J. and Alon, U. (2003). On the unifrom generation of random graphs with prescribed degree sequences, eprint arXiv:cond-mat/0312028v2 . Available at: http://arxiv.org/ PS_cache/cond-mat/pdf/0312/0312028v2.pdf (Accessed on: 9 November 2011). 127

Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D. and Alon, U. (2002). Network motifs: simple building blocks of complex networks, Sci- ence 298(5594): 824–827. 124, 125, 127, 319, 357

Mingers, J. and Brocklesby, J. (1997). Multimethodology: towards a framework for mixing methodologies, Omega: The International Journal of Manage- ment Science 25(5): 489–509. 128, 129

Mishler, E. G. (1995). Models of narrative analysis: a typology, Journal of Nar- rative and Life History 5(2): 87–123. 74

Mittelbach, F., Sch¨opf, R. and Downes, M. (2001). The eucal and euscript packages. Available at: http://www.maths.usyd.edu.au/u/SMS/texdoc/ euscript.pdf (Accessed on: 18 February 2012). 23, 435

Moffat, S. (2011). Doctor Who boss ‘hates’ fans who spoil show’s secrets, BBC Radio Five Live (Broadcast on 11 May, 2011, length 56 sec). Available at: http://www.bbc.co.uk/news/uk-wales-13358447 (Accessed on: 14 October 2011). xi

Moghnieh, A., Arroyo, E. and Blat, J. (2008). The news wall: serendipitous dis- coveries in dynamic information spaces, In: K. S. Mukasa, A. Holzinger and A. Karshmer (Eds.), IUI4AAL ’08: Proceedings of the First Interna- tional Workshop on Intelligent User Interfaces for Ambient Assisted Living, Fraunhofer IRB Verlag, Stuttgart, pp. 77–86. 40, 58

Molden, D. C. and Higgins, E. T. (2005). Motivated thinking, In: K. J. Holyoak and R. G. Morrison (Eds.), The Cambridge Handbook of Thinking and Reas- oning, Cambridge University Press, Cambridge, pp. 295–317. 71

Moran, D. (2002). Editor’s introduction, In: D. Moran and T. Mooney (Eds.), The Phenomenology Reader, Routledge, London, pp. 1–26. 65, 66, 67, 68, 69

413 BIBLIOGRAPHY

Moreno, J. L. (1934). Who Shall Survive?, Nervous and Mental Disease Publish- ing Company, Washington, D.C. 107

Morrison, R. G. (2005). Thinking in working memory, In: K. J. Holyoak and R. G. Morrison (Eds.), The Cambridge Handbook of Thinking and Reason- ing, Cambridge University Press, Cambridge, pp. 457–473. 72

Morton, W. R. and Swindler, K. (2005). Serendipitous insights involving nonhu- man primates, ILAR Journal, National Research Council Institute of Labor- atory Animal Resources 46(4): 346–351. 27, 46, 47, 238

Mueller, E. T. (1990). Daydreaming in Humans and Machines: A Computer Model of the Stream of Thought, Ablex, Norwood, NJ. 12, 46, 54, 241

Muenchen, R. A. (2009). R for SAS and SPSS Users, Springer, New York, NY. 183

Namboodiri, K. (1984). Matrix Algebra: An Introduction, Sage, London. 94

Natural Sciences and Engineering Research Council of Canada (NSERC) (2004). Guidelines for the Preparation and Review of Applications in Interdiscip- linary Research, NSERC, Ottawa, ON. 51

Newman, M. E. J. (2004). Analysis of weighted networks, eprint arXiv:cond- mat/0407503v1 . Available at: http://arxiv.org/pdf/condmat/ 0407503.pdf (Accessed on: 24 July 2012). 301

Newman, M. E. J. (2010). Networks: An Introduction, Oxford University Press, Oxford. 93, 100, 101, 105, 107, 120, 121, 122, 123, 124, 173, 267, 353, 354, 355

Nooteboom, B., Gilsing, V., Vanhaverbeke, W., Duysters, G. and van den Oord, A. (2006). Network embeddedness and the exploration of novel technologies: technological distance, betweenness centrality and dens- ity, Working Paper 21, Dynamics of Institutions and Markets in Europe (DIME), Europe. Available at: http://www.dime-eu.org/files/active/ 2/Nooteboom.pdf (Accessed on: 3 September 2010). 106

Nothhaft Jr., H. (2010). The myth of serendipity, TechCrunch . Available at: http://techcrunch.com/2010/11/27/myth-serendipity/ (Accessed on: 6 February 2012). 43

414 BIBLIOGRAPHY

Nutefall, J. E. and Ryder, P. M. (2010). The serendipitous research process, Journal of Academic Librarianship 36(3): 228–234. 27, 28, 32

Oksanen, J., Blanchet, F. G., Kindt, R., Legendre, P., Minchin, P. R., O’Hara, R. B., Simpson, G. L., Solymos, P., Stevens, M. H. H. and Wagner, H. (2012). vegan: Community Ecology Package. Available at: http:// cran.r-project.org/web/packages/vegan/vegan.pdf (Accessed on: 4 September 2012). 224, 317, 479

Olsson, M. (2005). Beyond ‘needy’ individuals: conceptualizing information be- havior, Proceedings of the American Society for Information Science and Technology 42(1): n/a. Available at: http://eprints.rclis.org/handle/ 10760/6998#.TzkA2MnKfIs (Accessed on: 13 February 2012). 9

Onwuegbuzie, A. J. and Johnson, R. B. (2006). The validity issue in mixed research, Research in the Schools 13(1): 48–63. 177

Open Clip Art Library (2012). Island palm and the sun, Open Clip Art Library. Available at: http://openclipart.org/detail/ 12857/island-palm-and-the-sun-by-anonymous-12857 (Accessed on: 17 September 2012). 15

Opsahl, T. (2009). Structure and Evolution of Weighted Networks, PhD thesis, Queen Mary College, University of London, London. Available at: http: //toreopsahl.com/publications/thesis/ (Accessed on: 24 July 2012). 301

Opsahl, T., Agneessens, F. and Skvoretz, J. (2010). Node centrality in weighted networks: generalizing degree and shortest paths, Social Networks 32(3): 245–251. 301

Osareh, F. (1996a). Bibliometrics, citation analysis and co-citation analysis: a review of literature i., Libri 46(3): 149–158. 105

Osareh, F. (1996b). Bibliometrics, citation analysis and co-citation analysis: a review of literature ii., Libri 46(4): 217–225. 105

P. H. A. (1963). Serendipity in research, Science 140(3572): editorial. 27, 340

Palmer, C. L., Teffeau, L. C. and Pirmann, C. M. (2009). Scholarly informa- tion practices in the online environment: themes from the literature and

415 BIBLIOGRAPHY

implications for library service, OCLC Research, Dublin, OH. Available at: www.oclc.org/programs/publications/reports/2009-02.pdf (Ac- cessed on: 23 August 2012). 367

Palmquist, M., Carley, K. and Dale, T. (1997). Applications of computer-aided text analysis: analyzing literary and non-literary texts, In: C. W. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statist- ical Inferences from Texts and Transcripts, Lawrence Erlbaum Associates, Mahwah, NJ, pp. 171–189. 102, 154

P´alsd´ottir,A. (2011). Opportunistic discovery of information by elderly Icelanders and their relatives, Information Research 16(3): paper 485. Available at: http://informationr.net/ir/16-3/paper485.html (Accessed on: 23 February 2012). 37

Patashnik, O. and Lamport, L. (2012). BIBTEX. 23

Perry, B. and Edwards, M. (2010). Serendipity pattern, In: A. J. Mills, G. Dure- pos and E. Wiebe (Eds.), Encyclopedia of Case Study Research, Vol. 2, Sage, Thousand Oaks, CA, pp. 858–860. 5, 7

Perry-Smith, J. E. and Shalley, C. E. (2003). The social side of creativity: a static and dynamic social network perspective, Academy of Management Review 28(1): 89–106. 106

Pettigrew, A. M. (1987). Introduction: researching strategic change, In: A. M. Pettigrew (Ed.), The Management of Strategic Change, Basil Blackwell, Oxford. 31

Pettigrew, K. E., Fidel, F. and Bruce, H. (2001). Conceptual frameworks in information behavior, In: M. E. Williams (Ed.), Annual Review of In- formation Science and Technology (ARIST), Vol. 35, American Society for Information Science and Technology (ASIST), Medford, NJ, pp. 43–78. 9

Pfaff, N. (2006). Narratives interview, Einf¨uhrung in die qualitative erziehungswissenschaftliche Forschung, Unpublished lecture given at the Martin-Luther-Universit¨at Halle-Wittenberg, Halle, Germany. Avail- able at: http://www.erzwiss.uni-halle.de/gliederung/paed/allgew/ material/ws05_06/NarrativesInterview.pdf (Accessed on: 20 May 2010). 139, 154

416 BIBLIOGRAPHY

Pfeffer, J. (2012). Networks/PAJEK. Package for Large Network Analysis. How to Convert EXCEL Datasets into PAJEK Format, EXCEL2PAJEK, Down- loadable via the PAJEK Wiki maintained by Vladimir Batagelj. Avail- able at: http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/ excel2Pajek.htm (Accessed on: 4 September 2012). 184

Plummer, K. (1995). Telling Sexual Stories: Power, Change, and Social Worlds, Routledge, London. 74

Popping, R. (2003). Knowledge graphs and network text analysis, Social Science Information 42(1): 91–106. 101, 154

Portes, A. (2010). Reflections on a common theme: establishing the phenomenon, adumbration, and ideal types, In: C. Calhoun (Ed.), Robert K. Merton: Sociology of Science and Sociology as Science, Columbia University Press, New York, NY, pp. 32–53. 6

Prabha, C., Connaway, L. S., Olszewski, L. and Jenkins, L. R. (2007). What is enough? Satisficing information needs, Journal of Documentation 63(1): 74–89. 46

Pratt, J. W. (1959). Remarks on zeros and ties in the Wilcoxon Signed Rank procedures, Journal of the American Statistical Association 54(287): 655– 667. 228

Prekop, P. (2002). A qualitative study of collaborative information seeking, Journal of Documentation 58(5): 533–547. 31, 35, 46

Psathas, G. (1999). On the study of human action: Schutz and Garfinkel on social science, In: L. E. Embree (Ed.), Schutzian Social Science, Vol. 37 of Contributions to Phenomenology, Kluwer Academic Publishers, Dordrecht, The Netherlands, pp. 47–68. 68, 69

Pullan, W. (2000). Introduction: structuring structure, In: W. Pullan and H. Bhadeshia (Eds.), Structure in Science and Art, (The Darwin College Lectures), Cambridge University Press, Cambridge, pp. 1–8. 90

Quine, W. V. (1996). Events and reification, In: R. Casati and A. C. Varzi (Eds.), Events, Dartmouth, Aldershot, pp. 107–116. 56

Quinn, B. (2003). Overcoming psychological obstacles to optimal online search

417 BIBLIOGRAPHY

performance, The Electronic Library 21(2): 142–153. 9

Raftery, J. (1998). From Ptolemy to Heisenberg: quantitative models and reality, Construction Management and Economics 16(3): 295–302. 90, 91

Ram, A., Wills, L., Domesck, E., Nersessian, N. J. and Kolodner, J. (1995). Understanding the creative mind: a review of Margaret Boden’s Creative Mind, Artificial Intelligence 79(1): 111–128. 12

Rasche, F. and Wernicke, S. (2006). FANMOD: a Tool for FAst Network MOtif Detection—Manual, Institut f¨urInformatik, Friedrich-Schiller-Universit¨at Jena, Jena, Germany. Available at: http://theinf1.informatik. uni-jena.de/~wernicke/motifs/fanmod-manual.pdf (Accessed on: 14 June 2011). 187

Rasche, F. and Wernicke, S. (2012). FANMOD: The FAst Network MOtif De- tection Tool, Institut f¨urInformatik, Friedrich-Schiller-Universit¨atJena, Jena, Germany. Available at: http://theinf1.informatik.uni-jena. de/~wernicke/motifs/ (Accessed on: 4 September 2012). 187

R Development Core Team (2012a). R: A Language and Environment for Stat- istical Computing, R Foundation for Statistical Computing, Vienna, Aus- tria. Available at: http://www.R-project.org (Accessed on: 7 September 2012). 182, 183

R Development Core Team (2012b). Nonmetric multidimensional scaling with stable solution from random starts, axis scaling and species scores, Com- munity Ecology Package . Available at: http://cc.oulu.fi/~jarioksa/ softhelp/vegan/html/metaMDS.html (Accessed on: 4 September 2012). 224

R Development Core Team (2012c). Symmetry tests, Conditional Inference Pro- cedures in a Permutation Test Framework . Available at: http://rss.acs. unt.edu/Rdoc/library/coin/html/SymmetryTests.html (Accessed on: 4 September 2012). 228

Reason, J. (1990). Human Error, Cambridge University Press, Cambridge. 71

Reber, A. S. (1993). Implicit Learning and Tacit Knowledge: An Essay on the Cognitive Unconscious, Vol. 19 of Oxford Psychology Series, Oxford Univer- sity Press, Oxford. 71

418 BIBLIOGRAPHY

Remer, T. G. (1965). Serendipity and the Three Princes, University of Oklahoma Press, Norman, OK. 5

Rice, J. (1988). Serendipity and holism: the beauty of OPACs, Library Journal 113(3): 138–141. 58

Rice, R. E., McCreadie, M. M. and Chang, S. L. (2001). Accessing and Browsing Information and Communication, MIT Press, Cambridge, MA. 27

Ricoeur, P. (1984). Time and Narrative, Vol. 1, University of Chicago Press, Chicago, IL. (Translated by K. McLaughlin and D. Pellauer). 19, 53

Riessman, C. K. (2002). Analysis of personal narratives, In: J. F. Gubrium and J. A. Holstein (Eds.), Handbook of Interview Research: Context and Method, Sage, London, pp. 695–710. 154

Riles, A. (2001). The Network Inside Out, University of Michigan Press, Ann Arbor, MI. 119, 334

Rimmon-Kenan, S. (2002). Narrative Fiction, 2nd edn., Routledge, Abingdon. 55

Rivers, W. C. (1920). Stigmata of predisposition to bone and joint tubercle, The British Journal of Children’s Diseases 17(202–204): 179–186. Available at: http://www.archive.org/details/britishjournalof17londuoft (Ac- cessed on: 6 February 2012). 6

Roberts, C. W. (1997). A generic semantic grammar for quantitative text ana- lysis: applications to East and West Berlin radio news content from 1979, Sociological Methodology 27: 89–129. 79, 84

Roberts, R. M. (1989). Serendipity: Accidental Discoveries in Science, John Wiley & Sons, Chichester. 12, 38, 48, 239, 254

Robins, G. and Morris, M. (2007). Special section: advances in exponential random graph (p*) models, Social Networks 29(2): 169–248. 118

Robins, G., Pattison, P. and Woolcock, J. (2004). Missing data in networks: exponential random graph (p*) models for networks with non-respondents, Social Networks 26(3): 257–283. 118

Rosenman, M. F. (2002). Serendipity and scientific discovery, In: R. D. Norton

419 BIBLIOGRAPHY

(Ed.), Creativity and Leadership in the 21st Century Firm, Vol. 13 of Re- search in Urban Economics, Emerald, Bingley, pp. 187–193. 6

Ross, D. (2010). Game theory, In: E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy, Center for the Study of Language and Information (CSLI), Stanford University, Stanford, CA. Available at: http://plato.stanford. edu/entries/game-theory/ (Accessed on: 11 November 2011). 62

Rousseeuw, P. J. and Croux, C. (1993). Alternatives to the median absolute deviation, Journal of the American Statistical Association 88(424): 1273– 1283. 259, 261

Rousseeuw, P. J., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke, T., Koller, M. and Maechler, M. (2012). robustbase: Ba- sic Robust Statistics. Available at: http://cran.r-project.org/web/ packages/robustbase/robustbase.pdf (Accessed on: 5 July 2012). 261

Rowlands, I., Nicholas, D., Williams, P., Huntington, P., Fieldhouse, M., Gunter, B., Withey, R., Jamali, H. R., Dobrowolski, T. and Tenopir, C. (2008). The Google generation: the information behaviour of the researcher of the future, Aslib Proceedings: New Information Perspectives 60(4): 290–310. 367

Rubin, V. L., Burkell, J. and Quan-Haase, A. (2011). Facets of serendip- ity in everyday chance encounters: a grounded theory approach to blog analysis, Information Research 16(3): paper 488. Available at: http: //informationr.net/ir/16-3/paper488.html (Accessed on: 14 Decem- ber 2011). 4, 37, 365

Sacks, H. (1995). Lectures, various, In: G. Jefferson (Ed.), Lectures on Conver- sation, Vol. 1–2, Blackwell, Oxford. 153

Sacks, H., Schegloff, E. and Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation, Language 50: 696–735. 153

Sailer, L. D. (1978). Structural equivalence: meaning and definition, computation and application, Social Networks 1(1): 73–90. 110

Salda˜na,J. (2009). The Coding Manual for Qualitative Researchers, Sage, Lon- don. 155, 161

420 BIBLIOGRAPHY

Sampson, F. (1969). Crisis in a Cloister, PhD thesis, Cornell University, Ithaca, NY. 120

Sawaizumi, S., Katai, O., Kawakami, H. and Shiose, T. (2009). Use of serendipity power for discoveries and inventions, In: M. Gen, D. Green, O. Katai, B. McKay, A. Namatame, R. A. Sarker and B. Zhang (Eds.), Intelligent and Evolutionary Systems, Vol. 187 of Studies in Computational Intelligence, Springer, Berlin, pp. 163–169. 40, 41, 58

Scearce, D. (2011). Connected Citizens: The Power, Peril and Po- tential of Networks, Report created by Monitor Institute for the John S. and James L. Knight Foundation, Miami, FL. Avail- able at: http://www.knightfoundation.org/publications/ connected-citizens-power-potential-and-peril-netwo (Accessed on: 4 May 2011). 102

Schenk, C. (2012). MiKTEX. Available at: http://www.miktex.org/ (Accessed on: 10 January 2012). 23

Scholand, A. J., Tausczik, Y. R. and Pennebaker, J. W. (2010). Social language network analysis, CSCW ’10: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, ACM, New York, NY, pp. 23–26. 102

Schultz-Jones, B. (2009). Examining information behavior through social net- works: an interdisciplinary review, Journal of Documentation 65(4): 592– 631. 103, 104, 105

Schumpeter (2010). In search of serendipity, The Economist 24 July 2010: 61. 12, 41, 59, 340

Sch¨utz,A. (1932). Der sinnhafte Aufbau der sozialen Welt, Springer, Vienna. 68

Sch¨utz,A. (1999). Frontispiece: the meaning structure of the social world, In: L. E. Embree (Ed.), Schutzian Social Science, Vol. 37 of Contributions to Phenomenology, Kluwer Academic Publishers, Dordrecht, The Netherlands, pp. ix–x. 68

Sch¨utze,F. (1983). Biographieforschung und narratives Interview, Neue Praxis 3: 283–293. 139

421 BIBLIOGRAPHY

Scott, J. (2000). Social Network Analysis: A Handbook, 2nd edn., Sage, London. 264, 353

Scott, J. and Marshall, G. (Eds.) (2009). A Dictionary of Sociology, revised 3rd edn., Oxford University Press, Oxford. 13, 56

Scott, P. B. (2005). Knowledge workers: social, task and semantic network ana- lysis, Corporate Communications: An International Journal 10(3): 257– 277. 105

Shulman, J. L. (2004). Introduction, The Travels and Adventures of Serendipity, Princeton University Press, Princeton, NJ, pp. xiii–xxv. 48, 52

Silva, da, E. A. (2010). Verbal and mental processes in science popularization news, Revista Ao ´peda Letra 12(2): 70–90. 365

Smith, C. E. and Burns, C. S. (2011). The first international workshop on oppor- tunistic discovery of information, Information Research 16(3): paper 490. Available at: http://informationr.net/ir/16-3/paper490.html (Ac- cessed on : 13 February 2012). 13, 26, 37

Smith, D. W. (2008). Phenomenology, In: E. N. Zalta (Ed.), The Stan- ford Encyclopedia of Philosophy, Center for the Study of Language and Information (CSLI), Stanford University, Stanford, CA. Available at: http://plato.stanford.edu/entries/phenomenology/ (Accessed on: 3 October 2011). 65, 66, 67, 69, 70, 365

Smucker, M. D. and Allan, J. (2007). Measuring the navigability of document net- works, Presentation at SIGIR ’07 Web Information-Seeking and Interaction Workshop, 27 July 2007, Amsterdam, The Netherlands. Available at: http: //citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.603 (Ac- cessed on: 20 August 2010). 105

Snijders, T. A. B. (2007). Models for longitudinal network data, In: P. J. Carrington, J. Scott and S. Wasserman (Eds.), Models and Methods in Social Network Analysis, Vol. 27 of Structural Analysis in the Social Sciences, Cambridge University Press, Cambridge, pp. 215–247. 120

Snowden, D. (2003). Managing for serendipity: or why we should lay off “best practice” in KM, ARK Knowledge Management 6(8): 1–8. Avail- able at: http://www.cognitive-edge.com/ceresources/articles/39_

422 BIBLIOGRAPHY

Managing_for_Serendipity_final.pdf (Accessed on: 17 February 2012). 28, 41, 58

Solomon, P. (1997). Discovering information behavior in sense making. I. Time and timing, Journal of the American Society for Information Science 48(12): 1097–1108. 40

Solomonoff, R. and Rapoport, A. (1951). Connectivity of random nets, Bulletin of Mathematical Biophysics 13: 107–117. 121

Sonnenwald, D. H. (1999). Evolving perspectives of human information beha- viour: contexts, situations, social networks and information horizons, In: T. Wilson and D. Allen (Eds.), Exploring the Contexts of Information Be- haviour, Taylor Graham, London, pp. 176–190. 31

Soskolne, C. (2000). Transdisciplinary approaches for public health, Epidemiology 11(4): S122. 51

Sowa, J. F. (1983). Conceptual Structures: Information Processing in Mind and Machine, Addison-Wesley, Boston, MA. 101

Spalter-Roth, R. (2000). Gender issues in the use of integrated approaches, In: M. Bamberger (Ed.), Integrating Quantitative and Qualitative Research in Development Projects, The World Bank, Washington, D.C., pp. 47–53. 371

Spink, A. (2004). Multitasking information behavior and information task switch- ing: an exploratory study, Journal of Documentation 60(4): 336–351. 46

Spink, A. and Cole, C. (Eds.) (2006). New Directions in Human Information Be- havior, Vol. 8 of Information Science and Knowledge Management, Springer, Dordrecht, The Netherlands. 33, 35

Squire, C. (2008). Approaches to narrative research, Review Paper NCRM/009, Economic & Social Research Council, National Centre for Research Meth- ods, Southampton. Available at: http://eprints.ncrm.ac.uk/419/1/ MethodsReviewPaperNCRM-009.pdf (Accessed on: 26 May 2010). 72, 73, 74, 75, 85, 364, 365

Srinivasan, A. and Te’eni, D. (1995). Modeling as constrained problem solv- ing: an empirical study of the data modeling process, Management Science 41(3): 419–434. 89

423 BIBLIOGRAPHY

Streitberg, B. and R¨ohmel, J. (1986). Exact distributions for permutation and rank tests: an introduction to some recently published algorithms, Statist- ical Software Newsletter 12(1): 10–17. 228

Streitberg, B. and R¨ohmel,J. (1987). Exakte Verteilungen f¨urRang- und Ran- domisierungstests im allgemeinen c-Stichprobenfall, EDV in Medizin und Biologie 18(1): 12–19. 228

Stutzman, F. D. (2011). Networked Information Behavior in Life Transition, PhD thesis, University of North Carolina, Chapel Hill, NC. Available at: http://fstutzman.com/2011/01/10/ networked-information-behavior-in-life-transition/ (Accessed on: 20 February 2012). 33

Suckale, J. (2007). LaTeX template for PhD Thesis. Available at: http:// openwetware.org/wiki/LaTeX_template_for_PhD_thesis (Accessed on: 10 January 2012). 23, 435

Sun, X., Sharples, S. and Makri, S. (2011). A user-centred mobile diary study approach to understanding serendipity in information research, Information Research 16(3): paper 492. Available at: http://informationr.net/ir/ 16-3/paper492.html (Accessed on: 14 December 2011). 4, 37, 39, 42, 46, 49, 54, 59, 60, 355

Sykes, J. B. (Ed.) (1976). The Concise Oxford Dictionary of Current English Based on The Oxford English Dictionary and its Supplements, 6th edn., Clarendon Press, Oxford. 42, 58, 64

Talja, S. and Hansen, P. (2006). Information sharing, In: A. Spink and C. Cole (Eds.), New Directions in Human Information Behavior, Vol. 8 of Informa- tion Science and Knowledge Management, Springer, Dordrecht, The Neth- erlands, pp. 113–134. 35

Talja, S., Keso, H. and Pietil¨ainen, T. (1999). The production of ‘context’ in in- formation seeking research: a metatheoretical view, Information Processing & Management 35(6): 751–763. 31

Tebbutt, D. (2007). Are we filtering ourselves into an internet ghetto?, Informa- tion World Review 10 April 2007. 12

TechSmith Corporation (2012). JING®, TechSmith Corporation, Okemos, MI.

424 BIBLIOGRAPHY

Available at: http://www.techsmith.com/jing/free/ (Accessed on: 4 September 2012). 23, 182

Teddlie, C. and Tashakkori, A. (2009). Foundations of Mixed Methods Research: Integrating Quantitative and Qualitative Approaches in the Social and Be- havioral Sciences, Sage, London. 13, 14, 118, 128, 146, 176, 177, 178, 231, 232, 233, 345, 346, 356, 363, 364, 371, 493

Teetor, P. (2011). R Cookbook, O’Reilly, Sebastopol, CA. 183

Thagard, P. and Croft, D. (1999). Scientific discovery and technological innova- tion: ulcers, dinosaur extinction, and the programming language JAVA, In: L. Magnani, N. Nersessian and P. Thagard (Eds.), Model-based Reasoning in Scientific Discovery, Springer, New York, NY, pp. 125–137. 10, 12, 41

The Stanford Natural Language Processing Group (2012). The Stanford Parser: a statistical parser, The Stanford Natural Language Processing Group, Stanford University, Palo Alto, CA. Available at: http://nlp. stanford.edu/software/lex-parser.shtml (Accessed on: 4 September 2012). 199

Thelwall, M. (2002). Evidence for the existence of geographic trends in university web site interlinking, Journal of Documentation 58(5): 563–574. 367

Thelwall, M. (2003). A layered approach for investigating the topological struc- ture of communities in the Web, Journal of Documentation 59(4): 410–429. 105

Thelwall, M. (2004). Link Analysis: An Information Science Approach, Elsevier, London. 105

Thelwall, M. (2008). Bibliometrics to webometrics, Journal of Information Sci- ence 34(4): 605–621. 105

Thelwall, M. (2009). Introduction to Webometrics: Quantitative Web Research for the Social Sciences, Vol. 1 of Synthesis Lectures on Information Con- cepts, Retrieval, and Services, Morgan & Claypool, San Francisco, CA. 105

Thelwall, M. (2010). Webometrics: emergent or doomed?, Information Re- search 15(4): colis 713. Available at: http://InformationR.net/ir/15-4/ colis713.html (Accessed on: 8 December 2010). 105

425 BIBLIOGRAPHY

Thelwall, M., Vaughan, L. and Bj¨orneborn, L. (2005). Webometrics, Annual Review of Information Science and Technology 39: 81–135. 105

Thelwall, M. and Wilkinson, D. (2003). Graph structure in three national aca- demic Webs: power laws with anomalies, Journal of the American Society for Information Science and Technology 54(8): 706–712. 105

Thimbleby, H. and Oladimeji, P. (2009). Social network analysis and interactive device design analysis, EICS 09: Proceedings of the SIGCHI Symposium on Engineering Interactive Computing Systems, ACM, New York, NY, pp. 91– 100. 102

Thudt, A., Hinrichs, U. and Carpendale, S. (2011). The Bohemian Bookshelf: supporting serendipitous discoveries through visualization, Technical Report 2011–1009–21, University of Calgary, Calgary, AB. Available at: http://dspace.ucalgary.ca/bitstream/1880/48717/1/ 2011-1009-21.pdf (Accessed on: 27 March 2012). 41, 58

Tilly, C. (1999). The trouble with stories, In: B. A. Pescosolido and R. Aminzade (Eds.), The Social Worlds of Higher Education: Handbook for Teaching in a New Century, Pine Forge Press, Thousand Oaks, CA, pp. 256–270. 369

Tilly, C. (2004). Social boundary mechanisms, Philosophy of the Social Sciences 34(2): 211–236. 117

Tilly, C. (2010). Mechanisms of the middle range, In: C. Calhoun (Ed.), Robert K. Merton: Sociology of Science and Sociology as Science, Columbia Uni- versity Press, New York, NY, pp. 54–62. 8, 11

Toms, E. (2000a). Serendipitous information retrieval, Proceedings of the First DELOS Network of Excellence Workshop on Information Seeking, Search- ing and Querying in Digital Libraries, ERCIM, pp. 11–12. Available at: http://www.ercim.eu/publication/ws-proceedings/DelNoe01/3_ Toms.pdf (Accessed on: 17 February 2012). 4, 36

Toms, E. (2000b). Understanding and facilitating the browsing of electronic text, International Journal of Human-Computer Studies 52(3): 423–452. 4, 54

Toms, E. (2010). Taking search to task, Seminar presentation, UCL Interaction Centre, 2 September 2010, University College London, London. 4

426 BIBLIOGRAPHY

Toms, E. and McCay-Peet, L. (2009). Chance encounters in the digital library, ECDL ’09: Proceedings of the 13th European Conference on Research and Advanced Technology for Digital Libraries, Springer, Heidelberg, Germany, pp. 192–202. 4, 46

Toolan, M. (2001). Narrative: A Critical Linguistic Introduction, 2nd edn., Rout- ledge, London. 56

T¨otterman,A.-K. and Wid´en-Wulff,G. (2007). What a social capital perspect- ive can bring to the understanding of information sharing in a univer- sity context, Information Research 12(4): colis 19. Available at: http:// informationr.net/ir/12-4/colis/colis19.html (Accessed on: 21 Feb- ruary 2012). 31

Tsai, W. (2001). Knowledge transfer in intraorganizational networks: effects of network position and absorptive capacity on business unit innovation and performance, The Academy of Management Journal 44(5): 996–1004. 106

Urquhart, A. H. (1977). The Effect of Rhetorical Organization on the Readability of Study Texts, PhD thesis, University of Edinburgh, Edinburgh. 151

Urquhart, C. (2011). Meta-synthesis of research on information seeking be- haviour, Information Research 16(1): paper 455. Available at: http: //informationr.net/ir/16-1/paper455.html (Accessed on: 13 Febru- ary 2012). 9, 10

Uzzi, B. and Spiro, J. (2005). Collaboration and creativity: the small world problem, American Journal of Sociology 111(2): 447–504. 106 van Andel, P. (1994). Anatomy of the unsought finding. Serendipity: origin, history, domains, traditions, appearances, patterns and programmability, The British Journal for the Philosophy of Science 45(2): 631–648. 12, 27, 43, 44, 45, 49, 60, 138, 238, 239, 240, 244, 450 van Andel, P. and Bourcier, D. (2009). De La S´erendipit´edans La Science, La Technique, L’Art et Le Droit, L’Act Mem, Libre science. 27 van Atteveldt, W. (2008). Semantic Network Analysis: Techniques for Extracting, Representing, and Querying Media Content, PhD thesis, Netherlands Re- search School for Information and Knowledge Systems (SIKS), Utrecht Uni- versity, Utrecht, Netherlands. Available at: http://vanatteveldt.com/

427 BIBLIOGRAPHY

dissertation/vanatteveldt_semanticnetworkanalysis.pdf (Accessed on: 18 June 2011). 154, 155

van Atteveldt, W., Kleinnijenhuis, J. and Ruigrok, N. (2008). Parsing, se- mantic networks, and political authority using syntactic analysis to ex- tract semantic relations from Dutch newspaper articles, Political Analysis 16(4): 428–446. 81, 83, 154

V´azquez,A., Dobrin, R., Sergi, D., Eckmann, G. P., Oltvai, Z. N. and Barab´asi, A. L. (2004). The topological relationship between the large-scale attrib- utes and local interaction patterns of complex networks, Proceedings of the National Academy of Sciences 101(52): 17940–17945. 124

Virbitskaite, I. and Dubtsov, R. (2008). Semantic domains of timed event struc- tures, Programming and Computer Software 34(3): 125–137. 20

W3C® (2004a). RDF vocabulary description language 1.0: RDF schema, World Wide Web Consortium (W3C), W3C Recommendation . Available at: http://www.w3.org/TR/2004/REC-rdf-schema-20040210/ (Accessed on: 11 October 2011). 79

W3C® (2004b). Resource description framework (RDF): concepts and abstract syntax. 79

W3C® (2011a). RDF current status, World Wide Web Consortium (W3C), Standards . Available at: http://www.w3.org/standards/techs/rdf (Ac- cessed on: 11 October 2011). 79

W3C® (2011b). Semantic Web, World Wide Web Consortium (W3C), Standards . Available at: http://www.w3.org/standards/semanticweb/ (Accessed on: 11 October 2011). 79

Wada, T. (2004). Event analysis of claim making in Mexico: how are social protests transformed into political protests?, Mobilization: An International Journal 9: 241–257. 85

Wagner, E. L. (2003). Interconnecting information systems narrative research: an end-to-end approach for process-orientated field studies, In: E. H. Wynn, E. A. Whitley, M. D. Myers and J. I. DeGross (Eds.), Global and Organiza- tional Discourse about Information Technology, Kluwer Academic Publish- ers, Dordrect, The Netherlands, pp. 419–435. 72

428 BIBLIOGRAPHY

Walsh, J. P. and Bayma, T. (1996). Computer networks and scientific work, Social Studies of Science 26(3): 661–703. 367

Walsh, J. P., Kucker, S., Maloney, N. G. and Gabbay, S. (2000). Connecting minds: computer-mediated communication and scientific work, Journal of the American Society for Information Science 51(14): 1295–1305. 367

Waring, P. (2010). Human Centred Event Linking, PhD thesis, University of Sheffield, Sheffield. 56

Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and Applications, Vol. 8 of Structural Analysis in the Social Sciences, Cambridge University Press, Cambridge. 93, 125, 328

Watson, J. S. (2001). Making sense of the stories of experience: methodology for research and teaching, Journal of Education for Library and Information Science 42(2): 137–148. 72

Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks, Nature 393(6684): 440–442. 353

Wei, W., Pfeffer, J., Reminga, J. and Carley, K. (2011). Handling weighted, asymmetric, self-looped, and disconnected networks in ora, Technical report CMU-ISR-11-113, Carnegie Mellon University, School of Computer Science, Institute for Software Research, Pittsburgh, PA. Available at: http://www.casos.cs.cmu.edu/publications/papers/ CMU-ISR-11-113.pdf (Accessed on: 24 July 2012). 114, 215, 301, 478

Wellman, B. (1926). The school child’s choice of companions, Journal of Educa- tional Research 14(2): 126–132. 107

Wellman, B. (2008). Structural analysis: from method and metaphor to theory and substance, In: B. Wellman and S. D. Berkowitz (Eds.), Social Struc- tures: A Network Approach, Vol. 15 of Contemporary Studies in Sociology, Emerald, Bingley, pp. 19–61. 62, 63, 64

Wellman, B. and Berkowitz, S. D. (2008). Introduction: studying social struc- tures, In: B. Wellman and S. D. Berkowitz (Eds.), Social Structures: A Network Approach, Vol. 15 of Contemporary Studies in Sociology, Emerald, Bingley, pp. 1–14. 62

429 BIBLIOGRAPHY

Wernicke, S. (2006). Efficient detection of network motifs, IEEE/ACM Trans- actions on Computational Biology and Bioinformatics 3(4): 347–359. Available at: http://theinf1.informatik.uni-jena.de/publications/ motifs-tcbb06.pdf (Accessed on: 13 July 2011). 187

Wernicke, S. and Rasche, F. (2006). FANMOD: a tool for fast net- work motif detection, Bioinformatics 22(9): 1152–1153. Available at: http://bioinformatics.oxfordjournals.org/content/22/9/1152. full.pdf?keytype=ref&ijkey=EGtmBvBvjGHy7Fi (Accessed on: 13 July 2011). 187

Whetsell, T. A. and Shields, P. M. (2011). Reconciling the varieties of pragmatism in public administration, Administration & Society 43(4): 474–483. 13

White, D. R. and Reitz, K. P. (1983). Graph and semigroup homomorphisms on networks of relations, Social Networks 5(2): 193–234. 110

Wills, L. M. and Kolodner, J. L. (1994). Explaining serendipitous recognition in design, In: A. Ram and K. Eiselt (Eds.), Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 940–945. 54, 57, 58

Wilson, C. J., Dougherty, A. E., Sutter, M., Mitchell-Jackson, J., Wellner, P. and Hall, N. (2008). Using Social Network Analysis to Advance Tradi- tional Qualitative Methods in Evaluation and Program Design, Paper ex- tracted from the CPUC sponsored 2006–2008 Statewide Marketing and Outreach Process Evaluation, Opinion Dynamics Corporation, Oakland, CA. Available at: http://www.opiniondynamics.com/papers/using% 20social%20network%20analysis_c%20wilson.pdf (Accessed on: 19 May 2010). 102

Wilson, T. D. (1997). Information behaviour: an interdisciplinary perspective, Information Processing & Management 33(4): 551–572. 33

Wilson, T. D. (1999). Models in information behaviour research, Journal of Documentation 55(3): 249–270. Available at: http://informationr.net/ tdw/publ/papers/1999JDoc.html (Accessed on: 10 October 2011). 33, 34, 35, 36, 37, 38, 39, 89, 367

Wilson, T. D. (2000). Human information behavior, Informing Science 3(2): 49– 55. 33

430 BIBLIOGRAPHY

Wilson, T. D. (2002). Alfred Schutz, phenomenology and research methodo- logy for information behaviour research, Presentation at ISIC4—Fourth In- ternational Conference on Information Seeking in Context, Universidade Lusiada, 11–13 September 2002, Lisbon, Portugal. Available at: http: //informationr.net/tdw/publ/papers/schutz02.html (Accessed on: 3 October 2011). 65, 68, 69

Winchester, S. (2008). The unanticipated pleasures of the writing life, In: M. de Rond and I. Morley (Eds.), Serendipity: Fortune and the Prepared Mind, Vol. 22 of The Darwin College Lectures, Cambridge University Press, Cambridge, pp. 123–141. 27, 42, 55, 56, 57

Winship, C. (1974). Thoughts about roles and relations. Part I: theoretical con- siderations, Unpublished Manuscript, Harvard University, Cambridge, MA. 110

Winship, C. (1988). Thoughts about roles and relations: an old document revised, Social Networks 10(3): 209–231. 110

Winskel, G. (1980). Events in Computation, PhD thesis, University of Edin- burgh, Edinburgh. Available at: http://www.daimi.au.dk/~gwinskel/ Events-in-Computation.pdf (Accessed on: 5 September 2012). 20

Winskel, G. (1987). Event structures, In: W. Brauer, W. Reisig and G. Rozen- berg (Eds.), Petri Nets: Applications and Relationships to Other Models of Concurrency, Vol. 255 of Lecture Notes in Computer Science, Springer, Berlin, pp. 325–392. 20

Winskel, G. (1989). An introduction to event structures, In: J. de Bakker, W. de Roever and G. Rozenberg (Eds.), Linear Time, Branching Time and Partial Order in Logics and Models for Concurrency, Vol. 354 of Lecture Notes in Computer Science, Springer, Berlin, pp. 364–397. 20

Yadamsuren, B. and Heinstr¨om,J. (2011). Emotional reactions to incidental exposure to online news, Information Research 16(3): paper 486. Available at: http://informationr.net/ir/16-3/paper486.html (Accessed on: 23 February 2012). 37

Yang, Y., Pierce, T. and Carbonell, J. (1998). A study of retrospective and on-line event detection, SIGIR ’98: Proceedings of the 21st ACM-SIGIR Interna- tional Conference of Research and Development in Information Retrieval,

431 BIBLIOGRAPHY

ACM, New York, pp. 28–36. 56

Zacks, J. M. and Tversky, B. (2001). Event structure in perception and concep- tion, Psychological Bulletin 127(1): 3–21. 56

Zhou, J., Shin, S. J., Brass, D. J., Choi, J. and Zhang, Z. X. (2009). Social networks, personal values, and creativity: evidence for curvilinear and in- teraction effects, Journal of Applied Psychology 94(6): 1544–1552. 106

Zhu, D. and Qin, Z. S. (2005). Structural comparison of metabolic networks in se- lected single cell organisms, BMC Bioinformatics 6(8). Available at: http: //www.biomedcentral.com/content/pdf/1471-2105-6-8.pdf (Accessed on: 26 April 2010). 124, 127

Zuba, M. (2009). A comparative study of network motif detection tools, Re- port UConn Bio-Grid, REU, Summer 2009, Department of Computer Science & Engineering, University of Conneticut, Storrs, CT. Available at: http://biogrid.engr.uconn.edu/REU/Reports_09/Final_Reports/ Michael_Zuba.pdf (Accessed on: 11 July 2011). 127

432 Appendices

433

Appendix A

The LATEX packages used in the study

This appendix lists the LATEX packages called by the LATEX class (.cls) file to customise the typesetting of the thesis; some packages were included in Suckale’s (2007) template, while others were added by the present author. Unless a citation indicates otherwise, the packages are available at AZ: Jim Hefferons web views of the Comprehensive TEX Archive Network (Heffron, 2012). Note that each package has author(s) and/or maintainer(s); further information can be found on the home pages of the packages.

{amsmath} {graphics} {amssymb} {graphicx} {appendix} {hyperref} {babel} {ifpdf} {caption} {ifthen} {color} {keystroke} {colortbl} {longtable} {dtklogos} (DANTE, 2012) {lscape} {enumerate} {multicol} {epsfig} {multirow} {eucal} (Mittelbach et al., 2001) {natbib} {fancyhdr} {pdflscape} {fixltx2e} {pifont}

435 A. THE LATEX PACKAGES USED IN THE STUDY

{setspace} {tocbibind} {textcomp} {upgreek}

436 Appendix B

Count of the Citation Classics Contained in the Online Database

This appendix presents the method used to count the number of Citation Classics in the online database. The explanation uses the year 1985 as an example.

Step one: The homepage for the online Citation Classics database was loaded and 1985 was selected from the year menu. This action opened the page containing all of the Citation Classics indexed for 1985.

Step two: The entire contents of the 1985 page were selected and copied to an EXCEL file.

Step three: Using the ‘Find All’ function in EXCEL, a search was performed for the text A1985. This search term was selected because each document refer- ence number in the Citation Classics database is made up of three ordered components: the capital letter ‘A’, which begins every reference number; the year the Citation Classic was published; and a unique reference number. As such, all 1985 Citation Classics in the online database have a document reference number that begins A1985. Using the first two components of the document reference number was also a good choice because it was highly

437 B. CITATION CLASSICS COUNT

unlikely that any other text string would prove an exact match for the selected search term.

Step four: The number of cells located by the EXCEL ‘Find All’ search was re- corded. As the search term used likely related only to document reference numbers, the cell count was a close, if not exact, representation of the num- ber of documents for the year. 357 cells were located in the 1985 search.

The above steps were repeated for every year indexed in the online database. The yearly counts summed to a total count of 5082. Table B.1 presents the document counts for each year.

438 Year Count

1977 52 1978 52 1979 312 1980 314 1981 354 1982 312 1983 363 1984 364 1985 357 1986 367 1987 408 1988 415 1989 386 1990 389 1991 202 1992 225 1993 210

Table B.1: Count of the Citation Classics in the online database - This table presents the yearly counts of the number of Citation Classics in the online database. The counts were determined by searching for a text string consisting of the first two components of the Citation Classics database document reference number.

439 B. CITATION CLASSICS COUNT

440 Appendix C

Calculations Used to Estimate the Number of Unique Citation Classics in the Online Database

This appendix gives the calculations used to estimate the number of unique Cita- tion Classics in the online database. These calculations employed figures related to the serendipity subpopulation (see section 4.2.6). Seventy-six Citation Clas- sics comprised the serendipity subpopulation. Seventeen of these were duplicates, leaving fifty-nine unique Citation Classics in the subpopulation. The ratio of the number of unique Citation Classics in the subpopulation to the number of Citation Classics in the whole subpopulation allowed the estima- tion of a similar ratio using the count of Citation Classics in the online database. Appendix B presents the method used to achieve the count of Citation Classics in the online database. The count was 5082.

The estimate of unique Citation Classics in the database was calculated as

59 x = 76 5082 where

x = the estimated number of unique Citation Classics.

441 C. ESTIMATION OF UNIQUE CITATION CLASSICS COUNT

Re-arranging the equation gave

76(x) = 59(5082)

and then

59(5082) x = 76.

The result of the calculation was

x = 3945.

442 Appendix D

List of the Citation Classics Comprising the Study Sample

This appendix lists the forty-eight Citation Classics (extracted from Garfield, 2012) that made up the study sample.1 Each entry listed includes the surname(s) of the author(s), the year published, the Citation Classics database document reference number, and the url link2 to the PDF file.

1. Adams (1981) A1981LC33100001 url: http://www.garfield.library.upenn.edu/classics1981/A1981LC33100001.pdf

2. Almeida (1980) A1980JX53000001 url: http://garfield.library.upenn.edu/classics1980/A1980JX53000001.pdf

3. Ashbaugh (1979) A1979HQ37400001 url: http://www.garfield.library.upenn.edu/classics1979/A1979HQ37400001.pdf

4. Bandurski (1981) A1981MQ15600001 url: http://garfield.library.upenn.edu/classics1981/A1981MQ15600001.pdf

1Recall from section 4.2.6 that Weissman and Koe (1990) reported three separate serendipity narratives. This is the reason that the sample comprises forty-eight Citation Classics rather than fifty. 2All url links were checked and active on 22 September 2011.

443 D. CITATION CLASSICS IN THE STUDY SAMPLE

5. Beadle (1978) A1978FE76900002 url: http://www.garfield.library.upenn.edu/classics1978/A1978FE76900002.pdf

6. Blanchard (1986) A1986E843500001 url: http://garfield.library.upenn.edu/classics1986/A1986E843500001.pdf

7. Boyd (1982) A1982PA24800001 url: http://garfield.library.upenn.edu/classics1982/A1982PA24800001.pdf

8. Brown (1981) A1981MM41700001 url: http://www.garfield.library.upenn.edu/classics1981/A1981MM41700001.pdf

9. Bull (1986) A1986AXY6000001 url:http://www.garfield.library.upenn.edu/classics1986/A1986AXY6000001.pdf

10. Buzbee (1992) A1992JJ54300001 url: http://www.garfield.library.upenn.edu/classics1992/A1992JJ54300001.pdf

11. Cleaver (1981) A1981LX51500001 url: http://www.garfield.library.upenn.edu/classics1981/A1981LX51500001.pdf

12. Cohn (1989) A1989CC64000001 url: http://garfield.library.upenn.edu/classics1989/A1989CC64000001.pdf

13. Dicke (1980) A1980JX54100001 url: http://garfield.library.upenn.edu/classics1980/A1980JX54100001.pdf

14. Ey (1989) A1989T566700001 url: http://garfield.library.upenn.edu/classics1989/A1989T566700001.pdf

15. Field (1984) A1984SB92100001 url: http://www.garfield.library.upenn.edu/classics1984/A1984SB92100001.pdf

444 16. Fitzgerald (1980) A1980KG03500001 url: http://garfield.library.upenn.edu/classics1980/A1980KG03500001.pdf

17. Fogg (1989) A1989AG82900001 url: http://www.garfield.library.upenn.edu/classics1989/A1989AG82900001.pdf

18. Galanter (1983) A1983RT91800001 url: http://garfield.library.upenn.edu/classics1983/A1983RT91800001.pdf

19. Giblett (1982) A1982NQ25600001 url: http://garfield.library.upenn.edu/classics1982/A1982NQ25600001.pdf

20. Gocke (1985) A1985ACJ8900001 url: http://garfield.library.upenn.edu/classics1985/A1985ACJ8900001.pdf

21. Gordon (1981) A1981LG19200001 url: http://www.garfield.library.upenn.edu/classics1981/A1981LG19200001.pdf

22. Graham (1988) A1988Q713700001 url: http://garfield.library.upenn.edu/classics1988/A1988Q713700001.pdf

23. Greenwood (1977) A1977DM03700001 url: http://garfield.library.upenn.edu/classics1977/A1977DM03700001.pdf

24. Harrison (1987) A1987G156200001 url: http://garfield.library.upenn.edu/classics1987/A1987G156200001.pdf

25. Hodge (1979) A1979HZ28200001 url: http://www.garfield.library.upenn.edu/classics1979/A1979HZ28200001.pdf

26. Hudson and Pope (1988) A1988L680000001 url: http://garfield.library.upenn.edu/classics1988/A1988L680000001.pdf

445 D. CITATION CLASSICS IN THE STUDY SAMPLE

27. Kanner (1979) A1979HZ31800001 url: http://garfield.library.upenn.edu/classics1979/A1979HZ31800001.pdf

28. Kappas (1987) A1987H855100001 url: http://garfield.library.upenn.edu/classics1987/A1987H855100001.pdf

29. Kirk (1989) A1989R442500001 url: http://garfield.library.upenn.edu/classics1989/A1989R442500001.pdf

30. Kroto (1993) A1993LT56300001 url: http://garfield.library.upenn.edu/classics1993/A1993LT56300001.pdf

31. Kuwabara (1993) A1993LZ46900001 url: http://www.garfield.library.upenn.edu/classics1993/A1993LZ46900001.pdf

32. Levy (1981) A1981LT86500001 url: http://garfield.library.upenn.edu/classics1981/A1981LT86500001.pdf

33. Mayer (1992) A1992HD13000001 url: http://garfield.library.upenn.edu/classics1992/A1992HC30800001.pdf

34. Milstein and Brownlee (1992) A1992JJ54100001 url: http://garfield.library.upenn.edu/classics1992/A1992JJ54100001.pdf

35. Moorhead (1983) A1983QA20000002 url: http://www.garfield.library.upenn.edu/classics1983/A1983QA20000002.pdf

36. Myers (1993) A1993MB49300001 url: http://www.garfield.library.upenn.edu/classics1993/A1993MB49300001.pdf

37. Niswender (1980) A1980KM40600001 url: http://garfield.library.upenn.edu/classics1980/A1980KM40600001.pdf

446 38. Nowell (1977) A1977DW24000002 url: http://www.garfield.library.upenn.edu/classics1977/A1977DW24000002.pdf

39. Pedersen (1985) A1985ANC3000001 url: http://garfield.library.upenn.edu/classics1985/A1985ANC3000001.pdf

40. Poulik (1984) A1984SJ84800001 url: http://www.garfield.library.upenn.edu/classics1984/A1984SJ84800001.pdf

41. Reynolds (1981) A1981LZ52300001 url: http://garfield.library.upenn.edu/classics1981/A1981LZ52300001.pdf

42. Schaefer (1981) A1981LH86200001 url: http://garfield.library.upenn.edu/classics1981/A1981LH86200001.pdf

43. Schatzmann (1984) A1984TQ21000002 url: http://garfield.library.upenn.edu/classics1984/A1984TQ21000002.pdf

44. Smythe (1980) A1980KT82300001 url: http://www.garfield.library.upenn.edu/classics1980/A1980KT82300001.pdf

45. Sussman (1990) A1990DJ90700001 url: http://www.garfield.library.upenn.edu/classics1990/A1990DJ90700001.pdf

46. Warner (1977) A1977DV63600002 url: http://www.garfield.library.upenn.edu/classics1977/A1977DV63600002.pdf

47. Weissman and Koe (1990) A1990EA25200002 url: http://www.garfield.library.upenn.edu/classics1990/A1990EA25200002.pdf

48. White (1987) A1987J555200001 url: http://garfield.library.upenn.edu/classics1987/A1987J555200001.pdf

447 D. CITATION CLASSICS IN THE STUDY SAMPLE

448 Appendix E

The Closed-Ended Survey Instrument Used to Collect the Contextual Data

This appendix reproduces the closed-ended survey instrument used in the study to collect the contextual data from the sample narratives. The data were collected in three worksheets of an EXCEL workbook. Each row in a worksheet corresponded to a single sampled narrative. Each column in a worksheet held a single response.

Worksheet one

Document identification information

ˆ Citation Classic document reference number, including web link?

ˆ Year?

ˆ Assign the sample narrative a nickname (normally the first author’s sur- name).

Data on outcome type

ˆ What does the narrator present as the primary outcome of the serendipity experience? Enter findings or methods.

449 E. SURVEY INSTRUMENT USED TO COLLECT THE CONTEXTUAL DATA

Data on discipline and multiple disciplinarity

ˆ Field of study of the journal in which the original research report was pub- lished? Take this from the introductory information provided at the top of the published Citation Classic. Be as specific as possible.

ˆ Current Contents edition metadata? Enter the Current Contents edition of the sampled Citation Classic. Note the number of Current Content editions in which the Citation Classic appeared.

ˆ Does the Citation Classic narrator refer to multiple disciplinarity as im- portant to the serendipity occurrence? Enter yes or no.

Worksheet two

Serendipity classification/pattern data Classification/pattern data is based on the researcher’s interpretation of the sample narrative.

ˆ Campanario Classification? Enter 1 or 2 (see Campanario, 1996, p. 10).

ˆ Van Andel pattern(s)? Enter 1–17 (see van Andel, 1994, sec. 9). Note that multiple patterns are allowed. Separate pattern numbers by a semicolon.

ˆ Foster and Ford Classification(s)? Enter 1, 2, 3, 4a or 4b (see Foster and Ford, 2003, p. 330, 332). Note that multiple patterns are allowed. Separate pattern numbers by a semicolon.

Data on intention/realisation Intention/realisation data is narrator-reported (i.e. as discussed by the nar- rator(s) of the Citation Classic).

ˆ Original intention? Select one:

a) Specific b) Vague

450 c) None d) Not discussed

ˆ Intended process/approach? Select one:

a) Specific b) Vague c) None d) Not discussed

ˆ Actual result? Select one:

a) As intended b) Unintended c) Not applicable (i.e. because no original intention/original intention not discussed)

ˆ Actual process or approach? Select one:

a) As intended b) Unintended c) Not applicable (i.e. because no intended process/intended process not discussed)

Worksheet three

All data in this section is narrator-reported.

Serendipity filter data Filter: Time

ˆ Does the discussion refer to the pressure of time? Enter yes or no.

ˆ If yes, the influence of this pressure is described as (select one):

451 E. SURVEY INSTRUMENT USED TO COLLECT THE CONTEXTUAL DATA

a) Negative b) Positive c) Not discussed

Filter: Need to meet original research aim(s)

ˆ Does the discussion refer to the pressure of a need to satisfy the original research aim(s)? Enter yes or no.

ˆ If yes, the influence of this pressure is described as (select one):

a) Negative b) Positive c) Not discussed

Filter: Financial

ˆ Does the discussion refer to financial pressure? Enter yes or no.

ˆ If yes, the influence of this pressure is described as (select one):

a) Negative b) Positive c) Not discussed

Filter: Social responsibility

ˆ Does the discussion refer to pressure of social responsibility? Enter yes or no.

ˆ If yes, the influence of this pressure is described as (select one):

a) Negative b) Positive c) Not discussed

452 Recognition data

ˆ The recognition of serendipity is discussed as (select one):

a) Immediate b) Not immediate c) Not discussed d) Not able to determine from the discussion.

Data on linearity/nonlinearity

ˆ The occurrence of serendipity is described as (select one) :

a) Linear (i.e. a process in steps with one thing leading to the next) b) Non-linear (i.e. several actions or events occurring concurrently) c) Not discussed d) Not able to determine from the discussion.

453 E. SURVEY INSTRUMENT USED TO COLLECT THE CONTEXTUAL DATA

454 Appendix F

A Walk-Through of a Model One Event Structure Coding

This appendix provides a walk-though of a model one event structure coding for the sample narrative Cleaver (1981) (see Garfield, 2012). This narrative was selected for two pragmatic reasons:

1. it is short and, therefore, allows a walk-through of the complete narrative proper; and,

2. it provides examples of the sort of ‘messy’ coding situations encountered in the study, yet remains straight-forward enough to enable the reader to follow the essential elements of the coding approach.

It is important to stress that this narrative resulted in an event structure that was much smaller than the average event structure in the study. It is also important for the reader to recognise that this appendix, which examines the coding of a single narrative from the study sample of fifty, can only provide a limited view of the coding process. The details provided here should not be considered exhaustive. As discussed in section 5.3.1, the narratives were rich, complex data; however, the decisions applied to a single narrative did not always equate to ideal decisions for the coding of that narrative. Section 4.4.1 noted that the research should not be labelled a case study design. As such, coding decisions needed to be applied consistently across the sample, even if this

455 F. MODEL ONE CODING WALK-THROUGH resulted in a less-than-ideal coding of a single case. The reader is asked to bear these points in mind when reviewing the coding details presented below.

Text of Cleaver (1981) Note that neither the introductory material to the text (e.g. summary, journal information, author information, etc.) nor the text’s reference list has been in- cluded here; however, the paragraph structure of the original text has been re- tained.

In scientific research everyone travels in hope of that lucky break; rarely are we privileged to receive it. This paper, now a Citation Classic, describes one of those serendipitous discoveries that has been valuable in defining the role of damage to the genetic material (DNA) in human cancer. It sparked the growth of a large field of currently active, fruitful, and competitive research, and it is this which accounts for the article’s frequent citation. In 1966–1968, we had the beginning of an interest in the way radiation damages human DNA. One direction was clear; to make significant progress we needed to select mutants that were altered in their capacity to recover from radiation damage. This direction was difficult, and has only recently been successful. The discovery of human mutants started from a most unlikely origin. In April 1967, the San Francisco Chronicle ran an article by David Perlman on a hereditary form of skin cancer in man induced by sunlight, xeroderma pigmentosum (XP). From that article I guessed that perhaps XP could be a human mutant of the kind we wanted. The first experiment on XP cells showed that the choice was right, and in just over a year this first report was published, and a second followed the next year. The choice of the right disease was everything, because most of the experiments involved routine techniques. Anyone can repeat these first experiments in a matter of a few days, and the rapidity of the competition to harvest the cream from the subject testified to this. Ironically, an earlier report demonstrating hypersensitivity of XP cells to ultraviolet light by Gartler had been completely ignored. That was fortunate for me, because if it had been pursued the pioneering work on XP would have been over before I left graduate school! There have been many ramifications from defining XP as a DNA repair deficiency in man. The etiology of a complex human disease involving both genetic and environmental components is clearer. Nu-

456 merous researchers have been stimulated to search for other diseases that might have similar characteristics. XP studies have become a starting point and justification for many experiments investigating carcinogen action on DNA The role of DNA repair mechanisms in carcinogenesis is a currently active field of research. A wide range of human diseases have now been identified that exhibit increases in sensitivity to environmental agents (chemicals, radiations) that damage DNA. These hypersensitive diseases—XP, ataxia telangiectasia, Cockayne syndrome, Fanconi’s anemia, etc.— form a diverse class within which XP is a subset involving defective DNA repair. The causes of hypersensitivity in the other diseases are more complex. Our work in XP has been recognized by two awards: the Radi- ation Research Society’s Research Award in 1973, and the American Academy of Dermatology’s Lila Gruber Award for Cancer Research in 1976. It has been generously supported by the Atomic Energy Commission and its successor, the Department of Energy.

Delineation of the serendipity-related narrative proper The delineation of the serendipity-related narrative proper involved several steps:

1. identify Labovian event narrative sections;

2. identify the most reportable event;

3. work backwards from the most reportable event through to the starting event of the sequence;

4. remove evaluation and most description elements from the narrative proper; and,

5. bracket any event context description to assist event context coding.

Step One

All six Labovian event narrative sections were present in the narrative:

1. an abstract (begins, “In scientific research . . . ”);

457 F. MODEL ONE CODING WALK-THROUGH

2. an orientation (begins, “In 1966–1968”);

3. a complicating action (begins, “In April 1967 . . . ”);

4. an evaluation (begins, “The choice of the right disease . . . ”);

5. a resolution (begins, “There have been many ramifications . . . ”); and,

6. a coda (begins, “A wide range of human diseases . . . ”).

The narrative is reproduced below with the sections colour-highlighted. Notice how the complicating action represents only a small part of the complete text.

In scientific research everyone travels in hope of that lucky break; rarely are we privileged to receive it. This paper, now a Citation Classic, describes one of those serendipitous discoveries that has been valuable in defining the role of damage to the genetic material (DNA) in human cancer. It sparked the growth of a large field of currently active, fruitful, and competitive research, and it is this which accounts for the article’s frequent citation. In 1966–1968, we had the beginning of an interest in the way radiation damages human DNA. One direction was clear; to make significant progress we needed to select mutants that were altered in their capacity to recover from radiation damage. This direction was difficult, and has only recently been successful. The discovery of human mutants started from a most unlikely origin. In April 1967, the San Francisco Chronicle ran an article by David Perlman on a hereditary form of skin cancer in man induced by sunlight, xeroderma pigmentosum (XP). From that article I guessed that perhaps XP could be a human mutant of the kind we wanted. The first experiment on XP cells showed that the choice was right, and in just over a year this first report was published, and a second followed the next year. The choice of the right disease was everything, because most of the experiments involved routine techniques. Anyone can repeat these first experiments in a matter of a few days, and the rapidity of the competition to harvest the cream from the subject testified to this. Ironically, an earlier report demonstrating hypersensitivity of XP cells to ultraviolet light by Gartler had been completely ignored. That was fortunate for me, because if it had been pursued the pioneering work on XP would have been over before I left graduate school!

458 There have been many ramifications from defining XP as a DNA repair deficiency in man. The etiology of a complex human disease involving both genetic and environmental components is clearer. Nu- merous researchers have been stimulated to search for other diseases that might have similar characteristics. XP studies have become a starting point and justification for many experiments investigating carcinogen action on DNA The role of DNA repair mechanisms in carcinogenesis is a currently active field of research. A wide range of human diseases have now been identified that exhibit increases in sensitivity to environmental agents (chemicals, radiations) that damage DNA. These hypersensitive diseases—XP, ataxia telangiectasia, Cockayne syndrome, Fanconi’s anemia, etc.— form a diverse class within which XP is a subset involving defective DNA repair. The causes of hypersensitivity in the other diseases are more complex. Our work in XP has been recognized by two awards: the Radi- ation Research Society’s Research Award in 1973, and the American Academy of Dermatology’s Lila Gruber Award for Cancer Research in 1976. It has been generously supported by the Atomic Energy Commission and its successor, the Department of Energy.

Step Two

The analysis focused on the complicating action and located the most report- able event (begins, “The first experiment . . . ”):

In April 1967, the San Francisco Chronicle ran an article by David Perlman on a hereditary form of skin cancer in man induced by sun- light, xeroderma pigmentosum (XP). From that article I guessed that perhaps XP could be a human mutant of the kind we wanted. The first experiment on XP cells showed that the choice was right, and in just over a year this first report was published, and a second followed the next year.

Note that the publishing of the reports is not considered a reportable event for the coding purposes; rather, the reports are considered results of the most reportable event—the experimental finding. This question of whether or not to consider the publication of a result as the most reportable event came up fairly frequently during the event structure coding. For consistency purposes, the de-

459 F. MODEL ONE CODING WALK-THROUGH

cision was taken prior to the second pass coding not to include result publications in the sample event structures, unless a publication occurred prior to the most reportable event.

Step Three

At first glance, the initial event appears to be the newspaper’s running of the article; however, a semantic reading of the text indicates the narrator actually refers to two earlier events—David Perlman’s article and the wanting of a ‘kind’ of mutant. Recall that narrative norms dictate that a narrator will include all, but no more, detail needed to support the understanding of the most reportable event. Steps Four and Five

The analysis removed evaluation and most description, bracketing event con- text information.

[In April 1967], the San Francisco Chronicle ran an article by David Perlman on . . . (XP). From that article I guessed that . . . XP could be a mutant . . . we wanted. The [first] experiment on XP cells showed that the choice was right.

The observant reader will notice at this point that description above and beyond event context information was retained in the narrative proper (for example, the word ‘right’). In this example, some descriptive content was retained right up to the semantic triplet coding stage because context information was needed in order to allow consistency in labelling the semantic triplet participants (see below). Recall from section 3.3.2.2 that descriptive and action elements naturally overlap to some degree; it is not always possible to entirely separate out the two elements.

Coding The next stage coded semantic triplets, attribute data and event context in- formation from the narrative proper. Figure F.1 presents the coded data.

460 Figure F.1: The second pass model one coding worksheet for Cleaver (1981) - This figure shows the second pass coding worksheet for Cleaver (1981) (see Garfield, 2012). The semantic triplet elements are entered in columns D, F and H. The semantic triplet attribute codes are entered in columns E, G and I. Semantic triplet ids are found in column C. Columns A and B hold the event context data.

Semantic triplet id1

ˆ The analyst decided to place the ‘wanting’ of the mutant as the first event in the sequence because this seemed most appropriate given the semantic context; however, it is important to point out that, based on the coding rules, the analyst could just as appropriately have coded Perlman’s writing of the article (coded as the second triplet) as the initial event.

ˆ The narrator uses the phrase ‘we wanted’; however, it is not clear from any of the narrative text who encompasses the plural ‘we’. As such, it was not possible to adopt the normal approach to the coding of ‘we’ (see section 5.3.3). Instead, the analyst assumed that the narrator was including himself in ‘we’ and coded ‘Cleaver’ in place of ‘we’.

Semantic triplets id2 and id3

ˆ Although the narrator did not use the term ‘writes’, this action was se- lected as appropriate based on the narrator’s use of the term ‘by’. Recall from section 5.3.3 that action itself was not translated into network format; rather, only the process link was incorporated into the networks. In other words, provided the coding rules concerning the creation/addition of verbs

461 F. MODEL ONE CODING WALK-THROUGH

were followed, the coded event structure would not be affected by the choice of action. The action ‘writes’ was selected because it made sense semantic- ally and was a process group 4 verb. Note that two semantic triplets were required because ‘write’ is an AFFECT-g type verb that maps onto three semantic roles. The coded label ‘refers to’ simply indicates a relation; the label ‘links to’ would have been just as appropriate.

ˆ Note that ‘XP’ is coded as knowledge.

Semantic triplet id4

ˆ This triplet was especially challenging to code. Initially, the triplet was coded as ‘chronicle ran article’; however, problematically, the triplet gave no indication that the narrator actually read the article, even though the semantic context of the reported events suggested that the narrator knew details concerning the content of the article. This type of situation presen- ted frequently across the sample during the first pass coding. Narrative norms dictate that semantic triplets must have a role to play in support- ing the most reportable event. Given a semantic triplet such as ‘chronicle ran article’, one could effectively ask, so what? Therefore, a decision was taken prior to the second pass coding to code such triplets as ‘narrator read article’, or similar. Again, the choice of action did not affect the event structure since it was the process link that contributed to the relational structure.

Semantic triplets id5, id6 and id7

ˆ Phrases such as ‘the experiment showed’ or ‘the test produced’ were com- mon in the narratives comprising the study sample. Semantic context indic- ated that experiments and tests needed to be performed by someone, even if this someone (in the singular or plural) was not explicitly stated. Follow- ing Franzosi’s (2010) coding methods, an actor, normally the narrator, was explicitly coded in such situations. The process element for these semantic triplets was labelled with a semantically-reasonable verb (e.g. conducts, carries out, performs, etc.). Once again, the choice of action had no effect on the structure coded.

462 ˆ Note that the term ‘cells’ has been replaced in triplet id8 by ‘mutants’, the term used in triplet id1 because the referent is the same in both triplets. Note, however, that ‘XP’, which has a knowledge attribute, represents a different referent.

463 F. MODEL ONE CODING WALK-THROUGH

464 Appendix G

A Walk-Through of a Model Two Event Structure Coding

This appendix provides a walk-though of a model two event structure coding for the sample narrative Cleaver (1981) (see Garfield, 2012). The model two walk- though uses the same example narrative as the model one walk-through in order to allow direct comparisons between the two coding procedures. The comments that introduce Appendix F also apply here, and the reader may find it helpful to review those remarks. Note also that, because it is outlined in detail in Appendix F, the delineation of the narrative proper is omitted here.

Coding The model two procedure involved the creation of element lists and matrices.

The element lists Figure G.1 shows the element lists coded during the second pass through the narrative proper of Cleaver (1981).

ˆ Like model one, the model two event structure contains two agents and four events; however, no resources are coded during the model two procedure. Instead, ‘mutants’ is coded as knowledge. In fact, the model two procedure codes more knowledge attributes in total for this narrative than the model one procedure.

465 G. MODEL TWO CODING WALK-THROUGH

Figure G.1: The second pass model two element lists for Cleaver (1981) - This figure shows the second pass element list coding worksheets for Cleaver (1981) (see Garfield, 2012). Each element type was listed in a separate worksheet. Column A contains a unique id number and column B contains the element label. The resource element list is empty because no resources were coded from the narrative.

The matrices

ˆ Figure G.2 presents the AA matrix. Because the two agents have no direct links with each other, no ‘1’s are entered. Note that self-relations (i.e. along the diagonal of the matrix) do not count as direct links.

Figure G.2: The AA matrix for Cleaver (1981) - The figure reproduces the second pass model two AA matrix coded from Cleaver (1981) (see Garfield, 2012).

466 ˆ Figure G.3 presents the AE and EE matrices. A ‘1’ in a matrix cell indicates a direct link running from row element to column element. Note that the two agents never participate in the same event. Note also that, although the EE matrix captures the sequence of events, it is not possible to discern the event sequence from the AE matrix. It is for this reason that the model two event structures are not considered to capture sequence; in the model one event structure, the coding process captures the sequence for all of the semantic triplets.

Figure G.3: The AE and EE matrices for Cleaver (1981) - This figure reproduces the second pass model two AE and EE matrices coded from Cleaver (1981) (see Garfield, 2012).

467 G. MODEL TWO CODING WALK-THROUGH

ˆ Figure G.4 reproduces the AK matrix, which was the most complicated matrix coded from Cleaver (1981). The narrator, ‘Cleaver’, has a direct link to all of the knowledge elements; however, he also shares two knowledge elements with ‘Perlman’—‘article’ and ‘XP’.

Figure G.4: The AK matrix for Cleaver (1981) - This figure shows the second pass model two AK matrix coded from Cleaver (1981) (see Garfield, 2012).

ˆ No AR or ER matrices are possible for this narrative due to the lack of resource elements.

468 Appendix H

The Supplementary Coding Rules Adopted for Coding the Model One Event Structures

This appendix presents the supplementary coding rules that were established prior to the coding of the model one event structures. As discussed in section 5.3.3, these rules were applied in conjunction with existing rules set out in Franzosi (2010), Dixon (2005), Labov and Waletzky (1967, 1997) and Labov (2003).

Coding semantic triplets:

1. Identify the Labovian event narrative sections, especially the boundary between the complicating action and the evaluation sections.

2. Code only semantic triplets from the serendipity-related narrative proper. Apply narrative norms to determine the start and end points for the serendip- ity-related sequence of events.

3. Code the two semantic triplet participants (as subject and object) and semantic triplet process (as action).

4. Code each and every semantic triplet. Do not aggregate or combine triplets that could be listed separately.

469 H. MODEL ONE SUPPLEMENTARY CODING RULES

5. Code one semantic triplet from transitive verbs with two semantic roles. Code two or more semantic triplets from transitive verbs with more than two semantic roles.

6. Assign each semantic triplet a unique id number, beginning with 1. Order the semantic triplets chronologically. Bear in mind that triplets may not always be recounted chronologically in the narrative.

7. In cases involving two or more separate subjects, verbs and/or objects, code the relevant separate semantic triplets and order the triplets based on their appearances in the narrative. For example, from the sentence, ‘The man ate an apple and a pear’, create two semantic triplets: {man:ate:apple} (triplet 1) and {man:ate:pear} (triplet 2).

8. Semantic triplets are always active, with the direction of the action running from subject to object. In cases where a verb requires an action to be recip- rocated (e.g. consider ‘speak with’ versus ‘speak to’), create two semantic triplets, switching the subject and object in the second triplet, in order to capture the two-way direction.

9. Use semantic context to determine when the narrator uses different words to describe the same referent. Apply a single label in these cases. For example, code the same referent, ‘journal article’ and ‘paper’, with a single label. If it is clear from the semantic context that ‘journal article’ and ‘paper’ are not the same, then code each with a different label. In ambiguous cases, assume different referents.

10. Code common referents using generic labels; for example, experiments and results feature frequently in narratives of the research process.

11. Verb tense may be changed to maintain semantic coherence. Verbs may be created (e.g. the noun ‘collaboration’ converted to the verb ‘collabor- ates with’) or replaced with another verb in the same attribute category (see Coding attributes, rule 2) in order to clarify meaning or maintain semantic coherence. If a verb is created/added, then a transitive verb with

470 two semantic roles should be preferred unless it is clear that the semantic context demands a verb with more than two roles.

12. Code plural subjects/objects (e.g. ‘we’ or ‘experiments’) in the singular whenever possible. For example, if the ‘narrator conducts experiments’ and the semantic context makes it clear that there were three experi- ments, then code three semantic triplets, {narrator:conducts:experiment1}, {narrator:conducts:experiment2} and {narrator:conducts:experiment3}. If the number of ‘experiments’ is not clear from the semantic context, then code ‘narrator conducts experiments’ as one semantic triplet, retaining the plural object.

13. After coding, read through the coded semantic triplets to check semantic coherence and the match between narrative input and coded output.

Coding attributes:

1. Code subject and object attributes: people—3, information/knowledge— 2, object—1. Code methods, techniques, findings and anything spoken, written or sensed as information. Code animates other than people (e.g. rats) as objects. If in doubt about whether something is an object or information, then code it as information.

2. If the subject is a person, then code action attributes according to Hal- liday process group: material—1, behavioural—2, mental—3, verbal—4, relational—5, and existential—6 (see Franzosi, 2010, p. 22–23). If the sub- ject is an object or information, then code the process attribute as 0.

Coding event contexts:

1. Event contexts are usually bounded by times or locations reported in a narrative. The narrator may indicate an event context boundary by means of description added to action (e.g. ‘The next day’, ‘In the lab’, ‘The second time’, etc.).

2. Each event context may comprise several semantic triplets.

471 H. MODEL ONE SUPPLEMENTARY CODING RULES

3. Each semantic triplet belongs to a single event context; however, semantic triplet participants may appear in multiple event contexts.

4. Order event contexts chronologically, beginning with id number 1. If two events contexts occur simultaneously or if the order of event contexts is not made explicit by the narrator, then order event contexts based on their ap- pearance in the narrative or the semantic context of the narrative, whichever seems most appropriate.

472 Appendix I

The Two Procedures Used to Estimate Ego’s Absolute Total Load

This appendix outlines the two procedures used to estimate ego’s absolute total load from the model one and model two networks.

Procedure one The first estimate of ego’s absolute total load was calculated as

z = x + y

where

z = ego’s estimated absolute total load

and

x = the measured outdegree of the model two ego vertex

and

y = the measured indegree of the model one ego vertex.

473 I. ESTIMATION PROCEDURES FOR ABSOLUTE TOTAL LOAD

Procedure two The second estimate of ego’s absolute total load was calculated as

(a + b) c = 2 where

c = ego’s estimated absolute total load

and

a = the measured total degree of the model one ego vertex

and

b = the measured total degree of the model two ego vertex.

474 Appendix J

The Procedure Used to Estimate Ego’s Activity

This appendix outlines the procedure used to estimate ego’s activity from the model one and model two networks. Both binary and summed versions of model one were used. Only a binary version of model two was available for use.

The underlying assumption The estimation procedure was based on the assumption that, due to the coding methods used to collect them, the model one networks underestimated active load in general, and activity more specifically; in other words, the study assumed that the ego vertices in the model two networks tended to have higher outdegrees than the ego vertices in the model one networks. A comparison of the ego vertex outde- gree centralities in the two network models quantitatively supported this assump- tion: the model one ego vertices had a lower outdegree centrality (mean=0.50, median=0.47) than the model two ego vertices (mean/median=0.84).

Estimation The estimate of ego’s activity was calculated in four steps.

1. The difference between the model one summed ego outdegree counts and the model one binary outdegree counts was calculated as

475 J. ESTIMATION PROCEDURES FOR ACTIVITY

r = p − q

where

r = the difference in outdegree

and

p = the measured outdegree of the summed model one ego vertex

and

q = the measured outdegree of the binary model one ego vertex.

2. An outdegree for an estimated ‘summed’ version of the model two binary network ego vertex was calculated as

t = s + r

where

t = the estimated ego outdegree for a ‘summed’ model two version

and

s = the measured outdegree of the binary model two ego vertex

476 and

r = is the result of the calculation from step one.

This step essentially took the extra ‘outdegrees’ found in the summed ver- sion of the model one network and added these extra ‘outdegrees’ to the binary model two network, the only version available, to create an estimated summed version of the model two network. Note that, by using absolute measures, this procedure assumed the same order for the model one and model two network pair. This assumption was not always empirically sup- ported in the study, a fact of which the analyst was well aware (see Chapter 7).

3. An estimate for the outdegree centrality of the newly created ‘summed’ version of the model two network was calculated as

t m = (n − 1)w

where

m = the estimated ego outdegree centrality for

a ‘summed’ version of model two

and

t = the result of the calculation from step two

and

n = the measured order of the binary model two network

and

477 J. ESTIMATION PROCEDURES FOR ACTIVITY

w = the maximum weight of the summed model one network.

This step calculated weighted outdegree centrality according to Wei et al.’s (2011) formula (see section 3.5.6.5). Note that the summed version of net- work model two was assumed to have the same order n as the measured binary model. This was a reasonable assumption given that increases in outdegree relate to links rather than vertices. Based on the logic behind Step 2 of the estimation procedure outlined above, the maximum weight w of the model one summed network was used for the calculation in Step 3.

478 Appendix K

The Multidimensional Scaling Procedures Used in the Study

This appendix outlines the three multidimensional scaling procedures used to study the serendipity interaction motifs (see section 5.8.2). All analyses were completed using the >metaMDS function found in Oksanen et al.’s (2012) vegan package for R. The vegan package was designed primarily for ecological data analysis; how- ever, the package also acts as a wrapper to multidimensional scaling functions available in other R packages (for example, the >isoMDS function in the MASS package). As such, vegan is also a useful package for the analysis of non-ecological data. The data used by vegan’s >metaMDS function can be input either in the form of a n x m community data matrix, where n represents the sampled ‘sites’ and m represents the ‘species’, or as a dissimilarity matrix. The distance argument allows the analyst to specify the type of distance measure to be used in the analysis. The number of dimensions is indicated by the argument k. Additional arguments are also possible; for example, the present study used the argument trymax=100 to limit the number of runs to one hundred.

The motif data in matrix format Prior to the multidimensional scaling procedures, information about the de- tected motifs was stored in community data format. Thirty of the fifty sampled

479 K. MULTIDIMENSIONAL SCALING PROCEDURES networks contained motifs. These thirty networks were the ‘sites’ in the com- munity data matrix. Ten out of the thirteen possible motifs (see section 3.5.7.3) were found across these thirty networks. The motifs were the ‘species’ in the com- munity data matrix. The cells of the n x m matrix contained presence/absence data in the form of either a 1 or 0; for example, if the network in row 3 contained the motif from column 5, then a 1 was placed in the corresponding row 1–column 5 cell. A similarity motif x motif matrix based on the ‘presence’ data was created from the community data. The similarity matrix stored information about the number of times that motifs appeared together as pairs across the thirty motif- containing networks. The data entered in the matrix cells could range from 0, indicating that two motifs never appeared together in the thirty networks, to 30, indicating that two motifs always appeared together. Data was collected for every possible motif pairing, including self-pairing (i.e. the cells comprising the diagonal of the symmetrical matrix); for example, if motif id12 appeared in five networks, then a 5 was entered in the row id12–column id12 cell of the matrix.

Multidimensional scaling procedure one This first procedure illustrated which, if any, motifs dominated the sample and how the motifs grouped together. The procedure involved four steps:

1. the similarity motif x motif matrix was read as a table into R;

2. the function >as.matrix was called to tell R to interpret the data as a matrix object, jj;

3. the similarity matrix, jj was converted to a dissimilarity matrix, kk, by subtracting the similarities from a constant;1 and,

4. the function >metaMDS(kk, distance="bray", k=2, trymax=100) was called. 1Note that the analysis tried both the constant 30 (because the matrix entries could range from a minimum of 0 to a maximum of 30) and the constant max(jj).

480 The Bray-Curtis distance measure was selected because the aggregated pres- ence/absence data was treated as raw counts.2 For the interpretation of the presence/absence data as raw counts, the networks could also be treated as being equal samples in terms of ‘size and volume’—each network could contain a max- imum of ten motifs, one of each ‘species’ of motif, regardless of actual network magnitude.

Multidimensional scaling procedure two This second procedure grouped the networks according to motif patterning. The n x m community data was the input for the analysis. The >metaMDS function automatically calculates the n x n dissimilarity matrix from the input community data using a selected distance measure. The analyst selected the Jaccard Index dissimilarity measure because the input was presence/absence data with many absences.3 The output graph, visualised in two dimensions, showed both networks and motifs.

Multidimensional scaling procedure three This procedure had similar aims to procedure one, but was accomplished by different means. The third procedure acted as a check on the similarity to dis- similarity conversion method used in procedure one and allowed an alternative visualisation of the dominant motif data, one that had the option to include points for both motifs and networks in the output graph. A community data n x m matrix was again the input; however, for this third procedure, the data was pivoted so that n = motifs and m = networks. Pivoting the data meant that the n x n matrix calculated by the >metaMDS function was a motif x motif matrix instead of a network x network matrix (as was used in procedure two). As in procedure two, the >metaMDS function was called with distance="jaccard" and k=2.

2Note that Greenacre (2008) argues that this index is appropriate for certain relative data as well as raw score data. In this present study, the use of the raw scores versus the normalised scores (i.e. the raw scores divided by thirty) made minimal difference to the output graph. 3This distance measure accounts for the fact that there will normally be many ‘absences’ in presence/absence data; as such, a ‘presence’ should be deemed more significant than can be indicated by a simple 1 or 0 entry.

481 K. MULTIDIMENSIONAL SCALING PROCEDURES

482 Appendix L

Interrelations between the Campanario and Foster and Ford Serendipity Classifications

This appendix details the study’s interpretation of the results presented in Table 6.3. The results relate to the interrelations between Campanario’s (1996) and Foster and Ford’s (2003) serendipity classifications. Because the interpretation is somewhat involved, the discussion is presented in an appendix to avoid inter- rupting the flow of the narrative text in the main body of the thesis.

Campanario/Foster and Ford interrelations (detailed interpretation): Given the total number of descriptions recorded when the classifications were con- sidered separately (27 for Campanario, Class Two and 26 for Foster and Ford, Class Two), the count of joint occurrences (23) indicates a strong interrelation between Campanario Class Two and Foster and Ford Class Two. The results also provide some evidence for a weaker, either/or-type link between Campanario Class One and Foster and Ford Classes One and 4b. All of the Foster and Ford Class One descriptions (12) correlate with Campanario Class One. This, of course, means that twelve of the total number of Campanario Class One de- scriptions (23) likewise correlate with Foster and Ford Class One. But what about the remaining Campanario Class One descriptions (23 − 12 = 11)? Recall from section 2.3.2.3 that Campanario Class One refers to something

483 L. CAMPANARIO/FOSTER AND FORD INTERRELATIONS

‘reached accidentally’, while Foster and Ford Class 4b refers to something found ‘by chance’. The two ideas are semantically similar. Do the study results show an interrelation between Campanario Class One and Foster and Ford Class 4b? Indeed, the results do suggest such a link. The number of remaining Cam- panario Class One descriptions (23−12 = 11) and the number of joint occurrences of Campanario Class One and Foster and Ford Class 4b (7), although not equal, are similar enough to be indicative of a possible correlation. Furthermore, the number of joint occurrences of Campanario Class One and Foster and Ford Class One (12), when added to the number of joint occurrences of Campanario Class One and Foster and Ford Class 4b (7), gives a figure (19) fairly close to the total number of Campanario Class One descriptions (23).

484 Appendix M

Journal Title Derived Narrow Disciplines of the Study Sample

This appendix, comprising Table M.1, presents the results derived from the study’s attempt to class the sample Citation Classics by narrower discipline based on journal title. As discussed in section 6.3.3, these results were not incorporated into the research findings.

Discipline Classes Suggested by Journal Titles Counts

Medicine 21

Biology, etc. 11

Chemistry 6

Other science 5

Psychology, etc. 4

Law 1

Table M.1: Narrower discipline spread across the study sample based on journal title - This table gives the narrower discipline spread suggested by journal title across the forty-eight Citation Classics that made up the study sample. The discipline classes are listed beginning with the most frequently occurring class.

485 M. JOURNAL TITLE DERIVED NARROW DISCIPLINES

486 Appendix N

The Multidimensional Scaling Matrices Used in the Study

This appendix presents the matrices underlying Figures 6.26 and 6.28 illustrated in section 6.4.2.5. The two matrices are in a community data format, with ‘sites’ in the rows and ‘species’ in the columns. Note that the first matrix presented covers two pages of this appendix. For more information on the multidimensional scaling procedures used in the study, see Appendix K.

487 N. MULTIDIMENSIONAL SCALING MATRICES - (Cont. next page) Table N.1: Matrix underlying Figure 6.26 100 0000 1110 1000 1000 1000 0100 0000000100 0000 0110101001 000 0110001000 000 0010000000 000 0110111001 0100101001 1000011001 0001000010 0000000001 0000000000 Greenwood Warner Beadle Ashbaugh Fitzgerald Smythe Adams Reynolds Boyd Giblett Field Poulik Schatzmann Gocke id38 id46 id238 id78 id12 id6 id166 id102 id140 id164

488 =⇒ Blanchard Bull Harrison Kappas Hudson Cohn Fogg Kirk Weiss.1 Weiss.2 Weiss.3 Buzbee Mayer Milstein Kroto Myers

id38 000 0001000000010 id46 111 0100111111111 id238 111 0000100001111 id78 010 1000000000000 id12 011 0000110011110 489 id6 001 0100100011011 id166 000 1000000000000 id102 000 1010000000001 id140 000 0000000000000 id164 001 0000000000000 N. MULTIDIMENSIONAL SCALING MATRICES

id38 id46 id238 id78 id12 id6 id166 id102 id140 id164

Greenwood 1 0 0 0 0 0 0 0 0 0 Warner 0 1 1 1 1 0 0 0 0 0 Beadle 0 1 0 0 0 1 0 0 0 0 Ashbaugh 0 1 0 0 0 0 0 0 0 0 Fitzgerald 0 0 0 0 0 0 1 0 0 0 Smythe 0 1 1 0 1 1 0 0 0 0 Adams 0 1 1 1 1 0 0 0 0 0 Reynolds 0 0 0 0 0 0 0 1 0 0 Boyd 0 1 0 0 1 1 0 0 0 0 Giblett 0 0 0 0 1 0 1 0 0 0 Field 0 1 1 0 1 1 1 0 0 0 Poulik 1 0 0 0 0 0 0 0 0 0 Schatzmann 0 0 0 0 0 0 0 1 0 0 Gocke 0 1 0 0 1 1 1 0 1 0 Blanchard 0 1 1 0 0 0 0 0 0 0 Bull 0 1 1 1 1 0 0 0 0 0 Harrison 0 1 1 0 1 1 0 0 0 1 Kappas 0 0 0 1 0 0 1 1 0 0 Hudson 0 1 0 0 0 1 0 0 0 0 Cohn 0 0 0 0 0 0 0 1 0 0 Fogg 1 0 0 0 0 0 0 0 0 0 Kirk 0 1 1 0 1 1 0 0 0 0 Weiss.1 0 1 0 0 1 0 0 0 0 0 Weiss.2 0 1 0 0 0 0 0 0 0 0 Weiss.3 0 1 0 0 0 0 0 0 0 0 Buzbee 0 1 0 0 1 1 0 0 0 0 Mayer 0 1 1 0 1 1 0 0 0 0 Milstein 0 1 1 0 1 0 0 0 0 0 Kroto 1 1 1 0 1 1 0 0 0 0 Myers 0 1 1 0 0 1 0 1 0 0

Table N.2: Matrix underlying Figure 6.28

490 Appendix O

A Hand-Drawn Event Structure from the Pilot Stage of the Study

This appendix comprises Figure O.1, which reproduces an early hand-drawn event structure from the pilot stage of the study. As section 7.2.2 notes, the precise drawings illustrated in the network portfolio (see Suppl. Vol.) would have been impossible using such manual drawing methods.

491 O. PILOT STAGE HAND-DRAWN EVENT STRUCTURE i ' in}e'TekAbn h'e ."eJs oranSe' ftr*^J, 'u'l, d-ireJ r'\ ": ,.- 5 b t;"fo'Jo 'u+t) n"lucnk l^ eon'-;p^.nJ /-th"'tl3xle"[ - This figure reproduces an early hand-drawn event structure ,{:'a;*:2*; coone&A G'y' ',i)':*-" '' Aro'^)ra --^;;':fuSo*E-H!"' a-ru nu.-..J--s +a,'5 +';^51 a@*'e x H: oQ - ^o*e ,n^li r n "'Lud coi PeoP\e' c J ina- +tar.-; \,&orrwhonJ = = F'., -- 'lnu-l^ C A + l^-Vcg.Jiooi \o.5n- t',- No*. Figure O.1: Anproduced early during hand-drawn the event pilot study structure stage of the research. The legend and explanation were included in the original drawing.

492 Appendix P

Publications and Presentations that Derived from the Research

This appendix lists the publications and presentations that derived from the research prior to the thesis submission. The most recent examples are listed first. As section 7.3.3 notes, publications and presentations allow for the scrutiny of research by other scholars. In particular, the successful passage of research through the peer-review process is seen as indicative of a basic acceptance by the scholarly community (Teddlie and Tashakkori, 2009).

ˆ McBirnie (2012), Phenomenological event structures of serendipity. A multi- model network approach to information behaviour in context, Doctoral Forum presentation, Statistical Cybermetrics Research Group, University of Wolverhampton.

ˆ McBirnie and Urquhart (2011), Motifs: dominant interaction patterns in event structures of serendipity, Information Research 16(3): paper 494.

ˆ McBirnie (2011), Studying serendipity: a multi-model network approach to narrative, Conference paper, 7th UK Social Networks Conference, Univer- sity of Greenwich, 7-9 July, 2011.

ˆ McBirnie (2010b), The structure of serendipity: research approaches and practical applications, Seminar presentation, UCL Interaction Centre (UC- LIC), University College London.

493 P. RESEARCH-RELATED PUBLICATIONS AND PRESENTATIONS

ˆ McBirnie (2010a), Social network analysis, Workshop presentation, Research2 PhD Forum, Loughborough University, 8-9 July 2010.

ˆ McBirnie (2009), Serendipity, Workshop presentation, Oxford Internet In- stitute Doctoral Workshop, Oxford Internet Institute, University of Oxford.

494