UNIVERSITY of SOUTHAMPTON Designing Experiments on Networks

UNIVERSITY OF SOUTHAMPTON FACULTY OF SOCIAL, HUMAN AND MATHEMATICAL SCIENCES Social Sciences Designing Experiments on Networks by Vasiliki Koutra Thesis for the degree of Doctor of Philosophy September 2017 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF SOCIAL, HUMAN AND MATHEMATICAL SCIENCES Social Sciences Doctor of Philosophy DESIGNING EXPERIMENTS ON NETWORKS by Vasiliki Koutra Designing experiments on networks challenges an assumption common in classical experimental designs, which is that the response observed on a unit is unaffected by treatments applied to other units. This assumption is referred to as `non-interference'. This thesis aims at improving the design efficiency and validity of networked experiments by relaxing the non-interference assumption, where efficiency stands for low variance of the estimated quantities (precision) and validity for unbiased quantities (accuracy). We develop flexible and effective methods for designing experiments on networks (with a special focus on social networks) by combining the well-established methodology of optimal design theory with the most relevant features of network theory. We provide evidence that conventional designs such as randomised designs are inefficient compared to a systematic approach that accounts for the connectivity structure that underlies the experimental units. We investigate the impact of the network structure on the efficiency and validity of the experimental design. There is evidence that the experimental design is determined by the small-scale properties of networks. We also develop an algorithmic approach for finding efficient designs by utilising the network symmetry as defined by the automorphism group of the underlying graph. This approach reduces considerably the search time for finding a good design in moderate-sized networks. It works by decomposing the network into symmetric and asymmetric subgraphs and consequently decomposing the design problem into simpler problems on these subgraphs. Moreover, we suggest a framework for finding optimal block designs, while taking into account the interrela- tions of groups of units within a network. In doing so, the units are initially divided into blocks, using spectral clustering techniques and the concept of modularity, prior to assigning the treatments. We study how the structural properties of the network communities affect the optimal experimental design and its properties. We also make a transition from experiments on social networks to experiments in agriculture showing the diversity of applications this research can address. In particular, we obtain optimal designs with two blocking factors while handling different definitions of neighbour struc- tures related to either the distance among plots or the farmer operations. Throughout this thesis, several optimal designs on networks are obtained using a simple exchange algorithm, which is implemented in the R programming language. Contents List of Figures iii List of Tables vii Declaration of Authorship ix Acknowledgements xi 1 Introduction1 1.1 Preliminaries.................................1 1.2 Experiments on networks..........................5 1.3 Related literature...............................8 1.3.1 Causes of interference and design properties............9 1.3.2 Adjusting for treatment interference................ 10 1.3.3 Adjusting for response interference and spatial or temporal vari- ation.................................. 12 1.3.4 Recent work on spillover effects and modelling influence in networks 13 1.4 Contribution................................. 16 1.5 Thesis outline................................. 16 2 General framework 19 2.1 Preliminaries on optimal designs...................... 19 2.2 Standard randomised designs........................ 21 2.3 Search algorithms for optimal design.................... 25 2.4 Basics of graph theory............................ 28 2.5 Discussion................................... 38 3 Optimal designs with network effects 39 3.1 Background: modelling interference..................... 39 3.2 Designs with linear network effects model (LNM)............. 43 3.3 Some analytical results............................ 48 3.4 Patterns of optimally allocated treatments................. 56 3.5 Design efficiency and bias.......................... 64 3.5.1 Efficiencies of randomised designs.................. 65 3.5.2 Bias due to model misspecification................. 66 i 3.5.3 Misspecification of the network structure............. 69 3.6 Simple exchange algorithm......................... 73 3.7 Discussion................................... 75 4 Design and network symmetry 77 4.1 Preliminaries on graph automorphisms................... 77 4.2 Symmetry breaking and design....................... 83 4.3 Algorithmic approach............................ 91 4.4 Discussion................................... 98 5 Optimal block designs with network effects 101 5.1 Spectral clustering and block definition................... 101 5.2 Clustering algorithm............................. 103 5.3 Designs with network effects block model (NBM)............. 106 5.4 Comparison of optimal designs under different models.......... 111 5.5 Misspecification of cross-blocking connections............... 133 5.6 Robustness due to misspecified blocking.................. 135 5.7 Discussion................................... 140 6 Optimal row-column designs with network effects 143 6.1 Introduction.................................. 143 6.2 Designs with row-column network effects block model (RCNBM).... 145 6.3 Optimisation algorithm........................... 148 6.4 Adjacency matrix: practical issues and extensions............. 149 6.5 Comparison of optimal designs....................... 151 6.6 Discussion................................... 159 7 Discussion and directions for future research 161 7.1 Summary................................... 161 7.2 Current work: Autoregressive network effects model........... 163 7.2.1 Autoregressive network effects model (ANM)........... 164 7.2.2 Maximum likelihood estimation................... 165 7.2.3 Asymptotic variance of the maximum likelihood estimators... 166 7.3 Future work.................................. 169 Appendix 173 A Results for Chapter3............................ 173 B Results for Chapter5............................ 179 C Optimal designs from Chapter6...................... 184 Bibliography 195 ii List of Figures 2.1 A small social network and its adjacency matrix............. 31 2.2 Degree distribution of the network..................... 32 2.3 Special types of graphs............................ 34 2.4 Popular types of networks.......................... 35 2.5 Degree distribution of the random network................ 36 2.6 Transition from a ring to a random network via small-world network.. 36 2.7 Degree distribution of the small-world network.............. 37 2.8 Degree distribution of the scale-free network................ 37 3.1 An example social network......................... 47 3.2 Network of Figure 3.1( m = 2)....................... 47 3.3 Network of Figure 3.1( m = 3)....................... 48 3.4 Pairwise comparisons of the criterion values for all balanced designs.. 59 3.5 Optimal designs for Example 3.4.1 for φ1 and φ2 ............. 60 3.6 Pairs of units of different distances receiving the same or different treatments..................................... 61 3.7 Snapshots of three common types of networks............... 63 3.8 Snapshots of three common types of networks, optimal designs (φ1)... 63 3.9 Snapshots of three common types of networks, optimal designs (φ2)... 64 3.10 Boxplots of efficiencies calculated for all balanced designs ignoring network effects.................................. 65 3.11 Bias in treatment effects due to network effects.............. 67 3.12 Bias for treatment effects when γ1 = γ2 .................. 68 3.13 Degree distribution of the network of Figure 3.1.............. 70 3.14 Design efficiencies for 100 simulated misspecified networks........ 73 3.15 Design efficiencies for 100 simulated misspecified networks (connected graph)..................................... 73 4.1 Graph automorphism corresponding to a relabelling........... 78 4.2 All six automorphisms of the star graph of Figure 4.1, Aut(G)...... 80 4.3 (a) Original graph; (b) a graph automorphism............... 82 4.4 (a) Orbits are indicated by different colours; (b) quotient graph; (c) skeleton.................................... 82 4.5 (a) Partition P1; (b) Partition P2 ...................... 83 iii 4.6 Network of size n = 32. Different colours indicate different orbits, with white corresponding to the skeleton..................... 85 4.7 Optimal designs φ1 and φ2 .......................... 86 4.8 Optimal designs φ1 and φ2 for the asymmetric network skeleton..... 87 4.9 All optimal designs for φ1 for the asymmetric network skeleton..... 87 4.10 Optimal design for φ1 and φ2 (given fixed allocation for the skeleton).. 88 4.11 (a) Original graph; (b) Graph skeleton................... 89 4.12 Optimal designs for φ1 and φ2 (given fixed allocation for the skeleton). 90 4.13 Optimal design for φ1 and φ2 (without restrictions)............ 90 4.14 Binary tree.................................. 91 4.15 PhD social network.............................. 93 4.16 Degree distribution of the PhD network.................. 94 4.17 (a) Original graph; (b) Graph skeleton................... 94 4.18 Relative design efficiency versus the number of iterations under φ1 ... 95 4.19 Relative design efficiency versus the number of iterations under φ2 ... 96 4.20

Load more