Optimization by Simulation of an Environmental Surveillance Network Application to the Fight Against Rice Pests in the Mekong Delta (Vietnam)

PHD THESIS OF UNIVERSITY PIERRE ET MARIE CURIE Speciality Computer Science, Telecommunication and Electronics (Doctoral School EDITE) Presented by TRUONG XUAN VIET To obtain the degree of DOCTOR OF PHILOSOPHY OF THE UNIVERSITY PIERRE ET MARIE CURIE Title of the thesis: OPTIMIZATION BY SIMULATION OF AN ENVIRONMENTAL SURVEILLANCE NETWORK APPLICATION TO THE FIGHT AGAINST RICE PESTS IN THE MEKONG DELTA (VIETNAM)

Defended on June 24, 2014

EXAMINING COMMITTEE Didier JOSSELIN, Research Director, UMR ESPACE - Equipe d'Avignon (reviewer) Arnaud BANOS, Research Director, UMR 8504 Géographie-Cités (reviewer) Christophe CAMBIER, Associate Professor, University Paris 6 (examiner) Philippe CAILLOU, Associate Professor, University Paris 11 (examiner) Alexis DROGOUL, Research Director, UMI 209 UMMISCO, IRD (thesis co-director) LE NGOC Minh, Professeur, HCMUT (thesis co-director) HUYNH XUAN Hiep, Professeur, Can Tho University (co-encadrant)

Université Pierre & Marie Curie-Paris 6 Tél. Secrétariat: 0144272810 Bureau d’accueil, inscription des doctorants Fax: 0144272395 Esc G, 2ème étage Tél. pour les étudiants de A à EL: 0144272807 15 rue de l’école de médicine Tél. pour les étudiants de EM à MON: 0144272805 75270 ‒ PARIS CEDEX06 Tél. pour les étudiants de MOO à Z: 0144272802 E-mail: [email protected]

Abstract Insect invasions represent major threats for rice ecosystems. In South-East Asia, the two most critical kinds come from the white-back plant hopper (WBPH) and the brown plant hopper (BPH) [1] [2]. In Vietnam in particular, BPHs critically damage rice yields not only through direct attacks on young rice branches but also through various indirect damages, like infecting plants with viruses [3]. Depending on the food source and the population density, BPH can migrate along the wind direction on a very large scale [4][5], and several generations of insects can be produced during one invasion [6][7]. Monitoring the propagation of BPH is therefore a requirement, as it is one of the conditions to design appropriate strategies of prevention against their invasions. In the Vietnamese Mekong Delta, for example, the monitoring of BPH invasions is done through a surveillance network composed of more than 300 light-traps that record the surrounding density of adult insects on a daily basis. However, this network is limited on two important aspects: firstly, the location of traps is fixed and therefore provides a sampling that is not always adapted to the current invasion; secondly, the estimation of BPH densities from this sampling has been done for years using basic models that do not take the (past and current) dynamics of the invasions into account. This particular network is a good example of multiple environmental surveillance networks that have been designed once and would require to be optimized further because either their initial design was faulty or, more often, their environmental context has changed since their deployment. A number of proposals of optimization algorithms have been made in the past years, but it is always complicated, or too costly, to prove their efficiency in a real context, which makes it difficult to convince the end users to apply these “optimal layouts”. Starting from this example, this PhD thesis is dedicated to propose a generic solution to both problems at once: on one hand, it will define a method to build better prediction models of invasions in order to optimize the placement of “nodes” in a surveillance network, and, on the other hand, it will improve the estimation models in order to acquire a better image of a given invasion from the samples provided by a network. To overcome the difficulty of experimentation in a real context, the originality of our approach has been to combine these two aspects in a simulation of the network and its environmental context using agent-based modeling techniques. The outcome of this work is a generic approach, validated on the BPH surveillance network in the Mekong Delta, which can be applied to any environmental surveillance network, and in which our main contributions are: 1) The design of a virtual laboratory called NOVEL (Network Optimization Virtual Environment Laboratory), based on the coupling of a model of the network with one or more models of its environmental context. 2) The automated optimization of the performances of the simulated network using optimization algorithms at different levels of representation against various environmental scenarios (such as scenarios of growth and propagation of BPH). We

also propose a novel optimization algorithm at macro-scale, namely the CDSN- based optimization algorithm. 3) The novel method of experimental assessments of any network layout based on simulation. These assessments are based on some quantitative criteria to evaluate the performances of an environmental surveillance network.

Acknowledgements

First, I would like to express my sincere gratitude to my two advisors, Alexis Drogoul and Le Ngoc Minh, and to my local supervisor (co-encadrant in french), Huynh Xuan Hiep, for their continuous support, patience, motivation, enthusiasm, and immense knowledge. Their guidance helped me in doing research and writing this thesis. I would like to thank them as well for all the activities they organized, such as the training programs, coding camps, which brought me to the world of research. I would like to thank the organizers, professors, and secretaries of the PDI program (Programme Doctoral International) for allowing me to participate to the international network. A special thanks to Jean-Daniel Zucker, Edith Perrier, Christophe Cambier, Didier Josselin, Arnaud Banos, Philippe Caillou, Nicolas Marilleau, Benjamin Roche, Vo Duc An, Evelyne Guilloux, Patricia Zizzo, Patrick Taillandier, Benoit Gaudou, Nathalie Corson and Kathy Baumont for their advices, lessons and supports. I would like to thank my professors at UPMC (Paris), HCMUT (HCMC), IRD (Bondy) and IFI (Hanoi). Especially Pham Tran Vu, Nguyen Duc Thai, Tran Van Hoai, and Le Thanh Van for the support they offered during my PhD studies at HCMUT. In addition, I would like to thank the members of my thesis committee, for their insightful comments, and hard questions. My sincere thanks also go to Le Quyet Thang and Ngo Ba Hung for offering me the initial opportunity to apply to this PhD program, and for their visits to Ho Chi Minh City, and to all my colleagues at Can Tho University Software Center, especially to Nguyen Hoang Viet and Nguyen Phu Truong, who gave me the best conditions for doing research during these almost four years. I thank all my fellow lab mates at DREAM Team, GAMA developers and CIT for the stimulating discussions, Do Thanh Nghi, Nguyen Nhi Gia Vinh, Pham Nguyen Khang, Nguyen Thai Nghe, Pham The Phi, Truong Minh Thai, Truong Chi Quang, Huynh Quang Nghi for their supports, suggestions for my research, and for all the fun we have had together. I thank the members of the CENRES, especially Vo Quang Minh, Nguyen Hieu Trung and Le Anh Tuan for providing me with the data on which this thesis is based. I also thank my friends, Luu Tien Dao and Tran Nguyen Minh Thu, Nguyen Hai Quan, Victor Mose, Gaoussou Camara for their friendship and helpfulness. Last but not least, I would like to thank my family: my parents Truong Van Xuan and Le Thi Tuyet Hanh, for their constant support in my life, and my wife Nguyen Thi Ngoc Thu, for all her love and understanding. Thanks to my son, Truong Xuan Bach, pleasure and solace after each stressful time. I also thank my sisters and brothers for their dedicated help during my study, Pham Van Nhan, Truong Thi Minh Giang, Truong Xuan Nam and Nguyen Thi Le Huyen.

iii

Table of Contents

CHAPTER 1. INTRODUCTION ...... 1 1.1. Introduction ...... 1 1.2. Context ...... 3 1.3. Research questions and organization of manuscript ...... 6 CHAPTER 2. OBJECTIVES AND STATE OF THE ART ...... 10 2.1. Complexity and scales ...... 10 2.2. Modeling and simulating the whole system ...... 11 2.3. Difficulties related to insect population modeling ...... 13 2.3.1. Randomness vs. Autocorrelation ...... 13 2.3.2. Spatial dispersion and nonergodicity ...... 14 2.3.3. Heterogeneity ...... 17 2.4. How to build the virtual laboratory (NOVEL)? ...... 18 2.4.1. Components of NOVEL ...... 19 2.4.2. Which approach can be used? ...... 20 2.5. How to couple the submodels? ...... 21 2.5.1. Spatial coupling ...... 22 2.5.2. Temporal coupling ...... 23 2.6. How to model the BPH trap-density data? ...... 23 2.6.1. How are the BPH trap-density data modeled? ...... 23 2.6.2. Gaussian process ...... 24 2.7. How to estimate the BPH trap-densities of unmeasured locations? ... 25 2.7.1. Kriging estimation ...... 25 2.7.2. Noising and smoothing processes ...... 26 2.8. How to optimize the light-trap density network? ...... 27 2.8.1. Quantifying the uncertainty of the estimated results ...... 27 2.8.2. Typical optimization approaches ...... 28 2.9. Conclusion ...... 30 CHAPTER 3. NETWORK OPTIMIZATION VIRTUAL ENVIRONMENT LABORATORY (NOVEL) ...... 31 3.1. Motivation ...... 31 3.2. Ecological Models ...... 32 3.2.1. Requirements and solutions ...... 32 iv

3.2.2. Entity specifications ...... 33 3.2.3. Environmental submodels ...... 36 3.2.4. Biological submodels ...... 41 3.3. Model of the light-trap network ...... 45 3.3.1. Disjointed Surveillance Network (DSN) ...... 45 3.3.2. Correlated & Disk graph-based Surveillance Network (CDSN) . 46 3.4. Experiments of NOVEL ...... 54 3.4.1. NOVEL in GAMA ...... 54 3.4.2. Verification of the BPH prediction model ...... 56 3.5. Conclusion ...... 59 CHAPTER 4. MULTIPLE SCALES OPTIMIZATION ...... 61 4.1. Introduction ...... 61 4.2. Data description ...... 63 4.3. Heuristic search used in network optimization ...... 65 4.3.1. Why is it necessary to apply a heuristic search? ...... 65 4.3.2. Universal Kriging Standard Error (UKSE) ...... 66 4.4. Micro-scale approach of network optimization ...... 67 4.4.1. Hill-climbing search algorithm ...... 67 4.4.2. Tabu search algorithm ...... 69 4.4.3. Flock problem of micro-scale algorithms ...... 71 4.5. Meso-scale optimization ...... 71 4.5.1. Improved hill-climbing search algorithm ...... 72 4.5.2. Particle Swarm Optimization (PSO) ...... 73 4.6. Macro-scale optimization ...... 76 4.6.1. Optimization strategy ...... 76 4.6.2. Experiments ...... 79 4.7. Conclusion ...... 80 CHAPTER 5. EXPERIMENTAL ASSESSMENT ...... 81 5.1. Why is it necessary to assess the performance of a surveillance network? ...... 81 5.2. Characteristics of a good surveillance network ...... 82 5.3. Performance Indicators of surveillance network ...... 85 5.3.1. Indicator 1: Root mean square error of variogram model ...... 86 5.3.2. Indicator 2: Deviation of cross-validation ...... 88 5.3.3. Indicator 3: Mean universal Kriging variance (MUKV) ...... 89

5.3.4. Aggregate performance indicators ...... 90 5.4. Experiments ...... 90 5.4.1. Improved hill-climbing search algorithm ...... 91 5.4.2. Particle Swarm Optimization (PSO) ...... 93 5.4.3. CDSN-based optimization ...... 93 5.4.4. Comparison between different optimization algorithms ...... 94 5.5. Conclusion ...... 95 CHAPTER 6. METHODOLOGICAL PROPOSAL...... 97 6.1. Introduction ...... 97 6.1.1. Parameters ...... 98 6.1.2. Functional components ...... 99 6.2. Methodological proposal ...... 102 6.3. Implementation and Release ...... 104 CHAPTER 7 ...... 105 CONCLUSION AND PERSPECTIVES ...... 105 REFERENCES ...... 109 PUBLICATIONS ...... 123 SUBMITTED PAPERS ...... 123 Annex 1 - DATA TABLES ...... 124 Annex 2 - CODE SNIPPETS IN GAML (GAMA PLATFORM) ...... 127 Annex 3 - CODE SNIPPETS IN R LANGUAGE...... 147

List of acronyms ACF Autocorrelation Function ANOVA ANalysis Of VAriance ARIMA Autoregressive Integrated Moving Average ARMA Autoregressive Moving Average CA Cellular Automata BPH Brown Plant Hopper CDSN Correlation& Disk Graph-based Surveillance Network DSN Disjointed Surveillance Network EBM Equation-Based Model GIS Geographical Information System GP Gaussian Process GPE Gaussian Process Entropy GPR Gaussian Process Regression GPs Gaussian Processes IBM Individual-Based Model MSE Mean Squared Error ODE Ordinary Differential Equation OK Ordinary Kriging RBF Radial Basis Function RMSE Root Mean Square Error SE Standard Error SK Simple Kriging SNM Surveillance Network Model SRPPC Southern Regional Plant Protection Center UDG Unit Disk Graph UK Universal Kriging UKSE Universal Kriging Standard Error UKV Universal Kriging Variance

vii

List of figures

Figure 1. Annual invasions of brown plant hoppers (BPH) in southeast ASIA...... 3 Figure 2. BPHs attack the young branches of rice...... 4 Figure 3. Insect surveillance network in the Mekong Delta region...... 4 Figure 4. Mean trap-density of BPH collected by 48 light-traps in 3 provinces: Hau Giang, Soc Trang and Bac Lieu in 2010...... 5 Figure 5. Recent studies and difficulties...... 5 Figure 6. Thesis organization...... 9 Figure 7. Appropriate modeling methods for different scales of simulation domains (Course of “Complex systems” of Edith Perrier, IRD/France)...... 11 Figure 8. Two different dispersion data (Source [62])...... 15 Figure 9. Histograms of two different dispersion data (Source [62])...... 15 Figure 10. Variograms of two different dispersion data (Source [62])...... 16 Figure 11. Standardized residuals (Trap-density data at Dai Thanh, Nga Bay Town, Hau Giang Province, Vietnam)...... 16 Figure 12. Time series prediction by ARIMA (Trap-density data at Dai Thanh, Nga Bay Town, Hau Giang Province)...... 17 Figure 13. Scaling out communication to rural farmers: lessons from the “Three Reductions, Three Gains” campaign in Vietnam [27]...... 18 Figure 14. Abstract view of NOVEL...... 19 Figure 15. Estimated trap-densities by using Universal Kriging technique...... 28 Figure 16. Universal Kriging variances...... 28 Figure 17. General architecture of NOVEL...... 32 Figure 18. Three state variables: attractiveness, obstruction indices (Environment) and density vector (Biology)...... 33 Figure 19. Temperature and Humidity indices...... 38 Figure 20. Temperature factor of Bac Lieu meteorological station...... 39 Figure 21. Humidity factor of Bac Lieu meteorological station...... 39 Figure 22. Combined weather index from two single indices...... 40 Figure 23. Rice cultivated regions in the Mekong Delta region (Data source: CENRES/CTU)...... 41

viii

Figure 24. Example of a BPH growth model (Source: Ngo (2008))...... 42 Figure 25. Different stages of BPH (Source [7])...... 42 Figure 26. Density vector V used for maintaining the BPH densities of each day-age in the whole BPH growth process...... 43 Figure 27. BPH migration model ‒ A pictorial view of migration process...... 44 Figure 28. Coupling between Model of the light-trap network and Ecological Models presented in their class diagram...... 45 Figure 29. Different types of surveillance device...... 47 Figure 30. Displacement vs. Path distance...... 49 Figure 31. Connection between two nodes...... 52 Figure 32. Inputs/outputs of node...... 52 Figure 33. Correlation & Disk graph-based Surveillance Network ‒ Macro-scale network organization...... 53 Figure 34. Visualization of BPH prediction model in GAMA platform...... 54 Figure 35. Empirical data vs. Estimated data...... 55 Figure 36. Dispersion of environment against the dimension of cellular automata. .. 56 Figure 37. Empirical data from January 01, 2010 of 48 light-traps in three provinces: Soc Trang, Hau Giang and Bac Lieu...... 58 Figure 38. Prediction data in two different models...... 58 Figure 39. General view of NOVEL...... 60 Figure 40. Micro-scale network organization...... 62 Figure 41. Meso-scale network organization...... 63 Figure 42. Placement of light-traps in three provinces: Hau Giang, Soc Trang and Bac Lieu, Mekong Delta, Vietnam...... 64 Figure 43. Two heuristic cellular automata in 3D visualization...... 67 Figure 44. Hill-climbing search process...... 69 Figure 45. Flock problem of greedy algorithm...... 71 Figure 46. Example of close neighbors, considered in the balance function...... 73 Figure 47. Screen shot of the balance function & micro-approach optimization...... 73 Figure 48. PSO optimization process ‒ Meso-scale network organization...... 76 Figure 49. Layout with only two first objective functions: Gaussian process entropy and Balance of network...... 79

Figure 50. Layout with all three objective functions: Gaussian process entropy, Balance of network and Local-constraints...... 79 Figure 51. Three data scales of surveillance network...... 83 Figure 52. Example of Sea height data (AVISIO datasource)...... 84 Figure 53. Two different empirical variograms of BPH light-trap data...... 85 Figure 54. Example for a random field F(Z,T) and a set of sampling locations L. .... 87 Figure 55. Dependence of MSEs on the number of random sampling locations...... 87 Figure 56. Directional variograms at 0, 45, 90, 135, 180, 225, 270 and 360 degree.88 Figure 57. Cross-validation in Hau Giang Province (Kriging interpolation)...... 89 Figure 58. Ordinary greedy optimization...... 91 Figure 59. Improved hill-climbing search algorithm ‒ Assessing the network performance...... 92 Figure 60. Improved hill-climbing search algorithm ‒ Network performance in 15 simulation instances...... 92 Figure 61. PSO - Network performance in 15 simulation instances...... 93 Figure 62. CDSN-based optimization - Network performance in 15 simulation instances...... 94 Figure 63. Mean value of RMSEs (of variogram) with different optimization algorithms...... 95 Figure 64. Abstract view of the general virtual laboratory...... 98

List of tables Table 1. Two main groups of submodels: Environmental and Biological submodels...... 33 Table 2. Natural agents...... 34 Table 3. Rice cultivated region agents...... 35 Table 4. Cellular automata...... 35 Table 5. Administrative area agents...... 36 Table 6. Affects of local constraint factors into two main environmental indices...... 37 Table 7. Specification of “Node” agent...... 46 Table 8. Specification of “Edge” and “Graph” agents...... 50 Table 9. Three scenarios of BPH characteristics...... 57 Table 10. RMSE of simulation data in three scenarios...... 59 Table 11. Light-traps in three provinces: Hau Giang, Soc Trang and Bac Lieu in the Mekong Delta region, Vietnam (Data source: Departments of Plant Protection)...... 64 Table 12. Optimization algorithms in NOVEL...... 80

Chapter 1 ‒ Introduction

CHAPTER 1

INTRODUCTION

Chapter 1 introduces the context and research problems of this PhD thesis. We also introduce the drawbacks of the traditional approaches used for optimizing environmental surveillance networks, and focus on the modeling and simulation approach as a novel solution to alleviate them. The overview of the application of this research to the optimization of a light-trap network in the Mekong Delta is presented to clarify our objectives.

1.1. Introduction

An efficient surveillance network is an invaluable tool to monitor and assess the different states of a given ecosystem [8] [9]. With the information collected by such a network, predictions can be generated using thorough spatio-temporal analyses, which can then support decision makers and stakeholders. Human dominated ecosystems are highly dynamic and complex , where most of the observed variables have mutual non-linear interactions. In addition, the human activities have a considerable impact on almost all ecosystems they inhabit, where they tend to disrupt the ecological balance in short period of times. The surveillance of such ecosystems by different technical solutions is complex and dynamic where designing an “optimal” surveillance network, i.e., a network that would reflect an almost realtime situation of an ecosystem. Often traditional optimization techniques fail to reflect the evolutions of the reality associated with these ecosystems. An example of such a situation, is the Mekong Delta region of Vietnam, where the provincial agricultural managers are concerned with the regular invasions of Brown Plant Hoppers (BPH), a particularly active rice pest, because of the diseases they carry and transmit to the rice yields. Their biggest concern is having a constantly accurate account of the current distribution of BPH waves, since it is the basis of establishing different prevention strategies. The time frame is short for applying these strategies: at least one week is needed between the moment where a prediction of the density of BPH can be estimated by the experts and warnings are sent to farmers and other end

Chapter 1 ‒ Introduction users and the moment where a strategy can be efficiently applied. To improve the efficiency of the system, the Vietnamese government has established a light-trap network [10]–[12] that can capture multiple kinds of insects, especially BPH, and which data (the density of insects per trap) is collected and analyzed daily. Maintaining this network in a good state of operation has become an important national program of the Ministry of Agriculture and Rural Development of Vietnam since 2006. Although the current light-trap network is considered as a necessity for supporting the fight against various plant pests, it has three restrictions: (1) It misses detailed accounts on the life cycle of the BPH: the density of insects in each trap only reflects the density of adult insects, whereas this insect has three main stages in its life cycle: egg, nymph, and adult stages, the two first forms being highly critical to be observed. (2) It has not evolved since its initial design and has not, therefore, completely adapted to the huge changes that the ecosystem of the Mekong Delta has undergone in the recent years especially due to limitations in management. (3) The network itself is too sparsely distributed: the average distance between two light-traps in the Delta is about dozens of kilometers, which is largely enough to let some BPH slip in between two light-traps. Normally, the data collected by the network are supplemented by a manual gathering of information collected on the field by agents of plant protection stations. This information is supposed to be combined with the light-trap data to gain a better understanding of the status of BPH invasions and to partly help to overcome the shortcomings of the network. However, these shortcomings make data collected by people a more trusted source of information, while the data obtained from light-traps are only considered as an additional source, and a not very reliable. This situation makes the government investment to be underutilized and places the decision makers in a situation where they depend on informally collected of information. Therefore, several proposals have been made to “optimize” this network [13] [14] so that it could constitute the primary and most reliable source of information on the status of insect invasions, become more quickly adaptable to the changing environment of the Delta and produce data that could be analyzed more efficiently. However, it is difficult to prove the performance of these proposals before applying them in reality, so most of them have remained purely theoretical. As a matter of fact, (1) the cost of real experiments is too high and, because most of the farmers in the Delta depend on rice cropping, it is impossible to “play” with the risk of invasions and (2) there are multiple factors that can affect the performance of the network, including human factors (since the collection is done by hand), and it is difficult to take all of them into account. Therefore, proposing solutions to enhance the efficiency of this particular network has become an important challenge in recent years and it forms the basis of this PhD thesis in proving that new approaches are needed. In particular, we will focus on the use of a modern modeling and simulation approaches coupled with a lot of traditional optimization algorithms to provide solutions to this problem.

Chapter 1 ‒ Introduction

In order to assess the true complexity of the problem and the interest of our proposal, it is necessary to gain a deeper understanding of the global context in which this research has taken place. Although our proposal has been designed to be applicable to other contexts, the one on which it has been primarily validated need to be completely presented. 1.2. Context

In Asia, the stability of rice food sources is threatened by different invasions of insects, among the critical ones are the white-back plant hopper (WBPH) and the BPH [1] [2]. In the proceedings of the “International Conference on Threats of Insecticide Misuse on Rice Ecosystems: Exploring Options for Mitigation” dated from December 16, 2011 in Hanoi, Vietnam, a report on the damages caused by the BPH estimate them to be extremely large: China lost 2.7 million tons of rice in 2005 and annually loses about 1.0 million tons; Thailand lost 1.1 million tons from 2007 to 2011; in Vietnam, only in 2007, this country lost about 0.7 million tons.

Figure 1. Annual invasions of brown plant hoppers (BPH) in southeast ASIA.

Figure 1 shows a map of the yearly BPH/WBPH invasions in Asia. In the Mekong Delta, the threat is essentially caused by BPH. And in fact, the plant protection centers encounter many difficulties in controlling this insect because of its complex dynamics. Firstly, the BPH life cycle is rather short with three stages: egg, nymph, and adult [6] [7] [15]. This insect can reproduce not only on rice but also on some other kinds of grass. Secondly, BPHs can migrate following the dominant winds to find their food sources [17] [18]. Several studies made by Otuka in 2009 noted that they can migrate on a very large spatial scale [4] [17]. To efficiently monitor the distribution of BPH, as we have already said before, and following a large invasion in 2005, a surveillance network with more than 300 light-traps has been established in 2006 in the Vietnamese Mekong Delta to help the managers

Chapter 1 ‒ Introduction

recording the densities of adult insect on a daily basis. The initial intention was to build a systematic sampling network, where one light-trap (or several ones) would be considered as the representative(s) for one district (subdivision of provinces). This network has proved its interest in capturing data like the diversity of insects and can monitor the densities of about twenty different insect species. However, its main weakness lies in the spatial sparsity of the collected data, which causes a high variance of all statistical estimations undertaken (e.g., Kriging estimation [19][20]). Characteristics of the light-trap network

Figure 2. BPHs attack the Figure 3. Insect surveillance network young branches of rice. in the Mekong Delta region.

The light-trap network established in the Mekong Delta has not evolved since 2006, since the location and sampling rate of each light-trap are fixed. There are actually several reasons for this stability: the availability of a power source nearby, the necessity to have trained personnel who are available, and some less obvious reasons due to the choices made by the management. Therefore, it cannot easily be adapted to changes of the environment (urbanization of areas, disappearance or appearance of rice fields, climatic changes) that may affect the spreading of BPH, and cannot easily be modified, except if these modifications are proved to improve its functioning on the long-term. Another characteristic of this network, and a difficult one to deal with, lies in the heterogeneity of the data collected, even between light-traps. Of course, the first reason lies in the weak density of the network, which can make each trap actually capture insects in very heterogeneous environmental conditions, but there are other explanations that are less obvious to take into account if we want to optimize it, such as human factors (e.g., keepers forgetting to inspect traps or report their result for several days, errors made in the manual reporting of the data, etc.). This is illustrated in Figure 4, which shows the mean trap-density of BPH collected by 48 light-traps in the year 2010.

Chapter 1 ‒ Introduction

Figure 4. Mean trap-density of BPH collected by 48 light-traps in 3 provinces: Hau Giang, Soc Trang and Bac Lieu in 2010.

In practice, then, as already said, the data collected by this network is not considered as reliable due to two main limitations: (1) low sampling density and (2) many impact factors are not taken into account. Therefore, the analysis and forecasting methods based on light-trap data are also less accurate.

a) Light-trap b) Field monitoring frame

Figure 5. Recent studies and difficulties.

Prior to this research, a number of endeavors on optimizing the light-trap network have been undertaken by researchers at Can Tho University and SRPPC (the Southern Regional Plant Protection Center, owner and manager of the light-trap network). And several actual experiments have been done in dedicated case studies: (1) The setup of a denser network made-up of one light-trap per small town (subdivisions of district) in Tien Giang Province since 2010. (2) The setup of a dense network, coupled with the light traps, of field monitoring frames (about 120 frames / small town) in Trung An, Thot Not district, and Can Tho (operated in

Chapter 1 ‒ Introduction

cooperation between Can Tho University and the Plant Protection Department of Can Tho in 2008) in order to capture the insects at different stages of their life-cycle [14]. (3) The setup of a hybrid network of light-traps and field monitoring frames in Hau Giang Province [13]. Another attempt at improving the usefulness of the network was done in the Vietnamese national project KC.01.15/06-10: “Design of Information Systems for supporting the Protection against Epidemic Diseases of crop plants and aquaculture in major economic areas”, developed by Can Tho University in coordinating with the SRPPC and the IRD/UMI UMMISCO [21]. In this project the researchers tried to increase the number of variables monitored by the existing network nodes in small towns of the Tien Giang and Dong Thap Provinces. Beside a number of specific thematic questions, the researchers principally ended up worrying about the performance of the network regarding the quality of the data and the difficulty to use it as an input for any prediction or estimation models. Their conclusion was that it was urgent to optimize the placement of the nodes so that the network could be used effectively for what it was intended to in the first place. However, establishing and maintaining a strategy that would consist in moving the nodes of the network from time to time in order to adapt it to the changes in the environment or to temporarily increase its density in highly infested areas, which is the most practical approach to solve the problems cited above, requires the mobilization of important financial resources, and the decision makers in charge of this network (mainly the SRPPC) are reluctant to take any decision in that direction until they can be convinced that this is the best one available. 1.3. Research questions and organization of manuscript

Regarding to the available optimization approaches in environmental surveillance networks, the problem is that there are limited options available to convince decision makers: (1) practical experiments at small scales cannot be proved to remain successful at large scales and (2) theoretical works on network optimization neglect, for the most part, the internal or external factors that greatly affect the BPH populations, such as the farming habits, the use of pesticides, the variability of the weather, etc. The approach we propose in this manuscript is an attempt at overcoming the difficulties of the two approaches and at providing sufficient evidences to help the decision makers to make best decision over the future of the network. It is also a way to reconcile the theoretical and practical approaches, by (1) capturing in a model the richness of the complex system represented by the network and its ecosystem; (2) applying mostly theoretical optimization algorithms to simulations of this model in different realistic scenarios in order to find the best possible solutions for the placement of its nodes. Normally, optimizing such a network revolves around two main questions: (1) What would be the most appropriate density of traps?

Chapter 1 ‒ Introduction

(2) What would be its most appropriate design, both in terms of placement and sampling rates of the traps? The difficulty of the first one is that, although it can be easily answered by some field experiments, these will be expensive to run and not so evident to generalize given the heterogeneity of the ecosystems composing the Mekong Delta. The second question raises other difficulties, which are usually addressed by optimization algorithms, among which the approaches known as optimal design [22] and metaheuristic optimization [23]–[25] are the most employed ones. The main difficulty of these approaches is that their performances can only be evaluated with respect to the existing sampling data, which is not always available in the real context. Furthermore, depending on the available data, these approaches may propose solutions that cannot be applied by the managers of the network. On one hand, if the data is too dense, the algorithms often try to remove several sampling locations or to decrease the sampling frequency in order to save costs. But, this solution, which might well be adapted for past sampling data, is potentially harmful regarding future situations, as the density of insects can be affected by human impacts (e.g., using pesticides, fertilizers and seed sowing by farmers [26][27]) or other causes that these algorithms do not take into account. On the other hand, if data is missing, these algorithms often deal with a high uncertainty in their estimations. This second problem appears more often than the first one, and additional optimization can be proposed to resolve it, but the lack of empirical data is clearly the main weakness of the traditional use of optimization algorithms. Finding a methodology that would enable deciders and experts to assess these “optimal” solutions in different scenarios, using both forecasted data and estimations of missing data, is the main purpose of this research, as the development of modeling and simulation techniques provides us with an opportunity to build virtual environments in which we can experimentally verify and assess the efficiency of potential network layouts. This research is therefore divided into five main topics: 1) Proposing a set of quantitative criteria to evaluate the performance of an environmental surveillance network. 2) Designing a virtual laboratory that couples a model of the surveillance network with one or more models of the environment. 3) Designing a set of optimization algorithms that can operate at different levels of representation of the simulated network. 4) Measuring the performances of these optimization algorithms in simulations against a handful of scenarios of growth and propagation of BPH. 5) Generalizing the approach for other environmental surveillance networks.

Chapter 1 ‒ Introduction

The two first chapters constitute the general introduction of the thesis. In Chapter 2, we provide more information about NOVEL: How it is built and its theoretical foundations in both modeling and optimization. In Chapter 3, we introduce the design and the implementation of the laboratory, with a special attention to the components needed to reconstruct a realistic virtual environment. Chapter 4 presents three different scales of network optimization: micro-, meso- and macro-scale approaches. For each scale, we detail how it is implemented in NOVEL and how it is bound to be experimented within it. Chapter 5 contains the most important contribution of this thesis. In this chapter, we introduce some objective indicators in order to evaluate the performance of the surveillance network in NOVEL. This contribution is then shown to provide a useful support for all the users whether or not they want to assess a network inside or outside NOVEL. Finally, the general steps required to generalize NOVEL to related research domains are exposed in Chapter 6. We believe that our methodological proposal can be easily generalized and we show how, taking the examples from several domains. The conclusion and the future directions we intend to take are presented in Chapter 7. Figure 6 shows the organization of this thesis with all the chapters and their principal notions and links.

Chapter 1 ‒ Introduction

Chapters Important notions & Links

Brown Plant Hopper (BPH) (Section 1.1) CHAPTER 1. INTRODUCTION Light-trap network (Section 1.2)

Complex multi-scale models (Section 2.1) CHAPTER 2. Virtual laboratory (Section 2.4) OBJECTIVES & STATE OF THE ART Variogram & Estimation techniques (Section 2.7)

Optimization methods (Section 2.8)

Ecological Models (EMs) (Section 3.2) CHAPTER 3. NETWORK OPTIMIZATION Model of the light-trap Network (Section 3.3) VIRTUAL ENVIRONMENT LABORATORY (NOVEL) Disjointed Surveillance Network (DSN) [3.3.1] Correlated & Disk graph-based Surveillance Network (CDSN) [3.3.2]

CHAPTER 4. Micro-scale optimization (Section 4.4) MULTIPLE Meso-scale optimization (Section 4.5) SCALES OPTIMIZATION Macro-scale optimization (Section 4.6)

Root mean square error of variogram model (Section 5.3.1) CHAPTER 5. Deviation of cross-validation (Section 5.3.2) EXPERIMENTAL ASSESSMENTS Mean universal Kriging variance (MUKV) (Section 5.3.3) Aggregated indicators (Section 5.3.4)

Ecological Models (EMs) (Section 6.1.2.1)

CHAPTER 6. Surveillance network model (SNM) (Section 6.1.2.2) METHODOLOGICAL

PROPOSAL Optimizers (Section 6.1.2.3) Evaluators (Section 6.1.2.4)

CHAPTER 7. CONCLUSION AND PERSPECTIVES

Figure 6. Thesis organization.

Chapter 2 ‒ Objectives and State of the art

CHAPTER 2

OBJECTIVES AND STATE OF THE ART

Chapter 2 gives a simple introduction to the contribution of the thesis, a virtual laboratory for network optimization. We try to define the main components of the laboratory, the most important properties of these components and to understand which previous studies can be reused and which ones have to be completely redesigned to handle these properties. Different states of the art are then presented, on agent-based models, Gaussian processes, as well as in network optimization, centered on two widely used approaches: optimal design [22] [28] and metaheuristic solutions [23]–[25].

2.1. Complexity and scales

As mentioned in Chapter 1, building a virtual laboratory is the approach we have chosen to address surveillance network optimization. However, it relies on the possibility to reproduce, in silico, a part of reality that at least includes the interactions between the insects, their environment, and the surveillance network. This kind of integrated model, considered as a whole, belongs to the class of complex multi-scale models [29]. In this approach to modeling, the relationships between the different parts of a system are represented in all their richness so as to keep most of their significant “natural” properties and relationships. The drawback of this approach is that it often requires a large amount of multidisciplinary knowledge. It has however already been applied with success in numerous research domains, for example in ecology by Grimm et al. [30], economy by Anderson et al. [31], or social sciences by JASSS [32], and it is beginning to be considered as an interesting approach in environmental management by Paegelow [33] and by Wainwright [34].

Chapter 2 ‒ Objectives and State of the art

“Multiscale” is a concept used to divide the different levels observed in a complex system [35] [36]. The meaning of these scales strongly depends on the point of view of the observers. When it comes to modeling, however, the classification of these scales helps modelers to better control the complexity of their models, by defining the properties of each scale in isolation and focusing only on the interaction and data transmission between scales. It also allows them to handle situations where both local and global viewpoints on a phenomenon are needed: for instance, BPH migration will depend on local factors (availability of rice fields, interaction of the groups with pesticides, etc.) but will also obey global rules, for example by following the dominant winds. In a multi-scale complex systems approach, both viewpoints can be described, as well as their relationships. Not all modeling paradigms and techniques allow for a multi-scale representation of a complex system. The figure below shows that analytical models (based on a mathematical representation) are more adapted to representing macro-scales, where agent-based models (ABM) appear to be quite versatile but limited in terms of representing local and global dynamics and entities simultaneously.

Figure 7. Appropriate modeling methods for different scales of simulation domains (Course of “Complex systems” of Edith Perrier, IRD/France).

These limitations of ABM have actually never been conceptual, but mainly due to the limitations of the toolkits and platforms (NetLogo [37], Swarm [38], Repast [39]) supporting this kind of modeling. Recent developments in the domain have proved that ABM is now the approach of choice for supporting arbitrary levels of representation when modeling complex systems. A good example of this new trend can for example be found in the evolution of the GAMA agent-based modeling and simulation platform [36] [40] [41]. 2.2. Modeling and simulating the whole system

The claim that we can study how to optimize the surveillance network using a virtual laboratory is based on the availability of reliable modeling and simulation techniques of the complex system formed by the network and the other components of the real

Chapter 2 ‒ Objectives and State of the art system. Choosing the appropriate modeling and simulation techniques and the appropriate way of coupling them is then an important question, which will directly affect the quality of the results we will obtain. We do not try, in this research, to replace existing optimization methods (e.g., optimal design [22] or sampling techniques [42][43]), but to provide the most appropriate tool to apply and validate these methods in a virtual environment. Three main models seem at first necessary to support our needs: a model of the environmental processes that affect BPH propagation (which include climatic aspects, land-use processes, agricultural submodels, etc.); a model of the BPH themselves, which needs to include at least the life-cycle and migration processes of the insects; and a model of the surveillance network itself, so that it could be easily assessed and optimized in various scenarios. To simplify their presentation, we will categorize them into two main subsets of models: ° Models of the ecosystem: The ecosystem here represents all the components of the real system that are not directly related to the network: environment, insects, climatic conditions, human activity, etc. The most difficult part is of course the models dealing with the BPH migrations. Some typical models of time-series prediction can be found in [44], which use traditional algorithms like ARMA (AutoRegressive Moving Average) or ARIMA (AutoRegressive Integrated Moving Average), or by solving ordinary/partial differential equations (models of prey/predator [45], SIR model [46] [47] or growth models [7] [48]). Other approaches deal with the propagation of species in the specific study regions, such as the multifractal model for spatial variation in species richness [49], agent-based models of BPH propagation [50] [51]. In this thesis, the virtual environment contains a coupled model between two main submodels: BPH Growth Model and BPH Migration Model, which will be detailed in Chapter 3. ° Model of the surveillance network: The light-trap network is modeled as a multi-scale agent-based model that operates synchronously with the aforementioned ecological model and acts as a “virtual observer” of this model. The data obtained from its simulation can then be used for two purposes: (1) verifying, against historical data, the validity of the model of the ecosystem and (2) evaluating the performances of different network optimization strategies. It is enriched with internal operations on statistical criteria, especially geostatistical ones [19] [52] [53]. In order to remain as close as possible to realistic observation activities, we will define the observation process of the surveillance network as a true sampling process and use the vocabulary and methods described by Fisher [42] to conduct experiments. The sampling process will then monitor one or multiple observed variable(s) to provide simulated experimental data for end users, and these data will become the input of different analyses: population estimation, variable forecasting [54], etc. Building these different models raises several challenges, like the gathering of relevant data, but also intrinsic difficulties related to their inherent properties. We review several studies on these difficulties in the sections below.

Chapter 2 ‒ Objectives and State of the art

The data gathered from the light-trap network are used in different stages of modeling and simulation in this thesis. This data source forms a set where each light-trap provides a time series data of insect trap-density. In this thesis, we use the notion of “trap-density” to specify the quantity of a specific insect type captured by the light-trap during a sampling cycle. Some other important data sources are also implemented in the laboratory, such as land-use data (especially agricultural land-use), weather data, e.g., temperature and humidity data, obtained from the meteorological stations and aggregated at the regional level (cf. [55][56] for aggregation methods), administrative boundaries data, and the GIS data of rivers and coastal regions of the Mekong Delta. In conclusion, our main research question is to design a decision-support system that allows decision-makers to simultaneously explore different layouts of the light-trap network in order to optimize its performances and to assess “optimal” configurations in different scenarios: BPH patterns, weather, etc. 2.3. Difficulties related to insect population modeling

Since our research is to be validated on Brown Plant Hopper (Nilaparvata lugens (Stål)) invasions, we will essentially review the difficulties raised by this particular species for plant protection activities [6], without really taking into account the biotic heterogeneity [57] that characterizes insect populations in the region. However, the properties of this species listed below, such as randomness or autocorrelation, spatial dispersion or nonergodicity (in time), or its vulnerability to multiple impacts, are not specific to BPH and can be found in other species.

2.3.1. Randomness vs. Autocorrelation

Randomness and autocorrelation are two opposite properties of the natural system we model in this research, which will partly decide which techniques or approaches can be employed to study it. ° Randomness The Oxford English Dictionary defines “random” as “Having no definite aim or purpose; not sent or guided in a particular direction; made, done, occurring, etc., without method or conscious choice; haphazard.” This concept of randomness suggests a non-order or non-coherence in a sequence of symbols or steps, so that there is no intelligible pattern or combination. Regarding light-trap data, the number of insects trapped every night is never totally random, but need to be considered as such. One of the reasons is that the limitation of the surveillance network hides the cause of their appearance at a given location: for example, the network cannot monitor the number of eggs or nymphs because these individuals cannot fly, but if we knew this number, we could probably forecast the number of insects that would be trapped. That is the reason why we need to consider this information as a random variable. And since the number of insects is a random variable, their dynamics can be considered as a random process.

Chapter 2 ‒ Objectives and State of the art

Of course, if the numbers were completely random, the surveillance network would become useless and there would be no point in trying to optimize it. There are patterns like in every natural “random processes”, where several hyperparameters can be used to predict the values at unknown locations. Well known random processes include: • Discrete time processes: Bernoulli processes, Markov chains, etc. • Continuous time processes: Cox, Lévy, Poisson processes, etc. • Mixed discrete & continuous time processes: Gaussian [58], Markov processes [59], etc. Assimilating insect distribution and spreading to a Gaussian process is one of the main assumptions of this research, which leads us to use the Kriging estimation [19] [20] (or Gaussian process regression) in a number of algorithms, such as insect density estimation, or in some strategies of network optimization. As we will see in this work, autocorrelation is present in the trap network, and is at the heart of the possibility to use techniques such as Kriging evaluation since, without any local correlation, it would be impossible to pretend optimizing the network. ° Autocorrelation The correlation between two different quantities is normally used as a measure of their similarity [60]. In the surveillance data, the cross-correlation [61] between two different surveillance locations or the autocorrelation at one location directly affects the performance of the network. Both time series analyses (e.g., ARMA) and spatial analyses (e.g., Kriging) try to study the similarity of different time lags (e.g., ACF) or distance lags (e.g., variogram) for their prediction functions [52].

2.3.2. Spatial dispersion and nonergodicity

While the term of “spatial dispersion” describes the difference and the irregularity of the density of insects in space, “nonergodicity” describes them in time. These two properties partly reflect the complexity of the ecosystem. The spatio-temporal changes of the insect ecosystem strongly depend on multiple natural and human factors, some of them being difficultly controllable (e.g., climate change, land use, pesticide use, etc.). These heterogeneous factors affect the natural distribution of insects, and consequently the stability of the surveillance network. Hence, in some circumstances, the surveillance network can become completely ineffective. For instance, in the Mekong Delta, some provinces have stopped the maintenance of light-trap network because of the changes in land use (like Ca Mau following the extension of aquaculture activities). More explanations regarding these two concepts are shortly introduced as follows: ° Spatial dispersion Figure 8 shows the contour plots of two surfaces where the relief of Surface 1 seems significantly rougher than one of Surface 2.

Chapter 2 ‒ Objectives and State of the art

a) Surface 1 b) Surface 2

Figure 8. Two different dispersion data (Source [62]).

The histogram analyses of these two surfaces are given in Figure 9. We cannot find a big difference between these two histograms, whereas their variograms (variance diagrams) in Figure 10 show a clearer difference.

a) Histogram of surface 1 b) Histogram of surface 2

Figure 9. Histograms of two different dispersion data (Source [62]).

In Figure 10, the left variogram shows a strong increase of variance in the range from 0 to 15 units of lag distance, whereas the right one shows a slower change of variance.

Chapter 2 ‒ Objectives and State of the art

a) Semivariogram of surface 1 b) Semivariogram of surface 2

Figure 10. Variograms of two different dispersion data (Source [62]).

These figures show that empirical variograms can be used to provide relevant insights about spatial dispersion. Evaluating the regression model of variograms is already an important method to assess the performance of a surveillance network. If a network has a low sampling density and does not satisfy the Nyquist–Shannon condition [43] [63] regarding sampling density, it will be impossible to propose a good regression model, without too much errors, based on its variogram. Conversely, the errors in the regression model can become a quantitative indicator to assess the performance of this network. ° Nonergodicity

Figure 11. Standardized residuals (Trap-density data at Dai Thanh, Nga Bay Town, Hau Giang Province, Vietnam).

Figure 11 shows an example of standardized residuals for trap-density data for one year. These time series data have been collected during 365 days in 2010 at Dai Thanh, Nga Bay town, Hau Giang province, Vietnam.

Chapter 2 ‒ Objectives and State of the art

Figure 12. Time series prediction by ARIMA (Trap-density data at Dai Thanh, Nga Bay Town, Hau Giang Province).

Figure 12 shows an ARIMA prediction for the same time series data. The blue line draws the predicted values for the following 40 days. The red and green lines show the standard deviations around these predicted values. Such analyses could be useful in two tasks of this research: (1) in predicting the outcomes of the growth model of insects and (2) in becoming a criterion during the verification of the predictions made by simulation. However, this approach is not completely effective with high values of standardized residuals, at some points in time, as presented in Figure 12, which means that the time series data are nonergodic.

2.3.3. Heterogeneity

Many extrinsic factors can disrupt the natural distribution of insects. These factors can be caused by natural or human processes and are called the “nuisance parameters” of the prediction models. Some of these factors will be considered in this research:

• Natural factors: temperature, humidity, wind (direction and speed). • Biological factors: predators and parasitoids of insect. • Human factors: land use planning, cultivation planning.

Chapter 2 ‒ Objectives and State of the art

a) Adopting 3R3G in An Giang Province from 2002 to 2008. b) Poster for “Three (Data source: An Giang Department of Agriculture, Long Xuyen, Reductions, Three Gains” Vietnam). program [27].

Figure 13. Scaling out communication to rural farmers: lessons from the “Three Reductions, Three Gains” campaign in Vietnam [27].

For example, in Vietnam, a large campaign named “Three Reductions, Three Gains”, launched in 2001, tried to convince the farmers to follow the strategies proposed by agricultural experts in using different rice seeds, less fertilizers and less pesticides [27]. Figure 13 shows the adoption rates of these strategies by farmers in An Giang Province from 2001 to 2008 and one typical poster for this campaign. We clearly see here that the human impacts on the insect population can become extremely important if coordinated behaviors are put into practice. This campaign demonstrated several successes, but the BPH adapted and their invasions, although different in shape and nature, are still very difficult to control. Multiple factors like this one are known to directly affect the heterogeneity, spatial dispersion and spreading of BPH populations. 2.4. How to build the virtual laboratory (NOVEL)?

Determining the necessary components of NOVEL to support various virtual experiments is the central question of this thesis. Even without precisely knowing the range of virtual experiments that need to be performed, we have to choose an appropriate approach, including the techniques, services, or platforms, etc., that can support modeling, coupling and combining separate functional components. Besides this, it is also required that these models, which will be used to simulate reality and generate simulation data, be as realistic as possible. The satisfaction of this requirement ensures that the outcome of virtual experiments can be verified and furthermore, that the solutions proposed, e.g. potential network layouts, can be possibly applied.

Chapter 2 ‒ Objectives and State of the art

2.4.1. Components of NOVEL

In order to ensure this requirement of realism, we need to rely on the context and the data sources available, especially on the structure and operation of the existing light- trap network (cf. Figure 3). In Section 2.2, we suggest that two groups of models need to be developped and coupled: (1) models of the ecosystem and (2) model of the surveillance network. Obviously, the important properties of BPH populations that have been identified in Section 2.3 must also be present in NOVEL. Virtual Ecology, initially proposed by Grimm et al. in 1999 [64] and refined by Zurell et al. in 2010 [65], proposes to model ecological surveillance tasks using a framework composed of four main components: (1) A virtual ecological model, which produces data, (2) A virtual sampling model, which observes the data, (3) Statistical models to build inferences on the data, and (4) Evaluation models to compare produced and real data. Our proposal for components of NOVEL is based on this approach, but in our case, the goal is to “infer” and assess optimal network layouts. Thus, a system of 4 components is proposed in NOVEL: (1) Ecological models, (2) Model of the light-trap network, (3) Optimization processes, and (4) Assessment processes.

Figure 14. Abstract view of NOVEL.

Figure 14 shows an abstract view of NOVEL with all 4 sub-systems and its internal data flows: 1) “Ecological Models”: This component, which mostly contains models and is responsible for the simulations of the BPH invasions in different scenarios, groups all the factors related to the dynamics of BPH, including food sources (rice-cultivated regions), the weather or the obstacles like rivers/seas, growth and migration models.

Chapter 2 ‒ Objectives and State of the art

All these models are spatially coupled: a cellular automaton is used to model the study region as a lattice of rectangle cells; each cell contains the local information regarding the environmental and biological system it belongs to and is used as a common structure for the coupling between the models. This sub-system will be extensively covered in Chapter 3. 2) “Model of the Light-trap Network”: This sub-system contains and manages the model of the surveillance network, i.e. light-trap network. This model is used to monitor some variables supplied by the cellular automaton of the previous sub-system and is directly manipulated (and possibly modified) by the components of “Optimization Processes”. 3) “Optimization Processes”: Multiple optimization algorithms are implemented to explore different layouts of the light-trap network in order to optimize its performances. 4) “Assessment Processes”: Various indicators assess its performances and serve as the criteria to assess “optimal” configurations in different scenarios (e.g. BPH patterns, weather). It is clear, from this figure, that the “Model of the Light-trap Network” sub-system is a central component of the whole system. As a matter of fact, it plays three roles at once: 1) With respect to the ecological and environmental models, it plays a role of “perception component” of the whole system they form, borrowing information from it as a real network would do. This task is represented by the monitors() relationship on the figure (which will directly translate into the behavior of some components of the model). 2) From the point of view of the “Optimization Processes” sub-system, it must provide a flexible interface to support the re-organization of the network modeled, allowing for the addition or deletion of measuring devices or the reorganization of their locations and frequencies. This task is represented by the optimizes() relationship. 3) Finally, it needs to reflect the actual surveillance network, with respect to both its organization and its operations, as faithfully as possible, so that decision makers can easily translate lessons learned in simulations to possible actions in reality.

2.4.2. Which approach can be used?

Choosing the technique(s) that can Definition. Agent-based model (ABM), presented by Gilbert [70]. support representing the various Agent-based modeling (ABM) (...) involves building a computational models built for NOVEL (including model consisting of agents, each of which represents an actor in the their various properties and local social world, and an “environment” in which the agents act. Agents are able to interact with each other and are programmed to be pro- constraints) is not an easy task. In active, autonomous and able to perceive their virtual world. The fact, there are multiple relations at techniques of ABM are derived from artificial intelligence and computer science, but are now being developed independently in different scales that need to be research centers throughout the world. considered at the same time: the spatial estimation [66], the models of the growth process [7], human impacts [27], ecological impacts [67], survey activities by light-traps [11] [63], etc. Moreover, in case of BPHs, some remarkable properties, such as their heterogeneity or dispersion, should be thoroughly noted, in order to conduct virtual experiments at different scales [35].

Chapter 2 ‒ Objectives and State of the art

Moreover, and besides the difficulties of building individual models, integrating and coupling all the necessary models require a conceptual and operational modeling framework that can support different techniques, different scales, and different point of views. Agent-based modeling (ABM) [68]–[70], in that respect, appears as an appropriate technique for this purpose. A review of the recent research on individual-based models, agent-based models and multi-agent systems underlines the fact that ABMs are increasingly used in non- computing related scientific domains including Life Sciences, Ecological Sciences and Social Sciences [69], precisely because of their versatility and capabilities in coupling different techniques, at different scales of space and time. Furthermore, it has been shown that particular optimization techniques, such as Tabu search [71] or Particle Swarm Optimization (PSO) [72]–[74] are very well adapted to the representation used in agent-based models. The core of NOVEL will then be based on an ABM meta-model, implemented on the GAMA platform. GAMA [38] [70] [71] has been chosen because of many interesting characteristics:

• The possibility to easily couple and combine different models, possibly written using other modeling concepts (ODE, PDE, Cellular Automata, Dynamic Systems). • Its inherent support of multiple-scale models, through the implementation of nested models and its explicit definitions of spatial and temporal scales. • Its support of GIS data (read/write from/into distributed spatial RDBMS). • Some advanced features like the management of multiple topologies (graphs, lattices), a tight interaction with the R language [77] and sophisticated 3D visualization. GAMA is being developed since 2007 by a set of partner laboratories, cooperation between France and Vietnam (IRD/UMMISCO1, IFI/MSI2, CTU/DREAM3, UT1/IRIT4, P11/LRI5, Rouen/IDEES6), which has allowed us to benefit from direct interactions with its developers. A PhD student has also contributed to the platform during this research, namely by designing and implementing the interaction with the R environment. 2.5. How to couple the submodels?

The basic “bricks” for building NOVEL are now more or less determined, including the important properties of the BPH ecosystem (cf. Section 2.3), its main components and the appropriate approach, i.e., ABM (cf. Section 2.4). However, we also need a general design that can give answers to the following questions:

1 www.ummisco.ird.fr 2 www.ifi.auf.org/site/content/view/35/46/ 3 http://www.ctu.edu.vn/dream/ 4 http://www.irit.fr/ 5 https://www.lri.fr/ 6 www.umr-idees.fr 21

Chapter 2 ‒ Objectives and State of the art

(1) How to initialize models using the input data, for example, the BPH trap-density data, the data related to different local or global constraints, etc., given that these data have been collected and are represented at very different scales? (2) How to represent the monitors() task of Figure 14? And how to ensure that this task is realistic enough compared to what happens in reality? (3) How to represent the optimizes() task (i.e., the placement of the light-traps) of Figure 14? How can this interaction be automatically performed inside the laboratory? To answer these three questions, we need to clarify in which way the submodels can be coupled with each other although they are situated at different scales. We do not have the intention to propose a detailed design of the laboratory in this section, but to only focus on the general directions we have chosen for its implementation, which will be presented in Chapter 3. The main choice we have made is to use the smallest temporal and spatial scales available in our data in order to synchronize and couple all the models on them. With such a choice, models and their components are able to easily interact with any other object at similar or higher levels.

2.5.1. Spatial coupling

For example, while a light-trap is managed at the administrative scale of a small town/district, it can represent a very small space in the study region, so the discretization of the space appears necessary. In ABM, cellular automata is often considered as the most interesting choice to represent both the spatial discretization and the interactions between the spaces, while allowing these spatial units to interact with agents at higher levels (the light-traps and their zones of influence, such as the rice-cultivated areas or administrative regions, etc.)

The first cellular automaton Definition. Cellular automata, presented by Wolfram [79]. (CA) was first developed in Cellular automata are mathematical idealizations of physical systems in which the 1940s by Stanislaw space and time are discrete, and physical quantities take on a finite set of Ulam, who was discrete values. A cellular automaton consists of a regular uniform lattice (or “array”), usually infinite in extent, with a discrete variable at each site (“cell”). investigating the growth of The state of a cellular automaton is completely specified by the values of the crystals, and by John Von variables at each site. A cellular automaton evolves in discrete time steps, with the value of the variable at one site being affected by the values of variables at Neumann, who was working sites in its “neighborhood” on the previous time step. They have been applied on the problem of self- and reintroduced for a wide variety of purposes, and referred to by a variety of names, including “tessellation automata”, “homogeneous structures”, “cellular replicating systems [78]. structures”, “tessellation structures”, and “iterative arrays”. This technique is now mainly used for representing the dynamics of different kinds of environments [79]–[81]. In its detailed study of the class of CA based models, Wolfram underlined that “even when the underlying rules are very simple, the behavior they produce can be highly complex, and can mimic many features of what we see in nature” [81]. Synchronous and asynchronous updates are the two main methods applied in cellular automata. Asynchronous methods in which the time variable is not explicitly defined, operate by specifying an updating order of the cells [80]. In the synchronous method, the variables at each site are updated simultaneously, based on the values of the variables in their neighborhood at the preceding time step, and according to a definite

Chapter 2 ‒ Objectives and State of the art

set of “local rules” [79]. We apply two cellular automata as the main solution to couple multiple submodels at different spatial scales in NOVEL (cf. Figure 17), all of them are essentially updated by the synchronous method.

2.5.2. Temporal coupling

Our meteorological data of the meteorological stations are produced on a monthly basis, while the trap-density data are produced daily, (See MARD7 ‒ Meteorological Data by Month, Years, and Stations, Ministry of Agriculture and Rural Development). In our model, we logically chose the day as the lowest scale applied for the models. One day will then correspond to one step in the simulation. 2.6. How to model the BPH trap-density data?

The properties of the distribution of BPH in a spatio-temporal zone, namely randomness and autocorrelation, have already been indicated. The question is how to model this distribution in our model(s), and more concretely, how to map it to an environment based on a cellular automata representation. As aforementioned, cellular automata are often used to model spatial zones, their dynamics and interactions. Regarding time, as mentioned in Section 2.3, there are two representations that are probably adaptable to model both the discrete and continuous time processes, respectively Gaussian processes [58] and Markov processes [59]. It should be noted that both discrete and continuous time processes are necessary to be considered in our work, as they correspond to the need to exchange data between the real environment (i.e., in reality) and the discrete environment (i.e., the collected and simulation-produced data). Both models are therefore intentionally used in NOVEL; Gaussian processes are used to support the estimation of BPH trap-densities in a specific study region, while Markov processes support the dynamics of the models, essentially for the BPH growth and migration models. Because Markov processes are rather easy to understand, and because their implementation is covered in Chapter 3, we will just provide in the following sections some bases about Gaussian Processes.

2.6.1. How are the BPH trap-density data modeled?

As already mentioned, we need to manage different data sets with different characteristics, inside our virtual environment: the data collected and the data produced by the simulation itself; both will be named “sampling data” in the following. Sampling data are defined as the data collected from multiple sampling devices (or measuring devices) of a network. Normally, these data are analyzed by statistical techniques to estimate the population of concerned variables. In our research, the sampling data are considered as a realization of a Gaussian process [90] [91]. Three important charasteristics of sampling data are their spatial, temporal, and multivariate traits. A definition of a spatio-temporal random field is given below:

7 http://fsiu.mard.gov.vn/data/khituong.htm 23

Chapter 2 ‒ Objectives and State of the art

Definition. Spatio-temporal random field. A spatio-temporal random field F is a stochastic process denoted by

(1) = (, ) | ∈ , ∈ , ⊂ , ⊂ } where and respectively denote the space area and the time duration, (, ) denotes the value of random variable at a specific continous space-time point (s, t). Concretely, a sampling design is developed by the end users to monitor the status of a random field F at some specific time and space scales. Sampling data are considered as the realization of F, and it must be a subset of F . Cressie presented the statistical spatio-temporal model in [52], which is also our approach in this thesis. Supposing that Z is the sampling data of a random field F, Z is defined as the following: (2) (, ) = , | ∈ , ∈ , ≤ ≤ , ≤ ≤ where ° s and t denote respectively the space and time of sampling, ° n denotes the number of sampling sites, ° m denotes the sampling times. ° denotes the observed value of random variable at a specific , sampling point . , Although the data model of F is not defined yet, where Z(s, t) is a representative of F, F can have the same model in Equation 2 without any limit on m and n. Using this data model, we can explore some intrinsic relations inside F by a number of covariance functions and some approximation techniques applied on Z(s, t).

2.6.2. Gaussian process

With the assumption of a random Definition. Gaussian process, defined by Rasmussen in 2006 [82]. field, the variable concerned at a A Gaussian process is a collection of random variables, any finite specific spatial site s will be an number of which has a joint Gaussian distribution. independent random variable, denoted as Z(s). In addition, since we assume that F is a Gaussian process, not only the variables but also the covariance functions will be Gaussian processes. A Gaussian process is completely specified by its mean function and covariance function. We define the mean function m(x) and the covariance k(x, x’) of a process as: (3) () = [()]

Chapter 2 ‒ Objectives and State of the art

(4) (, ’) = [(() – ()) ((’) – (’))] and hence, a Gaussian process will be written as: (5) () ~ ((), (, ’)) A Gaussian process is a generalization of the Gaussian probability distribution. While a probability distribution describes random variables (scalar or vector), a stochastic process governs the properties of functions. We denote a random function , i.e, the random function f is distributed as a Gaussian process with mean function~ (μ, )µ and covariance function K. In the case of Kriging estimation (cf. Section 2.7.1), also known as Gaussian process regression (GPR) [20] [82], is used as a covariance function. The = (, ) = ( − ) mean function µ can also be equal to 0 (Simple Kriging), or to a constant mean value µ0 (Ordinary Kriging). 2.7. How to estimate the BPH trap-densities of unmeasured locations?

Estimation techniques are classified into two groups: deterministic solutions (polynomial interpolation [84], radial basis function (RBF) [85]) and stochastic solutions (Kriging estimation [20] [19] and Bayesian inference [86]). In this research, we essentially concentrate on a stochastic solution in which the Kriging interpolation is the main approach. Kriging is the principal technique mentioned in geostatistics. It has first been applied in mineral mining related researches, but is now used in a wide range of applications.

2.7.1. Kriging estimation

Geostatistics [19] is a branch of statistics that especially focuses on spatial analysis instead of spatio-temporal analysis [52]. Kriging (named after Krige, and developed by Matheron [19]) is a popular interpolation method to estimate the values of variables observed at unknown locations from some sampling data set. Nowadays, this technique is not used only in geostatistics anymore, but also in different communities such as environmental sciences, hydrogeology, natural resources management, etc. Multiple kinds of Kriging methods exist, depending on the assumptions they make about the environment (or random field). Simple Kriging (SK) has an assumption of a known mean value, while Ordinary Kriging (OK) requires only a constant mean condition (but unknown). More generally, Universal Kriging (UK) considers that the random field has a non-constant mean value. Ripley, in 1981, and Cressie, in 1993, gave a thorough analysis of Kriging methods in spatial statistics [20] [87]. In the different Kriging methods, the establishment of a variogram is considered as a central factor. Thanks to its approximation function (cf. the variogram model mentioned below) mean values and variances of unmeasured locations can be computed. Variograms were introduced by Matheron in 1963 [19]. It should be noted that early papers by Kolmogorov in 1941 (as the structure function [88]) and by Jowett in 1952 (as the mean-squared difference [89]) defined the variogram in one dimension [52].

Chapter 2 ‒ Objectives and State of the art

In spatial statistics, variograms exist in two different varieties: theoretical and empirical ones. ° Theoretical variogram A theoretical variogram 2γ(x,y) is a function describing the degree of spatial dependence of a stochastic process (such as Gaussian process [82]).

(6) (, ) = () − () = (() − () − (() − ()) ) In case of a constant mean μ of the stochastic process, the expression to calculate the variogram is the following:

(7) (, ) = (|() − ()| ) ° Empirical variogram Empirical variograms are defined in geostatistics as the first estimation of the theoretical variogram needed for spatial interpolation by Kriging technique.

In spatial analysis, for observations zi, i = 1, 2, …, k at locations s1, s2, …, sk an empirical variogram, as defined in Cressie (1993), is: (8) () = − |()| (,)∈() where N(h) denotes the set of pairs of observations i, j. Generally an “approximate distance” h is implemented to represent a certain tolerance. In practice, the empirical variograms allow us to characterize the spatial smoothness or roughness of a data set. Variogram-based analysis consists in computing a from the data and building a variogram model fitted to this data. There are some popular variogram models that are often used: Linear model, Exponential model, Spherical model, Gaussian model and Matérn model. Each of them has a specific set of hyper-parameters that need to be tuned (or fitted) before they are applied. As mentioned before, we will use a feature of GAMA that allows to directly from within a simulation the gstat library [90] with a set of appropriate hyper-parameters from NOVEL. In Annex 3, several related code snippets are presented.

2.7.2. Noising and smoothing processes

In practice, when applying the results of a Kriging estimation, we cannot rigidly apply the estimated mean values for all simulation times. Doing that would transform the status of simulation data from random to deterministic, and this clearly does not satisfy the inherent properties of the BPH trap-density data. Fortunately, Kriging methods provide us with an estimated variance, and this variance can be used as an added noise to the estimated mean value to increase its randomness. This technique is called “Gaussian noise”.

Chapter 2 ‒ Objectives and State of the art

Gaussian noises are determined by the mean value and the variance of the Kriging estimation. A random noise based on the Gaussian variance is added to the mean value for each estimated location; the final value is called the Gaussian noise density. Gaussian noises help to generate various scenarios of the environment during the estimation process. In NOVEL, as will be presented in Chapter 3, Gaussian noise density is separately computed for every cell of the environmental cellular automaton. Thus, the differences of Gaussian noise densities between two adjacent cells can be high. These differences do not exist in a real context, which means that we also need to apply some smoothing process. Concretely, this “smoothing density” is computed by the mean value of all Gaussian noise densities of the cells’ neighbors. This is then used in place of the raw Gaussian noise density. 2.8. How to optimize the light-trap density network?

2.8.1. Quantifying the uncertainty of the estimated results

Quantifying the uncertainty of the results obtained in our laboratory can be considered as a measure of the performance of the whole surveillance network based on a specific estimation technique. We will take advantage of this by the way of performance indicators described in Sections 5.3.2 and 5.3.3. It also provides a solid foundation for optimization strategies; in fact, all the optimization algorithms presented in Chapter 4 will be based on this quantification (or similar ones), where the basic idea will be to install new light traps in locations with high uncertainty level; and in the opposite case, to remove existing light-traps from the network if this removal does not strongly affect the accuracy of the whole network. In each particular algorithm, several additional criteria will of course be considered. In the Kriging methods, an often-used measure of the uncertainty is called the Kriging variance. In this thesis, several measures of uncertainty are used based on the Universal Kriging estimation [20] [91], essentially the Universal Kriging variance (UKV). Furthermore, in Section 4.3.2, Universal Kriging Standard Error (UKSE) is mentioned as an alternative measure of UKV, and this approach is based on the so-called Gaussian process entropy (GPE). Gaussian process entropy (GPE) In information theory, entropy is a measure of the uncertainty associated with a random variable [92]. This notion has firstly been introduced by Shannon in 1948 [93]. In this thesis, the measure of the uncertainty is based on the Kriging estimation, and its variance is a form of entropy. Since Kriging estimation is also known as a Gaussian process regression [83], the entropy of Kriging estimation is the entropy of the Gaussian process (or Gaussian process entropy ‒ GPE).

Chapter 2 ‒ Objectives and State of the art

Figure 15. Estimated trap-densities by Figure 16. Universal Kriging variances. using Universal Kriging technique. Figure 15 shows the estimated trap-densities in the study region by using UK, whereas Figure 16 shows the UK variances in the same context. The accuracy of estimation in this case is very low because the UK variance is too high. The farther the estimated locations are from the sampling locations, the greater Kriging variance is. This measure can for example lead greedy optimization approach, where new measuring devices would be installed at locations of high variance.

2.8.2. Typical optimization approaches

Optimizing an environment surveillance network is the primary goal of this thesis. For several reasons mentioned above, we chose an approach based on simulation to solve this problem. The optimization process comprises two steps: (1) optimization itself of the surveillance network and (2) assessment of its performance. Conventional approaches can help in performing the first step. In our opinion, however, the second step plays a more important role and provides more detailed evidences for or against optimal network layouts. These two processes are installed inside two sub-systems called “Optimization Processes” and “Assessment Processes” (cf. Figure 14). Optimizing the surveillance network aims at proposing an optimal layout by adding, deleting or re-planning the location of measuring devices. Cross-validation, in many cases, can be applied for network optimization. As per this technique, each element is picked out of the training set and compared with its estimated/predicted value. Using cross-validation in optimization is suggested by Cressie (1993), namely using the deletion of a site from the network [20]. Thus, if a node has a serious error in cross- validation, it becomes an important node and should not be removed from the network. Otherwise, if a node has a low error, it can be removed from the network. In our work, optimization algorithms will be classified into local search based algorithms (micro-scale), proximity search based ones (meso-scale) and global search based ones (macro-scale). As we will see in Section 4.1 of Chapter 4, this classification depends on the network organization that is defined inside NOVEL.

Chapter 2 ‒ Objectives and State of the art

2.8.2.1. Experimental design and optimal design Experimental design, or design of experiments ‒ DOE, is defined as a design of any information-gathering exercises where some variation is present depending on different experimental strategies [39] [80] [81]. Optimal designs are one class of DOE [22], which aims at proposing a sampling design with respect to several statistical criteria. This thesis examines a small class of experimental designs, the one concerned with environmental sampling, a technique important in many domains of environmental and ecological applications, such as agricultural management [11] [12], fishery surveillance [96], or forest management [40] [83]. Atkinson et al. thoroughly discussed optimum experimental designs based on a covariance matrix of the empirical data [22]. Numerous optimal designs are proposed as macro-scale strategies, such as A-, D- or E-optimal designs (applied on a covariance matrix) [9] [22]. 2.8.2.2. Metaheuristic optimization

Metaheuristics are divided in two Definition. Metaheuristics, presented by Blum and Roli [24]. categories: trajectory-based Metaheuristics are high level concepts for exploring search spaces by metaheuristics that start with a using different strategies. These strategies should be chosen in such a way that a dynamic balance is given between the exploitation of the single initial solution, and accumulated search experience (which is commonly called population-based metaheuristics intensification) and the exploration of the search space (which is commonly called diversification). This balance is necessary on one side where the initial population is to quickly identify regions in the search space with high quality solutions randomly generated or created and on the other side not to waste too much time in regions of the search space which are either already explored or don’t provide high with a greedy algorithm. All quality solutions. metaheuristic algorithms in this thesis are population-based metaheuristics. Popular ones are Tabu search [71], Ant Colony Optimization [98] or the swarm optimization (e.g., Particle Swarm Optimization [72]). Although metaheuristics is a large class of algorithms that can provide solution for various problems, we will only use them to optimize the surveillance network, using what is called “metaheuristic optimization”. A recent survey of this trend can be found in [25]. Evolutionary algorithms (EA) An evolutionary algorithm is a generic population-based metaheuristic optimization algorithm. It uses some mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. It is to be noted that agent-based approaches share similar characteristics with EA, and their combination, known as Agent-based Evolutionary Search (AES), proves to be an interesting approach in applications similar to ours [99]. For example, Yan and Sarker introduced an evolutionary approach into agent-based simulation, calling this “dynamic optimization” [99][100]. This combination is also applied in many other domains, for example to optimize the placement of evacuation signs on road networks in case of emergency [101] [102].

Chapter 2 ‒ Objectives and State of the art

2.9. Conclusion

In conclusion, Chapter 2 presented the main objectives pertaining to this thesis. Based on the analyses of the studies that had been done to improve the performance of the light-trap network in the Mekong Delta, we showed that, besides the “technical restrictions” of using evolved optimization techniques, the “practical restrictions” in assessing optimal layouts constitute the biggest obstacle for decision makers in deciding how to change the network. However, we showed that, instead of applying an assessment based on real experiments, a solution based on simulation could also provide good evidences to evaluate network layouts before applying them in a real context. So, modeling the whole system, including the BPH ecosystem and the surveillance network, in order to deliver realistic simulations of potential network layouts, is the most important goal of this thesis. This is achieved by building a virtual laboratory that can support multiple network optimization algorithms and experimental assessments. An abstract view of this laboratory has been introduced, with three functional sub-systems: (1) Ecological & Environmental Models (EEMs), detailed in Section 3.2, (2) Surveillance Network Model (SNM) detailed in Section 3.3 and (3) Evaluation & Optimization Component (cf. Section 2.4). We showed that this abstract view opened a range of questions, most of them revolving around the problems of building the models, connecting these different components and choosing their mechanisms, etc. Corresponding to each question, possible solutions have been reviewed and a specific choice has been proposed, as follows: ° To build the models constituting NOVEL (cf. Section 2.1), we primarily chose an Agent-based modeling approach. ° To combine different models into a uniform system (cf. Section 2.5), NOVEL will support the lowest spatial and temporal scales of the data available. A CA-based technique is then proposed to combine the models at a spatial level, whereas the day is chosen as the smallest scale of time. ° To model the trap-density data (cf. Section 2.6), Gaussian and Markov processes are used to model the dynamics of this data. These two processes support both discrete and continuous data. ° To estimate the trap densities at unmeasured locations (cf. Section 2.7), a Kriging estimation (or Gaussian Process Regression), is proposed as an appropriate solution to represent the stochastic processes of this data. ° Finally, to optimize the light-trap network and assess its performances (cf. Section 2.8), two typical optimization techniques have been reviewed, and we will propose, in Chapter 4, some improved and novel approaches related to these two ones. The optimal layouts obtained this way will then be assessed inside NOVEL by using the simulated trap-density data.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

CHAPTER 3

NETWORK OPTIMIZATION VIRTUAL ENVIRONMENT LABORATORY (NOVEL)

Building a virtual laboratory, as mentioned in Section 2.4, is our approach to address surveillance network optimization. In this chapter, the implementation of this laboratory using an agent-based approach is mentioned in the context of light-trap networks. Almost all important factors of the real context are modeled, and divided into two groups: (1) Ecological Models (EMs) and (2) Model of the Light-trap Network. In this chapter, we essentially focus on the modeling process, but a verification process is also introduced to evaluate the accuracy of the BPH prediction model (i.e., the integrated model of EMs). Chapter 4 and 5 will respectively explain how to apply this laboratory for surveillance network optimization and experimental assessement of different potential network layouts.

3.1. Motivation

This chapter is designed as a practical answer to some of the questions introduced in Chapter 2. In Figure 14, an abstract view of NOVEL has been presented, where four principal sub-systems are proposed: (1) Ecological Models (EMs) presented in Section 3.2, (2) Model of the Light-trap Network presented in Section 3.3 (3) Optimization Processes presented in Chapter 4 and (4) Assessment Processes presented in Chapter 5. All these components are integrated into NOVEL.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Figure 17. General architecture of NOVEL.

Figure 17 graphically illustrates (1) how we employ cellular automata to model the ecological and environmental dynamics, and (2) how we represent the interaction between the surveillance network and its environment. Each cellular automaton is used to model the spatial environment as a lattice of cell agents. Each cell agent represents a separate rectangular zone in the study region. EEMs rely on two cellular automata. The first one is used to maintain the information on BPH’s stages/life-cycle, e.g., eggs, nymphs, adults, and, on the other hand, some environmental parameters, e.g., temperature, wind speed, land-use. A synchronous update is applied to this automaton. The second one (Heuristic cellular automaton) helps to maintain the information of UKSE (or a combinational heuristic argument as explained in detail in Section 4.3.2) that is primarily used to support the optimization algorithms (presented in detail in Chapter 4). This heuristic cellular automaton is just updated one time at the beginning of the optimization process. GAMA [38] [70] [71] supports a mechanism that allows implementing multiple cellular automata in parallel with other GIS-based agents. Therefore, in the scope of this thesis, all the models of the virtual labobatory are implemented on this platform. 3.2. Ecological Models

3.2.1. Requirements and solutions

This section concentrates on the modeling process ‒ building all necessary models related to the ecosystem of BPHs using agents. The main goal of Ecological Models is to reproduce all the important properties of the real system listed in Section 2.1, which include: (1) Randomness & Autocorrelation, (2) Spatial dispersion & Nonergodicity, and (3) Heterogeneity. Section 3.2.2 presents all the entities needed to develop the models and couple them. Sections 3.2.3 and 3.2.4 respectively introduce two main groups of submodels: (1) Environmental submodels and (2) Biological submodels. All the submodels of these two groups are listed in Table 1. .

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Table 1. Two main groups of submodels: Environmental and Biological submodels.

Environmental submodels Biological submodels Objectives Objectives Modeling the nuisance factors that affect the Modeling and simulating the life-cycle and dynamics of the population of insects migration of insects Input data Input data (1) Rice cultivated regions in two crop seasons: (1) Light-trap sampling data Summer-Autumn & Winter-Spring (2) Simulated wind data: Direction and Velocity (2) Statistical weather data (meteorological stations): Temperature & Humidity (3) Statistical wind data (4) Sea and river regions Simulation step Simulation step One MONTH One DAY Submodels Submodels (1) Temperature model (1) BPH growth model (2) Humidity model (2) BPH migration model (3) Wind model

3.2.2. Entity specifications

In this section, we provide the descriptions of all the agents installed in Ecological Models (see Figure 14). Components related to Model of the Light-trap Network will be mentioned in Chapter 4.

Figure 18. Three state variables: attractiveness, obstruction indices (Environment) and density vector (Biology).

Figure 18 shows the class diagram with all the entities used by Ecological Models in NOVEL. Environmental_Cellular_Agent entity plays the central role of this diagram, it

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

shares a common space with almost all spatial entities, hence the interactions between different models are performed through it. Two environmental indices and one biological index are used for this purpose: ° Attractiveness and obstruction indices (Environment): Defined as the combination of multiple environmental factors (cf. Table 6). They are updated by Environmental submodels and reused by Biological submodels. ° Density vector (Biology): Containing the density of BPHs at all their day-ages (cf. Figure 26). This vector is updated by Biological submodels. Furthermore, this vector is also a bridge to couple the Ecological Models and the Model of the Light-trap Network (the sum of all number of adult BPHs inside the vector is considered as the virtual trap-density captured by a virtual light-trap). Detailed description of all the agents (i.e. entities in the class diagram) are listed as follows: (1) Meteorological agents

Table 2. Natural agents.

No Property name Type Description 1 Wind_Infomation: Containing the general information pertaining to the dominant wind used for the whole study region in a year. Each property of this agent is an array of 12 elements, respectively the data of 12 months of the year. Mean_Wind_Speed Array Mean value of wind speed Min_Wind_Speed Array Minimum wind speed Max_Wind_Speed Array Maximum wind speed Wind_Direction_From Array Wind direction (1rst edge) Wind_Direction_To Array Wind direction (2sd edge) 2 Meteo_Station_Region: Representing a shape of Voronoi diagram [103] partitioned by the meteorological stations in the study region. All related parameters collected by these stations are assigned to this agent. Similarly, the monthly data are maintained by its properties, each of them is formed as an array of 12 elements. ID String ID Name String Name Temperature Temp_Mean Array Mean value of temperature Temp_Max Array Maximum temperature Temp_Min Array Minimum temperature Humidity Hum_Mean Array Mean value of humidity Hum_Min Array Minimum humidity Hum_No_Days Array Number of days with high humidity Geom Geometry A shape of Voronoi diagram partitioned by the meteorological stations in the study region.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

(2) Rice-cultivated region agent

Table 3. Rice cultivated region agents.

No Property name Type Description 1 WS_rice_region (Winter-Spring rice crop): Representing a splitted rice-cultivated region of the Winter-Spring rice crop. Id String ID Description String Type of cultivated region Geom Geometry Polygon 2 SA_rice_region (Summer-Autumn rice crop): Representing a splitted rice-cultivated region of the Summer-Autumn rice crop. Id String ID Description String Type of cultivated region Geom Geometry Polygon

(3) Cellular agents

Table 4. Cellular automata.

No Property name Type Description 1 environmental_cellular_agent (Dominated on the first surface in Figure 17): Being a cell agent of the environmental cellular automaton, which is used to reproduce the virtual environment, especially the BPH invasion. Almost all important models, such as the BPH growth and BPH migration, are built based on the behaviors of this agent. Id String ID Name String Name Geom Float Polygon Two principal indices attractiveness_index Float Attractiveness index obstruction_index Float Obstruction index BPH data density_vector Array Density vector number_of_adult_BPHs Float Number of adult BPHs Transplantation indices (By default: grass, other plants, etc.) rice_cultivated_index Float WS_transplantation_index Float WS transplantation index SA_transplantation_index Float SA transplantation index Variables of the propagation model out_number_of_BPHs Float Out-coming number of BPHs in_number_of_BPHs Float Incoming number of BPHs 2 heuristic_cellular_agent (Dominated on the second surface in Figure 17): Being an cell agent of the heuristic cellular automaton, which is used to support the optimization process. estimation_standard_error Float UKSE combinational_ standard_error Float Combinational standard error isTabu Boolean Attribute applied in the tabu search - one of micro-scale algorithm

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

(4) Administrative region agents

Table 5. Administrative area agents.

No Property name Type Description 1 smalltown_area: Representing a small town area in the study region. In Vietnam, the hierarchy of administrative boundaries comprises five levels: national, regional, provincial, district and small town. id_1 String Region ID region_name String Region name id_2 String Province ID province_name String Province name id_3 String District ID district_name String District name id_4 String Small town ID smalltown_name String Small town name Geom Geometry Polygon 2 district_area: Representing a district area in the study region. id_1 String Region ID region_name String Region name id_2 String Province ID province_name String Province name id_3 String District ID district_name String District name Geom Geometry Polygon 3 provincial_area: Representing a provincial area in the study region. id_1 String Region ID region_name String Region name id_2 String Province ID province_name String Province name rice_age String Rice age Geom Geometry Polygon 4 regional_area: Representing a regional area in the study region. id_1 String Region ID region_name String Region name Geom Geometry Polygon

3.2.3. Environmental submodels

Environmental submodels are built to model all the nuisance factors that affect the dynamics of the population of insects. The BPH invasion, beside its intrinsic heterogeneity, is also affected by multiple natural [104], biological [105] and human factors [27]. Several significant factors will be considered in NOVEL to increase its accuracy. Table 6 shows a list of factors which are considered in our model(s). All these factors are combined into two important and standardized environmental indices: attractiveness index and obstruction index. These two indices are maintained by the Environmental_Cellular_Agent. They are updated by two environmental submodels (except “Wind model”, which updates the regional variables and does not affect the local variables such as the attractiveness and obstruction indices) and reused by two other biological submodels (cf.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Table 1).

Table 6. Affects of local constraint factors into two main environmental indices.

No Name of factors Obstruction index (OI) Attractiveness index (AI) Weather: humidity, [0..1] 1 (1 - AI) temperature. Sea, river and residential 0 2 1 regions Rice transplantation regions Winter-Spring: α; Summer-Autumn: 3 (1 - AI) (by seasonal crops) β; All other: γ (0 ≤ α + β + γ ≤ 1)

In the context of BPH monitoring, the attractiveness index reflects the existence of ideal conditions for the growth of BPHs. On the other hand, the obstruction index, in some situations, will be considered as the opposite of attractiveness, whereas in other situations, this index will be used to control the possibility of placing light-traps. Determining these indices from local factors is almost heuristic. We can change them inside each simulation scenario, where historical data or expert experiences can be used to assign default values. In the environmental cellular automaton, these two indices are computed for every cell; they are used by the growth and migration models and also in optimization algorithms. The specific indices related to each environmental factor are presented as follows: 3.2.3.1. Humidity and Temperature Moldels The weather data are collected by a network of meteorological stations. The data source is indicated in Section 5 of Annex 1. These data are maintained by Meteo_Station_Region agents and then assigned to the corresponding Environmental_Cellular_Agent agents. The weather index is a combined index of all “separate” indices related to weather (called as the weather indices), such as temperature or humidity, which are combined using an index function. The parameters of this function can be determined using the real value of corresponding factors and the “adaptiveness” of the insects with respect to this natural variable (i.e. the interval in which the insects may develop correctly). We propose an exponential function in Equation 9 to calculate a specific weather index for almost all natural factors.

( ) (9) = × where ° denotes the real value of the specific weather factor, ° denotes the mean values of the adaptiveness, ° denotes the calibration parameter, should be calibrated by thematic experts ( by default), α = 2

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

° denotes the distance from the adaptiveMeanValue to the minimum/maximum value of the adaptiveness interval. Equation 9 is applied for almost all weather indices based on available parameters supplied by specific thematic studies. Generally, these studies determine that BPHs could adapt well in a certain range of weather values [minvalue, maxvalue]; based on this adaptive ranges we can easily determine specific weather indices, which are then aggregated into a common index in order to simplify the model. Two weather indices, one for temperature and one for humitity, are detailed below. The interval of each of these weather factors in which BPH can develop correctly has been found in thematic studies, for example Pham, in 2006 [6], who estimated the adaptive range of temperature between 25-30oC, or the adaptive range of humidity between 70-85% proposed by Dyck et al. in 1979 [104].

a) Temperature index with adaptive range of b) Humidity index with adaptive range of temperature from 25 to 30oC (Pham (2006) [6]) humidity from 70 to 85% (Dyck et al. (1979) [104]; Kulshreshtha et al. (1974) [106])

Figure 19. Temperature and Humidity indices.

Figure 19a shows the relation between the temperature and its index by using Equation 9. Similarly, Figure 19b shows the similar relation in case of the humidity measures. Implementation of weather indices in NOVEL Monthly statistical data at each meteorological station are used for NOVEL. At each station, the mean, minimum and maximum values are computed by using the real data from 2001 to 2010. For each simulation run, the simulated data of a specific weather factor are generated for each month by adding Gaussian noises (based on the minimum and maximum values) into the mean values. ° Temperature model Humidity model is built to update the simulated temperature index.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

a) Monthly data of temperature vs. Simulated temperature b) Simulated temperature index

Figure 20. Temperature factor of Bac Lieu meteorological station.

In Figure 20a shows an example of the simulated temperatures (purple line) in a particular simulation run, whereas Figure 20b gives their corresponding simulated temperature indices by using Equation 9. ° Humidity model Similarly, Figure 21 shows an analysis of humidity in Bac Lieu Province. The simulated humidity is also generated randomly for each simlulation run. The simulated humidity index is updated by the Humidity model.

a) Humidity factors of Bac Lieu Province. b) Humidity index of Bac Lieu Province (According to Dyck et al. (1979), the adaptive humidity of BPH is around 70%-85%).

Figure 21. Humidity factor of Bac Lieu meteorological station.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

To simplify the design of NOVEL and its implementation, we aggregate these indices into one, called the weather index. We suggest that all scalar environmental indices, including the temperature, humidity, rainning, sunning as listed in Table 2, should be indexed and aggregated into one and only one index ‒ Combined weather index. The weather index can be determined by the aggregate function as follows: (10) ∑ = where

th ° Ii denotes the environmental index of i natural factor, ° n denotes the number of weather factors considered in models.

Figure 22. Combined weather index from two single indices.

Figure 22 shows an example of the combined weather index for 12 months of the year calculated by Equation 10. 3.2.3.2. Sea, river and residential factors These factors can be modeled as a binary function where the specific index is assigned values of 0 or 1. For example, if a cell is located in the sea, the sea index of this cell is 1, while both the river and residential indices are assigned to 0. If one of these indices is assigned to 1, the obstruction index is also assigned to 1 and then the cell agent is considered as an “infeasible” location for installing a new measuring device. 3.2.3.3. Rice cultivation factor This factor is considered when assessing the effects of rice age on BPH growth and migration models. An exponential function in Equation 9 is still appropriate for this factor where the adaptive range is the time interval when the rice can easily be attacked by BPH. In the Mekong Delta region of Vietnam, there are two rice crops a year: Winter-Spring crop and Summer-Autumn crop. Figure 23 shows the rice-cultivated regions of these

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL) crops in 2005. We see that rice-cultivated area occupies an important part of the delta of Mekong.

a) Winter-Spring rice crop b) Summer-Autumn rice crop

Figure 23. Rice cultivated regions in the Mekong Delta region (Data source: CENRES/CTU).

According to agricultural experts, this cultivation characteristic is the ideal condition for the existence of BPHs and the other harmful insects. Thus, in our models, rice cultivated regions are considered as an important parameter.

3.2.4. Biological submodels

3.2.4.1. BPH growth model There are already multiple stages in insect growth process [7] [15] [48] where models based on Ordinary Differential Equations (ODE) are popular. LPA model for the flour beetle Tribolium castaneum is a typical insect growth model. It was originally defined by Park in 1965 [107] and reused by Dennis in 2001 [48]. This model keeps only three variables of three important stages of beetle's life cycle: larvae, pupae and adult. Pham introduces a full review of BPHs in [6]. This book presents a lot of studies about the BPHs, almost in Asia, including their life cycle, their enemies and also several environmental factors that can affect their activities. Horgan also presented a deeper analysis of the BPH life cycle [15]. These important materials can provide almost all information to determine the parameters for a BPH growth model. Beside the fundamental knowledge, a number of BPH growth models have already been developed. A workflow model written in Stella was developed by Ngo in 2008 [7], Figure 24 gives a chart representing a simulation outcome of this model. It consists of five parameters: egg number at stage 1 (blue line), egg number at stage 2 (green line), nymph at stage 1 (red line), nymph at stage 2 (orange line) and adult number (purple line). Unfortunately, a detailed implementation of this model was not introduced. A number of agent-based models are also presented in [50] [51]; some parameters and expressions of these two models can be reused in our growth model.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Figure 24. Example of a BPH growth model (Source: Ngo (2008)).

As mentioned above, BPH is an incomplete metamorphosis insect with three stages of life cycle: egg, nymph and adult ones. The growth process can be presented as follows [7]:

Figure 25. Different stages of BPH (Source [7]).

In fact, the growth is a behavior of Environmental_Cellular_Agent, also called a cell agent for short, in the later sections. In the growth model, we apply a deterministic model of D variables, where D is the life cycle (measured in days) of the insect. For example, BPH has a life cycle of maximum 32 days, then D is assigned to 32, where the maximum days of egg, nymph and adult stages are respectively assigned to 7, 13 and 12. These D variables will be stored in a density vector V of length D where an element V[i] marks the number of BPHs at age i (i.e., ith day of BPH life cycle). Each cell agent has its own density vector.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Figure 26. Density vector V used for maintaining the BPH densities of each day-age in the whole BPH growth process.

At the initial step of simulation, the values of V must be filled up based on the empirical trap-density data of D days. D times of Kriging estimation are applied and re-computed for this step. The estimated data of V will become the input of both models, the growth and migration ones. For each simulation step, in the growth model, all the elements of V will be updated by the equation below:

[] ∗ ∗ ( − ), = ∈ (11) [] = [ − ] ∗ ∗ ( − ), = [ − ] ∗ ∗ ( − ), = + where [ − ] ∗ ( − ), ° denotes the number of insects at age i, [] ° denotes the ratio of egg number able to become the nymph, ° denotes the ratio of nymph number able to become the adult, ° denotes the ratio of eggs which can be produced by an adult, ° denotes the ratio of natural mortality, ° denotes the egg time span, ° denotes the nymph time span, ° denotes the egg giving time span. 3.2.4.2. BPH migration model A lot of migration models that were developed for animals [105] [106] [110] or insects [50] [111]. Coupled map lattice approximations are introduced in [112]; this model is based on Individual-Based Model to simulate the reaction of individuals via a cellular automaton. Some studies on BPH migration models in a large scale are developed by Otuka [4] [17]. Recently, some new agent-based BPH migration models are also 43

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

developed in a smaller scale, with several studies of Phan in 2010 [50] and Nguyen in 2011 [51]. It is special that these two models have the same purpose with our research, which aims at building an integrated model of the growth and migration ones. Some epidemic models with spatial diffusion of epidemic are also introduced in [113][114]. Beside the BPH growth, migration is the second process which influences BPH density in the real context. The BPHs migrate if the local conditions are not convenient for them [1] [6]. This migration process is known as a behavior of adult long tail BPHs. The migration process is totally modeled inside the environmental cellular automaton (cf. Figure 17). The migration process of BPHs in the study region is modeled using a dynamic “propagation” process in cellular automata.

Figure 27. BPH migration model ‒ A pictorial view of migration process.

Let x(t) denote the number of adult BPHs of a cell at step t of the simulation. The migration model is essentially concerned with the outcome xout(t) which is the number of BPHs moving out of a cell. These (moving out) BPHs will be distributed to the destination cells at step (t+1) of the simulation. The destination cells are determined by a semi-circle under the wind as in Figure 27. The radius of the circle is the product of the wind velocity and the migration time in one day (i.e. in 24 hours). Assuming that there are n destination cells found at the time point t for a source cell i. The ratio of BPHs from i to the destination j is decided by the attractiveness index. The higher the attractiveness is, the higher the income ratio of BPHs is. 3.2.4.3. Coupling the biological submodels We combine the BPH growth model and the BPH migration model in order to create an integrated model, which helps to reproduce the dynamics of BPHs invasion process. In some cases, this integrated model can be used as a prediction model of the BPH invasion. The coupling of these two models is applied via some variables of the cell agents, essentially via the density vector. The new status of these variables by their own actions is updated in each simulation step of both models. The growth model is implemented based on an internal action of cell agent, whereas the migration model is more complex and implemented based on the interactions between a source cell agent and all

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL) destination cell agents. The density vector of BPH stages in Figure 26 is the main variable of these two models. In the BPH integrated model only k elements of k-day adult age, i.e, the k last elements, of the density vector are updated in the migration model, whereas all the elements are recomputed in the growth model. In general, this integrated model is really a Markov chain [59] and the whole cellular automaton now operates as a Markov Gaussian process [82] [83].

3.3. Model of the light-trap network

Model of the light-trap network is built to model and simulate the surveillance network inside NOVEL with three main objectives: ° Mimicking the insect monitoring of light-traps ° Supporting the network optimization processes ° Supporting the network assessment processes. One important hypothesis related to the surveillance network is that each measuring device can monitor in a very small scope of space, approximately equivalent to the surface of a cell in the cellular automaton. This part details the modeling of the Model of the light-trap network. We follow two approaches: (1) Disjointed Surveillance Network (DSN) and (2) Correlated & Disk graph-based Surveillance Network (CDSN).

3.3.1. Disjointed Surveillance Network (DSN)

DSN reflects exactly the nature of light-trap in the real context. Each light-trap is represented by a node agent, located in a cell.

Figure 28. Coupling between Model of the light-trap network and Ecological Models presented in their class diagram. 45

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Figure 28 is an extension of Figure 18 where the Node entity is added into the class diagram. Each node can be located on one and only one cell, and it can “monitor” the density of adult BPHs “living” on this cell. The following table presents the descriptions for all properties of Node agent:

Table 7. Specification of “Node” agent.

Property name Type Description Id String Node identifier Name String Node name empirical_trap_density Vector Time series vector contains the empirical data simulation_trap_density Vector Time series vector contains the simulation data cellular_agent_pointer environmental_cellular Pointer to the environmental cellular automaton _agent agent located by measuring device heuristic_agent_pointer heuristic_cellular_age Pointer to the heuristic cellular agent located by nt measuring device Geom Geometry(POINT) The location of measuring device in the study region The cellular_agent_pointer is used to indicate the cell in which Node agent is located. The number_of_adult_BPHs at the locating cell can be “trapped” by the node agent and assigned to the simulation_trap_density property of Node agent.

3.3.2. Correlated & Disk graph-based Surveillance Network (CDSN)

This approach considers that light-traps are connected to each other as per geographical distance as well as the correlation of measured data. It is based on a mathematical model named Correlation-based Surveillance Network Diagram (CSND) which itself is an application of the unit disk graph technique (detailed in the following section). Compared to the DSN approach, this approach offers two following advantages: ° Firstly, CSND can provide a new “vision” of the distribution of BPHs, as well as, the internal interaction of light-trap system. ° Secondly, CSND can support the process of optimizing the light-trap network at the macro-scale level (detailed in Section 4.6). 3.3.2.1. Unit disk graph technique Unit disk graph was firstly defined by Clark in 1990 to model the broadcast networks and resolve some problems in computational geometry [115].

The UDG technique was used Definition. Unit Disk Graph, defined by Clark [115]. widely in ad-hoc communication Consider a set of n equal-sized circles in the plane. The intersection [115]. In the broadcast network, it graph of these circles is a n-vertex graph; each vertex corresponds to a circle, and an edge appears between two vertices when the is simple to define the disk radius corresponding circles intersect (tangent circles are assumed to where the broadcast scope can be intersect). Such intersection graphs are called unit disk graph. used. Almost all the studies are in optimum frequency utilization [115][116]. UDG can enable the development of efficient algorithms in case of crucial networking problems. A number of classical NP-hard

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

optimization problems are simply solved with unit disk graphs [117][118], such as maximum independent set, minimum vertex cover, minimum coloring and minimum dominating set. Beside the broadcast network, UDG is also applied for some special purposes. A recent research proposed in [119] in which the UDG is used as a bridge between an Individual- Based Model (IBM) and an Equation-based model (EBM). This approach is rather similar to the clustering technique [120], often used in data mining community. Applying the UDG for a surveillance network is not always easy, which is caused by the characteristics of each network. We classify the surveillance networks into three groups: (1) communication transmitters, (2) radar devices, and (3) sampling devices (e.g., light- trap). This classification is illustrated in Figure 29. In fact, UDG is widely used in the first case while rarely used in two others. CSDN is a contribution for applying the UDG in the surveillance network.

a) Receiver/Transmitter b) Radar device c) Sampling device

Figure 29. Different types of surveillance device.

° Receivers/Transmitters (Active devices): Both the transmitter and the receiver operate actively when communicating. The transmitter has its own scope of transmission, and the same case for the receiver. ° Radar devices (Semi-active devices): Only the observer needs to get the information about the observed object. In that case, the role of observed object is strictly passive and the reflection of the observed object is totally based on the physical waves such as the sound waves, micro waves, etc. ° Sampling devices (Passive devices): Light-traps, the main research object of this thesis, are typical sampling devices. In a sampling network, the role of sampling device is totally passive; an organism (e.g., BPH), which can move to the device, plays an active role. A sampling device only operates as a measuring device, or concretely, a “counter”. In case of light-trap network, the attractiveness scope of a light-trap device (several hundred meters) is significantly less than the distance between two devices (dozen of kilometers). The light-trap network is really a sampling network, applying the UDG requires another view of the “coverage area” of a device. The disk radius now does not

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

depend on the capabilities of the device, but some external parameters of the environment. For example, in a hydrological sensing network, the water flow velocity may be used to calculate the disk radius. In this approach, the UDG is considered as a “clustering graph” [120], which allows extending flexible properties of connectivity between different devices. 3.3.2.2. Correlation analysis Determining the dependence between measuring devices is very important for our research. This factor probably affects the node-placement planning for the surveillance network [55] [78]. In statistics, dependence refers to any statistical relationship between two random variables or two datasets [61][122]. Correlation refers to any class of statistical relationships related to dependence. In practice, correlations are often applied in a wide range of applications because they can support some predictive analyses [122]. The most common measurement of dependence between two quantities is based on the Pearson product-moment correlation coefficient, or “Pearson's correlation” [122]. Equation 12 is used to compute this coefficient.

The correlation coefficient ρXY between two random variables X and Y is defined as:

(, ) [( − )( − )] (12) = (, ) = = where µX and µY respectively denote the expected values of X and Y,

σX and σY respectively denote the standard deviations X and Y, E[∙] denotes the expectation operator, cov(∙,∙) denotes the covariance function, and corr(∙,∙) is widely named for Pearson’s correlation function. The selection and use of Pearson's correlation for a specific problem is reviewed by Rodgers in [122]. In addition, there are other correlation classes namely the Spearman’s and Kendall’s rank correlation coefficients [123]. 3.3.2.3. Definition of CDSN We introduce a mathematical diagram for CDSN, i.e., Correlation-based Surveillance Network Diagram (CSND). In some situations, the term CDSN and its mathematical diagram, the CSND, may be used interchangeably. Definition. Correlation-based Surveillance Network Diagram (CSND). A CSND is represented as a directed and weighted graph G = (V, E) built from a set of vertices V = {v1, v2, …, vn}, where each vertex is related to one measuring device in the network; and the set of edges E = {e1, e2, …, em} is determined by the relationship between any two different vertices via two connected functions: D(vi, vj) and C(vi, vj). Function D(vi, vj) is defined as the function of path distance and C(vi, vj) ‒ correlation function.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

An edge (vi, vj) belongs to E if and only if (vi, vj) satisfies both conditions of the connection as follows:

• Condition of path distance: Path distance from vi to vj must be less than or equal to a threshold of spatial distance . We have: (13) ≤ (, ) ≤ where denotes the threshold of spatial distance (or disk radius) between two vertices and D(vi, vj) denotes the path distance between two different vertices vi and vj. The path distance is different from the displacement between two locations. This distance depends on the possible path between two measuring devices di and dj in the real context. The path distance, or the distance traveled along a path, differs from the displacement. In [124], the distance is defined as a scalar quantity that refers to “how much ground an object has covered” during its motion. Therefore, it should be noted that path distance is not required to be a straight line in the diagram. Path distance is straight if there is no obstacle between them. Figure 30 shows an example of a path distance in case an obstacle exists. The path totally depends on the trajectory of the movement of the observed object. For example, in case of the BPH migration, the path distance coincides with the displacement. In case of the water flow, the path distance depends on the network of rivers, canals or ground heights, etc. The trajectory of the movement can be used to determine the direction of a directed path and the direction of a directed edge in CSND.

Figure 30. Displacement vs. Path distance.

• Condition of autocorrelation: Correlation coefficient of two vertices vi and vj must be greater or equal to a threshold of correlation . (14) (, ) ≥ where denotes the threshold of sample correlation between two vertices.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

3.3.2.4. Why is the CDSN necessary? Tobler defined the first law of geography as “Everything is related to everything else, but near things are more related than distant things” [125]. This law is partly mentioned in many geographical issues. When analyzing an environmental surveillance network, the network topologies and the intrinsic environmental relations are often mentioned in different research questions. However, these two concerns are often observed independently, so some significant properties of the environment cannot be found. The Correlated & Diskgraph-based Surveillance Network (CDSN) is proposed to overcome two big disadvantages of a normal surveillance process: (1) the independence between the network topology and the intrinsic environmental relations and (2) the discontinuity between the data collection process and the data analysis process. This approach is based on two important techniques applied in sensor network (Unit Disk Graph) and data mining (clustering by correlation), which helps to establish the internal relations between different measuring devices. Some basic properties of CDSN are introduced in this section. According to these suggestions, a heuristic optimization algorithm is intentionally proposed by adding k measuring devices into the existing network of n devices. This algorithm is mentioned as a macro-scale optimization technique. 3.3.2.5. Implementation of CSND To deploy the CSND in NOVEL, we need to add two new agents: Edge and Graph. The specification for these two agents are as below:

Table 8. Specification of “Edge” and “Graph” agents.

No Property name Type Description 1 Edge: Representing an edge in the CSND. Id String Edge identifier Name String Edge name Source_Node Node Source node Destination_Node Node Destination node correlationWeight Float Correlation coefficient between the source and destination nodes 2 Graph: Representing a graph in the CSND. working_status Boolean Working status (Working: TRUE) node_list List List of Node agents (Not really implemented as a property of Graph agent, this property is implemented as a global variable) edge_list List List of Edge agents (Not really implemented as a property of Graph agent, this property is implemented as a global variable)

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

The implementation of CSND follows the below algorithm: Algorithm 1: Implementation of CSND Input: Coordinates of sampling locations & Sampling data Ouptut: - Array of vertices, Array of edges - Distance matrix, Correlation matrix Complexity: O(n2) with n is the number of sampling locations. BEGIN Step 1 V = loadData(); E = NULL; /*V: Array of vertices; E: Array of edges*/ distanceMatrix = matrix(0, n, n); correlationMatrix = matrix(0, n, n); /* distanceMatrix: Distance matrix of size n x n; correlationMatrix: Correlation matrix of size n x n;*/ Step 2 FOR i FROM 1 TO n DO/*Building the CSND*/ Step 3 FOR j FROM 1 TO n Step 4 IF(directedDistance(vi, vj)≤ΘspatialDistance) THEN Step 5 distanceMatrix [i, j] = 1; IF(correlationCoefficient(vi, vj)≥Θcorrelation) THEN correlationMatrix [i, j] = 1; E.addEdge(new Edge(vi, vj)); END IF ELSE IF(correlationCoefficient(vi, vj)≥Θcorrelation) THEN correlationMatrix[i, j] = 1; END IF END FOR END FOR RETURN V, E, distanceMatrix, correlationMatrix; END 3.3.2.6. Directed connection based on the wind direction In this section, we propose a directed CSND for the light-trap network, where the direction and the disk radius represent the migration of BPHs. Therefore, as we have presented in the BPH migration model in Section 3.2.4.2, the direction of edge will also depend on the wind direction. The sampling cycle is one day, which corresponds to the real cycle of a light-trap. Now, we observe the maximum distance over which the BPHs can migrate in each cycle. Supposing that v is the wind velocity and t is the migration period of BPHs in each simulation step, we have rmax = v × t. The disk radius θe of UDG must be less than rmax. We apply the case of model using the circle disk. For the light-trap network, t is equal to 4 hours, which is related to the time period from 18h to 22h at night when the lights are turned on (which also corresponds to the migration period of BPHs).

Chapter 3 ‒ Network Optimtimization Virtual Environment Laboratatory (NOVEL)

Figure 31. Connection betwtween two nodes. Figure 32. Inputs/o/outputs of node. The connection between twoo light-traps depends on the velocity and thet direction of wind. Figure 31 explains thishis idea. Supposing that we have two lightht-traps: N1 with coordinates (x1, y1) and N2 witith coordinates (x2, y2). N2 is located in the semicircles under the wind having rmax = v × t.

In this figure, α is called the aangle between the wind direction and the x axis. N1 is the source point, from which thehe BPHs can dispatch. D1 (x'1, y'1) is the farthestfa point to which the BPH can reach iff ththey follow the wind direction in one day. In this case, the distance from N1 to D1 is r. We need to find the equation that refleclects the relation between N1 and N2 based onn α and D1.

The line that connects N1 annd D1 is expressed by the equation . Thenn, based on the equations we can calculate ththe values of x'1 and y'1: (15) ′ = + × (16) ′ = – + × Also, we need to determinee whether N2 meets the condition to be found fo within the semicircle where N1 is the cenenter found under the wind. We call and two directed vectors. = We need= (′ too− determine , ′ − angle) β formed = between = ( −and d , . −If ) , ≤ /2 then N2 meets the condition tto be found within the semicircle in which N1 is the center. In this case, a new directed ededge from N1 to N2 will be added to the CDSNSN. The value of can be determrmined by the expression:

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

(17) ∙ = || ∙ || where is the scalar product of two vectors and , and are respectively ∙ the length of and by Euclidean distance. || || So, the condition of spatial distance to establish a connection from N1 to N2 are: (18) || ≤ ∗ (19) where t denotes the amount of| time| ≤ the/ BPHs can dispatch in one day, i.e t = 4 hours/day and v denotes the wind velocity. 3.3.2.7. Examples of CSND The CSND provides the ability to observe the existence of abundances of BPHs in a specific time period. This existence can be observed via the intrinsic connectivity of nodes, in detail, by the sub-graphs. We can consider that a sub-graph exists in the CDSN that represents a set of surveillance devices located in an insect abundance area in a specific time period. In case of long time period, this abundance area can be known as a spatial region that has higher level of insect distribution. By changing the spatial, temporal scales or correlation scales of the CDSN, we can observe the graph from different side, and the characteristics of the environment are also perceived via CDSN. In other words, CDSN is considered as a descriptive graph for a specific environment and can be used as a “prism” to observe the environment.

a) CDSN with θr of 10 km and θcorrelation of 0.2 b) CDSN with θr of 10 km and θcorrelation of 0.3 Figure 33. Correlation & Disk graph-based Surveillance Network ‒ Macro-scale network organization.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Figure 33a and Figure 33b illustrate two cases of CSND with θcorrelation of 0.2 and 0.3 respectively. The disk radius θr in both cases is set at 10 km. Figure 33a has more connections than Figure 33b due to the correlation condition is easier. 3.4. Experiments of NOVEL

3.4.1. NOVEL in GAMA

Figure 34 shows several screen shots of NOVEL. Figure 34a presents the distribution of the BPH in the winter-spring rice crop in three provinces: Hau Giang, Soc Trang and Bac Lieu. The sea/river regions are also considered in the models. Figure 34b presents a 3D visualization of the BPH distribution. On this figure, the local constraints are combined into the obstruction index; the height of the surface represents the value of this index calculated for all the cells of the environmental cellular automaton. We can see that the BPH density decreases significantly at the areas of high obstruction index.

Without rice Sea River a) BPH trap-density and Rice cultivated regions b) BPH trap-density against obstruction index

Figure 34. Visualization of BPH prediction model in GAMA platform.

This integrated model is presented in Section 3.2.4 as a combination of two previously presented models: BPH growth model (Section 3.2.4.1) and BPH migration model (Section 3.2.4.2). As presented in Section 3.2.4.3, the growth model is implemented based on an intrinsic process inside each environmental cellular automaton agent, whereas the migration model is based on the interactions between different cells in a global cellular automaton.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

a) Empirical data b) Estimated data

Figure 35. Empirical data vs. Estimated data.

Figure 35 shows the empirical and estimated data of 48 light-traps, respectively. In th these two figures, the Ti denotes the i trap in the list of 48 light-traps where the data of each light-trap are represented by one separate line with a particular symbol and color. The number of cells in each cellular automaton is determined by its numbers of rows and columns. These two values are parameterized in NOVEL, so they can be easily configured by the modelers. Figure 36 shows two examples of the virtual environment with different dimension of the cellular automata. Figure 36a shows the cellular automaton of 60x60 cells, whereas Figure 36b shows the cellular automaton of 120x120 cells. The area of one cell in Figure 36a is exactly four times bigger than one in Figure 36b. Thus, the dispersion of the virtual environment are also different, which can directly affect the quality of the prediction models. In theory, if the higher the dimension, the more accuracy the prediction model will be. In comparing between these two figures, it is clearly that Figure 36b models the dispersion of the environment better. In addition, when coupling with the SNM, each node agent will “capture” the BPHs in a smaller space. Hence, the simulation data of trap-density is more precise in the sense of the spatial scale. However, a high resolution of the environment significantly increases the computational complexity, especially in the BPH migration model.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

a) Cellular automaton with 60×60 cells b) Cellular automaton with 120×120 cells

Figure 36. Dispersion of environment against the dimension of cellular automata.

3.4.2. Verification of the BPH prediction model

Verifying the model is a necessary step to assure the accurate operation of NOVEL. As aforementioned in Section 3.2.1, the main goal of the EEMs is to reproduce all important properties of the real system, which comprise the randomness & autocorrelation, the spatial dispersion & nonergodicity and the heterogeneity. However, it is really difficult to separately verify each property because of its abstract nature. Therefore, we apply the black-box testing approach by verifying the outcomes of trap- density data collected from the BPH prediction model by the SNM. Our BPH prediction model is also used as one of case studies of another PhD student ‒ Truong Minh Thai (UT1/IRIT, France). His work aims at building a framework, namely by the Combination Framework of Business intelligence solution and Multi-agent platform (CFBM) [126], that helps to verify various agent-based models based on GAMA platform. Therefore, some further works of this section will be done in a recent future. 3.4.2.1. Scenarios of verification This section presents three different results of prediction validation corresponding to three specific scenarios of BPH cycles (parameters of BPH growth model) and wind factors (parameters of BPH migration model). The prediction estimated by simulation is applied for a short period of one month. In all three scenarios, the wind direction is assigned as per North-East direction and the velocity is assigned at 9.4±3.5km/h. Table 9 list all the parameters of the BPH prediction model as well as their corresponding values. These parameters are essentially used by the BPH growth model and explained in Equation 11.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Table 9. Three scenarios of BPH characteristics. Scenario 2 Scenario 3 Scenario 1 Parameters Description (Integrated (Integrated (Growth model) model) model) Egg giving time span 6 6 12 T1 T2 Egg time span 7 7 7

Nymph time span 13 13 13 T3 T Life cycle 32 32 32 Ratio of egg number able r 0.42 0.42 0.40 en to become the nymph Ratio of nymph number r 0.42 0.42 0.40 na able to become the adult Ratio of eggs which can be r 250.0 250.0 350.0 b produced by an adult

M Ratio of natural mortality 0.245 0.245 0.35 3.4.2.2. Verification process This section presents the operation of NOVEL through different scenarios of the BPH prediction model. We assess the difference between the empirical data and simulation data obtained after running prediction model. We use the RMSE (Root mean square error) index [127] as the indicator of the verification process. ° Empirical data We use the data from 48 light-traps of three typical provinces in the Mekong Delta region: Soc Trang, Hau Giang and Bac Lieu. Figure 37 shows the real trap-density data from January 01, 2010. Each line of the figure presents the data of one light- trap. We use the data of the first 32 days (D = 32) to apply D times of Kriging estimation for all the cells of the cellular automaton. The estimated data from 1rst to Dth days, or simulation steps, are used to compute the values of density vector V[1..D] (cf. Figure 26). As mentioned in Section 4.2.4, this density vector is used to predict the BPH trap-densities in the future, i.e., from (D + 1)th simulation step.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Figure 37. Empirical data from January 01, 2010 of 48 light-traps in three provinces: Soc Trang, Hau Giang and Bac Lieu.

° Simulation data Figure 38 shows the simulation data that were used in the same context in Figure 37. The first 32 days are the estimated trap-densities computed using Kriging estimation. Days from 33th to 64th are are trap-density data generated by the simulation. Figure 38a shows the simulation data of generated by the growth model, i.e., scenario 1, while Figure 38b gives the simulation of generated by the integrated model, i.e., scenario 2. As shown in Figure 38a, the form of predicted data is significantly smooth and looks like a sinusoidal one. The same sinusoidal wave form is found in [7] (cf. Figure 24). However, the changes of empirical data are often sudden, like a peak of insect amount. In both scenarios 2 and 3, we apply the integrated model. In Figure 38b, the form of BPH trap-densities is more realistic where the form of data is not like a sinusoidal wave yet.

a) Growth model b) Integrated model of growth & migration models

Figure 38. Prediction data in two different models.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Table 10 shows the RMSE [127] of simulation data in three scenarios listed in Table 9. The RMSE values are independently calculated for two stages of simulation data:

Table 10. RMSE of simulation data in three scenarios.

Scenario 1 Scenario 2 Scenario 3 RMSE of simulation data (Growth model) (Integrated model) (Integrated model)

Estimated data vs. Empirical data 614.3536 657.2320 323.9902 (From 1rst to 32th simulation step)

Predicted data vs. Empirical data 535.5026 524.7963 443.3628 (From 33rst to 64th simulation step)

In Table 10, we observe two cases of scenarios 1 and 2. In these two scenarios, the same values of parameters used for the BPH growth model (cf. Table 9) are used but the simulation models are different. The BPH growth model is applied for scenario 1 while the integrated model is applied for scenario 2. The accuracy of the predicted data (i.e., the second row in Table 10) generated by the integrated model in scenario 2 is better than ones generated by the growth model in scenario 1 (even if the accuracy of the estimated data (i.e., the first row in Table 10) in scenario 2 is worse than one in scenario 1). This result means that the integrated model operates a little better than the growth model when seperately operating. The RMSE values in the scenario 3 is lowest, and this scenario can be used for the further experiments, essentially on assessing the potential layouts as will be presented in Chapter 5. 3.5. Conclusion

We have introduced the detailed components of NOVEL, which is considered as an integrated model of the EEMs and SNM in agent-based approach. Chapter 4 will introduce three groups of network optimization implemented in this laboratory, classified as the micro-, meso- and macro-scale approaches. In more details, a general view of NOVEL is illustrated as the figure below.

Chapter 3 ‒ Network Optimization Virtual Environment Laboratory (NOVEL)

Figure 39. General view of NOVEL.

Figure 39 is used to explain the interactions between different models inside NOVEL. Principal agents of two groups of models, Ecological Models and Model of the Light-trap Network, are listed and their scopes of action are generally indicated. Regarding to the Ecological Models, beside the BPH migration and growth models, other environmental models are also mentioned, such as Temperature Model, Humidity Model and Wind Model. As mentioned above, the density vector is the main variable of each cell agent that contains the densities of BPH at different stages of their life cycle (cf. Figure 26); it is essentially updated by BPH migration and growth models. The light-traps, or Node agents, are located on several cell agents in the study region, their network can be modeled as Disjointed Surveillance Network Model (DSN) or a more complex one ‒ Correlated & Disk graph-based Surveillance Network (CDSN). An important characteristic of NOVEL is that all sub models can be easily replaced. For example, we can easily plug other BPH migration models, e.g., Nguyen [111] or Phan [50], in order to replace our current migration model.

Chapter 4 ‒ Multiple scales optimization

CHAPTER 4

MULTIPLE SCALES OPTIMIZATION

This chapter introduces three groups of network optimization algorithms. NOVEL defines several optimization algorithms that can be applied to the simulated network in order to improve its performance. These algorithms can modify the network by adding, removing or moving some of its nodes. Depending on different heuristic functions which use information at different scales, these algorithms are classified into three scales: micro-, meso- and macro-scales. Two first groups are known as the metaheuristic algorithms, while the third group presents a novel algorithm based on the graph and correlation theories. Although all the algorithms can return their own “optimal” layouts, as aforementioned, it is difficult to determine their performances in the real context. Therefore, in this thesis we call these “optimal” layouts potential ones. Chapter 5 will show how to undertake the experimental assessment based on simulation. These optimization algorithms use an objective function which is based on the Gaussian process entropy.

4.1. Introduction

Supposing that we have n current measuring devices in a surveillance network. This network is really sparse and gives a poor quality of population estimation. The question is how to add k new measuring devices into this network to achieve the best performance. In this manuscript, the additional optimization is chosen as the main problem to be monitored by NOVEL. Also, all implemented algorithms and related experiments will be concerned with this additional optimization problem. In agent-based modeling, the interaction between agents is the most important concern that decides which emergence pattern will emerge. In this thesis, each potential layout of the light-trap network can be considered as a pattern, which is thoroughly assessed via multiple simulation runs based on one or several performance indicators (will be presented in Chapter 5). The self-organization concept is widely mentioned in agent-

Chapter 4 ‒ Multiple scales optimization

based modeling and simulation techniques. The principles of the self-organizing system were presented by Ashby in 1962 [128]. Some discussions between self-organization and agent-based approach were found by Ferber in 2003 [129] and Drogoul in 2008 [68]. Optimization algorithms are based on different heuristic functions which use information at different scales: ° Micro-scale: Only the local information to each node (its cell and its neighboring environment) is considered. A Node agent does not have any interaction with the other device agents. When deciding to move (in the environment), the device agent considers its own perceptions of the environment. We propose two implementations for the micro-scale optimization technique based on two local search techniques: Hill Climbing search [130] and Tabu search [66] [125].

Figure 40. Micro-scale network organization.

° Meso-scale: At this scale, the relationships or interactions between nodes are considered as well (for example, to “balance” their respective positions). All characteristics at micro-scale are retained in the meso-scale. The limited interactions are determined based on the relationships between the measuring devices in the real context. In [36], Vo et al. proposed three types of scales: spatial, temporal, and decisional scales. For the spatial scale, the interactions between different device agents can be determined by the distance between two different devices, e.g., disk graph, only nearby devices are able to interact and share their own information to each other. In case of decisional scale, the interactions can be established by their administrative owner. For example, all the devices managed by one province can interact between themselves. Some examples of meso-scale are presented in Figure 41. On Section 4.5, we will introduce two meso-scale optimization algorithms where PSO [72] [132] is a typical metaheuristic representative.

Chapter 4 ‒ Multiple scales optimization

a) Decisional meso-scale model b) Spatial meso-scale model (Constraints of administrative (Disk graph-based constraints) management)

Figure 41. Meso-scale network organization.

° Macro-scale: All characteristics at meso-scale are retained in the macro-scale. At this scale, global properties of the network (for example, a minimal connectivity of the graph) are added and there are no constraints on the interactions between two different measuring device agents. The use of CSND (cf. Section 3.3.2.3) is one representative of this scale, this approach is presented in Section 4.6. In this graph- based model, the interactions between the measuring device agents are more diverse and represented by several evidences: disk graph radius, correlation coefficient, degree of nodes, sub-graphs, etc. 4.2. Data description

In this chapter, the case study of Section 3.5.1 is reused. We consider 48 light-traps of three provinces in the Mekong delta region as the nodes of a surveillance network network. The placement of these light-traps in the study region is illustrated in Figure 42.

Chapter 4 ‒ Multiple scales optimization

Figure 42. Placement of light-traps in three provinces: Hau Giang, Soc Trang and Bac Lieu, Mekong Delta, Vietnam.

All optimization algorithms try to add more k light-traps into this network using their own strategies. The BPH trap-densities data in 2010 of all these light-traps are used to generate the UKSEs for all the agents of the heuristic cellular_automaton (firstly mentioned in Figure 17 and presented in detail in Figure 43), which supports the optimization processes. Table 11 shows the detailed information of all these light-traps: light-trap name, district name, especially geographical coordinates.

Table 11. Light-traps in three provinces: Hau Giang, Soc Trang and Bac Lieu in the Mekong Delta region, Vietnam (Data source: Departments of Plant Protection).

ID Light-trap name District name Province name Latitude Longitude 1000 Dong Phuoc Chau Thanh Hau Giang 588,042.0 1,091,566.0 1001 Nga Sau town Chau Thanh Hau Giang 588,700.0 1,097,972.0 1002 Dong Thanh Chau Thanh Hau Giang 581,456.0 1,100,798.0 1006 Tan Hoa Chau Thanh A Hau Giang 567,748.0 1,094,000.0 1007 Truong Long A Chau Thanh A Hau Giang 563,996.0 1,103,511.0 1008 Tan Phu Thanh Chau Thanh A Hau Giang 578,154.0 1,100,258.0 1009 Nhon Nghia A Chau Thanh A Hau Giang 570,485.0 1,095,600.0 1011 Phuong Binh Phung Hiep Hau Giang 567,996.0 1,076,124.0 1012 Binh Thanh Phung Hiep Hau Giang 575,925.0 1,087,283.0 1014 VT Tay Vi Thuy Hau Giang 553,946.0 1,073,125.0 1015 Vi Trung Vi Thuy Hau Giang 560,609.0 1,081,144.0 1022 Phuong 5 Vi Thanh town Hau Giang 553,394.0 1,079,905.0 1028 Dai Thanh Nga Bay town Hau Giang 589,383.0 1,088,204.0 1029 Hiep Loi Nga Bay town Hau Giang 586,823.0 1,088,570.0 1031 Vinh Vien Long My Hau Giang 553,232.0 1,066,126.0 1032 Thuan Hung Long My Hau Giang 560,271.0 1,066,811.0 1033 Long Binh Long My Hau Giang 563,929.0 1,076,660.0 1036 Phong Tan Gia Rai Bac Lieu 549,285.4 1,028,493.9 1037 Hung Phu Phuoc Long Bac Lieu 559,376.5 1,038,970.9 1038 Gia Rai town Gia Rai Bac Lieu 550,663.2 1,021,204.0

Chapter 4 ‒ Multiple scales optimization

1039 Ninh Hoa Hong Dan Bac Lieu 552,401.1 1,054,250.6 1040 Vinh Thanh Phuoc Long Bac Lieu 554,130.0 1,034,740.2 1041 Ninh Quoi A Hong Dan Bac Lieu 556,992.6 1,049,985.1 1042 Vinh My B Hoa Binh Bac Lieu 562,443.4 1,026,776.9 1043 Vinh Hung A Vinh Loi Bac Lieu 563,869.2 1,038,718.9 1044 Hoa Binh town Hoa Binh Bac Lieu 569,101.8 1,025,902.1 1045 Chau Hung Vinh Loi Bac Lieu 581,188.3 1,033,552.5 1046 Bac Lieu town Bac Lieu town Bac Lieu 581,395.0 1,025,596.2 1047 Thanh Tri Thanh Tri Soc Trang 577,102.4 1,045,373.4 1048 Lam Kiet Thanh Tri Soc Trang 589,774.8 1,051,835.6 1049 Vinh Loi Thanh Tri Soc Trang 567,739.0 1,043,484.2 1050 My Binh Nga Nam Soc Trang 565,203.9 1,047,937.0 1051 Vinh Bien Nga Nam Soc Trang 577,102.4 1,045,373.4 1052 Truong Khanh Long Phu Soc Trang 611,007.6 1,070,453.4 1053 Long Phu town Long Phu Soc Trang 623,187.1 1,060,908.3 1054 Lich Hoi Thuong Long Phu Soc Trang 626,024.7 1,046,484.5 1055 Ke Thanh Ke Sach Soc Trang 603,674.3 1,080,272.6 1056 Ke Sach town Ke Sach Soc Trang 607,218.9 1,079,267.1 1057 Dai Hai Ke Sach Soc Trang 596,334.3 1,080,818.6 1058 Phuong 10 Soc Trang city Soc Trang 604,348.7 1,058,654.6 1059 Phuong 8 Soc Trang city Soc Trang 611,358.8 1,063,429.7 1060 Thuan Hung My Tu Soc Trang 593,092.8 1,060,394.8 1061 Long Hung My Tu Soc Trang 587,716.0 1,070,975.2 1062 Thuan Hoa Chau Thanh Soc Trang 599,276.2 1,070,859.5 1063 Ho Dac Kien Chau Thanh Soc Trang 594,372.4 1,074,912.7 1064 Thanh Thoi An My Xuyen Soc Trang 611,800.1 1,049,405.3 1065 Dai Tam My Xuyen Soc Trang 600,882.9 1,055,607.0 1066 Thanh Quoi My Xuyen Soc Trang 587,307.0 1,044,409.4

4.3. Heuristic search used in network optimization

4.3.1. Why is it necessary to apply a heuristic search?

In Chapter 2, we discussed the important properties of an insect ecosystem in agriculture, which comprise the randomness & autocorrelation, the spatial dispersion & nonergodicity and the heterogeneity. These properties describe the complexity of this ecosystem; hence, proposing an optimal layout for its sampling network becomes a complex problem. In a discrete environment, as we model the study region using a cellular automaton, the network optimization is a combinatorial problem of selecting placements [83]. To solve this combinatorial problem, an exhaustive algorithm to set k sampling locations on a discretized spatial zone of n × m cells can be used. Therefore, this algorithm has a factorial complexity. This problem was shown to be NP-hard by Ko et al. [133] and Krause [83]. In this case, heuristic solutions are often used to reduce the computational complexity. Unlike other exhaustive algorithms, only the “experience-based” layouts are considered in heuristic solutions. Thus, only the “near-optimal” layouts (or “approximating optimal” layouts) can be found [24] [83]. In this thesis, all network optimization algorithms are heuristic. By modeling the observed variables as Gaussian processes, we use the variance of the Gaussian

Chapter 4 ‒ Multiple scales optimization

Process Regression (or Gaussian Process Entropy) as an argument of a heuristic search, or a heuristic argument. The principal idea is to place the new sampling devices at the high-entropy locations.

4.3.2. Universal Kriging Standard Error (UKSE)

Adding new measuring devices at the locations of the highest entropies is a popular strategy mentioned in many studies [20] [83]. Minimizing the sum of Kriging variances is one of the most important conditions for an optimal network layout [8]. Concretely, Kriging variance is applied as an objective function of these optimization strategies. Cressie proposed an idea for optimization strategy of a new node to the network ‒ namely by the optimal addition [20]. Krause presented the near-optimal approach with support of GPEs [83]. In [8] [134], some optimization of sample patterns are performed based on the mean universal Kriging variance (MUKV) with the simulated annealing algorithm. The mobility of a surveillance network is often low. Because of such limitations, the optimal solutions must be adaptive for a long time period. For example, regarding the discussion about the nonergodic and heterogeneous properties of the insect ecosystem, both the observed parameters and nuisance parameters often change in time. In this case, a more globalized quantity (for time scale) of Gaussian process entropy would be more appropriate than a specific one. Hence, we propose to use the Universal Kriging Standard Error (UKSE), instead of UKV, in a specific time duration (the mean of n UK standard deviations, where n is the number of time cycles during ). Choosing an appropriate for UKSE is also a difficult question. There is no general answer for such question. The answer depends on the specific system and also on the mobility of the surveillance network. If the network is able to frequently adapt to the changes of system, the value of would be short, otherwise this value would be more prolonged. The value of UKSE at one specific location is given by the equation below:

∑ (, ) (20) (, ∆) = where denotes the value of UKSE at location z during , (, ∆) ∆ denotes the value of UKV at location z at time t = i, (, ) n denotes the number of time cycles during . ∆ Figure 42 shows an example of 48 light-traps in three provinces in the Mekong Delta region. Figure 43a shows its heuristic cellular automaton based on the UKSEs. The time duration is relative to the maximum value of BPH life cycle; is equal to 32 days [7]. With a sampling cycle of one day, n is determined as 32. Therefore, the UKSE at each cell is the mean value of UKVs computed for 32 days at this cell. In this thesis, UKV/UKSE is presented as the Gaussian process entropy ‒ the most important heuristic argument used in all optimization algorithms. In addition, we propose another combinational heuristic argument that combines both UKSE and all local

Chapter 4 ‒ Multiple scales optimization

constraints in a single quantity. Indeed, this combinational heuristic argument is determined as a division of the UKSE by the obstruction index.

a) Value of the UKSE (entropy) b) Heuristic value based on the combination of the 2 arguments: UKSE and obstruction index

Figure 43. Two heuristic cellular automata in 3D visualization.

Hence, the heuristic cellular automaton based on the combination of the 2 arguments has more local maximums or minimums. This characteristic means that there are more choices in the optimization process. Hence, the potential network layouts can be proposed with more constraints and could be adaptive to the reality better. Figure 43 shows a comparison between these two heuristic cellular automata. 4.4. Micro-scale approach of network optimization

4.4.1. Hill-climbing search algorithm

Hill-climbing belongs to the family of greedy local search [135]. This technique starts with an arbitrary solution to a problem, then attempts to find a better solution by incrementally changing a single element of the solution. In continuous search spaces, it generally uses the gradient of the objective function to take local steps in the direction of the greatest possible improvement. Hill-climbing search is also known as a metaheuristic [23]. In NOVEL, the hill-climbing search algorithm is implemented as an action (interaction) of the measuring device agent (i.e., Node agent). Each measuring device agent tries to move to the nearest location having best value of a specific objective function. Heuristic cellular automaton is used for all optimization algorithms proposed in this thesis. UKSE, found in the heuristic automaton, is used as the heuristic information. This implementation allows a measuring device agent to “read” easily either the environmental properties or the UKSE information. Hill-climbing search is chosen as the first optimization algorithm of this thesis by its simplicity. As mentioned above, optimizing the addition of k new measuring devices into an existing network of n devices is the main optimization problem mentioned in all experiments. Two basic optimization criteria for the measuring device locations needed to be considered in the objective functions:

Chapter 4 ‒ Multiple scales optimization

(1) Locating at a location of maximum local/global Gaussian process entropy. This criterion is controlled by the error value found in UKSE cell agent. (2) Locating at a location of minimum obstruction index, and especially avoiding on the “infeasible” locations (e.g. the residential, river or sea regions). In the micro-scale approach, each device agent just asks for the UKSE values of all adjacent cells. The device agent compares these values with the UKSE value of its current cell and then gives a decision of moving. Figure 44a & Figure 44b present two simulation steps of the algorithm. A cell has 8 surrounding cells as neighbours. The red circles denote the measuring devices existing in the surveillance network. These devices are stable through the simulation process. The white circles denote the added devices. At the initial step, k new device agents are randomly added into the network. The pseudo code of the hill-climbing search algorithm is as follows: ALGORITHM: Hill-climbing search algorithm. Inputs: ° location: Old location of node. ° Heuristic Cellular Automaton: The Heuristic Cellular Automaton based on UKSEs. Output: ° location: New location of node. Complexity: O(n2) BEGIN 1 neighbor_cells=this.getNeighborCells(8,1); 2 neighbor_maximal_UKSE=getMaximumLocalUKSE(neighbor_cells); 3 maximal_cell=getMaximumCell(neighbor_cells); 4 current_UKSE=getUKSE(current_cell); IF (current_UKSE

The getMaximumCell(Cell[] cell_list) function returns a cell with the highest UKSE in cell_list, whereas the getMaximumLocalUKSE(Cell[]cell_list) function returns the maximum UKSE value of this cell_list. The getUKSE(Cell the_cell) function returns the UKSE of the_cell and the getLocation(Cell the_cell) function returns the coordinates of the_cell parameter. In fact, the UKSE values are locally minimal at locations of the existing measuring devices and found higher at the farther locations. In the optimization process, the new

Chapter 4 ‒ Multiple scales optimization

device agents will be increasingly far away from the old device agents. The optimization process runs multiple calls of Hill-climbing search algorithm, which is implemented as an action of Node agent. So, each call corresponds to one simulation step. The process stops when there is no change of locations found between two successive simulation steps, i.e. a convergent state is found. Otherwise, a finite number of calls can be assigned for the optimization process. The number of calls tightly depends on the number of cells of the cellular automaton and the other charateristics of the environment.

a) Step 1 of Hill-climbing search process. b) Step 2 of Hill-climbing search process.

Figure 44. Hill-climbing search process.

4.4.2. Tabu search algorithm

Tabu search is a metaheuristic originally developed by Glover in 1986 [67] [126]. This technique is one of the most efficient local search algorithms. In general, local search techniques are widely used to find “near-optimal” solutions in a “reasonable” amount of time. Like the hill-climbing search, the tabu search (TS) is one of general iterative algorithms for combinatorial optimization [131]. Tabu search is chosen as the second optimization algorithm of the micro-scale optimization approach. In this technique, each measuring device agent has a tabu list limited in length to mark the movement trajectory. A heuristic cellular automaton based on the combinational arguments is used for the searching process. A device agent will randomly change its location if one of its adjacent cells is being monitored by another device. Beside two basic objective criteria implemented in the previous greedy algorithm: (1) Locating at a location of high local/global Gaussian process entropy, and (2) Locating on a “feasible” location, tabu search applies one more objective criterion for their optimization process:

Chapter 4 ‒ Multiple scales optimization

(3) Not locating at a “tabu” location. This criterion is controlled by the tabu trace on cell agent, which is marked by the previous searching steps and controlled by the tabu lists of all measuring device agents. The tabu search algorithm is implemented as an action (interaction) of measuring device agent (or Node agent). The pseudo code of this algorithm is as follows: ALGORITHM: Tabu search (Micro-scale approach in network optimization). Inputs: ° location: Old location of node. ° Heuristic Cellular Automaton: The heuristic cellular automaton based on the UKSEs or the combinational arguments. Output: ° location: New location of node. Complexity: O(n) BEGIN 1 neighborCells = this.getNeighborCells(8, 1); 2 IF (existsMonitoredCell(neighborCells)) 3 this.location = getLocation(new); 4 ELSE 5 localBestUKSE = getMaximumLocalUKSE(neighborCells); 6 currentCell = getCellAt(this.x, this.y); 7 potentialCell = getCellAt(x, y); 8 maximumPriority = getPriority(potentialCell); 9 currentPriority = getPriority(currentCell); 10 IF(maximumPriority

The existsMonitoredCell(Cell[] cell_list) function verifies whether any cell in cell_list is monitored by another device agent. If it is a case, TRUE will be returned, otherwise FALSE will be returned. The updateTabuList(Cell the_cell) function adds the_cell into the tabu list and removes the oldest cell from this list if the length of the previous list reaches the maximum threshold.

Chapter 4 ‒ Multiple scales optimization

4.4.3. Flock problem of micro-scale algorithms

The micro-scale algorithm has a big limitation for some flocks of measuring devices at some local maximal GPE locations. Figure 45a shows an example of this problem where some flocks of new devices are circled. Figure 45b shows the same network layout, where the measuring devices are visualized on the map of the virtual environment.

a) Heuristic cellular automaton based on the UKSEs b) Map of real environment & measuring devices. & measuring device agents.

Figure 45. Flock problem of greedy algorithm.

In fact, this problem is mentioned in many works and a lot of solutions are proposed. In [83], beside the objective function, to solve the flock problem, Krause used the mutual information between the chosen locations and the locations which are not selected. A balance function for overcoming this problem will be introduced in the next chapter, we presented another balance function in [136]. An unbalanced network, in practice, can lead to some problems of spatial interpolation, and directly affect its performance. In Kriging interpolation, the empirical variogram may be inexact, because the number of sampling point pairs in each distance bin are not ensured. Some distance bins can have a lot of pairs, but some others have nothing or very few of sampling point pairs. We have introduced two typical micro-scale algorithms, hill-climbing search and tabu search. The flock problem is also indicated as the unbalance of the layouts, which can cause some errors in the estimation techniques. 4.5. Meso-scale optimization

In this section, we introduce two different meso-scale algorithms. The first one is an improved version of the hill-climbing search algorithm. In this algorithm, we define a 71

Chapter 4 ‒ Multiple scales optimization

balance function and apply it to the new measuring device agents. The second one is also a typical metaheuristic approach, the PSO algorithm [72] [132]. The balance function is also applied for this PSO algorithm as an improved solution. All the algorithms presented in this section apply the same heuristic cellular automata mentioned in Figure 43. Meso-scale algorithms consider the interactions between the measuring devices in a limited scope. In this case, a measuring device agent considers not only the local information provided by the heuristic argument, but also some collaborative information from the neighbor devices. Hence, although there are some collaborations between different devices, this approach is still considered as a local optimization method because of the narrow collaborative scope. In some contexts, the interactions between the devices at meso-scale essentially focus on the local competitions or the swarm behaviors. The term of “meso” can be considered for any kind of scale existing in the model, such as the decisional scale, the administrative scale, the spatial distance scale, etc. In the meso-scale optimization algorithms, the measuring device agents have more information than in micro-scale one. In theory, with the supplementary information from the neighbors the device agent may give a better decision in the optimization process.

4.5.1. Improved hill-climbing search algorithm

To overcome the flock problem, we propose to add a new perception skill to the Node agents. With this new skill, each Node agent can be aware of its balance status inside the group of neighbors, and then decides to locate at the most balanced area. We define a balance function balance(z) in Equation 21 to solve this problem. Maximization of balance(z) can be considered as an added objective applied in our improved algorithm. Figure 46 shows an example of neighbor relations of measuring device agent A. This agent has 4 neighbors at the current simulation step; the requirement is that the location of A must be most balanced with their neighbors in the next step. There are two questions for the balance requirements: (1) Why is the balance necessary? and (2) How is the balance? The first question has been answered in Section 4.4 when we discuss the flock problem of the micro-approach. Therefore, there are many answers for the second question. The balance function of a specific site z is defined as below: (21) where neighors(z, h) returns() the= set (of (neighbors, of z (by, distance)) h, d(z, neighbors(z, h)) returns the minimum distance from z to all measuring devices found in neighbors(z, h). In Figure 46, we try to apply this balance function for measuring device A, so the balance(A) will be considered. Assuming that h is equal to 5, we have neighbors(A, h) =

Chapter 4 ‒ Multiple scales optimization

{N1, N2, N3, N4}. Also, we have d(A, neighbors(A, h)) = { d(A, N1), d(A, N2), d(A, N3), d(A, N4)} = {5, 3√2, 3, 3}. Therefore, the balance function of A returns:

() = , √, , = Figure 47 shows a screen shot of new optimization strategy by integrating the balance condition into the objective function of micro-scale approach defined in the previous chapter. The new objective function is defined as follows: (22) where denotes the balance() function = at location() ∗ z and() denotes the Gaussian process entropy at location z. When choosing a new place to move to in the optimization process, a Node agent will select the neighbor location which has the maximum value of function. (∙) By comparing the solutions presented in Figure 45b and Figure 47, it is obvious that the balance of measuring devices in Figure 47 is better. In this figure, the flock problem is already solved indeed. We have not discussed yet about the performance of the network layouts in this section, it will be assessed in Chapter 5.

Figure 46. Example of close neighbors, Figure 47. Screen shot of the balance considered in the balance function. function & micro-approach optimization.

4.5.2. Particle Swarm Optimization (PSO)

PSO is also classified as a population-based metaheuristic optimization algorithm. This technique is originally attributed to Kennedy, Eberhart and Shi [68] [69]. PSO can be considered as a greedy method based on both the individual perception (particle) and the group perception (swarm) of an organism. This technique is inspired by the collaborative behavior of a school of fish or a flock of birds in search for food. Cecconi et al. introduced that PSO is an efficient method for many possible applications [72]. PSO is an interesting selection and needs to be investigated in network optimization where a

Chapter 4 ‒ Multiple scales optimization

measuring device can be considered as a dynamic particle. PSO was also applied in network optimization by Choi [132]. PSO is also considered as a meso-scale optimization algorithm and the second representative at this scale. In agent-based model, PSO algorithm is based on two search levels of node agents: in neighbor cells (i.e. the perception of a node agent ‒ particle search) and in neighbor nodes (i.e., the perception of a group of node agents ‒ swarm search). Unlike the previous meso-scale strategy, i.e., the improved hill-climbing search algorithm presented in Section 4.5.1., PSO algorithm focuses on the “profits” of the neighbor nodes when making decision on moving strategy. 4.5.2.1. PSO algorithm Three basic optimization criteria for the measuring device locations need to be considered in the objective function: (1) Locating at a particular location where the agent believes that there is a maximum local/global Gaussian process entropy. This criterion is decided by two factors: the local observation and the swarm observation; the decision as per location is a sum vector of two other vectors. The first vector points from the current location to the local maximum location. The second vector points from the current location to the swarm maximum location (or global maximum). Random coefficients are also added into the sum vector. (2) Locating at a “feasible” location. This criterion is controlled by the obstruction index of cell agent. (3) Not locating at the location out of the boundary of the study region. After adding k new nodes into the network with arbitrary locations, the PSO algorithm will be invoked for each simulation step. This algorithm is implemented as an interaction of Node agent. The pseudo code of this algorithm is as follows: ALGORITHM:Particle Swarm Optimization(PSO). Inputs: ° location: The old location of node. ° Heuristic Cellular Automaton: The heuristic cellular automaton based on the UKSEs or the combinational arguments. Output: ° location: The new location of node. Complexity: O(n2) BEGIN 1 neighbor_cells = this.getNeighborCells(8,1); 2 neighbor_devices = this.getNeighborDevices(RADIUS); 3 local_best_UKSE = getMaximumLocalUKSE(neighbor_cells); 4 global_best_UKSE = getMaximumGlobalUKSE(neighbor_devices); 5 _vx = 0; _vy = 0; _c1 = 2; _c2 = 2; _w = 1; 6 _vx = velocity_x * _w + _c1 * rnd(1) * (local_best_UKSE.x ‒ this.x) + _c2* rnd(1)*(global_best_UKSE.x ‒ this.x);

Chapter 4 ‒ Multiple scales optimization

ALGORITHM:Particle Swarm Optimization(PSO). Inputs: ° location: The old location of node. ° Heuristic Cellular Automaton: The heuristic cellular automaton based on the UKSEs or the combinational arguments. Output: ° location: The new location of node. Complexity: O(n2) 7 _vy = velocity_y * _w + _c1 * rnd(1) * (local_best_UKSE.y ‒ this.y) + _c2* rnd(1)*(global_best_UKSE.y ‒ this.y); 8 // Verification the new location: 9 _x = fixedToBoundX(this.x + _vx); 10 _y = fixedToBoundY(this.y + _vy); 11 current_cell = getCellAt(this.x, this.y); 12 potential_cell = getCellAt(_x, _y); 13 maximum_priority = getPriority(potential_cell); 14 current_priority = getPriority(current_cell); 15 IF (maximum_priority < current_priority) 16 potential_cell = current_cell; 17 maximum_priority = current_priority; 18 ENDIF 19 this.location = getLocation(potential_cell); 20 RETURN this.location; END

The getNeighborDevices(Float the_radius) function returns a list of measuring devices, where the distance from the current device (this) to these devices is less than or equal to the_radius. The getMaximumGlobalUKSE(Node[] node_list) function is the principal difference between the PSO algorithm and the previous algorithms, where the global heuristic values are considered. This function returns the maximum UKSE value monitored by node_list. The fixedToBoundX(Float x) and fixedToBoundY(Float y) functions help to move the current node back inside the boundary of the study region. Other functions have the same definition presented in Section 6.3. 4.5.2.2. Experiments PSO algorithm requires a particular number of simulation steps to obtain the convergent status. Figure 48 shows an example of PSO process with 24 steps. Because the new measuring devices are generated with random locations, number of steps to be performed to achieve the convergent status will be different for each simulation. In addition, the result of PSO algorithm is also different for each simulation.

Chapter 4 ‒ Multiple scales optimization

New nodes (White)

Existing nodes (Blue)

a) PSO ‒ Initial optimization step b) PSO ‒ 5th optimization step (new nodes with random locations)

c) PSO ‒ 11th optimization step d) PSO ‒ 24th optimization step (Convergent)

Figure 48. PSO optimization process ‒ Meso-scale network organization.

4.6. Macro-scale optimization

This section presents the CDSN-based technique, a macro-scale optimization algorithm. This new algorithm is based on CDSN defined in Section 3.2.2. We present the mechanisms of adding measuring devices into an existing CDSN, and how these devices are integrated into this CDSN within the virtual environment.

4.6.1. Optimization strategy

We propose an agent-based optimization strategy to add k new devices into the existing network of n devices. In fact, these k devices, or k Node agents, will be added to CDSN in sequence; this addition process is applied by k iterations in a loop. Each iteration is used to add one Node agent. 76

Chapter 4 ‒ Multiple scales optimization

Adding a new node into CDSN consists of three main steps: (1) Determining the potential zone on which new node can be located: The number of connections from a node to the others, or the degree of node, is used to used to determine the potential zone. In CDSN-based method, the degree of nodes is the first criterion for optimization. If a node has few connections to the others (low degree), it can be considered to add more nodes around. Thus, a node N0 with the lowest degree must be chosen, and the space around N0, by distance of disk radius, is determined to be the potential zone to place a new node. Otherwise, if we want to delete a node from CDSN, a node with highest degree (or its neighbors) is considered to be removed. This strategy aims at a global balance in the optimal layout. (2) Determining the location inside the potential zone: All cell agents in the potential zone will be examined, then the cell agent with maximum priority will be chosen to place a new node. A priority function is defined to support this process. The priority function receives a location s and returns its priority. Two main factors will be considered by this function: Gaussian process entropy and balance of node. (23) () = () ∗ () where ° denotes the Gaussian process entropy at location z; can be used( as) such a function. (∙) ° denotes the balance function at location z, presented in Equation 21. () The local-constraints are also considered in this process. Like the PSO algorithm, the new node must be located at a “feasible” location, which is controlled by the obstruction index of cell agents. (3) Adding the new node into CDSN: To ensure the consistency of the CDSN-based layout, the condition of autocorrelation [57] [79] must be considered when adding new node into the neighbor location of current node . Krause illustrated a map that contains the correlation coefficients of data between a specific location with all others [83]. This idea is reused in this work to determine the correlation coefficients between the new node and the others.

Denoting Pi as the vector of correlation coefficients between vi and other (n − 1) measuring devices, we have: (24) = , , … , (), , (), … .

Chapter 4 ‒ Multiple scales optimization

The correlation coefficients between ‒ the location of ‒ and all the sites located in the study region can be easily generated from this ve ctor by Kriging technique, where each node vj keeps value of , and only node keeps value of 1. Thus, the correlation coefficient between and is estimated. Value of is used to determine the connection(, ∗) between and∗ . (, ∗) ∗ 4.6.1.1. Optimization algorithm As the optimization strategy presented, the CDSN-based optimization algorithm is implemented as follows: ALGORITHM: Addition of new node in to a CDSN. Inputs: ° network: CDSN with n nodes ° Heuristic Cellular Automaton: The heuristic cellular automaton based on the UKSEs or the combinational arguments. ° k: number of added vertices Output: ° network:CDSN with n + k vertices Complexity: O(n2) BEGIN 1 i=0; 2 LOOP (i<=k) 3 current_node=getMinimumDegree(network); 4 cells=getNeighborCells(current_node,radius); 5 potential_cell= NULL; 6 maximum_priority=0; 7 FOREACH (current_cell in cells) 8 current_priority=getPriority(current_cell); 9 IF (maximum_priority

The getMinimumDegree(CDSN network) function returns a node which has the minimum degree. If there is more than one node found, one of them could be returned in sequence. The getPriority(Cell the_cell) function is an implementation of Equation 23 in the agent-based model. The getNeighborCells(Node node, Integer radius)

Chapter 4 ‒ Multiple scales optimization

function returns all the cells which have the distance to node less than or equal to radius. The createNewNodeAt(Cell cell) function is used to create a new node at location of cell. This optimization algorithm does not consider the third condition related to the local constraints yet. To apply it, we can add the local condition into the 9th code line: IF(maximum_priority < current_priority AND LOCAL_CONSTRAINTS) Local constrains can be heuristic and controlled by the environmental indices (Table 6), specially, the obstruction index. This condition can be defined as LOCAL_CONSTRAINTS = (obstruction_index ≤ threshold).

4.6.2. Experiments

Unlike the metaheuristic solution, CDSN-based algorithm can be executed in only one simulation step. After this optimization step, the surveillance network model begins to collect the trap-density data from the BPH prediction model. These collected data will be stored in the database to apply the verification processes for assessing the performance of this potential network layout.

Figure 49. Layout with only two first Figure 50. Layout with all three objective objective functions: Gaussian process functions: Gaussian process entropy, entropy and Balance of network. Balance of network and Local-constraints.

Figure 49 shows the potential layout in which only the two first parameters of the optimization process are considered: the Gaussian process entropy and the balance of network. The heuristic cellular automaton based on the UKSEs, presented in Figure 43a, is applied in this experiment. The red circles represent 32 new vertices (measuring devices), and the blue ones indicate 48 current vertices in the CDSN. All new vertices are located in the region of high standard error, but they are not necessary in the local maximum region.

Chapter 4 ‒ Multiple scales optimization

In Figure 49, some new vertices are located at “infeasible” locations. Red squares mark seven new vertices which are located at the sea (2 vertices), on the river (2 vertices) and in regions without rice (3 vertices). These locations, of course, cannot be accepted by the end users. The same view of this case study is illustrated in Figure 50, where all the vertices located at “infeasible” locations do not exist. This result is due to the local constraints, via an aggregate obstruction index, applied in the objective function. In fact, the end users can apply different scenarios of these constraints to verify many “what-if” experiments. 4.7. Conclusion

This chapter presents the implementation of multiple optimization algorithms for NOVEL. These optimization algorithms are implemented in the “Optimization Processes” sub-system. Based on the agent-based modeling technique, with supports of self-organization, multiple scales of network optimization are defined, including the micro-, meso- and macro-scales; and their appropriate optimization algorithms are selected to implement in NOVEL. Several representatives of each scale are introduced including the pseudo codes:

Table 12. Optimization algorithms in NOVEL.

Optimization scale Implemented Proposed & implemented algorithms algorithms Micro-scale Metaheuristic (Local optimization) ° Hill-climbing search ° Tabu search Meso-scale Metaheuristic Metaheuristic (Hybrid optimization) ° Particle Swarm ° Improved hill-climbing search Optimization (PSO) Macro-scale ° CDSN-based optimization (Global optimization) algorithm In addition, with supports of several the combinational index, especially the obstruction index as presented Section 3.2.3, the optimization processes have more arguments and adapt the complexity of the environment better. Heuristic search, based on the Gaussian process entropy, is the main approach of all presented optimization algorithms. Furthermore, with an “open” architecture of NOVEL into which the other optimization algorithms can be easily deployed. The results of these algorithms are called as the “potential” network layouts, which will be verified by simulation process presented in Chapter 5.

Chapter 5 ‒ Experimental assessment

CHAPTER 5

EXPERIMENTAL ASSESSMENT

Chapter 5 concentrates on the most important contribution of this thesis, where all potential network layouts will be verified by simulation. Instead of assessing these layouts on the real context, this assessment will be conducted in the virtual environment. Three main performance indicators of surveillance networks are also presented. Beside these single indicators, various aggregate performance indicators at multi-scale criteria are also suggested, including the spatial, temporal and diversity scales. The configuration of the surveillance network proposed by the optimization algorithms are systematically simulated and assessed with respect to corresponding performance indicators that use simulated trap densities.

5.1. Why is it necessary to assess the performance of a surveillance network?

Assessing the performance of a specific environmental surveillance network is the final step of every optimization process. A surveillance network operates as a perception tool for a particular management system. In case of a weak performance network due to the problem of layouts, an optimization is applied. Usually, the performance indicators must belong to a specific estimation technique. There are two reasons according to which the end users need to assess the network performance: (1) If the network performance is good, the end users believe that they know the current status of the environment well, then some related decisions will be more believable. (2) If the network performance is bad, these indicators can be used as different criteria to assess new potential layouts. There are some considerable attentions when assessing a potential network layout: 1) For cases of deleting the measuring devices from the network, we can assess the performance of the potential solution without doing any further experiment. In this case, all retained measuring devices have their own data from the existing network. 81

Chapter 5 ‒ Experimental assessment

2) In other cases (adding new measuring devices, re-planning the placement of measuring devices, or the hybrid solutions), there is not enough data for network assessment. Unfortunately, case (1) is rare, almost all situations coincide with cases (2). Hence, the three following methods are possible for the assessment tasks:

° Finding a verification algorithm without data. This method has an advantage of low cost, but it is not accurate. In fact, it is very difficult to persuade the end users with only theoretical arguments. This method is less believable because it does not use data for decision support.

° Applying the potential solution in the real context to get real data, then evaluate these experimental data. This solution is really ideal. Imaging that any potential solution can be thoroughly assessed in the real contexts, and then the best solution will be selected. But almost all applications do not have an occasion to assess the performance of the potential network layout in real life, except for some special cases: to perform in a very small scale, to do the experiments easily, to invest a very big amount of experimental fund, etc. This method has an advantage of high accuracy. However, there are a lot of disadvantages, which are:

- Very high cost, often impossible for investment. - Low accuracy if the experiments are probably performed at a micro scale. - Not being able to answer the questions “what-if” for many situations.

° Building a virtual laboratory for simulation, and applying the potential network layout inside it. The evaluation can be also applied on the simulation data. Some advantages of this solution can be listed as the followings:

- Answering the questions “what-if” for many situations well. - Low cost. - Working at any scale of application. - Rather high accuracy depending on the validation of NOVEL. 5.2. Characteristics of a good surveillance network

In statistics, the terms of “sampling” and “population” are well defined [40] [59]. To understand any system, the end users often begin from the sampling data, but they usually need to be aware of the whole population. The surveillance network is a tool that provides the sampling data for related issues, hence, it plays an important role in observing the population. Usually three observation scales can determine the quality of a surveillance network: spatial, temporal and diversity scales. Figure 51 shows the relations between the sampling data and the population, where the sampling data are just a set of particular observation points of the population.

Chapter 5 ‒ Experimental assessment

Figure 51. Three data scales of surveillance network.

(1) Spatial scale: This scale is related to the spatial sampling density. In a balanced network, where the distances between different neighbor devices are nearly equal, the sampling density can be easily calculated due to number of devices per a distance unit. Almost all satellite sensing data are observed by a balanced surveillance network, i.e., Aster GDEM8 or AVISIO9. However, it is not always easy to establish such a balanced network because of many reasons. In practice, multiple local constraints are often considered when arranging the devices, such as the reliefs of the environment (rivers, seas, mountains, ...), the power sources for the measuring devices, the residential zones, etc. In fact, the performance of a surveillance network is strongly affected by its node density. In an unbalanced network, this impact also depends on the specific local constraints, and it is really hard to assess its performance. Moreover, these local constraints can differently affect the distribution of the observed variables. (2) Temporal scale: This scale is related to the sampling stages. The sampling time can be set to one, two or full stages depending on the specific purposes of the end users. In some situations, the performance of a surveillance network also depends on the local constraints, e.g., the rain can disable the operation of a light-trap even if the light is still turned on. The greatest concern of the temporal scale is the time interval between two consecutive sampling stages. This interval should be considered based on the characteristics of the observed objects and their dynamics in time. Sometimes, this scale depends on the capability of equipment or the maintenance fee.

8 http://asterweb.jpl.nasa.gov/gdem.asp 9 http://www.aviso.oceanobs.com/index.php?id=1272 83

Chapter 5 ‒ Experimental assessment

(3) Diversity scale: This scale is related to the number of variables monitored by the surveillance network. A measuring device is often a multifunctional equipment that allows simultaneously observing a set of variables at a specific spatio-temporal point. The diversity scale plays an important role in the multivariate research, such as the correlation between different variables or analyses of their interactions. For example, some biodiversity indices, such as the Shannon or Simpson indices [43] [55], play an important role in different environmental evaluations. In addition, some interpolation techniques, e.g., co-kriging [137], require multiple parameters for their computation instead of one. Such multivariate research, in many cases, gives a quality better than a univariate one. In a limited scope of the research, the diversity scale is not thoroughly discussed in this thesis, but it will be mentioned in our future works. Example of AVISIO datasource

a) Sampling locations, AVISIO datasource b) Sea height data, AVISIO datasource

Figure 52. Example of Sea height data (AVISIO datasource).

Figure 52 presents a satellite network of AVISO datasource with a high sampling density. We also concern the sea height parameter in this example. We use the variogram, a technique of geostatistics, to analyze the performance of this network. Variogram of sea height data is drawn in Figure 52b, where the semivariance value (y axis) via the distance lag (x axis) nearly forms a straight line. In this case, the linear regression model for this variogram is clearly rational. With such a network, further techniques for estimating the sea height become more reliable. The network performance can be evaluated in two different contexts: real and simulation. The real performance is determined by actual data of the existing network, whereas the simulation performance is based on simulation data which are collected by the surveillance network model in the simulation process. Only the second context is

Chapter 5 ‒ Experimental assessment mentioned in this research. Some performance indicators will be defined in Section 5.3 to support this work. All potential layouts, generated by different optimization algorithms, will be verified by one of these indicators to determine the “optimal” one. 5.3. Performance Indicators of surveillance network

In the previous section, the physical characteristics of a good surveillance network is essentially discussed. In this section, we discuss some logical methods to assess the performance of a surveillance network. Back to the purpose of the sampling technique, the final goal is to study the population of observed variables. Therefore, if there is a particular interpolation technique that gives the interpolated values with small errors, we will conclude that this is a well designed and high performance surveillance network. Hence, the assessment of surveillance network performance is probably complied with an uncertainty measurement as per specific interpolation technique. As aforementioned, Gaussian process regression [82] (GPR) is the main interpolation technique used in our research, where the performance of surveillance network is essentially assessed by its variance. GPR has its own conveniences as it provides not only the mean values of estimation but also the estimation variance. The Gaussian process entropy (GPE) is well detailed in Section 2.8.4. We know that this GPE will not be changed using a specific covariance function, but we cannot apply only one covariance function for all time-points. For this reason, the performance of a surveillance network may be changed by time. For example in Figure 53, the empirical variogram of light-trap data of each day is significantly changed, as the result, all the hyperparameters of its regression model must be changed consequently. In some cases, even the form of variogram model must be changed (e.g., from the linear model to the exponential model). We will discuss the performance of the surveillance network at different temporal scales in the next section, here we just focus on the network performance at a single time point of one single variable.

a) January 21, 2010 b) January 22, 2010

Figure 53. Two different empirical variograms of BPH light-trap data.

Chapter 5 ‒ Experimental assessment

In general, Kriging estimation consists of two principal steps: (1) Determining the variogram model and (2) Estimating the mean value and variance of the observed variable at a specific spatial point. The precision of each step directly affects, first, the final estimation results, and then the network performance. To assess the performance of a surveillance network, we introduce three indicators which can be applied to a univariate network; other suggestions for various aggregate performance indicators are also presented in Section 5.3.4.

5.3.1. Indicator 1: Root mean square error of variogram model

This performance indicator is directly calculated based on the variogram model (approximation function) of an empirical variogram [62]. Root mean square error (RMSE) is defined as the square root of the mean square error [127]. It is calculated by the equations below:

(25) ∑( − ) ( − ) = (26) where denotes the empirical( variogram− ) = at ith lag( distance− ) bin, denotes the estimated variogram at ith lag distance bin, n denotes the number of lag distance bins used in the empirical variogram. Although the final estimation results are not considered following this indicator, as discussed above, it plays the key role of Kriging estimation. Therefore, the MSE of variogram model has an advantage as per the computational speed. The next example demonstrates the MSEs of the variogram model in dependence on the number of random sampling locations. Sampling density and MSEs of the variogram model Figure 54a shows a scenario of the BPH distribution in three provinces of the Mekong Delta region: Hau Giang, Soc Trang and Bac Lieu. Figure 54b is the virtual surveillance network with 1400 random locations.

Chapter 5 ‒ Experimental assessment

a) Scenario of BPH distribution in three provinces of b)1400 random sampling locations. the Mekong Delta region: Hau Giang, Soc Trang and Bac Lieu.

Figure 54. Example for a random field F(Z,T) and a set of sampling locations L.

Figure 55 shows the dependence of MSEs on the number of random sampling locations in range of 50-1000. The mean of MSE is less than a specific threshold if the number of locations is high enough. For example, if the number of sampling locations is greater than 90, the MSE will be always less than 60. In this case, the number of sampling locations can be 90 with the MSE threshold of 60.

Figure 55. Dependence of MSEs on the number of random sampling locations.

Figure 56 shows various empirical variograms built for different directions; these variograms are determined from the data of random sampling locations. We can recognize that the variograms will be the same if the directions are inverse (e.g., the variograms of 2700 and 900 are the same).

Chapter 5 ‒ Experimental assessment

Figure 56. Directional variograms at 0, 45, 90, 135, 180, 225, 270 and 360 degree.

5.3.2. Indicator 2: Deviation of cross-validation

Cross-validation (Leave-one-out method) is often used to evaluate the quality of an estimation or prediction model [20] [61]. We also propose to use this method to validate the accuracy of Kriging estimation. The performance of the surveillance network will be high if the mean error of cross-validation is low.

(27) ∑| − | (| − |) = where ° n denotes the number of sampling locations, ° denotes the sampled value at location i, ° denotes the Kriging estimated value at location i when is picked out from the sampling set.

Chapter 5 ‒ Experimental assessment

Real densities

Estimated densities

Figure 57. Cross-validation in Hau Giang Province (Kriging interpolation).

Figure 57 shows the results of cross-validation for the Kriging estimation [19] [20] with range of 50 km, and regression model is linear. The idea for cross-validation is rather simple. According to this, we pick one sampling point out of the network and apply the Kriging estimation for the shortened list; the loop is applied for every sampling point. The red points on the figure indicate the real values and the blue ones indicate the estimated values in case of missing this one. At some outlier points, the value of deviation Δ is too high, so the precision of estimation is low at these points. In fact, we cannot remove these outliers because it is impossible to estimate exactly the variable there. On the other hand, a sampling location having low deviation can be removed if needed.

5.3.3. Indicator 3: Mean universal Kriging variance (MUKV)

Why is it necessary to use the universal Kriging? Ordinary Kriging (OK) is assumed that the estimated variable is stationary in the whole study region. In OK, an unknown mean value µ is applied for all estimated locations. This assumption seems not true for many types of environment, especially for the insect ecosystems. Figure 4 shows an example for mean values of sampling locations' calculated trap-densities. On this figure, mean values of light-traps are totally different. So an assumption of non-stationarity is more appropriate for BPH population. An improvement of OK is Universal Kriging, which supports the estimation with non- constant mean assumption. The equation below is used to calculate the MUKV [8] [138] - the third network performance indicator:

Chapter 5 ‒ Experimental assessment

(28) ∑ = where ° n denotes the number of estimated locations in the study region, th ° UKVi denotes the Universal Kriging Variance at the i estimated location.

5.3.4. Aggregate performance indicators

We have discussed some typical network performance indicators in the previous sections. These “single” indicators can be applied for a surveillance network. The “single” word means that these indicators are only applied for a specific point of three scales of a surveillance network (Figure 51): temporal, spatial or diversity scales. For example, the MUKV indicator [8] is applied for one specific day (temporal point), one specific province (spatial point) or one particular insect (diversity point). Among them different aggregate indicators are necessary in many situations. For example, the question “How to evaluate the performance of the light-trap network of Hau Giang province in January 2010 in monitoring the BPH invasion?” requires the aggregation of MUKV in one month (31 days), and the temporal scale is larger than one time point in this situation. For another example, the question of “How to evaluate the performance of the light-trap network of Hau Giang Province in January 2010 in monitoring the BPH & WBPH invasions?” requires an expanded scale pertaining to both time and diversity. Depending on different needs of the end users, the scale of the performance indicators is respectively proposed. There are two factors in determining an aggregate indicator: ° Aggregate function: Beside some first or second moments, some fuzzy functions are appropriate choices, e.g., the fuzzy logic is used to combine all objective criteria into a single fuzzy function in [131]. ° Specific scale: Determining specific single indicators that participate in the aggregate functions. We have introduced three single performance indicators and some suggestions for various aggregate indicators. Three data scales are also analyzed and applied as a part of these aggregate indicators: spatial, temporal and diversity scales. Next section will present some experiments of the assessment process by simulation. 5.4. Experiments

In this section, we introduce the experiments of three optimization algorithms presented in Chapter 4: (1) Improved hill-climbing search algorithm, (2) PSO and (3) CDSN-based optimization. An existing network of 48 light-traps established in three provinces: Soc Trang, Hau Giang and Bac Lieu in the Mekong Delta region, Vietnam will be optimized using these three algorithms and then assessed by NOVEL. RMSE of variogram model is used to evaluate the efficiency of potential layouts. BPH trap-density data of 32 consecutive days from January 1, 2010 are used in these experiments, where the value of 32 means the maximum duration of BPH life cycle. These data are used to initialize the virtual data

Chapter 5 ‒ Experimental assessment

for all the agents before running the prediction actions (by calling the BPH prediction model in 5.4.1) and measuring actions (by calling the Surveillance network model in 5.4.2). The optimization algorithm must be applied before the prediction step when 52 new measuring devices will be added into the existing network. Now, we have a new network with 100 nodes, which is called as the potential network layout. Then, the experimental assessment process follows after this optimization step. To do so, we apply 32 prediction steps for the assessment process, then the RMSE of variogram model based on the virtual data of surveillance network will be calculated for each step. RMSE values of both 48-node existing network and 100-node potential network layout are calculated. For each optimization strategy we apply a batch of 15 simulations, and the performance of network is aggregated from all these simulations.

5.4.1. Improved hill-climbing search algorithm

a) Heuristic cellular automaton based on the b) One potential layout proposed by the ordinary UKSEs and one potential layout hill-climbing search optimization

Figure 58. Ordinary greedy optimization.

Figure 58a presents the Heuristic cellular automaton based on the UKSEs, which provides the heuristic information for the objective function of the greedy algorithm. Figure 58b shows a potential layout based on the ordinary hill-climbing search optimization. In this example, the flock problem is found on some regions of high UKSE with too much measuring devices.

Chapter 5 ‒ Experimental assessment

a) One potential layout generated based on the b) RMSEs of the variogram models calculated for improved hill-climbing search algorithm the current, potential and added layouts

Figure 59. Improved hill-climbing search algorithm ‒ Assessing the network performance.

Figure 59a shows a specific case of the improved hill-climbing search algorithm (with the support of balance function). The RMSEs of the variogram models calculated for the current, “optimal” and added network layouts are presented in Figure 59b.

Figure 60. Improved hill-climbing search algorithm ‒ Network performance in 15 simulation instances.

Figure 60 shows the RMSEs values of the variogram model calculated from 15 simulation instances of the improved hill-climbing search algorithm. The ratio between the RMSE value of the potential network and the current one is observed for each simulation instance. The mean value of these ratios in 15 simulation instances is

Chapter 5 ‒ Experimental assessment

78.91%, i.e., the potential layout is better than the existing one in average. In detail, the minimum ratio of 47.80% is assessed as the best solution, and the mean value of 5 best ratios is 56.50%.

5.4.2. Particle Swarm Optimization (PSO)

Because the PSO is a stochastic algorithm where the random process is implemented inside the moving process, as the previous example, we generate 52 new nodes of arbitrary locations in the study region. Only these new nodes participate in the PSO moving process, while the old ones are fixed. The potential solution is found in case of convergent PSO process, in which no change is observed any more. In our experiments, the PSO often becomes convergent after 20 steps of simulation. The number of simulation steps depends on the dispersion of the heuristic cellular automaton applied for the optimization algorithm. Each simulation will generate a potential layout.

Figure 61. PSO - Network performance in 15 simulation instances.

Figure 61 shows the RMSE values of variogram model calculated from 15 simulation instances of PSO. The ratio between the RMSE value of the potential network and the current one is observed for each simulation instance. The mean value of these ratios in 15 simulation instances is 163.29%, in this case, the “optimal” solution is worse than the current network in average. However, it should be noted that this is the average of 15 different layouts, where the best solution is correspondent to the ratio of 48.48% and the worst solution ‒ ratio of 384.02%. The average of 5 best ratios is 72.76%. We suggest that these top 5 optimal layouts needed to be assessed to determine the best one for the network.

5.4.3. CDSN-based optimization

CDSN-based optimization is a deterministic optimization algorithm. It provides only one potential layout for every simulation instance. Figure 62 shows the RMSE values of variogram model calculated from 15 simulations using CDSN-based optimization.

Chapter 5 ‒ Experimental assessment

However, as mentioned above, all these simulations are applied for only one potential layout. The ratio between the RMSE value of the potential network and the current one is observed for each simulation instance. The mean value of these ratios in 15 simulation instances is 55.63%. This result is even better than one in case of 5 best potential layouts using PSO.

Figure 62. CDSN-based optimization - Network performance in 15 simulation instances.

A significant remark for CDSN-based optimization is the stability of the “optimal” network performance. In the whole simulation process, it is easy to see that the RMSE values of the “optimal” networks are always smaller than ones of the existing network. This characteristic cannot be found in the metaheuristic one.

5.4.4. Comparison between different optimization algorithms

In general, NOVEL can assess the performance of each potential layout via performance indicators. This section gives a brief comparison between different optimization algorithms. The basic condition of an optimization algorithm is nearly satisfied in almost all potential layouts that their performance must be better than one of the current layout.

Chapter 5 ‒ Experimental assessment

Figure 63. Mean value of RMSEs (of variogram) with different optimization algorithms.

Figure 63 illustrates the mean value of RMSEs of different network layouts: Current layout, Hill-climbing search layout, Tabu search layout, PSO layout and CDSN-based layout. In the CDSN-based optimization algorithm (red line, round bullet), the RMSEs are assessed by one and only one layout for every simulation. Otherwise, in case of three other metaheuristic algorithms, the RMSEs are assessed by a newly-generated layout for each simulation. As the result of this comparison, it is obvious that there a consistency in the performance of CDSN-based layout, although its performance is not the best one, i.e., the performance of the new layout is always better than one of the current layout. As per PSO algorithm (green line, diamond bullet), a similar result is obtained, i.e., a rather consistency in performance is observed; but it is not really competent. The improved hill climbing search algorithm (including the balance function) gives an efficient performance (black line, round bullet), while tabu search one has an inconsistent performance (brown line, diamond bullet). As discussed above, this comparison is just for reference. In fact, we suggest that the second round of assessment must be performed. In this round, top five potential layouts of each metaheuristic algorithm should be individually assessed to find out the best one. 5.5. Conclusion

Chapter 5 introduces the final process of the network optimization, where the potential layouts are assessed to identify the best solution. As aforementioned, actually, the assessment of an environmental surveillance network is a difficult work for any optimizer; and our virtual laboratory can support an economic and efficient way in solving this problem. However, all presented modeling and simulation processes are

Chapter 5 ‒ Experimental assessment just applied for the insect ecosystem with a light-trap network. In Chapter 6, we will generalize this approach as a methodological proposal for similar environmental surveillance networks.

Chapter 6 ‒Methodological proposal

CHAPTER 6

METHODOLOGICAL PROPOSAL

This thesis is presented following the bottom-up approach, as per which firstly a concrete case of an insect ecosystem and a light-trap network is studied. Then, secondly, we try to generalize the method applied in the concrete case to make it reusable for other similar cases. This chapter is dedicated to the presentation of the generalization process in this chapter, where all the components and processes will be systematized as a methodological proposal. We hope that this proposal can support the optimizers to make their own virtual laboratory better.

6.1. Introduction

This chapter provides a general methodological proposal for supporting the design and optimization of environmental surveillance network in multi-agents. In fact, our proposal is based on the experience obtained when working with the light-trap network of the insect ecosystem. Hence, this proposal aims at describing a general virtual laboratory for many similar environmental surveillance networks. In practice, each environmental surveillance network has its own individual characteristics, hence, it is necessary to adapt these characteristics to a virtual laboratory. Therefore, there are three main stages needed to be clarified before developing any specific virtual laboratory: (1) System analysis and design: This stage aims at determining the available data sources and designing the architecture of the virtual laboratory. Determining the available data sources, which can be reached by modelers, is an important task in analysis process. In fact, these data sources strongly affect the design of the whole system. The inputs/ouputs and functional components will be identified after these analyses. (2) Construction: Databases and all functional components will be implemented in this stage. The virtual laboratory must support to perform various layouts of

Chapter 6 ‒Methodological proposal

surveillance network. Also, all important characteristics of ecosystem must be well modeled. (3) Optimization by simulation: Applying multiple virtual experiments including proposing new potential layouts of surveillance network and assessing them inside the virtual laboratory. Based on the experiences of the light-trap network applied in insect ecosystem, we propose the architecture of a general virtual laboratory, which can adapt to other similar surveillance networks well. This virtual laboratory, as mentioned above, is intentionally built to support the design and optimization of an environmental surveillance network. An abstract view of the general virtual laboratory is illustrated in Figure 64.

Figure 64. Abstract view of the general virtual laboratory.

Figure 64 shows an extended version of NOVEL in Figure 14. This figure presents an abstract view of the general virtual laboratory. Two significant differences between these two figures are found: (1) Parameters and (2) Functional components.

6.1.1. Parameters

The environmental/ecological system, in a narrow sense, can be assumed as a “semiparametric” model. A “semi-parametric” model contains infinite-dimensional parameters of interest and infinite-dimensional nuisance parameters [139]. As per this assumption, the parameters are classified in 2 groups: parameters of interest and nuisance parameters. In other words, these parameters are divided into two groups: abstract parameters (aggregation of parameters of interest) and combinational parameters (aggregation of nuisance parameters).

Chapter 6 ‒Methodological proposal

Supposing that there are n parameters of interest in the system. We propose that all these n parameters of interest should be analyzed and abstracted. After the process of analyses and abstraction, only m abstract parameters are remained. These m abstract parameters will be used as the main parameters of the dynamic models. In addition, we propose that all nuisance parameters must be combined within k combinational parameters. For example, only two combinational parameters are used for an insect ecosystem: attractiveness and obstruction indices. Finally, a set of m abstract parameters and k combinational parameters becomes the final list of applied parameters of the virtual laboratory.

6.1.2. Functional components

Instead of three components, as presented in Figure 14, we suggest to build a general virtual laboratory with four components: (1) Ecological / Environmental Models (EEMs), (2) Surveillance Network Model (SNM), (3) Optimizers and (4) Evaluators. Two next sections will introduce, in more details, the method to build a virtual laboratory for various application domains. In this section, we try to classify the application domains and to identify the related works for each case. Section 6.2 will present a 5- stage procedure of our methodological proposal. In practice, environmental surveillance networks exist in many forms. They differ from each other not only by the intended use, but also by the way of collecting data. This diversity partly comes from the characteristics of study objects, relating to the inherent properties of the environment. We suggest that our approach, based on simulation, can be applied in a wide range of surveillance networks. 6.1.2.1. Building the EEMs On the other hand, EEMs of each application are more difficult to be modeled because of the complexity of the environment. However, the common similarity of these applications as per our request is that the “parameters of interest” must be installed in a cellular automaton. The environmental cellular automaton in Figure 17 is an example of this important requirement. As mentioned above, a cellular automaton can support a good interaction with measuring device agents. And through this model, it is easier to map the Gaussian process entropy with each locations of the study region. Thus, building the EEMs of virtual laboratories is the most fundamental difference between applications. In Figure 64, we easily realize that these differences essentially depend on their inputs, or the ecological and environmental data. These data are also considered as the “nuisance parameters” in our research. We classify these data into three main groups: (1) Natural phenomena data (used for environmental models), (2) Biological organism data (used for ecological models) and (3) Human impact data (used for both kinds of models). Here we introduce a set of applications for each group:

° Environmental models The meteorological or hydrological monitoring networks [140] [141] are two typical applications. Meteorological stations [142] often provide the information about the wind speed and direction, air temperature, rainfall gauge, humidity, barometric 99

Chapter 6 ‒Methodological proposal

pressure, etc. On the other hand, the hydrological stations10, e.g., stations of Mekong River Commission11, provide the information about the surface water, groundwater, water quality [143] [144]. Some physical-chemical properties of water can be measured by some specialized equipments, such as the salinity, dissolved oxygen, pH. etc [141]. Also, some special phenomena, such as earthquakes, tsunamis, etc. , are included in this kind of data. In [145], Nguyen et. al introduced a water-monitoring activity based on wells digging in Can Tho city, Vietnam. The changes of data of natural phenomena are often concerned with some basic principles of the physics, chemistry or mechanics. MM5 model can be used to predict atmospheric circulation at meso-scale [146]. WRF (Weather Research & Forecasting) model [147] is also an competitor of MM5. MIKE 11 software [148], developed by DHI12 group, can be an interesting reference for simulating the flow and water level, water quality and sediment transport in the rivers, flood plains, irrigation canals, reservoirs and other inland water bodies. Some studies conducted based on this software are developed further for the Mekong delta region. Hydrodynamic models are developed with counting the impacts of climate changes on fluvial flood propagation [149] [150], which can be also implemented into the virtual laboratory.

° Biological models A network of insects monitoring, such as light-trap network, is a typical example for this application [135] [136]. The organism objects can be plants in forest [37] [70], fish in the rivers/seas [96] [152], mosquitoes [153] or birds [110] in the study region. Biological organisms often behave following some specific biological principles; their two main activities are the growth and migration processes. Last but not least, the prey/predator relation [59] [141] is one of strong impacts on the distribution of organisms. We suggest that several growth [7] [15] [48] and migration [49] [50] models, presented in Chapter 2, should be considered in a number of specific case studies. Prey/predator models [45] and SIR models [46] are also the important ones needed to be applied for these biological organisms.

° Models of human impacts The human impacts, in many situations, increasingly affect on the ecological and environmental systems. In many cases, these impacts can totally change the distribution of a species in the environment, for example, the deforestation or land- use change as per warnings of WWF13. Controlling the spraying of pesticides and using of fertilizers in field is another example [26] [27]. According to warnings of Myers in 2003, the rapid worldwide depletion (or even extinction [155]) of predatory fish communities is caused by the ecological effects of industrialized fishing [156].

10 https://www.swfwmd.state.fl.us/data/hydrologic/ 11 http://ffw.mrcmekong.org/ 12 http://www.dhigroup.com/ 13 http://worldwildlife.org/threats/deforestation 100

Chapter 6 ‒Methodologicacal proposal

That is the reason why tthe human factors are required to be cononsidered in the EEMs. 6.1.2.2. Building the SNM As we have presented in SSection 3.3, two kinds of networks can be reused: (1) Disjointed Surveillance Netwtwork (DSN) and (2) Correlated & Diskisk graph-based Surveillance Network (CDSN)N). In case of DSN, as presentedted in Section 3.3.1, only the Node agent is required to be implemented. The most immportant role of this agent is to “monitnitor” the virtual environment, i.e., it reads ththe information of one or several neighboror cell agents of cellular automata. The sampling coordinates, aas presented in Figure 64, are used to determine the locations of Node agents in SSNM. The data structure of m abstract paarameters is the greatest concern for this agenent. In case of univariate variable, where m isi equal to one, we can use a vector, like thee Empirical_trap_density property in Tableble 7, to manage this parameter. Otherwise, iff m is greater than one, we must require a matrixma to manage these ones. A definition of datata matrix is presented as the following: Definition. Data matrix M.

Denoting the data matrix of one measuring deviced d0 if contains the measuring dataa captured by d0, where n denotes the obserervation duration (or the number of measuring ttimes) and m is the number of abstract pararameters. In case of CDSN, as presenented in Section 3.3.2, we need to implemement two more agents, Edge and Graph agegents. We suggest that, in case of multivariatate variables, the correlation condition in Equatation 14 should be replaced by an aggregatate function; this aggregate function is based oon multiple correlation coefficients, where eache coefficient is calculated from one single vvariable. 6.1.2.3. Building “Optimizersrs” “Optimizers” is a componenent which contains multiple optimizatiotion algorithms. Theoretically, the richness of algorithms will provide more choices of potentialpo layouts, and the final one could be betetter. We suggest that all presented algorithmsthm in Section 4 are probably enough for thehe fundamental requirements. However, statandardizing the library of optimization algoriththms is also one of our concerns in future woorks; especially, we will focus on adding moore metaheuristic algorithms due to their efficiency. This support will improve the ququality of optimization strategies as per theth richness of solutions. In general, there arere two important libraries needed be completleted: ° Deterministic algorithms:: EExisting libraries in R language may be the best choices, such as AlgDesign, conf.dnf.design, planor, crossdes, DoE.base, daeae or blockTools (See for more details in [9[95]). Specially, the AlgDesign library is closely clo related to works presented by Atkinsnson in 2007 [22].

101

Chapter 6 ‒Methodological proposal

° Metaheuristic algorithms: These algorithms are well adaptive with the agent-based models. Almost all of algorithms implemented in micro- and meso-scales of this thesis are metaheuristic algorithms. As aforementioned, the heuristic cellular automaton is one of the most important factors which can affect the efficiency of an optimization algorithm. We continue to suggest that the heuristic cellular automaton based on the UKSEs is reused for all the applications. Depending on the specific situation, several combinational indices can be defined and used to build a heuristic cellular automaton based on the combinational arguments, e.g., the obstruction index in case of insect ecosystem. Multivariate variables can be considered when building a heuristic cellular automaton. We suggest that the variance of co-kriging estimation [137] could be the best choice for this map. 6.1.2.4. Building “Evaluators” We suggest to reuse three performance indicators defined in Section 5.3. There is no significant change between the general “Evaluators” with one implemented for insect ecosystem. 6.2. Methodological proposal

In general, our methodological proposal is mainly based on the abstract view illustrated in Figure 64. Thus, a virtual laboratory for a new environmental/ecological surveillance network can be built by a 5-stage procedure as the following: ° Stage 1: Data collection & Parameter analysis This stage is very important as it decides the aspect of the virtual laboratory. To optimize any environmental surveillance network, we need a sufficient amount of empirical data from the existing network. Surely, the parameters of interest are determined by the observed variables of the existing network. For the end users, it is more difficult to determine the nuisance parameters because a lot of candidate parameters exist. Furthermore, the requested data are not always available. The output of this first stage is a set of abstract parameters and combinational parameters. ° Stage 2: Implementation of an integrated model of EEMs This is the most difficult stage for any complex system. Implementing multiple biological and environmental models often requires a multidisciplinary knowledge of the researchers, which must cover both the understanding of each separate science and the understanding of the connection between them. As the result, the combination and coupling techniques are usually preferred. In addition, almost all models are organized as a multi-scale system. Several important properties of the environment must be identified and satisfied in the integrated model. For example, such properties of the insect ecosystem must be

102

Chapter 6 ‒Methodological proposal

satisfied: randomness & autocorrelation, spatial dispersion & nonergodicity, and heterogeneity. If the integrated model of EEMs is well verified and calibrated, it will be a prediction model of the observed variables. However, at the lower extent of the requirement, just satisfying specific properties is sufficient for the further optimization strategies. In the virtual laboratory, the study region is required to be discretized as a grid of square or rectangle cells. Thus, as per this cell grid, the cellular automata are obligatorily deployed for integrated model, and the observed variables must be stored as the properties of cell agents. This constraint is the most important condition for the integration between integrated model and the next Surveillance Network Model. ° Stage 3: Implementation of Surveillance Network Model (SNM) SNM is the core of the virtual laboratory. The current layout of existing surveillance network is considered as the input of the virtual laboratory, and the output is an "optimal" one. The measuring device agents are the basis components of the network. Each measuring device agent will be located in a cell agent. The information of this cell will be visible to the located device. Each cell is measured by at most one measuring device, and surely only a limited number of cells are measured by measuring devices. The SNM will be optimized by the Optimizers (Figure 64). The capability of each potential layout is evaluated by some specific performance indicators. The best one will be chosen for the environmental surveillance network. The CDSN is an extended support for the SNM, this is an optional choice in the virtual laboratory. ° Stage 4: Implementation of optimization algorithms The virtual laboratory supports the optimization algorithms for different scales of surveillance network representation. Some of them are presented in Chapter 4. Although almost all optimization techniques consider only two types of optimization: local or global, the virtual laboratory classifies these optimization algorithms into three scales: micro-, meso- and macro-scales. Any algorithm could be implemented inside this laboratory, some of them are listed as follows: (1) Micro-scale algorithms (Local search): Tabu search, Hill climbing search, Guided Local Search, or greedy randomized adaptive search procedure, etc. (2) Meso-scale algorithms (Proximity search): PSO (Particle Swarm Optimization), Ant Colony Optimization, etc. (3) Macro-scale algorithms (Global search): CDSN-based optimization, D-optimal design (minimizing the log-determinant), A-optimal design (minimizing the trace), and E-optimal design (minimizing the spectral radius of the error covariance), etc. Heuristic argument should be an important parameter of optimization algorithms. This argument is get from a heuristic cellular automaton for whole study region. Two following heuristic arguments should be supported in the virtual laboratory:

103

Chapter 6 ‒Methodological proposal

(1) Gaussian process entropy: The UKV or UKSE can be used for this argument. (2) Combinational heuristics: A combination between Gaussian process entropy and the local constraints. In addition, we can ignore Stage 4 and apply the optimization process outside of the virtual laboratory. In that case, the optimizers just have to assess their individual potential network layouts generated by the external tools. ° Stage 5: Assessment of potential network layouts (Experimental assessment) This function is applied by the Evaluators (Figure 64). Evaluators receive a potential layout of SNM and the simulation data of EEMs, and then provide its performance as per a specific indicator. Three following performance indicators are proposed: (1) RMSE of variogram model, (2) Deviation of cross-validation based on estimation mean, and (3) Mean Universal Kriging Variance (MUKV). The assessment must be repeated at least k times, e.g., in Chapter 5 we perform this process with k = 15. The final performance of a potential layout will be aggregated from the results of all these simulations. The potential layout with best performance will be chosen and considered as the optimal one. 6.3. Implementation and Release

A last point of this methodological proposal is that of its implementation. Since it has been our support tool for the whole PhD thesis, we think that the GAMA platform [29] [117], or an equivalent platform, such as Repast, is needed to implement such a virtual laboratory, essentially because it is based on a number of complex couplings between modeling techniques, optimization algorithms, simulation runs, etc. So a good support of GIS data, multi-scale processes, and Spatial & Distributed RDBMS, R language, is more or less required. GAMA is actually the best platform nowadays to implement all of these, and it is our aim to propose an abstract framework that would support modelers in defining their own “virtual laboratories”. Some detail of the code used in GAMA (GAML) and R language are presented in Annexes 1 and 2.

104

Chapter 7 ‒ Conclusion and perspective

CHAPTER 7

CONCLUSION AND PERSPECTIVES

In this work, we have designed and built a virtual laboratory called NOVEL that allows modeling, simulating, optimizing and assessing a light-trap surveillance network applied to insect invasions in the Mekong delta region of Vietnam. This virtual laboratory contains four main sub-systems: (1) Ecological Models, (2) Model of the Light-trap Network, (3) Optimization Processes and (4) Assessment Processes. In this laboratory, all the important characteristics of BPH invasions are thoroughly modeled; multiple local constraints are also modeled and considered in the dynamics of insects via two combinational indices: attractiveness and obstruction indices. Cellular automata technique is used for modeling the environment, which can interact with different types of agents in a GIS-based and agent-based platform ‒ GAMA. The surveillance network is also modeled as a perception component of the insect invasion inside NOVEL. Surveillance Network Model is separated into two groups: Disjointed Surveillance Network and Correlated & Disk graph-based Surveillance Network (CDSN). Based on the models implemented, multiple optimization techniques have been experimented and assessed inside this virtual laboratory. We consider that this integrated architecture, further generalized in Chapter 6, is the first and more important contribution of this thesis. In this laboratory, the optimization of an environmental surveillance network is performed in two stages: (1) A first stage of modeling, in which three scales of network optimization are defined together with their appropriate optimization algorithms and (2) a second stage of optimization, where potential layouts are experimentally assessed to seek for the “near-optimal” one. These two stages, on both their theoretical and experimental aspects, can be considered as the second and third contributions of this thesis. In the second contribution, concretely introduced in Chapter 4, the optimization processes are based on different heuristic functions which use information at different scales: micro-, meso- and macro-scales. At each scale, different algorithms are at the disposal of the user. Beside the metaheuristic optimization techniques, defined as the micro-scale approaches (e.g., hill climbing search or tabu search) and the meso-scale approaches (e.g., PSO or improved hill climbing search algorithm), we proposed a novel macro-scale optimization technique based on the Correlated & Disk graph-based

105

Chapter 7 ‒ Conclusion and perspective

Surveillance Network (CDSN) ‒ an improved approach of Unit Disk Graph with support of the correlation-based similarity. Finally, multiple local constraints are thoroughly considered in all implemented optimization algorithms. Only “feasible” potential layouts are accepted as the output of our algorithms. Assessing the performance of a network completely relies on simulations and can be considered as a “virtual” assessment. This support overcomes two difficulties of the end users when applying traditional assessment approaches: (1) the cost of real experiments, usually high, and (2) the complexity of the factors which affect the performance of the network. In the specific case of light-trap networks, we have performed all optimization and assessment processes in the most possibly realistic context, using multiple confident data sources listed in Annex 1. We also present some performance indicators to assess the network layouts, where the RMSEs of the variogram model are suggested as the most efficient one given its remarkable features and computation speed. Furthermore, this indicator can be representative of other performance indicators. In addition, our virtual laboratory is flexible enough to allow for the assessment of external network layouts. Finally, and more importantly, our approach is able to be generalized into a methodological proposal, which can be applied to multiple types of environmental surveillance networks. This proposal is based on four main components: (1) Ecological / Environmental Models (EEMs), (2) Surveillance Network Model (SNM), (3) Optimizers and (4) Evaluators. We also listed three groups of possible models that can be deployed for applications inside the “Ecological Models”. These models depend on the types of data used in the virtual laboratory: natural phenomena, biological organisms and human impacts. A 5-stage procedure to implement a new virtual laboratory is also presented including some implementation and release guides in GAMA platform. We consider that this methodological proposal is an extension of the first contribution, where the architecture of a general virtual laboratory is fully explained. Perspectives In the application domain, expanding the application to the whole Mekong Delta region is the first important perspective, especially by supplying NOVEL to the Southern Research Plant Protection Center (SRPPC). In fact, this perspective can be probably executed based on an existing collaboration between CTU/IRD and SRPPC in the Vietnamese national project KC.01.15/06-10 and some other historical relations. Pertaining to the perspectives on NOVEL itself, we will focus one some studies as follows: ° Better integration of heterogeneous models: This work can inherit some studies of Huynh Quang Nghi, PhD student of UPMC/IRD/CTU and co-models in GAMA. ° Integration of more realistic models for the Ecological models. ° Integration of more optimization algorithms. Furthermore, the methodological perspectives of generalization aims at building a virtual laboratory for applying multiple layout strategies of surveillance networks. The initial promising results of our research can probably be extended to other complex surveillance systems. An important advantage of agent-based models is the possibility

106

Chapter 7 ‒ Conclusion and perspective of considering multiple sides of a question, and especially, of evaluating the system at multiple scales. Theorically, implementing new models or replacing the current model by another one is totally possible for EEMs and SNM. Therefore, beside the methodological proposal for similar surveillance networks presented in Chapter 6, we think that the virtual laboratory approach can also be applied for different systems. Among many applications, three groups of applications are examined in this section: Plant protection and Quarantine, Environment protection, Public health & Epidemiology. Suggestions for implementing these applications on EEMs and SNM(s) are presented below: ° Plant protection and quarantine: Back to the plant protection activities in the Mekong region of Vietnam, many research questions are still open for further implementation and improvement. Prey/predator interaction is one of the first relations which can be implemented into the EEMs, such as the role of Cyrtorhinus lividipennis ‒ a direct enemy of BPHs. BPHs can transmit both rice grassy stunt (RGSV) and rice ragged stunt (RRSV) viruses into the rice [3]; these relations can also be implemented into the EEMs. Beside the collected data of the light-trap network [11], sticky traps can be used for insect monitoring [67]. Moreover, to monitor the insect invasion and the other diseases in the rice field, a network of “human sensors” has been organized. This network comprises agents working at plant protection stations and farmers. They frequently visit the rice field and collect necessary data. The concept of “human sensors” is also found in [157] but in a different meaning, where people share their information on several social media, e.g., YouTube videos or Flickr images. Building a virtual laboratory made of several surveillance networks, e.g., light-trap network and "human sensors", appears to be completely feasible in our approach. With it, we can propose an optimal strategy regarding the number of human agents, the number of light-traps, their operations and taking into accounts “qualitative” constraints such as the behavior or activities of people. ° Environment protection Waste production, air pollution, and loss of biodiversity are some of the issues related to environmental protection. Environment protection is an important research domain, which is often a mojor concern at the national level. And the hierarchical management system is rather tight, such as the meteorological or hydrological monitoring networks [131] [132], forest [37] [70] and marine [158] monitoring networks, etc. An outstanding characteristic of these systems is that they require a high accuracy and must operate regularly in collecting, processing data and forecasting, of which the end users are very diverse. In practice, in a study region, there are many environmental surveillances, hence the optimization at the global scale becomes an important attention of the decision makers. Optimizing the common performance of multiple surveillance networks, which are managed by different owners, is a big problem to which our approach can probably bring some solutions.

107

Chapter 7 ‒ Conclusion and perspective

° Public health & Epidemiology: Public health & Epidemiology are areas of interest of many countries, especially developing ones like Vietnam. The main “agents” in these systems include people, other epidemiological vectors (e.g., mosquitoes, birds and poultry) and diseases pathogens (e.g viruses or bacteria). The interaction between different ecosystems, or between the human behaviors and these ecosystems, require more investments in multidisciplinary studies. For example, in [159], Teurlai et al. asked the question: “Can human movements explain heterogeneous propagation of dengue fever in Cambodia?”; this question concerns several ecological communities: dengue viruses, poultry, wild birds, people. Studies on both Avian influenza virus (AIV) epidemics and bird community dynamics via the fecal–oral route in water [160] [161] [162] are also the examples of the propagation of viruses inside these communities. In addition, the impacts of environmental factors of the water, e.g., the temperature, salinity or pH degree, into the AVI pathogens are also discussed in [163]. There are also many studies for building disease propagation models in recent years. An epidemiology ABM for H5N1 propagation at micro-scale (village scale) has been proposed in [164]. In this model, the environment, including the placement of houses, inner-village grounds, rice-fields, road/street and ponds, and the poultry production system (normally in the backyard of each house) are modeled to study the H5N1 propagation. Such a model can be combined with other ones at higher scales to understand and survey the propagation of AVI in water found in [160] [161] [162] [163] and, for example, assess existing epidemiological surveillance networks. The same goes for diseases shared by animals and humans. Usually, at least two types of surveillance networks operate simultaneously in these situations. The first type concerns the infected animals and the second one infected people. While infected animals can be monitored by various ways (e.g., ovitrap for mosquito surveillance [165] or blood samples on birds [166]), infected people are often controlled by diagnostic tests, e.g., dengue cases [159], applied at hospitals and health centres and often integrated in a broader program as recommended by the WHO [161]. Therefore, optimizing such surveillance systems, composed of multiple, possibly conflicting, sub-networks is a clear case study for NOVEL in future works. In conclusion, we defend our “virtual laboratory” approach as a potential solution for designing and optimizing multiple types of surveillance networks; and our methodological proposal can be readily applied at multiple scales of applications. Depending on the scopes and the concerns of the stakeholders, different models can be built and deployed inside the laboratory, or replaced by other ones to achieve their final goal. This also enables the use of multiple models to represent different aspects of the complexity of real situations, thus cutting across different disciplines and supporting more multi-disciplinary approaches and applications. Finally, and this was, at the beginning, our primary goal, since realistic scenarios can be easily simulated, our work has the huge advantage of simultaneously being easily understandable and potentially adaptable by potential end users and policy makers on all the questions related to environmental surveillance.

108

REFERENCES

[1] “Document on Brown Plant Hoppers and the diseases caused by these insects,” Minitry of Agriculture in Vietnam, 2009. [Online]. Available: http://www.agroviet.gov.vn/pls/portal/docs/1077832.PDF. [Accessed: 19-Mar- 2009].

[2] M. Matsumuza, H. Takeuchi, M. Satoh, S. Sanada-Morimura, A. Otuka, T. Watanabe, and D. Van Thanh, “Current status of insecticide resistance in rice planthoppers in Asia,” in in Planthoppers: New Threats to the Sustainability of Intensive Rice Production Systems in Asia, Kong Luen Heong, Ed. Int. Rice Res. Inst., 2009, pp. 233–244.

[3] R. C. ; C. Cabauatan, Pepito Q; Cabunagan and Il-Ryong;, “Rice viruses transmitted by the brown planthopper Nilaparvata lugens Stål,” in in Planthoppers: New Threats to the Sustainability of Intensive Rice Production Systems in Asia, Kong Luen Heong, Ed. Int. Rice Res. Inst., 2009, pp. 357–368.

[4] A. Otuka, T. Watanabe, Y. Suzuki, M. Matsumura, A. Furuno, and M. Chino, “A migration analysis of the rice planthopper Nilaparvata lugens from the Philippines to East Asia with three-dimensional computer simulations,” Soc. Popul. Ecol., vol. 47, no. Springer, 2005.

[5] J. T. Cronin, “Movement and spatial population structure of a prairie planthopper,” Ecology, vol. 84, no. 5, pp. 1179–1188, 2003.

[6] V. L. Pham, Things to know about Brown Plant Hopper and preventive measures. Labour Publishing House, Viet nam (In Vietnamese), 2006.

[7] N. H. Ngo, Principles and Applications in Mathematical model for Biological Studies, Agriculture and Environment. Vietnam: Agriculture Publishing House, 2008.

[8] D. J. Brus and G. B. M. Heuvelink, “Optimization of sample patterns for universal kriging of environmental variables,” Geoderma, vol. 138, no. 1–2, pp. 86–95, 2007.

[9] W. F. Caselton and J. V Zidek, “Optimal monitoring network designs,” Stat. Probab. Lett., vol. 2, no. 4, pp. 223–227, Aug. 1984.

[10] V. X. Truong, H. X. Huynh, M. N. Le, and A. Drogoul, “Estimating the Density of Brown Plant Hoppers from a Light-Traps Network Based on Unit Disk Graph,” in AMT 2011, 2011, vol. 6890, pp. 276–287.

109

[11] A. Spalding and M. Parsons, “Light trap transects – a field method for ascertaining the habitat preferences of night-flying Lepidoptera, using Mythimna turca (Linnaeus 1761) (Lepidoptera: Noctuidae) as an example,” J. Insect Conserv., vol. 8, no. 2–3, pp. 185–190, 2004.

[12] M. Kato, T. Itioka, S. Sakai, K. Momose, S. Yamane, A. A. Hamid, and T. Inoue, “Various population fluctuation patterns of light-attracted beetles in a tropical lowland dipterocarp forest in Sarawak,” Popul. Ecol., vol. 42, no. 1, pp. 97–104, 2000.

[13] D. Tran Hong, “Applying the geographical information system in managing the plant protection database of preventing the rice pest in Hau Giang province, Master thesis (In Vietnamese),” Can Tho university, 2010.

[14] Q. M. Vo, V. Q. Nguyen, C. Q. Truong, and M. H. Nguyen, “An Initial Method for Predicting the Spatial Distribution of Pest and Disease Using a Geographic Information System (GIS). A case study of Brown Backed Rice Plant Hopper (Nivaparvata lugens) in Trung An village, Thotnot province, Cantho city.,” GIS Dev., 2006.

[15] Finbarr Horgan, “Mechanisms of resistance: a major gap in understanding planthopper-rice interactions,” in in Planthoppers: New Threats to the Sustainability of Intensive Rice Production Systems in Asia, Kong Luen Heong, Ed. Int. Rice Res. Inst., 2009, pp. 281–302.

[16] K. L. Heong, Planthoppers: New Threats to the Sustainability of Intensive Rice Production Systems in Asia. International Rice Research Institute (IRRI), 2009.

[17] A. Otuka, “Migration of rice planthoppers and simulation techniques,” in in Planthoppers: New Threats to the Sustainability of Intensive Rice Production Systems in Asia, Kong Luen Heong, Ed. Int. Rice Res. Inst., 2009, pp. 343–356.

[18] J. L. A. Catindig, G. S. Arida, S. E. Baehaki, J. S. Bentur, L. Q. Cuong, M. Norowi, W. Rattanakarn, W. Sriratanasak, J. Xia, and Z. Lu, “Situation of planthoppers in Asia,” in in Planthoppers: New Threats to the Sustainability of Intensive Rice Production Systems in Asia, Kong Luen Heong, Ed. Int. Rice Res. Inst., 2009, pp. 191–220.

[19] G. Matheron, “Principles of geostatistics,” Econ. Geol., vol. 58, no. 8, pp. 1246– 1266, 1963.

[20] N. A. C. Cressie, Statistics for Spatial Data, Revised Ed. Wiley-Interscience, 1993.

[21] L. Q. Thang, “Final report of project KC.01.15/06-10: ‘Design of Information Systems for supporting the Protection against Epidemic Diseases of crop plants

110

and aquaculture in major economic areas’, directed Dr. Le Quyet Thang (in Vietnamese),” Can Tho, Vietnam, 2010.

[22] A. Atkinson, A. Donev, and R. Tobias, Optimum Experimental Designs, with SAS (Oxford Statistical Science Series). Oxford University Press, USA, 2007.

[23] I. Osman and G. Laporte, “Metaheuristics: A bibliography,” Ann. Oper. Res., vol. 63, no. 5, pp. 511–623, 1996.

[24] C. Blum and A. Roli, “Metaheuristics in combinatorial optimization: Overview and conceptual comparison,” ACM Comput. Surv., vol. 35, no. 3, pp. 268–308, Sep. 2003.

[25] I. Boussaïd, J. Lepagnot, and P. Siarry, “A survey on optimization metaheuristics,” Inf. Sci. (Ny)., vol. 237, no. 0, pp. 82–117, 2013.

[26] N. H. Huan, L. V Thiet, H. V Chien, and K. L. Heong, “Farmers’ participatory evaluation of reducing pesticides, fertilizers and seed rates in rice farming in the Mekong Delta, Vietnam,” Crop Prot., vol. 24, no. 5, pp. 457–464, 2005.

[27] K. L. Heong, M. M. Escalada, N. H. Huan, H. V Chien, and P. V Quynh, “Scaling out communication to rural farmers: lessons from the ``Three Reductions, Three Gains’' campaign in Vietnam,” in Research to impact: case studies for natural resource management for irrigated rice in Asia, 2011, pp. 207–220.

[28] H. Chernoff, Sequential Analysis and Optimal Design. Society for Industrial and Applied Mathematics, 1972.

[29] L. S. Fichter, E. J. Pyle, and S. J. Whitmeyer, “Strategies and Rubrics for Teaching Chaos and Complex Systems Theories as Elaborating, Self-Organizing, and Fractionating Evolutionary Systems,” J. Geosci. Educ., vol. 58, no. 2, pp. 65– 85, 2010.

[30] V. Grimm and S. F. Railsback, Individual-based Modeling and Ecology. Princeton University Press, 2005.

[31] P. W. Anderson, K. J. Arrow, D. Pines, and N. M. . Santa Fe Institute (Santa Fe, The Economy as an Evolving Complex System: The Proceedings of the Evolutionary Paths of the Global Economy Workshop, Held September, 1987 in Santa Fe, New Mexico, no. tập 1. Addison-Wesley, 1988.

[32] “Journal of Artificial Societies and Social Simulation.” [Online]. Available: http://jasss.soc.surrey.ac.uk/admin/about.html.

[33] M. Paegelow and M. T. C. Olmedo, Modelling Environmental Dynamics: Advances in Geomatic Solutions. Springer, 2008.

111

[34] J. Wainwright and M. Mulligan, Environmental Modelling: Finding Simplicity in Complexity. Wiley, 2013.

[35] D. Servat, E. Perrier, J.-P. Treuil, and A. Drogoul, “When Agents Emerge from Agents: Introducing Multi-scale Viewpoints in Multi-agent Simulations,” in in Multi- Agent Systems and Agent-Based Simulation, vol. 1534, J. Sichman, R. Conte, and N. Gilbert, Eds. Springer Berlin Heidelberg, 1998, pp. 183–198.

[36] D.-A. Vo, A. Drogoul, and J.-D. Zucker, “An Operational Meta-Model for Handling Multiple Scales in Agent-Based Simulations,” in Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2012 IEEE RIVF International Conference on, 2012, pp. 1–6.

[37] U. Wilensky, “NetLogo,” Northwestern University. .

[38] N. Minar, R. Burkhart, C. Langton, and M. Askenazi, “The SWARM simulation system: A toolkit for building multi-agent simulations,” 1996.

[39] A. T. Crooks, “The Repast Simulation/Modelling System for Geospatial Simulation.” .

[40] D.-A. Vo, A. Drogoul, and J.-D. Zucker, “Multi-Level Agent-Based Modeling: a Generic Approach and an Implementation.,” in KES-AMSTA, 2013, vol. 252, pp. 91–101.

[41] A. Drogoul, E. Amouroux, P. Caillou, B. Gaudou, A. Grignard, N. Marilleau, P. Taillandier, M. Vavasseur, D.-A. Vo, and J.-D. Zucker, “GAMA: A Spatially Explicit, Multi-level, Agent-Based Modeling and Simulation Platform,” in in Advances on Practical Applications of Agents and Multi-Agent Systems, vol. 7879, Y. Demazeau, T. Ishida, J. Corchado, and J. Bajo, Eds. Springer Berlin Heidelberg, 2013, pp. 271–274.

[42] R. A. Fisher, The design of experiments. Hafner Pub. Co., 1966.

[43] M. Kèohl, S. S. Magnussen, and M. Marchetti, Sampling Methods, Remote Sensing and GIS Multiresource Forest Inventory. Springer Berlin Heidelberg, 2006.

[44] G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, 3rd ed. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1994.

[45] H. I. Freedman, Deterministic mathematical models in population ecology. New York: M. Dekker, 1980.

112

[46] E. Kouokam, P. Auger, H. Hbid, and M. Tchuente, “Effect of the number of patches in a multi-patch SIRS model with fast migration on the basic reproduction rate,” Acta Biotheor., vol. 56, no. 1–2, 2008.

[47] P. Auger, C. Lett, and J.-C. Poggiale, Modélisation mathématique en écologie : cours et exercices corrigés. Paris: Dunod, 2010.

[48] B. Dennis, R. A. Desharnais, J. Cushing, S. M. Henson, and R. Costantino, “Estimating chaos and complex dynamics in an insect population,” Ecol. Monogr., vol. 71, no. 2, pp. 277–303, 2001.

[49] H. Laurie and E. Perrier, “A multifractal model for spatial variation in species richness,” Ecol. Complex., vol. 7, no. 1, pp. 32–35, 2010.

[50] C. H. Phan, H. X. Huynh, and A. Drogoul, “An agent-based approach to the simulation of Brown Plant Hopper (BPH) invasions in the Mekong Delta,” in Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2010 IEEE RIVF International Conference, 2010, pp. 1–6.

[51] V. G. N. Nguyen, A. Drogoul, and H. X. Huynh, “Toward an Agent-Based Multi- scale Recommendation System for Brown Plant Hopper Control,” in Computer Modeling and Simulation (EMS), 2012 Sixth UKSim/AMSS European Symposium on, 2012, pp. 9–14.

[52] N. Cressie and C. K. Wikle, Statistics for Spatio-Temporal Data (Wiley Series in Probability and Statistics), 1st ed. Wiley, 2011.

[53] E. Pebesma, “spacetime: Spatio-Temporal Data in R,” J. Stat. Softw., vol. 51, no. 7, pp. 1–30, 2012.

[54] J. G. De Gooijer and R. J. Hyndman, “25 years of time series forecasting,” Int. J. Forecast., vol. 22, no. 3, pp. 443–473, 2006.

[55] I. Mahfoud, D. Josselin, and B. Fady, “Sensibilité des indices de diversité à l’agrégation,” Rev. Int. Géomatique, vol. 17, no. 3–4, pp. 293–308, 2007.

[56] D. Josselin, “Spatial data exploratory analysis and usability.,” Data Sci. J., vol. 2, pp. 100–116, 2003.

[57] B. A. Melbourne, H. V Cornell, K. F. Davies, C. J. Dugaw, S. Elmendorf, A. L. Freestone, R. J. Hall, S. Harrison, A. Hastings, M. Holland, M. Holyoak, J. Lambrinos, K. Moore, and H. Yokomizo, “Invasion in a heterogeneous world: resistance, coexistence or hostile takeover?,” Ecol. Lett., vol. 10, pp. 248–1461, 2007.

[58] C. E. Rasmussen, Gaussian processes for machine learning. MIT Press, 2006.

113

[59] J. R. Norris, Markov chains. Cambridge University Press, 1998, pp. I–XVI, 1–237.

[60] D. Freedman, Statistical Models: Theory And Practice. Cambridge University Press, 2005.

[61] P. Gouédard, L. Stehly, F. Brenguier, M. Campillo, Y. de Verdière, E. Larose, L. Margerin, P. Roux, F. J. Sánchez-Sesma, N. M. Shapiro, and R. L. Weaver, “Cross-correlation of random fields: mathematical approach and applications,” Geophys. Prospect., vol. 56, no. 3, pp. 375–393, 2008.

[62] B. Randal, “Variogram Tutorial.” .

[63] M. Unser, “Sampling: 60 Years After Shannon,” in Sixteenth International Conference on Digital Signal Processing (DSP’09), 2009.

[64] V. Grimm, “Ten years of individual-based modelling in ecology: what have we learned and what could we learn in the future?,” Ecol. Modell., vol. 115, no. 2–3, pp. 129–148, 1999.

[65] D. Zurell, U. Berger, J. S. Cabral, F. Jeltsch, C. N. Meynard, T. Münkemüller, N. Nehrbass, J. Pagel, B. Reineking, B. Schröder, and V. Grimm, “The virtual ecologist approach: simulating data and observers,” Oikos, vol. 119, no. 4, pp. 622–635, 2010.

[66] V. K. Huynh, J.-D. Zucker, H. X. Huynh, and A. Drogoul, “Spatial Estimator of Brown Plant Hopper Density from Light Traps Data,” in Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2012 IEEE RIVF International Conference on, 2012, pp. 1–6.

[67] B. C. . A. S. . C. C. N. . H. C. Y. . R. R. S. Dyck V. A.; Misra, “Ecology of the brown planthopper in the tropics,” in in Planthoppers: New Threats to the Sustainability of Intensive Rice Production Systems in Asia, Kong Luen Heong, Ed. Int. Rice Res. Inst., 2009, pp. 61–98.

[68] A. Drogoul, “Keynote Speech: A Review of the Ontological Status, Computational Foundations and Methodological Processes of Agent-Based Modeling and Simulation Approaches: Open Challenges and Research Perspectives.,” in PRIMA, 2008, vol. 5357, p. 1.

[69] M. Niazi and A. Hussain, “Agent-based computing from multi-agent systems to agent-based models: a visual survey,” Scientometrics, vol. 89, no. 2, pp. 479– 499, Nov. 2011.

[70] G. N. Gilbert, Agent-based models. Los Angeles, CA: Sage, 2008.

114

[71] F. Glover and M. Laguna, Tabu Search. Norwell, MA, USA: Kluwer Academic Publishers, 1997.

[72] F. Cecconi and M. Campenní, “PSO (Particle Swarm Optimization): One Method, Many Possible Applications,” in in Agent-Based Evolutionary Search, vol. 5, R. Sarker and T. Ray, Eds. Springer Berlin Heidelberg, 2010, pp. 229–254.

[73] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Neural Networks, 1995. Proceedings., IEEE International Conference on, 1995, vol. 4, pp. 1942– 1948 vol.4.

[74] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” in Evolutionary Computation Proceedings, 1998. IEEE World Congress on Computational Intelligence., The 1998 IEEE International Conference on, 1998, pp. 69–73.

[75] UMMISCO, “GAMA simulation Platform - Website: http://code.google.com/p/gama-platform/.” .

[76] P. Taillandier, D.-A. Vo, E. Amouroux, and A. Drogoul, “GAMA: A Simulation Platform That Integrates Geographical Information Data, Agent-Based Modeling and Multi-scale Control,” in in Principles and Practice of Multi-Agent Systems, vol. 7057, N. Desai, A. Liu, and M. Winikoff, Eds. Springer Berlin Heidelberg, 2012, pp. 242–258.

[77] “R language - Website: http://www.r-project.org/.” .

[78] J. Von Neumann, Theory of Self-Reproducing Automata. Champaign, IL, USA: University of Illinois Press, 1966.

[79] S. Wolfram, “Statistical mechanics of cellular automata,” Rev. Mod. Phys., vol. 55, no. 3, pp. 601–644, Jul. 1983.

[80] B. Schönfisch and A. de Roos, “Synchronous and asynchronous updating in cellular automata,” Biosystems, vol. 51, no. 3, pp. 123–143, 1999.

[81] S. Wolfram, Cellular automata and complexity: collected papers. Reading, MA: Addison-Wesley, 1994.

[82] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005.

[83] A. Krause, A. Singh, and C. Guestrin, “Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies,” J. Mach. Learn. Res., vol. 9, pp. 235–284, Jun. 2008.

[84] G. M. Phillips, Interpolation and Approximation by Polynomials. Springer, 2003.

115

[85] M. D. Buhmann, Radial Basis Functions: {T}heory and Implementations. Cambridge University Press, 2003.

[86] P. B. Stark and L. Tenorio, “A Primer of Frequentist and Bayesian Inference in Inverse Problems.” .

[87] B. D. Ripley, Spatial Statistics. New York: Wiley, 1981.

[88] A. N. Kolmogorov, “Interpolation und extrapolation von stationaren zufalligen folgen,” Bull. Acad. Sci. USSR, Ser. Math, vol. 5, pp. 3–14, 1941.

[89] G. H. Jowett and J. F. Scott, “Simple Graphical Techniques for Calculating Serial and Spatial Correlations and means Semi-Squared Differences,” J. R. Stat. Soc. Ser. B, vol. 15, no. 1, pp. pp. 81–86, 1953.

[90] E. Pebesma, “Multivariable geostatistics in S: the gstat package,” Comput. Geosci., vol. 30, no. 7, pp. 683–691, Aug. 2004.

[91] H. Wackernagel, Multivariate Geostatistics. Springer, 2003.

[92] S. Ihara, Information Theory for Continuous Systems. World Scientific, 1993.

[93] C. E. Shannon, “A Mathematical Theory of Communication,” SIGMOBILE Mob. Comput. Commun. Rev., vol. 5, no. 1, pp. 3–55, Jan. 2001.

[94] K. Chaloner and I. Verdinelli, “Bayesian Experimental Design: A Review,” Stat. Sci., vol. 10, pp. 273–304, 1995.

[95] “CRAN Task View: Design of Experiments (DoE) & Analysis of Experimental Data. Website: http://cran.r-project.org/web/views/ExperimentalDesign.html,” 2013-03-20. .

[96] J. M. Bellido and N. Pérez, “A new optimal allocation sampling design to improve estimates and precision levels of discards from two different Fishery Units of Spanish trawlers in northeast Atlantic waters (ICES subareas VIIc, j, k),” Boletín del Inst. Español Oceanogr., vol. 23 (1–4), pp. 73–83, 2007.

[97] M. Talvitie, O. Leino, and M. Holopainen, “Inventory of sparse forest populations using adaptive cluster sampling,” Finnish Soc. For. Sci. Silva Fenn., vol. 40, no. 1, pp. 101–108, 2006.

[98] F. Geurts, “From ant colonies to artificial ants: First International Workshop on Ant Colony Optimization,” AI Commun., vol. 11, no. 3,4, pp. 241–242, Dec. 1998.

116

[99] R. Sarker and T. Ray, “Agent Based Evolutionary Approach: An Introduction,” in in Agent-Based Evolutionary Search, vol. 5, R. Sarker and T. Ray, Eds. Springer Berlin Heidelberg, 2010, pp. 1–11.

[100] Y. Yan, S. Yang, D. Wang, and D. Wang, “Agent Based Evolutionary Dynamic Optimization,” in in Agent-Based Evolutionary Search, vol. 5, R. Sarker and T. Ray, Eds. Springer Berlin Heidelberg, 2010, pp. 97–116.

[101] M. Nguyen, T. Ho, and J.-D. Zucker, “A Simulation Model for Optimise the Fire Evacuation Configuration in the Metro Supermarket of Hanoi,” in in Simulated Evolution and Learning, vol. 7673, L. Bui, Y. Ong, N. Hoai, H. Ishibuchi, and P. Suganthan, Eds. Springer Berlin Heidelberg, 2012, pp. 470–479.

[102] T. A. N. Nguyen, Y. Chevaleyre, and J.-D. Zucker, “Optimizing the Placement of Evacuation Signs on Road Network with Time and Casualties in Case of a Tsunami.,” in WETICE, 2012, pp. 394–396.

[103] F. Aurenhammer, “Voronoi diagrams -- a survey of a fundamental geometric data structure,” ACM Comput. Surv., vol. 23, no. 3, pp. 345–405, 1991.

[104] B. Dyck, V.A; Thomas, “The brown planthopper problem,” in in Brown Planthopper: Threat to Rice Production in Asia, Internatio., 1979, pp. 3–17.

[105] P. A. C. Ooi and S. B. M., “Predators and Parasitoids of Rice Insect Pests,” in in Biology and management of rice insects, E. A. Heinrichs, Ed. New York : Wiley ; [Manila, Philippines] : IRRI, 1994, pp. 585–612.

[106] S. Kulshreshtha, JP; Anjaneyulu, A; Padmanabhan, “The disastrous brown planthopper attack in Kerala.,” Indian Farm, vol. 24, pp. 5–7, 1974.

[107] T. Park, Thomas; Mertz, David B.; Grodzinski, Wladyslaw; Prus, “Cannibalistic Predation in Populations of Flour Beetles,” Physiol. Zool., vol. 38, no. 3, pp. 289– 321, 1965.

[108] V. N. Mose, T. Nguyen-Huu, P. Auger, and D. Western, “Modelling herbivore population dynamics in the Amboseli National Park, Kenya: Application of spatial aggregation of variables to derive a master model,” Ecol. Complex., vol. 10, no. 0, pp. 42–51, 2012.

[109] V. N. Mose, T. Nguyen-Huu, D. Western, P. Auger, and C. Nyandwi, “Modelling the dynamics of migrations for large herbivore populations in the Amboseli National Park, Kenya,” Ecol. Modell., vol. 254, no. 0, pp. 43–49, 2013.

[110] C. J. Ralph, J. R. Sauer, and S. Droege, Monitoring Bird Populations by Point Counts. DIANE Publishing Company, 1998.

117

[111] V. G. N. Nguyen, H. X. Huynh, T. T. Vo, and A. Drogoul, “On weather affecting to brown plant hopper invasion using an agent-based model,” in Proceedings of the International Conference on Management of Emergent Digital EcoSystems, 2011, pp. 150–157.

[112] Å. Brännström and D. J. Sumpter, “Coupled map lattice approximations for spatially explicit individual-based models of ecology,” in Bulletin of mathematical biology, 2005, vol. 67, no. Spinger, pp. 663–682.

[113] E. Eliot and É. Daudé, “Diffusion des épidémies et complexités géographiques : perspectives conceptuelles et méthodologiques,” Espac. Popul. sociétés [En ligne], 2006.

[114] D. Badariotti, V. Laperrière, and A. Banos, “Réalisation d’un modèle contingent à base d'agents pour simuler des épidémies de peste à Madagascar Le modèle SIMPEST,” Rev. Int. Géomatique, vol. 19, no. 3, pp. 253–275, 2009.

[115] B. N. Clark, C. J. Colbourn, and D. S. Johnson, “Unit disk graphs,” Discret. Math., vol. 86, no. 1–3, pp. 165–177, Jan. 1991.

[116] H. M. Shin and C. Y. Lee, “Optimal rate allocation and QoS-sensitive admission control in wireless integrated networks,” Wirel. Netw., vol. 17, no. 1, pp. 231–246, Jan. 2011.

[117] M. V Marathe, H. Breu, H. B. H. III, S. S. Ravi, and D. J. Rosenkrantz, “Simple Heuristics for Unit Disk Graphs,” NETWORKS, vol. 25, pp. 59–68, 1995.

[118] H. Breu and D. G. Kirkpatrick, “Unit Disk Graph Recognition is NP-Hard,” Comput. Geom. Theory Appl., vol. 9, 1993.

[119] N. D. Nguyen, T. H. D. Phan, T. A. N. Nguyen, A. Drogoul, and J.-D. Zucker, “Disk Graph-Based Model: A Graph Theoretical Approach for Linking Agent- Based Models and Dynamical Systems,” in RIVF, 2010, pp. 1–4.

[120] H. X. Huynh, F. Guillet, J. Blanchard, P. Kuntz, H. Briand, and R. Gras, “A Graph- based Clustering Approach to Evaluate Interestingness Measures: A Tool and a Comparative Study,” in in Quality Measures in Data Mining, vol. 43, F. Guillet and H. J. Hamilton, Eds. Springer, 2007, pp. 25–50.

[121] L. L. C. Books and G. B. LLC, Statistical Dependence: Independence, Correlation and Dependence, Copula, Spearman’s Rank Correlation Coefficient, Guttman Scale. General Books, 2010.

[122] J. L. Rodgers and A. W. Nicewander, “Thirteen Ways to Look at the Correlation Coefficient,” Am. Stat., vol. 42, no. 1, pp. 59–66, 1988.

118

[123] S. D. Bolboaca and L. Jantschi, “Pearson versus Spearman, Kendall’s Tau Correlation Analysis on Structure-Activity Relationships of Biologic Active Compounds,” Leonardo J. Sci., vol. 5, no. 9, pp. 179–200, 2006.

[124] T. Henderson, “The Physics Classroom: Describing Motion with Words,” vol. 2013, no. 1 December 2013. 2001.

[125] W. Tobler, “A Computer Movie Simulating Urban Growth in the Detroit Region,” Econ. Geogr., vol. 46, no. 2, pp. 234–240, 1970.

[126] M. T. Truong, F. Amblard, and G. Benoit, “Combination Framework of BI solution & Multi-agent platform (CFBM) for multi-agent based simulation with a huge amount of data,” Deuxième édition Atelier à la Décision à tous lé Etages (AIDE @ ECG 2013), Toulouse, pp. 35–42, 2013.

[127] R. J. Hyndman and A. B. Koehler, “Another look at measures of forecast accuracy,” Int. J. Forecast., vol. 22, no. 4, pp. 679–688, 2006.

[128] W. R. Ashby, “Principles of the self-organizing system,” in in Principles of Self- Organization: Transactions of the University of Illinois Symposium, H. v. Foerster and G. W. Zopf, Eds. London: Pergamon, 1962, pp. 255–278.

[129] J. Ferber, O. Gutknecht, and F. Michel, “From Agents to Organizations: An Organizational View of Multi-agent Systems,” in In: LNCS n. 2935: Procs. of AOSE’03, 2003, pp. 214–230.

[130] F. Distante and V. Piuri, “Hill-climbing heuristics for optimal hardware dimensioning and software allocation in fault-tolerant distributed systems,” IEEE Trans. Reliab., vol. 38, no. 1, pp. 28–39, Apr. 1989.

[131] H. Youssef, S. M. Sait, and H. Adiche, “Evolutionary algorithms, simulated annealing, and Tabu search: a comparative study,” Proc. SPIE, vol. 3455. pp. 94– 105, 1998.

[132] J. Choi, J. Lee, and S. Oh, “Swarm intelligence for achieving the global maximum using spatio-temporal Gaussian processes,” in American Control Conference, 2008, 2008, pp. 135–140.

[133] C. W. Ko, J. Lee, and M. Queyranne, “An Exact Algorithm for Maximum Entropy Sampling,” Oper. Res., vol. 43, no. 4, pp. 684–691, Jul. 1995.

[134] S. Zio, L. Fontanella, and L. Ippoliti, “Optimal spatial sampling schemes for environmental surveys,” Environ. Ecol. Stat., vol. 11, no. 4, pp. 397–414, 2004.

[135] B. Selman and C. P. Gomes, “Hill-climbing Search,” in in Encyclopedia of Cognitive Science, John Wiley & Sons, Ltd, 2006.

119

[136] V. Truong, H. Huynh, M. Le, and A. Drogoul, “Optimizing an Environmental Surveillance Network with Gaussian Process Entropy - An Optimization Approach by Agent-based Simulation,” in in Frontiers in Artificial Intelligence and Applications, vol. 252, R. J. H. L. C. J. Dariusz Barbucha Manh Thanh Le, Ed. 2013, pp. 102–111.

[137] D. Myers, “Matrix Formulation of Co-Kriging,” Math. Geol., vol. 14, no. 3, 1982.

[138] O. P. Baume, A. Gebhardt, C. Gebhardt, G. B. M. Heuvelink, and J. Pilz, “Network optimization algorithms and scenarios in the context of automatic mapping,” Comput. Geosci., vol. 37, no. 3, pp. 289–294, 2011.

[139] P. J. Bickel, C. A. J. Klaassen, Y. Ritov, and J. A. Wellner, Efficient and Adaptive Estimation for Semiparametric Models, 1st ed. Springer, 1998.

[140] W. M. Organization, Guide to hydrological practices. Secretariat of the WMO, 1981.

[141] M. J. A. N. Bhuiyan and D. Dutta, “Assessing impacts of sea level rise on river salinity in the Gorai river network, Bangladesh,” Estuar. Coast. Shelf Sci., vol. 96, no. 0, pp. 219–227, 2012.

[142] T. Frei, “Designing meteorological networks for Switzerland according to user requirements,” Meteorol. Appl., vol. 10, no. 4, pp. 313–317, 2003.

[143] M. J. Bhuiyan and D. Dutta, “Analysis of flood vulnerability and assessment of the impacts in coastal zones of Bangladesh due to potential sea-level rise,” Nat. Hazards, vol. 61, no. 2, pp. 729–743, 2012.

[144] T. D. Bui, Duc Long;Phung, “Flash floods, hydro-meteorological forecasting and warning systems in Viet Nam,” in Proceedings of the 7th Annual Mekong Flood Forum, 2009, pp. 3–2–1.

[145] H. T. Nguyen, M. Nguyen, Q. T. Le, D. A. T. Dinh, V. T. Lam, C. D. Trinh, H. L. Nguyen, L. Neumann, C. Stephen, and M. Moglia, A Mapbook of Water System and Environment of Can Tho City. Can Tho University, 2013.

[146] “MM5 Community Model. Website: http://www.mmm.ucar.edu/mm5/.” .

[147] “WRF (Weather Research & Forecasting. Website: http://www.wrf- model.org/index.php.” [Online]. Available: http://www.wrf-model.org/index.php.

[148] “MIKE by DHI software. Website: http://mikebydhi.com/.” [Online]. Available: http://mikebydhi.com/.

120

[149] V. P. Dang Tri, I. Popescu, A. van Griensven, D. Solomatine, N. H. Trung, and A. Green, “A study of the climate change impacts on fluvial flood propagation in the Vietnamese Mekong Delta,” Hydrol. Earth Syst. Sci. Discuss., vol. 9, no. 6, pp. 7227–7270, 2012.

[150] N. V Dung, B. Merz, A. Bárdossy, T. D. Thang, and H. Apel, “Multi-objective automatic calibration of hydrodynamic models utilizing inundation maps and gauge data,” Hydrol. Earth Syst. Sci., vol. 15, no. 4, pp. 1339–1354, 2011.

[151] C. Hébert, L. Jobin, M. Fréchette, G. Pelletier, C. Coulombe, C. Germain, and M. Auger, “An Efficient Pit-light Trap to Study Beetle Diversity,” J. Insect Conserv., vol. 4, no. 3, pp. 189–200, 2000.

[152] J. Stevens L. and A. R. Olsen, “Spatially Restricted Surveys over Time for Aquatic Resources,” J. Agric. Biol. Environ. Stat., vol. 4, no. 4, pp. pp. 415–428, 1999.

[153] H. Townson, “Mosquito ecology: field sampling methods. 2nd edition. By M.W. Service. (London: Chapman & Hall, 1993). xiii+988 pp. ISBN 1 85166-798-9.,” Bull. Entomol. Res., vol. 84, no. 03, pp. 439–440, 1994.

[154] K. Dao Duc, P. Auger, and T. Nguyen Huu, “Predator density dependent prey dispersal in a patchy environment with a refuge for the prey,” S. Afr. J. Sci., vol. 104, no. 5–6, pp. 180–184, 2008.

[155] R. A. Myers and B. Worm, “Extinction, survival or recovery of large predatory fishes,” Philos. Trans. R. Soc. B Biol. Sci., vol. 360, no. 1453, pp. 13–20, 2005.

[156] R. A. Myers and B. Worm, “Rapid worldwide depletion of predatory fish communities,” Nature, vol. 423, no. 6937, pp. 280–283, 2003.

[157] O. Aulov and M. Halem, “Human Sensor Networks for Improved Modeling of Natural Disasters.,” Proc. IEEE, vol. 100, no. 10, pp. 2812–2823, 2012.

[158] M. Karydis and D. Kitsiou, “Marine water quality monitoring: A review,” Mar. Pollut. Bull., no. 0, p. -, 2013.

[159] M. Teurlai, R. Huy, B. Cazelles, R. Duboz, C. Baehr, and S. Vong, “Can Human Movements Explain Heterogeneous Propagation of Dengue Fever in Cambodia?,” PLoS Negl Trop Dis, vol. 6, no. 12, p. e1957, 2012.

[160] B. Roche, C. Lebarbenchon, M. Gauthier-Clerc, C.-M. Chang, F. Thomas, F. Renaud, S. van der Werf, and J.-F. Guégan, “Water-borne transmission drives avian influenza dynamics in wild birds: The case of the 2005–2006 epidemics in the Camargue area,” Infect. Genet. Evol., vol. 9, no. 5, pp. 800–805, 2009.

121

[161] “Influenza at the Human-Animal Interface (HAI),” WHO 2013, 2013. [Online]. Available: http://www.who.int/influenza/human_animal_interface/en/index.html. [Accessed: 18-Sep-2013].

[162] A. Wallensten, V. J. Munster, N. Latorre-Margalef, M. Brytting, J. Elmberg, R. A. Fouchier, T. Fransson, P. D. Haemig, M. Karlsson, A. Lundkvist, A. D. Osterhaus, M. Stervander, J. Waldenström, and O. Björn, “Surveillance of influenza A virus in migratory waterfowl in northern Europe,” Emerg. Infect. Dis., vol. 13, no. 3, pp. 404–411, Mar. 2007.

[163] J. D. Brown, G. Goekjian, R. Poulson, S. Valeika, and D. E. Stallknecht, “Avian influenza virus in water: Infectivity is dependent on pH, salinity and temperature,” Vet. Microbiol., vol. 136, no. 1–2, pp. 20–26, 2009.

[164] E. Amouroux, P. Taillandier, and A. Drogoul, “Complex environment representation in epidemiology ABM: application on H5N1 propagation,” J. Sci. Technol., pp. 13–25, 2010.

[165] C. R. Williams, S. A. Long, R. C. Russell, and S. A. Ritchie, “Optimizing ovitrap use for Aedes aegypti in Cairns, Queensland, Australia: effects of some abiotic factors on field efficacy.,” J Am Mosq Control Assoc, vol. 22, no. 4, pp. 635–640, 2006.

[166] L. Haynes, E. Arzey, C. Bell, N. Buchanan, G. Burgess, V. Cronan, C. Dickason, H. Field, S. Gibbs, P. M. Hansbro, T. Hollingsworth, A. C. Hurt, P. Kirkland, H. McCracken, J. O’Connor, J. Tracey, J. Wallner, S. Warner, R. Woods, and C. Bunn, “Australian surveillance for avian influenza viruses in wild birds between July 2005 and June 2007,” Aust. Vet. J., vol. 87, no. 7, pp. 266–272, 2009.

122

PUBLICATIONS

[1] Truong, X.V., Huynh, X. H., Le, N.M., Drogoul, A. 2011. Estimating the density of Brown Plant Hoppers from a light-traps network based on Unit Disk Graph. The 2011 International Conference on Active Media Technology (AMT 2011), Lanzhou, China. Springer-Verlag Berlin, Heidelberg ©2011. ISBN: 978-3-642-23619-8. [2] Truong, X.V., Huynh, X.H., Le, N.M., Drogoul, A. 2011. Modeling the Brown Plant Hop-pers Surveillance Network using Agent-Based Model ̶ Application for the Mekong Delta Region. In The Proceedings of The Second International Symposium on Information and Communication Technology (SoICT’11), October 13–14 2011, Hanoi, Vietnam. ACM New York, NY, USA ©2011. ISBN: 978-1-4503- 0880-9. DOI:10.1145/2069216.2069243. [3] Truong, X.V., Huynh, X.H., Le, N.M., Drogoul, A. 2012. Modeling a Surveillance Network based on Unit Disk Graph technique - Application for monitoring the invasion of insects in Mekong Delta Region. The 15th International Conference on Principles and Practice of Multi-Agent Systems (PRIMA 2012), September 5–7 2012, Kuching, Malaysia. LNAI 7455, pp. 228–242, 2012. Springer-Verlag Berlin Heidelberg ©2012. [4] Truong, X.V., Huynh, X.H., Le, N.M., Drogoul, A. 2012. Optimizing by Simulation an Environmental Surveillance Network - Application to the Fight against Rice Pest in the Mekong Delta Region, Vietnam (Mid-term report of PhD thesis) . COnférence sur la Modélisation Mathématique et Informatique des Systèmes Complexes (COMMISCO 2012), October 23–25 2012, Bondy, France. [5] Truong, X.V., Huynh, X. H., Le, N.M., Drogoul, A. 2013. Optimizing an Environmental Surveillance Network with Gaussian Process − An optimization approach by agent-based simulation. The Sixth International KES Conference on Agents and Multi-agent Systems – Technologies and Applications (KES AMSTA 2013), Hue City, Vietnam, 27 - 29 May 2013. IOS Press. [6] Truong, M. T. , Truong, X.V., Amblard, F., Drogoul, A., Gaudou, G., Huynh, X.H, Le N.M., Sibertin- Blanc, C. A framework for combining Business Intelligence & Agent-based Simulation. Submitted to The 4th International Symposium on Information and Communication Technology (SoICT 2013), Da Nang City, Vietnam, 5-6 December 2013. SUBMITTED PAPERS

[1] Truong, X.V., Le, N.M., Huynh, X.H., Drogoul, A. 2013. Supporting the Design and Optimization of Environmental Surveillance Networks through Virtual Experiments: a Methodological Proposal. Submitted to Environmental Monitoring and Assessment (Springer Publisher). POSTERS [1] Truong, X.V., Huynh, X.H., Le, N.M., Drogoul, A. 2010. Optimizing by Simulation an Environmental Surveillance Network - Application to the Fight against Rice Pest in the Mekong Delta Region, Vietnam. COnférence sur la Modélisation Mathématique et Informatique des Systèmes Complexes (COMMISCO 2010), October 11–13, 2010, Bondy, France. [2] Truong, M.T, Truong, X.V., Amblard, F., Drogoul, A., Gaudou, B., Huynh, X.H., Le, N.M. and Sibertin- Blanc, C. 2013. A framework for combining Business Intelligence & Agent-based Simulation. The 10th IEEE RIVF International Conference on Computing and Communication Technologies (RIVF-2013), November 10-13, 2013, Hanoi city, Vietnam.

123

Annex 1 - DATA TABLES

1) Light-trap coordinates

ID latitude longitude ID latitude longitude ID latitude longitude ID latitude longitude 1 588042 1091566 13 589383 1088204 25 563869 1038719 37 603674 1080273 2 588700 1097972 14 586823 1088570 26 569102 1025902 38 607219 1079267 3 581456 1100798 15 553232 1066126 27 581188 1033553 39 596334 1080819 4 567748 1094000 16 560271 1066811 28 581395 1025596 40 604349 1058655 5 563996 1103511 17 563929 1076660 29 577102 1045373 41 611359 1063430 6 578154 1100258 18 549285 1028494 30 589775 1051836 42 593093 1060395 7 570485 1095600 19 559377 1038971 31 567739 1043484 43 587716 1070975 8 567996 1076124 20 550663 1021204 32 565204 1047937 44 599276 1070859 9 575925 1087283 21 552401 1054251 33 563583 1052416 45 594372 1074913 10 553946 1073125 22 554130 1034740 34 611008 1070453 46 611800 1049405 11 560609 1081144 23 556993 1049985 35 623187 1060908 47 600883 1055607 12 553394 1079905 24 562443 1026777 36 626025 1046485 48 587307 1044409

Data source: Departments of Plant Protection, Hau Giang, Soc Trang and Bac Lieu Provinces, Mekong Delta region, Vietnam. 2) Example of light-trap sampling data

BPH trap-density of days in January, 2010

No District Small town 1 2 3 4 5 6 7 8 9 10 1 Long My Vinh Vien 0 0 0 6 11 0 0 0 0 0 Thuan Hung 0 0 0 0 0 0 0 0 0 0 Long Binh 0 0 0 0 0 0 0 0 0 0 2 Chau Thanh Dong Phuoc 300 180 140 220 100 7 0 0 0 0 Nga 6 50 30 10 5 12 5 2 0 0 0 Dong Thanh 1 3 0 0 0 0 0 0 0 0 3 Chau Thanh A Tan Hoa 0 0 0 0 0 0 0 0 0 0 Truong Long A 0 0 0 0 0 0 0 0 0 0 Tan Phu Thanh 0 0 0 0 0 0 0 0 0 0 Nhon Nghia A 0 0 0 0 0 0 0 0 0 0 4 Phung Hiep Tan Long 0 0 0 0 0 0 0 0 0 0 Phuong Binh 0 0 0 0 0 0 0 0 0 0 Binh Thanh 0 0 0 0 0 0 0 0 0 0 Hoa An 0 0 0 0 0 0 0 0 0 0 5 Vi Thuy Vi Thuan Tay 0 0 0 0 0 0 0 0 0 0 Vi Trung 0 0 0 0 0 0 0 0 0 0

Data source: Department of Plant Protection, Hau Giang Province, Mekong Delta region, Vietnam.

124

3) List of insects collected by light-trap

No Insect name (English) Insect name (Vietnamese) 1 Brown Plant Hopper (Nilaparvata lugens) Rầy nâu 2 White Back Plant Hopper (Sogatella furcifera) Rầy lưng trắng 3 Green paddy leafhopper Rầy xanh đuôi đen 4 Scotinophra burmeister Bọ xít đen 5 Yellow Stem Borer Bướm sâu đục thân 6 Cnaphalocrosis medinalis Guenee Bướm sâu cuốn lá 7 Microvelia donglasi atrolineata Bọ xít nước 8 Crytohinus lividipennis Bọ xít mù xanh 9 Ladybird Bọ rùa 10 Dragon flies Chuồn chuồn các loại

Data source: Departments of Plant Protection in the Mekong Delta region, Vietnam. 4) Common weather data

Month 1 2 3 4 5 6 7 8 9 10 11 12 Mean_Wind_Speed 9.4 10.0 9.6 11.4 13.8 15.5 15.5 14.9 14.2 11.7 11.2 8.7 Min_Wind_Speed 6.0 6.0 6.0 6.0 6.0 8.0 10.0 10.0 7.0 6.0 7.0 6.0 Max_Wind_Speed 14.0 14.0 14.0 24.0 24.0 30.0 21.0 21.0 28.0 20.0 18.0 13.0 Wind_Direction_From 180 180 180 0 0 0 0 0 0 180 180 180 Wind_Direction_To 225 225 225 45 45 45 45 45 45 225 225 225

Data source: Aggregated from data provided by Dr. Le Anh Tuan (Research Institute for Climate Change, CanTho University) 5) Station weather data

Moc My Cao Cang Chau Can Soc Rach Bac Ca Station name Hoa Tho Lanh Long Doc Tho Trang Gia Lieu Mau Temp_Mean 26.20 25.60 25.50 25.70 26.30 26.00 25.90 25.90 25.80 26.50 Temp_Max 31.20 29.90 30.20 30.20 31.00 30.50 30.60 30.00 30.20 30.30 Temp_Min 23.20 23.10 22.60 22.80 23.00 23.40 23.40 23.40 23.00 24.00 Rain_Amount 18 45 30 31 6 15 70 50 3 1 Rain_Max 10 29 27 16 4 7 63 4 2 1 Rain_No_Days_Max 20 20 20 19 20 20 20 20 19 41 Rain_No_Days 3 3 3 2 4 0 2 3 2 3 Rain_Mean 0.58 1.45 0.97 1.00 0.19 0.48 2.26 1.61 0.10 0.03 Hum_Mean 82 79 85 83 81 80 82 80 82 78 Hum_Min 50 47 60 48 50 52 48 58 53 47 Hum_No_Days 7 31 17 27 16 1 27 17 25 27 Sunning_Hours 239 218 226 223 214 224 225 234 231 165 Sunning_Hours_Mean 7.71 7.03 7.29 7.19 6.90 7.23 7.26 7.55 7.45 5.32

Data source: Meteorological Data by Month Year and Stations (Vietnamese), Ministry of Agriculture and Rural Development (Website: http://fsiu.mard.gov.vn/data/khituong.htm).

125

6) Administrative boundary data Data is downloaded in format of *.shp files.

4) Districts 5) Small town 3) Provinces

Data source: DIVA-GIS, a free computer program for mapping and geographic data analysis (a geographic information system (GIS) (Website: http://www.diva-gis.org/).

126

Annex 2 - CODE SNIPPETS IN GAML (GAMA PLATFORM)

A. Species definitions 1) Global parameters model GlobalParam global { // Sensor number: var no_of_sensor type: int init: 100 ;

// Simulation step: var SIMULATION_STEP type: int init: 0; var DAY type: int init: 1; var MONTH type: int init: 1; var YEAR type: int init: 2010;

// Grid parameters: var GRID_COLUMN_NO type: int init: 60; var GRID_ROW_NO type: int init: 60;

// Optimization modes: var OPTIMIZATION_MODE type: string init: "CDSN";

// Thresholds: var MAX_DENSITY type: float value: 1000000.0 ; var MIN_DENSITY type: float value: 0.0;

// COORDINATES var BOUNDARY_MAX_X type: float value: 0.0; var BOUNDARY_MIN_X type: float value: 0.0; var BOUNDARY_MAX_Y type: float value: 0.0; var BOUNDARY_MIN_Y type: float value: 0.0;

// Disk graph radius: var DISK_RADIUS type: float value: 15000.0; var LOWEST_RADIUS type: float value: 5000.0; var CORRELATION_THRESHOLD type: float value: 0.3;

// IDW_RANGE: var IDW_RANGE type: float value: 40000.0;

// Density data var HISTORICAL_DURATION type: int value: 365;

// LIFE CYCLE OF BROWN PLANT HOPPER var BPH_LIFE_DURATION type: int init: 32 parameter: 'BPH LIFE DURATION:' category: 'BROWN PLANT HOPPER'; var ADULT_EGG_RATE type: float init: 350.0 parameter: 'ADULT-EGG RATE:' category: 'BROWN PLANT HOPPER'; // Long wing: 100; Short wing: 400 var EGG_NYMPH_RATE type: float init: 0.4 parameter: 'EGG-NYMPH RATE:' category: 'BROWN PLANT HOPPER'; var NYMPH_ADULT_RATE type: float init: 0.4 parameter: 'NYMPH-ADULT RATE:' category: 'BROWN PLANT HOPPER';

127

var NATURAL_MORTALITY_RATE type: float init: 0.035 parameter: 'NATURAL MORTALITY RATE:' category: 'BROWN PLANT HOPPER'; var MORTALITY_RATE_BY_PREDACTORS type: float init: 0.215 parameter: 'MORTALITY RATE BY PREDACTORS:' category: 'BROWN PLANT HOPPER'; // Can be calculated by density of other insect var EGG_DURATION type: float init: 7.0 parameter: 'EGG DURATION:' category: 'BROWN PLANT HOPPER'; var NYMPH_DURATION type: float init: 13.0 parameter: 'NYMPH DURATION:' category: 'BROWN PLANT HOPPER'; var ADULT_DURATION type: float init: 12.0 parameter: 'ADULT DURATION:' category: 'BROWN PLANT HOPPER'; var ADULT_DURATION_GIVING_BIRTH_DURATION type: float init: /*S3: */12.0 /*Default: 6.0*/ parameter: 'GIVING BIRTH DURATION:' category: 'BROWN PLANT HOPPER'; // Default: 6.0

// Propagation parameters: var PROPAGATION_DENSITY_THRESHOLD type: float init: 50.0 parameter: 'PROPAGATION DENSITY THRESHOLD:' category: 'PROPAGATION';

// MANAGEMENT PARAMETERS var NUMBER_OF_CURRENT_NODES type: int value: 48; var LIMIT_NUMBER_OF_NODES type: int value: 100; var NUMBER_OF_ADDED_NODES type: int value: 52;

// TRANSPLANTATION (Used for the optimization model): var WINTER_SPRING_SEASON_COEF type: float value: 0.4; var SUMMER_AUTUMN_SEASON_COEF type: float value: 0.4; var BASED_SEASON_COEF type: float value: 0.2;

// TRANSPLANTATION (Used for the migration model): var RICE_AGE type: float value: 60.0; // WINTER_SPRING: 11th month,

// Modified by TRUONG MINH THAI - PHD STUDENT OF "École Supérieure de Commerce de Toulouse", Frane (Moving the *.shp files to PostGreSQL/PostGIS RDBMS) var BOUNDS type:map init: ['host'::'localhost','dbtype'::'postgres','database'::'SurveillanceNetD B','port'::'5432','user'::'postgres','passwd'::'adminadmin', "select":: "select ST_AsBinary(geom) as geo from VNM_ADM2 WHERE ID_2 in (38254,38257,38249)",'srid'::'4326' //,'longitudeFirst'::'false' ];

var PARAMS type:map init: ['host'::'localhost','dbtype'::'postgis','database'::'SurveillanceNetDB ','port'::'5432','user'::'postgres','passwd'::'adminadmin'];

var ADMINISTRATIVE_DISTRICT type:string init: "SELECT id_1,id_2,name_1,name_2,name_3, ST_AsBinary(geom) as geo FROM VNM_ADM3" + " WHERE ID_1 = 3291" ;// 13 provinces of Mekong Delta var ADMINISTRATIVE_PROVINCE type:string init: "SELECT id_1,id_2,name_1,name_2, ST_AsBinary(geom) as geo FROM VNM_ADM2 WHERE ID_1 = 3291" ;

//MAPING 1:1 from shp file (End of Modification by TRUONG MINH THAI) var SEA_REGION type:string init: "SELECT Description, ST_AsBinary(geom) as geo FROM SeaRegion" ; var WS_LAND_USE type:string init: "SELECT id,sdd, ST_AsBinary(geom) as geo FROM LandUse_WS_Region" ; var SA_LAND_USE type:string init: "SELECT id,sdd, ST_AsBinary(geom) as geo FROM LandUse_SA_Region" ; var NODE type:string init: "SELECT id , LightTrap, District, x, y, Province, id_0 , id_1, id_2,id_3, ST_AsBinary(geom) as geo FROM Three_Provinces_Lighttraps_WGS WHERE remarks != 'unused'";

128

var WEATHER type:string init: "SELECT id,name, ST_AsBinary(geom) as geo FROM WeatherRegion"; // Maping CSV to SQL 1:1 var LIGHTTRAP_DATA type:string init: "SELECT * FROM lighttrap_data_2010_Correlation_Var WHERE remarks <> 'unused'"; var STANDARD_DEVIATION_DATA type:string init: "SELECT * FROM stdDeviation" ; var GENERAL_WEATHER_DATA type:string init: "SELECT * FROM general_weather_data"; var STATION_WEATHER_DATA type:string init: "SELECT * FROM station_weather_data03"; var AUTOCORRELATION_DATA type:string init: "SELECT * FROM auto_correlation_between_traps"; }

2) Environmental cellular automaton grid environmental_cellular_agent width: 60 height: 60 neighbours: 8 { const id type: string ; const name type: string ; var square_area type: float ;

// TWO PRINCIPAL INDICES var attractiveness_index type: float init: 0.001 ; var obstruction_index type: float init: 1.0 ;

// Transplantation indices: var based_transplantation_index type: float init: 0.0 ; // By default: grass, other plants ... var WS_transplantation_index type: float init: 0.0; var SA_transplantation_index type: float init: 0.0;

// Weather indices: var humidity_index type: float init: 0.0; var temperature_index type: float init: 0.0; var raining_index type: float init: 0.0; var weather_index type: float init: 0.0;

// Sea & river region: var is_sea_region type: bool init: false; var is_monitored type: bool init: false;

// Brown Plant Hopper: var number_of_BPHs type: float init: 0.0 ; var regression_count type: int init: 0 ; matrix grid_density_matrix size: {1, 32}; // Containing density of BPHs for all the stages of life cycle

// PROPAGATION VARIABLES var number_of_movable_BPHs type: float init: 0.0 ; // Depending on the local conditions (Determined by the attractiveness and obstruction indices) var out_number_of_BPHs type: float init: 0.0 ; var in_number_of_BPHs type: float init: 0.0 ;

var color type: rgb init: rgb('white') ; var _discretized type: int init: 0;

var z type: float init: 0.0;

129

aspect ThreeDirections{..} reflex Step{..} action initializeObstructionAndAttractivenessIndices{..} action updateObstructionAndAttractivenessIndices{..} action updateRiceAge{..} action updateWeatherIndex{..} action updateDensityCycle{..} action updateDensityCycle2{..} action updateAdultDensities{..} action growthCycle{..} action setcolor{..} }

3) Heuristic cellular automaton (Gaussian process entropy and Combinational heuristics) grid heuristic_cellular_agent width: 60 height: 60 neighbours: 8 { // Kriging estimation: var estimation_std_deviation type: float init: 0.001;

// Adding the obstruction index: var combinational_std_deviation type: float init: 0.001;

// Tabu search support: var isTabu type: bool init: false;

var color type: rgb init: rgb('white') ; var maximum_std_deviation type: float init: 2200.0;

var z type: float init: 0.0;

aspect elevation{..} }

4) Correlation & Disk graph-based Surveillance Network

° Node agent species node { var id type: string ; var name type: string ; var id_0 type: string ; var id_1 type: string ; var id_2 type: string ; var province_name type: string ; var id_3 type: string ; var district_name type: string ; matrix density_matrix size: {1, 365};

var number_of_BPHs type: float init: 0.0; var number_of_BPHs_by_day type: float init: 0.0;

var number_of_BPHs_total type: float init: 0.0 ;

// WORKING STATUS: TRUE/FALSE var working_status type: bool init: true; var existing_status type: bool init: true; var model_status type: string init: "ESTABLISHED";

130

// Pointers to callular agents: var cellular_agent_pointer type: environmental_cellular_agent; var correlation_cellular_agent_pointer type: cellula_correlation; var heuristic_cellular_agent_pointer type: heuristic_cellular_agent;

// OPTIMIZATION SUPPORTS var correlation_coefficient type: float init: 0.0; var sample_variance type: float init: 0.0; var degree type: int init: 0;

// For PSO: var velocity_x type: float value: 0.0; var velocity_y type: float value: 0.0;

// For Tabu search: var Tabu_List type: list init: nil; var TABU_LIST_MAXLENGHT type: int init: 24; var Tabu_List_lenght type: int init: 0;

var nodecolor type: rgb value: rgb([0, 0, 255]); var z type: float init: 0.0; reflex run_simulation{..}

action run_greedy_on_gaussian_entropy{..} action run_Tabu_search_on_combinational_index{..} action run_particle_swarm_optimization{..} }

° Edge agent species edge { var id type: string ; var name type: string ; var source type: node init: nil; var destination type: node init: nil; var shape type: geometry init: polygon([{0,0}, {1,1}, {0,0}]); var coefficience type: int init: 0 ; var correlationWeight type: int init: 0 ; var color type: rgb init: rgb([0, 0, 0]); aspect default {..} action setcolor {..} }

° Graph agent species UnitDiskGraph skills: [situated] { var UDgraph type: graph init: nil; var setup type: int init: 0; var alpha type: float init: 135.0;

// Tempo variable used in check_simple_connection action: var connected_temp type: bool init: false; var working_status type: bool value: true; action allocateNewNodeByCorrelation{..} action allocateNewNodeByStdDeviation{..} action add_new_nodes{..} // Add n nodes into the graph action correlation_estimation_for_trap{..} action resetEdgesList{..} }

131

5) Rice region model RiceRegionLandUse ... entities { // WINTER-SPRING RICE CROP species WS_rice_region{ var id type: string ; var description type: string ; var color type: rgb init: rgb([64, 255, 64]); }

// SUMMER-AUTUMN RICE CROP species SA_rice_region{ var id type: string ; var description type: string ; var color type: rgb init: rgb([180, 0, 0]); } } 6) Natural species species meteo_station_region { var id type: string ; var name type: string ;

// Temperature var Temp_Mean type: list init: nil; var Temp_Max type: list init: nil; var Temp_Min type: list init: nil; // Rainning var Rain_Amount type: list init: nil; var Rain_Max type: list init: nil; var Rain_No_Days_Max type: list init: nil; var Rain_No_Days type: list init: nil; var Rain_Mean type: list init: nil;

// Humidity var Hum_Mean type: list init: nil; var Hum_Min type: list init: nil; var Hum_No_Days type: list init: nil;

// Sunning var Sunning_Hours type: list init: nil; var Sunning_Hours_Mean type: list init: nil;

// Wind velocity: var Mean_Wind_Velocity type: list init: nil; var Min_Wind_Velocity type: list init: nil; var Max_Wind_Velocity type: list init: nil;

// Wind direction: var Wind_Direction_From type: list init: nil; var Wind_Direction_To type: list init: nil;

// Aggregation index: var general_weather_index type: list init: nil; var simulation_temperature type: list init: nil; var temperature_index type: list init: nil;

var simulation_humidity type: list init: nil; var humidity_index type: list init: nil;

132

var color type: rgb init: rgb([0, 255, 255]);

reflex WeatherReflex{..} action calculateAggregationIndices{..} } species NaturalEnvironment { var Mean_Wind_Speed type: list init: nil; var Min_Wind_Speed type: list init: nil; var Max_Wind_Speed type: list init: nil; var Wind_Direction_From type: list init: nil; var Wind_Direction_To type: list init: nil; } 7) Boundary regions (Administrative & Sea regions) species regional_area { var id_1 type: string ; var region_name type: string ; } species province_area { var id_1 type: string ; var region_name type: string ; var id_2 type: string ; var province_name type: string ;

// RICE AGE: var rice_age type: float init: 0.0; var color type: rgb init: rgb('white') ; } species district_area { var id_1 type: string ; var region_name type: string ; var id_2 type: string ; var province_name type: string ; var id_3 type: string ; var district_name type: string ; var transplantation_index type: float init: 0.0 ; var number_of_BPHs type: float init: 0.0 ;

var color type: rgb init: rgb('white') ; } species smalltown_area { var id_1 type: string ; var region_name type: string ; var id_2 type: string ; var province_name type: string ; var id_3 type: string ; var district_name type: string ; var id_4 type: string ; var smalltown_name type: string ; var color type: rgb init: rgb('white') ; }

species sea_region{ var description type: string ; var color type: rgb init: rgb('blue') ; }

133

B. Optimization algorithms 1) Greedy algorithm (Heuristic argument: Gaussian process entropy) action run_greedy_on_gaussian_entropy { let _the_std_deviation type: heuristic_cellular_agent value: heuristic_cellular_agent_pointer; let _current_std_deviation type: float value: _the_std_deviation.estimation_std_deviation; let std_deviation_list type: list value: (heuristic_cellular_agent_pointer neighbours_at 1) of_species heuristic_cellular_agent; let node_list type: list value: (list (node)) where ((node (each) distance_to self) <= DISK_RADIUS);

// Calculating the minimal distance to the current node let _current_min_distance type: float value: DISK_RADIUS + 0.001; loop var: i from: 0 to: length(node_list) - 1 { if((node_list at i) != self) { let _distance type: float value: (node_list at i) distance_to self; if(_distance < _current_min_distance) { set _current_min_distance <- _distance; } } }

let _maximum_std_deviationL type: float value: 0.0; let _maximum_cell_std_deviationL type: heuristic_cellular_agent value: nil; let _maximum_std_deviationG type: float value: 0.0; let _maximum_cell_std_deviationG type: heuristic_cellular_agent value: nil;

if(_neighbour_std_deviation > _maximum_std_deviationL) { set _maximum_std_deviationL value: _neighbour_std_deviation; set _maximum_cell_std_deviationL value: std_deviation_list at i; } }

// Global best loop var: i from: 0 to: length(node_list) - 1 { let _neighbour_std_deviation type: float value: 0.0; ask target: node_list at i { ask heuristic_cellular_agent_pointer { set _neighbour_std_deviation value: estimation_std_deviation; }

}

134

if(_neighbour_std_deviation > _maximum_std_deviationG) { set _maximum_std_deviationG value: _neighbour_std_deviation; set _maximum_cell_std_deviationG value: (node_list at i).heuristic_cellular_agent_pointer; } } // Next location: let _vx type: float value: 0; let _vy type: float value: 0; let _c1 type: float value: 2; let _c2 type: float value: 2; let _w type: float value: 1;

set _vx value: velocity_x * _w + _maximum_cell_std_deviationL.location.x - location.x; set _vy value: velocity_y * _w + _maximum_cell_std_deviationL.location.y - location.y; let _x type: float value: location.x +_vx; if(location.x +_vx > BOUNDARY_MAX_X) { set _x <- BOUNDARY_MAX_X; write "MAX_X" + BOUNDARY_MAX_X; } else if(location.x +_vx < BOUNDARY_MIN_X) { write "MIN_X" + BOUNDARY_MIN_X; set _x <- BOUNDARY_MIN_X; }

let _y type: float value: location.y +_vy; if(location.y +_vy > BOUNDARY_MAX_Y) { set _y <- BOUNDARY_MAX_Y; write "MAX_Y" + BOUNDARY_MAX_Y; } else if(location.y +_vy < BOUNDARY_MIN_Y) { set _y <- BOUNDARY_MAX_Y; write "MIN_Y" + BOUNDARY_MAX_Y; }

// Decision of movement: let _current_location_x type: float value: location.x; let _current_location_y type: float value: location.y; let _current_std_deviation type: float value: 0.0; ask heuristic_cellular_agent_pointer { set _current_std_deviation value: estimation_std_deviation; }

set location value: {_x, _y, 0.0};

set node_list value: (list (node)) where ((node (each) distance_to self) <= DISK_RADIUS);

// Calculating the minimal distance to the current node let _moved_min_distance type: float value: DISK_RADIUS + 0.001; loop var: i from: 0 to: length(node_list) - 1 { if((node_list at i) != self) { let _distance type: float value: (node_list at i) distance_to self;

135

if(_distance < _moved_min_distance) { set _moved_min_distance <- _distance; } }

}

// Decision of movement: let std_deviation_list type: list value: (list (heuristic_cellular_agent)) where (each overlaps(self)); let cell_list type: list value: (list (environmental_cellular_agent)) where (each overlaps(self)); if((((std_deviation_list at 0).estimation_std_deviation) < (_current_std_deviation)) or ((cell_list at 0).obstruction_index > 0.6) or ((cell_list at 0).is_monitored)) { set location value: {_current_location_x, _current_location_y, 0.0}; } else { // Re-updating the last location to NOT MONITORED set dominated_cell.is_monitored value: false; set heuristic_cellular_agent_pointer value: std_deviation_list at 0; set cellular_agent_pointer value: cell_list at 0; // Re-updating the new location to MONITORED set dominated_cell.is_monitored value: true; }

do setcolor; }

2) Hill climbing search algorithm (Heuristic argument: Combinational heuristics) action run_hill_climbing { let _the_std_deviation type: heuristic_cellular_agent value: heuristic_cellular_agent_pointer; let _current_std_deviation type: float value: _the_std_deviation.hydrid_std_deviation; let std_deviation_list type: list value: (heuristic_cellular_agent_pointer neighbours_at 1) of_species heuristic_cellular_agent;

let _maximum_std_deviationL type: float value: 0.0; let _maximum_cell_std_deviationL type: heuristic_cellular_agent value: nil;

if(_neighbour_std_deviation > _maximum_std_deviationL) { set _maximum_std_deviationL value: _neighbour_std_deviation; set _maximum_cell_std_deviationL value: std_deviation_list at i; } }

136

// Next location: let _vx type: float value: 0; let _vy type: float value: 0; let _c1 type: float value: 2; let _c2 type: float value: 2; let _w type: float value: 1;

set _vx value: velocity_x * _w + _maximum_cell_std_deviationL.location.x - location.x; set _vy value: velocity_y * _w + _maximum_cell_std_deviationL.location.y - location.y; let _x type: float value: location.x +_vx; if(location.x +_vx > BOUNDARY_MAX_X) { set _x <- BOUNDARY_MAX_X; write "MAX_X" + BOUNDARY_MAX_X; //set _vx <- 0.0; } else if(location.x +_vx < BOUNDARY_MIN_X) { write "MIN_X" + BOUNDARY_MIN_X; set _x <- BOUNDARY_MIN_X; }

set location value: {_x, _y, 0.0};

// Calculating the minimal distance to the current node let _moved_min_distance type: float value: DISK_RADIUS + 0.001;

else

137

{ // Re-updating the last location to NOT MONITORED set dominated_cell.is_monitored value: false; set heuristic_cellular_agent_pointer value: std_deviation_list at 0; set cellular_agent_pointer value: cell_list at 0; // Re-updating the new location to MONITORED set dominated_cell.is_monitored value: true; } do setcolor; }

3) Tabu search algorithm (Heuristic argument: Combinational heuristics) action run_Tabu_search_on_combinational_index { let _the_std_deviation type: heuristic_cellular_agent value: heuristic_cellular_agent_pointer; let _current_std_deviation type: float value: _the_std_deviation.hydrid_std_deviation; let cell_list type: list value: (self.cellular_agent_pointer neighbours_at 1) of_species environmental_cellular_agent where (each.is_monitored = true); if(length(cell_list) > 1) { let cell_count type: int value: length(environmental_cellular_agent);

int _location value: int (rnd(cell_count - 1)); bool allocated value: false; loop while: !allocated { if(((environmental_cellular_agent at _location).obstruction_index <= 0.6) and !(environmental_cellular_agent at _location).is_monitored) { set allocated value: true; set dominated_cell.is_monitored value: false;

set cellular_agent_pointer value: environmental_cellular_agent at _location; set correlation_cellular_agent_pointer value: cellula_correlation at _location; set heuristic_cellular_agent_pointer value: heuristic_cellular_agent at _location; set location value: (environmental_cellular_agent at _location).location; set (environmental_cellular_agent at _location).is_monitored value: true; } else { set _location value: int (rnd(cell_count - 1)); } } loop var: i from: 0 to: length(Tabu_List) - 1 { ask heuristic_cellular_agent(Tabu_List at i) { set isTabu value: false; } } set Tabu_List value: nil; set Tabu_List_lenght value: 0; } else { let std_deviation_list type: list value: (heuristic_cellular_agent_pointer neighbours_at 1) of_species heuristic_cellular_agent where (heuristic_cellular_agent_pointer.isTabu = false);

138

write "STEP: " + SIMULATION_STEP + " LENGTH: " + length(std_deviation_list); if(length(std_deviation_list) != 0) { let _maximum_std_deviationL type: float value: 0.0; let _maximum_cell_std_deviationL type: heuristic_cellular_agent value: nil;

// Local best loop var: i from: 0 to: length(std_deviation_list) - 1 { let _neighbour_std_deviation type: float value: 0.0; ask target: std_deviation_list at i { set _neighbour_std_deviation value: hydrid_std_deviation; } if(_neighbour_std_deviation > _maximum_std_deviationL) { set _maximum_std_deviationL value: _neighbour_std_deviation; set _maximum_cell_std_deviationL value: std_deviation_list at i; } } // Next location: let _vx type: float value: 0; let _vy type: float value: 0; let _c1 type: float value: 2; let _c2 type: float value: 2; let _w type: float value: 1; set _vx value: _maximum_cell_std_deviationL.location.x - location.x; set _vy value: _maximum_cell_std_deviationL.location.y - location.y; let _x type: float value: location.x +_vx; let _y type: float value: location.y +_vy;

// Decision of movement: let _current_location_x type: float value: location.x; let _current_location_y type: float value: location.y; ask heuristic_cellular_agent_pointer { set isTabu value: true; } set location value: {_x, _y, 0.0}; let cell_list type: list value: (list (environmental_cellular_agent)) where (each overlaps(self)); if((cell_list at 0).is_monitored) { set location value: {_current_location_x, _current_location_y, 0.0}; } else { // Re-updating the last location to NOT MONITORED set dominated_cell.is_monitored value: false; set heuristic_cellular_agent_pointer value: _maximum_cell_std_deviationL; set cellular_agent_pointer value: cell_list at 0; // Re-updating the new location to MONITORED set dominated_cell.is_monitored value: true;

// Update Tabu list: if(Tabu_List_lenght < TABU_LIST_MAXLENGHT)

139

{ set Tabu_List value: Tabu_List + heuristic_cellular_agent_pointer; set Tabu_List_lenght value: Tabu_List_lenght + 1; } else { let removeDevCell value: heuristic_cellular_agent(Tabu_List[0]); set removeDevCell.isTabu value: false; set Tabu_List value: Tabu_List - Tabu_List[0];

set Tabu_List value: Tabu_List + heuristic_cellular_agent_pointer; } }

do setcolor; } } }

4) Particle Swarm Optimization (Heuristic argument: Gaussian process entropy) action run_particle_swarm_optimization { let _the_std_deviation type: heuristic_cellular_agent value: heuristic_cellular_agent_pointer; let _current_std_deviation type: float value: _the_std_deviation.estimation_std_deviation; let std_deviation_list type: list value: (heuristic_cellular_agent_pointer neighbours_at 1) of_species heuristic_cellular_agent; let node_list type: list value: (list (node)) where ((node (each) distance_to self) <= DISK_RADIUS);

let _maximum_std_deviationL type: float value: 0.0; let _maximum_cell_std_deviationL type: heuristic_cellular_agent value: nil;

let _maximum_std_deviationG type: float value: 0.0; let _maximum_cell_std_deviationG type: heuristic_cellular_agent value: nil;

if(_neighbour_std_deviation > _maximum_std_deviationL) {

140

set _maximum_std_deviationL value: _neighbour_std_deviation; set _maximum_cell_std_deviationL value: std_deviation_list at i; } }

} if(_neighbour_std_deviation > _maximum_std_deviationG) { set _maximum_std_deviationG value: _neighbour_std_deviation; set _maximum_cell_std_deviationG value: (node_list at i).heuristic_cellular_agent_pointer; } } // Next location: let _vx type: float value: 0; let _vy type: float value: 0; let _c1 type: float value: 2; let _c2 type: float value: 2; let _w type: float value: 1; set _vx value: velocity_x * _w + _c1 * rnd(1) * (_maximum_cell_std_deviationL.location.x - location.x) + _c2 * rnd(1) * (_maximum_cell_std_deviationG.location.x - location.x); set _vy value: velocity_y * _w + _c1 * rnd(1) * (_maximum_cell_std_deviationL.location.y - location.y) + _c2 * rnd(1) * (_maximum_cell_std_deviationG.location.y - location.y); let _x type: float value: location.x +_vx; if(location.x +_vx > BOUNDARY_MAX_X) { set _x <- BOUNDARY_MAX_X; write "MAX_X" + BOUNDARY_MAX_X; } else if(location.x +_vx < BOUNDARY_MIN_X) { write "MIN_X" + BOUNDARY_MIN_X; set _x <- BOUNDARY_MIN_X; } let _y type: float value: location.y +_vy; if(location.y +_vy > BOUNDARY_MAX_Y) { set _y <- BOUNDARY_MAX_Y; write "MAX_Y" + BOUNDARY_MAX_Y; } else if(location.y +_vy < BOUNDARY_MIN_Y) { set _y <- BOUNDARY_MAX_Y; write "MIN_Y" + BOUNDARY_MAX_Y; }

141

set location value: {_x, _y, 0.0}; set node_list value: (list (node)) where ((node (each) distance_to self) <= DISK_RADIUS);

// Decision of movement: let std_deviation_list type: list value: (list (heuristic_cellular_agent)) where (each overlaps(self)); let cell_list type: list value: (list (environmental_cellular_agent)) where (each overlaps(self)); if((((std_deviation_list at 0).estimation_std_deviation * (_moved_min_distance)) < (_current_std_deviation * (_current_min_distance))) or ((cell_list at 0).obstruction_index > 0.6) or ((cell_list at 0).is_monitored)) { set location value: {_current_location_x, _current_location_y, 0.0}; } else { // Re-updating the last location to NOT MONITORED set dominated_cell.is_monitored value: false; set heuristic_cellular_agent_pointer value: std_deviation_list at 0; set cellular_agent_pointer value: cell_list at 0; // Re-updating the new location to MONITORED set dominated_cell.is_monitored value: true; } do setcolor; }

// Not verified: action getCorrelationCoefficient { let type: list name: list_neighbors value: (list (cellula_correlation)) where (each overlaps(self)); loop var: i from: 0 to: length(list_neighbors) - 1 { let the_cell type: cellula_correlation value: list_neighbors at i; if self intersects the_cell {

142

set correlation_coefficient value: the_cell.correlation_coefficient; } } } 5) CDSN-based optimization (Heuristic: Gaussian process entropy) action optimizeByCDSN_Based_Algorithm { write "Optimize by CDSN-based algorithm - GLOBAL STEP: " + SIMULATION_STEP; if condition: (current_udg.setup = 0) { do action: loadCellsCoordinates;

do action: loadLighttrapData; do action: loadStandardDeviation; do action: loadGeneralWeatherData; do action: loadStationWeatherData;

do action: loadCurrentTrapCoordinates; do action: loadOptimalTrapCoordinates; ask target: current_udg { do action: resetEdgesList;

} do action: estimate_IDW_Correlation;

// Surface of standard deviation ask heuristic_cellular_agent as list { let cells_possibles type: list of: heuristic_cellular_agent <- (self neighbours_at 2) + self;

loop i from: 0 to: length(shape.points) - 1 { let geom type: geometry <- square(1.0); set geom <- geom translated_to (shape.points at i); let myCells type: list of: heuristic_cellular_agent <- cells_possibles where (each.shape intersects geom); //write "z value: " + z; let z1 type: float <- mean (myCells collect (each.z)); set shape <- shape add_z_pt {i, (z1^2)}; } } }

// Calculating the correlation: if (SIMULATION_STEP < BPH_LIFE_DURATION) { do action: getsCorrelationByEdges;

do action: getCorrelationTwoDays;

if(SIMULATION_STEP = 0) { do action: assign_cell_to_node; }

do action: estimate_Kriging_by_day_R { arg estimated_day value: SIMULATION_STEP;

143

} do action: MSE_variogram_R { arg estimated_day value: SIMULATION_STEP; }

// Surface of environmental obstruction: ask environmental_cellular_agent as list { let cells_possibles type: list of: environmental_cellular_agent <- (self neighbours_at 2) + self; loop i from: 0 to: length(shape.points) - 1 { let geom type: geometry <- square(1.0); set geom <- geom translated_to (shape.points at i); let myCells type: list of: environmental_cellular_agent <- cells_possibles where (each.shape intersects geom); let z1 type: float <- mean (myCells collect (each.z)); set shape <- shape add_z_pt {i, (z1^2)}; } } } else { do action: update_curent_density { arg estimated_day value: SIMULATION_STEP; } if (SIMULATION_STEP = BPH_LIFE_DURATION) {

do action: optimize_network_by_CDSN { arg added_number value: NUMBER_OF_ADDED_NODES; } do action: loadOptimalTrapCoordinates; do action: dispatchByWind; } else { do action: dispatchByWind; }

do action: MSE_variogram_R { arg estimated_day value: SIMULATION_STEP; } }

set SIMULATION_STEP value: SIMULATION_STEP + 1; }

// Action: optimize_network_by_CDSN // Optimizing the surveillance network by CDSN action optimize_network_by_CDSN { arg added_number type: int;

let min_degree type: int value: (list (node)) min_of (each.degree); let max_degree type: int value: (list (node)) max_of (each.degree);

// Controlling the placement: 144

let number type: int value: added_number; let all_allocated type: bool value: false;

let the_degree type: int value: min_degree; loop while: !all_allocated { let node_list type: list value: (list (node)) where (each.degree = the_degree);

if(number >= length(node_list)) { set number value: number - length(node_list); ask target: current_udg { loop from: 0 to: length(node_list) - 1 var: i { let the_node value: node at i; do allocateNewNodeByStdDeviation { arg center_node value: the_node; } } }

// The next loop for placing the node: if(the_degree = max_degree) { set min_degree value: (list (node)) min_of (each.degree); set max_degree value: (list (node)) max_of (each.degree); set the_degree value: min_degree; }

} else { if(number != 0) { ask target: current_udg { loop from: 0 to: number - 1 var: i { let the_node value: node at i; do allocateNewNodeByStdDeviation { arg center_node value: the_node; } } } }

// Exit set number value: 0; set all_allocated value: true; }

set the_degree value: the_degree + 1; } }

145

C. Screen shot of NOVEL

146

Annex 3 - CODE SNIPPETS IN R LANGUAGE

A. R files called by the operational framwork 1) Kriging estimation with Gaussian noises File: "../includes/RCode/KrigingVariance2010WithGrid.R"

1. library(sp) 2. library(maptools) 3. library(lattice) 4. n<-vectorParam[1] 5. rowNo<-vectorParam[2] 6. columnNo<-vectorParam[3] 7. lighttrap <- data.frame(x=c(1:n), y=c(1:n), density=c(1:n)) 8. lighttrap[,3]<-vectorParam[4: (n+3)] 9. lighttrap[,1]<-vectorParam[(n+4):(2*n + 3)] 10. lighttrap[,2]<-vectorParam[(2*n+4):(3*n + 3)] 11. cellNo <- rowNo * columnNo 12. region_grid<- data.frame(x=c(1:cellNo), y=c(1:cellNo), dist =c(1:cellNo), ffreq=c(1:cellNo), part.a=c(1:cellNo), part.b=c(1:cellNo)) 13. region_grid[,"x"]<-vectorParam[(3*n + 4):(3*n + 4 + cellNo - 1)] 14. region_grid[,"y"]<-vectorParam[(3*n + 4 + cellNo):(3*n + 4 + 2 * cellNo - 1)] 15. region_grid[,"dist"] <- 0 16. region_grid[,"ffreq"] <- 0 17. region_grid[,"part.a"] <- 0 18. region_grid[,"part.b"] <- 0 19. coordinates(lighttrap)=~x+y 20. pts = region_grid[c("x", "y")] 21. predictiongrid = SpatialPixels(SpatialPoints(pts)) 22. gridded(predictiongrid) = TRUE 23. v_uk = gstat::variogram(density~x+y, lighttrap) 24. uk_model = gstat::fit.variogram(v_uk, gstat::vgm(1, "Gau", 30000, 0)) 25. zn_uk = gstat::krige(density~x+y, lighttrap, predictiongrid, model = uk_model) 26. zn = zn_uk 27. zn[["mekong"]] <- zn_uk[["var1.pred"]] 28. zn[["se_mekong"]] = sqrt(zn_uk[["var1.var"]]) 29. #Adding random noises: 30. zn[["noise"]]<-zn_uk[["var1.pred"]] 31. for(i in 1:length(zn[["noise"]])) 32. { 33. zn[["noise"]][i] = rnorm(1, zn[["noise"]][i], zn[["se_mekong"]][i]) 34. while(zn[["noise"]][i] < 0) 35. { 36. zn[["noise"]][i] = 0 37. } 38. 39. } 40. result<-zn[["noise"]]

147

2) MSE Variogram: File: "../includes/RCode/MSE_Variogram_Exponential.R" 1. library(sp) 2. library(maptools) 3. library(lattice) 4. # PARAMETERS 5. n<-length(vectorParam)/3 6. lighttrap <- data.frame(x=c(1:n), y=c(1:n), density=c(1:n)) 7. lighttrap[,3]<-vectorParam[1:n] 8. lighttrap[,1]<-vectorParam[(n+1):(2*n)] 9. lighttrap[,2]<-vectorParam[(2*n+1):(3*n)] 10. # PROCESSING 11. coordinates(lighttrap)=~x+y 12. v_uk = gstat::variogram(density~x+y, lighttrap) 13. uk_model = gstat::fit.variogram(v_uk, gstat::vgm(1, "Exp", 40000, 0)) 14. 15. ##### GET MODEL PARAMETERS####### 16. model_nugget <- data.frame(uk_model)[1,2] 17. model_sil <- data.frame(uk_model)[2,2] 18. model_range <- data.frame(uk_model)[2,3] 19. model_kappa <- data.frame(uk_model)[2,4] 20. estimated_vectors <- NULL 21. sampled_vectors <- NULL 22. mse <- 0 23. for(cnt in 1:length(v_uk[,1])) 24. { 25. distance <- v_uk[cnt, 2] 26. estimated_value <- model_sil * 27. (1 - exp(-(distance)/(model_range))) + model_nugget 28. gamma_value <- v_uk[cnt, 3] 29. estimated_vectors <- c(estimated_vectors, estimated_value) 30. sampled_vectors <- c(sampled_vectors, gamma_value) 31. mse <- mse + (estimated_value - gamma_value) ^ 2 32. }

148

B. Different R files 1) Cross-validation (Figure 57) 1. library(sp) 2. library(maptools) 3. library(lattice) # required for trellis.par.set(): 4. trellis.par.set(sp.theme()) # sets color ramp to bpy.colors() 5. lighttrap <- read.csv(file= 6. "Three_Provinces_Lighttraps.csv",head=TRUE,sep=",") 7. region.grid <- read.csv(file= 8. "Three_Provinces_Grids.csv",head=TRUE,sep=",") 9. rg <- readShapeSpatial("3_Provinces_UTM.shp") 10. rg1<-as(rg,"SpatialPolygons") 11. rg2 = list("sp.polygons", rg1, fill = ifelse(alphaChannelSupported(), "blue", "transparent"), alpha = ifelse(alphaChannelSupported(), 0.1, 1)) 12. coordinates(lighttrap)=~x+y 13. library(gstat, pos = match(paste("package", "sp", sep=":"), search()) + 1) 14. pts = region.grid[c("x", "y")] 15. predictiongrid = SpatialPixels(SpatialPoints(pts)) 16. gridded(predictiongrid) = TRUE 17. rowno <- length(lighttrap[,1]) 18. subset_lighttrap <- lighttrap 19. cellno <- length(pts[,1]) 20. cellsize <- 1000 21. # Keep the validation data (checkpoints): 22. checkpoints <- array(0, c(rowno,4)) 23. for(i in 1:rowno) 24. { 25. # Get check point: 26. checkpoints[i, 1] <- data.frame(lighttrap)[i,1] 27. checkpoints[i, 2] <- data.frame(lighttrap)[i, 2] 28. cnt <- 0 29. locator <- 0 30. for(cnt in 1:cellno) 31. { 32. 33. inside <- point.in.polygon(checkpoints[i, 1], checkpoints[i, 2], c(pts[cnt,1], pts[cnt,1] + cellsize, pts[cnt,1] + cellsize, pts[cnt,1], pts[cnt,1]), c(pts[cnt,2], pts[cnt,2], pts[cnt,2] + cellsize, pts[cnt,2] + cellsize, pts[cnt,2])) 34. if(inside == 1) 35. locator = cnt

149

36. } 37. checkpoints[i, 3] <- lighttrap[["Day21"]][i] 38. # Leave one out: 39. if(i == 1) 40. subset_lighttrap<-lighttrap[2:rowno,] 41. else if(i == rowno) 42. subset_lighttrap<-lighttrap[1:(rowno - 1),] 43. else 44. subset_lighttrap<-lighttrap[c(1:(i-1), (i+1):rowno),] 45. v.ok = gstat::variogram(Day21~1, subset_lighttrap) 46. sil <- var(subset_lighttrap$Day21) 47. ok.model = fit.variogram(v.ok, vgm(sil, "Lin", 25000, 0)) 48. zn.ok = krige(Day21~1, subset_lighttrap, predictiongrid, model = ok.model) 49. checkpoints[i, 4] <- zn.ok[["var1.pred"]][locator] 50. } 51. 52. write.table(checkpoints, file = "ValidationResults_3_Provinces.csv", sep = ",", col.names = c("No","x","y", "sampled_value"), qmethod = "double") 53. cor(checkpoints[,3], checkpoints[,4], method="spearman") 54. cor(checkpoints[,3], checkpoints[,4], method="pearson") 55. cor(checkpoints[,3], checkpoints[,4], method="kendall")

2) Validation of BPH Prediction Model by RMSE (Table 10) 1. empirical_data <- 2. read.csv(file=" EmpiricalData.csv",head=TRUE,sep=",") 3. simulation_data <- read.csv(file="SimulationData.csv",head=TRUE,sep=",") 4. # Estimated data: 5. sum<-0 6. for(i in 1:48) 7. { 8. simulation<-simulation_data[1:32,1] 9. real<-empirical_data[1:32,1] 10. deviation<-simulation - real 11. deviation 12. deviation ^ 2 13. RMSE1<-sqrt(mean(deviation ^ 2)) 14. sum <- sum + RMSE1 15. } 16. RMSE1_mean <- sum/48 17. simulation<-simulation_data[33:64, 1]

150

18. real<-empirical_data[33:64, 1] 19. deviation<-simulation - real 20. deviation 21. deviation ^ 2 22. RMSE2<-sqrt(mean(deviation ^ 2))

151