<<

PoS(ISCC2015)022 e in this http://pos.sissa.it/ In particular, w ining M attern 2511 -ro, , 30019, Korea city, Sejong Sejong-ro, 2511 30019, Korea city, Sejong Sejong-ro, 2511 2511 Sejong-ro, Sejong city, 30019, Korea 30019, city, Sejong Sejong-ro, 2511 P , , , , we propose a novel big data analytics model using extensive volumes study

3 ,HeesuChoi, Lee Jonguk 1 3 2 [email protected] [email protected];[email protected]; [email protected] necessary to develop technology for analysing livestock disease and predicting livestock disease propagation. In this of livestock disease occurrence data accumulated over andescribe extendeda sampleperiod. process based on a specific scenario thatsequential outbreaksdissemination byroutes applyingof HPAI sequential patternelicits mining information that generates paper. Highly pathogenic avian influenza virus(HPAI) can spread rapidly, resulting in high mortality and severe economic damage. To minimize the damage incurred from such diseases, it is Copyright owned by the author(s) under the terms of the Creative Commons the terms the under the owned Creative of by author(s) Copyright

Speaker Correspongding Author This was Basic research by supported Science Program Research the through National Research the Education by of funded Foundation Ministry (NRF- of Korea (NRF) and (Brain2015R1D1A3A01018731) Korea) 21 Plus BK Program.  4.0). License (CC BY-NC-ND 4.0 International Attribution-NonCommercial-NoDerivatives 18-19, December, 2015 18-19, December, ChinaGuangzhou, 1 2 3 ISCC 2015 Yongwha Chung Yongwha Campus Sejong University Korea E-mail: E-mail: [email protected] Daihee Campus Sejong University Korea E-mail: usingSequential ZhenshunXu Campus Sejong University Korea Propagation Routes Analysis of HPAI Outbreaks Analysisof HPAI PropagationRoutes PoS(ISCC2015)022 . Zhenshun Xu Zhenshun studies reported hasstudies the feasibility and test a sample process based on a . We ing describ an extensive variety ofvarietyextensive an in this paper 2 rchitecture

A operations and powerful data-mining techniques on techniquespowerfuldata-mininga operationsand

ystem S provided by the Ministry ofAgriculture in [5] ystem S In particular, we focus on nalysis A operations and powerful data-mining techniques on a multidimensional

nalysis A ethods M utbreak O utbreak utbreak Figure 1 illustrates the overall concept of the proposed analysis system. theproposed analysis concept theoverallof illustrates Figure 1 O

HPAI HPAI

. In this section, we analysisintroduce model anthat HPAI can analyze extensive long-term Unlike the current research perspectives, in this study, we propose an efficient big data Since the virusfirst atdetection theof endthe of HPAI have 2003, five outbreaks of HPAI Materials and Introduction Figure 1: Figure HPAI data usingHPAI OLAP datacube applicability of the proposed HPAI analysis model by implementing a HPAI analysis andsystemanalysis by a implementing HPAI model analysis applicability HPAI the proposed of Korea. Southin outbreaks oftoanalysis applying HPAI it the 2. 2.1HPAI The proposed analysis model provides comprehensive and detailedsimple (OLAP) on analytic processing line analysis results utilizing multidimensional data cube. specific scenario that elicits information that generates sequential dissemination routes of HPAI outbreaks by applying sequential pattern mining model the disease spread byavianpandemicinfluenza[4]. individuals who were susceptible to or were infected with analytics solution for the analysis livestock ofdisease occurrence data HPAI in South Korea using accumulated long-term the virus attached to aerosols produced by livestock[2] . Lee et al. constructed a spreaddirect HPAI network based on the relationships between farms using poultry-relatedand anbusiness indirect data spreadHPAI network using the aerial spread from each farm during the HPAI outbreak in 2008[3]. Tuncer and Le studied the effectsinfluenzaAsian fromAustralian and cities to ofthe United States. Real airair travel data was used totravel on the spread of avian been reported in South Korea [1]. Highly infectious livestock diseases have increased the threat to human life, causing minimize considerablethe damage incurred economic from such damagediseases, it is and date, To data. occurrence livestockdisease necessaryanalyzing to environmental develop technology problems.for To on outbreaks.HPAI Seo et al. used computational fluid dynamics to estimate the dispersion of Propagation Routes Analysis of HPAI Outbreaks Analysis of HPAI Routes Propagation 1. PoS(ISCC2015)022 of HPAI Zhenshun Xu Zhenshun summary Total outbreak sites outbreak Total 10 5 19 25 5 Since the first detection . diagnosis time, farm address, farm’s name, First outbreak site First outbreak Chungbuk.Eumseong Jeonbuk. Jeonbuk. Jeonbuk.Iksan Jeonbuk.Gochang 3

Figure 2 illustrates the HPAI outbreaks in South Korea 361 Numbers of outbreak 19 7 33 53 ining M serveral attributes such as attern attern P ources S Summary of HPAI Outbreak Status Outbreak HPAI ofSummary In this section, we briefly describe a data-mining techniques to retrieve useful information In our experiments, we used the livestock disease occurrence data provided by the As can be seen in this figure, the system consists of four layers from data resources to the Sequential Sequential 2003/2004 (winter) 2003/2004 (winter) 2006/2007 2008 (spring) (winter) 2010/2011 (winter) 2014/2015 Year (starting season) Year algorithm [7-8]. The data set for sequence mining consists of a Eachcollection input ofsequence in inputthe datasequence. set has a unique identifier called TID which it represents the year of occurrence, and each event in a given input sequence also has a unique identifier called EID which it represents the occurrence time in same TID, and items eachwhich it represents eventthe locations of containsoccurrence, in 2the Table presentssequence paper. an example Items. and EID, TID, includingminingpattern dataset for sequential subsequences as patterns. Given a set of sequences, where each sequence consists of eventsa (orlist elements)of and each event consists of a set of items, and support athreshold, sequential pattern mininguser-specified identifies all minimumthe frequent subsequences; that is, the subsequences whose occurrence frequency in the set of sequences is no less than supporta minimum threshold [6]. In this paper we used ‘arulesSequence’ packages in R with SPADE 2.3 for a comprehensive and in-depth understanding of HPAI diseases: sequential pattern mining. Sequential pattern mining is the discovery of frequently occurring ordered events or Table 1: 1: Table occurrence status across the countryoutbreak site, and includingtotal outbreak sites. outbreak year, numbers (untilgraphically. 2003 May to 2015 13 from over2015), years, of outbreak, first Ministry Agricultureof in South Korea over 13 years from 2003 to 2015 (until May 2015) [5]. The row data contains legal disease grade, livestock disease’s name, and variety of livestock of HPAI virus at the end of 2003, five HPAI occurrences have been reported in South 2003/2004,Korea: 2006/2007, 2008, 2010/2011, and 2014/2015.1 Table presents a The results of this disinfection layer policy provideand strategy valuablefor preventing informationand controlling interfaces. visualizationlayer’s toepidemic diseases stronglyin the suggestlast an improved Data 2.2 data analytics with OLAP and data mining techniques, and visualization. The various forms of data repositories in the first layer are delivered to the next layer of storage.ensure To that only data in the standard format is stored in the data warehouse, a process called extract, transform, and load (ETL) is performed to preprocess the data collected from the different data sources. In the third layer, we perform multidimensional analysis using OLAP and data-mining techniques. Propagation Routes Analysis of HPAI Outbreaks Analysis of HPAI Routes Propagation representation of analysis results: source data repositories, data warehouse and pre-processing, PoS(ISCC2015)022 Zhenshun Xu Zhenshun JeanNam.Yeongam Jeonbuk.Iksan Gyeonggi.Pyengtak JeanNam. Jeonbuk.Iksan Gyeonggi.Pyengtak Jeonbuk.Kimje Jeonbuk.Iksan Gyeonggi.Anseng JeanNam.Yeongam JeanNam.Naju Jeonbuk.Iksan JeanNam.Yeongam Gyeongbuk. Gyeonggi.Anseng JeanNam.Yeongam JeanNam.Naju Jeonbuk.Iksan Items ining M 4 ountry C

attern P 4 1 2 3 1 2 3 4 5 EID 1 2 3 1 2 3 1 2 3 cross cross the A equential S the feasibility and applicability of the proposed system by tatus test S ata set for ata set utbreak utbreak 3 (08) 4 (10/11) 4 (10/11) 4 (10/11) 5 (14/15) 5 (14/15) 5 (14/15) 5 (14/15) 5 (14/15) TID 1 (03/04) 1 (03/04) 1 (03/04) 2 (06/07) 2 (06/07) 2 (06/07) 3 (08) 3 (08) 3 (08) D O HPAI HPAI Example In this section, we Results [1,9] , we can observe that the outbreak is propagated rapidly over a wide area during the first elicits information that generates the sequential dissemination routes of HPAI outbreaks from “Gochang” within a specific period. We used the HPAI occurrence data set in“arulesSequences” package2014 in withR thetools for sequential pattern mining analysis. Table 3 presents sample of the obtained sequential considering patternthe nearby miningsites (see rules Figure 2) (mininum to support“Gochang”. Concerning ≥the incubation 0.2)period without 3. describing the experimental results that were applied to the HPAI outbreak data using R tools [8]. We are interested in identifying the sequential dissemination routes of HPAI outbreaks in South Korea. From our trial experiments, we describe a process based on a certain scenario that Table 2: 2: Table Figure 2: Figure Propagation Routes Analysis of HPAI Outbreaks Analysis of HPAI Routes Propagation PoS(ISCC2015)022 Zhenshun Xu Zhenshun ules R 5

attern attern P the feasibility and applicability of the proposed HPAI equential tested S btained O equential routes of HPAI outbreaks from “Gochang” in GIS. in from “Gochang” outbreaks HPAI ofequential routes S ofExample Chungnam.Buyeo(14.01.25)→Chungnam.Chenan(14.01.28)→Chungnam.Chengyang(14.02.17) Chungnam.Chenan(14.01.28)→Chungbuk.Jinchen do.Anseng(14.02.13)→Gyeonggi-do.Pyeongtak(14.02.25) (14.01.29)Gyeongnam.(14.01.30) → Jeonbuk.Imsil(14.01.29) → Jeonbuk.(14.01.24) Chungbuk.Emseong(14.02.07) → Chungbuk.Jinchen(14.01.29) | Chungnam.Nonshan(14.02.24) → → Jeonbuk.Gimje(14.02.19) Jeonbuk.Jeongeup(14.02.17) Gyeonggi-do.Hwaseong(14.01.30)(14.02.28) Jeonnam.Yeongam(14.02.17)→Jeonnam.HamPyeong(14.02.24)→Jeonnam.Yeonggwang | Gyeonggi- Sequential routes of HPAI outbreaks ofHPAI routes Sequential Chungnam.Buyeo(14.01.25) Jeonbuk.Buan(14.01.17) → Jeonbuk.Gochang(14.01.16) → Jeonbuk.Jeongeup(14.01.24) Jeonbuk.Gochang(14.01.16) → Jeonnam.Haenam(14.01.26) Jeonbuk.Gochang(14.01.16) → Jeonnam.Yeongam(14.01.30) Jeonnam.Naju(14.01.28) Jeonbuk.Gochang(14.01.16) → → The detailed analysis of HPAI occurrences and early prediction of the HPAI outbreak Discussions andDiscussions Conclusion 5 6 7 8 9 10 Index 1 2 3 4 information that generated the sequential dissemination routes of outbreaksHPAI by applying sequential pattern mining. We sequence are important issues in the In livestock thiswe industry. paper, proposed a new design methodology for the analysis of HPAI in South Korea.construction of anThe integrated data analysis model occurrencecoreusing dataHPAI accumulated in a of the methodology is the data warehouse over an extended period. The proposed analysis model elicited useful Figure 3: Figure 4. Table 3: Table HPAI HPAI in the first two weeks. Finally, we depicted the sequential propagation (seeFigure 3). Korea version Southa of in GIS “Gochang” outbreaks from routes of HPAI Propagation Routes Analysis of HPAI Outbreaks Analysis of HPAI Routes Propagation two weeks and is spread sparsely after two weeks; it is propagated throughout the nation within six weeks [9]. Therefore, we can state that it is critical to initiate preventive actions against PoS(ISCC2015)022 [J]. [J]. ccessed ccessed Zhenshun Xu Zhenshun [J]. Biosystems Biosystems [J]. [M]. Morgan [M]. Morgan Prediction of the spread the of spread Prediction Pathologic changes in Pathologic changes [J]. Machine learning. [J]. Machine learning. H5N8 highly pathogenic : 115-127(2014) Accessed December 10, 2015 (1) Prediction of the spread of highly of of spread the Prediction 6

Data Mining: Concepts and Techniques Concepts Data Mining: [J]. engineering. 118 Biosystems Effect of air travel on the spread of an avian influenza pandemic to of influenza the an avian Effect of spread air on the travel https://cran.r-project.org/web/packages/arulesSequences/index.html A https://cran.r-project.org/web/packages/arulesSequences/index.html : 160-176(2014) International journal of critical 27-47(2014) International of journal infrastructure 7(1): protection. (1) [J]. SPADE: An efficient algorithm for mining frequent sequences frequent efficientAn algorithm for mining SPADE: [J]. Osong public health Osong research perspectives. and [J]. 106-111(2015) public 6(2): : 31-60(2001) (1) To To the best of our knowledge, this is the first report of OLAP analysis for accumulated avian influenza in the republic of Korea: Epidemiology during the first wave, from January through January through wave, during first Epidemiology the from of Korea: influenzaavian in republic the July 2014 M. J. Zaki. M. J. Zaki. 42 R arulesSequences. December 10, 2015 al. et , H.-Y. Kang, Y.-M. Choi, Jeong, J. W. O.-K. Moon, Yoon, H. KAHIS (Korean Animal System). Health Integration KAHIS (Korean http://www.kahis.go.kr/home/lkntscrinfo/selectLkntsOccrrncList.do ed., J. Pei. 3rd and J. Han,Kamber, M. 125-148,243-272(2012) Burlington. Kaufmann Publishers, MA H.-J. Lee, I.-H. Seo, O.-K. Suh, I.-B. Moon. Lee, Jung, K. N.-S. analysis network network: usingpathogenic a multifactor Part 2–comprehensive influenza avian infection route direct/indirect with T. and N., Le, Tuncer, StatesUnited wild birds infected with highly pathogenic avian influenza a (H5N8) viruses, South Korea, 2014 infected influenza avian South highly pathogenic a (H5N8) with Korea, viruses, birds wild 775-780(2015) diseases. 21(5): infectious Emerging al. et Hong, H.-J. O.-K. Jung, I.-H. Seo, I.-B. Moon, N.-S. Lee, S.-W. Lee, 1–developmentof influenza network: avian highly pathogenic using a multifactor Part and dynamics of computational dispersion airborne simulations application fluid of 121 engineering. H.-R. Kim, Y.-K. Kwon, I. Jang, Y.-J. Lee, Lee, H.-M. al. et Kang, K.-H. Y.-J. Kwon, Jang,I. Y.-K. H.-R. Kim, [7] [8] [9] [5] [6] [3] [4] [2] [1] is required and is a part of our ongoing research. We it. in techniquessystemvarious analysisembeddingprototype data-mining also by plan to increase the power of our References long-term livestock disease occurrence data and data routes analysis of disease miningoutbreaks.The application of big data analytics including data mining analysis for the dissemination methods is appropriate considering the continuouscharacteristic of the newand era of big data. Further largetesting and refinement of the proposed incoming system data stream that is Propagation Routes Analysis of HPAI Outbreaks Analysis of HPAI Routes Propagation analysis model by implementing analysisan systemHPAI using R tools and then applying it to outbreaks Korea. in South HPAI oftheanalysis