Role of Pre-Processing in Textual Data Fusion: Osman, Mohd Haniff; Kaewunruen, Sakdirat
Total Page:16
File Type:pdf, Size:1020Kb
University of Birmingham Role of Pre-processing in Textual Data Fusion: Osman, Mohd Haniff; Kaewunruen, Sakdirat DOI: 10.3389/fbuil.2018.00030 License: Creative Commons: Attribution (CC BY) Document Version Publisher's PDF, also known as Version of record Citation for published version (Harvard): Osman, MH & Kaewunruen, S 2018, 'Role of Pre-processing in Textual Data Fusion: Learn from the Croydon Tram Tragedy', Frontiers in Built Environment, vol. 4, 30. https://doi.org/10.3389/fbuil.2018.00030 Link to publication on Research at Birmingham portal Publisher Rights Statement: Bin Osman MH and Kaewunruen S (2018) Role of Pre-processing in Textual Data Fusion: Learn From the Croydon Tram Tragedy. Front. Built Environ. 4:30. doi: 10.3389/fbuil.2018.00030 Published in Frontiers in Built Environment on 09/07/2018 General rights Unless a licence is specified above, all rights (including copyright and moral rights) in this document are retained by the authors and/or the copyright holders. The express permission of the copyright holder must be obtained for any use of this material other than for purposes permitted by law. •Users may freely distribute the URL that is used to identify this publication. •Users may download and/or print one copy of the publication from the University of Birmingham research portal for the purpose of private study or non-commercial research. •User may use extracts from the document in line with the concept of ‘fair dealing’ under the Copyright, Designs and Patents Act 1988 (?) •Users may not further distribute the material nor use it for the purposes of commercial gain. Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document. When citing, please reference the published version. Take down policy While the University of Birmingham exercises care and attention in making items available there are rare occasions when an item has been uploaded in error or has been deemed to be commercially or otherwise sensitive. If you believe that this is the case for this document, please contact [email protected] providing details and we will remove access to the work immediately and investigate. Download date: 01. Oct. 2021 PERSPECTIVE published: 09 July 2018 doi: 10.3389/fbuil.2018.00030 Role of Pre-processing in Textual Data Fusion: Learn From the Croydon Tram Tragedy Mohd H. Bin Osman 1,2* and Sakdirat Kaewunruen 1 1 School of Civil Engineering, University of Birmingham, Birmingham, United Kingdom, 2 Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi, Malaysia Tram/train derailment subject to human mistakes makes investments in an advanced control room as well as information gathering system exaggerated. A disaster in Croydon in year 2016 is recent evidence of limitation of the acquired systems to mitigate human shortcoming in disrupted circumstances. One intriguing way of resolution could be is to fuse continuous online textual data obtained from tram travelers and apply the information for early cautioning of risk discovery. This resolution conveys our consideration regarding a resource of data fusion. The focal subject of this paper is to discuss about role of pre-processing ventures in a low-level data fusion that have been distinguished as a pass to avoid time and exertion squandering amid information retrieval. Inclines in online text data pre-processing is reviewed which comes about an outline suggestion that concede traveler’s responses through social media channels. The research outcome shows by a Edited by: case of data fusion could go about as an impetus to railway industry to effectively partake Min An, in data exploration and information investigation. University of Salford, United Kingdom Reviewed by: Keywords: data fusion, text processing, social media, alert system, risk mitigation, disruption, tram accident, microsleep Grigorios Fountas, University at Buffalo, United States Ivo Haladin, Faculty of Civil Engineering, University INTRODUCTION of Zagreb, Croatia *Correspondence: Data is enormous in railway industry, covering both train/tram operations and infrastructure Mohd H. Bin Osman management dimensions. Acknowledging many successful stories from outside railway domain [email protected] associated with an adaptation of data-driven innovation, British railway infrastructure manager has launched the “Challenge Statements” program series which recognize data exploration/exploitation Specialty section: is one of their agenda (Network Rail, 2017). One element which has a great potential to accelerate This article was submitted to the journey of worthwhile state is data fusion. Transportation and Transit Systems, Data fusion is famously defined as (White, 1987): “A multi-process dealing with the association, a section of the journal correlation, combination of data and information from single and multiple sources to achieve Frontiers in Built Environment refined position, identify estimates and complete and timely assessments of situations, threats Received: 02 February 2018 and their significance.” The definition grasps a spirit of encouragement and inspirations for data Accepted: 12 June 2018 analysts to continuously conduct data exploration/exploitation to appropriately enrich the value Published: 09 July 2018 of existing information about an object of interest. In regard to application-wise, data fusion is Citation: nowadays beyond the dominant and matured domain; remote sensor and signal processing, as Bin Osman MH and Kaewunruen S case studies can be found in condition monitoring (Raheja et al., 2006), crime analysis (Nokhbeh (2018) Role of Pre-processing in Textual Data Fusion: Learn From the Zaeem et al., 2017), forest management (Chen et al., 2005), and engineering (Steinberg, 2001). Croydon Tram Tragedy. The key of data fusion being applied in diverse research domains is about the way the problem is Front. Built Environ. 4:30. formulated and the choice of methods (Hannah et al., 2000; Starr et al., 2002) recognize the capacity doi: 10.3389/fbuil.2018.00030 of identifying a parallel between problems under consideration with data fusion models is the key Frontiers in Built Environment | www.frontiersin.org 1 July 2018 | Volume 4 | Article 30 Bin Osman and Kaewunruen Early Warning From Social Media Text to success. These influential factors drag our regard to the source processing due to high workload is applied on data source. classical data fusion framework, known as the JDL (an acronym Depending on the problem under consideration, data sources can for Joint Directors of Laboratories) framework. be sensors, a prior knowledge, databases, or human input. The The JDL framework put major elements of data fusion step is metaphorically viewed as a bridge connecting external definition into five successive steps (a pre-processing, object elements with data fusion framework that must be traversed refinement, situation assessment, threat assessment, and process before subsequent data fusion steps, more complex in nature, refinement) to aid the encouragements expressed in the can be performed. By imposing barriers to free-flow of data definition in a systematic and efficient way. The framework streaming, a compact representation of raw data is sufficient to could be conveniently attended in hierarchy sense upon user’s produce brief but reliable decision making process (Eggers and capacity to accommodate the current problem at hand. For Khuon, 1990). In addition, an efficiency and scalability of an example, the low-level data fusion comprises of a pre-processing object refinement; a subsequent step applied to processed data, step is significant for a new application (or project) in a can be improved through a proper pre-processing (Mitali et al., context of data refinement. Hence, this study is carried out to 2003). address design issues of low-level data fusion, especially pre- However, this particular benefit is challenging to gain when processing steps regarding textual data. The discoveries are relied online text as a data source. Online text is delivered from vast upon to serve the future application of online text data-driven points of network to the servers at different arrival time and alerting system inspired from the tragedy of Croydon tram in various styles which is very informal in nature. Language derailment. dependent factors which do not have an impact to information On 9th November 2016, a commuter community particularly retrieval are also identified obstacle (Singh and Kumari, 2016; in London, was stunned when the unfortunate tram No. Nokhbeh Zaeem et al., 2017) points out the challenge in 2551 derailed and killed seven passengers. The tragedy can be dealing with social media text data for fortification of sentiment delegated as a black swan in the clean track records of the classification especially in terms of short length and internet slang tram operator company following 15 years of re-operations in word. In addition, an existence of uninformative text such as London. According to early investigation report, a human-error HTML tags, scripts, internet abbreviations and advertisements is identified as a primary contributor to the derailment. The tram rises the computational complexity to an upper level as compared driver endured black out which prompt to an inability to reduce to a well-presented text (Petz et al., 2012; Dos Santos and Ladeira, the tram speed as far as possible before entering the accident 2014) underline the significant of having a language detector area. The report proclamation sparks