Semantic Intelligence in Big Data Applications 73

Chapter 4 1 Semantic Intelligence in Big Data 2 Applications 3 Valentina Janev 4 Abstract Today, data are growing at a tremendous rate, and according to the 5 International Data Corporation, it is expected they will reach 175 zettabytes by 6 2025. The International Data Corporation also forecasts that more than 150B devices 7 will be connected across the globe by 2025, most of which will be creating data in 8 real time, while 90 zettabytes of data will be created by Internet of things (IoT) 9 devices. This vast amount of data creates several new opportunities for modern 10 enterprises, especially for analyzing enterprise value chains in a broader sense. In 11 order to leverage the potential of real data and build smart applications on top of 12 sensory data, IoT-based systems integrate domain knowledge and context-relevant 13 information. Semantic intelligence is the process of bridging the semantic gap 14 between human and computer comprehension by teaching a machine to think in 15 terms of object-oriented concepts in the same way as a human does. Semantic 16 intelligence technologies are the most important component in developing artifi- 17 cially intelligent knowledge-based systems, since they assist machines in contextu- 18 ally and intelligently integrating and processing resources. This chapter aims at 19 demystifying semantic intelligence in distributed, enterprise, and Web-based infor- 20 mation systems. It also discusses prominent tools that leverage semantics, handle 21 large data at scale, and address challenges (e.g., heterogeneity, interoperability, and 22 machine learning explainability) in different industrial applications. 23 Keywords Semantic intelligence · Big data applications · Knowledge graphs · 24 Artificial intelligence · Interoperability 25 Key Points 26 • Semantic intelligence is the process of bridging the semantic gap between human 27 and computer comprehension. 28 V. Janev (*) Institute Mihajlo Pupin, University of Belgrade, Belgrade, Serbia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 71 S. Jain, S. Murugesan (eds.), Smart Connected World, https://doi.org/10.1007/978-3-030-76387-9_4 72 V. Janev 29 • There is a need for semantic standards to improve the interoperability of complex 30 systems. 31 • The semantic data lakes supply the data lake with a semantic middleware that 32 allows uniform access to original heterogeneous data sources. 33 • Knowledge graphs is a solution that allows the building of a common under- 34 standing of heterogeneous, distributed data in organizations and value chains, and 35 thus provision of smart data for artificial intelligence applications. 36 • The goal of semantic intelligence is to make business intelligence solutions 37 accessible and understandable to humans. 38 4.1 Introduction 39 Both researchers and information technology (IT) professionals have to cope with a 40 large number of technologies, frameworks, tools, and standards for the development 41 of enterprise Web-based applications. This task has become even more cumbersome 42 as a result of the following events: 43 • The emergence of the Internet of things (IoT) in 1999 (Rahman & Asyhari, 2019) 44 • The development of Semantic Web (SW) technologies as a cornerstone for 45 further development of the Web (Berners-Lee, 2001) 46 • The development of big data solutions (Laney, 2001) 47 Hence, topics such as smart data management (Alvarez, 2020), linked open data 48 (Auer et al., 2014), semantic technologies (Janev & Vraneš, 2009), and smart AU1 49 analytics have spawned a tremendous amount of attention among scientists, software 50 experts, industry leaders, and decision-makers. Table 4.1 defines a few terms related 51 to data, such as open data, big data, linked data, and smart data. t1:1 Table 4.1 Definitions t1:2 Term Definition Open data “The data available for reuse free of charge can be observed as open data” (Janev t1:3 et al., 2018) Big data “‘Big data’ are high-volume, velocity, and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and t1:4 decision making” (Laney, 2001) “Big data are high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight t1:5 discovery, and process optimization” (Manyika, 2011) Linked The term “linked data” refers to a set of best practices for publishing structured data data on the Web. These principles have been coined by Tim Berners-Lee in the design t1:6 issue note Linked Dataa (Berners-Lee, 2006) Smart “Simply put, if big data are a massive amount of digital information, smart data are data the part of that information that is actionable and makes sense. It is a concept that developed along with, and thanks to, the development of algorithm-based technol- t1:7 ogies, such as artificial intelligence and machine learning” (Dallemand, 2020) t1:8 ahttps://www.w3.org/DesignIssues/LinkedData 4 Semantic Intelligence in Big Data Applications 73 Despite the fact that the term IoT (“sensors and actuators embedded in physical 52 objects and connected via wired and wireless networks”) is 20 years old, the actual 53 idea of connected devices is older and dates back to the 1970s. In the last two 54 decades, with the advancement in ITs, new approaches have been elaborated and 55 tested for handling the influx of data coming from IoT devices. On one side, the 56 focus in industry has been on manufacturing and producing the right types of 57 hardware to support IoT solutions. On the other, the software industry is concerned 58 with finding solutions that address issues with different aspects (dimensions) of data 59 generated from IoT networks, including (1) the volume of data generated by IoT 60 networks and the methods of storing data, (2) the velocity of data and the speed of 61 processing, and (3) the variety of (unstructured) data that are communicated via 62 different protocols and the need for adoption of standards. While these three Vs have 63 been continuously used to describe big data, additional dimensions have been added 64 to describe data integrity and quality, such as (4) veracity (i.e., truthfulness or 65 uncertainty of data, authenticity, provenance, and accountability), (5) validity (i.e., 66 correct processing of data), (6) variability (i.e., context of data), (7) viscosity (i.e., 67 latency data transmission between the source and destination), (8) virality (i.e., speed 68 of the data sent and received from various sources), (9) vulnerability (i.e., security 69 and privacy concerns associated with data processing), (10) visualization (i.e., 70 interpretation of data and identification of the most relevant information for the 71 users), and (11) value (i.e., usefulness and relevance of the extracted data in making 72 decisions and capacity to turn information into action). 73 With the rapid development of the IoT, different technologies have emerged to 74 bring the knowledge (Patel et al., 2018) within IoT infrastructures to better meet the 75 purpose of the IoT systems and support critical decision-making (Ge et al., 2018; 76 Jain, 2021). While the term “big data” refers to datasets that have large sizes and 77 complex structures, the term “big data analytics” refers to the strategy of analyzing 78 large volumes of data which are gathered from a wide variety of sources, including 79 different kind of sensors, images/videos/media, social networks, and transaction 80 records. Aside from the analytic aspect, big data technologies include numerous 81 components, methods, and techniques, each employed for a slightly different pur- 82 pose, for instance for pre-processing, data cleaning and transformation, data storage, 83 and visualization. 84 In addition to the emergence of big data, the last decade has also witnessed a 85 technology boost for artificial intelligence (AI)-driven technologies. A key prereq- 86 uisite for realizing the next wave of AI application is to leverage data, which are 87 heterogeneous and distributed among multiple hosts at different locations. Conse- 88 quently, the fusion of big data and IoT technologies and recent advancements in 89 machine learning have brought renewed visibility to AI and have created opportu- 90 nities for the development of services for many complex systems in different 91 industries (Mijović et al., 2019; Tiwari et al., 2018). Nowadays, it is generally 92 accepted that AI methods and technologies bring transformative change to societies 93 and industries worldwide. In order to reduce the latency, smart sensors (sensor 94 networks) are empowered with embedded intelligence that performs pre-processing, 95 reduces the volume, and reacts autonomously. Additionally, in order to put the data 96 74 V. Janev 97 in context, standard data models are associated with data processing services, thus 98 facilitating the deployment of sensors and services in different environments. 99 This chapter explains the need for semantic standards that improve interopera- 100 bility in complex systems, introduce the semantic lake concept, and demystify the 101 semantic intelligence in distributed, enterprise, and Web-based information systems 102 (see the following section). In order to select an appropriate semantic description, 103 processing model, and architecture solution, data architects and engineers need to 104 become familiar with the analytical problem and the business objectives of the 105 targeted application. Therefore, the authors describe four eras of data analytics and 106 introduce different big data tools. 107 4.2 From Data to Big Data to Smart Data Processing 108 Data-driven technologies such as big data and the IoT, in combination with smart 109 infrastructures for management and analytics, are rapidly creating significant oppor- 110 tunities for enhancing industrial productivity and citizen quality of life. As data 111 become increasingly available (e.g., from social media, weblogs, and IoT sensors), 112 the challenge of managing them (i.e., selecting, combining, storing, and analyzing 113 them) is growing more urgent (Janev, 2020).

Load more