sensors

Article An Analytics Environment Architecture for Industrial Cyber-Physical Systems Solutions

Eduardo A. Hinojosa-Palafox 1 , Oscar M. Rodríguez-Elías 1,* , José A. Hoyo-Montaño 1 , Jesús H. Pacheco-Ramírez 2 and José M. Nieto-Jalil 3

1 División de Estudios de Posgrado e Investigación, Tecnológico Nacional de México/I. T. de Hermosillo, Hermosillo, Sonora 83170, Mexico; [email protected] (E.A.H.-P.); [email protected] (J.A.H.-M.) 2 Departamento de Ingeniería Industrial, Universidad de Sonora, Hermosillo, Sonora 83000, Mexico; [email protected] 3 Escuela de Ingeniería y Ciencias, ITESM Sonora, Hermosillo, Sonora 83000, Mexico; [email protected] * Correspondence: [email protected]

Abstract: The architecture design of industrial data analytics system addresses industrial process challenges and the design phase of the industrial Big drivers that consider the novel paradigm in integrating Big Data technologies into industrial cyber-physical systems (iCPS). The goal of this paper is to support the design of analytics Big Data solutions for iCPS for the   modeling of data elements, predictive analysis, inference of the key performance indicators, and real-time analytics, through the proposal of an architecture that will support the integration from IIoT Citation: Hinojosa-Palafox, E.A.; environment, communications, and the cloud in the iCPS. An attribute driven design (ADD) approach Rodríguez-Elías, O.M.; has been adopted for architectural design gathering requirements from smart production planning, Hoyo-Montaño, J.A.; manufacturing process monitoring, and active preventive maintenance, repair, and overhaul (MRO) Pacheco-Ramírez, J.H.; scenarios. Data management drivers presented consider new Big analytics techniques Nieto-Jalil, J.M. An Analytics Environment Architecture for that show data is an invaluable asset in iCPS. An architectural design reference for a Big Data Industrial Cyber-Physical Systems analytics architecture is proposed. The before-mentioned architecture supports the Industrial Internet Big Data Solutions. Sensors 2021, 21, of Things (IIoT) environment, communications, and the cloud in the iCPS context. A fault diagnosis 4282. https://doi.org/10.3390/ case study illustrates how the reference architecture is applied to meet the functional and quality s21134282 requirements for Big Data analytics in iCPS.

Academic Editors: Salvatore Cavalieri Keywords: analytics environment architecture; industrial Big Data analytics; industrial Big Data and Nunzio Marco Torrisi architecture design; CPS analytics

Received: 30 April 2021 Accepted: 20 June 2021 Published: 23 June 2021 1. Introduction The (IoT) has been widely adopted by the industry [1], and its Publisher’s Note: MDPI stays neutral impact is transforming manufacturing. The Industrial Internet of Things (IIoT) consists with regard to jurisdictional claims in published maps and institutional affil- of creating networks of physical objects, environments, vehicles, and machines through iations. integrated electronic devices that allow the collection and exchange of data [2]. It opens the possibility of creation of industrial cyber-physical systems (iCPS) with the convergence of about networks, connectivity, data, ecosystems, and information systems with operational technology talking about physical plant equipment, machinery, and systems for monitoring and controlling. The construction of iCPS enables the inte- Copyright: © 2021 by the authors. gration of information with manufacturing processes [3]. In this context, iCPS connect Licensee MDPI, Basel, Switzerland. machines, systems, and assets, so organizations can create smart networks along the value This article is an open access article distributed under the terms and chain to, among other things, optimize production processes [4]. All this interconnection conditions of the Creative Commons of objects that generate and process data and information also require new approaches to Attribution (CC BY) license (https:// better manage this exponential growth of data and information to take advantage of this creativecommons.org/licenses/by/ exponential data accumulation [5]. 4.0/).

Sensors 2021, 21, 4282. https://doi.org/10.3390/s21134282 https://www.mdpi.com/journal/sensors Sensors 2021, 21, 4282 2 of 25

Machine learning models are used in Big Data to identify complex patterns to extract information from large amounts of data from industrial processes under observation [6]. Even though sophisticated data analysis is required, a design approach to allow the development of analytics systems architectures for the industry is necessary [7]. Different architectural approaches can be found in architectural design to address specific technologies or particular areas, but there is a lack of Big Data applications that integrates an analytics environment for iCPS. However, industrial analytics receives the most research interest [8–10]. Big Data solutions for iCPS need to integrate technologies in a consistent ecosystem in an industrial environment, but such solutions are complex. A reference architecture guide for Big Data analytics could facilitate the development, implementation, and operation of Big Data iCPS solutions in the industry [11]. This paper extends and completes a preliminary proposal to address this problem [12], particularly the design of a reference architecture in industry 4.0 that follows better architectural design practices of software engineering to facilitate the design and implementation of software-based solutions with a focus on data management. This paper aims at presenting an improvement in the design of Big Data analytics solutions for iCPS, and to provide support for the modeling of data elements, predictive analysis, inference of the key performance indicators, and real-time analytics, through the proposal of an architecture that will support the integration from IIoT environment, communications, and the cloud in the iCPS. It also presents the design of software-based solutions for iCPS with a case study for the fault diagnosis system domain that illustrates Big Data analytics on industrial cyber-physical systems. The remainder of this paper is structured as follows: Section2 presents the state of the art of Big Data analytics in iCPS. Section3 deals with general concepts related to industrial Big Data. Section4 presents a design framework for data management architecture, which precedes a fault diagnosis scenario for optimization data services, real-time equipment monitoring, and forecasting unusual process conditions in Section5. A discussion of the main findings together with their limitations and future work is presented in Section6. Finally, the conclusion of this paper is in Section7. For future reference, a list of acronyms used in this paper is provided in Table1.

Table 1. List of acronyms and their descriptions.

Acronym Description Acronym Description ADD Attribute driven design KPI Key performance indicator ALMA Architecture level modifiability analysis MES Manufacturing execution system ATAM Architecture trade-off analysis method MIS Manufacturing information systems CAD Computer-aided design MQTT Message queue telemetry transport CAE Computer-aided engineering MRO Maintenance, repair and overhaul CAM Computer-aided manufacturing OLAP On-line analytical processing CAS Computer-aided systems PaaS Platform as a service CBAM Cost benefit analysis method PDM Precedence diagram method CRM Customer relationship management PLC Programmable logic controller ERP Enterprise resource planning RFID Radio frequency identification ETL Extract-transform-load SAAM Software architecture analysis method Supervisory control and HDFS Hadoop distributed file system SCADA data acquisition iCPS Industrial cyber-physical system SCM Supply chain management IIoT Industrial Internet of Things

2. Related Works The architectural design of Big Data solutions follows the challenges from IIoT de- manding ecosystem with cloud services. Additionally, including the challenges to provide context to support data analysis. Such requirements require an agile and flexible architec- ture integration; hence, a reference architecture proposal is a common approach of various Sensors 2021, 21, 4282 3 of 25

researcher’s architecture design. For example, in [13], the architectural design focus is em- phasized, pointing out the complexity in Big Data management versus small data system differences. Big Data-system design (BDD) addresses the design of Big Data architecture combining data modeling with reference architectures and technology catalog to facilitate the development of a fitting Big Data platform development. In [7], a solution for analysis of goal and obstacle for Big Data architecture is proposed, selecting architectural decisions using imperfect information that satisfies goal drivers of stakeholders at the moment of the decision outcome. In [14], a use case modeling method to make an abstraction of the Big Data architecture to describe the role, activities, and functional components of Big Data in a better and more neutral way is presented. The development of architectures of Big Data analytics that consider the different points of view in the technological merge that occurs in iCPS and allows the develop- ment of Big Data analytics platforms is another of the approaches of several researchers. For example, an architecture that collects and integrates industrial data from IIoT and manufacturing information systems (MIS) is proposed in [15] with an approach for the model-driven generation of architectures to enable subsequent industrial analytics. A five-level architecture is used for CPS design in [10,16]. The architecture is called 5 C because it is built in five levels, which are intelligent connection, data to infor- mation conversion, cybernetic, cognition, and configuration. This five-level architecture is presented as an aid to guide the development of iCPS, aiming for smart manufacturing and resilient equipment that proves quality products and enhanced system performance. An architecture for the entire life cycle of aluminum industry production and supply chain in Industry 4.0 is presented in [17], it is a theoretical approach of an architecture composed of nine layers: seven horizontal layers (from physical equipment to data analysis) and two vertical layers (data models and software systems), each one containing several components and subcomponents, making the architecture too complex, since it is focused on integrating all the physical and software elements and systems that could be in operation in an aluminum production company. In [18], the cognitive oriented IoT Big-data (COIB) Framework integrates Big Data and IoT to implement an industrial IoT system. The COIB-framework wrangles data to extract the right data and metadata from multisource and heterogeneous data for better data governance. In [19], the authors propound a data architecture based on five layers that attempt to integrate sensors, actuators, networks, the cloud, and technologies of IoT for the generation of applications for Industry 4.0. The data response layer maintains persistence among the others through data management between them. The evolution of manufacturing execution systems (MES) from traditional manufac- turing to industry 4.0 integrates manufacturing activities with information technology advancements that rely on real-time software applications. For example, an MES system that connects the world machines and business systems that consider the particular tech- niques of production systems are proposed in [20] with the personalization of software for such automation of production systems. An approach for automation on the integration of MES with other manufacturing systems that simplify the design and implementation in the food and beverage industry is proposed in [21]. The approach in [22] focuses on real-time analytics for activities on the shop floor. Human tasks can be automated or augmented by the new role of MES in the digital twin of a physical product. The analysis of the selected articles indicates a lack of development of applications of industrial analytics. Therefore, given the need for analytics in the industry is growing fast and considering that industrial Big Data is still in an early stage, this opens the possibility of a contribution that considers an architecture design that integrates the Big Data technologies with industrial analytics. Sensors 2021, 21, x FOR PEER REVIEW 4 of 24 Sensors 2021, 21, 4282 4 of 25

3. Industrial Big Data Analytics 3. Industrial Big Data Analytics In iCPS, data is generated by multiple sources in different contexts such as facilities equipment,In iCPS, process data is equipment, generated bymanufacturin multiple sourcesg systems, in different and environment contexts data, such including as facilities equipment,sensors, machine process controllers, equipment, manufacturing manufacturing systems, systems, just and to environmentmention a few. data, This including could sensors,generate machinea massive controllers, volume of manufacturingdata that arrives systems, at high speed just to and mention with different a few. This formats, could generatecalled Big a Data massive (see volumeFigure 1). of data that arrives at high speed and with different formats, called Big Data (see Figure1).

FigureFigure 1. Big Data lifecycle.

However,However, rawraw data is useless to to get get the the data data value. value. Firstly, Firstly, data data wrangling wrangling for for iCPS iCPS isis neededneeded beforebefore processingprocessing because because of of data data no noise,ise, multiformat, multiformat, scale scale differences, differences, hetero- hetero- geneousgeneous sources, among among other other issues. issues. Next, Next, high-v high-valuealue data datais preserved is preserved as historical as historical data dataand andprocessed processed for exchanging for exchanging and sharing and sharing at all atlevels. all levels. Then, Then,usually usually through through cloud cloudser- services,vices, data data are are processed processed by bydata data mining mining and and machine . learning. ToTo address address the the last last challenges, challenges, an an analytics analytics architecture architecture for for iCPS iCPS Big Big Data Data that that considers con- thesiders novel the paradigmnovel paradigm in the in integration the integration of Big of DataBig Data technologies technologies into into iCPS iCPS and and allows allows the creationthe creation of applications of applications from from data data models models (see (see Figure Figure2), 2), that that support support Big Big Data Data analytics analyt- andics and address address industrial industrial challenges, challenges, even even though though Big Big Data Data technology technology rapidly rapidly changes, changes, were definedwere defined following following three questions.three questions. The first The question first question is how is to how address to address industrial industrial process challengesprocess challenges to achieve to achieve a Big Data a Big model, Data model, and how and Big how Data Big systems Data systems carry outcarry the out design the phasedesign of phase the industrial of the industrial Big Data Big management Data mana drivers?gement drivers? The architecture The architecture design of design industrial of analyticsindustrial system analytics is muchsystem more is much critical more than critical it is than for legacyit is for datalegacy systems, data systems, so the so second the questionsecond question centers oncenters how toon extendhow to traditional extend traditional design architecture design architecture for the systemfor the designsystem of iCPSdesign analytics? of iCPS analytics? A third question A third isquestion the integration is the integration of the other of the two: other how two: can how the can Big the Data modelBig Data and model architecture and architecture design be design integrated be in fortegrated analytics for systemanalytics framework system framework in the context in ofthe iCPS? context of iCPS? An architectonical guide should consider new data modeling analytics techniques that include:

Sensors 2021, 21, x FOR PEER REVIEW 5 of 24

• Diagnosis modeling. It extracts information from historical and transactional indus- trial data processes to recognize patterns in data by employing pattern recognition approaches. • Predictive modeling. It uses Big Data mining by applying modern techniques of sta- tistical approaches, data visualization, and pattern recognition approach to recognize organization threats and forecasts. Sensors 2021, 21, 4282 • Prescriptive modeling. It is used to generate better solutions by means of real-time5 of 25 operation machine learning-based optimization models that allow data integration and processing, enabling real-time monitoring and to make better decisions.

Figure 2. Big Data analytics management model. Figure 2. Big Data analytics management model.

3.1.An Industrial architectonical Big Data Management guide should Drivers consider new data modeling analytics techniques that include:Data is an invaluable asset in iCPS [23]. It enables smart manufacturing. Its strategic •significanceDiagnosis is modeling.to obtain the Itvalue extracts for analytics information through from Big historicalData processing. and transactional industrialBefore the data production processes process to recognize begins, the patterns smart production in data by plans employing conduct patternthe con- siderationrecognition of resource approaches. data of the production process. Next, the relationship of global data •allowsPredictive the optimized modeling. production It uses Big plan Data to miningbe quickly by applyinggenerated, modern improving techniques planning of statis- speed andtical accuracy. approaches, data visualization, and pattern recognition approach to recognize organizationIn the production threats process, and forecasts. real-time data should facilitate the monitoring of the pro- •ductionPrescriptive process, modeling. so the manufacturers It is used to could generate keep better up to solutions date with by the means production of real-time devia- tionsoperation to generate machine optimal learning-based operational optimizationcontrol plans models[24]. Fault that diagnosis allow data and integration operation processand processing,optimization enabling can be real-timeaccomplished monitoring by storing and toand make analyzing better decisions.data for active pre- ventive maintenance, repair, and overhaul (MRO), through Big Data from IIoT. 3.1. IndustrialFigure 3 Big shows Data that Management in iCPS, Big Drivers Data emerges from the accumulation of data gener- atedData in the is anmanufacturing invaluable asset lifecycle, in iCPS such [23 as]. Itplanning, enables smartproduction, manufacturing. and MRO Its [25]. strategic significance is to obtain the value for analytics through Big Data processing. Before the production process begins, the smart production plans conduct the consid- eration of resource data of the production process. Next, the relationship of global data allows the optimized production plan to be quickly generated, improving planning speed and accuracy. In the production process, real-time data should facilitate the monitoring of the pro- duction process, so the manufacturers could keep up to date with the production deviations to generate optimal operational control plans [24]. Fault diagnosis and operation process optimization can be accomplished by storing and analyzing data for active preventive maintenance, repair, and overhaul (MRO), through Big Data from IIoT. Figure3 shows that in iCPS, Big Data emerges from the accumulation of data generated in the manufacturing lifecycle, such as planning, production, and MRO [25]. Sensors 2021, 21, x FOR PEER REVIEW 6 of 24

A simplified data source for industrial analytics for iCPS useful for understanding industrial Big Data management drivers is shown below, including real-time data for in- dustrial processes and data for manufacturing information systems. In [23] a complete classification of data types for smart manufacturing can be found. • Resource data from production processes, which includes (a) data collected from IIoT infrastructure; (b) data collected from material, product, and service systems; (c) en- vironmental data. • Management data from manufacturing information systems (manufacturing execu- tion system (MES), enterprise resource planning (ERP), customer relationship man- agement (CRM), supply chain management (SCM), precedence diagram method Sensors 2021, 21, 4282 (PDM)), computer-aided systems (CAS), computer-aided design (CAD),6 of 25 computer- aided engineering (CAE), and computer-aided manufacturing (CAM)).

FigureFigure 3. 3.Manufacturing Manufacturing lifecycle lifecycle data. data. (Graphics (Graphics source: so https://online.visual-paradigm.com/urce: https://online.visual-paradigm.com/,, accessedaccessed date: date: 14 14 June June 2021). 2021).

3.2. AIndustrial simplified Big data Data source Attributes for industrial analytics for iCPS useful for understanding industrial Big Data management drivers is shown below, including real-time data for industrialThe processesscenario and technique data for manufacturingfocuses on identify informationing stimulus systems. Inand [23 ]how a complete the system should classificationrespond to it. of dataIt is typesalso related for smart to manufacturing quality attributes can be and found. seeks to highlight the consequences •of architecturalResource data decisions from production encapsulated processes, in whichthe des includesign. Quality (a) data attributes collected in from this context are relatedIIoT to infrastructure; the design (b)of datasoftware collected products, from material, where product, these attributes and service are systems; related (c) to functional requirementsenvironmental that data. the architecture must satisfy. However, data quality attributes, alt- • Management data from manufacturing information systems (manufacturing execution hough an important issue in manufacturing quality, is out of the scope of this work, and system (MES), enterprise resource planning (ERP), customer relationship manage- theyment are (CRM),covered supply in other chain works management such as (SCM), [26]. precedence The scheme diagram proposed method by (PDM)), [27], presented in Figurecomputer-aided 4, describes systems a quality (CAS), attribute. computer-aided The stimulus design (CAD), describes computer-aided an event engi- that reaches the systemneering and (CAE), represents and computer-aided a condition that manufacturing requires a (CAM)). response. The stimulus source can affect the way the stimulus is treated by the system. The response is the activity carried out in 3.2. Industrial Big Data Attributes response to the arrival of a stimulus. The measurement of the response allows determin- The scenario technique focuses on identifying stimulus and how the system should responding if the to it.requirement It is also related was to qualitysatisfied. attributes The artifa and seeksct is tothe highlight portion the of consequences the system that applies ofthe architectural requirement. decisions The encapsulatedenvironment in theis design.the set Quality of circumstances attributes in this in contextwhich arethe stimulus is relatedcarried to out. the design of software products, where these attributes are related to functional requirementsSix main that scenarios the architecture were mustconsidered satisfy. However,to identify data quality quality attributes attributes, althoughto illustrate the char- anacteristics important of issue interest in manufacturing for data management quality, is out in of Industry the scope of4.0. this Those work, scenarios and they are were first pro- covered in other works such as [26]. The scheme proposed by [27], presented in Figure4 , describesposed in a[12] quality and attribute. were defined The stimulus following describes the scenario an event technique that reaches for the quality system architectural and represents a condition that requires a response. The stimulus source can affect the way the stimulus is treated by the system. The response is the activity carried out in response to the arrival of a stimulus. The measurement of the response allows determining if the requirement was satisfied. The artifact is the portion of the system that applies the requirement. The environment is the set of circumstances in which the stimulus is carried out. Sensors 2021, 21, x FOR PEER REVIEW 7 of 24

requirements identification and description [28]. Figure 5 describes the stimulus source based on manufacturing lifecycle data that is the same for the six scenarios. Figure 6 shows Sensors 2021, 21, 4282 the scenario definition for each industrial Big Data attribute. Based on the above,7 ofthe 25 qual- ity attributes are defined in Table 2.

Sensors 2021, 21, x FOR PEER REVIEW 7 of 24

requirements identification and description [28]. Figure 5 describes the stimulus source based on manufacturing lifecycle data that is the same for the six scenarios. Figure 6 shows the scenario definition for each industrial Big Data attribute. Based on the above, the qual- ity attributes are defined in Table 2.

Figure 4. Scenario definition. Figure 4. Scenario definition. Six main scenarios were considered to identify quality attributes to illustrate the characteristics of interest for data management in Industry 4.0. Those scenarios were first proposed in [12] and were defined following the scenario technique for quality architectural requirements identification and description [28]. Figure5 describes the stimulus source based on manufacturing lifecycle data that is the same for the six scenarios. Figure6 shows the scenario definition for each industrial Big Data attribute. Based on the above, the quality attributes are defined in Table2. Figure 4. Scenario definition.

Figure 5. Stimulus source for the main attribute scenarios. (Graphics source: https://online.visual- paradigm.com/, accessed date: 14 June 2021).

FigureFigure 5. 5.Stimulus Stimulus source source for for the the main main attribute attribute scenarios. scenarios. (Graphics(Graphics source:source: https://online.visual- paradigm.com/paradigm.com,/, accessed accessed date:date: 1414 JuneJune 2021).2021).

Sensors 2021, 21, 4282 8 of 25 Sensors 2021, 21, x FOR PEER REVIEW 8 of 24

Figure 6. Scenario definition. (Graphics source: https://online.visual-paradigm.com/, accessed date: 14 June 2021). Figure 6. Scenario definition. (Graphics source: https://online.visual-paradigm.com/, accessed date: 14 June 2021). 4. Industrial Big Data Design Architecture for Analytics Different methodologies propose distinct approaches for the development of soft- ware architectures [29], most of them covering the architecture lifecycle defining the struc- ture of the system at a high level but present few specifications on how to do the design activity. The attribute-driven design approach (ADD) is the first software methodology

Sensors 2021, 21, x FOR PEER REVIEW 9 of 24

Sensors 2021, 21, x FOR PEER REVIEW 9 of 24 Sensors 2021, 21, x FOR PEER REVIEWbased on the design of quality attributes to satisfy different scenarios by selecting archi-9 of 24

Sensors 2021, 21, x FOR PEER REVIEWtectural structures and their representation into views. ADD takes the requirements, 9qual- of 24 Sensors 2021, 21, x FOR PEER REVIEW 9 of 24 ity, and functionality as input and as output produces a conceptual architecture that pro- Sensors 2021, 21, x FOR PEER REVIEW 9 of 24 based on the design of quality attributes to satisfy different scenarios by selecting archi- videsbased the on basis the design for the of architecture quality attributes design. to It satisfyalso includes different architecture scenarios byanalysis selecting and archi-doc- tectural structures and their representation into views. ADD takes the requirements, qual- basedumentationtectural on structuresthe as design an integral and of quality their part representation attributesof the design to intosatisfy process views. different [30]. ADD A detailedscenarios takes the architecture requirements,by selecting may archi- qual- re- ity,based and on functionality the design ofas qualityinput and attributes as output to satisfyproduces different a conceptual scenarios architecture by selecting that archi- pro- tecturalquireity, and refining structures functionality sketches and astheir created input representation andduring as output early into designproduces views. iterations ADD a conceptual takes and the then architecture requirements, joining those that qual- pro-de- videsbasedtectural theon structures thebasis design for andthe of architecture theirquality representation attributes design. to Itinto satisfy also views. includes different ADD architecture takesscenarios the requirements, by analysis selecting and archi- qual-doc- Sensors 2021, 21, 4282 ity,signvides and decisions the functionality basis to for implementation the as architectureinput and asoptions design. output a ccessibleItproduces also includes through a conceptual architecture frameworks.9 of 25 architecture analysis Additionally, that and pro- doc- umentationtecturality, and structuresfunctionality as an integral and as their input part representation andof the as designoutput into processproduces views. [30]. ADDa conceptual A detailedtakes the architecture architecture requirements, that may qual- pro- re- videsADDumentation theextends basis as to for an use theintegral reference architecture part architectures of thedesign. design It matc alsoprocesshing includes [30].with Aarchitecturea catalogdetailed of architecture technology analysis and mayratings doc- re- quireity,vides and refiningthe functionality basis sketches for the as architecture created input and during as design. output early It produces designalso includes iterations a conceptual architecture and then architecture analysisjoining those thatand pro-doc- de- umentationincludingquire refining tactics, as an sketches patterns,integral created part and of frameworks. during the design early process design [30]. iterations A detailed and then architecture joining thosemay re-de- signvidesumentation decisions the basis as toanfor implementationintegral the architecture part of theoptions design. design aItccessible processalso includes [30].through A architecture detailed frameworks. architecture analysis Additionally, and may doc- re- quiresign refiningdecisions sketches to implementation created during options early a designccessible iterations through and frameworks. then joining Additionally, those de- Table 2. Quality attributesTableADDumentationquire for 2. data Qualityextendsrefining management as attributes to sketchesan use integral reference for for created industrial data part managementarchitectures of during Bigthe Data.design early for matc processindustrialdesignhing iterations[30]. with Big AData. a detailedcatalog and then of architecture technology joining those may ratings de-re- signADD decisions extends to useimplementation reference architectures options accessible matching through with a catalog frameworks. of technology Additionally, ratings includingquiresign decisions refining tactics, sketchesto patterns,implementation created and frameworks.during options early a ccessible design iterations through andframeworks. then joining Additionally, those de- ADD extends to use reference architectures matching with a catalog of technology ratings Quality AttributesQuality Attributes signADDincluding decisions extends tactics, toto use implementation patterns, reference and architectures frameworks. options Descriptiona matcccessible Descriptionhing throughwith a catalog frameworks. of technology Additionally, ratings including tactics, patterns, and frameworks. TableADDincluding 2. Quality extends tactics, attributes toThe use patterns, IIoT reference for indata aandThe smart management architectures IIoTframeworks. factory in a smart for generates matc industrial factoryhing generates large withBig Data. volumesa catalog large of of datatechnology and consider- ratings Table 2. Quality attributes for data management for industrial Big Data. including tactics,ing patterns, that data andvolumes comes frameworks. from of data heterogeneous and considering sources that data such as programmable Quality Attributes Table 2. Quality attributes for data management for industrial Description Big Data. Data source integration Table 2. Quality attributeslogic controller for datacomes management (PLCs), from heterogeneoussupervisory for industrial control sources Big Data. and such data as acquisition Quality Attributes programmable logic controller Description (PLCs), Table 2. Quality attributesThe IIoT for in data a smart management factory forgenerates industrial large Big Data.volumes of data and consider- fromQuality iCPS Attributes (SCADA),The IIoT in enterprise supervisorya smart factory resource control generates andplanning Description data large acquisition (ERP), volumes for the of combination, data and consider- inte- Data source integration fromQuality iCPS Attributes ing that data comes from heterogeneous Description sources such as programmable Thegration,ing IIoT that and indata a(SCADA), latersmart comes storage factory from enterprise in heterogeneousgenerates large-scale resource large planningBig sources volumes Data, itsuch requires of data as programmable and an extract, consider- DataQuality source integration Attributes logicThe IIoT controller in a smart (PLCs), factory supervisory generates Description control large volumes and data of acquisition data and consider- ingtransform, that data and(ERP), comes load for from (ETL) the combination,heterogeneous system. integration, sources and such as programmable Data source integration Theinglogic thatIIoT controller datain a smartcomes (PLCs), factory from supervisory heterogeneous generates largecontrol sources volumes and suchdata of asacquisitiondata programmable and consider- from iCPS (SCADA), enterpriselater storage resource in large-scale planning Big Data,(ERP), it for the combination, inte- Datafrom source iCPS integration logicTo(SCADA), ensure controller Big enterprise Data (PLCs), processing resource supervisory withplanning controlvery (ERP),low and latency fordata the acquisitionin combination, real-time coming inte- DataData sourceprocessing integration scalable gration,inglogic that controller dataand requires latercomes (PLCs), storage from an extract, supervisory heterogeneousin large-scale transform, control Bigand sources Data, loadand data suchit requires acquisition as programmable an extract, from iCPS (SCADA),fromgration, IoT andtechnologies enterprise later storage resource (smart in large-scale sensors,planning radio (ERP),Big frequencyData, for itthe requires combination, identification an extract, inte- Datafromand elastic sourceiCPS integration transform,logic(SCADA), controller enterpriseand(ETL) load(PLCs), system. (ETL) resource supervisory system. planning control (ERP), and for data the acquisition combination, inte- gration,(RFID)), and the Tohybridlater ensure storage cloud Big Datain architecture large-scale processing must Big with Data, be very scalable lowit requires and anelastic. extract, from iCPS (SCADA),gration,transform, and enterprise and later load storage resource(ETL) in system.large-scale planning Big(ERP), Data, for it the requires combination, an extract, inte- Totransform, ensure Big andlatency Data load inprocessing (ETL) real-time system. comingwith very from low IoT latency in real-time coming Data processing scalable gration,transform, ToTo provide ensure and and Biga later prescriptive Dataload storage (ETL)processing in analyticssystem. large-scale with estimation very Big low Data, latencyfrom it requires real-time in real-time an data extract, comingfrom Data processing scalableTheData and composition processing elastic ofscalable data- from IoT technologiestechnologies (smart (smart sensors, sensors, radio radio frequencyfrequency identification and elastic transform,ToIIoTfrom ensure technologies IoT Bigtechnologiesand Data load (smart processing (ETL) (smart sensors, system. sensors,with RFID) very radio provided low frequencylatency from in theidentificationreal-time expected coming man- Datadrivenand processingelastic events scalable (RFID)),To ensure the Big hybrididentification Data cloudprocessing (RFID)), architecture with the hybridvery must low cloud be latency scalable in and real-time elastic. coming Data processing scalable fromufacturing(RFID)), IoT technologiesthe parameters hybrid cloud (smart in a architecturereliable sensors, response radio must frequency time.be scalable identification and elastic. and elastic Tofrom ensure IoT technologies Bigarchitecture Data processing (smart must be sensors, with scalable very radio and low elastic. frequency latency in identification real-time coming Dataand elastic processing scalable To(RFID)), provide the a hybrid prescriptive cloud analyticsarchitecture estimation must be from scalable real-time and elastic. data from The composition of data- from(RFID)),To To provide provide IoT thetechnologies a Tohybrida predictive prescriptive provide cloud (smart a prescriptiveanalytics architecture analytics sensors, model analytics estimation radiomust from estimation frequencybe IIoT scalablefrom technologies real-time identification and elastic. data (smart from andThe elastic composition of data- IIoT technologiesfrom real-time(smart sensors, data from RFID) IIoT technologiesprovided from the expected man- drivenOptimization events data services (RFID)), Tosensors,IIoT provide technologies theRFID) ahybrid prescriptive in hybrid (smartcloud cloudsarchitectureanalyticssensors, processing RFID) estimation must provided ofbe Bigfromscalable Data from real-time andfor the iCPS elastic.expected data in from a relia- man- The composition of data-drivenThedriven composition events events of data- ufacturing To provide parameters a(smart prescriptive sensors, in a analytics RFID)reliable provided responseestimation from time. thefrom real-time data from The composition of data- IIoTbufacturingle response technologies parameters time. (smart in sensors, a reliable RFID) response provided time. from the expected man- driven events ToIIoT provide technologies aexpected prescriptive (smart manufacturing sensors, analytics RFID) parametersestimation provided in from a from real-time the expected data from man- Thedriven composition events of data- Toufacturing provide parametersa predictive in analytics a reliable model response from time. IIoT technologies (smart IIoTufacturingToTo provide providetechnologies parameters specific reliablea predictive (smart responsealgorithms in sensors, analyticsa reliable time. of RFID) data modelresponse analytics provided from time. IIoT adapted from technologies the to expected embedded (smart man- Optimizationdriven events data services sensors, RFID)To in provide hybrid a predictiveclouds processing analytics model of Big from Data for iCPS in a relia- Optimization data services ufacturingTohardwaresensors, provide RFID) thatparameters a predictive produces in hybrid in analytics insightsa clouds reliable processingclmodel responseose to from the time. ofprocess/specific IIoT Big technologiesData for iCPS machine (smart in a relia- Embedded analytics bTole provideresponse a IIoTtime. predictive technologies analytics (smart model sensors, from RFID) IIoT in technologies (smart Optimization data servicesOptimization data services sensors,bbasedle response on RFID) own time. generatedin hybrid cloudsdata and processing data-at-rest of Big sources Data infor a iCPSreliable in are- relia- Optimization data services Tosensors, provide RFID) ahybrid predictive in hybrid clouds analytics clouds processing processing model of Big from Data of IIoT forBig Datatechnologies for iCPS (smart in a relia- Toble provide response specific time. algorithms of data analytics adapted to embedded Optimization data services sensors,bsponseTole response provide time.RFID) iCPS specific time. in inhybrid a algorithms re-liable clouds response processingof data time. analytics of Big adapted Data for to iCPS embedded in a relia- hardware thatTo produces provide specific insights algorithms close to of the data process/specific machine Embedded analytics bToForlehardware provideresponse business thatspecific time.decision produces algorithms making insights throug of datacloseh analytics advancedto the process/specific adapted prescriptive to embedded analyticsmachine Embedded analytics bToased provide on own specificanalytics generated algorithms adapted data to and embeddedof datadata -at-restanalytics hardware sources adapted in ato reliable embedded re- hardwareandbased parametric on thatown produces generated analysis insightsof data business and close data key to-at-rest performancethe process/specific sources indicators in a reliable machine (KPIs) re- EmbeddedAnalytics-based analytics decision sponseTohardware provide time. that specific that produces produces algorithms insights insights of closecldataose to analyticsto the the process/specific adapted to embedded machine Embedded analytics Embedded analytics bandased estimate on own error/risk generated or data predictions and data of-at-rest these sourcesKPIs, it requiresin a reliable the re-inte- hardwarebsponseased on time. ownthatprocess/specific generatedproduces insights data machine and cl dataose based to-at-rest the on ownprocess/specific sources in a reliable machine re- support For business decision making through advanced prescriptive analytics Embedded analytics sponsegrationFor business time. of manufacturinggenerated decision data making data and data-at-rest comingthrough from advanced sources IIoT in technologies prescriptive a andanalytics manu- andbsponseased parametric on time. own generated analysis of data business and data key-at-rest performance sources indicators in a reliable (KPIs) re- Analytics-based decision Forfacturingand business parametric informationreliable decision analysis response making systems of time.business throug (MIS). h key advanced performance prescriptive indicators analytics (KPIs) Analytics-based decision andsponseFor businessestimate time. For error/riskdecision business making or decision predictions throug makingh of advanced through these KPIs, prescriptive it requires analytics the inte- support andand parametric estimate error/risk analysis ofor businesspredictions key of performance these KPIs, itindicators requires the(KPIs) inte- Analytics-basedsupport decision grationForand business parametric of manufacturing advanceddecision analysis prescriptivemaking of data business throug coming analytics keyh fromadvanced performance and IIoT technologiesprescriptive indicators analyticsand (KPIs) manu- Analytics-based decision The ADD and processgration estimate consistsof manufacturing error/risk of seven or predictionsstages data coming (see Figure of fromthese 7), IIoT KPIs, beginning technologies it requires with theandinput inte- manu- iden- support facturingandand parametricestimate informationparametric error/risk analysis systems analysis or of predictions business of(MIS). business key of performancethese key KPIs, it indicators requires the (KPIs) inte- Analytics-basedsupport decision tifications for thegration design of ofmanufacturing performancethe architecture. indicators data These coming (KPIs) inputs from and are estimateIIoT the technologiesdesign purpose, and the manu- main Analytics-based decision support andgrationfacturing estimate of manufacturinginformation error/risk orsystems predictionsdata coming (MIS). of from these IIoT KPIs, technologies it requires andthe inte-manu- support functional requirements,facturing information theerror/risk main scenarios orsystems predictions of(MIS). quality of these attributes, KPIs, it constraints, and archi- The ADD grationfacturingprocess ofconsists informationmanufacturing of seven systems stagesdata coming(MIS). (see Figure from 7),IIoT beginning technologies with and input manu- iden- tecturalThe concerns. ADD process With theconsistsrequires review of thesevenof mention integration stagesed (s of entries,ee manufacturing Figure the 7), first beginning step is to with confirm input there iden- tifications for thefacturing design ofinformation thedata architecture. coming systems from These IIoT (MIS). technologies inputs are andthe design purpose, the main istifications sufficientThe ADD for requirement theprocess design consists information of the ofarchitecture. seven through stages These the (see industrial inputs Figure are 7), Big thebeginning Data design lifecycle: purpose,with input data the acqui- iden- main functionalThe ADD requirements, process consists themanufacturing main of sevenscenarios stages information of (squalityee Figure systems attributes, 7), (MIS). beginning constraints, with inputand archi- iden- tificationssition–datafunctional for preservation–datarequirements, the design of thethe architecture.mainprocessing. scenarios These of quality inputs areattributes, the design constraints, purpose, andthe mainarchi- tecturaltificationsThe concerns. ADD for the process design With consists the of thereview architecture. of seven of mention stages Theseed (s eeentries, inputs Figure arethe 7), thefirst beginning design step is purpose, withto confirm input the iden-theremain functionaltecturalThe concerns.second requirements, step With is tothethe choose reviewmain scenariosa ofsystem mention elementof edquality entries, to attributes, split the throughfirst constraints,step the is tomanufacturing confirm and archi- there istificationsfunctional sufficient for requirements, requirement the design ofinformation the the main architecture. scenarios through These ofthe quality industrialinputs attributes, are Big the Datadesign constraints, lifecycle: purpose, data and the acqui- archi-main 4. Industrial Big DatatecturallifecycleisDesign sufficient concerns. (smart Architecture requirement production With forthe information Analytics reviewplanning–manuf of mention throughacturinged the entries, industrial process the Bigfirst monitoring–active Data step lifecycle:is to confirm data preven- thereacqui- sition–datafunctionaltectural concerns. requirements, preservation–data With the the review main processing. scenariosof mention ofed quality entries, attributes, the first stepconstraints, is to confirm and archi- there Different methodologiesistivesition–data sufficient maintenance). propose requirement preservation–data distinct information approaches processing. forthrough the development the industrial of Big software Data lifecycle: data acqui- tecturalis sufficientThe concerns. second requirement step With is theto information choosereview aof system throughmention element theed entries,industrial to split the Big firstthrough Data step lifecycle: theis to manufacturing confirm data acqui- there architectures [29], mostsition–data ofIn themThe the second coveringpreservation–datathird step,step the ispotential architecture to choose processing. industrial a lifecycle system Bi defininggelement Data management theto split structure through ofdrivers the aremanufacturing identified, lifecycleissition–data sufficient (smart requirementpreservation–data production information planning–manuf processing. through acturing the industrial process Big monitoring–active Data lifecycle: data preven- acqui- the system at a highranking levellifecycleThe but the second present(smart requirements stepproduction few specificationsis to basedchoose planning–manuf on a onsystemtheir how impa toelementacturing doct theon to designthe process split architecture activity.through monitoring–active the(industry manufacturing business preven- tivesition–data maintenance).The second preservation–data step is to choose processing. a system element to split through the manufacturing The attribute-drivenlifecyclefunction, designtive maintenance). approach(smart analytics production (ADD) Big Data, is theplanning–manuf Big first Data software storage,acturing methodology and infrastructure). process based monitoring–activeon preven- lifecycleInThe the (smartsecond third production stepstep, is potential to choose planning–manuf industrial a system Bi elementacturingg Data managementto process split through monitoring–active drivers the manufacturingare identified, preven- the design of qualitytive attributes maintenance).In the tothird satisfy step, different potential scenarios industrial by Bi selectingg Data management architectural drivers are identified, rankinglifecycletive maintenance). the(smart requirements production based planning–manuf on their impaacturingct on theprocess architecture monitoring–active (industry business preven- structures and their representationrankingIn the the third requirements into step, views. potential ADD based industrial takes on their the requirements, Biimpag Datact on management the quality, architecture and drivers (industry are identified, business function,tive maintenance).In the analytics third step, Big Data,potential Big Dataindustrial storage, Big and Data infrastructure). management drivers are identified, functionality as inputranking and as the output requirements produces abased conceptual on their architecture impact on that the provides architecture the (industry business rankingfunction,In the the analyticsthird requirements step, Big potential Data, based Big industrial onData their storage, impaBig Dataandct on infrastructure). management the architecture drivers (industry are identified, business basis for the architecturefunction, design. analytics It also Big includes Data, architectureBig Data storage, analysis and and infrastructure). documentation rankingfunction, the analytics requirements Big Data, based Big Dataon their storage, impa andct on infrastructure). the architecture (industry business as an integral part of the design process [30]. A detailed architecture may require refining function, analytics Big Data, Big Data storage, and infrastructure). sketches created during early design iterations and then joining those design decisions to implementation options accessible through frameworks. Additionally, ADD extends to use reference architectures matching with a catalog of technology ratings including tactics, patterns, and frameworks.

The ADD process consists of seven stages (see Figure7), beginning with input iden- tifications for the design of the architecture. These inputs are the design purpose, the main functional requirements, the main scenarios of quality attributes, constraints, and Sensors 2021, 21, x FOR PEER REVIEW 10 of 24

Step four involves choosing a design concept that satisfies the architectural drivers. A layered architecture is adopted to fulfill the requirements of industrial Big Data analyt- ics attributes (infrastructure layer, monitoring layer, presentation layer). In step five, elements of the architecture are instantiated, and responsibilities are as- signed (infrastructure layer components, monitoring layer components, presentation layer components). In step six, interfaces are defined for instantiated elements through the component Sensors 2021, 21, 4282 deployment (Apache Spark, Apache Hive, Hadoop, Apache Kafka, Spark streaming,10 of 25 Grafana, and Python libraries). Step seven includes verifying and refining requirements and making them con- straints for architecture elements instantiated (data source integration from iCPS, data architectural concerns. With the review of mentioned entries, the first step is to confirm processing scalable and elastic, composition of data-driven events, optimization data ser- there is sufficient requirement information through the industrial Big Data lifecycle: data vices, embedded analytics, analytics based decision support). Finally, it is repeated as re- acquisition–data preservation–data processing. quired.

FigureFigure 7. 7. TheThe process process of of attribute-driven attribute-driven design approach.approach. (Graphics(Graphics source: source: image image library library of of Microsoft Microsoft Office). Office).

The second step is to choose a system element to split through the manufactur- ing lifecycle (smart production planning–manufacturing process monitoring–active prev- entive maintenance). In the third step, potential industrial Big Data management drivers are identified, ranking the requirements based on their impact on the architecture (industry business function, analytics Big Data, Big Data storage, and infrastructure). Sensors 2021, 21, 4282 11 of 25

Step four involves choosing a design concept that satisfies the architectural drivers. A layered architecture is adopted to fulfill the requirements of industrial Big Data analytics attributes (infrastructure layer, monitoring layer, presentation layer). In step five, elements of the architecture are instantiated, and responsibilities are assigned (infrastructure layer components, monitoring layer components, presentation layer components). In step six, interfaces are defined for instantiated elements through the component deployment (Apache Spark, Apache Hive, Hadoop, Apache Kafka, Spark streaming, Grafana, and Python libraries). Step seven includes verifying and refining requirements and making them constraints for architecture elements instantiated (data source integration from iCPS, data process- ing scalable and elastic, composition of data-driven events, optimization data services, embedded analytics, analytics based decision support). Finally, it is repeated as required.

4.1. Data Management Architecture A layered architecture is adopted to fulfill the requirements of industrial Big Data analytics attributes, and to be able to handle a wide range of workloads and industrial scenarios, where low read and write latency are required [12]. This Big Data architecture is composed of three layers of components that attempt to solve the problem of calculating functions in real-time, as depicted in Figure8. 1. An infrastructure layer that manages an immutable master data set and only calculates query functions called views lots, 2. A monitoring layer that compensates for the high latency of data actualization to the presentation layer and only processes recent data, and 3. A presentation layer that indexes batch views for low-latency ad hoc queries.

4.2. Infrastructure Layer The manufacturing resource data component stores Big Data from low data ingestion (for example, data from transactional ), or data with high data ingestion (like, for example, data streams from sensors), like a historic, append-only set of raw data. Then a list of features suitable for applying the respective model for the learning algorithms is obtained by processing data. If there are criteria changes, data are reprocessed. In the learning model component, the parameters describing the data model depend on the machine learning methods used. Anomaly detection would include time stamp aggregates of manufacturing data operation. In the context of processing large data sets, it is significant to use distributed data processing depending on a large degree of the actual data and learning method used. The OLAP model and Big Data mining component through Big storage allow the multidimensional data model (MDM) that enables data analysis of large volumes of structured data to support decision-making processes in a business intelligence context.

4.3. Monitoring Layer The monitoring layer ingests a new data stream from IIoT. Its main function is moni- toring the incoming data in current time. It deals with data processing in real-time that is usually time-dependent, so it is crucial to reduce latencies added by accessing the saved learning model in the presentation layer. The process stream component processes time-series data to obtain the features out of the new data stream. In the increment view component, the continuous measurement over time represents an important function in identifying data outliers, referring to the fact that the patterns are not supposed to change abruptly except unusual work data occurring in processes, then transferring the obtained model input to the real-time monitoring component in the presentation layer in the match- ing time period, with the result of the learning model and if the threshold is exceeded, and is detected consecutively over some time a data event is presenting in the real-time Sensors 2021, 21, x FOR PEER REVIEW 11 of 24

Sensors 2021, 21, 4282 4.1. Data Management Architecture 12 of 25 A layered architecture is adopted to fulfill the requirements of industrial Big Data analytics attributes, and to be able to handle a wide range of workloads and industrial scenarios,view. Even where though low sensor read errorsand write or other latency data ar imprecisionse required [12]. may This occur Big is notData required architecture to isdisseminate composed of an three anomaly layers event. of components Therefore, that the anomalyattempt to event solve takes the problem place if outliers of calculating are functionsdetermined in real-time, sequentially as overdepicted a specific in Figure time. 8.

FigureFigure 8. 8.LayeredLayered data data management management architecture forfor iCPSiCPS Big Big Data Data analytics analytic ins in Industry Industry 4.0. 4.0.

Sensors 2021, 21, 4282 13 of 25

4.4. Presentation Layer The presentation layer shows a view of the output of data patterns produced by the layer functions of infrastructure and presentation through real-time and batch views. Thus, the real-time monitoring component analyzes the delivery of continuously updated infor- mation to identify anomalies in aggregated data over time from manufacturing operations from the increment views component, to identify serious problems using the learning model parameters from the fault detection model estimated in the model view component. Furthermore, forecasting unusual process conditions component predicts problems of pro- cess performance analyzing data, tracing recurrent patterns, and optimization views from the model are used to flag abnormal behaviors, applying data-driven events to proactive monitoring. In the batch views, the fault diagnosis component presents predictive analysis to improve anomaly forecasting and diagnosis of manufacturing sys- tems, with optimization views of process equipment and facilities equipment based on Big Data mining methods implemented in the learning models component. The analytics-based decision support component uses batch views to supports on-line analytical processing (OLAP) analysis to inform, observe, and show how big or small the problem is, using a Big Data warehouse from the infrastructure layer. The figure also shows how the characteristics required in Industry 4.0 are related to the different components of the architecture. Multisource integration is done with coordination with the architectural components of the process data stream, heterogeneous stream, and manufacturing resources data. Scalable and elastic data processing occurs in the component of manufacturing resources data. The composition of data-driven events is done in the real-time monitoring, increment views and model view. Optimization data services are performed by the incremental view component and model views component. Embedded analytics and analysis-based decision support are calculated in precompute views and presented in the batch view component.

4.5. Component Deployment Nowadays, different technologies for Big Data analytics are being developed, often overlapping functional requirements and quality data management attributes. For Big Data architects of iCPS systems, the selection of technology is a challenging question that requires the attention of many implementation details and compatibility constraints. Figure9 shows technologies that are accessible under an open-source license. Once data is available, the data characteristics extraction component is used for precomputing views of machine learning or business intelligence purposes. A distributed data warehouse through Hive enables the OLAP and data mining component. However, Hive was developed on HDFS to enable distributed storage and processing for collecting large data sets. The learning model component uses data stored in Hadoop to build learning models using the distributed Data processing framework Apache Spark, an analytics engine for Big Data that enables better performance using the read access memory (RAM) of the server to process machine-learning algorithms. For data monitoring purposes, the process data stream component obtains IIoT data available with Spark Streaming interacting with Apache Kafka, offering scalable, high- throughput, fault-tolerant stream processing of live data stream processing for the compo- sition of data-driven events. Then the data is stored or pushed to the data-driven event component for real-time analysis. For data visualization, after processing data for analytics use with development tools, Grafana integrates the visualized data for presentation purposes. The component for real- time monitoring uses the Spark Streaming processing of data-driven events to prepare data for Grafana. Additionally, the decision support component takes OLAP data from Hive and outputs over Grafana. Finally, in the same way, the fault diagnosis component uses development tools based on the python libraries to process machine learning algorithms over Big Data stored in Hadoop to prepare data for Grafana. Sensors 2021, 21, x FOR PEER REVIEW 13 of 24

component. The analytics-based decision support component uses batch views to sup- ports on-line analytical processing (OLAP) analysis to inform, observe, and show how big or small the problem is, using a Big Data warehouse from the infrastructure layer. The figure also shows how the characteristics required in Industry 4.0 are related to the different components of the architecture. Multisource integration is done with coordi- nation with the architectural components of the process data stream, heterogeneous stream, and manufacturing resources data. Scalable and elastic data processing occurs in the component of manufacturing resources data. The composition of data-driven events is done in the real-time monitoring, increment views and model view. Optimization data services are performed by the incremental view component and model views component. Embedded analytics and analysis-based decision support are calculated in precompute views and presented in the batch view component.

4.5. Component Deployment Nowadays, different technologies for Big Data analytics are being developed, often overlapping functional requirements and quality data management attributes. For Big Sensors 2021, 21, 4282 Data architects of iCPS systems, the selection of technology is a challenging question14 of that 25 requires the attention of many implementation details and compatibility constraints. Fig- ure 9 shows technologies that are accessible under an open-source license.

Figure 9. Deployment of component technologies. (Graphics source: image library of Microsoft Office). Figure 9. Deployment of component technologies. (Graphics source: image library of Microsoft Office).

FigureOnce 9data also is shows available, the the architecture data characte componentsristics extraction that satisfy component industrial is used Big for Data pre- attributes.computing They views are of the machine characteristics learning required or business in the intelligence analytics system purposes. for Industry A distributed 4.0. data warehouse through Hive enables the OLAP and data mining component. • The multisource integration requirement is fulfilled in the infrastructure layer with However, Hive was developed on HDFS to enable distributed storage and pro- the following components: the architectural components of the heterogeneous stream, cessing for collecting large data sets. The learning model component uses data stored in process data stream, and manufacturing resources data. Hadoop to build learning models using the distributed Data processing framework • Scalable and elastic data processing requirement is fulfilled in the infrastructure layer Apache Spark, an analytics engine for Big Data that enables better performance using the with the manufacturing resources data component. read access memory (RAM) of the server to process machine-learning algorithms. • The composition of data-driven events is fulfilled in the monitoring layer with the For data monitoring purposes, the process data stream component obtains IIoT data following components: real-time monitoring, increment views, and model view. available with Spark Streaming interacting with Apache Kafka, offering scalable, high- • Optimization data services are fulfilled by the incremental view component and model views component. • Embedded analytics and analysis-based decision support are fulfilled in precompute views and the component of batch view.

5. Case Study: Fault Diagnosis

A notably stagnant trend in the reviewed literature is the lack of applications of iCPS analytics. The data models, learning methods, and architectures are somewhat different from other areas relating to maintenance and diagnosis. Implementing Big Data Analytics architecture in factories offers several advantages. Consider a factory that searches providing self-awareness and self-prediction of machine data, controlling parameters to monitor the status, and providing the self-comparison capability into a production line consisting of numerous amounts of machine tools. This level of knowledge guarantees near-zero downtime production. The fault diagnosis scenarios are selected to illustrate how Big Data analytics architec- ture is applied to meet the functional and quality requirements designing an architecture analytics solution in iCPS that fulfills the industrial Big Data attributes presented in Table3 . Scenario-based architectural evaluation methods are a well-established approach to vali- date the architectural design and to analyze the decisions that have been made to achieve the design approach [31]. They are thorough and comprehensive approaches that gather the stakeholders of a system and guide them through a structured process that explores the architectural design options and the resulting implications. Sensors 2021, 21, 4282 15 of 25 Sensors 2021, 21, x FOR PEER REVIEW 15 of 24

Table 3. Requirements of fault diagnosis scenarios. 1. Fault diagnosis—shows the benefits and fulfills the data quality attribute named op- Requirementtimization Industrialdata services. Big Data Attribute Scenario Provide iCPS analytics with the 2. Real-time monitoring in iCPS—shows the benefits and fulfills the data quality attrib- comparison ability, where machinery ute named the composition of data-driven events. performance logs can be compared3. withForecasting unusual process conditions—shows the benefits of data quality attrib- and rated among machines. utes named Optimizationoptimization datadata services services and data-driven events. Fault diagnosis Provide fault diagnosis of the machinery based on information about machine Table 3. Requirements of fault diagnosis scenarios. performance logs similarities. Shift, drift, outliers in underlyingRequirement Industrial Big Data Attribute Scenario components. The virtual sensor is Provide iCPS analytics with the comparison ability, where ma- working, and senses shift, drift, outliers. The composition of data-driven events Real-time monitoring Rootchinery causes performance among parameters. logs can Which be compared with and rated specificamong parametersmachines. contribute most to Optimization data services Fault diagnosis theProvide problem? fault diagnosis of the machinery based on information about machine performance logs similarities. Shift, drift, outliers in underlying Thecomponents. purpose ofThe the virtual scenarios sen- is to show architectural decision sensitivity and tradeoff sor is working, and senses shift,points drift, of architecturaloutliers. design for fault diagnosisThe composition in industrial of data-driven plants through Real-time an architectural moni- Root causes among parameters.instance Which for specific industrial-analytics parameters con- Big Data for the useevents case of fault diagnosistoring presented in tribute most to the problem? Figure 10.

FigureFigure 10. 10.Architectural Architectural instance instance forfor industrial-analyticsindustrial-analytics BigBig DataData forfor the use case of fault diagnosis.

ToFigure better 11 illustrate shows the the notation real-world proposed applicability to represent of the architecture proposed architecture, components, a com- set of realponent cases groups, extracted repositories, from research and process literature flows. were These chosen notation as a icons means are to used show in howthe dif- the componentsferent diagrams of the of architectureprocess view fit to with describe the maininteractions elements between proposed the architecture to provide solutions compo- tonents real and situations. the elements The scenarios that fit the provide architectu a viewre components of the possible with fittingthe use of case the elements solutions proposedpresented to below. the components The component of the icon architecture, describes hypotheticallyelements that fit considering an architecture the solutions compo- asnent if they for the were use implemented case presented. following The components the proposed group architecture. icon represents The works several selected compo- are relatednents related to the nextto each three other. application The repository scenarios: icon represents the area of data, models, or views. Finally, the process flow icon describes the sequence of the process view diagram.

Sensors 2021, 21, 4282 16 of 25

1. Fault diagnosis—shows the benefits and fulfills the data quality attribute named optimization data services. 2. Real-time monitoring in iCPS—shows the benefits and fulfills the data quality at- tribute named the composition of data-driven events. 3. Forecasting unusual process conditions—shows the benefits of data quality attributes named optimization data services and data-driven events. Figure 11 shows the notation proposed to represent architecture components, compo- nent groups, repositories, and process flows. These notation icons are used in the different diagrams of process view to describe interactions between the architecture components and the elements that fit the architecture components with the use case elements presented below. The component icon describes elements that fit an architecture component for the use case presented. The components group icon represents several components related to Sensors 2021, 21, x FOR PEER REVIEWeach other. The repository icon represents the area of data, models, or views. Finally,16 of the24

process flow icon describes the sequence of the process view diagram.

Figure 11. Notation icons for the elements of the process view diagram. Figure 11. Notation icons for the elements of the process view diagram. 5.1. Fault Diagnosis 5.1. FaultFault Diagnosis diagnosis plays a critical role in industrial systems. A Big Data industrial applicationFault diagnosis for fault plays diagnosis a critical helps role increase in industrial process systems. efficiency, A Big prevent Data industrial accidents, appli- and savescation costs. for fault Unfortunately, diagnosis helps in a traditional increase pr plant,ocess to efficiency, have a total prevent understanding accidents, of and the systemsaves tocosts. be optimizedUnfortunately, is almost in a traditional impossible plant, because to have such a total systems understanding are composed of the of system numerous to componentsbe optimized and is almost operate impossible in a variety because of conditions. such systems However, are composed most of of the numerous available com- data containponents records and operate of incomplete in a variety or missing of conditions faults due. However, to human most factors of the or available monitoring data systems con- thattain providerecords of inconsistent incomplete data or missing (e.g., obsolete faults due format). to human factors or monitoring systems that Inprovide the optimization inconsistent data data services (e.g., obsolete requirement, format). the development of a high-performance fault diagnosisIn the optimization system for data iCPS services requires requir mainlyement, two types the development of information: of first,a high-perfor- a thorough understandingmance fault diagnosis of the target system system, for iCPS and requires second, conditionmainly two monitoring types of information: data/fault recording. first, a Athorough broad level understanding of knowledge of about the systemtarget failuressystem, (i.e.,and mechanisms, second, condition fundamental monitoring causes) candata/fault facilitate recording. the effective A broad failure level diagnosis of knowledge for industrial about system plant systems.failures (i.e., mechanisms, fundamentalOn the other causes) hand, can thefacilitate industry the effectiv has manye failure machine diagnosis tools, for this industrial way time-machine plant sys- recordstems. from sensors’ data of critical components can be used for prescriptive analytics modelsOn tothe provide other hand, self-diagnosis. the industry Therefore, has many machine a significant tools, amount this way of time-machine fault monitoring rec- throughords from log sensors’ data, if data available, of critical can providecomponen excellentts can be information used for prescriptive for data-based analytics diagnostics. mod- els toA provide data-based self-diagnosis. diagnostic Therefore, is a common a significant approach thatamount can beof fault applied monitoring in different through indus- triallog data, contexts if available, for fault can diagnosis. provide This excellent can be information illustrated with for data-based the following diagnostics. examples in real scenarios:A data-based in [32] an diagnostic intelligent is faulta common diagnosis approach approach that for can a be mechanical applied in problem different using in- Bigdustrial Data contexts and unsupervised for fault diag featurenosis. learningThis can forbe illustrated providing with an accurate the following prediction examples in the casein real of motorscenarios: bearings in [32] logs an isintelligent presented. fault An diagnosis additional approach data-based for faulta mechanical diagnosis problem example relatedusing Big to theData petroleum and unsupervised industry feature is presented learning in [for33], providing where it was an accurate observed prediction that regular in the case of motor bearings logs is presented. An additional data-based fault diagnosis ex- ample related to the petroleum industry is presented in [33], where it was observed that regular maintenance cannot effectively detect faults in reciprocating compressors. In such work, machine learning methods were applied to analyze data and to diagnose faults to predict potential faults in compressors. To illustrate one of many possible applications of the proposed architecture for Big Data analytics in the industry for fault diagnosis scenarios, a real case for the automation problem of the machining process is discussed below, where an effective monitoring sys- tem of the condition of the face milling tool is required. In [34] a machine learning ap- proach is presented for faulty conditions based on cutting process data to control the ma- chining process, feeding optimized data to the machine controller. The system aims for a long machine tool life and processing quality to ensure high productivity.

Sensors 2021, 21, 4282 17 of 25

maintenance cannot effectively detect faults in reciprocating compressors. In such work, machine learning methods were applied to analyze data and to diagnose faults to predict potential faults in compressors. To illustrate one of many possible applications of the proposed architecture for Big Data analytics in the industry for fault diagnosis scenarios, a real case for the automation problem of the machining process is discussed below, where an effective monitoring system of the condition of the face milling tool is required. In [34] a machine learning approach

Sensors 2021, 21, x FOR PEER REVIEWis presented for faulty conditions based on cutting process data to control the machining17 of 24 process, feeding optimized data to the machine controller. The system aims for a long machine tool life and processing quality to ensure high productivity. The fitting of the proposed architecture to the machining process with Big Data supportThe and fitting machine of the learningproposed is architecture described following to the machining the process process view with of the Big architecture Data sup- proposedport and inmachine Section learning 4.1. The is processdescribed view following describes the interactionsprocess view between of the architecture the architecture pro- componentsposed in Section for industrial 4.1. The analytics process andview the describes activities interactions that realize managingbetween the Big architecture Data during components for industrial analytics and the activities that realize managing Big Data dur- its life cycle. Figure 12 shows the instances of the architecture components that would ing its life cycle. Figure 12 shows the instances of the architecture components that would interact during a fault diagnosis process scenario. From the analysis of the scenario used interact during a fault diagnosis process scenario. From the analysis of the scenario used in [34], the mapping of the solution proposed to the components of the architecture is in [34], the mapping of the solution proposed to the components of the architecture is as as follows: at architecture component manufacturing resources data, historical data from follows: at architecture component manufacturing resources data, historical data from the the component heterogeneous source stream is stored. That stored data includes machinery component heterogeneous source stream is stored. That stored data includes machinery per- performance logs from the measurement method. The information extraction process is formance logs from the measurement method. The information extraction process is per- performed at the component data characteristics extraction by virtual dimensions that capture formed at the component data characteristics extraction by virtual dimensions that capture the conditions of a face milling cutter and monitoring data-fault recording. The precompute the conditions of a face milling cutter and monitoring data-fault recording. The precompute views component performs the extraction of information through diagnosis modeling or views component performs the extraction of information through diagnosis modeling or using predictive modeling. The component OLAP model and data mining analyzes the using predictive modeling. The component OLAP model and data mining analyzes the data data using statistical parameters about the machinery performance logs and sensor fault using statistical parameters about the machinery performance logs and sensor fault detec- detection data to obtain information about the past and present. On the other hand, the use tion data to obtain information about the past and present. On the other hand, the use of of naive Bayes algorithm at the learning model component allows tool condition monitoring naive Bayes algorithm at the learning model component allows tool condition monitoring for fault diagnosis at batch views component. for fault diagnosis at batch views component.

FigureFigure 12. 12.Instance Instance ofof thethe faultfault diagnosisdiagnosis process.

5.2. Real-Time Equipment Monitoring The detection of anomalies in the process analysis in real-time for iCPS has allowed a new way of optimizing industrial systems supporting analysts and operators to solve possible problems [8]. Industrial data processes used to be streams obtained in real-time and, thus, fault detection methods must be able to handle the data in the temporal aspect, since it is only available at the window time of the acquisition. Table 4 presents different time assump- tions for industrial temporal data for fault detection. Data streaming carries a set of values extracted from a data provider over time, such as sensor data collected from different as- pects across manufacturing, including facilities equipment, process equipment, and phys- ical defects.

Sensors 2021, 21, 4282 18 of 25

5.2. Real-Time Equipment Monitoring The detection of anomalies in the process analysis in real-time for iCPS has allowed a new way of optimizing industrial systems supporting analysts and operators to solve possible problems [8]. Industrial data processes used to be streams obtained in real-time and, thus, fault detection methods must be able to handle the data in the temporal aspect, since it is only available at the window time of the acquisition. Table4 presents different time assumptions for industrial temporal data for fault detection. Data streaming carries a set of values extracted from a data provider over time, such as sensor data collected from different aspects across manufacturing, including facilities equipment, process equipment, and physical defects.

Table 4. Time assumptions for industrial data.

Industrial Data Data Source Time Assumption Facilities equipment Sensor and environment data Stream data Process equipment Sensor fault detection data Stream data Process results Process history and measurements Time series Physical defects Defect images and characteristics Stream data Product Product test characteristics Time series

In the scenario of the composition of data-driven events, real-time data determines active preventive maintenance, repair, and overhaul (MRO). Real-time monitoring at an earlier stage should prevent loss to machinery and reduce damage. In [35], a model-based technique is proposed that uses machine learning to identify internal faults of induction machines in real-time. Detection of device faults could have only minor manifestations but result in lower efficiency of operation. In that case, condition monitoring for inferring warning is quite useful to reduce industrial machine internal faults. In [36], a real-time fault monitoring system for industries is proposed to prevent severe machine damage in due course of time, based on random forest learning methods. To illustrate one of many possible applications of the proposed architecture for indus- trial Big Data analytics in scenarios of real-time equipment monitoring, a real case reported in [37] is discussed below. In [37], an autonomous learning system method for real-time fault detection in indus- trial processes based on TEDA (Typicality and Eccentricity Data Analytics) is proposed. The main advantages of the TEDA approach are that it does not require a priori knowledge about the streaming data, being this of particular importance in real-world applications; as well TEDA is very fast online, allowing its utilization for fault detection in the industry. To validate the scenario for real-time fault monitoring using TEDA for industrial problems in [37], real-world data from industrial plants were used. The open dataset DAMADICS (Development and Application of Methods for Actuator Diagnosis in Industrial Control Systems) was used in conjunction with a laboratory pilot plant. Data from DAMADICS comes from an actuator from the water evaporation process, which consists of the fol- lowing components: control valve, pneumatic servomotor, and positioner. Data plant comes from several sensors and actuators controlled by programmable logical controller (PLC) that automates the flow between two tanks, two pneumatic control valves, and a centrifugal pump. The process view of Figure 13 describes interactions between the architecture compo- nents for industrial analytics and the use case for real-time monitoring equipment activities that realize managing streaming data. Figure 13 shows an instance of the real-time moni- toring process for fault detection. Data are processed in real-time from DAMADICS and the laboratory pilot plant for process control at the architecture component of the process data stream. At the architecture component of increment views, the data provider at time processing uses a temporary window to perform operations on the data. Data of the pilot plant are collected in a sampling period of 100 ms. That operation is intended to clean up Sensors 2021, 21, 4282 19 of 25

the data, reduce the data frequency, and extract variables. Each data stream has monitored variables shown at the component of increment views. When tasks are completed, data variables are selected to be sent to the subprocess normal model operations for evaluation. In the component real-time monitoring of the data-driven events normal model operations Sensors 2021, 21, x FOR PEER REVIEWare used to apply a multivariate learning model for underlying virtual dimensions19 that of 24 capture the “good” and in-control process data.

FigureFigure 13. Instance of the real-timereal-time monitoringmonitoring process process for for fault fault detection. detection.

TEDA algorithm algorithm does does not not need need any any previous previous knowledge knowledge of of data data processes processes to to learn learn to detectto detect anomalies. anomalies. Unsupervised Unsupervised learning learning methods methods do do not not need labeledlabeled datadata and and detect detect rarerare events or outliers asas datadata veryvery distinctdistinct fromfrom the the majority majority data data based based on on identifying identifying newnew types of anomalies asas anomalyanomaly behaviorbehavior analysisanalysis [ 34[34].]. The The detection detection and and prediction prediction processes useuse thethe predictionprediction models models generated generated during during the the information information extraction extraction process process to toidentify identify patterns patterns or predictor predict states states in real-time. in real-time. The architecture The architecture component component of a data-driven of a data- drivenevent monitors event monitors in real-time in real-time the submitted the submitted data anddatareturns and returns a prediction a prediction for an for event an event in inDAMADICS DAMADICS or or the the laboratory laboratory pilot pilot plant plant for fo processr process control. control. This This subprocess subprocess uses uses the the modelmodel generatedgenerated byby the the information information extraction extraction process process to evaluate to evaluate the data the indata real-time. in real-time. The real-time views Thearchitecture architecture component component of of real-time displaysviews displays charts ofcharts input of signals input signals for fault for detection fault de-. tection.5.3. Forecasting Unusual Process Conditions The scenario for forecasting unusual process conditions combines optimization data 5.3. Forecasting Unusual Process Conditions services and data-driven requirements for an adaptive health assessment. The efficiency improvementThe scenario of asset-based, for forecasting through unusual real-time process monitoring conditions combined combines with optimization the prediction data servicesof future and behavior data-driven of machinery requirements and performance for an adaptive log, allows health health assessment. status monitoring The efficiency of improvementprocess conditions. of asset-based, through real-time monitoring combined with the prediction of futureThe fundamentalbehavior of machiner premisey is and to develop performance a predictive log, allows model health based status on the monitoring sensor data of processlog and conditions. apply the model to measure in real-time incoming stream data from sensors. The scenarioThe fundamental challenge for premise the architecture is to develop design a pr isedictive twofold. model First, based to develop on the asensor learning data logmodel and based apply on the sensor model Big to Data. measure The in second real-tim is toe incoming apply the analyticsstream data model from to sensors. the stream The scenario challenge for the architecture design is twofold. First, to develop a learning model based on sensor Big Data. The second is to apply the analytics model to the stream measures of sensors. The Big Data challenge is to face intensive real-time data loads while a machine learning model is computed. The detection of anomalies, in this case, requires both the analysis of massive amounts of historical data and rapid processing based on intermediate results (anomaly detection model) [36]. An adaptive health assessment could be used to diagnose system reliability and fore- cast the machine condition, or a system based on health monitoring information. In [38],

Sensors 2021, 21, 4282 20 of 25

measures of sensors. The Big Data challenge is to face intensive real-time data loads while a machine learning model is computed. The detection of anomalies, in this case, requires both the analysis of massive amounts of historical data and rapid processing based on intermediate results (anomaly detection model) [36]. An adaptive health assessment could be used to diagnose system reliability and forecast the machine condition, or a system based on health monitoring information. In [38], a generic methodology based on machine learning methods to strongly correlate faults detected in historical data of process log with the upcoming data stream, according to a prediction scope is presented. Data comes from the aluminum domain and represents the flow of the different phases and machine data comes from the process aluminum electrolysis and represents the flow of the different phases and machines to prepare the paste and form the anode (carbon blocks using for the aluminum reduction process). There are two data categories: the input signals are analyzed for the forecasting horizon, then the input data are correlated with the data of the internal system for fault diagnosis. The general approach consists of two parts: processing and applying the model to the ongoing process to detect outliers in underlying components and virtual dimensions. Figure 14 shows an instance for forecasting an unusual process. In the part of data- based diagnostics, the architecture component of manufacturing resources stores data records from stop data files. The architecture component for the data characteristics extraction includes process variables for anode production and variables for the baking process. The use of Isolation Forest at the learning model component allows forecasting recorded stops for fault diagnosis at the batch view component. This model can also be applied to the model view component to monitor a breakdown event at process signals coming from the process data stream component. The source of streams includes variables from the cooler and mixer devices. That allows an adaptive health assessment at the architecture component of real-time views for the health forecasting of the cooler, mixer, and non-recorded stops. Finally, it is also possible for the component real-time process monitoring to use normal model operations to apply a learning model that captures the “good” and in-control process data for fault detection. Sensors 2021, 21, x FOR PEER REVIEW 20 of 24

a generic methodology based on machine learning methods to strongly correlate faults detected in historical data of process log with the upcoming data stream, according to a prediction scope is presented. Data comes from the aluminum domain and represents the flow of the different phases and machine data comes from the process aluminum electrol- ysis and represents the flow of the different phases and machines to prepare the paste and form the anode (carbon blocks using for the aluminum reduction process). There are two data categories: the input signals are analyzed for the forecasting horizon, then the input data are correlated with the data of the internal system for fault diagnosis. The general approach consists of two parts: processing and applying the model to the ongoing process to detect outliers in underlying components and virtual dimensions. Figure 14 shows an instance for forecasting an unusual process. In the part of data- based diagnostics, the architecture component of manufacturing resources stores data records from stop data files. The architecture component for the data characteristics extraction includes process variables for anode production and variables for the baking process. The use of Iso- lation Forest at the learning model component allows forecasting recorded stops for fault di- agnosis at the batch view component. This model can also be applied to the model view com- ponent to monitor a breakdown event at process signals coming from the process data stream Sensors 2021, 21, 4282 component. The source of streams includes variables from the cooler and mixer devices.21 of 25 That allows an adaptive health assessment at the architecture component of real-time views for the health forecasting of the cooler, mixer, and non-recorded stops.

Figure 14. Instance for forecasting unusual process conditions for an adaptive health assessment.

6. Discussion, Research Limitations, and Further Work This paper focuses on an architectural design that supports the integration of the IIoT environment, communications, and the cloud, in an analytics Big Data for data obtained from iCPS. The related works that are in the research scope are classified in (i) proposal of reference architectures in industrial contexts, and (ii) the architectural design of Big Data analytics for iCPS. Hence, a discussion is presented concerning research in each classification.

6.1. Reference Architectures in Industrial Contexts The adoption of reference architectures accelerates the design of system solutions to common industrial process problems. However, proposals within the areas of Big Data are mainly focused on information systems or software in the field of business processes, being scarce those explicitly proposed for Industry 4.0. From literature related to reference architectures for industrial Big Data solutions, it has been observed that they do not provide systematic support for iCPS analytics towards generating possible architectural decisions alternatives according to industry business function. In [13], Big data-system design can use the architecture proposed in this paper as a reference architecture, and Sensors 2021, 21, 4282 22 of 25

the components of each layer as design methods and technology catalog for Big Data analytics in industrial contexts. A limitation found in [7] shows that validation is required to account for prescriptive analytics in fault diagnosis scenarios of integrating iCPS with Big Data analytics. It has been presented the functional requirements and quality attributes required by stakeholders, to allow making analytics architectural design decisions for the detection of failure events in industrial plants. Compared with the 1-3-5 model in [14], the reference architecture proposed in this paper can be used as a guide to identify alternative architectural solutions that consider new data modeling analytics techniques which include diagnosis modeling, predictive modeling, and prescriptive modeling.

6.2. Big Data Analytics Architectures Design Applied in iCPS Contexts The main reason for the complexity of developing Big Data analytics for iCPS is the difficulty in the integration of technologies in a consistent ecosystem in an industrial environment. Additionally, prescriptive analytics is inherently complex due to the need to align experience with model design, process optimization, and manufacturing forecasting. In the main literature review related to this topic, it has been observed that prescriptive analytics is not presented in them. For example, in [15], the model-driven proposal generates architectures for industrial data integration. In [10,16], a CBS adoption conceptual framework to support CPS manufacturing systems architecture for Industry 4.0 is defined. They focus particularly on guiding principles for CPS implementation in the industry through a methodology. Unlike those proposals, industrial Big Data attributes and data management drivers that specify architecturally significant requirements that guide the design of iCPS analytics architecture have been introduced in this paper. In [17], the authors introduce an Industry 4.0 architecture like a general overview of components that include analytics. The main difference between the approach in this paper and the previous study, lies in narrowing the focus on analytics systems as the unit of the data model and design solution architectures to integrate them with data analytics platforms. In the same way, the authors of [18] are trying to integrate the IIoT platform with Big Data analytics. A key feature is to enable IIoT for Big Data. While this approach acknowledges limitations in IIoT platforms enabling Big Data scenarios, and it keeps descriptions of process analytics, it does not represent the data management scenarios from manufacturing information systems. Nor do they address on-line analytical processing that allows structured analytics to support different perspectives in decision making. In [19], Big Data for prescriptive analytics is expected to be able to be used in manufacturing information systems (MIS) for advanced data analysis. Nevertheless, such work does not guide in an architectural framework for decision making in Big Data architecture design. The data management architecture proposed in this paper presents strong indications of being able to handle a wide range of industrial scenarios for prescriptive analytics, including fault diagnosis, real-time monitoring, and forecasting unusual process conditions, as it was discussed in the case study (see Section5).

6.3. Research Limitations As it has already been seen, manufacturing operations are affected by data manage- ment in several ways, however, industrial Big Data analytical projects used to suffer from long project execution time, making projects economically unattractive. It is believed that the proposed architectural based approach can significantly reduce the time spent developing Big Data analytics solutions for iCPS contexts. Although it is recognized that the fact of not presenting a real implementation of the proposed architecture could be considered a limitation of this work, it has been shown how it can be implemented with current open-source technologies. Moreover, the feasibility of the proposal to support the main functional and quality requirements was validated by use of the scenario technique, which is, perhaps, the most used approach to evaluate and validate software architectural Sensors 2021, 21, 4282 23 of 25

proposals [28,30,31], since they help to know if the proposal could actually support the needed requirements before a costly real implementation, as it has been shown.

6.4. Further Work The following significant research aspects in the architectural design for Big Data analytics in iCPS were found in the literature review and not addressed in this work, so they should be considered for future research: - Decentralized control in iCPS The approach presented in this paper is under the scope of the centralized control paradigm that integrates iCPS, artificial intelligence, and Big Data analytics, and agent-based systems that will enable autonomous control of subsystems. An emerging trend is a decentralized control that uses adaptive models that support rapid model alteration that no manual upgrades are applicable. Such a new problem approach arises as the processes change over time in complex systems like the material flow of product parts and associated logistic operations. The question that arises is, how can Big Data analytics be integrated into the decentralized control approach in iCPS? - iCPS analytics privacy concerns. An underlying concern in data security is industrial data privacy related to handling sensitive data without taking security measures, such as consideration of hidden Big Data analytics processing. An external server has become a potential security bridge to access confidential information from it. Thus, privacy concerns raised in Big Data analytics to concede data control to the cloud decrease confidentiality as data are probably stored, processed, and analyzed in several cloud centers leading security concerns into distinct locations of data. Then, the question that arises is, how does the Big Data analytics architecture handle sensitive data in iCPS contexts?

7. Conclusions In this paper, a Big Data analytics architecture for iCPS was proposed. An ADD ap- proach was adopted to gather industrial Big Data attributes (functional requirements) and industrial Big Data management drivers (data quality attributes) from planning, produc- tion, and maintenance, repair, and overhaul (different stakeholders) for the specification of the architecture. With those elements, the modeling of industrial Big Data design architec- ture for analytics was adopted, for predictive functions, inference of the key performance indicators, and data-driven events for real-time analysis. From the functional point of view, the architecture is structured into the infrastructure layer, monitoring layer, and presentation layer. Additionally, component deployment was provided, covering all functional requirements using technologies that are accessible under an open-source license. A case study for fault diagnosis conditions in industrial plants was discussed and used as a validation technique for the proposed architecture. It was performed through failure event detections to show the benefits of fault diagnosis, in-stream analytics, and the need to combine both for forecasting unusual process conditions, showing the feasibility of the usefulness of the proposed architecture. Finally, the contributions and strengths of the proposal compared to related works reported in the literature were discussed, as well as its limitations and future directions.

Author Contributions: E.A.H.-P.: conceptualization, methodology, investigation, writing—original draft, O.M.R.-E.: supervision, conceptualization, methodology, writing—original draft, resources, J.A.H.-M.: conceptualization, writing—review and editing, J.H.P.-R.: conceptualization, writing— review and editing, J.M.N.-J.: conceptualization, writing—review and editing. All authors have read and agreed to the published version of the manuscript. Funding: CONACYT has partially supported this work with scholarship No. 890778, and Tecnológico Nacional de México under project grant 10980.21-P. Institutional Review Board Statement: Not applicable. Sensors 2021, 21, 4282 24 of 25

Informed Consent Statement: Not applicable. Data Availability Statement: Not applicable. Conflicts of Interest: The authors declare no conflict of interest.

References 1. Lade, P.; Ghosh, R.; Srinivasan, S. Manufacturing analytics and industrial Internet of Things. IEEE Intell. Syst. 2017, 32, 74–79. [CrossRef] 2. Madakam, S.; Ramaswamy, R.; Tripathi, S. Internet of Things (IoT): A literature review. J. Comput. Commun. 2015, 3, 164–173. [CrossRef] 3. Yan, J.; Meng, Y.; Lu, L.; Li, L. Industrial Big Data in an Industry 4.0 Environment: Challenges, Schemes, and Applications for Predictive Maintenance. IEEE Access 2017, 5, 23484–23491. [CrossRef] 4. Jeschke, S.; Brecher, C.; Meisen, T.; Özdemir, D.; Eschert, T. Industrial Internet of Things and Cyber Manufacturing Systems. In Industrial Internet of Things; Springer: Berlin/Heidelberg, , 2017; pp. 3–19. 5. Chauhan, B.; Bhatt, C. Bigdata Analytics in Industrial IoT. In Internet of Things and Big Data Analytics toward Next-Generation Intelligence; Springer: Berlin/Heidelberg, Germany, 2018; pp. 381–406. 6. Wuest, T.; Weimer, D.; Irgens, C.; Thoben, K.D. Machine learning in manufacturing: Advantages, challenges, and applications. Prod. Manuf. Res. 2016, 4, 23–45. [CrossRef] 7. Fahmideh, M.; Beydoun, G. Big data analytics architecture design—An application in manufacturing systems. Comput. Ind. Eng. 2019, 128, 948–963. [CrossRef] 8. Atat, R.; Liu, L.; Wu, J.; Li, G.; Ye, C.; Yang, Y. Big Data Meet Cyber-Physical Systems: A Panoramic Survey. IEEE Access 2018, 6, 73603–73636. [CrossRef] 9. O’Donovan, P.; Leahy, K.; Bruton, K.; O’Sullivan, D.T.J. Big data in manufacturing: A systematic mapping study. J. Big Data 2015, 2, 20. [CrossRef] 10. Lee, J.; Bagheri, B.; Kao, H.A. A Cyber-Physical Systems architecture for Industry 4.0-based manufacturing systems. Manuf. Lett. 2015, 3, 18–23. [CrossRef] 11. Hinojosa-Palafox, E.A.; Rodríguez-Elías, O.M.; Hoyo-Montaño, J.A.; Pacheco-Ramírez, J.H. Trends and Challenges of Data Management in Industry 4.0. In LISS2019; Zhang, J., Dresner, M., Zhang, R., Hua, G., Shang, X., Eds.; Springer: , 2020. 12. Hinojosa-Palafox, E.A.; Rodriguez-Elias, O.M.; Hoyo-Montano, J.A.; Pacheco-Ramirez, J.H. Towards an Architectural Design Framework for Data Management in Industry 4.0. In Proceedings of the 2019 7th International Conference in Software Engineering Research and Innovation, CONISOFT 2019, Mexico City, Mexico, 23–25 October 2019; pp. 191–200. 13. Chen, H.-M.; Kazman, R.; Haziyev, S.; Hrytsay, O. Big Data System Development: An Embedded Case Study with a Global Outsourcing Firm. In Proceedings of the 2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering, Florence, , 23 May 2015; pp. 44–50. [CrossRef] 14. Xiaofeng, L.; Jing, L. Research on Big Data Reference Architecture Model. In Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Big Data, ICAIBD 2020, Chengdu, , 28–31 May 2020; pp. 205–209. 15. Trunzer, E.; Vogel-Heuser, B.; Chen, J.K.; Kohnle, M. Model-driven approach for realization of data collection architectures for cyber-physical systems of systems to lower manual implementation efforts. Sensors 2021, 21, 745. [CrossRef] 16. Lee, J.; Ardakani, H.D.; Yang, S.; Bagheri, B. Industrial Big Data Analytics and Cyber-physical Systems for Future Maintenance & Service Innovation. Procedia CIRP 2015, 38, 3–7. [CrossRef] 17. Cao, B.; Wang, Z.; Shi, H.; Yin, Y. Research and practice on Aluminum Industry 4.0. In Proceedings of the 6th International Conference on Intelligent Control and Information Processing, ICICIP 2015, Wuhan, China, 26–28 November 2015; pp. 517–521. 18. Mishra, N.; Lin, C.C.; Chang, H.T. A Cognitive Adopted Framework for IoT Big-Data Management and Knowledge Discovery Prospective. Int. J. Distrib. Sens. Netw. 2015, 2015, 718390. [CrossRef] 19. Molano, J.I.R.; Bravo, L.E.C.; Santana, E.R.L. Data architecture for the internet of things and industry 4.0. In Proceedings of the Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Fukuoka, , 5–8 November 2017; Volume 10387 LNCS, pp. 283–293. 20. Zwoli´nska,B.; Tubis, A.A.; Chamier-Gliszczy´nski,N.; Kostrzewski, M. Personalization of the MES system to the needs of highly variable production. Sensors 2020, 20, 6484. [CrossRef] 21. Chen, X.; Nophut, C.; Voigt, T. Manufacturing execution systems for the food and beverage industry: A model-driven approach. Electronics 2020, 9, 2040. [CrossRef] 22. Waschull, S.; Wortmann, J.C.; Bokhorst, J.A.C. Manufacturing Execution Systems: The Next Level of Automated Control or of Shop-Floor Support? Springer International Publishing: Berlin/Heidelberg, Germany, 2018; Volume 536, ISBN 9783319997063. 23. Tao, F.; Qi, Q.; Liu, A.; Kusiak, A. Data-driven smart manufacturing. J. Manuf. Syst. 2018, 48, 157–169. [CrossRef] 24. Bai, Y.; Sun, Z.; Deng, J.; Li, L.; Long, J.; Li, C. Manufacturing quality prediction using intelligent learning approaches: A comparative study. Sustainability 2017, 10, 85. [CrossRef] 25. Li, J.; Tao, F.; Cheng, Y.; Zhao, L. Big Data in product lifecycle management. Int. J. Adv. Manuf. Technol. 2015, 81, 667–684. [CrossRef] Sensors 2021, 21, 4282 25 of 25

26. Gustavsson, M.; Wänström, C. Assessing information quality in manufacturing planning and control processes. Int. J. Qual. Reliab. Manag. 2009, 26, 325–340. [CrossRef] 27. Bass, L.; Clements, P.; Kazman, R. Software Architecture in Practice, 3rd ed.; SEI: Upper Saddle River, NJ, USA, 2013; ISBN 0321154959. 28. Kazman, R.; Klein, M.; Clements, P. ATAM: Method for Architecture Evaluation. Cmusei 2000, 4, 83. 29. Capilla, R.; Jansen, A.; Tang, A.; Avgeriou, P.; Babar, M.A. 10 years of software architecture knowledge management: Practice and future. J. Syst. Softw. 2016, 116, 191–205. [CrossRef] 30. Bass, L.; Klein, M.; Bachmann, F. Quality attribute design primitives and the attribute driven design method. In Proceedings of the 4th International Workshop, PFE 2001, Bilbao, , 3–5 October 2001; Volume 2290, pp. 169–186. 31. Raza, A.; Zafar, S.; Rahman, S.U.; Khattak, U. Software Architecture Evaluation Methods: A Comparative Study. Int. J. Comput. Commun. Netw. 2019, 1, 1–9. 32. Lei, Y.; Jia, F.; Lin, J.; Xing, S.; Ding, S.X. An Intelligent Fault Diagnosis Method Using Unsupervised Feature Learning Towards Mechanical Big Data. IEEE Trans. Ind. Electron. 2016, 63, 3137–3147. [CrossRef] 33. Qi, G.; Zhu, Z.; Erqinhu, K.; Chen, Y.; Chai, Y.; Sun, J. Fault-diagnosis for reciprocating compressors using big data and machine learning. Simul. Model. Pract. Theory 2018, 80, 104–127. [CrossRef] 34. Madhusudana, C.K.; Budati, S.; Gangadhar, N.; Kumar, H.; Narendranath, S. Fault diagnosis studies of face milling cutter using machine learning approach. J. Low Freq. Noise Vib. Act. Control 2016, 35, 128–138. [CrossRef] 35. Vamsi, I.V.; Abhinav, N.; Verma, A.K.; Radhika, S. Random forest based real time fault monitoring system for industries. In Proceedings of the 2018 4th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, , 14–15 December 2018; pp. 1–6. [CrossRef] 36. Ranjan, G.S.K.; Kumar Verma, A.; Radhika, S. K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries. In Proceedings of the 2019 IEEE 5th International Conference for Convergence in Technology, I2CT 2019, Bombay, India, 29–31 March 2019; p. 5. 37. Bezerra, C.G.; Costa, B.S.J.; Guedes, L.A.; Angelov, P.P. An evolving approach to unsupervised and Real-Time fault detection in industrial processes. Expert Syst. Appl. 2016, 63, 134–144. [CrossRef] 38. Kolokas, N.; Vafeiadis, T.; Ioannidis, D.; Tzovaras, D. A generic fault prognostics algorithm for manufacturing industries using unsupervised machine learning classifiers. Simul. Model. Pract. Theory 2020, 103, 102109. [CrossRef]