applied sciences
Article Dynamic Topology Model of Q-Learning LEACH Using Disposable Sensors in Autonomous Things Environment
Jae Hyuk Cho * and Hayoun Lee
School of Electronic Engineering, Soongsil University, Seoul 06978, Korea; [email protected] * Correspondence: [email protected]; Tel.: +82-2-811-0969
Received: 10 November 2020; Accepted: 15 December 2020; Published: 17 December 2020
Abstract: Low-Energy Adaptive Clustering Hierarchy (LEACH) is a typical routing protocol that effectively reduces transmission energy consumption by forming a hierarchical structure between nodes. LEACH on Wireless Sensor Network (WSN) has been widely studied in the recent decade as one key technique for the Internet of Things (IoT). The main aims of the autonomous things, and one of advanced of IoT, is that it creates a flexible environment that enables movement and communication between objects anytime, anywhere, by saving computing power and utilizing efficient wireless communication capability. However, the existing LEACH method is only based on the model with a static topology, but a case for a disposable sensor is included in an autonomous thing’s environment. With the increase of interest in disposable sensors which constantly change their locations during the operation, dynamic topology changes should be considered in LEACH. This study suggests the probing model for randomly moving nodes, implementing a change in the position of a node depending on the environment, such as strong winds. In addition, as a method to quickly adapt to the change in node location and construct a new topology, we propose Q-learning LEACH based on Q-table reinforcement learning and Fuzzy-LEACH based on Fuzzifier method. Then, we compared the results of the dynamic and static topology model with existing LEACH on the aspects of energy loss, number of alive nodes, and throughput. By comparison, all types of LEACH showed sensitivity results on the dynamic location of each node, while Q-LEACH shows best performance of all.
Keywords: wireless sensor networks; disposable IoT; reinforcement learning; Q-learning; artificial intelligence; LEACH; fuzzy clustering
1. Introduction The most potent and influential technology of 21st century is Wireless Sensor Network (WSN). This is a key technology in the ubiquitous network, and is becoming more important as sensors can connect everything to the Internet and process important information. It also collects information about the surrounding environment (temperature, humidity, etc.) with a specific purpose and transmits it to the base station through various processes. With the advancements of WSN disposable sensors shows enhanced performance on the fields where the user cannot access or include large area, such as forest or deep sea [1]. In addition, the size of the sensor node should be small as it includes batteries, making it impossible to charge or replace, and often forming a wireless network by deploying a large number of sensors. And because it uses a battery, it cannot be charged or replaced. In addition, the size of the sensor node must be small as a large number of sensors are placed to form a wireless network. In addition, the price should be low, so there is a limit to the ability to process data and the amount of power that can be supplied to the nodes [2].
Appl. Sci. 2020, 10, 9037; doi:10.3390/app10249037 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, 9037 2 of 19
In general, in a wireless network environment, disposable sensor nodes that make up the network must perform routing and sensing roles together, so each sensor node always has an energy burden [3]. Due to these characteristics of disposable sensor network, the efficient use of limited power is of paramount importance in WSN design. Thus, various studies are currently being conducted to maximize energy consumption efficiency and to increase the overall life span. Since routing protocols in WSN are very important to efficiently transmit detected data to the Base Station (BS), clustering-based routing is preferred to obtain the advantage of efficient communication. As a common low-energy cluster-based protocol, Low-Energy Adaptive Clustering Hierarchy (LEACH) was proposed [4]. This is a technique that utilizes random rotation of BS to distribute network sensors and energy loads evenly. Only Cluster Heads (CH) interact with BS, which saves energy and increases network lifetime [5]. Thus, the idea of clustering-based routing is aimed at saving energy on sensor nodes by using CH’s information aggregation mechanism to reduce the amount of data transmission and energy. Thus, in the case of LEACH, the advantage of ensuring even energy use is achieved by using the probability function, but all nodes should take part in the CH election process every round, and, if a few CHs are elected, the number of nodes to be managed increases, resulting in high network traffic and increased energy consumption. Therefore, various LEACHs were developed to compensate for these problems in the preceding studies. In recent years, extensive research has been conducted on WSN’s various systems of artificial intelligence approaches, called Reinforcement Learning (RL), to improve network performance, and there is much interest in applying them [6]. A Q-learning-based cooperative sensing plan is proposed to enhance the performance of spectrum in a wireless network environment [7]. In Autonomous Things environment, the interests on disposable sensors with characteristics that can be easily sprayed and moved is emerging. Therefore, efficient energy management is required considering these topology changes, in real life. As limits exist for applying multiple LEACH algorithm with fixed topology state in previous studies, the development of a new dynamic topology model is required. In addition, as opposed to traditional LEACH and D-LEACH, which performs LEACH in its jurisdiction by dividing the zones, D-LEACH is efficient in the static topology simulation; however, in dynamic topology simulation, with higher uncertainty, the advantage of D-LEACH is not expected. The F-LEACH we developed for testing to secure dynamic uncertainty consists of two processes: first, it performs flexible clustering, centering nodes toward the cluster center by upper/lower membership values which compensate uncertainty and sensitivity, while producing optimized fuzzifier constant by histogram. Then, LEACH method is performed on the clustered node location. Q-LEACH the LEACH method using Q-learning (reinforced learning) rewards the success of the agent progress from multi nodes to the cluster head, and it derives this success in the form of a performance probability. From the results of energy loss, number of alive nodes, and throughput between LEACH, D-LEACH, F-LEACH, and Q-LEACH, finally, we invented the topology model, which updates the changes of sensor location in a dynamic or static autonomous things environment and definitize Q-LEACH, which increases the efficiency of sensor energy consumption, extending sensor life.
2. Materials and Methods
2.1. Wireless Sensor Network (WSN) In simple terms of a wireless sensor network, it means that sensors operating wirelessly form a network to transmit data. As shown in Figure1, when an event occurs in a sensor node in each cluster, event status information and location information are transmitted to the head of the cluster. And then the head of the cluster is configured to immediately send all information to BS or Sink. Appl. Sci. 2020, 10, 9037 3 of 19 Appl. Sci. 2020, 12, x FOR PEER REVIEW 3 of 19
FigureFigure 1. 1.DiagramDiagram of ofWireless Wireless Sensor Sensor Network Network (WSN). (WSN).
TheThe latest latest WSN WSN technology enablesenables two-way two-way communication communication to controlto control sensor sensor activity. activity. The types The of typesWSN of can WSN be roughly can bedivided roughly into divided homogenous into homo [6,7]genous and heterogeneous [6,7] and heterogeneous network [8,9]. network Previous studies[8,9]. Previoususing wireless studies using sensor wireless networks sensor have networks limitedenergy have limited because energy sensor because nodes sensor operate nodes with operate batteries. withTherefore, batteries. for Therefore, efficient use for of efficient the battery, use aof clustering the battery, technique a clustering was developed technique thatwas considers developed network that considersoverlapping network prevention overlapping [8]. Peopleprevention who [8]. use Peop thele technique who use couldthe technique cite several could tasks cite several that deal tasks with thatmonitoring deal with [ 9monitoring–11]. This is[9–11]. not only This for is medicalnot only use,for medical but WSN use, is thebut same WSN in is civil the same areas, in such civil as areas, health suchmonitoring as health and monitoring smart agriculture, and smart as agriculture, well as enemy as well tracking as enemy and military tracking areas. and military Remote sensingareas. Remoteexecution sensing is greatly execution simplified is greatly to be simplified useful in inexpensiveto be useful sensors in inexpensive and essential sensors software and essential packages. softwareTherefore, packages. the user Therefore, can monitor the anduser control can monito the basicr and environment control the basic from environment a remote location, from a and remote it will location,be used and in various it will be ways used besides in various safety ways accident besides and safety prevention accident training and prevention [12,13]. training [12,13]. 2.2. Disposable IoT Sensors 2.2. Disposable IoT Sensors Currently, various technologies have been fused and evolved into IoT technologies. Existing IoT Currently, various technologies have been fused and evolved into IoT technologies. Existing IoT sensors have limitations in price, size, and weight, so it is difficult to precisely monitor a wide range sensors have limitations in price, size, and weight, so it is difficult to precisely monitor a wide range of physical spaces. To solve these problems, technology of disposable IoT that has reduced size, of physical spaces. To solve these problems, technology of disposable IoT that has reduced size, price, price, and weight and has been constantly studied and developed. In the late 1990s, The University and weight and has been constantly studied and developed. In the late 1990s, The University of of California developed a very small sensor with a size of 1–2 mm called “Smart Dust” [14]. It has a California developed a very small sensor with a size of 1–2 mm called “Smart Dust” [14]. It has a technology that can detect and manage surrounding physical information (temperature, humidity, technology that can detect and manage surrounding physical information (temperature, humidity, acceleration, etc.) through a wireless network through disposable IoT sensors (micro sensors). acceleration, etc.) through a wireless network through disposable IoT sensors (micro sensors). Therefore, it is possible to accurately sense the entire space by using a low-cost micro sensor compared Therefore, it is possible to accurately sense the entire space by using a low-cost micro sensor to the existing sensor. In smart dust system using disposable sensors, LEACH is one of the key compared to the existing sensor. In smart dust system using disposable sensors, LEACH is one of the technologies for effective data collecting. In addition, disposable sensors broadcast collected data key technologies for effective data collecting. In addition, disposable sensors broadcast collected data without polling process in order to reduce energy consumption and the size of send/receive payloads without polling process in order to reduce energy consumption and the size of send/receive payloads are relatively small. Disposable IoT sensor can be applied to a wide range of fields, such as weather, are relatively small. Disposable IoT sensor can be applied to a wide range of fields, such as weather, national defense, safety of risks, and detection of forest fire, as it can judge parts that are difficult for national defense, safety of risks, and detection of forest fire, as it can judge parts that are difficult for user to access [15]. user to access [15]. With the advancements of wireless sensor networks, it is anticipated that real time forest fire With the advancements of wireless sensor networks, it is anticipated that real time forest fire detection systems can be developed for high precision and accuracy using wireless sensors data. detection systems can be developed for high precision and accuracy using wireless sensors data. Thousands of disposable sensors can be densely scattered over a disaster-prone area to form a wireless Thousands of disposable sensors can be densely scattered over a disaster-prone area to form a sensor network in a forest [16]. wireless sensor network in a forest [16]. For monitoring the environment, disposable sensors are distributed in large areas by aerial For monitoring the environment, disposable sensors are distributed in large areas by aerial dispersion using airplanes or dispersion in water stream. Distributed disposable sensors can be dispersion using airplanes or dispersion in water stream. Distributed disposable sensors can be dislocated from original point by environmental condition, such as winds or movement of animals, dislocated from original point by environmental condition, such as winds or movement of animals, since its size is only a maximum of 1 cm2, as shown Figure2. since its size is only a maximum of 1 cm2, as shown Figure 2.
Appl. Sci. 2020, 10, 9037 4 of 19 Appl.Appl. Sci. Sci. 2020 2020, ,12 12, ,x x FOR FOR PEER PEER REVIEW REVIEW 44 of of 19 19
FigureFigureFigure 2. 2. Image 2.ImageImage of of disposable ofdisposable disposable sensor. sensor. sensor.
2.3.2.3.2.3. LEACH LEACH LEACH (Low-Energy (Low-Energy (Low-Energy Adaptive Adaptive Adaptive Clustering Clustering Clustering Hierarchy) Hierarchy) Hierarchy) TheTheThe structurestructure structure ofof of aa network anetwork network cancan can bebe be classifiedclassified classified accordingaccording according toto to nodenode node uniformity,uniformity, uniformity, inin in FlatFlat Flat NetworksNetworks Networks RoutingRoutingRouting Protocols Protocols Protocols (FNR (FNR (FNRP)P)P) and and and Hierarchical Hierarchical Hierarchical Networks Networks Networks Routing Routing Routing Protocol Protocol Protocol (HNRP) (HNRP) (HNRP) [17]. [17]. [17 ]. LEACHLEACHLEACH is is one isone one of of ofHNRP HNRP HNRP in in insensor sensor sensor networks networks networks proposed proposed proposed by by byWendi Wendi Wendi Heizehan Heizehan Heizehan [18]. [18]. [18 This ].This This is is a isa typical typical a typical routingroutingrouting protocol protocol protocol that that that effectively effectively effectively reduces reduces reduces transmission transmission transmission energy energy energy consumption consumption consumption by by byforming forming forming a a hierarchical hierarchical a hierarchical structurestructurestructure between between between nodes. nodes. nodes. As As As shown shown shown in in Figure in Figure Figure 3, 3, 3LEACH LEACH, LEACH is is isa a method amethod method in in inwhich which which the the the cluster cluster cluster head head head collects collects collects andandand processes processes processes data data data from from from the the the member member member nodes nodes nodes of of ofthethe the cluster cluster cluster and and and delivers delivers delivers it it directly itdirectly directly to to the tothe the BS. BS. BS. If If this Ifthis this is is is used,used,used, the the the cluster cluster cluster head head head is is isselected selected selected by by by the the the ratio ratio ratio of of of all all all sensor sensor nodes, nodes, and and the thethe cluster clustercluster head head is is determined determined by bybycalculation calculation calculation insideinside inside thethe the sensorsensor sensor node,node node,so ,so soit it itis is isa a adistributed distributed distributed system. system. system.
FigureFigureFigure 3. 3. Diagram 3.DiagramDiagram of of WSN. ofWSN. WSN. ( a(a) ()Flata Flat) Flat networks networks networks routing routing routing protocols. protocols. protocols. ( b(b) ()bHierarchical Hierarchical) Hierarchical routing routing routing protocols protocols protocols (Low-Energy(Low-Energy(Low-Energy Adaptive Adaptive Adaptive Cluste Cluste Clusteringringring Hierarchy Hierarchy Hierarchy (LEACH)). (LEACH)). (LEACH)).
TheTheThe operation operation operation of of LEACH LEACH of LEACH is is divided divided is divided into into rounds, intorounds, rounds, and and each each and round round each roundbegins begins beginswith with a a setup withsetup phase aphase setup when when phase thethewhen cluster cluster the is is clusterformed, formed, is and formed,and consists consists and of consistsof a a steady steady of state astate steady phase phase state when when phase data data when is is transmitted transmitted data is transmitted to to the the base base to station. station. the base InIn station. orderorder toto In minimizeminimize order to minimize overhead,overhead, overhead, thethe steadysteady the statestate steady phphase statease isis phase longerlonger is longerthanthan thethe than setset the phase,phase, set phase, asas shownshown as shown inin EquationEquationin Equation (1) (1) [19]. [19]. (1) [In 19In ].Equation Equation In Equation (1), (1), the (1),the setup thesetup setup is: is: each each is: each node node node decides decides decides to to depend depend to depend on on onCHCH CH of of the ofthe the current current current round.round.round. This This This decision decision decision is is isbased based based on on on choosing choosing choosing a aa random randomrandom number number betweenbetween between 00 0 and and and 1 1 if1 if theif the the number number number is lessis is less less than thanthanthe the the threshold threshold threshold T(n T( ),T( andnn),), and and the the nodethe node node becomes becomes becomes the clusterthe the cluster cluster head head head of the of of current the the current current round. round. round. The threshold The The threshold threshold is set as isis set follows.set as as follows. follows. If the If clusterIf the the cluster cluster head head ishead selected, is is selected, selected, the state the the isstate state reported is is reported reported using using using the Carrier the the Carrier Carrier Sense Sense Sense Multiple Multiple Multiple Access AccessAccess(CSMA) (CSMA) (CSMA) Medium Medium Medium Access Access Access Control Co Controlntrol (MAC) (MAC) (MAC) protocol. protocol. protocol. The The The remained remained remained nodes nodes nodes make make make decisions decisions decisions about about about the thethecurrent currentcurrent round’s round’sround’s cluster clustercluster head headhead according accordingaccording to the toto received ththee receivedreceived signal signalsignal strength strengthstrength of the of advertisementof thethe advertisementadvertisement message. message.message.The Time The The Division Time Time Division Division Multiple Multiple AccessMultiple (TDMA) Access Access (TDMA) scheduling(TDMA) scheduling scheduling is applied is tois applied allapplied members to to all all ofmembers members the cluster of of groupthe the clusterclusterto send group group a message to to send send toa a message themessage CH andto to the the then CH CH from and and then the then cluster fr fromom the headthe cluster cluster to the head head base to to station. the the base base As station. station. soon asAs As a soon clustersoon asas heada a cluster cluster is selected head head is is forselected selected a region, for for a thea region, region, steady the the state stea stea phasedydy state state begins. phase phase Steady:begins. begins. Steady: theSteady: cluster the the iscluster cluster created is is created whencreated the whenwhenTDMA the the schedulingTDMA TDMA scheduling scheduling is fixed andis is fixed fixed data and transferand data data can transf transf begin.erer can can If the begin. begin. node If If alwaysthe the node node has always always the data has has to the send,the data data it sends to to send,send, it it sends sends it it to to the the cluster cluster head head for for the the allo allottedtted transmission transmission time. time. Each Each head head node node can can be be turned turned offoff until until the the node node allocates allocates the the transmission transmission time, time, thus thus minimizing minimizing energy energy lo lossss in in these these nodes nodes [20]. [20].
Appl. Sci. 2020, 10, 9037 5 of 19 it to the cluster head for the allotted transmission time. Each head node can be turned off until the node allocates the transmission time, thus minimizing energy loss in these nodes [20].
ρ 1 , if n G 1 ρ(r mod( ρ )) ∈ T(n) = − (1) 0, otherwise
T(n) = Threshold; r = current round; G = The set of nodes that have not been CH; ρ = the percentage of cluster heads. − 2.3.1. Enhancements of LEACH There are several drawbacks with LEACH. The setup phase is non-deterministic due to randomness. It may be unstable during setup phase that depending on the density of sensor nodes. It is not applicable on large networks as it uses single hop for communication. CH, which is located far from BS, will consume huge amount of energy. It does not guarantee the good CH distribution and it involves assumption of uniform energy consumption of CH during setup phase. And also, the main problems with LEACH lie in the random selection of CH. Various studies have been conducted to improve the LEACH protocol to improve network lifetime, energy savings, and performance and stability [21–28]. LEACH-Centralized (LEACH-C) method was developed for improving the lifetime of the network in the setting stage and also each node transmits information related to the location and energy level to the BS. The BS determines the cluster, as well as the CH and each cluster node. Through this, it was possible to extend the life of the network and improve energy savings [21]. In addition, in S-LEACH, solar power is used to improve WSN life. Typically, the remained energy is maximized to select a node with redundant power, and the node transmits the solar state to the BS along with its energy level. As a mechanism, nodes with higher energy are selected as CH, and the higher the number of recognition nodes, the better the performance of the sensor network. According to a Simulation results, S-LEACH significantly extends the lifetime of WSNs [23]. In addition, in the case of a LEACH, a new node concept that reduces and provides a uniform distribution of dissipated energy by dividing routing work and data aggregation, and supports CH in multi-hop routing is introduced. Energy-efficient multi-hop pathways are established for nodes to reach the BS to extend the life of the entire network and minimize energy dissipation [28]. As a method for saving energy, H-LEACH was developed with various algorithms that consider the concept of minimizing the distance between data [21]. It adds a CH of LEACH to act as the Master Cluster Head (MCH) to pass the data to the BS. In addition, it is proposed I-LEACH to save energy, while communicating within the network. In order to increase the stability of the network, we proposed Optimized-LEACH(O-LEACH), which optimizes CH selection [29]. This means that the selected CH is based only on the remaining dynamic energy. If the energy of the node is greater than 10% of the minimum residual energy, the node is selected as CH, otherwise the existing CH is maintained. In order to deal with situations in which the diameter of the network increases beyond certain limits, D-LEACH randomly places nodes with a high probability of being located close to each other, called twin nodes. It is necessary to keep one node asleep until the energy of another node depletes. Therefore, D-LEACH has uniform distribution of CH so that it does not run out of energy when longer distance transmission takes place [21]. The D-LEACH method ensures that the selected cluster heads are distributed to the network. Therefore, it is unlikely that all cluster heads will be concentrated in one part of the network [25].
2.3.2. Clustering in LEACH Clustering means defining a data group by considering the characteristics of given data and finding a representative point that can be represented. In LEACH, the sensor detects the events and then sends the collected data to a faraway BS. Because the cost of information transmission is higher Appl. Sci. 2020, 10, 9037 6 of 19 than the calculation, nodes are clustered into groups to save energy. It allows data to be communicated only to the CH, then the CH routes the aggregated data to the BS. The CH, which is periodically elected using a clustering algorithm, aggregates the data collected from cluster members and sends it back to the BS, which can be used by the end user. To transmit data over long distances in this way, only a few nodes are required, and most nodes need to complete short distance transmissions. So, more energy is saved, and the WSN life is extended. The main idea of a hierarchical routing protocol is to divide the entire network into two or more hierarchical levels, each of which performs different tasks [20]. In order to create these hierarchical levels, the clustering functions as a critical role in LEACH.
2.4. Interval Type-2 Possibilistic Fuzzy C-Means (IT2-PFCM) It is known that the synthesis of Fuzzy C-Means (FCM) and T2FS gives more room to handle the uncertainties of belongingness caused by noisy environment. In addition, a Possibilistic C-Means (PCM) clustering algorithm was presented, which allocates typicality using an absolute distance between one pattern and one central value. However, the PCM algorithm also has a problem that clustering results respond to the initial parameter values sensitively. To address this sensitivity problem, PFCM algorithm combining FCM and PCM by weighted sum was proposed. However, the PFCM method also has the uncertainty problem of determining the value of the purge constant. It is an important issue to control the uncertainty of the fuzzy constant value because the fuzzy constant value plays a decisive role in obtaining the membership function. To control the uncertainty of these fuzzy constants, hybrid algorithms are suggested, which includes the general type-2 FCM [30–32], Interval Type-2 FCM (IT2-FCM) [33], kernelized IT2-FCM [34], interval type-2 fuzzy c-regression clustering [35], interval type-2 possibilistic c-means clustering [36,37], interval type-2 relative entropy FCM [38], particle swarm optimization based IT2-FCM [39], interval-valued fuzzy set-based collaborative fuzzy clustering [40], and others. These T2FS based algorithms have been successfully applied to areas like image processing, time series prediction and others. Interval Type-2 FCM (IT2-FCM): In fuzzy clustering algorithms, like FCM, the fuzzification coefficient m plays an important role in determining the uncertainty of partition matrix. However, the value of m is usually hard to be decided upon in advance. IT2-FCM considers the fuzzification coefficient as an interval (m1, m2) and solves two optimization problems [41].
Fuzzifier Value IT2 PFCM is expressed as the sum of the weights of FCM and PCM method. Therefore, it is clustered in the direction of minimizing PFCM objective function, as follows in Equation (2). In Equation (3), uik represents a membership value where the input pattern k belongs to cluster
Xn Xc Xc Xn m η 2 η Jm,η(U, T, V; X) = (au +bt ) x v + γ ((1 t ) ), (2) ik × k k − ik i − ik k=1 i=1 ik i=1 k=1
Xc u = 1, 0 u , t 1, m = > 1, η > 1, γ > 0. (3) ik ≤ ik ik ≤ i i=1
i. xi is the k-th input pattern, and vi is the center value of the i-th cluster. m is a constant representing the degree of fuzziness and satisfies the condition of m (1, ). tik represents typicality ∈ ∞ that the input pattern k belongs to cluster i, which is a feature of PFCM method using absolute distance. γi is a scale defining point, where typicality of the i-th cluster is 0.5, and the value is an arbitrary number. For PFCM cluster method, in above Equation (5) the objective function should be minimized with respect to the membership function uik. Membership that is obtained by Equation (2). To draw m, Appl. Sci. 2020, 10, 9037 7 of 19 you must create the lowest and highest membership functions using the primary membership function. The highest and lowest membership functions of PFCM according to m are as follows.
1 = ( ) uik 2 , (4) m 1 P1 dik 1− j=1 dij
1 = ( ) uik 2 , (5) m 1 P1 dik 2− j 1 d − ij
As shown in Equation (6), the lowest and highest membership values where m1 and m2 are representing objective function, the value γi also changes according to the lowest and highest membership functions. Using γi, the lowest and highest typicality is,
2 1 1 PC dij mold − log log1 + uj − k=2 dik γ = (6) d log ij dik
2 m = 1 + (7) jnew γ For updating the center value, as shown in Equation (7), the type reduction process of changing type-2 fuzzy takes up type-1 using the K-means algorithm that it is performed, and the center value of each cluster is updated. N N X X m = m /N, m = m /N (8) 1 1i 2 2i i=1 i=1 Table1 shows the symbol and original values of IT2-PFCM.
Table 1. Explanation of symbols.
Symbol Explanation C Bulky cluster v Cluster center vi,V Cluster prototype m Fuzzifier value u Membership function U Partition matric dik/dij Euclidean distance value δ Threshold of fuzzifier constant Ae Secondary membership degree J PFCM objective function x Input pattern tik Represents typicality, the input pattern k belongs to cluster i γi Scale defining point where typicality of the i-th cluster is 0.5 X Input space i Φ Xj Kernel property space K Input space for kernel S Number of kernels
Figure4 shows that histograms and Footprint of Uncertainty (FOU) examples are determined by class and dimension. Upper membership function (UMF) histogram and lower membership function (LMF) histogram are obtained according to class and dimension. A new membership function from the Gaussian Curve Fitting (GF-F) method can be applied to calculate the adaptive fuzzifier value. Appl. Sci. 2020, 10, 9037 8 of 19 Appl. Sci. 2020, 12, x FOR PEER REVIEW 8 of 19
FigureFigure 4. Histogram 4. Histogram for for fuzzifier fuzzifier value. value. (a (a)) LMF LMF histogram ( b(b) UMF) UMF histogram histogram 2.5. Reinforcement Learning (RL) 2.5. Reinforcement Learning (RL) RL is one of the unsupervised learning methods which tries to find out some policies from RLinteraction is one withof the the unsupervised environment. It learning is the problem methods faced bywhich an agent tries that to has find to learnout some behavior policies through from interactiontrial and with error the interactions environment. with vigorousIt is the environments.problem faced And by itan was agent applied that successfully has to learn in manybehavior throughagent trial systems and error [42–44 interactions]. The aim of reinforcementwith vigorous learning environments. is to find out And a useful it was behavior applied by successfully evaluating in manythe agent reward. systems Q learning [42–44]. is a RLThe method aim of adequately reinforcement appropriate learning for learning is to find from out interaction, a useful wherebehavior the by evaluatinglearning the is performedreward. Q through learning a reward is a mechanism.RL method This adequately method was appropriate applied to WSN for optimizationlearning from interaction,problems where [45]. Researchthe learning into applyingis performed intelligent through and machine a reward learning mechanism. methods This for power method management was applied was considered in Reference [46], with Reference [47] being among those specially targeting the area to WSN optimization problems [45]. Research into applying intelligent and machine learning of dynamic power management for energy harvesting embedded systems. A learner interacted methods for power management was considered in Reference [46], with Reference [47] being among with the environment and autonomously determined required actions [47]. An RL-based Sleep thoseScheduling specially targeting for Coverage the (RLSSC) area of algorithmdynamic ispower for sustainable management time-slotted for energy operation harvesting in rechargeable embedded systems.sensor A learner networks interacted [48]. Then, with the learnerthe environm was rewardedent and by autonomously the reward function determined to respond required to different actions [47]. Anenvironmental RL-based Sleep states. Scheduling RL can be applied for Coverage to both (RLSSC) single and algorithm multi-agent is systemsfor sustainable [49]. In previoustime-slotted operationstudies, in the rechargeable efficiency of RLsensor was improved networks by developing[48]. Then, RL the applications learner was in WSN rewarded [50]. An independentby the reward functionRLapproach to respond for resourceto different management environmental in WSN isstat proposedes. RL [can51]. WSNbe applied tasks through to both random single selection and multi- agentcan systems provide [49]. better In performance previous instudies, terms of the cumulative efficiency compensation of RL was over improved time and residualby developing energy of RL applicationsthe network in WSN [52]. [50]. An independent RL approach for resource management in WSN is proposedQ-Learning [51]. WSN tasks through random selection can provide better performance in terms of cumulative compensation over time and residual energy of the network [52]. The most well-known reinforcement learning technique is Q-learning. Q-learning is a representative Q-Learningalgorithm of RL prosed by Watkins [53]. As shown in Figure5, data is collected directly by the agent acting in a given environment. In other words, the agent takes an action (a) in the current state (s) and Thegets amost reward well-known (r) for it. reinforcement learning technique is Q-learning. Q-learning is a representativeIn Equation algorithm (9), after of RL Q-learning prosed Qby( sWatkinst, at ) is initialized [53]. As toshown an arbitrary in Figure value, 5, itdata is repeatedly is collected directlyupdated by the with agent the acting following in a formula given valueenvironment. according In to other learning words, progresses. the agentRt+1 takesis the an reward action that (a) in the currentgets it fromstate current (s) and state( getss ta) toreward next state (r) for(st+ it.1). γ is a discount factor, which is the largest value among the action value function values that can be obtained in the next state (st+1)[54].
Q( st, at )= Rt+1+ γ maxa+1Q. (9)
Figure 5. Diagram of Q-learning.
Appl. Sci. 2020, 12, x FOR PEER REVIEW 8 of 19
Figure 4. Histogram for fuzzifier value. (a) LMF histogram (b) UMF histogram
2.5. Reinforcement Learning (RL) RL is one of the unsupervised learning methods which tries to find out some policies from interaction with the environment. It is the problem faced by an agent that has to learn behavior through trial and error interactions with vigorous environments. And it was applied successfully in many agent systems [42–44]. The aim of reinforcement learning is to find out a useful behavior by evaluating the reward. Q learning is a RL method adequately appropriate for learning from interaction, where the learning is performed through a reward mechanism. This method was applied to WSN optimization problems [45]. Research into applying intelligent and machine learning methods for power management was considered in Reference [46], with Reference [47] being among those specially targeting the area of dynamic power management for energy harvesting embedded systems. A learner interacted with the environment and autonomously determined required actions [47]. An RL-based Sleep Scheduling for Coverage (RLSSC) algorithm is for sustainable time-slotted operation in rechargeable sensor networks [48]. Then, the learner was rewarded by the reward function to respond to different environmental states. RL can be applied to both single and multi- agent systems [49]. In previous studies, the efficiency of RL was improved by developing RL applications in WSN [50]. An independent RL approach for resource management in WSN is proposed [51]. WSN tasks through random selection can provide better performance in terms of cumulative compensation over time and residual energy of the network [52].
Q-Learning The most well-known reinforcement learning technique is Q-learning. Q-learning is a representative algorithm of RL prosed by Watkins [53]. As shown in Figure 5, data is collected directlyAppl. Sci. by2020 the, 10 agent, 9037 acting in a given environment. In other words, the agent takes an action (a)9 in of 19 the current state (s) and gets a reward (r) for it.
Appl. Sci. 2020, 12, x FOR PEER REVIEW 9 of 19
In Equation (9), after Q-learning Q( st,at )is initialized to an arbitrary value, it is repeatedly updated with the following formula value according to learning progresses. Rt+1 is the reward that gets it from current state(st) to next state (st+1). γ is a discount factor, which is the largest value among the action value function values that can be obtained in the next state(st+1) [54].
Q( st,at )=Rt+1+ γ maxa+1Q. (9) FigureFigure 5. Diagram 5. Diagram of Q-learning. of Q-learning. 2.6. Proposed Modification in LEACH 2.6. Proposed Modification in LEACH FigureFigure 6 shows6 shows the workflow the workflow of this of study, this study, first, first,we define we define the deployment the deployment of sensor of sensor node node in the matrix inthen, the choose matrix then,the model choose accord the modeling to according sensor movement, to sensor movement, if the movement if the movement of sensor topology of sensor is not consideredtopology isas nottraditional considered LEACH as traditional modeling, LEACH static modeling, topology staticmodel topology is chosen; model otherwise, is chosen; the dynamicotherwise, topology the model dynamic is applied topology to model imply is the applied disp toosable imply sensor the disposable movement. sensor In movement.the chosen Inmodel, the LEACHchosen protocol model, algorism LEACH is protocolapplied, algorism and each is applied,LEACH andprotocol each LEACH selects protocolCluster Head selects (CH) Cluster in Headits own way, while(CH) the in its calculation own way, while is repeated the calculation until the is repeated system untilis terminated the system as is the terminated node dying. as the node dying.
FigureFigure 6. Workflow 6. Workflow of ofresearch. research.
2.6.1. Dynamic Topology Modeling The simulation of the dynamic topology model is signed with the changing node location on the field during operation. This reflects the phenomenon that the micro-sensors exposed to the real environment show movements. As round (r) in Equation (10) increase, the position of the node (Xr, Yr),