applied sciences

Article Application of Improved LightGBM Model in Blood Glucose Prediction

Yan Wang and Tao Wang *

School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China; [email protected] * Correspondence: [email protected]; Tel.: +86-1733-988-1226

 Received: 6 April 2020; Accepted: 2 May 2020; Published: 6 May 2020 

Abstract: In recent years, with increasing social pressure and irregular schedules, many people have developed unhealthy eating habits, which has resulted in an increasing number of patients with diabetes, a disease that cannot be cured under the current medical conditions, and can only be mitigated by early detection and prevention. A lot of human and material resources are required for the detection of the blood glucose of a large number of people in medical examination, while the integrated learning model based on can quickly predict the blood glucose level and assist doctors in treatment. Therefore, an improved LightGBM model based on the Bayesian hyper-parameter optimization algorithm is proposed for the prediction of blood glucose, namely HY_LightGBM, which optimizes parameters using a Bayesian hyper-parameter optimization algorithm based on LightGBM. The Bayesian hyper-parameter optimization algorithm is a model-based method for finding the minimum value of the function so as to obtain the optimal parameters of the LightGBM model. Experiments have demonstrated that the parameters obtained by the Bayesian hyper-parameter optimization algorithm are superior to those obtained by a genetic algorithm and random search. The improved LightGBM model based on the Bayesian hyper-parameter optimization algorithm achieves a mean square error of 0.5961 in blood glucose prediction, with a higher accuracy than the XGBoost model and CatBoost model.

Keywords: blood glucose prediction; integrated learning; LightGBM; Bayesian super parameter optimization

1. Introduction The recent years have seen a rapid increase in the incidence of diabetes around the world, due to many factors such as the continuous improvement in people’s living standards, changes in dietary structure, an increasingly rapid pace of life, and a sedentary lifestyle. Diabetes has become the third major chronic disease that seriously threatens human health, following cancer and cardiovascular disease [1,2]. According to statistics from the International Diabetes Federation (IDF), there were approximately 425 million patients with diabetes across the world in 2017. One in every 11 adults has diabetes, and one in every two patients is undiagnosed [3]. As of 2016, diabetes directly caused 1.6 million deaths [4], and it is estimated that by 2045, nearly 700 million people worldwide will suffer from diabetes, which will pose an increasing economic burden on health systems in most of countries. It is forecast that by 2030, at least $490 billion will be spent on diabetes globally [5]. Currently, the diagnosis rate of diabetes is low, as patients exhibit no obvious symptoms in the early onset of the disease, and many people do not realize they have the disease [6]; thus, early detection and diagnosis are particularly needed. It is relatively simple to measure a person’s blood glucose under the existing medical conditions, but it takes a lot of human and material resources to detect the blood glucose of a large number of people in medical examination. Therefore, the prediction

Appl. Sci. 2020, 10, 3227; doi:10.3390/app10093227 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, 3227 2 of 16 of the blood glucose of a large number of people in medical examination by machine learning can save a lot of unnecessary expenses (for example [7]). With the application of machine learning in medical fields, more and more people are applying emerging prediction methods to many different medical fields to help greatly reduce the workload of the related medical staff and improve the diagnosis efficiency of doctors. For example, Yu Daping and Liu applied the XGBoost model for the early diagnosis of lung cancer [8]. Tjeng Wawan Cenggoro used the XGBoost model to predict and analyze colorectal cancer in Indonesia [9]. Chang Wenbing forecasted the prognosis of hypertension using the XGBoost model [10]. Ogunleye Adeola Azeez applied the XGBoost model for the diagnosis of chronic kidney disease [11]. Wenbing Chang used the XGBoost model and clustering algorithm to analyze the probability of hypertension-related symptoms [12]. Wang Bin predicted severe hand, foot, and mouth disease using the CatBoost model [13]. As the XGBoost model generates a decision tree using the level-wise method [14], it the splits leaves simultaneously on the same layer, but the splitting gain of most leaf nodes is low. In many cases where further splitting is unnecessary, the XGBoost model will continue to split, causing non-essential expenses. Compared with the LightGBM model, the XGBoost model occupies more memory and consumes more time when the dataset is large. The traditional genetic algorithm and random search algorithm have different defects—the genetic algorithm is a natural adaptive optimization method that simulates the problem to be solved as a process of biological evolution, and gradually eliminates the solutions with low fitness function values through generating next-generation solutions by operations such as replication, crossover, and mutation [15]. The genetic algorithm uses the search information of multiple search points at the same time, and adopts probabilistic search technology to obtain the optimal or sub-optimal solution of the optimization problem. It boasts of good search flexibility, global search capability, and is easily implemented [16]. However, it has a poor local search ability, complicated process caused by many control variables, and there are no definite termination rules. The random searching algorithm (RandomizedSearchCV) performs a random search in a set parameter search space. It samples a fixed number of parameters from a specified distribution instead of trying all of the parameter values [17]. However, the random searching algorithm exhibits a poor performance when the dataset is small. Therefore, this paper proposes an improved LightGBM model for blood glucose prediction. The main contributions of this study are as follows: by preprocessing the data on various medical examination indicators of people receiving medical examination, this paper proposes a LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm to predict the blood glucose level of people receiving medical examination. The LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm achieves a higher accuracy than the XGBoost model, Catboost model, the LightGBM model optimized by genetic algorithm, and the LightGBM model optimized by random searching algorithm, which can help doctors give early warning to those potentially suffering from diabetes, so as to reduce the incidence of diabetes, increase the diagnosis rate, and provide new ideas for the in-depth study on diabetes. The proposed method was verified by the diabetes data from a grade-three first-class hospital in China from September to October 2017.

2. Materials and Methods

2.1. LightGBM Model LightGBM is an improvement framework based on decision tree algorithm released by Microsoft in 2017. LightGBM and XGBoost both support parallel arithmetic, but LightGBM is more powerful than the previous XGBoost model, with a fast training speed and less memory occupation, which can reduce the communication cost of parallel learning. LightGBM is mainly featured by the decision tree algorithm based on gradient-based one-side sampling (GOSS), exclusive feature bundling (EFB), and a histogram and leaf-wise growth strategy with a depth limit. Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 17 Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 17 Appl. TheSci. 20 basic20, 10 ,idea x FOR of PEER GOSS REVIEW (gradient -based one-side sampling) is to keep all of the large gradient3 of 17 The basic idea of GOSS (gradient-based one-side sampling) is to keep all of the large gradient samplesAppl. Sci. and2020, 10to, 3227perform random sampling on the small gradient samples according to proportion.3 of 16 Thesamples basicThe and basicidea to ideaofperform the of EFBGOSS random (exclusive (gradient sampling feature-based on one bundling)the- sidesmall samplin gradientalgorithmg) issamples isto tokeep divide according all of the the features tolarge proportion. gradient into a smallerThesamples basic number and idea to of ofperform themutually EFB random ( exclusiveexclusive sampling featurebundles, on bundling) the that small is, it algorithmgradientis impossible samples is to to divide find according an the accurate features to proportion. solution into a insmallerThe polynomial basicThe number basicidea time. idea ofof themutually ofTherefore, GOSSEFB ( exclusiveexclusive (gradient-based what it featurebundles,uses is one-side bundling)an that approximate is, sampling) it algorithmis impossible solut is to ision, keep to to that dividefind all is, of an theathe accuratesmall large features number gradient solution into of a sampleinsmallersamples polynomial pointsnumber and totime.that performof mutuallyare Therefore, not random mutually exclusive what sampling itexclusive bundles,uses on is thean arethat approximate small allowed is, it gradient is impossiblebetween solut samplesion, features to that accordingfind is, (foran a smallaccurate example, to proportion. number solution some of correspondingsampleinThe polynomial basic points idea ofsamplethattime. the are EFBTherefore, points not (exclusive mutually are what not feature nonitexclusive uses-zero bundling) is at anare the approximate algorithmallowed same time). between is solut to Allowing divideion, features thethat a features smallis, (for a small example,part into aofnumber smaller conflict some of cancorrespondingsamplenumber obtain points of mutuallya smaller samplethat exclusiveare pointsnumber not mutually bundles,are of not feature non thatexclusive- is,zero bundles, it is at impossibleare the allowed whichsame to time). findfurtherbetween anAllowing accurateimproves features solutiona small (forthe part incomputationalexample, polynomial of conflict some effectivenesscancorrespondingtime. obtain Therefore, a [18 smaller sample]. what it numberpoints uses is are an of approximatenot feature non- zerobundles, solution, at the whichsam thate is, time).further a small Allowing numberimproves a of small samplethe computationalpart points of conflict that effectivenesscanare The notobtain mutuallybasic a[18 ideasmaller]. exclusive of thenumber h areistogram allowed of feature algorithm between bundles, featuresis to discretizewhich (for example,further continuous someimproves correspondingfloating the pointcomputational samplefeatures intoeffectivenesspoints kThe integers, are basic not [18 non-zero ideaand]. toof constructthe at the histogram same a histogram time). algorithm Allowing with is a to smallwidth discretize part of k of at conflictcontinuous the same can obtaintime. floating Whe a smaller pointn the number datafeatures are traversed,intoof feature kThe integers, basicstatistics bundles, andidea whichis toof accumulatedconstruct the further histogram a improveshistogram in thealgorithm histogram the with computational is a towidth with discretize theof ek ffdiscretized atectiveness continuousthe same [value 18time. ].floating asWhe index. npoint the After datafeatures theare datatraversed,into arekThe integers, traversed basicstatistics idea and once, ofis to theaccumulated construct the histogram histogram a histogram algorithmin the accumulates histogram with is to discretizea thewidth with required theof continuous k discretized at volumethe same floating ofvalue time. statistics, point as Whe index. features nand the Afterthen data into the kare integers, and to construct a histogram with a width of k at the same time. When the data are traversed, optimaldatatraversed, are segmentationtraversed statistics once, is pointaccumulated the histogramcan be foundin theaccumulates histogramthrough traversethe with required the according discretized volume to ofthevalue statistics, discrete as index. andvalue Afterthen in thethe statistics is accumulated in the histogram with the discretized value as index. After the data are traversed histogramoptimaldata are segmentation traversed, as shown once, in point Figure the canhistogram 1: be found accumulates through traverse the required according volume to theof statistics, discrete andvalue then in thethe histogramoptimalonce, the segmentation, histogram as shown accumulatesin pointFigure can 1: thebe requiredfound through volume traverse of statistics, according and then to the the optimal discrete segmentation value in the histogrampoint can be, a founds shown through in Figure traverse 1: according to the discrete value in the histogram, as shown in Figure1:

Figure 1. Schematic diagram of the histogram algorithm. Figure 1. Schematic diagram of the histogram algorithm. Most of the learningFigure algorithm 1.1. Schematic tree diagramofdiagram decision ofof the the tree histogram histogram is generated algorithm. algorithm by. the level-wise growth method,Most such of theas XGBoost, learning asalgorithm shown in tree Figure of 2:decision tree is generated by the level-wise growth Most of the learning algorithm tree of decision tree is generated by the level-wise growth method, method,Most such of asthe XGBoost, learning as algorithm shown in treeFigure of 2:decision tree is generated by the level-wise growth method,such as XGBoost,such as XGBoost, as shown as in shown Figure 2in: Figure 2:

Figure 2. Schematic diagram of level wise strategy growth tree. FigureFigure 2. 2. SchematicSchematic diagram diagram of level wisewise strategystrategy growthgrowth tree. tree. Figure 2. Schematic diagram of level wise strategy growth tree. LightGBMLightGBM uses uses a leaf-wiseleaf-wise growth growth strategy strategy with with a depth a depth limit limit to find to afind leaf nodea leaf with node the with largest the LightGBM uses a leaf-wise growth strategy with a depth limit to find a leaf node with the largestsplit gain split in gain all of in the all currentof the current leaf nodes, leaf thennodes, splits, then and splits so,on, and as so shown on, as in shown Figure in3: Figure 3: largestLightGBM split gain usesin all aof leaf the- wisecurrent growth leaf nodes, strategy then with splits a ,depth and so limit on, as to shown find a in leaf Figure node 3: with the largest split gain in all of the current leaf nodes, then splits, and so on, as shown in Figure 3:

FigureFigure 3. 3. SchematicSchematic diagram diagram of leaf leaf-wise-wise strategystrategy growthgrowth tree. tree. Figure 3. Schematic diagram of leaf-wise strategy growth tree. Compared with the level-wise growth strategy, leaf-wise tree growth can reduce large errors and Compared with theFigure level 3.- wiseSchematic growth diagram strategy, of leaf leaf-wise-wise strategy tree growthgrowth tree can. reduce large errors achieve a higher accuracy, thereby providing solutions to many problems. For example, Xile Gao used and achieveCompared a higher with theaccuracy, level-wise thereby growth providing strategy, solutions leaf-wise to treeman growthy problems. can reduce For example, large errors Xile Stacked Denoising Auto Encoder (SDAE) and LightGBM models to recognize human activity [19]; Gaoand achieveusedCompared Stacked a higher with Denoising theaccuracy, level Auto-wise thereby Encodergrowth providing strategy,(SDAE solutions) andleaf- wiseLightGBM to treeman growthy modelsproblems. can to reduceForrecognize example, large human errors Xile Sunghyeon Choi employed a random forest, XGBoost model, and LightGBM model to predict solar activityGaoand usedachieve [19 Stacked]; Sunghyeona higher Denoising accuracy, Choi employedAuto thereby Encoder aproviding random (SDAE forest, solutions) and XGBoost LightGBM to man model,y modelsproblems. and toLightGBM recognizeFor example, model human Xile to energy output [20]; João Rala Cordeiro forecasted children’s height using a XGBoost model and predictactivityGao used solar[19 ];Stacked Sunghyeonenergy Denoising output Choi [20 employedAuto]; João Encoder Rala a random Cordeiro (SDAE forest, )forecast and XGBoost LightGBMed children’s model, models andheight LightGBMto usingrecognize a modelXGBo humanost to predictactivityLightGBM solar [19]; model energySunghyeon [21 output]; Ma Choi Xiaojun [20 employed]; João et al. Rala adopted a random Cordeiro a LightGBM forest, forecast XGBoosted model children’s model, and XGBoost heightand LightGBM modelusing toa XGBo predictmodelost to predict solar energy output [20]; João Rala Cordeiro forecasted children’s height using a XGBoost Appl. Sci. 2020, 10, 3227 4 of 16 the default of P2P network loans [22]; Chen Cheng et al. used LightGBM and a multi-information fusion model to predict the interaction between proteins [23]; and Vikrant A. Dev applied a LightGBM model to stratum lithology classification [24]. The following is the introduction to the theory of the LightGBM model’s objective function: yi is the objective value, i is the predicted value, T represents the number of leaf nodes, q denotes the structure function of the tree, and w is the leaf weight. The objective function is as follows:

n t (t) = P ( (t)) + P ( ) Obj l yi, yˆi Ω fi i=1 i=1 n t (1) P (t 1) P = ( − + ( )) + ( ) l yi, yˆi ft xi Ω fi i=1 i=1

Logistic loss: X yˆ yˆi L(θ) = [y ln(1 + e− i ) + (1 y ) ln (1 + e) ] (2) i − i i Use the Taylor expansion to define the objective function:

1 2 f (x + ∆x)  f (x) + f 0(x)∆x + f 00 (x)∆x (3) 2 At this time, the objective function is the following:

Xn (t) (t 1) 1 2 Obj = [l(y , yˆ − ) + g f (x ) + h f (x )] + Ω( f ) (4) i i t i 2 i t i t i=1

Among: (t 1) 2 (t 1) gi = ∂ (t 1) l(yi, yˆ − ), hi = ∂ (t 1) l(yi, yˆ − ) (5) yˆ − − i yˆi Use the accumulation of n samples to traverse all of the leaf nodes:

n ( ) X 1 Obj t  [g f (x ) + h f 2(x )] + Ω( f ) (6) i t i 2 i t i (t) i=1 where Ij is the sample set in leaf node j, namely: X X Gj = gi, Hj = hi (7) i I i I ∈ j ∈ j Therefore: PT P P Obj(t) = [( g )w + 1 ( h + )w2] i Ij i j 2 i Ii j λ j j=1 ∈ ∈ T (8) = P [ + 1 ( + ) 2] Gjwj 2 Hj λ wj j=1

The partial derivative of the output Wj of the jth leaf node is obtained, and the minimum value is obtained as follows: X X X X ∂ 1 2 [( gi)wj + ( hi + λ)wj ] = gi + ( hi + λ)wj (9) w i Ij i Ij i Ij i Ij ∂ j ∈ 2 ∈ ∈ ∈ Solution: Gj wj = (10) −Hj + λ Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 17

T 1 2 Lt() q=∑ Gwjj ++() H jλ w j j=1 2 Appl. Sci. 2020, 10, 3227 5 of 16 T GGjj1 2 =[()()()]∑ GHjj− + +−λ j=1 HHjj++λλ2 When the structure of the q (x) tree is determined, the function Lt(q) is obtained, as follows: T 22 (11) GGjj1 =−+PT L∑(q) = G w + 1 (H + )w2 tj=1 HHjj++λλj2 j 2 j λ j j=1 T T2 2 P Gj Gj 1 = G j [ ( ) + 1 ( + )( ) ] = − Gj H +λ 2 Hj λ H +λ ∑ j=1 − j − j 2 = H + λ j 1 jT G2 G2 (11) P j 1 j = H +λ + 2 H +λ The calculated gain is the followingj=1 −: j j T G2 = 221 P j 2 1 GL2 H + Gλ () GG LR+ G =[]− j= +−1 j +λλ + ++ λ (12) 2 HL H R HH LR The calculated gain is the following:

2.2. Bayesian Hyper-Parameter Optimization2 Algorithm2 2 1 G G (GL + GR) G = [ L + R ] (12) The basic idea of the Bayesian2 H Lhyper+ λ -parameterHR + λ − HoptimizationL + HR + λ algorithm is to establish a substitute function based on the evaluation result of the past objective to find the minimum value of 2.2. Bayesian Hyper-Parameter Optimization Algorithm the objective function. The substitute function established in this process is easier to optimize than the originalThe basicobjective idea offunction, the Bayesian and hyper-parameterthe input value optimizationto be evaluated algorithm is selected is to establish by applying a substitute a certain standardfunction to based the proxy on the function evaluation [25 result]. Although of the past the objective genetic toalgorithm find the minimum is currently value widely of the objectiveused in the parameterfunction. opti Themization substitute of function machine established learning in this[26– process29], its is disadvantages easier to optimize are than also the obvious, original objective such as a function, and the input value to be evaluated is selected by applying a certain standard to the proxy poor ability of local search, many control variables, a complicated process, and no determined function [25]. Although the genetic algorithm is currently widely used in the parameter optimization of termination rules. The random searching algorithm is a traditional parameter optimization machine learning [26–29], its disadvantages are also obvious, such as a poor ability of local search, many algorithm in machine learning [30]. It uses random sampling within the search range for parameter control variables, a complicated process, and no determined termination rules. The random searching optimization,algorithm is but a traditional the effect parameter is poor when optimization the dataset algorithm is small. in machineIn contrast, learning Bayesian [30]. Ithyper uses- randomparameter optimizationsampling within takes thethe search result range of the for parameterprevious evaluation optimization, into but theaccount effect when is poor t whenrying theanother dataset set is of hypersmall.-parameters, In contrast, and Bayesian has hyper-parametera simpler process optimization of optimizing takes the model result of parameters the previous than evaluation the genetic into algorithm,account whenwhich trying can save another a lot set of oftime. hyper-parameters, and has a simpler process of optimizing model parametersThe HY_LightGBM than the genetic model algorithm, proposed which in can this save paper a lot of uses time. one of the Bayesian optimization librariesThe in HY_LightGBMPython, Hyperopt, model proposedwhich uses in this Tree paper Parzen uses one Estimation of the Bayesian (TPE) optimization as the optimization libraries algorithm.in Python, Hyperopt, which uses Tree Parzen Estimation (TPE) as the optimization algorithm. TheThe optimization optimization process process of of the the Bayesian Bayesian optimization algorithm algorithm is is shown shown in in Figure Figure4. 4.

start

Define objective function

Definition domain space is the value range of parameters

Construct substitution function

Get the optimal parameters

End

FigureFigure 4. 4. FFlowlow chart chart of parameter optimization.optimization. Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 17

By defining the objective function, domain space, and constructing the substitution function, the optimal parameters were finally obtained, with mean square error (MSE) as the evaluation indicator. The obtained optimal parameters were input into the LightGBM model to further improve the prediction ability of the model. Appl. Sci. 2020, 10, 3227 6 of 16 2.3. Improved LightGBM Model Based on Bayesian Hyper-Parameter Optimization Algorithm The experimentBy defining was the objectiveconducted function, on a domaincomputer space, with and Intel constructing I5 8400 the2.8GHz substitution six-core function, six-thread central theprocessing optimal parameters unit (CPU were), finally16G random obtained,- withaccess mean memory square error (RAM (MSE)), and as the Windows evaluation indicator.10 .The The obtained simulation optimal platform parameters is werePycharm, input into and the Python LightGBM was model used to for further programming, improve the prediction with sklearn, ability of the model. pandas, and numpy libraries adopted. Because2.3. Improved LightGBM LightGBM has Model many Based parameters, on Bayesian Hyper-Parameter the manual Optimizationadjustment Algorithm of parameters will be complicated,The and experiment considering was conducted its parameters on a computer have a with great Intel impact I5 8400 on 2.8 GHzexperimental six-core six-thread results, it is particularlycentral necessary processing unitto (CPU),use Bayesian 16G random-access hyper-parameter memory (RAM), optimization and Windows algorithm 10 operating for system. parameter optimization.The simulation The Hyperopt platform isuse Pycharm,d in this and paper Python is was one used of for the programming, Bayesian optimization with sklearn, pandas,libraries in Python.and It is numpy also a libraries class adopted. used in distributed asynchronous algorithm configuration in Python, with a fasterBecause speed LightGBM and better has many effect parameters, in finding the manual the optimal adjustment parameters of parameters of will the be model complicated, than the traditionaland consideringparameter itsoptimization parameters have algorithms. a great impact on experimental results, it is particularly necessary to use Bayesian hyper-parameter optimization algorithm for parameter optimization. The Hyperopt The specific steps are as follows: used in this paper is one of the Bayesian optimization libraries in Python. It is also a class library used (1)in distributedDivide the asynchronous dataset into algorithm training configuration set and test in Python,set, process with a fasterthe missing speed and values, better effectanalyze in the findingweight the optimalof the influence parameters of of thethe model eigenvalues than the traditionalon the results, parameter delete optimization useless algorithms.eigenvalues, and The specific steps are as follows: delete outliers; (2)(1) UseDivide the Bayesian the dataset hyper into training-parameter set and optimization test set, process algorithm the missing for values, the analyzeparameter the weight optimization of of thethe influence LightGBM of the model, eigenvalues and the on theHY_LightGBM results, delete uselessmodel eigenvalues, is constructed and deleteand trained; outliers; (2) Use the Bayesian hyper-parameter optimization algorithm for the parameter optimization of the (3) Use the HY_LightGBM model for prediction and output the prediction results. LightGBM model, and the HY_LightGBM model is constructed and trained; (4) The specific experimental process is shown in Figure 5. (3) Use the HY_LightGBM model for prediction and output the prediction results. (4) The specific experimental process is shown in Figure5.

start

Input data set

Divide the data set into training set and test set

Delete characteristic values with more missing values

Delete eigenvalues that have no effect on experimental results

Delete outliers

Using Bayesian super parameter optimization algorithm to optimize the parameters of the model, after obtaining the parameters, training and prediction are carried out

Output forecast results

End

Figure 5. Experiment flow chart.

2.4. Data Preprocessing Figure 5. Experiment flow chart. The dataset in this paper is the diabetes data from September to October 2017 in a grade-three first-class hospital, provided by the Tianchi competition platform as the data source, with a total of 7642 pieces of data and 42 eigenvalues, As shown in Table1. The eigenvalues include the following: Appl. Sci. 2020, 10, 3227 7 of 16

Table 1. Introduction to eigenvalues.

Eigenvalue Name Eigenvalue Name Explanation (Unit) ID Physical examination personnel ID Gender Male/female Age Age Date of physical examination Date of physical examination Aspartate aminotransferase Aspartate aminotransferase (U/L) Alanine aminotransferase Alanine aminotransferase (U/L) Alkaline phosphatase Alkaline phosphatase (U/L) R-Glutamyltransferase R-Glutamyltransferase (U/L) Total protein Total serum protein (g/L) Albumin Serum albumin (g/L) Globulin Globulin (g/L) White ball ratio Ratio of albumin to globulin Triglyceride Serum triglyceride (mmol/L) Total cholesterol Total cholesterol in lipoproteins (mmol/L) High density lipoprotein cholesterol High density lipoprotein cholesterol (mg/dl) LDL cholesterol LDL cholesterol (mg/dl) Urea Urea (mmol/L) Creatinine Products of muscle metabolism in human body (µ mol/L) Uric acid Uric acid (umol/L) Hepatitis B surface antigen Hepatitis B surface antigen (ng/mL) Hepatitis B surface antibody Hepatitis B surface antibody (mIU/mL) Hepatitis B e antigen Hepatitis B e antigen (PEI/mL) Hepatitis B e antibody Hepatitis B e antibody (P/mL) Hepatitis B core antibody Hepatitis B core antibody (PEI/mL) Leukocyte count Leukocyte count ( 109/L) × RBC count RBC count ( 1012/L) × Hemoglobin Hemoglobin (g/L) Hematocrit Hematocrit Mean corpuscular volume Mean corpuscular volume (fl) Mean corpuscular hemoglobin Mean corpuscular hemoglobin (pg) Mean corpuscular hemoglobin concentration Mean corpuscular hemoglobin concentration (g/L) Red blood cell volume distribution width Red blood cell volume distribution width Platelet count Platelet count ( 109/L) × Mean platelet volume Mean platelet volume (fl) Platelet volume distribution width Platelet volume distribution width (%) Platelet specific volume Platelet specific volume (%) Neutrophils Neutrophils (%) Lymphocyte Lymphocyte (%) Monocyte Monocyte (%) Eosinophils Eosinophils (%) Basophilic cell Basophilic cell (%) Blood glucose Blood glucose level (mg/dl)

In this paper, the dataset was divided, with 6642 pieces of data as the training set, and the remaining 1000 pieces of data as the test set. Appl. Sci. 2020, 10, 3227 8 of 16

Appl. Sci. 20Firstly,20, 10, thex FOR missing PEER REVIEW values in the original dataset were analyzed to obtain the proportion of the8 of 17 missing data. It can be seen from Figure6 that the missing proportion of five basic features—hepatitis B featuressurface— antigen,hepatitis hepatitis B surface B surface antigen, antibody, hepatitis hepatitis B B coresurface antibody, antibody, hepatitis hepatitis B e antigen, B andcore hepatitis antibody, B hepatitise antibody—is B e antigen, over 70%,and significantlyhepatitis B exceedinge antibody the— missingis over proportion 70%, significantly of other basic exceeding features. Therefore,the missing proportionthese features of other with basic large features. missing values Therefore, were deleted, these features and those with with large smaller missing missing values values were were deleted, filled andwith those medians. with smaller missing values were filled with medians.

FigureFigure 6. 6. FeatureFeature missing scale.scale. Secondly, the weight of the influence of the data features was analyzed. The weight of the eigenvalues Secondly, the weight of the influence of the data features was analyzed. The weight of the can be obtained through related functions. According to the eigenvalue weight of each item, the eigenvalues eigenvalues can be obtained through related functions. According to the eigenvalue weight of each that have no effect on the results can be found. By processing the invalid eigenvalues, the accuracy of the item,experiment the eigenvalues can be further that improved.have no effect It can on be seenthe fromresults Figure can7 bethat found the date. By of processing medical examination the invalid eigenvalues,and gender the have accuracy no practical of the influence experiment on the can prediction be further results improved. of the model. It can It can be be seen known from through Figure 7 thatcommon the date sense of thatmedical ID also examination has no impact and on thegender experimental have no results, practical so these influence features on were the deleted. prediction results ofAgain, the model. to analyze It thecan correlationbe known coe throughfficient ofcommon each eigenvalue, sense that the ID darker also the has color, no impact the stronger on the experimentalthe correlation. results, From so Figurethese 8features, we can were see the deleted. correlation between each eigenvalue. Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 17

Appl. Sci. 2020, 10, 3227 9 of 16 Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 17

Figure 7. Eigenvalue influence weight graph.

Again, to analyze the correlation coefficient of each eigenvalue, the darker the color, the stronger the correlation. FromFigureFigure Figure 7. 7. E8Eigenvalueigenvalue, we can seeinfluence influence the correlation weight weight graph. between each eigenvalue.

Again, to analyze the correlation coefficient of each eigenvalue, the darker the color, the stronger the correlation. From Figure 8, we can see the correlation between each eigenvalue.

FigureFigure 8. 8. MMatrixatrix thermodynamic diagramdiagram ofof thethe eigenvalue eigenvalue correlation correlation coe coefficient.fficient.

Figure 8. Matrix thermodynamic diagram of the eigenvalue correlation coefficient. Appl.Appl. Sci. Sci.20202020, 10,,10 x ,FOR 3227 PEER REVIEW 1010 of of 16 17

Finally, the basic features and the distribution of the blood glucose levels were learned, which Finally, the basic features and the distribution of the blood glucose levels were learned, which can can provide a more intuitive reference for feature engineering. The blood glucose values were used provide a more intuitive reference for feature engineering. The blood glucose values were used as the as the Y-axis coordinates and the other basic features as the X-axis coordinates. Each basic feature Y-axis coordinates and the other basic features as the X-axis coordinates. Each basic feature and the anddistribution the distribution of the bloodof the glucoseblood glucose value were value listed were for listed analysis, for asanalysis, shown inas Figureshown9 ,in where Figure aspartate 9, where aspartateaminotransferase aminotransferase and the distributionand the distribution of blood glucoseof blood are glucose displayed. are displayed. It can be seen It can from be the seen figure from thethat figure the distributionthat the distribution of the blood of glucose the blood value glucose in aspartate value aminotransferase in aspartate aminotransferase included outliers, include so the d outlieroutlierss, so were the outliers deleted. were The outliers deleted of. The the remainingoutliers of featuresthe remaining were also features deleted. were also deleted.

Figure 9. Scatter diagram of aspartate aminotransferase and blood glucose value. Figure 9. Scatter diagram of aspartate aminotransferase and blood glucose value. 2.5. Parameter Optimization Based on Bayesian Hyper-Parameter Optimization Algorithm 2.5. Parameter Optimization Based on Bayesian Hyper-Parameter Optimization Algorithm The optimal parameters of the LightGBM model were found by the Bayesian hyper-parameter optimizationThe optimal algorithm. parameters Firstly, of the Hyperopt’s LightGBM own model function were wasfound used byto the define Bayesian the parameter hyper-parameter space, optimizationthen the model algorithm. and score Firstly, acquirer Hy wereperopt created,’s own and function finally, MSEwas used was used to define as the the evaluation parameter indicator space, thento obtainthe model the optimal and score parameters acquirer of thewere LightGBM created, model. and finally The optimal, MSE parameterswas used ofas thethe LightGBM evaluation indicatormodel obtainedto obtain through the optimal the Bayesian parameters hyper-parameter of the LightGBM optimization model. algorithm The optimal are shownparameters in Table of2 the. LightGBM model obtained through the Bayesian hyper-parameter optimization algorithm are Table 2. The optimal parameters of the LightGBM model. shown in Table 2. Parameter Name Default Value Optimal Parameters Parameter Implication learning_rateTable 0.1 2. The optimal 0.052 parameters of the LightGBM Learningmodel. rate n_estimators 10 376 Number of basic learners min_data_in_leafDefault 20 Optimal 18 The smallest possible record tree for a leaf Parameter Name Parameter Implication bagging_fractionValue 1 Parameters 0.9 Data scale for each iteration feature_fraction 1 0.5 Proportion of randomly selected features in each iteration learning_rate 0.1 0.052 Learning rate n_estimators2.6. Blood Glucose Prediction10 by HY_LightGBM376 Model Number of basic learners min_data_in_leaf 20 18 The smallest possible record tree for a leaf bagging_fractionThrough the data preprocessing1 in Sections0.9 2.4 and 2.5 and the BayesianData scale hyper-parameter for each iteration optimization algorithm, the optimal parameters of the LightGBM model wereProportion determined of randomly and input selected into the LightGBMfeatures in feature_fraction 1 0.5 model for training and prediction. The blood glucose values output by it andeach the iteration bloodglucose values in the test set were evaluated by three evaluation indicators, namely: mean square error (MSE), root

mean square error (RMSE), and determination coefficient R2 (R-Square). Appl. Sci. 2020, 10, 3227 11 of 16

2.7. Evaluation Indicators The performance of the blood glucose prediction model was evaluated by the following three indicators: mean square error (MSE), root mean square error (RMSE), and determination coefficient R2 (R-Square). These are commonly used in regression tasks, and the smaller the mean square error (MSE) and the root mean square error (RMSE) value, the more accurate the prediction results.

n 2 1 X  MSE = y yˆ (13) 2n i − i i=1 uv t n 2 1 X  RMSE = y yˆ (14) 2n i − i i=1 n P 2 (yi yˆ i) i=1 − R = 1 (15) squaed − n 2 P − (yi yi) i=1 −

3. Experimental Results and Discussion

3.1. Experimental Results After the parameter optimization of the LightGBM model by the Bayesian hyper-parameter optimization algorithm, the parameters of the LightGBM model obtained were set as follows: the learning_rate was 0.052 and the number of basic learners n_estimators was 376. The minimum row tree that the leaf may have, the tree min_data_in_leaf, is 18. The ratio of data used in each iteration bagging_fraction was 0.9, and the ratio of randomly selected features in each iteration feature_fraction was 0.5. This set of optimal parameters of the LightGBM models were input into the LightGBM model to predict the blood glucose value. The experimental results are shown in Table3. The scatter diagram of the actual blood glucose values and the predicted blood glucose values is shown in Figure 10.

Table 3. Prediction results of HY_LightGBM model.

Model Name MSE RMSE R-Square Training Time HY_LightGBM 0.5961 0.7721 0.2236 26.7938 s Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 17

FigureFigure 10. 10.Scatter Scatter diagram diagram ofof predictionprediction results results of of HY HY_LightGBM_LightGBM model model..

It can be found from Table 3 that the HY_LightGBM model had a mean square error of 0.5961, a root mean square error of 0.7721, a determination coefficient of 0.2236, and a training time of 26.7938 s in the prediction of the blood glucose values.

3.2. Comparative Experiments Two groups of comparative experiments were set in the study. The first group of comparative experiments was as follows: comparing the HY_LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm with the LightGBM model, the XGBoost model, and the CatBoost model without parameter optimization. The second group of the comparative experiments was as follows: the comparison among the LightGBM model with parameter optimization by genetic algorithm, the LightGBM model with parameter optimization through random searching algorithm, and the LightGBM model with parameter optimization by the Bayesian hyper-parameter optimization algorithm used in this study. The comparative experiments verified the improvement in the prediction accuracy of the HY_LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm and the feasibility of the HY_LightGBM model.

3.2.1. Comparison between the HY_LightGBM Model and LightGBM Model This experiment verified the effectiveness of the Bayesian hyper-parameter optimization algorithm for improving the LightGBM model by comparing the LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm with the LightGBM model without parameter optimization. The optimal parameters found by the Bayesian hyper-parameter optimization algorithm were input into the LightGBM model, and the prediction result of the improved HY_LightGBM model was compared with that of the LightGBM model. The specific experimental results are shown in Table 4 and Figure 11.

Table 4. Prediction performance comparison between the HY_LightGBM model and LightGBM model.

Model Name MSE RMSE R-Square Training Time LightGBM 0.6159 0.7848 0.1978 3.5039 s HY_LightGBM 0.5961 0.7721 0.2236 26.7938 s Appl. Sci. 2020, 10, 3227 12 of 16

It can be found from Table3 that the HY_LightGBM model had a mean square error of 0.5961, a root mean square error of 0.7721, a determination coefficient of 0.2236, and a training time of 26.7938 s in the prediction of the blood glucose values.

3.2. Comparative Experiments Two groups of comparative experiments were set in the study. The first group of comparative experiments was as follows: comparing the HY_LightGBM model optimized by the Bayesian hyper- parameter optimization algorithm with the LightGBM model, the XGBoost model, and the CatBoost model without parameter optimization. The second group of the comparative experiments was as follows: the comparison among the LightGBM model with parameter optimization by genetic algorithm, the LightGBM model with parameter optimization through random searching algorithm, and the LightGBM model with parameter optimization by the Bayesian hyper-parameter optimization algorithm used in this study. The comparative experiments verified the improvement in the prediction accuracy of the HY_LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm and the feasibility of the HY_LightGBM model.

3.2.1. Comparison between the HY_LightGBM Model and LightGBM Model This experiment verified the effectiveness of the Bayesian hyper-parameter optimization algorithm for improving the LightGBM model by comparing the LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm with the LightGBM model without parameter optimization. The optimal parameters found by the Bayesian hyper-parameter optimization algorithm were input into the LightGBM model, and the prediction result of the improved HY_LightGBM model was compared with that of the LightGBM model. The specific experimental results are shown in Table4 and Figure 11.

Table 4. Prediction performance comparison between the HY_LightGBM model and LightGBM model.

Model Name MSE RMSE R-Square Training Time LightGBM 0.6159 0.7848 0.1978 3.5039 s Appl. Sci. 2020, 10, x FOR PEERHY_LightGBM REVIEW 0.5961 0.7721 0.2236 26.7938 s 13 of 17

Figure 11. Performance comparison between HY_LightGBM and LightGBM. Figure 11. Performance comparison between HY_LightGBM and LightGBM.

The experiment demonstrated that although the LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm takes a little more time than the LightGBM model without parameter optimization, it significantly outperforms the LightGBM model without any parameter optimization in terms of prediction accuracy.

3.2.2. Comparison between HY_LightGBM Model and Other Classification Models In order to further verify the prediction performance of the HY_LightGBM model, two other models of the same type were selected for comparison, namely the XBGoost model and the CatBoost model. The experiment confirmed that the HY_LightGBM model has a significantly higher prediction accuracy than the other two models of the same type. The experimental results are shown in Table 5 and Figure 12.

Table 5. Performance comparison of the other models.

Model Name MSE RMSE R-Square Training Time HY_LightGBM 0.5961 0.7721 0.2236 26.7938 s XBGoost 0.6284 0.7927 0.1815 6.2837 s CatBoost 0.6483 0.8051 0.1556 73.3301 s

Figure 12. Performance comparison chart of HY_LightGBM, XGBoost, and CatBoost. Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 17

Figure 11. Performance comparison between HY_LightGBM and LightGBM. Appl. Sci. 2020, 10, 3227 13 of 16 The experiment demonstrated that although the LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm takes a little more time than the LightGBM model without The experiment demonstrated that although the LightGBM model optimized by the Bayesian parameter optimization, it significantly outperforms the LightGBM model without any parameter hyper-parameter optimization algorithm takes a little more time than the LightGBM model without optimization in terms of prediction accuracy. parameter optimization, it significantly outperforms the LightGBM model without any parameter optimization in terms of prediction accuracy. 3.2.2. Comparison between HY_LightGBM Model and Other Classification Models 3.2.2.In order Comparison to further between verify HY_LightGBM the prediction Model performance and Other of Classification the HY_LightGBM Models model, two other models Inof orderthe same to further type verify were the selected prediction for performancecomparison, of namely the HY_LightGBM the XBGoost model, model two and other the CatBoostmodels model. of the same The typeexperiment were selected confirmed for comparison, that the namely HY_LightGBM the XBGoost model model has and thea significantly CatBoost highermodel. prediction The experiment accuracy confirmed than the that other the HY_LightGBMtwo models of model the same has a type. significantly The experimental higher prediction results are accuracyshown in than Table the 5 other and Figure two models 12. of the same type. The experimental results are shown in Table5 and Figure 12. Table 5. Performance comparison of the other models. Table 5. Performance comparison of the other models. Model Name MSE RMSE R-Square Training Time Model Name MSE RMSE R-Square Training Time HY_LightGBM 0.5961 0.7721 0.2236 26.7938 s XBGoost HY_LightGBM0.6284 0.59610.7927 0.7721 0.22360.1815 26.7938 s 6.2837 s XBGoost 0.6284 0.7927 0.1815 6.2837 s CatBoost CatBoost0.6483 0.64830.8051 0.8051 0.15560.1556 73.3301 s 73.3301 s

Figure 12. Performance comparison chart of HY_LightGBM, XGBoost, and CatBoost. Figure 12. Performance comparison chart of HY_LightGBM, XGBoost, and CatBoost. The experiment proved that the LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm used in this paper is significantly superior to the XGBoost model and CatBoost model in prediction accuracy.

3.2.3. Comparison of Parameter Tuning among Bayesian Hyper-Parameter Optimization Algorithm, Genetic Algorithm, and Random Searching Algorithm This experiment verified the feasibility of the Bayesian hyper-parameter optimization algorithm for parameter optimization by comparing the performance of genetic algorithm, random searching algorithm, and the Bayesian hyper-parameter optimization algorithm in the parameter optimization. The parameters obtained by genetic algorithm were as follows: learning_rate was 0.05, the number of basic learners n_estimators was 400, the minimum row tree that the leaf may have min_data_in_leaf 60, the data ratio used in each iteration bagging_fraction 0.9, and the ratio of randomly selected features in each iteration feature_fraction 0.5. Appl. Sci. 2020, 10, x FOR PEER REVIEW 14 of 17

The experiment proved that the LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm used in this paper is significantly superior to the XGBoost model and CatBoost model in prediction accuracy.

3.2.3. Comparison of Parameter Tuning among Bayesian Hyper-Parameter Optimization Algorithm, Genetic Algorithm, and Random Searching Algorithm This experiment verified the feasibility of the Bayesian hyper-parameter optimization algorithm for parameter optimization by comparing the performance of genetic algorithm, random searching algorithm, and the Bayesian hyper-parameter optimization algorithm in the parameter optimization. The parameters obtained by genetic algorithm were as follows: learning_rate was 0.05, the number of basic learners n_estimators was 400, the minimum row tree that the leaf may have min_data_in_leaf 60, the data ratio used in each iteration bagging_fraction 0.9, and the ratio of randomly selected features in each iteration feature_fraction 0.5. The parameters obtained by the random searching algorithm were as follows: learning_rate Appl. Sci. 2020, 10, 3227 14 of 16 0.05, the number of basic learners n_estimators 370, the minimum row tree that the leaf may have min_data_in_leaf 36, the data ratio used in each iteration bagging_fraction 0.9, and the ratio of randomlyThe selected parameters features obtained in each by iteration the random featu searchingre_fraction algorithm 0.98, as were shown as follows: in Table learning_rate 6. 0.05, the number of basic learners n_estimators 370, the minimum row tree that the leaf may have min_data_in_leafTable 6. 36, Parameters the data ratio obtained used in from each parameter iteration bagging_fraction optimization of 0.9,genetic and algorithm the ratio of. randomly selected features in each iteration feature_fraction 0.98, as shown in Table6. Parameter Name GA_LightGBM RS_LightGBM Table 6. Parameterslearning_rate obtained from parameter0.05 optimization0.05 of genetic algorithm. n_estimators 400 370 Parameter Name GA_LightGBM RS_LightGBM min_data_in_leaf 60 36 bagging_fractionlearning_rate 0.050.9 0.050.9 n_estimators 400 370 feature_fraction 0.5 0.98 min_data_in_leaf 60 36 bagging_fraction 0.9 0.9 The prediction resultsfeature_fraction obtained by the LightGBM 0.5 model optimized 0.98 by the genetic algorithm and random searching algorithm, that is, the GA_LightGBM and RS_LightGBM model, were comparedThe with prediction those results of the obtained LightGBM by the LightGBMmodel optimized model optimized by the by theBayesian genetic algorithmhyper-parameter and optimizationrandom searching algorithm algorithm,, and thatthe is,comparison the GA_LightGBM results and are RS_LightGBM shown in model,Table were7 and compared Figure 13 respectivelywith those. of the LightGBM model optimized by the Bayesian hyper-parameter optimization algorithm, and the comparison results are shown in Table7 and Figure 13 respectively. Table 7. Comparison of GA_LightGBM, RS_LightGBM, and HY_LightGBM. Table 7. Comparison of GA_LightGBM, RS_LightGBM, and HY_LightGBM. Model Name MSE RMSE R-Square Model Name MSE RMSE R-Square GA_LightGBM 0.6116 0.7821 0.2033 GA_LightGBM 0.6116 0.7821 0.2033 RS_LightGBMRS_LightGBM 0.6094 0.6094 0.78060.7806 0.2063 0.2063 HY_LightGBMHY_LightGBM 0.5961 0.5961 0.77210.7721 0.2236 0.2236

Figure 13. Comparison chart of the predicted performance of HY_LightGBM, GA_LightGBM, and RS_LightGBM.

The experiment demonstrated that the LightGBM model optimized by the Bayesian hyper- parameter optimization algorithm achieves significantly better prediction results than the LightGBM model optimized by the genetic algorithm and random searching algorithm.

4. Conclusions This paper proposes an improved LightGBM prediction model based on the Bayesian hyper-parameter optimization algorithm, namely the HY_LightGBM model, where Bayesian hyper-parameter optimization was employed to find the optimal parameter combination for the model, which improved the prediction Appl. Sci. 2020, 10, 3227 15 of 16 accuracy of the LightGBM model. The experiments proved that the method proposed in this paper achieves a higher prediction accuracy than the XGBoost model and CatBoost model, and has higher efficiency than the genetic algorithm and random searching algorithm. The previous measurement of the blood glucose value needs to measure each person’s blood glucose value one by one, which requires a lot of manpower and material resources. After training, the model will no longer need the characteristic value of the blood glucose value. It can directly predict the blood glucose value of the physical examination personnel through the HY_LightGBM model and other physical examination indicators of the physical examination personnel. The HY_LightGBM model, with a strong generalization ability, can also be applied to other types of auxiliary diagnosis and treatment, but the overall performance can be further improved. The next work will focus on the further optimization of the model using the idea of model fusion to improve the prediction accuracy of the model.

Author Contributions: Conceptualization, T.W.; methodology, T.W.; software, T.W.; validation, T.W.; formal analysis T.W.; investigation, T.W.; resources, T.W.; writing (original draft preparation), T.W.; writing (review and editing), T.W., Y.W.; visualization, T.W.; supervision, T.W.; project administration, T.W.; funding acquisition, Y.W. All authors have read and agree to the published version of the manuscript. Funding: This research is supported by the key R & D plan of Gansu Province (18YF1GA060). Conflicts of Interest: The authors declare no conflict of interest.

References

1. American Diabetes Association. Standards of Medical Care in Diabetes—2019. Diabetes Care 2019, 42, S1–S2. [CrossRef][PubMed] 2. Kerner, W.; Brückel, J. Definition, Classification and Diagnosis of Diabetes Mellitus. Exp. Clin. Endocrinol. Diabetes 2014, 122, 384–386. [CrossRef][PubMed] 3. Cho, N.; Shaw, J.; Karuranga, S.; Huang, Y.; Fernandes, J.D.R.; Ohlrogge, A.; Malanda, B. IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res. Clin. Pract. 2018, 138, 271–281. [CrossRef][PubMed] 4. WHO.int. Diabetes. Available online: https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed on 26 March 2020). 5. Zhang, P.; Zhang, X.; Brown, J.; Vistisen, D.; Sicree, R.; Shaw, J.; Nichols, G. Global healthcare expenditure on diabetes for 2010 and 2030. Diabetes Res. Clin. Pract. 2010, 87, 293–301. [CrossRef] 6. Tripathi, B.; Srivastava, A. Diabetes mellitus complications and therapeutics. Med. Sci. Monit. 2006, 12, RA130–RA147. 7. He, J. Blood Glucose Concentration Prediction Based on Canonical Correlation Analysis. In Proceedings of the 38th China Control Conference, Guangzhou, China, 27–30 July 2019; pp. 1354–1359. 8. Yu, D.; Liu, Z.; Su, C.; Han, Y.; Duan, X.; Zhang, R.; Liu, X.; Yang, Y.; Xu, S. Copy number variation in plasma as a tool for lung cancer prediction using Extreme (XGBoost) classifier. Thorac. Cancer 2020, 11, 95–102. [CrossRef] 9. Cenggoro, T.; Mahesworo, B.; Budiarto, A.; Baurley, J.; Suparyanto, T.; Pardamean, B. Features Importance in Classification Models for Colorectal Cancer Cases Phenotype in Indonesia. Procedia Comput. Sci. 2019, 157, 313–320. [CrossRef] 10. Chang, W.; Liu, Y.; Xiao, Y.; Yuan, X.; Xu, X.; Zhang, S.; Zhou, S. A Machine-Learning-Based Prediction Method for Hypertension Outcomes Based on Medical Data. Diagnostics 2019, 9, 178. [CrossRef] 11. Azeez, O.; Wang, Q. XGBoost Model for Chronic Kidney Disease Diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019.[CrossRef] 12. Chang, W.; Liu, Y.; Xiao, Y.; Xu, X.; Zhou, S.; Lu, X.; Cheng, Y. Probability Analysis of Hypertension-Related Symptoms Based on XGBoost and Clustering Algorithm. Appl. Sci. 2019, 9, 1215. [CrossRef] 13. Wang, B.; Feng, H.; Wang, F. Application of cat boost model based on machine learning in prediction of severe HFMD. Chin. J. Infect. Control 2019, 18, 18–22. 14. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 785–794. Appl. Sci. 2020, 10, 3227 16 of 16

15. Rani, S.; Suri, B.; Goyal, R. On the Effectiveness of Using Elitist Genetic Algorithm in Mutation Testing. Symmetry 2019, 11, 1145. [CrossRef] 16. Fernández, J.; López-Campos, J.; Segade, A.; Vilán, J. A genetic algorithm for the characterization of hyperelastic materials. Appl. Math. Comput. 2018, 329, 239–250. [CrossRef] 17. Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. 18. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; pp. 3146–3154. 19. Gao, X.; Luo, H.; Wang, Q.; Zhao, F.; Ye, L.; Zhang, Y. A Human Activity Recognition Algorithm Based on Stacking Denoising Autoencoder and LightGBM. Sensors 2019, 19, 947. [CrossRef] 20. Choi, S.; Hur, J. An Ensemble Learner-Based Bagging Model Using Past Output Data for Photovoltaic Forecasting. Energies 2020, 13, 1438. [CrossRef] 21. Cordeiro, J.R.; Postolache, O.; Ferreira, J.C. Child’s Target Height Prediction Evolution. Appl. Sci. 2019, 9, 5447. [CrossRef] 22. Ma, X.; Sa, J.; Wang, D.; Yu, Y.; Yang, Q.; Niu, X. Study on a prediction of p2p network loan default based on the machine learning and xgboost algorithms according to different high dimensional data cleaning. Electron. Commer. Res. Appl. 2018, 31, 24–39. [CrossRef] 23. Cheng, C.; Qing, M.; Zhang, Q.; Ma, B. LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion. Chemom. Intell. Lab. Syst. 2019, 191, 54–64. [CrossRef] 24. Dev, V.A.; Eden, M.R. Formation lithology classification using scalable gradient boosted decision trees. Comput. Chem. Eng. 2019, 128, 392–404. [CrossRef] 25. Letham, B.; Karrer, B.; Ottoni, G.; Bakshy, E. Constrained bayesian optimization with noisy experiments. Bayesian Anal. 2019, 14, 495–519. [CrossRef] 26. Zhou, T.; Lu, H.; Wang, W.; Yong, X. GA-SVM based feature selection and parameter optimization in hospitalization expense modeling. Appl. Soft Comput. J. 2018, 75, 323–332. 27. Ma, C.; Yang, S.; Zhang, H.; Xiang, M.; Huang, Q.; Wei, Y. Prediction models of human plasma protein binding rate and oral bioavailability derived by using GA–CG–SVM method. J. Pharm. Biomed. Anal. 2008, 47, 677–682. [CrossRef][PubMed] 28. Raman, M.; Somu, N.; Kirthivasan, K.; Liscano, R.; Sriram, V. An efficient intrusion detection system based on hypergraph—Genetic algorithm for parameter optimization and feature selection in support vector machine. Knowl. Based Syst. 2017, 134, 1–12. [CrossRef] 29. Su, B.; Wang, Y. Genetic algorithm based feature selection and parameter optimization for support vector regression applied to semantic textual similarity. J. Shanghai Jiaotong Univ. (Sci.) 2015, 20, 143–148. [CrossRef] 30. Putatunda, S.; Rama, K. A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper- Parameter Optimization of XGBoost. In Proceedings of the 2018 International Conference on Signal Processing and Machine Learning (SPML ’18); Association for Computing Machinery: New York, NY, USA, 2018; pp. 6–10. [CrossRef]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).