Medical & Biological Engineering & Computing https://doi.org/10.1007/s11517-019-01978-z

ORIGINAL ARTICLE

Ensemble learning algorithm based on multi-parameters for sleep staging

Qiangqiang Wang1 & Dechun Zhao1 & Yi Wang1 & Xiaorong Hou2

Received: 21 May 2018 /Accepted: 4 April 2019 # International Federation for Medical and Biological Engineering 2019

Abstract The aim of this study is to propose a high-accuracy and high- sleep staging algorithm using single-channel electroen- cephalograms (EEGs). The process consists four parts: signal preprocessing, feature extraction, feature selection, and classifi- cation algorithms. In the preconditioning of EEG, function and IIR filter are used for noise reduction. In feature selection, 15 feature algorithms in , time-, and nonlinearity are selected to obtain 30 feature parameters. Feature selection is very important for eliminating irrelevant and redundant features. Feature selection algorithms as Fisher score, Sequential Forward Selection (SFS), Sequential Floating Forward Selection (SFFS), and Fast Correlation-Based Filter Solution (FCBF) were used. The paper establishes a new ensemble learning algorithm based on stacking model. The basic layers are k- Nearest Neighbor (KNN), Random Forest (RF), Extremely Randomized Trees (ERT), Multi-layer Perceptron (MLP), and Extreme Gradient Boosting (XGBoost) and the second layer is a Logistic regression. Comparing classification of RF, Gradient Boosting Decision Tree (GBDT), and XGBoost, the accuracies and kappa coefficients are 96.67% and 0.96 using the proposed method. It is higher than other classification algorithms.The results show that the proposed method can accurately sleep staging using single-channel EEG and has a high ability to predict sleep staging.

Keywords EEG signal . Sleep stage . Feature selection . Ensemble learning algorithm . Stacking

1 Introduction sleep (REM). NREM can be divided into N-REM1, N-REM2, N-REM3, and N-REM4. Sleep is an important physiological phenomenon and a neces- The sleep stage classification is classically performed by sary physiological process. Sleep can be used to analyze the identifying the characteristics extracted from cerebral rhythms. quality of sleep for sleep conditions and to detect certain The awake stage contains alpha (more than 50%) wave activity sleep-based diseases such as neurasthenia and cardiovascular and low-amplitude mixed-frequency activity. N-REM1 is the disease. It has important clinical significance and broad appli- transitional phase of the brain from the awake phase to the sleep cation prospects. Sleep stage is an important prerequisite for phase. It is manifested by the transition from the alpha wave (8– understanding sleep status. In 1968, Rechtschaffen and Kates 13 Hz) to theta wave (4–7Hz).IntheN-REM2,theEEGbrain propose R&K staging rules [26] based on factors such as wave with low amplitude was mainly composed of sleep spin- EEG, electrooculogram (EOG), eye movements in electromy- dle, k complex wave, and delta (0.5–2Hz)wave(lessthan ography (EMG), and muscle size during sleep. According to 20%). The delta (0.5–2 Hz) wave is the dominant wave in N- the rules, sleep is divided into the awake stage, the non-rapid REM3. In the N-REM4, the ratio of delta (0.5–2Hz)wavesis eye movement sleep (NREM), and the rapid eye movement higher than 50% and saw-tooth waves appear. In REM, brainwave waveform is similar to the NREM1/2, but there is a period of rapid eye movements during this period. This period * Dechun Zhao is also the main period of dreaming. [email protected] Figure 1 shows the brain activity during different periods of sleep. NREM and REM sleep occur in alternating cycles, each 1 Chongqing University of Posts and Telecommunications, lasting approximately 90–110 min (min) in adults, with ap- Chongqing, China proximately 4–6 cycles during the course of a normal 6–8-h 2 Chongqing Medical University, Chongqing, China (h) sleep period. Med Biol Eng Comput

Fig. 1 EEG signals at different sleep stages

In clinical practice, sleep stages are widely classified using et al. [56] used Support Vector Machine classification algo- manual judgment. Manual staging is based upon visual in- rithm (SVM) to get an 87.5% accuracy rate. In order to obtain spection of the EEG as well as the EOG and EMG traces. It higher accuracy rate, Kaveh et al. [42] used Fpz-Cz EEG has a high sleep recognition rate. However, it needs to be signals and RDSTFT as a feature algorithm. Then they used completed by expert visual analysis. It is subjective and inef- a random forest to perform four-stage sleep classification. A ficient and can easily lead to misjudgment. Automated sleep 92.5% accuracy rate was obtained. analysis has been around for almost 30 years [10]. Automated The most important part of the sleep staging algorithm is the sleep analysis uses modern technology to accuracy of identification [41]. Features are important factors that achieve effective and accurate staging. Arthur et al. [15]ex- affect recognition. How to select the feature parameters with high tracted the complexity and correlation coefficients and used accuracy is the key to the algorithm. Most papers do not specify the hidden Markov model (HMM) by C3 and C4 EEG data. this point in detail, and they are all based on their respective char- The accuracy rate of the algorithm was 80%. Luay et al. [18] acteristic parameters for the staging algorithm. In this paper, 30 extracted wavelet coefficients and used the regression tree feature parameters are derived from 15 feature algorithm groups, classification algorithm to classify sleep for six periods by which are highly reliable feature sets, are synthesized. The inte- Pz-Oz EEG signals. In this regard, they obtain a 75% accuracy grated feature set will inevitably have problems of redundancy rate. By using the C4-A1 EEG signal and a decision tree and time-consuming. Then, feature selection algorithm is used to classification algorithm (DT), Salih et al. [19]extractedthe obtain the optimal feature set. For sleep staging, most of the early Welch coefficients and divided EEG waves into six periods. classification algorithms are SVM and NN [1]. Currently, random The authors obtained an 82.15% accuracy rate finally. In an- forests are widely used in this field and get good results [18]. The other study, Thiago et al. [11] used Pz-Oz EEG signals to integration algorithm has many advantages. Random forest is a extract , , and . Then they obtained prominent representative. In this paper, 30 feature parameters are a 90.5% accuracy rate by using a random forest classification derived from 15 feature algorithms (Stacking) is proposed to in- algorithm (RF) to classify sleep into six periods. Farideh et al. tegrate random forest and other algorithms. Compared with sev- [13] who extracted wavelet packet coefficients by using Pz-Oz eral integration algorithms as RF, GBDT, and XGBoost, the sleep EEG signal, used Artificial Neural Network classification al- staging results are greatly improved. gorithm (ANN) to get a result a 93% accuracy rate when The paper is organized in the following manner: The first divided it into six periods. By Pz-Oz EEG signal and thinking part introduces current research status and the main research Different Visibility Graph (DVG) as a feature algorithm, Zhu results in the automatic stage of sleep. The second part Med Biol Eng Comput introduces the datum and the detailed algorithm including data preprocessing, feature extraction, feature selection, and clas- sification algorithms. The third part provides the results of feature selection algorithm and integrated classification algo- rithm through the simulation of MATLAB and python. The fourth part compares and discusses the accuracy. In the fifth part, it describes and summarizes the results of this paper and look forward to the direction of future research in the automat- ic staging of sleep.

2Methods

2.1 Data description and preprocessing

Fig. 2 The raw signal and noise-reduced signal consist of the awake The data set used in the study was provided by the Sleep-EDF period in Pz-Oz, with a total of 3000 epochs. The raw EEG signal is database [23, 28]. It was obtained from Caucasian males and shown in the first plot, then the following is noise-reduced signal females (21–35 years old) without any medication. They con- tain horizontal EOG, Fpz-Cz, and Pz-Oz EEG, each sampled as features of sleep staging. Hjorth activity, Hjorth complexity, at 100 Hz. Hypnograms are manually scored according to Hjorth mobility, Energy (δ/α), LZ complexity, and the energy Rechtschaffen & Kales based on Fpz-Cz / Pz-Oz EEG instead of the four rhythmic waves (α, β, θ,andδ) are taken from of C4-A1 / C3-A2 EEG [50]. In this study, the Pz-Oz channel [29]. Fractal dimension, largest Lyapunov exponent, and EEG signal was selected to analyze and identify the sleep Hurst exponent are taken from [37]. Kurtosis and skewness stages because it can provide better automatic classification are taken from [45]. Tsallis entropy and permutation entropy accuracy than the Fpz-Cz channel [56]. The interval of each are taken from [43]. Fuzzy entropy is taken from [8]. Sample segment (or epoch) in this study is defined as 30 s and contains entropy is taken from [22]. Based on the important influence 3000 data points. The data composition is shown in Table 1. of four rhythmic waves on sleep staging in sleep staging, this The study uses an adaptive threshold discrete wavelet func- paper statistically makes the following feature selection: the tion and IIR filter function filtering method to reduce the noise of four rhythm waves (α, β, θ,andδ)and of EEG signals, which can effectively improve the signal-to- maximum value of four rhythm waves (α, β, θ,andδ). All the noise ratio of the signal. Wavelet base is Bdb4^ and wavelet rhythmic waves here are obtained by wavelet decomposition. decomposition layer is four layers. The threshold and thresh- This paper obtains 30 features through 15 feature extraction old functions respectively are the adaptive threshold and soft algorithms. The feature algorithm is shown in Table 2. threshold functions. The IIR filter function uses a 20th-order Butterworth filter function with a filter frequency of 1–60 Hz. 2.2.1 Time domain features In order to understand the signal, the performance of the denoising function is measured by using SNR. It used formula For the specificity of EEG signals, Bo Hjorth [25] introduces SNR = 10Log (Ps/Pn), where Ps and Pn represent the effective Hjorth parameters, which contain: Activity, Mobility, and power of signal and noise. The study gets the maximum SNR Complexity. Let y (t) be the signal. of 14 dB when using wavelet basis Bdb4^ and four-level de- Activity ¼ varðÞytðÞ ð1Þ composition. Figure 2 shows the comparison between the vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u  original signal and the noise-reduced signal. u dyðÞ t u tvar dt Mobility ¼ ð2Þ 2.2 Feature extraction varðÞytðÞ  dyðÞ t There are three types of characteristic signals including time Mobility domain, time-frequency, and non-linearity used in this paper dt Complexity ¼ ð3Þ MobilityðÞytðÞ Table 1 Data sets from sleep-data Skewness [21, 32] is a description of the symmetry condi- Sleep stage Awake N1 N2 N3 N4 REM Total tion of the distribution of data composition. Kurtosis [21, 32] Number(Epochs) 3853 826 1185 991 831 537 8233 describes whether the peak of the distribution of the data structure is abrupt or flat. Variance [31] measures the degree Med Biol Eng Comput

Table 2 Feature distribution table in the discrete wavelet transform to generate approximate sig- Feature category nals and detail signals [12]. Each layer decomposition reduces the data frequency by a factor of two. Time domain Hjorth activity (1) Hjorth complexity (2) Hjorth For the signal, the original frequency is 100 Hz, mobility (3) respectively. DWT is very important for wavelet base selection Kurtosis (4) Skewness (5) Standard deviation (αβθδ)(6–9) and wavelet decomposition layer number. This paper adopts Max value (αβθδ)(10–13) Bdb4^ wavelet basis, which is smooth and compact according Time-frequency Energy (αβθδ)(14–17) Energy (δ/α)(18) to this kind of wavelet. Compared with Bdb2,^ Bsym2,^ and domain Bcoif4,^ the Bdb4^ has higher accuracy and is more suitable for Non-linear Fuzzy entropy (19) Sample entropy (20) Fractal di- the processing of EEG signals. The signal’s main frequency com- mension (21) ponent and the frequency of the required signal classification Sample entropy (αβθδ)(22–25) Tsallis entropy (26) determine the number of decomposition layers. Therefore, a 4- LZ complexity (27) Hurst exponent (28) layer wavelet decomposition can be used to effectively decom- Largest Lyapunov exponent (29) Permutation en- pose the required rhythm wave frequency. As shown in Fig. 3, tropy (30) the D2 frequency (12.5–25 Hz) is in the beta rhythm wave at the standard frequency (13–30 Hz), which can effectively represent – of deviation between a random variable and its mathematical beta rhythm waves. Similarly, D3 (6.25 12.5 Hz) represents an – expectation (the ). The rhythm waves are obtained from alpha rhythm wave. D4 (3.125 6.25 Hz) represents theta rhythm the wavelet transform described in the time-frequency do- wave. Since the detail signals D5 and D6 cannot effectively rep- – main. So, the skewness, kurtosis, variance, and maximum resent delta rhythm waves, so the approximate signal A4 (0 formulas for the rhythm wave are as follows: 3.125 Hz) is used to represent the delta rhythm wave. Figures 4 show the wavelet decomposition of data in the awake stage. n 1 3 3 With wavelet rhythm wavelet decomposition, wavelet Skewness ¼ ∑ ðÞxi−μ =SD ð4Þ n−1 i¼1 rhythm (αβθδ) wave energy can be extracted as follows: 1 n ÀÁ m ¼ ∑ ðÞx −μ 4=SD4 ð Þ 2 2 Kurtosis i 5 Eij ¼ ∫ f ij t j dt ¼ ∑ xij : ð9Þ n− i¼ 1 1 j¼1 n 1 2 Eij represent the frequency band energy value of the j-th Variance ¼ ∑ ðÞxi−μ ð6Þ n−1 i¼1 node on the i-th layer of sleep EEG. xij represent the amplitude i of the discrete points of the signal fij (tj). i =0,1,2…,2− 1. MaxV ¼ Max½Š xi ð7Þ x … Where n is number of samples, i=1, 2, 3 is a . 2.2.3 Non-linear μ is the mean of the sample. SD is variance. Sample entropy Sample entropy (SampEn) is a measure of 2.2.2 Time-frequency domain time series complexity proposed by Richman and Moornan [39]. It is an improvement over the approximate entropy algo- Wavelet transform (WT) [14] is the main time-frequency anal- rithm [5]. Sample entropy reduces the error of approximate ysis method. An important property of wavelet transform is entropy and is more closely related to the known random part. that it has good localization characteristics in time domain and In the feature processing of sleep EEG staging, the frequency domain and can provide frequency information of each frequency sub-section of the target signal. The continu- ous wavelet transform formula is as follows:

þ∞  1 * t−b W f ðÞ¼a; b < f ; Ψa;b >¼ pffiffiffi ∫ ftðÞΨ dt ð8Þ a −∞ a

Generally, the computer implements the binary discrete processing in the wavelet transform. The discretized wavelet and its corresponding wavelet transform are called discrete wavelet transforms (DWT) [48]. In fact, the discrete wavelet transform is obtained by discretizing the scale and displace- ment of the continuous wavelet transform according to the Fig. 3 Frequency of sub-bands obtains by fourth level of DWT power of two. There are low-pass filters and high-pass filters decomposition of EEG data Med Biol Eng Comput

Fig. 4 Four-level wavelet decomposition of the 3000 epochs. At this , the subject’s sleep is scored as awake according to the Sleep-EDF. The raw EEG signal is shown in the first plot, while the details in each wavelet decomposition are presented in the remaining ones

N−m approximate entropy is gradually replaced. In this study, sam- φmðÞ¼n; r 1 ∑N−m 1 ∑ Dm ð Þ N−m i¼1 N−m− ij 14 ple entropy was extracted as a feature. The lower the value of 1 j¼1  the sample entropy, the higher the sequence self-similarity; the n Dm ¼ − dm =r ð Þ larger the sample entropy, the more complex the sample se- ij exp ij 15 quence. Therefore, when NREM4 is in sleep, the sample en- tropy is the largest, followed by REM. dij is the maximum value of the difference in distance be- N m m Let the original data be a time series of length . tween X i andX j . More details can be found in [9]. ÂÃ SampEnðÞ¼ N; m; r −ln φmþ1ðÞr =φmðÞr ð10Þ The values of sample entropy and fuzzy entropy are deter- mined by the embedding dimension (m) and similar tolerance mþ1 ci ðÞ¼r N mþ1ðÞi =ðÞN−m þ 1 ð11Þ (r). Pincus [34] pointed out that r is 0.1 to 0.25 times the m 1 standard deviation of the original data and is 1 or 2, the φmþ1ðÞ¼r ∑N−mþ1cmþ1ðÞr ð12Þ N−ðÞm þ 1 i¼1 i calculated entropy has more reasonable statistical characteris- tics. The study uses this conclusion and the Pearson correla- More details can be found in [46]. tion coefficient [33] to calculate sample entropy, fuzzy entro- py, and correlation of initial data for each parameter, which is Fuzzy entropy Fuzzy entropy (FuzzyEn) is a time series com- shown in Table 3. plexity calculation method proposed by Chen [9]. It is suc- Therefore, the optimal parameters of the sample entropy: m cessfully used in the feature extraction and classification of is 2 and r is 0.15. Fuzzy entropy optimal parameters: m is 1 EMG. Importing the concept of fuzzy sets, vectors’ similarity and r is 0.15. Using this study, six features were extracted: the is fuzzily defined in FuzzyEn on the basis of exponential sample entropy of the overall data, the sample entropy of the function and their shapes. Besides possessing the good prop- four rhythm waves, and the fuzzy entropy of the overall data. erties of SampEn, FuzzyEn also succeeds in giving the entro- py definition in the case of small parameters. Due to the com- Fractal dimension Higuch [30] proposes a method for the plexity of the characterization signal, this article uses it to fractal dimension of EEG signals. In general, signals that ex- extract features of EEG signals. The activity curve of Fuzzy hibit are considered fractal. Autocorrelation is Entropy is similar to the sample entropy. So, during the awake a part of the signal that has similarity with this signal, and this period, Fuzzy Entropy is the smallest, and during the NREM similarity repeats in a recursive manner. The fractal dimension period, it reaches the maximum. is a quantitative measure of this similarity. Let the original data be a time series of length N, expressed Let the original data be a time series of length N, expressed as {u(i):1≤ i ≤ N}. as {X(i):1≤ i ≤ N} construct this sequence: ÈÉ m mþ1 m FuzzyEnðÞ¼ m; n; r lnφ ðÞn; r −lnφ ðÞn; r ð13Þ X k : XmðÞ; XmðÞþ k ; XmðÞþ 2k ;::: ; m ¼ 1;:::;k: Med Biol Eng Comput

Table 3 Correlation between sample entropy, fuzzy entropy, and initial asymptotic behavior of the rescaled range of the process. data under various parameters Hurst exponent (H) is generally divided into three cases: (1) Correlation m =1& m =1& m =2& m =2& when H is 0.5, the system is in a stable state; (2) when H is r =0.15 r =0.2 r =0.15 r =0.2 more than 0.5, the data flow shows a positive effect, which indicates that the future trend is consistent with the past. The Sample 0.8346 0.8794 0.8815 0.8766 closer H is to 1, the stronger the persistence is; (3) when H is Fuzzy 0.8869 0.8418 0.8766 0.8621 less than 0.5, data stream has a negative effect, which indicates that the future trend of sleep state is contrary to the past. The closer H is to 0, the stronger the anti-persistence is. The purpose is to calculate the BLength^ of the input signal at different time intervals K and estimate the fractal dimension Permutation entropy Bandt and Pompe [3]proposedanaver- by the following formula: age entropy parameter to measure the complexity of one-

−D dimensional time series-permutation entropy. Compared with LkðÞ∝K : ð16Þ the fractal dimension and correlation dimension that also re- flect one-dimensional time series, it is characterized by high The estimate for length is as follows:  robustness, low computational cost, and strong ability to resist M N− noise and interference. It is the quantification of EEG com- L ðÞ¼k 1 ∑ jXmðÞþ ik −XmðÞjþ ðÞi− k 1 : m 1 plexity activities. Due to the complexity of sleep activities, k i¼1 Mk permutation entropy can identify EEG information very well. ð17Þ The main process is as follows: In general, it can use the standard least-squares fit to find Let the original data be a time series of length N, expressed the slope that best fits the straight line, which is the fractal as {u(i):1≤ i ≤ N}. dimension. For a simple curve, D is close to 1, for highly The reconstruction matrix is established using the embed- irregular curves, D is close to 2. The value of d used in this ding dimension (m) and delay time (τ). study is 1. The choice of parameters has been mentioned in the asso- ciated dimension chapter above. Then rearrange the recon- Tsallis entropy According to Plastino’s[35] description of struction matrix: Tsallis entropy, it is a special case of spectral entropy based xi½Šþ ðÞj −1 τ ≤xi½Šþ ðÞj −1 τ ≤n≤xi½Šðþ ðÞj −1 τ 20Þ on the concept of generalized entropy of a probability distri- 1 2 m bution. So, EEG complexity can be reflected in Tsallis entro- A set of symbol sequences can be obtained for any time py. The formula is as follows: series [s(l), l =1,...,m]. The M-dimensional phase space map  m m has different symbol sequences total of ! Calculate the prob- q k 1−∑pi ability of occurrence of each symbol sequence [Pi(i =1,2,..., S ¼ i¼1 ; q∈R: ð Þ k)]. According to the Shannon entropy model [3], the permu- q q− 18 1 tation entropy is defined as:

In this formula, k is the Bolzman constant, usually taken as k 1. In the sleep EEG analysis, the value of q is generally taken H pð¼m −∑−p jlnp j: ð21Þ j¼1 as 2. This study uses this conclusion.

Hurst exponent The Hurst exponent [6] is a measure used to The largest Lyapunov exponent The largest Lyapunov expo- evaluate self-similarity and correlation. The value is from 0 to nent [17] is an important quantitative measure of the dynamics 1. Hurst is used for EEG time series analysis to give an of the system. It characterizes the average exponential rate of unsteady/anti-static computer electrical status observed during convergence or between adjacent orbits of the sys- H sleep. is defined as: tem in phase space. If the system is dynamically chaotic, the H ¼ logðÞR=S =logðÞT : ð19Þ maximum Lyapunov exponent is greater than zero. In sleep electroencephalography, a small amount of data is Where T is the duration of the sample of data and R/S is the generally used. The largest Lyapunov exponent needs to de- corresponding value of rescaled range. The above expression termine the time delay, average period, and embedding dimen- is obtained from Hurst’s generalized equation of time-series sion. Time delay generally adopts autocorrelation function that is also valid for Brownian motion. It is given by R/S = k × method and C-C method. Generally, the correlation dimension TH,wherek is a constant. The Hurst exponent is a measure of (d) is calculated first, and then the embedding dimension (m) the smoothness of a fractal time-series based on the is determined according to this formula (m ≥ 2d +1). Med Biol Eng Comput

Here, the C-C method is used to calculate the time delay of 2.3.3 FCBF 17 in the largest Lyapunov exponent, the average period of 256, and the embedding dimension 1.5. The full name of FCBF algorithm is Fast Correlation–Based Filter Solution [55], which is a fast filtering feature selection Complexity Wu and others [20] introduced the complexity algorithm. The method is based on symmetrical uncertainty into the study of EEG signals. It can quantitatively evaluate (SU). The algorithm steps are as follows: Calculate the corre- the complexity of the intuitive signal curve changes and can lation between each feature (Fi) and the target (C)asfollows: be effectively used in the analysis of sleep EEG. Its advan- IGðÞ Fi; Y tages are simple, easy to implement, and time-consuming. SUðÞ¼ Fi; Y 2 ð24Þ HFðÞþi HYðÞ Let the original signal [Si(i =1,2,...,n)] be the sequence belonging to the range from 0 to 1 IGðÞ¼ Fi=Y HFðÞi −HFðÞi=Y ð25Þ c lim cnðÞ¼bnðÞ¼n=log n: ð22Þ HFðÞ¼−∑pfðÞ* ðÞpfðÞ ðÞ n→∞ 2 i i log2 i 26 i¼1 c c  Complexity is the result of normalization, which is HFðÞ¼jY −∑pfðÞ∑ pfjy pfjy : ð Þ i i j i log2 j i 27 C(n)=c(n)/b(n). More details can be found in [20]. i¼1 j¼1

Here IG (X, Y) represents information gain and E(X)repre- 2.3 Feature selection algorithms sents information entropy. p(fi) represents the probability when Fi is taken as i,andc is the number of categories. 2.3.1 Fisher score First, select the feature with a correlation (SU) greater than δ SU the pre-set threshold .Sort Fi;c in descending order, and Fisher score (FS) [27] is a common feature correlation criteri- calculate the correlation between each feature (Fi) and all oth- SU on. First, find the Fisher score for each feature as F,thenseta er features less than Fi;c. Feature selection methods are as threshold as θ.IfF is greater than θ, the feature is selected, follows: otherwise not selected. The characteristics of this algorithm are simple calculation, high accuracy, strong operability, and Step1: economical operation time. The definition is as follows:  c 2 k ∑ nk μ f −μ f SU > SU SU ð Þ j i If Fi;c F j;c then chose Fi;c 28 FfðÞ¼k¼1 : ð Þ i c 2 23 ∑ ∑ f −μk j;i f i k¼1 y j¼1

μ i f Step2: f i represents the average of the -th feature ( i)inthe μk i sample. f i represents the average value of the -th feature (fi)inthek-th class in the sample. nk is the number of the k- SU > SU F ð Þ If F j; Fi F j;c then delete feature j 29 th sample and c is the classification. fj, i represents the value of the i-th feature in the j-th feature. The advantage of this method is that comparing a pair of redundant features, it is possible to retain features that are more relevant to the target C and to eliminate features that 2.3.2 SFFS are less relevant. At the same time, it uses the features with higher correlation to filter other features and also reduces the Sequential Floating Forward Selection (SFFS) [52] is a heu- time complexity. Therefore, it is a fast-filtering feature selec- ristic search-based feature selection algorithm. It starts from tion algorithm. the empty set and selects a subset x among the unselected features in each round so that the evaluation function J is optimal after joining the subset x. Due to the existence of 2.4 Classification algorithm feature redundancy, SFFS selects the subset z among the se- lected features so that the evaluation function is optimized 2.4.1 RF after eliminating z. Its advantage lies in its high computational efficiency, but its disadvantage is that it sacrifices the global The random forest was proposed by Breiman [4] in which the optimum and does not necessarily yield optimal results. meta-learner is composed of decision trees. It adopts the Med Biol Eng Comput bagging technique of integration [36]. There are two main 2.5 Classification evaluation index steps in the algorithm: In the first step, it is to put the original training data set back into N times and get N subsets (N is the Briefly describe the evaluation index of the classification al- training data set); In the second step, the sample dataset gen- gorithm used in this paper. The concepts of positive and neg- erates multiple decision trees. From the root node to the sub- ative are proposed by the two-class learning algorithm and are sequent nodes, a subset of m input features is randomly select- also suitable for multiple classifications. True positives (TP) is ed. In the test of these m input variables, the nodes that effec- the correct number of positive cases; False positives (FP) is tively divide the sample into two separate samples are select- the incorrect number of positive cases; False negatives (FN) is ed. Each decision tree gets a decision and uses a simple voting the number of incorrectly divided negative cases; True mechanism to get the final classification result. Negatives (TN) is the number of negatives that are correctly divided. The confusion matrix [11] is a situation analysis table summarizing the prediction results of the classification model. 2.4.2 GBDT The records in the data set are summarized in a matrix form according to the classifications made by the actual classifica- Gradient Boosting Decision Tree (GBDT) [54], also known as tion and the classification model. Accuracy, Precision, Recall, Multiple Additive Regression Tree (MART),is an iterative andF1score[24] are composed of the parameters TP, TN, FP, decision tree algorithm. It is composed of Boosting ensemble and FN. The formula is as follows: learning [36]. Boosting is an algorithm that promotes weak TP þ TN learners to strong learners. It belongs to the category of en- ¼ ð Þ Accuracy TP þ TN þ FP þ FN 30 semble learning. The main idea of the GDBT algorithm is to TP train a series of classifiers iteratively. The sample distribution ¼ ð Þ Precision TP þ FP 31 used by each classifier is related to the previous round of learning results. After constructing the decision tree, it uses TP ¼ ð Þ the residual algorithm principle to iteratively generate a new Recall TP þ FN 32 decision tree. * 2*Precision Recall F score ¼ : ð33Þ 1 Precision þ Recall 2.4.3 XGBoost In addition, kappa coefficients [11] are also commonly used to judge the difference between the classifier’sclassifi- Extreme Gradient Boosting (XGBoost) [7] is an optimization cation result and the random classification. The formula is as algorithm based on GBDT, proposed by Dr. Chen Tianqi. The follows: main improvements of XGBoost are: Traditional GBDT uses cart as the base classifier, and XGBoost also supports linear ACC−ACC0 k ¼ : ð34Þ classifiers. It adds regularization items to the cost function to 1−ACC0 control the complexity of the model. XGBoost supports par- allel processing and the speed of the algorithm is greatly im- ACC indicates classification accuracy. Opportunity level proved. It has high flexibility to allow users to customize ACC0 =1/NY, NY indicates the number of classification. optimization goals and evaluation criteria. It has rules that When the Kappa coefficient is 0, it that the classifica- can handle missing samples of feature values and pruning tion accuracy is the degree of opportunity level, and the Kappa operations that can effectively avoid overfitting. coefficient of 1 means the best classification effect.

2.4.4 Stacking 3 Results Stacking [49] is an ensemble learning technique whose prin- ciple is to build a meta-classifier or meta-regression to aggre- This study uses the Scikit-learn, XGBoost, and mlxtend li- gate multiple classification or regression models. The basic brary stacking algorithms [38] in Python, and uses level model uses a complete training set for cross-validation MATLAB 2017a for signal preprocessing and feature extrac- training. Then, the Meta model is trained based on the output tion. Sleep stages are divided into Awake, NREM-1, NREM- of the base level model, and the final result is obtained. The 2, NREM-3, NREM-4, and REM stages according to R&K basic level is usually composed of different learning algo- rules. In the classification algorithm, the original data is divid- rithms, so the stacking ensemble learning algorithm is usually ed into a 70% training set and a 30% prediction set. K-fold heterogeneous. The model is shown in Fig. 5. cross-validation [40] is 5. The main evaluation criteria are Med Biol Eng Comput

algorithm, but the number of trees is too many, it is easy to cause over-fitting. Compared range from 1 to 100 trees with an interval of 5 and other parameters default to sleep staging. The results are shown in Fig. 7. When the number of trees is 25, the accuracy is 0.8949 to achieve the best algorithm accu- racy. Using cross-validation and grid search, the optimal num- ber of iterations is 60, and the oob_score parameter is set to true. And setting the oob_score parameter to true. The reason is that bagged samples can be used to evaluate the quality of Fig. 5 The stacking model the model and improve the generalization ability of the algo- rithm after fitting. classification accuracy, Confusion matrix and Kapper As shown in Fig. 6, the first ten feature datasets selected by coefficient. the FS feature selection algorithm will have the maximum accuracy of 0.9222. In the first ten feature data sets, the SFFS obtained an accuracy of 0.8823 and achieved a maxi- 3.1 Results of feature selection mum accuracy of 0.9222 in the 15 feature sets. In the first ten feature data sets, the FCBF was 0.8234, and the maximum After feature extraction of feature data, 30 sets of feature data accuracy was 0.9211 in the 17 feature sets. After 17 feature sets were obtained. It uses three feature selection algorithms as groups, the results of the three feature selection algorithms Fisher Score, SFFS, and FCBF to calculate the weight of the tend to be consistent with 0.91. Therefore, using the FS feature feature algorithm (weights from high to low). Table 4 shows selection algorithm in RF can achieve the minimum feature the results of feature selection. data set (10) and the maximum accuracy (0.9222). The selec- tion features are 25-Sample entropy (δ), 30-Permutation en- 3.2 Classification results tropy, 2-Hjorth complexity, 19-Fuzzy entropy, 28-Hurst expo- nent, 18-Energy (δ/α), 29-Largest Lyapunov exponent, 16- 3.2.1 RF Energy (θ), 27-LZ Complexity, and 23-Sample entropy (β).

Using the three feature selection results, the feature data set 3.2.2 GBDT was selected in order from the highest to the lowest weight, and the RF algorithm was used to perform the sleep staging. The use of GBDT classification algorithm, from the highest to The results are shown in Fig. 6. The main parameters of the the lowest weights in order to select the characteristics of the random forest algorithm are the number of trees (m), the max- data group for sleep staging, the results are shown in Fig. 9. imum number of iterations for the weak learner (n_estima- This study uses cross-validation and grid search for the main tors), and oob-score. The study parameters are set as follows: parameters of GBDT to obtain the optimal parameters. The The more the number of trees, the higher the accuracy of the maximum number of features (max_features) is 10, the learn- ing rate is 0.001, the maximum depth of the decision tree (max_depth) is 7, the maximum number of iterations for the weak learner (n_estimators) is 5000, the minimum number of leaves (min_samples_leaf) is 90, the minimum number of samples required for internal node redistribution (min_samples_split) is 1000, and the other parameters are default.

Table 4 Feature selection results

Selection Selection feature (weights from high to low) algorithm

Fisher score 25 30 2 19 28 18 29 16 27 23 10 9 6 5 12 13 22 8 3 11 14 24 17 26 4 7 20 21 1 15 SFFS 211141520232527226816181971713542930 121391021242826 FCBF 23 25 4 5 2 1 18 19 22 29 9 14 21 26 30 6 11 20 Fig. 6 Results of different feature selection algorithms in RF 727122813168153241710 classification Med Biol Eng Comput

3.2.3 XGBoost

The XGBoost algorithm selects feature data sets from the highest to the lowest weight in order to perform sleep staging. The results are shown in Fig. 9. The XGBoost algorithm is derived from GBDT and the main parameters are similar. The maximum tree depth is 12, and the greater the depth of the tree, the easier it is to overfit. The L2 regularization parameter is set to 2, because the larger the parameter, the less likely to cause over-fitting. Learning rate is 0.001. The number of lifting iterations is 100, and other parameters are equivalent to GBDT. Figure 9 shows that the first 11 feature data sets were se- lected using the Fisher score feature selection algorithm. The maximum accuracy is 0.9023 and the SFFS and FCBF accu- Fig. 7 Accuracy under different trees in a random forest racy is 0.8933 and 0.8122. The SFFS maximum accuracy of 0.8983 is obtained in the first 13 feature arrays. After the 17 From Fig. 8, it can get the maximum accuracy of characteristics of FCBF, the maximum accuracy was 0.9015. 0.8422 by selecting the first ten feature data of the After 17 feature groups, the results of the three feature selec- FCBF algorithm. In selecting the first 10 feature datasets, tion algorithms tend to be consistent with 0.90. Therefore, the the Fisher score has an accuracy of 0.8232. In the first 14 minimum feature data set (10) and the maximum accuracy features, the maximum accuracy is 0.8340. In selecting (0.9023) can be achieved by using the FS feature selection the first 10 characteristic data sets, the SFFS is 0.8122, algorithm in XGBoost. Feature selection is equivalent to the and the maximum accuracy is 0.8359 in the first 16 fea- selection of random forests. tures. After 17 feature groups, the sleep stage results ob- tained by the three feature selection algorithms tend to be 3.2.4 Stacking uniformly 0.83. Therefore, choosing the FCBF feature filtering algorithm in GBDT can achieve the minimum The Stacking model consists of multiple layers of classifiers. feature data set (10) and the maximum accuracy In this paper, the three-layer and multi-layer classifiers have (0.8422). Features are 23-Sample entropy (β), 25- poor performance and poor stability. Therefore, the stacking Sample entropy (δ), 4-Kurtosis, 5-Skewness, 2-Hjorth of this paper consists of two layers of classifiers. Due to the complexity, 1-Hjorth activity, 18-Energy (δ/α), 19-Fuzzy excellent performance of random forest and XGBoost inte- entropy, 22-Sample entropy (α), and 29-Largest grated classification algorithms in this data classification, they Lyapunov exponent. were selected as the first-level learners. Based on the charac- teristics of stacking selection learning model, extremely

Fig. 8 Results of different feature selection algorithms in the GBDT Fig. 9 Results of different feature selection algorithms in the Xgboost classification classification Med Biol Eng Comput

Fig. 10 Stacking model

random trees, as special random forests, have distinct charac- with 0.95. Therefore, FS feature selection algorithm in teristics in general random forests. So, choose it as the first Stacking can achieve the minimum feature data set (9) and level learner. K-Nearest Neighbor (KNN) and NN are often the maximum accuracy (0.9667). The feature is 25-Sample used to build Stacking models. There are great differences in entropy (δ), 30-Permutation entropy, 2-Hjorth complexity, ensemble learning algorithms. So, choose it as the first level 19-Fuzzy entropy, 28-Hurst exponent, 18-Energy (δ/α), 29- learner. This model uses Multi-layer Perceptron (MLP) to rep- Largest Lyapunov exponent, 16-Energy (θ), and 27-LZ resent NN. Complexity. Therefore, the first five learning classifiers for this al- gorithm are KNN, RF, ERT, MLP, and XGBoost. The second layer is a Logistic regression. The model is shown 3.3 Algorithm comparison and performance in Fig. 10. evaluation For the feature selection results, sleep staging was per- formed using the parameterized stacking algorithm. The accu- This paper uses four integrated learning algorithms: Random racy of the algorithm is shown in Fig. 11. Fisher score feature Forest, GBDT, XGBoost, and Stacking. Three feature selec- selection algorithm selects the first nine feature data sets, and tion algorithms are used to eliminate the redundancy of feature the maximum accuracy is 0.9667. The SFFS and FCBF accu- data sets and improve the generalization ability of the algo- racy is 0.8906 and 0.7992. After 17 feature groups, the results rithm. Based on the above findings, it can conclude that of the three feature selection algorithms tend to be consistent Table 5 shows the highest accuracy of different integration algorithms in sleep staging. Based on the R&K sleep stage rules, sleep was divided into six phases: Awake, N-REM 1, N-REM2, N-REM3, N-REM4, and REM. A six-phase confusion matrix based on the stacking model for sleep staging results was established.

Table 5 The highest accuracy of different integration algorithms in sleep staging

Classification Classification No of selection algorithm accuracy (%) features algorithm

RF 92.22 10 FS GBDT 84.22 10 FCBF XGBoost 90.23 11 FS Stacking 96.67 9 FS Fig. 11 Accuracy of feature selection under stacking algorithm Med Biol Eng Comput

Table 6 Sleep staging results confusion matrix

Expert score Proposed method

Awake N1 N2 N3 N4 REM Awake 1135 14 4 0 0 2 N1 0 236 6 0 0 6 N208342005 N3002284110 N400262420 REM031030145

ACC 98.27% 95.16% 96.33% 95.62% 96.8% 91.93% Total ACC 96.67%

The sleep confusion matrix of Table 6 shows that the 4 Discussion algorithm has more than 90% of classifications for sleep periods. The distinction between the awake period and the In comparison with other sleep staging papers, we need to sleep period reached 98.27%. It means that the classifica- focus on three factors that affect the accuracy of sleep staging: tion algorithm can highly accurately recognize the two sleep data, feature algorithms, and classification algorithms. states of sleep and wakefulness. At the same time, N- Sleep EEG is not the only way to identify sleep stages. REM1, N-REM2, and REM phases have similar ampli- Physiological signals such as EOG, EMG, and HRV [51]are tudes and waveforms in the sleep EEG, and it is difficult also used for sleep staging. Willemen et al. [53] used the Heart to identify them from the EEG signals in the time domain. Rate (HR), Breathing Rate (BR), and Moving for the fourth- This study uses different fields of feature parameters to stage sleep (Wake-REM-N1/N2-N3/N4) classification. The improve the recognition rate. The results prove the effec- accuracy rate of the staging was 69%. Fonseca et al. [16] tiveness of this method. The classification recognition rate extracted features from ECG and thoracic respiratory effort was improved, the recognition rate for N-REM1 was measured with respiratory inductance plethysmography. 0.9516, the recognition rate for N-REM2 was 0.9633, They achieved a Cohen’s kappa coefficient of 0.49 and an and the recognition rate for REM was 0.9193. accuracy of 69% in the classification of wake, REM, light, The classification accuracy rate reflects the recognition and deep sleep. Non-EEG signal acquisition is very conve- rate of the classification algorithm, but it is not the only nient. However, its disadvantage is that it is not possible to evaluation index for the recognition ability of the classi- accurately determine the exact state of sleep. It leads to less fication algorithm, in order to calculate the generalization accurate sleep staging. Single-channel EEG signals can pro- ability of the classification algorithm and verify the au- duce very competitive results compared to methods based on thenticity of the classification accuracy. The Precision, several PSG channels [11, 56]. Recall, F1score, and kappa coefficients of the computa- Multi-domain–based feature algorithm set can describe the tional classification algorithm are shown in Table 7.The details of a signal better than a single feature or feature set of kappa coefficient of the computational classification algo- the same domain and improve the accuracy of the algorithm. rithm is 0.96. Table 6 and kappa coefficient results indi- The selection of multiple features will bring about redundan- cate that the stacking sleep staging algorithm used in this cy, so feature selection algorithms are needed. Thiago et al. study yields reliable classification accuracy, high consis- [11] extracted time domain signals and wavelet signals. The tency, and high generalization capability. stability of the feature set is confirmed with ReliefF tests which show a performance reduction when any individual Table 7 Precision, Recall, and F1score comparison chart feature is removed. Baha et al. [44] used 20 attribute algo- rithms in four categories. 41 feature parameters were obtained Accuracy Precision Recall F1score from these algorithms. Effective feature selection algorithms Awake 0.9827 1 0.9826 0.9912 such as minimum redundancy maximum relevance (mRMR); NREM stage 1 0.9516 0.9328 0.9516 0.9421 fast correlation-based feature selection (FCBF); ReliefF; t test; NREM stage 2 0.9633 0.9395 0.9633 0.9513 and Fisher score algorithms are preferred at the feature selec- NREM stage 3 0.9562 0.9693 0.9562 0.9627 tion stage in selecting a set of features which best represent NREM stage 4 0.968 0.9565 0.968 0.9627 EEG signals. A 97.03% classification accuracy is obtained. REM 0.9193 0.9177 0.9006 0.9090 Sleep classification algorithms are popular for NN [13], SVM [2], and RF [44]. Med Biol Eng Comput

Table 8 Sleep staging algorithms recent work

Authors Electrode Sleep stage Features extraction Classification Accuracy (%)

Arthur et al C3/C4 1-NREM/Wake/REM = 3 Reflection coefficients stochastic complexity HMMs 80 Luay et al Pz-Oz 4-NREM/Wake/REM = 6 Wavelet coefficients Regression-Trees 75 Salih et al C4-A1 4-NREM/Wake/REM = 6 Welch spectral analysis k-NN-C4.5 82.15 Luay et al C3-A1 3-NREM/Wake/REM = 5 CWD/CWT/HHT RF 80 Thiago et al Pz-Oz 4-NREM/Wake/REM = 6 Var/Kurt/Skew RF 90.5 Farideh et al Pz-Oz 3-NREM/Wake/REM = 5 Wavelet Packet coefficients ANN 93.0 ± 4.0 Khald et al Fpz-Cz 3-NREM/Wake/REM = 5 Energy/Entropy/STD DT 97.3 Zhu et al Pz-Oz 4-NREM/Wake/REM = 6 Difference visibility graph SVM 87.5 Ahnaf et al Pz-Oz 4-NREM/Wake/REM = 6 Spectral features AdaBoost 80.34 Kaveh et al Fpz-Cz Wake,N1/REM,N2,N3/N4 RDSTFT RF 92.5 Arthur et al C3/C4 1-NREM/Wake/REM = 3 Reflection coefficients stochastic complexity HMMs 80 Proposed method Pz-Oz 4-NREM/Wake/REM = 6 Multiple features Stacking 96.6

In recent years, deep learning algorithms have also been get the best feature algorithm group. This method greatly re- used in sleep staging. Sors et al. [47] introduce the use of a duces the risk of selecting features and helps to find efficient deep convolutional neural network (CNN) on raw EEG sam- algorithms. In addition, it not only improves the accuracy of ples for supervised learning of 5-class sleep stage prediction. the sleep staging algorithm, but also reduces the time consum- The network has 14 layers. Performance metrics reach the ing. Third, this paper establishes a stacking two-layer ensem- state of the art, with an accuracy of 0.87 and Cohen kappa ble learning model, where the base layer is RF, ERT, of 0.81. In deep learning, features are self-learning and con- XGBoost, KNN, and MLP, and the second layer is a stantly updated, and may be more in line with the real situation Logistic regression model. In this paper, RF, GBDT, of the data. However, the high complexity, high randomness, XGBoost and the proposed method are compared. It has a and high non-linearity of EEG signals make the nature of higher recognition degree of sleep stages. signals deeply hidden. It is difficult to categorize signals by The best result is the nine optimal feature algorithms based feature algorithms but by using deep learning methods. Sleep on FS selection algorithm. Then stacking two-layer ensemble staging algorithms (Table 8) show this paper contrasts with learning model was used for sleep staging. Sleep staging recog- recent work based on sleep EEG and machine learning. nition rate and kappa coefficient are 96.67 and 0.96. Comparing the sleep staging results in other papers as in Table 8, this model has a higher classification accuracy for sleep staging. In view of 5 Conclusion the advantages of this algorithm, this paper’s automatic sleep staging algorithm will have a wide range of applications and This study presents a new method for sleep stage classification scenarios in the field of sleep staging. according to the R&K standard using a single EEG channel. The methodology was broadly validated using a dataset com- References posed of more than 8233 EEG epochs of 30 s. The open- access signals are available on the Sleep-EDF database. The 1. Aboalayon KAI, Almuhammadi WS, Faezipour MA (2015) Comparison of different machine learning algorithms using single adaptive threshold wavelet function and the IIR filter function channel EEG signal for classifying human sleep stages. In: filtering method to reduce the noise of EEG signals. Systems, Applications and Technology Conference, pp 1–6 There are three main contributions in this paper. The first is 2. Alickovic E, Subasi A (2018) Ensemble SVM method for automat- to extract feature algorithm from time domain, time-frequency ic sleep stage classification. IEEE Trans Instrum Meas 60:1258- domain, and nonlinearity. At the same time, the parameters of 1265 3. Bandt C, Pompe B (2002) Permutation entropy: a natural complex- sleep EEG are analyzed and selected by feature algorithm. It ity measure for time series. Phys Rev Lett 88:174102 has increased the applicability of many feature algorithms in 4. Breiman L (2001) Random forests, machine learning, p 45 this field. Multi-feature extraction improves the accuracy of 5. Burioka N, Miyata M, Cornélissen G, Halberg F, Takeshima T, sleep staging algorithm. Compared with other algorithms pro- Kaplan DT, Suyama H, Endo M, Maegaki Y, Nomura T (2005) posed by other papers, the algorithm has better comprehen- Approximate entropy in the electroencephalogram during wake and sleep. Clin Eeg Neurosci 36:21–24 siveness and stability. It can better reflect the information by 6. Carbon A, Castellia G, Stanleyb HE (2012) Time-dependent Hurst sleep EEG. Secondly, it uses FS, SFFS, and FCBF feature exponent in financial time series. Phys A Stat Mech Its Appl 344: selection algorithms to filter the multi-feature algorithm and 267–271 Med Biol Eng Comput

7. Chen T, He T, Benesty M, Khotilovich V, Tang Y (2016) xgboost: 27. Jin X, Bo T, He H, Hong M (2016) Semisupervised feature selec- Extreme Gradient Boosting tion based on relevance and redundancy criteria. IEEE Trans Neural 8. Chen W, Wang Z, Xie H, Yu W (2007) Characterization of surface Netw Learn Syst 28:1974–1984 EMG signal based on fuzzy entropy. IEEE Trans Neural Syst 28. Kemp B, Zwinderman AH, Tuk B, Kamphuisen HA, Oberyé JJ Rehabil Eng A Publ IEEE Eng Med Biol Soc 15:266–272 (2000) Analysis of a sleep-dependent neuronal feedback loop: the 9. Chen W, Wang Z, Xie H, Yu W (2007) Characterization of surface slow-wave microcontinuity of the EEG. IEEE Trans Biomed Eng EMG signal based on fuzzy entropy. IEEE Trans Neural Syst 47:1185–1194 Rehabil Eng A Publ IEEE Eng Med Biol Soc 15:266–272 29. Koley B, Dey D (2012) An ensemble system for automatic sleep 10. Crawford C (1986) Sleep recording in the home with automatic stage classification using single channel EEG signal. Comput Biol analysis of results. Eur Neurol 25:30–35 Med 42:1186 –1195 11. Da ST, Kozakevicius AJ, Rodrigues CR (2016) Single-channel 30. Mandelbrot BB (1983) The fractal geometry of nature/Revised and EEG sleep stage classification based on a streamlined set of statis- enlarged edition. Whfreeman & Cop, New York, p 1 tical features in wavelet domain. Med Biol Eng Comput 55:1–10 31. Mohseni HR, Maghsoudi A, Shamsollahi MB (2008) Seizure de- 12. Denk TC, Parhi KK (1997) VLSI architectures for lattice structure tection in EEG signals: a comparison of different approaches. In: based orthonormal discrete wavelet transforms. IEEE Transactions Engineering in Medicine and Biology Society, 2006. Embs '06. on Circuits and Systems II: Analog and Digital Signal Processing International Conference of the IEEE, pp 6724–6727 44:129–132 32. Ozsen (2013) Classification of sleep stages using class-dependent 13. Ebrahimi F, Mikaeili M, Estrada E (2008) Nazeran H automatic sequential feature selection and artificial neural network. Neural sleep stage classification based on EEG signals by using neural Comput Appl 23:1239–1250 networks and wavelet packet coefficients. In: International 33. Pearson RG, Dawson TP, Berry PM, Harrison PA (2002) SPECIES: Conference of the IEEE Engineering in Medicine & Biology a spatial evaluation of climate impact on the envelope of Species. Society, p 1151 Ecol Model 154:289–300 14. Farge M (1992) Wavelet transform and their application to turbu- 34. Pincus SM (1991) Approximate entropy as a measure of system – lence. Annurevfluid Mech 56:68 68 complexity. Proc Natl Acad Sci U S A 88:2297–2301 15. Flexer A, Gruber G, Dorffner G (2005) A reliable probabilistic 35. Plastino AR, Plastino A (1993) Stellar polytropes and Tsallis' en- sleep stager based on a single EEG signal. Artif Intell Med 33: tropy. Phys Lett A 174:384–386 199–207 36. Quinlan R (1996) Bagging, boosting, and C4.5, vol 1, pp 725–730 16. Fonseca P, Long X, Radha M, Haakma R, Aarts RM, Rolink J 37. Rajendra AU, Oliver F, Kannathal N, ., Tjileng C, Swamy L (2005) (2015) Sleep stage classification with ECG and respiratory effort. Non-linear analysis of EEG signals at various sleep stages. Comput Physiol Meas 36:2027–2040 Methods Prog Biomed 80:37–45 17. Frøyland J, Frøyland J (1992) Introduction to chaos and coherence. 38. Raschka S, Nakano R, Bourbeau J, Mcginnis W, Poiriermorency G, Institute of Physics Publishing, Bristol Fernandez P, Bahnsen AC, Peters M, Savage M, Abramowitz M 18. Fraiwan L, Lweesy K, Khasawneh N, Wenz H, Dickhaus H (2012) (2018) rasbt/mlxtend: Version 0.11.0 Automated sleep stage identification system based on time- 39. Richman JS, Lake DE, Moorman JR (2004) Sample entropy. frequency analysis of a single EEG channel and random forest Methods Enzymol 384:172 classifier. Comput Methods Progn Biomed 108:10–19 19. Güneş S, Polat K, Yosunkaya Ş (2010) Efficient sleep stage recog- 40. Rodriguez JD, Perez A, Lozano JA (2010) Sensitivity analysis of k- nition system based on EEG signal using -means clustering based fold cross validation in prediction error estimation. In: IEEE feature weighting. Expert Syst Appl 37:7922–7928 Computer Society 20. Gabrel V,Murat C, Wu L (2013) New models for the robust shortest 41. Roebuck A, Monasterio V, Gederi E, Osipov M, Behar J, Malhotra path problem: complexity, resolution and generalization. Ann Oper A, Penzel T, Clifford GD (2014) A review of signals used in sleep – Res 207:97–120 analysis. Physiol Meas 35:R1 R57 21. Gandhi TK, Chakraborty P, Roy GG, Panigrahi BK (2012) 42. Samiee K, Kovács P, Kiranyaz S, Gabbouj M, Saramaki T (2015) Discrete harmony search based expert model for epileptic seizure Sleep stage classification using sparse rational decomposition of detection in electroencephalography. Expert Syst Appl An Int J single channel EEG records. In: Signal Processing Conference, pp – 39:4055–4062 1860 1864 Ş ş ğ 22. Ge J, Peng Z, Xin Z, Wang M (2007) Sample entropy analysis of 43. en B, Peker M, Çavu o lu A, Çelebi FV (2014) A comparative sleep EEG under different stages. In: IEEE/ICME International study on classification of sleep stage based on EEG signals using – Conference on Complex Medical Engineering feature selection and classification algorithms. J Med Syst 38:1 21 23. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, 44. Şen B, Peker M, Çavuşoğlu A, Çelebi FV (2014) A comparative Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE (2000) study on classification of sleep stage based on EEG signals using PhysioBank, PhysioToolkit, and PhysioNet: components of a new feature selection and classification algorithms. J Med Syst 38:18 research resource for complex physiologic signals. Circulation 45. Silveira TLTD, Kozakevicius AJ, Rodrigues CR (2017) Single- 101:E215 channel EEG sleep stage classification based on a streamlined set 24. Goutte C, Gaussier E (2005) A probabilistic interpretation of preci- of statistical features in wavelet domain. Med Biol Eng Comput 55: sion, recall and F-score, with implications for evaluation. Int J 1–10 Radiat Biol Relat Stud Phys Chem Med 51:952–952 46. Song Y, Crowcroft J, Zhang J (2012) Automatic epileptic seizure 25. Hjorth B (1975) An on-line transformation of EEG scalp potentials detection in EEGs based on optimized sample entropy and extreme into orthogonal source derivations. Electroencephalogr Clin learning machine. J Neurosci Methods 210:132–146 Neurophysiol 39:526–530 47. Sors A, Bonnet S, Mirek S, Vercueil L, Payen JF (2018) A 26. Hobson JA (1969) A manual of standardized terminology, tech- convolutional neural network for sleep stage scoring from raw niques and scoring system for sleep stages of human subjects: A. single-channel EEG. Biomed Signal Process Control 42:107–114 Rechtschaffen and A. Kales (editors). (Public Health Service, U.S. 48. Subasi A (2005) Automatic recognition of alertness level from EEG Government Printing Office, Washington, D.C., 1968, 58 p., by using neural network and wavelet coefficients. Expert Syst Appl $4.00). Electroencephalogr Clin Neurophysiol 26:644–644 28:701–711 Med Biol Eng Comput

Dechun Zhao obtained doctor 49. Tang B, Chen Q, Wang X, Wang X (2010) Reranking for stacking degree from Bioengineering ensemble learning. In: International Conference on Neural College, Chongqing University Information Processing: Theory and Algorithms, pp 575–584 in 2008. After that, he was with 50. Van Sweden B, Kemp B, Kamphuisen HAC, Van DV, E A (1990) the Chongqing University of Alternative electrode placement in ( automatic) sleep scoring Posts and Telecommunications. 51. Vanoli E, Adamson PB, Ba-Lin GDP, Lazzara R, Orr WC (2001) He became a Professor in 2016. Heart rate variability during specific sleep stages. In: Computers in His research interests include cardiology, vol 2001, pp 461–464 brain-computer interface, micro 52. Ververidis D, Kotropoulos C (2006) Fast sequential floating for- diagnosis and treatment system, ward selection applied to emotional speech features estimated on and electromagnetic safety. DES and SUSAS data collections. In: Signal Processing Conference, 2006 European, pp 1–5 53. Willemen T, Van DD, Verhaert V,Vandekerckhove M, Exadaktylos V,Verbraecken J, Van HS, Haex B, Sloten JV (2017) An evaluation of cardiorespiratory and movement features with respect to sleep- stage classification. IEEE J Biomed Health Inform 18:661–669 54. Xie J, Coggeshall S (2010) Prediction of transfers to tertiary care and hospital mortality: a gradient boosting decision tree approach. Statist Anal Data Min 3:253–258 Yi Wang is a postgraduate student 55. Yu L, Liu H (2003) Feature selection for high-dimensional data: a at Chongqing University of Posts fast correlation-based filter solution. In: Twentieth International and Telecommunications Conference on International Conference on Machine Learning, pp (CQUPT), China. Her current re- 856–863 search interests include digital 56. Zhu G, Yan L, Peng W (2014) Analysis and classification of sleep signal processing and machine stages based on difference visibility graphs from a single-channel learning algorithm. EEG signal. IEEE J Biomed Health Inform 18:1813–1821

Publisher’snote Springer Nature remains neutral with regard to jurisdic- tional claims in published maps and institutional affiliations.

Xiaorong Hou is an Associate Qiangqiang Wang is a MD stu- Professor and Supervisor of dent in BME at Chongqing Postgraduate in the College of University of Posts and Medical Informatics, Chongqing Telecommunications, China. His Medical University, China. She interests include the machine received her master’s degree in learning algorithm and digital sig- management from Chongqing nal processing. University, China. Her research interests are in the fields of health information dissemination and in- formation service.