Support Vector Regression Approach to Predict the Design Space for the Extraction Process of Pueraria Lobata
Total Page:16
File Type:pdf, Size:1020Kb
molecules Article Support Vector Regression Approach to Predict the Design Space for the Extraction Process of Pueraria lobata Yaqi Wang 1,2, Yuanzhen Yang 2, Jiaojiao Jiao 1, Zhenfeng Wu 2 and Ming Yang 1,2,* 1 College of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 610072, China; [email protected] (Y.W.); [email protected] (J.J.) 2 Key Laboratory of Modern Preparation of Traditional Chinese Medicine, Ministry of Education, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China; [email protected] (Y.Y.); [email protected] (Z.W.) * Correspondence: [email protected]; Tel.: +86-791-8711-8658 Received: 2 August 2018; Accepted: 18 September 2018; Published: 20 September 2018 Abstract: A support vector regression (SVR) method was introduced to improve the robustness and predictability of the design space in the implementation of quality by design (QbD), taking the extraction process of Pueraria lobata as a case study. In this paper, extraction time, number of extraction cycles, and liquid–solid ratio were identified as critical process parameters (CPPs), and the yield of puerarin, total isoflavonoids, and extracta sicca were the critical quality attributes (CQAs). Models between CQAs and CPPs were constructed using both a conventional quadratic polynomial model (QPM) and the SVR algorithm. The results of the two models indicated that the SVR model had better performance, with a higher R2 and lower root-mean-square error (RMSE) and mean absolute deviation (MAD) than those of the QPM. Furthermore, the design space was predicted using a grid search technique. The operational range was extraction time, 24–51 min; number of extraction cycles, 3; and liquid–solid ratio, 14–18 mL/g. This study is the first reported work optimizing the design space of the extraction process of P. lobata based on an SVR model. SVR modeling, with its better prediction accuracy and generalization ability, could be a reliable tool for predicting the design space and shows great potential for the quality control of QbD. Keywords: Pueraria lobata; SVR; QPM; extraction process; QbD; design space 1. Introduction Radix puerariae (RP) is the root of Pueraria lobata, known as ‘Gegen’ in Chinese. RP was one of the earliest herbal resources used for food and medicine: it was firstly documented in “Shen Nong’s Herbal Classic” and classified as middle grade for the prevention and treatment of fever, diabetes, diarrhea, and cardiovascular and cerebrovascular diseases. Furthermore, modern studies have shown that RP extract exhibits potential bioactivity for the treatment of several immune disorders, such as atopic dermatitis [1], osteoporosis [2], and Alzheimer’s disease [3]. Isoflavonoids are believed to be the major active components in RP. Puerarin is the major and most important component in RP with extensive pharmacological activities such as hepatoprotection [4], anti-atherogenic effects [5], and anti-cancer effects [6]. The content of puerarin (≥2.4%) is regarded as the quality indicator of RP according to the Pharmacopeia of the People’s Republic of China. Extraction is a key operation process in the manufacturing of health food, dietary supplements, and medicine, especially for traditional Chinese medicine (TCM). The quality by design (QbD) concept has been applied in the pharmaceutical field as a well-established tool for both formulation and manufacturing process development. According to ICH guideline Q8, QbD is implemented with Molecules 2018, 23, 2405; doi:10.3390/molecules23102405 www.mdpi.com/journal/molecules Molecules 2018, 23, 2405 2 of 11 several steps, including risk analysis, diagnosis of potential critical quality attributes (CQAs) and process parameters (CPPs), construction of mathematical models, optimization of the design space, selection of a control strategy, and continual improvement in the product lifecycle [7,8]. Among these steps, design space, as a reliable operation range, has gained increasing attention in the food and drug industries. According to the definition of design space, the quality is guaranteed at any combination of independent variables (process parameters) within the space. Consequently, the design space is regarded as a zone of robustness, as no significant fluctuations should be observed within the space. Design space is established on the basis of a sound understanding of the effect of the interaction of CQAs and CPPs on the quality of the product. Therefore, statistical and multivariate analysis models are essential in the implementation of QbD. The predictive ability of the mathematical model shows the robustness and predictability of the design space. Furthermore, a well-fitted model helps us not only to gain a clear understanding of the connection and the intrinsic regular pattern between CPPs and CQAs, but also to gain regulatory flexibility. Thus, constructing a reliable model is first and foremost. The robustness of the model directly influences the batch-to-batch consistency of the quality of products. Response surface methodology (RSM), due to its data visualization and handling ability, has become the most widely used method to express multidimensional relationships between CQAs and CPPs. The quadratic polynomial model (QPM) is the most common algorithm for response surface optimization. Design of experiment (DoE) analysis techniques such as analysis of variance and fitted regression models are used frequently. A multivariate knowledge space may be delineated to find regions of risk or optimal performance, which are often graphically illustrated by figures and known as response surfaces. The literature has shown that the QPM has good performance when used for relatively simple and linear cases [9,10]. The QPM also has some well-known limitations that may affect the prediction accuracy [11]. The multidimensional relationships observed in the pharmaceutical area are often complex and nonlinear. The predictions of models based on the linear regression algorithm exhibit poor estimation [12]. Therefore, the QPM may not be the most applicable algorithm for accurate prediction of the design space of QbD. Support vector regression (SVR) is a promising kernel-based machine learning algorithm developed by Vapnik and Cortes [13]. The SVR approach can optimize complex nonlinear problems by using an exclusive objective function that minimizes the structural risk of the model. The introduction of the kernel function allows nonlinear problems to be linearly solved in a higher dimension compared with its original dimensional feature space. Thus, SVR has a global optimum and exhibits excellent prediction accuracy. Considering its remarkable generalization performance, SVR has attracted particular attention and been extensively used in applications including atmospheric science prediction [14], drug design [15], credit rating analysis [16], protein structure and function prediction [17], and metabolomics [18]. In most of these cases, the performance of the SVR model is better than that of traditional machine learning approaches. To our knowledge, few studies of the design space model have been developed based on the SVR model. The aim of this present study was to explore the practicability of using an SVR model for predicting a design space. For the RP extraction process, there are several analytical methods based on the QPM for response surface optimization [19]. A single algorithm may not be credible enough for model development. To the best our knowledge, there is no study in the literature comparing the QPM and the SVR algorithm for the extraction process of RP. Thus, in this paper, the RP extraction process was optimized as a case study. The extraction time, number of extraction cycles, and liquid–solid ratio were identified as CPPs, and the yield of puerarin, total isoflavonoids, and extracta sicca were the CQAs. Models between CQAs and CPPs were constructed using both the QPM and the SVR algorithm based on the Box–Behnken design. The performance of the two models was analyzed and compared. Then, the design space was calculated and optimized using a grid search technique. This is the first study on optimization of the design space of the extraction process of RP using the SVR algorithm. Molecules 2018, 23, 2405 3 of 11 2. Results 2.1. Box–Behnken Design Box–Behnken design is one of the most commonly used DoEs for RSM. Three influential factors (independent variables) for the extraction process of RP were investigated, including extraction time (X1, min), extraction cycles (X2, cycles), and liquid–solid ratio (X3, mL/g). Seventeen experimental runs were arranged using Box–Behnken design. Their experimental results were considered as dependent variables. In this work, Y1, Y2, and Y3 represented the yields of puerarin (%), extracta sicca (%), and total isoflavonoids (%), respectively. The entire dataset obtained using Box–Behnken design was considered as the training set and adopted to establish the QPM and SVR fitted models. To further evaluate the performance of the two fitted models, cross-validation and external validation approaches were adopted. Four sets of external validation values—random combinations of independent variables along with experimental responses—were used as the test set to evaluate the quality of the fitted models. The experimental datasets are shown in Table1. Table 1. Results of training and test set. Factors Response Variables No. X1 (min) X2 (cycles) X3 (mL/g) Y1 (%) Y2 (%) Y3 (%) Training set 1 35 3 5 3.93 28.64 16.00 2 35 3 15 5.22 34.11 43.86 3 10 3 10 3.63 27.53 28.95 4 35 2 10 4.03 28.06 16.40 5 60 3 10 4.74 34.44 26.82 6 35 2 10 3.90 27.97 20.60 7 35 1 5 1.22 13.52 39.20 8 10 2 15 3.24 25.18 27.57 9 60 2 5 3.24 26.67 13.37 10 10 1 10 1.76 14.82 7.73 11 35 2 10 3.93 28.25 15.41 12 60 1 10 2.40 23.47 12.66 13 35 2 10 4.03 30.40 23.03 14 60 2 15 4.28 33.46 36.94 15 35 2 10 3.77 28.90 21.69 16 10 2 5 2.32 19.67 8.46 17 35 1 15 2.56 20.74 20.70 Test set 1 25 2 10 3.50 25.92 22.70 2 30 2 8 3.45 25.37 17.62 3 15 2 15 3.46 26.35 34.41 4 20 2 15 3.45 27.78 35.40 2.2.