1

Real-time Identification Using Deep Learning for Linear Processes with Application to Unmanned Aerial Vehicles Abdulla Ayyad*1, Mohamad Chehadeh*1, Mohammad I. Awad1, and Yahya Zweiri1,2

1Khalifa University Center for Autonomous Robotic , Khalifa University, Abu Dhabi United Arab Emirates 2Faculty of Science, Engineering and computing, Kingston University, London SW15 3DW, UK *Equal contribution authors

Abstract—This paper proposes a novel parametric identifi- depending on the control requirements and design constraints. cation approach for linear systems using Deep Learning (DL) Parametric are -driven approaches where model and the Modified Relay Feedback Test (MRFT). The proposed of a Process Under Test (PUT) are identified methodology utilizes MRFT to reveal distinguishing frequencies about an unknown process; which are then passed to a trained based on observation data. Such approaches include prediction DL model to identify the underlying process parameters. The error methods (PEM) [6], [7], [8], maximum likelihood (ML) presented approach guarantees stability and performance in methods [9], [10], least square (LS) methods [11], the identification and control phases respectively, and requires response identification methods [12], [13], [14], and neural few seconds of observation data to infer the dynamic system network based methods [15], [16], [17]. Several studies in parameters. Quadrotor Unmanned Aerial Vehicle (UAV) attitude and altitude dynamics were used in and experimen- the literature applied these techniques to UAV operation with tation to verify the presented methodology. Results show the accurate identification results [18], [19], [20], [21], [22], [23]. effectiveness and real-time capabilities of the proposed approach, Nonetheless, these methods require extensive data generation which outperforms the conventional Prediction Error Method in and accurate selection of optimizer initial conditions, which terms of accuracy, robustness to biases, computational efficiency demand human experience and cause susceptibility to data and data requirements. biases and overfitting. Furthermore, most of these methods Index Terms—System Identification, Unmanned Aerial Vehi- are computationally expensive and not suitable for real-time cles, Robot Learning, Sliding Control, Process Control. application. On the other hand, non-parametric identification includes methods which rely on the partial knowledge of the PUT to MULTIMEDIA MATERIAL tune a predefined controller structure. Such methods include, A supplementary video for the work presented in this for example, the classical Ziegler-Nichols method [24], the paper can be accessed at: https://www.youtube.com/watch?v= Relay Feedback Test (RFT) [25], and the Modified Relay dz3WTFU7W7c Feedback Test (MRFT) [26]. In all these methods, the knowl- edge of the PUT response to a single excited frequency is enough to get controller parameters tuning. It was shown in I.INTRODUCTION recent work [27] that a near optimal controller for quadcopter Since the third industrial revolution, system identification attitude dynamics can be designed using MRFT based tuning arXiv:2004.08603v2 [eess.SY] 21 Apr 2020 has been a key element in the development of autonomous rules. However, non-parametric methods in the literature are technologies in a wide set of industrial applications. Accurate limited to PID tuning and do not provide full insight to the knowledge of enables the design of robust system dynamics as they cannot be used to obtain model and high-performance systems for prediction, planning and parameters. control. Unmanned Aerial Vehicles (UAVs) are an example of Recent advancements in the fields of Iterative Learning an autonomous system that has seen diverse utilization in areas Control (ILC), Reinforcement Learning (RL) and Deep Learn- of agriculture, disaster relief, remote sensing, surveillance, etc. ing (DL), and the growth of computational capabilities have [1], [2], [3], [4], [5]. UAVs are often deployed in uncontrolled given rise to new approaches of controller design and tuning environments, and are hence required to adapt to dynamic [28], [29], [30], [31], [32], [33], [34]. These approaches conditions in real-time with minimal sacrifice to functionality have introduced advantages in regards to accuracy of models and performance. and controllers, adaptation time, and the ability to handle To meet the aforementioned requirements of autonomy, nonlinearities in the PUT; with the limiting requirement of extensive research have been carried out to develop effective abundant observation data. Similar to non-parametric tuning, methods of system identification and adaptation. These meth- these approaches do not generate explicit estimates of model ods are generally classified as parametric or non-parametric parameters, but rather implicitly consider these parameters in 2

Fig. 1: The proposed system identification scheme. The control switch starts at position (a) where MRFT is used to excite an unknown plant G(s) with stable oscillations. These oscillations are pre-processed and forwarded to a DL classifier to identify the system parameters as Gˆ(s). After the PUT was identified, a suitable controller C∗(s) can be designed and applied to the plant by shifting the control switch to position (b). controller design. which highlight its advantages over other existing methods. This paper bridges the gap between parametric and non- First, the required amount of data needed for identification parametric methods and presents a novel methodology to is minimal as it consists of a single excited frequency of the infer accurate estimates of model parameters with the prime system. In contrast, other data driven classical identification motivation of designing high-performance controllers online methods used in the literature require extensive data generation and in real-time. The novelty of the proposed methodology lies [6], [7], [8], [9], [10], [13], [14]. Due to the reduction in in utilizing self-excited oscillations (i.e. chattering) resulting data requirements, the proposed methodology does not re- from a sliding mode controller, which is the MRFT, to reveal quire human experience for data generation, and the model distinguishing information about the PUT. The information fitting process is not prone to data bias. This makes the revealed by MRFT are then fed to a DL classifier which proposed method precise and accurate in identifying unknown selects model parameters that best represents the PUT. Fig. model parameters. Other advantages of data reduction include 1 illustrates the proposed comprehensive system identification minimizing computational requirements and shortening the approach. We show that this identification method can be period of the identification phase. In fact, we have found performed in real-time such that a UAV adapts to changes that the required computational time on modern commercial to its own physical dynamics during a flight mission. The processors is in the order of milliseconds; and the identification suggested online identification methodology is safe with guar- phase lasts for a maximum of a few seconds (this depends on anteed stability, and results in controllers with assessable levels the PUT dynamics, e.g. mass). of robustness and performance. The presented approach is The second inherent feature of the presented identification mainly applied to Second Order with Integrator Plus Time method is the guarantee of stability during the identification Delay (SOIPTD) linear systems; but is also applicable to other phase. This relieves the need for hand-tuned initial stabilizing system models with equal or lower number of model parame- controllers as opposed to classical identification methods, ters. Due to its real-time capabilities, the proposed technique and makes the presented method less prone to estimation can handle static nonlinearities by applying the identification biases and non-optimality caused by the selection of initial process in multiple operation modes (e.g. near ground hovering control parameters (i.e. initial parameters for the optimizer or drag dynamics caused by large translational speeds); to decision variables). As stability is guaranteed, the PUT can obtain locally linear descriptions of the system. be directly started in the identification phase as demonstrated There are two inherent features of the proposed approach in the results section; thus further minimizing the required 3

TABLE I: Qualitative comparison with selected recent methods from literature addressing the problem of automatic controller tuning and adaptation for UAVs

Method Experimental data gen- Computational Stability Comments eration Resources

This work Single steady-state oscil- A few millisec- Guaranteed in data gener- Not investigated for lateral lation at a specified phase onds with mod- ation phase by Loeb crite- UAV motion yet. Provides ern on-board pro- rion [35], [26] PUT model parameters cessors

Sim-to-(Multi)-Real using Not required Inference model No stability guarantees This work’s problem proximal policy optimiza- is running in real- statement is closest to tion (PPO) RL [32] time the one presented in this paper, but with no guarantee of stability or explicit inference of model parameters

ILC [28] Requires a lot of iterations Computationally Stability guaranteed Feedforward expensive within ILC iterations compensation terms

Deep Model-Based Rein- Requires a lot of experi- Computationally Not guaranteed Early adaptation of forcement Learning [29] mental data expensive model-based RL

Learning through Requires a lot of experi- Computationally Guaranteed stability dur- Gaussian processes with mental data expensive ing learning within model Bayesian optimization uncertainty margins [30]

Non-parametric tuning of Single steady-state oscil- Negligible Guaranteed stability dur- Does not provide model UAV inner loops [27], lation at a specified phase ing identification parameters [36]

Heuristics based tuning Requires a lot of experi- Low Not guaranteed and [37] mentation data computational subject to selected resources optimization parameters constraints operation time to estimate model . Table I provides cation and the non-parametric tuning rules of [27]. Finally, a qualitative comparison between the method suggested in this Section V summarizes the findings of this paper and provides paper and other related work in literature. Additionally, the concluding remarks. Results section provides a quantitative comparison between this paper’s method and two other identification methods: PEM II.PROBLEM STATEMENT and non-parametric tuning based on MRFT. Considering an LTI system G(s) with known model struc- The main contributions of this paper can be summarized as: ture and unknown set of bounded model parameters ~p ∈ • We introduce a novel approach of parametric system iden- (D ⊂ RQ) where Q corresponds to the number of unknown tification with guaranteed stability, real time capabilities, model parameters in G(s). Let us also assume a feedback and minimal requirements of observation data. controller C(s) that acts on G(s). Given vector X ∈ R2N • We optimize the identification phase to reveal distinctive that contains N uniformly sampled measurements of both information about the PUT by means of finding the the process variable pv and controller output u signals. Inner distinguishing phase for a set dynamic systems. system states are considered to be unobservable. We wish to • We present a discretization technique to address system find the mapping Γ: X → D¯ where D¯ is a discretization of identification as a classification problem by utilizing the the subspace D. concept of controller performance deterioration. In this work, we limit the order of the LTI system to SOIPTD, • We devise a modified formulation of the Softmax ac- which corresponds to multirotor attitude and altitude dynamics tivation function that adds a meaningful discrepancy to as presented in [27]: the cost of misclassification, leading to faster and more K e−τs accurate training of the DL model. G(s) = eq (1) s(T s + 1)(T s + 1) The remainder of this paper is organized as follows. The prop body system identification problem is first formulated in Section where Keq is the overall lumped gain of the system, τ is II. Section III describes the comprehensive identification ap- the overall observed delay in the system, Tprop is the time proach proposed in this paper. In section IV, simulation and constant associated with propulsion dynamics, and Tbody is the experimental results for the suggested approach are presented, time constant associated with body dynamics. These dynamics discussed and compared against PEM based system identifi- relate motor commands sent by the flight controller to the 4

observed roll, pitch, or altitude. The considered attitude and where b1 = −βemin and b2 = βemax. emax > 0 and emin < altitude dynamics are subject to measurement noise ℵ and 0 are respectively the last maximum and minimum values of forced bias u0 caused by external disturbances (e.g. gravity) the error signal after crossing the zero level; and uM (t−) = or sensors bias. lim→0+ uM (t − ) is the previous control signal. Prior to the start of MRFT the maximum and minimum error values are set III.METHODOLOGY as: emax = emin = 0. β is a constant parameter that dictates the phase of the excited oscillations as: A. Finding the Distinguishing Phase The identification method presented in this paper builds on ϕ = arcsin (β) (5) two propositions. Using the Describing Function (DF) method, it could be Proposition 1: There is a distinguishing phase ϕd at which shown that the MRFT achieves oscillations at a specified phase the sustained self-excited oscillation characteristics can be angle by satisfying the Harmonic Balance (HB) equation [39]: used to identify the corresponding processes in D¯. Proposition 2: The distinguishing phase ϕd corresponds to Nd(a0)Wp(jΩ0) = −1 (6) the optimal tuning rules of [26]. As such, ϕ can be deter- d N W a Ω mined by the process of designing optimal non-parametric where d is the DF, p is the process under test, 0 and 0 tuning rules as outlined in [26], [38]. are the amplitude and frequency of the steady state oscillations, respectively. The DF of MRFT is presented in [26] as: For completeness, we summarize the steps to obtain optimal non-parametric tuning rules as follows [27], [26], [38]: 4h p 2 Nd(a0) = ( 1 − β − jβ) (7) 1) Select process model and the of the normalized πa0 model parameters. The DF method provides an approximate solution that is 2) Discretize the selected range of model parameters to a valid only if Wp(s) has sufficient low pass filtering properties. finite set of dynamics processes. It is worth mentioning that the MRFT control signal uM (t) 3) Select tuning rule specifications based on gain margin has a phase lead relative to the error signal e(t) in the case of or phase margin requirements. β < 0, and lags in the case of β > 0. The MRFT DF intersects 4) Generate a locally optimal tuning rule for every process the Nyquist plot in the second quadrant for β < 0; while this in the range. intersection occurs in the third quadrant when β > 0. The 5) Apply every locally optimal tuning rule to all other pro- Relay Feedback Test (RFT) [25] could be thought of as a cesses in the range. Note the performance deterioration special case of the MRFT algorithm where β = 0. For our for every process due to the application of the non- case, the value of the MRFT parameter β that corresponds to ◦ optimal tuning. ϕd = −46.89 is βd = sin(ϕd) = −0.73. 6) Select the tuning rule with the least deterioration in performance as the global optimum. B. Discretization of System Parameters’ Subspace In our case, the model structure is SOIPTD; therefore, the vector ~p that contains the model parameters is defined as: System identification has been generally considered in the literature as a regression problem [40], [41], [42]. However, T   training a deep learning regression model can raise several in- ~p = , ~p ∈R3 (2) Tprop Tbody τ stability and complexity concerns as suggested in [43], where a DL network was used for age prediction. To avoid these and the model parameters range is selected to be shortcomings with DL regression, the system identification  T  T conundrum in this study is formulated as a classification 0.015, 0.2, 0.0005 ≤ ~p ≤ 0.3, 2, 0.1 (3) problem by discretizing the parameter space D into N unique sets of system parameters D¯ = {G1,G2, ..., GN }. System which includes a wide variety of multirotor UAV designs and identification hence becomes the problem of selecting a candi- sizes from racing quadrotors to larger multirotor UAVs having date set of process parameters G within D¯ that best resembles a takeoff weight of up to 50Kg. i the dynamics of the ground truth dynamic system Gact. Using the aforementioned method to generate optimal non- A trivial approach to obtain D¯ would be discretizing D ϕ −46.89◦ parametric tuning rules, we found d to be . In based on an equispaced distance of the model parameters ~p. practice, a self-sustained oscillation with a specific phase can Assuming that the equi-space distance was small enough to be excited using MRFT. MRFT is an algorithm that can be represent all the pivotal processes, D¯ would end up being an realized with the following equation [26]: over-discretized representation of D. Adjacent processes of a given subspace of D¯ would have similar frequency response uM (t) =  characteristics while adjacent processes in another subspace  of D¯ would have vastly different frequency response charac-  h if e(t) ≥ b1 or (e(t) > −b2 and uM (t−) = h) teristics. Thus, a trained classifier would be biased towards  the subspace where adjacent processes have similar frequency  −h if e(t) ≤ −b2 or (e(t) < b1 and uM (t−) = −h) response characteristics. Therefore, a meaningful criterion for (4) discretization must be developed to ensure a balance between 5

several weeks depending on the selected parameters range of D). For that we propose reducing the dimensions of the parameters space by transforming the describing subspace D from rectangular to spherical coordinates. The transformation is given by:

q 2 2 2 r0 = Tprop + Tbody + τ

θ = arctan ( Tbody ) (10) Tprop

τ φ = arccos ( r ) It is worth noting that the parameter r in (10) represents time scaling of process parameters ~p as in s0 = rs. This allows us to introduce two properties of the spherical representation that will make the discretization process more efficient. Property 1: For subsequent time scaling of a system G(s) Fig. 2: Showing a representation of D and S with discretized i along the radial direction G(αs) ,G(α2s) , the joint cost processes in S¯ shown in green and discretized processes in j k between successive scaled systems remain constant as in: D¯ shown in red. The discretized processes in D¯ are denser at τ τ Jij = Jjk,Jji = Jkj for α ∈ R>0. parts of D where the ratio of T and T are highest. prop body Property 2: Considering two radially scaled systems: G(s)i and G(α1s)j with a joint cost Jij, and another pair of radially scaled systems G(s) and G(α s) with the same joint cost the distinguishability of these processes (i.e. in terms of k 2 l J = J ; then J ∗ = J = J for α , α ∈ . their frequency response characteristics) and their accuracy kl ij (ik) (jl) 1 2 R>0 Property 1 allows us to discretize a subsurface of a sphere in representing D. For this purpose, a joint cost function is S that we choose its radius to satisfy r = ||~p ||, where introduced based on the concept of controller performance 0 min ~p corresponds to the minimum model parameters set in deterioration. To illustrate such joint cost function, let us min D. This effectively reduces the discretization problem by one consider {G (s),G (s)} ∈ D¯; the joint cost associated with i j dimension. Fig. 2 provides a three-dimensional illustration applying C (s), which is the optimal controller of process i of D and S. To discretize S we set J ∗ = 10% and we G (s), to the process G (s) would be given by: i j proceed with the discretization by varying the values of φ J(Ci(s),Gj(s)) − J(Cj(s),Gj(s)) and θ. To prevent excessive discretization and to increase Jij = × 100% (8) J(Cj(s),Gj(s)) robustness of controllers against varying model parameters and linearization assumptions, we impose phase margin constraints where J is a cost function relating a controller C(s) to a on controllers used to find joint cost function J(ij). The process G(s) (i.e. IAE, ISE, etc.). The self-joint cost is defined phase margin constraint can be imposed using a set of three by the case of i = j, where Jij = 0. For the case where equations. The first equation relates PID parameters with the i 6= j, Jij > 0 by definition. It must be noted that the joint PUT amplitude and frequency responses when a steady state cost function is non-commutative, i.e. Jij 6= Jji. Therefore, oscillation is excited at a certain phase [38]: J(ij) = max{Jij,Jji} is used as the discretization criteria to provide a performance guarantee among adjacent members of 4h 2π 2π Kc = c1 ,Ti = c2 ,Td = c3 (11) D¯. In this paper, we have chosen ISE as a system performance πa0 Ω0 Ω0 index. ISE cost function is given by: where c1, c2 and c3 are called the homogeneous tuning rules parameters. These parameters are related with the excitation 1 Z Ts J (C,G) = e(C,G)2dt (9) phase ψ characterized by the MRFT parameter β through the ISE T d s 0 two following equations [38]: where in this paper we use J := JISE for convenience. 1 For discretizing D, we choose a desired value of the joint β = sin(φm + arctan( − 2πc3)) (12) 2πc cost between adjacent processes J ∗. In this paper, we use 2 ∗ and: J = 10% as it provides sufficient accuracy without requiring r 1 2 excessive simulation time (simulation time have a cubical c1 1 + (2πc3 − ) = 1 (13) relationship with the reciprocal of J ∗). To find the process 2πc2 adjacent to a known one, Gi, we use an optimizer (Nelder- where φm is the imposed phase margin constraint. In this ◦ Mead simplex algorithm realized by ”fminsearch” function in work, we choose φm = 20 . A higher imposed value of MATLAB R have been used) that takes a vector of model the phase margin constraint will result in lower number of parameters ~pj as the set of decision variables and uses E = discretized processes. A modified version of Nelder-Mead ∗ 2 (J −J(ji)) as a cost function. We have found that discretizing simplex algorithm that accepts constraints on optimization D requires excessive simulation time (might take days to decision variables is used to realize (12) and (13). Then we 6

in practical settings. Thirty were performed for each candidate system in D¯ to produce a training set of size 6240. Five additional simulations per system were carried out to generate 1040 samples to be used as a verification set. The pre-processing steps undertaken to prepare the DL input data can be summarized as: adjustment, cropping, zero-padding, amplitude normalization, and concatenation. Fig. 4 illustrates these pre-processing steps. Sampling period was fixed to be 1ms. The size of the input vector was set to be X ∈ R2×2260 to accommodate the response of the slowest system in D¯ (i.e. a period of 2.26s). Fig. 5 shows the structure of the developed deep learning model. The DL network consists of two hidden layers of size 3000 and 1000 respectively. This structure was chosen upon testing with several DL models of different depth and Fig. 3: A projected side view showing parameters space D width up to four layers and 10000 neurons, as it showed the ∗ and the surface S with radius r0. J is achieved by using the best performance with suitable performance on a signle-core ¯ time scales α1, α2, α3. Note that {G1(s),G4(s),G8(s)} ∈ S processor. Convolutional Neural Networks were also tested ¯ ∗ and {G2(s),G5(s),G6(s),G9(s)} ∈ D. J = J45 = J56 with no noticeable performance improvements. Rectified Lin- ∗ illustrates property 1. Property 2 is illustrated by J = J14 = ear Units (ReLU) were utilized as the activation function for J35. G2(s) and G9(s) represent a scaled version of processes both hidden layers due to its simplicity, reliability, and to avoid ¯ G1(s) and G8(s) respectively in D. gradient vanishing [45]. Dropout is used after each layer for its regularization effect to avoid overfitting and prompt a noise rejection behavior [46]. Batch normalization is applied to the proceed by finding S¯ which is the set of the discretized outputs of the hidden layers to accelerate training and to add processes in S. Once we have S¯, we find the scaling parameter a slight regularization effect [47]. The output layer consists of α for every process in S¯ as proposed in property 1. Fig 3 208 neurons, one for each system in D¯. illustrates these steps with the properties 1 and 2. Property 2 The Softmax activation function and the Cross-entropy loss guarantees that all adjacent systems have a joint cost within function are among the most utilized combinations for training J ∗. Fig 2 shows the set of discretized processes. deep learning networks. For the case of system identification, a For the parameters range presented in (3), we found the conventional application of this combination lacks in the sense size of D¯ to be 208 processes. The discretized processes are that the cost of incorrect classifications is identical regardless denser at the parts of D were the ratio between the time of the corresponding error in parameter space. To undermine delay τ and the other process time constants is the highest. this shortcoming, we introduce a modified formulation of the Therefore, the identification and control of small UAVs with Softmax function. The modified formulation utilizes the joint sensors and actuators that have high delays is found to be more cost function presented in (8) to add a meaningful discrepancy challenging. th to the cost of misclassification. For the i logit ai in the output layer corresponding to a process Gi, the modified Softmax C. Deep Learning Model Development and Training probability pi is introduced as: In this section, the process of developing and training a deep learning model for system parameter identification is eγiT ·ai pi = N (14) discussed. The objective of the DL model in this study is P γjT ·aj j=1 e to find the mapping Γ: X → D¯ as illustrated in Section II. The input X to the DL classifier is a uniformly time sampled where T is the class corresponding to the ground truth system vector concatenating the controller and plant response while, GT , N is the size of the output layer, and γiT = 1 + JiT . the output of the DL netwrok is one of N sets of process parameters in D¯. The DL network was trained using a Stochastic Gradient Training data was generated based on members of D¯. The Descent approach with the cross-entropy L = ¯ PN MRFT response of each system within D was simulated − i=1 yi log (pi), where y is a one-hot encoded vector that multiple times according to the process diagram shown in indicates the ground truth class T . The partial derivative of L Fig. 1. Measurement noise power ℵ was randomly varied with respect to output layer logits ai when using the Softmax between different simulations to add a regularization effect function in (14) is calculated as: and prevent over-fitting [44]. To further prompt robustness and ∂L generalization against varied testing conditions, simulations = JiT × (pi − yi) (15) ∂ai were carried out with varied values of forced input bias u0, which introduces asymmetry to the MRFT controller output. For an exact derivation of (15), readers can refer to The values of u0 were limited to half the relay amplitude Appendix A. h of the MRFT controller as a reasonable bias magnitude 7

Fig. 4: Pre-processing the DL classifier input vector: (a) The system’s MRFT response is obtained and sampling time is adjusted to be 1ms. (b) A single cycle of the steady state oscillation is selected, zero-padding is applied elsewhere. (c) The response is zero-center and scaled to an amplitude of 1. (d) PV and u are concatenated to form a 1D vector

measurement noise ℵ and input bias u0. A single inference run of the DL model on a single core of an i5-6300U processor requires 5 ms, which reflects the suitability of the developed deep learning framework for real-time identification applications.

IV. RESULTS

This section evaluates the effectiveness of the comprehen- sive system identification approach proposed in this paper. The testing approach follows the framework indicated by Fig. 1; where the MRFT response of a dynamic system is obtained and passed to a DL model to predict the parameters of the PUT. Evaluation was performed using simulation data in addition to experimental tests on a UAV. Simulation results Fig. 5: DL model architecture. The input vector length is 4520. are compared against PEM as a well-established system iden- Layer 1 contains 3000 neurons and Layer 2 contains 1000 tification method and the non-parametric tuning rules as an neurons. optimal controller design criteria.

TABLE II: DL verification set results on simulation data A. Simulation Results Modified Softmax Conventional Softmax Classification Accuracy 38.46% 30.38% Fifty different system parameter combinations were ran- domly sampled from the parameter space D to form a testing Average JpT 0.30% 0.41% set D¯ test. Unlike the DL verification set in III-C, these systems Maximum JpT 5.03% 13.29% are not members of the discretized parameter space D¯, and are Minimum φm 13.73 4.89 thus better suited to evaluate the generalization performance of the inclusive system identification solution. Testing data is generated by simulating the MRFT response for each system Table II demonstrates the classifier’s performance on the in D¯ test under varied u0 and ℵ. Data is then pre-processed verification set utilizing both the modified and conventional and passed to the DL classifier to predict the parameters of a Softmax formulations. Although the DL network was used as system Gp. a classifier, classification accuracy is not a suitable measure The controller performance deterioration JpT is utilized of the performance as it does not reflect the error between the to evaluate the accuracy of identification. The and predicted system Gp and GT . As such, controller performance maximum deterioration values for the testing set are shown in deterioration JpT is considered as a better and impartial eval- Table III using both the conventional Softmax function and the uation criteria, especially as it was the basis for discretizing modified formulation in (14). Both the average and maximum ∗ the system identification problem. JpT are within J = 10%, which validates the proposed Results show that although classification accuracy is rela- means of system identification. The modified Softmax formu- ∗ tively low, the average joint cost JpT is well below J = 10%. lation outperforms the conventional one as it results in lower The modified Softmax formulation results in a generally controller performance deterioration and provides a larger lower JpT than the conventional Softmax function, particularly margin for φm. These results demonstrate the generalization when comparing the maximum cost of misclassification. These capabilities of the system identification DL framework to results highlight a promising performance of the developed accommodate for the full parameter space D under different DL scheme in system identification applications under varied conditions of system bias and measurement noise. 8

TABLE III: DL testing set Dtest results on simulation data PV u Modified Softmax Conventional Softmax

Average JpT 0.53% 0.75%

Maximum JpT 3.51% 6.38%

Minimum φm 15.53 12.15

B. Comparison with the prediction error method The system identification performance of the proposed method was benchmarked against PEM using the same testing set D¯ test. PEM was implemented using Matlab’s System (a) Identification Toolbox [48] with Sequential Quadratic Pro- gramming as the search algorithm. To generate input/output estimation data for PEM, the closed-loop response of each PV u system in D¯ test was simulated under varied u0 and ℵ. MRFT was chosen as the closed-loop controller is due to its guarantee of stability for all systems in D. Accordingly, two different sets of estimation data were simulated for each PUT in order to assess PEM’s performance against the amount of observation data. The first set (Estimation Set-I) consists of a single cycle of the MRFT response with the distinguishing phase β = −0.73 of section III-A. Fig. 6-a shows a sample PEM estimation data of the first set, which is on par with the (b) requirements of the DL approach proposed in this paper. The second set (Estimation Set-II) consists of 20 seconds of the Fig. 6: Sample of PEM input/output estimation data (a) simulated MRFT response with the β parameter continuously Estimation Set-I with a single cycle of MRFT response (b) swept from βmax = −0.1 to βmin = −0.9; resulting in Estimation Set-II with multiple cycles of MRFT response with oscillations of different magnitude and frequency as shown in varied β. Fig. 6-b. Finally, the most sensitive system in D for parameters variation which is Ginitial = [0.1, 0.015, 0.2] was selected as the initial guess for PEM predictions in this study unless data enhances the prediction accuracy of PEM as indicated explicitly stated otherwise. by results on Estimation Set-II; but it requires significantly larger processing time and still does not satisfy the maximum TABLE IV: Comparison of system identification performance deterioration target of J ∗ = 10%. on simulation data for testing set D¯ test By examining the cases where PEM fails to produce accu- rate estimations, two main factors affecting PEM performance PEM were identified. The first of which is PEM’s requirement of DL a good initial guess. Table V shows PEM predictions for a Estimation Set-I Estimation Set-II system GT : {Tprop = 0.02,Tbody = 0.3, τ = 0.001} with ∗ ∗ Average JpT 10.82% 4.41% 0.52% different initial guesses. PEM predictions differ significantly with the initial guess and do not consistently converge to a Maximum J unstable unstable 3.51% pT suitable solution. By contrast, our proposed approach does Number of unstable 5 1 0 not require an initial guess or prior knowledge of the PUT predictions to generate appropriate system identification results. Computation time 1.8936 4.8740 0.005 The second factor disturbing PEM predictions is the input per inference (s) bias u0. Table VI compares identification results for a system GT : {Tprop = 0.02,Tbody = 0.3, τ = 0.001} under different *: Excluding unstable predictions. values of input bias using PEM and the suggested deep Table IV compares PEM identification results against our learning method. Referring to the results in Table VI, PEM DL approach in terms of controller performance deterioration predictions worsen as input bias get larger; by contrast, no J and computation time per inference. Unstable predictions significant differences are observed using our deep learning pT system identification method. are defined as those that results in JpT which grows to infinity with time (note that a condition to handle steady-state error is used). PEM results on Estimation Set-I show that unlike C. Comparison with non-parametric tuning rules the deep learning approach, PEM fails to generate reliable In order to assess the significance of the deep learning predictions of system parameters from just a single cycle of system identification solution for controller synthesis, it was the MRFT response. Increasing the amount of observation benchmarked against the non-parametric tuning rules of [27] 9

TABLE V: PEM system identification results with different D. Experimental Results initial guesses for a simulated process GT : {Tprop = The approach presented in this paper was implemented 0.02,Tbody = 0.3, τ = 0.001} to independently identify the altitude and attitude dynamics

Initial Guess Controller Performance Deterioration JpT of a UAV. We utilized the Quanser QDrone as the testing platform for our . The onboard IMU data was Gp: {Tprop, Tbody, τ} Estimation Set-I Estimation Set-II fused with Optitrack’s motion capture system to estimate the {0.015, 0.2, 5 × 10−4} 0.0% 6.7% pose of the drone. The procedure summarized in Fig. 1 is applied to a single control loop of the QDrone to identify its {0.015, 0.2, 0.1} unstable 21.2% underlying process parameters and optimal PD controller. It {0.3, 2, 0.0005} unstable unstable must be noted that an optimal controller is initially designed offline for each system in D¯ to form a lookup table of {0.3, 2, 0.1} unstable 26.0% optimal controller parameters. During operation, the process {0.15, 1, 0.05} unstable 23.9% parameters are identified in real-time and optimal controller parameters are selected from the pre-designed lookup-table. One advantage of our approach is the guarantee of stability TABLE VI: Comparison of system identification performance during the identification phase by the MRFT controller [26]. on simulation data for process GT : {Tprop = 0.02,Tbody = Therefore, the identification procedure can be safely carried 0.3, τ = 0.001} with varied input bias out without the need for prior knowledge of the underlying system dynamics. To illustrate this feature, identification of Controller Performance Deterioration J pT the altitude dynamics was done with the UAV starting from Input Bias u0 PEM DL the ground with no PD controller. A generalized controller consisting of the summation of a MRFT controller and an Estimation Set-I Estimation Set-II Identification integrator was used to elevate the UAV and excite stable 0.0 0.0% 0.0% 0.0% oscillations around a predefined set-point. The objective of the integrator action is to counter the gravitational force. The −0.1 × hmrft 0.0% 0.0% 0.0% following equation illustrates the quadrotor takeoff controller:

−0.2 × hmrft 0.0% 11.13% 0.0%   R  ki (zref − z)dt if z < zref or z˙ < z˙max −0.3 × hmrft 0.0% 15.95% 0.0%  ui(t) = −0.4 × h  mrft 7.49% 16.09% 0.0%  ui(t−) otherwise (16) where ki is a constant gain, z and z˙ are the altitude and altitude as it has comparable data and computational time require- change rate respectively, zref is the set point for altitude, ments. These tuning rules were used to infer optimal PD z˙max is the maximum allowed altitude rate, and ui(t−) is controllers for each system in D¯ test. Using these rules, an av- the previous controller output. erage performance deterioration of 1.67% and a maximum of Once a steady state oscillation is acquired, the identification 13.29% were observed on the testing set. By comparison, the scheme takes place using the pre-trained DL classifier. Based approach presented in this paper reduces these deterioration on the identified process parameters, an optimal PD controller values to a third as indicated in Table III. These performance is inferred and applied to the plant. Fig. 7 shows the altitude improvements would be difficult to observe in practice as both and the controller action during the end-to-end identification approaches result in a low performance deterioration value. and control process. The proposed take-off method was ca- The main advantage the work presented in this paper holds pable of stably lifting the UAV while simultaneously excit- over the non-parametric tuning rules lies in its scalability to ing oscillations. The DL network then identified the process various controller structures and system models. The proposed parameters as Gh : {Tprop = 0.0321,Tbody = 1.6886, τ = approach identifies the dynamic model parameters, which 0.0237}. Accordingly, the ISE optimal controller parameters ∗ enables the design of a wide set of controllers fitting to specific were selected as Ch : {Kp = 59.0220,Kd = 9.0356}. The practical and performance requirements; as opposed to the PID controlled system response demonstrates a stable and smooth structure limitation of the tuning rules. Accurate knowledge of performance. In the absence of a ground truth system, this the model parameters can further be utilized in designing other favorable controller performance indicates the effectiveness sub-systems such as: trajectory generation, state estimation, or of the presented identification technique in targeting realistic multi-loop cascaded controllers. The presented solution can be control applications. extended to any parametric system identification or controller The identification for the altitude dynamics was tuning problem with minimal modifications to the approach repeated with a payload of 400g attached to the drone; which given the constraint on the number of unknown parameters. corresponds to a 30% increase in the drone’s mass. The In contrast, adapting the tuning rules for different model optimal PD controller parameters identified under the increase structures would require extensive theoretical adjustments and in mass were {Kp = 69.9732,Kd = 11.0002}, which shows ∗ analysis. a reasonable inflation over the nominal parameters in Ch. 10

TABLE VII: cross performance deterioration matrix showing controller performance deterioration Jij for DL system identi- fication of different cycles of the experimental MRFT response in Fig. 8 X XX cycle j XX 1 2 3 4 5 cycle i XXX 1 - 0.19% 0.0% 0.9% 0.0%

2 0.77% - 0.0% 1.58% 0.01%

3 1.25% 0.27% - 1.97% 0.35%

4 0.0% 1.03% 0.45% - 0.0%

5 0.14% 0.64% 0.23% 0.95% - Fig. 7: UAV experimental results for the altitude control loop. In the takeoff phase the algorithm presented in (16) was used. Note that MRFT takes a few oscillations to reach steady state. A single oscillation at steady state is selected as the input to drone’s position. To undermine the translational drift, a bias the trained DL classifier. compensation technique was implemented where the output of the roll channel’s sub-optimal controller C(s) is filtered and stored prior to initiating the identification procedure. The To assess the precision of the proposed identification filtered output is then used to offset the MRFT controller scheme, the DL framework was tested with multiple cycles during the system identification process. of the MRFT response as indicated in Fig. 8. Each cycle Fig. 9 shows the results of the roll channel identification serves as an independent input to the DL model, which experiments during the subsequent identification and control predicts system parameters accordingly. Table VII shows the phases with and without compensating for biases. When ap- cross-performance deterioration matrix among the multiple plying the bias compensation technique, the DL classifier iden- identified systems from multiple steady-state oscillations cor- tified the process parameters as Gr : {Tprop = 0.02,Tbody = responding to a single MRFT run. The maximum joint cost 1.6889, τ = 0.0121}, and the corresponding optimal PD con- ∗ observed was 1.97% despite noticeable noise and variations troller parameters were Cr : {Kp = 1.1000,Kd = 0.0985}. among subsequent cycles of the response, which illustrates The PD control stage shows a positive response of the attitude the precision and noise rejection capability of the trained DL dynamics to the designed controller and no substantial drift in classifier. the drone’s position was observed. For the experiment where bias is not compensated for, the identified optimal controller ∗ parameters were Cr− = {Kp = 1.1082,Kd = 0.1013}. 1.05 100 ∗ ∗ Cr− is almost identical to Cr , and would cause a practi- 1 2 3 4 5 80 cally negligible performance deterioration of less than 0.2% when applied to Gr. This demonstrates the robustness of 60 the developed identification framework under different testing 1 conditions, which can widen its scope of application to a broad 40 range of practical control problems. Altitude (m) 20 E. supplementary material Motor Commands (%) 0.95 0 For better visualization of the experimental results, readers 0 0.5 1 1.5 2 2.5 are encouraged to refer to the supplemental video in [49], Time (s) which better highlights the performance of the proposed Fig. 8: Multiple cycles of the experimental MRFT response identification and control framework when applied to altitude for the UAV height control loop. and attitude loops of a quadrotor UAV. In addition to showing the capability of devising high-performance controllers in The identification process was further applied to the roll real-time, the video demonstrates the robustness of these dynamics of the QDrone. The closed loop system starts with controllers to several artificial disturbances applied during a sub-optimal stabilizing controller C(s) that enables that operation time. These disturbances mimic practical conditions UAV to safely take-off; the system identification procedure a UAV might encounter during a mission flight; and include is then applied and an optimal controller C∗(s) is inferred weight changes, external nudges and induced wind speeds up accordingly. It must be noted that the outer-loop position con- to 5 m/s. The UAV sustains stability and performance despite troller is disabled during the identification phase to maintain the extreme conditions; which highlights the robust capabilities a constant reference value for the MRFT controller. As a of the presented methodology and its applicability to practical result, inherent system biases can cause a lateral drift in the identification and control problems. 11

Fig. 9: UAV experimental results for the roll control loop: (a) Without bias compensation for the outer-loop, inherent system biases cause a drift in the lateral position of the drone. (b) With bias compensation, lateral drift is minimized. Proper system and controller parameters for the roll dynamics were identified in both scenarios.

V. CONCLUSION of which makes the presented approach applicable to a wide This paper introduced a novel approach for linear systems set of practical control problems. identification of a degree up to SOIPTD. The proposed method For future work, we aim to evaluate the proposed iden- combines MRFT and DL to obtain a distinctive frequency tification scheme for higher order systems by extending the response from an unknown plant, and map this response to techniques of finding the distinguishing phase and parameter a set of process parameters. The design of the identification space discretization to higher dimensions. procedure is presented as finding the distinguishing phase for a family of processes with the same model structure. APPENDIX A System identification is then approached as a classification DERIVATIVE OF THE MODIFIED SOFTMAX FUNCTION WITH problem by using the principle of controller performance THECROSS-ENTROPYLOSSFUNCTION deterioration. Subsequently, a DL classifier is constructed and trained on noisy simulated MRFT responses. The end-to-end The derivative of the Softmax probability pi from (14) with identification process takes place online requiring few seconds respect to the kth logit can be obtained using the quotient rule: of observation data and microsecond level inference. The suggested approach was verified through simulation and experimentation. Experiments were carried out to identify the ∂p i = altitude and attitude dynamics of a UAV. Results show the ∂ak N N effectiveness of the presented techniques by demonstrating ∂ (eγiT ·ai ) · P eγjT ·aj − ∂ (P eγjT ·aj ) · eγiT ·ai stability in the adaptation phase, accuracy of identification, and ∂ak j=1 ∂ak j=1 N P γjT ·aj 2 real-time computation capabilities. The proposed method was ( j=1 e )

∂ γiT ·ai PN γjT ·aj γkT ·ak γiT ·ai bench-marked against PEM and the optimal tuning rules; and (e ) · e − γkT e · e ∂ak j=1 exhibited advantages in accuracy, robustness to input biases, = N (17) P γjT ·aj 2 less observation data requirements, and faster inference. All ( j=1 e ) 12

In the case i = k: REFERENCES

∂pi ∂pk [1] C. Wu, B. Ju, Y. Wu, X. Lin, N. Xiong, G. Xu, H. Li, and X. Liang, = “Uav autonomous target search based on deep reinforcement learning ∂ak ∂ak in complex disaster scene,” IEEE Access, vol. 7, pp. 117 227–117 245, N γkT ·ak P γjT ·aj γkT ·ak γkT ·ak 2019. γkT e · j=1 e − γkT e · e [2] J. A. Shaffer, E. Carrillo, and H. Xu, “Hierarchal application of receding = N P γjT ·aj 2 ( j=1 e ) horizon synthesis and dynamic allocation for uavs fighting fires,” IEEE Access, vol. 6, pp. 78 868–78 880, 2018. γ ·a PN γjT ·aj γkT ·ak e kT k j=1 e − e [3] H. Shakhatreh, A. H. Sawalmeh, A. Al-Fuqaha, Z. Dou, E. Almaita, = γkT N · N I. Khalil, N. S. Othman, A. Khreishah, and M. Guizani, “Unmanned P γjT ·aj P γjT ·aj j=1 e j=1 e aerial vehicles (uavs): A survey on civil applications and key research challenges,” IEEE Access, vol. 7, pp. 48 572–48 634, 2019. = γkT pk(1 − pk) (18) [4] J. Kim, S. Kim, C. Ju, and H. I. Son, “Unmanned aerial vehicles in agri- culture: A review of perspective of platform, control, and applications,” Alternatively, for i 6= k, ∂ (eγiT ·ai ) = 0. (17) hence ∂ak IEEE Access, vol. 7, pp. 105 100–105 115, 2019. simplifies as: [5] I. Kalyaev, S. Kapustyan, D. Ivanov, I. Korovin, L. Usachev, and G. Schaefer, “A novel method for distribution of goals among uavs for oil field monitoring,” 09 2017, pp. 1–4. γ ·a γiT ·ai ∂pi 0 − γkT e kT k · e [6] L. Ljung, “System Identification.” Birkhauser,¨ Boston, MA, 1998, = N pp. 163–173. [Online]. Available: http://link.springer.com/10.1007/978- ∂a P γjT ·aj 2 k ( j=1 e ) 1-4612-1768-8{ }11 −eγkT ·ak eγiT ·ai [7] L. Ljung, “PREDICTION ERROR ESTIMATION METHODS*,” Tech. Rep., 2002. [Online]. Available: https://link.springer.com/content/pdf/ = γkT N · N = −γkT · pk · pi (19) P γjT ·aj P γjT ·aj j=1 e j=1 e 10.1007/BF01211648.pdf [8] P. M. Van den Hof, A. Dankers, P. S. Heuberger, and X. Bombois, Therefore, the derivative of the modified Softmax formula- “Identification of dynamic models in complex networks with prediction tion is: error methodsbasic methods for consistent module estimates,” Automat- ica, vol. 49, no. 10, pp. 2994–3006, 2013.  [9] A. Hagenblad, L. Ljung, and A. Wills, “Maximum likelihood identifi-  −γ · p (1 − p ) if i = k cation of wiener models,” Automatica, vol. 44, no. 11, pp. 2697–2705, ∂pi  kT k k 2008. = (20) [10] L. Tong and S. Perreau, “Multichannel blind identification: From ∂ak   −γkT pkpi if i 6= k subspace to maximum likelihood methods,” Proceedings of the IEEE, vol. 86, no. 10, pp. 1951–1968, 1998. PN [11] Y. Hu, J. Wu, and C. Zeng, “Robust adaptive identification of linear The cross-entropy function L = − i=1 yi log(pi) has the time-varying systems under relaxed excitation conditions,” IEEE Access, following derivative with respect to the kth logit: vol. 8, pp. 8268–8274, 2020. [12] R. Pintelon, P. Guillaume, Y. Rolain, J. Schoukens, and H. Van Hamme, N “Parametric identification of transfer functions in the - ∂L X 1 ∂pi a survey,” IEEE Transactions on Automatic Control, vol. 39, no. 11, pp. = − yi (21) ∂ak pi ∂ak 2245–2260, Nov 1994. i=1 [13] N. Nevaranta, S. Derammelaere, J. Parkkinen, B. Vervisch, T. Lindh, Where y is a One-Hot encoded vector that points out the K. Stockman, M. Niemel, O. Pyrhnen, and J. Pyrhnen, “Online identi- fication of a mechanical system in frequency domain using sliding dft,” ground truth class T . IEEE Transactions on Industrial Electronics, vol. 63, no. 9, pp. 5712– ∂p By plugging i from (20) to (21), the backpropagation 5723, Sep. 2016. ∂ak term ∂L becomes: [14] S. Villwock and M. Pacas, “Application of the welch-method for the ∂ak identification of two- and three-mass-systems,” IEEE Transactions on Industrial Electronics, vol. 55, no. 1, pp. 457–466, Jan 2008. ∂L [15] S. Chen, S. Billings, and P. Grant, “Non-linear system identification X International journal of control = −ykγkT (1 − pk) + yi(γkT · pk) using neural networks,” , vol. 51, no. 6, ∂ak pp. 1191–1214, 1990. i6=k [16] S. Lu and T. Basar, “Robust nonlinear system identification using neural- X network models,” IEEE Transactions on Neural networks, vol. 9, no. 3, = γkT [−yk + ykpk + yipk] pp. 407–429, 1998. i6=k [17] A. Delgado, C. Kambhampati, and K. Warwick, “Dynamic recurrent X = γ [−y + p (y + y )] (22) neural network for system identification and control,” IEE Proceedings- kT k k k i and Applications, vol. 142, no. 4, pp. 307–314, 1995. i6=k [18] D. Grymin and M. Farhood, “Two-step system identification and trajec- P tory tracking control of a small fixed-wing uav,” Journal of Intelligent As y is a one-hot encoded vector, yk + i6=k yi = 1. The and Robotic Systems, vol. 83, 11 2015. backpropogation term hence becomes: [19] M. Ahsan and M. Choudhry, “System identification of an airship using trust region reflective algorithm,” International Journal of ∂L Control, Automation and Systems, vol. 15, pp. 1–10, 05 2017. = γ (y − p ) (23) ∂a kT k k [20] W. Wei, M. Tischler, and K. Cohen, “System identification and controller k optimization of a quadrotor unmanned aerial vehicle in hover,” Journal of the American Helicopter Society, vol. 62, 10 2017. ACKNOWLEDGMENT [21] S. Bhandari, P. Navarro, and A. Ruiz, “Flight testing, , We would like to thank Quanser for the generous support and system identification of a multicopter uav,” 01 2017. [22] M. Patil, L. Hale, and C. Roy, “Aerodynamic parameter identification and technical assistance. We also thank Mohamad Wahbah and uncertainty quantification for small unmanned aircraft,” 01 2015. and AbdulRahaman Al-Marzooqi for their technical assistance [23] J. Guo, B. Mu, L. Wang, G. Yin, and L. Xu, “Decision-based system with the experiments. This publication is based upon work sup- identification and adaptive resource allocation,” IEEE Transactions on Automatic Control, vol. PP, pp. 1–1, 09 2016. ported by the Khalifa University of Science and Technology [24] J. G. Ziegler, N. B. Nichols et al., “Optimum settings for automatic under Award No. RC1-2018-KUCARS. controllers,” trans. ASME, vol. 64, no. 11, 1942. 13

[25] K. Astr˚ om¨ and T. Hagglund,¨ “Automatic tuning of simple regulators R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 597– with specifications on phase and amplitude margins,” Automatica, 607. [Online]. Available: http://papers.nips.cc/paper/6662-convergence- vol. 20, no. 5, pp. 645–651, sep 1984. [Online]. Available: analysis-of-two-layer-neural-networks-with-relu-activation.pdf https://www.sciencedirect.com/science/article/pii/0005109884900141 [46] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and [26] I. Boiko, “Modified relay feedback test (MRFT) and tuning of R. Salakhutdinov, “Dropout: A simple way to prevent neural PID controllers,” in Non-parametric Tuning of PID Controllers, networks from overfitting,” Journal of Machine Learning 2013, no. 9781447144649, pp. 25–79. [Online]. Available: http: Research, vol. 15, pp. 1929–1958, 2014. [Online]. Available: //link.springer.com/10.1007/978-1-4471-4465-6{ }3 http://jmlr.org/papers/v15/srivastava14a.html [27] M. S. Chehadeh and I. Boiko, “Design of rules for in-flight non- [47] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep parametric tuning of PID controllers for unmanned aerial vehicles,” network training by reducing internal covariate shift,” 2015, pp. Journal of the Franklin Institute, vol. 356, no. 1, pp. 474–491, jan 448–456. [Online]. Available: http://jmlr.org/proceedings/papers/v37/ 2019. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/ ioffe15.pdf S0016003218306604 [48] T. MathWorks, System Identification Toolbox, Natick, Massachusetts, [28] A. P. Schoellig, F. L. Mueller, ·. Raffaello D’andrea, A. P. Schoellig, United State, 2019. [Online]. Available: https://www.mathworks.com/ F. L. Mueller, ·. R. D’andrea, and R. D’andrea, “Optimization- products/sysid.html based iterative learning for precise quadrocopter trajectory tracking,” [49] A. Ayyad, Real-time System Identification Using Deep Learning vol. 33, pp. 103–127, 2012. [Online]. Available: https://link.springer. for Linear Processes, 2020. [Online]. Available: https://youtu.be/ com/content/pdf/10.1007{%}2Fs10514-012-9283-2.pdf dz3WTFU7W7c [29] N. O. Lambert, D. S. Drew, J. Yaconelli, S. Levine, R. Calandra, and K. S. Pister, “Low-level control of a quadrotor with deep model-based reinforcement learning,” IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 4224–4230, 2019. [30] F. Berkenkamp, A. P. Schoellig, and A. Krause, “Safe Controller Optimization for Quadrotors with Gaussian Processes.” [Online]. Available: http://tiny.cc/icra16 [31] W. Koch, R. Mancuso, R. West, and A. Bestavros, “Reinforcement learning for uav attitude control,” ACM Transactions on Cyber-Physical Systems, vol. 3, no. 2, p. 22, 2019. [32] A. Molchanov, T. Chen, W. Honig,¨ J. A. Preiss, N. Ayanian, and G. S. Sukhatme, “Sim-to-(multi)-real: Transfer of low-level robust control policies to multiple quadrotors,” arXiv preprint arXiv:1903.04628, 2019. [33] J. Hwangbo, I. Sa, R. Siegwart, and M. Hutter, “Control of a quadrotor with reinforcement learning,” IEEE Robotics and Automation Letters, vol. 2, no. 4, pp. 2096–2103, 2017. [34] S. Bansal, A. K. Akametalu, F. J. Jiang, F. Laine, and C. J. Tomlin, “Learning quadrotor dynamics using neural network for flight control,” in 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 2016, pp. 4653–4660. [35] J. Loeb, “Recent advances in nonlinear servo theory,” Frequency re- sponse, pp. 260–268, 1965. [36] P. Poksawat, L. Wang, and A. Mohamed, “Automatic tuning of attitude control system for fixed-wing unmanned aerial vehicles,” IET Control Theory & Applications, vol. 10, no. 17, pp. 2233–2242, 2016. [37] W. Giernacki, D. Horla, T. Ba´ca,ˇ and M. Saska, “Real-time model- free minimum-seeking autotuning method for unmanned aerial vehicle controllers based on fibonacci-search algorithm,” Sensors, vol. 19, no. 2, p. 312, 2019. [38] I. Boiko, “Design of non-parametric process-specific optimal tuning rules for PID control of flow loops,” Journal of the Franklin Institute, vol. 351, no. 2, pp. 964–985, feb 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0016003213003487 [39] D. P. Atherton and G. M. Siouris, “Nonlinear control engineering,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 7, no. 7, pp. 567– 568, July 1977. [40] R. E. Kopp and R. Orford, “ Applied to System Identification for Adaptive Control Systems ,” AIAA Journal, vol. 1, pp. 2300–2306, 1963. [41] M. Jansson and B. Wahlberg, “A linear regression approach to state- space subspace system identification,” , vol. 52, pp. 103–129, 1996. [42] F. M. Juang and C. D. Hsieh, “A Locally Recurrent Fuzzy Neural Net- work With Support Vector Regression for Dynamic System Modeling,” IEEE Transactions on Fuzzy Systems, vol. 18, pp. 261 – 273, 2010. [43] R. Rothe, R. Timofte, and V. Gool, “Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks,” Inter- national Journal of Computer Vision, vol. 126, p. 144157, 2016. [44] H. Noh, T. You, J. Mun, and B. Han, “Regularizing deep neural networks by noise: Its interpretation and optimization,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 5109–5118. [Online]. Available: http://papers.nips.cc/paper/7096-regularizing-deep- neural-networks-by-noise-its-interpretation-and-optimization.pdf [45] Y. Li and Y. Yuan, “Convergence analysis of two-layer neural networks with relu activation,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and