A Deep Reinforcement Learning-Based Approach to Intelligent Powertrain Control for Automated Vehicles *
Total Page:16
File Type:pdf, Size:1020Kb
2019 IEEE Intelligent Transportation Systems Conference (ITSC) Auckland, NZ, October 27-30, 2019 A Deep Reinforcement Learning-Based Approach to Intelligent Powertrain Control for Automated Vehicles * I-Ming Chen, Member, IEEE, Cong Zhao and Ching-Yao Chan, Member, IEEE Abstract—The development of a powertrain controller for automated driving systems (ADS) using a learning-based approach is presented in this paper. The goal is to create an intelligent agent to learn from driving experience and integrate the operation of each powertrain unit to benefit vehicle driving performance and efficiency. The feature of reinforcement learning (RL) that the agent interacts with environment to improve its policy mimics the learning process of a human driver. Therefore, we have adopted the RL methodology to train the agent for the intelligent powertrain control problem in this study. In this paper, a vehicle powertrain control strategist named Intelligent Powertrain Agent (IPA) based on deep Q-learning Figure. 1 Architecture for ADS technology network (DQN) is proposed. In the application for an ADS, the IPA receives trajectory maneuver demands from a The approaches for vehicle powertrain control can be decision-making module of an ADS, observes the vehicle states classified into three categories as shown in Fig. 2. Rule-based and driving environment as the inputs, and outputs the control methods use principles developed with heuristics, experiment, commands to the powertrain system. As a case study, the IPA is and analysis to formulate a control strategy with look-up applied to a parallel hybrid vehicle to demonstrate its capability. tables of previously saved parameters for real-time control. Through the training process, the IPA is able to learn how to Being simple, fast, and stable, the rule-based approach is the operate powertrain units in an integrated way to deal with most commonly used approach in practical applications. varied driving conditions, and vehicle’s speed chasing ability However, the rule-based methods are usually inflexible by and power management are improved along with its driving their nature, and it cannot adapt to changes in driving situation. experience. Therefore, the system performance is far from being optimal, I. INTRODUCTION and calibration is required to ensure system performance in different conditions. The optimization-based methods use an The development of autonomous vehicles (AVs) and established system mathematical model with an optimized or automated driving systems (ADS) is a multi-disciplinary sub-optimized algorithm to calculate the control strategy. integration as depicted in an exemplar system architecture in Some examples of the optimization-based methods are Fig. 1. The system requires efforts across many technological equivalent consumption minimization strategies (ECMS) [1], fields including computer vision, sensor fusion, information dynamic programming (DP), stochastic dynamic connection, planning and control. While the topics of programming (SDP) [2] and model predictive control (MPC). perception and planning for AVs have drawn a great deal of While these methods offer a mechanism for optimization, attention from researchers, the lower-level control problem such methods require highly accurate system models, high related to the operational control of steering, throttle and computational demands and prediction for future driving brake earns relatively less notice. This is due to the fact that condition, thus rendering the implementation of this type of the conventional control approaches used by the vehicle approach impractical. industry have long been proven to be solid and efficient for low-level controllers, and is regarded to be sufficient for the applications of ADS. However, for an ADS, there will be a higher level of integration and coordination, which is an essential difference from conventional vehicles. Therefore, the powertrain performance of an ADS has the potential to be further improved in a systematic manner in comparison with the component level optimization conventionally. Figure. 2 Control category of vehicle powertrain control The third category is the learning-based method, which is *Research supported by Ministry of Science and Technology, Taiwan an emerging approach in recent years. Learning-based with project number 107-2917-I-564-039. methods structure and train a neural network (NN) to establish I-M. Chen and C.-Y. Chan are with California PATH, University of the relationship between the input and output data, and are California, Berkeley, CA 94804, USA. (e-mail: [email protected]; able to adapt and optimize powertrain operation for different [email protected]). C. Zhao is with Key Laboratory of Road and Traffic driving conditions. With the data-driven mechanism, the Engineering of the Ministry of Education, Tongji University, Shanghai difficulty of deriving accurate system models can be 201804, China. (e-mail: [email protected]). 978-1-5386-7024-8/19/$31.00 ©2019 IEEE 2620 dismissed, and real measurement data are utilized to II. PARALLEL HYBRID ELECTRIC VEHICLE practically improve system control strategy. When the training process is completed, the stored NN can be applied to In this study, a model of a parallel HEV is used as a case real-time control. The learning-based method possesses the study to explain the formulation and development of the IPA. advantages of both rule-based and optimization-based It should be noticed that in real applications, the IPA does not approaches. The NN system can be used for driving require a mathematical model in its algorithm, the IPA environment prediction as suggested in [3], [4], or performing interacts directly with the real environment and obtains power management illustrated in [5], [6]. However, the feedback from onboard sensors of a vehicle. However, in aforementioned previous studies did not consider the order to conduct preliminary training before driving the integration of NN with lower level controllers, namely, the vehicle on a real way, the powertrain model of a parallel throttle, brake, and gear shift operation. Since an integrated hybrid system is built in a simulation scenario. powertrain controller is beneficial to ADS, we propose a The parallel hybrid system investigated in this paper learning-based control approach to establish an adaptive and consists of one ICE, one MG, and one 5-speed transmission, intelligent agent. its configuration is depicted in Fig. 4, where Iice and Img are the In this paper, we use the name of Intelligent Powertrain inertias of the ICE and MG, Nt and Nf represent ratio of the Agent (IPA) to refer to the learning-based integrated control 5-speed transmission and final drive, Mv is the vehicle mass, agent for ADS. The development of this agent is based on the Tice and Tmg are the torques applied to the ICE and MG. concept of deep Q-learning network (DQN). Fig. 3 shows the architecture of the IPA. The IPA receives motion planning commands from a high-level decision maker in the ADS. The IPA uses the motion command, as well as the vehicle state and the surrounding environment as its inputs. The IPA then outputs control commands to powertrain units including Figure. 4 Configuration of parallel hybrid system internal combustion engine (ICE), motor/generator (MG), The speed equation of the powertrain is shown in (1), transmission and brake. The IPA utilizes a NN to figure out where ωw, ωmg, and ωice denote the angular velocities of wheel, the relationship between the inputs and outputs through its MG and ICE. The dynamic equation of the system is arranged training process. The task of IPA is to learn a powertrain in (2), where M is the vehicle equivalent mass, r is the wheel operation strategy to maximize a reward that is defined to e w radius, Tb is the torque applied to the brake, g is the reflect the vehicle performance and efficiency under varied 2 gravitational acceleration (9.81 m/s ), Cr is the rolling driving conditions. The IPA can be applied to various types of resistance coefficient, ρ is the density of air, A is the vehicle powertrain configurations including conventional ICE vehicle, frontal area, and Cd is the aerodynamic drag coefficient. electric vehicle (EV), and hybrid electric vehicle (HEV). In TABLE I lists the parameters and their values used for the this paper, a parallel HEV composed of one ICE, one MG, and simulation. Please note that the MG possesses two operation one 5-speed transmission is used as an example to modes, a positive MG torque corresponds to motor mode and demonstrate IPA's functionality. a negative MG torque refers to generator mode. mg ice w (1) Nf NN tf 2 2 2 2 wMr ew IN mg f INN icet f TNNice t f TN mg f T b (2) MgC 0.5 AC r2 r vr dww w TABLE I. PARAMETERS OF PARALLEL HYBRID VEHICLE MODEL Parameters Value Vehicle mass 1400 kg Figure. 3 Architecture for intelligent powertrain agent Vehicle equivalent mass 1600 kg This paper is organized as follows: Section II introduces Transmission ratio [3.46, 1.75, 1.10, 0.86, 0.71] Final drive ratio 3.21 the mathematical models of the powertrain system in the IPA. Tire radius 0.28 m Section III presents the DQN architecture, states, action, Maximum ICE power 101.7Nm @ 4000rpm reward and hyperparameters. A rule-based PID controller is Maximum MG torque 305.0Nm @ 940rpm (Motor) also introduced at the end of this section to establish a Minimum MG torque -305.0Nm @ 940rpm (Generator) benchmark for comparison. Section IV describes the learning Maximum brake torque 2000 Nm Battery capacity 1.3 kWh results and powertrain operation of the IPA. Finally, the Rolling resistance coefficient 0.009 findings and the future work of this study are summarized in Air density 1.2 kg/m3 the conclusion section. Vehicle frontal area 2.3 m2 Aerodynamic drag coefficient. 0.3 2621 * III. METHODOLOGY After the Q (s, a) is found, optimal action under certain state can be selected as a = argmax Q*(s, a), and an ε-greedy policy In order to develop an integrated and intelligent can be applied to ensure adequate exploration of the state powertrain controller for ADS to perform the driving task space.