114 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 1, JANUARY 2020 Non-Cooperative and Cooperative Optimization of Scheduling With Vehicle-to-Grid Regulation Services Xiangyu Chen, Student Member, IEEE, and Ka-Cheong Leung , Member, IEEE

Abstract—Due to the increasing popularity of electric vehi- services are provided by controllable generators, which are cles (EVs) and technological advancements of EV electronics, the expensive to be operated. Therefore, in recent years, the research vehicle-to-grid (V2G) technique, which utilizes EVs to provide an- community is exploring cost-effective and efficient approaches cillary services for the , stimulates new ideas in current research. Since EVs are selfish individuals owned by to providing regulation in place of the controllable different parties, how to motivate them to provide ancillary services generators. becomes an issue. In this paper, game theoretic approaches using Due to the development of intelligent electric vehicle (EV) non-cooperative and cooperative game are proposed to motivate technology, the vehicle-to-grid (V2G) technique, which utilizes EVs to provide frequency regulation services for the power grid. EVs to provide frequency regulation services,1 have become a In a non-cooperative V2G system, the interaction between the EV aggregator and EVs is formulated as a non-cooperative Stackelberg hot research topic in current years. Some recent studies show game. The EV aggregator as the leader decides the electricity that the bidirectional EV chargers [1], [2] and trading price, and EVs as the followers determine their charg- inside EVs [3] can compensate well for frequency regulation- ing/discharging strategies. In a cooperative V2G system, a potential down and regulation-up signals through battery charging and game is formulated to achieve the optimal social welfare of the discharging. Hence, an aggregation of EVs, which constitutes a V2G system. The existence and uniqueness of the Nash equilibrium of these two games are validated. Our simulation results show distributed system as vehicle-to-grid (V2G) sys- that the proposed game theoretic approaches can motivate EVs to tem, can bring substantial capabilities for providing frequency smooth out the power fluctuations from the grid while EVs schedule regulation services. their charging/discharging activities to maximize their utilities. When utilizing V2G system for providing frequency regu- This demonstrates the effectiveness of the use of the V2G game lation services, control techniques are required to coordinate in providing regulation services to the grid. Through cooperation and extra information exchange, the social welfare of EVs and the the charging/discharging powers of EVs. Existing control tech- EV aggregator can be improved to the global optimum and the V2G niques for V2G regulation services include: the grid measure- regulation services can also achieve near-optimal performance. ment approach [4], [5] and the optimization-based approach [6], Index Terms—Electric vehicles (EVs), frequency regulation, [7]. These approaches are from the perspective of grid optimiza- game theory, vehicle-to-grid (V2G). tion and assume that EVs follow the schedules dispatched by the grid operator or aggregators. Nevertheless in the real-world I. INTRODUCTION operations, EVs are selfish individuals which are owned by ALANCING and demand in real time is different EV owners. These owners may be more concerned B critical for stable and reliable operation in power grids. about the utilities of their EVs and the degradation issue of the Due to the increasing penetration of renewables, the power gen- EV batteries, instead of grid operations. Therefore, EVs may fail eration becomes difficult to forecast and follow. This brings great to follow the instructions of the grid because of their conflict of challenges to real-time power balance in power grids. Frequency interests. It is thus a pressing issue on how to motivate EVs to regulation, which aims to stabilize the utility frequency within participate into the V2G system. its nominal range through active power compensation, can keep To solve the aforementioned issue, game theory provides an real-time power balance in power grids. Frequency regulation effective framework to analyze the relationship between the services include regulation-up, which requires ramping up of individual utility and the system goal. Instead of following generation assets, and regulation-down, which requires ramp- the instructions from a centralized controller, in game theory, ing down of generation assets. Traditionally, these regulation each decision-maker makes its own strategy that maximizes its utility. Applying game theoretic approaches to V2G regu- Manuscript received January 16, 2019; revised June 28, 2019, August 27, lation ensures the fulfillment of EV’s utility since each EV can 2019, and October 10, 2019; accepted October 10, 2019. Date of publication choose its optimal charging/discharging strategy to maximize November 11, 2019; date of current version January 15, 2020. This work was supported by the Research Grants Council of the Hong Kong Special its utility. Therefore, game theoretic approaches can motivate Administrative Region, China, under Grant 17261416. The review of this article the participation of EVs in V2G regulation. Existing work has was coordinated by Prof. S. Manshadi. (Corresponding author: Ka-Cheong studied using game theoretic approaches to solve EV schedul- Leung.) X. Chen is with the Department of Electrical and Electronic Engineering, The ing problem [8]Ð[15], which can be further categorized into University of Hong Kong, Hong Kong (e-mail: [email protected]). K.-C. Leung is with the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China (e-mail: [email protected]). 1The regulation services provided by EVs are secondary frequency regulation Digital Object Identifier 10.1109/TVT.2019.2952712 services, which balance the grid power within 5Ð15 .

0018-9545 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. CHEN AND LEUNG: NON-COOPERATIVE AND COOPERATIVE OPTIMIZATION OF SCHEDULING WITH V2G 115 non-cooperative game approaches [8], [10], [12]Ð[14] and co- An electricity pricing model is also devised to motivate EVs to operative game approaches [9], [11], [15]. provide regulation services implicitly. The performance of these However, there remain some research gaps in the current two game theoretic approaches are compared. Our simulation game theoretic approaches. First, the authors in [10]Ð[13], [15] results show that the proposed non-cooperative game theoretic only considered the games among EVs and the authors in [8] approach, together with the devised electricity pricing model, only considered the bidding game among EV aggregators in can autonomously motivate EVs to smooth out the power fluc- an . An hierarchical game among the grid tuations from the grid when EVs maximize their own utilities. operator, aggregators, and EVs in an electricity market, was By using the cooperative game approach, the social welfare not studied. , in [8]Ð[13], EV charging behaviours were of EVs and the EV aggregator can be further improved to the coordinated while EV discharging was not considered. This global optimum and the V2G regulation services can also obtain means that EVs cannot provide V2G regulation services nor sell near-optimal performance, though with small communication electricity back to the power grid. Third, though in [14], [15], the overhead. coordination of EV charging and discharging was investigated The contributions of this work are summarized as follows: using the non-cooperative game approach [14] and cooperative 1) Different from [10]Ð[13], [15] that have only considered game approach [15], the strategy set of an individual EV was games in the EV level, this work studies a hierarchical assumed to have three states only, namely, charging, idle, and game which consists of a V2G game among EVs and a discharging. This assumption oversimplifies the strategy set of pricing game of the EV aggregator. This helps to under- EVs and thus fails to consider the case that the strategy set of stand how EVs and the aggregator interact in different EVs has infinitely many elements. Besides, [14] only consid- levels of the V2G market. ers non-cooperative game and [15] only considers cooperative 2) Different from the existing work [14], [15], our proposed game. They do not compare the performance on V2G regulation games are infinite games in which the number of alterna- achieved by the non-cooperative game and cooperative game tives available to each EV is a continuum. This continuous approaches, implying that the merits and drawbacks of these game model is more general and practical than [14], two approaches cannot be validated. Moreover, the approaches [15] since EVs can choose any charging and discharging in [14], [15] suppose that EVs get payment from the grid if they powers within their feasible regions. respond to the regulation requests. These approaches sacrifice 3) Instead of explicitly giving payments to EVs for V2G some benefits of the power grid companies so as to explicitly regulation services [14], [15], we devise an electricity motivate EVs to provide regulation services. This inspires us to pricing model to coordinate EVs to provide regulation devise a smart pricing model that can motivate EVs to provide services implicitly. By reacting to the electricity price, regulation services implicitly. The smart pricing model sets elec- EVs can achieve good performance of providing V2G tricity price based on real-time regulation signals. Specifically, regulation services. the electricity price is set to be high when the power grid requires 4) Both non-cooperative and cooperative games between regulation-up. This motivates EVs to discharge or sell electricity EVs and the EV aggregator are studied in a V2G sys- back to grid so that the regulation-up signals are responded. tem. We find that through cooperation, the optimal social Similarly, the electricity price becomes low when the power grid welfare of EVs and the EV aggregator can be achieved in a requires regulation-down, that motivates EVs to charge from the distributed fashion. Furthermore, V2G regulation services grid so that the regulation-down signals are responded. In this can also obtain near-optimal performance. way, the power grid companies can achieve frequency regulation 5) To operate the V2G system in real-time, real-time V2G services without directly giving EVs payments, that saves the games are extended from the dynamic V2G games. In real- costs of these power grid companies. time V2G games, forecasting information of regulation In this work, a hierarchical game framework, which includes request is not required and decisions are made among those a grid operator, an EV aggregator, and EVs, is proposed for EVs that have arrived at the EV aggregator. Therefore, providing V2G regulation services. In this framework, both real-time games are more practical to be implemented in non-cooperative game and cooperative game are studied to V2G systems. coordinate the aggregator and EVs to provide V2G regulation The rest of this paper is organized as follows. The related work services. In the non-cooperative game, the interaction between is summarized in Section II. In Section III, system architecture competitive EVs and EV aggregator is described as a hierar- and system model, which includes the electric vehicle model, EV chical Stackelberg game, in which the EV aggregator is the aggregator model, and trading price model, are introduced. The leader to determine the electricity trading price and the EVs V2G non-cooperative game and cooperative game are studied in are the followers to decide their charging/discharging strategies. Sections IV and V, respectively. In Section VI, real-time V2G In the cooperative game, the EV aggregator collaborates with games are extended from the dynamic games in Sections IV and its subordinate EVs to maximize their social welfare function. V. The performance evaluation is conducted to examine the V2G The social welfare maximization problem is equivalent to a game in Section VII and the paper is concluded in Section VIII. potential game, in which the incentives of EVs to change their strategies are reflected in the social welfare function. For both V2G non-cooperative game and cooperative game, the existence II. RELATED WORK and uniqueness of the Nash equilibria of the games are validated, The preliminary version of this work can be found in [16]. and algorithms are then devised to find the Nash equilibria. In [16], we proposed a non-cooperative game theoretic approach 116 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 1, JANUARY 2020 to vehicle-to-grid scheduling. Different with [16], in this article, we propose and study both non-cooperative game and coopera- tive game approaches to motivate EVs to provide V2G regulation services. Moreover, simulation results are provided to evaluate the advantages of these two approaches based on regulation performance and social welfare. We also extend these two games so as to support V2G regulation in real time. To provide V2G regulation services, several existing algo- rithms have considered the coordination of charging/discharging behaviours of EVs from the perspective of system optimiza- tion. Based on the control techniques, these algorithms can be divided into 1) the grid measurement approach [4], [5] and 2) the optimization-based approach [6], [7]. The former Fig. 1. System Model. measures the grid frequency and then employs droop character- istics to drive the frequency deviation to zero. It allows EVs to respond quickly to frequency deviation with very limited convexity of the potential function. Nevertheless, the potential communication [4], [5]. However, since this approach relies functions in [12], [13] were devised with no physical meaning on the frequency signal which lacks sophisticated coordination in terms of the system performance, e.g., the social welfare. scheme of EVs, the global optimum is unable to be achieved Therefore, the social optimum cannot be guaranteed in their for- for V2G regulation services. For the optimization-based ap- mulated potential games. In [8]Ð[13], EV charging coordination proach, a global optimization problem is formulated to guide has been considered, but EV discharging is not supported. This the charging/discharging schedules of EVs toward the global means that EVs cannot provide V2G regulation services nor sell optimum. In [6], the optimal V2G scheduling strategies were electricity back to the power grid. proposed based on both forecast-based scheduling and online EV charging/discharging control for V2G regulation services scheduling. A decentralized algorithm was designed to solve has been studied in [14], [15]. In [15], a cooperative game theo- the V2G scheduling problems in a distributed manner based retic model was proposed to investigate the interaction between on the gradient projection method. In [7], a two-level charging aggregator and EVs in a V2G market. The designed interaction scheduling framework was proposed for a large population of game is a single-level game in which EVs are players who EVs. The Benders decomposition technique was applied so as determine their charging or discharging strategies in response to to solve the hierarchical charging scheduling problem. the electricity price. A decentralized mechanism was designed to In order to study the relationship between the behaviours of achieve the optimal performance on providing frequency regula- EVs and the system goal, game theory has been recently applied tion in a distributed fashion. In [14], a non-cooperative two-level to the EV scheduling problem [8]Ð[15]. The coordination of game theoretic framework was proposed for V2G regulation ser- EV charging behaviours using game theoretic approaches has vices. In the upper level, the frequency regulation capacity bids been considered in [8]Ð[13]. In [9], the authors studied the EV among aggregators are modelled as a non-cooperative game. In charging problem in a power system composed of an aggregator the lower level, the charging coordination of EVs was formulated and multiple electric vehicles (EVs) using the robust Stackelberg as a Markov game. The authors in [14], [15] assume that the game approach, in which the aggregator and EVs are consid- strategy set of individual EV only includes three states, namely, ered to be the leader and followers, respectively. The authors charging, idle, and discharging. This assumption oversimplifies in [10] have studied the parking-lot EV charging scheduling the strategy set of EVs and thus fails to consider the case that problem, which can be considered as a non-cooperative game EVs can choose any feasible charging/discharging power within with coupled constraints among EVs. The Nikaido-Isoda relax- their power limits. ation algorithm was used to find the Nash equilibrium point of the game with coupled constraints. In [11], an optimal charging III. SYSTEM MODEL strategy for EVs was proposed based on stochastic mean field A. System Architecture game theory. In [8], the authors considered an optimal bidding problem among EV aggregators in -ahead . Our designed V2G system is composed of three components, The interaction among EV aggregators was modelled as an namely, the grid operator, an EV aggregator, and a fleet of incomplete game and distributed algorithms were devised to grid-connected EVs. The V2G system aims at providing fre- find the Nash equilibrium of the game. In [12], [13], potential quency regulation services to the power grid by coordinating the game theory was applied to study the existence and uniqueness charging/discharging schedules of EVs. Meanwhile, EVs should of the Nash equilibrium in non-cooperative EV charging games. meet their charging requirements during their plug-in periods. In [12], the proposed EV charging game was formulated as an As shown in Fig. 1, the EV aggregator receives regulation request ordinal potential game to guarantee the existence of the Nash Pr(t) kW from the grid operator at Time Slot t. It sets the equilibrium of the game. In [13], a potential game framework electricity trading price with EVs as $ pEV (t) per kWh. EVs for PHEV charging scheduling was studied. The uniqueness of then determine their charging/discharging power in response the Nash equilibrium in the game was validated by studying the to the electricity trading price. Note that, due to the coupling CHEN AND LEUNG: NON-COOPERATIVE AND COOPERATIVE OPTIMIZATION OF SCHEDULING WITH V2G 117

pays the aggregator for the V2G regualtion services provided in terms of the performance of regulation.

B. Basic Model Consider that the V2G system operates over the time horizon [Tbegin,Tend], which is divided equally into T time slots, each of which has a duration of Δt minutes. Each t ∈{1, 2,...,T} = T denotes a time slot in the scheduling period. The set of EVs par- ticipating in the V2G system is denoted as N = {1, 2,...,N}. EV n ∈Nplugs in at the beginning of the time slot tn,in with its initial state of charge (SOC), denoted by Sn,in. The SOC of EV Fig. 2. Agent-based Information Flow in the V2G System. ∈ n at Time Slot t is given as Sn [Sn, Sn], where Sn and Sn indicate its upper and lower bounds, respectively. EV n departs from the EV aggregator at the beginning of Time Slot tn,out. Each EV n needs to fulfill its charging requirement Sn,req kWh before its departure. We define the set of the charging-discharging vectors of all EVs as PN (T )  {P1(T ),P2(T ),...,PN (T )}, where vector Pn(T ):=(Pn(1),Pn(2),...,Pn(T )) denotes the charging-discharging schedule of EV n over scheduling period T , where n = 1, 2,...,N. The charging and discharging limits of EV n are denoted as P n kW and P n kW, respectively. To investigate the interaction between the EV aggregator and Fig. 3. V2G Frequency Regulation for Active Power Balance. EVs, the utility functions of EVs and the EV aggregator as well as the operational constraints of EVs are defined and modelled. Due to the intricacies of these models, the details of these models effect of the behaviours of competitive EVs, each individual will be explained in the following subsections. EV should also consider the charging/discharging schedules of other EVs when making its own decision. Based on the C. Electric Vehicle Model charging/discharging schedules of EVs, the EV aggregator sells ∈N or buys the amount of energy traded with EVs, namely, Eg(t) 1) Utility Function of EV: For each EV n , we define kWh, to or from the grid operator, with grid electricity price $ a utility function Un(Pn(T ), P−n(T ),pEV (T )) which reflects pg(t) per kWh. the revenue of selling electrical energy, the cost of buying To facilitate the pricing and scheduling processes in the V2G electrical energy, and the satisfaction of charging: system, each EV is deployed with a smart V2G agent inside it. The smart V2G agent has two basic functions. First, each V2G T T T T − agent has a computing unit to compute the charging/discharging Un(Pn( ), P−n( ),pEV ( ))= Pn(t)ΔtpEV (PN (t)) schedules of the EV. The optimal schedules of EVs in the t=1 game, i.e., the equilibrium point, can be achieved through the +γn log(Sb,n +Sn(tn,out)) distributed computation among V2G agents. Second, each V2G (1) agent can communicate with the EV aggregator as well as other where P−n(T )  {P1(T ),...,Pn−1(T ),Pn+1(T ),...,PN V2G agents. This allows a V2G agent to receive price signals (T )} denotes the set of charging-discharging vectors of all from the EV aggregator and share scheduling information with the EVs, except the EV n. The first term in (1) gives the other V2G agents. The information flow in the V2G system with revenue/cost of selling/buying electrical energy in dollars, smart V2G agents is illustrated in Fig. 2. where Pn(t) > 0 and Pn(t) < 0 denote buying electrical In the operation of the regulation market, the regulation re- energy from the aggregator (charging) and selling electrical quests are generated by the grid operator based on the mismatch energy to the aggregator (discharging), respectively. The trading between the generation and the load in the power grid, and then price pEV (PN (t)), which gives both the electricity buying the dispatched to the EV aggregator. The regulation requests are selling prices, is decided not only by the aggregator’s pricing either estimated series of signals for a day-ahead market, or scheme, but also by the charging/discharging powers of all EVs real-time generated signals for a real-time market. As illustrated PN (t). The detailed trading price model will be discussed in in Fig. 3, after the aggregator receives regulation-up (regulation- Section III-E. down) signals, it needs to coordinate the discharging (charging) The second term in (1) reflects the satisfaction of EV charging activities of EVs so as to provide regulation-up (regulation- where γn denotes the sensitivity index of satisfaction and Sb,n re- down) services. A V2G game between the aggregator and EVs is flects the basic satisfaction level of EV n. Note that this function then conducted, in which the charging/discharging power of EVs is an increasing and strictly concave function with respect to the and electricity price are determined. The grid operator finally SOC at the departure time, namely, Sn(tn,out). Weight γn maps 118 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 1, JANUARY 2020 the EV’s satisfaction of charging to its economical benefit in dol- formulated as: lars so that the multiple objectives are represented by a uniform T utility function. γn is a predefined parameter that specifies the UA(PN (T ),pEV (T ),pg(T )) = − Eg(t)pg(t) tradeoff between the economical benefits and the satisfaction t=1 of charging. Therefore, it also defines a chosen point from T N the Pareto front as the optimal solution of the multi-objective + Pn(t)Δt · pEV (PN (t)) + Qfr(PN (T ),Pr(T )), problem. t=1 n=1 2) Operational Constraints of EV: In terms of charger limi- (9) tations, the EV charging/discharging power P (t) of EV n ∈N,  n N should follow: where Eg(t)= n=1 Pn(t)Δt. The first and the second terms denote the revenue/cost from ≤ ≤ ∈ Pn Pn(t) Pn,t[tn,in,tn,out] trading electrical energy with the grid operator and the rev- (2) Pn(t)=0,t/∈ [tn,in,tn,out]. enue/cost from trading with EVs, respectively. The third term denotes the income of providing regulation According to (2), during the plug-in period of EV n, Pn(t) services for the grid operator. After the EV aggregator re- should be within the maximum charging and discharging limits. ceive regulation-down (regulation-up) signals, the AGC signals Pn(t) is equal to zero when EV n is unplugged. should be tracked and compensated by coordinating the charging Considering the energy efficiencies of battery, the evolution (discharging) power of EVs. To explain, we define the quality of the SOC Sn(t) of EV n is given as: of smoothing function F (PN (T ),Pr(T )) [6] so as to measure to what extent the frequency regulation requests (AGC signals) ∈{ } Sn(t + 1)=Sn(t)+η(Pn(t))Pn(t)Δt, t 1, 2,...,T . are tracked and compensated by EVs: (3) T T T The energy efficiency of battery, η(x), is defined as: F (PN ( ),Pr( )) = Var(Ptotal( )) ⎛ ⎛ ⎞⎞ ⎧ 2 T T ⎨⎪ηch if x ≥ 0, 1 1 = ⎝P (i) − ⎝ P (j)⎠⎠ . (10) η(x)= 1ifx = 0, (4) T total T total ⎩⎪ i=1 j=1 1 if x<0, ηdch where Var(·) denotes the function for calculating variance. where ηch and ηdch denote the charging and discharging effi- Ptotal(t) defines the sum of the regulation request and the total ciencies, respectively. We have: charging/discharging powers of EVs at Time Slot t: ≤ ≤ 0 <ηch 1 and 0 <ηdch 1. (5) Ptotal(t)=Pr(t)+ Pn(t). (11) n∈N There are two constraints that confines the SOC of EV n. (10) measures the variance of the total power P (t). Intu- Firstly, to protect the battery life, the SOC should lie within the total itively, a lower variance of the total power indicates a better battery’s lower and upper limits: regulation services provided, which corresponds to a higher ≤ ≤ ∈{ } payment for the regulation services. The income of providing Sn Sn(t + 1) Sn,t 1, 2,...,T . (6) frequency regulation services, Qfr(PN (T ),Pr(T )), is thus Secondly, EV n should be charged over its minimum charging expressed as: requirement upon its departure: Qfr(PN (T ),Pr(T )) = Qbase − pfrF (PN (T ),Pr(T )),

Sn,req ≤ Sn(tn,out). (7) (12) where Qbase is the base capacity income for providing regulation T We further define the feasible set of Pn( ), which includes services and pfr is the penalty factor for regulation performance. any feasible sequence of charging and discharging powers The utility function model reflects the income of the EV Pn(T )={Pn(1),...,Pn(T )} as Qn: aggregator under performance-based regulation market. In a performance-based regulation market, the revenues of regula- Q { T | T } n = Pn( ) Pn( ) satisfies (2), (3), (6), and (7) (8) tions services typically consist of the revenue from the energy market, the income of committed regulation capacity, and the Hereinafter, we use Qn to denote the strategy set of EV n for simplicity. income based on regulation performance [17]. The first and second terms in (9) give the total revenue of the EV aggregator in the energy market. The third term in (9) measures the total D. EV Aggregator Model revenue of the EV aggregator in providing regulation services, For the EV Aggregator, its utility consists of three parts, which includes the capacity income and the income based on namely, the cost/revenue from buying/selling electrical energy regulation performance. Specifically, the capacity income of from/to the grid operator, the revenue/cost from trading with frequency regulation is given by Qbase in (12). This is the EVs, and the income of providing regulation services for the payment from the grid operator for the commitment of providing grid operator. Hence, the utility function of the aggregator is a certain capacity of regulation services and takes units of CHEN AND LEUNG: NON-COOPERATIVE AND COOPERATIVE OPTIMIZATION OF SCHEDULING WITH V2G 119

$ per MW of capacity. The degradation of income based on where pG is the fixed price given by the grid operator and LG(t) regulation performance is measured by the second term of is the of the power grid, i.e., the total load of the power (12). The function F (PN (T ),Pr(T )) measures the penalty on grid without the incorporation of V2G. This pricing scheme for regulation income when the grid power deviates from its mean grid operator can be regarded as a based real- value, in which case the regulation signals are not compensated time pricing scheme [18]. well by EVs. IV. V2G GAME:NON-COOPERATIVE APPROACH E. Electricity Trading Price Model In this section, we study the V2G scheduling problem for The electricity trading price between EVs and the EV aggre- a non-cooperative system in a day-ahead market, in which all gator, pEV (t), is defined as follows: players, including EVs and the EV aggregator, are selfish and  aim to maximize their own utilities. The interaction between pEV (t)=pAα(Pn(t)) Lbase(t)+Pr(t)+ Pk(t) EVs and the EV aggregator are formulated as a Stackelberg k∈N game. We consider a dynamic Stackelberg game in which the

EV aggregator and EVs plan their strategies over the daily = pAα(Pn(t)) Lbase(t)+Pr(t)+Pn(t) scheduling period T . The existence and uniqueness of the Nash equilibrium of the game are proved.  + Pk(t) , (13) k∈N ,k=n A. Nash Equilibrium Problem for EVs T where p is the fixed price set by the EV aggregator. L (t)+ We denote by PN ( ) the set of the charging-discharging A base T P (t)+ P (t) as the variable price indicator denotes the vectors for all EVs over scheduling period . Moreover, we use r k∈N k T equivalent total load when the V2G technique is incorporated, notation P−n( ) to denote the set of the charging-discharging T where L (t) is the base load at the EV aggregator. A piecewise vectors over for all EVs except EV n. The aim of EV n, base T function α(P (t)) is defined to differentiate the buying and the given the other EVs strategies P−n( ) and electrical trading n T selling prices: price pEV ( ), is to determine its charging-discharging schedule Pn(T ) that maximizes its utility function. That is, 1ifPn(t) ≥ 0, α(Pn(t)) = (14) P1n : maximize Un(Pn(T ), P−n(T ),pEV (T )) δ if Pn(t) < 0, Pn(T ) T ∈Q where δ denotes the ratio of the selling price to the buying price. subject to Pn( ) n. (16) The proposed real-time pricing model in (13) takes the equiv-  G Q Q  N Q alent load as a price indicator. Therefore, the regulation request The game ( , f), with the strategy set n=1 n  T N Pr(t) as well as the charging/discharging powers of all EVs and the payoff function f (Un(PN ( )))n=1,isaset N PN (t) jointly impact on the trading price pEV (t). It is shown that of coupled optimization problems (16) of all EVs in . this pricing model can drive the EVs to provide regulation-up Game G(Q, f) is also called a Nash Equilibrium problem ∗ and regulation-down services. From (13), we can see that EVs (NEP). A solution of the NEP is a feasible point PN (T )  { ∗ T ∗ T ∗ T } are motivated to discharge to the grid when the grid requires P1 ( ),...,Pn( ),...,PN ( ) such that: regulation up, i.e., Pr(t) > 0. This is because the selling price ∗ T ∗ T T Un(Pn( ), P−n( ),pEV ( )) becomes higher when Pr(t) becomes higher, which gives incen- ≥ T ∗ T T ∀ T ∈Q tives to EVs to sell electricity back to the grid. Similarly, EVs Un(Pn( ), P−n( ),pEV ( )), Pn( ) n (17) are motivated to charge from the grid when the grid requires reg- ∈N ulation down (P (t) < 0) since the buying price becomes lower holds for all EV n . In words, a Nash Equilibrium solution r ∗ T when P (t) becomes lower. Therefore, without any control or is a feasible point PN ( ) with the property that no individual r ∗ T instruction from the grid, the EVs can autonomously balance EV can gain more benefit by unilaterally deviating from Pn( ), the grid power and provide frequency regulation services by if the strategies of all the other EVs remain unchanged. The ∗ T reacting to the trading prices. In Section VII-D1, we employ our strategy for EV n at the Nash Equilibrium, i.e., Pn( ),isalso simulation results to validate this claim. called the best response of EV n. Similarly, it is assumed that the electricity trading price be- tween the grid operator and the EV aggregator, namely, pg(t), B. Pricing Game for EV Aggregator also follows the price model as (13). Therefore, it holds that: Based on the best responses of all EVs, the EV aggregator   aims to maximize its own utility by choosing an appropriate pg(t)=pGα(Pn(t)) LG(t)+Pr(t)+ Pk(t) fixed price pA. The pricing game optimization problem for the k∈N EV aggregator is formulated as: ⎛ ⎞ ∗ P2 : maximize UA(PN (T ),pEV (T ),pg(T )) ⎝ ⎠ pA = pGα(Pn(t)) LG(t)+Pr(t)+Pn(t)+ Pk(t) , ∈N = k ,k n subject to pmin ≤ pA ≤ pmax. (18) (15) 120 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 1, JANUARY 2020

C. Existence and Uniqueness of Equilibrium for V2G Game and convex in Pn(T ) for every fixed P−n(T ), where n ∈N. G Q We prove the existence and uniqueness of the Nash equi- Therefore, by Lemma 1, the game ( , f) can be converted Q librium for the proposed V2G non-cooperative game by using to an equivalent variational inequality VI( , F). Since F is Q Q the variational inequality (VI) theory. Due to the intricacy of continuous and is convex and compact in VI( , F), we can the formulated V2G game, the existence and the uniqueness of conclude that, by Lemma 2, there exists at least one solution for Q G Q  the Nash equilibrium may not be guaranteed. To guarantee the VI( , F), or equivalently, for the game ( , f). existence and uniqueness of the Nash equilibrium, we make the Theorem 2: For any pA given by the EV aggregator, the Nash G Q following assumptions: equilibrium for the game ( , f) is unique. Assumption 1: δ = 1in(14). Proof: In Appendix A, it is proved that the Hessian matrix G Q Assumption 2: The charging and discharging efficiencies, of f in ( , f) is negative definite. Therefore, F is strongly Q namely, η and η , are both equal to one. monotone on based on the convex optimization theory. By ch dch Q Assumption 1 means that it is reasonable to have the price of Lemma 3, VI( , F) admits a unique solution. Since the game G Q Q buying electricity from the EV aggregator to EVs equal to the ( , f) is equivalent to VI( , F), the Nash equilibrium for the G Q  price of selling electricity from EVs to the EV aggregator. In fact, game ( , f) is also unique. G Q it has been studied in [19] that, in an energy market, setting the As a result, the unique Nash equilibrium for the game ( , f) Q electricity buying price equal to the selling price is a necessary can be computed by solving the strongly monotone VI( , F). condition for a non-profit local trading center to achieve the The EV aggregator then sets its optimal price pEV (t) based on optimal social welfare of energy buyers and sellers. Moreover, the best responses of EVs, which represents the equilibrium of we also study two cases when the buying and selling prices are the Stackelberg V2G game. not equal, in which the selling price is higher (lower) than the buying price. When the selling price is higher than the buying D. Algorithms for V2G Non-Cooperative Game price, EVs can always make profits through energy arbitrage, i.e., In Section IV-C, we have proved the existence and the unique- buying electricity at a lower price and then selling it at a higher ness of the Nash equilibrium of the V2G non-cooperative game. price. This is not a desirable case in the V2G market. When In this Section, we devised algorithms for EVs and the EV the selling price is lower than the buying price, EVs lose their aggregator to find the Nash equilibrium. incentives to sell electricity back to the grid, this means that EVs A Jacobi best response-based algorithm, shown in fail to provide regulation-up services in the V2G market. These Algorithm 1, is proposed for each EV to find the solution two cases are indeed studied with our simulation experiments in of the NEP. The Jacobi best response-based EV algorithm Section VII-D3. Hence, it is reasonable to set the buying price is a distributed algorithm devised for each EV and operates equal to the selling price in the V2G market. Assumption 2 is in an iterative fashion. For each iteration, each EV n solves reasonable since the Lithium-ion battery installed in an EV can P1n based on the strategies of other EVs simultaneously, achieve an energy efficiency of over 95% [20]. and then broadcast its strategy to other EVs. The algorithm To facilitate the proof, some lemmas based on the VI the- terminates when the strategies of EVs converge to the Nash ory [21] and the game theory are introduced as follows: equilibrium. Note that the Jacobi best response-based algorithm G Q Lemma 1: Given the game ( , f) defined by (16), with is guaranteed to converge to the Nash equilibrium given that the Q  N Q  T N n=1 n and f (Un(PN ( )))n=1, suppose that for uniqueness of the equilibrium is guaranteed. The detailed proof each EV n: has been provided in both [21], [22] and is omitted here due Q 1) the strategy set n is convex and close; to space limitation. In Algorithm 1, each EV n simultaneously T T 2) the utility function Un(PN ( )) is convex in Pn( ) for solves its own optimization P1 in a distributed manner. The T n every fixed P−n( ) and continuously differentiable in time complexity of the algorithm thus only depends on the T PN ( ). number of scheduling slots T . Suppose that the optimization G Q Q Then, the game ( , f) is equivalent to VI( , F), where for each EV is solved using the interior point method, the time T  ∇ T N F(PN ( )) ( Pn(T )Un(PN ( )))n=1. complexity of the algorithm in each iteration is O(T 3.5). Q Lemma 2: Given VI( , F), suppose that: Given the solution of the NEP, namely, the best responses Q 1) the set is compact (closed and bounded) and convex; of EVs, from Algorithm 1, the EV aggregator sets the optimal 2) the function F is continuous. electricity trading price to maximize its own profit, which fol- Then, the set of solutions is compact and nonempty. lows Algorithm 2. In Algorithm 2, the EV aggregator iteratively Q Q Lemma 3: Given VI( , F),ifF is strongly monotone on , updates p toward the equilibrium point using the gradient Q A VI( , F) admits a unique solution. ascent approach. When Algorithm 2 converges, the optimal price The following theorems are provided for the proof of the ∗ ∗ T pA and the best responses of EVs PN ( ) constitute the Nash existence and uniqueness of the Nash equilibrium. equilibrium of the Stackelberg V2G game. Theorem 1: For any pA chosen by the EV aggregator, there exists a Nash equilibrium for the game G(Q, f). V. V2G G AME:COOPERATIVE APPROACH Proof: It can be easily validated that the strategy set Qn for each n ∈N is closed and convex. Moreover, the payoff Due to the competitiveness among EVs and the selfishness function Un(PN (T )) is continuously differentiable in PN (T ) of EVs and the EV aggregator, both the utilities of EVs and the CHEN AND LEUNG: NON-COOPERATIVE AND COOPERATIVE OPTIMIZATION OF SCHEDULING WITH V2G 121

 N Q Algorithm 1: Jacobi Best Response-Based EV Algorithm. where Eg(t)= n=1 Pn(t)Δt and n denotes the feasible set T 1: For EV n ∈N, choose any feasible starting point of Pn( ) as defined in (8). It can be easily shown that Problem (0) P3 is a convex programming. Pn (T ), and set iteration index i = 0. (i+1) T 2: Compute the optimal solution Pn ( ) based on B. Potential Game P1n. (i+1) The social welfare maximization problem in P3 is closely 3: Broadcast Pn (T ) to other EVs. 4: Set i ← i + 1. related to a concept in game theory, called the potential game. (i) T  (i) T (i) T In game theory, a game is said to be a potential game if the 5: Update P−n( ) (P1 ( ),...,Pn−1( ), (i) T (i) T incentive of players to change their strategies can be reflected in Pn+1( ),...,PN ( )). (i) (i−1) a single global function, called the potential function. 6: For each n ∈Nat t ∈T,if|Pn (t) − Pn (t)|≤ε, G Q Q  By mathematical definition, a game c( , f), with terminate the algorithm. Otherwise, repeat Steps 2Ð5 N N = Qn and f  (Un(PN (T ))) = , is an exact potential game until the condition is satisfied. n 1 n 1 if there exists a function Φ:Q→R such that ∀ P−n(T ) ∈ Q ∀ T T ∈Q Algorithm 2: Pricing Algorithm for the EV Aggregator. −n, Pn( ),Pn( ) n, it holds that: ( ) 0 Φ(P (T ), P− (T )) − Φ(P (T ), P− (T )) 1: Choose any feasible starting point pA ,setj = 0 and n n n n price update rate γ. ( ) = Un(P (T ), P−n(T )) − Un(P (T ), P−n(T )). (20) 2: Choose an infinitesimal variable Δ.Givenp j − Δ n n A (j) In other words, when player n from Action P (T ) to and pA +Δ, execute Algorithm 1 and obtain the best n ( ) T ∗ T j − Action Pn( ), the change in the potential Φ equals the change responses from EVs as PN ( ,pA Δ) and ( ) in the utility of that player. P∗ (T ,p j +Δ), respectively. N A For an exact potential game, the Nash equilibrium of the game (j) (j) − (j) (j) 3: Calculate UA (pA Δ) and UA (pA +Δ)based also indicates the optimal point of the potential function. For on (9). instance, in an exact potential game, if the social welfare function ( + ) ( ) ( (j)( (j)+Δ)− (j)( (j)−Δ)) j 1 j γ UA pA UA pA serves as the potential function, the optimal social welfare can 4: Update pA = pA + 2Δ , set j ← j + 1. be achieved in the equilibrium point of the game. (j) (j−1) 5: If |pA − pA |≤ς, terminate the algorithm. Otherwise, repeat Steps 2Ð4 until the condition is C. V2G Cooperative Game satisfied. To achieve the optimal social welfare in a cooperative system, we define the utility function of EV in a cooperative system as follows: EV aggregator suffer losses in the non-cooperative game. In this section, we study the V2G scheduling problem for a cooperative Vn(Pn(T ), P−n(T )) system, where all players are willing to cooperate to maximize = γ log(S + S (t )) + Q (PN (T ),P (T )) the social welfare of the system. A game theoretic approach n b,n n n,out fr r based on potential game is then proposed to achieve the optimal T social welfare in a distributed manner. − Pn(t)Δtp˜g(t) t=1 A. Social Welfare Maximization Problem = γn log(Sb,n + Sn(tn,out)) + Qfr(PN (T ),Pr(T )) To explore the V2G scheduling problem for a cooperative T system, we first need to define the social welfare maximization − Pn(t)ΔtpGα(Pn(t))(LG(t)+Pr(t)+Pn(t) problem for the system. The social welfare of the whole V2G t=1 system is defined as the sum of the utilities of all EVs plus the utility of the EV aggregator. We then consider the following + 2Pk(t)), (21) social welfare maximization problem: k∈N ,k=n where the first and second terms correspond to the satisfaction P3 : maximize Φc(PN (T )) PN (T ) function of EV charging and the revenue of providing regula- tion services for the grid operator. The third term denotes the T T = Un(PN ( )) + UA(PN ( )) cost/revenue of an individual EV from buying/selling electrical n∈N energy from/to the EV aggregator. p˜g(t) is the adjusted grid T electricity price from (15) and is set as the trading price between − = γn log(Sb,n + Sn(tn,out)) Eg(t)pg(t) EVs and the EV aggregator. This ensures the structure of the n∈N t=1 V2G cooperative game as a potential game. Note that the actual

+ Qfr(PN (T ),Pr(T )) utility of an individual EV is the sum of the first and the third terms. The second term is not the part of the actual utility of an T ∈Q ∀ ∈N subject to Pn( ) n, n . (19) individual EV and is included in (21) to facilitate achieving the 122 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 1, JANUARY 2020 optimal social welfare. Therefore, (21) can be regarded as the Algorithm 3: Jacobi Best Response-Based EV Algorithm. reinforced utility function of an individual EV in a cooperative ∈N system. 1: For EV n , choose any feasible starting point (0) T G Q g Pn ( ), and set iteration index i = 0. A V2G cooperative game can then be formulated as c( , ), ( + ) N i 1 Q  Q  T N 2: Compute the optimal solution Pn (T ) based on with n=1 n and g (Vn(PN ( )))n=1. Due to the property of the utility function Vn(PN (T )), Gc(Q, g) is an exact P4n. (i+1) T potential game since ∀ P−n(T ) ∈Q−n, ∀ P (T ),P (T ) ∈ 3: Broadcast Pn ( ) to other EVs. n n ← Qn, the following equality holds: 4: Set i i + 1. (i) T  (i) T (i) T 5: Update P−n( ) (P1 ( ),...,Pn−1( ), T T − T T ( ) ( ) Φc(Pn( ), P−n( )) Φc(Pn( ), P−n( )) i T i T Pn+1( ),...,PN ( )). T T − T T ∈N ∈T | (i) − (i−1) |≤ = Vn(Pn( ), P−n( )) Vn(Pn( ), P−n( )). (22) 6: For each n at t ,if Pn (t) Pn (t) ε, terminate the algorithm. Otherwise, repeat Steps 2Ð5 where Φc is defined in (19). In other words, the incentive of EVs until the condition is satisfied. to change their strategies can be reflected in a single potential function, which is exactly the social welfare function in the cooperative system. shares the same solution with P3, it can be concluded that Based on potential game theory, the solution of Problem P3, G (Q, g) admits a unique Nash equilibrium solution.  ∗ c namely, PN (T ), can be obtained as the collection of the optimal solutions of the following problem (23): E. Algorithm for V2G Cooperative Game T T P4n : maximize Vn(Pn( ), P−n( )) Note that Problem (23) is a NEP which is similar to Prob- Pn(T ) lem (16). Therefore, a Jacobi best response-based algorithm subject to Pn(T ) ∈Qn. (23) can be applied to find the solution of the NEP, as shown in Algorithm 3. In each iteration, each EV n solves P4n based on The trading price between the EV aggregator and EVs at the ∗ the strategies of other EVs and then broadcasts its strategy to equilibrium, pA(t), is then determined as follows based on (21): other EVs. The algorithm terminates when the strategies of EVs ∗ converge to the Nash equilibrium. pA(t)=pGα(Pn(t)) ⎛ ⎞ Given the solution of the NEP, namely, the best responses of EVs, from Algorithm 3, the EV aggregator sets the trading ×⎝ ∗ ∗ ⎠ 2Pk (t)+Pn(t)+LG(t)+Pr(t) . price at equilibrium point based on (24). The best response of ∈N = ∗ k ,k n EVs, PN (T ), yields the optimal solution for the social welfare (24) maximization Problem P3. Note that, from (24), it can be proved that the F. Discussions on Cooperative and Non-Cooperative Games sum of electricity bills charged from all EVs, namely, T ∗ t=1 pA(t) n∈N Pn(t)Δt, is always larger than or equal to Based on (21), the utility function of EV in the cooper- T T the total cost which the EV aggregator pays to the utility grid ative game is reinforced with Qfr(PN ( ),Pr( )). There- T t=1 pg(t)Eg(t). Therefore, the EV aggregator can always fore, to solve P4n, each EV is required to receive the reg- make a profit in the V2G cooperative game. ulation request Pr(T ) over T from the EV aggregator. This brings communication overhead between the EV aggregator D. Existence and Uniqueness of Nash Equilibrium and EVs compared to the non-cooperative game. In return, the social welfare Φ (PN (T )) can be optimized in the cooperative We now prove the existence and uniqueness of the Nash equi- c game and the V2G regulation services can also achieve near- librium of G (Q, g) under certain assumptions by introducing c optimal performance. These improvements will be discussed in Theorem 3. Section VII. Theorem 3: The Nash equilibrium of Gc(Q, g) always exists. Given Assumptions 1 and 2 hold, the Nash equilibrium of VI. EXTENSION TO REAL-TIME GAME Gc(Q, g) is unique. Proof: Note that Gc(Q, g) is a potential game in which the The V2G market discussed in Sections IIIÐV is a day-ahead potential function is the social welfare function Φc.Dueto market, in which the EV aggregator and EVs plan their strategies the properties of potential game, there exists at least one Nash over the daily scheduling horizon [Tbegin,Tend]. In a day-ahead equilibrium in a potential game. Therefore, the existence of the market, it is assumed that the forecasting information of regu- Nash equilibrium of Gc(Q, g) can be guaranteed. lation request is available and the mobility information of EVs, The uniqueness of the Nash equilibrium can be obtained by e.g., arrival and departure times, are known in the planning exploring the property of P3. It is easy to verify that the potential period. Nevertheless, in real-world operation, the forecasting function Φc(PN (T )) is strictly concave with PN (T ) and the information of regulation request is hard to obtain and there feasible set of P3 is convex given Assumptions 1 and 2 hold. exists large uncertainty in EVs’ mobility behaviours. Therefore, Therefore, P3 admits a unique optimal solution. Since Gc(Q, g) a real-time solution is preferred to plan the strategies of players CHEN AND LEUNG: NON-COOPERATIVE AND COOPERATIVE OPTIMIZATION OF SCHEDULING WITH V2G 123 at each time slot based on current information. In this section, Algorithm 4: Jacobi Best Response-Based Real-Time EV real-time V2G games in a real-time market are extended from Algorithm (Non-Cooperative). the dynamic V2G games in a day-ahead market discussed in 1: At Time Slot t,forEVn ∈N, choose any feasible Sections IIIÐV. In real-time V2G games, forecasting information (0) of regulation request is not required and decisions are made starting point Pn (t), and set iteration index i = 0. (i+1) among those EVs that have arrived at the EV aggregator. There- 2: Compute the optimal solution Pn (t) based on fore, real-time games are more practical for implementation in P5n. (i+1) V2G system. 3: Broadcast Pn (t) to other EVs. 4: Set i ← i + 1. (i)  (i) (i) A. Real-Time V2G Non-Cooperative Game 5: Update P−n(t) (P1 (t),...,Pn−1(t), ( ) ( ) P i (t),...,P i (t)). 1) Real-Time Game Formulation: In real-time V2G game, n+1 N ∈N | (i) − (i−1) |≤ the utility function of EVs and the EV aggregator, as well as the 6: For each n ,if Pn (t) Pn (t) ε, strategy set should be decoupled and redefined in each Time Slot terminate the algorithm. Otherwise, repeat Steps 2Ð5 t. For real-time V2G non-cooperative game, the utility function until the condition is satisfied. of EV n at Time Slot t is decoupled from (1) and defined as follows: and the feasible set of EV n at Time Slot t Qn(t) follows:

Un(Pn(t), P−n(t),pEV (t))=−Pn(t)ΔtpEV (PN (t))+γn log Qn(t)={Pn(t)|Pn(t) satisfies (2), (6), and (32)}, (31) × − (Sb,n +Sn(t)+η(Pn(t))Pn(t)Δt) γn log(Sb,n +Sn(t)). where (25) Sn(t)+η(Pn(t))Pn(t)Δt +(tn,out − t − 1)ηchPn ≥ Sn,req Based on (9), the utility function of the EV aggregator at Time ≤ − Slot t is decoupled from the whole scheduling period and defined t tn,out 1. (32) as follows: Constraint (32) guarantees that the charging requirement of EV n N can be fulfilled. Note that the original time-coupling constraints Q UA(PN (t),pEV (t),pg(t)) = Pn(t)ΔtpEV (PN (t)) (3) and (7) in the feasible set n are relaxed by introducing (32). n=1 Therefore, the feasible set of EV n at Time Slot t Qn(t) is free of time-coupling component. − E (t)p (t)+Q (PN (t),P (t)), (26) g g fr,t r In the real-time V2G game, the EV aggregator determines  N the optimal pA(t) at each Time Slot t ∈T based on P6. Then where Eg(t)= n=1 Pn(t)Δt. Qfr,t is the income of provid- ing regulation services at Time Slot t and is formulated as: EVs decide their charging/discharging powers at Time Slot t based on P5n correspondingly. The existence and uniqueness of Qfr,t(PN (t),Pr(t)) = Qbase,t − pfr,tFt(PN (t),Pr(t)), the Nash equilibrium for the real-time V2G game can be easily (27) proved using the VI theory discussed in Section IV-C. Similar ∈N where Qbase,t is the base income for providing regulation service to Algorithm 1, Algorithm 4 is devised for each EV n to ∗ at Time Slot t and pfr,t is the penalty factor for power fluctuation find the equilibrium solution at Time Slot t, Pn(t). As an online at Time Slot t. Ft(PN (t),Pr(t)) measures the squared deviation algorithm, Algorithm 4 yields the time complexity of O(1) to between the total power and its mean value: find the optimal solution of EV n at Time t in each iteration.   Algorithm 2 can then be applied to find the real-time price at the 2 ∗ PN − (28) Ft( (t),Pr(t)) = Ptotal(t) Ptotal , equilibrium point, pA(t), using the gradient ascent approach. 2) Analytic Solutions of P5n: To find the Nash equilibrium where Ptotal is defined in (11). Ptotal is the mean value of the of the real-time V2G non-cooperative game, we need to solve total power over T . Due to the zero-mean property of regulation P5n for optimal Pn(t) given that P−n(t) is available. Here, request [6], P is also the time-averaged total charging energy total we obtain the closed-form solution of P5n for Pn(t) when of EVs and can be estimated using historical EV charging data. P−n(t) is given. Therefore, P5n can be solved without the help Similar to the dynamic V2G game, the real-time V2G game of optimization solvers. at Time Slot t can then be formulated based on (16) and (18), We first calculate the partial derivative of Un with respect to with utility functions (1) and (9) replaced, respectively, by (25) Pn(t) ≥ 0: and (26): ∂Un(Pn(t), P−n(t)) =−pAΔt(Lbase +Pr(t)+ Pk(t)) P5n : maximize Un(Pn(t), P−n(t),pEV (t)) ∂Pn(t) Pn(t) k∈N ,k=n

subject to Pn(t) ∈Qn(t). (29) ηchΔt − 2pAPn(t)Δt + γn . (33) ∗ Sb,n + Sn(t)+ηchPn(t)Δt P6 : maximize UA(PN (t),pEV (t),pg(t)) pA(t) The partial derivative (33) is a rigorously decreasing function subject to pmin ≤ pA(t) ≤ pmax. (30) with respect to Pn(t). Making the partial derivative to be zero 124 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 1, JANUARY 2020 gives the maximum point of the utility function (25) if P (t)≥0, n Algorithm 5: Jacobi Best Response-Based Real-Time EV which is shown in (34), shown at the bottom of this page. Algorithm (Cooperative). Similarly, if P (t) < 0, the maximum point of the utility n 1: At Time Slot t,forEVn ∈N, choose any feasible function (25) is given by (35), shown at the bottom of this (0) page. Therefore, combining (34) and (35) gives the maximum starting point Pn (t), and set iteration index i = 0. (i+1) point of the utility function (25) as (36), shown at the bottom 2: Compute the optimal solution Pn (t) based on of this page. Since Un(Pn(t), P−n(t)) is strictly concave with P7n. (i+1) respect to Pn(t), its absolute maximum can only be yielded 3: Broadcast Pn (t) to other EVs. ← at (36) or on the boundary conditions. By confining Pn(t) 4: Set i i + 1. (i)  (i) (i) within its feasible region, the absolute maximum point of 5: Update P−n(t) (P1 (t),...,Pn−1(t), P5 (i) (i) n is obtained as (37), shown at the bottom of this page. Pn+1(t),...,PN (t)). ( ) ( − ) Equation (37) gives the closed-form solution of P5n for EV 6: For each n ∈N,if|P i (t) − P i 1 (t)|≤ε,  { } n n n when P−n(t) P1(t),...,Pn−1(t),Pn(t),...,PN (t) is terminate the algorithm. Otherwise, repeat Steps 2Ð5 given. By (37), the equilibrium solution of P5n for each EV,i.e., until the condition is satisfied. ∗ ∈N Pn(t) for n , can be obtained through the iteration steps in Algorithm 4. The equilibrium solution of P5n is achieved after Algorithm 4 converges. The real-time V2G cooperative game is then formulated as the following NEP: B. Real-Time V2G Cooperative Game P7n : minimize Vn(Pn(t), P−n(t)) 1) Real-Time Game Formulation: Similar to the procedures Pn(t) in Section VI-A, the real-time V2G cooperative game can also ∈Q be formulated. subject to Pn(t) n(t). (39) Based on (21), the utility function of EV in a cooperative Similar to P4 , P7 admits a unique equilibrium solution. system at Time Slot t is decoupled and formulated as follows: n n Algorithm 5 can find the equilibrium solution for each EV in Vn(Pn(t), P−n(t)) the real-time V2G cooperative game at each time slot. (24) then ∗ gives the real-time price at Time Slot t, pA(t). = γn log(Sb,n + Sn(t)+η(Pn(t))Pn(t)Δt) 2) Analytic Solutions of P7n: Similar to the real-time V2G − γn log(Sb,n + Sn(t)) non-cooperative game, we obtain the closed-form solution of P7n for Pn(t) when P−n(t) is given. The partial derivative of + Qfr,t(PN (t),Pr(t)) − Pn(t)Δtp˜g(t) Vn with respect to Pn(t) is given as (40) when Pn(t) ≥ 0: = γn log(Sb,n + Sn(t)+η(Pn(t))Pn(t)Δt)  ∂V (P (t), P− (t)) n n n = −p Δt L + P (t)+ 2P (t) − γn log(Sb,n + Sn(t)) + Qfr,t(PN (t),Pr(t)) G G r k ∂Pn(t) ⎛ ⎞ k∈N ,k=n − ⎝ ⎠ ηchΔt Pn(t)ΔtpGα(Pn(t)) 2Pk(t)+Pn(t)+LG +Pr(t) , − 2pGPn(t)Δt + γn Sb,n + Sn(t)+ηchPn(t)Δt k∈N ,k=n  (38) − 2pfr Pr(t)+ Pk(t)+Pn(t) − Ptotal . (40) where Qfr,t(PN (t),Pr(t)) is given by (27). k∈N ,k=n

 p A(t)η Δt + 2p B(t) − p2 [2B(t)+A(t)η Δt] − 8p Δtη [p · A(t) · B(t) − γ η ] P (t)= A ch A A ch A ch A n ch (34) n −4p Δtη  A ch where A(t)=(Lbase + Pr(t)+ k∈N ,k=nPk(t)) and B(t)=Sb,n + Sn(t)  p δA(t) 1 Δt + 2p B(t) − p2 [2B(t)+δA(t) 1 Δt] − 8p Δt 1 [p · δA(t) · B(t) − γ 1 ] A ηdch A A ηdch A ηdch A n ηdch Pn(t)= (35) −4p Δt 1 A ηdch  · − 2 − − pA C(t)+2pAB(t) pA[2B(t)+ C(t)] 8pAΔtη(Pn(t))[pAα(Pn(t))A(t)B(t) γnη(Pn(t))] Pˇn(t)= (36) −4pAΔtη(Pn(t)) where C(t)=α(Pn(t))A(t)η(Pn(t))Δt.    − − − − − − ∗ ˇ Sn(t) Sn(t) Sn(t) Sn(t) Sn,req Sn(t) (tn,out t 1)η(P n)P n Pn(t)=max min Pn(t), P n, ,Pn, 1 , (37) Δtηch Δt Δtηch η dch CHEN AND LEUNG: NON-COOPERATIVE AND COOPERATIVE OPTIMIZATION OF SCHEDULING WITH V2G 125

Making the partial derivative to be zero gives the maximum be 07:00. Similarly, the departure time of an EV is assumed point of the utility function (38) if Pn(t) ≥ 0, which is shown in to follow a normal distribution with the mean at 22:00 and (41), shown at the bottom of this page. Similarly, the maximum the standard deviation of 60 minutes. Any departure time after point of the utility function (38) is given by (42), shown at the 23:40 is set be to 23:40. In the electricity price model pg(t), bottom of this page, if Pn(t) < 0. Therefore, combining (41) and pG is set as 0.01 so that the average grid electricity price is (42) gives the maximum point of the utility function (38) as (43), 0.3 dollar/kWh. δ in (14) is set to 1 if not specified. The regulation shown at the bottom of this page. Since Vn(Pn(t), P−n(t)) is signals from the PJM market in Dec. 1, 2017 [26] are adopted in strictly concave with respect to Pn(t), its absolute maximum can our simulation. The base capacity price for providing regulation only be yielded at (43) or on the boundary conditions. By con- services is set as 0.1 dollar/kW. To match with the V2G capacity, fining Pn(t) within its feasible region, the absolute maximum the original regulation power is normalized with standard devi- point of P7n is obtained as (44), shown at the bottom of this ation σ = 71.62 kW in Sections VII-D-1), 2), 5), and 6) and page. Equation (44) gives the closed-form solution of P7n for standard deviation σ = 57.30 kW in Sections VII-D-3) and 4). EV n when P−n(t)  {P1(t),...,Pn−1(t),Pn(t),...,PN (t)} The simulation is conducted in MATLAB Release 2015b, using ∗ is given. By (44), the solution of P7n for each EV, i.e., Pn(t) the gurobi optimization solver under the yalmip toolbox. for n ∈N, can be obtained through the iteration steps in Algorithm 5. The equilibrium solution of P7n is achieved after B. Performance Metrics Algorithm 5 converges. Two performance metrics are applied to the evaluation of different algorithms: Quality of Smoothing and Social Welfare. VII. PERFORMANCE EVA L UAT I O N The Quality of Smoothing metric measures the performance A. Simulation Setup of providing frequency regulation services [6]. The Quality of We evaluate the performance of the proposed game theoretic Smoothing function F (PN (T ),Pr(T )) [6] reflects to what ex- V2G scheduling approaches in a V2G system, which includes tent the V2G system can smooth out the fluctuation of regulation one EV aggregator and 50 EVs supporting regulation services request: for a power grid. We consider a time horizon of 1000 minutes F (PN (T ),P (T )) = Var(P (T )) (from 07:00 in the morning to 23:40 in the evening), which is r total ⎛ ⎛ ⎞⎞ divided equally into T = 200 slots of length Δt = 5 minutes. 2 1 T 1 T Among EVs currently in the market, Chevrolet Volt with = ⎝P (i) − ⎝ P (j)⎠⎠ . T total T total (45) 18.4 kWh battery pack [23] is chosen to conduct our simulation. i=1 j=1 For bidirectional V2G, the charging/discharging power of the Volt EV range from −3.6 kW to 3.6 kW according to the where Ptotal is defined in (11). (45) measures the variance of standard Level 2 charging in the USA [24]. The charging and the total power, which is the sum of the regulation request and discharging efficiencies are both set to one. Based on [6], [25], the total charging or discharging power of EVs. Intuitively, the the distributions of plug-in and departure times of EVs are close lower the variance of the total power becomes, the better the to the normal distributions. The plug-in time of an EV follows frequency regulation services will be. a normal distribution with the mean at 08:00 and the standard The Social Welfare of the whole V2G system is defined as deviation of 60 minutes. Any plug-in time before 07:00 is set to the sum of the utilities of all EVs plus the utility of the EV

 2 K(t) − K (t) − 4(2ΔtpG + 2pfr)ηchΔt[2pfrB(t) · G(t)+B(t)pGΔtH(t) − γnηchΔt] Pn(t)= (41) 2ηch(−2ΔtpG − 2pfr)Δt   − where G(t)=(Pr(t)+ k∈N ,k=nPk(t) P total), B(t)=Sb,n + Sn(t), H(t)=(LG + Pr(t)+ k∈N ,k=n2Pk(t)), and K(t)= 2 2B(t)pfr + 2ηchpfrG(t)Δt + ηchpG(Δt) H(t)+2B(t)ΔtpG  M(t) − M 2(t) − 4(2Δtp δ + 2p ) 1 Δt[2p B(t) · G(t)+B(t)p δΔtH(t) − γ 1 Δt] G fr ηdch fr G n ηdch Pn(t)= (42) 2 1 (−2Δtp δ − 2p )Δt ηdch G fr where M(t)=2B(t)p + 2 1 p G(t)Δt + 1 p δ(Δt)2H(t)+2B(t)Δtp δ fr ηdch fr ηdch G G  2 R(t)− R (t)−4(2ΔtpGα(Pn(t))+2pfr)η(Pn(t))Δt[2pfrB(t) · G(t)+B(t)pGα(Pn(t))ΔtH(t)−γnη(Pn(t))Δt] Pˆn(t)= 2η(Pn(t))(−2ΔtpGα(Pn(t)) − 2pfr)Δt (43) 2 where R(t)=2B(t)pfr + 2η(Pn(t))pfrG(t)Δt + η(Pn(t))pGα(Pn(t))(Δt) H(t)+2B(t)ΔtpGα(Pn(t))    − − − − − − ∗ ˆ Sn(t) Sn(t) Sn(t) Sn(t) Sn,req Sn(t) (tn,out t 1)η(P n)P n Pn(t)=max min Pn(t), P n, ,Pn, 1 , (44) Δtηch Δt Δtηch η dch 126 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 1, JANUARY 2020

TABLE I COMPARISONS OF DIFFERENT ALGORITHMS

Fig. 5. Grid Frequency Response Under Different Algorithms.

Fig. 4. Smoothing Effect of V2G Game.

aggregator. The Social Welfare function is given by Φc(PN (T )) in (19).

C. Algorithms for Comparison To test the achieved performance of the proposed game theo- Fig. 6. Convergence of V2G Algorithms. (a) Convergence of Algorithm 1. (b) Convergence of Algorithm 3. retic approaches, three algorithms to implement V2G scheduling arer compared in the simulation: (S1): Non-cooperative game approach (NCG); r frequency regulation services. Among the three algorithms, (S3) r (S2): Cooperative game approach (CG); (S3): System optimization approach (SO) [6]. achieves the optimal Quality of Smoothing since it takes Quality (S1) and (S2) are the approaches proposed in our work, of Smoothing as the control objective. Compared to (S1), (S2) which are introduced in Sections IV and V, respectively. (S3) is achieves a better and near-optimal Quality of Smoothing since the approach devised in [6]. Specifically, (S3) takes Quality of the profile of total grid power achieved by (S2) is quite close Smoothing (45) as the control objective and optimizes the charg- to that of (S3). Therefore, it is concluded that the Quality of ing/discharging schedules of EVs from the perspective of the Smoothing can be improved when EVs and the EV aggregator grid operator. Therefore, (S3) can achieve the optimal Quality collaborate with each other in the V2G system. of Smoothing in the V2G scheduling. However, since (S3) only To better show the performance of regulation services, we takes Quality of Smoothing as the optimization target, it fails to further conduct a test in a small-scale islanded with consider the utility of individual EV. To better understand the installed capacity of 3000 kW. Fig. 5 illustrates the grid fre- differences of algorithms, Table I illustrates the time complexity quency under different algorithms. It is shown that all these three and information exchange required of different algorithms. algorithms can stabilize the grid frequency within the range of [59.9, 60.1] Hz. Compare to (S1), (S2) and (S3) achieve better performance to stabilize the grid frequency. D. Simulation Results 2) Convergence Analysis: We test the convergence of the 1) Effectiveness of V2G Game for Providing Regulation algorithm for both (S1) and (S2). Fig. 6(a) and 6(b) illustrate Services (Quality of Smoothing): In the V2G non-cooperative the total V2G power of EVs under different number of iterations game and cooperative game, EVs autonomously determine their in Algorithm 1 and Algorithm 3. From Fig. 6(a) and 6(b), it is charging/discharging strategies to maximize their utility func- shown that Algorithm 1 can converge to the Nash equilibrium tions. We show that their charging/discharging strategies can within around 20 iterations and Algorithm 3 can converge to the absorb the power fluctuations from the regulation requests. Fig. 4 Nash equilibrium within 30 iterations. Therefore, Algorithms 1 illustrates the total grid power, which is the sum of the regulation and 3 show their efficiencies in finding the Nash equilibrium of request and the charging/discharging powers of all EVs, under the game. One interesting finding is that Algorithm 1 suffers os- different operation algorithms. We can see from Fig. 4 that cillations in its iterative process until it converges. This is due to all the three algorithms show good performance on providing the competitive behaviours of EVs in the non-cooperative game. CHEN AND LEUNG: NON-COOPERATIVE AND COOPERATIVE OPTIMIZATION OF SCHEDULING WITH V2G 127

Fig. 8. Impact of δ on QualityofSmoothing.

Fig. 7. Effect of Initialization on Convergence. with I1 (1189.5 s). It can be concluded that through approximat- In comparison, Algorithm 3 shows monotone convergence in ing the equilibrium solution, I2 outperforms I1 in Algorithm 1 in its iterative process. This is because the potential function in terms of convergence performance. the cooperative game can implicitly guide the behaviours of 3) Impact of δ on Quality of Smoothing: Hereinbefore, it is EVs towards the equilibrium. Therefore, the V2G power can assumed that the electricity buying price equals the electricity converge monotonously to the equilibrium. selling price of EVs, i.e., δ = 1 in (14). Nevertheless, the buying The convergence of the Jacobi best response-based algorithm price and selling price can be different due to different pricing is generally sensitive to the choice of initial point. In this test, policies of governments and market operators. To study this we study the effect of different initialization methods on the situation, we conduct two extreme cases, i.e., in which the buying convergence of Jacobi algorithm, and compare two initialization price is much higher than the selling price (δ = 0), and in which methods, i.e., I1) zero initialization and I2) initialization with the buying price is much lower then the selling price (δ = 10) approximation. in V2G non-cooperative game (S1). To better illustrate how δ For I1, the starting point of charging/discharging power for can impact on Quality of Smoothing, the charging requirements ∈N (0) T T of EVs are set to be zero to relieve the charging constraints each EV n is initialized with Pn ( )=(0)1 .ForI2,we ( ) and to see the changes of Quality of Smoothing purely affected initialize P 0 (T ) so that it can approximate the value of P (T ) n n by the electricity prices. The results are illustrated in Fig. 8. at the equilibrium point: From Fig. 8(a), it can be concluded that when the selling price  P − P (t) T is relatively low, EVs will only respond to regulation-down P (0)(T )= total,avg r , (46) signals while ignoring the regulation-up signals, that results in n N(t) t=1 unidirectional regulation services. This is because EVs are not where N(t) is the number of EVs that participate in willing to selling electricity to the grid since EVs cannot gain benefits from the selling behavior when selling price is low. the V2G game at Time t, Ptotal,avg is the estimated T When the selling price is much higher then the buying price, mean of the total grid power over , i.e., Ptotal,avg = T illustrated in Fig. 8(b), EVs can gain much more benefits through 1 · (P (t)+ P (t)). Since frequency regulation T t=1 r n∈N n  arbitrage, i.e., buying electricity in regulation-down period and T ≈ services are zero-energy services in which t=1 Pr(t) 0, then selling electricity in regulation-up period. Therefore, (S1) Ptotal,avg can be approximated by the time-average of the total can still achieve good QualityofSmoothing and provide bidi- charged energy of EVs: rectional V2G regulation services when δ 1. 1 E 4) Game With Irresponsive EVs: In the above discussion, P = · EV , (47) total,avg T Δt it is assumed that all of the EVs are responsive who make their strategies to maximize their own utilities. However, in the where EEV is the total charged energy of all EVs over T . It can real-world operation, there may exist irresponsive EVs in the be easily estimated with historical data of EEV . game who do not aim to adjust their strategies to maximize their (0) I2 aims to make Pn (T ) closer to its equilibrium point utilities. For instance, some EVs may adopt deterministic charg- compared with I1. Fig. 7 illustrates the convergence curve ing strategies without in response to electricity prices. They are of Algorithm 1 under these two initialization methods. It is regarded as “noise users” in the V2G game who can deteriorate shown that I1 and I2 converge to the same value, i.e., the Nash the V2G performance (reflected by QualityofSmoothing). equilibrium point. Therefore, the convergence performance of This inspires us to study how the irresponsive EVs can affect the these two methods can be evaluated by their convergence speed. performance of providing regulation services. In this test, we test Intuitively, a faster convergence speed yields a better perfor- the achieved Quality of Smoothing in (S1) with different portion mance. Compared with I1 that converges within 25 iterations, of irresponsive EVs in the non-cooperative game. It is assumed I2 achieves a faster convergence speed and can converge around that all the irresponsive EVs uses constant charging scheme to 15 iterations. Therefore, I2 achieves more stable convergence meet their charging requirements in which the charging power than I1. In our simulations over 20 times, I2 can save 479.7 s of the EV maintains constant during the plug-in period. The on average in finding the equilibrium point of the game compared results are illustrated in Fig. 9. In Fig. 9(a), the achieved Quality 128 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 1, JANUARY 2020

Fig. 9. V2G Non-cooperative Game with Irresponsive EVs. (a) 20% EVs are Irresponsive. (b) 50% EVs are Irresponsive. Fig. 11. Utilities of the EV Aggregator and EVs.

Fig. 12. Smoothing Effect of Real-Time V2G Game. (a) Real-Time V2G Non- cooperative Game. (b) Real-Time V2G Cooperative Game.

Fig. 10. Social Welfare.

in Algorithm 3 increases, the Social Welfare achieved by (S2) of Smoothing deteriorates only slightly when 20 percent of keeps increasing till it reaches to the global optimum. This is EVs become irresponsive. It means that (S1) can still achieve because, in the potential cooperative game, each individual EV good Quality of Smoothing when a small portion of EVs are optimizes its utility function in the cooperative system while the irresponsive. This is because other responsive EVs can sense potential function (Social Welfare function) is also enhanced. the behaviours of irresponsive EVs during the iterative process After each iteration, the Social Welfare of the system is improved in Algorithm 1. These responsive EVs then manipulate their compared to the previous iteration, and it is unchanged until strategies accordingly, which can still maintain the total power to it reaches to the equilibrium point, which is also the global be flat. As the the proportion of irresponsive EVs increase from optimum of the Social Welfare. 20 percent to 50 percent, the total power becomes fluctuated Furthermore, Fig. 11 illustrates the utilities of the EV aggre- as the yellow dotted curve shows in Fig. 9(b). This is because gator as well as EVs achieved by (S1) and (S2). It shows that not the deterministic charging behaviours of a large number of only the Social Welfare but also the utility of each party is im- irresponsive EVs reduce the V2G capacities of the V2G system, proved through cooperation. Compared to the non-cooperative which in turn deteriorates the Quality of Smoothing achieved. game, the utilities of EVs and the EV aggregator increase by We can conclude from this study that the V2G system has a 10.25% and 5.34%, respectively. Therefore, the EV aggregator certain degree of tolerance to a small portion of irresponsive and EVs both have incentives to collaborate in the V2G game. EVs. 6) Real-Time V2G Game: In this test, we consider the V2G 5) Social Welfare Evaluation: The Social Welfare achieved cooperative and non-cooperative games operated in real-time, by different algorithms are compared and illustrated in Fig. 10. as discussed in Section VI. The simulation results are illustrated Among the three algorithms, (S2) achieves the highest value of in Fig. 12(a) and 12(b), where the solid curve denotes the total Social Welfare. (S3) achieves the second highest value of Social grid power when V2G game is incorporated. We can see from Welfare since it takes Quality of Smoothing as the control ob- Fig. 12 that, similar to the dynamic games, EVs can smooth jective and fail to optimize the Social Welfare. (S1) achieves the out the power fluctuations from the grid in the real-time games. lowest value of Social Welfare because of the non-cooperative By cooperation, the EVs and the EV aggregator can achieve a behaviours of different parties in the system. Another interest- better performance on Quality of Smoothing in real-time game. ing finding is that as the number of iterations in Algorithm 1 Therefore, through real-time electricity trading, the V2G system increases, the Social Welfare achieved by (S1) decreases till it can effectively provide regulation services for the power grid in reaches to the equilibrium point. For the update of each iteration the real-time games. in Algorithm 1, the achieved Social Welfare reduces compared to the previous iteration since the competitiveness in the charg- VIII. CONCLUSION ing/discharging activities of self-interest EVs becomes severer after each iteration, which has a negative effect on the Social This paper proposes non-cooperative and cooperative game Welfare of the system. In comparison, as the number of iterations theoretic approaches for V2G system, which aims to provide CHEN AND LEUNG: NON-COOPERATIVE AND COOPERATIVE OPTIMIZATION OF SCHEDULING WITH V2G 129 frequency regulation services. In the non-cooperative game, the this Hessian matrix are negative. Therefore, the Hessian matrix interaction between competitive EVs and the EV aggregator is of f is negative definite.  described as a Stackelberg game, in which the EV aggregator is the leader to determine the electricity trading price and the EVs REFERENCES are the followers to decide their charging/discharging strategies. [1] M. Kwon and S. Choi, “An electrolytic capacitorless bidirectional EV In the cooperative game, a potential game for EVs and the EV charger for V2G and V2H applications,” IEEE Trans. Power Electron., aggregator is formulated to achieve the social welfare of the V2G vol. 32, no. 9, pp. 6792Ð6799, Sep. 2017. [2] M. Restrepo, J. Morris, M. Kazerani, and C. A. Canizares, “Model- system. For both V2G non-cooperative game and cooperative ing and testing of a bidirectional smart charger for distribution system game, the existence and uniqueness of the Nash equilibria of EV integration,” IEEE Trans. Smart Grid, vol. 9, no. 1, pp. 152Ð162, the games are validated, and algorithms are then devised to find Jan. 2018. [3] C. H. Merrill et al., “Modeling and simulation of fleet vehicle batteries for the Nash equilibria. To operate the V2G system in real-time, in integrated logistics and grid services,” in Proc. Syst. Inf. Eng. Des. Symp., Section VI, real-time V2G games are extended from the dynamic Apr. 2015, pp. 255Ð260. V2G games proposed in Sections IV and V. Our simulation [4] T. N. Pham, H. Trinh, and L. V. Hien, “Load frequency control of power systems with electric vehicles and diverse transmission links using results show that the proposed non-cooperative game can au- distributed functional observers,” IEEE Trans. Smart Grid, vol. 7, no. 1, tonomously motivate EVs to smooth out the power fluctuations pp. 238Ð252, Jan. 2016. from the grid when EVs maximize their own utilities. By using [5] H. Yang,C. Y.Chung, and J. Zhao, “Application of plug-in electric vehicles to frequency regulation based on distributed signal acquisition via limited the cooperative game approach, the social welfare of EVs and the communication,” IEEE Trans. Power Syst., vol. 28, no. 2, pp. 1017Ð1026, EV aggregator can be further improved to the global optimum May 2013. and the Quality of Smoothing can also achieve near-optimal [6] J. Lin, K.-C. Leung, and V. O. K. Li, “Optimal scheduling with vehicle-to- grid regulation service,” IEEE Internet Things J., vol. 1, no. 6, pp. 556Ð569, performance, with the communication overhead between the Dec. 2014. aggregator and EVs being increased slightly. [7] C. Shao, X. Wang, X. Wang, C. Du, and B. Wang, “Hierarchical charge control of large populations of EVs,” IEEE Trans. Smart Grid, vol. 7, no. 2, pp. 1147Ð1155, Mar. 2016. APPENDIX [8] H. Wu, M. Shahidehpour, A. Alabdulwahab, and A. Abusorrah, “A game theoretic approach to risk-based optimal bidding strategies for electric A. Negative Definiteness of Hessian Matrix of f vehicle aggregators in electricity markets with variable wind energy ∀ ≤ ≤ resources,” IEEE Trans. Sustain. Energy, vol. 7, no. 1, pp. 374Ð385, Proof: 1 i N, the first-order partial derivatives of Jan. 2016. Un(PN (t)) on Pi(t) is calculated as follows: [9] H. Yang, X. Xie, and A. V. Vasilakos, “Noncooperative and cooperative   optimization of electric vehicle charging under demand uncertainty: A robust Stackelberg game,” IEEE Trans. Veh. Technol., vol. 65, no. 3, ∂Un(PN (t)) = −Pi(t) − Pk(t) pp. 1043Ð1058, Mar. 2016. ∂Pi(t) k∈N [10] L. Zhang and Y. Li, “A game-theoretic approach to optimal scheduling of parking-lot electric vehicle charging,” IEEE Trans. Veh. Technol., vol. 65, 1 no. 6, pp. 4068Ð4078, Jun. 2016. + γn (48) [11] Z. Zhu, S. Lambotharan, W. H. Chin, and Z. Fan, “A mean field game Sb,i + Si,in + Pi(t) theoretic approach to electric vehicles charging,” IEEE Access,vol.4, ∀ ≤ ≤ pp. 3501Ð3510, 2016. 1 i, j N, the second-order derivatives are derived by (48) [12] O. Beaude, S. Lasaulce, and M. Hennebel, “Charging games in networks of as follows: electrical vehicles,” in Proc. 6th Int. Conf. Netw. Games, Control, Optim., 2012, pp. 96Ð103. ∂Un(PN (t)) − − 1 [13] S. Bahrami and V.W. S. Wong, “A potential game framework for charging 2 = 2 γn 2 (49) ∂Pi(t) (Sb,i + Si,in + Pi(t)) PHEVs in smart grid,” in Proc. IEEE Pacific Rim Conf. Commun., Comput., Signal Process., 2015, pp. 28Ð33. ∂Un(PN (t)) [14] J. Tan and L. Wang, “A game-theoretic framework for vehicle-to-grid fre- = −1(50)quency regulation considering smart charging mechanism,” IEEE Trans. ∂Pi(t)∂Pj (t) Smart Grid, vol. 8, no. 5, pp. 2358Ð2369, Sep. 2017. [15] C. Wu, H. Mohsenian-Rad, and J. Huang, “Vehicle-to-aggregator inter- Therefore, the Hessian matrix is calculated as: action game,” IEEE Trans. Smart Grid, vol. 3, no. 1, pp. 434Ð442, Mar. ∇2 2012. Un(PN (t)) [16] X. Chen and K.-C. Leung, “A game theoretic approach to vehicle-to-grid ⎡ ⎤ scheduling,” in Proc. IEEE Global Commun. Conf., Dec. 2018, pp. 1Ð6. 2+γnζ(t) 11··· 1 [17] John A. Dutton e-Education Institute, PSU, EBF 483: Introduction to Elec- ⎢ ··· ⎥ ⎢ 12+γnζ(t) 1 1 ⎥ tricity Markets, Apr. 2019. [Online]. Available: https://www.e-education. ⎢ . . ⎥ psu.edu/ebf483/node/705 ⎢ . . ⎥ [18] S. Kim and G. B. Giannakis, “An online convex optimization approach to − ⎢ 11. . 1 ⎥ = ⎢ . . ⎥, real-time energy pricing for demand response,” IEEE Trans. Smart Grid, ⎢ . . .. ⎥ vol. 8, no. 6, pp. 2784Ð2793, Nov. 2017. ⎢ . . . 2+γnζ(t) 1 ⎥ ⎣ ··· ⎦ [19] Y. Wu, X. Tan, L. Qian, D. H. K. Tsang, W. Song, and L. Yu, “Optimal 11 12+γnζ(t) pricing and energy scheduling for hybrid energy trading market in future smart grid,” IEEE Trans. Ind. Inform., vol. 11, no. 6, pp. 1585Ð1596, (51) Dec. 2015. 1 [20] Battery University, Coulombic and Energy Efficiency with the Battery, where ζ(t)= 2 . (Sb,i+Si,in+Pi(t)) Jan. 2018. [Online]. Available: http://batteryuniversity.com/learn/article/ 2 Note that for ∇ Un(PN (t)), all the principal minors of order bu_808c_coulombic_and_energy_efficiency_with_the_battery [21] F. Facchinei and J.-S. Pang, “Nash equilibria: The variational approach,” i equal to the leading principal minor of the same order i. in Convex Optimization in Signal Processing and Communications,D.P. Since γn,ζ(t) > 0, it then can be proved by the principle of Palomar and Y. C. Eldar, Eds., Cambridge, U.K.: Cambridge Univ. Press, mathematical induction that all the leading principal minors of 2009, ch. 12, pp. 443Ð493. 130 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 1, JANUARY 2020

[22] J. Pang, G. Scutari, D. P. Palomar, and F. Facchinei, “Design of cognitive Ka-Cheong Leung (S’95–M’01) received the B.Eng. radio systems under temperature-interference constraints: A variational degree in computer science from the Hong Kong inequality approach,” IEEE Trans. Signal Process., vol. 58, no. 6, pp. 3251Ð University of Science and Technology, Hong Kong, 3271, Jun. 2010. in 1994, the M.Sc. degree in electrical engineering [23] Chevrolet, Chevy volt specifications, Apr. 2018. [Online]. Available: http: (computer networks) and the Ph.D. degree in com- //www.plugincars.com/chevrolet-volt puter engineering from the University of Southern [24] SAE Electric Vehicle and Plug-in Hybrid Electric Vehicle Conductive California, Los Angeles, CA, USA, in 1997 and 2000, Charge Coupler, SAE Standard J1772, Oct. 2010. respectively. He was with Nokia Research Center, [25] S. Xie, W. Zhong, K. Xie, R. Yu, and Y. Zhang, “Fair energy scheduling Nokia, Inc., Irving, TX, USA from 2001 to 2002, for vehicle-to-grid networks using adaptive dynamic programming,” IEEE Texas Tech University, Lubbock, TX, USA, from Trans. Neural Netw. Learn. Syst., vol. 27, no. 8, pp. 1697Ð1707, Aug. 2016. 2002 to 2005, and the University of Hong Kong, Hong [26] PJM, Regulation signal data, Aug. 2017. [Online]. Available: http:// Kong, from 2005 to 2019. He is currently an Associate Professor with the www.pjm.com/-/media/markets-ops/ancillary/mkt-based-regulation/ School of Computer Science and Technology, Harbin Institute of Technology, regulation-data.ashx?la=en Shenzhen, China. His research interests include smart grids, vehicle-to-grid (V2G), transport layer protocol design, active queue management, congestion control, and wireless communications.

Xiangyu Chen (S’16) received the B.S. degree in information engineering from Shanghai Jiao Tong University, Shanghai, China, in 2015 and the Ph.D. degree in electrical and electronic engineering from the University of Hong Kong, Hong Kong, in 2019. His research interests include smart grids, vehicle- to-grid, optimization theory and algorithms, game theory, and reinforcement learning.