1 Media-TCP: A Quality-Centric TCP-Friendly Congestion Control for Multimedia Transmission Hsien-Po Shiang and Mihaela van der Schaar Department of Electrical Engineering, UCLA, Los Angeles, USA {hpshiang, mihaela}@ee.ucla.edu

Abstract— In this paper, we propose a quality-centric congestion support for multimedia applications at the . control for multimedia streaming over IP networks, which we Therefore, multimedia applications need to rely on error refer to as media-TCP. Unlike existing congestion control resilience [8], forward error correction [9][10] and/or source schemes that adapt a user’s sending rate merely to the network coding rate control solutions [11][27], which need to be condition, our solution adapts the sending rate to both the implemented at the application layer to achieve a desirable network condition and the application characteristics by explicitly considering the distortion impacts, delay deadlines, and streaming quality. Moreover, the lack of congestion control interdependencies of different video packet classes. Hence, our mechanisms in UDP can lead to severe . media-aware solution is able to provide differential services for Therefore, a significant body of existing multimedia streaming transmitting various packet classes and thereby, further research over the past decade has focused on applying UDP- improves the multimedia streaming quality. We model this based congestion control that are TCP-friendly [4], which are problem using a Finite-Horizon Markov Decision Process being standardized as the Datagram Congestion Control (FHMDP) and determine the optimal congestion control policy Protocol (DCCP) [28]. However, these solutions often ignore that maximizes the long-term multimedia quality, while adhering the specific characteristics and requirements of multimedia to the horizon- K TCP-friendliness constraint, which ensures applications, thereby leading to a sub-optimal performance for long-term fairness with existing TCP applications. We show that the FHMDP problem can be decomposed into multiple optimal these applications. stopping problems, which admit a low-complexity threshold- Importantly, multimedia applications have several unique based solution. Moreover, unlike existing congestion control characteristics which need to be taken into account when approaches, which focus on maintaining throughput-based designing a suitable congestion control mechanism. First, fairness among users, the proposed media-TCP aims to achieve quality-based fairness among multimedia users. We also derive multimedia applications are loss-tolerant, and graceful quality sufficient conditions for multiple multimedia users to achieve degradation can be achieved if multimedia packets having quality-based fairness using media-TCP congestion control. Our lower distortion impacts are not received. Hence, various simulation results show that the proposed media-TCP achieves scheduling strategies [18][19] were proposed to optimize the more than 3dB improvement in terms of PSNR over the received multimedia quality for multimedia streaming by conventional TCP congestion control approaches, with the prioritizing packets for transmission over error-prone IP largest improvements observed for real-time streaming networks. Such solutions, which explicitly consider multimedia applications requiring stringent playback delays. packets’ distortion impacts, have also been adopted in order to improve the performance of congestion control mechanisms Keywords: Quality-centric congestion control, TCP-friendly congestion control for multimedia, finite-horizon Markov decision for multimedia applications [7]. Secondly, multimedia process, quality-based fairness. applications are delay-sensitive, i.e. multimedia packets have hard delay deadlines by which they must be decoded. If I. INTRODUCTION multimedia packets cannot be received at the destination before Transmission Control Protocol (TCP) is the most widely their delay deadlines, they should be purged from the senders’ used protocol for data transmission at the transport layer. transmission buffers to avoid wasting precious However, existing TCP congestion control provides resources. Third, in order to remove the temporal correlation dramatically varying throughput that is unsuitable for delay- existing in the source data, multimedia data are often encoded sensitive, bandwidth-intense, and loss-tolerant multimedia interdependently using prediction-based coding solutions (as in applications [3] (e.g. real-time video streaming, [20][21]). This introduces sophisticated dependencies between videoconferencing etc.). This is due to the fact that current multimedia packets across time. Hence, if a multimedia packet TCP congestion control aggressively increases the congestion is not received at the destination before its delay deadline, all window until congestion occurs, and then adopts an the packets that depend on that packet should be purged from exponential backoff mechanism to mitigate the congestion. the transmission buffer to avoid unnecessary congestion, since The fluctuating throughput results in long end-to-end delays these packets are not usable at the decoder side. which can easily violate the hard delay deadlines required by A. Limitations of current trasnport layer solutions for various multimedia applications. Hence, numerous multimedia multimedia transmission transmission solutions over IP networks adopt User Datagram Protocol (UDP) at the transport layer [27]. However, UDP Supporting real-time multimedia transmission over IP provides connectionless, unreliable services without networks is an important, yet challenging problem. Various guaranteed delivery, which limits the (QoS) approaches have been proposed to adapt the existing transport 2 layer protocols such that they can better support delay- In this paper, to overcome the abovementioned limitations, sensitive, loss-tolerant multimedia applications. However, the proposed media-TCP aims to make the following most current approaches still exhibit several key limitations. contributions: 1) Multimedia quality unaware adaptation. Conventional 1) Quality-centric packet-based congestion control. The transport layer congestion control approaches are application- proposed media-TCP congestion control is quality-centric, agnostic, meaning that they merely attempt to avoid the meaning that it aims specifically at maximizing the received network congestion by adjusting the sending rates, without multimedia quality. Instead of applying a flow-based considering the impact on the application’s performance. For multimedia model, our solution takes into account the example, many TCP-friendly approaches apply analytical distortion impact and delay deadline of each packet, as well as models [2] on the long-term TCP throughput and adapt the the packets’ interdependencies using a directed acyclic graph sending rate to the periodically updated TCP throughput (DAG) [17]. Importantly, instead of reactively adapting the [3][4]. These model-based approaches aim to optimize the throughput, the proposed media-TCP actively and jointly bandwidth utilization, which may fail to maximize the optimizes the congestion window size as well as the multimedia application performance (e.g. video quality) since transmission scheduling to provide differential services for they do not consider multimedia characteristics, such as different packet classes. Performing this joint optimization is distortion impact, delay deadline, etc. very important in order to maximize the multimedia quality, because the optimal congestion window size depends on the 2) Flow-based models for multimedia traffic without delay transmission order of the multimedia packets in the consideration. Various approaches are proposed to adapt transmission buffer. multimedia applications to the available TCP throughput by applying rate-distortion optimization [11], forward error 2) Foresighted adaptation using a Markov decision process coding (FEC) [9][10], or frame dropping [25]. These solutions framework. We formulate the congestion control problem often adopt flow-based models for multimedia traffic that only using a Finite-Horizon Markov decision process (FHMDP) consider the high-level flow rate (e.g. the average rate and framework in order to maximize the expected long-term peak rate of the flow/frame [27]). They do not explicitly multimedia quality, under a long-term TCP-friendliness consider the specific distortion impact and delay deadline of constraint over the subsequent K time slots (i.e. horizon- K each packet, as well as the interdependencies existing among TCP-friendliness). Such foresighted planning is essential for packets of multimedia applications. Hence, these congestion multimedia streaming since it can consider, predict, and control approaches only provide suboptimal solutions for exploit the dynamic characteristics of the multimedia traffic in multimedia transmission [18][29]. order to optimize the application performance over dynamic IP networks. We show that the complex FHMDP formulation 3) Myopic adaptation. Prediction/estimation of the network can be decomposed into multiple optimal stopping problems condition is widely used in congestion control mechanisms [23]. Based on the structural results obtained from the based on network information feedback, e.g. in [4][11]. decomposition, we present low-complexity threshold-based However, these solutions adapt the congestion window when the multimedia packets are coded either myopically, i.e. based only on the current network condition. independently or interdependently. Considering the packets’ delay deadlines and dependencies in the transmission buffer, the congestion window size not only 3) Quality-based fairness among coexisting streams. impacts the immediate multimedia quality, but also impacts Preserving the fairness among the coexisting streams the available packets in the buffer for future transmission. represents an important issue [1][6]. However, even though Hence, it is important to consider not only the instantaneous the throughput/bandwidth is equally shared by the users, multimedia quality, but also how the immediate congestion multimedia users can still experience very different qualities window size impacts the long-term expected quality in the since various applications and source data may result in subsequent time slots. In [12], it was shown that the quality different traffic characteristics. Hence, instead of the can be improved by allowing temporary violation of the TCP- throughput-based fairness proposed by most existing friendliness, while later compensating the congestion control congestion control solutions, we focus in this paper on to maintain long-term TCP-friendliness. They proposed a joint quality-based fairness. We adopt Jain’s fairness index [26] on source rate control and QoS-aware congestion control scheme. the multimedia qualities and show that the proposed media- However, this solution only considers multimedia source rates TCP is able to achieve quality fairness among multimedia and adopts a heuristic rate-compensation that cannot users. In [25], the authors also proposed a frame dropping optimally determine the required congestion window size. scheme for min-max distortion fairness. However, the frame dropping approach is determined myopically, without In summary, a media-aware congestion control mechanism considering the resulting TCP-friendliness to other flows. that optimally determines the required congestion window size to maximize the long-term multimedia quality in a look-ahead In Table I, we compare the features of the proposed media- (foresighted) rather than myopic manner is still missing. TCP with the existing TCP-friendly congestion control solutions for multimedia streaming. B. Contribution of our solution and paper organization 3

TABLE I. COMPARISONS OF CURRENT CONGESTION CONTROL SOLUTIONS FOR MULTIMEDIA STREAMING. Name of the adopted Type of TCP- Distortion impact Delay deadline/ Multimedia support Decision type congestion control Friendliness consideration Content dependency TCP- Towsley 2008 [5] TCP Playback buffering No No Myopic streaming Bohacek 2003 [6] TCP TCP Playback buffering No No Myopic Rejaie 1999 [13] RAP AIMD-based Source rate adaptation – layered encoding No No Myopic Optimal source rate is bounded due to Mark 2005 [14] DTAIMD AIMD-based No No Myopic buffer underflow at the receiver Seferoglu 2009 [10] TFRC/FEC Model-based Application layer FEC Yes No Myopic Source rate adaptation – packet size Zakhor 1999 [8] TFRC Model-based Yes No Myopic adaptation Source rate adaptation – distortion Zhang 2001 [11] MSTFP Model-based Yes No Myopic minimization s.t. rate budget Our approach Media-TCP Model-based Quality-centric congestion control Yes Yes Foresighted

The paper is organized as follows. In Section II, we first l 3 response function2 RRttp(,)= (bits/sec). We formulate the packet-based media-TCP congestion control TCP Rtt2 p problem for one media-TCP user. In Section III, we present assume a time-slotted system and set the time slot duration T the FHMDP framework used by the media-TCP user to 3 as Rtt . Denote the measured rate in time slot k determine the optimal transmission scheduling and congestion k window size. In Section IV, we investigate how to decompose as p . We define the expected TCP window size in time slot the FHMDP problem and provide structural results for solving Rtt 3 k as Wpkk()== RRttp ( , k ) (pkts/time this problem in different transmission scenarios. In Section V, TCP TCP l 2pk we investigate multiple media-TCP users interacting in the slot), which can be viewed as a metric describing the network same network with regular TCP users and discuss the quality- congestion in time slot k . based fairness among the multimedia users. Simulation results are shown and discussed in Section VI, and Section VII B. Application layer multimedia model concludes the paper. Multimedia data is encoded and packetized into multiple data units at the application layer. A data unit usually encapsulates II. MEDIA-TCP CONGESTION CONTROL PROBLEM FOR a video slice, which contains a set of macroblocks or an entire ONE MEDIA-TCP USER video frame (see e.g. the H.264 standard [21] for example). We assume that these data units will be packetized into RTP A. Transport layer model packets with the same size l for transmission at the transport As in TCP, each packet transmission is acknowledged after layer [4][14]. We also assume that these packets are classified a round-trip time (RTT) Rtt . Packet loss rate p can be into M multimedia classes {}CL1,..., CLM . These packets measured based on the packets’ acknowledgements. We are queued in different transmission (post-encoding) buffers assume a model-based congestion control as in [4][8][11] that for transmission. A class CLm in time slot k is characterized kkAkDkk,, adapts the congestion window size to a long-term available by the set of parameters ψmmmmmm= {,NN , N ,, Qdepth }. TCP throughput calculated from the packet loss rate p and These parameters are discussed next. the round-trip time Rtt (the adaptation will be discussed in k Section II.D). Let l represesent the packet size. The long- (a) Packet number: Let Nm represent the number of packets term available TCP throughput can be approximated by in the transmission buffer of class CLm in time slot k . lba()2 − RabRttp(,, , )= (bits/sec) [14], where the (b) Arrival rate and discard rate: Let N Ak, denote the arrival AIMD Rtt2 bp m rate, which represents the number of packets in class TCP congestion control is modeled as a special case of CL that arrive in time slot k . Let N Dk, denote the generic Additive Increase Multiplicative Decrease (AIMD) m m discard rate, which represents the number of packets in based congestion control with parameters (,)ab 1 [14]. By class CLm whose delay deadline expires in time slot k . substituting (,)ab to (1, 0.5) , we have the well-known TCP A packet is purged from the buffer if 1) it is successfully transmitted or 2) its delay deadline is expired. Based on the packet arrivals and departures, the number of packets

1 Given the current congestion window size W , the user increases its congestion window size as per RTT when there is no packet loss. Wa+ 2 When a packet loss event occurs, the TCP congestion control decreases the A more sophisticated TCP response function can be obtained by Markov chain modeling [2], which considers the timeout duration. congestion widow size as (1− bW ) and retransmits the packets. A packet 3 For simplicity, we assume a fixed RTT (time slot) in this paper. However, loss event can be either a timeout or receiving three duplicated the proposed approach can be easily extended in a time-varying RTT acknowledgements in a row [14]. environment. 4

k k Nm varies over time. In practice, if the multimedia data Let πm ∈ {1, 0} represent the transmission permission

is pre-encoded, the arrival rate and discard rate in each for transmitting the packets in class CLm in time slot k . We 4 time slot can be computed a priori . In the case of real- k k assume that if πm = 1, then all of the Nm packets of class time multimedia transmission, e.g. video conferencing, k the arrival time and the delay deadline can be CLm are transmitted in time slot k ; if πm = 0 , then no 5 kk kk stochastically modeled [18]. packets in CLm are transmitted . Denote rNlmm()ππ= mm (c) Distortion impact: We assume an additive distortion as the source rate of class CLm in time slot k , and denote kk reduction for the packets similar to the one employed in r ==[rmm , 1,..., M ] as the vector of rates for all the [17][18]. Let the distortion reduction when the packets in classes in time slot k . In addition, let class CLm are received and decoded at the receiver be ki πAPP ==[πm ,ikmM 1,..., , = 1,..., ] represent the NQk , where Q represents the distortion impact of the mm m transmission permissions of all the classes from time slot 1 to class CLm . k time slot k . Hence, the availability ρm of the class CLm at (d) Depth: Some classes of packets need to be received the receiver in time slot k can be computed by k before others. Such interdependencies among the ρπkk()(π =≥I i 1), where I()⋅ represents an multimedia classes can be represented using a DAG [17]. mmAPP ∑i=1 Figure 1 gives an example of a DAG, in which MPEG indicator function. Based on the DAG, the actual distortion video frames are classified into classes. More examples reduction for a class depends on whether or not its ancestors can be found in [17][18], and our solution is not restricted are available at the receiver. Hence, the actual distortion to any packet classification methods. Based on the DAG, reduction of class CLm can be written as act k k k if there is a path from class CLm to CLn , we say the QQ()ππ= ρ () and the resulting mmnAPP∏CL ∈Anc APP class CL is an ancestor of CL , and CL is a nm m n n multimedia distortion reduction in time slot k can be descendent of CLm . Denote Ancm and Desm as the represented by: ancestor set and descendent set of the class CLm . Let kk kM actk k k k QQr((r ππ ))= mmm ( )(π ). (1) depthm represent the depth (the maximum distance) from APP∑m=1 APP class CLm to the root in the DAG in time slot k . For classes at the root of the DAG, we define its depth to be C. Conventional flow-based solutions 0. Depth captures the importance of a class in terms of Most existing TCP-friendly congestion control solutions for interdependency, which depends on the depths of the multimedia streaming reactively adopt the available TCP ancestor classes, i.e. depthmn=+max depth 1 . throughput as a rate-budget constraint and maximize the CLnm∈Anc immediate multimedia quality at the application layer (e.g. the The DAG structure varies over time as a traveling tree in rate-distortion optimization in [11] and the packet size [18]. Figure 1 also shows the variation of the DAG when adaptation in [7]). These solutions can be formulated using the the packets in class CL1 and CL3 are transmitted in time following flow-based optimization. slot k . Myopic-Flow-Based Optimization6: time slot k time slot k+1 maximize Qmkk([π ,= 1,..., M ]) Depth=0 Depth=1 Depth=2 Depth=3 k m Depth=0 Depth=1 Depth=2 Depth=3 [πm ,mM= 1,..., ] . (2) class 5 class 5 M kk P P s. t. rRRttpmTCP≤ ( , ) class 4 class 4 ∑m =1 B class 3 B Note that these solutions passively adapt the available TCP P class 2 k k class 2 throughput RRttpTCP (,) to the network condition p in B class 1 B each time slot. In contrast, we aim to propose a congestion I control mechanism that adapts the congestion window size to

Fig. 1 Travelling tree example with MPEG IBPBP video frames. both the network congestion and the specific characteristics

5 For illustration simplicity, we consider only binary transmission 4 k Assuming that the packets in class CLm has an arrival time tm (the permission πm in this paper to transmit the entire class or not. However, time when the packets are ready for transmission in the buffer) and a delay similar approach can be applied to transmit partial data of a class by deadline d , the arrival rate can then be calculated by k m considering πm ∈ [0,1]. Ak,0 0 6 NNIkTtkTmm=≤≤+(1) m(), where Nm represents the We assume that the information of actual distortion reduction act k Dk, Qm ()π is available in both equation (2) and (4). Moreover, because initial size of the class. The discard rate can be written as Nm = APP k of the retransmission error control in the transport layer, we ignore the impact NIkTmm(1)≤≤+ d() k T. of packet loss in the distortion reduction model. 5 and requirements of multimedia applications. Instead of metrics in each time slot as in TCP. For example, a simple shaping the traffic at the application layer to match the updating rule of the packet loss rate can be written as k kkk+1 k kk available throughput RRttpTCP (,), the proposed media-TCP [11]: ppW(, )=+−αα p (1 )( pWˆ ) , where jointly optimizes the congestion window size W k and the pWˆkk() represents the realization of the packet loss rate in k transmission permissions πm of classes at the transport layer time slot k , and α represents the updating rates. In this to maximize the expected multimedia quality. In the next paper, we focus on the joint optimization of the congestion subsection, we discuss the media-TCP congestion control window size and transmission scheduling by the transmission problem in more details. scheduler. For exposition simplicity, we assume that the only network congestion metric is the expected TCP window size kk D. Proposed packet-based media-TCP solution WpTCP (). Note that other congestion metrics and more The proposed media-TCP congestion control is illustrated sophisticated updating rules (as in [4][11]) can be easily in Figure 2. integrated into the proposed media-TCP.

At the application layer: Multimedia RTP packets are Time slot k Time slot k + 1 classified into M classes based on their interdependencies. Application layer Multimedia classes Sophisticated packet classification can be performed in the transmission buffer Multimedia classes Packet application layer based on the different video coding dropped due to structures. Based on the packet classification, the attributes expiration k [ψm ,mM= 1,..., ] for each class can be determined as in Transport layer [18][19][20]. TX TX scheduler scheduler Ack Ack estimator estimator Congestion At the transport layer: Media-TCP adopts the same error Congestion control as TCP, which retransmits the lost packets based on RTP packets RTP packets negative acknowledgements. However, unlike TCP which Fig. 2 System diagram of media-TCP in time slot k and k + 1 . keeps retransmitting the lost packets until success, media-TCP will drop all the expired packets in the transmission buffer. We assume that the proposed media-TCP congestion control Moreover, unlike TCP that adopts an AIMD-based congestion adheres to the following TCP-friendliness constraint. control, media-TCP adjusts the congestion window size Definition 1 Horizon- K TCP-friendliness: A congestion relying on the following two components: control scheme is horizon- K TCP-friendly, if and only if the (a) Transmission scheduler: The transmission scheduler congestion control window sizes from time slot i to time slot iK+−1 selects the classes of packets to transmit and also determines W k iK+−1 satisfy the condition ∑ kk≤ K . the number of packets to be sent in time slot k . Specifically, ki= WpTCP () the packet scheduler computes the priority metrics kkkM x =∈[PM1 ,..., PMM ] R for all the classes to capture The horizon- K TCP-friendliness constraint keeps the average kk the marginal benefit (in terms of decreasing the expected TCP-friendliness ratio WWTCP close to 1 over the horizon distortion) when the packets in class CLm are transmitted in K . Based on this definition, in time slot i , the proposed time slot k . In Section IV, we will discuss how to optimally media-TCP congestion control solves the following packet- determine these priority metrics based on the application based optimization. attributes [ψk ,mM= 1,..., ]and the network conditions. Then, m Foresighted-Packet-Based Optimization: the transmission scheduler sends the packets in the classes with positive priority metrics 7 . In other words, the iK+−1 kkk k maximizeEQ [ ([πm (x ), m= 1,..., M ])] transmission permission πm for each class and the resulting iM ∑ x ∈R ki= k , (4) congestion window size W can be determined as: iK+−1 ⎡⎤kk 1 ⎢⎥W ()x kk max M s.t. EWW≤≤1, 0(x ) ≤ ππkk()xxx=>IPM ( k 0),() W kk = N kkk (). (3) K ∑ ⎢⎥Wpkk() mm∑m=1 mm ki= ⎣⎢⎥TCP ⎦ (b) Network estimator: The network estimator updates the where E[]i represents the expected value and W max 8 k packet loss rate p and evaluates the network congestion represents the maximum congestion window size. Comparing our media-TCP using the foresighted-packet- 7 A positive priority metric for a class indicates that the benefit (i.e. the resulting expected distortion reduction) of transmitting the packets in that based optimization in equation (4) with the conventional class is greater than the cost of transmitting the packets. Hence, transmitting solutions using the myopic-flow-based optimization in the packets of that class increases the total utility. equation (2), the differences are: 8 Here, we assume that Rtt remains constant. In practice, the round-trip time can also be updated using a similar updating rule. Our Theorems and 1) Conventional solutions passively adapt the congestion Lemmas still hold in such cases. 6 window size to the network condition (e.g. the expected framework are described next: k packet loss rate p ). Our proposed media-TCP solution takes (a) Action: We denote the action of the FHMDP in time slot one step further by adapting the congestion window size to the k as amMkk==[π , 1,..., ] ∈=A {0,1} M. network conditions as well as to the application characteristics m by optimizing the transmission scheduling. This allows the (b) State and state transition: We denote the state of the user to adapt the congestion window size to the characteristics FHMDP in time slot k as of the multimedia packets in the buffer to provide differential sWkkkNetApp=∈×={,}N SS S, where the services for various packet classes. Hence, a packet class with TCP k a higher distortion impact or more stringent delay deadline has expected TCP window size WTCP represents the network a higher chance to be transmitted, which is desirable for state and the number of packets kk maximizing the received multimedia quality. N ==[Nmm , 1,..., M ] in all the packet classes 2) Media-TCP maximizes the expected long-term distortion represents the application state. Let reduction over a horizon- K instead of solely maximizing the S Net = {0,...,W max } represent the state space of the immediate distortion reduction as in the conventional network state W k , and let S App= {0,...,N max } M solutions. This is especially important for multimedia TCP represent the state space of the application state, where applications with content dependencies. For example, in the max IBPBP frame structure in Figure 1, the media-TCP user may W represents the maximum number of the window want to plan the congestion window sizes for transmitting I size and N max represents the maximum number of and P frames, instead of myopically determining window sizes packets in a class. Let P :[0,1]SAS××→ denote for the B frames. the state transition function, which can be described for 3) Instead of performing a constrained rate optimization the network state and application state as follows: myopically at every time slot, media-TCP adopts the horizon- 1) The network state transition is described by the state K TCP-friendliness constraint. Note that the horizon- K transition probabilities PW(|)kk+1 W , which can be TCP-friendliness becomes the traditional rate budget TCP TCP constraint in equation (2) when K = 1 . The larger horizons evaluated by estimating the next possible packet loss rate provide long-term TCP-friendliness, which leads to more pk +1 given the current feedback pk as in [9][15]. In flexible window sizes and a better expected long-term quality. general, the number of TCP users in the network is large However, the short-term TCP-unfriendliness can be high enough such that the network state transition is not k (especially when network condition is good, i.e. WTCP is impacted by a single user’s action. large) and needs to be compensated in the subsequent time 2) The application state transition is described by the state slots [12]. transition probabilities Pa(|,)NNkkk+1 . The number of In the next section, we discuss how the foresighted-packet- packets in each class varies over time depending on the based optimization in equation (4) can be solved by applying action ak . Note that each class can have its own arrival an FHMDP framework. Ak, rate per time slot Nm ≥ 0 and its own discard rate per Dk, time slot Nm ≥ 0 . Therefore, the application state INITE ORIZON ARKOV ECISION ROCESS III. F -H M D P transition can be computed as: In this section, we first formulate the media-TCP kkkAkDkk+1,, congestion control problem in equation (4) using an FHMDP NNmmmmmm=−+−(1ππ ) NN (1 − ) with Markovian state transition. The complexity of the =(NNkDkk−−+,, )(1π ) N Ak . (5) FHMDP can be high due to the large state space. Hence, in the mm m m remaining number of packets packet arrivals next section, we will decompose the problem into simpler sub- after packet departure problems having smaller state spaces. We model this problem as an FHMDP due to the following reasons: This framework can be applied to both pre-encoded and 1) In numerous multimedia applications, multimedia traffic real-time multimedia applications. For pre-encoded can be described by Markov models (as in [24]). multimedia applications, the media-TCP user knows the state transitions for the entire multimedia session and 2) TCP operations are commonly modeled by discrete-time solves the finite-horizon dynamic programming problem. finite-state Markov chains (see e.g. [2][5][15]). Hence, the For real-time multimedia applications, media-TCP can average TCP window size can be described by Markovian apply stochastic models to capture the state transitions of models based on the states of all the users (or the aggregate the applications [18]. states as in [9]) in the network. Since the network state transition and the application The FHMDP framework can be defined by the tuple state transition are independent, we define the overall {,,,,,,}AS Puγλ K . The various components of the 7 state transition probabilities 9 as: utility-to-go at the last time slot of the horizon as kkk+++111 k k k kk iK+−11 iK +− Ps(|,)( s a= PWTCP | WTCP )(|,) PNN a . Jsusssμ ()=∀∈ (,μ ()), S , where μ : SA→ represents a stationary mapping from the given state to an (c) Utility, discount factor, and horizon definitions: action. We define the expected utility-to-go in time slot First, we apply a positive Lagrangian multiplier λ and kiiK=+−,..., 2 as: modify equation (4) into an unconstrained optimization10: kk kk k k+++111 k k k k Jsμμ()=+ us (,())μγ s∑ Ps ( | s ,()) μ s J ( s ). iK+−111⎛⎞ iK +−k iK +− sk +1 ∈S kkk ⎜ W ⎟ maximizeQas ( , )−−λ⎜ 1⎟ ai ∈A ∑∑∑⎜ k ⎟ The optimal congestion control policy in time slot i can ki===⎝⎠⎜ kiWTCP ki ⎟ , (6) then be rewritten as: iK+−1 kkk ii i i = maximizeuas ( , ) χ (sJs )= arg maxμ ( ) . (8) i ∑ μ a ∈A ki=

kkk kkk k k The system diagram of the proposed media-TCP congestion where uas(,)=− Qas (,)λ() WWTCP − 1 is control using FHMDP framework is shown in Figure 3. The referred to as the instantaneous utility in time slot k . The media-TCP user repeats the following steps at each time slot i : second term of the instantaneous utility can be interpreted 1) Calculate the application state and network state transition as the TCP window size deviation cost. The Lagrangian 11 probabilities. multiplier λ determines how the media-TCP user 2) Evaluate the expected utility-to-go at the various time slots favors the TCP-friendliness over the multimedia quality. of the entire horizon, i.e. from Based on the unconstrained optimization, the objective of JsiK+−11(), iK +−∀∈ s iK +− 1S to Jsii++11(),∀∈ s i + 1S . the FHMDP in time slot i is defined as: μ μ 3) Based on the expected utility-to-go, the user updates the iK+−1 policy using χγii(susa )= arg max kikk− ( , k ) , (7) a∈A ∑ ⎪⎪⎧⎫ ki= ⎪⎪ikkkk+++111 μγ()susaPssaJs←+ argmax⎨⎬ ( , ) ( | , )μ ( ) ,(9) a∈A ⎪⎪∑ ⎩⎭⎪⎪sk +1 ∈S where χii()s represents the optimal congestion control and obtains the optimal action given the current state si . policy given the state si at the time slot i , and γ represents the discount factor ( 01≤≤γ ). Note that The approach is similar to the “receding horizon control” in the optimal control literature [15]. Note that the complexity of equation (7) is equivalent to the unconstrained solving the FHMDP directly is extremely high: it is optimization in equation (6) when γ = 1 . Since the proportional to the square of the number of states, and the Markovian models of the network state and application number of states is exponential in the number of classes, i.e. state transition may not be accurate, the discount factor γ ()N max M . Hence, it is important to decompose the FHMDP is set smaller than 1 to alleviate the impact of the problem into sub-problems with smaller state space to reduce inaccurate future utilities. The tradeoff between the TCP- the complexity of solving the problem. friendliness and multimedia quality with different λ and γ will be discussed in Section VI. Multimedia traffic classifier Application layer

Transport layer Note that the media-TCP not only maximizes the Current application state instantaneous utility, but also the expected future utilities, Priority PM Schedule classes Calculate application metric which are expressed using the expected utility-to-go, which is state transitions Decision defined next. J k making blocks Utility-to-go k Current J evaluation Definition 2. Expected utility-to-go: Define the expected k Application J state Calculate network Determine k W state transition 9 congestion window In this work, we assume that the state transition probabilities can be size policy calculated a priori. In fact, the state transition probabilities can be learned on Current Current the fly as long as the transitions of the states are semi-stationary. network Network feedback network 10 State state To simplify notation, we ignore the expectation E[]i in our FHMDP formulation. Fig. 3 System diagram of the proposed media-TCP congestion 11 control. λ can be chosen based on the dual problem of equation (4) with the M IV. STRUCTURAL SOLUTIONS OF MEDIA-TCP constraint QWN−=λ kk0 , which suggests ∑m=1()m TCP m CONGESTION CONTROL λ = WQTCP m , where Qm represents the average distortion reduction In this section, we decompose the FHMDP problem in k Section III. We derive structural results that provide the per packets in class CLm , and WTCP is the time average WTCP over the i* horizon. optimal values of the priority metrics x for the media-TCP 8 congestion control problem described in Section II.D to Theorem 2. Structural results of the proposed media-TCP facilitate a low-complexity threshold-based congestion control with independent packets: Given the state si in time slot i , as shown in equation (3). In Section IV.A, we discuss the the optimal policy of the transmission permission for a class decomposition of the FHMDP when the applications are coded CL is πii**()sIPMs=> ( ii () 0), where the optimal independently. Then, in Section IV.B, we take the m mm ii* interdependencies among packets into account and discuss how priority metric PMm () s is computed by: to derive the structural results in this case. ⎛⎞ ii* ⎜ λ ⎟ i PMmm() s=−⎜ Q⎟ N m + ⎝⎠⎜ W i ⎟ A. Decomposition with independently coded packets TCP ⎛⎞iiAi++11, We first discuss the case when all the packets are ⎜JWμ,mm(,)TCP N ⎟ γ PW(|)ii+1 W ⎜ ⎟ ∑ TCP TCP ⎜ iii++11 DiAi , ,⎟ independently coded (for example, video streams are coded i+1 ⎜ ⎟ s ∈S ⎝⎠⎜−−−JWμ,mmmm(,TCP N ( N N ))⎟ using motion-JPEG), i.e. all the classes have the same depth, (10) depthm ==0, m 1,..., M . We first examine the structure of kk Jsμ() and define the following property of the expected Proof: See Appendix C. utility-to-go. Theorem 2 indicates that when the packets are coded independently, the media-TCP congestion control problem Definition 3. Separable expected utility-to-go: The expected kk becomes an optimal stopping problem [23], where the media- utility-to-go Jsμ() in time slot k is separable if and only if TCP user transmits the packets of a certain class if and only if it can be written in the form of the priority metric of the class is positive. The priority metrics kkM k k k k ii** ii Js()=+ J ( W , N ) CW ( ) , where x (sPMsmM )== [m ( ), 1,..., ] quantify the benefits of μμ∑m=1 ,m TCP m TCP JWkkk(,) N represents the utility-to-go component of a transmitting packets from various classes as opposed to not μ,mTCPm transmitting them in time slot i . Importantly, in addition to k specific class CLm , and CW()TCP represents a term that only the distortion impact Qm , the media-TCP user needs to k depends on the network state WTCP . consider the arrival rates and discard rates of the various classes. If a class CLm has numerous expiring packets (i.e. Next, we show that the expected utility-to-go is separable for Di,, Ai multimedia applications with independently-coded packets. NNmm− is large), it can be shown that the respective class has a larger PMii*() s to be transmitted in time slot i , Theorem 1: The expected utility-to-go m instead of waiting for a future time slot. Moreover, it can be Jskk( ),∀∈ s k Sk , = i ,..., i +− K 1 over the horizon are μ shown that at a time slot with a better network state (larger separable, and the utility-to-go component JWkkk(,) N i μ,mTCPm WTCP ), we have larger priority metrics from equation (10) for k of class CLm is a nondecreasing function of Nm . all the classes and hence, more classes are able to obtain the transmission permissions. Based on Theorem 2, we have the Proof: See Appendix B. following remarks: Based on Theorem 1, we have the following remarks: Remark 3: In equation (10), as γ approaches 0, the user kk Remark 1: The separation of the expected utility-to-go Jsμ() prefers to prioritize the packet classes based on their distortion impact values Q . As γ approaches 1, the user increasingly in Theorem 1 suggests that the utility-to-go evaluation can be m decomposed into M independent FHMDP sub-problems. weights the impact from the arrival rate and discard rate on the Each sub-problem computes a utility-to-go component future expected utility. Note that when γ = 0 , the FHMDP kkk JWμ,mTCPm(,) Nfor a class (see equation (17) for the problem in equation (7) becomes a myopic optimization that merely optimize the instantaneous utility, which is equivalent optimization sub-problems). The decomposition significantly to solving an unconstrained optimization of the conventional reduces the overall complexity originally proportional to solution in equation (2). ()N max M to a complexity proportional to MN max . Remark 4: Note that the optimal congestion control policy kkk ii Remark 2: The nondecreasing property of JWμ,m(,) TCP N m in χ ()s in equation (7) includes the optimal priority metrics k ii* ii* Nm in Theorem 1 shows that the more packets in the x ()s and the congestion window size Ws(). Theorem 2 transmission buffer of class CLm , the media-TCP user has provides the optimal priority metrics ii** ii higher expected utility for the specific class when the x (sPMsmM )== [m ( ), 1,..., ] . Based on this, the application is independently coded. Next, we investigate how ii* the user determines the optimal congestion control policy optimal congestion window size Ws() of the media-TCP when it knows the number of packets in the transmission user can be written as buffer of each class. The following theorem presents the ii**M i ii Ws()=>∑ NIPMsmm ( () 0) and the resulting structural results of solving the FHMDP problem. m=1 9 expected multimedia quality can be computed by maximum depth D and the number of classes per depth is M M Qsii()=> QNIPMs i ( i* () i 0). Note that the on average, the complexity of the algorithm can be represented ∑m=1 mm m max max 2 optimal policy varies with both the application state and the by OMDKW()() N . iii network state in time slot i (sW= {,}TCP N ). V. FAIRNESS AMONG MULTIPLE MEDIA-TCP STREAMS In Appendix A, Algorithm 1 provides the specific In the previous sections, we focus on only one media-TCP procedures for computing the optimal congestion policy ii user interacting with multiple regular TCP users in the same χ ()s when the packets are independent. The time network. In this section, we assume that there are V media- complexity of the algorithm is OKMW()()max N max 2 . TCP users interacting with other TCP users in a network and investigate the competition among the multimedia users. B. Decomposition with interdependently coded packets Denote V =={Vnn , 1,..., V } as the set of the media-TCP In this subsection, we investigate the decomposition of users. Denote user Vn ’s expected multimedia distortion k FHMDP problem when the packets have interdependencies, reduction in time slot k as Qn . We assume a saturated described by a DAG as introduced in Section II.A. The condition (i.e. all the users continuously have their source following theorem presents the structural results of solving the traffic fed into their transmission buffers). We apply the well- FHMDP problem. known Jain’s fairness index [26] to quantify the fairness Theorem 3. Structural results of media-TCP with among the V media-TCP users: 2 i V k interdependent packets: Given the DAG and the state s in Qn k ()∑n=1 time slot i , the FHMDP problem can be solved by repeating F = V 2 . (12) VQk the following two phases: ∑n =1()n Phase 1. Select packet classes to transmit at the current time The fairness index F k measures the quality deviation of the i slot at the depth depthn =− j 1 : multimedia applications. It varies as the media-TCP users ii** ii i πmm()sIPMs=>∀∈=− ( () 0), CLCLdepthj mnn { , 1}, make their own decisions at each time slot. Note that the index where is always bounded by 1. The quality-based fairness is reached, ⎛⎞ i.e. F = 1 , only when all the media-TCP users have the ii* ⎜ actλ ⎟ i PMmm() s=−⎜ Q⎟ N m + same multimedia quality. ⎝⎠⎜ W i ⎟ TCP ,(11) Following the TCP response function introduced in Section ⎛⎞iiAi++11, ⎜JWμ,mm(,)TCP N − ⎟ II.B and the packet loss rate updating rules in Section II.D, the γ PW(|)ii+1 W ⎜ ⎟ ∑ TCP TCP ⎜ iii++11 DiAi , ,⎟ i+1 ⎜ ⎟ expected network state of a user V in the next time slot can s ∈S ⎝⎠⎜JWμ,mmmm(,TCP N−− ( N N ))⎟ n and j represents the number of iterations. kkk+1 1.5 be expressed as EW[(,)]TCP, n pnn W = kkk+1 . Phase 2. Update the actual distortion impact of each class: ppWnnn(, ) act i* k +1 QQmm= i π n, mM= 1,..., . Similarly, we denote EPM[]mn as the expected priority ∏∀∈CLnmAnc , Proof: The proof can be easily provided by considering the metric of CLmn (the m -th class of user Vn ) in time slot k DAG structure. We omit the proof due to space limitations. k + 1 , which can be shown as a function of Wn . Then, we In Phase 1, the media-TCP user selects packet classes for can prove the following lemma. transmission by applying Theorem 3 starting from the classes Lemma 1: For user Vn , its priority metrics kk+1 at the root of the DAG. Since classes with the same depth are {[EPMmn ( W n )],∀∈ CL mn V n } are all nonincreasing independent of each other, Theorem 2 can be applied to Phase functions of W k , if 1 for classes with the same depth. Phase 2 indicates that if a n 1) γ = 0 , or class has no transmission permission, i.e. πii*()s = 0, the m 2) 01<≤γ , NNDk, = k, ∀∈=CL V, k 1,..., K media-TCP user set all its descendents’ distortion impact to 0, mmmn n Proof: For both conditions, the priority metric of class CL i.e. Qact = 0 for ∀∈CL Des (see Section II.B). Based on mn n nm kk+1 the DAG, since the distortion impact of a class is only can be written by EPM[()]mn W n = kkk+1 influenced by the ancestors, the greedy algorithm in Theorem 3 QEWpWmn− λ [(,)]TCP n n based on equation (10). Since the starting from the root provides the optimal congestion policy. kkk+1 The two phases are repeated until the maximum depth of the estimated packet loss rate ppW(, ) is in general a DAG is reached. monotonically nondecreasing function 12 of the congestion k In Appendix A, Algorithm 2 provides the procedures for window size Wn [11], it is straightforward that both the computing the optimal congestion control policy χii()ssWs= {x i** (), i i ()} i when the packets are 12 The packet loss rate can be modeled as an M/M/1/K queue at the bottleneck link that reacts to the summation of the window sizes of all the interdependently coded. Assuming that the DAG has the users. 10 kkk+1 fairness. expected window size EW[(,)]TCP pnn W and the expected kk+1 Theorem 4: The fairness index of multiple media-TCP users priority metrics EPM[()]mn W n in the next time slot are ∞ Vn ∈ V converges to 1, i.e. F = 1 , if the following k nonincreasing functions of Wn . ■ sufficient conditions are satisfied: Lemma 1 indicates that the priority metrics 1) QQmn== mn' Q m ∀∈VVnn, ' V EPM[()],kk+1 W∀∈ CL V are nonincreasing functions of mn n mn n 2) NNmn≥ m'' n for any QQmm≥ ' , ∀∈VVnn, ' V . k kk+1 Wn when the users apply the myopic media-TCP or when the 3) EPM[()],,mn W n∀∈∀∈ CL mn V n V n V are all packets in the buffer expire in the next time slot. In these two nonincreasing functions of W k using the same λ . cases, the priority metrics are dominated by the first term in n Proof: See Appendix D. equation (10). The second condition requires the delay The first condition in Theorem 4 indicates that the media- deadlines of the classes to be stringent, which is more likely to be true in the case of real-time streaming, as opposed to the TCP users apply the same set of [Qmm ,= 1,..., M ] to pre-encoded streaming applications. In these two cases, classify their M classes of the multimedia applications. The Lemma 1 indicates that the competition among users makes it second condition indicates that for all the users in the network, impossible for a user to excessively increase its congestion users always have more packets in a class with higher window size in order to improve its own quality. The increase distortion impact than a class with lower distortion impact. k This condition is commonly seen in many video coding of the congestion window size Wn may decrease the priority techniques. For example, in MPEG video frames, I-frames metrics and hence, reduce the resulting distortion reduction in usually contain much more information bits than P-frames and the next time slot. Following Remark 4 in Section IV, the B-frames. The third condition is discussed in Lemma 1. Based multimedia distortion reduction of user V in the next time n on equation (14), as long as the priority metrics slot can be written as kk+1 k EPM[()]mn W n are nonincreasing functions of Wn for all kk++11 kk QWn() n=>∑ QNIEPMW mn mn ([ mn ()]0) n ,(13) the classes, we can show that a user with a larger multimedia ∀∈CLmn V n quality Qn will always have a smaller quality change ΔQn where Qmn represents the distortion impact of class CLmn . when users applying media-TCP to change their congestion kk+1 Comparing QWnn() with the current multimedia distortion window size. In Appendix D, we prove that this allows the k proposed media-TCP to satisfy the sufficient condition in reduction Qn , the variation appears only for the classes whose k Lemma 2 and hence the quality-based fairness index priority metrics change sign. If we denote Mn as the set of converges to 1. Finally, the three conditions in Theorem 4 lead kk+1 ∞ classes whose priority metrics follow PMmn E[]0 PM mn < , to F = 1 for media-TCP users Vn ∈ V . we can rewrite equation (13) as ⎧ kk VI. SIMULATION RESULTS ⎪Qnn, if M =∅ ⎪ In this section, we simulate the proposed congestion control k +1 ⎪ kk k k+1 k Q = ⎨QQPMEPMCLnn+Δ, if mnmnmnn < [ ], ∀ ∈M n ⎪ scheme using different video sequences: “Forman”, “Mobile”, ⎪ and “Coastguard” (at a frame rate of 30 Hz, CIF format). The ⎪QQPMEPMCLkk−Δ, if k > [ k+1 ], ∀ ∈M k ⎩⎪ n n mn mn mn n sequences are encoded using an embedded scalable video , (14) codec [20] at the bitrate of 1500Kbps. We assume that each kk+1 k Group of Picture (GOP) contains 16 frames and each of them where Δ=QQnn − Q n =∑ k QN mnmn ≥0 ∀∈CLmnM n can tolerate a playback delay of {133, 266, 400, 533} ms. We represents the difference of the expected distortion reduction set the packet length up to 1000 bytes and the video packets are of user Vn in time slot k . Based on equation (14), we next classified into sixteen classes based on their spatial and prove the sufficient condition for achieving the discussed temporal interdependencies as in [19][20]. Table II provides a quality-based fairness. summary of the classifications of the sequences. We simulate Lemma 2: The difference of the fairness index is nonnegative, the video transmission using MATLAB using the simulation i.e. Δ=FFkk+1 − F k ≥0 , if settings in Figure 4. There are 20 regular TCP users and the resulting average RTT for the video packets is 133ms. kkkkk2 ()QQQQQnnnnnΔ≥ Δ. (15) ∑∑ ∑∑ Multimedia VV∈∈VV VV ∈∈ VV Multimedia app sink nn nn user 1 Proof: We omit the proof here due to space limitations. A 10Mbps 10Mbps similar proof can be found in [16]. Multimedia Regular TCP app sink Lemma 2 provides a sufficient condition that ensures a user 2 N regular nondecreasing fairness index ΔF k . Since the index is TCP users bounded by 1, the interaction among users asymptotically Fig. 4 The simulation settings. drives the fairness index to 1 [16]. Based on Lemma 2, the following Theorem provides the sufficient conditions for A. Tradeoff between multimedia performance and TCP multiple myopic media-TCP users to reach the quality-based friendliness 11 First, we simulate the case without the media-TCP user 2. scheduling and congestion window size; 2) a flow-based rate- We focus on the media-TCP user 1 streaming the “Coastguard” distortion optimization approach (RD) [11] to optimize the sequence using the proposed media-TCP congestion control transmission scheduling by adapting the sending rate to the with different Lagrangian multipliers and discount factors. available TCP throughput; 3) passive multimedia transmission Based on the measured packet loss rate, the media-TCP user directly over TCP connections (PA) as in [5]. applies a Markov chain model (similar to the model applied in Figure 6 shows the average video quality of various [9]) on the network states with an expected TCP window size k approaches using different playback delays and clearly WTCP = 16 per RTT, and the horizon K = 4 RTT. Figure 5 demonstrates that the joint transmission scheduling and shows the tradeoff between multimedia quality and the congestion control optimization is essential for real-time horizon- K TCP-friendliness. Larger λ provides better TCP- multimedia transmission. Our proposed approach significantly friendliness, but achieves lower multimedia quality, because outperforms the others especially when the playback delay is the quality gain is weighed less than the cost within the smaller than 400ms (which is common in numerous real-time instantaneous utility. The results also show that the foresighted video streaming and videoconferencing applications), because approach with larger γ significantly improves the multimedia it is able to jointly optimize the congestion window size as well quality while maintaining moderate TCP-friendliness. as the transmission scheduling by considering the distortion However, in this paper, we focus on deriving the optimal impacts, delay deadlines, and interdependencies of the packets. solution when the environment (i.e. the state transition probabilities) and the utility are perfectly known. If the transition probabilities are not perfect, a larger γ can lead to a 38 worse learning performance. The selection of λ and γ for 36 Forman media-TCP using online learning for the case when the 34 environment is unknown represents an interesting future Coastguard research direction. 32

TABLE II. CLASSIFICATION OF THE SEQUENCES. 30 Coastguard, PA

Class CL 1 2 3 4 5~8 9~16 (dB) Y-PSNR Average m 28 Coastguard, RD Q (dB/pkt) range 0.154 0.153 0.09 0.08 ~0.072 ~0.053 Coastguard, MT m Forman, PA 0 26 Forman, RD Nm of “Coastguard” per GOP 17 17 12 12 5 4 Forman, MT 0 Nm of “Forman” per GOP 34 34 8 8 4 0 24 100 200 300 400 500 600 0 Playback Delay (ms) Nm of “Mobile” per GOP 30 30 13 13 1 0 Fig. 6 Average received video quality using different TCP congestion control for multimedia transmission (For MT (a) 32 γ = 0.8 approach, λ ==10, γ 0.8 , K = 4 RTT). γ = 0.4 30 γ = 0 C. Fairness among multiple media-TCP streams 28 In this subsection, we validate the quality-based fairness

Average Y-PSNR Average among multiple users using media-TCP. We simulate the case 26 10 20 30 40 when multimedia user 1 streams “Coastguard” sequence and (b) λ multimedia user 2 streams “Mobile” sequence simultaneously 1.15 γ = 0.8 using the same simulation settings in Figure 4 with 20 regular γ = 0.4 TCP users. The playback delay is set as 533ms. Figure 7 1.1 γ = 0 shows the congestion window size and the video quality over 1.05 time when the multimedia users apply the passive multimedia Friendliness TCP transmission approach (PA) directly over TCP connections as 1 10 20 30 40 in [5]. It is shown that although the congestion window sizes λ of the multimedia users follow the average TCP window size Fig. 5 (a) Average Y-PSNR of Coastguard sequence, (b) of the 20 regular TCP users, the video quality gap between the resulting TCP-friendliness versus different Lagrangian two sequences is always larger than 3 dB. multipliers (playback delay: 266ms, K = 4 RTT). On the other hand, Figure 8 shows the congestion window B. Comparisons against alternative congestion control size and the video quality over time when the multimedia solutions for multimedia applications users apply the proposed media-TCP congestion control (MT) algorithms. We classify the video packets of both sequences to We simulate separately the streaming of “Coastguard” satisfy the first two sufficient conditions in Theorem 4. We sequence as well as “Forman” sequence using three different also set a small discount factor γ = 0.1 to ensure that the approaches: 1) our proposed packet-based media-TCP kk+1 congestion control (MT) that jointly optimizes the transmission priority metrics EPM[()]mn W n are nonincreasing functions 12 sequences as the number of users in the network increases, (a) 50 while having limited impact on the other regular TCP users. This is because the media-TCP is able to better utilize the 40 Regular TCP Coastguard, PA resource by prioritizing the video classes for transmission, in 30 Mobile, PA addition to merely adapting the congestion window size. 20 VII. CONCLUSIONS Window size 10 In this paper, we formulate a media-aware congestion 0 20 40 60 80 100 control for multimedia transmission using FHMDP that explicitly considers the distortion impacts, delay deadlines and (b) 40 interdependencies of the various multimedia classes. The Coastguard, PA Mobile, PA proposed approach not only adapts the congestion window 35 size given the measured packet loss rate, but also optimally 30 prioritizes the multimedia classes for transmission and hence, further improves the multimedia quality. We show that this 25 complex FHMDP problem can be decomposed into simpler

Video quality(PSNR) optimal stopping problems, thereby significantly reducing the 20 20 40 60 80 100 complexity of solving the problem. The simulation results Time slot (RTT) show that the proposed foresighted media-TCP significantly Fig. 7 (a) Congestion window sizes over time for the two outperforms the conventional TCP-friendly congestion control multimedia users and the average congestion window size of the schemes in terms of quality, especially for real-time streaming 20 regular TCP users. (b) Video quality over time for the two with a small playback delay. Moreover, unlike the multimedia users using PA approach (playback delay: 533ms). conventional congestion control approaches focusing on the throughput-based fairness, our solution maintains the quality- (a) 50 Regular TCP Coastguard, MT based fairness among the multimedia users, which improves 40 Mobile, MT the overall streaming quality by utilizing the available 30 bandwidth resources more efficiently. TABLE III. THE COMPARISONS OF THE VIDEO QUALITIES USING THE PA 20 APPROACH AND THE PROPOSED MT APPROACH WHEN THE NUMBER OF Window sizeWindow 10 USERS IN THE NETWORK INCREASES.

0 (playback MT approach ( λ = 10 , 20 40 60 80 100 delay: PA approach 533ms) γ = 0.1 , K = 4 RTT) (b) Avg. Avg. Avg. Avg. 40 Avg. Avg. Avg. Avg. PSNR window PSNR window Number of PSNR PSNR PSNR PSNR of the size of of the size of 35 TCP users of user of user of user of user users TCP users TCP 1(dB) 2 (dB) 1(dB) 2 (dB) 30 (dB) users (dB) users N = 20 32.07 27.46 29.76 17.00 31.58 30.43 31.00 16.95 25 Coastguard, MT N = 25 31.14 25.61 28.37 14.14 31.43 30.40 30.91 13.80 Mobile, MT

Video quality(PSNR) N = 30 29.88 23.38 26.63 11.70 30.29 30.22 30.25 11.51 20 20 40 60 80 100 Time slot (RTT) APPENDIX A

Fig. 8 (a) Congestion window sizes over time for the two Algorithm 1 Media-TCP congestion control with independent packets multimedia users and the average congestion window size of the For time slot ki= , given the current state sKi ,,,λγ 20 regular TCP users. (b) Video quality over time for the two Set kiK=+ −1 ; multimedia users using MT approach (λ = 10 , γ = 0.1 , While ki≥ K = 4 RTT , playback delay: 533ms). For all classes CLm of W k for all the classes. It is shown that media-TCP results kk k n Compute all JWmTCPm(,) N from equation (17), in closer streaming qualities for the two multimedia users at for NNWWkk∈∈{0,...,max }, {0,..., max } ; the cost of moderate TCP unfriendliness. Hence, by applying m m TCP TCP kk* kk* the proposed media-TCP, the multimedia users fairly share the Compute PMm () s and πm ()s in equation (10); resources in terms of video quality. End for We further increase the number of the regular TCP users Set kk=−1 ; using the same simulation settings. The resulting congestion End while iki** window sizes and the video qualities are summarized in Table Set PMmm= PM() s for mM= 1,..., ; III. It is shown that the proposed media-TCP congestion ii* Set WN= ∑ i* m ; control is able to maintain the streaming qualities for both ∀≥mPM,0m 13

Algorithm 2 Media-TCP congestion control with interdependent packets ⎛⎞M ⎜ kkkk+++111 ⎟ i ⎜ JWNμμ,,mmm(,())TCP π ⎟ For time slot ki= , given the current state sK,,,λγ kk+1 ⎜ ∑ ⎟ γ ∑ PW(|)TCP WTCP ⎜ m=1 ⎟ k +1 ⎜ Set kiK=+ −1 ; s ∈S ⎜ kk++11 ⎟ ⎝⎠⎜+CW()TCP ⎟ While ki≥ M Set W k = 0 and depth = 0 ; kk k kk =(,)()∑ JWm TCP N m+ CW TCP , where k m=1 For all classes CLm with depthm = depth kk k kkk⎛⎞λ kk Compute all JWmTCPm(,) N from equation (17), JW(,) N=−⎜ Q⎟ Nπ + μμ,,m TCP m⎜ mk ⎟ m m kkmax max ⎝⎠WTCP for NNWWm∈∈{0,..., m }, TCP {0,..., TCP } ; , (17) kk* kk* kkkkkk++++1111 Compute PMm () s and πm ()s in equation (11); γπ∑ PW(|)(,())TCP WTCP Jμμ,, m W TCP N m m sk +1 ∈S kk* If PMm () s < 0, set Qn = 0 for ∀∈CLnmDes ; kk and [πμμ,m ,mMs== 1,..., ] ( ). Hence, the expected utility- Setdepth=+ depth 1 ; kk kkk End for to-go Jsμ() is also separable and JWμ,mTCPm(,) N is also a Set kk=−1 ; nondecreasing function of N k . By backward induction, End while m kk k iki** Jsμ( ),∀∈ sS , k = 1,..., K are all separable. ■ Set PMmm= PM() s for mM= 1,..., ; ii* Set WN= ∑ i* m ; ∀≥mPM,0m APPENDIX C

APPENDIX B Proof of Theorem 2: From equation (9), the optimal policy in time slot i with state si is the action that maximize the Proof of Theorem 1: Without losing generality, we assume expected utility. Based on Theorem 1, we can see that when i = 1 . Since packets are independent, the distortion kkk the packets in the buffer are independent, the utility-to-go is reduction Qas(,) in equation (6) can be computed by separable. From equation (17), the optimization problem M becomes kkk kk Qas(,)= QNmmmπ . First, we see that when kK= , ii** ii ∑ [ππ1 (ss ),...,M ( )] = m=1 KK ⎛⎞⎛⎞λ Jsμ () can be rewritten as: ⎜⎜QN−+⎟ i π ⎟ M ⎜⎜ mmmi ⎟ ⎟ . ⎜⎝⎠WTCP ⎟ MM arg max ⎜ ⎟ ⎛⎞[ππ ,..., ] ∑ ⎟ KKλ KK K ⎟ 1 M ⎜ iiiii++++1111⎟ QNππ−−⎜ N W ⎟ m=1⎜γπPW(|)(,()) W J W N ⎟ ∑∑mmμμ,, mK ⎜ m m TCP⎟ ⎜ ∑ TCPTCPμ, m TCP m m ⎟ WTCP ⎝⎠⎜ ⎟ ⎝⎠⎜ si+1 ∈S mm==11, (16) M ⎛⎞λ In other words, we have =()⎜QNCW−+⎟ KKπ K ∑ ⎜ mK ⎟ mμ, m TCP iii m=1⎝⎠WTCP JWμ,mTCPm(,) N= ⎧⎛ ⎞ KK ⎪ ⎜ λ ⎟ i where [πμμ,m ,mMs== 1,..., ] ( ) denotes the vector of ⎪ JQ1 =−⎜ mm⎟ N + ⎪ ⎝⎠⎜ W i ⎟ K ⎪ TCP transmission permissions given the policy μ()s . Based on ⎪ ⎪ iiiiAii+++111,* this, the expected utility-to-go ⎪ γπ∑ PW (TCP | WTCP ) Jμ, m ( W TCP , N m ), if m = 1 ⎪ i+1 M ⎛⎞ ⎨ s ∈S KK⎜ λλ⎟ KK M ⎪ Jsμμ()=−⎜ Qmmm⎟ Nπ , + is separable ⎪ ∑ ⎜ KK⎟ ⎪ iiiiiDiAi+++111,, m=1⎝⎠WWTCP TCP ⎪JPWWJWNNN0,=−+γ ∑ (|)(,TCPTCPμ m TCP m m m ) ⎪ i+1 K ⎪ s ∈S and also a nondecreasing function of Nm . Then, by assuming ⎪ ⎪ , if πi* = 0 M ⎩⎪ m kk++11 k + 1 k + 1 k + 1 k + 1 k + 1 Jsμμ()=+∑ JWN,mm (TCP , ) CW ( TCP ) and . (18) m=1 Hence, by defining PMii*() s=− J J in equation (18), the assume JWkkk+++111(,) N are nondecreasing functions of m 10 μ,mmTCP problem becomes an optimal stopping problem [23] that leads k +1 Nm for all classes, we have to the utility-to-go in equation (9). ■ Jskk()=+ us kk (,())μγ s k Ps ( k+++111 | sJ k ) k ( s k ) μμ∑ APPENDIX D sk +1 ∈S Proof of Theorem 4: Let V = 1 , the sufficient condition in M ⎛⎞λλ = ⎜QN−++⎟ kkπ equation (15) is satisfied. Without loss of generality, let ∑ ⎜ mmmkk⎟ μ, m=1⎝⎠WWTCP TCP QQ12≥≥≥≥... QQhh+ 1 =Q . Suppose that the condition is satisfied when V ≤ h , i.e. 14 hh hh Friendly Transport Layer,” In Proc. of Int. Packet Video 2 ∑∑()QQQQQnnnnnΔ≥ ∑∑ Δ. (19) Workshop, 2009. nn==11 nn == 11 [11] Q. Zhang, W. Zhu, Y. Zhang, “Resource Allocation for For V =+h 1 , since Q ≤∀∈QV, V , if the sufficient Multimedia Streaming over the ,” IEEE Trans. on nn Multimedia, vol. 3, no. 3, Sep. 2001. conditions are satisfied, we have priority metrics [12] P. Zhu, W. Zeng, C. Li, “Joint Design of Source Rate Control and QoS-Aware Congestion Control for Video Streaming over Δ=Q QN ≥ QN =Δ∀ Q, V ∈V ∑∑mmh,1+ mmn n n the Internet,” IEEE Trans. on Multimedia, vol. 9, no. 2, pp. 366- ∀∈mmMMkk ∀∈ h +1 n 376, Feb. 2007. . Hence, we have [13] R. Rejaie, M. Handley, and D. Estrin, “RAP: An End-to-end hh Rate-based Congestion Control Mechanism for Real-time ∑∑Δ−≥Δ−QQQQQnn() QQ nn () Q Streams in the Internet,” IEEE INFOCOM 1999. nn==11 [14] L. Cai, X. Shen, J. Pan, and J. W. Mark, “Performance Analysis hhh hof TCP-Friendly AIMD Algorithms for Multimedia 22 ⇒Δ+Δ≥QQQQQ∑∑∑QQQQQnnnnn Δ+Δ ∑Applications,” IEEE Trans. on Multimedia, vol. 7, no. 2, pp. nnn===111 n = 1339-355, Apr. 2005. . (20) [15] G. Wu, E. K. P. Chong, R. Givan, “Burst-Level Congestion Adding equation (19) and (20), we have, Control Using Hindsight Optimization,” IEEE Trans. on ⎛⎞⎛⎞hh Automatic Control, vol. 47, no. 6, pp. 979-991, June 2002. ⎜⎜22⎟⎟ [16] N. R. Sastry and S. S. Lam, “CYRF: A Theory of Window- ⎜⎜∑∑QQnn+Δ+Δ≥QQ⎟⎟ ⎝⎠⎝⎠⎜⎜nn==11⎟⎟ Based Unicast Congestion Control,” IEEE/ACM Trans. , Networking, vol. 13, no. 2, pp. 330-342, Apr. 2005. ⎛⎞⎛⎞hh ⎜⎜QQQ+Δ+ΔQQQ⎟⎟[17] P. Chou, and Z. Miao, “Rate-distortion optimized streaming of ⎜⎜∑∑nnn⎟⎟packetized media,” IEEE Trans. Multimedia, vol. 8, no. 2, pp. ⎝⎠⎝⎠⎜⎜nn==11⎟⎟ 390-404, 2005. which satisfy the condition in equation (15). By induction, we [18] F. Fu, M. van der Schaar, “Structural solutions of cross-layer prove that the sufficient condition of the lemma is fulfilled for optimization of wireless multimedia transmission,” Technical all V ≥ 1. ■ Report, 2009. http://medianetlab.ee.ucla.edu/papers/ UCLATechReport_CLO.pdf REFERENCES [19] H. Shiang and M. van der Schaar, "Multi-user video streaming [1] J. Mo and J. Walrand, “Fair End-to-End Window-Based over multi-hop wireless networks: A distributed, cross-layer Congestion Control,” IEEE/ACM Transactions on Networking, approach based on priority queuing," IEEE J. Sel. Areas vol. 8, no. 5, pp. 556 – 567, Oct. 2000. Commun.,vol. 25, no. 4, pp. 770-785, May 2007. [2] J. Padhye, V. Firoiu, D. F. Towsley, J. F. Kurose, “Modeling [20] J.-R. Ohm, M. van der Schaar, and J. W. Woods, "Interframe TCP Reno Performance: a simple model and its empirical wavelet coding x motion picture representation for universal validation,” IEEE/ACM Trans. on Networking, vol. 8, no. 2, pp. scalability," EURASIP Signal Processing: Image 133-145, Apr. 2000. Communication, Special issue on Digital Camera, vol. 19, no. 9, [3] J. Padhye, V. Firoiu, D. F. Towsley, J. F. Kurose, “A Model pp. 877-908, Oct. 2004. Based TCP-Friendly Rate Control Protocol,” In Proc. of [21] T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, NOSSDAV’99, 1999. “Overview of the H.264/AVC video coding standard,” IEEE [4] S. Floyd, M. Handley, J. Padhye, J. Widmer, “Equation-Based Transactions on Circuits and Systems for Video Technology, Congestion Control for Unicast Applications,” ACM SIGCOMM vol. 13, no. 7, pp. 560-576, July, 2003. Computer Communication Review, vol. 30, no. 4, pp. 43-56, [22] M. Dai, D. Loguinov, and H. Radha, “Rate distortion modeling Oct. 2000. for scalable video coders,” in Proc. of ICIP, 2004. [5] B. Wang, J. Kurose, P. Shenoy, D. Towsley, “Multimedia [23] D. P. Bertsekas, Dynamic programming and Optimal Control. Streaming via TCP: An analytic Performance Study,” ACM vol. I, Belmont, MA: Athena Scientific, 1995. Trans. on Multimedia Computing, Communications, and [24] D. S. Turaga and T. Chen, "Hierarchical Modeling of Variable Applications, vol. 4, no. 2, May 2008. Bit Rate Video Sources," Packet Video, 2001. [6] S. Bohacek, “A Stochastic Model of TCP and Fair Video [25] Y. Li, Z. Li, M. Chiang, and A. R. Calderbank, “Content-aware Transmission,” IEEE INFOCOM 2003. distortion fair video streaming in networks,” IEEE Trans. [7] I. V. Bajic, O. Tickoo, A. Balan, S. Kalyanaraman, and J. W. Multimedia, to appear. Woods, “Integrated end-to-end buffer management and [26] R. Jain, D.-M. Chiu, and W. Hawe, “A quantitative measure of congestion control for scalable video communications,” Proc. fairness and discrimination for resource allocation in shared IEEE ICIP 2003, Barcelona, Spain, vol. 3, pp. 257–260, Sept. computer systems,” Digital Equipment Corporation, DEC 2003. Research Report, Tech. Rep. TR-301, Sep. 1984. [8] W. Tan and A. Zakhor, “Real-time Internet Video Using Error [27] P. A. Chou, “Streaming media on demand and live broadcast,” Resilient Scalable Compression and TCP-Friendly Transport Multimedia over IP and Wireless Networks: Compression, Protocol,” IEEE Trans. on Multimedia, vol. 1, no. 2, June 1999. Networking, and Systems, Academic Press, 2007. [9] T. Nguyen, A. Zakhor, “Distributed video streaming with [28] E. Kohler, M. Handley, and S. Floyd. Datagram Congestion forward error correction,” In Proc. of Int. Packet Video Control Protocol (DCCP). Proposed standard RFC 4340, IETF, Workshop, 2002. http://www.ietf.org/rfc/rfc4340, March 2006. [10] H. Seferoglu, U. C. Kozat, M. R. Civanlar, and J. Kempf, [29] Y. Li, M. Chiang, and A. R. Calderbank, “Congestion control in “Congestion State-Based Dynamic FEC Algorithm for Media networks with delay sensitive traffic,” Proc. IEEE GLOBECOM, Washington D.C., Nov. 2007. 15

TABLE IV. NOMENCLATURE k kk Qm Distortion impact of the class CLm . N N ==[Nmm , 1,..., M ] Number of packets in the transmission buffer of class Action at time slot k of the FHMDP k CLm k Nm a kk M in time slot k problem:amM==[πm , 1,..., ] ∈ {0,1} State at time slot k of the FHMDP problem: kA, k Nm Number of packets in class CLm that arrive in time slot k s kkk sW=∈{,}TCP N S Number of packets in class whose delay deadline is kD, CLm Nm λ Positive Lagrangian multiplier of the FHMDP problem expired in time slot k Maximum distance from class to the root in the DAG k CLm γ depthm Discount factor of the FHMDP problem in time slot k . Qk Expected multimedia distortion reduction in time slot k K Finite horizon of the FHMDP problem k kk Expected utility-to-go in time slot k for the FHMDP PM Priority metric of class CL in time slot k Js() m m problem Expected utility-to-go component specific to the class k kkkM kk k x x =∈[PM1 ,..., PMM ] R JWm(,) TCP N m CLm A set of media-TCP users in the network, k Transmission permission for transmitting the packets in class πm V CLm in time slot k V =={Vnn , 1,..., V } k k Jain’s fairness index of the media-TCP users in the W Congestion window size in time slot k F network k k User Vn ’s expected multimedia distortion reduction in p Estimated packet loss rate in time slot k Qn time slot k Expected distortion reduction variation of user Vn in kk k WpTCP () Expected TCP window size (network congestion metric) ΔQn kk+1 k time slot k , Δ=QQnn − Q n