Goodput-Aware Load Distribution for Real-Time Traffic Over Multipath

Home , Goodput, Throughput

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2014.2347031, IEEE Transactions on Parallel and Distributed Systems

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 1 Goodput-Aware Load Distribution for Real-time Trafﬁc over Multipath Networks Jiyan Wu, Student Member, IEEE, Chau Yuen, Senior Member, IEEE, Bo Cheng, Yanlei Shang, and Junliang Chen

Abstract—Load distribution is a key research issue in deploying the limited network resources available to support traffic transmissions. Developing an effective solution is critical for enhancing traffic performance and network utilization. In this paper, we investigate the problem of load distribution for real-time traffic over multipath networks. Due to the path diversity and unreliability in heterogeneous overlay networks, large end-to-end delay and consecutive packet losses can significantly degrade the traffic flow’s goodput, whereas existing studies mainly focus on the delay or throughput performance. To address the challenging problems, we propose a Goodput- Aware Load distribuTiON (GALTON) model that includes three phases: (1) path status estimation to accurately sense the quality of each transport link, (2) flow rate assignment to optimize the aggregate goodput of input traffic, and (3) deadline-constrained packet interleaving to mitigate consecutive losses. We present a mathematical formulation for multipath load distribution and derive the solution based on utility theory. The performance of the proposed model is evaluated through semi-physical emulations in Exata involving both real Internet traffic traces and H.264 video streaming. Experimental results show that GALTON outperforms existing traffic distribution models in terms of goodput, video PSNR (Peak Signal-to-Noise Ratio), end-to-end delay, and aggregate loss rate.

Index Terms—Load Distribution, Goodput, Multipath Networks, Real-time Trafﬁc, Multihoming. !

1 INTRODUCTION Gateway

He advancements of various network infrastructures Wi-Fi have reached an unprecedent height during the T Multi-mode past few years. The network heterogeneity and high- Load Distributor degree connectivity to different access medium provide increased opportunities to establish multiple paths between end devices [1] as illustrated in Fig. 1. However, Cellular

a single communication path is not capable of com- Multi-mode pletely satisfying the stringent QoS (Quality of Service) WiMAX requirements (e.g., delay, bandwidth and reliability) im- Media Server posed by current and emerging real-time multimedia applications [2][3] (e.g., multi-player online gaming and Fig. 1. Illustration of multipath connectivity in heteroge- live sports program). With the rise of multihomed or neous overlay networks. The communications paths can multinetwork clients [4] (e.g., the Mushroom products be established either in the wired domain (e.g., using [5]), new research trends [6][7] have moved toward- the SCTP - Stream Control Transmission Protocol [44]) s simultaneously exploiting these multiple paths for or in hybrid wireless networks (e.g., cooperative packet enhanced transmission reliability and throughput. The delivery in cellular, Wi-Fi, WiMAX and ad hoc networks). IETF MONAMI6 working group [8], which is focused on enhancing mobile IPv6 with multihoming support, has identified the benefits that multihomed hosts offered to both individual mobiles and network operators. inefficient load distribution can significantly degrade the The key research issue in utilizing the available path- traffic performance and network utilization, e.g., load s between multihomed communication terminals is to imbalance, packet reordering, large end-to-end delay, etc. effectively distribute input traffic load for providing Therefore, many algorithms [1][2][10][11] have been pro- adequate QoS perceived by end users [1][2][9]. Indeed, posed to optimize the delay or throughput performance. However, these network-level criteria cannot properly • Jiyan Wu, Bo Cheng, Yanlei Shang, and Junliang Chen are with the indicate the benefits of upper-layer applications. For State Key Laboratory of Networking and Switching Technology, Beijing instance, a live streaming video application cannot ef- University of Posts and Telecommunications, Beijing 100876, P. R. China. fectively leverage the throughput gains since its stream- • Jiyan Wu and Chau Yuen are with the SUTD-MIT International Design Center, Singapore University of Technology and Design, 20 Dover Drive, ing rate is typically fixed or bounded by the encoding Singapore 138682. schemes. Furthermore, the increased throughput may E-mail: [email protected], [email protected], {chengbo, shangyl, lead to larger end-to-end delays, which in turn induce chjl}@bupt.edu.cn. video quality degradation. Consequently, the load distri-

1045-9219 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2014.2347031, IEEE Transactions on Parallel and Distributed Systems

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 2

bution of real-time traffic to achieve excellent QoS still GALTON over the competing schemes become more remains problematic. obvious as the number of available paths increases. Distinct from previous studies, we present in this The remainder of this paper is structured as follows. paper a load distribution model to optimize the goodput In Section 2, we briefly review and discuss the related [12] performance of real-time traffic over multipath network. The system model and problem formulation are works. Goodput differs from throughput as it represents presented in Section 3. Section 4 describes the solution the amount of data successfully received by the destina- procedure of the proposed GALTON. Performance eval- tion within the imposed deadline. The path diversity and uation is provided in Section 5 and concluding remarks unreliability in heterogeneous overlay networks [13], in are given in Section 6. The basic notations used through- concert with the stringent QoS requirements, pose crucial out this paper are presented in Table 1. challenges to achieve the goal. To effectively aggregate the available capacity of different network paths, we TABLE 1 have to seriously consider: (1) how to guarantee the Basic notations used throughout this paper. input traffic delivering within the delay constraint, and (2) how to alleviate the burst data losses1 frequently en- Symbol Definition countered in wired/wireless packet switching networks. P, E the probability value, expectation value. Motivated by addressing the above issues, we propose P, p the set of available paths, a path element. a Goodput-Aware Load distribuTiON model (GALTON) F, f the set of traffic flows, a flow element. that includes three phases: (1) path status estimation P,F the number of available paths, traffic flows. to capture the physical characteristics of each transport R,Rf the flow rate assignment matrix, an element. link, (2) flow rate assignment to optimize the aggregate p T goodput of input traffic, and (3) deadline-constrained the delay constraint for the input traffic. packet interleaving to alleviate consecutive losses. The RTTp the round trip time of p. detailed descriptions of the proposed solution will be µp, νp the available, residual bandwidth of p. presented in Section 4. G/B the Good/Bad state of p. B G Specifically, the contributions of this paper can be πp , πp the stationary probability that p is in B/G state. G summarized as follows: ξp the state transition probability of p from B to G. πp the transmission loss rate of p. • A load distribution model that effectively integrates f the path status estimation, flow rate assignment, and Πp the effective packet loss rate of flow f over p. packet interleaving to optimize the goodput perfor- Rp, P ktp the probing traffic rate/size of p. mance of real-time traffic over multipath networks. M,Mp the total number of packets, dispatched onto p. f • A mathematical formulation for load distribution Dp the end-to-end delay of flow f over path p. of multiple deadline-constrained flows over parallel Θ the aggregate goodput. f communication paths to maximize the aggregate U,Up the system utility matrix, an element. goodput. The utility theory is employed to derive the solution for flow rate assignment. • Extensive semi-physical emulations in Exata involving both real Internet traffic and H.264 video stream- 2 RELATED WORK ing over wired/wireless multipath networks. Exper- Traffic load distribution over multipath networks has imental results show that: (1) GALTON improves been an active research area in recent years and the the goodput by up to 0.25, 0.48, and 0.64 Mbps general reviews can be found in [9][14]. The existing compared to the OPI [2], E-DCLD [1], and THR distribution models can be categorized into the flow and [45], respectively. (2) GALTON increases the average packet based traffic splitting approaches. video PSNR by up to 6.1, 8.6, and 12.1 dB com- The packet based models generally dispatch packets pared to the OPI, E-DCLD, and THR, respectively. onto different paths based on the channel status in- (3) GALTON reduces the average end-to-end delay formation, packet delay constraint, etc. Although these by up to 33.1, 12.3, and 46.5 ms compared to the scheduling policies are able to reduce the queueing OPI, E-DCLD, and THR, respectively. (4) GALTON delay over a single path, they may also induce serious mitigates the aggregate loss rates by up to 5.1%, packet reordering problems, which in turn result in 9.5%, and 10.9% compared to the OPI, E-DCLD, and large end-to-end delay. The Effective Delay Controlled THR, respectively. Furthermore, the superiority of Load Distribution (E-DCLD) [1] aims at minimizing the difference among end-to-end delays of different paths, 1. In the context of heterogeneous overlay networks, the packet losses can be classified into three categories: 1) the losses caused by thereby reducing packet delay variation and risk of pack- congestions due to the bandwidth limitation or buffer overflow; 2) the et reordering. The authors formulate the queueing delay errors caused by noise or interference in the wireless networks; and for each path with a hybrid M/M/1 model and splits 3) the path failure loss or handover loss. In wireless networks, most packet losses are due to the wireless channel fluctuations or path failure the traffic to minimize the cost of delay variations. In and not caused by the link congestion. [18], each path is assigned with a weight associated with

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 3

the flow rate assignment. An opportunistic scheduling- adaption parameters). The flow rate allocator is respon- based scheduler was proposed to send packets to multi- sible for partitioning the input traffic (which may contain ple paths while keeping the fraction of bytes transmitted multiple deadline-constrained flows) into several sub- on each path. flows and dispatching each of them to the available The flow-based traffic splitting models address the paths. The allocated sub-flows will be temporarily stored packet reordering problem by assigning the packets be- in the sending buffers for different communication paths. long to the same flow onto the same path. Although the In the process of deadline-constrained packet interleav- risk of packet out-of-order may be decreased, load imbal- ing, the packets scheduled onto the same path will be ance among the communication paths frequently occurs spread out within the delay constraint. and the queueing delay over a single path significantly The receiving buffer at the receiver side is used for increases. The Load Balancing for Parallel Forwarding collecting the packets. In order to restore the original (LBPF) [15] model selects the path for a flow according to traffic, packet-to-flow mapping and inter-packet rese- the hashed result of the packet identifier in the ordinary quence are two necessary steps. The problem of traffic mode, similar to the schedule of Direct Hashing. The load distribution involves the models of communication FlowLet Aware Routing Engine (FLARE) [16] switch- path and real-time traffic, which will be described in the es packet bursts named flowlets onto available paths. rest of this section. Flowlets are spaced by a minimum interval δ, chosen to be larger than the delay difference between parallel paths to reduce packet reordering. 3.1 Communication Path Model The MultiPath LOss Tolerant (MPLOT) [19] transport We consider a heterogeneous overlay network integrat- protocol addresses the burst data losses in wireless net- ing multiple communication paths between two termi- works by introducing a hybrid proactive/reactive FEC nals. The end-to-end connection can be constructed by (Forward Error Correction) mechanism. The Encoded binding a pair of IP addresses from the source and Multipath Streaming (EMS) [20] model adjusts the FEC destination node, respectively. Each communication path redundancy in a progressive manner based on the infor- p ∈ P is considered to be an independent transport mation loss rate. This study reveals the tradeoff between link uncorrelated with others and this assumption can loss and delay performance in real-time applications. be justified as follows The work closest to ours is conducted by Bui et al. [2]. • In the environment of wired multipath networks, The authors formulate the multipath data distribution as a method commonly used in existing studies (e.g., a Markov Decision Process and propose an Online Policy [22][36][46]) is to detect the shared congestion paths Iteration (OPI) algorithm. The proposed algorithm aims with the end-to-end measurement techniques (e.g., at improving the performance metrics, e.g., delay, loss, the cross-correlation based approaches [47], and and throughput. However, goodput is able to better indi- wavelet based schemes [48]). Then, the paths shar- cate the QoS requirements of real-time applications. Both ing bottleneck links can be treated as one com- the traffic and network info should be comprehensively munication path. Another method to construct the analyzed. Towards this end, we formulate the data distri- end-to-end independent paths is to use the overlay bution aiming to optimize the goodput performance and relay nodes to forward the traffic [49]. In this case, develop solutions for flow rate assignment. Besides, we the available communication paths are disjoint. The take into account the burst path losses in heterogeneous general review of setting the independent paths can overlay networks and propose an interleaving scheme be found in [50]. to alleviate consecutive losses. In literature [21], we • In the context of heterogeneous wireless networks propose a Sub-Frame Level (SFL) scheduling approach to (e.g., WLAN, Cellular, WiMAX), the last-hop wire- optimize the delay performance of high definition video less links are most likely to be the bottleneck links of streaming over heterogeneous wireless networks. the end-to-end paths due to the limited capacity and time-varying channel status [6][39]. These access 3 SYSTEM MODELAND PROBLEM FORMULA- networks use different wireless spectrums and do TION not interfere with each other. Therefore, we can assume the available capacities and loss rates are We firstly describe the system overview of the proposed independent to each other [6][7]. GALTON, which is shown in Fig. 2. The proposed model is an end-to-end scheduling framework and the The end-to-end communication paths are character- implementation requires the modifications on the sender ized by the following properties: and receiver sides. • the available bandwidth µp. This metric does not The key working component in the sender side is the indicate the raw per-path capacity, but the time- parameter control unit, which takes in the input pa- varying share of that bandwidth as perceived by rameters (including the path status, deadline constraint) the end-to-end flow. It can also be viewed as the and outputs the scheduling parameters (including flow sending rate on path p that is achievable with the assignment vector, interleaving level, and probing traffic current sending buffer and round trip time.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 4

Traffic flow Information flow

Path 1 Packet Traffic Receiver Reassembler

Flow Rate Path 2 Packet-to- .

. Flow Mapping Allocator ...... Path P Receiving Inter Packet Sub-flows Sending Buffers T Buffer Resequence

Network Region Deadline Constrained Path Status Packet Interleaving Monitor Reassembled Parameter Traffic Control Unit pathChirp Available bandwidth, Path loss rate, algorithm Sender Round trip time Receiver

Fig. 2. Overview of Goodput-Aware Load Distribution (GALTON) model.

• the round trip time RTTp, which represents the the arrival packets. In real-time applications (e.g., video length of time it takes for a data packet to be sent conferencing, multi-player online gaming), the input plus the delay it takes for an acknowledgment of traffic is associated with a deadline T [24], which is the that packet to be received. This metric therefore con- time when it is extracted from the receiving buffer for sists of the packet delivery time, processing latency, upper-layer applications (e.g., decoding for display). The and path propagation delay2. differentiated scheduling for flows with heterogeneous B • the average packet loss rate πp , assumed to be delay constraints can be future research work [25]. In an independent and identically distributed (i.i.d) such applications, a packet will only stay in the receiving process, uncorrelated with the input traffic rate. buffer for at most T seconds. Similar to the previous work [22], we assume that the A packet may be lost either due to transmission er- background traffic is much larger than our own, and thus rors or dropped at receiver side because it is overdue. the traffic load we impose on a path does not affect its Both types of packet losses are undesirable in terms of loss statistics. application QoS requirements. A larger receiving buffer In addition to the above metrics, we model burst loss can reduce the overdue packet ratio but may result in on each path using the Gilbert loss model [23], which can a larger end-to-end delay. Therefore, the effective packet be expressed as a two-state stationary continuous time loss rate for flow f considered in this paper includes Markov chain. The state Xp(t) at time t assumes one of both transmission and overdue loss. two values: G (Good) or B (Bad). If a packet is sent at The scheduled packets may arrive at the multihomed time t and Xp(t) = G then the packet can be successful client out-of-order due to variations in path delays or delivered; if Xp(t) = B then the packet is lost. We denote the non-First Come First Serve (FCFS) service discipline G B by πp and πp the stationary probabilities that path p at intermediate routing nodes. Packet reordering (out- G B is good or bad. Let ξp and ξp represent the transition of-order arrivals) problem also has a significant impact probability from B to G and G to B, respectively. In on the end-to-end performance perceived by users, and this work, we adopt two system-dependent parameters reportedly, is not a sporadic event if there is no mecha- to specify the continuous time Markov chain packet loss nism to maintain packet in-order [1]. The probability of B model: (1) the channel loss rate πp , and (2) the average packet out-of-order is likely to increase in a network with B burst loss length 1/ξp . Then, we will have a high degree of parallelism. The earlier arrival packets have to wait for late packets in the receiving buffer at G B G ξp B ξp the destination. If late packets arrive within the imposed πp = B G , and πp = B G · (1) ξp + ξp ξp + ξp deadline, the transmission is successful. However, the waiting time causes additional packet delay and overdue 3.2 Real-time Traffic Model any packet is treated as a lost one. Without loss of generality, we assume the input traffic 3.3 Problem Formulation is composed of multiple flows f ∈ F and each flow is characterized by a data rate Rf that remarks its minimal As analyzed in Section 3.2, the effective loss rate repre- consuming bandwidth. To have a stable system, the sum sents the combined probability of channels losses and ex- of flow rates should not exceed aggregate bandwidth pired arrivals of packets. In a capacity-limited transport of all communication paths. Due to the latency differ- link, such loss rate can be approximated by the M/G/1 ences in available communication paths, the receiving queueing model [54]. Hence, the packet delay over a buffer at client side is used to absorb delay jitters of single link follows an exponential distribution [26] and can be modeled as 2. In this work, RTT can be expressed as RTT = 2 × packet transmission time + 2 × propagation delay + processing delay. P{D > T} = exp{−ω ·T}, (2)

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 5

where ω represents the arriving rate and is determined Therefore, the aggregate goodput (Θ) can be expressed by the average delay, i.e., as the successfully delivered traffic over all the commu- 1 nication paths within the delay constraint ω = , (3) {D} X X f f E Θ = Rp · 1 − Πp . (10) where E{·} represents the expectation value. Generally, p∈P f∈F ω can be empirically determined from end-to-end delay Furthermore, as is proved in existing studies [1][17][21] statistics but the fitting process needs a large number that the optimal flow allocation to minimize end-to-end of samples. In order to derive a general solution for delay is to eliminate the delay differences of available efficient online operation, we construct a model to ap- communication paths. Now, we are ready to formulate proximate the average packet delay. We suppose the flow the following linear constraint optimization problem rate assignment to be expressed in a matrix form, i.e., for given communication paths P on maximizing the f f R = {Rp }p∈P,f∈F , in which element Rp represents the aggregate goodput of real-time traffic F while satisfying assigned rate of flow f over the communication path p. the path capacity, delay constraint, etc. Therefore, the aggregated assigned flow rates over path P f For each distribution interval, p is Rp = f∈F Rp , and the totally assigned flow rates f P f R = Rf , Ω = {ω } , for flow f is R = p∈P Rp . We denote the residual determine the values of p P×F p P bandwidth of p by (νp). Then, we can have X X f f to maximize: Θ = Rp · 1 − Πp , X f p∈P f∈F νp = µp − Rp . (4) f∈F  f f Πp = πp + (1 − πp) · P Dp > T , p ∈ P, f ∈ F,  The available bandwidth for flow f over path p can be  Df > T = , P p Equation (8) obtained with  0 0 P f νp = µp − f∈F Rp , p ∈ P, 0 f f X f s. t. µ ·RT Tp µp = µp − Rp . (5) p  P f ≤ T , p ∈ P, f ∈ F, 0 0  2·(µp− f∈F Rp ) f 6=f,f ∈F  P Rf ≤ µ , p ∈ P,  f∈F p p As the assigned rates on each path approaches the  0 0  {D } = {D 0 }, for p 6= p , {p, p } ∈ P. maximum achievable rate, the average packet delay typ- E p E p (11) ically increases due to network congestion. We employ The distribution interval is correlated with the imposed a fractional function to approximate the delay of the deadline (e.g., 0.25 seconds, which is the duration of a allocated sub-flow rate Rf over p, i.e., p Group of Pictures). To obtain the neal-optimal result with Rf ρ fast convergence adapting to the online operation, we Df = p + p , E p f (6) propose a progressive flow rate assignment algorithm µp νp approach to solve the rate allocation optimization based in which ρp can be interpreted as the available source on the utility theory [27]. The proposal iteratively makes for the classical M/G/1 queuing model. The value of ρp a locally optimal decision on rate assignment for each can be estimated from the latest observations of the path flow in each path. The flow rate assignment function status information is used to approximate the above goal function. The 0 0 νp · RTTp function is the convex union of many small hypercubes, ρ = . (7) p 2 and an approximately globally optimal solution of the original problem confined in this union can be found in If ν is equal to the latest observed residual bandwidth of the set of local solutions. In many cases, the number of path p, i.e., ν0 = ν , the one-way delay is RTT /2. Then, p p p such unions may be much less than that of all smaller we can have partitioned hypercubes. Hence, the proposed algorithm f P Dp > T = can substantially reduce the computational complexity.  P f 0  In the next section, we will describe the solution for this  2 · νp ·T· µp − f 06=f,f 0∈F Rp  exp − . optimization problem in detail. 0 0 0 P f f  νp · RTTp · µp − f 06=f,f 0∈F Rp + 2 · νp · Rp  (8) 4 PROPOSED SOLUTION The three steps of the proposed GALTON model are In conjunction with πp, which is the transmission loss f interdependent: 1) In the path status estimation period, rate, the effective packet loss rate Πp of a single flow f over path p can be obtained with the feedback information is collected from the heterogeneous networks; 2) In the flow rate assignment phase, f f Πp = πp + (1 − πp) · P Dp > T , (9) the effective loss rate can be derived with Equation (9) with the estimated path status. Then, the rate assignment Therefore, the expectedly received packets of flow f vector can be obtained following Algorithm 2; 3) In P f f through p can be estimated with f∈F Rp · (1 − Πp ). the packet interleaving scheme, the number of packets

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 6

allocated for the available communication paths can be Algorithm 1 Per-path Probing Trafﬁc Adjustment

determined by the rate assignment vector with Equation Require: {np, γp, Rp, P ktp, µp,RTTp}p∈P // np - number of (19). The indexes of the scheduling packets and transmis- consecutive losses over p; {γ , R , P kt } sion intervals can be obtained with Algorithm 3 based Ensure: p p p p∈P ; 1: t = current time; on the path status and rate assignment vector. 2: for each communication path p in P do

3: if (np == 1&&RTTp < RTTp − σRT Tp )||(np ==

4.1 Path Status Estimation 2&&RTTp < RTTp − σRT Tp )||(np == 3&&RTTp < RTTp)||(np > 3&&RTTp < RTTp − σRT T /2) then To effectively utilize the channel resources available in p 4: Update the loss parameters of path p; // wireless loss heterogeneous overlay networks, it is fairly important occurs; to accurately estimate the status of each communication 5: else path. In this paper, the pathChirp algorithm [28] is em- 6: Rp ⇐ Rp − ∆R; // decrease probing rate; ployed to estimate the available bandwidth of each com- 7: γp ⇐ γp + ∆γ; // spread the probing packets; 8: end if munication path. We propose a per-path probing traffic 0 9: if µp 6∈ Confidence Interval{µp} then adjustment algorithm to improve the measurement accu- 10: P ktp = P ktp + ∆P kt; // increase the probing packet racy and reduce the network overhead. Specifically, we size to improve accuracy; employ the ZigZag scheme [51] (with the decaying factor 11: else α = 1/32) to distinguish the congestion losses from 12: P ktp = P kt; // set the packet size to the normal value; wireless losses. The mean value (RTT ) and deviation 13: end if σ ( RT T ) of round trip time are obtained as follows [51] 14: Send the probing packets (size P ktp) to path p at time i j Rp·P robeCycle k t + (γp) , for i = 1,..., ; RTT = (1 − α) · RTT + α · RTT, and P ktp 15: end for σRT T = (1 − 2α) · σRT T + 2α · |RTT − RTT |. The rate and gap adjustments are performed at per- path level in each probing cycle. The per-path probing iterative method to calculate the standard deviation to traffic adjustment mechanism is presented in Algorith- avoid storing all the collected samples at the sender and m 1. If path congestion occurs, the proposed model reduce the computational complexity adaptively decreases the probing rate and spreads the s 2 packets. As the pathChirp algorithm generally performs σN · (N − 1) sN+1 − SN σN+1 = + . (14) better with larger packet size, we also dynamically vary N N + 1 the probing packet size when the measured bandwidth After obtaining the mean value and the standard devi- frequently fluctuates (i.e., the latest measured value is ation from Equations (12) and (14), we use the Central not in the confidence interval of the record data). The Limit Theorem to calculate the confidence interval existing estimation models mainly adjust the probing rate and gap at the aggregate level. Due to the path di- σ σ P S − Z1−α/2 · √ ≤ u ≤ S + Z1−α/2 · √ = 1 − α, versity in heterogeneous overlay networks, the aggregate N N level estimation strategy may degrade the measurement (15) accuracy and induce extra overhead. in which 1 − α is the confidence interval. Assuming the With the purposed of further improving the accuracy probability of transmission with no packet loss set to of network measurements, we calculate the confidence 95%, we can have α = 0.05 and Z1−α/2 = 1.96. The interval [29] per-path by combining the historic interval confidence interval is also used as a reference to assist samples. Suppose the values of the sampled channel picking appropriate paths out of all candidates because 1 2 3 n involving an unreliable communication path in the trans- status information of path p are sp, sp, sp, . . . , sp , we can calculate the mean value of the sampling values with mission will only degrade the aggregate goodput. In PN order to evaluate the efficacy of the proposed estimation SN = ( i=1 si)/N, in which N denotes the number of samples. Therefore, the latest mean value of status model, we compare its performance with the Traceband information can be obtained with [52] and SigMon [53]. The results are presented in the complementary file. S · N + sN+1 S = . (12) N+1 N + 1 4.2 Flow Rate Assignment We use the previous mean value S and the new sam- N To maximize the aggregate goodput of input traffic load, pling value s to calculate the current time interval N+1 we use the continuous piecewise linear approach for S . The standard deviation can be obtained with N+1 the flow rate assignment based on utility theory [27]. s PN 2 Note that the utility theory based algorithms are inclined (si − SN ) σ = i=1 . (13) to assign loads to communication paths with higher N N − 1 quality. This indicates that GALTON tends to assign The above equation presents the general formula to a large amount of flow rates to network paths with calculate the standard deviation. We also employ an higher quality and transmission capability. However, this

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 7

policy will in turn result in load imbalance in the traffic Algorithm 2 Utility Theory based Flow Rate Assignment B f distribution. To alleviate severe imbalance problems, we Require: {RTTp, µp, πp }p∈P , T , ∆Rp ; f introduce a load imbalance parameter Lp to indicate Ensure: R = {Rp }f∈F,p∈P ; whether path p is overloaded and it is expressed as 1: for each path p in P do 2: for each each flow f in F do P f ϕ(Rf +∆Rf )−ϕ(Rf ) µp − R f p p p f∈F p 3: Up ⇐ f ; Lp = . (16) ∆Rp f P f P P P µp− R p∈P µp − p∈P f∈F Rp /P f∈F p 4: Lp ⇐ P P P f ; p∈P µp− p∈P f∈F Rp /P f f f When the value of Lp is obviously higher than a thresh- 5: ∆Rp ⇐ ∆Rp /Up ; f f f old limit value (TLV) [26], path p is overloaded. Let 6: Rp ⇐ Rp + ∆Rp ; f 3 7: Update the approximate function ϕ(Rf ); ∆Rp denote the rate variation of flow f over path p p 8: end for at each iteration and Rf + ∆Rf represent the transition p p 9: end for of the next allocation. The utility of this transition can 10: U ⇐ argmax{U}; be expressed as [27]: R 11: if Lp ≤TLV then f f f ϕ(Rf + ∆Rf ) − ϕ(Rf ) 12: Rp ⇐ Rp + ∆Rp //Intra-path allocation; f p p p p Up = f , (17) 13: Update the free resources of communication path ; ∆Rp 14: else 15: find other flow that can transfer part of its rate to in which ϕ(·) [27] represents the approximate linear p0 6= p ∈ P f f f path with maximum transition utility function for Θ in the interval [Rp ,Rp + ∆Rp ]. The utility improvement ∆U //Inter-path allocation; f matrix can be expressed as U = {Up }. In each iteration, 16: if ∆U > 0 then f f f 17: Rp ⇐ Rp + ∆Rp ; the proposed flow rate assignment algorithm obtains the 0 Rf that brings the highest utility U = {U f }, i.e., 18: Update the free resources of path p and p ; p p 19: end if U = arg max{U}. (18) 20: end if R The proposed algorithm allocates the channel resources available in heterogeneous overlay networks in a pro- path allocation rate is zero. gressive manner. Once the resources of path p are ex- Proof: First, the maximal capacity requirement of hausted, the algorithm will seek a different communi- all input flows is no more than the minimal available f bandwidth of all network paths (max{R } ≤ max{µp}). cation path which can release the required resources f∈F p∈P for flow f by allocating part of its available rate. This Second, the imposed traffic load does not exceed the P f P operation will be performed until utility value of the aggregate capacity ( f∈F R ≤ p∈P µp). We can arrive system can not be improved or the the channel resources at a conclusion that any single path is able to deliver a P f available are depleted. The sketch of the progressive single flow, i.e., p∈P Rp ≤ µp, for f ∈ F. Thus, the flow rate assignment algorithm based utility theory is inter-path allocation procedure will not be executed. presented in Algorithm 2. Proposition 2: Assuming a flow f with rate Rf which In this algorithm, the intra-path allocation process cannot be satisfied by any single path in a heterogeneous always attempts to increase the systems utility by assign- overlay network, Algorithm 2 initially allocates flow f ing some resources in path p. If the available resources to a path p with the highest estimated bandwidth. In are not adequate, this procedure will find a new flow this case, the system achieves the maximal aggregate that can release enough resources by allocating parts goodput. of its assigned rate through another path. The time Proof: A valid rate allocation scheme should respect complexity of Algorithm 2 is O(P × F ), in which P is the utility constraints. Suppose p denotes the the path the number of available paths and F is the number of with the largest available bandwidth. If the flow rate input flows. As P and F are all finite integers in practice, allocated for p is not the largest one, the proposed the time complexity of Algorithm 2 is acceptable in most algorithm searches for a better solution by transferring cases and is significantly lower compared with the traffic the assigned rates from other paths. As the total flow transmission delay. Thus, the overhead for executing rate remains unchanged during the distribution interval, the proposed flow rate assignment algorithm can be the rate transfer process does not violate the network reasonably ignored in the experimentations. According capacity constraints. However, this process contributes to Algorithm 2, we can have the following two proposi- to change the effect packet loss rate. tions. Note that the reference allocation scheme (i.e., the Proposition 1: If the maximal flow rate of all traffic goodput optimization based on utility theory but with- flows is less than or equal to the minimal available out using the load imbalance factor) based on utili- bandwidth among all communication paths, the inter- ty theory achieves better performance in the medium transmission rates. Under the high or close-to-capacity f f f f 3. The value of ∆Rp can be dynamically adjusted if Rp +∆Rp ≤ µp . traffic loads, such solution cannot guarantee the optimal f f In this work, we set ∆Rp to be µp /3 as the initial value. goodput because network congestions and burst losses

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 8

frequently occur. In order to show the approximation scheduling feasibility, we define the packet transmission ratio of the proposed solution to the reference rate allo- approach as follows. First, we rank the communication cation scheme in medium transmission rates, we present paths according to the estimated end-to-end delay in an example in the supplementary file. descending order. (When two paths have the same end- to-end delay, we take the path with a higher assigned 4.3 Deadline-Constrained Packet Interleaving rate). Similarly, the traffic flows are reordered based on the number of packets with ascending order. Second, we {Rf } Given the flow rate assignment vector p P ×F , the schedule the packets in an ‘out-of-order’ fashion over number of packets allocated for each path can be ex- different paths (following the rate assignment vector) to pressed as strive for the in-order arrivals. This mechanism is able $ P Rf · M % M = f∈F p , to leverage the path diversity to alleviate the packet p P P f (19) p∈P f∈F Rp reordering while mitigating the consecutive losses. The outline of deadline-constrained packet interleav- in which bxc denotes the largest integer less than x. In- ing scheme is presented in Algorithm 3. A concrete deed, both the maximum deadline constraint T and the example of the improved packet interleaving scheme is flow assignment vector R impose important scheduling p presented in the supplementary file. constraints. The packet scheduling policy is feasible if the following conditions are satisfied: Algorithm 3 Deadline-Constrained Packet Interleaving 1) max {E{Dp} + (Mp − 1) · ωp} ≤ T , i.e., all the pack- p∈P B f Require: RTTp, µp, πp , R = Rp , T , M; ets dispatched onto different paths should arrive at p∈P P×F Ensure: Ω = ωf ; the destination before the deadline. p P×F f n f o 0 0 1: Rank P according to the estimated end-to-end delay in 2) E Dp = E Dp0 , for {p, p } ∈ P, p 6= p , i.e., all descending order. 2: Dmax = max { {Dp}}; the sub-flows belonged to the same flow should be p∈P E estimated to arrive at the destination simultaneous- 3: for each communication path p in P do P f ly so as to minimize the risk of packet reordering. f∈F Rp ·M 4: Mp = P P f ; 3) All the packets belonged to the same sub-flow p∈P f∈F Rp Dmax 5: ωp = ; should be delivered in-order to reduce the waiting Mp−1 6: for each flow f in F do time of earlier arrival packets at the receiver side. f f Rp 7: Mp = P f0 · Mp ; // packets to be served in f0∈F Rp Delay constraint (ms) f over path p 150 200 250 300 350 400 450 500 6 1.12 8: Indexes of packets={i ∗ P ∗ f index}, for i ∈ N && P f i == Mp // N: set of natural numbers f 9: ωp = ωp · (P − 1); 4 1.1 10: Remove the scheduled packets from f; Interleaving level = 20 ms 11: end for 12: end for 2 1.08 13: Schedule all the packets over different paths with interval Goodput (Mbps) Ω = ωf Effective loss rate (%) p P×F ; Delay constraint = 150 ms 0 1.06 0 5 10 15 20 25 30 35 Interleaving level (ms) Note that the deadline-constrained packet interleaving Fig. 3. Tradeoff between effective loss rate and interleav- scheme with fixed interval ωp is not always optimal ing level on one side, goodput performance and delay in mitigating the consecutive losses. For instance, we constraint on the other side. consider the transmission of 4 packets on a single path B with loss rate π = 2.5% and burst length 1/ξp = 10 It is a challenging issue to determine the interleaving ms. The delay constraint is 45 ms. The resulting packet interval due to the tradeoff between the transmission loss rate with even path interleaving (0, 15, 30, 45) is and overdue losses. Fig. 3 shows the tradeoff and it 1.33%. But the optimal interleaving obtained by the can be observed from the blue line that there is a sud- Mathematica [30] optimization tool is (0, 21.48, 37.53, 45) den increase in the effective loss when the interleaving and the packet loss rate is 1.06%. This indicates the level is above 20 ms. On the other side, the goodput alternative path interleaving guarantees the near-optimal performance increases with a larger delay constraint. solution. However, this scheme is practical and easy to The increase is rapid for tight delay constraints but be implemented in real system since it does not require becomes very slow in the loose deadlines. Therefore, we explicit information of the burst loss length, which is d- propose to spread out the packets’ departures within the ifficult to be measured and predicted with high accuracy delay constraint T over different communication paths in practice. Although the even packet interleaving is not in an alternative manner. In the traditional ‘back-to-back’ always optimal, it still outperforms the ‘back-to-back’ transmission schemes, all data packets are delivered im- transmission schemes and we can have the following mediately after they are generated. In order to guarantee proposition.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 9

Proposition 3: Given the number of packets to be In conclusion, the rate assignment and packet interleaving scheduled (Mp) and interleaving level (ωp > 0) for each schemes in GALTON significantly reduce the effective loss communication path, the proposed deadline-constrained rate, and thus improve the goodput performance of real-time packet interleaving scheme outperforms the ‘back-to- traffic over multipath networks. back’ scheduling policy in mitigating transmission losses under the assumption of burst path losses. 5 PERFORMANCE EVALUATION Proof: See the supplementary file. We compare the performance of the proposed packet In this section, we evaluate the efficacy of the proposed interleaving scheme with three representative multipath GALTON by comparing it with the existing traffic distri- packet spreading approaches, i.e., the Smarttunnel [36], bution models over multipath networks. We firstly de- EMS [20], and Spread [55]. The evaluation results are scribe the evaluation methodology and results with real provided in the supplementary file. These packet inter- Internet traffic over wired multipath networks. Then, we leaving schemes are used for enhancing the FEC perfor- present and analyze the emulation results with H.264 mance under the burst loss assumption. We present a video streaming in heterogeneous wireless networks. summary of the reference schemes in Table 2. 5.1 Evaluations with Real Internet Traffic Traces TABLE 2 Summary of multipath packet interleaving schemes. Exata Emulations C1 ... C4 Path status Deadline Multiple flows Reordering √ Traffic Smarttunnel [36] × × × Generator 1 √ √ IPD EMS [20] √ √ × × … … Spread [55] × × … √ √ √ √ IP Proposed S Receiver Sender R … … IPD

4.4 Analysis mapping C1 ... C4 mapping The proposed GALTON model optimizes the goodput performance by taking advantage of the packet interleaving and flow rate assignment. As discussed in Sec- Local Network Connection Local Network Connection tion 3.3, the goodput performance of real-time traffic is Emulation Server determined by the effective packet loss rate (Π). This loss Sender Receiver probability over each communication path includes the Original Traffic Received Traffic transmission (π ) and overdue ( {D > T}) losses. Trace Trace p P p .trc file .trc file We impose the deadline T in the transmission constraints as shown in (11). Within the delay constraint T , Exata Analyzer the effective loss rate is expected to decrease due to a larger interleaving level. The proposed packet interleav- Fig. 4. System architecture for evaluations with real ing scheme strikes a good balance between the transmission Internet traffic. and overdue loss rates to minimize the effective loss rate as it possibly spreads out the packets’ departures while respecting the delay constraint. 5.1.1 Emulation Setup From the perspective of rate assignment, the delay- Network emulator. Exata 2.1 [31] is adopted as the net- oriented distribution models focus on minimizing the work emulator. Exata is an advanced edition of QualNet end-to-end latency of input traffic while ignore the [32] in which we can perform semi-physical emulations. transmission loss. The goodput performance of input In the emulation topology, the server has one network traffic will seriously degrade if such distribution mod- interface while the client is equipped with multiple els dispatch packets onto the network paths with high wired interfaces. The server and client are mapped to loss rate; On the other hand, the throughput-oriented real computers in local networks and they are connected schemes may allocate the flows onto the communication to the emulation server through the Exata 2.1 Connection paths with long-delay and many packets will arrive at Manager. We can construct an end-to-end path by bind- the destination out of the deadline. For the proposed ing a pair of IP addresses from the server and the client 1 flow rate assignment algorithm, we have the following respectively, e.g., P1 is constructed by {IPS,IPD}. The proposition input traffic can be concurrently transmitted through all Proposition 4: The proposed utility theory based flow the available paths to the client. The architecture of eval- rate assignment algorithm is able to obtain a near- uation system is presented in Fig. 4. As depicted in the optimal result to maximize the aggregate goodput of figure, each router is attached to one edge node, which deadline-constrained traffic over multipath networks. is single-homed and introduces background traffic. Each Proof: See the supplementary file. of the edge nodes has four traffic generators producing

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 10

cross traffic with a Pareto distribution. The background • Average end-to-end delay. The end-to-end delay is traffic packet sizes are chosen to resemble the distribu- counted from the departure of a data packet to its tion found on the Internet: 50% of them are 44Bytes arrival at the destination node. The reordering time long, 25% have 576Bytes, and 25% are 1500Bytes long is also taken into account. [33]. The aggregate cross traffic loads on the available • Aggregate loss rate. The aggregate loss rate consid- network paths are similar and vary randomly between ered in this work includes both the transmission and 0 − 30 percent of the bottleneck links’ bandwidth. overdue loss of all input flows over the available Traffic trace. In order to conduct the emulations with paths. With regard to the delay constraint (300 ms), real-life traffic, we download the Internet traces of from the overdue packets arrived at the destination are [34] and convert them to (.trc) files. The trace files considered to be lost. are input to the “Trace-based Traffic Generator” Exata application and the generated traffic lasts for 1-hour 5.1.4 Emulation Scenarios in order to obtain statistically meaningful results. The To evaluate the effect of the number of available paths, profile of the 4 traffic traces are presented in Fig. 5. we conduct emulations for traffic distribution over dif- According to the ITU-T G.114 recommendation [35], the ferent number of communication paths (P = 2, 3, 4, 5). worst case one-way delay should not exceed 400 ms to The configurations of path parameter in the emulations achieve acceptable media quality. Therefore, we set “QoS are presented in Table 3. The client reports the path constraint” for end-to-end delay as 300 ms. The data status to the sender for every 0.25 seconds and the distribution algorithm is executed for every 0.25 seconds. sample size (N) is set to be 5.

5.1.2 Reference Schemes TABLE 3 • Online Policy Iteration (OPI) [2]. The OPI algorithm Parameters of different wired links.

also uses a path state monitoring mechanism that Parameter Value complies with the end-to-end principle and cap- Path capacity (Mbps) 0.5, 0.75, 1, 1.25, 1.5 tures congestion states of overlay paths. The Join Path propagation delay (ms) 20, 40, 60, 80, 100 Path loss rate (%) 2.5, 5, 7.5, 10, 12.5 the Shortest Queue (JSQ) algorithm is selected as Average burst loss length (ms) 5, 10, 15, 20, 25 the starting policy. When a new data bin arrives, Threshold Limit Value (TLV) 1.2 [26] OPI observes the current system state and takes an appropriate action following the current policy To obtain the confidence results, we repeat each set of to distribute the data bin. Then, it evaluates the emulations more than 10 runs and present the averaged immediate reward for the action taken. results with a 95% interval. For the microscopic results • Effective Delay Controlled Load Distribution (E- and time series analysis, we present the data of a single DCLD) [1]. The objective of E-DCLD is to minimize run with finer granularity. the latency differences among different network paths for reducing packet reordering at the receiver 5.1.5 Evaluation Results and to efficiently balance load across these paths. 5 4.5 E-DCLD consists of three functional components: Available capacity GALTON OPI E−DCLD THR 4 GALTON (a) 4 (b) traffic splitter to derive flow rate allocation ratios for OPI E−DCLD 3.5 3 THR different paths, path selector to select an appropriate 3 2 path for each packet, and load adaptor to dynami- 2.5 Goodput (Mbps)

Goodput (Mbps) 1 cally estimate the end-to-end delays on each path. 2 • 0 1.5 Table-based Hashing with Reassignments (THR) 2 3 4 5 0 50 100 150 200 250 300 350 400 450 500 [45]. The control policy of THR is based on both Number of available paths Time (second) traffic and network status. In each scheduling phase, one of the super-flows assigned to the most over- Fig. 6. Comparison of goodput performance: (a) mean utilized path is moved to the most under-utilized values and confidence intervals, (b) instantaneous val- path (having a small queue-length) by updating the ues. flow-to-path mapping table, accordingly. THR has a pre-determined key parameter, β, which determines Goodput. Fig. 6a plots the average goodput obtained the priority between improving load imbalance and by different models in different emulations and it can be preventing packet reordering. observed that GALTON achieves higher goodput with lower variations than the reference schemes. With the 5.1.3 Performance Metrics increase in the number of available paths, the superiority • Goodput [12]. Goodput is the application-level of GALTON over other competing models becomes more throughput, i.e., the number of useful information obvious as the traffic load distribution algorithm is able bits delivered to the multihomed client within the to effectively leverage the path diversity to optimize the imposed deadline. The amount of data considered aggregate goodput. OPI outperforms E-DCLD and THR excludes protocol overhead bits. as it periodically monitors the paths status and adjusts

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 11

Trace 1 Trace 2 Trace 3 Trace 4 ) )

2 3 0.45 0.85 3 0.55 1.8 0.4 0.8 0.5 0.75 1.6 0.35 0.45 0.7 1.4 0.3 0.65 0.4 1.2 0.25 0.6 0.35 1 0.2 0.55 0.3 0.5 0.8 0.15 0.25 0.45 Traffic rate (Mbps) 0.6 0.1 Traffic rate (Mbps) 0.4 0.2 0.4 Number of packets (*10 0.05 0.35 Number of packets (*10 0.15 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 Time (second) Time (second) Time (second) Time (second)

Fig. 5. Proﬁle of input trafﬁc traces.

the data distribution on the fly. The results indicate dicted arrival time. In this fashion, GALTON reduces the importance of path loss estimation and mitigation the out-of-order packets belonged to different sub-flows schemes for traffic distribution. Although the E-DCLD and thus performs better than other reference schemes. model can effectively reduce the end-to-end delay by In real-time multimedia applications, the delay perfor- minimizing the risk of packet reordering, it takes insuffi- mance gains can significantly prevent playback buffer cient account of burst path losses that often occur in real- starvation, which is critical for the user-perceived media life Internet [36]. Therefore, the scheduled packets may quality. For instance, large end-to-end delay can induce encounter consecutive channel losses during the trans- video stalls/glitches during the playback process and mission with this ‘back-to-back’ transmission scheme. As result in ungraceful user experience. the emulations are performed in highly network dynam- 12 9 ics, it is difficult for the THR algorithm to balance the 10 (a) 8 (b) 7 8 tradeoff between packet out-of-order and load balancing 6 GALTON OPI E−DCLD THR frequently. In order to have a microscopic view of the 6 5 4 4 GALTON OPI E−DCLD THR results, the instantaneous goodput values in the case of Loss rate (%) 3 2 2

four communication paths between transmission ends Aggregate loss rate (%) 0 1 2 3 4 5 0 200 400 600 800 1000 are plotted in Fig. 6b. Number of available paths Time (second)

280 70 GALTON GALTON OPI THR E−DCLD Fig. 8. Comparison of loss performance : (a) mean values 60 260 (a) OPI (b) E−DCLD 50 and conﬁdence intervals, (b) instantaneous values in the 240 THR 40 220 interval of [0, 1000] second. 30 200 20 180 10 Out−of−order packets

End−to−end delay (ms) Aggregate Loss Rate. The aggregate loss rates of the 160 0 2 3 4 5 0 50 100 150 200 250 300 350 400 450 500 Number of available paths Time (second) competing models are depicted in Fig. 8a. GALTON significantly reduces the overall packet losses as it re- Fig. 7. Comparison of delay performance: (a) average duces the end-to-end delay as well as mitigates the end-to-end delay, (b) out-of-order packets. consecutive losses. Although OPI also distributes the traffic loads based on path status to optimize delay and loss performance, the transmission packets may Average End-to-End Delay. The results of average encounter consecutive losses due to the ‘back-to-back’ end-to-end delay are shown in Fig. 7a and the pattern fashion. But OPI is still able to outperform E-DCLD and is nearly the opposite to that in Fig. 6a. Larger end-to- THR in reducing the aggregate packet losses because end packet delay incurs more overdue packets, which it dispatches fewer packets onto the paths with lower in turn results in lower goodput. It can be observed quality. In Fig. 8b, we sketch the evolutions of aggregate that GALTON outperforms other competing models and loss rates in [0, 1000] seconds to have a close-up view. the gaps become larger with the availability of more It can be illustrated that GALTON achieves significantly usable paths. Different from the results of goodput, lower over all loss rate than the competing models. E-DCLD outperforms OPI in reducing the end-to-end delay. To depict the results of packet reordering, we 5.2 Evaluations with H.264 Video Streaming also show the number of out-of-order packets in Fig. 7b. This metric is measured by the offset between the two 5.2.1 Emulation Setup consecutively received packets (the difference between H.264/SVC reference software JSVM 9.18 [37] is adopted the sequence number of the current packet and that as the video codec. In order to implement the real-video- of the latest received packet). As is shown in Fig. 7b, streaming based emulations, we integrate the source the other competing models induce more out-of-order code of JSVM4 [as Objective File Library (.LIB)] with Exata packets than GALTON. GALTON periodically estimates 4. We choose the JSVM in convenience for the source code integra- the latest information available in terms of path status tion as both Exata and JSVM are developed using the C++ code while and distributes the traffic loads according to the pre- the H.264/AVC JM [38] software is developed using C language.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 12

Video frames Video PSNR evaluation tool client has three wireless network interfaces, i.e., WLAN, Encoder (PSNR Static) WiMAX and HSDPA. We conduct emulations in mobile FEC Original Video Encoder s scenarios that the client is set to move at walking speed t e

k Exata Emulations

c along a preﬁxed mobile trajectory within the coverage of a C1 ... C4 p these wireless networks. To emulate the burst channel C E

F HSDPA loss behavior, all the three wireless links are set to IP S have random faults. The parameter conﬁgurations for WLAN Client Server HSDPA, WiMAX and WLAN are presented in Table 4. Received FEC C1 ... C4 packets WiMAX TABLE 4 Reconstructed Parameters of heterogeneous wireless networks. Video Error Video FEC Concealment Decoder Decoder Parameter Value Path capacity (Kbps) 300, 500, 800 Path propagation delay (ms) 40, 60, 80 Path loss rate (%) 2.5, 5, 7.5 Fig. 9. System architecture for evaluations with H.264 Average burst loss length (ms) 10, 15, 20 video streaming.

5.2.4 Evaluation Results and develop an application layer protocol of ‘Video We ﬁrstly depict the channel status information obtained Transmission’ (as shown in Fig. 9). The detailed de- by the path monitoring algorithm in Fig. 19. It can be ob- scriptions of the development steps could be referred served that HSDPA supports relatively stable link while to Exata Programmer’s Guide [31]. The generated video the available bandwidth of WiMAX and WLAN expe- streaming is encoded at 30 frames per second and a riences frequent ﬂuctuations during the client mobility. GoP consists of 8 frames. The test video sequences are It is well-known that cellular networks exhibit better Foreman, Mother & Daughter, Hall, and Container in QCIF performance in sustaining user mobility than WiMAX (Quarter Common Interchange Format) with 300 frames. and WLAN but provide a lower peak data rate [6][43]. Each of the sequences features a different pattern of

temporal motion and spatial characteristics which is 800 12 WLAN HSDPA WiMAX WLAN HSDPA WiMAX 10 reflected in their corresponding video quality versus 600 (a) 8 (b) encoding rate dependencies. We concatenate the video 400 6 sequences 10 times to be 3000 frame-long to obtain statis- 4 200 Loss rate (%) tically meaningful results. The video encoding rate is set 2 0 Available bandwidth (Kbps) 0 to be 1400, 1600, and 1800 Kbps in different emulations. 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 Time (second) Time (second) The decoding deadline for each video frame is set as 250 ms in order to achieve excellent perceived quality [39]. Forward Error Correction (FEC) coding is common- Fig. 10. Profile of channel status information: (a) available ly adopted for data protection is implementing loss- bandwidth, (b) path loss rate. resilient video transmission systems. Therefore, the Reed Solomon code [40] is employed in the emulations to Fig. 11a plots the average PSNR values and confidence achieve the goal. The FEC redundancy adaption algo- intervals of the competing models in different emulation- rithm can be referred to our previous work [41]. We s. Expectedly, GALTON achieves higher PSNR values dynamically adjust the redundancy based on the video under various video streaming rates than the reference bit rate, path status information, and loss requirement. schemes due to its excellent performance in goodput. Higher goodput values generally lead to lower channel 5.2.2 Performance Metrics distortion of streaming video, which guarantees more video frames deliver within the decoding deadline and PSNR [42] (Peak Signal-to-Noise Ratio) is a standard thus results in the performance gains in PSNR. Generally, metric of video quality and is a function of the mean the PSNR values achieved by the competing schemes square error between the original and the received video degrade with the increase in video encoding rate because frames. If a video frame is lost or past the deadline, it is the aggregated bandwidth is insufficient compared to considered lost but may be concealed by copying from the input traffic rate. On one hand, increasing the video the last received frame before it, i.e., error concealment. encoding rate can reduce the source distortion caused This metric is evaluated using the PSNR static tool in by data compression process; On the other hand, higher the JSVM software. transmission rates will lead to larger end-to-end delays and give rise to more channel distortion. It is a challeng- 5.2.3 Emulation Scenarios ing task to dynamically determine the video source and In the emulation topology as depicted in Fig. 9, the video FEC coding rate due to the tradeoff between loss and server has one wired network interface while the mobile delay performance. The instantaneous PSNR values for

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 13

45 50 GALTON OPI E−DCLD THR [2] V. Bui, W. Zhu, et al., “A Markovian approach to Multipath Data 40 (a) (b) 40 Transfer in Overlay Networks,” IEEE Transactions on Parallel and 35 Distributed Systems, vol. 21, no. 10, pp. 1398-1411, 2010. 30 30 [3] K. Chebrolu and R. R. Rao, “Bandwidth aggregation for real-time PSNR (dB) PSNR (dB) 25 applications in heterogeneous wireless networks,” IEEE Transac- 20 GALTON OPI E−DCLD THR tions on Mobile Computing, vol. 5, no. 4, pp. 388-403, 2006. 20 1400 1600 1800 1000 1050 1100 1150 1200 1250 1300 1350 1400 [4] N. M. Freris, C. H. Hsu, J. P. Singh, and X. Zhu, “Distortion-aware Video encoding rate (Kbps) Video frame index Scalable Video Streaming to Multinetwork Clients,” IEEE/ACM Transactions on Networking, vol. 21, no. 2, pp. 469-481, 2013. Fig. 11. Comparison of PSNR results: (a) mean values, [5] Mushroom Networks Inc., Wireless broadband bonding network appliance, 2014. [Online]. Available: http://www. (b) PSNR per video frame indexed from 1000 to 1400 mushroomnetworks.com measured from the Foreman sequence. [6] S. Han, H. Joo, D. Lee and H. Song, “An End-to-End Virtual Path Construction System for Stable Live Video Streaming over Heterogeneous Wireless Networks,” IEEE Journal on Selected Areas in Communications, vol. 29, no. 5, pp. 1032-1041, 2011. video frames indexed from 1000 to 1400 measured from [7] C. Xu, T. Liu, J. Guan, et al., “CMT-QA: Quality-Aware Adaptive the Foreman sequence are shown in Fig. 11b to have a Concurrent Multipath Data Transfer in Heterogeneous Wireless microscopic view. Networks,” IEEE Transactions on Mobile Computing, vol. 12, no. 11, pp. 2193-2205, 2013. [8] T. Ernst, N. Montavont, R. Wakikawa, K. Kuladinithi, Motivations 6 CONCLUSIONAND DISCUSSION and Scenarios for Using Multiple Interfaces and Global Addresses, Internet-Draft, IETF MONAMI6 Working Group, 2008. The exponential growth of data traffic over the Inter- [9] S. Prabhavat, H. Nishiyama, N. Ansari, and N. Kato, “On Load net has become a major driving force for multihomed Distribution over Multipath Networks,” IEEE Communications Sur- veys & Tutorials, vol. 14, no. 3, pp. 662-680, 2012. communications over parallel network paths. Despite [10] L. Shi, B. Liu, C. Sun, Z. Yin, L. N. Bhuyan, and H. J. Chao, the rapid development of network infrastructures, it “ Load Balancing Multipath Switching System with Flow Slice,” is still a crucial challenge to effectively deliver real- IEEE Transactions on Computers, vol. 61, no. 3, pp. 350-365, 2012. [11] C. C. Hui and S. T. Chanson, “Hydrodynamic Load Balancing,” time applications with stringent delay, throughput, and IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 11, reliability requirements. pp. 1118-1137, 1999. This paper proposes a novel Goodput-Aware Load [12] http://en.wikipedia.org/wiki/Goodput [13] D. Ma, M. Ma, “A QoS Oriented Vertical Handoff Scheme for Distribution (GALTON) model to optimize the good- WiMAX/WLAN Overlay Networks,” IEEE Transactions on Parallel put performance of real-time traffic over multipath net- and Distributed Systems, vol. 23, no. 4, pp. 598-606, 2012. works. Through modeling and analysis, we have de- [14] A. Makela, S. Siikavirta, J. Manner, “Comparison of load- balancing approaches for multipath connectivity,” Computer Net- veloped solutions for path status estimation, flow rate works, vol. 56, no. 8, pp. 2179-2195, 2012. assignment, and deadline-constrained packet interleav- [15] W. Shi, M. H. MacGregor and P. Gburzynski, “Load Balancing for ing. The emulation results demonstrate that GALTON Parallel Forwarding,” IEEE/ACM Transactions on Networking, vol. 13, no. 4, pp. 790-801, 2005. is able to outperform existing traffic distribution mod- [16] S. Kandula, D. Katabi, S. Sinha and A. Berger, “Dynamic Load els in terms of goodput, video PSNR, delay and loss Balancing without Packet Reordering,” ACM SIGCOMM Computer performance. The superiority of the proposed model in Communications Review, vol. 37, no.2 , pp. 53-62, 2007. [17] S. Mao, S. S. Panwar, et al., “On optimal partitioning of Realtime enhancing transmission reliability is theoretically ana- traffic over multiple paths,” in Proc. of IEEE INFOCOM, 2005. lyzed based on Gilbert loss model and continuous time [18] C. Cetinkaya and E. W. Knightly, “Opportunistic traffic schedul- Markov chain. As future work, we will study the chal- ing over multiple network paths,” in Proc. of IEEE INFOCOM, 2004. [19] V. Sharma, K. Kar, K. K. Ramakrishnan, et al. “A Transport lenging problem of differentiated scheduling for real- Protocol to Exploit Multipath Diversity in Wireless Networks,” time traffic flows with heterogeneous delay constraints. IEEE/ACM Trans. Netw., pp. 1024-1039, vol. 20, no. 4, 2012. [20] A. L. H. Chow, H. Yang, C. H. Xia, et al., “Ems: Encoded Multipath Streaming for Real-time Live Streaming Applications,” in Proc. of ACKNOWLEDGEMENT IEEE ICNP, 2009. The research reported in this paper is partly supported by the [21] J. Wu, J. Yang, X. Wu, and J. Chen, “A Low Latency Scheduling Multi-plAtform Game Innovation Centre (MAGIC), funded by Approach for High Definition Video Streaming over Heteroge- neous Wireless Networks,” in Proc. of IEEE GLOBECOM, 2013. the Singapore National Research Foundation under its IDM [22] L. Golubchik, J. Lui, T. Tung, et al., “Multi-path continuous media Futures Funding Initiative and administered by the Interac- streaming. what are the benefits?” Performance Evaluation, vol. 49, tive & Digital Media Programme Office, Media Development no. 1, pp. 429-449, 2002. Authority; National Grand Fundamental Research 973 Pro- [23] E. Gilbert, “Capacity of a burst-noise channel,” Bell System Tech- gram of China under Grant Nos. 2011CB302506, 2013CB329102, nical Journal, vol. 39, no. 9, pp. 1253-1265, 1960. 2012CB315802; National High-tech R&D Program of China (863 [24] R. Li, and A. Eryilmaz, “Scheduling for End-to-End Deadline- Program) under Grant No. 2013AA102301; National Natural Constrained Traffic with Reliability Requirements in Multi-Hop Science Foundation of China under Grant Nos. 61003067, Networks,” in Proc. of IEEE INFOCOM, 2011. 61171102, 61001118, 61132001. [25] A. Zhou, M. Liu, Z. Li, et al., “Cross-layer design for proportional delay differentiation and network utility maximization in multi- We thank the anonymous reviewers and the Associate Editor hop wireless networks,” IEEE Transactions on Wireless Communica- for their valuable comments. tions, vol. 11, no. 4, pp. 1446-1455, 2012. [26] X. Zhu, P. Agrawal, J. P. Singh, et al., “Distributed Rate Allocation REFERENCES Policies for Multihomed Video Streaming Over Heterogeneous Access Networks,” IEEE Transactions on Multimedia, vol. 11, no. 4, [1] S. Prabhavat, H. Nishiyama, N. Ansari and N. Kato, “Effective pp. 752-764, 2009. Delay-Controlled Load Distribution over Multipath Networks,” [27] F. Kelly and T. Voice, “Stability of end-to-end algorithm for joint IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. routing and rate control,” ACMSIGCOMM Comput. Commun. Rev., 10, pp. 1730-1741, 2011. vol. 32, no. 2, pp. 5-12, 2005.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, 2014 14

[28] V. Ribeiro, R. Riedi, R. Baraniuk, J. Navratil, and L. Cottrell. Jiyan Wu received the Ph.D. degree in com- “pathChirp: Efficient available bandwidth estimation for network puter science and technology from the Beijing paths,” in Proc. of Passive and Active Measurement Workshop, 2003. University of Posts and Telecommunications (su- [29] Y. L. Hong, W. Q. Meeker, et al., “The Relationship between Con- pervisor: Prof. Junliang Chen) in June 2014, the fidence Intervals for Failure Probabilities and Life Time Quantiles,” Maser degree from China University of Mining IEEE Trans. Reliability, vol. 57, no. 2, pp. 260-266, 2008. and Technology (Beijing) in June 2011. Since [30] Mathematica, http://www.wolfram.com/mathematica/ March 2014, he has been working as a post- [31] Exata, http://www.scalable-networks.com/exata doctoral research fellow (supervisor: Prof. Chau [32] QualNet, http://www.scalable-networks.com/qualnet Yuen) in the SUTD-MIT International Design [33] C. Fraleigh, S. Moon, B. Lyles et al., “Packet-Level Traffic Mea- Center, Singapore University of Technology and surements from the Sprint IP Backbone,” IEEE Network, vol. 17, no. Design (SUTD). His research interests include 6, pp. 6-16, 2003. video coding, heterogeneous networks, multipath transmission, etc. [34] Traffic Traces, Columbia University, [Online]. Available: http:// www.cs.columbia.edu/∼hgs/internet/traces.html [35] ITU-T G.114: One-way transmission time, International Telecom- munication Union, Recommendation, 2003. Chau Yuen received the BEng and PhD degrees [36] Y. Li , Y. Zhang, L. L. Qiu, et al., “Smarttunnel: Achieving from Nanyang Technological University, Singa- reliability in the internet,” in Proc. of IEEE INFOCOM, 2007. pore, in 2000 and 2004, respectively. He was a [37] H.264/SVC JSVM software, [Online]. Available: http://ip.hhi.de/ postdoc fellow in Lucent Technologies Bell Labs, imagecom-G1/savce/downloads/SVC-Reference-Software.htm Murray Hill during 2005. He was also a visiting [38] H.264/AVC JM reference software, [Online]. Available:http:// assistant professor of Hong Kong Polytechnic iphome.hhi.de/suehring/tml/ University in 2008. During the period of 2006- [39] W. Song, W. Zhuang, “Performance analysis of probabilistic mul- 2010, he was at the Institute for Infocomm Re- tipath transmission of video streaming traffic over multi-radio search (Singapore) as a senior research engi- wireless devices,” IEEE Trans. Wireless Commun., vol. 11, no. 4, neer. He joined Singapore University of Technol- pp. 1554-1564, 2012. ogy and Design as an assistant professor from [40] P. Frossard, “FEC performances in multimedia streaming,” IEEE June 2010. He also serves as an associate editor for IEEE Transactions Communications Letters, vol. 5, no. 3, pp. 122-124, 2001. on Vehicular Technology. In 2012, he received the IEEE Asia-Pacific [41] J. Wu, Y. Shang, J. Huang, et al., “Joint source-channel coding Outstanding Young Researcher Award. and optimization for mobile video streaming in heterogeneous wireless networks,” EURASIP Journal on Wireless Communications and Networking, pp. 1-16, 2013. [42] ANSI T1.TR.74-2001, “Objective video quality measurement using a peak-signal-to-noise-ratio (PSNR) full reference technique,” [On- Bo Cheng received his Ph.D. degree in comput- line]. Available: http://webstore.ansi.org/RecordDetail.aspx?sku= er science and engineering in July 2006, from T1.TR.74-2001. University of Electronic Science and Technology [43] P. Si, H. Ji, and F. R. Yu, “Optimal network selection in hetero- of China. He has been working in the Beijing Uni- geneous wireless multimedia networks” Wireless Networks, vol. 16, versity of Posts and Telecommunications (BUP- no. 5, pp. 1277-1288, 2010. T) since 2008. He is now an associate professor [44] R. Stewart, “Stream Control Transmission Protocol (SCTP),” IETF of the Research Institute of Networking Technol- RFC 4960, 2007. ogy of BUPT. His current research interests in- [45] T. W. Chim, K. L. Yeung, and K. S. Lui, “Traffic distribution over clude network services and intelligence, Internet equal-cost-multi-paths,” Computer Networks, vol. 49, no. 4, pp. 465- of Things technology, communication software 475, 2005. and distribute computing, etc. [46] J. R. Iyengar, P. Amer, and R. Stewart, “Concurrent Multipath Transfer Using SCTP Multihoming over Independent End-to-End Paths,” IEEE/ACM Trans. Netw., vol. 14, no. 5, pp. 951-964, 2006. [47] D. Rubenstein, J. Kurose, and D. Towsley, “Detecting shared Yanlei Shang received his Ph.D. degree in com- congestion of flows via end-to-end measurement,” IEEE/ACM puter science and technology from Beijing Uni- Transactions on Networking, vol. 10, no. 3, pp. 381-395, 2002. versity of Posts and Telecommunications (BUP- [48] M. S. Kim, T. Kim, Y. Shin, S. Lam, and E. Powers, “A wavelet- T) in 2006. Then he worked in the Nokia Re- based approach to detect shared congestion,” in Proc. of ACM search Center as a postdoctoral research fellow. SIGCOMM, 2004. He is currently an associate professor in the [49] J. Han, D. Watson, F. Jahanian, “Topology aware overlay net- State Key Laboratory of Networking and Switch- works,” in Proc. of IEEE INFOCOM, 2005. ing Technology, BUPT. His research interest- [50] S. Fashandi, S. O. Gharan, A. K. Khandani, “Path diversity s include cloud computing, service computing, over packet switched networks: performance analysis and rate distributed system and virtualization technology. allocation,” IEEE/ACM Transactions on Networking, vol. 18, no. 5, pp. 1373-1386, 2010. [51] S. Cen, P. C. Cosman, G. M. Voelker, “End-to-end differentiation of congestion and wireless losses,” IEEE/ACM Transactions on Net- working, vol. 11, no. 5, pp. 703-717, 2003. Junliang Chen is the chairman and a profes- [52] C. D. Guerrero, M. A. Labrador, “Traceband: A fast, low overhead sor of the Research Institute of Networking and and accurate tool for available bandwidth estimation and monitor- Switching Technology at Beijing University of ing,” Computer Networks, vol. 54, no. 6, pp. 977-990, 2010. Posts and Telecommunications (BUPT). He has [53] J. C. Kim, Y. Lee, “An end-to-end measurement and monitoring been working in BUPT since 1955. He received technique for the bottleneck link capacity and its available band- the B.S. degree in electrical engineering from width,” Computer Networks, vol. 58, no. 15, pp. 158-179, 2014. Shanghai Jiaotong University, China, in 1955, [54] U. K. Sarkar, S. Ramakrishnan, and D. Sarkar, “Modeling full- and his Ph.D. degree in electrical engineering in length video using Markov-modulated Gamma-based framework,” May, 1961, from Moscow Institute of Radio En- IEEE/ACM Trans. Netw., vol. 11, no. 4, pp. 638-649, 2003. gineering, formerly Soviet Russia. His research [55] M. Kurant, “Exploiting the path propagation time differences in interests are in the area of communication net- multipath transmission with FEC,” IEEE Journal on Selected Areas works and next generation service creation technology. Prof. Chen was in Communications, vol. 29, no. 5, pp. 1021-1031, 2011. elected as a member of the Chinese Academy of Science in 1991, and a member Chinese Academy of Engineering in 1994.