A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks

Sanaa Hamid Mohamed, Student Member, IEEE, Taisir E.H. El-Gorashi, Member, IEEE, and Jaafar M.H. Elmirghani, Senior Member, IEEE

Abstract— This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source platform; Hadoop, are enabling the development of a large number of cloud-based services and big data applications. MapReduce and Hadoop thus introduce innovative, efficient, and accelerated intensive computations and analytics. These services usually utilize commodity clusters within geographically-distributed data centers and provide cost-effective and elastic solutions. However, the increasing traffic between and within the data centers that migrate, store, and process big data, is becoming a bottleneck that calls for enhanced infrastructures capable of reducing the congestion and power consumption. Moreover, enterprises with multiple tenants requesting various big data services are challenged by the need to optimize leasing their resources at reduced running costs and power consumption while avoiding under or over utilization. In this survey, we present a summary of the characteristics of various big data programming models and applications and provide a review of cloud computing infrastructures, and related technologies such as virtualization, and software-defined networking that increasingly support big data systems. Moreover, we provide a brief review of data centers topologies, routing protocols, and traffic characteristics, and emphasize the implications of big data on such cloud data centers and their supporting networks. Wide ranging efforts were devoted to optimize systems that handle big data in terms of various applications performance metrics and/or infrastructure energy efficiency. This survey aims to summarize some of these studies which are classified according to their focus into applications-level, networking-level, or data centers-level optimizations. Finally, some insights and future research directions are provided. Index Terms— Big Data, MapReduce, Machine Learning, Data Streaming, Cloud Computing, Cloud Networking, Software-Defined Networking (SDN), Virtual Machines (VM), Network Function Virtualization (NFV), Containers, Data Centers Networking (DCN), Energy Efficiency, Completion Time, Scheduling, Routing.

I INTRODUCTION T HE evolving paradigm of big data is essential for critical advancements in data processing models and the underlying acquisition, transmission, and storage infrastructures [1]. Big data differs from traditional data in being potentially unstructured, rapidly generated, continuously changing, and massively produced by a large number of distributed users or devices. Typically, big data workloads are transferred into powerful data centers containing sufficient storage and processing units for real-time or batch computations and analysis. A widely used characterization for big data is the "5V" notion which describes big data through its unique attributes of Volume, Velocity, Variety, Veracity, and Value [2]. In this notation, the volume refers to the vast amount of data produced which is usually measured in Exabytes (i.e. 260 or 1018 bytes) or Zettabytes (i.e. 270 or 1021 bytes), while the velocity reflects the high speed or rate of data generation and hence potentially the short lived useful lifetime of data. Variety indicates that big data can be composed of different types of data which can be categorized into structured and unstructured. An example of structured data is bank transactions which can fit into relational database systems, and an example of the unstructured data is social media content that could be a mix of text, photos, animated Graphics Interchange Format (GIF), audio files, and videos contained in the same element (e.g. a tweet, or a post). The veracity measures the trustworthiness of the data as some generated portions could be erroneous or inaccurate, while the value measures the ability of the user or owner of the data to extract useful information from the data.

In 2020, the global data volume is predicted to be around 40,000 Exabytes which represents a 300 times growth factor compared to the global data volume in 2005 [3]. An estimate of the global data volume in 2010 is about 640 Exabytes [4], and in 2015 is about 2,700 Exabytes [5]. This huge growth in data volumes is the result of continuous developments in various applications that generate massive and rich content related to a wide range of human activities. For example, online business transactions are expected to have a rate of 450 Billion transactions per day by 2020 [4]. Social media such as Facebook, LinkedIn, and Twitter, which have between 300 Million and 2 Billion subscribers who access these social media platforms through web browsers in personal computers (PCs), or through applications installed in tablets and smart phones are enriching the content of the Internet with content in the range of several Terabytes (240 bytes) per day [5]. Analyzing the thematic connections between the subscribers, for example by grouping people with similar interests, is opening remarkable opportunities for targeted marketing and e-commerce. Moreover, the subscriber's behaviours and preferences tracked by their activities, clickstreams, requests, and collected web log files can be analyzed with big data mining tools for profound psychological, economical, business-oriented, and product improvement studies [6], [7]. To accelerate the delay-sensitive operations of web searching and indexing , distributed programming models for big data such as MapReduce were developed [8]. MapReduce is a powerful, reliable, and cost-effective programming model that performs parallel processing for large distributed datasets. These features have enabled the development of different distributed programming big data solutions and cloud computing applications.

Fig. 1. Big data communication, networking, and processing infrastructure, and examples of big data applications.

A wide range of applications are considered big data applications from data-intensive scientific applications that require extensive computations to massive datasets that require manipulation such as in earth sciences, astronomy, nanotechnology, genomics, and bioinformatics [9]. Typically, the computations, simulations, and modelling in such applications are carried out in High Performance Computing (HPC) clusters with the aid of distributed and grid computing. However, the datasets growth beyond these systems capacities in addition to the desire to share datasets for scientific research collaborations in some disciplines are encouraging the utilization of big data applications in cloud computing infrastructures with commodity devices for scientific computations despite the resultant performance and cost tradeoffs [10].

With the prevalence of mobile applications and services that have extensive computational and storage demands exceeding the capabilities of the current smart phones, emerging technologies such as Mobile Cloud Computing (MCC) were developed [11]. In MCC, the computational and storage demands of applications are outsourced to remote (or close as in mobile edge computing (MEC)) powerful servers over the Internet. As a result, on-demand rich services such as video streaming, interactive video, and online gaming can be effectively delivered to the capacity and battery limited devices. Video content accounted for 51% of the total mobile data traffic in 2012 [11], and is predicted to account for 78% of an expected total volume of 49 Exabytes by 2021 [12]. Due to these huge demands, in addition to the large sizes of video files, big video data platforms are fronting several challenges related to video streaming, storage, and management, while needing to meet strict quality-of-experience (QoE) requirements [13]. In addition to mobile devices, the wide range of everyday physical objects that are increasingly interconnected for automated operations has formed what is known as the Internet-of-Things (IoT). In IoT systems, the underlying communication and networking infrastructure are typically integrated with big data computing systems for data collection, analysis, and decision-making. Several technologies such as RFID, low power communication technologies, Machine-to-Machine (M2M) communications, and wireless sensor networking (WSN) have been suggested for improved IoT communications and networking infrastructure [14]. To process the big data generated by IoT devices, different solutions such as cloud and fog computing were proposed [15]-[31]. Existing cloud computing infrastructures could be utilized by aggregating and processing big data in powerful central data centers. Alternatively, data could be processed at the edge where fog computing units, typically with limited processing capacities compared to cloud, are utilized [32]. Edge computing reduces both; the traffic in core networks, and the latency by being closer to end devices. The connected devices could be sensors gathering different real-time measurements, or actuators performing automated control operations in industrial, agricultural, or smart building applications. IoT can support vehicle communication to realize smart transportation systems. IoT can also support medical applications such as wearables and telecare applications for remote treatment, diagnosis and monitoring [33]. With this variety in IoT devices, the number of Internet-connected things is expected to exceed 50 Billion by 2020, and the services provided by IoT are expected to add $15 Trillion to the global Gross Domestic Product (GDP) in the next 20 years [14]. Figure 1 provides generalized big data communication, networking, and processing infrastructure and examples of applications that can utilize it.

Fig.2. Classification of big data applications optimization studies.

Achieving the full potential of big data requires a multidisciplinary collaboration between computer scientists, engineers, data scientists, as well as statisticians and other stakeholders [4]. It also calls for huge investments and developments by enterprises and other organizations to improve big data processing, management, and analytics infrastructures to enhance decision making and services offerings. Moreover, there are urgent needs for integrating new big data applications with existing Application Program Interfaces (API) such as Structured Query Language (SQL), and R language for statistical computing. More than $15 billion have already been invested in big data systems by several leading Information Technology (IT) companies such as IBM, Oracle, Microsoft, SAP, and HP [34]. One of the challenges of big data in enterprise and cloud infrastructures is the existence of various workloads and tenants with different Service Level Agreement (SLA) requirements that need to be hosted on the same set of clusters. An early solution to this challenge at the application level is to utilize a distributed to control the access and sharing of data within the clusters [35]. On the infrastructure level, solutions such as Virtual Machines (VMs) or containers dedicated to each application or tenant were utilized to support the isolation between their assigned resources [1], [34]. Big data systems are also challenged by security, privacy, and governance related concerns. Furthermore, as the increasing computational demands of the increasing data volumes are exceeding the capabilities of existing commodity infrastructures, future enhanced and energy efficient processing and networking infrastructure for big data have to be investigated and optimized.

This survey paper aims to summarize a wide range of studies that use different state-of-art and emerging networking and computing technologies to optimize and enhance big data applications and systems in terms of various performance metrics such as completion time, data locality, load balancing, fairness, reliability, and resources utilization, and/or their energy efficiency. Due to the popularity and wide applicability of the MapReduce programming model, its related optimization studies will be the main focus in this survey. Moreover, optimization studies for big data management and streaming, in addition to generic cloud computing applications and bulk data transfer are also considered. The optimization studies in this survey are classified according to their focus into applications-level, cloud networking-level, and data center-level studies as summarized in Figure 2. The first category at the application level targets the studies that extend or optimize existing framework parameters and mechanisms such as optimizing jobs and data placements, and scheduling, in addition to developing benchmarks, traces, and simulators [36]-[119]. As the use of big data applications is evolving from clusters with controlled environments into cloud environments with geo-distributed data centers, several additional challenges are encountered. The second category at the networking level focuses on optimizing cloud networking infrastructures for big data applications such as inter-data centers networking, virtual machine assignments, and bulk data transfer optimization studies [120]-[203]. The increasing data volumes and processing demands are also challenging the data centers that store and process big data. The third category at the data center level targets optimizing the topologies, routing, and scheduling in data centers for big data applications, in addition to the studies that utilize, demonstrate, and suggest scaled-up computing and networking infrastructures to replace commodity hardware in the future [204]-[311]. For the performance evaluations in the aforementioned studies, either realistic traces, or deployments in experimental testbed clusters are utilized.

Although several big data related surveys and tutorials are available, to the best of our knowledge, none has extensively addressed optimizing big data applications while considering the technological aspects of their hosting cloud data centers and networking infrastructures. The tutorial in [1] and the survey in [312] considered mapping the role of cloud computing, IoT, data centers, and applications to the acquisition, storage and processing of big data. The authors in [313]-[316] extensively surveyed the advances in big data processing frameworks and compared their components, usage, and performance. A review of benchmarking big data systems is provided in [317]. The surveys in [318], [319] focused on optimizing jobs scheduling at the application level, while the survey in [320] additionally tackled extensions, tuning, hardware acceleration, security, and energy efficiency for MapReduce. The environmental impacts of big data and its usage to green applications and systems were discussed in [321]. Security and privacy concerns of MapReduce in cloud environments were discussed in [322], while the challenges and requirements of geo-distributed batch and streaming big data frameworks were outlined in [323]. The surveys in [324]-[326] addressed the use of big data analytics to optimize wired and wireless networks, while the survey in [327] overviewed big data mathematical representations for networking optimizations. The scheduling of flows in data centers for big data is addressed in [328] and the impact of data centers frameworks on the scheduling and resource allocation is surveyed in [329] for three big data applications.

This survey paper is structured as follows: For the convenience of the reader, brief overviews for the state-of- art and advances in big data programming models and frameworks, cloud computing and its related technologies, and cloud data centers are provided before the corresponding optimization studies. Section II reviews the characteristics of big data programming models and existing batch, streaming processing and storage management applications while Section III summarizes the applications-focused optimization studies. Section IV discusses the prominence of cloud computing for big data applications, and the implications of big data applications on cloud networks. It also reviews some related technologies that support big data and cloud computing systems such as machine and network virtualization, and Software-Defined Networking (SDN). Section V summarizes the cloud networking-focused optimization studies. Section VI briefly reviews data center topologies, traffic characteristics and routing protocols, while Section VII summarizes the data center-focused optimization studies. Finally, Section VIII provides future research directions, and Section IX concludes the survey. Key Acronyms are provided below.

APPENDIX A LIST OF KEY ACRONYMS AM Application Master. ACID Atomicity, Consistency, Isolation, and Durability. BASE Basically Available Soft-state Eventual consistency. CAPEX Capital Expenditure. CPU Central Processing Unit. CSP Cloud Service Provider. DAG Directed Acyclic Graph. DCN Data Center Networking. DFS Distributed File System. DVFS Dynamic Voltage and Frequency Scaling. EC2 Elastic Compute Cloud. EON Elastic Optical Network. ETSI European Telecom Standards Institute. HDFS Hadoop Distributed File System. HFS Hadoop Fair Scheduler. HPC High Performance Computing. I/O Input/Output. ILP Integer Linear Programming. IP Internet Protocol. ISP Internet Service Provider. JT Job Tracker. JVM Java Virtual Machine. MILP Mixed Integer Linear Programming. NFV Network Function Virtualization. NM Node Manager. NoSQL Not only SQL. OF Open Flow. O-OFDM Optical Orthogonal Frequency Division Multiplexing. OPEX Operational Expenses. OVS Open v Switch. P2P Peer-to-Peer. PON Passive Optical Network. QoE Quality of Experience. QoS Quality of Service. RAM Random Access Memory. RDBMS Relational Data Base Management System. RDD Resilient Distributed Data sets. RM Resource Manager. ROADM Reconfigurable Optical Add Drop Multiplexer. SDN Software Defined Networking. SDON Software Defined Optical Networks. SLA Service Level Agreement. SQL Structured Query Language. TE Traffic Engineering. ToR Top-of-Rack. TT Task Tracker. VM Virtual Machine. VNE Virtual Network Embedding. VNF Virtual Network Function. WAN Wide Area Network. WC WordCount. WDM Wavelength Division Multiplexing. WSS Wavelength Selective Switch. YARN Yet Another Resource Negotiator.

II PROGRAMMING MODELS, PLATFORMS, AND APPLICATIONS FOR BIG DATA ANALYTICS: This Section reviews some of the programming models developed to provide parallel computation for big data. These programming models include MapReduce [8], Dryad [330], and Cloud Dataflow [331]. The input data in these models can be generally categorized into bounded and unbounded data depending on the required computational speed. In applications that have no strict processing speed requirements, the input data can be aggregated, bounded, and then processed in batch mode. In applications that require instantaneous analysis of continuous data flows (i.e. unbounded data), streaming mode is utilized. The MapReduce programming model, with the support of open source platforms such as , has been widely used for efficient, scalable, and reliable computations at reduced costs especially for batch processing. The maturity of such platforms have supported the development of several scalable commercial or open-source big data applications and services for processing and data storage management. The rest of this Section is organized as follows: Subsection II-A illustrates the aforementioned programming models, while Subsection II-B briefly describes the characteristics and components of Apache Hadoop and its related applications. Subsection II-C focuses on big data storage management applications, while Subsection II-D summarizes some of the in-memory big data applications. Finally, Subsections II-E, and II-F elaborate on big data stream processing applications, and the Lambda hybrid architecture, respectively. A. Programming Models: The MapReduce programming model was introduced by Google in 2003 as a cost-effective solution for processing massive data sets. MapReduce utilizes distributed computations in commodity clusters that run in two phases; map and reduce which are adopted from the Lisp functional programming language [8]. The MapReduce user defines the required functions of each phase by using a programming language such as C++, Java, or Python, and then submits the code as a single MapReduce job to process the data. The user also defines a set of parameters to configure the job. Each MapReduce job consists of a number of map and reduce tasks depending on the input data size and the configurations, respectively. Each map task is assigned to process a unique portion of the input data set, preferably available locally, and hence can run independently from other map tasks. The processing starts by transforming the input data into the key-value schema and applies it to the map function to compute another key-value pair also known as the intermediate results. These results are then shuffled to reduce tasks according to their keys where each reduce tasks is assigned to process intermediate results with a unique set of keys. Finally, each reduce task generates the final outputs. The internal operational details of MapReduce such as assigning the nodes within the cluster to map or reduce tasks, partitioning the input data, tasks scheduling, fault tolerance, and inter-machine communications are typically performed by the run-time system and are hidden from the users. The input and output data files are typically managed by a Distributed File System (DFS) that provides a unified view of the files and their details, and allows various modifications such as replication, read, and write operations. An example of DFSs is the (GFS) [332] which is a fault-tolerant, reliable, and scalable chunk- based distributed file system designed to support MapReduce in Google's commodity servers. The typical chunk size in GFS is 64 MB and each chunk is replicated in different nodes with a default value of 3 to support fault- tolerance. The components of a typical MapReduce cluster and the implementation details of the MapReduce programming model are illustrated in Figure 3. One of the nodes in the cluster is set to be a Master, while the others are set to be Workers and are assigned to either a map or a reduce task by the master. Beside task assignments, the master is also responsible for monitoring the performance of the running tasks and for checking the statuses of the nodes within the cluster. The master also manages the information about the location of the running jobs, data replicas, and intermediate results. The detailed steps of implementing a MapReduce job are as follows [8]: 1) The MapReduce code is copied to all cluster nodes. In this code, the user defines the map and reduce functions and provides additional parameters such as input and output data types, names of the output files, and the number of reduce workers. 2) The master assigns map and reduce tasks to available workers where typically, map workers are more than reduce workers and are assigned to several map tasks. 3) The input data, in the form of key-value pairs, is split into smaller partitions. The splits (S) and their

replicas (SR) are distributed in the map workers’ local disks as illustrated in Figure 3. The splits are then processed concurrently in their assigned map workers according to their scheduling. Each map function produces intermediate results (IR) consisting of the intermediate key-value pairs. These results are then materialized (i.e. saved persistently) in the local disks of the map workers. 4) The intermediate results are divided into (R) parts to be processed by R reduce workers. Partitioning can be done through hash functions (e.g. hash(key) mod R) to ensure that each key is assigned to only one reduce worker. The locations of the hashed intermediate results and their file sizes are sent to the master node.

Fig. 3. Google’s MapReduce cluster components and the programming model implementation.

5) A reduce task is composed of shuffle, sort, and reduce phases. The shuffle phase can start when 5% of the map results are generated, however the last reduction phase cannot start unless all map tasks are completed. For shuffling, each reduce worker obtains the locations of the intermediate pairs with the keys assigned to it and fetches the corresponding results from the map workers’ local disks typically via the HyperText Transfer Protocol (HTTP). 6) Each reduce worker then sorts its intermediate results by the keys. The sorting is performed in the Random Access Memory (RAM) if the intermediate results can fit, otherwise, external sort-merge algorithms are used. The sorting groups all the occurrences of the same key and forms the shuffled intermediate results (SIR). 7) Each reduce worker applies the assigned user-defined reduce function on the shuffled data to generate the final key-value pairs output (O), the final output files are then saved in the distributed file system. In MapReduce, fault-tolerance is achieved by re-executing the failed tasks. Failures can occur due to hardware causes such as disk failures, out of disk, out of memory, and socket time out. Each map or reduce task can be in one of three statuses which are idle, in-progress, and completed. If an in-progress map task fails, the master changes its status to idle to allow it to be re-scheduled on an available map node containing a replica of the data. If a map worker fails while having some completed map tasks, all contained map tasks must be re-scheduled as the intermediate results, which are only saved on local disks, are no longer accessible. In this case, all reduce workers must be re-scheduled to obtain the correct and complete set of intermediate results. If a reduce worker fails, only in-progress tasks are re-scheduled as the results of completed reduce jobs are saved in the distributed file system. To improve MapReduce performance, speculative execution can be activated, where backup tasks are created to speed up the lacking in-progress tasks known as stragglers [8]. The MapReduce programming model is capable of solving several common programming problems such as words count and sort in addition to implementing complex graph processing, data mining, and machine learning applications. However, the speed requirements for some computations might not be satisfied by MapReduce due to several limitations [333]. Moreover, developing efficient MapReduce applications requires advanced programming skills to fit the computations into the map and reduce pipeline, and deep knowledge of underlying infrastructures to properly configure and optimize a wide range of parameters [334]-[337]. One of MapReduce limitations is that transferring non-local input data to map workers and shuffling intermediate results to reduce workers typically require intensive networking bandwidth and disk I/O operations. Early efforts to minimize the effects of these bottlenecks included maximizing data locality, where the computations are carried closer to data [8]. Another limitation is due to the fault-tolerance mechanism that requires materializing the entire output of MapReduce jobs in the disks managed by the DFS before being accessible for further computations. Hence, MapReduce is generally less suitable for interactive and iterative computations that require repetitive access to results. An implementation variation known as MapReduce Online [338], supports shuffling by utilizing RAM resources to pipeline intermediate results between map and reduce stages before the materialization. Several other programming models were developed as variants to MapReduce [314]. One of these variants is Dryad which is a high-performance general-purpose distributed programming model for parallel applications with coarse-grain data [330]. In Dryad, a Directed Acyclic Graph (DAG) is used to describe the jobs by representing the computations as the graph vertices and the data communication patterns as the graph edges. The Job Manager within Dryad, schedules the vertices, which contain sequential programs, to run concurrently in a set of machines that are available at the run time. These machines can be different cores within the same multi-core PC or can be thousands of machines within a large data center. Dryad provides fault-tolerance and efficient resource utilization by allowing graph modification during the computations. Unlike MapReduce, which restricts the programmer to provide a single input file and produces a single output file, Dryad allows for an arbitrary number of input and output files. As a successor to MapReduce, Google has recently introduced “Cloud Dataflow” which is a unified programming model with enhanced processing capabilities for bounded and unbounded data. It provides a balance between correctness, latency, and cost when processing massive, unbounded, and out-of-order data [331]. In Cloud Dataflow, the data is represented as tuples containing the key, value, event-time, and the required time window for the processing. This supports sophisticated user requirements such as event-time ordering of results by utilizing data windowing that divide the data streams into finite chunks to be processed in groups. Cloud Data flow utilizes the features of both FlumeJava which is a batch engine and MillWheel which is a streaming engine [331]. The core primitives of Cloud Dataflow, are ParDo, which is an element-wise generic parallel processing function, and GroupByKey and GroupByKeyandWindow which aggregate data with the same key according to the user requests. B. Apache Hadoop Architecture and Related Software: Hadoop, which is currently under the auspices of the Apache Software Foundation, is an open source software framework written in Java for reliable and scalable distributed computing [339]. This framework was initiated by Doug Cutting who utilized the MapReduce programming model for indexing web crawls and was open-sourced by Yahoo in 2005. Beside Apache, several organizations have developed and customized Hadoop distributions tailored for their infrastructures such as HortonWorks, Cloudera, (AWS), Pivotal, and MAPR technologies. The Hadoop ecosystem allows other programs to run on the same infrastructure with MapReduce which made it a natural choice for enterprise big data platforms [35]. The basic components of the first versions of Hadoop; Hadoop 1.x are depicted in Figure 4. These versions contain a layer for the Hadoop Distributed File System (HDFS), a layer for the MapReduce 1.0 engine which resembles Google's MapReduce, and can have other applications on the top layer. The MapReduce 1.0 layer follows the master-slave architecture. The master is a single node containing a Job Tracker (JT), while each slave node contains a Task Tracker (TT). The JT handles jobs assignment and scheduling and maintains the data and metadata of jobs, in addition to resources information. It also monitors the liveness of TTs and the availability of their resources by sending periodic heartbeat messages typically each 3 seconds. Each TT contains a predefined set of slots. Once it accepts a map or a reduce task, it launches a Java Virtual Machine (JVM) in one of its slots to perform the task, and periodically updates the JT with the task status [339]. The HDFS layer consists of a name node in the master and several data nodes in each slave node. The name node stores the details of the data nodes and the addresses of the data blocks and their replicas. It also checks the data nodes via heartbeat messages and manages load balancing. For reliability, a secondary name node is typically assigned to save snapshots of the primary name node. As in GFS, the default file size in HDFS is 64 MB and three replicas are maintained for each file for fault-tolerance, performance improvements, and load balancing. Beside GFS and HDFS, several distributed files systems were developed such as Amazon's simple Storage Service (S3), Moose File System (MFS), Kosmos distributed file system (KFS), and Colossus [314], [340]. Default tasks scheduling mechanisms in Hadoop are First-In First-Out (FIFO), capacity scheduler, and Hadoop Fair Scheduler (HFS). FIFO schedules the jobs according to their arrival time which leads to undesirable delays in environments with a mix of long batch jobs and small interactive jobs [319]. The Capacity scheduler developed at Yahoo reserves a pool containing minimum resources guarantees for each user, and hence suits systems with multiple users [319]. FIFO scheduling is then used for the jobs of the same user. The Fair scheduler developed at Facebook dynamically allocates the resources equally between jobs. It thus improves the response time of small jobs [46].

(a) (b) Fig. 4. Framework components in Hadoop (a) Hadoop 1.x, and (b) Hadoop 2.x.

Hadoop 2.x, which is also depicted in Figure 4, introduced a resource management platform named YARN; Yet Another Resource Negotiator [341]. YARN decouples the resource management infrastructure from the processing components and enables the coexistence of different processing frameworks beside MapReduce which increases the flexibility in big data clusters. In YARN, the JT and TT are replaced with three components which are the Resource Manager (RM), the Node Manager (NM), and the Application Master (AM). The RM is a per- cluster global resources manager which runs as daemon on a dedicated node. It contains a scheduler that dynamically leases the available cluster resources in the form of containers (further explained in Subsection IV- B4), which are considered as logical bundles (e.g. 2 GB RAM, 1 Central Processing Unit (CPU) core), among competing MapReduce jobs and other applications according to their demands and scheduling priorities. A NM is a per-server daemon that is responsible for monitoring the health of its physical node, tracking its containers assignments, and managing the containers lifecycle (i.e. starting and killing). An AM is a per-application container that manages the resources consumption, the jobs execution flow, and also handles the fault-tolerance tasks. The AM, which typically needs to harness resources from several nodes to finish its job, issues a resource request to the RM indicating the required number of containers, the required resources per container, and the locality preferences. Figure 5 illustrates the differences between Hadoop 1.x, and Hadoop 2.x with YARN. A detailed study of various releases of Hadoop is presented in [342]. The authors also provided a brief comparison of these releases in terms of their energy efficiency and performance.

(a) (b) (c) Fig. 5. Comparison of clusters with (a) Google’s MapReduce, (b) Hadoop 1.x, and (b) Hadoop 2.x with YARN.

YARN increases the resource allocation flexibility in MapReduce 2 as it utilizes flexible containers for resources allocations and hence eliminates the use of fixed resources slots as in MapReduce 1. However, this advantage comes at the expenses of added system complexity and a slight increase in the power consumption when compared to MapReduce 1 [342]. Another difference is that the intermediate results shuffling operations in MapReduce 2 are performed via auxiliary services that preserve the output of a container’s results before killing it. The communications between the AMs, NMs, and the RM are heartbeat-based. If a node fails, the NM sends an indicating heartbeat to the RM which in turn informs the affected AMs. If an in-progress job fails, the NM marks it as idle and re-executes it. If an AM fails, the RM restarts it and synchronizes its tasks. If an RM fails, an old checkpoint is used and a secondary RM is activated [341]. A wide range of applications and programming frameworks can run natively in Hadoop as depicted in Figure 4. The difference in their implementation between the Hadoop versions is that in the 1.x versions, they are forced to follow the MapReduce framework while in the 2.x versions they are no longer restricted to it. Examples of these applications and frameworks are Pig [343], Tez [344], Hive [345], HBase [314], Storm [346], Giraph for graph processing, and Mahout for machine learning [315], [340]. Due to the lack of built-in declarative languages in Hadoop, Pig, Tez, and Hive were introduced to support querying and to replace ad-hoc users-written programs which were hard to maintain and reuse. Pig is composed of an execution engine and a declarative scripting language named Pig Latin that compiles SQL-like queries to an equivalent set of sequenced MapReduce jobs. Pig hides the complexity of MapReduce and directly provides advanced operations such as filtering, joining, and ordering [343]. Tez is a flexible input-processor output-runtime model that transforms the queries into abstracted DAG where the vertices represent the parallel tasks, and the edges represent the data movement between different map and reduce stages [344]. It supports in-memory operations which makes it more suitable for interactive processing than MapReduce. Hive [345] is a data warehouse software developed at Facebook. It contains a declarative language; HiveQL that automatically generates MapReduce jobs from SQL-like user queries.

C. Big Data Storage Management Applications: As with the programming models, the increasing data volume is challenging legacy storage management systems. Using relational centralized database systems with big data will typically lead to several inefficiencies. This has encouraged the use of large-scale distributed cloud-based data storage management systems known as “key-value stores” or, “Not only SQL (NoSQL)”. It is well known that no single computing system is capable of providing effective processing to all types of workloads. To design a generic platform that suits several types of requests and workloads, some trade-offs have be considered. A set of measures for these trade-offs for data storage management systems is defined and follows the Consistency, Availability, and Partition-tolerance (CAP) theorem which states that any distributed system can only satisfy two of these three properties. Consistency reflects the fact that all replicas of an entry must have the same value at all times, and that reading operations should return the latest values of that entry. Availability implies that the requested operations should be allowed always and performed promptly, while partition-tolerance indicates that the system can function if some parts of it are disconnected [313]. Most of the traditional Relational Data Base Management Systems (RDBMS), which utilize SQL for querying, are classified as “CA” which stands for consistency and availability. As partition-tolerance is a key requirement in cloud distributed infrastructures, NoSQL systems are classified as either “CP” where the availability is relaxed or “AP” where the consistency is relaxed and replaced by eventual, timeline, or session consistency. For any transaction to be processed concurrently, RDBMS are required to provide the Atomicity, Consistency, Isolation, and Durability (ACID) guarantees. Most of RDBMS rely on expensive shared-memory or shared-disks hardware to ensure high performance [314]. On the other hand, NoSQL systems utilize commodity share-nothing hardware while ensuring scalability, fault-tolerance, and cost-effectiveness, traded for consistency. Thus, the ACID guarantees are typically relaxed or replaced by the Basically Available Soft-state Eventual consistency (BASE) guarantees. While RDBMS are considered mature and well-established, NoSQL systems still lack experienced programmers and most of these systems are still in pre-production phases [313]. Several big data storage management applications were developed for commercial internal production workloads operations such as BigTable [347], PNUTS [348], and DynamoDB [349]. Big Table is a Column oriented storage system developed at Google for managing structured large-scale data sets in petabyte scale. It only supports single row transactions and performs an atomic read-modify-write sequence that relies on a distributed locking system called Chubby. PNUTS is a massive-scale data base system designed at Yahoo to support their web-based applications, while DynamoDB is a highly scalable and available distributed key-value data store developed at Amazon to support their cloud-based applications [340]. Examples of open-sourced big data management systems include HBase, HadoopDB, and Cassandra [314]. HBase is a key-value column- oriented data base management system while HadoopDB is a hybrid system that combines the scalability of MapReduce with the performance guarantees of parallel databases. In HadoopDB, the queries are expressed in SQL but are executed in parallel through the MapReduce framework [313], [350]. Cassandra is a highly scalable eventually consistent, distributed structured key-value store developed at Facebook [351]. Examples of other systems in efforts to integrate indexing capabilities with MapReduce are Hadoop++ [352], and Hadoop Aggressive Indexing Library (HAIL). Hadoop++ adds indexing and joining capabilities to Hadoop without changing its framework, while HAIL creates different clustered indexes for each data replica and supports multi- attribute querying [315]. D. Distributed In-memory Processing: Distributed in-memory processing systems are widely preferred for iterative, interactive, and real-time big data applications due to the bottleneck of the relatively slow data materialization processes in existing disk-based systems [316]. Running these applications in legacy MapReduce systems requires repetitive materialization as map and reduce tasks have to be re-launched iteratively. This in turn leads to excessive un-utilized disk Input/Output (I/O), CPU, and network bandwidth resources [315]. In-memory systems use fast memory units such as Dynamic Random Access Memory (DRAM) or cache units to provide rapid access to data during the run-time to avoid the slow disk I/O operations. However, as memory units are volatile, in-memory systems are required to adopt advanced mechanisms to guarantee fault-tolerance and data durability. Examples of in-memory NoSQL databases are RAMCloud [353] and HANA [354]. RAMCloud aggregates DRAM resources from thousands of commodity servers to provide large-scale database system with low latency. Distributed cache systems can also enhance the performance of large-scale web applications. Examples of these systems are Redis, which is an in- memory data structure store, and Memcached which is a light-weight in-memory key-value object caching system with strict Least Recently Used (LRU) eviction mechanism [316]. To support in-memory interactive and iterative data processing, Spark which is a scalable data analytics platform written in Scala, was introduced [355]. In Spark, the data is stored as Resilient Distributed Data sets (RDD) [356] which are general purpose and fault-tolerant abstraction for data sharing in distributed computations. RDDs are created in the memory through course-grained deterministic transformations to datasets such as map, flatmap, filter, join, and GroupByKey. In cases of insufficient RAM, Spark performs lazy materialization of RDDs. RDDs are immutable and by applying further transformations, new RDDs are created. This provides fault- tolerance without the need for replication as the information about the transformations that created each RDD can be retrieved and reapplied to obtain the lost portions. E. Distributed Graph Processing: The partitioning and processing of graphs (i.e. a description of entities by vertices and their relationships by connecting edges) is considered a key class of big data applications especially for social networks that contain graphs with up to Billions of entities and edges [357]. Most of big data graph processing applications utilize in- memory systems due to the iterative nature of their algorithms. Pregel [358], developed at Google, targets the processing of massive graphs on distributed clusters of commodity machines by using the Bulk Synchronous Parallel (BSP)-based programming model. Giraph, which is the open-source implementation of Pregel, uses online hashing or range-based partitioning to dispatch sub-graphs to workers [39]. Trinity is a distributed graph engine that optimizes distributed memory usage and communication cost under the assumption that the whole graph is partitioned across a cloud memory [359]. Other examples of distributed graph processing applications are GraphLab [360] which utilizes asynchronous distributed shared memory, and PowerGraph [361], which focuses on efficient partitioning of graphs with power-law distributions. F. Big Data Streaming Applications: Applications related to e-commerce and social networking typically receive virtually unbounded data, also known as streams that are asynchronously generated by a huge number of sources or users at uncontrollable rates. Stream processing systems are required to provide low latency and efficient processing that copes with the arrival rate of the events within the streams. If they did not cope, the processing of some events is dropped leading to load shedding which is undesirable. Conceptually, a streaming systems model is composed of continuously arriving input data to almost static queries for processing, while the batch and RDBMS systems model is composed of different queries applied to static data [362]. Streams processing can be performed in batch systems, however this leads to several inefficiencies. Firstly, the implementation is not straightforward as it requires transforming the streams into partitions of batch data before processing. Secondly, considering relatively large partitions increases the processing latency while considering small partitions increases the overheads of segmenting, maintaining the inter-segment dependencies, in addition to fault-tolerance overheads caused by the frequent materialization. Main Memory MapReduce (M3R) [363], MillWheel [364], Storm [346], Yahoo S4 [365], and Spark streaming [366] are examples of streams processing systems. M3R is introduced to provide high reliability for interactive and continuous queries over streamed data of terabytes in small clusters [363]. However, it does not provide adequate resilience guarantee as it caches the results totally in memory. MillWheel is a scalable fault-tolerant, low-latency stream processing engine developed at Google [364] that allows time-based aggregations and provides fine-grained check-pointing. Storm was developed at Twitter as a distributed and fault-tolerant platform for real-time processing of streams [346]. It utilized two primitives; spout and bolts to apply transformations to data streams, also named tuples, in a reliable and distributed manner. The spouts define the streams sources, while the bolts perform the required computations on the tuples, and emit the resultant modified tuples to other bolts. The computations in Storm are described as graphs where each node is either a spout or a bolt, and the vertices are the tuples routes. Storm relies on different grouping methods to specify the distribution of the tuples between the nodes. These include shuffle where streams are partitioned randomly, field where partitioning is performed according to a defined criteria, all where the streams are sent to all bolts, and global where all streams are copied to a single bolt. Yahoo Simple Scalable Streaming System (S4) is a general-purpose distributed stream processing engine that provides low latency and fault-tolerance [365]. In S4, the computations are performed by identical Processing Elements (PEs) that are logically hosted in Processing Nodes (PNs). The coordination between PEs is performed by “ZooKeeper” which is an open-source cluster management application. The PEs either emit results to another PE or publish them, while PNs are responsible for listening to events, executing the required operations, dispatching events, and emitting output events. Spark streaming [366], which is extended from Spark, introduced a stream programming model named discretized streams (D-Streams) to provide consistency, fault tolerance, and efficient integration with batch systems. Two types of operations can be used which are transformation operations that produce new D-streams, and output operations that save resultant RDDs to HDFS. Spark Streaming allows users to seamlessly combine streaming, batch, and interactive queries. It also contains stateful operators such as windowing, incremental aggregation, and time-skewed joins. G. The Lambda Architecture: The Lambda architecture is a hybrid big data processing system that provides concurrent arbitrary processing and querying for batch and real-time data in a single entity at the same time [367]. As depicted in Figure 6, it consists of three layers which are the batch layer, the speed layer, and the serving layer. The batch layer contains a batch system such as Hadoop or Spark to support the querying and processing of archived or stored data, while the speed layer contains a streaming system to provide low-latency processing to incoming input data in real-time. The input data is fed to both; the batch and speed layers in parallel to produce two sets of results that represent the batch and real-time view of the queries, respectively. The serving layer role is to combine both results and produce the finalized results. The Lambda architecture is typically integrated with general messaging systems such as Kafka to aggregate the queries of users. Kafka is a high performance, open-source messaging system developed at LinkedIn for the purpose of aggregating high-throughput log files [368]. Although the Lambda architecture introduces high level of flexibility by getting the benefits of both real-time, and batch systems, it lacks simplicity due to the need to maintain two systems. Moreover, the Lambda architecture encounters some limitations with event-oriented applications [367].

Fig. 6. Components of the Lambda architecture.

III APPLICATIONS-FOCUSED OPTIMIZATION STUDIES This Section discusses some big data applications optimization studies performed at the application-level. Optimizations at this level include tuning application parameters rather than using the default settings, or modifying the programming models themselves to obtain enhanced performance. The rest of this Section is organized as follows: Subsection III-A considers studies focusing on optimizing data and/or jobs and tasks placements. Subsection III-B summarizes studies that optimize jobs scheduling, while Subsection III-C describes and evaluates studies focusing on reducing the completion time of jobs. Finally, Subsection III-D provides a summary of big data benchmarking suites, traces, and simulators. Table I provides a summary of the studies presented in the first three Sections while focusing on their objectives, targeted platform, optimization tools, benchmarks used, and experimental setup and/or simulations environments. A. Optimized Jobs/Data Placements: The early work in [36] addressed the problem of optimizing data placement in Homogeneous Hadoop clusters. A dynamic algorithm that assigns data fragments according to nodes processing capacities was proposed to reduce data movement between over-utilized and under-utilized servers. However, the effects of replications were not considered. Typically, most of the nodes within a Hadoop cluster are kept active to provide high data availability which is energy inefficient. In [37], the energy consumption of Hadoop clusters was reduced by dynamically switching off unused nodes while ensuring that replicas of all data sets are contained in a covering subset which is kept always active. Although the energy consumption was reduced, the jobs running time was increased. Dynamic sizing and locality-aware scheduling in dedicated MapReduce clusters with three types of workloads namely; batch MapReduce jobs, web applications, and interactive MapReduce jobs, were addressed in [38]. A balance between performance and energy saving was achieved by allocating the optimum number of servers by delaying batch workloads while considering data locality and delay constraints of web and interactive workloads. A Markov Decision Process (MDP) model was considered to reflect the stochastic nature of jobs arrival and an event-driven simulator was used to empirically validate the proposed algorithms optimality. Energy savings between 30% and 59% were achieved over the no allocation strategy. Graphs and iterative applications typically have highly dependent and skewed workloads as some portions of the datasets are more accessed for processing than others. The study in [39] proposed a Dependency Aware Locality for MapReduce (DALM) algorithm to process highly skewed and dependent input data. The algorithm was tested with Giraph and was found to reduce the cross-server traffic by up to 50%. Several studies addressed optimizing reduce tasks placement to minimize networking traffic by following greedy approaches. These approaches degrade the performance as they maximize the intermediate data locality for current reduce tasks without considering the effects on the input data locality for map tasks in following MapReduce jobs [40]. The theoretical and experimental study in [40] addressed optimizing reduce tasks data locality for sequential MapReduce jobs and achieved up to 20% improvement in performance compared to the greedy approaches. The efficiency of indexing, grouping, and joint querying operations can be significantly affected by the used framework data placement strategy. In [41], a data block allocation approach was proposed to reduce job execution time and improve query performance in Hadoop clusters. The default random data placement policy of Hadoop 1.0.3 was modified by using k-means clustering techniques to co-allocate related data blocks on same clusters while considering the default replication factor. The results indicated a query performance improvement by up to 25%. The study in [42] addressed the poor resource utilization caused by misconfiguration in HBase. An algorithm named HConfig for semi-automating resources allocation and data bulks loading was proposed and an improvement between 2% and 3.7% was achieved compared to default settings. Several recent studies considered jobs or data placement optimizations in Hadoop 2.x with YARN. The impact of data locality in YARN with delay scheduler on jobs completion time was addressed in [43]. To measure the data locality achieved, a YARN Location Simulator (YLocSim) tool was developed, validated, and compared with experimental results in a private cluster. Moreover, the effects of inherent resources allocation imbalance which increase with data locality, and the redundant I/O operations due to same data requests by different containers, were studied. Existing job schedulers ignore the partitioning skew caused by uneven shuffled intermediate data volumes to reducers. In [44], a framework to control resource allocation to reduce jobs, named Dynamic REsource Allocation technique for MapReduce with Partitioning Skew (DREAMS), was developed while considering data replication. Experimental results show an improvement by up to 2.29 times compared to native YARN. Inappropriate configurations in YARN lead to degraded performance as resource usage by tasks typically vary during the execution, while the resources initially assigned to them are fixed. In [45], JellyFish which is a self-tuning system based on YARN, was proposed to reduce the completion time and improve the utilization by considering elastic containers, performing online reconfigurations, and rescheduling for the idle resources. The study considered the most critical subset of YARN parameters and an average improvement of 65% was achieved compared to default YARN for repetitive jobs. B. Jobs Scheduling: Default scheduling mechanisms such as FIFO, capacity scheduler, and HFS, can lead to undesirable performance especially with mixtures of small interactive and long batch jobs [318]. Several studies suggested enhanced scheduling mechanisms to overcome the inefficiencies of default schedulers in terms of resource utilization, jobs makespan (i.e. the completion time of last task), or energy efficiency. The conflict between fairness in scheduling and data locality was addressed in [46]. To maximize data locality, some tasks assigned according to fairness scheduling were intentionally delayed until a cluster containing the corresponding data is available. An algorithm named delay scheduling was implemented on HFS and tested on commercial and private clusters based on Hive workloads from Facebook. It was found that only short delays were required to achieve almost 100% locality. The response times for small jobs were improved by about five times at the expense of moderately slowing down larger jobs. Quincy, a slot-based scheduler for Dryad, was implemented as a minimum- cost flow algorithm in [47] to achieve fairness and data locality for concurrent distributed jobs. The scheduling was implemented as a DAG, where the edges represent the competing demand of locality and fairness. Quincy provided better performance than queue-based schedulers and was capable of effectively reducing the network traffic. In [48], a data replication-aware scheduling algorithm was proposed to reduce the network traffic and speculative executions. The proposed map jobs scheduler was implemented in two waves. Firstly, the empty slots in the cluster were filled based on the number of hosted maps and the replication schemes. Secondly, a run-time scheduler was utilized to increase the locality and balance the intermediate data distribution for the shuffling phase. A map scheduler that balances locality and load balancing was proposed in [49]. In this work, the computing cluster is modeled as a time-slotted system where job arrivals are modeled as Bernoulli random variables and the service time is modelled as a geometric random variable with different mean values based on data locality. The scheduler utilized ‘Join the Shortest Queue’ (JSQ) and the maximum weight policies and was proven to be throughput and delay optimal under heavy traffic conditions. In [50], a Resource-aware Adaptive Scheduling (RAS) mechanism was proposed to improve resource utilization and achieve user-defined completion time goals. RAS utilizes offline job profiling information to enable dynamic adjustments for the number of slots in each machine while meeting the different requirements of the map and reduce phases. FLEX, is a flexible scheduling allocation scheme introduced in [51] to optimize the response time, makespan, and SLA. FLEX ensures minimum slot guarantees for jobs as in fair scheduler and introduces maximum slots guarantees. Experimental results showed that FLEX outperformed the fair scheduler by up to 30% in terms of the response time. The studies in [52], [53] scheduled MapReduce jobs according to apriori known job sizes information to optimize Hadoop clusters performance. HFSP in [52] focused on resource allocation fairness and the reduction of the response time among concurrent interactive and batch jobs by using the concepts of ageing function and virtual time. LsPS in [53] is a self-tuning two-tier scheduler, the first scheduler is used among multiple users and the second scheduler is for each user. It aimed to improve the average response time by adaptively selecting between fair and FIFO schedulers. Prior job information was also utilized in [54] for offline MapReduce scheduling while taking server assignments into consideration. The authors in [55] reported that shuffling in MapReduce accounted for about a third of the overall jobs completion time. A joint scheduling mechanism for the map, shuffle, and reduce phases that considered their dependencies was implemented via linear programs and heuristics with precedence constraints. The study in [56] focused on improving the CPU utilization by dynamically scheduling waiting map tasks to nodes with high I/O waiting time. The execution time of I/O intensive jobs with the proposed scheduler was improved by 23% compared to FIFO. To reduce the energy consumption of performing MapReduce jobs, the work in [57] proposed an energy-aware scheduling mechanism while considering the Dynamic Voltage Frequency Scaling (DVFS) settings for the cluster. DVFS was experimentally utilized in [58] to reduce the energy consumption of computation extensive workloads. In [59], two heuristics namely EMRSA-I, and EMRSA-II were developed to assign map and reduce tasks to slots with the goal of reducing the energy consumption while satisfying SLAs. The energy consumption was reduced by about 40% compared to default schedulers at the expense of increased jobs makespan. In [60], a dynamic slot allocation framework; DynamicMR was proposed to improve the efficiency of Hadoop. Dynamic Hadoop Slot Allocation (DHSA) algorithm was utilized to relax the strict slot allocation to either map or reduce tasks by enabling their reallocation to achieve a dynamic map to reduce ratio. DynamicMR outperformed YARN by 2% - 9%, and considerably reduced the network contention caused by the high number of over-utilized reduce tasks. Unlike scheduling at the task-level, the work in [61] introduced PRISM which is a fine-grained resource aware scheduler. PRISM, which divides tasks into phases each with its own resources usage profile, provided 1.3 times reduction in the running time compared to default schedulers. Several recent studies considered optimizing MapReduce scheduling under YARN. HaSTE in [62] utilized the fine-grained CPU and RAM resources management capabilities of YARN to schedule map and reduce tasks under dependency constrains without the need for prior knowledge of tasks execution times. HaSTE improved the resources utilization and the makespan even for mixed workloads. The study in [63] suggested an SLA-aware energy-efficient scheduling scheme based on DVFS for Hadoop with YARN. The scheme, which was applied to the per-application AM, reduced the CPU frequency in the cases of tasks finishing before their expected completion time. In [64], a priority-based resource scheduler for a streaming system named Quasit was proposed. The scheduler dynamically modifies the data paths of lower priority tuples to allow faster processing for higher priority tuples for vehicular traffic streams.

C. Completion Time: The reduction of jobs overall completion time was considered in several studies with the aim of improving the SLA or reducing the power consumption in underlying clusters. Numerical evaluations were utilized in [65] to test two power efficient resource allocation approaches in a pool of MapReduce clusters. The algorithms developed aimed to reduce the end-to-end delay or the energy consumption while considering the availability as an SLA metric. In an effort to provide predictable services to deadline-constrained jobs, the work in [66] suggested a Resource and Deadline-aware Hadoop scheduler (RDS). RDS is based on online optimization and a self-learning completion time estimator for future tasks which makes it suitable for dynamic Hadoop clusters with mixture of energy sources and workloads. The work in [67] focused on reducing the completion time of small jobs that account for the majority of the jobs in production Hadoop clusters. The proposed scheduler; Fair4S achieved an improvement by a factor of 7 compared to fair scheduler where 80% of small jobs waited less than 4 seconds to be served. In [68], a Bipartite Graph MapReduce scheduler (BGMRS) was proposed for deadline-constrained MapReduce jobs in clusters with heterogeneous nodes and dynamic job execution time. By optimizing scheduling and resources allocations, BGMRS reduced the deadline miss ratio by 79% and the completion time by 36% compared to fair scheduler. The authors in [69] developed an Automatic Resource Inference and Allocation (ARIA) framework based on jobs profiling to reduce the completion time of MapReduce jobs in shared clusters. A Service Level Objective (SLO) scheduler was developed to utilize the predicted completion times and determine the schedule of resources allocation for tasks to meet soft deadlines. The study in [70] proposed a Dynamic Priority Multi-Queue Scheduler (DPMQS) to reduce the completion time of map tasks in heterogeneous environments. DPMQS increased both the data locality and the priority of map jobs that are near to completion. Optimizing the scheduling of mixed MapReduce-like workloads was considered in [71] through offline and online algorithms that determine the order of tasks that minimize the weighted sum of the completion time. The authors in [72] also emphasized the role of optimizing jobs ordering and slots configurations in reducing the total completion time for offline jobs and proposed algorithms that can improve non-optimized Hadoop by up to 80%. The work in [73] considered optimizing four NoSQL databases (i.e. HBase, Cassandra, and Hive) by reducing the Waiting Energy Consumption (WEC) caused by idle nodes waiting for job assignments, I/O operations, or results from other nodes. RoPE was proposed in [73] to reduce the response time of relational queries performed as cascaded MapReduce jobs in SCOPE which is a parallel processing engine used by Microsoft. A profiler for code and data properties was used to improve future invocations of the same queries. RoPE achieved 2× improvements in the response time for 95% of Bing's production jobs while using 1.5× less resources. Jobs failures lead to huge increase in their completion time. To improve the performance of Hadoop under failures, a modified MapReduce work flow with fine-grained fault-tolerance mechanism called BEneath the Task Level (BeTL) was proposed in [75]. BeTL allows generating more files during the shuffling to create more checkpoints. It improved the performance of Hadoop under no failures by 6.6% and under failures by up to 51%. The work in [76] proposed four multi-queue size-based scheduling policies to reduce jobs slowdown variability which is defined as the idle time to wait for resources or I/O operations. Several factors such as parameters sensitivity, load unbalance, heavy-traffic, and fairness were considered. The work in [77] optimized the number of reduce tasks, their configurations, and memory allocations based on profiling the intermediate results size. The results indicated a complete disregard for job failures due to insufficient memory and a reduction in the completion time by up to 88.79% compared to legacy memory allocation approaches. To improve the performance of MapReduce in memory-constrained systems, Mammoth was proposed in [78] to provide global memory management. In Mammoth, related map and reduce tasks were launched in a single Java Virtual Machine (JVM) as threads that share the memory at run time. Mammoth actively pushes intermediate results to reducers unlike Hadoop that passively pulls from disks. A rule-based heuristic was used to prioritize memory allocations and revocations among map, shuffle, and reduce operations. Mammoth was found to be 5.19 times faster than Hadoop 1 and to outperform Spark for interactive and iterative jobs when the memory is insufficient [78]. An automatic skew mitigation approach; SkewTune was proposed and optimized in [79]. SkewTune detects different types of skew, and effectively re-partitions the unprocessed data of the stragglers to process them in idle nodes. The results indicated a reduction by a factor of 4 in the completion time for workloads with skew and minimal overhead for workloads without skew.

TABLE I SUMMARY OF APPLICATIONS-FOCUSED OPTIMIZATIONS STUDIES Ref Objective Application Tools Benchmarks/workloads Experimental Setup/Simulation environment [36]* Optimize data placement Hadoop 1.x Dynamic Grep, WordCount 5 heterogeneous nodes with Intel's in heterogeneous clusters algorithm (Core 2 Duo, Celeron, Pentium 3) [37]* Reduce energy Hadoop 0.20.0 Modification to Webdata_sort, data_scan 36 nodes (8 CPU cores, 32GB RAM, Gigabit consumption by Hadoop from Gridmix benchmark NIC, two disks), 48-port HP ProCurve 2810- scaling down unutilized and defining (16-128 GB) 48G switch clusters covering subset [38]* Balance energy - locality-aware Geometric distribution Event-driven simulations consumption scheduler for tasks arrival rate and performance via Markov Decision Process [39]* Dependency-Aware Hadoop 1.2.1, Modification to 3.5 GB Graph from 4 nodes (Intel i7 3.4 GHz Quad, 16GB DDR3 Locality for MapReduce Giraph 1.0.0 HDFS replication Wikimedia database, RAM, 1 TB disk, 1 Gbps NIC, NETGEAR 8- (DALM) scheme and 2.1 GB of public social port switch scheduling networks data algorithm [40]* Optimizing reduce task Hadoop 1.x Classical stochastic Grep (10 GB), 7 slave nodes (4 cores 2.933 GHz CPU, locality for sequential sequential Sort (15 GB, 4.3 GB) 32KB cache, 6GB RAM, 72GB disk) MapReduce jobs assignment [41]* Optimizing data Hadoop 1.0.3, k-means clustering 920 GB business ad-hoc 10 nodes; 1 NameNode (6 2.6 GHz CPU cores, placements for query with Hive 0.10.0 for data queries from TPC-H 16GB RAM, 1TB SATA), 9 DataNodes (Intel operations placement, HDFS i5, 4GB RAM, 300GB disk), 1 Gbps Ethernet extension to support customized data placement [42]* HConfig: Configuring HBase 0.96.2, Algorithms to semi- YCSB Benchmark 13 nodes (1 manager, 3 coordinators, 9 data loading in HBase Hadoop 2.2.0 automate data workers), and 40 nodes (1 manager, 3 clusters loading and coordinators, 36 workers) with (AMD resource allocation Opeteron CPU, 8GB RAM, 2 SATA 1TB disks), 1 Gigabit Ethernet [43]* Impact of data locality Hadoop 2.3.0, Modify Rumen and 8GB of synthetic text for One node (4 CPU cores, 16GB RAM), 16 in YARN on completion Sqoop, Pig, YARN Grep, WordCount, 10GB nodes (2 CPU cores, 8GB RAM), 1 Gigabit time and I/O operations Hive Scheduler Load TPC-H queries Ethernet, YARN Location Simulator Simulator (SLS) (YLocSim) to report data locality [44]* DREAMS: dynamic YARN 2.4.0 Additional feature WC, Inverted Index, k- 21 Xen-based virtual machines (4 2GHz reduce tasks resources in YARN means, classification, CPU cores, 8GB RAM, 80GB disk) allocation DataJoin, Sort, Histo- movies (5GB, 27GB) [45]* JellyFish: Self tuning Hadoop 2.x Parameters tuning, PUMA benchmark 4 nodes (dual Intel 6 cores Xeon system based on YARN resources (TeraSort, WordCount, CPU, 15MB L3 cache, 7GB RAM, rescheduling Grep, Inverted Index) 320GB disk) Gigabit Ethernet via elastic containers [46]° Delay scheduling: Hadoop 0.20 Algorithms Facebook traces (text 100 nodes in Amazon EC2 (4 2 GHz core, 4 balance fairness and data implemented search, simple filtering disks, 15GB RAM, 1 Gbps links), 100 nodes locality in HDFS selection, aggregation, (8 CPU cores, 4 disks), 1 Gbps Ethernet join) [47]° Quincy: Fair Scheduling Dryad Graph-based Sort(40,160,320) GB, 243 nodes in 8 racks (16GB RAM, with locality and fairness algorithms Join (11.8, 41.8) GB, 2 2.6 GHz dual core AMD) 48-port PageRank(240) GB, Gigabit Ethernet per rack WordCount (0.1-5) GB [48]° Maestro: Replica- Hadoop 0.19.0, Heuristic GridMix (Sort, 20 virtual nodes in local cluster, 100 Grid5000 aware Scheduling Hadoop 0.21.0 WordCount) nodes (2 GHz dual core AMD, 2GB RAM, (2.5,1,12.5,200) GB 80GB disk) [49]° Map task scheduling to - Join Shortest Parameters from search Simulations for a 400 machines cluster balance data locality and Queue, Max Weight jobs in databases load balancing Scheduler in heavy traffic queues [50]° Resource-aware Hadoop 0.23 Scheduler and Gridmix Benchmark 22 nodes (64-bit 2.8 GHz Intel Xeon, Adaptive Scheduling job profiler (Sort,Combine,Select) 2GB RAM), Gigabit Ethernet (RAS) [51]° FLEX: flexible allocation Hadoop 0.20.0 Standalone plug-in 1.8 TB synthesized 26 nodes (3 GHz Intel Xeon, 13 4-core blades scheduling scheme or data, and GridMix2 in one rack and 13 8-core blades in second add-on module in rack FAIR [52]° HFSP: size- Pig, Job size estimator PigMix (1,10,100 20 workers with TaskTracker based scheduling Hadoop 1.x and algorithm GB and 1TB) (4 CPU cores, 8GB RAM) [53]° LsPS: self-tuning Hadoop 1.x Plug-in WordCount, Grep, EC2 m1.large (7.5GB RAM, 850GB disk), 11 two-tiers scheduler scheduler PiEstimator, Sort nodes (1 master, 10 slaves), trace-driven simulations [54]° Joint scheduling of Hadoop 1.2.0 3-approximation WordCount (43.7 GB 16 nodes EC2 VM (1 GHz CPU, MapReduce in servers algorithm, heuristic Wikipedia document 1.7GB memory, 160GB disk) package) [55]° Joint Scheduling of - Linear Synthesized workloads Event-based simulations processing and shuffle programming, heuristics [56]° Dynamic slot scheduling Hadoop 1.0.3 Dynamic algorithm Sort (3,6,9,12,15) GB 2 masters (12 CPU cores 1.9 GHz AMD, 32GB for I/O intensive jobs based RAM), 4,8,16 slaves (Intel Core i5 1.9 GHz, on I/O and CPU 8GB RAM) statistics [57]° Energy-efficient - Polynomial time Synthesized workloads MATLAB simulations Scheduling constant-factor approximation algorithm [58]° Energy efficiency for Hadoop 0.20.0 DVFS control in Sort, CloudBurst, 8 nodes (AMD Opteron quad-core 2380 with comp- source code Matrix Multiplication DVFS support, 2 64 kB L1 caches) Gigabit utation intensive and via external Ethernet workloads scheduler [59]° Energy-aware Hadoop 0.19.1 Scheduling TeraSort, Page Rank, 2 nodes (24GB RAM,16 2.4 GHz CPU cores, Scheduling algorithms k-means 1TB Disk), 2 nodes (16GB RAM,16 2.4 GHz EMRSA-I, CPU cores, 1TB Disk) EMRSA-II [60]° DynamicMR: slot Hadoop 1.2.1 Algorithms: PI- PUMA benchmark 10 nodes (Intel's X5675, 3.07 allocation in shared DHSA, PD-DHSA, GHz, 24GB RAM, 56GB disk) clusters SEPB, slot pre- scheduling [61]° PRISM: Fine-grained Hadoop 0.20.2 Phase-level Gridmix 2, 10 nodes; 1 master, 15 slaves (Quad-core Xeon resource aware scheduling PUMA benchmark E5606, 8GB RAM, 100GB disk), Gigabit scheduling algorithm Ethernet [62]° HaSTE: scheduling in Hadoop YARN dynamic WordCount (14,10.5) 8 nodes (8 CPU cores, 8GB RAM) YARN to reduce 2.2.0 programming GB, Terasort 15GB, makespan algorithm WordMean 10.5GB, PiEstimate [63]° SLA-aware energy - DVFS in the per- Sort, Matrix CloudSim-based simulations for 30 servers efficient scheduling in applications master Multiplications with network speeds between 100-200 Mbps Hadoop YARN [64]° Priority-based resource Quasit Meta-scheduler 40 million tuples of 5 nodes; 1 master, 4 workers (AMD scheduling for realistic 3 hours Athlon64 3800+, 2GB RAM) Distributed Stream vehicular traffic traces Processing Systems [65]† Power-efficient resources - Algorithms - Arena simulator allocation and mean end- to-end delay minimization [66]† Resource and Deadline- Hadoop 1.2.1 Receding horizon PUMA benchmark 21 Virtual Machines, 1 master and aware scheduling in control algorithm, (WordCount, TeraSort, 20 slaves (1 CPU core, 2GB RAM) dynamic Hadoop clusters self-learning Grep) completion time estimator [67]† Fair4S: Modified Hadoop 0.19 Scheduler and Synthesized workloads Discrete event-based Fair Scheduler modi- generated by Ankus MapReduce simulator fication to JobTracker [68]† Deadline-Constrained Hadoop 1.2.1 Bipartite Graph WordCount, 20 virtual machines in 4 physical machines MapReduce Scheduling Modelling Sort, Grep (Quad core 3.3 GHz, 32 GB RAM, 1TB disk), to perform BGMRS Gigabit Ethernet. MATLAB simulations for scheduler 3500 nodes [69]† ARIA (Automatic Hadoop 0.20.2 Service level WordCount, Sort, 66 nodes (4 AMD CPU cores, 8GB RAM, 2 Resource Inference and Objective Bayesian classification, 160GB disks) in two racks, Gigabit Ethernet Allocation) scheduler to meet TF-IDF from Mahout, soft deadlines WikiTrends, twitter workload [70]† Scheduling for Hadoop 1.2.4 Dynamic Priority Text search, Sort, Cluster-1: 6 nodes (dual core CPU 3.2 GHz, improved response time multi-queue WordCount, Page Rank 2GB RAM, 250GB disk), Cluster-2: 10 nodes scheduler as a plug- (dual core CPU 3.2 GHz, 2GB RAM, 250GB in disk), Gigabit Ethernet [71]† Scheduling for fast - 3-approximation Synthesized workload Simulations completion in algorithms; MapReduce-like systems OFFA, Online heuristic ONA [72]† Dynamic Job Ordering Hadoop 1.x Greedy algorithms PUMA Benchmark 20 nodes EC2 Extra Large instances (4 and slot configuration based on (WordCount, Sort, CPU cores, 15GB RAM, 4 420GB disks) Johnson’s rule for Grep), synthesized 2-stage flow shop Facebook traces [73]† Reduction of idle HDFS 0.20.2, Energy Loading, Grep, Selection, 12 nodes ( Intel i5-2300, 8GB RAM, 1TB disk) energy consumption HBase 0.90.3, consumption Aggregation, join due to waiting in NoSQL Hive 0.71, model Cassandra 1.0.3, HadoopDB 0.1.1 [74]† RoPE: reduce response SCOPE Query optimizer 80 jobs from major Bing’s Production cluster (tens of thousands time of relational that business groups of 64 bit, multi-core, commodity servers) querying piggyback job execution [75]† BEneath the Task Level Hadoop 2.2.0 Algorithms to Hibench Benchmark 16 nodes in Windows Azure (1 CPU core, (BeTL) modify (WordCount, Hive 1.75GB RAM, 60GB disk), Gigabit Ethernet fine-grained checkpoint MapReduce queries) strategy workflow [76]† Job Slowdown Hadoop 1.0.0 Four algorithms; SWIM traces, 6 nodes (24 dual-quad core, 24GB RAM, Variability reduction FBQ, Grep, sort, WordCount 50TB storage), 1 Gigabit Ethernet and 20 TAGS, SITA, Gbit/s InfiBand, Mumak simulations COMP [77]† Memory-aware reduce Hadoop 1.2.1 Mnemonic PUMA benchmarks 1 node (4 CPU cores, 7GB RAM) tasks configurations mechanism (InvertedIndex, of Reduce tasks to determine Ranked InvertedIndex, number Self Join, Sequence Count, WordCount) [78]† SkewTune: mitigating Hadoop 0.21.1 Modification to job Inverted Index, 20 nodes cluster (2 GHz quad-core CPU, skew in MapReduce tracker and task PageRank, 16GB RAM, 2 750GB disks) applications tracker CloudBurst [79] † Mammoth: global Hadoop 1.0.1 Memory WordCount, Sort, 17 nodes (2 8 CPU cores 2.6 GHz Intel Xeon memory management management WordCount with E5-2670, 32GB memory, 300GB SAS disk) via the public pool Combiner *Jobs/data placement, °Jobs Scheduling, †Completion time.

D. Benchmarking Suites, Production Traces, Modelling, Profiling Techniques, and Simulators for Big Data Applications: 1) Benchmarking Suites: Understanding the complex characteristics of big data workloads is an essential step toward optimizing the configurations for the frameworks parameters used and identifying the sources of bottlenecks in the underlying clusters. As with many legacy applications, MapReduce and other big data frameworks are supported by several standard benchmarking suites such as [80]-[88]. These benchmarks have been widely utilized to evaluate the performance of big data applications in different infrastructures either experimentally, analytically or via simulations as summarized for the optimization studies in Tables I, III, V, and VI. Moreover, they can be used in production environments for initial tuning and debugging purposes, in addition to stress-testing and bottlenecks analysis before the actual run of intended commercial services. The workloads contained in these benchmarks are typically described by a semantic that run on previously collected or randomly generated datasets. Examples are text retrieval-based (e.g. Word-Count (WC), Word-Count with Combiner (WCC), and Sort), and web search-based (e.g. Grep, Inverted Index, and Page Rank) workloads. WC, and WCC calculate the occurrences of each word in large distributed documents. WC and WCC differ in that the reduction is performed totally at the reduce stage in WC, and is done partially at the map stage with the aid of a combiner at the map stage in WCC. Sort generates alphabetically sorted output from input documents. Grep finds the match of regular expressions (regex) in input files, while Inverted Index generates a word-to- document indexing for a list of documents. PageRank is a link analysis algorithm that measures the popularity of web pages based on their referral by other websites. In addition to the above examples, computations related to graphs and to machine learning such as k-means clustering are also considered in benchmarking big data applications. These workloads vary in being I/O, memory, or CPU intensive. For example, PageRank, Grep, and sort are I/O intensive, while WC, Page rank, and k-means are CPU intensive [8]. This should be considered in optimization studies to correctly address bottlenecks and targeted resources. For a detailed review of benchmarking different specialized big data systems, the reader is referred to the survey in [317], where workloads generation techniques, input data generation, and assessment metrics are extensively reviewed. Here, we briefly describe some examples of the widely used big data benchmarks and their workloads as summarized in Table II and listed below: 1) GridMix [80]: GridMix is the standard benchmark included within the Hadoop distribution. GridMix, has three versions and provides a mix of workloads synthesized from traces generated by Rumen [369] from Hadoop clusters. 2) HiBench1 [81]: HiBench is a comprehensive benchmark suite provided by Intel for big data. HiBench provides a wide range of workloads to evaluate computations speed, systems throughput, and resources utilization. 3) HcBench [82]: HcBench is a Hadoop benchmark provided by Intel that includes workloads with a mix of CPU, storage, and network intensive jobs with Gamma inter-job arrival times for realistic clusters evaluations. 4) PUMA [83]: PUMA is a MapReduce benchmark developed at Purdue University that contains workloads with different computational and networking demands. 5) Hive Benchmark [84]: The Hive benchmark contains 4 queries and targets comparing Hadoop with Pig. 6) PigMix [85]: PigMix is a benchmark that contains 12 queries types to test the latency and scalability performance of Apache Pig and MapReduce. 7) BigBench [86]: BigBench is an industrial-based benchmark that provides quires with structures, semi- structured and unstructured data. 8) TPC-H [87]: The TPC-H benchmark, provided by the Transaction Processing Performance Council (TPC), allows generating realistic datasets and performing several business oriented ad-hoc queries. Thus, it can be used to evaluate NoSQL and RDBMS scalability, processing power, and throughput. 9) YCSB2 [88]: The Yahoo Cloud Serving Benchmark (YCSB) targets testing the inserting, reading, updating and scanning operations in database-like systems. YCSB contains 20 records data sets and provides a tool to create the workloads. 10) TABLE II 11) EXAMPLES OF BENCHMARKING SUITES FOR BIG DATA APPLICATIONS AND THEIR WORKLOADS. Benchmark Application Workloads GridMix [80] Hadoop Synthetic loadjob, Synthetic sleepjob HiBench [81] Hadoop, SQL, Micro Benchmarks (Sort, WordCount, TeraSort, Sleep, enhanced DFSIO to test HDFS throughput), Machine Kafka, Spark Learning (Bayesian Classification, k-means, Logistic Regression, Alternating Least Squares, Gradient Boosting Streaming Trees, Linear Regression, Latent Dirichlet Allocation, Principal Components Analysis, Random Forest, Support Vector Machine, Singular Value Decomposition), SQL(Scan, Join, Aggregate), Websearch Benchmarks (PageRank, Nutch indexing), Graph Benchmark (NWeight), Streaming Benchmarks (Identity, Repartition, Stateful WC, Fixwindow) HcBench [82] Hadoop, Hive, Telco-CDR (Call Data Records) interactive queries, Hive workloads (PageRank URLs, aggregates by Mahout source, average of PageRank), k-means Clustering iterative jobs in machine learning, and Terasort PUMA [83] Hadoop Micro Benchmarks (Grep, WordCount, TeraSort), term vector, inverted index, self-join, adjacency- list, k-means, classification, histogram, histogram ratings, sequence count, ranked inverted index Hive [84] Hive Grep selection, Ranking selection, user visits aggregation, user visits join PigMix [85] Pig Explode, fr join, join, distinct agg, anti-join, large group by key, nested split, group all, order by 1 field, order by multiple fields, distinct + union, multi-store BigBench [86] RDBMS, Business queries (cross-selling, customer micro-segmentation, sentiment analysis, enhancing NoSQL multi-channel customer experience, assortment and pricing optimization, performance transparency, return analysis, inventory management, price comparison) TPC-H [87] RDBMS, Ad-hoc queries for New Customer Web Service, Change Payment Method, Create Order, Shipping, Stock NoSQL Management Process, Order Status, New Products Web Service Interaction, Product Detail, Change Item YCSB [88] Cassandra,HBase, Update heavy, Read heavy, Read only, Read latest, Short ranges Yahoo’s PNUTS

2) Production Traces:

1 Available at: https://github.com/intel-hadoop/hibench 2 Available at: https://github.com/brianfrankcooper/YCSB As the above-mentioned benchmarks are defined by their semantics where the codes functionalities are known and the jobs are submitted by a single user to run on deterministic datasets, they might be incapable of fully representing production environments workloads where realistic mixtures of workloads with different data sizes and inter-arrival times for multi users coexist. Information about such realistic workloads can be provided in the form of traces collected from previously submitted jobs in production clusters. Although sharing such information for production environments is hindered by confidentiality, legal and business restrictions, several companies have provided archived jobs traces while normalizing the resources usage. Examples of realistic evaluations based on publicly available production traces were conducted by Yahoo [89], Facebook [90], [91], Cloudera and Facebook [92], Google [93], [94], IBM [95], and in clusters with scientific [96], or business-critical workloads [97]. These traces describe various job features such as number of tasks, data characteristics (i.e. input, shuffle and output data sizes and their ratios), completion time, in addition to jobs inter arrival times and resources usage without revealing information about the semantics or users sensitive data. Then, synthesized workloads can be generated by running dummy codes to artificially generate shuffle and outputs data sizes that match the trace on randomly generated data. To accurately capture the characteristics in the trace while running in testing clusters with a scale smaller than production clusters, effective sampling should be performed. Traces based on ten month log files for data intensive MapReduce jobs running in the M45 supercomputing production Hadoop cluster for Yahoo were characterized in [89] according to utilization, job patterns, and sources of failures. Facebook traces collected from a 600 nodes for 6 months and Yahoo traces were utilized in [90] to provide comparisons and insights for MapReduce jobs in production clusters. Both traces were classified by k- means clustering according to the number of jobs, input, shuffle, and output data sizes, map and reduce tasks, and jobs durations into 10 and 8 bins for Facebook and yahoo traces, respectively. Both traces indicated input data sizes ranging between kBytes and TBytes. The workloads are labeled as small jobs which constitute most of the jobs, load jobs with only map tasks, in addition to expand, aggregate, and transformation jobs based on input, shuffle, and output data sizes. A framework that properly sample the traces to synthesize representative and a scaled down workloads for use in smaller clusters is proposed and sleep requests in Hadoop to emulate the inter- arrivals of jobs were utilized. Facebook traces from 3000 machines with total data size of 12TBytes for MapReduce Workloads with Significant Interactive Analysis were utilized in [91] to evaluate the energy efficiency of allocating more jobs to idle nodes in interactive job clusters. Six Cloudera traces from e-commerce, telecommunications, media, and retail users’ workloads and Facebook traces collected over a year for 2 Million jobs were analyzed in [92]. Several insights for production jobs such as the weekly time series, and the burstiness of submissions were provided. These traces are available in a public repository which also contains a synthesizing tool; SWIM3 which is integrated with Hadoop. Although previously-mentioned traces contain rich information about various jobs characteristics, the lack of per job resources utilization information makes them partially representative for estimating workloads resources demands. Google workloads traces were characterized in [93] and [94] using k-means based on their duration and CPU and memory resources usage per task to aid in capacity planning, forecasting demands growth, and for improving tasks scheduling. Insights such as “most tasks are short”, “duration of long and short tasks follows a Bimodal distribution”, and that “most of the resources are taken by few extensive jobs” were provided. The traces4 were collected from 12k machines cluster in 2011 and include information about scheduling requests, taken actions, tasks submission times and normalized resources usage, and machines availability [94]. However, disk and networking resources were not covered. The traces collected from IBM-based private clouds for banking, communication, e-business, production, and telecommunication industries in [95] further considered disk and file system usage in addition to CPU and memory. The inter-dependencies between the resources were measured and disk and memory resources were found to be negatively correlated indicating the potential benefit of co-locating memory intensive and disk intensive tasks. A user-centric study was conducted in [96] based on traces5 collected from three research clusters; OPENCLOUD, M45, and WEB MINING. Workloads, configurations, in addition to resources usage and sharing information were used to address the gaps between data scientists needs and systems design. Evaluations for preferred applications and the penalties of using default parameters were provided. Two large-scale and long term

3 Available at: https://github.com/SWIMprojectucb/swim/wiki 4 Available at: https://github.com/google/cluster-data 5 Available at: http://www.pdl.cmu.edu/HLA/ traces6 collected from distributed data centers for business-critical workloads such as financial simulators were utilized in [97] to provide basic statistics, correlations, and time-pattern analysis. Full characteristics for the provisioned and actual usage of CPU, memory, disk I/O, and network I/O throughput resources were presented. However, no information about the inter arrival times were provided. Ankus MapReduce workload synthesizer was developed as part of the study in [67] based on e-commerce traces collected from Taobao which is a 2k nodes production Hadoop cluster. The inter arrival times in the traces were found to be Poisson. 3) Modelling and Profiling Techniques: Statistical-based characterization and modelling were considered for big data workloads based on production clusters traces as in [98] or benchmarks as in [99], and [101]. Also, different profiling and workloads modelling studies such as in [102]-[105] were conducted with the aim of automating clusters configurations or estimating different performance metrics such as the completion time based on the selected configurations and resources availability. A statistical-driven workloads generator was developed in [98] to evaluate the energy efficiency of MapReduce clusters under different scales, configurations, and scheduling policies. A framework to ease publishing production traces anonymously based on inter-arrival times, jobs mixes, resources usage, idle time, and data sizes was also proposed. The framework provide non-parametric statistics such as the averages and standard deviations and 5 numbers percentile summaries (1st, 25th , 50 th, 75 th, and 99 th) for inter-arrival times and data sizes. Statistical modelling for GridMix, Hive, and HiBench workloads based on principal component analysis for 45 metrics and regression models was performed in [99] to provide performance estimations for Hadoop clusters under different workloads and configurations. BigDataBench7 was utilized in [100] to examine the performance of 11 representative big data workloads in modern superscale out-of-order processors. Correlation analysis was performed to identify the key factors that affect the Cycles Per Instruction (CPI) count for each workload. Keddah8 in [101] was proposed as a toolchain to profile based on end host or switches traffic measurements, empirically characterize, and reproduce Hadoop traffic. Flow-level traffic models can be derived by Keddah for use with network simulators with varied settings that affect networking requirements such as replication factor, cluster size, split size, and number of reducers. Results based on TeraSort, PageRank and k- means workloads in dedicated and cloud-based clusters indicated high correlation with Keddah-based simulation results. To automate Hadoop clusters configurations, the authors in [102], [103] developed a cost-based optimization that utilize an online profiler and a What-if Engine. The profiler uses the BTrace Java-based dynamic instrumentation tool to collect job profiles at run time for the data flows (i.e. input data and shuffle volumes) and to calculate the cost based on the program, input data, resources, and configuration parameters at tasks granularity. The What-if Engine contains a cost-based optimizer that utilizes a task scheduler simulator and model-based optimization to estimate the costs if different combination of cost variables are used. This suits just-in-time configurations for the computations that run periodically in production clusters, where the profiler can trial the execution on a small set of the cluster to obtain the cost function, then the What-if Engine obtains the best configurations to be used for the rest of the workload. In [104], a Hadoop performance model is introduced. It estimates completion time and resources usage based on Locally Weighted Linear Regression and Langrage Multiplier, respectively. The non-overlapped and over-lapped phases of shuffling and the number of reduce waves were carefully considered. Polynomial regression is used in [105] to estimate the CPU usage in clocks per cycle for MapReduce jobs based on the number of map and reduce tasks. 4) Simulators: Tuning big data applications in clusters requires time consuming and error-prone evaluations for a wide range of configurations and parameters. Moreover, the availability of a dedicated cluster or an experimental setup is not always guaranteed due to their high deployment costs. To ease tuning the parameters and to study the behaviour of big data applications in different environments, several simulation tools were proposed [106]-[119], and [148]. These simulators differ in their engines, scalability, flexibility with parameters, level of details, and the support for additional features such as multi disks and data skew. Mumak [106] is an Apache Hadoop simulator included in its distribution to simulate the behaviour of large Hadoop clusters by replaying previously generated traces. A built-in tool; Rumen [369] is included in Hadoop to generate these traces by extracting previous jobs information

6 Available at: http://gwa.ewi.tudelft.nl/datasets/Bitbrains 7 Available at: http://prof.ict.ac.cn/BigDataBench 8 Available at: https://github.com/deng113jie/keddah from their log files. Rumen collects more than 40 properties of the tasks. In addition, it provides the topology information to Mumak. However, Mumak simplifies the simulations by assuming that the reduce phase starts only after the map phase finishes, thus it does not provide accurate modelling for shuffling and provides rough completion time estimation. SimMR is proposed in [107] as a MapReduce simulator that focuses on modelling different resource allocation and scheduling approaches. SimMR is capable of replaying real workloads traces as well as synthetic traces based on the statistical properties of the workloads. It relies on a discrete event-based simulator engine that accurately emulates Job Tracker decisions in Hadoop for map/reduce slot allocation, and a pluggable scheduling policy engine that simulates decisions based on the available resources. SimMR was tested on a 66-node cluster and was found to be more accurate and two orders of magnitude faster than Mumak. It simplifies the node modelling by assuming several cores but only one disk. MRSim9 is a discrete event-based MapReduce simulator that relies on SimJava, and GridSim to test workloads behaviour in terms of completion time and utilization [108]. The user provides cluster topology information and jobs specifications such as the number of map and reduce tasks, the data layout (i.e locations and replication factor), and algorithms description (i.e. number of CPU instructions per record and average record size) for the simulations. MRSim focuses on modelling multi-cores, single disk, network traffic, in addition to memory, buffers, merge, parallel copy, and sort parameters. The results of MRSim were validated on a single rack cluster with four nodes. The authors in [109] proposed MR-cloudsim as a simulation tool for MapReduce in cloud computing data centers. MR-cloudsim is based on the widely used open-source cloud systems event-driven simulator; CloudSim [110]. In MR-cloudsim, several simplifications such as assigning one reduce per map, and allowing the reduce phase to start only after the map phase finishes are assumed. To assist with MapReduce clusters design and testing, MRPerf10 is proposed in [111] to provide fine-grained simulations while focusing on modelling the activities inside the nodes, the disk and data parameters, and the inter and intra rack networks configurations. MRPerf is based on Network Simulator-2 (ns-2)11 which is a packet-level simulator, and DiskSim12 which is an advanced disk simulator. A MapReduce heuristic is used to model Hadoop behaviour and perform scheduling. In MRPerf, the user provides the topology information, the data layout, and the job specifications, and obtains a detailed phase-level trace that contains information about the completion time and the volume of transferred data. However, MRPerf needs several minutes per evaluation and has the limitation of modelling only one replica and a single disk per node. Also it does not enable speculative execution modeling and simplifies I/O and computations processes by not overlapping them. MRemu13 in [112] is an emulation-based framework based on Miminet 2.0 for MapReduce performance evaluations in terms of completion time in different data center networks. The user can emulate arbitrary network topologies and assign bandwidth, packet loss ratio, and latency parameters. Moreover, network control based on SDN can be emulated. A layered simulation architecture; CSMethod is proposed in [113] to map the interactions between different software and hardware entities at the cluster level with big data applications. Detailed models for JVMs, the name nodes, data nodes, JT, TT, and scheduling were developed. Furthermore, the diverse hardware choices such components, specifications, and topologies were considered. Similar approaches were also used in [114], and [115] to simulate NoSQL and Hive applications, respectively. As part of their study, the authors in [136] developed a network flow level discrete-event simulator named PurSim to aid with simulating MapReduce executions in up to 200 virtual machines. Few recent articles considered the modelling and simulations of YARN environments [116]-[119]. Yarn Scheduler Load Simulator (SLS) is included in the Hadoop distribution to be used with Rumen to evaluate the performance of YARN different scheduling algorithms with different workloads [116]. SLS utilizes a single JVM to exercise a real RM with thread-based simulators for NM and AM. However, SLS ignores simulating the network effects as the NM and AM simulators interact with the RM only via heartbeat events. The authors in [117] suggested an extension for SLS with an SDN-based network emulator; MaxiNet and a data center traffic generator DCT2 to add realistic modelling for network and traffic in YARN environments and to emulate the interactions between jobs scheduling and flow scheduling. Real-Time ABS language is utilized in [118] to develop ABS-YARN simulator that focuses on prototyping YARN and modelling job executions. YARNsim in [119] is a

9 Available at: http://code.google.com/p/mrsim 10 Available at: https://github.com/guanying/mrperf 11 Available at: http://www.isi.edu/nsnam/ns 12 Available at: http://www.pdl.cmu.edu/DiskSim/ 13 Available at: https://github.com/mvneves/mremu parallel discrete-event simulator in YARN that provides comprehensive protocol-level accuracy simulations for task executions and data flow. A detailed modeling for networking, HDFS, data skew, and I/O read and write latencies is provided. However, YARNsim simplifies scheduling policies and fault tolerance. A comprehensive set of Hadoop benchmarks in addition to bioinformatics clustering applications were utilized for experimental validation and an average error of 10% was achieved.

IV. CLOUD COMPUTING SERVICES, DEPLOYMENT MODELS, AND RELATED TECHNOLOGIES: Cloud computing aims to enable seamless access for multi-users or tenants to a pool of computational, storage, and networking resources that typically reside within and between several geographically distributed data centers. Unlike traditional IT services that are limited by localized resources inaccessible by remote computing units, cloud computing services allow dynamic outsourcing to software and/or hardware resources. Hence, they can provide scalable and large-scale computational solutions while increasing the resources utilization. Moreover, cloud computing considerably reduces both; the capital expenditures (CAPEX) and operational expenses (OPEX) of software and hardware and increases the resilience. Consequently, it is continuing to encourage wide deployments of large-scale Internet-based services by various organizations and enterprises [370]. Although MapReduce and many other big data frameworks were originally provisioned for use in local clusters under controlled environments, there is an increasing number of cloud computing-based big data applications realizations and services despite the incurred overheads and challenges. This Section provides a brief overview of cloud computing concepts and discusses the challenges and implications of deploying big data applications in cloud computing environments. Moreover, it reviews some related emerging technologies that support computing and networking resource provisioning in cloud computing infrastructures and can hence improve the performance of big data applications. These related technologies aim mainly to virtualize and softwarize networking and computing systems at different levels to provide agile and flexible future-proof solutions. The rest of this Section is organized as follows: Subsection IV-A reviews cloud computing services and deployment models while Subsection IV-B reviews machine and network virtualization, Network Function Virtualization (NFV), containers, and Software-Defined Networking (SDN) technologies. Subsection IV-C discusses the requirements and the challenges of deploying big data applications in cloud computing infrastructures, while Subsection IV-D illustrates the options for deploying big data application in geo-distributed clouds. Finally, Subsection IV-E presents the implications of big data on cloud networking infrastructures. A. Cloud Computing Services and Deployment Models: The ability to share hardware, software, development platforms, network, or applications resources between multi-users enables cloud computing systems to provide what can be described as anything-as-a-service (XaaS). Cloud computing services can be categorized according to the outsourced resources and end-users’ privileges into Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS) [371]. SaaS provides on-demand Internet-based services and applications to end-users without providing the privilege of controlling or accessing the hardware, network, , or the development platforms resources. Examples of SaaS are Salesforce, Google Apps, and Microsoft Office 365. In the PaaS model, end-users have access privilege to the platform which enables them to develop, control, and upgrade their own cloud applications, but not to the underlying hardware. Thus, more flexibility is provided without the need for owning, operating and maintaining the hardware. Microsoft Azure, AWS Elastic Beanstalk, and Google App Engine are examples of PaaS provided as a pay-as-you-go service. The IaaS model provides end-users with extra provisioning privileges that allow them to fully control the hardware, the operating systems, and the applications development platforms [372]. Examples of IaaS are Amazon Elastic Compute Cloud (EC2), Rackspace, and Google Compute Engine. Based on their accessibility and ownership, cloud infrastructures have four deployment models which are public, private, hybrid, and community clouds [313]. Providers of public clouds allow users to rent cloud resources so they can accelerate the launching of their businesses, services, and applications. The advantages include the elimination of CAPEX and OPEX costs, and the reduction of risks especially for start-up companies. The disadvantage is however the lack of fine-grained control over data storage, network, or security settings. In the private cloud model, the infrastructure is exclusive for use by the owners. Although the costs are high, the full control of the infrastructure typically leads to guaranteed performance, reliability, and security metrics. In the hybrid clouds model, a private cloud may offload or burst some of its workload to a public cloud due to exceeding its capacity, for costs reduction, or to achieve other performance metrics [121], [142]. Hybrid clouds combine the benefits of both private and public clouds but at the expense of additional controlling overhead. Finally, community clouds are owned by several organizations to serve customers with shared and specific interests and needs [313].

B. Cloud Computing and Big Data Related Technologies: 1) Machine Virtualization: Machine virtualization abstracts the CPU, memory, network, and disk resources and provides isolated interfaces to allow several Virtual Machines (VMs), also called instances, to coexist in the same physical machine while having different Operating Systems (OS) [373]. Cloud computing relies heavily on machine virtualization as it allows dynamic resources sharing among different workloads, applications, or tenants while isolating them [374]. Moreover, VMs are widely considered in the billing structures of pay-as-you-go cloud services as they can be leased with different prices based on pre-defined sets of resources and configurations. For example, IaaS is typically provided in the form of on-demand VMs such as on EC2 [375] and Google Compute engine [376]. Examples of EC2 offerings are the general purpose t2 VMs with nano, micro, small, medium, large, xlarge, and 2xlarge options, in addition to compute, memory, or storage optimized offers [375]. Google Compute engine offers standard machines (e.g. n1-standard-64), high memory, and high CPU machines [376]. Figure 7(a) illustrates the components of a physical machine with VMs.

(a) (b) Fig. 7. Physical machines with (a) VMs and (b) containers.

Virtualization increases the utilization and energy efficiency of cloud infrastructures as with proper management, VMs can be efficiently assigned and seamlessly migrated while running and can thus be consolidated in fewer physical machines. Then, more machines can be switched to low-power or sleep modes. Managing and monitoring VMs in cloud data centers have been realized through several management platforms (i.e. hypervisors) such as Xen [377], KVM [378], and VMWare [379]. Also, virtual infrastructures management tools such as Open Nebula, and VMWare VSphere, were introduced to support the management of virtualized resources in heterogeneous environments such as hybrid cloud [374]. However, and compared to ``bare-metal'' implementations (i.e. native use of physical machines), virtualization can add performance overheads related to the additional software layer, memory management requirements, and the virtualization of I/O and networking interfaces [380], [381]. Moreover additional networking overheads are encountered when migrating VMs between several clouds for load balancing and power saving purposes through commercial or dedicated inter data center networks [382]. These migrations, if done concurrently by different service providers, can lead to increased network congestion, exceeding delay limits, and hence services performance fluctuation, and violations to SLAs [383]. In the context of VMs, several studies considered optimizing their placements and resource allocation to improve big data applications performance. Also, overcoming the overheads of using VMs with big data applications is considered as detailed in Subsection V-B. 2) Network Virtualization: To complement the benefits of the mature VMs technologies used in cloud computing systems, virtualizing networking resources have also been considered [384], [385]. Network virtualization enables efficient use of cloud networking resources by allowing multiple heterogeneous Virtual Networks (VNs), also known as slices, composed of virtual links and nodes (i.e. switches and routers) to coexist on the same physical network (substrate network). As with VMs, VNs can be created, updated, migrated, and deleted upon need and hence, allow customizable and scalable on-demand allocations. The assignments and mapping of the VNs on the physical network resources can be performed through Virtual Network Embedding (VNE) offline or online algorithms [386]. VNE can target different goals such as maximizing revenue and resilience through effective links remapping and virtual node migrations, in addition to energy efficiency by consolidating over-provisioned resources [387], [388]. Figure 8 illustrates the concept of VNs and VNE where three different VNs share the physical network. Several challenges can be encountered with network virtualization due to the large scale and the heterogeneous and autonomous nature of cloud networking infrastructures [384]. Also, different infrastructure providers (InP) can have conflicting goals and un-unified QoS measures for their network services.

Fig. 8. The concept of virtual networks and VNE.

3) Network Function Virtualization (NFV): Network Function Virtualization (NFV) is the detachment of networking functions (NFs) from their proprietary equipment (i.e. middleboxes and appliances) [389] - [393]. Various NFs such as routing, load balancing, firewalls, multimedia caching, QoS monitoring, gateways, proxies, Deep Packet Inspection (DPI), Dynamic Host Configuration Protocol (DHCP), and Network Address Translator (NAT), in addition to mobile networking and telecom operators services such as BaseBand Units (BBUs) and Mobility Management Entities (MMEs) can then be provided as software instances virtualized in data centers, servers, or high capacity edge networking devices [394], [395]. The state-of-the-art implementations of such functions in proprietary hardware are characterized by relatively high OPEX and CPEX, and slow per-device basis upgrades and development cycles. NFV allows rapid development of each NF independently which supports the release and upgrades of new, and existing networking and telecom services in a cost-effective, scalable, and agile manner. Such virtualized NF instances can be created, replicated, terminated, and migrated as needed to elastically improve the handling of the evolving functions and the increasing big data traffic to be acquired, transferred, cached or, processed by those functions. Moreover, NFV can increase the energy efficiency for example by consolidating and scaling resource usage of BBU in cloud environments and virtualized MME pools in base stations [396]-[399]. A key effort to standardize NFV was initiated in 2012 by the European Telecommunication Standard Institute (ETSI) which was selected by seven leading telecom operators to provide the requirements, deployment recommendations, and unified terminologies for NFV. According to ETSI, the components of a Virtual Network Function (VNF) architecture are illustrated in Figure 9 and are listed below [393], [394]: • The networking services which are delivered as VNFs. • The NFV Infrastructure (NFVI) composed of the software and hardware to virtualize the computing and networking infrastructure required to realize and host the VNFs. • The Management and Orchestration (MANO) which manages the life-cycle of VNFs and their resources usage. NFV aided by cloud computing infrastructures is considered a key enabler for future 5G networks that target innovation, agility, and programmability in networking services for multi-tenant usage with heterogeneous requirements and demands [393], [400]. To achieve this, telecom infrastructures are required to transform their infrastructure design and management for example by integrating VNF with Cloud Radio Access Networks (C- RANs) in the front-haul network (e.g. virtualized BBUs that support multiple Remote Radio Heads (RRHs) and MMEs) [401], [402], and with virtualized Evolved Packet Core networks (vEPC) in the back-haul network [390], [391]. Moreover, and to provide isolated and unified end-to-end network slices in 5G, wireless and spectrum resources in access [403] and wireless networks [404], [405] are also virtualized and integrated with cloud infrastructures and VNFs. Among the challenges with NFV deployments are the need for the coexistence of NFV-enabled and legacy equipment, and the sensitivity of VNFs to underlying NFVI specifications [406]. Also, further challenges are encountered in the optimized allocation of NFVI resources to VNFs while keeping the cost and energy consumption low, and the composition of end-to-end chained services where traffic has to pass through a certain ordered set of NFVs to deliver the required functionality, [394].

Fig. 9. The concept of NFV and the ETSI-NFV architecture [393].

4) Container-based Virtualization: Container-based virtualization is a recently introduced technology for cloud computing infrastructures proposed to reduce the overheads of VMs as containers can help provide up to five times better performance [407]. While hypervisors perform the isolation at the hardware level and require an independent guest OS for each VM, containers perform the isolation at the OS level and thus, can be regarded as a lightweight alternative to VMs [408]. In each physical machine, the containers share the OS kernel and provide isolation through having different user-spaces by using Linux Kernel containment (LKC) user-space interfaces for example. Figure 7 compares the components of physical machines when hosting VMs 7(a) and containers 7(b). In addition to sharing the OS, containers can also share the binary and library files of the applications running on them. With these features, containers can be deployed, migrated, restarted, and terminated faster and can be deployed in larger numbers in a single machine compared to VMs. However, and due to sharing the OS, containers can be less secure than VMs. To improve their security, containers can be deployed inside VMs and share the Guest OS [409] at the cost of reduced performance. Some examples of Linux-based container engines are Linux-Vserver, OpenVZ and Linux containers (LXC). The performance isolation, ease of resources management, and overheads of these systems for MapReduce clusters have been addressed in [410] and near bare-metal performance was reported. Docker [407] is an open- source and widely-used containers manager that extends LKC with the kernel and application APIs within the containers [411]. Docker containers have been used to launch the containers within YARN (e.g. in Hadoop 2.7.2) to provide better software environment assembly and consistency and elasticity in assigning the resources to different components within YARN (e.g. map or reduce containers) [412]. Besides YARN, other cloud resources management platforms have successfully adopted containers for resources management such as Mesos and Quasar [160]. Similar to YARN, V-Hadoop was proposed in [413] to enable the use of Linux containers for Hadoop. V- Hadoop scales the number of containers used according to the resource usage and availability in cloud environments. 5) Software-defined Networking (SDN): Software-defined networking (SDN) is an evolving networking paradigm that separates the control plane, which generates the rules for where and how data packets are forwarded, from the data plane, which handles the packets received at the device according to agreed rules, in networking infrastructures [414]. Legacy networks as depicted in Figure 10(a), contain networking devices with vendor-specific integrated data and control planes that are hard to update and scale, while software-defined networks introduce a broader level of programmability and flexibility in networking operations while providing a unified view for the entire network. SDN architectures can have centralized or semi-centralized controllers distributed across the network [415] to monitor and operate networking devices such as switches and routers while considering them as simple forwarding elements [416] as illustrated in Figure 10(b). Software defined networks have three main layers namely; infrastructure, control, and application. The infrastructure layer which is composed of SDN-enabled devices interacts with the control layer through a Southbound Interface (SBI), while the application layer connects to the control layer through a Northbound Interface (NBI) [415], [417]. Several protocols for SBI such as OpenFlow were developed as an effort to standardize and realize SDN architectures [418] - [420]. OpenFlow performs the switching at the flow granularity (i.e. group of sequenced packets with common set of header fields) where each forwarding element contains a flow table that receives rules updates dynamically from the SDN controllers. Examples of commercially available OpenFlow centralized controllers are NOX, POX, Trema, Ryu, FloodLight, Beacon, and Maestro, and of distributed controllers are Onix, ONOS, and HyperFlow [420] - [422]. Software-based OpenFlow switches such as Open vSwitch (OVS) are also introduced to enable SDN in virtualized environments and ease the interactions between hypervisors, container engines, and various application elements and the software-based SDN controller while connecting several physical machines [423]. An additional tool that aids the control in SDN at a finer granularity is the Programmable Protocol-independent Packet Processor (P4) high-level language that enables arbitrary modifications to packets in forwarding elements at the line rate [424]. The flexibility and agility benefits that result from adopting SDN in large-scale networks have been experimentally validated in several testbeds [425] and in commercial Wide Area Networks (WANs) that interconnect geo-distributed cloud data centers such as Google B4 WAN [426], and Microsoft Software-driven WAN (SWAN) [427]. SDN in WAN supports various Traffic Engineering (TE) operations and improves their congestion management, load balancing, and fault tolerance. Also, SDN relaxes the over-provisioning requirements of traditional WANs which are typically 30-60-% utilized to increase the resilience and performance [421], [422]. The programmability of SDN enables fine-grained integration of adaptive routing and flow aggregation protocols with additional resource allocation and energy-aware algorithms across different network layers [428]. This allows dynamic configuration and ease of implementation for networking applications with scalability requirements and dynamic traffic. The concepts have found wide spread use in cloud computing services and big data applications [429], [430], NFVs [391]-[393] and wireless networks [405]. This is in addition to intra data center networking [421], [431] as will be discussed in Subsection VI-E.

(a) (b) Fig. 10. Architectures of (a) Legacy, and (b) SDN-enabled networks.

6) Advances in Optical Networks: Transport networks, as depicted in Figure 1, are composed of a core WAN, typically connected as a mesh to link cities in a country or across continents, Metropolitan Area Network (MAN) connected in ring or star topologies, and access networks mostly with Passive Optical Networks (PONs) technologies and topologies [432]. Core networks use fiber optic communication technologies due to their high capacity, reach, and energy efficiency. High-end Internet Protocol (IP) routers in core networks are widely integrated with optical switches to form IP over Wavelength-Division-Multiplexing (WDM) architectures. The IP layer aggregates the traffic from metro and access networks and connects with the optical layer through short reach interfaces. The optical layer is composed of optical switching devices, mainly Reconfigurable Optical Add Drop Multiplexers (ROADM) realized by Wavelength Selective Switches (WSS), or Optical Cross Connect (OXC) switches to drop wavelengths of terminating traffic, insert wavelengths of newly received traffic, and convert wavelengths, if necessary, to groom transit traffic. In addition, transponders for demodulation and modulation and Optical-Electrical-Optical (O/E/O) conversations are required [433]. Between the nodes, fiber links spanning up to thousands of kilometers are utilized. Due to physical impairments in fibers causing optical power losses and pulse dispersion, amplification is required mainly with Erbium-Doped Fiber Amplifiers (EDFAs) at fixed distances. Reamplification, Reshaping, and Retiming (3R) regenerators can also be installed at distances depending on the line rate. Such networks have sophisticated heterogeneous configurations and are typically configured to be static for long periods to reduce labor risk and costs. However, the increasing aggregated traffic due to Internet-based services with big data workloads makes legacy infrastructures incapable of serving future demands efficiently. This calls for improvements at different layers [434]-[437]. Legacy Coarse or Dense WDM (CDMA or DWDM) systems, standardized by the International Telecommunication Union (ITU), utilized transponders based on On Off Keying (i.e. intensity) modulation and direct detection with carriers in the Conventional band (C-band) (1530-1565 nm) with 50 GHz spacing between channels and with up to 96 channels [438]. Typically, Single-Mode Fiber (SMF) links with a Single Line Rate (SLR) at 10 Gbps or 40 Gbps per fiber are utilized. To increase the capacity with existing fiber plants and to improve the spectral efficiency, the use of Mixed Line Rates (MLR) and multiple modulation formats with more than 1 bit per symbol such as duobinary and Differential Quadrature Phase Shift Keying (DQPSK) were proposed [438]. The rates in MLR systems can be adjusted to transport traffic with short/long reach with high/low rates to reduce regeneration requirements. With the advances in digital coherent detection enabled with high sampling rate Digital-to-Analogue Converters (DACs), Digital Signal Processors (DSP), pre and post digital Chromatic Dispersion (CD) compensation, and Forward Error Correction (FEC) [439]-[441], the use of coherent higher order modulation formats such as QPSK, Quadrature Amplitude Modulation (QAM), in addition to Polarization Multiplexing were also proposed to enhance the spectral efficiency. DSP can realize matched-filters for detection and hence can enable transmission at the Nyquist limit for Inter Symbol Interference (ISI)-free transmission and relaxes the 50 GHz guard bands requirements between adjacent WDM channels allowing more channels in the C- band [440]. The use of the Long-band (L-band) (1565-1625 nm), and the Short-band (S-band) (1460–1530 nm) was also proposed to accommodate more channels, however, costly and complex amplifiers such as Raman amplifiers are required in addition to careful impairments and non-linearities compensation approaches [440], [442]. Space Division multiplexing (SDM)-based solutions that enable wavelengths reuse in the fiber are also considered to increase links’ capacity. However, these solutions call for the replacement of SMFs with Multi Mode Fibers (MMFs) (i.e. with large core diameter to accommodate several lightpaths propagating at different modes), or Multi Core Fiber (MCFs) (i.e. with several cores within the fiber) to spatially separate lightpaths that use the same carrier [442]. To further improve the spectral efficiency and allow dynamic bandwidth allocation, the concept of superchannels in Elastic Optical Networks (EONs) is also introduced [443]-[446]. A superchannel is composed of bundled spatial or spectral channels with variable bandwidths at the granularity of 12.5 GHz, as defined by the ITU-T SG15/G.694.1 FlexGrid. These channels can be transmitted as a single entity with guard bands only between superchannels. Such channels can be constructed with Nyquist WDM or with coherent Optical Orthogonal Frequency Division Multiplexing (O-OFDM) which overlaps adjacent subcarriers [447]-[449]. Such flexibility in bandwidth assignments in FlexGrids requires programmable and adaptive networking equipment such as Bandwidth-Variable Transponders (BV-Ts), Bandwidth-Variable Wavelength Selective Switches (BV-WSSs), and Contentionless, Directionless, and Colorless (CDC) Reconfigurable Optical Add Drop Multiplexers (ROADMs) as detailed in [435], [436], and [442]. The modulation format, line rate, and bandwidth of each channel can then be dynamically configured with SDN control [436] to provide programmable and agile Software- Defined Optical Networks with fine-grained control at the optical components and IP layer levels [188], [450].

C. Challenges and requirements for Big Data Applications deployments in Cloud Environments: Although Hadoop and other big data frameworks were originally designed and provisioned to run in dedicated clusters under controlled environments, several cloud-based services, enabled by leasing resources from public, private or hybrid clouds, are offering big data computations to public users aiming to increase the profit and utilization of cloud infrastructures [451]. Examples of such services are Amazon Elastic MapReduce (EMR)14, Microsoft's Azure HDInsight15, VMWare Serengeti's project [452], Cloud MapReduce [120], and Resilin [453]. Cloud-based implementations are increasingly considered as cost-effective, powerful, and scalable delivery models for big data analytics and can be preferred over cluster-based deployments especially for interactive workloads [454]. However, to suit cloud computing environments, conventional big data applications are required to adopt several changes in their frameworks [124], [148]. Also, to suit big data applications and to meet the SLAs and QoS metrics for the heterogeneous demands of multi-users, cloud computing infrastructures are required to provide on-demand elastic and resilient services with adequate processing, storage, memory, and networking resources [126]. Here we identify and discuss some key challenges and requirements for big data applications in cloud environments: Framework modifications: Single site cluster implementations typically couple data and compute nodes (e.g. name nodes and JVMs coexist in same machines) to realize data locality. In virtualized cloud environments and because elasticity is favored for computing, while resilience is favored for storage, the two entities are typically decoupled or splitted [142]. For example, VMWare follow this split Hadoop architecture, and Amazon EMR utilizes S3 for storage and EC2 instances for computing. This requires an additional data loading step into computing VMs before running jobs [148]. Such transfers are typically free of charge within the same geographical zone to promote the use of both services, but are charged if carried between different zones. For cloud-based RDBMs and as discussed in Section II-C, the ACID requirements are replaced by the BASE requirements, where either the availability or the consistency are relaxed to guarantee partition-tolerance which is a critical requirement in cloud environments [455]. Another challenge associated with the scale and heterogeneity in cloud-implementations is that applications debugging cannot be performed in smaller configurations as in single-site environments and requires tracing in the actual scale [313]. Offer selection and pricing: The availability of large number of competing Cloud Service Providers (CSP) each offering VMs with different characteristics can be confusing to users whom should define their policies, and requirements. To ease optimizing Hadoop-based applications for un-experienced users, some platforms such as

14 Available at: https://aws.amazon.com/emr/ 15 Available at: https://azure.microsoft.com/en-gb/services/hdinsight/ AmazonEC216 provide a guiding script to aid in estimate computing requirements and tuning of configurations. However, unified methods for estimation and comparison are not available. CSP are also challenged with resources allocation while meeting revenue goals as surveyed in [456] at different networking layers where different economic and pricing models are considered. Resources allocation: Cloud-based deployments for big data applications require dynamic resource allocation at run-time as newly generated data arrivals and volumes are unprecedented and production workloads might change periodically, or irregularly. Fixed resource allocations or pre-provisioning can lead to either under- provisioning and hence QoS violations or over-provisioning which lead to increased costs [120]. Moreover, cloud- based computations might experience performance variations due to interference caused by the share of resources usage. Such variations require careful resources allocations especially for scientific workflow, which also require elasticity in resources assignments as the requirements of different stages vary [457]. Data intensive applications and scientific workflows in clouds have data management challenges (i.e. transfers between storage and compute VMs) and data transfer bottlenecks over WANs [458]. QoS and SLA guarantees: In cloud environments, guaranteeing and maintaining QoS metrics such as response time, processing time, trust, security, failure-rate, and maintenance time to meet SLAs, which are the agreements between CSPs and users about services requirements and violation penalties, is a challenging task due to several factors as discussed in the survey in [459]. With multiple tenants sharing the infrastructure, QoS metrics for each user should be predictable and independent of the co-existence with other users which requires efficient resource allocation and performance monitoring. A QoS metric that directly impacts the revenue is the response time of interactive services. It was reported in [460], and [461] that a latency of 100 ms in search results caused 1% loss in the sales of Amazon, while a latency of 500 ms caused 20% sales drop, and a speed up of 5 seconds resulted 10% sales increase in Google. QoE in video services is also impacted by latency. It was measured in [462] that with more than 2 seconds delay in content delivery, 60% of the users abandon the service. Different interactive services depending on big data analytics are expected to have similar impacts on revenue and customer behaviors. Resilience: Increasing the reliability, availability, data confidentiality, and ensuring the continuity of services provided by cloud infrastructures and applications against cyber-attacks or systems failures are critical requirements especially for sensitive services such as banking and healthcare applications. Resiliency in cloud infrastructures is approached at different layers including the computing and networking hardware layer, the middleware, and virtualization layer, and at the applications layer by efficient replication, checkpointing, and clouds collaboration as extensively surveyed in [463]. Energy consumption: A drawback of cloud-based applications is that the energy consumption at both network and end-user devices can exceed the energy consumption of local deployments [464], and [465]. D. Options for Big Data Applications deployment in Geo-distributed Cloud Environments: CSPs place their workloads and content in geo-distributed data centers to improve the quality of their services, and rely on multiple Internet Service Providers (ISPs), or dedicated WANs [426], [427] to connect these data centers. Advantages such as load balancing, increasing the capacity, availability and resilience against catastrophic failures, and reducing the latency by being close to the users are attained [440]. Such environments attract commercial and non-commercial big data analytics as they match the geo-distributed nature of data generation and provide scales beyond single-cluster implementations. Thus, there is a recent interest in utilizing geo- distributed data centers for big data applications despite the challenging requirements as surveyed in [323], and suggested in [466]. Figure 11 illustrates different scenarios for deploying big data applications in geo-distributed data centers and compares single-site clusters (case A) with various geo-distributed infrastructures with big data including bulk data transfers (case B), cloud bursting (case C), and different implementations for geo-distributed big data frameworks (case D) as outlined in [177]. Case B refers to legacy and big data bulk data transfers between geo-distributed data centers including data backups, content replications, bulk data transfers, and VMs migrations which can reach between 330TB and 3.3PB a month [164] requiring cost and performance optimized scheduling and routing. Case C is for hybrid frameworks, where a private cloud bursts some of its workloads to public clouds for various goals such as reducing costs, or accelerating completion time. Such scenarios require interfacing with

16 Available at: https://wiki.apache.org/hadoop/AmazonEC2 public cloud frameworks in addition to profiling for performance estimations to ensure the gain from bursting compared to localized computations.

Fig. 11. Different scenarios for deploying big data applications in geo-distributed cloud environments.

Geo-distributed big data frameworks in public cloud have three main deployment modes [177]. Case D-1 copies all input data into a single data center prior to processing. This case is cost-effective and less complex in terms of framework management and computations but encounters delays due to WAN bottlenecks. Case D-2 distributes the map-like tasks of a single job in geo-distributed locations to locally process input data. Then, intermediate results are shuffled geo-distributively to a single location for the final reduce-like tasks for final output computations. This arrangement suits workloads with intermediate data sizes much less than the input data. Although networking overheads can be reduced compared to case D-1, this case requires complex geo-aware control for the distributed framework components and experiences task performance fluctuations under heterogeneous clouds computing capabilities. The last case, D-3, creates a separate job in each location and transmits the individual output results to a single location to launch a final job for results aggregation. This realization relaxes the fine-grained control requirements of the distributed framework in D-2, but is considered costly due to the large number of jobs. Also, it only suits associative workloads such as ‘averages’ or ‘counting’ where the computations can be performed progressively in stages. Although such options provide flexibility in implementing geo-distributed big data applications, some unavoidable obstacles can be encountered. Due to privacy and governance regulation reasons, copying data beyond their regions might be prohibited [467]. Also, due to HDFS federation (i.e. data nodes cannot register in other locations governed by a different organization), and the possibility of using different DFSs, additional codes to unify data retrieval are required. E. Big Data Implications on the Energy Consumption of Cloud Networking Infrastructures: Driven by the economic, environmental, and social impacts of the increased CAPEX, OPEX, Global Greenhouse Gas (GHG) emission, and carbon footprints as a result of the expanding demands for Internet-based services, tremendous efforts have been devoted by industry and academia to reduce the power consumption and increase the energy efficiency of transport networks [468]-[472]. These services empowered by fog, edge, and cloud computing, and various big data frameworks, incur huge traffic loads on networking infrastructures and computational loads on hosting data centers which in turn, increase the power consumption and carbon footprints of these infrastructures [1], [473]-[475]. The energy efficiency of utilizing different technologies for wireless access networks has been addressed in [476]-[481], while for wired PONs and hybrid access networks in [482]- [488]. Core networks, that interconnect cloud data centers with metro and access networks containing IoT and users devices, transport huge amounts of aggregated traffic. Therefore, optimizing core networks plays an important role in improving the energy efficiency of the cloud networking infrastructures challenged by big data. The reduction of energy consumption and carbon footprint in core networks, mainly IP over WDM networks, have been widely considered in the literature by optimizing the design of their systems, devices, and/or routing protocols [432], [489]-[506], utilizing renewable energy sources [507]-[513], and by optimizing the resources assignment and contents placement in different Internet-based applications [387], [514]-[527]. The early positioning study in [489] to green the Internet addressed the impact of coordinated and uncoordinated sleeps (for line cards, crossbars, and main processors within switches) on the switching protocols such as Open Shortest Path First (OSPF) and Internal Broader Gateway Protocol (IBGP). Factors such as how, when, and where to cause devices to sleep, and the overheads of redirecting the traffic and awakening the devices were addressed. The study pointed out that energy savings are feasible but are challenging due to the modification required in devices and protocols. In [490], several energy minimization approaches were proposed such as Dynamic Voltage Scaling (DVF) and Dynamic Frequency Scaling (DFS) at the circuit level, and efficient routing based on equipment with efficient energy profiles at the network level. The consideration of switching off idle nodes and rate adaptation have also been reported in [491]. The energy efficiency of the bypass and non-bypass virtual topologies and traffic grooming schemes in IP over WDM have been assessed in [492] through Mixed Integer Linear Programming (MILP) and heuristic methods. The non-bypass approach requires O/E/O conversation to lightpaths (i.e. traffic carried optically in fiber links and optical devices) in all intermediate nodes, to be processed electronically in the IP layer and routed to following lightpaths. On the other hand, the bypass approach omits the need for O/E/O conversation in intermediate nodes, and hence reduces the number of IP router ports needed, and achieves power consumption savings between 25% and 45% compared to the non-bypass approach. In [493], a joint optimization for the physical topology of core IP over WDM networks, the energy consumption and average propagation delay is considered under bypass or non-bypass virtual topologies for symmetric and asymmetric traffic profiles. Additional 10% saving was achieved compared to the work in [492]. Traffic-focused optimizations for IP over WDM networks were also considered for example in [494]-[499]. Optimizing static and dynamic traffic scheduling and grooming were considered in [494]-[497] in normal and post-disaster situations to reduce the energy consumption and demands blocking ratio. Techniques such as utilizing excess capacity, traffic filtering, protection path utilization, and services differentiation were examined. To achieve lossless reduction for transmitted traffic, the use of Network Coding (NC) in non-bypass IP over WDM was proposed in [498], [499]. Network-coded ports encode bidirectional traffic flows via XOR operations, and hence reduce the number of router ports required compared to un-coded ports. The energy efficiency and resilience trade-offs in IP over WDM networks were also addressed as in [500]-[503]. The impact on the energy consumption due to restoration after link cuts and core node failures was addressed in [500]. An energy efficient NC-based 1+1 protection scheme was proposed in [501], and [502] where the encoding of multiple flows sharing protection paths in non-bypass IP over WDM networks was optimized. MILP, heuristics, in addition to closed form expressions for the networking power consumption as a function of the hop count, network size, and demands indicated power saving by up to 37% compared to conventional 1+1 protection. The authors in [503] optimized the traffic grooming and the assignment of router ports to protection or working links under different protection schemes while considering the sleep mode for protection ports and cards. Up to 40% saving in the power consumption was achieved. Utilizing renewable energy resources such as solar and wind to reduce non-renewable energy usage in IP over WDM networks with data centers was proposed in [507], and [508]. Factors such as renewable energy average availability and their transmission losses, regular and inter data center traffic, and the network topology were considered to optimize the locations of the data centers and an average reduction by 73% in non-renewable energy usage was achieved. The work in [509] considered periodical reconfiguration to virtual topologies in IP over WDM networks based on a “follow the sun, follow the wind” operational strategy. Renewable energy was also considered in IP over WDM networks for cloud computing to green their traffic routing [510], content distribution [511], services migration [512], and for VNE assignments [513]. The energy efficiency of Information-Centric Networks (ICNs) and Content Distribution Networks (CDNs) were extensively surveyed in [512] and [514] respectively. CDNs, are scalable services that cache popular contents throughout ISP infrastructures, while ICNs support name-based routing to ease the access to contents. The placement of data centers and their content in IP over WDM core nodes were addressed in [516] while considering the energy consumption, propagation delay, and users upload and download traffic. An early effort to green the Internet [517] suggested distributing Nano Data Centres (NaDa) next to home gateways to provide various caching services. In [518], the energy efficiency of Video-on-Demand (VoD) services was examined by evaluating five strategic caching locations in core, metro, and access networks. The work in [519]-[521] addressed the energy efficiency of Internet Protocol Television (IPTV) services by optimizing video content caching in IP over WDM networks while considering the size and power consumption of the caches and the popularity of the contents. To maximize caches hit rate, the dynamics of TV viewing behaviors throughout the day were explored. Several optimized content replacement strategies were proposed and up to 89% power consumption reduction was achieved compared to networks with no caching. The energy efficiency of Peer-to-Peer (P2P) protocol-based CDNs in IP over WDM networks was examined in [522] while considering the network topology, content replications, and the behaviors of users. In [523], the energy efficiency and performance of various cloud computing services over non-bypass IP over WDM networks under centralized and distributed computing modes were considered. Energy-aware MILP models were developed to optimize the number, location, capacity and contents of the clouds for three cloud services namely; content delivery, Storage-as-a-Service (SaaS), and virtual machines (VM)-based applications. An energy efficient cloud content delivery heuristic (DEER-CD) and a real- time VM placement heuristic (DEERVM) were developed to minimize the power consumption of these services. The results showed that replicating popular contents and services in several clouds yielded 43% power saving compared to centralized placements. The placement of VMs in IP over WDM networks for cloud computing was optimized in [524] while considering their workloads, intra-VM traffic, number of users, and replicas distribution and an energy saving of 23% was achieved compared to one location placements. The computing and networking energy efficiency of cloud services realized with VMs and VNs in scenarios using a server, a data center, or multiple geo-distributed data centers were considered in [387]. A Real-time heuristics for Energy Optimized Virtual Network Embedding (REOViNE) that considered the delay, clients locations, load distribution, and efficient energy profiles for data centers was proposed and up to 60% power savings were achieved compared to bandwidth cost optimized VNE. Moreover, the spectral and energy efficiencies of O-OFDM with adaptive modulation formats and ALR power profile were examined. To bridge the gap between traffic growth and networking energy efficiency in wired access, mobile, and core networks, GreenTouch17, which is a leading Information and Communication Technology (ICT) research consortium composed of fifty industrial and academic contributors, was formed in 2010 to provide architectures and specifications targeting energy efficiency improvements by a factor of 1000 in 2020 compared to 2010. As part of the GreenTouch recommendations, and to provide a road map to ISP operators for energy efficient design for cloud networks, the work in [504], [506] proposed a combined consideration for IP over WDM design approaches, and the cloud networking-focused approaches in [387], and [523]. The design approaches jointly consider optical bypass, sleep modes for components, efficient protection, MLR, optimized topology and routing, in addition to improvements in hardware where two scenarios; Business-As-Usual (BAU), and BAU with GreenTouch improvements are examined. Evaluations on AT&T core network with realistic 7 data centers locations and 2020 projected traffic, based on Cisco Visual Network Index (VNI) forecast and a population-based gravity model, indicated energy efficiency improvements of 315x compared to 2010 core networks. Focusing on big data and its applications, the work in [526], [527] addressed improving the energy efficiency of transport networks while considering different “5V” characteristics of big data and suggested progressive processing in intermediate nodes as the data traverse from source to central data centers. A tapered network that utilizes limited processing capabilities in core nodes in addition to 2 optimally selected cloud data centers is proposed in [527] and energy consumption reduction by 76% is achieved compared to centralized processing. In [527], the effects of data volumes on the energy consumption is examined. The work in the above mentioned two papers is extended in [173], [174] and is further explained is Section V with other energy efficiency and/or performance-related studies for big data applications in cloud networking environments.

V. CLOUD NETWORKING-FOCUSED OPTIMIZATION STUDIES This Section addresses a number of network-level optimization studies that target big data applications deployed partially or fully in virtualized cloud environments. Unlike in dedicated clusters implementations, cloud- based applications encounter additional challenges to meet QoS and SLA requirements such as the heterogeneity

17 Available at: www.greentouch.org of infrastructures and the uncertainty of resources availability. Thus, a wide range of optimizations objectives, and utilizations for cloud technologies and infrastructures are proposed. This Section is organized as the follows: Subsection V-A focuses on cloud resources management and optimization for applications, while Subsection V- B addresses VMs and containers placement and resources allocation optimizations studies. Subsection V-C discusses optimizations for big data bulk transfer between geo-distributed data centers and for inter data center networking with big data traffic. Finally, Subsection V-D summarizes studies that optimize big data applications while utilizing SDN and NFV. The studies presented in this Section are summarized in Tables III, and IV for generic cloud-based, and specific big data frameworks, respectively.

A. Cloud Resource Management: To avoid undesired under or over-provisioning in cloud-based MapReduce systems, the authors in [120] provided a theoretical-based mechanism for resource scaling at run-time based on three sufficient conditions that satisfy hard-deadline QoS metrics. A dynamic resources allocation policy was examined in [121] to meet soft- deadlines of map tasks in hybrid clouds while minimizing their budget. The policy utilized local resources in addition to public cloud resources to allow concurrent executions and hence reduce the completion time. In [122], a Hyper-Heuristic Scheduling Algorithm (HHSA) was proposed to reduce the makespan of Hadoop map tasks in virtualized cloud environments. HHSA was found to outperform FIFO and Fair schedulers for jobs with heterogeneous map tasks. To jointly optimize resource allocation and scheduling for MapReduce jobs in clusters and clouds while considering data locality and meeting earliest start time, execution time, and deadline SLAs, the work in [123] proposed a MapReduce Constraint Programming-based Resource Management (MRCP-RM) algorithm. An average reduction by 63% in deadlines missing is achieved compared to existing schedulers. Cloud RESource Provisioning (CRESP) was proposed in [124] to minimize the cost of running MapReduce workloads in public clouds. A reliable cost model that allocates MapReduce workloads while meeting deadlines is obtained by quantifying the relations between the completion time, the size of input data, and the required resources. However, a linear relation was assumed between data size and computation cost which do not suit intensive computations. The energy efficiency of running a mix of MapReduce and video streaming applications in P2P and community clouds was considered in [125]. Results based on simulations that estimate the processing and networking power consumption indicated that community clouds are more efficient than P2P clouds for MapReduce workloads. Due to the varying prices, in addition to the heterogeneity in hardware configurations and provisioning mechanisms among different cloud providers, choosing a service that ensures adequate and cost-efficient resources for big data applications is considered a challenging task for users and application providers. In [126], a real-time QoS-aware multi-criteria decision making technique was proposed to ease the selection of IaaS clouds for applications with strict delay requirements such as interactive online games, and real-time mobile cloud applications. Factors such as the cost, hardware features, location, in addition to WAN QoS were considered. Typically, cloud providers adopt resource-centric interfaces, where the users are fully responsible for defining their requirements and selecting corresponding resources. As these selections are usually non-optimal in terms of performance, and utilization, Bazaar was alternatively proposed in [127] as a job-centric interface. Users only provide high-level performance and cost goals and then the provider allocates the corresponding computing and networking resources. Bazaar achieved accepted request rates increase between 3% and 14% as it argued that several sets of resources lead to similar performance and hence provide more flexibility in resource assignments. In [128], an accurate projection-based model was proposed to aid the selection of VMs and network bandwidth resources in public clouds with homogeneous nodes. The model profiles MapReduce applications on small clusters and projects the resources it predicts into larger clusters requirements. The concurrency of in-memory sort, external merge-sort, and disk read/write speeds were considered. To address the heterogeneity in cloud systems caused by regular server replacements due to failures, the work in [129] proposed Application to Datacenter Server Mapping (ADSM) as a QoS-aware recommendation system. Results with 10 different server configurations indicated up to 2.5x efficiency improvement over a heterogeneity-oblivious mapping scheme with minimal training overhead. The authors in [130] considered a strategy for resource placement to provide high availability cloud applications. A multi-constrained MILP model was developed to minimize the total down time of cloud infrastructures while considering the interdependencies and redundancies between applications, the mean time to failure (MTTF), and mean time to recover (MTTR) of underlying infrastructures and applications components. Up to 52% performance improvement was obtained compared to OpenStack Nova scheduler. The studies in [131]-[134] focused on the management of bandwidth resources in cloud environments. The authors in [131] proposed CloudMirror to guarantee bandwidth allocation for interactive and multi-tier cloud applications. A network abstraction that utilized tenants’ knowledge of applications bandwidth requirements based on their communication patterns was utilized. Then, a workload placement algorithm that balances locality and availability and a runtime mechanism to enforce the bandwidth allocation were used. The results indicated that CloudMirror can accept more requests while improving the availability from 20% to 70%. Falloc was proposed in [132] to provide VM-level bandwidth guarantees in IaaS environments. First, a minimum base bandwidth is guaranteed for each VM, then a proportional share policy through a Nash bargaining game approach is utilized to allocate excess bandwidth. Compared to fixed bandwidth reservation, an improvement by 16% in jobs completion time was achieved. ElasticSwitch in [133] is a distributed solution that can be implemented in hypervisor to provide minimum bandwidth guarantees and then dynamically allocate residual bandwidth to VMs in a work-conserving manner. Small testbed-based results showed successful guarantees even for challenging traffic patterns such as one-all traffic patterns; and improved links utilization to 75% to 99%. A practical distributed system to provide fine-grained bandwidth allocation among tenants, EyeQ, was examined in [134]. To provide minimum bandwidth guarantees and resolve contention at the edge, EyeQ maintains at each host a Sender EyeQ Module (SEM) that control the transmitting rate and a Receiver EyeQ Module (REM) that measures the received rate. These modules can be implemented in the codebase of virtual switches in untrusted environments or as a shim layer in non-virtualized trusted environments. For traffic with different transport protocols, EyeQ maintained the 99.9th percentile of latency equivalent to that of dedicated links. The authors in [135] addressed optimizing NoSQL databases deployments in hybrid cloud environment. A cost-aware resources provisioning algorithm that considers the existence of different SLAs among multiple cloud tiers is proposed to meet queries QoS requirements while reducing their cost. In [136], the need for dynamic provisioning of cloud CPU and RAM resources was addressed for real-time streaming applications such as Storm. Jackson open queuing networks theory was utilized for modelling arbitrary operations such as loops, splits, and joins. The fairness of assigning Spark-based jobs in geo-distributed environments was addressed in [137] through max-min fairness scheduling. For up to 5 concurrent sort jobs and compared to default Spark scheduler, job completion time improvements by up to 66% was achieved. The work in [138] outlined that the memoryless fair resource allocation strategies of YARN are not suitable for cloud computing environments as they violate their good properties. To address this, two fair resource allocation mechanisms: Long-Term Resource Fairness (LTRF) for single-level and Hierarchical LTRF (H-LTRF) were proposed and implemented in YARN. The results indicated improvements compared to existing schedulers in YARN, however, only RAM resources were considered. While considering CPU, memory, storage, network and data resources management in multi-tenancy Hadoop environment, the authors in [139] utilized a meta-data scheme, meta-data manager, and a multi-tenant scheduler to introduce multi-tenancy features to YARN and HDFS. Kerberos authentication was also proposed to enhance the security and access control of Hadoop.

B. Virtual Machines and Containers Assignments: The efficiency of VMs for big data applications relies heavily on the resources allocation scheme used. To ease sizing the virtualized resources for cloud MapReduce clusters and meeting the requirements of non-expert users, the authors in [140] suggested Elastisizer which automates the selection of both VM sizes and jobs configurations. Elastisizer utilizes Starfish detailed profiling [103], in addition to white-box and black-box techniques to estimate the virtualized resource needs of workloads. Cura was proposed in [141] to provide cost- effective provisioning for interactive MapReduce jobs by utilizing Starfish profiling, VM-aware scheduling, and online VMs reconfigurations. A reduction of 80% in costs and 65% in job completion time were achieved but at the cost of increasing the risks for cloud providers as they fully decide on resource allocation and satisfying users QoS metrics. Cloud providers typically over-provision resources to ensure meeting SLAs, however, this leads to poor utilization. The work in [142] utilized the over-provisioning of interactive transactional applications in hybrid virtual and native clusters by concurrently running MapReduce batch jobs to increase the utilization. HybridMR, a 2-phase hierarchical scheduler, was proposed to dynamically configure the native and virtual clusters. The first phase categorizes the incoming jobs according to their expected virtualization overhead, then decides on placing them in the physical or virtual machine. The second phase is composed of dynamic resource manager (DRM) at run time, and an Interference Prevention System (IPS) to reduce the interference between tasks in co-hosted VMs and tasks co-located in the same VM. The results indicated an improvement of 40% in completion time compared with fully virtualized systems, and by 45% in utilization compared to native systems with minimum performance overhead. The differences in the requirements of long and short duration MapReduce jobs were utilized in [143] through fine-grained resources allocations and scheduling. A trade-off between scheduling speed and accuracy was considered where accurate constrained programming was utilized to maximize the revenue for long jobs, and two heuristics; first-fit and best-fit algorithms were utilized to schedule small jobs. As a result, the jobs accommodated have increased by 18% leading to an improvement of 30% in revenue. However, simplifications such as a job must run only in one machine, no preemption, and homogeneous VMs were assumed. The elasticity of VM assignments was utilized in several studies to reduce the energy consumption of running big data applications in cloud environments [144]-[147]. In [144] an online Energy-Aware Rolling-Horizon (EARH) scheduling algorithm was proposed to dynamically adjust the scale of VMs to meet the real-time requirements of aperiodic independent big data jobs while reducing the energy consumption. Cost and Energy- Aware Scheduling (CEAS) was proposed in [145] to reduce the cost and energy consumption of workflows while meeting their deadline through utilizing DVFS and 5 sub algorithms. The work in [146] considered spatial and temporal placement algorithms for VMs that run repeated batch MapReduce jobs. It was found that spatio- temporal placements outperform spatial-only placement in terms of both energy saving and job performance. The work in [147] considered energy efficient VM placement while focusing on the characteristics of MapReduce workloads. Two algorithms; TRP as a trade-off between VM duration and resources utilization, and VCS to enhance the utilization of active servers, were utilized to reduce the energy consumption. Savings by about 16%, and 13% were obtained compared to Recipe Packing (RP), and Bin Packing (BP) heuristics, respectively. Several studies addressed data locality improvements in virtualized environments [148]-[151]. To reduce the impact of sudden increases in shuffling traffic on the performance of clouds, Purlieus was proposed in [148] as a cloud-based MapReduce system. The system suggests using dedicated clouds with preloaded data and utilizes separate storage and compute units, which is different from conventional MapReduce clouds, where the data must be imported before the start of the processing. Purlieus improved the input and intermediate data locality by coupling data and VM placements during both; map and reduce phases via heuristics that consider the type of MapReduce jobs. The challenge of seamlessly loading data into VMs is addressed by loop-back mounting and VM disk attachment techniques. The results indicated an average reduction in job execution time by 50% and in traffic between VMs by up to 70%. However, it assumed knowledge about loads on datasets, job arrival rate and mean execution time. The headroom CPU resources that cloud providers typically preserve to accommodate the peaks in demands were utilized in [149] to maximize data locality of Hadoop tasks. These resources were dynamically adjusted at run-time via hot-plugging. If unstarted tasks in a VM have the data locally, headroom CPU resources are reserved to it while removing equivalent resources from other VMs to keep the price constant for the users. For the allocation and deallocation of resources, two Dynamic Resource Reconfiguration (DRR) schedulers were proposed; synchronous DRR that utilize CPU headroom resources only when they are available, and queue-based DRR that delay the schedule of tasks with locality-matched VM until headroom resources are available. An average improvement in the throughput by 15% was achieved. DRR differs from Purlies in that it assumes no prior knowledge and schedules dynamically, however, the locality for reduce tasks was not considered. To tackle resources usage interference between VMs in virtual MapReduce clusters that can cause performance degradation, the authors in [150] proposed an interference and locality aware (ILA) scheduling algorithm based on delay scheduling and task performance prediction models. Based on observations that the highest interference is in the I/O resources, ILA was tested under different storage architectures. ILA achieved a speed up by 1.5-6.5 times for individual jobs, and 1.9 times improvement in the throughput compared with 4 state- of-art schedulers, interference-aware only, and locality aware only scheduling. The authors in [151] proposed CAM which is a cloud platform to support cloud MapReduce by determining the optimal placement of new VMs requested by users while considering the physical location and the distribution of workloads and input data. CAM exposes networking topology information to Hadoop and integrates IBM ISAAC protocol with GPFS-SNC for the storage layer which allows co-locating related data blocks to increase the locality of VM image disks. As a result, the cross-rack traffic related to operating system and MapReduce was reduced by 3 times and the execution duration was reduced by 8.6 times compared to state-of-art resources allocation schemes. To increase the revenue and reliability for business critical workloads in virtualized environments, the authors in [152] proposed Mnemos which is a self-expressive resources management and scheduling architecture. Mnemos utilized existing system monitoring and VM management tools, in addition to a portfolio scheduler and Nebu topology-aware scheduler to adapt the policies according to workload types. Mnemos was found to respond well to workload changes, and to reduce costs and risks. Similarly, and to provide automated sizing for NoSQL clusters in real-time based on workloads and user defined policies, an open-sourced framework for use in IaaS platforms, TIRAMOLA was suggested in [153]. TIRAMOLA integrates VM-based monitoring with Markov Decision Processing (MDP) to decide on the addition and removal of VMs while monitoring the client and server sides. The results here showed a successful VMs resizing that tracks the typical sinusoidal variations in commercial workloads while using different load smoothing techniques to avoid VM assignments oscillations (i.e. being continuously added and removed without being useful to workloads). To enhance current Xen-based VM schedulers that do not fully support MapReduce workloads, a MapReduce Group Scheduler (MRG) was proposed in [154]. MRG aims to improve the scheduling efficiency by aggregating the I/O requests of VMs located in the same physical machine to reduce context switching and improve the fairness by assigning weights to VMs. MRG was tested with speculative execution and under multiprocessor, and an improvement by 30% was achieved compared to Xen credit default scheduler. In [155], a large segment scheme and Xen I/O ring mechanism between front-end and back-end drivers were utilized to optimize block I/O operations for Hadoop. As a result, CPU usage was reduced by a third during I/O, and the throughput was improved by 10%. Few recent studies addressed Hadoop improvements in Linux containers-based systems [156]-[160]. FlexTuner was proposed in [156] to allow users and developers to jointly configure the network topology and routing algorithms between Linux containers, and to analyze communication costs for applications. Compared with Docker containers that are connected through a single virtual Ethernet bridge, FlexTuner used virtualized Ethernet links to connect the containers and the hosts (i.e. each container has its own host name and IP address). The authors in [157] utilized a modified k-nearest neighbour algorithm to select the best configurations for newly submitted jobs based on information from similar past jobs in Hadoop clusters based on Docker containers. 28% gain in the performance was obtained compared to default YARN configurations. An automated configuration for Twister, which is a dedicated MapReduce implementation in container-driven clouds for iterative computations workloads, was proposed in [158]. The traditional client/server communication model (e.g. RPC) was replaced with a publish/subscribe model via NaradaBrokering open-source messaging infrastructure. Negligible difference in jobs completion time was found between physical and container-based implementations. To improve the performance of networking in Linux container-based Hadoop in cloud environments, the authors in [159] proposed the use of IEEE 802.3ad that enables separate interfaces to instances and then aggregates the traffic at the physical interface. Completion time results indicated a comparable performance with bare-metal implementation and improvement by up to 33.73% compared to regular TCP. EHadoop in [160] tackled the impact of saturated WAN resources on the completion time and costs of running MapReduce in elastic cloud clusters. A network-aware scheduler the profile jobs and monitor network bandwidth availability is proposed to schedule the jobs in optimum number of containers so that the costs are reduced and network contentions are avoided. Compared to default YARN scheduler, 30% reduction in costs was achieved.

C. Bulk Transfers and Inter Data Center Networking (DCN): Several studies considered optimizing bulk data transfers for backups, replication, and data migration between geo-distributed data centers in terms of links utilization, cost, completion time, or energy efficiency [161]-[170]. The early work in [161] proposed NetStitcher to utilize the unused bandwidth between multiple data centers for bulk transfers. NetStitcher used future bandwidth availability information (e.g. fixed free slot 3-6 AM), and intermediate data centers between the source and destination in a store-and-forward fashion to overcome availability mismatch between non-neighbouring nodes. Moreover, data bulks were split and routed over multipath to overcome intermediate nodes storage constraints. Postcard in [162] further considered different file types, multi-source-destination pairs, and proposed a time slotted model to consider the dynamics of storing and forwarding. An online optimization algorithm was developed to utilize the variability in bandwidth prices to reduce the routing costs. GlobeAny was proposed in [163] as an online profit-driven scheduling algorithm for inter data center data transfer requested by multiple cloud applications. Request admission, store-and-forward routing, and bandwidth allocation were considered jointly to maximize the cloud providers profit. Furthermore, differentiated services can be provided by adjusting applications weights to satisfy different priority and latency requirements. CLoudMPcast is suggested in [164] to optimize bulk transfers from a source to multi geo-distributed destinations through overlay distribution trees that minimize costs without affecting transfer times while considering full mesh connections and store-and-forward routing. The charging models used by public cloud service providers which are characterized by flat cost depending on location and discounts for transfers exceeding a threshold in the range of TBytes were utilized. Results indicated improved utilization and savings for customers by 10% and 60% for Azure and EC2, respectively compared to direct transfers. Similarly, the work in [165] optimized big data broadcasting over heterogeneous networks by constructing a pipeline broadcast tree. An algorithm was developed to select the optimum uplink rate and construct an optimal LockStep Broadcast Tree (LSBT). The authors in [166], and [167] focused on improving bulk transfers while considering the optical networking and routing layer. In [166], joint optimization for the backup site and the data transmission paths was proposed to accelerate regular backups. The results indicated that it is better to select the sites and the paths separately. The spectrum management agility of elastic optical networks, realized by Bandwidth-Variable transponders and wavelength selective switches, was utilized in [167] to handle the uncertainty and heterogeneity of big data traffic by adaptively routing backups which were modeled as dynamic anycasts. End systems Solutions besides networking solutions were proposed in [168] to reduce the energy consumption of data transfers between geo-distributed data centers. Factors such as file sizes, and TCP pipelining, parallelism, and concurrency, were considered. Moreover, data transfer power models based on power meters measurements, that consider CPU, memory, hard disk, and NIC resources, were developed. To emulate geo-distributed transfers in testbeds, artificial delays obtained by setting the round-trip time (RTT) were considered. A similar approach was also utilized in [169], and [170]. To assist users to get the best configurations, the work in [169] investigated the effects of the distance on the delay and the traffic characteristics in geo-distributed data centers while considering different block sizes, locality, and replicas. The authors in [170] experimentally addressed the cost and speed efficiency of data movements into a single data center for MapReduce workloads. Two online algorithms; Online Lazy Migration (OLM), and Randomized Fixed Horizon Control (RFHC) were proposed and the authors generalized that MapReduce computation are better if processed in single data center. On the other hand, several other studies have utilized the fact that for most production jobs, the intermediate data generated is much smaller than the input data and they proposed alternative solutions based on distributed computations [171]-[174]. To reduce the computational and networking costs of processing globally collected data, the authors in [171] proposed a parallel algorithm to jointly consider data centric job placement and network coding based routing to avoid duplicate transmissions of intermediate data. An improvement by 70% was achieved compared to simple data aggregation and regular routing. To decrease MapReduce inter data centers traffic while providing predictable completion time, the authors in [172] also addressed joint optimization of input data routing and tasks placement. The bandwidth between data centers and the data input to intermediate ratio were determined by profiling, and the global allocation of map and reduce tasks was performed so that the maximum time required for input data, and intermediate data transmissions is minimized. A reduction by 55% was achieved compared to the centralized approach. The authors in [173], [174] improved the energy efficiency of bypass IP over WDM transport networks while considering the different 5V attributes of big data. The use of distributed Processing Nodes (PNs) in the sources and/or intermediate core nodes was suggested to progressively process the data chunks before reaching cloud data centers, and hence reduce the network usage. In [173], a joint optimization of the processing in PNs or cloud data centers and the routing of data chunks with different volumes and processing requirement yielded up to 52% reduction in the power consumption. The capacities and power usage effectiveness (PUE) of the PNs and the availability of the required processing frameworks were also examined. The impact of the velocity of big data on the power consumption of the network is further considered in [174] where two processing modes were examined; expedited (i.e. real-time) which is delay optimized, and relaxed (i.e. batch) which is energy optimized. Power consumption savings ranging between 60% for fully relaxed mode and 15% for fully expedited mode due to additional processing requirements were achieved compared to single location processing without progressive processing. Several studies addressed detailed optimizations for the configurations and scheduling of MapReduce framework in geo-distributed environments for example to reduce the costs [175]-[177], the completion time [178], [179], and for hybrid clouds [180], [181], while few addressed optimizing geo-distributed data streaming, querying, and graph processing applications [182]-[187]. An inter data center allocation framework was proposed in [175] to reduce the costs of running large jobs while considering the variabilities in electricity and bandwidth prices. The framework considered jobs indivisibility (i.e. ability to run in data centers without the need for inter communications), data centers processing capacities, and time constraints. A StochAstic power redUction schEme (SAVE) was proposed in [176] to schedule incoming batch jobs into a front-end server to geo-distributed back- end clusters while trading-off power costs and delay. Scheduling in SAVE was based on two time scales, the first is to allocate jobs and activate/deactivate servers, and the second, which is finer, is to manage service rates by controlling CPUs power levels. To reduce the cost and execution time of MapReduce sequential jobs in geo- distributed environments, G-MR in [177] utilized data Transformation Graphs (DTG) to select an optimal execution sequence detailed by data movements and processing locations. G-MR used approximating functions to estimate completion times for map and reduce tasks, and sizes of intermediate and output data. Replicas were considered by constructing a DTG for each and comparing. Moreover, the effects of choosing different providers (i.e. different inter data center bandwidths, and networking and compute costs) with data centers in the US east coast, US west coast, Europe and Asia were considered. The work in [178] addressed some geo-distributed Hadoop limitations through prediction based job localization, map input data pre-fetching to balance completion times of local and remote map tasks, and by modifying HDFS data placement policy to be sub-cluster aware to reduce output data writing time. An improvement by 48.6% was achieved for the completion time of reduce tasks. The authors in [179] addressed the placement of MapReduce jobs in geo-distributed clouds with limited WAN bandwidth. A Community Detection-based Scheduling (CDS) algorithm was utilized to reduce the WAN data transmission volume and hence reduce the overall completion time while considering the dependencies between tasks, workloads, and data centers. Compared to a hyper-graph partition based scheduling and a greedy algorithm, the transmitted data volume was reduced by 40.7% and the completion time was reduced by 35.8%. A performance model was proposed in [180] to estimate the speed up in completion time for iterative MapReduce jobs when considering cloud bursting. The model focused on weak links between on-premise and off-permise VMs and suggested asynchronous rack-aware replica placement and replica-aware scheduling to avoid additional data transmission to off-premise VMs if a task is scheduled before its data is replicated there. Experimental results with up to 15 on-premise VMs and 3 off-premise VMs indicated between 1-10% error in the model-based estimation. Compared to ARIA [69], up to one order of magnitude higher accuracy was achieved. BStream was proposed in [181] as cloud bursting framework that optimally couples batch processing in limited resources internal clouds (IC) with stream processing in external clouds (EC). To account for the delays incorporated with data uploading to ECs, BStream matched the stream processing rate with the data upload rate by allocating enough EC resources. Storm was used for stream processing with the aid of Nimbus, Zookeeper, and LevelDB for coordination, and Kafka servers to temporary store data in EC. BStream assumed homogeneous resources and no data skew and suggested an approach that performs partial reduction in IC if the tasks are associative. Alternatively, the approach runs final reduce tasks only in IC. Two checkpointing strategies were developed to optimize the reduce phase by overlapping the transmission of reduce results back to IC with input data bursting and processing in EC. However, users need to write two different codes (i.e. for Hadoop and for Storm) to express the same tasks. SAGE was proposed in [182] as an efficient service-oriented geo-distributed data streaming framework. SAGE suggested the use of loosely coupled cloud queues to manage and process streams locally and then utilized independent global aggregator services for final results. Compared to single location processing, a performance improvement by a factor of 3.3 was achieved. Moreover, an economical model was developed to optimize the resource usage and costs of the decomposed services. However, the impact of limited WAN bandwidths on the performance was not addressed. To address this, the authors in [183] proposed JetStream which utilizes structured Online Analytical Processing (OLAP) data cubes to aggregate data streams in edge nodes, and proposed to adaptively filter the data (i.e. degrade) to match the instantaneous capacity of the WAN in exchange for the analysis quality. The work in [184] aimed to reduce the inter data centers networking cost while meeting SLAs of big data stream processing in geo-distributed data centers by jointly optimizing VM placements and flow balancing. Each task is replicated in multiple VMs and the distribution of data streams between them was optimized. The results revealed improvements compared to single VM tasks allocation. The computational and communication costs of streaming workflows in geo-distributed data centers was considered in [185]. An Extended Streaming Workflow Graph (ESWG)-based allocation algorithm that accounts for semantic types (i.e. sequential, parallel, and choice for incoming and outgoing patterns) and price diversity of computational and communication tasks was constructed. The algorithm assumes that each task is allocated a VM in each data center, and divides the graph into sub-graphs to simplify the allocation solutions. The results indicated improved performance and reduced costs compared to greedy, computational-cost only, and communication-cost only approaches. To optimize queries processing latency and the usage of limited bandwidth and varied price WANs in geo-distributed Spark-like systems, Iridium was proposed in [186]. An online heuristic that iteratively optimizes the placement of data prior queries arrival based on their value per byte, and the processing for a complex DAGs of tasks was used. Iridium was found to speed the queries by 3-19 times and reduce WAN bandwidth usage by 15-64% compared to single location processing and unmodified Spark. The work in [187] proposed G-Cut for efficient geo-distributed static and dynamic graphs partitioning. Two optimization phases were suggested to account for the usage constraints and the heterogeneity of WANs. The first phase improves the greedy vertex-cut partitioning approach of PowerGraph to be geo-aware and hence reduces inter data center data transfers. The second phase adopts a refinement heuristic to consider the variability in WAN bottlenecks. Compared to unmodified PowerGraph, reductions by up to 58% in the completion time, and 70% in WAN usage were achieved.

D. SDN, EON, and NFV-based Optimization Studies: Static transport circuit-switched networks cannot provide bandwidth guarantees to applications as they have low visibility to transported traffic. The early work in [188], utilized OpenFlow as a unified control plane for the packet-switched, and circuit-switched layers to enable applications-aware routing that can meet advanced requirements such as low latency or low jitter. OpenFlow was shown to enable differentiated services and resilience for applications with different priorities. To provide latency and protection guarantees to applications, the Application Centric IP/Optical Networks Orchestration (ACINO) project was proposed in [189]. ACINO facilitates applications with the ability to specify reliability and latency requirements, and then groom the traffic of applications with similar requirements into specific optical services. Moreover, an IP layer restoration mechanism was introduced to assist optical layer restoration. A Zero Touch Provisioning, Operation and Management (ZTP/ZTPOM) model proposed in [190]. It utilized SDN and NFV to support Cloud Service Delivery Infrastructures (CSDI) by automating the provisioning of storage and networking resources of multiple cloud providers. However, these studies considered generalized large scale web, voice and video applications, and did not focus on big data applications. The authors in [191] focused on big data transmission and demonstrated the benefits of using Nyquist superchannels in software-defined optical networks for high speed transmission. Routing, Modulation Level, and Spectrum Allocation (RMLSA) optimizations were carried to select paths and modulation levels that improve the transmission speed while ensuring flexible spectrum allocation and high spectral efficiency. Compared to traditional Routing and Spectrum Assignments (RSA), an improvement by 77% was achieved. However, only bandwidth requirements were considered. To control geo-distributed shuffling traffic that typically saturates local and wide area network switches, the work in [192] suggested an OpenFlow (OF) over Ethernet enabled Hadoop implementation to dynamically control flows paths and define OF-enabled queues with different priorities to accelerate jobs. Experiments with sort workloads demonstrated that OF-enabled Hadoop outperformed standard Hadoop. Palantir was proposed in [193] as an SDN service in Hadoop to ease obtaining adequate network proximity information such as rack-awareness within the data center and data center-awareness in geo-distributed deployments, without revealing security sensitive information. Two optimization algorithms were proposed; the first for data center-aware replica placement and tasks scheduling, and the second for proactive data prefetching to accelerate tasks with non-local data. To automate provisioning of computing and networking resources in hybrid clouds for big data applications, VersaStack was proposed in [194] as an SDN-based full-stack model driven orchestration. VersaStack computes models for virtual cloud network, manages VM placement, computes L2 paths, performs Single Root Input/Output Virtualization (SR-IOV) stitching, and manages which is a networked block storage. Enhancing bulk data transfers has been considered with the aid of SDN and elastic optical inter data center networking in [195]-[199]. To overcome the limitations of current inter data centers WANs with pre-allocate static links and capacities for big data transfers, the authors in [195] proposed Open Transport Switch (OTS) which is a virtual light-weight switch that abstracts the control of transport networks resources for hybrid clouds. OpenFlow was extended to manage packet-optical cross connect (XCON) through additional messaging capabilities. OTS allows applications to optimally compute end-to-end paths across multiple domains or to just submit bandwidth requirements to OTS. Two modes of operations were suggested; implicit and explicit. The former allows the SDN controller to communicate only with edge switches which are located between different transport domains (e.g. Ethernet, Multiprotocol Label Switching (MPLS), OTN), and the later unifies the control for all the components across different domains. Malleable reservations for data bulks transfers in EONs were proposed in [196]. While accounting for flow-oriented reservations with fixed bandwidth allocations, the RSA of the bulk transfers was adjusted dynamically to increase the utilization of the EON and to recycle 2-D spectrum fragmentation. The authors in [197] considered dynamic routing for different data chunks in bulk data transfers between geo- distributed data centers by utilizing an SDN centralized controller and distributed gateway servers in the data centers. Tasks admission control, routing, and the store-and-forward mechanism were considered to maximize the utilization of dedicated links between the data centers. To boost high priority transfers when the networking resources are insufficient, transmission of lower priority chunks can be paused. These chunks were then temporarily stored in intermediate gateway servers and forwarded later at a carefully computed time. To further enhance the transfer performance, three dynamic algorithms were proposed which are bandwidth reserving, dynamic adjustment, and future demand friendly. Owan was proposed in [198] as an approach that can help reduce the completion time of deadline-constrained business-oriented bulk transfers in WANs by utilizing a centralized SDN controller that controls ROADMs and packet switches. Network throughput was maximized by joint optimizations of the network topology, routing, and transfer rates while assuming prior knowledge for demands. Topology reconfigurations were achieved by changing the connectivity between router and ROADM ports for example to double the capacity for certain links. Simulated annealing was utilized to update the network gradually and hence ease the reconfigurations. Moreover, consistency was ensured while applying the updates to avoid packets dropping. Compared to packet layer control only, the results indicated 4.45 times faster transfers and 1.36 times lower completion time. Flexgrid optical technologies were proposed with SDN control to meet SLAs and increase revenue of inter data center transfers in [199]. Application-Based Network Operations (ABNO) architecture, which is a centralized architecture under the standardization of IEFT, for the control plane was utilized to manage applications-oriented semantics to identify data volumes and completion time requirements. Heuristics for the Routing and Scheduled Spectrum Allocation (RSSA) were proposed to adjust BV transponders, and OXC to ensure sufficient resources and hence meet deadlines. If workloads with high priority and nearest completion time lack resources, assigned resources for lowest priority workloads can be dynamically reduced. Few recent studies addressed utilizing NFV for improving the performance of big data applications [200]- [203]. An NFV framework for genome-centric networking was proposed in [200] to address the challenges of exchanging both; genetic data with large file sizes (e.g. 3.2 GB) and the VM images that contain the processing software between data centers. To reduce the traffic, SDN and NFV-based caches were utilized to store parts of the files to be processed along the path. A signalling protocol with on and off path message distribution is proposed to discover the cache resources. NFV enables service chaining by using sequences of VNFs to perform fundamental networks operations such as firewalls and deep packet inspection (DPI) on traffic. The work in [201] addressed optimizing VNF instances placement to minimize the communication costs between connected VNF instances while considering the volumes of big data traffic. An extended network service chain graph was proposed and joint optimization of VNFs and the flows between them was considered. The ETSI NFV specifications were followed in [202] and [203] to enhance cloud infrastructures with big data video workloads. The authors in [202] implemented an elastic CDN as a NFV to deliver live TV and VoD services via the Standard MPEG Dynamic Adaptive Streaming over HTTP (MPEG- DASH). A centralized CDN manager was proposed to control the virtualized CDN functions such as requests admission, and intermediate and leaf cache contents and capacity while reducing the networking costs and ensuring high QoE. NUBOMEDIA was proposed in [203] as an elastic PaaS with integrated API stack to integrate Real-time Multimedia Communication (RTC) with multimedia big data processing such as video content analysis (VCA) and augmented reality (AR) with the aid of NFV. The results indicated that NUBOMEDIA scaled well with increasing requests while maintaining the QoS for WebRTC workloads. A summary of the cloud networking-focused optimization studies for big data applications discussed is given in Table III. Table IV, in contrast, focuses on generic cloud applications and provides a summary in a fashion similar to that in TABLE III. TABLE III SUMMARY OF CLOUD NETWORKING-FOCUSED OPTIMIZATION STUDIES FOR BIG DATA APPLICATIONS Ref Objective Application Tools Benchmarks/workloads Experimental Setup/Simulation environment [120]* Resources scaling to avoid MapReduce Theoretical HiBench (TeraSort Emulator for IaaS and over and under provisioning analysis and WordCount) cloud-based MapReduce [121]* Dynamic resources allocation in Hadoop 1.x Policy applied WordCount Aneka platform; 4 IBM X3200 M3 servers hybrid clouds to meet soft deadlines In the master (4 CPU cores, 3 GB RAM) and a master (2 cores, 2 GB RAM) EC2 m1.small instances. [122]* Hyper-Heuristic Scheduling Algorithm Hadoop 1.1.3 simulated annealing, genetic fMRI, protein annotation, CloudSim simulator (4 VMs in a data center), (HHSA) for cloud computing systems algorithm, swarm, ant colony PSPLIB, Grep, TeraSort Single node (Intel i7, 64GB RAM) multi-node cluster with 4 VMs (8 cores, 16GB RAM) [123]* MapReduce jobs resource Hadoop 1.2.1 Matchmaking and scheduling Synthetic workloads, 11 nodes cluster in EC2 (Intel Xeon allocation and scheduling algorithm (MRCP-RM) WordCount CPU, 3.75 GB RAM), simulations [124]* Cloud RESource Provisioning (CRESP) Hadoop 1.0.3 Profiling and WordCount, Sort, 16 nodes cluster (4 Quad-core CPU, 16GB to minimize cost or completion time machine learning TableJoin, PageRank RAM, 2 500GB disks), EC2 small instances (1 CPU, 1.7GB RAM, 160GB disk) [127]* Bazaar: job-centric interface for Hadoop 0.20.2 Profiling+ greedy heuristic Sort, WordCount, GridMix, TF- Prediction: 35 Emulab nodes (from data analytics applications in clouds for resources selection IDF, LinkGraph Cloudera), evaluation on 26 Emulab nodes [128]* Public cloud resources allocation to Hadoop 1.x Naive and linear Sort, Word Count, 33 nodes (4 cores, 4 GB RAM, meet completion time requirements profiling GridMix 1TB disk for data), 1 Gbps Ethernet [132]* Falloc: VM-level bandwidth Hadoop 1.0.0 Nash bargaining Random VM-VM traffic, 16 nodes cluster with 64 VMs, 1 Gbps guarantees in IaaS environments game, rate control WordCount, Sort, join Trace-driven Mininet simulations [135]* Cost model for NoSQL Elasticsearch Profiling + look- YCSB 6 nodes with mixture of VMs with in public and private clouds 0.20.6 ahead optimization (2-4 CPU, 2.4-3.2 GHz, 3-8GB RAM) [136]* Dynamic resource scheduling for Data Storm DRS Video logo detection, 6 nodes (Intel quad core CPU, Stream Management Systems (DSMS) scheduler frequent pattern detection 8 GB RAM), LAN network [137]* Fairness scheduler for Spark jobs Apache Linear program, Sort workloads for 12 EC2 instances in 6 geo-distributed In geo-distributed data centers Spark heuristic multiple jobs data centers (2 CPU, 7.5GB RAM, 32GB SSD [138]* Long Term Fair Resource Hadoop 2.2.0 LTYARN scheduler Facebook traces, Purdue suite, 10 nodes cluster (2 Intel X5675 Allocation in cloud computing In YARN Hive workloads CPUs„ 24GB RAM) to emulate EC2 TPC-H, Spark ML t2.medium (2 CPU cores, 4 GB RAM) [139]* Introduce multi-tenancy Hadoop 2.6.0 Modification to WordCount, Grep, 8 nodes cluster(2.4 GHz CPU, 3GB features in Hadoop YARN and HDFS PiEstimator RAM, 200 GB HDD), 1 GB Ethernet [140]° Elastisizer: automating virtualized Hadoop 1.x Profiling, white Link Graph, Join, TF-IDF, Various EC2 instances cluster sizing for cloud MapReduce and black-box methods TeraSort, WordCount [141]° Cura: cost-effective cloud Hadoop 1.x Profiling, VM-aware Facebook-like traces 20 nodes KVM-based cluster (16 provisioning for MapReduce scheduling, online generated by SWIM cores CPU, 16GB RAM) in two racks, VM reconfiguring 1 Gbps Ethernet, Java based simulations [142]° HybridMR: MapReduce scheduler in Hadoop 0.22.0 2-phase hierarchical Sort, WC, PiEst, DistGrep, 24 nodes (dual core 2.4 GHz CPU, 4GB hybrid native/virtualized clusters scheduler, profiling twitter, K-means, TestDFSIO RAM), 48 VMs (1 CPU core, 1 GB RAM) [146]° Energy-aware MapReduce in clouds Hadoop 1.x BinCardinality, BinDu- Sort, WordCount, 6 nodes (dual core CPU, 2GB RAM, by spatio-temporal placement tradeoffs ration, RandomFF, recipe PiEstimator 250GB disk), 3 VM types, 1 Gbps Ethernet. [147]° Energy-efficient VM placement in IaaS Hadoop 1.1.0 Tight Recipe Packing (TRP), Sort, TeraSort, 10 nodes Xen-based cluster (4 cores cloud data centers running MapReduce Virtual Cluster Scaling (VCS) WordCount CPU, 6GB RAM, 600GB storage) Gigabit Ethernet, Java-based simulations [148]° Purlieus: optimizing data and VM Hadoop 1.x Heuristics based Grep, Permutation 2 racks, 20 nodes, 20 VMs/job (4 2 GHz placements for MapReduce in clouds on k-club generator, Sort CPUs, 4GB RAM), KVM as hypervisor, 1 Gbps network, simulations (PurSim) [149]° Locality-aware Dynamic Hadoop 0.20.2 Hot-plugging, synchronous Hive workloads (Grep, 100 nodes EC2 cluster (8 CPU, 7 GB RAM), Resource Reconfiguration (DRR) and queue-based scheduler select aggregation, join) 6 nodes cluster (6 CPU, 16GB RAM), With 5 VMs/node (2CPU, 2GB) [150]° Interference and locality-aware Hadoop Meta task scheduler TeraSort, Grep RWrite, 12 physical machines in 2 racks forming MapReduce tasks scheduling in 0.20.205 WCount, TeraGen 72-nodes Xen-based cluster (12 CPU cores, virtual clusters 32GB, 500GB disk), 1 Gbps Ethernet [151]° CAM: topology-aware Vanilla Scheduler based on Hive workloads 4 machines (2 quad CPU, 48GB RAM), resources manager for Hadoop min-cost flow problem KVM hypervisor, 1 Gbps Ethernet, 23 VMs MapReduce in clouds (2 CPU, 1GB RAM), simulations (PurSim) [152]° Self-expressive management for business Hadoop and Mnemos: portfolio scheduler, Real-world business-critical Simulations critical workloads in clouds Microsoft HPC Nedu: virtualization scheduler traces from 1300 VMs [153]° TIRAMOLA: open-source Hadoop 1.0.1, Ganglia for monitoring, YCSB benchmark OpenStack cactus cluster with 20 clients framework to automate VM HBase 0.92.0 Markov Decision Process VMs (2 CPU cores, 4GB RAM), and resizing in NoSQL clusters 16 server VMS (4 CPU cores, 8GB RAM) [154]° Enhanced Xen scheduler Hadoop 0.20.2 modification to Xen Word Count, Grep, sort, Machine type 1 (2.8 GHz 2 core CPU, 2MB for MapReduce hypervisor, 2-level Hive benchmarks L2 cache, 3GB RAM) Machine type 2 (3 scheduling policy GHz 2 core CPU, 6MB L2 cache, 6GB RAM) [155]° Enhancing Hadoop Hadoop 1.x Optimizing Xen TestDFSIO (Intel i7 CPU, 16GB RAM, SSD 128GB disk) on Xen-based VMs I/O ring mechanism [157]° Automatic configurations for Hadoop 2.x Lightweight custom k- HiBench workloads (k- 5 nodes (2 Quad-core 2.8 GHz CPU, 12 MB L2 Hadoop on Docker containers nearest neighbour heuristic Means, Bayes, PageRank) cache, 8GB RAM, 320GB disk), 1 Gbps Ethernet [158]° Iterative MapReduce Twister Pub/sub broker network k-means clustering 2 Docker containers on Docker containers via NaradaBrokering [159]° Improving networking performance Hadoop 2.7.3 Link aggregation via TestDFSIO 2 ProLiant DL385 G6 servers (2 6-cores AMD, 32GB in Container-based Hadoop IEEE 802.3ad standard RAM), IEEE 802.3ad-enabled Gigabit Ethernet [160]° EHadoop: Network-aware scheduling YARN Online profiling, HiBench (sort, 30 nodes, 4 data and 20-25 compute (8 CPU in elastic MapReduce clusters FIFO and fair schedulers WordCount, PageRank) cores, 8GB RAM), 800 Mbps network [169]† Improving Hadoop in Hadoop 2.2.0 Statistical analysis of WordCount, 18 nodes cluster in 3 sub-clusters, 1 Gbps network, each geo-distributed data centers traffic log data TeraSort sub-cluster contains 3 racks and 6 data nodes [171]† Data-centric cloud architecture Hadoop 1.x parallel algorithm for job Aggregate, expand, Simulations for 12 nodes in EC2 for geo-distributed MapReduce scheduling and data routing transform, summary [172]† MapReduce in geo-distributed data Hive Chance-constrained Hive traces Simulations for 12 data centers with network bandwidth centers with predictable completion time optimization uniformly distributed between 100Mbps and 1Gbps [173]† Balancing processing location and type, Hadoop 1.x, MILP, Input data chunks (10 -330)GB NSFNET, Cost239, and Italian networks with 14 and energy consumption in big data others heuristic with output sizes (0.01-330)GB processing nodes and 2 cloud data centers networks [174]† Balancing processing speed and energy Hadoop 1.x MILP Input data chunks (10 -220)GB NSFNET network with 14 processing nodes consumption in big data networks with output sizes (0.2-220)GB and 2 cloud data centers [177]† G-MR: MapReduce framework for Hadoop 1.x Data Transformation CensusData, WordCount, 10 large instances from 4 EC2 data centers Geographically-distributed data centers. Graph (DTG) algorithm KNN, Yahoo, Google traces (4 CPU cores, 7.5GB RAM), 2 VICCI clusters [178]† Hadoop system-level improvements Hadoop 1.x Sub-cluster scheduling and WordCount, Grep 20 nodes in 2 sub-clusters, cross access latency in geo-distributed data centers reduce output placement of 200 ms, and subcluster latency of 1 ms [180]† Performance model for iterative Hadoop 2.6.0 Profiling, rack-aware Iterative GREP, k-means, 8 nodes, 4 (4 CPU cores, 500GB HDD, 4GB RAM), MapReduce with cloud bursting scheduler HiBench, PageRank 4 (2_8 CPU cores, 1 TB HDD, 64GB RAM), KVM [181]† BStream: Cloud bursting Hadoop 0.23.0, Profiling, modification WordCount, MultiWord- Cluster: 21 nodes (2 2GHz CPU, 4GB framework for MapReduce Storm 0.8.1 to YARN Count, InvertedIndex, , RAM, 500GB disk), Cloud: 12 nodes sort (Quad 3.06GHz CPU, 8GB RAM) [187]† G-Cut: geo-distributed PowerGraph modification to PowerGraph PageRank, Shortest Single EC2 instances, graph partitioning partitioning algorithm Source Path (SSSP), Subgraph Simulations on Grid5000 Isomorphism (SI) on 5 real- world graphs [192]‡ OpenFlow (OF)-enabled Matsu, modified JT, MalStone Benchmark, Open Cloud Consortium Project infra- Hadoop over Ethernet Hadoop 1.x Open vSwitch (OVS) sort structure with 10 nodes, 3 data centers, OF-enabled switches, 10Gbps network [193]‡ Palantir: SDN service to obtain Hadoop 1.x Service module Facebook traces Testbed with 32 nodes in 4 data centers, network proximity for Hadoop in Floodlight each with 2 racks, Pronto 3290 OpenFlow core switch and OVS in Top-of-Rack (ToR) switches *Cloud resources management, °VM and containers assignments, †Bulk transfers and inter DCN, ‡SDN and NFV-based.

TABLE IV SUMMARY OF CLOUD NETWORKING-FOCUSED OPTIMIZATION STUDIES FOR GENERIC CLOUD APPLICATIONS Ref Objective Tools Benchmarks/workloads Experimental Setup/Simulation environment [125]* Energy efficiency in cloud data energy consumption MapReduce synthesized Simulations centers with different granularities evaluation workloads, video streaming [126]* Cloud recommendation system based Analytic Hierarchy Process Information about cloud Single machine as master, server from NeCTAR on multicriteria QoS optimization (AHP) decision making providers (e.g. prices, locations) cloud, small EC2 instance, andC3.8xlarge EC2 instance [129]* QoS and heterogeneity-aware Application Lightweight controller Single and multi-threaded std Homogeneous 40 nodes clusters with 10 configurations, to Datacenter Server Mapping (ASDM) in cluster schedulers benchmarks, Microsoft workloads heterogeneous 40 nodes cluster with 10 machine types [130]* Highly available cloud MILP model for availability- Randomly generated 50 nodes cluster with total applications and services aware VM placements MTTF and MTTR (32 CPU cores, 30GB RAM) [131]* CloudMirror: Applications-based network TAG modeling Empirical and synthesized Simulations for tree-based cluster with 2048 servers abstraction and workloads placements with placement algorithm workloads high availability [133]* ElasticSwitch: Work-conserving minimum Guarantee Partitioning (GP), Shuffling traffic 100 nodes testbed (4 3GHz CPU, 8GB RAM) bandwidth guarantees for cloud computing Rate Allocation (RA), OVS [134]* EyeQ: Network performance Sender and receiver Shuffling traffic, 16 nodes cluster (Quad core CPU), 10 Gbps NIC Isolation at the servers EyeQ modules Memcached traffic packet-level simulations [143]° Multi-resource scheduling in Constrained programming, Google cluster Trace-driven simulations for cloud data centers with VMs first-fit, best-fit heuristics Traces (1024 nodes, 3-tier tree topology) [144]° Real-time scheduling for Rolling-horizon Google cloud Simulations via tasks in virtualized clouds optimization Tracelogs CloudSim toolkit [145]° Cost and energy aware scheduling VM selection/reuse, tasks Montage, LIGO, Simulations via For deadline constraint tasks in clouds with merging/slaking heuristics SIPHT, CyberShake CloudSim toolkit [156]° FlexTuner: Flexible container- Modified Mininet, MPICH2 version 1.3, Simulation for one VM based tuning system iperf tools NAS Parallel Benchmarks (2GB RAM, 1 CPU core) [161]† NetStitcher: system for inter Multipath and multi-hop, Video and content Equinix topology, 49 CDN (Quad Xeon data center bulk transfers store-and-forward algorithms sharing traces CPU, 4GB RAM, 3TB disk), 1 Gbps links [162]† Costs reduction for Convex optimizations, Uniform distribution Simulations for inter inter-data cener traffic time-slotted model for file sizes DCN with 20 data centers [163]† Profit-driven traffic scheduling for inter Lyapunov Uniform distribution Simulations for 7 nodes in EC2 DCN with multi cloud applications optimizations for arrival rate with 20 different cloud applications [164]† CloudMPcast: Bulk transfers in multi ILP and Heuristic based Data backup, Trace-driven simulations for 14 data centers with CSP pricing models on Steiner Tree Problem video distribution data centers from EC2 and Azure [165]† Reduce completion time of big data LockStep Broadcast - Numerical evaluations broadcasting in heterogeneous clouds Tree (LSBT) algorithm [166]† Optimized regular data backup in ILP and Transfer of Simulations for US backbone Geographically-distributed data centers heuristics 1.35 PBytes network topology with 6 data centers [167]† Optimize migration and backups in Greedy anycast Poisson arrival demands, Simulations for EON-based geo-distributed DCNs algorithms -ve exponential service time NSFNET network [168]† Energy saving in end- Power modelling by 20 and 100GB data sets Servers (Intel Xeon, 6GB RAM, 500 GB disk), (AMD FX, to-end data transfers linear regression with different file sizes 10GB RAM, 2TB disk), Yokogawa WT210 power meter [170]† Reducing costs of migrating geo- An offline and 2 Meteorological 22 nodes to emulate 8 data centers, 8 gateways, and dispersed big data to the cloud online algorithms data traces 6 user-side gateways, additional node for routing control [175]† Reduction of electricity and bandwidth Distributed 22k jobs from 4 data centers with varying electricity costs in inter DCNs with large jobs algorithms Google cluster traces and bandwidth costs throughout the day [176]† Power cost reduction in distributed data Two time scales Facebook Simulations for 7 geo-distributed data centers centers for delay-tolerant workloads scheduling algorithm traces [179]† Tasks scheduling and WAN bandwidth Community Detection 2000 file with Simulations in China-VO allocation for big data analytics based Scheduling (CDS) different sizes network with 5 data centers [182]† SAGE: service-oriented architecture Azure Queues 5 streaming services Azure public cloud for data steaming in public cloud with synthetic benchmark (North and West US and North and West Europe) [183]† JetStream: execution engine for OLAP cubes, 140 Million HTTP requests (51GB) VICCI testbed to emulate Coral CDN data streaming in WAN adaptive data filtering [184]† Networking cost reduction for geo- MILP model, Multiple Streams with four Simulations for distributed big data stream processing VM Placement algorithm different semantics NSFNET network [185]† Reduction of streaming workflows MILP and 500 streaming workflows Simulations for costs in geo-distributed data centers 2 heuristics each with 100 tasks 20 data centers [186]† Iridium: low-latency geo- Linear program, Bing, Conviva, Facebook, TPC- EC2 instances in 8 worldwide regions, distributed data analytics online heuristic DS queries, AMPLab Big-Data trace-driven simulations [188]‡ Application-aware Aggregation and TE OpenFlow, Voice, video, 7 GE Quanta packet switched, 3 Ciena CoreDirector in converged packet-circuit networks POX controller and web traffic hybrid switch, 6 PCs with random traffic generators [189]‡ Application-Centric IP/optical SDN Bulk transfers, dynamic Software for controller and use Network Orchestration (ACINO) orchestrator 5G services, security, CDN cases for ACINO infrastructure [190]‡ ZeroTouch Provisioning (ZTP) Network automation Bioenformatics, GEANT network and use for managing multiple clouds with SDN and NFV UHD video editing cases for ZTP/ZTPOM [191]‡ Routing, Modulation level and Spec- MILP, Bulk data Emulation of NSFNET trum Allocation for bulk transfers NOX controller transfers network with OF-enabled WSSs [194]‡ VersaStack: full-stack model-driven Resources modelling and Reading/writing from/to AWS and Google VMs, OpenStack-based VMs orchestrator for VMs in hybrid clouds orchestration workflow Ceph parallel storage with SR-IOV interfaces, 100G SDN network [195]‡ Open Transport Switch OTS prototype Bulk data ESnet Long Island Metropolitan (OTS) for Cloud bursting based on OpenFlow transfers Area Network (LIMAN) testbed [196]‡ Malleable Reservation for efficient MILP, dynamic Bulk data Simulations on NSFNET bulk data transfer in EONs programming transfers [197]‡ SDN-based bulk data Dynamic algorithm, Bulk data 10 data centers (IBM BladeCenter HS23 cluster), 10 OF- transfers orchestration OpenFlow, Beacon transfers enabled HP3500 switches, 10 server-based gateways [198]‡ Owan: SDN-based traffic management Simulated Bulk data Prototype, testbed, simulations system for bulk transfers over WAN annealing transfers for ISP and inter data center WAN [199]‡ SDN-managed Bulk transfers ABNO, ILP Bulk data OMNeT++ simulations for Telefonica (TEL), British in Flexgrid inter DCNs heuristics transfers Telecom (BT), and Deutsche Telekom (DT) networks [200]‡ Genome-Centric cloud-based NetServ, algorithms Genome 3-server clusters with total (160 CPU cores, 498GB networking and processing off-path signaling data RAM, and 37TB storage), emulating GEANT Network [201]‡ VNF placement for MILP, Random DAGs 10 nodes network with random big data processing heuristics for chained NFs communication cost values [202]‡ NFV-based CDN network for OpenStack-based HD and full Leaf cache on SYNERGY testbed (1 DC with 3 servers big data video distribution CDN manager HD videos each with Intel i7 CPU, 16GB RAM, 1TB HDD) [203]‡ NUBOMEDIA: real-time multimedia PaaS-based WebRTC 3 media server on communication and processing APIs data KVM-based instances *Cloud resources management, °VM and containers assignments, †Bulk transfers and inter DCN, ‡SDN and NFV-based.

VI. DATA CENTERS TOPOLOGIES, ROUTING PROTOCOLS, TRAFFIC CHARACTERISTICS, AND ENERGY EFFICIENCY:

Data centers can be classically defined as large warehouses that host thousands of servers, switches, and storage devices to provide various data processing and retrieval services [528]. Intra Data Center Networking (DCN), defined by the topology (i.e. the connections between the servers and switches), links capacity, and the switching technologies utilized and routing protocols, is an important design aspect that impacts the performance, power consumption, scalability, resilience, and cost. Data centers have been successfully hosting legacy web applications but are challenged by the need to host an increasing number of big data and cloud-based applications with elastic requirements, multi-tenant, and heterogeneous workloads. Such requirements are contentiously challenging data center architectures to improve their scalability, agility, and energy efficiency while providing high performance and low latency. The rest of this Section is organized as follows: Subsection VI-A reviews electronic switching-based data centers, while Subsection VI-B reviews proposed and demonstrated hybrid electronic/optical and optical switching-based data centers. Subsection VI-C briefly describe HPC clusters, and disaggregated data centers. Subsection VI-D presents traffic characteristics in cloud data centers, while Subsection VI-E reviews intra DCN routing protocol and scheduling mechanisms. Finally, Subsection VI-F addresses the energy efficiency in data centers. A. Electronic Switching Data Centers: Extensive surveys on the categorization and characterization of different data center topologies and infrastructures are presented in [529]-[535]. In what follows, we briefly review some of state-of-the-art electronic switching DCN topologies while emphasizing their suitability for big data and cloud applications. Servers in data centers are typically organized in “racks” where each rack typically accommodates between 16 and 32 servers. A Top-of-Rack (ToR) switch (also known as access or edge switch) is used to provide direct connections between the rack's servers and indirect connections with other racks via higher layer/layers switches according to the DCN topology. Most of legacy DCNs have a multi-rooted tree structure where the ToR layer is connected either to an upper core layer (two-tiers) or upper aggregation and core layers (three-tiers) [528]. For various improvement purposes, alternative designs based on Clos networks, flattened connections with high-radix switches, unstructured connections, and wireless transceivers were also considered. These architectures can be classified as switch-centric as the servers are only connected to ToR switches and the routing functionalities are exclusive to the switches. Another class of DCNs, known as server-centric, utilizes the servers/set of servers with multiport NIC and software-based routing to aid the process of traffic forwarding. A brief description of some electronic switching DCNs is provided below and some examples of small topologies are illustrated in Figure 12 showing the architecture in each case: • Three-tier data centers [528]: Three-tier designs have access, aggregation, and core layers (Figure 12(a)). Different subsets of ToR/access switches are connected to aggregation switches which connect to core switches with higher capacity to ensure all-to-all racks connectivity. This increases the over-subscription ratio as the bisection bandwidth between different layers varies due to link sharing. Supported by firewall, load balancing, and security features in their expensive switches, three-tier data centers were well-suited for legacy Internet-based services with dominant south-north traffic and were widely adopted in production data centers. • k-ary Fat-tree [536]: Fat-tree was proposed to provide 1:1 oversubscription and multiple equal-cost paths between servers in a cost-effective manner by utilizing commodity switches with the same number of ports (k) at all layers. Fat-tree organizes sets of equal edge and aggregation switches in pods and connects each pod as a complete bipartite graph. Each edge switch is connected to a fixed number of servers and each pod is connected to all core switches forming a folded-Clos network (Figure 12(b)). The Fat-tree architecture is widely considered in industry and research [529] indicating its efficiency with various workloads. However, its wiring complexities increase massively with scaling. • VL2 [537]: Is a three-tier Clos-based topology with 1:1 oversubscription proposed to provide performance isolation, load balancing, resilience, and agility in workload placements by using a flat layer-2 addressing scheme with address resolution. VL2 suits virtualization and multi-tenants, however, its wiring complexities are high. • Flattened ButterFLY (FBFLY) [538]: FBFLY is a cost-efficient topology that flattens k-ary n-fly butterfly networks into k-ary n-flat networks by merging the n switches in each row into a single high-radix switch. FBFLYs improve the path diversity of butterfly networks, and achieve folded-Clos network performance under load-balanced traffic with half the costs. However, with random adversarial traffic patterns, both load balancing and routing become challenging, thus, FBFLY is not widely considered for big data applications. • HyperX [539]: HyperX is a direct network of switches proposed as an extension to hypercube and flattened butterfly networks. Further design flexibility is provided as several regular or general configurations are possible. For load-balanced traffic, HyperX achieved the performance of folded Clos with fewer switches. Like FBFLY, HyperX did not explicitly target improving big data applications. • Spine-leaf ([e.g. [540], [541]): Spine-leaf DCNs are folded Clos-based architectures that gained widespread adoption by industry as they utilize commercially-available high-capacity and high-radix switches. Spine- leaf allows flexibility in the number of spine, leaf, and servers per leaf and links capacities at all layers (e.g. in Figure 12(c)). Hence, controllable oversubscription according to cost-performance trade-offs can be attained. Their commercial usage indicates acceptable performance with big data and cloud applications. However, wiring complexities are still high. • BCube [542] and MDCube [543]: BCube is a generalized hyper cube-based architecture that targets modular data centers with scales that fit in shipping containers. The scaling in BCube is recursive where the first building block “BCube0” is composed of n servers and an n-port commodity switch and the kth level k (i.e. BCubek) is composed of n BCubek-1 and n n-port switches. Figure 12(d) shows a BCube1 with n=4. For its multipath routing and to provide low latency and high bisection bandwidth and fault-tolerance, BCube utilizes switches and servers equipped with multiple ports to connect with switches at different levels. BCube is hence, suitable for several traffic patterns such as 1-1, 1-many, one-all, and all-all which arise in big data workloads. However, with large scales, lower level to higher level bottlenecks increase and address space have to be overwritten. For larger scales, MDCube in [543] was proposed to interconnect BCube containers by high speed links in 1D or 2D connections. • CamCube [544]: CamCube is a server-centric architecture that directly connects servers in a 3D torus topology where each server is connected to other neighbouring 6 servers. In CamCube, switches are not needed for intra routing which reduces costs and energy consumption. A Greedy key-based routing algorithm, Symbiotic routing, is utilized at the servers which enables applications-specific routing and arbitrary in-network functions such as caching and aggregation at each hop which can improve big data analytics [214]. However, CamCube might not suit delay sensitive workloads with high number of servers due to routing complexities, longer paths, and high store-and-forward delay.

• DCell [545]: DCellk is a recursively-scaled data center that utilizes a commodity switch per DCell0 pod to connect its servers, and the remaining of the (k+1) ports in the servers for direct connections with servers in

other pods of same level and in higher levels pods. Figure 12(e) shows a DCell1 with 4 servers per pod. DCell provides high bandwidth, scalability, and fault-tolerance at low costs. In addition, under all-all, many- 1, and 1-many traffic patterns, DCell achieves balanced routing, which ensures high performance for big data applications. However, as it scales, longer paths between servers in different levels are required. • FiConn [546]: FiConn is a server-centric DCN that utilizes switches and dual port servers to recursively scale while maintaining low diameter and high bandwidth at reduced cost and wiring complexity compared

to BCube and DCell. In FiConn0, a port in each server is connected to the switch and in each level, half of the remaining ports in the pods are reserved for the connections with servers in the next level. For example,

Figure 12(f) shows a FiConn2 with 4 servers per FiConn0. Real-time requirements are supported by employing a small diameter and hop-by-hop traffic-aware routing according to the network condition. This also improves the handling of bursty traffic of big data applications. • Unstructured data centers with random connections: With the aim of reducing the average path lengths and easing incremental expansions, unstructured DCNs based on random graphs such as Jellyfish [547], Scafida [548], and Small-World Data Center (SWDC) [549] were proposed. Jellyfish [547] creates random connections between homogeneous or heterogeneous ToR switches and connect hosts to the remaining ports to support incremental scaling while achieving higher throughput due to low average path lengths. Scafida [548], is an asymmetric scale-free data center that incrementally scale under limits on the longest path length. In Scafida, two disjoint paths are assigned per switch pairs to ensure high resilience. SWDC [549] includes its servers in the routing and connects them in small-world-inspired distribution connections. Unstructured DCNs, however, have routing and wiring complexities and their performance with big data workloads is not widely addressed. • Data centers with wireless 60 GHz radio transceivers: To improve the performance of tree-based DCNs without additional wiring complexity, the use of wireless transceivers at servers or ToR switches was also proposed [550], [551].

Fig. 12. Examples of electronic switching DCNs (a) Three-tier, (b) Fat-tree, (c) Spine-leaf, (d) BCube, (e) DCell, and (f) FiConn. B. Hybrid Electronic/Optical and All Optical Switching Data Centers: Optical switching technologies have been proposed for full or partial use in DCNs as solutions to overcome the bandwidth limitations of electronic switching, reduce costs, and to improve the performance and energy efficiency [552]-[557]. Such technologies eliminate the need for O/E/O conversion at intermediate hops and make the interconnections data-rate agnostic. Hybrid architectures add Optical Circuit Switching (OCS), typically realized with Micro-Electro-Mechanical System Switches (MEMSs) or free-space links, to enhance the capacity of an existing Electronic Packet Switching (EPS) network. To benefit from both technologies, bursty traffic (i.e. for mice flows) is offloaded to EPS while bulky traffic (i.e. for elephant flows) is offloaded to the OCS. MEMS- based OCS requires reconfiguration time in the scale of ms or µs before setting paths between pairs of ToR switches, and because packet headers are not processed, external control is needed for the reconfigurations. Another shortcoming of MEMS is their limited port count. WDM technology can increase the capacity of ports without huge increase in the power consumption [270], resolve wavelength contention, and reduce wiring complexities at the cost of additional devices for multiplexing, de-multiplexing, and fast tuning lasers and tuneable transceivers at ToRs or servers. In addition, most of the advances in optical networking discussed in Subsection IV-B6 have also been considered for DCNs such as OFDM [558], PONs technologies [559]-[573], and EONs [574]-[576]. In hybrid and all optical DCNs, both active and passive components were considered. The passive components including fibers, waveguides, splitters, couplers, Arrayed Waveguide Gratings (AWGs), and Arrayed Waveguide Grating Routers (AWGRs), do not consume power but have insertion loss, crosstalk, and attenuation losses. Active components include Wavelength Selective Switches (WSSs), that can be configured to route different sets of wavelengths out of a total of M wavelengths in an input port to N different output ports (i.e. 1×N switch), MEMSs, Semiconductor Optical Amplifiers (SOAs) that can provide switching time in the range of ns, Tuneable Wavelength Converters (TWCs), and Mach-Zehnder Interferometer (MZI) which are external modulators based on controllable phase shifts in split optical signals. In addition to OCS, Optical Packet Switching (OPS) [577]-580] was also considered with or without intermediate electronic buffering. Examples of hybrid electrical/optical and all optical switching DCNs are summarized below, and some are illustrated in Figure 13: • c-Through [581]: In c-Through, electronic ToR switches are connected to a two-tier EPSs network and a MEMS-based OCS as depicted in Figure 13(a). The EPS maintains persistent but low bandwidth connections between all ToRs and handles mice flows, while the OCS must be configured to provide high bandwidth links between pairs of ToRs at a time to handle elephant flows. As the MEMS used have ms switching time, c-Through was only proven to improve the performance of workloads with slowly varying traffic. • Helios [582]: In Helios, electronic ToR switches are connected to a single tier containing arbitrary number of EPSs and MEMS-based OCSs as in Figure 13(b). Helios performs WDM multiplexing in the OCS links and hence requires WDM transceivers in the ToRs. Due to its complex control, Helios was demonstrated to improve the performance of applications with second-scale traffic stability. • Mordia [583]: Mordia is a 24-port OCS prototype based on ring connection between ToRs each with 2D MEMS-based WSS that provides 11.5 µs reconfiguration time at 65% of electronic switching efficiency. Mordia can support unicast, multicast, and broadcast circuits, and enables both long and short flows offloading which makes it suitable for big data workloads. However, it has limited scalability as each source- destination needs a dedicated wavelength. • Optical Switching Architecture (OSA) [584] / Proteus [585]: OSA and Proteus utilize a single MEMS- based optical switching matrix to dynamically change the physical topology of electronic ToRs connections. Each ToR is connected to the MEMS via an optical module that contains multiplexers/demultiplexers for WDM, a WSS, circulators, and couplers as depicted in Figure 13(c). This flexible design allows multiple connections per ToR to handle elephant flows and eliminates blocking for mice flows by enabling multi-hop connections via relaying ToRs. OSA was examined with bulk transfers and mice flows and minimal overheads were reported while achieving 60%-100% non-blocking bisection bandwidth. • Data center Optical Switch (DOS) [586]: DOS utilizes an N+1-ports AWGR to connect N ToR electronic switches through OPS with the aid of optical label extractors as shown in Figure 13(d). Each ToR is connected via a TWC to the AWGR to enable it to connect to one other ToR at a time. At the same time, each ToR can receive from multiple ToR simultaneously. The last ports on the AWGR are connected to an electronic buffer to resolve contention for transmitting ToRs. DOS suits applications with bursty traffic patterns, however, its disadvantages include the limited scalability of AWGRs and the power hungry buffering. • Petabit [587]: Petabit is a bufferless and high-radix OPS architecture that utilizes three stages of AWGRs (Input, central, and output) in addition to TWCs as depicted in Figure 13(e). At each stage the wavelength can be tuned to a different one according to the contention at the next stage. However, electronic buffering at the ToRs and effective scheduling are required to achieve high throughput. Petabit can scale without impacting latency and thus can guarantee high performance for applications even at large scales. • Free-Space Optics (FSO)-based data centers: Using FSO-based interconnection to link ToRs using mirrors in roofs in DCNs was proposed by several studies such as FireFly [588], and Patch Panels [589].

Fig. 13. Examples of hybrid/all optical switching DCNs (a) c-Through, (b) Helios, (c) OSA/Proteus, (d) DOS, and (e) Petabit.

C. HPC Clusters and Disaggregated Data Centers: The DCNs summarized in the previous two Subsections target production and enterprise data centers with general-purpose usage. Alternatively, HPC clusters target specific set of compute-intensive application and are thus designed with specialized servers, accelerators such as Graphical Processing Units (GPUs), in addition to having dedicated high speed interconnections to parallel storage systems such as InfiniBand and Myrinet, Torus or mesh topologies, and photonic interconnects for chip-chip, board-board, blade-blade, and rack-rack links [590]- [592]. According to the International Data Corporation (IDC), more than 67% of HPC facilities performed big data analytics [78]. Another variation to conventional data centers is disaggregating the CPU, memory, IO, and network resources at rack, pod, or entire data center level to achieve utilization and power efficiency advantages over legacy single-box servers [593]-[599]. Combinations of big data applications with uncorrelated resources demands can be deployed in disaggregated data centers with higher utilization, energy efficiency, and performance [593]. D. Characteristics of Traffic inside Data Centers: Traffic characteristics within enterprise, cloud, social networking, and university campus data centers have been reported and analysed in [600]-[604] to provide several insights about traffic patterns, volume variations, and congestion, in addition to various flows statistics such as their duration, arrivals, and inter-arrival times. Intra data center traffic is mainly composed of different mixes of data center applications traffic including retrieval services and big data analytics, and provisioning operations such as data transfers, replication and backups. The first three pioneering studies by Microsoft Research [600]-[602] pointed that traffic monitoring tools used by ISPs in WANs do not suit data centers environments as their traffic characteristics are not statistically similar. The authors in [600] utilized low-overhead socket-level instrumentation at 1500 servers to collect application-aware traffic information. This cluster contained diverse workloads including MapReduce style workloads that generated 10GB per server per day. Two traffic patterns were defined; Work-Seeks-Bandwidth: for engineered applications with high locality, and Scatter-Gather: for applications that require servers to push or pull data from several others. It was found that 90% of the traffic stays inside the racks and that 80% of the flows last less than 10s, while less than 0.1% last more than 200s. The traffic patterns showed transient or stable spikes and the flows inter-arrivals were periodic short term bursts spaced by an average of 15ms and a maximum of 10s. In [601], an empirical study for traffic patterns in 19 tree-based two-tier and three-tier data centers for web-based services was carried based on coarse-grained measurements for links utilization and packets loss rate taken from Simple Network Management Protocol (SNMP) logs at routers and switches every five minutes. Average link loads were found the highest at the core switches and the highest packets losses were found at edge (i.e. ToR) switches. Additionally, fine-grained packet-level statistics at five edge switches in a smaller cluster showed a clear ON-OFF intensity and log-normal inter-arrivals during ON intervals. Reverse engineering methods to obtain fine-grained characteristics from SNMP logs were suggested for data center traffic generators and simulators. SNMP logs were also utilized in [602] to study the traffic empirically while considering broader data centers usages and topologies. Those included 10 data centers with two-tier, three-tier, star-like, and Middle-of-Rack switches-based (i.e. connecting servers in several racks) topologies for university campuses, private enterprises, in addition to web-based services and big data analytics cloud data centers. The send/receive patterns of applications including authentication services, HTTP and secure HTTP-based, and local-use applications were examined at the flow-level to measure their effects on links utilization, congestion, and packets drop rate. It was found that 80% of the traffic stayed inside the racks in cloud data centers, while 40-90% left the racks in universities and enterprises data centers. Fine-grained packet traces from selected switches in 4 DCNs indicated that 80% of the flows are small (i.e. ≤ 10 kB), active flows were less than 10,000 per second per rack, and Inter- arrivals were less than 10 µs for 2-13% of the flows. The recent studies [603], and [604] presented the traffic characteristics inside Facebook's 4-tier Clos-based data center that hosts hundreds of thousands of 10 Gbps servers. In [603], wide monitoring tools and per-host packet-level traces were utilized to characterize the traffic while focusing on its locality, stability, and predictability. The practice in this architecture recommends assigning each machine to one role (e.g. cache, web server, Hadoop), localizing each type of workloads in certain racks, and varying the level of oversubscription according to the workload needs. Traffic was found to be neither fully rack- local nor all-all, and without ON-OFF behaviour. Also, it was found that servers communicated with up to 100s of servers concurrently, most of the flows were long-lived, and non-Hadoop packets were <200 Bytes. To capture fine-grained network behaviours such as µbursts (i.e. high utilization events lasting <1 ms), high-resolution measurements at the rack-level with granularity of tens/hundreds of µs were utilized in [604] for the same data center and application sets in the previous study. The measurements were based on a developed counter collection framework that polls packet counters and buffer utilization statistics every 25µs, and 50µs, respectively. It was found that high utilization events were short-lived as more than 70% of bursts lasted at most for tens of µs, and that load is very unbalanced as web and Hadoop racks had hot downlink ports while Cache had uplink hot ports. 90% of bursts lasted < 200µs for all application types, and < 50µs for Web racks, and the highest tail was recorded for Hadoop racks at 0.5 ms. It was noticed that the packets included in µbursts are larger than in the outside, µbursts were caused by application behavioural changes, and that the arrival rate of µbursts was not Poisson with 40% of inter-arrivals being > 100 µs for Cache and Web racks. Regarding the impact of µbursts on shared buffers, Hadoop racks had ports buffers at > 50% utilization, while web and cache racks had a maximum of 71% and 64% of their ports buffers at high utilization. Latency and packet loss measurements between VMs in different public clouds were presented and performed in [605] through a developed tool, PTPmesh, that aided cloud users in monitoring network conditions. Results for one way messaging delay between data centers in the same and different clouds were shown to range between µs and ms values. Specific traffic measurements for big data applications were presented in [279] and three traffic patterns; Single peak, repeated fixed-width peaks, varying heights and widths peaks were reported. To forecast Hadoop traffic demands in cloud data centers, HadoopWatch which utilizes real-time file system monitoring tools was proposed in [606]. HadoopWatch monitors the meta-data and log files of Hadoop to extract accurate traffic information for importing, exporting, and shuffling data flows before entering the network by few seconds. Based on coflows information, a greedy joint optimization for scheduling and routing flows improved jobs completion time by about 14%. Several recent studies tackled improving the profiling, estimation, and generation of data center's traffic to aid in examining DCN TE, congestion control and load balancing methods [607]-[610]. The authors in [607] applied sketch-based streaming algorithms to profile data center traffic while considering its skewness among different services. To model spatial and temporal characteristics of traffic in large-scale DC systems, ECHO was developed in [610] as a scalable topology and workload-independent modeling scheme that utilizes hierarchical Markov Chains at the granularities of groups of racks, racks and servers. CREATE, a fine-grained Traffic Matrix (TM) estimation proposed in [608], utilized the sparsity of traffic between ToR switches to remove underutilized links. Then, the correlation coefficient between ToR switches in the reduced topology is extracted via SNMP counters and service placement logs to estimate the TM. In [609], random number generators for Poisson shot-noise processes were utilized to design a realistic TM generator at the flow-level between hosts in tree-based DCN while considering flows arrival rate, ratio in the same rack, duration, and size that can be used with packet-level traffic generators for DCN simulations. E. Intra Data Centers Routing Protocols and Traffic Scheduling Mechanisms: Routing protocols, which define the rules for choosing the paths for flows or flowlets between source and destination servers, were extensively surveyed in [611]-[616]. Routing in DCNs can be static or adaptive where paths assignments can be dynamic according to criteria measured by a feedback mechanism. Adaptive routing or traffic scheduling can be centralized where a single controller is required to gather network-wide information and to distribute routing and rate decisions to switches and servers, or distributed where the decisions are taken independently by the switches or servers according to local decision based on partial view of the network. Centralized mechanisms provide optimal decisions but have limited scalability while distributed mechanisms are scalable but not always optimal. Tree-based data centers such as three-tier designs typically utilize VLAN with Spanning Tree Protocol (STP), which is a simple Layer2 protocol that eliminates loops by disabling redundant links and forcing the traffic to route through core switches. Spine-leaf DCNs typically use improved protocols such as Transparent Interconnection of Lots of Links (TRILL) or Shortest Path Bridging (SPB) that enables the utilization of all available links while ensuring loop-free routing. CONGA [617] was proposed as distributed flowlets routing mechanism for spine-leaf data centers, and achieves load balancing by utilizing leaf-leaf congestion feedback. Improved tree-based DCNs such as Fat-tree and server-centric DCNs require designing their routing protocols closely with their topological properties to fully exploit the topology. For example, Fat-tree requires specific routing with two-level forwarding tables for servers with fixed pre-defined addresses [536]. To select paths according to network bandwidth utilization in Fat-tree, DARD was proposed in [618] as a host-based distributed adaptive routing protocol. For agility, VL2 uses two addresses for servers; a Location-specific Address (LA), and an Application-specific Address (AA) [537]. For packets forwarding, VL2 employs Equal Cost Multi-Path (ECMP), which is a static layer3 routing protocol that distributes flows to paths by hashing, and Valiant Load Balancing (VLB), that randomly selects intermediate nodes between a source and a destination. BCube employs a Source Routing protocol (BSR) [542], DCell adopts a distributed routing protocol (DFR) [545], and JellyFish [547] uses a k-shortest paths algorithm. c-Through uses the Edmond's algorithm to obtain the MEMSs configurations from the traffic matrix, then the ToR switches traffic is sent via VLAN-based routing into the OSC or the EPS [581]. Helios has a complex control scheme of three modules; Topology Manager (TM), circuit switch manager (CSM), and Pod Switch Manager (PSM) [582]. Mordia utilizes a Traffic Matrix Scheduling (TMS) algorithm that obtains effective short-lived circuit schedules, based on predicted demands, that can be applied to configure the MEMS and WSSs sequentially. OSA, and Proteus use the maximum-weight b-matching problem to enable the connection of multiple ToR switches, configure the WSS to match capacities and then use shortest path-based routing [584]. Using the Transmission Control Protocol (TCP) of Internet in data center environments has been proved to be inefficient [613] due to the difference in the nature of their traffic, the higher sensitivity to incast, and the key requirement in data center applications of minimizing Flow Completion Time (FCT). Thus, different transport protocols were proposed for DCNs. DCTCP [619] provides similar or better throughput than TCP and guarantees low Round Trip Time (RTT) by active control of queue lengths in the switches. MPTCP [620] splits flows to sub- flows and balances the load across several paths via linked congestion control. However, it might perform excessive splitting which requires extensive CPU and memory resources at end hosts. D2TCP [621] is a Deadline- aware Data center TCP protocol that considers single paths for flows and performs load balancing. For FCT reduction, D3 proposed in [622] uses flow deadline information to control the transmission rate. pFabric [623] and PDQ [624] enable the prioritirization of the flows closet to completion, and DeTail [625] splits flows, and performs adaptive load balancing based on queues occupancy to reduce the highest FCT. Alternatively, the centralized schedulers; Orchestra [258], Varys [262], and Baraat [263], which are further elaborated in Subsection VII-C, target reducing the completion time of coflows which are sets of flows with applications-related semantics such as intermediate data shuffling in MapReduce. Virtualization in data centers was surveyed in [626] with a focus on routing and resources management and in [627] while focusing on the techniques for network isolation in multi-tenant data centers. SDN has been widely considered for data centers as it can improve the load balancing, congestion detection and mitigation [421], [431], [628]. To allow users to make bandwidth reservation in data centers for their VM-to-VM communications, a centralized controller is used to determine the rate and path for each user's flow. SecondNet was proposed in [629] and is such a controller. Hedera in [630], detects elephant flows and maximizes their throughput via a centralized SDN-based controller. ElasticTree in [631] improves the energy efficiency of Fat-tree DCNs by dynamically switching-off sets of links and switches while meeting demands and maintaining fault-tolerance. F. Energy Efficiency in Data Centers: The energy consumption in data center is attributed to servers and storage, networking devices, in addition to cooling, powering, and lightning facilities with percentages of 26%, 10%, 50%, 11%, and 3% respectively of the total energy consumption [632]. As the energy consumption of servers is becoming proportional to loads, hence their energy efficiency is improving faster, the portion of the networking is expected to increase [633]. In [632], techniques for modeling the data centers energy consumption were comprehensively surveyed where they were divided into, hardware-centric, and software-centric approaches. In [634], green metrics including the Power Usage Effectiveness (PUE) (defined as the total facility power over the IT equipment power), and measurement tools that can characterise emissions were surveyed to aid in sustaining distributed data centers. Several studies considered reducing the energy consumption and costs in data centers at different levels [635]- [639]. For the hardware, dynamically switching off the idle components, proposing efficient hardware with inherent higher efficiency components, DVFS, and utilizing optical networking elements were considered. For example, to improve the energy proportionality of Ethernet switches, the Energy Efficient Ethernet (EEE) standard [639] was developed. EEE enables three states for interfaces which are active, idle with no transmission, and low power idle (i.e. deep sleep). Although EEE have gained industrial adoption, its activation is not advised due to uncertainty with its impact on applications performance [283]. Placement of workloads and VMs into fewer servers, and scheduling tasks to shave peak power usage were also proposed to balance the power consumption and utilization in data centers.

VII. DATA CENTERS-FOCUSED OPTIMIZATION STUDIES

This Section summarizes a number of big data applications optimization studies that consider the characteristics of their hosting data centers including details such as improving their design and protocols or analyzing the impact of their computing and networking parameters on the applications’ performance. Subsection VII-A addresses the performance, scalability, flexibility, and energy consumption improvements and tradeoffs for big data applications under various data centers topologies and design considerations. Subsection VII-B focuses on the studies that improve intra data centers routing protocols to enhance the performance of big data applications while improving the load balancing and utilization of the data centers. Subsection VII-C discusses flows, coflows, and jobs scheduling optimization studies to achieve different applications and data centers performance goals. Finally, Subsection VII-D addresses the studies that utilize advanced technologies to scale up big data infrastructures and improve their performance. The studies presented in this Section are summarized in Tables V, and VI. A. Data Center Topology: Evaluating the performance and energy efficiency of big data applications in different data centers topologies was considered in [204]-[209]. The authors in [204] modeled Hadoop clusters with up to 4 ToR switches and a core switch to measure the influence of the network on the performance. Several simplifications such as homogeneous servers and uniform data distribution were applied and model-based and experimental evaluations indicated that Hadoop scaled well enough under 9 different clusters configurations. The MRPerf simulator was utilized in [205] to study the effect of the data center topology on the performance of Hadoop while considering several parameters related to clusters (e.g. CPU, RAM, and disk resources), configurations (e.g. chunk size, number of map and reduce slots), and framework (e.g. data placement and task scheduling). DCell was compared to star and double-rack clusters with 72 nodes under the assumptions of 1 replica and no speculative execution and was found to improve sorting by 99% compared to double-rack clusters. The authors in [206] extended CloudSim simulator [110] as CloudSimExMapReduce to estimate the completion time of jobs in different data center topologies with different workload distributions. Compared to hypothetically optimal topology for MapReduce with a dedicated link for each intermediate data shuffling flow, CamCube provided the best performance. Different levels of intermediate data skew were also examined and worse performance was reported for all the topologies. In [207], we examined the effects of the data center network topology on the performance and energy efficiency of shuffling operations in MapReduce with sort workloads in different data centers with electronic, hybrid and all-optical switching technologies and different rate/server values. The results indicated that optical switching technologies achieved an average power consumption reduction by 54% compared to electronic switching data centers with comparable performance. In [208], the Network Power Effectiveness (NPE) defined as the ratio between the aggregate throughput and the power consumption was evaluated for six electronic switching data center topologies under regular and energy-aware routing. The power consumption of the switches, the server's NIC ports and CPU cores used to process and forward packets in server centric topologies were considered. The results indicated that FBFLY achieved the highest NPE followed by the server-centric data centers, and that NPE is slightly impacted by the topology size as the number of switches scales almost linearly with the data center size for the topologies examined. Design choices such as link speeds, oversubscription ratio, and buffer sizes in spine and leaf architectures with realistic web search queries with Poisson arrivals and heavy- tail size distribution were examined by simulations in [209]. It was found that ECMP is efficient only at link capacities higher than 10 Gbps, where those resulted in 40% degradation in the performance compared to ideal non-blocking switch. Higher oversubscription ratios degraded the performance only at 60% and higher loads. Examining spine and leaf switches queue sizes revealed that it is better to maintain consistency and that additional capacities are beneficial at leaf switches. Flexible Fat-tree is proposed in [210] as an improvement and generalization of the Fat-tree topology in [536] to achieve higher aggregate bandwidth and richer paths by allowing uneven number of aggregation and access switches in the pods. With more aggregation switches, shuffling results indicated about 50% improvement in the completion time. As a cost-effective solution to improve oversubscribed production data centers, the concept of flyways, where additional on-demand wireless or wired links between congested ToRs, was introduced in [211] and further examined in [212]. Under sparse ToR-to-ToR bandwidth requirements, the results indicated that few flyways allocated in the right ToR switches improved the performance by up to 50% bringing it closer to 1:1 DCNs performance. The flyways can be 802.11g, 802.11n, or 60 GHz wireless links, or random wired connections for subset of the ToR switches via commodity switches. The wired connections, however, cannot help if the congestion is between unconnected ToRs. A central controller is proposed to gather the demands and utilize MPLS to forward the traffic over the oversubscribed link or one of the flyways. In [213], a spectrum efficient and failure tolerant design for wireless data centers with 60 GHz transceivers was examined for data mining applications. A spherical rack architecture based on bimodal degree distribution for the servers’ connections was proposed to reduce the hop count and hence reduce the transmission time compared to traditional wireless data center with cylindrical racks and same degree Cayley graph connections. Challenges related to interference, path loss, and the optimization of hub servers’ selection were addressed to improve the data transmission rate. Moreover, the efficiencies of executing MapReduce in failure prone environments (due to software and hardware failures) were simulated. Several big data frameworks that tailor their computations to the data center topology or utilize their properties were proposed as in [214]-[216]. Camdoop in [214] is a MapReduce-like system that run in CamCube and exploits its topology by aggregating the intermediate data along the path to reduce workers. A window-based flow control protocol and independent disjoint spanning trees with the same root per reduce worker were used to provide load balancing. CamCube achieved improvements over switch centric, Hadoop and Dryad, and over Camdoop with TCP and Camdoop without aggregation. In [215], network-awareness and utilization of existing or attached networking hardware were proposed to improve the performance of query applications. In-network processing in ToR switches with attached Network-as-a-Service (NaaS) boxes was examined to partially reduce the data and hence reduce bandwidth usage and increase the queries throughput. For API transparency, a shim layer is added to perform software-based custom routing for the traffic through the NaaS boxes. A RAM-based key-value store in BCube [542]; RAMCube was proposed in [216] to address false failure detection in large data centers caused by network congestion, entire rack blockage due to ToR switch failure, and the traffic congestion during recovery. A symmetric multi-ring arrangement that restricts failure detection and recovery to one hop in BCube is proposed to provide fast fault recovery. Experimental evaluation for the throughput under single switch failure with 1 GbE NIC cards in the servers indicated that a maximum of 8.3 seconds is needed to fully transmit data from a recovery to a backup server. The topologies of data centers were also considered in optimizing VM assignments for various applications as in [217]-[220]. A Traffic-aware VM Placement Problem (TVMPP) to improve the scalability of data centers was proposed in [217]. TVMPP follows two-tier approximating algorithm that leverages knowledge of traffic demands and the data center topology to co-allocate VMs with heavy traffic in nearby hosts. First, the hosts and the VMs are clustered separately and a 1-to-1 mapping that minimizes the aggregated traffic cost is performed. Then, each VM within each cluster is assigned to a single host. The gain as a result of TVMPP compared to random placements for different traffic patterns was examined and the results indicated that multi-level architectures such as BCube benefit more than tree-based architectures and that heterogeneous traffic leads to more benefits. To tackle intra data center network performance variability in multi-tenant data centers with network-unaware VM-based assignments, the work in [218] proposed Oktopus as an online network abstraction and virtualization framework to offer minimum bandwidth guarantees. Oktopus formulates virtual or oversubscribed virtual clusters to suit different types of cloud applications in terms of bandwidth requirements between their VMs. These allocations are based on greedy heuristics that are exposed to the data center topology, residual bandwidths in links, and current VMs allocation. The results showed that allocating VMs while accounting for the oversubscription ratio improved the completion time and reduced tenant costs by up to 75% while maintaining the revenue. In [219], a communication cost minimization-based heuristic: Traffic Amount First (TAF) was proposed for VMs to PMs assignments under architectural and resources constraints and was examined in three data centers topologies. Inter VM traffic was reduced by placing VMs with higher inter traffic in the same PM as much as possible. A Topology-independent resources allocation algorithm namely; NetDEO was designed in [220] based on swarm optimizations to gradually reallocate existing VMs and allocate newly accepted VMs based on matching resources and availability. NetDEO maintains the performance during network and servers upgrades and accounts for the topology in the VM placements. The performance of big data applications in SDN-controlled electronic and hybrid electronic/optical switching data centers topologies was considered in [221]-[228]. To evaluate the impact of networking configurations on the performance of big data applications in SDN-controlled data centers with multi-racks before deployments, a Flow Optimized Route Configuration Engine (FORCE) was proposed in [221]. FORCE emulates building virtual topologies with OVS over the physical network controlled by SDN to enable optimizing the network and enhance the applications performance at run-time and improvements by up to 2.5 times were achieved. To address big data applications and the need for frequent reconfigurations, the work in [222] examined a ToR-level SDN-based topology modification in a hybrid data center with core MEMS switch and electrical Ethernet-based switches at run-time. Different topology construction and routing mechanisms that jointly optimize the performance and network utilization were proposed for single aggregation, shuffling, and partially overlapped aggregation communication patterns. The authors accounted for reconfiguration delays by starting the applications early and accounted for the consistency in routing tables updates. The work in [223] experimentally examined the performance of MapReduce in two hybrid electronic/optical switching data centers namely c- Through and Helios with SDN control. An “observe-analyze-act” control framework based on OpenFlow was utilized for the configurations of the OCS and the packet networks. The authors addressed the hardware and software challenges and emphasized on the need for near real-time analysis of application requirements to optimally obtain hybrid switch scheduling decisions. The work in [225] addressed the challenges associated with handling long-lived elephant flows of background applications while running Hadoop jobs in Helios hybrid data centers with SDN control. Although SDN control for electronic switches can provide alternative routes to reduce congestion and allow prioritizing packets in the queues, such techniques largely increase the switches CPU and RAM requirements with elephant flows. Alternatively, [224] proposed detecting elephant flows and redirecting them via the high bandwidth OCS, to improve the performance of Hadoop. To reduce the latency of multi-hop connections between servers in 2D Torus data centers, the work in [225] proposed the use of SDN-controlled MEMS to bypass electronic links and directly connect the servers. Results based on an emulation testbed and all- to-all traffic pattern indicated that optical bypassing can reduce the end-to-end latency for 11 of the 15 hosts by 11%. To improve the efficiency of multicasting and incasting in workloads such as HDFS read, join, VM provisioning, and in-cluster software updates, a hybrid architecture based on Optical Space Switches (OSS) was proposed in [225] to establish point-to-point links on-demand to connect passive splitters and combiners. The splitters transparently duplicate the data optically at the line rate and the combiners aggregate incast traffic under orchestration system control with TDM. Compared with small scale electronic non-blocking switches, similar performance was obtained indicating potential gains with the optical accelerators at larger scale, where non- blocking performance is not attained by electronic switches. Unlike the above work which fully offloads multicast traffic to the optical layer, HERO in [227] was proposed to integrate optical passive splitters and FSO modules with electronic switching to handle multicast traffic. During the optical switches configuration, HERO multicasts through the electronic switches, then migrates to optical multicasting. HERO exhibited linear increase in the completion time with the increase in the flow sizes and significantly outperformed Binomial, and ring Message Passing Interface (MPI) broadcasting algorithms with electronic switching only for messages less than or equal to, and greater than 12 kBytes in size, respectively. A software-defined and controlled hybrid OPS and OCS data center is proposed and examined in [228] for multi-tenants dynamic Virtual Data Center (VDC) provisioning. A topology manager was utlilized to build the VDCs with different provisioning granularities (i.e. macro and micro) in an Architecture-on-Demand (AoD) node with OPS, and OCS modules in addition to ToRs and servers. For VMs placement, a network-aware heuristic that jointly considers computing and networking resources and takes the wavelengths continuity, optical devices heterogeneity, and the VM dynamics into account was considered. Improvements by 30.1% and 14.6% were achieved by the hybrid topology with 8, and 18 wavelengths per ToR switch, respectively, compared to using OCS only. The work in [229] proposed a 3D MEMS crossbar to connect server blades for scalable stream processing systems. Software-based control, routing, and scheduling mechanisms were utilized to adapt the topology to graph computational needs while accounting for MEMS reconfiguration and protocol stack delays. To overcome the high blocking ratio and scalability limitations of single hop MEMS-based OCS, the authors in [230] proposed a distributed multi-hop OCS that utilizes WDM and SDM technologies integrated with multi- rooted tree based data centers. A multi wavelengths optical switch based on Microring Resonators (MR) is designed to ensure fast switching. A modification to Fat-tree by replacing half of the electronic core, aggregation, and access switches with the OCS was proposed and distributed control was utilized at each optical switch with low bandwidth copper links to interconnect and combine control with EPS. Compared with Hedera and DARD, much faster optical path setup (i.e. 126.144 µs) was achieved with much lower control messaging overhead. B. Data Center Routing: In [231], a “reduce tasks” placement problem was analyzed in multi-rack environments to decrease cross- rack traffic based on two greedy approaches. Compared to original Hadoop with random reduce task placements, up to 32% speedup in completion time among different workloads was achieved. A scalable DCN-aware load balancing technique for key distribution and routing in the shuffling phase of MapReduce was proposed in [232] while considering DCN bandwidth constraints and addressing data skewness. A centralized Heuristic with two subproblems; network flow and load balancing, was examined and compared to three state-of-the-art techniques, load balancing-based; LPT, fairness-based; LEEN, and the default routing algorithm with hash-based key distribution in MapReduce. For synthetic and realistic traffic, the network-aware load balancing algorithm outperformed the others by 40% in terms of completion time and achieved maximum load per reduce comparable to that of LPT. To improve shuffling and reduce its routing costs under varying data sizes and data reduction ratios, a joint intermediate data partitioning and aggregation scheme was proposed in [233]. A decomposition- based distributed online algorithm was proposed to dynamically adjust data partitioning by assigning keys with larger data sizes to reduce tasks closer to map tasks while optimizing the placement and migration of aggregators that merge the same key traffic from multiple map tasks before sending them to remote reduce tasks. For large scale computations (i.e. 100 keys), and compared to random hash-based partitioning with no aggregation and with random 6 aggregators placement, the scheme resulted in 50%, and 26% reduction in the completion time, respectively. DCP was proposed in [234] as an efficient and distributed cache sharing protocol to reduce the intra data center traffic in Fat-tree data centers by eliminating the need to retransmit redundant packets. It utilized a packets cache in each switch for the eliminations and a Bloom filter header field to store and share cached packets information among switches. Simulation results for 8-ary fat-tree data center showed that DCP eliminated the retransmission by 40-60%. To effectively use the bandwidth in BCube data centers, the work in [235] proposed and optimized two schemes for in-network aggregation at the servers and switches. The first for incast traffic was modeled as minimal incast aggregation tree problem, and the second for shuffling traffic was modeled as minimal shuffle aggregation subgraph problem. 2-approximation efficient algorithms named IRS-based, and SRS-based were suggested for incast and shuffling traffic, respectively. Moreover, an effective forwarding scheme based on in-switch and in-packet Bloom filters was utilized at a cost of 10 Bytes per packet to ease related flows identification. The results for SRS-based revealed traffic reduction by 32.87% for a small BCube, and 53.33 % for a large-scale BCube with 262,144 servers. The optimization of data transfers throughput in scientific applications was addressed in [236] while considering the impact of combining different levels of TCP flows pipelining, parallelism, and concurrency and the heterogeneity in the files sizes. Recommendations were presented such as the use of pipelining only for file sizes less than a threshold related to the bandwidth latency product and with different levels related to file size ranges. To improve the throughput, routing scalability, and upgrade flexibility in electronic switching data centers with random topologies, Space Shuffle (S2) was proposed in [237] as a greedy key-based multi-path routing mechanism that operates in multiple ring spaces. Results based on fine-grained packet-level simulations that considered the finite sizes of shared buffers and forwarding tables and the acknowledgment (ACK) packets indicated improved scalability compared to Jellyfish, and higher throughput and shorter paths than SWDC and Fat-tree. However, the overheads associated with packets reordering were not considered. An oblivious distributed adaptive routing scheme in Clos-based data centers was proposed in [238] and was proven to converge to non- blocking assignments with minimal out-of-the-order packet delivery via approximate Markov models and simulations when the flows are sent at half the available bandwidth. While transmitting at full bandwidth with strictly and rearrangeable non-blocking routing resulted in exponential convergence time, the proposed approach converged in less than 80 µs in a 1152 nodes cluster and showed weak dependency on the network size at the cost of minimal delay due to retransmitting first packets of redirected flows. The work in [239] utilized stochastic permutations to generate bursty traffic flows and statistically evaluated the expected throughput of several layer 2 single path and multi-path routing protocols in oversubscribed spine-leaf data centers. Those included deterministic single path selections based on hashing source or destination, worst-case optimal single path, ECMP- like flow-level multipathing, and a stateful Round Robin-based packet-level multipathing (packets spraying). Simulation results indicated that the throughput of the ECMP-like multipath routing is less than the deterministic single path due to flow collisions as 40% of the flows experienced 2 fold slowdown and the packet spraying outperformed all examined protocols. The authors in [240] proposed DCMPTCP to improve MPTCP through three mechanisms; Fallback for rACk-local traffic (FACT) to reduce unnecessary sub-flows creation for rack- local traffic, ADAptive packet scheduler (ADA) to estimate flow lengths and enhance their division, and Signal sHARE control (SHARE) to enhance the congestion control for the short flows with many-to-one patterns. Compared with two congestion control mechanisms typically used with MPTCP; LIA and XMP, 40% reduction in inter rack flows transmission time was achieved. The energy efficiency of routing big data applications traffic in data centers was considered in [241]-[243]. In [241], preemptive flow scheduling and energy efficient routing were combined to improve the utilization in Fat-tree data centers by maximizing switches sharing while exclusively assigning each flow to needed links during its schedule. Compared to links bandwidth sharing with ECMP, and flow preemption with ECMP, additional energy savings by 40% and 30% were achieved, respectively at the cost of increased average completion time. To improve the energy efficiency of MapReduce-like systems, the work in [242] examined combining VM assignments with TE. A heuristic; GEERA was designed to first cluster the VMs via minimum k-cut, and then assign them via local search, while accounting for the traffic patterns. Compared with other load-balancing techniques (i.e. Randomized Shortest Path (RSP), and integral optimal and fractional solutions), an average additional 20% energy saving was achieved, and total average savings by 60% in Fat-tree, and 30% in BCube data centers were achieved. A GreenDCN framework was proposed in [243] to green switch-centric data center networks (Fat-tree as focus) by optimizing VM placements and TE while considering the network features such as the regularity and role of switches, and the applications traffic patterns. The energy-efficient VM assignment algorithm (OptEEA), transforms VMs into super VMs with heavy traffic, assigns jobs to different pods through k-means clustering, and finally assigns super VMs to racks through minimum k-cut. Then, the energy-aware routing algorithm (EER) utilizes the first fit decreasing algorithm and MPTCP to balance the traffic across a minimized number of switches. Compared with greedy VM assignments and shortest path routing, GreenDCN achieved 50% reduction in the energy consumption. The optimization of routing traffic between VMs and in multi tenant data center was considered in [244]- [246]. VirtualKnotter in [244] utilized a two-step heuristic to optimize VM placement in virtualized data centers while accounting for the congestion due to core switch over-subscription and unbalanced workload placements. While other TE schemes operate for fixed source and destination pairs, VirtualKnotter considered reallocating them through optimizing VM migration to further reduce the congestion. Compared to a baseline clustering-based VM placement, a reduction by 53% in congestion was achieved for production data center traffic. To allow effective multiplexing of applications with different routing requirements, the authors in [245] proposed an online Topology Switching (TS) abstraction that defines a different logical typology and routing mechanism per application according to its goals (e.g. bisection bandwidth, isolation, and resilience). Results based on simulations for an 8-ary Fat-tree indicated that the tasks achieved their goals with TS while a unified routing via ECMP with shortest path and isolation based on VLAN failed to guarantee the different goals. The work in [246] addressed the fairness of bandwidth allocation and link sharing between multiple applications in private cloud data centers. A distributed algorithm based on dual-based decomposition was utilized to assign link bandwidths for flows based on maximizing the social welfare across the applications while maintaining performance-centric fairness with controlled relaxation. The authors assumed that the bottlenecks are in the access link of the PM and considered workloads where half of the tasks communicate with the other half and no data skew. To evaluate the algorithm, two different scenarios for applications allocation and communication requirements were used and the results indicated the flexibility and the fast convergence of the proposed algorithm. SDN-based solutions to optimize the routing of big data applications traffic in various data center topologies were discussed in [247]-[257]. MILPFlow was developed in [247] as a routing optimization tool set that utilizes MILP modeling based on an SDN topology description, that define the characteristics of the data center, to deploy data path rules in OpenFlow-based controllers. To improve the routing of shuffling traffic in Fat-tree data centers, an application-aware SDN routing scheme was proposed in [248]. The scheme included a Fat-tree manager, MapReduce manager, links load monitor, and a routing component. The Fat-tree manager maintains information about the properties of different connections to prioritize the assignment of flows with less paths flexibility. Emulation results indicated a reduction in the shuffling time by 20% and 10% compared to Round Robin-based ECMP under no skew, and with skew, respectively. Compared to Spanning Tree and Floodlight forwarding module (shortest path-based routing), a reduction around 60% was achieved. To enhance the shuffling between map and reduce VMs under background traffic in OpenFlow-controlled clusters, the work in [249] suggested dynamic flows assignment to queues with different rates in OVS and LINC software switches. Results showed that prioritizing Hadoop traffic and providing more bandwidth to straggler reduce tasks can reduce the completion time by 42% compared to solely using a 50 Mbps queue. In [250], a dynamic algorithm for workload balancing between different racks in a Hadoop cluster was proposed. The algorithm estimates the processing capabilities of each rack and accordingly modifies the allocation of unfinished tasks to racks with least completion time and higher computing capacity. The proposed algorithm was found to decrease the completion time of the slowest jobs by 50%. A network Overlay Framework (NoF) was proposed in [251] to gurantee the bandwidth requirements of Hadoop traffic at run time. NoF achieves this by defining networks topologies, setting communication paths, and prioritizing traffic. A fully virtualized spine-leaf cluster was utilized to examine the impact on job execution time when redirecting Hadoop flows through the overlay network controlled by NoF and a reduction in the completion time by 18-66% was achieved. An Application-Aware Network (ANN) platform with SDN-based adaptive TE was examined in [252] to control the traffic in Hadoop clusters at fine-grain level to achieve better performance and resources allocation. An Address Resolution Protocol (ARP) resolver algorithm for flooding avoidance was proposed instead of STP to update the routing tables. Compared with MapReduce in oversubscribed links, SDN-based TE resulted in completion time improvement between 16-337× for different workloads. To dynamically reduce the volume of shuffling traffic and green data exchange, the work in [253] utilized spate coding-based middleboxes under SDN control in 2-tier data centers. The scheme uses a sampler to obtain side information (i.e. traffic from map to reduce tasks residing in the same node) in addition to the coder/decoder at the middleboxes. Then, OpenFlow is used to multicast the coded packets according to an optimized grouping of multicast trees. Compared to native Vanilla Hadoop, reduction in the exchanged volume by 43%, and 59% were achieved when allocating the middlebox in the aggregation and a ToR switch, respectively. Compared to Camdoop-like in-network aggregation, the reduction percentages were 13%, and 38%. As an improvement of MPTCP, the authors in [254] proposed a responsive centralized scheme that adds subflows dynamically and selects best route according to current traffic conditions in SDN-controlled Fat-tree data centers. A controller that employs Hedera's demand estimation to perform subflow route calculation and path selection, and a monitor per server to adjust the number of subflows dynamically were utilized. Compared with ECMP and Hedera under shuffling with background traffic, the responsive scheme achieved 12% improvement in completion time while utilizing a lower number of subflows. To overcome the scalability issues of centralized OpenFlow-based controllers, the work in [255] proposed a distributed algorithm at each switch; LocalFlow that exploits the knowledge of its active flows and defines the forwarding rules at flowlets, individual flows and sub-flows resolutions. LocalFlow targets data centers with symmetric topologies and can tolerate asymmetries caused by links and node failures. It allows but reduces flow spatial splitting, while aggregating flows per destination, only if the load imbalance exceeded a slack threshold. To avoid the pitfalls of packets reordering, the duplicate acknowledgment (dup-ACK) duration at end-hosts is slightly increased. LocalFlow improved the throughput by up to 171%, 19%, and 23% compared to ECMP, MPTCP, and Hedera, respectively. XPath was proposed in [256] to allow different applications to explicitly route their flows without the overheads of establishing paths and dynamically adding them to routing tables of commodity switches. A 2-step compression algorithm was utilized to enable pre-installing very huge number of desired paths into commodity switches IP tables in large-scale data centers with limited table sizes. First, to reduce the number of unique IDs for paths, non-conflicting paths such as converging and disjoint ones are clustered into path sets and each set is assigned a single ID. Second, the sets are mapped to IP addresses based on Longest Prefix Matching (LPM) to reduce the number of IP tables entries. Then, a logically centralized SDN-based controller can be utilized to dynamically update the IDs of the paths instead of the routing tables, and to handle failures. For MapReduce shuffling traffic, XPath enabled choosing non- conflicting parallel paths and achieved 3× completion time reduction compared to ECMP. The recent work in [257] discussed the need for applying Machine Learning (ML) techniques to bridge the gap between large-scale DCN topological information and various networking applications such as traffic prediction, monitoring, routing, and workloads placement. A methodology named Topology2Vec was proposed to generate topology representation results in the form of low dimensional vectors from the actual topologies while accounting for the dynamics of links and nodes availability due to failures or congestion. To demonstrate the usability of Topology2Vec, the problem of placing SDN controllers to reduce end-to-end delays was addressed. A summary of the data center focused optimization studies is given in Table V.

TABLE V SUMMARY OF DATA CENTER -FOCUSED OPTIMIZATION STUDIES - I Ref Objective Application/ Tools Benchmarks/workloads Experimental Setup/Simulation environment platform [204]* Hadoop performance model Hadoop Modelling for MapReduce program to 160 servers in 5 racks (4 CPU cores, 8GB RAM, 2 250GB disk), for multi-rack clusters 0.21.0 processing time read, encrypt, then sort Gigabit Ethernet Cisco Catalyst 3750G and 2960-S, Bonnie++, Netperf [205]* Simulation approach to evaluate Hadoop 1.x ns-2 network simulator, TeraSort, MRPerf-based simulations for (single rack, double rack, tree-based data center topology effects on Hadoop DiskSim, C++, Tcl, Search, Index and DCell) topologies with 2, 4, 8, 16 nodes each with (2 Xeon Quad Python 2.5 GHz core, 4 750GB SATA), 1 Gbps Ethernet [206]* Evaluating job finish time for MapReduce Hadoop 1.x CloudSimExMapReduce Scientific Simulations for hierachical, fat-tree, CamCube, DCell, and workloads in different data centers topologies simulator, CloudDynTop workloads hypothetically optimal topology for MapReduce, 6-9 servers, 1 Gbps and 10 Gbps links [207]* Optimizing shuffling operations in electronic, Google’s MILP Sort Spine-leaf, Fat-tree, BCube, DCell, c-Through, hybrid, and optical data center topologies MapReduce Helios, Proteus topologies with 16 nodes, 10 Gbps links [208]* Evaluation of data centers Network - Regular and energy- One-to-one and all-to-all Simulations for Fat-tree, Vl2, FBFLY, BCube, DCell, FiConn with Power Effectiveness (NPE) aware TCP traffic flows similar network diameter, ∼8000 or ∼100 servers, and a mix of 1, 10 Flyways: wireless or wired additional links routing algorithms GbE links on-demand between congested ToR switches [209]* Data path performance evaluation - Network simulation Realistic workload with a OMNET++-based Simulations for spine-leaf data center (100 servers in spine-leaf architectures cradle mix of in 4 racks) with oversubscription ratios (1:1, 2.5:1, 5:1) and 10, 40, small and large jobs, and 100 Gigabit Ethernet bursty traffic

[210]* FFTree: Adjustment to pods in Fat-tree Google’s Flexibility in pods design WordCount Ns3-based simulations for 16 Data Nodes in Linux containers for higher bandwidth and flexibility MapReduce defined by edge offset, connected by TapBridge NetDevice BulkSendApplication [211]*, Flyways: wireless or wired additional links MapReduce- Central controller, MPLS Production MapReduce Simulation for 1500 servers in 75 racks with 20 server per rack) [212]* on-demand between congested ToR switches like data label workloads and additional Flyways with (0.1, 0.6, 1, and 2) Gbps capacity per mining switching, optimization Flyway problem

[213]* Failure-tolerant and spectrum efficient - Bimodal degree 125 Mbytes input data, Simulations for a spherical rack with 200 servers 10 of them are hub wireless data center for big data applications distribution, space and 20 Mbytes intermediate servers, time division multiple data access scheme [214]* Camdoop: MapReduce-like system Hadoop key-based routing, in- WordCount, Sort 27 servers CamCube (Quad Core 2.27 GHz CPU, 12GB RAM, 32GB in CamCube data centers 0.20.2, network prcessing, SSD), 1 Gbps Quadport NIC, 2 1 Gbps dual NIC, packet level Dryad, spanning tree simulator for 512 server CamCube Camdoop [215]* In-network distributed query processing Apache Solr NaaS box attached to Queries from Solr cluster (1 master, 6 workers), 1 Gbps Ethernet for servers, 10 system to increase throughput ToR for in-network 75 clients Gbps for NaaS box processing [216]* RAMCube : BCube-oriented design Key-value RPC through Ethernet, Key-value workloads BCube(4,1) with 16 servers (2.27GHz Quad core CPU, 16GB RAM, for resilient key-value store store one hop with set, get, delete 7200 RPM 1TB disk) and 8 8-port DLink GbE, ServerSwitch 1 Gbps allocation to recovery operations NIC in each server server [217]* Traffic-Aware VM Placement Problem - Two-tier approximate VM-to-VM global and Tree, VL2, Fat-tree, and BCube data centers with (TVMPP) in data centers algorithm partitioned 1024 VMs and traffic, production traces [218]* Oktopus: Intra data center network - Greedy allocation Symmetric and 25 nodes with 5 ToRs and core switch (2.27GHz CPU, 4GB RAM, 1 virtualization for predictable performance algorithms, rate-limiting asymmetric VM-to- VM Gbps), Simulations for multi-tenant data center with 16000 servers in at end hosts static and dynamic traffic 4 racks and 4 VMs per server [219]* VM to PM mapping based on architectural - Traffic Amount First Uniformly random VM- Simulations for Fat-tree, VL2, and Tree-based data centers in large- and resources constraints (TAF) heuristic to-VM traffic scale (60 VMs) and small-scale (10 VMs) settings [220]* NetDEO: VM placement in data centers Multi-tier Meta-heuristic based on Synthesized traces Simulations for data center networks with heterogeneous servers data centers and efficient system upgrade web Simulated annealing and different topologies (non-homogeneous Tree, FatTree, BCube) applications [221]* Emulator for SDN-based data centers Hadoop 1.x OVS Simulated Hadoop shuffle Testbed with 1 primary server (2.4 GHz, 4GB RAM), 12 client with reconfigurable topologies traffic generator workstations (dual-core CPU, 2GB RAM), 2 Pica8 Pronto 3290 SDN- enabled Gigabit Ethernet switches) [222]* SDN-approach in Application-Aware Hadoop 2.x Ryu, OVS, lightweight HiBench benchmark (sort, 1 master and 8 slaves in GENI testbed (2.67GHz CPU, 8GB RAM) Networks (AAN) to improve Hadoop REST-API, ARP resolver word count, scan, join, 100 Mbps per link. PageRank) [223]* Application-aware run-time SDN-based - 2D Torus topology - Hybrid data center with OpenFlow-enabled ToR electrical switches control for data centers with Hadoop configuration algorithms connected to an electrical core switch and MEMS-based core optical switch [224]* Experimental evaluation for MapReduce Hadoop 1.x Topology manager, TrintonSort (900 GB) and 24 servers in 4 racks, 5 Gbps packet link and 10 Gbps optical link in c-Through and Helios OpenFlow, circuit switch TCP long-lived traffic Monaco switches, Glimmerglass MEMS manager [225]* Measuring latency in Torus-based hybrid - Network Orchestrator Real-time network 2D Torus network testbed constructed with 4 MEMS sliced from a optical/electrical switching data centers Module (NOM) traffic via Iperf 96×96 MEMS- based CrossFiber LiteSwitcM and two Pica8 OpenFlow switches to emulate 16 NIC [226]* Acceleration of incast and multicast traffic - Floodlight, Redis pub/sub Two sets of 50 multicast 8 nodes hybrid cluster testbed with optical splitters and with on-demand passive optical modules messages, jobs (500MB-5GB) with combiners connected with controlled Optical Space Switch (OSS) integer program to 4 and 7 groups configure OSS [227]* HERO: Hybrid accelerated delivery MPICH for SDN controller, greedy 50 multicast flows with Small free-space optics mutlicast testbed with passive optical 1:9 for multicast traffic Message optical links assignment uniform flows splitters for throughput and delay evaluation, flow-level simulations Passing algorithm (100MB-1GB) and for 100 racks in spine and leaf hybrid topology Interface groups (10-100) (MPI) [228]* Multi-tenant virtual optical data center Virtual Data OpenDayLight, network- 2 VDC, randomly FPGA optoelectronics 12×10GE ToRs, SOA-based OPS switch, with network-aware VM placement Center aware VM placement generated 500 Polatis 192×192 MEMS- based switch as optical back-plane, (VDC) based on variance VDCs with Poisson simulations for 20/40 servers per rack, 8 racks composition requests arrival

[229]* Reconfigurable 3D MEMS based data center System S Scheduling Optimizer for Multiple jobs for 3 IBM Bladecenter-H chassis, 4 HS21 blade servers per chassis, for optimized stream computing systems Distributed Applications streaming applications Calient 320×320-port 3D MEMS (SODA) [230]* Hybrid switching architecture for cloud - Multi wavelength optical 3 synthetic traffic patterns OPNET simulations for the proposed hybrid OCS and EPS switching applications in Fat-tree data centers switch, distributed with mice and elephant in Fat-tree with 128 servers to evaluate the average end-to-end delay control for scheduling flows and the network throughput. [231]° Reduce tasks placements to Hadoop 1.x Linear-time greedy WordCount, Grep, Cluster with 4 racks; A, B , C, and D with 7, 5, 4, and 4 servers (Intel reduce cross-rack traffic algorithm, binary search PageRank, k-means, Xeon E5504, E5520, E5620, and E5620 CPUs, 8GB RAM), 1 GbE Frequency Pattern Match links [232]° Data center network-aware load balancing to - Greedy Synthetic and Wikipedia Simulations for 12-ary Fat-tree with 1 Gbps links optimize shuffling in MapReduce with skewed algorithm page-to-page link datasets data [233]° Joint intermediate data partitioning and - Profiling, decomposition- Dump files in Wikimedia, Simulations for three-tier data center with 20 nodes aggregation to improve the shuffling phase of based distributed WordCount, random MapReduce algorithm shuffling [234]° DCP: Distributed cache protocol to reduce - In-memory store, Randomly generated Simulations for Fat-tree with k=16 (1024 servers, 64 core switches, redundant packets transmission in Fat-tree bloom filter header packets with Zipf 28 aggregation, 128 edge), simulations distribution [235]° In-network aggregation in BCube for Hadoop 1.x IRS-based incast, SRS- WordCount with 61 VMs in 6 nodes emulated BCube (2 8-cores CPU, 24GB RAM, scalable and efficient data shuffling based shuffle, Bloom combiner 1TB disk), large-scale simulations for BCube(8,5) with filter [236]° Application-level TCP tuning for data transfers - Recursive Chunk Bulk data transfers high-speed networking testbeds and cloud networks through pipelining, parallelism, and concurrency Division, Parallelism- (512KB-2GB) per file Emulab-based emulations, AWS EC2 instances Concurrency Pipelining [237]° Space Shuffle (S2): greedy routing on multiple - Greediest routing, Random permutation Fine-grained packet-level event-based simulations for rings to improve throughput, scalability, and MILP traffic the proposed data center, Jellyfish, SWDC, and Fat-tree flexibility [238]° Non-blocking distributed oblivious adaptive - Approx. Markov chain Random permutation OMNet++-based simulations for InfiniBand network (three-level Clos routing in Clos-based data centers for big data models to predict traffic of 245KB flows DCN with 24 input, 24 output, and 48 intermediate switches and 1152 applications convergence time nodes), 40 Gbps links [239]° Evaluation for different routing protocols on - Stochastic permutations Bursty traffic, delay- Flow-level simulations for spine and leaf data center the performance of spine and leaf data centers for traffic generation sensitive workloads with 8 spine switches, 16 leaf switches, and 128 end nodes [240]° DCMPTCP: improved MPTCP for load - Fallback for rACk-local Many-to-one traffic, data Ns3-based simulations for Spine and leaf data center with 8 spine, 8 balancing in data centers with rack local and Traffic, ADAptive packet mining and web search leaf switches, and 64 nodes per leaf switch, 10 and 40 Gbps links inter racks traffic scheduler traffic [241]° Greening data centers by Flow Preemption - Algorithm for the 10k flows wit exponential Simulations for 24-ary Fat-tree (FP) and Energy-Aware Routing (EAR) FP and EAR scheme distribution with mean of 64MB [242]° Improving the energy efficiency of routing in - Approximate-algorithm Uniform random traffic Fat-tree (4-ary and 8-ary), BCube(2,2) and BCube(8,2) data centers by joint VM assignments and TE (GEERA) and number of VMs

[243]° GreenDCN: scalable framework to green data - Time-slotted algorithms; Synthetic jobs with Simulations for 8-ary and 12-ary Fat-tree data centers, 2 VMs per center network through optimized VM optEEA, EER normal distribution for server, identical switches with 300W max power and max processing placement and TE no. of servers speed of 1 Tbps [244]° VirtualKnotter: efficient online VM placement - Multiway θ-Kernighan- Synthetic and realistic Dynamic simulations and migration to reduce congestion in data Lin and simulated traffic centers annealing [245]° Topology switching to allow multiple - Allocator, centralized Randomly generated Simulations for 16-ary Fat-tree with 1 Gbps links applications to use different routing schemes topology server routing tasks [246]° Performance-centric fairness in links bandwidth - Gradient projection- Two scenarios for Simulations for a private data center with homogeneous nodes allocation in private clouds data centers based distributed applications traffic algorithm [247]° MILPFlow: a Tool set for modeling and data - MILP, Video streaming, Mininet-based emulation for 4-ary Fat-tree in VirtualBox-based paths deployment in OpenFlow-based data OVS Iperf traffic VMs, 1 GbE NIC centers [248]° Application-aware SDN routing Hadoop 1.x Floodlight controller, WordCount EstiNet-based emulation for 20 OpenFlow switches in 4-ary Fat-tree in data centers managers, links monitor, with 16 nodes routing component [249]° Hadoop acceleration in an Cloudera Floodlight, OVS and Sort (0.4 MB - 4GB), 10 VMs in 3 nodes in Cloudera connected by a physical switch OpenFlow-based cluster distribution LINC switches Iperf for background of Hadoop traffic [250]° SDN-controlled dynamic workload balancing - Balancing algorithm WordCount Mumak-based simulations for a data center with three racks to improve completion time of Hadoop jobs based on estimation and prediction [251]° SDN-based Network Overlay Framework (NoF) Hadoop Configuration engine, TeraGen, TeraSort, Virtualized testbed with spine-leaf virtual switches, VirtualBox for to define networks based on applications 2.3.0 OVS, POX controller iperf for background VMs requirements traffic [252]° SDN-approach in Application-Aware Hadoop 1.x Ryu, OVS, lightweight HiBench benchmark (sort, 1 master and 8 slaves in GENI testbed (2.67GHz CPU, 8GB RAM), Networks (AAN) to improve Hadoop REST-API, ARP resolver word count, scan, join, 100 Mbps per link PageRank) [253]° Dynamic control for data volumes through SDN Vanilla Sampler, spate coding- TeraSort, Grep Prototype: 2-tier data center with 8 nodes (12 cores CPU, 128GB control and spade coding in cloud data centers Hadoop based middleboxes for RAM, 1TB disk), Testbed: 8 VMs controlled by Citrix XenServer and coding/decoding connected with OVS [254]° Responsive multipath TCP for optimized - Demand estimation and Random, permutation, NS3-based simulations for 8-ary Fat-tree with 1 Gbps links and shuffling in SDN-based data centers route calculation and shuffling traffic customized SDN controller algorithms [255]° LocalFlow: local link balancing for scalable - Switch-local pcap packet traces, Packet-level network simulations (stand-alone and htsim-based) for and optimal flow routing in data centers algorithm MapReduce-style flows 8-ary and 16-ary Fat-tree, VL2, oversubscribed variations [256]° XPath: explicit flow-level path control in - 2 step compression Random TCP Testbed: 6-ary Fat-tree testbed with Pronto Broadcom 48-port commodity switches in data centers algorithm connections, sequential Ethernet switches, Algorithm evaluation for BCube, Fat-tree, HyperX, read/write, shuffling and VL2 with different scales [257]° Topology2Vec: DCN representation learning - Biased random walk Real-world Internet Simulations for networking applications sampling ML-based topologies from the network partitioning Topology Zoo * Data center topology, ° Data center routing. C. Scheduling of Flows, Coflows, and Jobs in Data Centers: Scheduling big data traffic at the flow level was addressed in [258]-[261]. Orchestra [258], a task-aware centralized cluster manager, aimed to reduce the average completion time of various data transfer patterns for batch, iterative, and interactive workloads. Within each transfer, a Transfer Controller (TC) was used for monitoring and updating sources associated with destination sets. For broadcast, a BitTorrent-like protocol, Cornet, was utilized and for shuffle, an algorithm named Weighted Shuffle Scheduling (WSS) was used. Orchestra allows scheduling at the transfer level between applications stages where concurrent transfers belonging to the same job can be optimized through an Inter-Transfer Controller (ITC) that can utilize FIFO, fair, and priority- based scheduling. Broadcast was improved by 4.5 times compared to native Hadoop with status quo and high priority transfers were improved by 1.7 times. Seawall in [259] is an edge scheduler that used a hypervisor-based mechanism to ensure fair sharing of DCN links between tenants. FlowComb was proposed in [260] as a centralized and transparent network management framework to improve the utilization of clusters and reduce the processing time of big data applications. FlowComb utilized software agents installed in nodes to detect transfer requests and report to the centralized engine and OpenFlow to enforce forwarding rules and paths. An improvement of 35% in completion time of sort workloads was achieved with 60% path enforcement. In [261], distributed flow scheduling was proposed to achieve adaptive routing and load balancing through DiFS. For several traffic patterns, DiFS achieved better aggregate throughput compared to ECMP, and comparable or higher throughput compared to Hedera and DARD. Optimizing coflows scheduling, which is more applications-aware than flows scheduling, to minimize the completion time of workloads was the focus in [262]-[265]. Varys was proposed in [262] as a coordinated inter- coflow scheduler in data centers targeting predictable performance and reduced completion time for big data applications. A greedy co-flow scheduler and a per-flow rate allocator were utilized to identify the slowest flow in a co-flow and adjust the rate of accompanying flows to the slowest one so networking resources can be used in other co-flows. Trace-driven simulations indicated that Varys achieved 3.66×, 5.53×, and 5.65× improvements compared to fair sharing, per-flow scheduling, and FIFO, respectively. Baraat in [263] suggested decentralized task-aware scheduling for co-flows to reduce their tail completion times. FIFO with limited multiplexing (FIFO- LM) is used to schedule the tasks while avoiding head-of-line blocking for small flows by changing the multiplexing level when heavy tasks arrive. Compared to pFabric [622] and Orchestra, the completion time of 95% of MapReduce workloads was reduced by 43% and 93%, respectively. A decentralized coflow-aware scheduling system (D-CAS) that dynamically and explicitly set the priorities of flows according to the maximum load in the senders was proposed in [264] to minimize coflows completion time. D-CAS achieved a performance close to Varys with up to 15% difference and outperformed Baraat by 1.4 and 4 times for homogeneous and heterogeneous workloads, respectively. Rapier in [265] integrated routing and scheduling at the coflow level to optimize the performance of big data applications in DCNs with commodity switches. Compared to Varys with ECMP and optimized routing only, about 80% and 60% improvement in coflow completion time was achieved. Scheduling traffic in DCNs with SDN environments was addressed in [266]-[269]. Online scheduling of multicast flows in Fat-tree DCNs was considered in [266] to ensure bounds on congestion and improve the throughput and load balancing. A centralized Bounded Congestion Multicast Scheduling (BCMS) algorithm was developed for use with OpenFlow and improvements compared to VLB and random scheduling were reported. Pythia in [267] focused on improving the network performance under skewed workloads. A run-time intermediate data size prediction and centralized OpenFlow-based control were utilized and up to 46% reduction in completion time was obtained compared to ECMP. SDN Openflow network with Combined Input and Crosspoint Queued (CICQ) switches was proposed to dynamically schedule packets of different big data applications in [268]. An application-aware computing and networking resource allocation scheme based on neural network predictions was developed to allocate adequate resources so that the SLA is minimally violated. For five different DCN applications, the scheme achieved less than 4% SLA violation rate at the cost of 9.21% increase in the energy consumption. The authors in [269], proposed and experimentally demonstrated a heuristic for Bandwidth-Aware Scheduling with SDN (BASS) to reduce the minimum job completion time in Hadoop clusters. The heuristic prioritizes scheduling the tasks locally but considers remote assignment while accounting for links assignment for data transfers if the total completion time is reduced. BASS was found to reduce the completion time by 10%. The work reported in [270]-[273] focused on scheduling traffic in optical and hybrid DCNs. In [270], the gaps between OCS interconnects performance and latency-sensitive applications requirements were addressed. Two scheduling algorithms: centralized Static Circuit Flexible Topology (SCFT) and distributed Flexible Circuit, Flexible Topology (FCFT) were proposed and up to 2.44× improvement over Mordia [283] was achieved. Resource allocation in NEPHELE was addressed in [271] while accounting for the SDN controller delay and random allocation of iterative MapReduce tasks. Compared to Mordia, NEPHELE uses multiple WDM rings instead of one to enable efficient use of resources, and introduces an application-aware and a feedback-based synchronous slotted scheduling algorithms. In [272], different end-to-end continuous-time scheduling algorithms were designed for a proposed hybrid packet optical DCNs by the same authors, and were compared using random traffic. Effective traffic scheduling for a proposed Packet-Switched Optical Network (PSON) DCN with space switches and layers of AWGRs was examined in [273]. To treat traffic flows differently and optimally based on their type, three machine-learning flow detection algorithms were used and were compared in terms of accuracy and classification speed. Scheduling algorithms that consider priority of flows, and occupancy of buffers were proposed and improvements in terms of packet loss ratio and average delay compared to Round Robin were reported. The authors in [274] maximized the throughput of data retrieval for single and multiple applications in tree- based DCNs while accounting for links bandwidths and allocation of data replicas. The proposed approximation algorithm achieved near optimal performance and improved retrieval time compared to random scheduling. A topology-aware heuristic for data placement that minimizes remote data access costs in geo-distributed MapReduce clusters was proposed in [275] based on optimized replica balanced distribution tree. CoMan was proposed in [276] to improve bandwidth utilization and reduce completion time of several big data frameworks in Multiplexed Datacenters. Virtual Link Group (VLG) abstraction was utilized to define shared bandwidth resources pool and an approximation algorithm was developed. Compared to ECMP with ElasticSwitch [133], the bandwidth utilization and completion time were improved by 2.83× and 6.68×, respectively. The authors in [177] proposed probabilistic map and reduce tasks scheduling algorithms that consider the data center topology and the links statuses in computing the cost and latency of data transmission. Based on the calculations, the algorithm incorporates randomness in the assignments where the tasks with the least transmission time get scheduled on available slots with higher probability. Thus, the locality and completion time are balanced without enforcement. Compared to a coupling scheduling method and to fair scheduling, reductions by 17%, and 46% were achieved. A Network-Aware Scheduler (NAS) was examined in [278] for improving the shuffling phase in clusters with multi racks. NAS utilizes three algorithms: Map Task Scheduling (MTS) to balance cross node traffic caused by skewness, Congestion-Avoidance Reduce Task Scheduling (CA-RTS) to reduce cross-rack traffic, and Congestion-Reduction RTS (CR-RTS) to prioritize light-shuffle traffic jobs. Compared to state-of-the-art schedulers like fair and delay, improvements in the throughput by up to 62%, and reductions in the average completion time by up to 44% and cross rack traffic by up to 40% were reported. A fine-grained framework; Time-Interleaved Virtual Cluster (TIVC) that accounts for the dynamics of big data applications bandwidth requirements was proposed and tested in a system named PROTEUS in [279]. Based on observing, via profiling four workloads, that the traffic peaked only during 30-60% of the execution time, TIVC targeted reducing the over-utilization of bandwidth and allowing overlapped scheduling of jobs. First, the user's application was profiled and a cost model based on bandwidth cap and performance trade-offs was generated, then a dynamic programming-based spatio-temporal jobs allocation algorithm was used to maximize the utilization and revenue. Compared with Oktopus in [218] for mixed batch jobs, a reduction by 34.5% in the completion time was achieved. For dynamically arriving jobs at 80% load, PROTEUS reduced rejection ratio to 3.4% from 9.5% reported for Oktopus. The authors in [280] explored the benefits of ahead-planning with MapReduce and DAG-based production workloads and developed Corral that jointly optimizes data placement and tasks scheduling at the rack-level. Corral modified HDFS and AM to enable rack selection with data and tasks placement and hence, increase the locality and reduce the overall cross-rack traffic. Compared to native YARN, with capacity scheduler, the makespan was reduced by up to 33% while reducing cross-rack traffic by up to 90%. Simulation results for integrating Corral with Varys [262] showed clear improvement over using capacity scheduler with Varys and over using Corral with TCP indicating the importance of integrating network and data/tasks scheduling solutions. The authors in [281] proposed Mercury as an extension of YARN that enables hybrid resource management in large clusters by allowing AM of applications to request guaranteed containers via a centralized scheduling or queueable containers via distributed scheduling according to their needs. Compared to default YARN, an average improvement by 35% in throughput was obtained. The work in [282], optimized the allocation of container resources in terms of reduced congestion and improved data locality by considering the networking resources and the data locations. Compared to default placement in YARN, 67% reduction in completion time for network- extensive workloads was obtained. The energy efficiency of data centers through workloads and traffic scheduling was considered in [283]-286]. The work in [283] examined the performance-energy tradeoffs when using the Low Power Idle (LPI) link sleep mode of the EEE standard [639] with MapReduce workloads. The timing parameters of entering and leaving the LPI mode in 10GbE links were optimized while utilizing packet coalescing (i.e delaying outgoing packets during the quite mode and aggregating them for transmission in the following active mode). Depending on the superscription ratio and workloads, EEE achieved power saving between 5 and 8 times compared to legacy Ethernet. A detailed study of the optimum packet coalescing setting under different TCP settings and with shallow and deep buffers (i.e. 128kB and 10MB per port) was provided and an additional improvement by a factor of two was achieved. Applications scheduling in fewer machines for energy saving in networking devices of large scale data centers was considered in [284] through a Hierarchical Scheduling Algorithm (HSA). The workloads were allocated at the nodes according to their subsets (i.e. connectivity with ToR switches), then according to the levels of higher layer switches and associate data transmission costs. HSA was tested with a Dynamic Max Node Sorting Method and two classical bin packing problem solutions and the proposed method resulted in improved performance and stability. Willow in [285] aimed to reduce the energy consumption of switches in Fat-tree through SDN-based dynamic scheduling while accounting for the performance of network-limited elastic flows. The number of activated switches as well as usage duration was considered in computing the routing path per flow. Compared to ECMP and classical heuristics (i.e. simulated annealing and particle swarm optimization), up to 60% savings were achieved. The authors in [286] proposed JouleMR as a green-aware and cost-effective tasks and jobs scheduling framework for MapReduce workloads that maximized renewable energy usage while accounting for brown energy dynamic pricing and renewable energy storage. To obtain a plan for the jobs/tasks scheduling, performance-energy consumption models were utilized to guide a two-phase heuristic. The first phase optimizes the start time of tasks/jobs to reduce the brown energy usage and the second phase optimizes the assigned resources per task/job to further reduce it while satisfying soft deadlines. Compared to Hadoop with YARN, a reduction by 35% in brown energy usage and by 21% in overall energy consumption was obtained. D. Performance Improvements based on Advances in Computing, Storage, and Networking Technologies: Comparisons of scale-up and scale-out systems for Hadoop were discussed in [287]-[290]. In [287], the authors showed that running Hadoop workloads with sub Tera-scale on a single scaled-up server can be more efficient than scale-out clusters. For the scale-up system, several optimizations were suggested to optimize storage, number of concurrent tasks, heartbeats, JVMs heap size, and the shuffling operation. A hybrid Hadoop cluster with scale-out and scale-up servers was proposed in [288]. Based on profiling the performance of workloads on scale-out and scale-up clusters, a scheduler that assigns each job to the best choice was designed. A study to measure the impact on HDFS of improved networking such as InfiniBand and 10Gigabit Ethernet, selected protocol, and SSD storage systems was presented in [289]. With HDD storage, enhanced networking achieved up to 100% performance improvement, while with SSD, the improvement was by up to 219%. The authors in [290] discussed the need to optimize the tasks scheduling and data placement for data-centric workloads in compute-centric (i.e. HPC) clusters. A comprehensive evaluation was conducted to study the impact of storing intermediate RDDs on RAM, local SSD, and clusters, which requires write locks, on Spark workloads with different compute and I/O requirements. An Enhanced Load Balancer scheduler was suggested and a Congestion- Aware Task Dispatching to SSD mechanism was also proposed. Enhancements by 25%, and 41.2% were achieved. It was also found that increasing the chunk size to 128 MB from 32 MB reduced the job execution time by 15.9% as scheduling overheads were reduced. The performance of big data applications under SSD storage was addressed in [291]-[294]. The work in [291] examined the performance improvements and economics (i.e. cost-per-performance) of partial and full use of SSDs in MapReduce clusters. A comparison between SATA Hard Disk Drives (HDDs) and Peripheral Component Interconnect express (PCIe) SSDs with the same aggregate I/O bandwidth showed that SSDs achieved up to 70% better performance at 2.4× the cost of HDDs. When upgrading HDDs systems with SSDs, proper configurations with multiple HDFSs should be considered. The authors in [292], discussed the need to modify Hadoop frameworks when using SSDs and proposed a direct map output collection, pre-read model for HDFS, a reduce scheduler to address skewness, and a modified data blocks placement policy to extend the lifetime of SSDs. The first two methods achieved improvements by 30%, and 18%, while the scheduler produced improvements by 12% and the overall improvement was by 21%. To accelerate I/O intensive and memory-intensive MapReduce jobs, mpCache, that dynamically caches input data from HDD into SATA SSDs, was proposed and examined in [293] and average speedups by 2× were gained. The aggregate performance of Non-Volatile Memory Express (NVMe) SSDs with high number of Docker containers for database applications was evaluated in [294]. Under optimized selection of instances count, up to 6× improved write throughput can be obtained compared to one instance. With multiple applications, the performance can degrade by 50%. Optimizing data transfers from networked storage systems were addressed in [295]-[297]. Google traces were used in [295] to examine the performance of Hadoop with Network Attached Storage (NAS). MRPerf [111] was utilized to evaluate the slow down factor of using a NAS, that requires rack-remote transfers, instead of conventional Direct Attached Storage (DAS) for different number of racks and different workloads. The results showed that CPU-intensive workloads had the least slow down factor and that TeraSort workloads can benefit the most from an assumed enhanced NAS. FlexDAS in [296] was proposed as a switch based on SAS expanders to form a Disk Area Network (DAN) to provide flexibility in connecting and disconnecting computing nodes to HDDs while maintaining the I/O performance of DAS. Results based on a prototype showed that FlexDAS can detach and attach a HDD in 1.16 seconds and for three I/O intensive applications, it achieved the same I/O performance of DAS. The work in [297] addressed optimizing and balancing the I/O operation required for Terabits data transfers from centralized Parallel Storage Systems (PFS) over WAN for competing scientific applications. An end-to-end layout-aware optimization based on a scheduling algorithm for sources (i.e. disks) and for sinks (i.e. requesting servers) was utilized to coordinate the use of network and storage and to reduce I/O imbalances caused by congestion in some disks. Remote Direct Memory Access (RDMA) that enables zero-copy (i.e. accessing the RAM without the intervention of the operating system), was utilized in [298], [299] to improve the performance of big data applications. Fast Remote Memory (FaRM) was proposed in [298] as a system for key-value and graph stores that regards RAMs of the machines in a cluster as a shared address space. FaRM utilizes one-sided RDMA lock-free serializable reads, RDMA writes for fast message passing, and single machine transactions with optimized locality. A kernel driver, PhyCo, was developed to enable addressing 2GB memory regions to overcome the issue of limited entries in NIC page tables. Compared to TCP/IP, FaRM achieved improved rate by 11×, and 9×, for request sizes of 16 and 512 bytes, respectively. Hadoop-A in [299] reduced the overheads of materializing intermediate data in MapReduce by proposing a virtual shuffling mechanism that utilizes RDMA in InfiniBand and optimizes the use of memory buffers. Improvements by up to 27% compared to default shuffling and 12% power consumption reduction were achieved. The studies in [300]-[309] focused on utilizing improved CPUs, GPUs, and Field Programmable Gate Arrays (FPGAs) to accelerate big data applications. An investigation of the inefficiencies of scale-out clusters for cloud applications based on micro-architectural characterization of their workloads was conducted in [300]. It was found that scale-out workloads exhibit high instruction-cache miss rate, low instruction and memory level parallelism, and have low CPU core to core bandwidth requirements, and recommendations for specialized servers were provided. The author in [301] proposed a system for exascale computation that includes low-power processors, byte-addressable NAND flash as main memory, and Dynamic RAM (DRAM) as cache for the flash. A total of 2550 nodes in 51 boards were designed to have their main memories connected as Hoffman-Singleton graph, which provides 2.6 PB capacity, 9.1 TB/s bisection bandwidth and reduced latency to at most four hops. Mars in [302] was proposed as a MapReduce runtime system that utilized NVIDIA GPUs, AMD GPUs, and multi-core CPUs to accelerate the computations. To ease the programming of MapReduce on GPUs, MarsCUDA programming framework was introduced to automate the task partitioning, data allocation, and threads allocation to key-value pairs. The results showed that Mars outperformed a CPU-only based MapReduce system for multi- core CPUs, Phoenix, by 22 times. The authors in [303] proposed coupling the use of CPUs and GPUs for accelerating map tasks with MPTCP to reduce shuffling time for improving the performance and reliability of Hadoop. FPMR was proposed in [304] as a framework to help developers boost MapReduce for machine learning and data mining in FPGAs while masking the details of task scheduling and data management and communications between FPGAs and CPU. A processor scheduler was implemented in FPGA to control the assignments of map and reduce tasks, and a Common Data Path (CDP) was built to perform data transfers without redundancy to several tasks. As an example, compute-intensive rank algorithm; RankBoost was mapped into FPMR. Compared to CPU-only implementation, speedups by 16.74×, and 31.8× were achieved without and with CDP, respectively. SODA in [305] introduced a framework for software defined accelerators based on FPGA. To target heterogeneous architectures, SODA utilizes a component based programming model for the accelerators, and Dataflow execution phases to parallelize the tasks with an out-of-order scheduling scheme. In a Constrained Shortest Path Finding (CSPF) algorithm for SDN applications in 128 nodes, acceleration by 43.75× was achieved. To accelerate MapReduce with high energy efficiency in hosting data centers, the authors in [306] extended the current High Level Synthesis (HLS) toolflow to include the MapReduce framework, which allows customizing the accelerations and scaling to data center level. To allow easy transition from C/C++ to Register Transfer Level (RTL) design, the tasks and data paths in MapReduce were decoupled, the accessible memory for each task was isolated, and a dataflow computational approach was forced. The throughput was improved by up to 4.3× while having a two order of magnitude improvement in the energy efficiency compared to a multi-core processor. A Multilevel NoSQL cache implemented in FPGA-based in-NIC and software-based in-kernel caches was proposed in [307] to accelerate NoSQL. To efficiently process entries with different sizes, multiple heterogeneous Processing Elements (PEs) (i.e. string PE, hash PE, list PE, and set PE) were utilized instead of pipelining. FPGA- based NICs were also utilized in [308] to perform one-at-a-time processing for data streaming instead of micro- batch processing of Apache Spark Streaming and suggested filtering the data in the NIC to reduce processed data. Experimental results demonstrated that WordCount throughput was improved by 22× and change point detection latency was reduced by 94.12%. The authors in [309] examined the challenges associated with partitioning large graphs for the Graphlet counting algorithm from bioinformatics and proposed a framework for reconfigurable acceleration in FPGAs that achieved an order of magnitude improvement compared to processing in quad-core CPUs and better scalability. The challenges for big data applications in disaggregated data centers were discussed in [310], [311]. In [310], the network requirements for supporting big data applications in disaggregated data centers were discussed. The key findings were that existing networking hardware that provides between 40 Gbps and 100 Gbps data rates is sufficient for rack-scale and even data center-scale disaggregation with some workloads, if the network software is improved and the protocols used were deadline-aware. In [311], the authors demonstrated the feasibility of composable rack-scale disaggregation for heterogeneous big data workloads and negligible latency impact was recorded for some of the workloads such as Memchached. Table VI provides the second part of the summary of data center focused optimization studies reported in this section.

TABLE VI SUMMARY OF DATA CENTER -FOCUSED OPTIMIZATION STUDIES - II Ref Objective Application/ Tools Benchmarks/workloads Experimental Setup/Simulation environment platform [258]* Orchestra: Task-aware cluster management Spark BitTorrent-like protocol, Facebook traces EC2 instances, DETERlab cluster to reduce the completion time of data transfers Weighted Shuffle Scheduling [259]* Seawall: Centralized network bandwidth - Shim layer, Strawman Web workloads Cosmos; 60 nodes in 3 racks (Xeon 2.27 GHz CPU, allocation scheme in multitenant environments bandwidth allocator 4GB RAM), 1 Gbps NIC ports [260]* FlowComb: Transparent network-management Hadoop 1.x Software agents, Sort 10GB 14 nodes framework for high utilization and fast OpenFlow processing [261]* Distributed Flow Scheduling (DiFS) for - Imbalance Detection and, Deterministic, random, Packet-level stand-alone simulator for Fat-tree adaptive routing in hirarchical data centers Explicit Adaption and shuffling flows (120 with 16 hosts and 1024 hosts algorithm GB) [262]* Varys: Centralized co-flow scheduling to reduce - Smallest-Effective- Facebook traces Extra large high-memory instances in a 100-machine EC2 cluster, completion time of data-intensive applications Bottleneck First (SEBF) task-level trace-driven simulations heuristic [263]* Baraat: Decentralized task-aware scheduler for Memchached FIFO limited Bing, Cosmos, 20 nodes in 5 racks testbed (Quad core, 4GB RAM), 1 GbE co-flows in DCNs to reduce tail completion time multiplexing, Smart and Facebook traces flow-based simulations, ns-2 large-scale simulations Priority Class (SPC) [264]* D-CAS: A decentralized coflow-aware - Online 2-approximation Facebook traces Python-based trace-driven simulations scheduling in large-scale DCNs scheduling algorithm [265]* Rapier: coflow-aware joint routing - Online heuristic, Many-to-many Spine-leaf DCN with 9 nodes (4 CPU cores,8GB RAM, 500GB disk) and scheduling in DCNs OpenFlow communication patterns large-scale event-based flow-level simulations for 512 nodes Fat-tree and VL2 [266]* Low-complexity online multicast - BCMS algorithm Unicast and mixed Event-driven flow simulations for 32-ary Fat-tree flows scheduling in Fat-tree DCNs synthetic traffic [267]* Pythia: real-time prediction of data Hadoop 1.1.2 Application-specific HiBench benckmarks: 10 nodes (12 CPU cores, 128 GB RAM, 1 SATA disk) in communication volume to accelerate instrumentation, Sort and Nutch index two racks, OpenFlow-enabled ToRs MapReduce OpenFlow [268]* Application-aware Resources Allocation - Neural networks Different applications Cloudsim-based simulations for 20 servers with 80 VMs scheme for SDN-based data centers traffic [269]* BASS: Bandwidth-Aware Scheduling Hadoop 1.2.1 OVS, time-slotted WordCount, Sort 5-nodes cluster with SDN in Hadoop clusters bandwidth allocation algorithm [270]* Scheduling in dynamic OCS-switching data - SCFT and FCFT NAS Parallel Benchmark OCSEMU emulator with 20 nodes (E5520 CPU, centers for delay-sensitive applications scheduling algorithms (NPB) suite 24 GB RAM, 500 GB disk) [271]* Slotted resource allocation in an - Online and incremental Iterative MapReduce OMNET++ 4.3.1-based packet-level simulations SDN control-based hybrid DCN scheduling heuristics workloads

[272]* Scheduling algorithms in hybrid - MILP, heuristics 10k-20k random 1024 nodes in two-tier network packet optical data centers requests (4 core MEMS switches, and 64 ToRs) [273]* Scheduling in packet-switched optical DCNs - Mahout, C4.5, and Naïve Random traffic Simulations for 80 ToRs connected by two 40×40 AWGRs Bayes Discretization and two 1 × 2 space switches (NBD) [274]* Max-throughput data transfer scheduling - MILP, heuristic 1-1, many-1 Bulk Simulations for three-tier and Fat-tree DCNs with 128 nodes and to minimize data retrieval time transfers VL2 with 160 nodes. [275]* Topology-aware data placement Hadoop 2.2.0 Heuristic WordCount, TeraSort, 18 nodes, TopoSim MapReduce simulator in geo-distributed data centers scheduler k-means [276]* CoMan: Global in-network management in - 3/2 approximation 55 flows from Emulation for a Fat-tree DCN with 10 switches and 8 servers multiplexed data centres algorithm 10 applications (using Pica8 3297), trace-driven simulations [277]* Network-aware MapReduce tasks placement Hadoop 1.2.1 Probabilistic tasks Wordcount, Grep from Palmetto HPC platform with 1978 slave nodes (16 cores CPU, 16GB to reduce transmission costs in DCNs scheduling algorithm, BigDataBench, Terasort RAM, 3GB disk), 10 GbE [278]* Network-aware Scheduler (NAS) Hadoop 2.x MTS, CA-RTS, and CR- Facebook traces 40 nodes in 8 racks cluster with 1 GbE for cross-rack links for high performance shuffling RTS scheduling Trace-driven event-based simulations for 600 nodes in 30 racks with algorithms 200 users

[279]* PROTEUS: system for Temporally Interleaved Hive, Profiling, Sort, WordCount, Join, Profiling: 33 nodes (4 cores 2.4GHz CPU, 4GB RAM), Gigabit Virtual Cluster (TIVC) abstraction for multi- Hadoop dynamic programming aggregation switch tenant data centers 0.20.0 prototype: 18-nodes in three-tier data center with NetFPGA switches, simulations [280]* Corral: offline scheduling framework to reduce Hadoop 2.4 Offline planner, SWIM and Microsoft 210 nodes (32 cores) in 7 racks, 10GbE cross-rack traffic for production workloads cluster scheduler traces, TPC-H large-scale simulations for 2000 nodes in 50 racks [281]* Mercury: Fine-grained hybrid resources Hadoop 2.4.1 Daemon in each node, Gridmix and Microsoft’s 256 nodes cluster (2 8-core CPU, 128GB RAM, 10 3TB disks), management and scheduling extensions to YARN production traces 10 Gbps in rack, 6 Gbps between racks [282]* Optimizing containers allocation to reduce Hadoop 2.7, Simulated annealing, k-means, 8 nodes (8-core CPU, 32GB RAM, SATA disk) in a Fat-tree topology congestion and improve data locality Apache Flink modification to AM connected components 1 GbE, OpenFlow 1.1 0.9 [283]* Examining the impact of the Energy Hadoop 1.x EEE in switches, TeraSort, Search, MRperf-based packet-level simulations Efficient Ethernet on MapReduce workloads packets coalescing Index for two racks cluster with up to 80 nodes [284]* Energy-aware hierarchical applications - k th max node sorting, Uniform / normal random C++ based simulations for up to 4096 nodes scheduling in large DCNs dynamic max node applications demands with 32, 64, 128, and 256 ports switches sorting algorithms [285]* Willow: Energy-efficient SDN-based flows Hadoop 1.x Online greedy Locally generated 16 nodes (AMD CPU, 2GB RAM), 1 GbE, Simulations for Fat-tree scheduling in data centers with network-limited approximation algorithm MapReduce traces and Fat-tree with disabled core switches, 1 GbE workloads [286]* JouleMR: Cost-effective and energy-aware Hadoop 2.6.0 Two-phase heuristic TeraSort, GridMix 10 nodes cluster (2 6-cores CPUs, 24GB RAM, 500GB disk), 10 GbE MapReduce jobs and tasks scheduling Facebook traces simulations for 600 nodes in tree-structured DCN, with1GbE links [287]* Evaluation of scale-up vs scale-out Vanilla Parameters optimization Log processing, select, 16 nodes (Quad-core CPU, 12GB RAM, 160GB HDD, 32GB SSD), 1 systems for Hadoop workloads Hadoop to Hadoop in scale-up aggregate, join, TeraSort, GbE, Dell PowerEdge910 (4 8-core CPU, 512GB RAM, 2 600GB servers k-means, indexing HDD, 8 240GB SSD) [288]° Hybrid scale-out and scale-up Hadoop 1.2.1 Automatic Hadoop WordCount, Grep, 12 nodes (2 4-core CPU, 16GB RAM, 193GB HDD), 10 Gbps Hadoop cluster parameters optimization, Terasort, TestDFSIO, Mytinet, 2 nodes (4 6-core CPU, 505GB RAM, 91GB HDD), 10 Gbps scheduler to select cluster Facebook traces Mytinet, OrangeFS [289]° Measuring the impact of InfiniBand and Hadoop Testing different network Sort, random write, 10 nodes cluster (Quad-core CPU, 6GB RAM, 250GB HDD or (up to 10Gigabit on HDFS 0.20.0 interfaces and sequential write 4), 64GB SSD), InfiniBand DDR 16 Gbps links and 10GbE technologies [290]° Optimizing HPC clusters for dual compute- Spark 0.7.0 Enhanced Load Balancer, GroupBy, Grep, Logistic Hyperion cluster with 101 nodes ( 2 8-core CPUs, 64GB RAM, centric and data-centric computations Congestion-Aware Task Regression (LR) 128GB SSD) in 2 racks, InfiniBand QDR 32 Gbps links Dispatching [291]° Cost-per-performance evaluation of MapReduce Hadoop 2.x Standalone evaluation Teragen, Terasort, (8 core CPU, 48GB RAM ) 10 GbE, single rack, SSDs 1.3TB, HHDs clusters with SSDs and HDDs with different storage and Teravalidate, WordCount, 2TB setups 6 HDDs, 11 HDDs, 1 SSD, (6 HHDs +1 SSD) application shuffle, HDFS configurations [292]° Improving Hadoop framework for Hadoop Modified map data Terasort, DFSIO 16 nodes (8 core CPU, 256GB RAM, 600GB HDD, 512GB SATA better utilization of SSDs 2.6.0, handling, pre-read in SSD, 10GbE) 8 nodes (6 core CPU, 256GB RAM, 250GB HDD, Spark 1.3 HDFS, reduce scheduler, 800GB NVMe SSD, 10GbE) Ceres-II 39 nodes (6 core CPU, 64GB placement policy RAM, 960GB NVMe SSD, 40GbE) [293]° mpCache: SSD-based acceleration for Hadoop 2.2.0 Modified HDFS, Grep, WC - Wikipedia 7 nodes (2 8-core CPUs, 20MB cache, 32GB RAM, 2TB SATA MapReduce workloads in commodity clusters admission control classification - Netflix, HDD, 2 160GB SATA SSD) policy, main cache PUMA replacement scheme, [294]° Evaluation of IO-intensive applications Cassandra Docker’s data volume Cassandra-stress, (32 core hyper-threaded CPU, 20480KB cache, 128GB RAM, with Docker containers 3.0, TPC-C benchmark and 3 960GB NVMe SSD) MySQL 5.7, FIO 2.2 [295]° Examining the performance of different Hadoop - Modification to MRPerf Synthesized Google MRperf-based simulations for 2 and 4 racks clusters schedulers and impact of NAS on Hadoop to implement Fair share traces for 9174 jobs scheduler and Quincy [296]° Improving the flexibility of direct attached Hadoop FlexDAS switch, SAS TeraSort for 1TB, YCSB 12 nodes (4 core CPU, 48GB RAM), 10GbE and 48 external HDDs storage via a Disk Area Network (DAN) 2.0.5, expanders, Host Bus COSBench 0.3.0.0 Cassandra Adapters (HBAs) Swift 1.10.0 [297]° Optimizing bulk data transfers from parallel Scientific Layout-aware source Offline Storage Tables Spider storage system at the Oak Ridge Leadership computing file systems via layout-aware I/O operations applications algorithm, layout-aware (OSTs) with 20MB and Facility with 96 DataDirect Network (DDN) S2A9900 RAID sink algorithm 256MB files controllers for 13400 1TB SATA HDDs [298]° FaRM: Fast Remote Memory system using Key-value Circular buffer for YCSB 20 nodes (2 8-core CPUs, 128GB RAM, 240GB SSD), 40Gbps RDMA to improve in-memory key-value store and graph RDMA messaging, RDMA over Converged Ethernet (RoCE) stores PhyCo kernel driver, ZooKeeper [299]° HadoopA: Virtual shuffling for efficient Hadoop virtual shuffling through TeraSort, Grep, 21-nodes cluster (dual socket quad-core 2.13 GHz Intel Xeon, 8 GB data movement in MapReduce 0.20.0 three-level segment WordCount, and RAM, 500 GB disk, 8 PCI-Express 2.0 bus), InfiniBand QDR switch, near-demand merging, Hive workloads 48-port 10GigE and merging sub-trees

[300]° Micro-architectural characterization of scale- - Intel VTune for Caching, NoSQL, PowerEdge M1000e (2 Intel 6-core CPUs, 32KB out workloads characterization MapReduce, media L1 cache, 256KB L2 cache, 12MB L3 cache, 24GB RAM) PARSEC, SPEC2006, TPC-E [301]° A 2 PetaFLOP, 3 Petabyte, 9 TB/s - Hoffman-Singleton - 2,550 nodes (64-bit Boston chip, 64GB DDR3 SDRAM, 1TB NAND and 90 kW cabinet for exascale computations topology flash)

[302]° Mars: Accelerating MapReduce Hadoop 1.x CUDA, Hadoop String match, matrix 240-core NVIDIA GTX280+ 4-core CPU), (128-core NVIDIA with GPUs streaming, GPU Prefix multiplication, MC, 8800GTX + 4-core CPU), (320 core ATI Radeon HD 3870 + 2-core Sum routine, GPU Black-scholes, similarity CPU) Bitonic Sort routine score, PCA [303]° Using GPUs and MPTCP to Hadoop 1.x CUDA, Hadoop pipes, Terasort, WordCount and Node1 (Intel i7 920, 24GB RAM, 4 1TB HDD), node2 (Intel Quad improve Hadoop performance MPTCP PiEstimate DataGen Q9400, NVIDIA GT530 GPU, 4GB RAM, 500GB HDD), heterogeneous 5 nodes [304]° FPMR: a MapReduce framework - On-chip processor RankBoost 1 node (Intel Pentium 4, 4GB RAM), Altera Stratix II EP2S180F1508 on FPGA to accelerate RankBoost scheduler, Common Data FPGA, Quartus II 8.1 and ModelSim 6.1-based simulations Path (CDP)

[305]° SODA: software defined acceleration for big - Vivado high level Constrained Shortest Path Xilinx Zynq FPGA, ARM Cortex processor data with heterogeneous reconfigurable synthesis tools, out-of- Finding (CSPF) for SDN multicore FPGA resources order scheduling algorithm [306]° HLSMapReduceFlow: Synthesizable - High-Level Synthesis- WordCount, histogram, Virtex-7 FPGA MapReduce Accelerator for FPGA-coupled Data based MapReduce Matrix multiplication, Centers dataflow linear regression, PCA, k- means [307]° FPGA-based in-NIC and software-based NoSQL DRAM and multi USR, SYS, APP, NetFPGA-10G ( Xilinx Virtex-5 XC5VTX240T) of NetFPGA, in-Kernel caches for NoSQL processing elements in ETC, VAR NetFPGA, Netfilter framework for kernel cache [308]° FPGA-based processing in NIC Spark 1.6.0, one-at-a-time WordCount, change NetFPGA-10G (Xilinx Virtex-5 XC5VTX240TFFG1759-2) as NIC in for Spark streaming Scala 2.10.5 methodology operations point detection server node (Intel core i5 CPU, 8GB DRAM) and a client node [309]° FPGA-acceleration for large - Graph Processing Graphlet counting Convey HC-1 server with Virtex-5 LX330s FPGA graph processing Elements (GPEs), algorithm memory interconnect network, run-time management unit [310]° Network requirements for big data Hadoop, Page-level memory WordCount, sort, EC2 instances (m3.2xlarge, c3.4xlarge), with virtual private network application in disaggregated data centers Spark, access, block-level collaborative (VPC), simulations and emulations GraphLab, distributed data filtering, YCSB Timely placement RDMA and dataflow, integrated NICs Memchached, HERD, SparkSQL [311]° A composable architecture for rack-scale Memchached, Empirical approaches for 100k Memcached H3 RAID array with 12 SATA drives and single PCIe Gen3 × 4 port, disaggregation for big data computing Giraph, resources provisioning, operations, PCIe switch with 16 Gen2 × 4 port, host bus adapter connected to Cassandra PCIe switches TopKPagerank, 10k IBM 3650M4 Cassandra operations * Scheduling in data centers, °Performance improvements based on advances in technologies. VIII. SUMMARY OF OPTIMIZATION STUDIES AND FUTURE RESEARCH DIRECTIONS: Tremendous efforts were devoted in the last decade to address various challenges in optimizing the deployment of big data applications and their infrastructures. The ever increasing data volumes and the heterogeneous requirements of available and proposed frameworks in dedicated and multi-tenant clusters are still calling for further improvements of different aspects of these infrastructures. We categorized the optimization studies of big data applications into three broad categories which are at the applications-level, cloud networking- level, and at the data center-level. Application-level studies focused on the efforts to improve the configuration or structure of the frameworks. These studies were further categorized into optimizing jobs and data placements, jobs scheduling, and completion time, in addition to benchmarking, production traces, modeling, profiling and simulators for big data applications. Studies at the cloud networking level addressed the optimization beyond single cluster deployments, for example for transporting big data and/or for in-network processing of big data. These studies were further categorized into cloud resource management, virtual assignments and container assignments, bulk transfers and inter data center networking, and SDN, EON, and NFV optimization studies. Finally, the data center-level studies focused on optimizing the design of the clusters to improve the performance of the applications. These studies were further categorized into topology, routing, scheduling of flows, coflows, and jobs, and advances in computing, storage, and networking technologies studies. In what follows, we highlight key challenges for big data application deployments and provide some insights into research gaps and future directions. Big Data Volumes: There is a huge gap in most of the studies between the actual volumes of big data and the tested traffic or workloads volumes. The main reason is the relatively high costs of experimenting in large clusters or renting IaaS. This calls for improving existing simulators or performance models to enable accurate testing of systems at large scale. Data volumes are continuing to grow beyond the capabilities of existing systems due to many bottlenecks in computing and networking. This will require continuing the investment in scale-up systems to incorporate different technologies such as SDN and advanced optical networks for intra and inter data center networking. Another key challenge with big data is related to the veracity and value of big data which calls for cleansing techniques prior to processing to eliminate unnecessary computations. Workload characteristics and their modeling: Big data workload characteristics and available frameworks will keep changing and evolving. Most of the studies of big data address only the MapReduce framework while few considered other variations like key-value store, streaming and graph processing applications or a mixture of applications. Also, several studies have utilized scaled and outdated production traces where only high level statistics are available or a subset of workloads in micro benchmarks is available for the evaluations which might not be very representative. Thus, there is a need for more comprehensive, enhanced, and updated benchmarking suites and production traces to reflect more recent frameworks and workload characteristics. Resources allocation: Different workloads can have uncorrelated CPU, I/O, memory, and networking resources requirements and placing those together can improve the overall infrastructure utilization. However, isolating resources such as cache, network, and I/O via recent management platforms at large scale is still challenging. Improving some of the resources can change the optimum configurations for applications. For example, improving the networking can reduce the need to maintain data locality, and hence, the stress on tasks scheduling is reduced. Also, improving the CPU can change CPU-bound workloads into I/O or memory-bound workloads. Another challenge is that there is still a gap between the users’ knowledge about their requirements and the resources they lease which leads to non-optimal configurations and hence, waste of resources, delayed response, and higher energy consumption. More tools to aid users in understanding their requirements or their workloads are required for better resource utilization. Performance in proposed clusters: Most of the research that enhances big data frameworks, reported in in Section III, was carried in scale-out clusters while overlooking the physical layer characteristics and focusing on the framework details. Alternatively, most of the studies in Sections V, and VII have considered custom-build simulators, SDN switches-based emulations, or small scale-up/out clusters while focusing on the hardware impact, considering only a subset of configurations, and oversimplifying the frameworks characteristics. Although it might sound more economical to scale out infrastructures as the computational requirements increase, this might not be sufficient for applications with strict QoS requirements as scaling-out depends on high level of parallelism which is constrained by the network between the nodes. Hence, improving the networking and scaling up the data centers are required and are gaining research and industrial considerations. With improving DCNs, many tradeoffs should be considered including the scalability, agility, end-to-end delays, in addition to the complexity of the routing and scheduling mechanisms required to fully exploit the improved bandwidth links. This will mostly be satisfied by application-centric DCN architectures that utilize SDN to dynamically vary the topologies at run-time to match the requirements of deployed applications. Clusters heterogeneity: Potential existence of hardware with different specifications for example due to replacements in very large clusters can lead to completion time imbalance among tasks. This requires more attention in big data studies and accurate profiling of the performance in all nodes to improve task scheduling to reduce the imbalance. Multi-tenancy environments: Multi-tenancy is a key requirement enabled by virtualization of cloud data centers where workloads of different users are multiplexed in the same infrastructure to improve the utilization. In current cloud services, static cluster configurations and manual adjustments are still followed but are not optimal. There is still lack of dynamic job allocation and scheduling for multi-users. In such environments, the performance isolation between users due to sharing resources such as network and I/O is not yet widely addressed. Also, fairness between users and optimal pricing while maintaining acceptable QoS for all users is still a challenging research topic requiring more comprehensive multi-objective studies. Geo-distributed frameworks: The challenges faced in geo-distributed frameworks were summarized in Subsection IV-C. Those include the need for modifying the frameworks which were originally designed for single cluster deployments. There is a need for new frameworks for offers and pricing, optimal resources allocation in heterogeneous clusters, QoS guarantees, resilience, and energy consumption minimization. A key challenges with workloads in geo-distributed frameworks is that not all workloads can be divided as the whole data set may need to reside in a single cluster, also transporting some data sets from remote locations can have high data-access latency. Extended SDN control between transport networks and within data centers is a promising research area to jointly optimize path computations, provision distributed resources, and reduce jobs completion time. It is also a promising research area to improve big data applications and frameworks in geo-distributed environments. Power consumption: Current infrastructures have a trade-off between energy efficiency and the applications performance. Most cloud providers still favor over-provisioning to meet SLA over reducing the power consumption. Reducing the power consumption while maintaining the performance is an area that should be explored further in designing future systems. For example, it is attractive to consider more energy-efficient components, more efficient geo-distributed frameworks to reduce the need for transporting big data. It is also attractive to perform progressive computations in the network as the data transitions, while considering agile technologies such SDN, VNE, and NFV, which can improve the energy efficiency of big data applications, however, the impact of those strategies on the applications performance should be comprehensively addressed.

IX. CONCLUSIONS Improving the performance and efficiency of big data frameworks and applications is an ongoing critical research area as they are becoming the mainstream for implementing various services including data analytics and machine learning at large scale. These are services that continue to grow in importance. Support should also be developed for other services deployed fully or partially in cloud and fog computing environments with ever increasing volumes of generated data. Big data interacts with systems at different levels starting from the acquisition of data through wireless and wired access networks from users and IoTs, to transmission through WAN networks, into different types of data centers for storage and processing via different frameworks. This survey paper surveyed big data applications and the technologies and network infrastructure needed to implement them. It identified a number of key challenges and research gaps in optimizing big data applications and infrastructures. It has also comprehensively summarized early and recent efforts towards improving the performance and/or energy efficiency of such big data applications at different layers. For the convenience of readers with different backgrounds, brief descriptions of big data applications and frameworks, cloud computing and related emerging technologies, and data centers are provided in Sections II, IV, and VI, respectively. The optimization studies, that appear in Section III focus on the frameworks, those that appear in Section V focus on cloud networking, with Section VII focusing on data centers, and finally comprehensive summaries are given in Tables I-VI. The survey paid attention to a range of existing and proposed technologies and focused on different frameworks and applications including MapReduce, data streaming and graph processing. The survey considered different optimization metrics (e.g. completion time, fairness, cost, profit, and energy consumption), reported studies that considered different representative workloads, optimization tools and mathematical techniques, and considered simulation-based and experimental evaluations in clouds and prototypes. We provided some future research directions in Section VIII to aid researchers in identifying the limitations of current solutions and hence determine the area where future technologies should be developed in order to improve big data applications, their infrastructures and performance.

ACKNOWLEDGEMENTS Sanaa Hamid Mohamed would like to acknowledge Doctoral Training Award (DTA) funding from the UK Engineering and physical Sciences Research Council (EPSRC). This work was supported by the Engineering and Physical Sciences Research Council, INTERNET (EP/H040536/1), STAR (EP/K016873/1) and TOWS (EP/S016570/1) projects. All data are provided in full in the results section of this paper.

REFERENCES [1] H. Hu, Y. Wen, T.-S. Chua, and X. Li, “Toward Scalable Systems for Big Data Analytics: A Technology Tutorial,” Access, IEEE, vol. 2, pp. 652–687, 2014. [2] Y. Demchenko, C. de Laat, and P. Membrey, “Defining architecture components of the Big Data Ecosystem,” in Collaboration Technologies and Systems (CTS), 2014 International Conference on, May 2014, pp. 104–112. [3] X. Yi, F. Liu, J. Liu, and H. Jin, “Building a network highway for big data: architecture and challenges,” Network, IEEE, vol. 28, no. 4, pp. 5–13, July 2014. [4] H. Fang, Z. Zhang, C. J. Wang, M. Daneshmand, C. Wang, and H. Wang, “A survey of big data research,” Network, IEEE, vol. 29, no. 5, pp. 6–9, September 2015. [5] W. Tan, M. Blake, I. Saleh, and S. Dustdar, “Social-Network-Sourced Big Data Analytics,” Internet Computing, IEEE, vol. 17, no. 5, pp. 62–69, Sept 2013. [6] C. Fang, J. Liu, and Z. Lei, “Parallelized user clicks recognition from massive HTTP data based on dependency graph model,” Communications, China, vol. 11, no. 12, pp. 13–25, Dec 2014. [7] H. Hu, Y. Wen, Y. Gao, T.-S. Chua, and X. Li, “Toward an SDN-enabled big data platform for social TV analytics,” Network, IEEE, vol. 29, no. 5, pp. 43–49, September 2015. [8] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008. [9] C. P. Chen and C.-Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on big data,” Information Sciences, vol. 275, pp. 314 – 347, 2014. [10] J. Hack and M. Papka, “Big Data: Next-Generation Machines for Big Science,” Computing in Science Engineering, vol. 17, no. 4, pp. 63–65, July 2015. [11] Y. Xu and S. Mao, “A survey of mobile cloud computing for rich media applications,” Wireless Communications, IEEE, vol. 20, no. 3, pp. 46–53, June 2013. [12] “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2016-2021,” White Paper, Cisco, March 2017. [13] X. He, K. Wang, H. Huang, and B. Liu, “QoE-Driven Big Data Architecture for Smart City,” IEEE Communications Magazine, vol. 56, no. 2, pp. 88–93, Feb 2018. [14] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications,” Communications Surveys Tutorials, IEEE, vol. 17, no. 4, pp. 2347–2376, Fourthquarter 2015. [15] E. Marín-Tordera, X. Masip-Bruin, J. G. Almiñana, A. Jukan, G. Ren, J. Zhu, and J. Farre, “What is a Fog Node A Tutorial on Current Concepts towards a Common Definition,” CoRR, vol. abs/1611.09193, 2016. [16] P. Mach and Z. Becvar, “Mobile Edge Computing: A Survey on Architecture and Computation Offloading,” IEEE Communications Surveys Tutorials, vol. 19, no. 3, pp. 1628–1656, thirdquarter 2017. [17] I. Stojmenovic, “Fog computing: A cloud to the ground support for smart things and machine-to-machine networks,” in 2014 Australasian Telecommunication Networks and Applications Conference (ATNAC), Nov 2014, pp. 117–122. [18] S. Wang, X. Wang, J. Huang, R. Bie, and X. Cheng, “Analyzing the potential of mobile opportunistic networks for big data applications,” IEEE Network, vol. 29, no. 5, pp. 57–63, September 2015. [19] Z. T. Al-Azez, A. Q. Lawey, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Energy efficient IoT virtualization framework with passive optical access networks,” in 2016 18th International Conference on Transparent Optical Networks (ICTON), July 2016, pp. 1–4. [20] Z. T. Al-Azez and A. Q. Lawey and T. E. H. El-Gorashi and J. M. H. Elmirghani, “Virtualization framework for energy efficient IoT networks,” in 2015 IEEE 4th International Conference on Cloud Networking (CloudNet), Oct 2015, pp. 74–77. [21] A. A. Alahmadi, A. Q. Lawey, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Distributed processing in vehicular cloud networks,” in 2017 8th International Conference on the Network of the Future (NOF), Nov 2017, pp. 22–26. [22] H. Q. Al-Shammari, A. Lawey, T. El-Gorashi, and J. M. H. Elmirghani, “Energy efficient service embedding in IoT networks,” in 2018 27th Wireless and Optical Communication Conference (WOCC), April 2018, pp. 1–5. [23] B. Yosuf, M. Musa, T. Elgorashi, A. Q. Lawey, and J. M. H. Elmirghani, “Energy Efficient Service Distribution in Internet of Things,” in 2018 20th International Conference on Transparent Optical Networks (ICTON), July 2018, pp. 1–4. [24] I. S. M. Isa, M. O. I. Musa, T. E. H. El-Gorashi, A. Q. Lawey, and J. M. H. Elmirghani, “Energy Efficiency of Fog Computing Health Monitoring Applications,” in 2018 20th International Conference on Transparent Optical Networks (ICTON), July 2018, pp. 1–5. [25] M. B. A. Halim, S. H. Mohamed, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Fog-assisted caching employing solar renewable energy for delivering video on demand service,” CoRR, vol. abs/1903.10250, 2019. [26] F. S. Behbehani, M. O. I. Musa, T. Elgorashi, and J. M. H. Elmirghani, “Energy-efficient distributed processing in vehicular cloud architecture,” CoRR, vol. abs/1903.12451, 2019. [27] B. Yosuf, M. N. Musa, T. Elgorashi, and J. M. H. Elmirghani, “Impact of distributed processing on power consumption for iot based surveillance applications,” 2019. [28] I. S. M. Isa, M. O. I. Musa, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Energy Efficient and Resilient Infrastructure for Fog Computing Health Monitoring Applications,” arXiv e-prints, p. arXiv:1904.01732, Apr 2019. [29] R. Ma, A. A. Alahmadi, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Energy Efficient Software Matching in Vehicular Fog,” arXiv e-prints, p. arXiv:1904.02592, Apr 2019. [30] Z. T. Al-Azez, A. Q. Lawey, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Energy Efficient IoT Virtualization Framework with Peer to Peer Networking and Processing,” IEEE Access, pp. 1–1, 2019. [31] M. S. H. Graduate, A. Q. Lawey, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Patient-Centric Cellular Networks Optimization using Big Data Analytics,” IEEE Access, pp. 1–1, 2019. [32] K. Dolui and S. K. Datta, “Comparison of edge computing implementations: Fog computing, cloudlet and mobile edge computing,” in 2017 Global Internet of Things Summit (GIoTS), June 2017, pp. 1–6. [33] J. Andreu-Perez, C. Poon, R. Merrifield, S.Wong, and G.-Z. Yang, “Big Data for Health,” Biomedical and Health Informatics, IEEE Journal of, vol. 19, no. 4, pp. 1193–1208, July 2015. [34] X. Xu, Q. Sheng, L.-J. Zhang, Y. Fan, and S. Dustdar, “From Big Data to Big Service,” Computer, vol. 48, no. 7, pp. 80–83, July 2015. [35] S. Mazumdar and S. Dhar, “Hadoop as Big Data Operating System– The Emerging Approach for Managing Challenges of Enterprise Big Data Platform,” in Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on, March 2015, pp. 499–505. [36] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares, and X. Qin, “Improving MapReduce performance through data placement in heterogeneous Hadoop clusters,” in Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, April 2010, pp. 1–9. [37] J. Leverich and C. Kozyrakis, “On the Energy (in)Efficiency of Hadoop Clusters,” SIGOPS Oper. Syst. Rev., vol. 44, no. 1, pp. 61–65, Mar. 2010. [38] Y. Ying, R. Birke, C. Wang, L. Chen, and N. Gautam, “Optimizing Energy, Locality and Priority in a MapReduce Cluster,” in Autonomic Computing (ICAC), 2015 IEEE International Conference on, July 2015, pp. 21–30. [39] X. Ma, X. Fan, J. Liu, and D. Li, “Dependency-aware Data Locality for MapReduce,” Cloud Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [40] J. Tan, S. Meng, X. Meng, and L. Zhang, “Improving ReduceTask data locality for sequential MapReduce jobs,” in INFOCOM, 2013 Proceedings IEEE, April 2013, pp. 1627–1635. [41] B. Arres, N. Kabachi, O. Boussaid, and F. Bentayeb, “Intentional Data Placement Optimization for Distributed Data Warehouses,” in Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on, Oct 2015, pp. 80–86. [42] X. Bao, L. Liu, N. Xiao, F. Liu, Q. Zhang, and T. Zhu, “HConfig: Resource adaptive fast bulk loading in HBase,” in Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), 2014 International Conference on, Oct 2014, pp. 215–224. [43] Y. Elshater, P. Martin, D. Rope, M. McRoberts, and C. Statchuk, “A Study of Data Locality in YARN,” in 2015 IEEE International Congress on Big Data, June 2015, pp. 174–181. [44] Z. Liu, Q. Zhang, R. Ahmed, R. Boutaba, Y. Liu, and Z. Gong, “Dynamic Resource Allocation for MapReduce with Partitioning Skew,” IEEE Transactions on Computers, vol. PP, no. 99, pp. 1–1, 2016. [45] X. Ding, Y. Liu, and D. Qian, “JellyFish: Online Performance Tuning with Adaptive Configuration and Elastic Container in Hadoop Yarn,” in Parallel and Distributed Systems (ICPADS), 2015 IEEE 21st International Conference on, Dec 2015, pp. 831–836. [46] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling,” in Proceedings of the 5th European Conference on Computer Systems, ser. EuroSys ’10. New York, NY, USA: ACM, 2010, pp. 265–278. [47] M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg, “Quincy: Fair Scheduling for Distributed Computing Clusters,” in Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, ser. SOSP ’09. New York, NY, USA: ACM, 2009, pp. 261–276. [48] S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu, and S. Wu, “Maestro: Replica-Aware Map Scheduling for MapReduce,” in Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on, May 2012, pp. 435–442. [49] W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, “MapTask Scheduling in MapReduce With Data Locality: Throughput and Heavy-Traffic Optimality,” Networking, IEEE/ACM Transactions on, vol. PP, no. 99, pp. 1–1, 2014. [50] J. Polo, C. Castillo, D. Carrera, Y. Becerra, I. Whalley, M. Steinder, J. Torres, and E. Ayguadé, “Resource- aware Adaptive Scheduling for Mapreduce Clusters,” in Proceedings of the 12th ACM/IFIP/USENIX International Conference on Middleware, ser. Middleware’11. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 187–207. [51] J. Wolf, D. Rajan, K. Hildrum, R. Khandekar, V. Kumar, S. Parekh, K.- L. Wu, and A. balmin, “FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads,” in Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware, ser. Middleware ’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 1–20. [52] M. Pastorelli, D. Carra, M. Dell’Amico, and P. Michiardi, “HFSP: Bringing Size-Based Scheduling To Hadoop,” Cloud Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [53] Y. Yao, J. Tai, B. Sheng, and N. Mi, “LsPS: A Job Size-Based Scheduler for Efficient Task Assignments in Hadoop,” Cloud Computing, IEEE Transactions on, vol. 3, no. 4, pp. 411–424, Oct 2015. [54] Y. Yuan, D. Wang, and J. Liu, “Joint scheduling of MapReduce jobs with servers: Performance bounds and experiments,” in IEEE INFOCOM 2014 - IEEE Conference on Computer Communications, April 2014, pp. 2175– 2183. [55] F. Chen, M. Kodialam, and T. V. Lakshman, “Joint scheduling of processing and Shuffle phases in MapReduce systems,” in INFOCOM, 2012 Proceedings IEEE, March 2012, pp. 1143–1151. [56] S. Kurazumi, T. Tsumura, S. Saito, and H. Matsuo, “Dynamic Processing Slots Scheduling for I/O Intensive Jobs of Hadoop MapReduce,” in Proceedings of the 2012 Third International Conference on Networking and Computing, ser. ICNC ’12. Washington, DC, USA: IEEE Computer Society, 2012, pp. 288–292. [57] E. Bampis, V. Chau, D. Letsios, G. Lucarelli, I. Milis, and G. Zois, “Energy Efficient Scheduling of MapReduce Jobs,” in Euro-Par 2014 Parallel Processing, ser. Lecture Notes in Computer Science, F. Silva, I. Dutra, and V. Santos Costa, Eds. Springer International Publishing, 2014, vol. 8632, pp. 198–209. [58] T. Wirtz and R. Ge, “Improving MapReduce Energy Efficiency for Computation Intensive Workloads,” in Proceedings of the 2011 International Green Computing Conference and Workshops, ser. IGCC ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 1–8. [59] L. Mashayekhy, M. Nejad, D. Grosu, Q. Zhang, and W. Shi, “Energy- Aware Scheduling of MapReduce Jobs for Big Data Applications,” Parallel and Distributed Systems, IEEE Transactions on, vol. 26, no. 10, pp. 2720–2733, Oct 2015. [60] S. Tang, B.-S. Lee, and B. He, “DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters,” Cloud Computing, IEEE Transactions on, vol. 2, no. 3, pp. 333–347, July 2014. [61] Q. Zhang, M. Zhani, Y. Yang, R. Boutaba, and B. Wong, “PRISM: Fine-Grained Resource-Aware Scheduling for MapReduce,” Cloud Computing, IEEE Transactions on, vol. 3, no. 2, pp. 182–194, April 2015. [62] Y. Yao, J. Wang, B. Sheng, J. Lin, and N. Mi, “HaSTE: Hadoop YARN Scheduling Based on Task- Dependency and Resource-Demand,” in 2014 IEEE 7th International Conference on Cloud Computing, June 2014, pp. 184–191. [63] P. Li, L. Ju, Z. Jia, and Z. Sun, “SLA-Aware Energy-Efficient Scheduling Scheme for Hadoop YARN,” in High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on, Aug 2015, pp. 623–628. [64] P. Bellavista, A. Corradi, A. Reale, and N. Ticca, “Priority-Based Resource Scheduling in Distributed Stream Processing Systems for Big Data Applications,” in Utility and Cloud Computing (UCC), 2014 IEEE/ACM 7th International Conference on, Dec 2014, pp. 363–370. [65] K. Xiong and Y. He, “Power-effiicent resource allocation in MapReduce clusters,” in Integrated Network Management (IM 2013), 2013 IFIP/IEEE International Symposium on, May 2013, pp. 603–608. [66] D. Cheng, J. Rao, C. Jiang, and X. Zhou, “Resource and Deadline- Aware Job Scheduling in Dynamic Hadoop Clusters,” in Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, May 2015, pp. 956–965. [67] Z. Ren, J. Wan, W. Shi, X. Xu, and M. Zhou, “Workload Analysis, Implications, and Optimization on a Production Hadoop Cluster: A Case Study on Taobao,” IEEE Transactions on Services Computing, vol. 7, no. 2, pp. 307–321, April 2014. [68] C. Chen, J. Lin, and S. Kuo, “MapReduce Scheduling for Deadline- Constrained Jobs in Heterogeneous Cloud Computing Systems,” Cloud Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [69] A. Verma, L. Cherkasova, and R. H. Campbell, “ARIA: Automatic Resource Inference and Allocation for Mapreduce Environments,” in Proceedings of the 8th ACM International Conference on Autonomic Computing, ser. ICAC ’11. New York, NY, USA: ACM, 2011, pp. 235–244. [70] X. Dai and B. Bensaou, “Scheduling for response time in Hadoop MapReduce,” in 2016 IEEE International Conference on Communications (ICC), May 2016, pp. 1–6. [71] H. Chang, M. Kodialam, R. R. Kompella, T. V. Lakshman, M. Lee, and S. Mukherjee, “Scheduling in mapreduce-like systems for fast completion time,” in INFOCOM, 2011 Proceedings IEEE, April 2011, pp. 3074–3082. [72] S. Tang, B. Lee, and B. He, “Dynamic Job Ordering and Slot Configurations for MapReduce Workloads,” Services Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [73] T. Li, G. Yu, X. Liu, and J. Song, “Analyzing the Waiting Energy Consumption of NoSQL Databases,” in Dependable, Autonomic and Secure Computing (DASC), 2014 IEEE 12th International Conference on, Aug 2014, pp. 277–282. [74] S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou, “Re-optimizing Data-parallel Computing,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’12. Berkeley, CA, USA: USENIX Association, 2012, pp. 21–21. [75] H. Wang, H. Chen, Z. Du, and F. Hu, “BeTL: MapReduce Checkpoint Tactics Beneath the Task Level,” Services Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [76] B. Ghit and D. Epema, “Reducing Job Slowdown Variability for Data-Intensive Workloads,” in Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2015 IEEE 23rd International Symposium on, Oct 2015, pp. 61–70. [77] S. M. Nabavinejad, M. Goudarzi, and S. Mozaffari, “The Memory Challenge in Reduce Phase of MapReduce Applications,” IEEE Transactions on Big Data, vol. PP, no. 99, pp. 1–1, 2016. [78] X. Shi, M. Chen, L. He, X. Xie, L. Lu, H. Jin, Y. Chen, and S. Wu, “Mammoth: Gearing Hadoop Towards Memory-Intensive MapReduce Applications,” Parallel and Distributed Systems, IEEE Transactions on, vol. 26, no. 8, pp. 2300–2315, Aug 2015. [79] Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, “SkewTune: Mitigating Skew in Mapreduce Applications,” in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’12. New York, NY, USA: ACM, 2012, pp. 25–36. [80] Apache Hadoop: Gridmix. (Cited on 2018, Jan). [Online]. Available: https://hadoop.apache.org/docs/current/hadoop-gridmix/GridMix.html [81] S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, “The HiBench benchmark suite: Characterization of the MapReduce-based data analysis,” in Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on, March 2010, pp. 41–51. [82] V. A. Saletore, K. Krishnan, V. Viswanathan, and M. E. Tolentino, “HcBench: Methodology, development, and characterization of a customer usage representative big data/Hadoop benchmark,” in 2013 IEEE International Symposium on Workload Characterization (IISWC), Sept 2013, pp. 77–86. [83] F. Ahmad, S. Lee, M. Thottethodi, and T. Vijaykumar, “Puma: Purdue mapreduce benchmarks suite,” 2012. [84] Hive performance benchmarks. (Cited on 2018, Jan). [Online]. Available: https://issues.apache.org/jira/browse/HIVE-396 [85] Apache Hadoop: Pigmix. (Cited on 2018, Jan). [Online]. Available: https://cwiki.apache.org/confluence/display/PIG/PigMix [86] A. Ghazal, T. Rabl, M. Hu, F. Raab, M. Poess, A. Crolotte, and H.- A. Jacobsen, “BigBench: Towards an Industry Standard Benchmark for Big Data Analytics,” in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’13. New York, NY, USA: ACM, 2013, pp. 1197–1208. [87] Transaction Processing Performance Council (TPC-H). (Cited on 2018, Jan). [Online]. Available: http://www.tpc.org/tpch/ [88] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking Cloud Serving Systems with YCSB,” in Proceedings of the 1st ACM Symposium on Cloud Computing, ser. SoCC ’10. New York, NY, USA: ACM, 2010, pp. 143–154. [89] S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan, “An Analysis of Traces from a Production MapReduce Cluster,” in Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, ser. CCGRID ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 94–103. [90] Y. Chen, A. Ganapathi, R. Griffith, and R. Katz, “The Case for Evaluating MapReduce Performance Using Workload Suites,” in Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2011 IEEE 19th International Symposium on, July 2011, pp. 390–399. [91] Y. Chen, S. Alspaugh, D. Borthakur, and R. Katz, “Energy Efficiency for Large-scale MapReduce Workloads with Significant Interactive Analysis,” in Proceedings of the 7th ACM European Conference on Computer Systems, ser. EuroSys ’12. New York, NY, USA: ACM, 2012, pp. 43–56. [92] Y. Chen, S. Alspaugh, and R. H. Katz, “Design Insights for MapReduce from Diverse Production Workloads,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2012-17, Jan 2012. [93] A. K. Mishra, J. L. Hellerstein, W. Cirne, and C. R. Das, “Towards characterizing cloud backend workloads: Insights from google compute clusters,” SIGMETRICS Perform. Eval. Rev., vol. 37, no. 4, pp. 34–41, Mar. 2010. [94] C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch, “Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis,” in Proceedings of the Third ACM Symposium on Cloud Computing, ser. SoCC ’12. New York, NY, USA: ACM, 2012, pp. 7:1–7:13. [95] R. Birke, L. Y. Chen, and E. Smirni, “Multi-resource characterization and their (in)dependencies in production datacenters,” in 2014 IEEE Network Operations and Management Symposium (NOMS), May 2014, pp. 1–6. [96] K. Ren, Y. Kwon, M. Balazinska, and B. Howe, “Hadoop’s Adolescence: An Analysis of Hadoop Usage in Scientific Workloads,” Proc. VLDB Endow., vol. 6, no. 10, pp. 853–864, Aug. 2013. [97] S. Shen, V. v. Beek, and A. Iosup, “Statistical Characterization of Business-Critical Workloads Hosted in Cloud Datacenters,” in Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, May 2015, pp. 465–474. [98] Y. Chen, A. S. Ganapathi, A. Fox, R. H. Katz, and D. A. Patterson, “Statistical Workloads for Energy Efficient MapReduce,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS- 2010-6, Jan 2010. [99] H. Yang, Z. Luan, W. Li, and D. Qian, “Mapreduce workload modelling with statistical approach,” Journal of Grid Computing, vol. 10, no. 2, pp. 279–310, Jun 2012. [100] Z. Jia, J. Zhan, L. Wang, C. Luo, W. Gao, Y. Jin, R. Han, and L. Zhang, “Understanding Big Data Analytics Workloads on Modern Processors,” IEEE Transactions on Parallel and Distributed Systems, vol. PP, no. 99, pp. 1–1, 2016. [101] J. Deng, G. Tyson, F. Cuadrado, and S. Uhlig, “Keddah: Capturing Hadoop Network Behaviour,” in 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), June 2017, pp. 2143–2150. [102] H. Herodotou and S. Babu, “Profiling, What-if Analysis, and Costbased Optimization of MapReduce Programs.” PVLDB, vol. 4, no. 11, pp. 1111–1122, 2011. [103] H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu, “Starfish: A Self-tuning System for Big Data Analytics,” in In CIDR, 2011, pp. 261–272. [104] M. Khan, Y. Jin, M. Li, Y. Xiang, and C. Jiang, “Hadoop Performance Modeling for Job Estimation and Resource Provisioning,” Parallel and Distributed Systems, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [105] N. B. Rizvandi, J. Taheri, R. Moraveji, and A. Y. Zomaya, “On Modelling and Prediction of Total CPU Usage for Applications in Mapreduce Environments,” in Proceedings of the 12th International Conference on Algorithms and Architectures for Parallel Processing - Volume Part I, ser. ICA3PP’12. Berlin, Heidelberg: Springer-Verlag, 2012, pp. 414–427. [106] Apache Hadoop MapReduce Mumak: Map-Reduce simulator. (Cited on 2016, Dec). [Online]. Available: https://issues.apache.org/jira/browse/MAPREDUCE-728l [107] A. Verma, L. Cherkasova, and R. H. Campbell, “Play It Again, SimMR!” in 2011 IEEE International Conference on Cluster Computing, Sept 2011, pp. 253–261. [108] S. Hammoud, M. Li, Y. Liu, N. K. Alham, and Z. Liu, “MRSim: A discrete event based MapReduce simulator,” in 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, vol. 6, Aug 2010, pp. 2993–2997. [109] J. Jung and H. Kim, “MR-CloudSim: Designing and implementing MapReduce computing model on CloudSim,” in 2012 International Conference on ICT Convergence (ICTC), Oct 2012, pp. 504–509. [110] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and R. Buyya, “CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms,” Softw. Pract. Exper., vol. 41, no. 1, pp. 23–50, Jan. 2011. [111] G. Wang, A. R. Butt, P. Pandey, and K. Gupta, “Using Realistic Simulation for Performance Analysis of Mapreduce Setups,” in Proceedings of the 1st ACM Workshop on Large-Scale System and Application Performance, ser. LSAP ’09. New York, NY, USA: ACM, 2009, pp. 19–26. [112] M. V. Neves, C. A. F. D. Rose, and K. Katrinis, “MRemu: An Emulation-Based Framework for Datacenter Network Experimentation Using Realistic MapReduce Traffic,” in Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2015 IEEE 23rd International Symposium on, Oct 2015, pp. 174–177. [113] Z. Bian, K. Wang, Z. Wang, G. Munce, I. Cremer, W. Zhou, Q. Chen, and G. Xu, “Simulating Big Data Clusters for System Planning, Evaluation, and Optimization,” in 2014 43rd International Conference on Parallel Processing, Sept 2014, pp. 391–400. [114] J. Liu, B. Bian, and S. S. Sury, “Planning Your SQL-on-Hadoop Deployment Using a Low-Cost Simulation- Based Approach,” in 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Oct 2016, pp. 182–189. [115] K. Wang, Z. Bian, Q. Chen, R. Wang, and G. Xu, “Simulating Hive Cluster for Deployment Planning, Evaluation and Optimization,” in 2014 IEEE 6th International Conference on Cloud Computing Technology and Science, Dec 2014, pp. 475–482. [116] Yarn Scheduler Load Simulator (SLS). (Cited on 2018, Jan). [Online]. Available: https://hadoop.apache.org/docs/r2.4.1/hadoop-sls/SchedulerLoadSimulator.html [117] P. Wette, A. Schwabe, M. Splietker, and H. Karl, “Extending Hadoop’s Yarn Scheduler Load Simulator with a highly realistic network amp; traffic model,” in Network Softwarization (NetSoft), 2015 1st IEEE Conference on, April 2015, pp. 1–2. [118] J.-C. Lin, I. C. Yu, E. B. Johnsen, and M.-C. Lee, “ABS-YARN: A Formal Framework for Modeling Hadoop YARN Clusters,” in Proceedings of the 19th International Conference on Fundamental Approaches to Software Engineering - Volume 9633. New York, NY, USA: Springer-Verlag New York, Inc., 2016, pp. 49–65. [119] N. Liu, X. Yang, X. H. Sun, J. Jenkins, and R. Ross, “YARNsim: Simulating Hadoop YARN,” in Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, May 2015, pp. 637– 646. [120] X. Xu, M. Tang, and Y. C. Tian, “Theoretical Results of QoSGuaranteed Resource Scaling for Cloud-based MapReduce,” IEEE Transactions on Cloud Computing, vol. PP, no. 99, pp. 1–1, 2016. [121] M. Mattess, R. N. Calheiros, and R. Buyya, “Scaling MapReduce Applications Across Hybrid Clouds to Meet Soft Deadlines,” in 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), March 2013, pp. 629–636. [122] C.-W. Tsai, W.-C. Huang, M.-H. Chiang, M.-C. Chiang, and C.-S. Yang, “A Hyper-Heuristic Scheduling Algorithm for Cloud,” IEEE Transactions on Cloud Computing, vol. 2, no. 2, pp. 236–250, April 2014. [123] N. Lim, S. Majumdar, and P. Ashwood-Smith, “MRCP-RM: a Technique for Resource Allocation and Scheduling of MapReduce Jobs with Deadlines,” IEEE Transactions on Parallel and Distributed Systems, vol. PP, no. 99, pp. 1–1, 2016. [124] K. Chen, J. Powers, S. Guo, and F. Tian, “CRESP: Towards Optimal Resource Provisioning for MapReduce Computing in Public Clouds,” Parallel and Distributed Systems, IEEE Transactions on, vol. 25, no. 6, pp. 1403– 1412, June 2014. [125] L. Sharifi, L. Cerdà-Alabern, F. Freitag, and L. Veiga, “Energy efficient cloud service provisioning: Keeping data center granularity in perspective,” Journal of Grid Computing, vol. 14, no. 2, pp. 299–325, 2016. [126] M. Zhang, R. Ranjan, M. Menzel, S. Nepal, P. Strazdins, W. Jie, and L. Wang, “An Infrastructure Service Recommendation System for Cloud Applications with Real-time QoS Requirement Constraints,” IEEE Systems Journal, vol. PP, no. 99, pp. 1–11, 2015. [127] V. Jalaparti, H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, “Bridging the Tenant-provider Gap in Cloud Services,” in Proceedings of the Third ACM Symposium on Cloud Computing, ser. SoCC ’12. New York, NY, USA: ACM, 2012, pp. 10:1–10:14. [128] D. Xie, Y. C. Hu, and R. R. Kompella, “On the performance projectability of MapReduce,” in Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on, Dec 2012, pp. 301–308. [129] C. Delimitrou and C. Kozyrakis, “The Netflix Challenge: Datacenter Edition,” IEEE Computer Architecture Letters, vol. 12, no. 1, pp. 29– 32, January 2013. [130] M. Jammal, A. Kanso, and A. Shami, “High availability-aware optimization digest for applications deployment in cloud,” in Communications (ICC), 2015 IEEE International Conference on, June 2015, pp. 6822– 6828. [131] J. Lee, Y. Turner, M. Lee, L. Popa, S. Banerjee, J.-M. Kang, and P. Sharma, “Application-driven Bandwidth Guarantees in Datacenters,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 4, pp. 467–478, Aug. 2014. [132] J. Guo, F. Liu, J. C. S. Lui, and H. Jin, “Fair Network Bandwidth Allocation in IaaS Datacenters via a Cooperative Game Approach,” IEEE/ACM Transactions on Networking, vol. 24, no. 2, pp. 873–886, April 2016. [133] L. Popa, P. Yalagandula, S. Banerjee, J. C. Mogul, Y. Turner, and J. R. Santos, “Elasticswitch: Practical work-conserving bandwidth guarantees for cloud computing,” SIGCOMM Comput. Commun. Rev., vol. 43, no. 4, pp. 351–362, Aug. 2013. [134] V. Jeyakumar, M. Alizadeh, D. Mazières, B. Prabhakar, C. Kim, and A. Greenberg, “EyeQ: Practical Network Performance Isolation at the Edge,” in Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, ser. nsdi’13. Berkeley, CA, USA: USENIX Association, 2013, pp. 297– 312. [135] A. Antoniadis, Y. Gerbessiotis, M. Roussopoulos, and A. Delis, “Tossing NoSQL-Databases Out to Public Clouds,” in Utility and Cloud Computing (UCC), 2014 IEEE/ACM 7th International Conference on, Dec 2014, pp. 223–232. [136] T. Z. J. Fu, J. Ding, R. T. B. Ma, M. Winslett, Y. Yang, and Z. Zhang, “DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams,” in Distributed Computing Systems (ICDCS), 2015 IEEE 35th International Conference on, June 2015, pp. 411–420. [137] L. Chen, S. Liu, B. Li, and B. Li, “Scheduling jobs across geodistributed datacenters with max-min fairness,” in IEEE INFOCOM 2017 - IEEE Conference on Computer Communications, May 2017, pp. 1–9. [138] S. Tang, B. S. Lee, and B. He, “Fair Resource Allocation for Data- Intensive Computing in the Cloud,” IEEE Transactions on Services Computing, vol. PP, no. 99, pp. 1–1, 2016. [139] H. Won, M. C. Nguyen, M. S. Gil, and Y. S. Moon, “Advanced resource management with access control for multitenant Hadoop,” Journal of Communications and Networks, vol. 17, no. 6, pp. 592–601, Dec 2015. [140] H. Herodotou, F. Dong, and S. Babu, “No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data- intensive Analytics,” in Proceedings of the 2Nd ACM Symposium on Cloud Computing, ser. SOCC ’11. New York, NY, USA: ACM, 2011, pp. 18:1–18:14. [141] B. Palanisamy, A. Singh, and L. Liu, “Cost-Effective Resource Provisioning for MapReduce in a Cloud,” Parallel and Distributed Systems, IEEE Transactions on, vol. 26, no. 5, pp. 1265–1279, May 2015. [142] B. Sharma, T. Wood, and C. R. Das, “HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers,” in Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems, ser. ICDCS ’13. Washington, DC, USA: IEEE Computer Society, 2013, pp. 102–111. [143] Y. Zhang, X. Fu, and K. K. Ramakrishnan, “Fine-grained multiresource scheduling in cloud datacenters,” in Local Metropolitan Area Networks (LANMAN), 2014 IEEE 20th International Workshop on, May 2014, pp. 1–6. [144] X. Zhu, L. Yang, H. Chen, J. Wang, S. Yin, and X. Liu, “Real- Time Tasks Oriented Energy-Aware Scheduling in Virtualized Clouds,” Cloud Computing, IEEE Transactions on, vol. 2, no. 2, pp. 168–180, April 2014. [145] Z. Li, J. Ge, H. Hu, W. Song, H. Hu, and B. Luo, “Cost and Energy Aware Scheduling Algorithm for Scientific Workflows with Deadline Constraint in Clouds,” Services Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [146] M. Cardosa, A. Singh, H. Pucha, and A. Chandra, “Exploiting Spatio- Temporal Tradeoffs for Energy- Aware MapReduce in the Cloud,” Computers, IEEE Transactions on, vol. 61, no. 12, pp. 1737–1751, Dec 2012. [147] F. Teng, D. Deng, L. Yu, and F. Magoulès, “An Energy-Efficient VM Placement in Cloud Datacenter,” in High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on, Aug 2014, pp. 173–180. [148] B. Palanisamy, A. Singh, L. Liu, and B. Jain, “Purlieus: Localityaware Resource Allocation for MapReduce in a Cloud,” in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’11. New York, NY, USA: ACM, 2011, pp. 58:1–58:11. [149] J. Park, D. Lee, B. Kim, J. Huh, and S. Maeng, “Locality-aware Dynamic VM Reconfiguration on MapReduce Clouds,” in Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’12. New York, NY, USA: ACM, 2012, pp. 27–36. [150] X. Bu, J. Rao, and C.-z. Xu, “Interference and Locality-aware Task Scheduling for MapReduce Applications in Virtual Clusters,” in Proceedings of the 22Nd International Symposium on High-performance Parallel and Distributed Computing, ser. HPDC ’13. New York, NY, USA: ACM, 2013, pp. 227–238. [151] M. Li, D. Subhraveti, A. R. Butt, A. Khasymski, and P. Sarkar, “CAM: A Topology Aware Minimum Cost Flow Based Resource Manager for MapReduce Applications in the Cloud,” in Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’12. New York, NY, USA: ACM, 2012, pp. 211–222. [152] V. van Beek, J. Donkervliet, T. Hegeman, S. Hugtenburg, and A. Iosup, “Self-Expressive Management of Business-Critical Workloads in Virtualized Datacenters,” Computer, vol. 48, no. 7, pp. 46–54, July 2015. [153] D. Tsoumakos, I. Konstantinou, C. Boumpouka, S. Sioutas, and N. Koziris, “Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA,” in Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on, May 2013, pp. 34–41. [154] H. Kang, Y. Chen, J. L. Wong, R. Sion, and J. Wu, “Enhancement of Xen’s Scheduler for MapReduce Workloads,” in Proceedings of the 20th International Symposium on High Performance Distributed Computing, ser. HPDC ’11. New York, NY, USA: ACM, 2011, pp. 251–262. [155] B. M. Ko, J. Lee, and H. Jo, “Toward Enhancing Block I/O Performance for Virtualized Hadoop Cluster,” in Utility and Cloud Computing (UCC), 2014 IEEE/ACM 7th International Conference on, Dec 2014, pp. 481– 482. [156] Y. Yu, H. Zou, W. Tang, L. Liu, and F. Teng, “Flex Tuner: A Flexible Container-Based Tuning System for Cloud Applications,” in Cloud Engineering (IC2E), 2015 IEEE International Conference on, March 2015, pp. 145–154. [157] R. Zhang, M. Li, and D. Hildebrand, “Finding the Big Data Sweet Spot: Towards Automatically Recommending Configurations for Hadoop Clusters on Docker Containers,” in Cloud Engineering (IC2E), 2015 IEEE International Conference on, March 2015, pp. 365–368. [158] Y. Kang and R. Y. C. Kim, “Twister Platform for MapReduce Applications on a Docker Container,” in 2016 International Conference on Platform Technology and Service (PlatCon), Feb 2016, pp. 1–3. [159] C. Rista, D. Griebler, C. A. F. Maron, and L. G. Fernandes, “Improving the Network Performance of a Container-Based Cloud Environment for Hadoop Systems,” in 2017 International Conference on High Performance Computing Simulation (HPCS), July 2017, pp. 619–626. [160] L. Yazdanov, M. Gorbunov, and C. Fetzer, “EHadoop: Network I/O Aware Scheduler for Elastic MapReduce Cluster,” in 2015 IEEE 8th International Conference on Cloud Computing, June 2015, pp. 821– 828. [161] N. Laoutaris, M. Sirivianos, X. Yang, and P. Rodriguez, “Interdatacenter Bulk Transfers with Netstitcher,” SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, pp. 74–85, Aug. 2011. [162] Y. Feng, B. Li, and B. Li, “Postcard: Minimizing Costs on Inter- Datacenter Traffic with Store-and- Forward,” in Distributed Computing Systems Workshops (ICDCSW), 2012 32nd International Conference on, June 2012, pp. 43–50. [163] P. Lu, K. Wu, Q. Sun, and Z. Zhu, “Toward online profit-driven scheduling of inter-DC data-transfers for cloud applications,” in Communications (ICC), 2015 IEEE International Conference on, June 2015, pp. 5583– 5588. [164] J. Garcia-Dorado and S. Rao, “Cost-aware Multi Data-Center Bulk Transfers in the Cloud from a Customer- Side Perspective,” Cloud Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [165] C. Wu, C. Ku, J. Ho, and M. Chen, “A Novel Pipeline Approach for Efficient Big Data Broadcasting,” Knowledge and Data Engineering, IEEE Transactions on, vol. 28, no. 1, pp. 17–28, Jan 2016. [166] J. Yao, P. Lu, L. Gong, and Z. Zhu, “On Fast and Coordinated Data Backup in Geo-Distributed Optical Inter-Datacenter Networks,” Journal of Lightwave Technology, vol. 33, no. 14, pp. 3005–3015, July 2015. [167] P. Lu, L. Zhang, X. Liu, J. Yao, and Z. Zhu, “Highly efficient data migration and backup for big data applications in elastic optical interdata-center networks,” Network, IEEE, vol. 29, no. 5, pp. 36–42, September 2015. [168] I. Alan, E. Arslan, and T. Kosar, “Energy-Aware Data Transfer Tuning,” in Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on, May 2014, pp. 626–634. [169] Y. Koshiba, W. Chen, Y. Yamada, T. Tanaka, and I. Paik, “Investigation of network traffic in geo- distributed data centers,” in Awareness Science and Technology (iCAST), 2015 IEEE 7th International Conference on, Sept 2015, pp. 174–179. [170] L. Zhang, C. Wu, Z. Li, C. Guo, M. Chen, and F. Lau, “Moving Big Data to The Cloud: An Online Cost- Minimizing Approach,” Selected Areas in Communications, IEEE Journal on, vol. 31, no. 12, pp. 2710– 2721, December 2013. [171] P. Li, S. Guo, S. Yu, and W. Zhuang, “Cross-cloud MapReduce for Big Data,” Cloud Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [172] P. Li, S. Guo, T. Miyazaki, X. Liao, H. Jin, A. Y. Zomaya, and K.Wang, “Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 6, pp. 1785–1796, June 2017. [173] A. M. Al-Salim, A. Q. Lawey, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Energy Efficient Big Data Networks: Impact of Volume and Variety,” IEEE Transactions on Network and Service Management, vol. PP, no. 99, pp. 1–1, 2017. [174] A. M. Al-Salim, T. E. El-Gorashi, A. Q. Lawey, and J. M. Elmirghani, “Greening big data networks: velocity impact,” IET Optoelectronics, November 2017. [175] C. Joe-Wong, I. Kamitsos, and S. Ha, “Interdatacenter Job Routing and Scheduling With Variable Costs and Deadlines,” IEEE Transactions on Smart Grid, vol. 6, no. 6, pp. 2669–2680, Nov 2015. [176] Y. Yao, L. Huang, A. Sharma, L. Golubchik, and M. Neely, “Power Cost Reduction in Distributed Data Centers: A Two-Time-Scale Approach for Delay Tolerant Workloads,” Parallel and Distributed Systems, IEEE Transactions on, vol. 25, no. 1, pp. 200–211, Jan 2014. [177] C. Jayalath, J. Stephen, and P. Eugster, “From the Cloud to the Atmosphere: Running MapReduce across Data Centers,” Computers, IEEE Transactions on, vol. 63, no. 1, pp. 74–87, Jan 2014. [178] Q. Zhang, L. Liu, K. Lee, Y. Zhou, A. Singh, N. Mandagere, S. Gopisetty, and G. Alatorre, “Improving Hadoop Service Provisioning in a Geographically Distributed Cloud,” in Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on, June 2014, pp. 432–439. [179] Y. Li, L. Zhao, C. Cui, and C. Yu, “Fast Big Data Analysis in Geo- Distributed Cloud,” in 2016 IEEE International Conference on Cluster Computing (CLUSTER), Sept 2016, pp. 388–391. [180] F. J. Clemente-Castelló, B. Nicolae, R. Mayo, and J. C. Fernández, “Performance Model of MapReduce Iterative Applications for Hybrid Cloud Bursting,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 8, pp. 1794–1807, Aug 2018. [181] S. Kailasam, P. Dhawalia, S. J. Balaji, G. Iyer, and J. Dharanipragada, “Extending MapReduce across Clouds with BStream,” IEEE Transactions on Cloud Computing, vol. 2, no. 3, pp. 362–376, July 2014. [182] R. Tudoran, G. Antoniu, and L. Bougé, “SAGE: Geo-Distributed Streaming Data Analysis in Clouds,” in 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, May 2013, pp. 2278–2281. [183] A. Rabkin, M. Arye, S. Sen, V. S. Pai, and M. J. Freedman, “Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area,” in Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’14. Berkeley, CA, USA: USENIX Association, 2014, pp. 275–288. [184] L. Gu, D. Zeng, S. Guo, Y. Xiang, and J. Hu, “A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers,” Computers, IEEE Transactions on, vol. 65, no. 1, pp. 19–29, Jan 2016. [185] W. Chen, I. Paik, and Z. Li, “Cost-Aware Streaming Workflow Allocation on Geo-Distributed Data Centers,” IEEE Transactions on Computers, vol. 66, no. 2, pp. 256–271, Feb 2017. [186] Q. Pu, G. Ananthanarayanan, P. Bodik, S. Kandula, A. Akella, P. Bahl, and I. Stoica, “Low Latency Geo- distributed Data Analytics,” SIGCOMM Comput. Commun. Rev., vol. 45, no. 4, pp. 421–434, Aug. 2015. [187] A. C. Zhou, S. Ibrahim, and B. He, “On Achieving Efficient Data Transfer for Graph Processing in Geo- Distributed Datacenters,” in 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), June 2017, pp. 1397–1407. [188] S. Das, Y. Yiakoumis, G. Parulkar, N. McKeown, P. Singh, D. Getachew, and P. D. Desai, “Application- aware aggregation and traffic engineering in a converged packet-circuit network,” in Optical Fiber Communication Conference and Exposition (OFC/NFOEC), 2011 and the National Fiber Optic Engineers Conference, March 2011, pp. 1–3. [189] V. Lopez, J. M. Gran, J. P. Fernandez-Palacios, D. Siracusa, F. Pederzolli, O. Gerstel, Y. Shikhmanter, J. Mårtensson, P. Sköldström, T. Szyrkowiec, M. Chamania, A. Autenrieth, I. Tomkos, and D. Klonidis, “The role of SDN in application centric IP and optical networks,” in 2016 European Conference on Networks and Communications (EuCNC), June 2016, pp. 138–142. [190] Y. Demchenko, P. Grosso, C. de Laat, S. Filiposka, and M. de Vos, “Zerotouch provisioning (ZTP) model and infrastructure components for multi-provider cloud services provisioning,” CoRR, vol. abs/1611.02758, 2016. [191] S. Wang, X. Zhang, W. Hou, X. Yang, and L. Guo, “SDNyquist platform for big data transmission,” in 2016 15th International Conference on Optical Communications and Networks (ICOCN), Sept 2016, pp. 1– 3. [192] S. Narayan, S. Bailey, A. Daga, M. Greenway, R. Grossman, A. Heath, and R. Powell, “OpenFlow Enabled Hadoop over Local and Wide Area Clusters,” in 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, Nov 2012, pp. 1625–1628. [193] Z. Yu, M. Li, X. Yang, and X. Li, “Palantir: Reseizing Network Proximity in Large-Scale Distributed Computing Frameworks Using SDN,” in Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on, June 2014, pp. 440–447. [194] X. Yang and T. Lehman, “Model Driven Advanced Hybrid Cloud Services for Big Data: Paradigm and Practice,” in 2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud), Nov 2016, pp. 32–36. [195] A. Sadasivarao, S. Syed, P. Pan, C. Liou, I. Monga, C. Guok, and A. Lake, “Bursting Data between Data Centers: Case for Transport SDN,” in High-Performance Interconnects (HOTI), 2013 IEEE 21st Annual Symposium on, Aug 2013, pp. 87–90. [196] W. Lu and Z. Zhu, “Malleable Reservation Based Bulk-Data Transfer to Recycle Spectrum Fragments in Elastic Optical Networks,” Journal of Lightwave Technology, vol. 33, no. 10, pp. 2078–2086, May 2015. [197] Y. Wu, Z. Zhang, C.Wu, C. Guo, Z. Li, and F. Lau, “Orchestrating Bulk Data Transfers across Geo- Distributed Datacenters,” Cloud Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [198] X. Jin, Y. Li, D. Wei, S. Li, J. Gao, L. Xu, G. Li, W. Xu, and J. Rexford, “Optimizing bulk transfers with software-defined optical wan,” in Proceedings of the 2016 ACM SIGCOMM Conference, ser. SIGCOMM ’16. New York, NY, USA: ACM, 2016, pp. 87–100. [199] A. Asensio and L. Velasco, “Managing transfer-based datacenter connections,” IEEE/OSA Journal of Optical Communications and Networking, vol. 6, no. 7, pp. 660–669, July 2014. [200] M. Femminella, G. Reali, and D. Valocchi, “Genome centric networking: A network function virtualization solution for genomic applications,” in 2017 IEEE Conference on Network Softwarization (NetSoft), July 2017, pp. 1–9. [201] L. Gu, S. Tao, D. Zeng, and H. Jin, “Communication cost efficient virtualized network function placement for big data processing,” in 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), April 2016, pp. 604–609. [202] L. Gifre, M. Ruiz, and L. Velasco, “Experimental assessment of Big Data-backed video distribution in the telecom cloud,” in 2017 19th International Conference on Transparent Optical Networks (ICTON), July 2017, pp. 1–4. [203] B. García, M. Gallego, L. López, G. A. Carella, and A. Cheambe, “NUBOMEDIA: An Elastic PaaS Enabling the Convergence of Real-Time and Big Data Multimedia,” in 2016 IEEE International Conference on Smart Cloud (SmartCloud), Nov 2016, pp. 45–56. [204] J. Han, M. Ishii, and H. Makino, “A Hadoop performance model for multi-rack clusters,” in Computer Science and Information Technology (CSIT), 2013 5th International Conference on, March 2013, pp. 265–274. [205] G. Wang, A. R. Butt, P. Pandey, and K. Gupta, “A simulation approach to evaluating design decisions in MapReduce setups,” in Modeling, Analysis Simulation of Computer and Telecommunication Systems, 2009. MASCOTS ’09. IEEE International Symposium on, Sept 2009, pp. 1–11. [206] Z. Kouba, O. Tomanek, and L. Kencl, “Evaluation of Datacenter Network Topology Influence on Hadoop MapReduce Performance,” in 2016 5th IEEE International Conference on Cloud Networking (Cloudnet), Oct 2016, pp. 95–100. [207] S. H. Mohamed, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “On the energy efficiency of MapReduce shuffling operations in data centers,” in 2017 19th International Conference on Transparent Optical Networks (ICTON), July 2017, pp. 1–5. [208] Y. Shang, D. Li, J. Zhu, and M. Xu, “On the Network Power Effectiveness of Data Center Architectures,” Computers, IEEE Transactions on, vol. 64, no. 11, pp. 3237–3248, Nov 2015. [209] M. Alizadeh and T. Edsall, “On the Data Path Performance of Leaf-Spine Datacenter Fabrics,” in High- Performance Interconnects (HOTI), 2013 IEEE 21st Annual Symposium on, Aug 2013, pp. 71–74. [210] J. Duan and Y. Yang, “FFTree: A flexible architecture for data center networks towards configurability and cost efficiency,” in 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS), June 2017, pp. 1–10. [211] S. Kandula, J. Padhye, and V. Bahl, “Flyways To De-Congest Data Center Networks,” Tech. Rep., August 2009. [212] D. Halperin, S. Kandula, J. Padhye, P. Bahl, and D. Wetherall, “Augmenting data center networks with multi-gigabit wireless links,” SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, pp. 38–49, Aug. 2011. [213] K. Suto, H. Nishiyama, N. Kato, T. Nakachi, T. Sakano, and A. Takahara, “A Failure-Tolerant and Spectrum-Efficient Wireless Data Center Network Design for Improving Performance of Big Data Mining,” in Vehicular Technology Conference (VTC Spring), 2015 IEEE 81st, May 2015, pp. 1–5. [214] P. Costa, A. Donnelly, A. Rowstron, and G. O’Shea, “Camdoop: Exploiting In-network Aggregation for Big Data Applications,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’12. Berkeley, CA, USA: USENIX Association, 2012, pp. 3–3. [215] L. Rupprecht, “Exploiting In-network Processing for Big Data Management,” in Proceedings of the 2013 SIGMOD/PODS Ph.D. Symposium, ser. SIGMOD’13 PhD Symposium. New York, NY, USA: ACM, 2013, pp. 1–6. [216] Y. Zhang, C. Guo, R. Chu, G. Lu, Y. Xiong, and H. Wu, “RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store,” in 4th USENIX Workshop on Hot Topics in Cloud Computing, Hot- Cloud’12, Boston, MA, USA, June 12-13, 2012, 2012. [217] X. Meng, V. Pappas, and L. Zhang, “Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement,” in Proceedings of the 29th Conference on Information Communications, ser. INFOCOM’10. Piscataway, NJ, USA: IEEE Press, 2010, pp. 1154–1162. [218] H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, “Towards predictable datacenter networks,” SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, pp. 242–253, Aug. 2011. [219] D. Zeng, S. Guo, H. Huang, S. Yu, and V. C. M. Leung, “Optimal VM Placement in Data Centres with Architectural and Resource Constraints,” Int. J. Auton. Adapt. Commun. Syst., vol. 8, no. 4, pp. 392–406, Nov. 2015. [220] Z. Wu, Y. Zhang, V. Singh, G. Jiang, and H. Wang, “Automating Cloud Network Optimization and Evolution,” Selected Areas in Communications, IEEE Journal on, vol. 31, no. 12, pp. 2620–2631, December 2013. [221] W. C. Moody, J. Anderson, K.-C. Wange, and A. Apon, “Reconfigurable Network Testbed for Evaluation of Datacenter Topologies,” in Proceedings of the Sixth International Workshop on Data Intensive Distributed Computing, ser. DIDC ’14. New York, NY, USA: ACM, 2014, pp. 11–20. [222] G. Wang, T. E. Ng, and A. Shaikh, “Programming Your Network at Run-time for Big Data Applications,” in Proceedings of the First Workshop on Hot Topics in Software Defined Networks, ser. HotSDN ’12. New York, NY, USA: ACM, 2012, pp. 103–108. [223] H. H. Bazzaz, M. Tewari, G. Wang, G. Porter, T. S. E. Ng, D. G. Andersen, M. Kaminsky, M. A. Kozuch, and A. Vahdat, “Switching the Optical Divide: Fundamental Challenges for Hybrid Electrical/Optical Datacenter Networks,” in Proceedings of the 2Nd ACM Symposium on Cloud Computing, ser. SOCC ’11. New York, NY, USA: ACM, 2011, pp. 30:1–30:8. [224] M. Channegowda, T. Vlachogiannis, R. Nejabati, and D. Simeonidou, “Optical flyways for handling elephant flows to improve big data performance in SDN enabled Datacenters,” in 2016 Optical Fiber Communications Conference and Exhibition (OFC), March 2016, pp. 1–3. [225] Y. Yin, K. Kanonakis, and P. N. Ji, “Hybrid optical/electrical switching in directly connected datacenter networks,” in Communications in China (ICCC), 2014 IEEE/CIC International Conference on, Oct 2014, pp. 102– 106. [226] P. Samadi, V. Gupta, B. Birand, H. Wang, G. Zussman, and K. Bergman, “Accelerating Incast and Multicast Traffic Delivery for Data-intensive Applications Using Physical Layer Optics,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 4, pp. 373–374, Aug. 2014. [227] J. Bao, B. Zhao, D. Dong, and Z. Gong, “HERO: A Hybrid Electrical and Optical Multicast for Accelerating High-Performance Data Center Applications,” in Proceedings of the SIGCOMM Posters and Demos, ser. SIGCOMM Posters and Demos ’17. New York, NY, USA: ACM, 2017, pp. 17–18. [228] S. Peng, B. Guo, C. Jackson, R. Nejabati, F. Agraz, S. Spadaro, G. Bernini, N. Ciulli, and D. Simeonidou, “Multi-tenant softwaredefined hybrid optical switched data centre,” Lightwave Technology, Journal of, vol. 33, no. 15, pp. 3224–3233, Aug 2015. [229] L. Schares, X. J. Zhang, R. Wagle, D. Rajan, P. Selo, S. P. Chang, J. Giles, K. Hildrum, D. Kuchta, J. Wolf, and E. Schenfeld, “A reconfigurable interconnect fabric with optical circuit switch and software optimizer for stream computing systems,” in 2009 Conference on Optical Fiber Communication - incudes post deadline papers, March 2009, pp. 1–3. [230] X. Yu, H. Gu, K. Wang, and G. Wu, “Enhancing Performance of Cloud Computing Data Center Networks by Hybrid Switching Architecture,” Lightwave Technology, Journal of, vol. 32, no. 10, pp. 1991–1998, May 2014. [231] L. Y. Ho, J. J. Wu, and P. Liu, “Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework,” in 2011 IEEE 4th International Conference on Cloud Computing, July 2011, pp. 420– 427. [232] Y. Le, F. Wang, J. Liu, and F. Ergün, “On Datacenter-Network-Aware Load Balancing in MapReduce,” in Cloud Computing (CLOUD), 2015 IEEE 8th International Conference on, June 2015, pp. 485–492. [233] H. Ke, P. Li, S. Guo, and M. Guo, “On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 3, pp. 818–828, March 2016. [234] Z. Jiang, Z. Ding, X. Gao, and G. Chen, “DCP: An efficient and distributed data center cache protocol with Fat-Tree topology,” in Network Operations and Management Symposium (APNOMS), 2014 16th Asia-Pacific, Sept 2014, pp. 1–4. [235] D. Guo, J. Xie, X. Zhou, X. Zhu, W. Wei, and X. Luo, “Exploiting Efficient and Scalable Shuffle Transfers in Future Data Center Networks,” Parallel and Distributed Systems, IEEE Transactions on, vol. 26, no. 4, pp. 997– 1009, April 2015. [236] E. Yildirim, E. Arslan, J. Kim, and T. Kosar, “Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency,” IEEE Transactions on Cloud Computing, vol. 4, no. 1, pp. 63–75, Jan 2016. [237] Y. Yu and C. Qian, “Space Shuffle: A Scalable, Flexible, and High- Performance Data Center Network,” IEEE Transactions on Parallel and Distributed Systems, vol. PP, no. 99, pp. 1–1, 2016. [238] E. Zahavi, I. Keslassy, and A. Kolodny, “Distributed Adaptive Routing Convergence to Non-Blocking DCN Routing Assignments,” Selected Areas in Communications, IEEE Journal on, vol. 32, no. 1, pp. 88–101, January 2014. [239] N. Chrysos, M. Gusat, F. Neeser, C. Minkenberg, W. Denzel, and C. Basso, “High performance multipath routing for datacenters,” in High Performance Switching and Routing (HPSR), 2014 IEEE 15th International Conference on, July 2014, pp. 70–75. [240] E. Dong, X. Fu, M. Xu, and Y. Yang, “DCMPTCP: Host-Based Load Balancing for Datacenters,” in 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), July 2018, pp. 622–633. [241] Y. Shang, D. Li, and M. Xu, “Greening data center networks with flow preemption and energy-aware routing,” in Local Metropolitan Area Networks (LANMAN), 2013 19th IEEE Workshop on, April 2013, pp. 1– 6. [242] L. Wang, F. Zhang, and Z. Liu, “Improving the Network Energy Efficiency in MapReduce Systems,” in Computer Communications and Networks (ICCCN), 2013 22nd International Conference on, July 2013, pp. 1–7. [243] L. Wang, F. Zhang, J. Arjona Aroca, A. Vasilakos, K. Zheng, C. Hou, D. Li, and Z. Liu, “GreenDCN: A General Framework for Achieving Energy Efficiency in Data Center Networks,” Selected Areas in Communications, IEEE Journal on, vol. 32, no. 1, pp. 4–15, January 2014. [244] X. Wen, K. Chen, Y. Chen, Y. Liu, Y. Xia, and C. Hu, “VirtualKnotter: Online Virtual Machine Shuffling for Congestion Resolving in Virtualized Datacenter,” in Distributed Computing Systems (ICDCS), 2012 IEEE 32nd International Conference on, June 2012, pp. 12–21. [245] K. C. Webb, A. C. Snoeren, and K. Yocum, “Topology Switching for Data Center Networks,” in Proceedings of the 11th USENIX Conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services, ser. Hot-ICE’11. Berkeley, CA, USA: USENIX Association, 2011, pp. 14–14. [246] L. Chen, Y. Feng, B. Li, and B. Li, “Towards performance-centric fairness in datacenter networks,” in INFOCOM, 2014 Proceedings IEEE, April 2014, pp. 1599–1607. [247] L. A. Rocha and F. L. Verdi, “MILPFlow: A toolset for integration of computational modelling and deployment of data paths for SDN,” in Integrated Network Management (IM), 2015 IFIP/IEEE International Symposium on, May 2015, pp. 750–753. [248] L.W. Cheng and S. Y.Wang, “Application-Aware SDN Routing for Big Data Networking,” in 2015 IEEE Global Communications Conference (GLOBECOM), Dec 2014, pp. 1–6. [249] S. Narayan, S. Bailey, and A. Daga, “Hadoop Acceleration in an OpenFlow-Based Cluster,” in High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, Nov 2012, pp. 535–538. [250] X. Hou, A. K. T. K, J. P. Thomas, and V. Varadharajan, “Dynamic Workload Balancing for Hadoop MapReduce,” in Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on, Dec 2014, pp. 56–62. [251] C. Trois, M. Martinello, L. C. E. de Bona, and M. D. Del Fabro, “From Software Defined Network to Network Defined for Software,” in Proceedings of the 30th Annual ACM Symposium on Applied Computing, ser. SAC ’15. New York, NY, USA: ACM, 2015, pp. 665–668. [252] S. Zhao and D. Medhi, “Application-Aware Network Design for Hadoop MapReduce Optimization Using Software-Defined Networking,” IEEE Transactions on Network and Service Management, vol. 14, no. 4, pp. 804– 816, Dec 2017. [253] Z. Asad, M. Chaudhry, and D. Malone, “Greener Data Exchange in the Cloud: A Coding Based Optimization for Big Data Processing,” Selected Areas in Communications, IEEE Journal on, vol. PP, no. 99, pp. 1–1, 2016. [254] J. Duan, Z. Wang, and C. Wu, “Responsive multipath TCP in SDN based datacenters,” in Communications (ICC), 2015 IEEE International Conference on, June 2015, pp. 5296–5301. [255] S. Sen, D. Shue, S. Ihm, and M. J. Freedman, “Scalable, Optimal Flow Routing in Datacenters via Local Link Balancing,” in Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies, ser. CoNEXT ’13. New York, NY, USA: ACM, 2013, pp. 151–162. [256] S. Hu, K. Chen, H. Wu, W. Bai, C. Lan, H. Wang, H. Zhao, and C. Guo, “Explicit path control in commodity data centers: Design and applications,” Networking, IEEE/ACM Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [257] Z. Xie, L. Hu, K. Zhao, F. Wang, and J. Pang, “Topology2Vec: Topology Representation Learning For Data Center Networking,” IEEE Access, vol. 6, pp. 33 840–33 848, 2018. [258] M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica, “Managing Data Transfers in Computer Clusters with Orchestra,” SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, pp. 98–109, Aug. 2011. [259] A. Shieh, S. Kandula, A. Greenberg, C. Kim, and B. Saha, “Sharing the Data Center Network,” in Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’11. Berkeley, CA, USA: USENIX Association, 2011, pp. 309–322. [260] A. Das, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, and C. Yu, “Transparent and flexible network management for big data processing in the cloud,” in Presented as part of the 5th USENIX Workshop on Hot Topics in Cloud Computing. Berkeley, CA: USENIX, 2013. [261] W. Cui and C. Qian, “DiFS: Distributed flow scheduling for adaptive routing in hierarchical data center networks,” in 2014 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), Oct 2014, pp. 53–64. [262] M. Chowdhury, Y. Zhong, and I. Stoica, “Efficient Coflow Scheduling with Varys,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 4, pp. 443–454, Aug. 2014. [263] F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron, “Decentralized Task-aware Scheduling for Data Center Networks,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 4, pp. 431–442, Aug. 2014. [264] S. Luo, H. Yu, Y. Zhao, S. Wang, S. Yu, and L. Li, “Towards Practical and Near-optimal Coflow Scheduling for Data Center Networks,” IEEE Transactions on Parallel and Distributed Systems, vol. PP, no. 99, pp. 1–1, 2016. [265] Y. Zhao, K. Chen, W. Bai, M. Yu, C. Tian, Y. Geng, Y. Zhang, D. Li, and S. Wang, “Rapier: Integrating routing and scheduling for coflow-aware data center networks,” in 2015 IEEE Conference on Computer Communications (INFOCOM), April 2015, pp. 424–432. [266] Z. Guo, J. Duan, and Y. Yang, “On-Line Multicast Scheduling with Bounded Congestion in Fat-Tree Data Center Networks,” Selected Areas in Communications, IEEE Journal on, vol. 32, no. 1, pp. 102– 115, January 2014. [267] M. V. Neves, C. A. F. D. Rose, K. Katrinis, and H. Franke, “Pythia: Faster Big Data in Motion through Predictive Software-Defined Network Optimization at Runtime,” in 2014 IEEE 28th International Parallel and Distributed Processing Symposium, May 2014, pp. 82–90. [268] W. Hong, K. Wang, and Y.-H. Hsu, “Application-Aware Resource Allocation for SDN-based Cloud Datacenters,” in Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on, Dec 2013, pp. 106–110. [269] P. Qin, B. Dai, B. Huang, and G. Xu, “Bandwidth-Aware Scheduling With SDN in Hadoop: A New Trend for Big Data,” Systems Journal, IEEE, vol. PP, no. 99, pp. 1–8, 2015. [270] H. Rodrigues, R. Strong, A. Akyurek, and T. Rosing, “Dynamic optical switching for latency sensitive applications,” in Architectures for Networking and Communications Systems (ANCS), 2015 ACM/IEEE Symposium on, May 2015, pp. 75–86. [271] K. Kontodimas, K. Christodoulopoulos, E. Zahavi, and E. Varvarigos, “Resource allocation in slotted optical data center networks,” in 2018 International Conference on Optical Network Design and Modeling (ONDM), May 2018, pp. 248–253. [272] G. C. Sankaran and K. M. Sivalingam, “Design and Analysis of Scheduling Algorithms for Optically Groomed Data Center Networks,” IEEE/ACM Transactions on Networking, vol. 25, no. 6, pp. 3282–3293, Dec 2017. [273] L. Wang, X. Wang, M. Tornatore, K. J. Kim, S. M. Kim, D. Kim, K. Han, and B. Mukherjee, “Scheduling with machine-learning-based flow detection for packet-switched optical data center networks,” IEEE/OSA Journal of Optical Communications and Networking, vol. 10, no. 4, pp. 365–375, April 2018. [274] R. Xie and X. Jia, “Data Transfer Scheduling for Maximizing Throughput of Big-Data Computing in Cloud Systems,” Cloud Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [275] I. Paik, W. Chen, and Z. Li, “Topology-Aware Optimal Data Placement Algorithm for Network Traffic Optimization,” Computers, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [276] W. Li, D. Guo, A. X. Liu, K. Li, H. Qi, S. Guo, A. Munir, and X. Tao, “CoMan: Managing Bandwidth Across Computing Frameworks in Multiplexed Datacenters,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 5, pp. 1013–1029, May 2018. [277] H. Shen, A. Sarker, L. Yu, and F. Deng, “Probabilistic Network-Aware Task Placement for MapReduce Scheduling,” in 2016 IEEE International Conference on Cluster Computing (CLUSTER), Sept 2016, pp. 241– 250. [278] Z. Li, H. Shen, and A. Sarker, “A Network-Aware Scheduler in Data-Parallel Clusters for High Performance,” in 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), May 2018, pp. 1–10. [279] D. Xie, N. Ding, Y. C. Hu, and R. Kompella, “The Only Constant is Change: Incorporating Time-varying Network Reservations in Data Centers,” in Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, ser. SIGCOMM ’12. New York, NY, USA: ACM, 2012, pp. 199–210. [280] V. Jalaparti, P. Bodik, I. Menache, S. Rao, K. Makarychev, and M. Caesar, “Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can,” SIGCOMM Comput. Commun. Rev., vol. 45, no. 4, pp. 407–420, Aug. 2015. [281] K. Karanasos, S. Rao, C. Curino, C. Douglas, K. Chaliparambil, G. M. Fumarola, S. Heddaya, R. Ramakrishnan, and S. Sakalanaga, “Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters,” in Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference, ser. USENIX ATC ’15. Berkeley, CA, USA: USENIX Association, 2015, pp. 485–497. [282] T. Renner, L. Thamsen, and O. Kao, “Network-aware resource management for scalable data analytics frameworks,” in Big Data (Big Data), 2015 IEEE International Conference on, Oct 2015, pp. 2793–2800. [283] R. F. e Silva and P. M. Carpenter, “Energy Efficient Ethernet on MapReduce Clusters: Packet Coalescing To Improve 10GbE Links,” IEEE/ACM Transactions on Networking, vol. 25, no. 5, pp. 2731–2742, Oct 2017. [284] G. Wen, J. Hong, C. Xu, P. Balaji, S. Feng, and P. Jiang, “Energy-aware hierarchical scheduling of applications in large scale data centers,” in Cloud and Service Computing (CSC), 2011 International Conference on, Dec 2011, pp. 158–165. [285] D. Li, Y. Yu, W. He, K. Zheng, and B. He, “Willow: Saving Data Center Network Energy for Network- Limited Flows,” Parallel and Distributed Systems, IEEE Transactions on, vol. 26, no. 9, pp. 2610–2620, Sept 2015. [286] Z. Niu, B. He, and F. Liu, “JouleMR: Towards Cost-Effective and Green-Aware Data Processing Frameworks,” IEEE Transactions on Big Data, vol. 4, no. 2, pp. 258–272, June 2018. [287] R. Appuswam, C. Gkantsidis, D. Narayanan, O. Hodson, and A. Rowstron, “Scale-up vs Scale-out for Hadoop: Time to rethink?” ACM Symposium on Cloud Computing, October 2013. [288] Z. Li, H. Shen, W. Ligon, and J. Denton, “An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 2, pp. 386–400, Feb 2017. [289] S. Sur, H. Wang, J. Huang, X. Ouyang, and D. Panda, “Can High- Performance Interconnects Benefit Hadoop Distributed File System?” 2010. [290] Y. Wang, R. Goldstone, W. Yu, and T. Wang, “Characterization and Optimization of Memory-Resident MapReduce on HPC Systems,” in Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, May 2014, pp. 799–808. [291] K. Kambatla and Y. Chen, “The Truth About MapReduce Performance on SSDs,” in Proceedings of the 28th USENIX Conference on Large Installation System Administration, ser. LISA’14. Berkeley, CA, USA: USENIX Association, 2014, pp. 109–117. [292] J. Hong, L. Li, C. Han, B. Jin, Q. Yang, and Z. Yang, “Optimizing Hadoop Framework for Solid State Drives,” in 2016 IEEE International Congress on Big Data (BigData Congress), June 2016, pp. 9–17. [293] B. Wang, J. Jiang, Y. Wu, G. Yang, and K. Li, “Accelerating MapReduce on Commodity Clusters: An SSD- Empowered Approach,” IEEE Transactions on Big Data, vol. PP, no. 99, pp. 1–1, 2016. [294] J. Bhimani, J. Yang, Z. Yang, N. Mi, Q. Xu, M. Awasthi, R. Pandurangan, and V. Balakrishnan, “Understanding performance of I/O intensive containerized applications for NVMe SSDs,” in 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC), Dec 2016, pp. 1–8. [295] G. Wang, A. R. Butt, H. Monti, and K. Gupta, “Towards Synthesizing Realistic Workload Traces for Studying the Hadoop Ecosystem,” in Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, ser. MASCOTS ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 400–408. [296] T. Ono, Y. Konishi, T. Tanimoto, N. Iwamatsu, T. Miyoshi, and J. Tanaka, “FlexDAS: A flexible direct attached storage for I/O intensive applications,” in 2014 IEEE International Conference on Big Data (Big Data), Oct 2014, pp. 147–152. [297] Y. Kim, S. Atchley, G. R. Vallee, and G. M. Shipman, “Layout-aware I/O Scheduling for terabits data movement,” in 2013 IEEE International Conference on Big Data, Oct 2013, pp. 44–51. [298] A. Dragojevi´c, D. Narayanan, M. Castro, and O. Hodson, “FaRM: Fast Remote Memory,” in 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). Seattle, WA: USENIX Association, 2014, pp. 401–414. [299] W. Yu, Y. Wang, X. Que, and C. Xu, “Virtual Shuffling for Efficient Data Movement in MapReduce,” Computers, IEEE Transactions on, vol. 64, no. 2, pp. 556–568, Feb 2015. [300] M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, “A Case for Specialized Processors for Scale-Out Workloads,” IEEE Micro, vol. 34, no. 3, pp. 31–42, May 2014. [301] B. Jacob, “The 2 PetaFLOP, 3 Petabyte, 9 TB/s, 90 kW Cabinet: A System Architecture for Exascale and Big Data,” IEEE Computer Architecture Letters, vol. PP, no. 99, pp. 1–1, 2015. [302] W. Fang, B. He, Q. Luo, and N. K. Govindaraju, “Mars: Accelerating MapReduce with Graphics Processors,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 4, pp. 608–620, April 2011. [303] C. Wang, C. Yang, W. Liao, R. Chang, and T. Wei, “Coupling GPU and MPTCP to improve Hadoop/MapReduce performance,” in 2016 2nd International Conference on Intelligent Green Building and Smart Grid (IGBSG), June 2016, pp. 1–6. [304] Y. Shan, B. Wang, N. X. Jing Yan, Y. Wang, and H. Yang, “FPMR: MapReduce framework on FPGA,” in Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays. New York, NY, USA: ACM, January 2010, p. 93–102. [305] C. Wang, X. Li, and X. Zhou, “SODA: Software defined FPGA based accelerators for big data,” in 2015 Design, Automation Test in Europe Conference Exhibition (DATE), March 2015, pp. 884–887. [306] D. Diamantopoulos and C. Kachris, “High-level synthesizable dataflow MapReduce accelerator for FPGA- coupled data centers,” in Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2015 International Conference on, July 2015, pp. 26–33. [307] Y. Tokusashi and H. Matsutani, “Multilevel NoSQL Cache Combining In-NIC and In-Kernel Approaches,” IEEE Micro, vol. 37, no. 5, pp. 44–51, September 2017. [308] K. Nakamura, A. Hayashi, and H. Matsutani, “An FPGA-based low-latency network processing for spark streaming,” in 2016 IEEE International Conference on Big Data (Big Data), Dec 2016, pp. 2410–2415. [309] B. Betkaoui, D. B. Thomas, W. Luk, and N. Przulj, “A framework for FPGA acceleration of large graph problems: Graphlet counting case study,” in 2011 International Conference on Field-Programmable Technology, Dec 2011, pp. 1–8. [310] P. X. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han, R. Agarwal, S. Ratnasamy, and S. Shenker, “Network Requirements for Resource Disaggregation,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). Savannah, GA: USENIX Association, 2016, pp. 249–264. [311] C.-S. Li, H. Franke, C. Parris, B. Abali, M. Kesavan, and V. Chang, “Composable architecture for rack scale big data computing,” Future Generation Computer Systems, vol. 67, pp. 180 – 193, 2017. [312] M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mob. Netw. Appl., vol. 19, no. 2, pp. 171–209, Apr. 2014. [313] S. Sakr, A. Liu, D. Batista, and M. Alomari, “A Survey of Large Scale Data Management Approaches in Cloud Environments,” Communications Surveys Tutorials, IEEE, vol. 13, no. 3, pp. 311–336, Third 2011. [314] L. Jiamin and F. Jun, “A Survey of MapReduce Based Parallel Processing Technologies,” Communications, China, vol. 11, no. 14, pp. 146–155, Supplement 2014. [315] Y. Zhang, T. Cao, S. Li, X. Tian, L. Yuan, H. Jia, and A. V. Vasilakos, “Parallel Processing Systems for Big Data: A Survey,” Proceedings of the IEEE, vol. 104, no. 11, pp. 2114–2136, Nov 2016. [316] H. Zhang, G. Chen, B. C. Ooi, K. L. Tan, and M. Zhang, “In-Memory Big Data Management and Processing: A Survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 7, pp. 1920–1948, July 2015. [317] R. Han, L. K. John, and J. Zhan, “Benchmarking Big Data Systems: A Review,” IEEE Transactions on Services Computing, vol. PP, no. 99, pp. 1–1, 2017. [318] G. Rumi, C. Colella, and D. Ardagna, “Optimization Techniques within the Hadoop Eco-system: A Survey,” in Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2014 16th International Symposium on, Sept 2014, pp. 437–444. [319] B. T. Rao and L. S. S. Reddy, “Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments,” CoRR, vol. abs/1207.0780, 2012. [320] R. Li, H. Hu, H. Li, Y. Wu, and J. Yang, “Mapreduce parallel programming model: A state-of-the-art survey,” International Journal of Parallel Programming, vol. 44, no. 4, pp. 832–866, Aug 2016. [321] J. Wu, S. Guo, J. Li, and D. Zeng, “Big Data Meet Green Challenges: Big Data Toward Green Applications,” IEEE Systems Journal, vol. 10, no. 3, pp. 888–900, Sept 2016. [322] P. Derbeko, S. Dolev, E. Gudes, and S. Sharma, “Security and privacy aspects in mapreduce on clouds: A survey,” Computer Science Review, vol. 20, pp. 1 – 28, 2016. [323] S. Dolev, P. Florissi, E. Gudes, S. Sharma, and I. Singer, “A Survey on Geographically Distributed Big- Data Processing using MapReduce,” IEEE Transactions on Big Data, vol. PP, no. 99, pp. 1–1, 2017. [324] M. Hadi, A. Lawey, T. El-Gorashi, and J. Elmirghani, “Big Data Analytics for Wireless and Wired Network Design: A Survey,” Computer Networks, vol. 132, pp. 180–199, February 2018. [325] J. Wang, Y. Wu, N. Yen, S. Guo, and Z. Cheng, “Big Data Analytics for Emergency Communication Networks: A Survey,” IEEE Communications Surveys Tutorials, vol. 18, no. 3, pp. 1758–1778, thirdquarter 2016. [326] X. Cao, L. Liu, Y. Cheng, and X. Shen, “Towards Energy-Efficient Wireless Networking in the Big Data Era: A Survey,” IEEE Communications Surveys Tutorials, vol. PP, no. 99, pp. 1–1, 2017. [327] S. Yu, M. Liu, W. Dou, X. Liu, and S. Zhou, “Networking for Big Data: A Survey,” IEEE Communications Surveys Tutorials, vol. 19, no. 1, pp. 531–549, Firstquarter 2017. [328] S. Wang, J. Zhang, T. Huang, J. Liu, T. Pan, and Y. Liu, “A Survey of Coflow Scheduling Schemes for Data Center Networks,” IEEE Communications Magazine, vol. 56, no. 6, pp. 179–185, June 2018. [329] K. Wang, Q. Zhou, S. Guo, and J. Luo, “Cluster Frameworks for Efficient Scheduling and Resource Allocation in Data Center Networks: A Survey,” IEEE Communications Surveys Tutorials, pp. 1–1, 2018. [330] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Distributed Data-parallel Programs from Sequential Building Blocks,” SIGOPS Oper. Syst. Rev., vol. 41, no. 3, pp. 59–72, Mar. 2007. [331] T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. J. Fernández- Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle, “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of- Order Data Processing,” Proceedings of the VLDB Endowment, vol. 8, pp. 1792–1803, 2015. [332] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google File System,” SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 29–43, Oct. 2003. [333] V. Kalavri and V. Vlassov, “MapReduce: Limitations, Optimizations and Open Issues,” in 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, July 2013, pp. 1031– 1038. [334] S. Babu, “Towards Automatic Optimization of MapReduce Programs,” in Proceedings of the 1st ACM Symposium on Cloud Computing, ser. SoCC ’10. New York, NY, USA: ACM, 2010, pp. 137–142. [335] P. Lama and X. Zhou, “AROMA: Automated Resource Allocation and Configuration of Mapreduce Environment in the Cloud,” in Proceedings of the 9th International Conference on Autonomic Computing, ser. ICAC ’12. New York, NY, USA: ACM, 2012, pp. 63–72. [336] A. Rabkin and R. Katz, “How Hadoop Clusters Break,” Software, IEEE, vol. 30, no. 4, pp. 88–94, July 2013. [337] D. Cheng, J. Rao, Y. Guo, C. Jiang, and X. Zhou, “Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 3, pp. 774–786, March 2017. [338] T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears, “MapReduce Online,” in Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’10. Berkeley, CA, USA: USENIX Association, 2010, pp. 21–21. [339] T. White, Hadoop: The Definitive Guide, 1st ed. O’Reilly Media, Inc., 2009. [340] C. Ji, Y. Li, W. Qiu, U. Awada, and K. Li, “Big Data Processing in Cloud Computing Environments,” in Pervasive Systems, Algorithms and Networks (ISPAN), 2012 12th International Symposium on, Dec 2012, pp. 17–23. [341] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler, “Apache Hadoop YARN: Yet Another Resource Negotiator,” in Proceedings of the 4th Annual Symposium on Cloud Computing, ser. SOCC ’13. New York, NY, USA: ACM, 2013, pp. 5:1–5:16. [342] I. Polato, D. Barbosa, A. Hindle, and F. Kon, “Hadoop branching: Architectural impacts on energy and performance,” in Green Computing Conference and Sustainable Computing Conference (IGSC), 2015 Sixth International, Dec 2015, pp. 1–4. [343] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, “Pig Latin: A Not-so-foreign Language for Data Processing,” in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’08. New York, NY, USA: ACM, 2008, pp. 1099–1110. [344] B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. Murthy, and C. Curino, “Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’15. New York, NY, USA: ACM, 2015, pp. 1357–1369. [345] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy, “Hive: A Warehousing Solution over a Map-reduce Framework,” Proc. VLDB Endow., vol. 2, no. 2, pp. 1626–1629, Aug. 2009. [346] A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy, “Storm@twitter,” in Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’14. New York, NY, USA: ACM, 2014, pp. 147–156. [347] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable: A Distributed Storage System for Structured Data,” ACM Trans. Comput. Syst., vol. 26, no. 2, pp. 4:1–4:26, Jun. 2008. [348] B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni, “PNUTS: Yahoo!’s Hosted Data Serving Platform,” Proc. VLDB Endow., vol. 1, no. 2, pp. 1277–1288, Aug. 2008. [349] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazon’s Highly Available Key-value Store,” SIGOPS Oper. Syst. Rev., vol. 41, no. 6, pp. 205–220, Oct. 2007. [350] A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin, “HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads,” Proc. VLDB Endow., vol. 2, no. 1, pp. 922–933, Aug. 2009. [351] A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,” SIGOPS Oper. Syst. Rev., vol. 44, no. 2, pp. 35–40, Apr. 2010. [352] J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad, “Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing),” Proc. VLDB Endow., vol. 3, no. 1-2, pp. 515–529, Sep. 2010. [353] J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman, “The Case for RAMClouds: Scalable High-performance Storage Entirely in DRAM,” SIGOPS Oper. Syst. Rev., vol. 43, no. 4, pp. 92–105, Jan. 2010. [354] F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner, “SAP HANA Database: Data Management for Modern Business Applications,” SIGMOD Rec., vol. 40, no. 4, pp. 45–51, Jan. 2012. [355] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster Computing with Working Sets,” in Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, ser. HotCloud’10. Berkeley, CA, USA: USENIX Association, 2010, pp. 10–10. [356] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’12. Berkeley, CA, USA: USENIX Association, 2012, pp. 2–2. [357] A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan, “One Trillion Edges: Graph Processing at Facebook-scale,” Proc. VLDB Endow., vol. 8, no. 12, pp. 1804–1815, Aug. 2015. [358] G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A System for Large-scale Graph Processing,” in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’10. New York, NY, USA: ACM, 2010, pp. 135–146. [359] B. Shao, H. Wang, and Y. Li, “Trinity: A Distributed Graph Engine on a Memory Cloud,” in Proceedings of SIGMOD 2013. ACM SIGMOD, June 2013. [360] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, “Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud,” Proc. VLDB Endow., vol. 5, no. 8, pp. 716– 727, Apr. 2012. [361] J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin, “PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs,” in Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). Hollywood, CA: USENIX, 2012, pp. 17–30. [362] D. Bernstein, “The Emerging Hadoop, Analytics, Stream Stack for Big Data,” Cloud Computing, IEEE, vol. 1, no. 4, pp. 84–86, Nov 2014. [363] A. M. Aly, A. Sallam, B. M. Gnanasekaran, L. V. Nguyen-Dinh, W. G. Aref, M. Ouzzani, and A. Ghafoor, “M3: Stream Processing on Main-Memory MapReduce,” in 2012 IEEE 28th International Conference on Data Engineering, April 2012, pp. 1253–1256. [364] T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle, “Mill-Wheel: Fault-Tolerant Stream Processing at Internet Scale,” in Very Large Data Bases, 2013, pp. 734–746. [365] L. Neumeyer, B. Robbins, A. Nair, and A. Kesari, “S4: Distributed Stream Computing Platform,” in Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, ser. ICDMW ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 170–177. [366] M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica, “Discretized Streams: An Efficient and Fault-tolerant Model for Stream Processing on Large Clusters,” in Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, ser. HotCloud’12. Berkeley, CA, USA: USENIX Association, 2012, pp. 10–10. [367] N. Marz and J. Warren, Big Data: Principles and Best Practices of Scalable Realtime Data Systems, 1st ed. Greenwich, CT, USA: Manning Publications Co., 2015. [368] J. Kreps, N. Narkhede, and J. Rao, “Kafka: A distributed messaging system for log processing,” in Proceedings of 6th International Workshop on Networking Meets Databases (NetDB), Athens, Greece, 2011. [369] Apache Hadoop Rumen. (Cited on 2016, Dec). [Online]. Available: https://hadoop.apache.org/docs/stable/hadoop-rumen/Rumen.html [370] M. Sadiku, S. Musa, and O. Momoh, “Cloud Computing: Opportunities and Challenges,” Potentials, IEEE, vol. 33, no. 1, pp. 34–36, Jan 2014. [371] B. Biocic, D. Tomic, and D. Ogrizovic, “Economics of the cloud computing,” in MIPRO, 2011 Proceedings of the 34th International Convention, May 2011, pp. 1438–1442. [372] N. da Fonseca and R. Boutaba, Cloud Architectures, Networks, Services, and Management. Wiley-IEEE Press, 2015, p. 432. [373] J. E. Smith and R. Nair, “The architecture of virtual machines,” Computer, vol. 38, no. 5, pp. 32–38, May 2005. [374] B. Sotomayor, R. S. Montero, I. M. Llorente, and I. Foster, “Virtual Infrastructure Management in Private and Hybrid Clouds,” IEEE Internet Computing, vol. 13, no. 5, pp. 14–22, Sept 2009. [375] Amazon EC2. (Cited on 2017, Dec). [Online]. Available: https: //aws.amazon.com/ec2 [376] Google Compute Engine Pricing. (Cited on 2017, Dec). [Online]. Available: https://cloud.google.com/compute/pricing [377] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the Art of Virtualization,” SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 164–177, Oct. 2003. [378] A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori, “kvm: the Linux Virtual Machine Monitor,” in Proceedings of the Linux Symposium, vol. 1, Ottawa, Ontario, Canada, Jun. 2007, pp. 225–230. [379] F. Guthrie, S. Lowe, and K. Coleman, VMware vSphere Design, 2nd ed. Alameda, CA, USA: SYBEX Inc., 2013. [380] T. Kooburat and M. Swift, “The Best of Both Worlds with On-demand Virtualization,” in Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systems, ser. HotOS’13. Berkeley, CA, USA: USENIX Association, 2011, pp. 4–4. [381] D. Venzano and P. Michiardi, “A Measurement Study of Data-Intensive Network Traffic Patterns in a Private Cloud,” in 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing, Dec 2013, pp. 476–481. [382] F. Xu, F. Liu, H. Jin, and A. Vasilakos, “Managing Performance Overhead of Virtual Machines in Cloud Computing: A Survey, State of the Art, and Future Directions,” Proceedings of the IEEE, vol. 102, no. 1, pp. 11– 31, Jan 2014. [383] G. Wang and T. S. E. Ng, “The Impact of Virtualization on Network Performance of Amazon EC2 Data Center,” in Proceedings of the 29th Conference on Information Communications, ser. INFOCOM’10. Piscataway, NJ, USA: IEEE Press, 2010, pp. 1163–1171. [384] Q. Duan, Y. Yan, and A. V. Vasilakos, “A Survey on Service-Oriented Network Virtualization Toward Convergence of Networking and Cloud Computing,” IEEE Transactions on Network and Service Management, vol. 9, no. 4, pp. 373–392, December 2012. [385] R. Jain and S. Paul, “Network virtualization and software defined networking for cloud computing: a survey,” Communications Magazine, IEEE, vol. 51, no. 11, pp. 24–31, November 2013. [386] A. Fischer, J. F. Botero, M. T. Beck, H. de Meer, and X. Hesselbach, “Virtual Network Embedding: A Survey,” IEEE Communications Surveys Tutorials, vol. 15, no. 4, pp. 1888–1906, Fourth 2013. [387] L. Nonde, T. El-Gorashi, and J. Elmirghani, “Energy Efficient Virtual Network Embedding for Cloud Networks,” Lightwave Technology, Journal of, vol. 33, no. 9, pp. 1828–1849, May 2015. [388] L. Nonde, T. E. H. Elgorashi, and J. M. H. Elmirghani, “Cloud Virtual Network Embedding: Profit, Power and Acceptance,” in 2015 IEEE Global Communications Conference (GLOBECOM), Dec 2015, pp. 1– 6. [389] R. Mijumbi, J. Serrat, J. Gorricho, N. Bouten, F. De Turck, and R. Boutaba, “Network Function Virtualization: State-of-the-Art and Research Challenges,” Communications Surveys Tutorials, IEEE, vol. 18, no. 1, pp. 236–262, Firstquarter 2016. [390] H. Hawilo, A. Shami, M. Mirahmadi, and R. Asal, “NFV: state of the art, challenges, and implementation in next generation mobile networks (vEPC),” IEEE Network, vol. 28, no. 6, pp. 18–26, Nov 2014. [391] V. Nguyen, A. Brunstrom, K. Grinnemo, and J. Taheri, “SDN/NFVBased Mobile Packet Core Network Architectures: A Survey,” IEEE Communications Surveys Tutorials, vol. 19, no. 3, pp. 1567–1602, thirdquarter 2017. [392] D. A. Temesgene, J. Núñez-Martínez, and P. Dini, “Softwarization and Optimization for Sustainable Future Mobile Networks: A Survey,” IEEE Access, vol. 5, pp. 25 421–25 436, 2017. [393] I. Afolabi, T. Taleb, K. Samdanis, A. Ksentini, and H. Flinck, “Network Slicing & Softwarization: A Survey on Principles, Enabling Technologies & Solutions,” IEEE Communications Surveys Tutorials, pp. 1–1, 2018. [394] J. G. Herrera and J. F. Botero, “Resource Allocation in NFV: A Comprehensive Survey,” IEEE Transactions on Network and Service Management, vol. 13, no. 3, pp. 518–532, Sept 2016. [395] L. Peterson, A. Al-Shabibi, T. Anshutz, S. Baker, A. Bavier, S. Das, J. Hart, G. Palukar, and W. Snow, “Central office re-architected as a data center,” IEEE Communications Magazine, vol. 54, no. 10, pp. 96–101, October 2016. [396] A. N. Al-Quzweeni, A. Q. Lawey, T. E. H. Elgorashi, and J. M. H. Elmirghani, “Optimized Energy Aware 5G Network Function Virtualization,” IEEE Access, pp. 1–1, 2019. [397] A. Al-Quzweeni, A. Lawey, T. El-Gorashi, and J. M. H. Elmirghani, “A framework for energy efficient NFV in 5G networks,” in 2016 18th International Conference on Transparent Optical Networks (ICTON), July 2016, pp. 1–4. [398] A. Al-Quzweeni, T. E. H. El-Gorashi, L. Nonde, and J. M. H. Elmirghani, “Energy efficient network function virtualization in 5G networks,” in 2015 17th International Conference on Transparent Optical Networks (ICTON), July 2015, pp. 1–4. [399] J. Zhang, Y. Ji, X. Xu, H. Li, Y. Zhao, and J. Zhang, “Energy efficient baseband unit aggregation in cloud radio and optical access networks,” IEEE/OSA Journal of Optical Communications and Networking, vol. 8, no. 11, pp. 893–901, Nov 2016. [400] M. Peng, Y. Li, Z. Zhao, and C. Wang, “System architecture and key technologies for 5G heterogeneous cloud radio access networks,” IEEE Network, vol. 29, no. 2, pp. 6–14, March 2015. [401] M. Peng, Y. Sun, X. Li, Z. Mao, and C. Wang, “Recent Advances in Cloud Radio Access Networks: System Architectures, Key Techniques, and Open Issues,” IEEE Communications Surveys Tutorials, vol. 18, no. 3, pp. 2282–2308, thirdquarter 2016. [402] A. Checko, H. L. Christiansen, Y. Yan, L. Scolari, G. Kardaras, M. S. Berger, and L. Dittmann, “Cloud RAN for Mobile Networks; A Technology Overview,” IEEE Communications Surveys Tutorials, vol. 17, no. 1, pp. 405–426, Firstquarter 2015. [403] L. Velasco, L. M. Contreras, G. Ferraris, A. Stavdas, F. Cugini, M. Wiegand, and J. P. Fernandez-Palacios, “A service-oriented hybrid access network and clouds architecture,” IEEE Communications Magazine, vol. 53, no. 4, pp. 159–165, April 2015. [404] M. Kalil, A. Al-Dweik, M. F. A. Sharkh, A. Shami, and A. Refaey, “A Framework for Joint Wireless Network Virtualization and Cloud Radio Access Networks for Next Generation Wireless Networks,” IEEE Access, vol. 5, pp. 20 814–20 827, 2017. [405] M. Richart, J. Baliosian, J. Serrat, and J. L. Gorricho, “Resource Slicing in Virtual Wireless Networks: A Survey,” IEEE Transactions on Network and Service Management, vol. 13, no. 3, pp. 462–476, Sept 2016. [406] R. Bolla, R. Bruschi, F. Davoli, C. Lombardo, J. F. Pajo, and O. R. Sanchez, “The dark side of network functions virtualization: A perspective on the technological sustainability,” in 2017 IEEE International Conference on Communications (ICC), May 2017, pp. 1–7. [407] D. Bernstein, “Containers and Cloud: From LXC to Docker to Kubernetes,” Cloud Computing, IEEE, vol. 1, no. 3, pp. 81–84, Sept 2014. [408] C. Pahl and B. Lee, “Containers and Clusters for Edge Cloud Architectures – A Technology Review,” in 2015 3rd International Conference on Future Internet of Things and Cloud, Aug 2015, pp. 379–386. [409] I. Mavridis and H. Karatza, “Performance and Overhead Study of Containers Running on Top of Virtual Machines,” in 2017 IEEE 19th Conference on Business Informatics (CBI), vol. 02, July 2017, pp. 32– 38. [410] M. G. Xavier, M. V. Neves, and C. A. F. D. Rose, “A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters,” in 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Feb 2014, pp. 299– 306. [411] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual machines and Linux containers,” in Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on, March 2015, pp. 171–172. [412] Docker Container Executor. (Cited on 2017, Mar). [Online]. Available: https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html [413] S. Radhakrishnan, B. J. Muscedere, and K. Daudjee, “V-Hadoop: Virtualized Hadoop using containers,” in 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA), vol. 00, Oct. 2016, pp. 237–241. [414] D. Kreutz, F. M. V. Ramos, P. E. Veríssimo, C. E. Rothenberg, S. Azodolmolky, and S. Uhlig, “Software- Defined Networking: A Comprehensive Survey,” Proceedings of the IEEE, vol. 103, no. 1, pp. 14–76, Jan 2015. [415] F. Bannour, S. Souihi, and A. Mellouk, “Distributed SDN Control: Survey, Taxonomy, and Challenges,” IEEE Communications Surveys Tutorials, vol. 20, no. 1, pp. 333–354, Firstquarter 2018. [416] B. Nunes, M. Mendonca, X.-N. Nguyen, K. Obraczka, and T. Turletti, “A Survey of Software-Defined Networking: Past, Present, and Future of Programmable Networks,” Communications Surveys Tutorials, IEEE, vol. 16, no. 3, pp. 1617–1634, Third 2014. [417] W. Xia, Y. Wen, C. H. Foh, D. Niyato, and H. Xie, “A Survey on Software-Defined Networking,” IEEE Communications Surveys Tutorials, vol. 17, no. 1, pp. 27–51, Firstquarter 2015. [418] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “Openflow: Enabling innovation in campus networks,” SIGCOMM Comput. Commun. Rev., vol. 38, no. 2, pp. 69–74, Mar. 2008. [419] F. Hu, Q. Hao, and K. Bao, “A Survey on Software-Defined Network and OpenFlow: From Concept to Implementation,” IEEE Communications Surveys Tutorials, vol. 16, no. 4, pp. 2181–2206, Fourthquarter 2014. [420] A. Lara, A. Kolasani, and B. Ramamurthy, “Network Innovation using OpenFlow: A Survey,” IEEE Communications Surveys Tutorials, vol. 16, no. 1, pp. 493–512, First 2014. [421] A. Mendiola, J. Astorga, E. Jacob, and M. Higuero, “A Survey on the Contributions of Software-Defined Networking to Traffic Engineering,” IEEE Communications Surveys Tutorials, vol. 19, no. 2, pp. 918–953, Secondquarter 2017. [422] O. Michel and E. Keller, “SDN in wide-area networks: A survey,” in 2017 Fourth International Conference on Software Defined Systems (SDS), May 2017, pp. 37–42. [423] B. Pfaff, J. Pettit, T. Koponen, E. J. Jackson, A. Zhou, J. Rajahalme, J. Gross, A. Wang, J. Stringer, P. Shelar, K. Amidon, and M. Casado, “The Design and Implementation of Open vSwitch,” in Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’15. Berkeley, CA, USA: USENIX Association, 2015, pp. 117–130. [424] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, “P4: Programming Protocol-independent Packet Processors,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 3, pp. 87–95, Jul. 2014. [425] T. Huang, F. R. Yu, C. Zhang, J. Liu, J. Zhang, and Y. Liu, “A Survey on Large-Scale Software Defined Networking (SDN) Testbeds: Approaches and Challenges,” IEEE Communications Surveys Tutorials, vol. 19, no. 2, pp. 891–917, Secondquarter 2017. [426] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat, “B4: Experience with a Globally-deployed Software Defined Wan,” SIGCOMM Comput. Commun. Rev., vol. 43, no. 4, pp. 3–14, Aug. 2013. [427] C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer, “Achieving High Utilization with Software-driven WAN,” SIGCOMM Comput. Commun. Rev., vol. 43, no. 4, pp. 15–26, Aug. 2013. [428] J. Wang, Y. Yan, and L. Dittmann, “Design of energy efficient optical networks with software enabled integrated control plane,” Networks, IET, vol. 4, no. 1, pp. 30–36, 2015. [429] L. Cui, F. R. Yu, and Q. Yan, “When big data meets software-defined networking: SDN for big data and big data for SDN,” IEEE Network, vol. 30, no. 1, pp. 58–65, January 2016. [430] H. Huang, H. Yin, G. Min, H. Jiang, J. Zhang, and Y. Wu, “Data- Driven Information Plane in Software- Defined Networking,” IEEE Communications Magazine, vol. 55, no. 6, pp. 218–224, 2017. [431] T. Hafeez, N. Ahmed, B. Ahmed, and A. W. Malik, “Detection and Mitigation of Congestion in SDN Enabled Data Center Networks: A Survey,” IEEE Access, vol. 6, pp. 1730–1740, 2018. [432] Y. Zhang, P. Chowdhury, M. Tornatore, and B. Mukherjee, “Energy Efficiency in Telecom Optical Networks,” Communications Surveys Tutorials, IEEE, vol. 12, no. 4, pp. 441–458, Fourth 2010. [433] R. Ramaswami, K. N. Sivarajan, and G. H. Sasaki, Optical Networks, 3rd ed. Morgan Kaufmann, 2010. [434] H. Yin, Y. Jiang, C. Lin, Y. Luo, and Y. Liu, “Big data: transforming the design philosophy of future internet,” Network, IEEE, vol. 28, no. 4, pp. 14–19, July 2014. [435] K.-I. Kitayama, A. Hiramatsu, M. Fukui, T. Tsuritani, N. Yamanaka, S. Okamoto, M. Jinno, and M. Koga, “Photonic Network Vision 2020 - Toward Smart Photonic Cloud,” Lightwave Technology, Journal of, vol. 32, no. 16, pp. 2760–2770, Aug 2014. [436] A. S. Thyagaturu, A. Mercian, M. P. McGarry, M. Reisslein, and W. Kellerer, “Software Defined Optical Networks (SDONs): A Comprehensive Survey,” IEEE Communications Surveys Tutorials, vol. 18, no. 4, pp. 2738–2786, Fourthquarter 2016. [437] Y. Yin, L. Liu, R. Proietti, and S. J. B. Yoo, “Software Defined Elastic Optical Networks for Cloud Computing,” IEEE Network, vol. 31, no. 1, pp. 4–10, January 2017. [438] A. Nag, M. Tornatore, and B. Mukherjee, “Optical Network Design With Mixed Line Rates and Multiple Modulation Formats,” Journal of Lightwave Technology, vol. 28, no. 4, pp. 466–475, Feb 2010. [439] Y. Ji, J. Zhang, Y. Zhao, H. Li, Q. Yang, C. Ge, Q. Xiong, D. Xue, J. Yu, and S. Qiu, “All Optical Switching Networks With Energy- Efficient Technologies From Components Level to Network Level,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 8, pp. 1600–1614, Aug 2014. [440] X. Zhao, V. Vusirikala, B. Koley, V. Kamalov, and T. Hofmeister, “The prospect of inter-data-center optical networks,” IEEE Communications Magazine, vol. 51, no. 9, pp. 32–38, September 2013. [441] G. Tzimpragos, C. Kachris, I. B. Djordjevic, M. Cvijetic, D. Soudris, and I. Tomkos, “A Survey on FEC Codes for 100 G and Beyond Optical Networks,” IEEE Communications Surveys Tutorials, vol. 18, no. 1, pp. 209–221, Firstquarter 2016. [442] D. M. Marom, P. D. Colbourne, A. D’errico, N. K. Fontaine, Y. Ikuma, R. Proietti, L. Zong, J. M. Rivas- Moscoso, and I. Tomkos, “Survey of photonic switching architectures and technologies in support of spatially and spectrally flexible optical networking [invited],” IEEE/OSA Journal of Optical Communications and Networking, vol. 9, no. 1, pp. 1–26, Jan 2017. [443] X. Yu, M. Tornatore, M. Xia, J. Wang, J. Zhang, Y. Zhao, J. Zhang, and B. Mukherjee, “Migration from fixed grid to flexible grid in optical networks,” IEEE Communications Magazine, vol. 53, no. 2, pp. 34–43, Feb 2015. [444] M. Jinno, H. Takara, B. Kozicki, Y. Tsukishima, Y. Sone, and S. Matsuoka, “Spectrum-efficient and scalable elastic optical path network: architecture, benefits, and enabling technologies,” IEEE Communications Magazine, vol. 47, no. 11, pp. 66–73, November 2009. [445] O. Gerstel, M. Jinno, A. Lord, and S. J. B. Yoo, “Elastic optical networking: a new dawn for the optical layer?” IEEE Communications Magazine, vol. 50, no. 2, pp. s12–s20, February 2012. [446] B. C. Chatterjee, N. Sarma, and E. Oki, “Routing and Spectrum Allocation in Elastic Optical Networks: A Tutorial,” IEEE Communications Surveys Tutorials, vol. 17, no. 3, pp. 1776–1800, thirdquarter 2015. [447] G. Zhang, M. D. Leenheer, A. Morea, and B. Mukherjee, “A Survey on OFDM-Based Elastic Core Optical Networking,” IEEE Communications Surveys Tutorials, vol. 15, no. 1, pp. 65–87, First 2013. [448] A. Klekamp, U. Gebhard, and F. Ilchmann, “Energy and Cost Efficiency of Adaptive and Mixed-Line-Rate IP Over DWDM Networks,” Journal of Lightwave Technology, vol. 30, no. 2, pp. 215–221, Jan 2012. [449] T. E. El-Gorashi, X. Dong, and J. M. Elmirghani, “Green optical orthogonal frequency-division multiplexing networks,” IET Optoelectronics, vol. 8, pp. 137–148(11), June 2014. [450] H. Harai, H. Furukawa, K. Fujikawa, T. Miyazawa, and N. Wada, “Optical Packet and Circuit Integrated Networks and Software Defined Networking Extension,” Journal of Lightwave Technology, vol. 32, no. 16, pp. 2751–2759, Aug 2014. [451] IT center, Intel, “Big Data in the Cloud: Converging Technologies,” Solution Brief, 2015. [452] Project Serengeti: There’s a Virtual Elephant in my Datacenter. (Cited on 2018, May). [Online]. Available: https://octo.vmware.com/project-serengeti-theres-a-virtual-elephant-in-my-datacenter/ [453] A. Iordache, C. Morin, N. Parlavantzas, E. Feller, and P. Riteau, “Resilin: Elastic MapReduce over Multiple Clouds,” in 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, May 2013, pp. 261–268. [454] D. Wang and J. Liu, “Optimizing big data processing performance in the public cloud: opportunities and approaches,” Network, IEEE, vol. 29, no. 5, pp. 31–35, September 2015. [455] D. Agrawal, S. Das, and A. El Abbadi, “Big Data and Cloud Computing: Current State and Future Opportunities,” in Proceedings of the 14th International Conference on Extending Database Technology, ser. EDBT/ICDT ’11. New York, NY, USA: ACM, 2011, pp. 530–533. [456] N. C. Luong, P. Wang, D. Niyato, Y. Wen, and Z. Han, “Resource Management in Cloud Networking Using Economic Analysis and Pricing Models: A Survey,” IEEE Communications Surveys Tutorials, vol. 19, no. 2, pp. 954–1001, Secondquarter 2017. [457] Y. Zhao, X. Fei, I. Raicu, and S. Lu, “Opportunities and Challenges in Running Scientific Workflows on the Cloud,” in 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Oct 2011, pp. 455–462. [458] E.-S. Jung and R. Kettimuthu, “Challenges and Opportunities for Data- Intensive Computing in the Cloud,” Computer, vol. 47, no. 12, pp. 82–85, 2014. [459] M. H. Ghahramani, M. Zhou, and C. T. Hon, “Toward cloud computing QoS architecture: analysis of cloud systems and cloud services,” IEEE/CAA Journal of Automatica Sinica, vol. 4, no. 1, pp. 6–18, Jan 2017. [460] Latency is Everywhere and it Costs you Sales - How to Crush it. (Cited on 2017, Dec). [Online]. Available: http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it [461] R. Kohavi, R. M. Henne, and D. Sommerfield, “Practical Guide to Controlled Experiments on the Web: Listen to Your Customers Not to the Hippo,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’07. New York, NY, USA: ACM, 2007, pp. 959–967. [462] S. S. Krishnan and R. K. Sitaraman, “Video Stream Quality Impacts Viewer Behavior: Inferring Causality Using Quasi-experimental Designs,” in Proceedings of the 2012 Internet Measurement Conference, ser. IMC ’12. New York, NY, USA: ACM, 2012, pp. 211–224. [463] C. Colman-Meixner, C. Develder, M. Tornatore, and B. Mukherjee, “A Survey on Resiliency Techniques in Cloud Computing Infrastructures and Applications,” IEEE Communications Surveys Tutorials, vol. 18, no. 3, pp. 2244–2281, thirdquarter 2016. [464] A. Vishwanath, F. Jalali, K. Hinton, T. Alpcan, R. W. A. Ayre, and R. S. Tucker, “Energy Consumption Comparison of Interactive Cloud-Based and Local Applications,” IEEE Journal on Selected Areas in Communications, vol. 33, no. 4, pp. 616–626, April 2015. [465] J. Baliga, R. W. A. Ayre, K. Hinton, and R. S. Tucker, “Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport,” Proceedings of the IEEE, vol. 99, no. 1, pp. 149–167, Jan 2011. [466] H. Zhang, Q. Zhang, Z. Zhou, X. Du, W. Yu, and M. Guizani, “Processing geo-dispersed big data in an advanced mapreduce framework,” Network, IEEE, vol. 29, no. 5, pp. 24–30, September 2015. [467] A. Vulimiri, C. Curino, P. B. Godfrey, T. Jungblut, J. Padhye, and G. Varghese, “Global Analytics in the Face of Bandwidth and Regulatory Constraints,” in Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’15. Berkeley, CA, USA: USENIX Association, 2015, pp. 323– 336. [468] F. Idzikowski, L. Chiaraviglio, A. Cianfrani, J. L. Vizcaíno, M. Polverini, and Y. Ye, “A Survey on Energy- Aware Design and Operation of Core Networks,” IEEE Communications Surveys Tutorials, vol. 18, no. 2, pp. 1453–1499, Secondquarter 2016. [469] W. V. Heddeghem, B. Lannoo, D. Colle, M. Pickavet, and P. Demeester, “A Quantitative Survey of the Power Saving Potential in IP-Over-WDM Backbone Networks,” IEEE Communications Surveys Tutorials, vol. 18, no. 1, pp. 706–731, Firstquarter 2016. [470] M. N. Dharmaweera, R. Parthiban, and Y. A. ¸Sekercio˘glu, “Toward a Power-Efficient Backbone Network: The State of Research,” IEEE Communications Surveys Tutorials, vol. 17, no. 1, pp. 198–227, Firstquarter 2015. [471] R. S. Tucker, “Green Optical Communications—Part I: Energy Limitations in Transport,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 17, no. 2, pp. 245–260, March 2011. [472] R. S. Tucker, “Green Optical Communications—Part II: Energy Limitations in Networks,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 17, no. 2, pp. 261–274, March 2011. [473] W. V. Heddeghem, M. D. Groote, W. Vereecken, D. Colle, M. Pickavet, and P. Demeester, “Energy- efficiency in telecommunications networks: Link-by-link versus end-to-end grooming,” in 2010 14th Conference on Optical Network Design and Modeling (ONDM), Feb 2010, pp. 1–6. [474] A. Fehske, G. Fettweis, J. Malmodin, and G. Biczok, “The global footprint of mobile communications: The ecological and economic perspective,” IEEE Communications Magazine, vol. 49, no. 8, pp. 55–62, August 2011. [475] H. A. Alharbi, M. Musa, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Real-Time Emissions of Telecom Core Networks,” in 2018 20th International Conference on Transparent Optical Networks (ICTON), July 2018, pp. 1–4. [476] D. Feng, C. Jiang, G. Lim, L. J. Cimini, G. Feng, and G. Y. Li, “A survey of energy-efficient wireless communications,” IEEE Communications Surveys Tutorials, vol. 15, no. 1, pp. 167–178, First 2013. [477] L. Budzisz, F. Ganji, G. Rizzo, M. A. Marsan, M. Meo, Y. Zhang, G. Koutitas, L. Tassiulas, S. Lambert, B. Lannoo, M. Pickavet, A. Conte, I. Haratcherev, and A. Wolisz, “Dynamic Resource Provisioning for Energy Efficiency in Wireless Access Networks: A Survey and an Outlook,” IEEE Communications Surveys Tutorials, vol. 16, no. 4, pp. 2259–2285, Fourthquarter 2014. [478] M. Ismail,W. Zhuang, E. Serpedin, and K. Qaraqe, “A Survey on Green Mobile Networking: From The Perspectives of Network Operators and Mobile Users,” IEEE Communications Surveys Tutorials, vol. 17, no. 3, pp. 1535–1556, thirdquarter 2015. [479] K. Gomez, R. Riggio, T. Rasheed, and F. Granelli, “Analysing the energy consumption behaviour of WiFi networks,” in 2011 IEEE Online Conference on Green Communications, Sept 2011, pp. 98–104. [480] S. Xiao, X. Zhou, D. Feng, Y. Yuan-Wu, G. Y. Li, and W. Guo, “Energy-Efficient Mobile Association in Heterogeneous Networks With Device-to-Device Communications,” IEEE Transactions on Wireless Communications, vol. 15, no. 8, pp. 5260–5271, Aug 2016. [481] A. Abrol and R. K. Jha, “Power Optimization in 5G Networks: A Step Towards GrEEn Communication,” IEEE Access, vol. 4, pp. 1355–1374, 2016. [482] L. Valcarenghi, D. P. Van, P. G. Raponi, P. Castoldi, D. R. Campelo, S. Wong, S. Yen, L. G. Kazovsky, and S. Yamashita, “Energy efficiency in passive optical networks: where, when, and how?” IEEE Network, vol. 26, no. 6, pp. 61–68, November 2012. [483] J. Kani, S. Shimazu, N. Yoshimoto, and H. Hadama, “Energy-efficient optical access networks: issues and technologies,” IEEE Communications Magazine, vol. 51, no. 2, pp. S22–S26, February 2013. [484] P. Vetter, D. Suvakovic, H. Chow, P. Anthapadmanabhan, K. Kanonakis, K. Lee, F. Saliou, X. Yin, and B. Lannoo, “Energy-efficiency improvements for optical access,” IEEE Communications Magazine, vol. 52, no. 4, pp. 136–144, April 2014. [485] B. Skubic, E. I. de Betou, T. Ayhan, and S. Dahlfort, “Energy-efficient next-generation optical access networks,” IEEE Communications Magazine, vol. 50, no. 1, pp. 122–127, January 2012. [486] E. Goma, M. Canini, A. Lopez, N. Laoutaris, D. Kostic, P. Rodriguez, R. Stanojevic, and P. Yague, “Insomnia in the Access (or How to Curb Access Network Related Energy Consumption),” Proceedings of the ACM SIGCOMM 2011 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, 2011. [487] A. S. Gowda, A. R. Dhaini, L. G. Kazovsky, H. Yang, S. T. Abraha, and A. Ng’oma, “Towards Green Optical/Wireless In-Building Networks: Radio-Over-Fiber,” Journal of Lightwave Technology, vol. 32, no. 20, pp. 3545–3556, Oct 2014. [488] B. Kantarci and H. T. Mouftah, “Energy efficiency in the extendedreach fiber-wireless access networks,” IEEE Network, vol. 26, no. 2, pp. 28–35, March 2012. [489] M. Gupta and S. Singh, “Greening of the Internet,” in Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ser. SIGCOMM ’03. New York, NY, USA: ACM, 2003, pp. 19–26. [490] J. C. C. Restrepo, C. G. Gruber, and C. M. Machuca, “Energy Profile Aware Routing,” in 2009 IEEE International Conference on Communications Workshops, June 2009, pp. 1–5. [491] S. Nedevschi, L. Popa, G. Iannaccone, S. Ratnasamy, and D. Wetherall, “Reducing Network Energy Consumption via Sleeping and Rateadaptation,” in Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, ser. NSDI’08. Berkeley, CA, USA: USENIX Association, 2008, pp. 323– 336. [492] G. Shen and R. Tucker, “Energy-Minimized Design for IP Over WDM Networks,” Optical Communications and Networking, IEEE/OSA Journal of, vol. 1, no. 1, pp. 176–186, June 2009. [493] X. Dong, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “On the Energy Efficiency of Physical Topology Design for IP Over WDM Networks,” Journal of Lightwave Technology, vol. 30, no. 12, pp. 1931–1942, June 2012. [494] S. Zhang, D. Shen, and C. K. Chan, “Energy-Efficient Traffic Grooming in WDM Networks With Scheduled Time Traffic,” Journal of Lightwave Technology, vol. 29, no. 17, pp. 2577–2584, Sept 2011. [495] Z. H. Nasralla, T. E. H. El-Gorashi, M. O. I. Musa, and J. M. H. Elmirghani, “Energy-Efficient Traffic Scheduling in IP over WDM Networks,” in 2015 9th International Conference on Next Generation Mobile Applications, Services and Technologies, Sept 2015, pp. 161–164. [496] Z. H. Nasralla and T. E. H. El-Gorashi and M. O. I. Musa and J. M. H. Elmirghani, “Routing post-disaster traffic floods in optical core networks,” in 2016 International Conference on Optical Network Design and Modeling (ONDM), May 2016, pp. 1–5. [497] Z. H. Nasralla, M. O. I. Musa, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Routing post-disaster traffic floods heuristics,” in 2016 18th International Conference on Transparent Optical Networks (ICTON), July 2016, pp. 1–4. [498] M. O. I. Musa, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Network coding for energy efficiency in bypass IP/WDM networks,” in 2016 18th International Conference on Transparent Optical Networks (ICTON), July 2016, pp. 1–3. [499] M. O. I. Musa and T. E. H. El-Gorashi and J. M. H. Elmirghani, “Energy efficient core networks using network coding,” in 2015 17th International Conference on Transparent Optical Networks (ICTON), July 2015, pp. 1–4. [500] T. E. H. El-Gorashi, X. Dong, A. Lawey, and J. M. H. Elmirghani, “Core network physical topology design for energy efficiency and resilience,” in 2013 15th International Conference on Transparent Optical Networks (ICTON), June 2013, pp. 1–7. [501] M. Musa, T. Elgorashi, and J. Elmirghani, “Energy efficient survivable IP-over-WDM networks with network coding,” IEEE/OSA Journal of Optical Communications and Networking, vol. 9, no. 3, pp. 207–217, March 2017. [502] M. Musa and T. Elgorashi and J. Elmirghani, “Bounds for energy efficient survivable IP over WDM networks with network coding,” IEEE/OSA Journal of Optical Communications and Networking, vol. 10, no. 5, pp. 471–481, May 2018. [503] Y. Li, L. Zhu, S. K. Bose, and G. Shen, “Energy-Saving in IP Over WDM Networks by Putting Protection Router Cards to Sleep,” Journal of Lightwave Technology, vol. 36, no. 14, pp. 3003–3017, July 2018. [504] J. M. H. Elmirghani, L. Nonde, A. Q. Lawey, T. E. H. El-Gorashi, M. O. I. Musa, X. Dong, K. Hinton, and T. Klein, “Energy efficiency measures for future core networks,” in 2017 Optical Fiber Communications Conference and Exhibition (OFC), March 2017, pp. 1–3. [505] J. M. H. Elmirghani, T. Klein, K. Hinton, L. Nonde, A. Q. Lawey, T. E. H. El-Gorashi, M. O. I. Musa, and X. Dong, “GreenTouch GreenMeter core network energy-efficiency improvement measures and optimization,” IEEE/OSA Journal of Optical Communications and Networking, vol. 10, no. 2, pp. A250–A269, Feb 2018. [506] M. O. I. Musa, T. El-Gorashi, and J. M. H. Elmirghani, “Bounds on GreenTouch GreenMeter Network Energy Efficiency,” Journal of Lightwave Technology, pp. 1–1, 2018. [507] X. Dong, T. El-Gorashi, and J. Elmirghani, “IP Over WDM Networks Employing Renewable Energy Sources,” Lightwave Technology, Journal of, vol. 29, no. 1, pp. 3–14, Jan 2011. [508] X. Dong, T. El-Gorashi, and J. M. H. Elmirghani, “Green IP over WDM Networks: Solar and Wind Renewable Sources and Data Centres,” in 2011 IEEE Global Telecommunications Conference – GLOBECOM 2011, Dec 2011, pp. 1–6. [509] G. Shen, Y. Lui, and S. K. Bose, ““Follow the Sun, Follow the Wind” Lightpath Virtual Topology Reconfiguration in IP Over WDM Network,” Journal of Lightwave Technology, vol. 32, no. 11, pp. 2094– 2105, June 2014. [510] M. Gattulli, M. Tornatore, R. Fiandra, and A. Pattavina, “Low- Emissions Routing for Cloud Computing in IP-over-WDM Networks with Data Centers,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 1, pp. 28–38, January 2014. [511] A. Q. Lawey, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Renewable energy in distributed energy efficient content delivery clouds,” in 2015 IEEE International Conference on Communications (ICC), June 2015, pp. 128–134. [512] S. K. Dey and A. Adhya, “Delay-aware green service migration schemes for data center traffic,” IEEE/OSA Journal of Optical Communications and Networking, vol. 8, no. 12, pp. 962–975, December 2016. [513] L. Nonde, T. E. H. Elgorashi, and J. M. H. Elmirgahni, “Virtual Network Embedding Employing Renewable Energy Sources,” in 2016 IEEE Global Communications Conference (GLOBECOM), Dec 2016, pp. 1–6. [514] C. Ge, Z. Sun, and N.Wang, “A Survey of Power-Saving Techniques on Data Centers and Content Delivery Networks,” IEEE Communications Surveys Tutorials, vol. 15, no. 3, pp. 1334–1354, Third 2013. [515] C. Fang, F. R. Yu, T. Huang, J. Liu, and Y. Liu, “A Survey of Green Information-Centric Networking: Research Issues and Challenges,” IEEE Communications Surveys Tutorials, vol. 17, no. 3, pp. 1455–1472, thirdquarter 2015. [516] X. Dong, T. El-Gorashi, and J. Elmirghani, “Green IP Over WDM Networks With Data Centers,” Lightwave Technology, Journal of, vol. 29, no. 12, pp. 1861–1880, June 2011. [517] V. Valancius, N. Laoutaris, L. Massoulié, C. Diot, and P. Rodriguez, “Greening the Internet with Nano Data Centers,” in Proceedings of the 5th International Conference on Emerging Networking Experiments and Technologies, ser. CoNEXT ’09. New York, NY, USA: ACM, 2009, pp. 37–48. [518] C. Jayasundara, A. Nirmalathas, E. Wong, and C. Chan, “Improving Energy Efficiency of Video on Demand Services,” IEEE/OSA Journal of Optical Communications and Networking, vol. 3, no. 11, pp. 870–880, November 2011. [519] N. I. Osman, T. El-Gorashi, and J. M. H. Elmirghani, “Reduction of energy consumption of Video-on- Demand services using cache size optimization,” in 2011 Eighth International Conference on Wireless and Optical Communications Networks, May 2011, pp. 1–5. [520] N. I. Osman and T. El-Gorashi and J. M. H. Elmirghani, “The impact of content popularity distribution on energy efficient caching,” in 2013 15th International Conference on Transparent Optical Networks (ICTON), June 2013, pp. 1–6. [521] N. I. Osman, T. El-Gorashi, L. Krug, and J. M. H. Elmirghani, “Energy-Efficient Future High-Definition TV,” Journal of Lightwave Technology, vol. 32, no. 13, pp. 2364–2381, July 2014. [522] A. Q. Lawey, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “BitTorrent Content Distribution in Optical Networks,” Journal of Lightwave Technology, vol. 32, no. 21, pp. 4209–4225, Nov 2014. [523] A. Lawey, T. El-Gorashi, and J. Elmirghani, “Distributed Energy Efficient Clouds Over Core Networks,” Lightwave Technology, Journal of, vol. 32, no. 7, pp. 1261–1281, April 2014. [524] H. A. Alharbi, T. E. H. El-Gorashi, A. Q. Lawey, and J. M. H. Elmirghani, “Energy efficient virtual machines placement in IP over WDM networks,” in 2017 19th International Conference on Transparent Optical Networks (ICTON), July 2017, pp. 1–4. [525] U. Wajid, c. cappiello, P. Plebani, B. Pernici, N. Mehandjiev, M. Vitali, M. Gienger, K. Kavoussanakis, D. Margery, D. Perez, and P. Sampaio, “On Achieving Energy Efficiency and Reducing CO2 Footprint in Cloud Computing,” Cloud Computing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [526] A. Al-Salim, A. Lawey, T. El-Gorashi, and J. Elmirghani, “Energy Efficient Tapered Data Networks for Big Data processing in IP/WDM networks,” in Transparent Optical Networks (ICTON), 2015 17th International Conference on, July 2015, pp. 1–5. [527] A. M. Al-Salim, H. M. M. Ali, A. Q. Lawey, T. El-Gorashi, and J. M. H. Elmirghani, “Greening big data networks: Volume impact,” in 2016 18th International Conference on Transparent Optical Networks (ICTON), July 2016, pp. 1–6. [528] L. A. Barroso and U. Hoelzle, The Datacenter As a Computer: An Introduction to the Design of Warehouse- Scale Machines, 1st ed. Morgan and Claypool Publishers, 2009. [529] T. Wang, Z. Su, Y. Xia, and M. Hamdi, “Rethinking the Data Center Networking: Architecture, Network Protocols, and Resource Sharing,” Access, IEEE, vol. 2, pp. 1481–1496, 2014. [530] A. Hammadi and L. Mhamdi, “A survey on architectures and energy efficiency in data center networks,” Computer Communications, vol. 40, pp. 1 – 21, 2014. [531] W. Xia, P. Zhao, Y. Wen, and H. Xie, “A Survey on Data Center Networking (DCN): Infrastructure and Operations,” IEEE Communications Surveys Tutorials, vol. PP, no. 99, pp. 1–1, 2016. [532] T. Chen, X. Gao, and G. Chen, “The features, hardware, and architectures of data center networks: A survey,” Journal of Parallel and Distributed Computing, vol. 96, pp. 45 – 74, 2016. [533] Y. Liu, J. K. Muppala, M. Veeraraghavan, D. Lin, and M. Hamdi, “Data Center Networks,” in SpringerBriefs in Computer Science, 2013. [534] K. Bilal, S. U. R. Malik, O. Khalid, A. Hameed, E. Alvarez, V. Wijaysekara, R. Irfan, S. Shrestha, D. Dwivedy, M. Ali, U. S. Khan, A. Abbas, N. Jalil, and S. U. Khan, “A taxonomy and survey on green data center networks,” Future Generation Computer Systems, vol. 36, pp. 189 – 208, 2014. [535] B. Wang, Z. Qi, R. Ma, H. Guan, and A. V. Vasilakos, “A Survey on Data Center Networking for Cloud Computing,” Comput. Netw., vol. 91, no. C, pp. 528–547, Nov. 2015. [536] M. Al-Fares, A. Loukissas, and A. Vahdat, “A Scalable, Commodity Data Center Network Architecture,” SIGCOMM Comput. Commun. Rev., vol. 38, no. 4, pp. 63–74, Aug. 2008. [537] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “VL2: A Scalable and Flexible Data Center Network,” SIGCOMM Comput. Commun. Rev., vol. 39, no. 4, pp. 51–62, Aug. 2009. [538] J. Kim, W. J. Dally, and D. Abts, “Flattened Butterfly: A Cost-efficient Topology for High-radix Networks,” SIGARCH Comput. Archit. News, vol. 35, no. 2, pp. 126–137, Jun. 2007. [539] J. H. Ahn, N. Binkert, A. Davis, M. McLaren, and R. S. Schreiber, “HyperX: topology, routing, and packaging of efficient large-scale networks,” in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, Nov 2009, pp. 1–11. [540] Pall Beck, Peter Clemens, Santiago Freitas, Jeff Gatz, Michele Girola, Jason Gmitter, Holger Mueller, Ray O’Hanlon, Veerendra Para, Joe Robinson, Andy Sholomon, Jason Walker, and Jon Tate, IBM and Cisco: Together for a World Class Data Center. IBM Redbooks, 2013. [541] A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, and A. Vahdat, “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network,” in Sigcomm ’15, 2015. [542] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu, “BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers,” SIGCOMM Comput. Commun. Rev., vol. 39, no. 4, pp. 63–74, Aug. 2009. [543] H. Wu, G. Lu, D. Li, C. Guo, and Y. Zhang, “MDCube: A High Performance Network Structure for Modular Data Center Interconnection,” in Proceedings of the 5th International Conference on Emerging Networking Experiments and Technologies, ser. CoNEXT ’09. New York, NY, USA: ACM, 2009, pp. 25–36. [544] H. Abu-Libdeh, P. Costa, A. Rowstron, G. O’Shea, and A. Donnelly, “Symbiotic Routing in Future Data Centers,” SIGCOMM Comput. Commun. Rev., vol. 40, no. 4, pp. 51–62, Aug. 2010. [545] C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu, “DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers,” in SIGCOMM08. Association for Computing Machinery, Inc., August 2008. [546] D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, and S. Lu, “FiConn: Using Backup Port for Server Interconnection in Data Centers,” in IEEE INFOCOM 2009, April 2009, pp. 2276–2285. [547] A. Singla, C. Hong, L. Popa, and P. B. Godfrey, “Jellyfish: Networking Data Centers Randomly,” CoRR, vol. abs/1110.1687, 2011. [548] L. Gyarmati and T. A. Trinh, “Scafida: A Scale-free Network Inspired Data Center Architecture,” SIGCOMM Comput. Commun. Rev., vol. 40, no. 5, pp. 4–12, Oct. 2010. [549] J.-Y. Shin, B. Wong, and E. G. Sirer, “Small-world Datacenters,” in Proceedings of the 2Nd ACM Symposium on Cloud Computing, ser. SOCC ’11. New York, NY, USA: ACM, 2011, pp. 2:1–2:13. [550] E. Baccour, S. Foufou, R. Hamila, and M. Hamdi, “A survey of wireless data center networks,” in 2015 49th Annual Conference on Information Sciences and Systems (CISS), March 2015, pp. 1–6. [551] A. S. Hamza, J. S. Deogun, and D. R. Alexander, “Wireless Communication in Data Centers: A Survey,” IEEE Communications Surveys Tutorials, vol. 18, no. 3, pp. 1572–1595, thirdquarter 2016. [552] C. Kachris and I. Tomkos, “A survey on optical interconnects for data centers,” Communications Surveys Tutorials, IEEE, vol. 14, no. 4, pp. 1021–1036, Fourth 2012. [553] M. Chen, H. Jin, Y. Wen, and V. Leung, “Enabling technologies for future data center networking: a primer,” Network, IEEE, vol. 27, no. 4, pp. 8–15, July 2013. [554] L. Schares, D. M. Kuchta, and A. F. Benner, “Optics in Future Data Center Networks,” in 2010 18th IEEE Symposium on High Performance Interconnects, Aug 2010, pp. 104–108. [555] H. Ballani, P. Costa, I. Haller, K. Jozwik, K. Shi, B. Thomsen, and H. Williams, “Bridging the Last Mile for Optical Switching in Data Centers,” in 2018 Optical Fiber Communications Conference and Exposition (OFC), March 2018, pp. 1–3. [556] G. Papen, “Optical components for datacenters,” in 2017 Optical Fiber Communications Conference and Exhibition (OFC), March 2017, pp. 1–53. [557] L. Chen, E. Hall, L. Theogarajan, and J. Bowers, “Photonic Switching for Data Center Applications,” IEEE Photonics Journal, vol. 3, no. 5, pp. 834–844, Oct 2011. [558] P. N. Ji, D. Qian, K. Kanonakis, C. Kachris, and I. Tomkos, “Design and Evaluation of a Flexible-Bandwidth OFDM-Based Intra-Data Center Interconnect,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 19, no. 2, pp. 3 700 310–3 700 310, March 2013. [559] C. Kachris and I. Tomkos, “Power consumption evaluation of hybrid WDM PON networks for data centers,” in 2011 16th European Conference on Networks and Optical Communications, July 2011, pp. 118–121. [560] Y. Cheng, M. Fiorani, R. Lin, L. Wosinska, and J. Chen, “POTORI: a passive optical top-of-rack interconnect architecture for data centers,” IEEE/OSA Journal of Optical Communications and Networking, vol. 9, no. 5, pp. 401–411, May 2017. [561] H. Liu, F. Lu, A. Forencich, R. Kapoor, M. Tewari, G. M. Voelker, G. Papen, A. C. Snoeren, and G. Porter, “Circuit Switching Under the Radar with REACToR,” in 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). Seattle, WA: USENIX Association, 2014, pp. 1–15. [562] J. Elmirghani, T. EL-GORASHI, and A. HAMMADI, “Passive optical-based data center networks,” 2016, wO Patent App. PCT/GB2015/053,604. [Online]. Available: http://google.com/patents/WO2016083812A1?cl=und [563] A. Hammadi, T. El-Gorashi, and J. Elmirghani, “High performance AWGR PONs in data centre networks,” in Transparent Optical Networks (ICTON), 2015 17th International Conference on, July 2015, pp. 1–5. [564] R. Alani, A. Hammadi, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “PON data centre design with AWGR and server based routing,” in 2017 19th International Conference on Transparent Optical Networks (ICTON), July 2017, pp. 1–4. [565] A. Hammadi, T. E. H. El-Gorashi, M. O. I. Musa, and J. M. H. Elmirghani, “Server-centric PON data center architecture,” in 2016 18th International Conference on Transparent Optical Networks (ICTON), July 2016, pp. 1–4. [566] A. Hammadi, M. Musa, T. E. H. El-Gorashi, and J. H. Elmirghani, “Resource provisioning for cloud PON AWGR-based data center architecture,” in 2016 21st European Conference on Networks and Optical Communications (NOC), June 2016, pp. 178–182. [567] A. Hammadi, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Energyefficient software-defined AWGR- based PON data center network,” in 2016 18th International Conference on Transparent Optical Networks (ICTON), July 2016, pp. 1–5. [568] A. E. A. Eltraify, M. O. I. Musa, A. Al-Quzweeni, and J. M. H. Elmirghani, “Experimental Evaluation of Passive Optical Network Based Data Centre Architecture,” in 2018 20th International Conference on Transparent Optical Networks (ICTON), July 2018, pp. 1–4. [569] S. H. Mohamed, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Energy Efficiency of Server-Centric PON Data Center Architecture for Fog Computing,” in 2018 20th International Conference on Transparent Optical Networks (ICTON), July 2018, pp. 1–4. [570] S. H. Mohamed, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Impact of Link Failures on the Performance of MapReduce in Data Center Networks,” in 2018 20th International Conference on Transparent Optical Networks (ICTON), July 2018, pp. 1–4. [571] R. A. T. Alani, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Virtual Machines Embedding for Cloud PON AWGR and Server Based Data Centres,” arXiv e-prints, p. arXiv:1904.03298, Apr 2019. [572] A. E. A. Eltraify, M. O. I. Musa, A. Al-Quzweeni, and J. M. H. Elmirghani, “Experimental Evaluation of Server Centric Passive Optical Network Based Data Centre Architecture,” arXiv e-prints, p. arXiv:1904.04580, Apr 2019. [573] A. E. A. Eltraify, M. O. I. Musa, and J. M. H. Elmirghani, “TDM/WDM over AWGR Based Passive Optical Network Data Centre Architecture,” arXiv e-prints, p. arXiv:1904.04581, Apr 2019. [574] H. Yang, J. Zhang, Y. Zhao, J. Han, Y. Lin, and Y. Lee, “SUDOI: software defined networking for ubiquitous data center optical interconnection,” IEEE Communications Magazine, vol. 54, no. 2, pp. 86–95, February 2016. [575] M. Fiorani, S. Aleksic, M. Casoni, L. Wosinska, and J. Chen, “Energy- Efficient Elastic Optical Interconnect Architecture for Data Centers,” IEEE Communications Letters, vol. 18, no. 9, pp. 1531–1534, Sep. 2014. [576] Z. Cao, R. Proietti, M. Clements, and S. J. B. Yoo, “Experimental Demonstration of Flexible Bandwidth Optical Data Center Core Network With All-to-All Interconnectivity,” Journal of Lightwave Technology, vol. 33, no. 8, pp. 1578–1585, April 2015. [577] S. J. B. Yoo, Y. Yin, and K. Wen, “Intra and inter datacenter networking: The role of optical packet switching and flexible bandwidth optical networking,” in 2012 16th International Conference on Optical Network Design and Modelling (ONDM), April 2012, pp. 1–6. [578] H. J. S. Dorren, S. Di Lucente, J. Luo, O. Raz, and N. Calabretta, “Scaling photonic packet switches to a large number of ports [invited],” IEEE/OSA Journal of Optical Communications and Networking, vol. 4, no. 9, pp. A82–A89, Sep. 2012. [579] F. Yan, W. Miao, O. Raz, and N. Calabretta, “Opsquare: A flat DCN architecture based on flow-controlled optical packet switches,” IEEE/OSA Journal of Optical Communications and Networking, vol. 9, no. 4, pp. 291– 303, April 2017. [580] X. Yu, H. Gu, K. Wang, and S. Ma, “Petascale: A Scalable Buffer-Less All-Optical Network for Cloud Computing Data Center,” IEEE Access, vol. 7, pp. 42 596–42 608, 2019. [581] G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki, T. E. Ng, M. Kozuch, and M. Ryan, “c Through: part-time optics in data centers,” SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, Aug. 2010. [582] N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Subramanya, Y. Fainman, G. Papen, and A. Vahdat, “Helios: a hybrid electrical/optical switch architecture for modular data centers,” SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, Aug. 2010. [583] G. Porter, R. Strong, N. Farrington, A. Forencich, P. Chen-Sun, T. Rosing, Y. Fainman, G. Papen, and A. Vahdat, “Integrating Microsecond Circuit Switching into the Data Center,” SIGCOMM Comput. Commun. Rev., vol. 43, no. 4, pp. 447–458, Aug. 2013. [584] A. Singla, A. Singh, and Y. Chen, “OSA: An Optical Switching Architecture for Data Center Networks with Unprecedented Flexibility,” in Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). San Jose, CA: USENIX, 2012, pp. 239–252. [585] A. Singla, A. Singh, K. Ramachandran, L. Xu, and Y. Zhang, “Proteus: A Topology Malleable Data Center Network,” in Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, ser. Hotnets-IX. New York, NY, USA: ACM, 2010, pp. 8:1–8:6. [586] X. Ye, Y. Yin, S. Yoo, P. Mejia, R. Proietti, and V. Akella, “DOS – A scalable optical switch for datacenters,” in Architectures for Networking and Communications Systems (ANCS), 2010 ACM/IEEE Symposium on, Oct 2010, pp. 1–12. [587] K. Xia, Y. H. Kaob, M. Yangb, and H. J. Chao, “Petabit optical switch for data center networks,” Polytechnic Institute of NYU, Tech. Rep., 2010. [588] N. Hamedazimi, Z. Qazi, H. Gupta, V. Sekar, S. R. Das, J. P. Longtin, H. Shah, and A. Tanwer, “FireFly: A Reconfigurable Wireless Data Center Fabric Using Free-space Optics,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 4, pp. 319–330, Aug. 2014. [589] N. Hamedazimi, H. Gupta, V. Sekar, and S. R. Das, “Patch Panels in the Sky: A Case for Free-space Optics in Data Centers,” in Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks, ser. HotNets-XII. New York, NY, USA: ACM, 2013, pp. 23:1–23:7. [590] A. Roozbeh, J. Soares, G. Q. Maguire, F. Wuhib, C. Padala, M. Mahloo, D. Turull, V. Yadhav, and D. Kosti´c, “Software-Defined “Hardware” Infrastructures: A Survey on Enabling Technologies and Open Research Directions,” IEEE Communications Surveys Tutorials, vol. 20, no. 3, pp. 2454–2485, thirdquarter 2018. [591] S. Rumley, D. Nikolova, R. Hendry, Q. Li, D. Calhoun, and K. Bergman, “Silicon Photonics for Exascale Systems,” Journal of Lightwave Technology, vol. 33, no. 3, pp. 547–562, Feb 2015. [592] M. A. Taubenblatt, “Optical Interconnects for High-Performance Computing,” Journal of Lightwave Technology, vol. 30, no. 4, pp. 448–457, Feb 2012. [593] G. Zervas, H. Yuan, A. Saljoghei, Q. Chen, and V. Mishra, “Optically disaggregated data centers with minimal remote memory latency: Technologies, architectures, and resource allocation [Invited],” IEEE/OSA Journal of Optical Communications and Networking, vol. 10, no. 2, pp. A270–A285, Feb 2018. [594] G. M. Saridis, Y. Yan, Y. Shu, S. Yan, M. Arslan, T. Bradley, N. V. Wheeler, N. H. L. Wong, F. Poletti, M. N. Petrovich, D. J. Richardson, S. Poole, G. Zervas, and D. Simeonidou, “EVROS: Alloptical programmable disaggregated data centre interconnect utilizing hollow-core bandgap fibre,” in 2015 European Conference on Optical Communication (ECOC), Sep. 2015, pp. 1–3. [595] O. O. Ajibola, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Disaggregation for Improved Efficiency in Fog Computing Era,” arXiv e-prints, p. arXiv:1904.01311, Apr 2019. [596] O. O. Ajibola, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “On Energy Efficiency of Networks for Composable Datacentre Infrastructures,” in 2018 20th International Conference on Transparent Optical Networks (ICTON), July 2018, pp. 1–5. [597] H. M. M. Ali, T. E. H. El-Gorashi, A. Q. Lawey, and J. M. H. Elmirghani, “Future Energy Efficient Data Centers with Disaggregated Servers,” Journal of Lightwave Technology, vol. PP, no. 99, pp. 1–1, 2017. [598] H. Mohammad Ali, A. Lawey, T. El-Gorashi, and J. Elmirghani, “Energy efficient disaggregated servers for future data centers,” in Networks and Optical Communications - (NOC), 2015 20th European Conference on, June 2015, pp. 1–6. [599] H. M. M. Ali, A. M. Al-Salim, A. Q. Lawey, T. El-Gorashi, and J. M. H. Elmirghani, “Energy efficient resource provisioning with VM migration heuristic for Disaggregated Server design,” in 2016 18th International Conference on Transparent Optical Networks (ICTON), July 2016, pp. 1–5. [600] S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken, “The Nature of Data Center Traffic: Measurements & Analysis,” in Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement Conference, ser. IMC ’09. New York, NY, USA: ACM, 2009, pp. 202–208. [601] T. Benson, A. Anand, A. Akella, and M. Zhang, “Understanding Data Center Traffic Characteristics,” SIGCOMM Comput. Commun. Rev., vol. 40, no. 1, pp. 92–99, Jan. 2010. [602] T. Benson, A. Akella, and D. A. Maltz, “Network Traffic Characteristics of Data Centers in the Wild,” in Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, ser. IMC ’10. New York, NY, USA: ACM, 2010, pp. 267–280. [603] A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the Social Network’s (Datacenter) Network,” SIGCOMM Comput. Commun. Rev., vol. 45, no. 5, pp. 123–137, Aug. 2015. [604] Q. Zhang, V. Liu, H. Zeng, and A. Krishnamurthy, “High-resolution Measurement of Data Center Microbursts,” in Proceedings of the 2017 Internet Measurement Conference, ser. IMC ’17. New York, NY, USA: ACM, 2017, pp. 78–85. [605] D. A. Popescu and A. W. Moore, “A First Look at Data Center Network Condition Through The Eyes of PTPmesh,” in 2018 Network Traffic Measurement and Analysis Conference (TMA), June 2018, pp. 1–8. [606] Y. Peng, K. Chen, G. Wang, W. Bai, Y. Zhao, H. Wang, Y. Geng, Z. Ma, and L. Gu, “Towards Comprehensive Traffic Forecasting in Cloud Computing: Design and Application,” Networking, IEEE/ACM Transactions on, vol. PP, no. 99, pp. 1–1, 2015. [607] C. H. Liu, A. Kind, and A. V. Vasilakos, “Sketching the data center network traffic,” IEEE Network, vol. 27, no. 4, pp. 33–39, July 2013. [608] Z. Hu, Y. Qiao, J. Luo, P. Sun, and Y. Wen, “CREATE: Correlation enhanced traffic matrix estimation in Data Center Networks,” in 2014 IFIP Networking Conference, June 2014, pp. 1–9. [609] Y. Han, J. Yoo, and J. W. Hong, “Poisson shot-noise process based flow-level traffic matrix generation for data center networks,” in 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), May 2015, pp. 450–457. [610] C. Delimitrou, S. Sankar, A. Kansal, and C. Kozyrakis, “ECHO: Recreating network traffic maps for datacenters with tens of thousands of servers,” in 2012 IEEE International Symposium on Workload Characterization (IISWC), Nov 2012, pp. 14–24. [611] M. Noormohammadpour and C. S. Raghavendra, “Datacenter Traffic Control: Understanding Techniques and Tradeoffs,” IEEE Communications Surveys Tutorials, vol. 20, no. 2, pp. 1492–1525, Secondquarter 2018. [612] K. Chen, C. Hu, X. Zhang, K. Zheng, Y. Chen, and A. V. Vasilakos, “Survey on routing in data centers: insights and future directions,” IEEE Network, vol. 25, no. 4, pp. 6–10, July 2011. [613] R. Rojas-Cessa, Y. Kaymak, and Z. Dong, “Schemes for Fast Transmission of Flows in Data Center Networks,” IEEE Communications Surveys Tutorials, vol. 17, no. 3, pp. 1391–1422, thirdquarter 2015. [614] J. Qadir, A. Ali, K. A. Yau, A. Sathiaseelan, and J. Crowcroft, “Exploiting the Power of Multiplicity: A Holistic Survey of Network- Layer Multipath,” IEEE Communications Surveys Tutorials, vol. 17, no. 4, pp. 2176– 2213, Fourthquarter 2015. [615] J. Zhang, F. R. Yu, S. Wang, T. Huang, Z. Liu, and Y. Liu, “Load Balancing in Data Center Networks: A Survey,” IEEE Communications Surveys Tutorials, pp. 1–1, 2018. [616] Y. Zhang and N. Ansari, “On Architecture Design, Congestion Notification, TCP Incast and Power Consumption in Data Centers,” IEEE Communications Surveys Tutorials, vol. 15, no. 1, pp. 39–64, First 2013. [617] M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, V. T. Lam, F. Matus, R. Pan, N. Yadav, and G. Varghese, “CONGA: Distributed Congestion-aware Load Balancing for Datacenters,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 4, pp. 503–514, Aug. 2014. [618] X. Wu and X. Yang, “DARD: Distributed Adaptive Routing for Datacenter Networks,” in Distributed Computing Systems (ICDCS), 2012 IEEE 32nd International Conference on, June 2012, pp. 32–41. [619] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, “Data center TCP (DCTCP),” SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, pp. –, Aug. 2010. [620] C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley, “Improving Datacenter Performance and Robustness with Multipath TCP,” in Proceedings of the ACM SIGCOMM 2011 Conference, ser. SIGCOMM ’11. New York, NY, USA: ACM, 2011, pp. 266–277. [621] B. Vamanan, J. Hasan, and T. Vijaykumar, “Deadline-aware Datacenter TCP (D2TCP),” in Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, ser. SIGCOMM ’12. New York, NY, USA: ACM, 2012, pp. 115–126. [622] C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron, “Better never than late: Meeting deadlines in datacenter networks,” SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, pp. 50–61, Aug. 2011. [623] M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker, “pFabric: Minimal Near-optimal Datacenter Transport,” SIGCOMM Comput. Commun. Rev., vol. 43, no. 4, pp. 435–446, Aug. 2013. [624] C.-Y. Hong, M. Caesar, and P. B. Godfrey, “Finishing Flows Quickly with Preemptive Scheduling,” in Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, ser. SIGCOMM ’12. New York, NY, USA: ACM, 2012, pp. 127–138. [625] D. Zats, T. Das, P. Mohan, D. Borthakur, and R. Katz, “DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks,” in Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, ser. SIGCOMM ’12. New York, NY, USA: ACM, 2012, pp. 139–150. [626] M. Bari, R. Boutaba, R. Esteves, L. Granville, M. Podlesny, M. Rabbani, Q. Zhang, and M. Zhani, “Data Center Network Virtualization: A Survey,” Communications Surveys Tutorials, IEEE, vol. 15, no. 2, pp. 909– 928, Second 2013. [627] V. D. Piccolo, A. Amamou, K. Haddadou, and G. Pujolle, “A Survey of Network Isolation Solutions for Multi-Tenant Data Centers,” IEEE Communications Surveys Tutorials, vol. 18, no. 4, pp. 2787–2821, Fourthquarter 2016. [628] S. Raghul, T. Subashri, and K. R. Vimal, “Literature survey on traffic-based server load balancing using SDN and open flow,” in 2017 Fourth International Conference on Signal Processing, Communication and Networking (ICSCN), March 2017, pp. 1–6. [629] C. Guo, G. Lu, H. J. Wang, S. Yang, C. Kong, P. Sun, W. Wu, and Y. Zhang, “SecondNet: A Data Center Network Virtualization Architecture with Bandwidth Guarantees,” in Proceedings of the 6th International COnference, ser. Co-NEXT ’10. New York, NY, USA: ACM, 2010, pp. 15:1–15:12. [630] M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, “Hedera: Dynamic Flow Scheduling for Data Center Networks,” in Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’10. Berkeley, CA, USA: USENIX Association, 2010, pp. 19–19. [631] B. Heller, S. Seetharaman, P. Mahadevan, Y. Yiakoumis, P. Sharma, S. Banerjee, and N. McKeown, “ElasticTree: Saving Energy in Data Center Networks,” in Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’10. Berkeley, CA, USA: USENIX Association, 2010, pp. 17–17. [632] M. Dayarathna, Y. Wen, and R. Fan, “Data Center Energy Consumption Modeling: A Survey,” IEEE Communications Surveys Tutorials, vol. 18, no. 1, pp. 732–794, Firstquarter 2016. [633] D. Çavdar and F. Alagoz, “A survey of research on greening data centers,” in 2012 IEEE Global Communications Conference (GLOBECOM), Dec 2012, pp. 3237–3242. [634] A. C. Riekstin, B. B. Rodrigues, K. K. Nguyen, T. C. M. de Brito Carvalho, C. Meirosu, B. Stiller, and M. Cheriet, “A Survey on Metrics and Measurement Tools for Sustainable Distributed Cloud Networks,”IEEE Communications Surveys Tutorials, vol. 20, no. 2, pp. 1244–1270, Secondquarter 2018. [635] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The Cost of a Cloud: Research Problems in Data Center Networks,” SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 68–73, Dec. 2008. [636] W. Zhang, Y. Wen, Y. W. Wong, K. C. Toh, and C. H. Chen, “Towards Joint Optimization Over ICT and Cooling Systems in Data Centre: A Survey,” IEEE Communications Surveys Tutorials, vol. 18, no. 3, pp. 1596– 1616, thirdquarter 2016. [637] K. Bilal, S. U. R. Malik, S. U. Khan, and A. Y. Zomaya, “Trends and challenges in cloud datacenters,” IEEE Cloud Computing, vol. 1, no. 1, pp. 10–20, May 2014. [638] C. Kachris and I. Tomkos, “Power consumption evaluation of all-optical data center networks,” Cluster Computing, vol. 16, no. 3, pp. 611–623, 2013. [639] K. Christensen, P. Reviriego, B. Nordman, M. Bennett, M. Mostowfi, and J. Maestro, “IEEE 802.3az: the road to energy efficient ethernet,” Communications Magazine, IEEE, vol. 48, no. 11, pp. 50–56, November 2010.

Sanaa Hamid Mohamed received the B.Sc. degree (honors) in electrical and electronic engineering from the University of Khartoum, Sudan, in 2009 and the M.Sc. degree in electrical engineering from the American University of Sharjah (AUS), United Arab Emirates, in 2013. She received a full-time graduate teaching assistantship, MSELE program, AUS in 2011. She received a Doctoral Training Award (DTA) from the UK Engineering and Physical Sciences Research Council (EPSRC) to fund her PhD studies at the University of Leeds in 2015. Currently, is a PhD student at the Institute of Communication and Power Networks, University of Leeds, UK. She has held research and teaching positions at the University of Khartoum, AUS, Sudan University of Science and Technology, and Khalifa University between 2009 and 2013. She has been an IEEE member since 2008 and a member of the IEEE communication, photonics, computer, cloud computing, software defined networks, and sustainable ICT societies. Her research interests include wireless communications, optical communications, optical networking, software- defined networking, data center networking, big data analytics, and cloud and fog computing.

Taisir E. H. El-Gorashi received the B.S. degree (first-class Hons.) in electrical and electronic engineering from the University of Khartoum, Khartoum, Sudan, in 2004, the M.Sc. degree (with distinction) in photonic and communication systems from the University of Wales, Swansea, UK, in 2005, and the PhD degree in optical networking from the University of Leeds, Leeds, UK, in 2010. She is currently a Lecturer in optical networks in the School of Electrical and Electronic Engineering, University of Leeds. Previously, she held a Postdoctoral Research post at the University of Leeds (2010– 2014), where she focused on the energy efficiency of optical networks investigating the use of renewable energy in core networks, green IP over WDM networks with datacenters, energy efficient physical topology design, energy efficiency of content distribution networks, distributed cloud computing, network virtualization and Big Data. In 2012, she was a BT Research Fellow, where she developed energy efficient hybrid wireless-optical broadband access networks and explored the dynamics of TV viewing behavior and program popularity. The energy efficiency techniques developed during her postdoctoral research contributed 3 out of the 8 carefully chosen core network energy efficiency improvement measures recommended by the GreenTouch consortium for every operator network worldwide. Her work led to several invited talks at GreenTouch, Bell Labs, Optical Network Design and Modelling conference, Optical Fiber Communications conference, International Conference on Computer Communications, EU Future Internet Assembly, IEEE Sustainable ICT Summit and IEEE 5G World Forum and collaboration with Nokia and Huawei.

Jaafar M. H. Elmirghani (M’92–SM’99) is the Director of the Institute of Communication and Power Networks within the School of Electronic and Electrical Engineering, University of Leeds, UK. He joined Leeds in 2007 and prior to that (2000–2007) as chair in optical communications at the University of Wales Swansea he founded, developed and directed the Institute of Advanced Telecommunications and the Technium Digital (TD), a technology incubator/spin-off hub. He has provided outstanding leadership in a number of large research projects at the IAT and TD. He received the Ph.D. in the synchronization of optical systems and optical receiver design from the University of Huddersfield UK in 1994 and the DSc in Communication Systems and Networks from University of Leeds, UK, in 2014. He has co- authored Photonic Switching Technology: Systems and Networks, (Wiley) and has published over 500 papers. He has research interests in optical systems and networks. Prof. Elmirghani is Fellow of the IET, Fellow of the Institute of Physics and Senior Member of IEEE. He was Chairman of IEEE Comsoc Transmission Access and Optical Systems technical committee and was Chairman of IEEE Comsoc Signal Processing and Communications Electronics technical committee, and an editor of IEEE Communications Magazine. He was founding Chair of the Advanced Signal Processing for Communication Symposium which started at IEEE GLOBECOM’99 and has continued since at every ICC and GLOBECOM. Prof. Elmirghani was also founding Chair of the first IEEE ICC/GLOBECOM optical symposium at GLOBECOM’00, the Future Photonic Network Technologies, Architectures and Protocols Symposium. He chaired this Symposium, which continues to date under different names. He was the founding chair of the first Green Track at ICC/GLOBECOM at GLOBECOM 2011, and is Chair of the IEEE Sustainable ICT Initiative within the IEEE Technical Activities Board (TAB) Future Directions Committee (FDC) and within the IEEE Communications Society, a pan IEEE Societies Initiative responsible for Green and Sustainable ICT activities across IEEE, 2012-present. He is and has been on the technical program committee of 38 IEEE ICC/GLOBECOM conferences between 1995 and 2019 including 18 times as Symposium Chair. He received the IEEE Communications Society Hal Sobol award, the IEEE Comsoc Chapter Achievement award for excellence in chapter activities (both in 2005), the University of Wales Swansea Outstanding Research Achievement Award, 2006, the IEEE Communications Society Signal Processing and Communication Electronics outstanding service award, 2009, a best paper award at IEEE ICC’2013, the IEEE Comsoc Transmission Access and Optical Systems outstanding Service award 2015 in recognition of “Leadership and Contributions to the Area of Green Communications”, received the GreenTouch 1000x award in 2015 for “pioneering research contributions to the field of energy efficiency in telecommunications", the 2016 IET Optoelectronics Premium Award and shared with 6 GreenTouch innovators the 2016 Edison Award in the “Collective Disruption” Category for their work on the GreenMeter, an international competition, clear evidence of his seminal contributions to Green Communications which have a lasting impact on the environment (green) and society. He is currently an editor of: IET Optoelectronics, Journal of Optical Communications, IEEE Communications Surveys and Tutorials and IEEE Journal on Selected Areas in Communications series on Green Communications and Networking. He was Co-Chair of the GreenTouch Wired, Core and Access Networks Working Group, an adviser to the Commonwealth Scholarship Commission, member of the Royal Society International Joint Projects Panel and member of the Engineering and Physical Sciences Research Council (EPSRC) College. He was Principal Investigator (PI) of the £6m EPSRC INTelligent Energy awaRe NETworks (INTERNET) Programme Grant, 2010-2016 and is currently PI of the £6.6m EPSRC Terabit Bidirectional Multi-user Optical Wireless System (TOWS) for 6G LiFi Programme Grant, 2019-2024. He has been awarded in excess of £30 million in grants to date from EPSRC, the EU and industry and has held prestigious fellowships funded by the Royal Society and by BT. He was an IEEE Comsoc Distinguished Lecturer 2013-2016.