Effectiveness of Cloud Services for Scientific and Vod Applications
Total Page:16
File Type:pdf, Size:1020Kb
University of Massachusetts Amherst ScholarWorks@UMass Amherst Doctoral Dissertations Dissertations and Theses March 2015 Effectiveness of Cloud Services for Scientific and oDV Applications Dilip Kumar Krishnappa University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/dissertations_2 Part of the OS and Networks Commons Recommended Citation Krishnappa, Dilip Kumar, "Effectiveness of Cloud Services for Scientific and oDV Applications" (2015). Doctoral Dissertations. 310. https://doi.org/10.7275/6377319.0 https://scholarworks.umass.edu/dissertations_2/310 This Open Access Dissertation is brought to you for free and open access by the Dissertations and Theses at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. EFFECTIVENESS OF CLOUD SERVICES FOR SCIENTIFIC AND VOD APPLICATIONS A Dissertation Presented by DILIP KUMAR KRISHNAPPA Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY February 2015 Electrical and Computer Engineering c Copyright by Dilip Kumar Krishnappa 2015 All Rights Reserved EFFECTIVENESS OF CLOUD SERVICES FOR SCIENTIFIC AND VOD APPLICATIONS A Dissertation Presented by DILIP KUMAR KRISHNAPPA Approved as to style and content by: Michael Zink, Chair Lixin Gao, Member David Irwin, Member Prashant Shenoy, Member Christopher V. Hollot, Department Chair Electrical and Computer Engineering To my wonderful parents, caring brother and loving sister. ACKNOWLEDGEMENTS First and foremost, I would like to thank my advisor Prof. Michael Zink. It has been an honor to be his first PhD student. For the past 5 years, he has been a constant support and guide in all my academic decisions. His mentoring and guidance has helped enjoy my stay at UMass and provided confidence to finish my PhD. I would like to thank my committee members Prof. David Irwin, Prof. Lixin Gao and Prof. Prashant Shenoy for presiding over my dissertation. I would like to thank Prof. Ramesh Sitaraman from UMass Amherst, Prof. Carsten Griwodz and Prof. P˚al Halvorsen from University of Oslo for giving me an opportunity to work with them during parts of my dissertation. I would like to thank all the reviewers and shepherds of all the conference and journal articles I have submitted during my PhD. I would like to thank my colleagues Cong Wang, Divyashri Bhat, Samamon Khemmarat and Eric Adams for working closely with me on projects related to my dissertation. My time at UMass was made enjoyable in large part due to the many friends that became a part of my life during these past 5 years. A special thanks to Ravi Vantipalli, Vikram Desai, Abhinna Jain, Satish Inampudi, Chilakam Madhusudan, Siddhartha Nalluri, Srikar Damaraju and Pavan Prakash. It is their support during all the lows and highs of being a PhD student has helped me through the past 5 years and provided moments to cherish for life. Finally, I am obliged to my grand parents, parents, my brother Amith Kumar, my sister Shruthi, my cousins Chethan and Chaitra, who have influenced me the most and were always there to support and encourage me. They have been my source of strength. I dedicate this thesis to them. v ABSTRACT EFFECTIVENESS OF CLOUD SERVICES FOR SCIENTIFIC AND VOD APPLICATIONS FEBRUARY 2015 DILIP KUMAR KRISHNAPPA B.E., VISVESWARAIAH TECHNOLOGICAL UNIVERSITY, INDIA M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Michael Zink Cloud platforms have emerged as the primary data warehouse for a variety of applications, such as DropBox, iCloud, Google Music, etc. These applications allow users to store data in the cloud and access it from anywhere in the world. Com- mercial clouds are also well suited for providing high-end servers for rent to execute applications that require computation resources sporadically. Cloud users only pay for the time they actually use the hardware and the amount of data that is trans- mitted to and from the cloud, which has the potential to be more cost effective than purchasing, hosting, and maintaining dedicated hardware. In this dissertation, we look into the efficiency of the cloud Infrastructure-as-a-Service (IaaS) model for two real time high bandwidth applications: A scientific application of short-term weather forecasting and Video on Demand services. vi Short-term Weather Forecasting Since today's weather forecasts only cover large regions every few hours, their use in severe weather is limited. Dedicating high-end servers for executing scientific applications that run intermittently, such as severe weather detection or generalized weather forecasting, wastes resources. In this dissertation, we present CloudCast, an application that provides short-term weather forecasts depending on user's cur- rent location. Since severe weather is rare, CloudCast leverages pay-as-you-go cloud platforms to eliminate dedicated computing infrastructure. CloudCast has two com- ponents: 1) an architecture linking weather radars to cloud resources, and 2) a Now- casting algorithm for generating accurate short-term weather forecasts. Our hypothesis is that the connectivity and diversity of current cloud platforms mitigate the impact of staging data and computation latency for Nowcasting. In evaluating our hypothesis, in this dissertation, we study CloudCast's design space, which requires significant data staging to the cloud. Short-term weather forecasting has both computational and network requirements. While it executes rarely, when- ever severe weather approaches, it benefits from an IaaS model; However, since its results are time-critical, sufficient bandwidth must be available to transmit radar data to cloud platforms before it becomes stale. In this dissertation, we analyze the networking capabilities of multiple commercial (Amazon's EC2 and Rackspace) and research (GENICloud and ExoGENI) cloud platforms for our CloudCast application. We conduct network capacity measurements between radar sites and cloud platforms throughout the country. Our results indicate that research cloud testbeds performs the best for both serial and parallel data transfer with an average throughput of 110.22 Mbps and 17.2 Mbps, respectively. We also found that the cloud services per- form better in the distributed data transfer case, where a subset of nodes transmit data in parallel to a cloud instance. The results also indicate that serial transfers vii achieve tolerable throughput, while parallel transfers represent a bottleneck for real- time weather forecasting. In addition to network capacity analysis, we also analyze the computation ca- pability of cloud services for our CloudCast application. We measure the cost and computation time to generate 15-minute forecasts for real weather data on cloud ser- vices. Also, we analyze the forecast accuracy of executing Nowcasting in the cloud using canned weather data and show high accuracy for ten minutes in the future. Fi- nally, we execute CloudCast live using an on-campus radar, and show that it delivers a 15-minute Nowcast to a mobile client in less than 2 minutes after data sampling started, which is quick enough for users and emergency management personnel to take desired action. VoD Services Advancement in Internet bandwidth connectivity to homes coupled with the stream- ing of TV shows and other media content on the Internet has led to a huge increase in VoD services in the past decade. The advent of sites such as YouTube, Hulu, Netflix, DailyMotion and various other video content providers, make it feasible for the users to share their own videos on these VoD websites or watch their favorite movies or TV shows online. The popularity of VoD services has motivated U.S national TV chan- nels such as ABC, NBC and CBS to offer their prime time programs on the Internet for users to catch up on their shows via streaming on their websites. In 2012, Internet video traffic made up 57% of the total network traffic in the Internet. This shows that with the popularity of VoD services, the amount of Internet traffic due to video increases tremendously. However, the popularity of videos offered in VoD services follow a long tail distribution, where only a few videos are highly popular and most of the videos offered are requested rarely. This long tail popularity distribution in VoD services is especially significant when content is user generated. Analysis from a YouTube trace show that, only 23% of the YouTube videos are viii requested more than once over a period of 24 hours. Consequently, 77% of the videos which are requested only once reduces the efficiency of caches deployed all around the world to serve the videos requested. Video streaming in VoD services such as Netflix, Hulu, YouTube etc., have moved from traditional streaming of a fixed bit rate video file to the client to Adaptive Bit Rate streaming (ABR). ABR streaming is realized through video streaming over HTTP where the source content is segmented into small multi-second (usually be- tween 2 and 10 seconds) segments and each segment is encoded at multiple bit rates. The process of creating multiple bit rate versions of a video is called transcoding. Once each video is transcoded into multiple bit rates, ABR streaming allows the client to choose an appropriate quality version for each video segment based on the available bandwidth between the source and client. However, long tail popularity distribution of VoD services shows that transcoding all video provided by VoD services to all the bit rates supported wastes transcoding resources and storage space. In this dissertation, we provide solutions to the issues posed by the long tail popularity distribution of VoD services. We look into the feasibility of using cloud services to reduce the traffic in the Internet by reducing the amount of video requests from the long tail distribution of VoD services. In particular, this dissertation focuses on YouTube video service, the world's most popular Internet service that hosts user- generated videos.