Leveraging Cloud-Based Predictive Analytics to Strengthen Audience Engagement
Total Page:16
File Type:pdf, Size:1020Kb
TECHNICAL PAPER Leveraging Cloud-Based Predictive Analytics to Strengthen Audience Engagement By U. Shakeel and M. Limcaco Abstract appointed time for the airing of our favorite TV show. To grow their business and increase their audience, content People now expect to watch the programming they distributors must understand the viewing habits and interests want on their own schedule, which is driving more and of content consumers. This typically requires solving tough more media companies to consider providing their own computational problems, such as rapidly processing vast over-the-top service. These services use the internet amounts of raw data from Web sites, social media, devices, for delivery, which introduces potential quality issues catalogs, and back-channel sources. For- that are out of the control of the media tunately, today’s content distributors can owner or distributer. To mitigate these take advantage of the scalability, cost risks, many media distributors invest effectiveness, and pay-as-you-go model This raises the heavily in solutions that detect play- of the cloud to address these challenges. question of how to back issues; these solutions require In this paper, we show content distribu- build the next- large amounts of computational capac- ity to process massive, raw datasets to tors how to use cloud technologies to build generation media predictive analytic solutions. We exam- provide real-time course correction. In ine architectural patterns for optimiz- deliv ery platform this way, distributors can provide more ing media delivery, and we discuss how that not only delivers reliable content that caters to the view- to assess the overall consumer experience reli able content ing habits of their audience. based on representative data sources. but also ensures This raises the question of how to Finally, we present concrete implementa- that the content is build the next-generation media deliv- ery platform that not only delivers reli- tions of cloud-based machine learning ser- experienced the way vices and show how to use the services to able content but also ensures that the profle audience demand, to cue content the con tent creator content is experienced the way the con- recommendations, and to prioritize the intended. tent creator intended. One approach is delivery of related media. to have the delivery platform predict when events, such as network conges- Introduction tion or low- quality streams, will occur in the future, An abundance of technical advancements has and subsequently guide consumers in the right direc- expanded the range of options for media con- tion. Powerful tools toward this goal include using data A sumers. Today’s consumers can choose to have generated by consumers from their interaction with 3-D, 4K, HDR, or even 8K content displayed content, social media, and multiple screens in conjunc- on a variety of sophisticated devices. Given the public’s tion with predictive modeling, machine learning, and appetite for these high-end devices, media creators are real-time analytics. According to Nielsen, social media constantly under pressure to increase resolution and activity drives higher broadcast TV ratings for 48% of quality to compete in an ever-expanding war of content shows;1 in a similar survey by Netflix, over 75% of what choices. people watch is based on Netflix’s recommendations.2 In addition to the changes in display technologies, In terms of audience engagement, there are two basic on-demand content and streaming media delivery have categories: changed the habits of content viewers. Gone are the ■■ Content experience—Using predictions and analytics on days when we used to circle around the TV set at an viewers’ viewing habits, player network logs, and data- sets to quickly analyze existing issues or predict future issues. The predictions can be used to minimize or even Digital Object Identifer 10.5594/JMI.2016.2602121 Date of publication: XXXX eliminate a poor customer experience. 1545-0279/16©2016SMPTE XXXX 2016 | SMPTE Motion Imaging Journal 1 jmi-shakeel-2602121.indd 1 20/09/16 1:30 PM ■■ Content relevance—Using predictions and analytics on can easily scale up and down based on demand. This historical and some real-time datasets to detect and elasticity means that as requirements change, you can recommend relevant content and personalize content easily resize your environment (horizontally or vertically) and ads, thereby improving the experience for con- on AWS to meet your needs without having to wait for tent selection. additional hardware or overinvest to provision for peak capacity. This scalability is especially important for mis- Audience Engagement Signals sion-critical applications. In contrast, system designers To build a next-generation media delivery platform and for traditional infrastructures have no choice but to over- deliver a better customer experience, content distribu- provision because systems must be able to handle surges tors can capture, fuse, and synthesize background sig- in data volumes due to increases in business demand. nals (noise) to create models to analyze, for both batch In addition, the dynamic computing resources of and real-time data. We can leverage both transactional AWS run on a world-class infrastructure, which is data (from user interactions such as searching, playing, available across the 11 different geographic regions that watching/listening, and contacting sales/support) to AWS supports. The AWS platform offers several scal- behavioral activities (such as sharing, tagging, liking, able services, such as Amazon Simple Storage Service reviewing the content, and so on) to build a prediction (Amazon S3) and AWS Data Pipeline. The scalability, model. The analysis can be descriptive (aggregation, elasticity, and global footprint of the AWS platform retrospective), predictive (statistical, machine learn- make it an extremely good fit for solving big data prob- ing), or prescriptive (what should we do about it) based lems. You can read about how many AWS customers on technology choices. In the remainder of this paper, have implemented successful big data analytics work- we focus on descriptive and predictive analytics, espe- loads on the AWS Case Studies & Customer Success cially in the context of content relevance. Stories, Powered by the AWS Cloud Web page. AQ 1 Machine Learning for Predictive Analytics Technology Machine learning (ML) is a broad area of tools and Relevant Products and Solutions techniques that can help you use historical data to make The technology spectrum in this space can be catego- better business decisions. ML algorithms discover pat- rized into three main areas: terns in data and construct predictive models using ■■ Desktop-driven data science tooling (including these patterns, allowing you to use the models to make Microsoft Excel, KNIME, IBM SPSS Statistics, predictions from future data. For example, you could RapidMiner, R language and environment, Weka, use ML to predict whether customers will select a title MATLAB, and Octave) to view based on data such as their viewing history, what ■■ Server-side big data platforms (several products within other users in their same demographic have watched, the Hadoop platform, such as Apache Mahout, Spark and even whom they follow on social media platforms. MLlib, Oxdata, GraphLab, R+ Hadoop, Radoop, You then can use the predictions to identify which cus- Apache Hama, Apache Giraph, Apache HBase Kiji, tomers are most likely to respond to your personalized, and BigML) promotional e-mail. ■■ Scoring/prediction deployment services (Zementis ADAPA, and Orynx). Benefits of the AWS Cloud Many of these options (Prediction IO, Mortar, for Predictive Analytics BigML) are available in the AWS Marketplace as well. Amazon Web Services (AWS) offers a comprehensive, AWS Marketplace allows users to launch Amazon EC2 end-to-end portfolio of cloud computing services that you can use to analyze your consumers’ interactions. AWS performs high-volume data processing at scale Single Node Multi Node and at a fraction of the cost compared with traditional Visualization data analytics infrastructure solutions. It is important to & understand the scale of such a dataset. For example, as Analytics R KNIME GraphLab of 2011, Netflix has been managing 20 billion requests Octave WEKA Mahout per month for millions of consumers across more than Matlab Python Kits Spark Mllib Excel 60 geographies.3 Netflix leverages tens of thousands of H2O EC2 virtual instances on demand and terminating the RDBMS instances when the work is complete. SAN/NAS HBASE Analyzing large datasets requires significant compute DAS HDFS capacity that can vary in size based on the amount of Storage input data and the analysis required. This characteristic of big data workloads is ideally suited to the pay-as-you- go cloud-computing model of AWS, where applications FIGURE 1. Relevant analytics technologies. 2 SMPTE Motion Imaging Journal | XXXX 2016 jmi-shakeel-2602121.indd 2 20/09/16 1:30 PM Hot Cold Storage Amazon EBS Amazon S3 Amazon Glacier Processing Amazon EC2 Amazon EMR AWS Data Pipeline Ingest Amazon Kinesis AWS Storage AWS Import/Export Gateway Database DynamoDB Amazon Redshift Amazon RDS FIGURE 2. Relevant services: AWS. instances or Amazon EMR workloads with software data object store as well as an ideal storage solution for licensing included in the hourly pricing model, so you digital content (source and processed content files). simply pay a prorated license fee for the time you use Amazon DynamoDB6 is a fast, fully managed NoSQL the application. You can choose appropriate products database service that makes it simple and cost-effective specific to the use case, depending on whether you need to store and retrieve any amount of data or serve any a single node job or a distributed, clustered job. level of request traffic. DynamoDB has provisioned guaranteed throughput and single-digit millisecond Relevant Services on the AWS Platform latency, which makes it a great fit for gaming, ad tech, AWS data analytics solutions consist of fully man- mobile, and many other big data applications.