Mapreduce: a Flexible Data Processing Tool

Total Page:16

File Type:pdf, Size:1020Kb

Mapreduce: a Flexible Data Processing Tool contributed articles DOI:10.1145/1629175.1629198 of MapReduce has been used exten- MapReduce advantages over parallel databases sively outside of Google by a number of organizations.10,11 include storage-system independence and To help illustrate the MapReduce fine-grain fault tolerance for large jobs. programming model, consider the problem of counting the number of by JEFFREY DEAN AND SaNjay GHEMawat occurrences of each word in a large col- lection of documents. The user would write code like the following pseudo- code: MapReduce: map(String key, String value): // key: document name // value: document contents for each word w in value: A Flexible EmitIntermediate(w, “1”); reduce(String key, Iterator values): // key: a word Data // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Processing Emit(AsString(result)); The map function emits each word plus an associated count of occurrences (just `1' in this simple example). The re- Tool duce function sums together all counts emitted for a particular word. MapReduce automatically paral- lelizes and executes the program on a large cluster of commodity machines. The runtime system takes care of the details of partitioning the input data, MAPREDUCE IS A programming model for processing scheduling the program’s execution and generating large data sets.4 Users specify a across a set of machines, handling machine failures, and managing re- map function that processes a key/value pair to quired inter-machine communication. generate a set of intermediate key/value pairs and MapReduce allows programmers with a reduce function that merges all intermediate no experience with parallel and dis- tributed systems to easily utilize the re- values associated with the same intermediate key. sources of a large distributed system. A We built a system around this programming model typical MapReduce computation pro- cesses many terabytes of data on hun- in 2003 to simplify construction of the inverted dreds or thousands of machines. Pro- index for handling searches at Google.com. Since grammers find the system easy to use, and more than 100,000 MapReduce Z AT then, more than 10,000 distinct programs have been W jobs are executed on Google’s clusters US implemented using MapReduce at Google, including every day. I algorithms for large-scale graph processing, text ON BY MAR ON BY Compared to Parallel Databases I processing, machine learning, and statistical machine The query languages built into paral- translation. The Hadoop open source implementation lel database systems are also used to ILLUSTRAT 72 COMMUNICatIONS OF THE ACM | JanUARY 2010 | VOL. 53 | NO. 1 JANUARY 2010 | VOL. 53 | NO. 1 | COMMUNICatIONS OF THE ACM 73 contributed articles express the type of computations sup- support a new storage system by de- would need to read only that sub-range ported by MapReduce. A 2009 paper fining simple reader and writer imple- instead of scanning the entire Bigtable. by Andrew Pavlo et al. (referred to here mentations that operate on the storage Furthermore, like Vertica and other col- as the “comparison paper”13) com- system. Examples of supported storage umn-store databases, we will read data pared the performance of MapReduce systems are files stored in distributed only from the columns needed for this and parallel databases. It evaluated file systems,7 database query results,2,9 analysis, since Bigtable can store data the open source Hadoop implementa- data stored in Bigtable,3 and structured segregated by columns. tion10 of the MapReduce programming input files (such as B-trees). A single Yet another example is the process- model, DBMS-X (an unidentified com- MapReduce operation easily processes ing of log data within a certain date mercial database system), and Vertica and combines data from a variety of range; see the Join task discussion in (a column-store database system from storage systems. the comparison paper, where the Ha- a company co-founded by one of the Now consider a system in which a doop benchmark reads through 155 authors of the comparison paper). Ear- parallel DBMS is used to perform all million records to process the 134,000 lier blog posts by some of the paper’s data analysis. The input to such analy- records that fall within the date range authors characterized MapReduce as sis must first be copied into the parallel of interest. Nearly every logging sys- “a major step backwards.”5,6 In this DBMS. This loading phase is inconve- tem we are familiar with rolls over to article, we address several misconcep- nient. It may also be unacceptably slow, a new log file periodically and embeds tions about MapReduce in these three especially if the data will be analyzed the rollover time in the name of each publications: only once or twice after being loaded. log file. Therefore, we can easily run a ˲ MapReduce cannot use indices and For example, consider a batch-oriented MapReduce operation over just the log implies a full scan of all input data; Web-crawling-and-indexing system files that may potentially overlap the ˲ MapReduce input and outputs are that fetches a set of Web pages and specified date range, instead of reading always simple files in a file system; and generates an inverted index. It seems all log files. ˲ MapReduce requires the use of in- awkward and inefficient to load the set efficient textual data formats. of fetched pages into a database just so Complex Functions We also discuss other important is- they can be read through once to gener- Map and Reduce functions are often sues: ate an inverted index. Even if the cost of fairly simple and have straightforward ˲ MapReduce is storage-system inde- loading the input into a parallel DBMS SQL equivalents. However, in many pendent and can process data without is acceptable, we still need an appropri- cases, especially for Map functions, the first requiring it to be loaded into a da- ate loading tool. Here is another place function is too complicated to be ex- tabase. In many cases, it is possible to MapReduce can be used; instead of pressed easily in a SQL query, as in the run 50 or more separate MapReduce writing a custom loader with its own ad following examples: analyses in complete passes over the hoc parallelization and fault-tolerance ˲ Extracting the set of outgoing links data before it is possible to load the data support, a simple MapReduce program from a collection of HTML documents into a database and complete a single can be written to load the data into the and aggregating by target document; analysis; parallel DBMS. ˲ Stitching together overlapping sat- ˲ Complicated transformations are ellite images to remove seams and to often easier to express in MapReduce Indices select high-quality imagery for Google than in SQL; and The comparison paper incorrectly said Earth; ˲ Many conclusions in the compari- that MapReduce cannot take advan- ˲ Generating a collection of inverted son paper were based on implementa- tage of pregenerated indices, leading index files using a compression scheme tion and evaluation shortcomings not to skewed benchmark results in the tuned for efficient support of Google fundamental to the MapReduce model; paper. For example, consider a large search queries; we discuss these shortcomings later in data set partitioned into a collection ˲ Processing all road segments in the this article. of nondistributed databases, perhaps world and rendering map tile images We encourage readers to read the using a hash function. An index can that display these segments for Google original MapReduce paper4 and the be added to each database, and the Maps; and comparison paper13 for more context. result of running a database query us- ˲ Fault-tolerant parallel execution of ing this index can be used as an input programs written in higher-level lan- Heterogenous Systems to MapReduce. If the data is stored in guages (such as Sawzall14 and Pig Lat- Many production environments con- D database partitions, we will run D in12) across a collection of input data. tain a mix of storage systems. Customer database queries that will become the Conceptually, such user defined data may be stored in a relational data- D inputs to the MapReduce execution. functions (UDFs) can be combined base, and user requests may be logged Indeed, some of the authors of Pavlo et with SQL queries, but the experience to a file system. Furthermore, as such al. have pursued this approach in their reported in the comparison paper indi- environments evolve, data may migrate more recent work.11 cates that UDF support is either buggy to new storage systems. MapReduce Another example of the use of in- (in DBMS-X) or missing (in Vertica). provides a simple model for analyzing dices is a MapReduce that reads from These concerns may go away over the data in such heterogenous systems. Bigtable. If the data needed maps to a long term, but for now, MapReduce is a End users can extend MapReduce to sub-range of the Bigtable row space, we better framework for doing more com- 74 COMMUNICatIONS OF THE ACM | JanUARY 2010 | VOL. 53 | NO. 1 contributed articles plicated tasks (such as those listed ear- of protocol buffers uses an optimized lier) than the selection and aggregation binary representation that is more that are SQL’s forte. compact and much faster to encode and decode than the textual formats Structured Data and Schemas used by the Hadoop benchmarks in the Pavlo et al. did raise a good point that MapReduce is comparison paper. For example, the schemas are helpful in allowing multi- a highly effective automatically generated code to parse ple applications to share the same data. a Rankings protocol buffer record For example, consider the following and efficient runs in 20 nanoseconds per record as schema from the comparison paper: tool for large-scale compared to the 1,731 nanoseconds CREATE TABLE Rankings ( required per record to parse the tex- pageURL VARCHAR(100) fault-tolerant tual input format used in the Hadoop PRIMARY KEY, benchmark mentioned earlier.
Recommended publications
  • Intro to Google for the Hill
    Introduction to A company built on search Our mission Google’s mission is to organize the world’s information and make it universally accessible and useful. As a first step to fulfilling this mission, Google’s founders Larry Page and Sergey Brin developed a new approach to online search that took root in a Stanford University dorm room and quickly spread to information seekers around the globe. The Google search engine is an easy-to-use, free service that consistently returns relevant results in a fraction of a second. What we do Google is more than a search engine. We also offer Gmail, maps, personal blogging, and web-based word processing products to name just a few. YouTube, the popular online video service, is part of Google as well. Most of Google’s services are free, so how do we make money? Much of Google’s revenue comes through our AdWords advertising program, which allows businesses to place small “sponsored links” alongside our search results. Prices for these ads are set by competitive auctions for every search term where advertisers want their ads to appear. We don’t sell placement in the search results themselves, or allow people to pay for a higher ranking there. In addition, website managers and publishers take advantage of our AdSense advertising program to deliver ads on their sites. This program generates billions of dollars in revenue each year for hundreds of thousands of websites, and is a major source of funding for the free content available across the web. Google also offers enterprise versions of our consumer products for businesses, organizations, and government entities.
    [Show full text]
  • Mapreduce and Beyond
    MapReduce and Beyond Steve Ko 1 Trivia Quiz: What’s Common? Data-intensive compung with MapReduce! 2 What is MapReduce? • A system for processing large amounts of data • Introduced by Google in 2004 • Inspired by map & reduce in Lisp • OpenSource implementaMon: Hadoop by Yahoo! • Used by many, many companies – A9.com, AOL, Facebook, The New York Times, Last.fm, Baidu.com, Joost, Veoh, etc. 3 Background: Map & Reduce in Lisp • Sum of squares of a list (in Lisp) • (map square ‘(1 2 3 4)) – Output: (1 4 9 16) [processes each record individually] 1 2 3 4 f f f f 1 4 9 16 4 Background: Map & Reduce in Lisp • Sum of squares of a list (in Lisp) • (reduce + ‘(1 4 9 16)) – (+ 16 (+ 9 (+ 4 1) ) ) – Output: 30 [processes set of all records in a batch] 4 9 16 f f f returned iniMal 1 5 14 30 5 Background: Map & Reduce in Lisp • Map – processes each record individually • Reduce – processes (combines) set of all records in a batch 6 What Google People Have NoMced • Keyword search Map – Find a keyword in each web page individually, and if it is found, return the URL of the web page Reduce – Combine all results (URLs) and return it • Count of the # of occurrences of each word Map – Count the # of occurrences in each web page individually, and return the list of <word, #> Reduce – For each word, sum up (combine) the count • NoMce the similariMes? 7 What Google People Have NoMced • Lots of storage + compute cycles nearby • Opportunity – Files are distributed already! (GFS) – A machine can processes its own web pages (map) CPU CPU CPU CPU CPU CPU CPU CPU
    [Show full text]
  • Google Apps Premier Edition: Easy, Collaborative Workgroup Communication with Gmail and Google Calendar
    Google Apps Premier Edition: easy, collaborative workgroup communication with Gmail and Google Calendar Messaging overview Google Apps Premier Edition messaging tools include email, calendar and instant messaging solutions that help employees communicate and stay connected, wherever and whenever they work. These web-based services can be securely accessed from any browser, work on mobile devices like BlackBerry and iPhone, and integrate with other popular email systems like Microsoft Outlook, Apple Mail, and more. What’s more, Google Apps’ SAML-based Single Sign-On (SSO) capability integrates seamlessly with existing enterprise security and authentication services. Google Apps deliver productivity and reduce IT workload with a hosted, 99.9% uptime solution that gets teams working together fast. Gmail Get control of spam Advanced filters keep spam from employees’ inboxes so they can focus on messages that matter, and IT admins can focus on other initiatives. Keep all your email 25 GB of storage per user means that inbox quotas and deletion schedules are a thing of the past. Integrated instant messaging Connect with contacts instantly without launching a separate application or leaving your inbox. No software required. Built-in voice and video chat Voice and video conversations, integrated into Gmail, make it easy to connect face-to-face with co-workers around the world. Find messages instantly Powerful Google search technology is built into Gmail, turning your inbox into your own private and secure Google search engine for email. Protect and secure sensitive information Additional spam filtering from Postini provides employees with an additional layer of protection and policy-enforced encryption between domains using standard TLS protocols.
    [Show full text]
  • Improving Efficiency of Map Reduce Paradigm with ANFIS for Big Data (IJSTE/ Volume 1 / Issue 12 / 015)
    IJSTE - International Journal of Science Technology & Engineering | Volume 1 | Issue 12 | June 2015 ISSN (online): 2349-784X Improving Efficiency of Map Reduce Paradigm with ANFIS for Big Data Gor Vatsal H. Prof. Vatika Tayal Department of Computer Science and Engineering Department of Computer Science and Engineering NarNarayan Shashtri Institute of Technology Jetalpur , NarNarayan Shashtri Institute of Technology Jetalpur , Ahmedabad , India Ahmedabad , India Abstract As all we know that map reduce paradigm is became synonyms for computing big data problems like processing, generating and/or deducing large scale of data sets. Hadoop is a well know framework for these types of problems. The problems for solving big data related problems are varies from their size , their nature either they are repetitive or not etc., so depending upon that various solutions or way have been suggested for different types of situations and problems. Here a hybrid approach is used which combines map reduce paradigm with anfis which is aimed to boost up such problems which are likely to repeat whole map reduce process multiple times. Keywords: Big Data, fuzzy Neural Network, ANFIS, Map Reduce, Hadoop ________________________________________________________________________________________________________ I. INTRODUCTION Initially, to solve problem various problems related to large crawled documents, web requests logs, row data , etc a computational processing model is suggested by jeffrey Dean and Sanjay Ghemawat is Map Reduce in 2004[1]. MapReduce programming model is inspired by map and reduce primitives which are available in Lips and many other functional languages. It is like a De Facto standard and widely used for solving big data processing and related various operations.
    [Show full text]
  • Notes on Chromebooks and Neverware Cloudready Chromium
    Chromebooks Are For Seniors - Ron Brown - APCUG VTC - 8-19-17 https://youtu.be/4uszFPNL-SU http://cb4s.net/ Are Chromebooks more secure than laptops? Google’s security features in ChromeOS When Google set about designing ChromeOS it had the distinct advantage of being able to see the problems that Windows, macOS, and even Linux had struggled with when it came to security. With this in mind it implemented five key features that make ChromeOS a formidable system for hackers to crack. ​ ​ Automatic Updates As new threats become known, it’s vital that patches are applied quickly to thwart them. Google has an excellent track record on this, as not only does it release fixes on a very regular basis, but with Chromebooks guaranteed OS updates for seven years after release, the majority of users are running the most up to date version anyway. his can be an issue on other platforms, where differing combinations of OS ​ ​ versions and hardware can delay patches. Sandboxing If something does go wrong, and malware gets onto a Chromebook, there’s not much damage it can do. Each tab in ChromeOS acts as a separate entity with a restricted environment or ‘sandbox’. This means that only the ​ ​ affected tab is vulnerable, and that it is very difficult for the infection to spread to other areas of the machine. In Windows and macOS the malware is usually installed somewhere on the system itself, which makes it a threat with a much wider scope. There are ways to restrict this of course, with anti-virus software, regular system ​ ​ scans, and not running as an administrator.
    [Show full text]
  • Redalyc.Acceptability Engineering: the Study of User Acceptance Of€Innovative€Technologies
    Journal of Applied Research and Technology ISSN: 1665-6423 [email protected] Centro de Ciencias Aplicadas y Desarrollo Tecnológico México Kim, Hee-Cheol Acceptability engineering: the study of user acceptance of innovative technologies Journal of Applied Research and Technology, vol. 13, núm. 2, 2015, pp. 230-237 Centro de Ciencias Aplicadas y Desarrollo Tecnológico Distrito Federal, México Available in: http://www.redalyc.org/articulo.oa?id=47439895008 How to cite Complete issue Scientific Information System More information about this article Network of Scientific Journals from Latin America, the Caribbean, Spain and Portugal Journal's homepage in redalyc.org Non-profit academic project, developed under the open access initiative Disponible en www.sciencedirect.com Journal of Applied Research and Technology Journal of Applied Research and Technology 13 (2015) 230-237 www.jart.ccadet.unam.mx Original Acceptability engineering: the study of user acceptance of innovative technologies Hee-Cheol Kim Department of Computer Engineering, u-Healthcare & Anti-aging Research Center, Inje University, Gimhae, Gyeong-Nam, Korea Received 19 April 2014; accepted 18 August 2014 Abstract The discipline of human-computer interaction (HCI) has been vital in developing understandings of users, usability, and the design of user- centered computer systems. However, it does not provide a satisfactory explanation of user perspectives on the specialized but important domain of innovative technologies, instead focusing more on mature technologies. In particular, the success of innovative technologies requires attention to be focused on early adopters of the technology and enthusiasts, rather than general end-users. Therefore, user acceptance should be considered more important than usability and convenience.
    [Show full text]
  • Learn How to Use Google Reviews at Your Hotel
    Learn How to Use Google Reviews at your Hotel Guide Managing Guest Satisfaction Surveys: Best Practices Index Introduction 2 Can you explain Google’s rating system? 3 What’s different about the new Google Maps? 5 Do reviews affect my hotel’s search ranking? 6 How can we increase the number of Google reviews? 7 Can I respond to Google reviews? 8 Managing Guest Satisfaction1 Surveys: Best Practices Introduction Let’s be honest, Google user reviews aren’t very helpful And then there’s the near-ubiquitous “+1” button, a way when compared to reviews on other review sites. for Google+ users to endorse a business, web page, They’re sparse, random and mostly anonymous. You photo or post. can’t sort them, filtering options are minimal, and the rating system is a moving target. These products are increasingly integrated, allowing traveler planners to view rates, availability, location, But that’s all changing. photos and reviews without leaving the Google ecosystem. Reviews and ratings appear to play an increasingly critical role in Google’s master plan for world domination This all makes Google reviews difficult to ignore—for in online travel planning. They now show prominently in travelers and hotels. So what do hotels need to know? In Search, Maps, Local, Google+, Hotel Finder and the this final instalment in ReviewPro’s popular Google For new Carousel—and on desktops, mobile search and Hotels series, we answer questions from webinar mobile applications. attendees related to Google reviews. Managing Guest Satisfaction2 Surveys: Best Practices Can you Explain Google’s Rating System? (I) Registered Google users can rate a business by visiting its Google+ 360° Guest Local page and clicking the Write a Review icon.
    [Show full text]
  • Character-Word LSTM Language Models
    Character-Word LSTM Language Models Lyan Verwimp Joris Pelemans Hugo Van hamme Patrick Wambacq ESAT – PSI, KU Leuven Kasteelpark Arenberg 10, 3001 Heverlee, Belgium [email protected] Abstract A first drawback is the fact that the parameters for infrequent words are typically less accurate because We present a Character-Word Long Short- the network requires a lot of training examples to Term Memory Language Model which optimize the parameters. The second and most both reduces the perplexity with respect important drawback addressed is the fact that the to a baseline word-level language model model does not make use of the internal structure and reduces the number of parameters of the words, given that they are encoded as one-hot of the model. Character information can vectors. For example, ‘felicity’ (great happiness) is reveal structural (dis)similarities between a relatively infrequent word (its frequency is much words and can even be used when a word lower compared to the frequency of ‘happiness’ is out-of-vocabulary, thus improving the according to Google Ngram Viewer (Michel et al., modeling of infrequent and unknown words. 2011)) and will probably be an out-of-vocabulary By concatenating word and character (OOV) word in many applications, but since there embeddings, we achieve up to 2.77% are many nouns also ending on ‘ity’ (ability, com- relative improvement on English compared plexity, creativity . ), knowledge of the surface to a baseline model with a similar amount of form of the word will help in determining that ‘felic- parameters and 4.57% on Dutch. Moreover, ity’ is a noun.
    [Show full text]
  • GOOGLE LLC V. ORACLE AMERICA, INC
    (Slip Opinion) OCTOBER TERM, 2020 1 Syllabus NOTE: Where it is feasible, a syllabus (headnote) will be released, as is being done in connection with this case, at the time the opinion is issued. The syllabus constitutes no part of the opinion of the Court but has been prepared by the Reporter of Decisions for the convenience of the reader. See United States v. Detroit Timber & Lumber Co., 200 U. S. 321, 337. SUPREME COURT OF THE UNITED STATES Syllabus GOOGLE LLC v. ORACLE AMERICA, INC. CERTIORARI TO THE UNITED STATES COURT OF APPEALS FOR THE FEDERAL CIRCUIT No. 18–956. Argued October 7, 2020—Decided April 5, 2021 Oracle America, Inc., owns a copyright in Java SE, a computer platform that uses the popular Java computer programming language. In 2005, Google acquired Android and sought to build a new software platform for mobile devices. To allow the millions of programmers familiar with the Java programming language to work with its new Android plat- form, Google copied roughly 11,500 lines of code from the Java SE pro- gram. The copied lines are part of a tool called an Application Pro- gramming Interface (API). An API allows programmers to call upon prewritten computing tasks for use in their own programs. Over the course of protracted litigation, the lower courts have considered (1) whether Java SE’s owner could copyright the copied lines from the API, and (2) if so, whether Google’s copying constituted a permissible “fair use” of that material freeing Google from copyright liability. In the proceedings below, the Federal Circuit held that the copied lines are copyrightable.
    [Show full text]
  • Advanced Search Options
    ADVANCED SEARCH OPTIONS Even the most powerful search engine requires a bit of fine-tuning. To enhance your Google search, try the following options: Phrase Searches Search for complete phrases by enclosing them in quotation marks. Words enclosed in double quotes ("like this") appear together in all results exactly as you have entered them. Phrase searches are especially useful for finding famous sayings or proper names. Category Searches The Google Web Directory (located at directory.google.com) is a good place to start if you're not sure exactly what terms to use. A directory can also eliminate unwanted results from your search. For example, searching for "Saturn" within the Science > Astronomy category of the Google Web Directory returns only Google supports several advanced operators. Many are pages about the planet Saturn, while searching for "Saturn" accessible from the Google advanced search page. within the Automotive category of the Google Web Directory returns only pages about Saturn cars. Advanced Searches Made Easy You can increase the accuracy of your searches by adding Domain Restrict Searches operators that fine-tune your keywords. Most of the options listed If you know the website you want to search but aren't sure on this page can be entered directly into the Google search box or where the information you want is located within that site, you selected from Google's "Advanced Search" page, which can be can use Google to search only that domain. Do this by entering found at: http://www.google.com/advanced_search what you're looking for, followed by the word "site" and a colon followed by the domain name.
    [Show full text]
  • Study and Analysis of Different Cloud Storage Platform
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 06 | June-2016 www.irjet.net p-ISSN: 2395-0072 Study And Analysis Of Different Cloud Storage Platform S Aditi Apurva, Dept. Of Computer Science And Engineering, KIIT University ,Bhubaneshwar, India. ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Cloud Storage is becoming the most sought 1.INTRODUCTION after storage , be it music files, videos, photos or even The term cloud is the metaphor for internet. The general files people are switching over from storage on network of servers and connection are collectively their local hard disks to storage in the cloud. known as Cloud .Cloud computing emerges as a new computing paradigm that aims to provide reliable, Google Cloud Storage offers developers and IT customized and quality of service guaranteed organizations durable and highly available object computation environments for cloud users. storage. Cloud storage is a model of data storage in Applications and databases are moved to the large which the digital data is stored in logical pools, the centralized data centers, called cloud. physical storage spans multiple servers (and often locations), and the physical environment is typically owned and managed by a hosting company. Analysis of cloud storage can be problem specific such as for one kind of files like YouTube or generic files like google drive and can have different performances measurements . Here is the analysis of cloud storage based on Google’s paper on Google Drive,One Drive, Drobox, Big table , Facebooks Cassandra This will provide an overview of Fig 2: Cloud storage working. how the cloud storage works and the design principle .
    [Show full text]
  • Large-Scale Youtube-8M Video Understanding with Deep Neural Networks
    Large-Scale YouTube-8M Video Understanding with Deep Neural Networks Manuk Akopyan Eshsou Khashba Institute for System Programming Institute for System Programming ispras.ru ispras.ru [email protected] [email protected] many hand-crafted approaches to video-frame feature Abstract extraction, such as Histogram of Oriented Gradients (HOG), Histogram of Optical Flow (HOF), Motion Video classification problem has been studied many Boundary Histogram (MBH) around spatio-temporal years. The success of Convolutional Neural Networks interest points [9], in a dense grid [10], SIFT [11], the (CNN) in image recognition tasks gives a powerful Mel-Frequency Cepstral Coefficients (MFCC) [12], the incentive for researchers to create more advanced video STIP [13] and the dense trajectories [14] existed. Set of classification approaches. As video has a temporal video-frame features then encoded to video-level feature content Long Short Term Memory (LSTM) networks with bag of words (BoW) approach. The problem with become handy tool allowing to model long-term temporal BoW is that it uses only static video-frame information clues. Both approaches need a large dataset of input disposing of the time component, the frame ordering. data. In this paper three models provided to address Recurrent Neural Networks (RNN) show good results in video classification using recently announced YouTube- modeling with time-based input data. A few papers [15, 8M large-scale dataset. The first model is based on frame 16] describe solving video classification problem using pooling approach. Two other models based on LSTM Long Short-Term Memory (LSTM) networks and achieve networks. Mixture of Experts intermediate layer is used in good results.
    [Show full text]