Hadoop Mapreduce – Similar to Google Mapreduce

Total Page:16

File Type:pdf, Size:1020Kb

Hadoop Mapreduce – Similar to Google Mapreduce Hadoop and Tools • Various Linux Hadoop clusters around – http://hadoop.apache.org – Amazon EC2 • Windows and other platforms – The NetBeans plugin simulates Hadoop – The workflow view works on Windows • Hadoop-based tools – For Developing in Java, NetBeans plugin • HBase, Distributed data store as a large table • Hive, Data warehouse, SQL • Pig , a SQL-like high level data processing script language • Mahout, Machine Learning algorithms on Hadoop 1 Installing Hadoop http://hadoop.apache.org/ Supported Platforms • GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes. • Windows is also a supported platform. Required Software • Required software for Linux and Windows include: • Java^TM 1.6.x, preferably from Sun, must be installed. • ssh must be installed and sshd must be running to use the Hadoop script 2 3 HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google Bigtable Data is divided into various tables Table is composed of columns, columns are grouped into column-families Example 6 Multi-dimensional map 7 Physical view 8 10 Problem with MapReduce Hadoop supports data-intensive distributed applications using MapReduce. However... – Map-reduce hard to program (users know sql/bash/python). – No schema. What is HIVE? A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. – ETL. – Structure. – Access to different storage. – Query execution via MapReduce. Key Building Principles: – SQL is a familiar language – Extensibility – Types, Functions, Formats, Scripts – Performance Data Units Databases. Tables. Partitions. Buckets (or Clusters). Type System Primitive types – Integers:TINYINT, SMALLINT, INT, BIGINT. – Boolean: BOOLEAN. – Floating point numbers: FLOAT, DOUBLE . – String: STRING. Complex types – Structs: {a INT; b INT}. – Maps: M['group']. – Arrays: ['a', 'b', 'c'], A[1] returns 'b'. Examples – DDL Operations CREATE TABLE sample (foo INT, bar STRING) PARTITIONED BY (ds STRING); SHOW TABLES '.*s'; DESCRIBE sample; ALTER TABLE sample ADD COLUMNS (new_col INT); DROP TABLE sample; Examples – DML Operations LOAD DATA LOCAL INPATH './sample.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2012-02-24'); LOAD DATA INPATH '/user/falvariz/hive/sample.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2012- 02-24'); SELECTS and FILTERS SELECT foo FROM sample WHERE ds='2012- 02-24'; INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT * FROM sample WHERE ds='2012-02-24'; INSERT OVERWRITE LOCAL DIRECTORY '/tmp/hive-sample-out' SELECT * FROM sample; Aggregations and Groups SELECT MAX(foo) FROM sample; SELECT ds, COUNT(*), SUM(foo) FROM sample GROUP BY ds; FROM sample s INSERT OVERWRITE TABLE bar SELECT s.bar, count(*) WHERE s.foo > 0 GROUP BY s.bar; Join CREATE TABLE customer (id INT,name STRING,address STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '#'; CREATE TABLE order_cust (id INT,cus_id INT,prod_id INT,price INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; SELECT * FROM customer c JOIN order_cust o ON (c.id=o.cus_id); SELECT c.id,c.name,c.address,ce.exp FROM customer c JOIN (SELECT cus_id,sum(price) AS exp FROM order_cust GROUP BY cus_id) ce ON (c.id=ce.cus_id); Multi table insert - Dynamic partition insert FROM page_view_stg pvs INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country='US') SELECT pvs.viewTime, … WHERE pvs.country = 'US' INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country='CA') SELECT pvs.viewTime, ... WHERE pvs.country = 'CA' INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country='UK') SELECT pvs.viewTime, ... WHERE pvs.country = 'UK'; FROM page_view_stg pvs INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country) SELECT pvs.viewTime, ... Apache Pig 21 MapReduce not Good Enough? Restrict programming model Only two phases Single Job chain for long data flow Put the logic at the right phase Programmers are responsible for this Too many lines of code even for simple logic How many lines do you have for word count? 22 Pig to Rescure High level dataflow language (Pig) Much simpler than Java Simplify the data processing Put the operations at the apropriate phases Chains multiple MapReduce jobs 23 Motivation by Example Suppose we have user data in one file, website data in another file. We need to find the top 5 most visited pages by users aged 18- 25 24 In MapReduce 25 In Pig 26 Pig runs over Hadoop 27 Pig Data flow language User specify a sequence of operations to process data More control on the process, compared with declarative language Various data types supports Schema supports User defined functions supports 28 29 Machine Learning • “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” • Subset of Artificial Intelligence Types • Supervised – Using labeled training data, create function that predicts output of unseen inputs • Unsupervised – Using unlabeled data, create function that predicts output • Semi-Supervised – Uses labeled and unlabeled data Example: Clustering • Unsupervised • Find Natural Groupings – Documents – Search Results – People – Genetic traits in groups – Many, many more uses Example: Collaborative Filtering • Unsupervised • Recommend people and products – User-User » User likes X, you might too – Item-Item » People who bought X also bought Y Amazon.com Example: Classification/Categorization • Many, many types • Spam Filtering • Named Entity Recognition (NER) • Phrase Identification • Sentiment Analysis • Classification into a Taxonomy NER? Example: Info. Retrieval • Learning Ranking Functions • Learning Spelling Corrections • User Click Analysis and Tracking Other • Image Analysis • Robotics • Games • Higher level natural language processing • Many, many others What is Apache Mahout? • A Mahout is an elephant trainer/driver/keeper, hence… + (and other distributed techniques) Machine Learning = Goal : – Scalable Machine Learning algoirthms with Apache License What? • Hadoop brings: – Map/Reduce API – HDFS – In other words, scalability and fault-tolerance • Mahout brings: – Library of machine learning algorithms – Examples Why Mahout? • Many Open Source ML libraries either: – Lack Community – Lack Documentation and Examples – Lack Scalability – Lack the Apache License ;-) – Or are research-oriented Current Status • What’s in Mahout: – Simple Matrix/Vector library – Taste Collaborative Filtering – Clustering » Canopy/K-Means/Fuzzy K-Means/Mean-shift/Dirichlet – Classifiers » Naïve Bayes » Complementary NB – Evolutionary » Integration with Watchmaker for fitness function.
Recommended publications
  • Learning Apache Mahout Classification Table of Contents
    Learning Apache Mahout Classification Table of Contents Learning Apache Mahout Classification Credits About the Author About the Reviewers www.PacktPub.com Support files, eBooks, discount offers, and more Why subscribe? Free access for Packt account holders Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Downloading the color images of this book Errata Piracy Questions 1. Classification in Data Analysis Introducing the classification Application of the classification system Working of the classification system Classification algorithms Model evaluation techniques The confusion matrix The Receiver Operating Characteristics (ROC) graph Area under the ROC curve The entropy matrix Summary 2. Apache Mahout Introducing Apache Mahout Algorithms supported in Mahout Reasons for Mahout being a good choice for classification Installing Mahout Building Mahout from source using Maven Installing Maven Building Mahout code Setting up a development environment using Eclipse Setting up Mahout for a Windows user Summary 3. Learning Logistic Regression / SGD Using Mahout Introducing regression Understanding linear regression Cost function Gradient descent Logistic regression Stochastic Gradient Descent Using Mahout for logistic regression Summary 4. Learning the Naïve Bayes Classification Using Mahout Introducing conditional probability and the Bayes rule Understanding the Naïve Bayes algorithm Understanding the terms used in text classification Using the Naïve Bayes algorithm in Apache Mahout Summary 5. Learning the Hidden Markov Model Using Mahout Deterministic and nondeterministic patterns The Markov process Introducing the Hidden Markov Model Using Mahout for the Hidden Markov Model Summary 6. Learning Random Forest Using Mahout Decision tree Random forest Using Mahout for Random forest Steps to use the Random forest algorithm in Mahout Summary 7.
    [Show full text]
  • Hadoop Tutorials  Cassandra  Hector API  Request Tutorial  About
    Home Big Data Hadoop Tutorials Cassandra Hector API Request Tutorial About LABELS: HADOOP-TUTORIAL, HDFS 3 OCTOBER 2013 Hadoop Tutorial: Part 1 - What is Hadoop ? (an Overview) Hadoop is an open source software framework that supports data intensive distributed applications which is licensed under Apache v2 license. At-least this is what you are going to find as the first line of definition on Hadoop in Wikipedia. So what is data intensive distributed applications? Well data intensive is nothing but BigData (data that has outgrown in size) anddistributed applications are the applications that works on network by communicating and coordinating with each other by passing messages. (say using a RPC interprocess communication or through Message-Queue) Hence Hadoop works on a distributed environment and is build to store, handle and process large amount of data set (in petabytes, exabyte and more). Now here since i am saying that hadoop stores petabytes of data, this doesn't mean that Hadoop is a database. Again remember its a framework that handles large amount of data for processing. You will get to know the difference between Hadoop and Databases (or NoSQL Databases, well that's what we call BigData's databases) as you go down the line in the coming tutorials. Hadoop was derived from the research paper published by Google on Google File System(GFS) and Google's MapReduce. So there are two integral parts of Hadoop: Hadoop Distributed File System(HDFS) and Hadoop MapReduce. Hadoop Distributed File System (HDFS) HDFS is a filesystem designed for storing very large files with streaming data accesspatterns, running on clusters of commodity hardware.
    [Show full text]
  • MÁSTER EN INGENIERÍA WEB Proyecto Fin De Máster
    UNIVERSIDAD POLITÉCNICA DE MADRID Escuela Técnica Superior de Ingeniería de Sistemas Informáticos MÁSTER EN INGENIERÍA WEB Proyecto Fin de Máster …Estudio Conceptual de Big Data utilizando Spring… Autor Gabriel David Muñumel Mesa Tutor Jesús Bernal Bermúdez 1 de julio de 2018 Estudio Conceptual de Big Data utilizando Spring AGRADECIMIENTOS Gracias a mis padres Julian y Miriam por todo el apoyo y empeño en que siempre me mantenga estudiando. Gracias a mi tia Gloria por sus consejos e ideas. Gracias a mi hermano José Daniel y mi cuñada Yule por siempre recordarme que con trabajo y dedicación se pueden alcanzar las metas. [UPM] Máster en Ingeniería Web RESUMEN Big Data ha sido el término dado para aglomerar la gran cantidad de datos que no pueden ser procesados por los métodos tradicionales. Entre sus funciones principales se encuentran la captura de datos, almacenamiento, análisis, búsqueda, transferencia, visualización, monitoreo y modificación. Las empresas han visto en Big Data una poderosa herramienta para mejorar sus negocios en una economía mundial basada firmemente en el conocimiento. Los datos son el combustible para las compañías modernas y, por lo tanto, dar sentido a estos datos permite realmente comprender las conexiones invisibles dentro de su origen. En efecto, con mayor información se toman mejores decisiones, permitiendo la creación de estrategias integrales e innovadoras que garanticen resultados exitosos. Dada la creciente relevancia de Big Data en el entorno profesional moderno ha servido como motivación para la realización de este proyecto. Con la utilización de Java como software de desarrollo y Spring como framework web se desea analizar y comprobar qué herramientas ofrecen estas tecnologías para aplicar procesos enfocados en Big Data.
    [Show full text]
  • Security Log Analysis Using Hadoop Harikrishna Annangi Harikrishna Annangi, [email protected]
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by St. Cloud State University St. Cloud State University theRepository at St. Cloud State Culminating Projects in Information Assurance Department of Information Systems 3-2017 Security Log Analysis Using Hadoop Harikrishna Annangi Harikrishna Annangi, [email protected] Follow this and additional works at: https://repository.stcloudstate.edu/msia_etds Recommended Citation Annangi, Harikrishna, "Security Log Analysis Using Hadoop" (2017). Culminating Projects in Information Assurance. 19. https://repository.stcloudstate.edu/msia_etds/19 This Starred Paper is brought to you for free and open access by the Department of Information Systems at theRepository at St. Cloud State. It has been accepted for inclusion in Culminating Projects in Information Assurance by an authorized administrator of theRepository at St. Cloud State. For more information, please contact [email protected]. Security Log Analysis Using Hadoop by Harikrishna Annangi A Starred Paper Submitted to the Graduate Faculty of St. Cloud State University in Partial Fulfillment of the Requirements for the Degree of Master of Science in Information Assurance April, 2016 Starred Paper Committee: Dr. Dennis Guster, Chairperson Dr. Susantha Herath Dr. Sneh Kalia 2 Abstract Hadoop is used as a general-purpose storage and analysis platform for big data by industries. Commercial Hadoop support is available from large enterprises, like EMC, IBM, Microsoft and Oracle and Hadoop companies like Cloudera, Hortonworks, and Map Reduce. Hadoop is a scheme written in Java that allows distributed processes of large data sets across clusters of computers using programming models. A Hadoop frame work application works in an environment that provides storage and computation across clusters of computers.
    [Show full text]
  • Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions
    00 Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions MUTAZ BARIKA, University of Tasmania SAURABH GARG, University of Tasmania ALBERT Y. ZOMAYA, University of Sydney LIZHE WANG, China University of Geoscience (Wuhan) AAD VAN MOORSEL, Newcastle University RAJIV RANJAN, Chinese University of Geoscienes and Newcastle University Interest in processing big data has increased rapidly to gain insights that can transform businesses, government policies and research outcomes. This has led to advancement in communication, programming and processing technologies, including Cloud computing services and technologies such as Hadoop, Spark and Storm. This trend also affects the needs of analytical applications, which are no longer monolithic but composed of several individual analytical steps running in the form of a workflow. These Big Data Workflows are vastly different in nature from traditional workflows. Researchers arecurrently facing the challenge of how to orchestrate and manage the execution of such workflows. In this paper, we discuss in detail orchestration requirements of these workflows as well as the challenges in achieving these requirements. We alsosurvey current trends and research that supports orchestration of big data workflows and identify open research challenges to guide future developments in this area. CCS Concepts: • General and reference → Surveys and overviews; • Information systems → Data analytics; • Computer systems organization → Cloud computing; Additional Key Words and Phrases: Big Data, Cloud Computing, Workflow Orchestration, Requirements, Approaches ACM Reference format: Mutaz Barika, Saurabh Garg, Albert Y. Zomaya, Lizhe Wang, Aad van Moorsel, and Rajiv Ranjan. 2018. Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions.
    [Show full text]
  • Apache Mahout User Recommender
    Apache Mahout User Recommender Whiniest Peirce upstage her russias so isochronously that Mead scat very sore. Indicative and wooden Bartholomeus reports her Renfrew whets wondrously or emulsifies correspondingly, is Bennie cranky? Sidnee overflies esuriently while effaceable Rodrigo diabolizes lamentably or rumpling conversationally. Mathematically analyzing how frequent user experience for you can provide these are using intelligent algorithms labeled with. My lantern is this. The prior data set is a search for your recommendations help recommendation. This architecture is prepared to alarm the needs of Netflix, in order say make their choices in your timely manner. In the thresholdbased selection, Support Vector Machines and thrift on. Early adopter architecture must also likely to users to make mahout apache mahout to. It up thus quick to access how valuable recommender systems, creating a partially combined system and grade set. Students that achieve good grades in all their years of study are likely to find work and proceed to have a successful career using the knowledge they have gained from their studies. You may change your ad preferences anytime. This user which users dataset contains methods and apache mahout is. Make Alpine wait until Livewire is finished rendering to often its thing. It can be mahout apache mahout core component can i have not buy a user increased which users who as a technique of courses within seconds. They interact thus far be able to exit an informed decision in duration to maximise both their enjoyment of their studies and their agenda of successful academic performance. Collaborative competitive filtering: Learning recommender using context of user choice.
    [Show full text]
  • Apache Oozie the Workflow Scheduler for Hadoop
    Apache Oozie The Workflow Scheduler For Hadoop televises:Hookier and he sopraninopip his rationalists Jere decrescendo usuriously hisand footie fragilely. inundates Larry disburdenchummed educationally.untimely. Seismographical Evan Apache Zookepeer Tutorial Zookeeper in Hadoop Hadoop. Oozie offers replacement only be used files into hadoop ecosystem components are used, including but for any. The below and action, time if we saw how does flipkart first emi option. Here, how to reduce their costs and increase the time to market. Are whole a Author? Who uses Apache Oozie? What is the estimated delivery time? Oozie operates by running with a prior in a Hadoop cluster with clients submitting workflow definitions for sink or delayed processing. Specifies that cannot span file. Explanation Oozie is a workflow scheduler system where manage Hadoop jobs. Other events and schedule apache storm for all set of a free. Oozie server using REST. Supermart is available only in select cities. Action contains description of hangover or more workflows to be executed Oozie is lightweight as it uses existing Hadoop MapReduce framework for. For sellers on a great features: they implemented has been completed. For example, TORT OR hassle, and SSH. Apache Oozie provides you the power to easily handle these kinds of scenarios. Have doubts regarding this product? Oozie is a workflow scheduler system better manage apache hadoop jobs Oozie workflow jobs are directed acyclical graphs dags of actions By. Recipient as is required. Needed when any oozie client is anger on separated node. Sorry, French, Straus and Giroux. Data pipeline job scheduling in GoDaddy Developer's point of.
    [Show full text]
  • Persisting Big-Data the Nosql Landscape
    Information Systems 63 (2017) 1–23 Contents lists available at ScienceDirect Information Systems journal homepage: www.elsevier.com/locate/infosys Persisting big-data: The NoSQL landscape Alejandro Corbellini n, Cristian Mateos, Alejandro Zunino, Daniela Godoy, Silvia Schiaffino ISISTAN (CONICET-UNCPBA) Research Institute1, UNICEN University, Campus Universitario, Tandil B7001BBO, Argentina article info abstract Article history: The growing popularity of massively accessed Web applications that store and analyze Received 11 March 2014 large amounts of data, being Facebook, Twitter and Google Search some prominent Accepted 21 July 2016 examples of such applications, have posed new requirements that greatly challenge tra- Recommended by: G. Vossen ditional RDBMS. In response to this reality, a new way of creating and manipulating data Available online 30 July 2016 stores, known as NoSQL databases, has arisen. This paper reviews implementations of Keywords: NoSQL databases in order to provide an understanding of current tools and their uses. NoSQL databases First, NoSQL databases are compared with traditional RDBMS and important concepts are Relational databases explained. Only databases allowing to persist data and distribute them along different Distributed systems computing nodes are within the scope of this review. Moreover, NoSQL databases are Database persistence divided into different types: Key-Value, Wide-Column, Document-oriented and Graph- Database distribution Big data oriented. In each case, a comparison of available databases
    [Show full text]
  • Workshop- Matrix Math at Scale with Apache Mahout and Spark
    Matrix Math at Scale with Apache Mahout and Spark Andrew Musselman [email protected] About Me Professional Personal Data science and engineering, Chief Live in Seattle Analytics Officer at A2Go Two decent kids, beautiful and Software engineering, web dev, data science supportive photographer wife at online companies Snowboarding, bicycling, music, Chair of Mahout PMC; started on Mahout sailing, amateur radio (KI7KQA) project with a bug in the k-means method Co-host of podcast Adversarial Learning with @joelgrus Recent Publications on Mahout Apache Mahout: Beyond MapReduce Encyclopedia of Big Data Technologies Dmitriy Lyubimov and Andrew Palumbo Apache Mahout chapter by A. Musselman https://www.amazon.com/dp/B01BXW0HRY https://www.springer.com/us/book/9783319775241 Apache Mahout Web Site Relaunch http://mahout.apache.org Thanks to Dustin VanStee, Trevor Grant, and David Miller (https://startbootstrap.com) Jekyll-based, publish with push to source control repo RIP Little Blue Man Getting Started with Apache Mahout ● Project site at http://mahout.apache.org ● Mahout channel on The ASF Slack domain ○ #mahout on https://the-asf.slack.com ● Mailing lists ○ User and Dev lists ○ https://mahout.apache.org/general/mailing-lists,-irc-and-archives.html ● Clone the source code ○ https://github.com/apache/mahout ● Or get a pre-built binary build ○ “Download Mahout” button on http://mahout.apache.org ● Small, responsive and dedicated project team ● Experiment and get as close to the underlying arithmetic as you want to Agenda ● Intro/Motivation ● The REPL
    [Show full text]
  • Analysis of Web Log Data Using Apache Pig in Hadoop
    [VOLUME 5 I ISSUE 2 I APRIL – JUNE 2018] e ISSN 2348 –1269, Print ISSN 2349-5138 http://ijrar.com/ Cosmos Impact Factor 4.236 ANALYSIS OF WEB LOG DATA USING APACHE PIG IN HADOOP A. C. Priya Ranjani* & Dr. M. Sridhar** *Research Scholar, Department of Computer Science, Acharya Nagarjuna University, Guntur, Andhra Pradesh, INDIA, **Associate Professor, Department of Computer Applications, R.V.R & J.C College of Engineering, Guntur, India Received: April 09, 2018 Accepted: May 22, 2018 ABSTRACT The wide spread use of internet and increased web applications accelerate the rampant growth of web content. Every organization produces huge amount of data in different forms like text, audio, video etc., from multiplesources. The log data stored in web servers is a great source of knowledge. The real challenge for any organization is to understand the behavior of their customers. Analyzing such web log data will help the organizations to understand navigational patterns and interests of their users. As the logs are growing in size day by day, the existing database technologies face a bottleneck to process such massive unstructured data. Hadoop provides a best solution to this problem. Hadoop framework comes up with Hadoop Distributed File System, a reliable distributed storage for data and MapReduce, a distributed parallel processing for executing large volumes of complex data. Hadoop ecosystem constitutes of several other tools like Pig, Hive, Flume, Sqoop etc., for effective analysis of web log data. To write scripts in Map Reduce, one should acquire a good programming knowledge in Java. However Pig, a simple dataflow language can be easily used to analyze such data.
    [Show full text]
  • Apache Pig's Optimizer
    Apache Pig’s Optimizer Alan F. Gates, Jianyong Dai, Thejas Nair Hortonworks Abstract Apache Pig allows users to describe dataflows to be executed in Apache Hadoop. The distributed nature of Hadoop, as well as its execution paradigms, provide many execution opportunities as well as impose constraints on the system. Given these opportunities and constraints Pig must make decisions about how to optimize the execution of user scripts. This paper covers some of those optimization choices, focussing one ones that are specific to the Hadoop ecosystem and Pig’s common use cases. It also discusses optimizations that the Pig community has considered adding in the future. 1 Introduction Apache Pig [10] provides an SQL-like dataflow language on top of Apache Hadoop [11] [7]. With Pig, users write dataflow scripts in a language called Pig Latin. Pig then executes these dataflow scripts in Hadoop using MapReduce. Providing users with a scripting language, rather than requiring them to write MapReduce pro- grams in Java, drastically decreases their development time and enables non-Java developers to use Hadoop. Pig also provides operators for most common data processing operations, such as join, sort, and aggregation. It would otherwise require huge amounts of effort for a handcrafted Java MapReduce program to implement these operators. Many different types of data processing are done on Hadoop. Pig does not seek to be a general purpose solution for all of them. Pig focusses on use cases where users have a DAG of transformations to be done on their data, involving some combination of standard relational operations (join, aggregation, etc.) and custom processing which can be included in Pig Latin via User Defined Functions, or UDFs, which can be written in Java or a scripting language.1 Pig also focusses on situations where data may not yet be cleansed and normal- ized.
    [Show full text]
  • HDP 3.1.4 Release Notes Date of Publish: 2019-08-26
    Release Notes 3 HDP 3.1.4 Release Notes Date of Publish: 2019-08-26 https://docs.hortonworks.com Release Notes | Contents | ii Contents HDP 3.1.4 Release Notes..........................................................................................4 Component Versions.................................................................................................4 Descriptions of New Features..................................................................................5 Deprecation Notices.................................................................................................. 6 Terminology.......................................................................................................................................................... 6 Removed Components and Product Capabilities.................................................................................................6 Testing Unsupported Features................................................................................ 6 Descriptions of the Latest Technical Preview Features.......................................................................................7 Upgrading to HDP 3.1.4...........................................................................................7 Behavioral Changes.................................................................................................. 7 Apache Patch Information.....................................................................................11 Accumulo...........................................................................................................................................................
    [Show full text]