Hortonworks Data Platform Command Line Installation (November 30, 2016)

Total Page:16

File Type:pdf, Size:1020Kb

Hortonworks Data Platform Command Line Installation (November 30, 2016) Hortonworks Data Platform Command Line Installation (November 30, 2016) docs.cloudera.com Hortonworks Data Platform November 30, 2016 Hortonworks Data Platform: Command Line Installation Copyright © 2012-2016 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing and analyzing large volumes of data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. The Hortonworks Data Platform consists of the essential set of Apache Software Foundation projects that focus on the storage and processing of Big Data, along with operations, security, and governance for the resulting system. This includes Apache Hadoop -- which includes MapReduce, Hadoop Distributed File System (HDFS), and Yet Another Resource Negotiator (YARN) -- along with Ambari, Falcon, Flume, HBase, Hive, Kafka, Knox, Oozie, Phoenix, Pig, Ranger, Slider, Spark, Sqoop, Storm, Tez, and ZooKeeper. Hortonworks is the major contributor of code and patches to many of these projects. These projects have been integrated and tested as part of the Hortonworks Data Platform release process and installation and configuration tools have also been included. Unlike other providers of platforms built using Apache Hadoop, Hortonworks contributes 100% of our code back to the Apache Software Foundation. The Hortonworks Data Platform is Apache-licensed and completely open source. We sell only expert technical support, training and partner-enablement services. All of our technology is, and will remain, free and open source. Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page. Feel free to contact us directly to discuss your specific needs. Except where otherwise noted, this document is licensed under Creative Commons Attribution ShareAlike 4.0 License. http://creativecommons.org/licenses/by-sa/4.0/legalcode ii Hortonworks Data Platform November 30, 2016 Table of Contents 1. Preparing to Manually Install HDP ............................................................................... 1 1.1. Meeting Minimum System Requirements ........................................................... 1 1.1.1. Hardware Recommendations .................................................................. 1 1.1.2. Operating System Requirements ............................................................. 1 1.1.3. Software Requirements .......................................................................... 1 1.1.4. JDK Requirements .................................................................................. 2 1.1.5. Metastore Database Requirements ......................................................... 5 1.2. Virtualization and Cloud Platforms .................................................................. 15 1.3. Configuring Remote Repositories .................................................................... 15 1.4. Deciding on a Deployment Type ..................................................................... 16 1.5. Collect Information ......................................................................................... 16 1.6. Prepare the Environment ................................................................................ 16 1.6.1. Enable NTP on Your Cluster ................................................................. 16 1.6.2. Disable SELinux ..................................................................................... 18 1.6.3. Disable IPTables .................................................................................... 18 1.6.4. Install Berkeley Database for Falcon ..................................................... 19 1.7. Download Companion Files ............................................................................. 19 1.8. Define Environment Parameters ...................................................................... 20 1.9. Creating System Users and Groups .................................................................. 26 1.10. Determining HDP Memory Configuration Settings ......................................... 27 1.10.1. Running the YARN Utility Script .......................................................... 27 1.10.2. Calculating YARN and MapReduce Memory Requirements .................. 28 1.11. Configuring NameNode Heap Size ................................................................ 31 1.12. Allocating Adequate Log Space for HDP ........................................................ 32 1.13. Downloading the HDP Maven Artifacts ......................................................... 32 2. Installing Apache ZooKeeper ..................................................................................... 34 2.1. Install the ZooKeeper Package ........................................................................ 34 2.2. Securing ZooKeeper with Kerberos (optional) ................................................. 34 2.3. Securing ZooKeeper Access ............................................................................. 35 2.3.1. ZooKeeper Configuration ..................................................................... 35 2.3.2. YARN Configuration ............................................................................. 36 2.3.3. HDFS Configuration .............................................................................. 37 2.4. Set Directories and Permissions ....................................................................... 37 2.5. Set Up the Configuration Files ......................................................................... 38 2.6. Start ZooKeeper .............................................................................................. 39 3. Installing HDFS, YARN, and MapReduce ..................................................................... 40 3.1. Set Default File and Directory Permissions ....................................................... 40 3.2. Install the Hadoop Packages ........................................................................... 40 3.3. Install Compression Libraries ........................................................................... 41 3.3.1. Install Snappy ....................................................................................... 41 3.3.2. Install LZO ............................................................................................ 41 3.4. Create Directories ............................................................................................ 41 3.4.1. Create the NameNode Directories ........................................................ 42 3.4.2. Create the SecondaryNameNode Directories ......................................... 42 3.4.3. Create DataNode and YARN NodeManager Local Directories ................ 42 3.4.4. Create the Log and PID Directories ....................................................... 43 3.4.5. Symlink Directories with hdp-select ....................................................... 45 4. Setting Up the Hadoop Configuration ........................................................................ 47 iii Hortonworks Data Platform November 30, 2016 5. Validating the Core Hadoop Installation .................................................................... 53 5.1. Format and Start HDFS ................................................................................... 53 5.2. Smoke Test HDFS ............................................................................................ 53 5.3. Configure YARN and MapReduce .................................................................... 54 5.4. Start YARN ..................................................................................................... 56 5.5. Start MapReduce JobHistory Server ................................................................. 56 5.6. Smoke Test MapReduce .................................................................................. 57 6. Installing Apache HBase ............................................................................................. 58 6.1. Install the HBase Package ................................................................................ 58 6.2. Set Directories and Permissions ....................................................................... 59 6.3. Set Up the Configuration Files ......................................................................... 59 6.4. Validate the Installation .................................................................................. 62 6.5. Starting the HBase Thrift and REST Servers ...................................................... 63 7. Installing Apache Phoenix .......................................................................................... 65 7.1. Installing the Phoenix Package ........................................................................ 65 7.2. Configuring HBase for Phoenix ........................................................................ 65 7.3. Configuring Phoenix to Run in a Secure Cluster ............................................... 66 7.4. Validating the Phoenix Installation .................................................................. 67 7.5. Troubleshooting Phoenix ................................................................................. 69 8. Installing and Configuring Apache Tez ......................................................................
Recommended publications
  • Getting Started with Derby Version 10.14
    Getting Started with Derby Version 10.14 Derby Document build: April 6, 2018, 6:13:12 PM (PDT) Version 10.14 Getting Started with Derby Contents Copyright................................................................................................................................3 License................................................................................................................................... 4 Introduction to Derby........................................................................................................... 8 Deployment options...................................................................................................8 System requirements.................................................................................................8 Product documentation for Derby........................................................................... 9 Installing and configuring Derby.......................................................................................10 Installing Derby........................................................................................................ 10 Setting up your environment..................................................................................10 Choosing a method to run the Derby tools and startup utilities...........................11 Setting the environment variables.......................................................................12 Syntax for the derbyrun.jar file............................................................................13
    [Show full text]
  • Operational Database Offload
    Operational Database Offload Partner Brief Facing increased data growth and cost pressures, scale‐out technology has become very popular as more businesses become frustrated with their costly “Our partnership with Hortonworks is able to scale‐up RDBMSs. With Hadoop emerging as the de facto scale‐out file system, a deliver to our clients 5‐10x faster performance Hadoop RDBMS is a natural choice to replace traditional relational databases and over 75% reduction in TCO over traditional scale‐up databases. With Splice like Oracle and IBM DB2, which struggle with cost or scaling issues. Machine’s SQL‐based transactional processing Designed to meet the needs of real‐time, data‐driven businesses, Splice engine, our clients are able to migrate their legacy database applications without Machine is the only Hadoop RDBMS. Splice Machine offers an ANSI‐SQL application rewrites” database with support for ACID transactions on the distributed computing Monte Zweben infrastructure of Hadoop. Like Oracle and MySQL, it is an operational database Chief Executive Office that can handle operational (OLTP) or analytical (OLAP) workloads, while scaling Splice Machine out cost‐effectively from terabytes to petabytes on inexpensive commodity servers. Splice Machine, a technology partner with Hortonworks, chose HBase and Hadoop as its scale‐out architecture because of their proven auto‐sharding, replication, and failover technology. This partnership now allows businesses the best of all worlds: a standard SQL database, the proven scale‐out of Hadoop, and the ability to leverage current staff, operations, and applications without specialized hardware or significant application modifications. What Business Challenges are Solved? Leverage Existing SQL Tools Cost Effective Scaling Real‐Time Updates Leveraging the proven SQL processing of Splice Machine leverages the proven Splice Machine provides full ACID Apache Derby, Splice Machine is a true ANSI auto‐sharding of HBase to scale with transactions across rows and tables by using SQL database on Hadoop.
    [Show full text]
  • Tuning Derby Version 10.14
    Tuning Derby Version 10.14 Derby Document build: April 6, 2018, 6:14:42 PM (PDT) Version 10.14 Tuning Derby Contents Copyright................................................................................................................................4 License................................................................................................................................... 5 About this guide....................................................................................................................9 Purpose of this guide................................................................................................9 Audience..................................................................................................................... 9 How this guide is organized.....................................................................................9 Performance tips and tricks.............................................................................................. 10 Use prepared statements with substitution parameters......................................10 Create indexes, and make sure they are being used...........................................10 Ensure that table statistics are accurate.............................................................. 10 Increase the size of the data page cache............................................................. 11 Tune the size of database pages...........................................................................11 Performance trade-offs of large pages..............................................................
    [Show full text]
  • Unravel Data Systems Version 4.5
    UNRAVEL DATA SYSTEMS VERSION 4.5 Component name Component version name License names jQuery 1.8.2 MIT License Apache Tomcat 5.5.23 Apache License 2.0 Tachyon Project POM 0.8.2 Apache License 2.0 Apache Directory LDAP API Model 1.0.0-M20 Apache License 2.0 apache/incubator-heron 0.16.5.1 Apache License 2.0 Maven Plugin API 3.0.4 Apache License 2.0 ApacheDS Authentication Interceptor 2.0.0-M15 Apache License 2.0 Apache Directory LDAP API Extras ACI 1.0.0-M20 Apache License 2.0 Apache HttpComponents Core 4.3.3 Apache License 2.0 Spark Project Tags 2.0.0-preview Apache License 2.0 Curator Testing 3.3.0 Apache License 2.0 Apache HttpComponents Core 4.4.5 Apache License 2.0 Apache Commons Daemon 1.0.15 Apache License 2.0 classworlds 2.4 Apache License 2.0 abego TreeLayout Core 1.0.1 BSD 3-clause "New" or "Revised" License jackson-core 2.8.6 Apache License 2.0 Lucene Join 6.6.1 Apache License 2.0 Apache Commons CLI 1.3-cloudera-pre-r1439998 Apache License 2.0 hive-apache 0.5 Apache License 2.0 scala-parser-combinators 1.0.4 BSD 3-clause "New" or "Revised" License com.springsource.javax.xml.bind 2.1.7 Common Development and Distribution License 1.0 SnakeYAML 1.15 Apache License 2.0 JUnit 4.12 Common Public License 1.0 ApacheDS Protocol Kerberos 2.0.0-M12 Apache License 2.0 Apache Groovy 2.4.6 Apache License 2.0 JGraphT - Core 1.2.0 (GNU Lesser General Public License v2.1 or later AND Eclipse Public License 1.0) chill-java 0.5.0 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 OpenCensus 0.12.3 Apache License 2.0 ApacheDS Protocol
    [Show full text]
  • Developing Applications Using the Derby Plug-Ins Lab Instructions
    Developing Applications using the Derby Plug-ins Lab Instructions Goal: Use the Derby plug-ins to write a simple embedded and network server application. In this lab you will use the tools provided with the Derby plug-ins to create a simple database schema in the Java perspective and a stand-alone application which accesses a Derby database via the embedded driver and/or the network server. Additionally, you will create and use three stored procedures that access the Derby database. Intended Audience: This lab is intended for both experienced and new users to Eclipse and those new to the Derby plug-ins. Some of the Eclipse basics are described, but may be skipped if the student is primarily focused on learning how to use the Derby plug-ins. High Level Tasks Accomplished in this Lab Create a database and two tables using the jay_tables.sql file. Data is also inserted into the tables. Create a public java class which contains two static methods which will be called as SQL stored procedures on the two tables. Write and execute the SQL to create the stored procedures which calls the static methods in the java class. Test the stored procedures by using the SQL 'call' command. Write a stand alone application which uses the stored procedures. The stand alone application should accept command line or console input. Detailed Instructions These instructions are detailed, but do not necessarily provide each step of a task. Part of the goal of the lab is to become familiar with the Eclipse Help document, the Derby Plug- ins User Guide.
    [Show full text]
  • Java DB Based on Apache Derby
    JAVA™ DB BASED ON APACHE DERBY What is Java™ DB? Java DB is Sun’s supported distribution of the open source Apache Derby database. Java DB is written in Java, providing “write once, run anywhere” portability. Its ease of use, standards compliance, full feature set, and small footprint make it the ideal database for Java developers. It can be embedded in Java applications, requiring zero administration by the developer or user. It can also be used in client server mode. Java DB is fully transactional and provides a standard SQL interface as well as a JDBC 4.0 compliant driver. The Apache Derby project has a strong and growing community that includes developers from large companies such as Sun Microsystems and IBM as well as individual contributors. How can I use Java DB? Java DB is ideal for: • Departmental Java client-server applications that need up to 24 x 7 support and the sophisti- cation of a transactional SQL database that protects against data corruption without requiring a database administrator. • Java application development and testing because it’s extremely easy to use, can run on a laptop, is available at no cost under the Apache license, and is also full-featured. • Embedded applications where there is no need for the developer or the end-user to buy, down- load, install, administer — or even be aware of — the database separately from the application. • Multi-platform use due to Java portability. And, because Java DB is fully standards-compliant, it is easy to migrate an application between Java DB and other open standard databases.
    [Show full text]
  • Apache Geronimo Uncovered a View Through the Eyes of a Websphere Application Server Expert
    Apache Geronimo uncovered A view through the eyes of a WebSphere Application Server expert Skill Level: Intermediate Adam Neat ([email protected]) Author Freelance 16 Aug 2005 Discover the Apache Geronimo application server through the eyes of someone who's used IBM WebSphere® Application Server for many years (along with other commercial J2EE application servers). This tutorial explores the ins and outs of Geronimo, comparing its features and capabilities to those of WebSphere Application Server, and provides insight into how to conceptually architect sharing an application between WebSphere Application Server and Geronimo. Section 1. Before you start This tutorial is for you if you: • Use WebSphere Application Server daily and are interested in understanding more about Geronimo. • Want to gain a comparative groundwork understanding of Geronimo and WebSphere Application Server. • Are considering sharing applications between WebSphere Application Server and Geronimo. • Simply want to learn and understand what other technologies are out there (which I often do). Prerequisites Apache Geronimo uncovered © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 1 of 23 developerWorks® ibm.com/developerWorks To get the most out of this tutorial, you should have a basic familiarity with the IBM WebSphere Application Server product family. You should also posses a general understanding of J2EE terminology and technologies and how they apply to the WebSphere Application Server technology stack. System requirements If you'd like to implement the two technologies included in this tutorial, you'll need the following software and components: • IBM WebSphere Application Server. The version I'm using as a base comparison is IBM WebSphere Application Server, Version 6.0.
    [Show full text]
  • IBM Acquires Gluecode Software 10 May 2005
    IBM Acquires Gluecode Software 10 May 2005 IBM today announced it has acquired Gluecode especially amongst SMB and departmental users," Software, a privately held company based in El said Robert LeBlanc, general manager, Application Segundo, Calif., and provider of software and and Integration Middleware, IBM Software Group. support services for open source application "With the Gluecode acquisition, IBM enables infrastructure software. Financial details were not customers and Business Partners to tap the low disclosed. cost of entry of open source technology to quickly develop and deploy applications, and migrate to Based on core open source technology from the WebSphere software as business needs expand." Apache Geronimo application server, Gluecode's software and related subscription support services With the acquisition, IBM will enable customers and provide a flexible and affordable infrastructure to Business Partners to download Gluecode organizations that need a reliable alternative to application server software and start development traditional commercial software offerings. and deployment at no cost, and then purchase Gluecode helps Java developers, small and mid- software support services as needed from IBM. sized businesses (SMB) and departmental users Gluecode's operations will be integrated into IBM's reduce the complexity of application development Software Group. by pre-integrating the most common services for building mainstream Java applications. "The acquisition of Gluecode by IBM recognizes the value of software solutions that are built in IBM will become an active contributor to the collaboration with the open community," said Chet Apache Geronimo open source project and will Kapoor, CEO, Gluecode. "We are pleased to expand the existing community of developers that enable a new business model for IBM that allows advance this project and innovate on top of it.
    [Show full text]
  • Kerio Workspace
    Kerio Workspace Step-by-Step Kerio Technologies 2011 Kerio Technologies s.r.o. All rights reserved. This guide provides detailed description on Kerio Workspace, version 1.0.0. All additional modifications and updates reserved. For current versions of the product and related manuals, check http://www.kerio.com/workspace/download/. Information regarding registered trademarks and trademarks are provided in the appendix A. Contents 1 Introduction ................................................................... 4 2 System requirements .......................................................... 5 2.1 Supported operating systems and browsers ................................ 5 3 Installation ..................................................................... 6 3.1 Windows ................................................................. 6 3.2 Mac OS X ................................................................. 6 3.3 Linux ..................................................................... 7 3.4 Admin Account ........................................................... 8 4 Kerio Workspace Administration Interface ..................................... 9 4.1 Users ..................................................................... 9 4.2 Directory Service ........................................................ 11 4.3 Email Settings ........................................................... 13 4.4 Server ports ............................................................. 14 4.5 SSL Certificates .........................................................
    [Show full text]
  • Introduction to Apache Derby, TS-3154, Javaone Conference 2006
    Introduction to Apache Derby Dan Debrunner Senior Technical Staff Member IBM David Van Couvering Senior Staff Engineer Sun TS-3154 Copyright © 2006, Sun Microsystems Inc., All rights reserved. 2006 JavaOneSM Conference | Session TS-3154 | Where Would You Like Your Data? Learn more about Apache Derby, the database that can go anywhere 2 2006 JavaOneSM Conference | Session TS-3154 | Agenda Overview What Is Derby Good For? Derby Architecture 3 2006 JavaOneSM Conference | Session TS-3154 | Overview ● Derby enables data anywhere ● A complete database in a small package ● Mature, robust, performant, supported ● Community-based open source 4 2006 JavaOneSM Conference | Session TS-3154 | With Apache Derby, Your Data Can Go Anywhere In a browser, memory stick, laptop, server machine, mainframe, PDA... —anywhere Java™ technology goes 5 2006 JavaOneSM Conference | Session TS-3154 | How Does Derby Do This? ● Pure Java technology ● Easy to use ● Embeddable and client/server ● Small footprint ● Secure 6 2006 JavaOneSM Conference | Session TS-3154 | A Complete Database in a Small Package TM ● JDBC Optional Package for CDC/Foundation Profile (JSR 169) ● SQL92 and SQL99 ● Java based procedures, triggers, referential constraints, fully transactional with recovery ● Online backup/restore ● Database encryption ● 2 MB runtime footprint 7 2006 JavaOneSM Conference | Session TS-3154 | Java Technology in Your Database CREATE FUNCTION SEND_MAIL( TO_ADDRESS VARCHAR(320), SUBJECT VARCHAR(320), BODY VARCHAR(32000)) RETURNS INT LANGUAGE JAVA PARAMETER STYLE JAVA NO
    [Show full text]
  • HCP-DM Third-Party Copyrights and Licenses
    Hitachi Content Platform 7.3.0 HCP-DM Third-Party Copyrights and Licenses HCP Data Migrator (HCP-DM) software incorporates third-party software from a number of vendors. This book contains the copyright and license information for that software. MK-90ARC030-06 August 2017 © 2010, 2017 Hitachi Data Systems Corporation. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or stored in a database or retrieval system for any purpose without the express written permission of Hitachi Data Systems Corporation (hereinafter referred to as “Hitachi Data Systems”) except that recipients may copy open source licenses and copyright notices without additional permission. Hitachi Data Systems reserves the right to make changes to this document at any time without notice and assumes no responsibility for its use. This document contains the most current information available at the time of publication. When new and/or revised information becomes available, this entire document will be updated and distributed to all registered users. Some of the features described in this document may not be currently available. Refer to the most recent product announcement or contact Hitachi Data Systems for information about feature and product availability. Notice: Hitachi Data Systems products and services can be ordered only under the terms and conditions of the applicable Hitachi Data Systems agreements. The use of Hitachi Data Systems products is governed by the terms of your agreements with Hitachi Data Systems. By using this software, you agree that you are responsible for: a) Acquiring the relevant consents as may be required under local privacy laws or otherwise from employees and other individuals to access relevant data; and b) Ensuring that data continues to be held, retrieved, deleted, or otherwise processed in accordance with relevant laws.
    [Show full text]
  • Notice Report
    NetApp Notice Report Copyright 2020 About this document The following copyright statements and licenses apply to the software components that are distributed with the Active IQ Platform product released on 2020-08-14 06:55:16. This product does not necessarily use all the software components referred to below. Where required, source code is published at the following location. ftp://ftp.netapp.com/frm-ntap/opensource 1 Components: Component License Achilles Core 5.3.0 Apache License 2.0 An open source Java toolkit for Amazon S3 0.9.0 Apache License 2.0 Aopalliance Version 1.0 Repackaged As A Module 2.5.0-b32 Common Development and Distribution License 1.1 Apache Avro 1.7.6-cdh5.4.2.1.1 Apache License 2.0 Apache Avro 1.7.6-cdh5.4.4 Apache License 2.0 Apache Avro Tools 1.7.6-cdh5.4.4 Apache License 2.0 Apache Commons BeanUtils 1.7.0 Apache License 2.0 Apache Commons BeanUtils 1.8.0 Apache License 2.0 Apache Commons CLI 1.2 Apache License 2.0 Apache Commons Codec 1.9 Apache License 2.0 Apache Commons Collections 3.2.1 Apache License 2.0 Apache Commons Compress 1.4.1 Apache License 2.0 Apache Commons Configuration 1.6 Apache License 2.0 Apache Commons Digester 1.8 Apache License 1.1 Apache Commons Lang 2.6 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 Apache Commons Math 3.1.1 Apache License 2.0 Apache Commons Net 3.1 Apache License 2.0 Apache Directory API ASN.1 API 1.0.0-M20 Apache License 2.0 Apache Directory LDAP API Utilities 1.0.0-M19 Apache License 2.0 Apache Directory LDAP API Utilities 1.0.0-M20 Apache License 2.0
    [Show full text]