Lenovo Big Data Validated Design for Hortonworks Data Platform Using ThinkSystem Servers Last update: 14 December 2017 Version 1.0 Configuration Reference Number: BGDHW01XX74 Describes a validated design for Solution based on the powerful, Hortonworks Data Platform, versatile Lenovo ThinkSystem powered by Apache Hadoop SR650 server with high speed and Apache Spark ThinkSystem switches up to 100Gb network speeds Deployment considerations for Contains detailed bill of high-performance, cost-effective material for different servers and scalable solutions and associated networking Dan Kangas Weixu Yang Ajay Dholakia Brian Finley Table of Contents 1 Introduction ............................................................................................... 1 2 Business problem and business value ................................................... 2 2.1 Business problem .................................................................................................... 2 2.2 Business value ......................................................................................................... 2 3 Big Data Requirements ............................................................................ 3 3.1 Functional requirements........................................................................................... 3 3.2 Non-functional requirements .................................................................................... 3 4 Architectural overview ............................................................................. 4 5 Component model .................................................................................... 5 6 Operational model .................................................................................... 8 6.1 Hardware description ............................................................................................... 8 6.1.1 Lenovo ThinkSystem SR650 Server ........................................................................................... 8 6.1.2 Lenovo ThinkSystem SR630 Server ........................................................................................... 9 6.1.3 Lenovo RackSwitch G8052 ....................................................................................................... 10 6.1.4 Lenovo RackSwitch G8272 ....................................................................................................... 10 6.1.5 Lenovo RackSwitch NE10032 - Cross-Rack Switch .................................................................. 11 6.2 Cluster nodes ......................................................................................................... 12 1.1.1 Worker nodes ............................................................................................................................ 12 6.2.1 Master Nodes ............................................................................................................................ 13 6.3 Systems management ........................................................................................... 16 6.4 Networking ............................................................................................................. 17 6.4.1 Data network .............................................................................................................................. 18 6.4.2 Hardware management network ............................................................................................... 18 6.4.3 Multi-rack network...................................................................................................................... 19 6.5 Predefined cluster configurations ........................................................................... 20 7 Deployment considerations ................................................................... 24 7.1 Increasing cluster performance .............................................................................. 24 ii Lenovo Big Data Reference Architecture for Hortonworks Enterprise 7.2 Designing for high ingest rates ............................................................................... 24 7.3 Designing for Storage Capacity and Performance ................................................. 24 7.3.1 Node Capacity ........................................................................................................................... 24 7.3.2 Node Throughput ....................................................................................................................... 25 7.3.3 HDD controller ........................................................................................................................... 25 7.4 Designing for in-memory processing with Apache Spark ....................................... 25 7.5 Data Network Adapter Options ............................................................................... 27 7.6 Estimating disk space ............................................................................................ 27 7.7 Scaling considerations ........................................................................................... 28 7.8 High availability considerations .............................................................................. 29 7.8.1 Networking considerations ........................................................................................................ 29 7.8.2 Hardware availability considerations ......................................................................................... 29 7.8.3 Storage availability ..................................................................................................................... 29 7.8.4 Software availability considerations ........................................................................................... 29 7.9 Migration considerations ........................................................................................ 30 8 Appendix: Bill of Materials ..................................................................... 31 8.1 Master node ........................................................................................................... 31 8.2 Worker node .......................................................................................................... 32 8.3 Systems Management Node .................................................................................. 34 8.4 Management network switch.................................................................................. 35 8.5 Data network switch ............................................................................................... 35 8.6 Rack ....................................................................................................................... 35 8.7 Cables .................................................................................................................... 36 9 Acknowledgements ................................................................................ 37 10 Resources ............................................................................................... 38 11 Document history ................................................................................... 39 12 Trademarks and special notices ............................................................ 40 iii Lenovo Big Data Reference Architecture for Hortonworks Enterprise 1 Introduction This document describes the reference architecture for Hortonworks Data Platform (HDP), a distribution of Apache Hadoop with enterprise-ready capabilities. It provides a predefined and optimized Lenovo hardware infrastructure for the Hortonworks Data Platform. The intended audience is IT professionals, technical architects, sales engineers, and consultants to assist in planning, designing, and implementing the Hortonworks big data solution using Lenovo hardware. It is assumed that you are familiar with Hadoop components and capabilities. For more information about Hadoop, see “Resources” on page 38. This Hortonworks reference architecture was validated on the Lenovo hardware described this document. The hardware bill of material is provided and this predefined configuration provides a baseline for a big data solution, which can be modified, based on the specific customer requirements, such as lower cost, improved performance, and increased reliability. Reference the Bill of Material section on page 31. The Hortonworks Data Platform, powered by Apache Hadoop, is a highly scalable and fully open source platform for storing, processing and analyzing large volumes of structured and unstructured data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. Hortonworks expands and enhances this technology to withstand the demands of your enterprise, adding management, security, governance, and analytics features. The result is that you obtain a more enterprise ready solution for complex, large-scale analytics. 1 Lenovo Big Data Reference Architecture for Hortonworks Data Platform 2 Business problem and business value This section describes the business problem that is associated with big data environments and the value that is offered by the Hortonworks Data Platform solution and Lenovo hardware. 2.1 Business problem By 2012, the world generated 2.5 million terabytes (TB) of data, daily - a level that is expected to increase to 44 zettabytes (44 trillion gigabytes by 2020). In all, 90% of the data in the world today was created in the last two years alone. This data
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages44 Page
-
File Size-