Lenovo Big Data Validated Design for Cloudera Enterprise on Thinksystem SR655 and SR635 Servers

Lenovo Big Data Validated Design for Cloudera Enterprise on Thinksystem SR655 and SR635 Servers

Lenovo Big Data Validated Design for Cloudera Enterprise on ThinkSystem SR655 and SR635 Servers Last update: 28 June 2021 Version 1.3 Reference architecture for Solution based on the Cloudera Enterprise with Apache ThinkSystem SR635/SR655 server Hadoop and Apache Spark with AMD CPU inside Deployment considerations for Solution based on ThinkSystem scalable racks including detailed SR655 compute node with storage validated bills of material pool Xiaotong Jiang (Lenovo) Xifa Chen (Lenovo) Ajay Dholakia (Lenovo) Weixu Yang (Lenovo) Dan Kangas (Lenovo) Lenovo Big Data Validated Design for Cloudera Enterprise on ThinkSystem SR655 and 1 SR635 Servers Table of Contents 1 Introduction ............................................................................................... 4 2 Business problem and business value ................................................... 5 3 Requirements ............................................................................................ 7 Functional Requirements ......................................................................................... 7 Non-functional Requirements................................................................................... 7 4 Architectural Overview ............................................................................. 8 Cloudera Enterprise ................................................................................................. 8 Bare-metal Cluster ................................................................................................... 8 5 Component Model .................................................................................. 10 Cloudera Components ........................................................................................... 11 Apache Spark on Cloudera .................................................................................... 13 6 Operational Model .................................................................................. 15 Hardware Description ............................................................................................ 15 6.1.1 Lenovo ThinkSystem SR635 Server ......................................................................................... 15 6.1.2 Lenovo ThinkSystem SR655 Server ......................................................................................... 16 Cluster Node Configurations .................................................................................. 18 6.2.1 Worker Nodes ............................................................................................................................ 18 6.2.2 Master and Utility Nodes ........................................................................................................... 20 6.2.3 System Management and Edge Nodes ..................................................................................... 22 Cluster Software Stack .......................................................................................... 22 6.3.1 Cloudera Enterprise CDH .......................................................................................................... 22 6.3.2 Red Hat Operating System ........................................................................................................ 23 Cloudera Service Role Layouts .............................................................................. 23 System Management ............................................................................................. 25 Networking ............................................................................................................. 26 6.6.1 Data Network ............................................................................................................................. 27 6.6.2 Hardware Management Network ............................................................................................... 27 6.6.3 Multi-rack Network ..................................................................................................................... 28 6.6.4 10Gb and 25Gb Data Network Configurations .......................................................................... 29 Lenovo Big Data Validated Design for Cloudera Enterprise on ThinkSystem SR655 and 2 SR635 Servers Predefined Cluster Configurations ......................................................................... 30 6.7.1 SR655 Configurations ............................................................................................................... 31 6.7.2 Cluster Storage Capacity ........................................................................................................... 32 7 Deployment considerations ................................................................... 33 Increasing Cluster Performance ............................................................................. 33 Processor Selection ............................................................................................... 33 7.2.1 SR635/SR655 Processors ......................................................................................................... 34 Designing for Storage Capacity and Performance ................................................. 34 7.3.1 Node Capacity ........................................................................................................................... 34 7.3.2 Node Throughput ....................................................................................................................... 34 7.3.3 HDD Controller .......................................................................................................................... 35 Memory Size and Performance .............................................................................. 35 Data Network Considerations ................................................................................ 35 Estimating Disk Space ........................................................................................... 36 High Availability Considerations ............................................................................. 37 7.7.1 Network Availability .................................................................................................................... 37 7.7.2 Cluster Node Availability ............................................................................................................ 37 7.7.3 Storage Availability .................................................................................................................... 38 7.7.4 Software Availability ................................................................................................................... 38 Linux OS Configuration Guidelines ........................................................................ 39 7.8.1 OS configuration for Cloudera CDH .......................................................................................... 39 Designing for High Ingest Rates ............................................................................ 39 8 Bill of Materials ....................................................................................... 41 Master Node .......................................................................................................... 41 Worker Node .......................................................................................................... 43 System Management Node.................................................................................... 45 Rack ....................................................................................................................... 47 Cables .................................................................................................................... 47 Software ................................................................................................................. 48 9 Resources ............................................................................................... 49 Document history ......................................................................................... 51 Lenovo Big Data Validated Design for Cloudera Enterprise on ThinkSystem SR655 and 3 SR635 Servers 1 Introduction This document describes the reference architecture for Cloudera Enterprise on ThinkSystem server with AMD CPU inside. It provides a predefined and optimized hardware infrastructure for the Cloudera Enterprise, a distribution of Apache Hadoop and Apache Spark with enterprise-ready capabilities from Cloudera. This reference architecture provides the planning, design considerations, and best practices for implementing Cloudera Enterprise with Lenovo products. Lenovo and Cloudera worked together on this document, and the reference architecture that is described herein was validated by Lenovo and Cloudera. With the ever-increasing volume, variety and velocity of data becoming available to an enterprise comes the challenge of deriving the most value from it. This task requires the use of suitable data processing and management software running on a tuned hardware platform. With Apache Hadoop and Apache Spark emerging as popular big data storage and processing frameworks, enterprises are building so-called Data Lakes by employing these components. Cloudera brings the power of Hadoop to the customer's enterprise. Hadoop is an open source software framework that is used to reliably manage large volumes of structured and unstructured data. Cloudera expands and enhances this technology to withstand the demands

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    52 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us