Dell EMC Isilon and Cloudera Reference Architecture and Performance Results
Total Page:16
File Type:pdf, Size:1020Kb
Reference Architecture Dell EMC Isilon and Cloudera Reference Architecture and Performance Results Abstract This document is a high-level design, performance results, and best-practices guide for deploying Cloudera Enterprise Distribution on bare-metal infrastructure with Dell EMC’s Isilon scale-out NAS solution as a shared storage backend. August 2020 H18523 Revisions Revisions Date Description August 2020 Initial release Acknowledgments Author: Kirankumar Bhusanurmath, Analytics Solutions Architect, Dell EMC Support: The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any software described in this publication requires an applicable software license. Copyright © 2020 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners. [9/22/2020] [Reference Architecture] [H18523] 2 Dell EMC Isilon and Cloudera Reference Architecture and Performance Results | H18523 Table of contents Table of contents Revisions ..................................................................................................................................................................... 2 Acknowledgments ........................................................................................................................................................ 2 Table of contents ......................................................................................................................................................... 3 Executive summary ...................................................................................................................................................... 5 Disclaimer .................................................................................................................................................................... 5 Audience and scope .................................................................................................................................................... 6 Glossary of terms ......................................................................................................................................................... 6 1 Dell EMC Isilon distributed storage array for HDFS and bare-metal nodes as compute nodes ................................ 8 1.1 Physical cluster component list ..................................................................................................................... 9 1.2 Logical cluster topology ................................................................................................................................ 9 1.3 Sizing recommendation for physical nodes ................................................................................................. 10 1.4 Sizing recommendation for storage allocation ............................................................................................. 11 1.5 Supportability and compatibility matrix ........................................................................................................ 11 2 Performance testing and platform positioning ....................................................................................................... 13 2.1 DFSIO testing with Dell EMC Isilon F800 .................................................................................................... 13 2.1.1 Overview .................................................................................................................................................... 13 2.1.2 Test environment setup .............................................................................................................................. 13 2.1.3 Testing with DFSIO .................................................................................................................................... 13 2.1.4 Hadoop cluster configurations .................................................................................................................... 14 2.1.5 Performance results ................................................................................................................................... 14 2.2 Dell EMC Isilon model for Hadoop use cases ............................................................................................. 14 3 Platform tuning recommendations ........................................................................................................................ 16 3.1 CPU ........................................................................................................................................................... 16 3.1.1 CPU BIOS settings ..................................................................................................................................... 16 3.1.2 CPUfreq governor ...................................................................................................................................... 16 3.2 Memory ...................................................................................................................................................... 17 3.2.1 Minimize anonymous page faults ................................................................................................................ 17 3.2.2 Disable transparent huge-page compaction ................................................................................................ 17 3.2.3 Disable transparent huge-page defragmentation......................................................................................... 17 3.3 Network...................................................................................................................................................... 17 3.3.1 OneFS TCP tuning ..................................................................................................................................... 17 3.3.2 Compute nodes network tuning .................................................................................................................. 18 3.3.3 Verify NIC advanced features ..................................................................................................................... 19 3.3.4 NIC ring buffer configurations ..................................................................................................................... 19 3 Dell EMC Isilon and Cloudera Reference Architecture and Performance Results | H18523 Table of contents 3.4 Storage ...................................................................................................................................................... 19 3.4.1 Disk/FS mount options ............................................................................................................................... 19 3.4.2 FS Creation Options ................................................................................................................................... 20 A Technical support and resources ......................................................................................................................... 21 A.1 Related resources ...................................................................................................................................... 21 4 Dell EMC Isilon and Cloudera Reference Architecture and Performance Results | H18523 Executive summary Executive summary Dell EMC works closely with Cloudera, understanding the needs of Apache Hadoop customers, validating hardware and software-based solutions, empowering organizations with deeper insights and enhanced data- driven decision making by using the right infrastructure for the right data. Dell EMC Isilon and Cloudera - Combines a powerful yet simple, highly efficient, and massively scalable storage platform with integrated support for Hadoop analytics. Dell EMC Isilon is the first, and only, scale-out NAS platform to incorporate native support for the HDFS layer. With your unstructured data on Isilon, these capabilities allow you to quickly implement an in-place data analytics approach and avoid unnecessary capital expenditures, increased operational costs, and time-consuming replication of your big data to a separate infrastructure. This document describes the high-level design, performance results, and best-practices guide for deploying Cloudera Enterprise Distribution on bare-metal infrastructure with Dell EMC’s Isilon scale-out NAS solution as a shared storage backend. Disclaimer The documentation is and contains Cloudera proprietary information protected by copyright and other intellectual property rights. No license under copyright or any other intellectual property right is granted herein. Copyright information for Cloudera software may be found within the documentation accompanying each component in a particular release. Cloudera software includes software from various open source or other third-party projects, and may be released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other software included may be released under the terms of alternative open-source licenses. Review the license and notice files accompanying the software for