Best Practices for Scaling Deep Learning Training and Inference with Tensorflow* on Intel® Xeon® Processor Based HPC Infrastructures

Best Practices for Scaling Deep Learning Training and Inference with TensorFlow* On Intel® Xeon® Processor Based HPC Infrastructures _____________________________________ Version: 1.0 Date of Issue: August 2018 Prepared By: Aishwarya Bhandare¶, Deepthi Karkada¶, Kushal Datta¶, Anupama Kurpad§, Vamsi Sripathi¶, Sun Choi¶, Vikram Saletore¶, §Connectivity Group & ¶AI Products Group Data Center Group, Intel Corp Customer Solutions Technical Enabling/AIPG Notices and Disclaimers: Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. Intel, the Intel logo, Xeon, Xeon Phi and Nervana are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © 2018 Intel Corporation. All rights reserved. Best Practices for Scaling Deep Learning Training and Inference with TensorFlow* On Intel® Xeon® Table of Contents 1.Best Practices for TensorFlow Over Intel® Xeon® ...............................................4 1.1 TensorFlow Setup and Installation .................................................................... 4 1.2 Install MPI, if not already installed ................................................................... 4 1.3 Install Uber’s Horovod Library ........................................................................ 5 1.4 Installing tf_cnn_benchmarks ........................................................................... 5 1.5 Preparing the ImageNet2012-1K Dataset ........................................................... 5 1.5.1 Steps to download and prepare Dataset ....................................................................... 6 1.5.2 Already have the ImageNet-1K Dataset ....................................................................... 6 1.5.3 Dataset Striped on Lustre ............................................................................................ 6 1.6 Example: Running ResNet-50 with tf_cnn_benchmarks ..................................... 7 1.6.1 tf_cnn_benchmarks: ResNet-50 ................................................................................... 7 1.6.2 Training on Single-Node with Multiple Workers ......................................................... 8 1.6.3 Using OpenMPI........................................................................................................... 9 1.6.4 Using Intel® MPI ...................................................................................................... 10 1.6.5 Using MVAPICH2 .................................................................................................... 10 1.6.6 Training on Multiple Nodes with Multiple Workers .................................................. 11 1.6.7 Evaluating the Accuracy of the Trained Model .......................................................... 11 1.6.8 Multi-Stream Inference on the Trained Model .......................................................... 12 1.6.9 Running Inference on the Trained Model .................................................................. 13 2.Using Singularity ............................................................................................... 15 2.1 Installing Singularity ...................................................................................... 15 2.2 Building Singularity Image ............................................................................. 15 2.3 Running TensorFlow With Singularity ............................................................ 15 3.Using NFS and SLURM ..................................................................................... 17 3.1 Using NFS Mounted File System ..................................................................... 17 3.2 Using SLURM Scheduler ................................................................................ 17 4.TensorFlow Build Instructions ........................................................................... 18 4.1 Building TensorFlow ...................................................................................... 18 4.2 Install TensorFlow using script ....................................................................... 19 5.Sample scripts ................................................................................................... 20 5.1 TensorFlow build script .................................................................................. 20 5.2 SLURM scripts .............................................................................................. 22 5.3 Singularity scripts .......................................................................................... 24 5.3.1 Install script .............................................................................................................. 24 5.3.2 Recipe file for tensorflow wheel downloaded from a URL .......................................... 24 5.3.3 Recipe file for TensorFlow wheel on local file system ................................................. 25 5.3.4 Singularity run-script ................................................................................................ 27 6.Troubleshooting ................................................................................................ 28 6.1 TensorFlow Import Issues .............................................................................. 28 6.1.1 Importing TensorFlow .............................................................................................. 28 6.1.2 Run ldd to find the dynamically linked libraries ........................................................ 28 6.1.3 Check by running: ..................................................................................................... 29 6.1.4 Another Common Error when Importing TensorFlow .............................................. 30 6.1.5 Verify that TensorFlow is Using right the version of gcc ............................................ 30 6.1.6 Run ldd again after adding the correct version of gcc ................................................ 31 6.2 TensorFlow Build Issues ................................................................................. 32 6.3 Horovod Install Issues .................................................................................... 33 6.4 Verify Intel® Omni-Path Architecture (OPA) ................................................. 33 6.4.1 Verify that OPA is Up and Running .......................................................................... 33 Version 0.94 Page 2 9/15/2018 Best Practices for Scaling Deep Learning Training and Inference with TensorFlow* On Intel® Xeon® 6.4.2 Verify Install (Example of a good install) ................................................................... 33 6.4.3 Verify OPA Fabric Performance ..............................................................................

Best Practices for Scaling Deep Learning Training and Inference with Tensorflow* on Intel® Xeon® Processor Based HPC Infrastructures

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support