Validation of an HPC Cluster: A Sometimes Neglected Aspect of System Administration walk through of methods and procedures Michael Hebenstreit INTEL® corp. CRT Datacenter, Senior Cluster Architect tut118 1 SC2010 1 Agenda • CRT-DC – the Customer Response Data Center •The problem • Tier 1: The hardware • Tier 2: the installed image • Tier 3: performance tests • A look at MS Windows* • Commercial solutions 2 SC2010 2 Agenda • CRT-DC – the Customer Response Data Center •The problem • Tier 1: The hardware • Tier 2: the installed image • Tier 3: performance tests • A look at MS Windows* • Commercial solutions 3 SC2010 3 CRT Datacenter Challenges • Support for variety – Multitude of different hardware architectures – Early access often leads to alpha and beta systems used in cluster configuration • Support for different customers – OEMs, End users, ISVs – Some want their own configuration – Manage access while preserving security of data for each user – Protect the internal network and Intel IP from external disclosure • Support for scaling – Often requires exclusive period due to custom configurations – Remove compute nodes out of circulation for the duration of the project 4 SC2010 CRT-DC cluster configuration Panasas* Force10* network /home QDR IB long-term 360 64 storage admin1 Urbanna Supermicro* admin2 compute compute DDN* pbs-serv1 nodes Nodes Lustre pbs-serv2 24 GB RAM 24 GB RAM 400 GB SAS HD 500 GB SATA HD LFS4 login 2 (HDD) login compile LFS5 (SSD) 1GbE network QDR InfiniBand network 5 SC2010 Exemplary Configurations •Nodes – 360 Intel SR1600UR: Xeon® X5670 (WSM),2.93 GHz,12cores/node,24 GB – 64 Supermicro 6026T-NTR+: 34 Xeon® X5560 (NHM,2.8GHz,8 cores/node), 40 Xeon® X5677 (WSM,3.47GHz,8 cores/node), all 24 GB • Cluster File System – Panasas *(70 TB storage) – DDN* Lustre (28 TB storage) – HDD Lustre (23 TB storage) – SSD Lustre (3 TB storage) • Distributed GigE: – Force10* Networks C-300 backbone, Force10 Networks S50N top-of-rack • Distributed InfiniBand*: – Mellanox* MTS3600Q, 18 spine, 28 leaf switches, 504 ports •Software stack: – RedHat* EL5, OFED 1.3+,Lustre 1.6.4.3+ • has been on Top 500 since June 2006 (best ranking #68, worst #153) 6 SC2010 Agenda • CRT-DC – the Customer Response Data Center •The problem • Tier 1: The hardware • Tier 2: the installed image • Tier 3: performance tests • A look at MS Window* • Commercial solutions 7 SC2010 7 Classification • Hardware and software defects: systems dead or does not operate correctly • Inconsistencies: configuration (config files, installed rpms…) are not identical across the cluster • Degradation: system performs correctly but lost performance ->keep log files 8 SC2010 The Linux Toolbox • Executing commands in parallel – pdsh* • Consolidating pdsh output – dshbak* • cat, grep, sum, sed, awk… • shell scripting • advanced programming languages like Python* or Perl* pdsh homepage: http://sourceforge.net/projects/pdsh 9 SC2010 redirect – To send the output of a file or command into another file [user]$ echo "\"To err is human -" > text1 [user]$ echo "and to blame it on a computer is even more so."\" > text2 [user]echo "Robert Orben" > text3 --------------------------------------------------------------------------------------------- ----------------------------------------------- cat (concatenate) Displays the contents of one or more files to standard output. It is most commonly used to display a single file to a monitor. [user]$ cat text1 "To err is human – [user]$ cat text2 and to blame it on a computer is even more so." [user]$ cat text3 Robert Orben [user]$ cat text1 text2 text3 "To err is human - and to blame it on a computer is even more so." Robert Orben [user]$ cat text1 text2 text3 > text4 [user]$ cat text4 "To err is human - and to blame it on a computer is even more so." Robert Orben --------------------------------------------------------------------------------------------- ------------------------ grep – Used to find a text pattern within a file and return the line(s) containing the pattern. Most commonly used to find a word, but can find a character, phrase, sentence or any regular expression. [user]$ grep computer text4 and to blame it on a computer is even more so." grep –i Because grep is case sensitive, -i is used to ignore case [user]$ grep to text4 and to blame it on a computer is even more so." [user]$ grep –i to text4 "To err is human - and to blame it on a computer is even more so." grep –c To count the number of lines which contain the expression being grep’d. [user]grep -c is text4 2 redirect – To send the output of a file or command into another file [smartuser@server1~]$ echo "\"To err is human -" > text1 [smartuser@server1~]$ echo "and to blame it on a computer is even more so."\" > text2 [smartuser@server1~]echo "Robert Orben" > text3 -------------------------------------------------------------------------------------------------------------------------------------------- cat (concatenate) Displays the contents of one or more files to standard output. It is most commonly used to display a single file to a monitor. [smartuser@server1~]$ cat text1 "To err is human – [smartuser@server1~]$ cat text2 and to blame it on a computer is even more so." [smartuser@server1~]$ cat text3 Robert Orben [smartuser@server1~]$ cat text1 text2 text3 "To err is human - and to blame it on a computer is even more so." Robert Orben [smartuser@server1~]$ cat text1 text2 text3 > text4 [smartuser@server1~]$ cat text4 "To err is human - and to blame it on a computer is even more so." Robert Orben --------------------------------------------------------------------------------------------------------------------- grep – Used to find a text pattern within a file and return the line(s) containing the pattern. Most commonly used to find a word, but can find a character, phrase, sentence or any regular expression. [smartuser@server1~]$ grep computer text4 and to blame it on a computer is even more so." grep –i Because grep is case sensitive, -i is used to ignore case [smartuser@server1~]$ grep to text4 and to blame it on a computer is even more so." [smartuser@server1~]$ grep –i to text4 "To err is human - and to blame it on a computer is even more so." grep –c To count the number of lines which contain the expression being grep’d. [smartuser@server1~]grep -c is text4 2 grep –v To search for lines which do not contain the expression [smartuser@server1~]grep -v is text4 Robert Orben grep –q Searches and quietly exits if the expression is found. When the grep is finished, the exit code is set to the variable $?. If we echo $?, we will see if the expression is present or not. Succcess = 0, Failure = 1. This is useful in “if” statements to avoid confusing output to a user. [smartuser@server1~]grep -q man text4; echo $? 0 [smartuser@server1~]grep -q woman text4; echo $? 1 --------------------------------------------------------------------------------------------------------------------- sum – Computes a 16-bit checksum for each given file and counts the blocks each file occupies. This is calculated after a file transfer and compared to the checksum of the original file to ensure file integrity. [smartuser@server1~]$ sum text4 05333 1 [smartuser@server1~]$ sum text1 text2 text3 24872 1 text1 63331 1 text2 20594 1 text3 --------------------------------------------------------------------------------------------------------------------- awk (printing a specific column) – awk generally is used to search output or a file for a pattern and then manipulate it. When awk finds a specified pattern in a line, it assigns each part of that line to unique variables, e.g. $1 $2 $3 $4 $NF. The smart user can then manipulate the values by using the variables. [smartuser@server1~]$ cat text4 "To err is human - and to blame it on a computer is even more so." Robert Orben [smartuser@server1~]$ err to Orben To limit the output we can use an option telling awk to only consider the line that begins with “and” [smartuser@server1~]$ awk /^and/'{print $3" "$6" "$7}' text4 blame a computer piping with "|" – The pipe lets us direct output from one command directly into another. So here is another way to get to the same output. [smartuser@server1~]$ grep blame text4 | awk '{print$3" "$6" "$7}' blame a computer -------------------------------------------------------------------------------------------------------------------------------------------- sed (changing text) – sed is most useful for making text transformations on an input stream, whether from a file or a pipeline. The single quotes contain the logic sed is to follow, s = substitute, computer is the expression to find and dog is the expression to put in it’s place, g means global and tells sed not to stop at the first occurrence, but to make the change anywhere in the file where the expression “computer” occurs. [smartuser@server1~]$ grep blame text4 | awk '{print$3" "$6" "$7}' | sed 's/computer/dog/g' blame a dog Or awk /^and/'{print $3" "$6" "$7}' text4 | sed 's/computer/cat/g' [smartuser@server1~]$ awk /^and/'{print $3" "$6" "$7}' text4 | sed 's/computer/cat/g' blame a cat For fun [smartuser@server1~]OTHERS="horse pig mouse goat" [smartuser@server1~]echo $OTHERS horse pig mouse goat [smartuser@server1~]for i in $OTHERS; do awk /^and/'{print"You should "$3" "$6" "$7}' text4 | sed 's/computer/'$i'/g';done You should blame a horse You should blame a pig You should blame a mouse You should blame a goat sort is used to sort either alphabetically or numerically. If you | standard output to sort, you will see sorted results on your monitor.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages130 Page
-
File Size-