System Operations IT-ST-FDO
Total Page:16
File Type:pdf, Size:1020Kb
System Operations IT-ST-FDO Index: - Bash pag. 2 - Common Operations pag. 3 - EOS pag. 13 - Filesystem Operations pag. 16 - CASTOR pag. 23 - Interventions pag. 28 - How to implement the SSO on EOSCOCKPIT Machine pag. 34 - XrdFed pag. 38 - Rundeck pag. 40 - Gitlab pag. 41 - SAMBA pag. 45 Please remind that commands and procedures could be no longer updated. Please verify them before any use. Bash: Shell command line --> “command” “options” “arguments”. - scp [source] [dest] --> scp log.cron root@lxbst2277:/etc/file.conf --> Secure copy to other machines. - cp [options]... Source Dest --> Copy Source to Dest or Directory. - lp [options]...[file...] --> Send files to a printer. - cd [Options] [Directory] --> Change Directory - change the current working directory to a specific folder. - pwd [-LP] --> Print Working Directory. - ls [Options]... [File]... --> List information about files. - ll = ls -l [file] --> List directory contents using long list format. - cat [Options] [File]... --> Concatenate and print (display) the content of files. - grep [options] PATTERN [FILE...] --> Search file(s) for specific text. - sort [options] [file...] --> Sort text files. Sort, merge, or compare all lines from files given. - cut [OPTION]... [FILE]... --> Divide a file into several parts (columns). - tr [options]... SET1 [SET2] --> Translate, squeeze, and/or delete characters. - mv [options]... Source... Directory --> Move or rename files or directories. - source filename [arguments] --> Read and execute commands from the filename argument in the current shell context. - mkdir [Options] folder... --> Create new folder(s), if they do not already exist. - xrdcp [options] source destination --> Copies one or more files from one location to another. - df [option]... [file]... --> Disk Free - display free disk space. - alias [-p] [name[=value] ...] --> Create an alias that substitut a string with a word. - awk <options> 'Program' Input-File1 Input-File2 --> Find and Replace text, database sort/validate/index. - fs [Options --> la, sa, lq] [Directory] --> Permit to see who can access to files or directories on AFS. - rm [options]... file... --> Remove files (delete/unlink). - man [command name] --> Format and display help pages. - ssh --> ssh -l root [machine_name] or ssh [user@][hostname] --> OpenSSH SSH client (remote login program). - kinit [-R] --> Requests renewal of the ticket-granting ticket. - history --> Command Line history. → to remove a line from the history → history -d [line.number] - service [hostname or script-name] COMMAND --> Show the status or start, stop, and restart the daemons and other services. COMMAND = start – stop – status - restart - vimdiff [options] [file1] [file2] --> Edit one, two or more version of a file and show differences. - uptime --> Tell how long the system has been running. - /dev/urandom --> It is a random number generator to create random files. - less [path][script] --> See all code of the script - Parsing --> [script_path]... --[parameter1]... --[parameter2]... --> How to write a parsing of a script. - CTRL+R --> Reversed search - locate [word]--> Useful to find where are the logs - iftop – iotop – iostat --> Network monitoring - uname -a --> check name and so version of the machine - ifconfig → Check information in the machine, IP, MAC address, etc... - eos-disk-menu.py - uname -r → check kernel version of a machine (if you want to compare what is in GRUB → cat /boot/grub/grub.conf) - service iptables stop → Stop the firewall - wall → to send messages to everybody connected in one machine - who → to see who is connected in the console - last --> show last logged in users, crashes and reboots Common Operations: How to create a new directory and an empty file: [root@lxfsrd06c03 ~]# mkdir /etc/castor/ [root@lxfsrd06c03 ~]# echo -n > /etc/castor/castor.conf Special Characters: # --> Comment. * --> String wildcard. \ --> Quote next character. > --> Output redirect. < --> Input redirect. & --> Background job: [command..] > [file] & --> Send output in background to a file. | --> Pipe: redirect the output of a command into the standard input of another command instead of a file. ; --> Shell command separator, permits putting two or more commands on the same line. ` --> Back tick: everything you type between backticks is evaluated (executed) by the shell before the main command. Automated Operations: - for i in [host1] [host2] [host3] ...; do echo “[command] $i”; done | sh Parallel Login: - wassh -t 99999 -l root [host1],[host2],[host3] “[command]” File Permission: - ls -l /afs/.../file.key --> Check setted permissions. - chmod [600] /afs/.../file.key --> Set file permissions. Me Group Other RWX RWX RWX 4 2 1 - - - - - - ---> /afs/.../file.key 6 0 0 How to make a script executable: - chmod +x [script_path] Phyton Debug --> PDB or strace -p [PID] Service Now (SNOW) assignments: In case of Hardware problems → Repair Services In case of Software problems → Sys Admins Team In case of a network cable problem → Facility operation (search the FE in snow) How to recover AFS home directory from backup: This procedure is for the WORK space. Concerning the user space have a look on the link below. ~/tmp$ fs mkm -dir test -vol work.$USER.backup ~/tmp$ ls test (copy stuff) eventually ~/tmp$ fs rmm -dir test https://cern.service-now.com/service-portal/article.do?n=KB0000430 To recover single files written after 6:30 pm of the previous day it is needed to use → afs restore Check information about users and groups: - getent passwd [user] - getent group [GID] - ls -l [path] - id [user] - groups [user] - getacl [path] - chown user:group [path] --> change ownership - ACLs → http://eos.readthedocs.org/en/latest/configuration/permission.html How to see if a machine is an headnode: - cat /etc/sysconfig/eos - nslookup [ip] How to transform a vertical list in an horizontal one: for i in `cat /afs/cern.ch/user/a/afiorot/LISTS/abc`; do echo "$i "| tr '\n' ' '; done How to check the status of all the nodes in an instance: [afiorot@aiadm054 ~]$ for i in `wassh -c eoslhcb --list`; do roger show $i 2>/dev/null | tr '\n' ' ' | tr '[{}]:,' ' ';echo; done | awk '{print $1, $2, $3, $4, $9, $10, $8, $13, $14, $15, $16, $17, $18, $19, $20}' | sort for i in `wassh -cl eos/cms/storage --list`; do roger show $i 2>/dev/null | tr '\n' ' ' | tr '[{}]:,' ' ';echo; done 2>/dev/null --> if there are errors in the output the commmand delete them tr '\n' '|' --> If there are more variables in column, this command put them in one line tr " " "\n" --> If I have variables on the same lines, this command put them in column [root@p05508916e55509 ~]# cat /afs/cern.ch/user/a/afiorot/LISTS/ldap | tr '\n' '|' | sed 's/||/\n/g' | tr '|' ' ' | grep -E "gidNumber|uidNumber" Service or process problems: if there is some problem or error messages with services or processes, restart it --> service [srvname] restart When service goes down: -Check which kind of error is form IT Status Board How to do some statistic: [afiorot@c2atlassrv301 ~]$ zgrep atlspecial /var/log/castor/stagerd.log-201412* | grep atnight | tr ' ' '\n' | grep Filename | sort | uniq -c | sort -nr [afiorot@c2atlassrv301 ~]$ zgrep atlspecial /var/log/castor/stagerd.log-201412* | grep atnight | tr ' ' '\n' | grep Type | sort | uniq -c | sort -nr How to see the mapping of a user: [afiorot@c2atlassrv301 ~]$ grep atlas003 /etc/grid-security/grid-mapfile How to add a machine in Puppet: echo "c2repack-2.cern.ch 128.142.37.94 C8-60-00-1B-D8-D6" | ai-foreman addhost -o "SLC 6.7" --architecture "x86_64" --hostgroup "castor/c2repack/headnode" --environment "production" --ptable "Castor_MDRaid1_SystemOnSmallestDisks" -m "SLC" How to check the list of files are present in a CASTOR pool: /afs/cern.ch/project/castor/www/DiskPoolDump/lhcb.lhcbdisk.last.gz How to create directory in EOS and list them: - eos mkdir /eos/castorlhcb-decommission/lhcbt3.remains - eos ls -l /eos/castorlhcb-decommission/ drwxrwsr-+ 1 afiorot c3 1 Jun 03 17:17 castor drwxrwsr-+ 1 root root 2 Sep 14 15:31 lhcbdisk.remains drwxrwsr-+ 1 root root 0 Sep 16 14:08 lhcbt3.remains How to sum a list of number: …........ 17268959 942202576 289531559 [afiorot@wlustagemanu ~]$ awk '{ sum += $1 } END { print sum }' /afs/cern.ch/user/a/afiorot/list.txt 11780693487319 LEMON METRICS: Exceptions: If there is some unusual exception: search for it in --> ls /etc/lemon/agent/metrics/ | grep [exception] take the ID look in the https://metricmgr.cern.ch How to read a metric: In case I want to know how the following Notification works --> exception.packetsDropped Find the notification in metricmgr.cern.ch –> look for the correlation, that in this case is: Correlation: ((13367:1 eq 'interface') || (13367:3 > 300 ) || (13367:5 > 300)) The correlation explain how the exception works and which counters are taken in consideration. 13367:1 → the first field indicate the number of the metric, while the second one the index of that metric have a look at the Metrics page in metricmgr.cern.ch → Open the Metric class for details In the correlation are taken into account 3 fields (1-3-5) that correspond to: 1 InterfaceName 3 NumReceivedDroppedLastInterval 5 NumTransmittedDroppedLastInterval At this point we know what the correlation means: if “NumReceivedDroppedLastInterval” or “NumTransmittedDroppedLastInterval” are more than 300 during 5 min of interval, the notification will be raised. - To check that the interval is 5 min → login in metricmgr → open the notification –> Edit → Period (300 sec in this case) How to disable a metric temporary: lemon-host-check -d ID --duration=2678400