Table of Contents Transferring Files & Data...... 1 Transfer: Overview...... 1 Verifying Files Transferred to the Lou Mass Storage System...... 4

Using Shift for Local and Remote Transfers (Recommended)...... 5 Shift Transfer Tool Overview...... 5 Checking Shift Transfer Status and Restarting Transfers...... 7 Shift Options...... 9 Using Shift for Local Transfers and Operations...... 19 Using Shift for Transfers and Tar Operations Between Two NAS Hosts...... 22 Using Shift for Remote Transfers and Tar Operations...... 24

Local Transfers...... 27 Checking File Integrity...... 27 Local Commands...... 29 Shift Transfer Tool Overview...... 31

Remote Transfers...... 32 Remote File Transfer Commands...... 32 Checking File Integrity...... 36 Using GPG to Encrypt Your Data...... 38 Shift Transfer Tool Overview...... 41 Using bbFTP and bbSCP for Remote Transfers...... 42 The bbSCP ...... 42 Using bbSCP for and Verification...... 45 Using bbFTP for Remote File Transfers...... 47 Using SUP for Remote Transfers...... 52 Using the Secure Unattended Proxy (SUP)...... 52 Advanced SUP Use...... 58 Using the SUP Virtual File System...... 58 Using the SUP without the SUP Client...... 62 Examples...... 68 Inbound File Transfer Through SFEs: Examples...... 68 Outbound File Transfer Examples...... 71

Optimizing/Troubleshooting...... 73 Increasing File Transfer Rates...... 73 Dealing with Slow File Retrieval...... 74 TCP Performance Tuning for WAN Transfers...... 77 Optional Advanced Tuning for Linux...... 79 Streamlining PBS Job File Transfers to Lou...... 80 File Transfers Tips...... 81 Troubleshooting SCP File Transfer Failure with Protocol Error...... 82 Transferring Files & Data

File Transfer: Overview

Here is a general overview of the various file transfer scenarios within the NAS environment, with pointers to related articles.

File Transfers Between NAS Hosts

For basic information about transferring files within the NAS secure enclave, see the following articles:

• Local File Transfer Commands - , cxfscp, mcp, shiftc, • Remote File Transfer Commands - scp, bbftp/bbscp, shiftc • Pleiades Front-End Usage Guidelines - file transfer between the compute systems or mass storage systems • Streamlining File Transfers from Pleiades Compute Nodes to Lou - within a PBS job • Checking File Integrity - to ensure the integrity of the data before and after the transfer

File Transfers Between a NAS Host and Your Local Host

Transferring files between a NAS host (such as Pleiades, Endeavour, or Lou) and a remote host, such as your local desktop, is complex. There are multiple factors that you should be aware of:

Which Commands to Use

Remote file transfer commands such as scp, bbftp, bbscp, and sup/shiftc are supported on most NAS systems. Depending on the way the transfers are performed, you may need one or both of the client and software for SCP, bbFTP, the bbSCP script, and/or the SUP/Shift client installed on your local host.

Transfer Rates

File transfer rates with the scp command, especially using SCP from versions of OpenSSH older than 4.7, can be as slow as 2 megabytes per second (MB/sec). For transferring large files over a long distance, consider doing the following:

• Upgrade to the the latest version of OpenSSH. • Enable compression by adding - to the scp command line if the data will compress well. • Use bbftp/bbscp and/or sup/shiftc.

Security Issues

• With SCP, your authentication information (such as password or passcode) and data are encrypted. • With bbFTP and bbSCP, only the authentication information is encrypted, while data is not.

Transferring Files & Data 1 • You can use the SFEs or Secure Unattended Proxy (SUP). SUP is the easiest way to transfer files from and/or to your site if your local system is configured to allow the transfer.

Inbound File Transfers

When a file transfer command is initiated on a remote host, such as your local desktop, the transfer must go through either the SFEs or SUP.

Using the Secure Front Ends

Going through the SFEs requires authentication via RSA SecurID the of the transfer. You will be prompted for your RSA SecurID passcode when you issue the file transfer commands (such as scp, bbftp, or bbscp).

You can use one of the following approaches:

1. One step via SSH Passthrough: Initiate scp, bbftp/bbscp from your local system to Pleiades, Endeavour, or Lou if SSH Passthrough has been set up. 2. One step via SSH ProxyCommand: Initiate scp from your local system through an SFE using the ssh ‑oProxyCommand option.

To learn more, see Inbound File Transfers through SFEs Examples.

Using the Secure Unattended Proxy

Going through the SUP does not require RSA SecurID token authentication at the time of transfer. Instead, special "SUP keys" using RSA SecurID authentication must be obtained ahead of time. The SUP keys are good for one week and are used automatically to authenticate your file transfers using scp, bbftp, or bbscp issued on a command line or in a job script.

TIP: We highly recommended that you learn about and use the Shift tool, which can be used together with the SUP to provide automated, reliable, and fast file transfers. WARNING: Although users have accounts on the SUP servers, no login session is allowed. File transfers going through SUP offers multiple benefits over going through the SFEs:

• SUP allows transfers to be unattended; that is, you do not have to in your password, passphrase, or passcode when the file transfer command is issued. So, file transfers can be done within a script that can be scheduled to run ahead of time. File transfers through the SFEs cannot be done in a script. • File transfers through SUP are done in one step, and setting up SSH passthrough is not needed since the SFEs are not involved. • SUP automatically sets some options, such as the port range allowed for bbFTP transfers, so that you don't have to set them explicitly. So, the syntax for bbFTP over SUP is greatly simplified compared to bbFTP without SUP.

TIP: Some sites only allow specific outbound ports; SUP allows setting custom ports manually if needed. For example: sup bbftp -E 'bbftpd -e50000:51000' -e 'put foobar /tmp/foobar' pfe21.nas.nasa.gov See Using the Secure Unattended Proxy (SUP) and Shift Transfer Tool Overview for more information.

File Transfer: Overview 2 NAS Username and Your Local Username

If your NAS username and local username are different, you may have to add the appropriate username in the scp, bbftp, bbscp, or sup/shiftc command line.

• If you issue the command on your local host, then username is your NAS username. • If you issue the command on a NAS host, then username is your local username.

In the examples shown in Outbound File Transfer Examples and Inbound File Transfers through SFEs Examples, you will the correct syntax for adding the appropriate username in the file transfer commands.

For inbound file transfers, if you have correctly included your NAS username in the ~/.ssh/config file of your local system, you do not have to include your NAS username in the scp, bbftp, bbscp, or sup/shiftc command. A template for ~/.ssh/config is available for download.

Improving File Transfer Performance

Some file transfer commands provide options that can be used to improve your transfer rates. For example, enabling compression during file transfers may in some cases; with the bbftp command, you can use multiple streams instead of a single stream for better performance. Read Tips for File Transfers and Increasing File Transfer Rates for more information.

File transfer performance is also dependent on some system-wide settings. If necessary, ask your local system administrator to look into issues discussed in the following articles:

• TCP Performance Tuning for WAN Transfers • Optional Advanced Tuning for Linux • Pittsburgh Supercomputing Center's Enabling High Performance Data Transfers - a properly tuned TCP/IP stack

The NAS Networks team can help you analyze the performance your file transfers. Contact us at [email protected].

File Transfer: Overview 3 Verifying Files Transferred to the Lou Mass Storage System

It is a good practice to confirm whether files are copied correctly to the Lou mass storage system after you transfer them.

TIP: The easiest way to verify the integrity of your file contents is to use the NAS-developed Shift tool to transfer the files. By default, Shift automatically performs a checksum operation on the data at both the source and the destination, as part of the transfer. If corruption is detected, partial file transfers and checksum operations will be performed until the problem is fixed. If you transfer files using a command other than Shift, such as scp, the simplest and most lightweight method to verify transferred files is to compare the size (disk usage) and number of files of the original with that of the . For example, if you use scp to transfer dir1 from pfe21 to lfe5: pfe21% -sk dir1 353760 dir1 pfe21% find dir1 | -l 51 pfe21% scp -rp dir1 lfe5: lfe5% du -sk --apparent-size dir1 353684 dir1 lfe5% find dir1 | wc -l 51

In the example, the sizes nearly match (353760; 353684), and the number of files matches exactly (51).

Note: In most cases the sizes will not match exactly. Using the --apparent-size option of the du command is necessary on the Lou systems because the data may reside on either tape or disk.

Verifying Files Transferred to the Lou Mass Storage System 4 Using Shift for Local and Remote Transfers (Recommended)

Shift Transfer Tool Overview

The NAS-developed Shift tool can copy files locally on NAS enclave hosts, transfer files between hosts inside the NAS enclave, and transfer files between the NAS enclave and remote hosts. You can also use Shift to check the status of transfers at any time, receive email notification of completion, errors, and warnings, and restart interrupted transfers or transfers with errors.

All functionality is accessed through the Shift client, which is invoked via the shiftc command. The syntax for shiftc is similar to the syntax for the cp and scp commands.

Shift is the recommended method for transferring files to and from the Lou mass storage system, as it can create tar files as part of the transfer and a transfer into multiple tar files for oversized directories (larger than 1 TB).

Advanced Features

Shift includes the following advanced features:

• Automatic parallelization of transfers • Local and remote tar creation and extraction • Synchronization based on modification times and checksums (similar to rsync) • Automatic file integrity verification and correction • Ability to restart transfers • Automatic retrieval of files from tape storage (DMF-managed Lou filesystems) • Ability to check status of current transfers

How to Use Shift

See the following articles for detailed information about how to use Shift:

• Using Shift for Local Transfers and Tar Operations • Using Shift for Transfers and Tar Operations Between Two NAS Hosts • Using Shift for Remote Transfers and Tar Operations • Checking Shift Transfer Status and Restarting Transfers • Shift Command Options

Additional Resources

You can also see presentation slides for three HECC training webinars that demonstrate how to use the tool:

• Simplifying and Optimizing Your Data Transfers (PDF) • Advanced Features of the Shift Automated File Transfer Tool (PDF) • Simple Automated File Transfers Using SUP and Shift (PDF)

Recordings of each presentation are also available in the Past Webinars Archive.

Using Shift for Local and Remote Transfers (Recommended) 5 Note: Some hostnames and options may have changed since the webinars were presented.

Shift Transfer Tool Overview 6 Checking Shift Transfer Status and Restarting Transfers

Shift can provide status on transfers currently in progress, and can report results of transfers that were performed within the past week. You can restart and complete any interrupted transfers or transfers that completed with errors.

Note: Transfers that have been inactive for more than a week cannot be restarted.

• Check transfer status from a NAS high-end computing host (for example, pfe21):

pfe21% shiftc --status • Check transfer status from remote host (for example, your local system):

your_local_system% sup shiftc --status • Restart a transfer that has errors or was interrupted:

pfe21% shiftc --restart --id=transfer_id

Checking Transfer Status

Show the status of all transfers: pfe21% shiftc --status id | state | dirs | files | file size | date | run | rate | | sums | attrs | sum size | time | left | ---+------+------+------+------+------+------+------1 | done | 0/0 | 1/1 | 92KB/92KB | 10/03 | 2s | 46KB/s | | 0/0 | 0/0 | 0.0B/0.0B | 17:06 | | 2 | done | 0/0 | 1/1 | 92KB/92KB | 10/03 | 8s | 11.5KB/s | | 0/0 | 1/1 | 0.0B/0.0B | 17:06 | | 3 | done | 1/1 | 2/2 | 99KB/99KB | 10/03 | 1s | 99KB/s | | 4/4 | 0/0 | 198KB/198KB | 17:07 | | 4 | error | 1/1 | 1/2 | 92KB/99KB | 10/03 | 3s | 30.7KB/s | | 0/0 | 0/0 | 0.0B/0.0B | 17:08 | | 5 | done | 1/1 | 64/64 | 65.5GB/65.5GB | 10/03 | 29s | 2.26GB/s | | 0/0 | 0/0 | 0.0B/0.0B | 17:09 | |

Showing Detailed Status

The examples in this section show how to display details about specific transfers that are listed in the output in the previous example.

TIP: Showing the detailed status of a single transfer might result in extremely large output, as a single transfer may contain hundreds, thousands, or even millions of transferred files—and multiple lines may be output for each file. Therefore, if you want to see the detailed status of a very large transfer, redirect the output to a file. For example: shiftc --status --id=5 > status.id5

You can also search for files matching a given pattern, and show only their status. For example: shiftc --status --id=1 --search=file2

Show the detailed status of all operations in transfer #2, from your local system: your_local_system% sup shiftc --status --id=2 state | op | target | size | date | run | rate

Checking Shift Transfer Status and Restarting Transfers 7 | tool | info | | time | left | ------+------+------+------+------+------+------done | cp | lfe2:/u/user1/file1 | 92KB | 10/03 | 5s | 18KB/s | bbftp | - | | 17:06 | | done | chattr | lfe2:/u/user1/file1 | - | 10/03 | 1s | - | sftp | - | | 17:06 | |

Show the detailed status of all operations in transfer #3 that involve a filename containing file2: pfe21% shiftc --status --id=3 --search=file2 state | op | target | size | date | run | rate | tool | info | | time | left | ------+------+------+------+------+------+------done | cp | /username/dir2/file2 | 7KB | 10/03 | 1s | 7KB/s | mcp | - | | 17:07 | | done | | /username/dir2/file2 | 7KB | 10/03 | 1s | 7KB/s | msum | - | | 17:07 | |

Show the detailed status of all operations in transfer #4 that have an error state: pfe21% shiftc --status --id=4 --state=error state | op | target | size | date | run | rate | tool | info | | time | left | ------+------+------+------+------+------+----- error | cp | /username/dir2/file2 | 7KB | - | - | - | rsync | rsync: send_files | | | | | | failed to open | | | | | | "/username/dir2/file2": | | | | | | Permission denied | | | |

Showing Transfer History

Show the history of all transfers: pfe21% shiftc --history id | origin | command ---+------+------1 | pfe21.nas.nasa.gov | shiftc file1 /u/username/dir1 | [/u/user1] | 2 | pfe21.nas.nasa.gov | shiftc -p file1 lfe2: | [/u/user1] | 3 | your_local_system | sup shiftc -r --no-verify /username/dir1 lfe2:/username/dir2 | [/Users/user1] | 4 | your_local_system | sup shiftc -r --secure lfe2:/username/dir2 . | [/username] | 5 | pfe21.nas.nasa.gov | shiftc -r --hosts=2 /nobackup/user1/bigdir1 /nobackup/user1/bigdir2 | [/u/user1] |

Show the history of all transfers that involve a host or a command containing lfe2 from your local system: your_local_system% sup shiftc --history --search=lfe2 id | origin | command ---+------+------2 | pfe21.nas.nasa.gov | shiftc -p file1 lfe2: | [/u/user1] | 4 | your_local_system | shiftc -r --secure lfe2:/username/dir2 . | [/username] |

Checking Shift Transfer Status and Restarting Transfers 8 Shift Command Options

Options for the shiftc command are described in this section. For more information, including syntax and examples, see man shiftc on any Pleiades or Lou front-end system (PFE or LFE).

Initialization Options

Transfers are initialized using syntax identical to cp and scp commands for local and remote transfers, respectively.

--clients=NUM Parallelize the transfer by using additional clients on each host. If the number given is 1, no additional clients will be used. A number greater than 1 will fork additional processes on each host to more fully utilize system resources and improve transfer performance. --create-tar Create a tar file of all sources at the destination, which must be a non-existing filename. This option implies --recursive and --no-offline. By default, multiple tar files are created at 1 TB boundaries. The split size may be changed or splitting disabled using the --split-tar option. The --index-tar option may be used to produce a table of contents file for each tar file created. Note that this option cannot be used with --sync. Create any missing parent directories. This option allows files to be transferred to a directory hierarchy that may not already exist, similar to the -d option of the install command. --L, --dereference Always follow symbolic links to both files and directories. Note that this can result in file and directory duplication at the destination as all symbolic links will become real files and directories. -d, --directory Create any missing parent directories. This option allows files to be transferred to a directory hierarchy that may not already exist, similar to the -d option of the install command. --exclude=REGEX Do not transfer source files matching the given regular expression. Note that regular expressions must be given in Perl syntax (for details, see perlre(1) on The Perl Foundation website) and should be quoted on the command line when including characters normally expanded by the shell (such as "*"). Shell wildcard behavior can be approximated by using ".*" in place of "*". --extract-tar Extract all source tar files to the destination, which must be an existing directory or non-existing directory name. This option implies --no-offline. Note that only tar archives in the POSIX ustar are supported, but GNU extensions for large UIDs, GIDs, file sizes, and filenames are handled appropriately. Also note that this option cannot be used with --sync. -f, --force Overwrite existing read-only files at the destination by temporarily adding owner permission. File permissions will be restored later in the transfer. Note, however, that if the transfer does not complete successfully, files may be left with the wrong permissions. Also note that files marked as immutable using chattr +i cannot be overwritten even when this option is in effect. --host-file=FILE Parallelize the transfer by using additional clients on the hosts specified in the given file (one hostname per line). This option implies a value for the --hosts option that is equal to the number of hosts in the file plus any additional hosts from the --host-list option. Fewer hosts may be used by explicitly specifying a value for the --hosts option. Note that the actual number of client hosts used will depend on the number of hosts that have equivalent access to the source and/or destination filesystems. Within PBS job scripts,

Shift Command Options 9 this option can be set to the $PBS_NODEFILE variable to use all nodes of the job. --host-list=LIST Parallelize the transfer by using additional clients on the hosts specified in the given comma-separated list. This option implies a value for the --hosts option that is equal to the number of hosts on the list plus any additional hosts from the --host-file option. Fewer hosts may be used by explicitly specifying a value for the --hosts option. Note that the actual number of client hosts used will depend on the number of hosts that have equivalent access to the source and/or destination filesystems. --hosts=NUM Parallelize the transfer by using additional clients on (at most) the given number of hosts. If the number given is 1, no additional clients will be used. A number greater than 1 enables automatic transfer parallelization where additional clients may be invoked on additional hosts to increase transfer performance. Note that the actual number of clients used will depend on the number of hosts for which Shift has filesystem information and the number of hosts that have equivalent access to the source and/or destination filesystems. Client hosts will be accessed as the current user with host-based authentication or an existing ssh agent that contains an ssh from a file matching ~/.ssh/id*. --identity=FILE Authenticate to remote systems using the given ssh identity file. The corresponding public key must reside in the appropriate user's ~/.ssh/authorized_keys file on the remote host. Note that only identity files without passphrases are supported. If a passphrase is required, an ssh agent may be used instead, but with a loss of reliability. This option is not needed if the remote host accepts host-based authentication from client hosts. -I, --ignore-times By default, the --sync option skips the processing of files that have the same size and modification time at the source and destination. This option specifies that files should always be processed by checksum regardless of size and modification time. --include=REGEX Only transfer source files matching the given regular expression. Note that regular expressions must be given in Perl syntax (see perlre(1) on the Perl Foundation website for details) and should be quoted on the command line when including characters normally expanded by the shell (such as "*"). Shell wildcard behavior can be approximated by using ".*" in place of "*". --index-tar Create a table-of-contents file for each tar file created with the --create-tar option. The table of contents will show each file in the tar file along with permissions, user/group ownership, and size. For a tar file "file.tar", the table of contents will be named "file.tar.toc". Unless the --no-verify option is used, a checksum file named "file.tar.sum" will also be created, which is suitable as input for msum --check- -c. Note that when the --split-tar option is used, multiple table-of-contents files may be created. For each split tar file "file.tar-i.tar", the table of contents will be named "file.tar--.tar.toc" and the checksum file will be named "file.tar-i.tar.sum". --newer=[TYPE:]DATE Only transfer source files whose modification time (or combination of modification, access, and/or creation times) is newer (inclusive) than the given date. Any date string supported by the Perl Date::Parse module (see Date::Parse(3) for details) can be specified. An optional type expression of the form "[acmACM]+(|[acmACM]+)*)" can be given to specify conditions in which one or more conditions are or are not newer than the date, where: "a" is access time; "c" is creation time; "m" is modification time; and "A", "C", and "M" are their inverses, respectively. For example, "aM|cm" would transfer source files whose access time was newer than the date but whose modification time was not newer, or files whose creation time and modification time were newer. Note that this option can be combined with --older to specify exact date ranges. --P, --no-dereference Never follow symbolic links to files or directories. Note that this can result in broken links at the destination, as files and directories referenced by symbolic links that were not

Shift Command Options 10 explicitly transferred or implicitly transferred using --recursive might not exist on the target. -T, --no-target-directory Do not treat the destination specially when it is a directory or a to a directory. This option can be used with recursive transfers to copy a directory's contents into an existing directory instead of into a new subdirectory beneath it as is done by default. --older=[TYPE:]DATE Only transfer source files whose modification time (or combination of modification, access, and/or creation times) is older than the given date. Any date string supported by the Perl Date::Parse module (see Date::Parse(3) for details) can be specified. An optional type expression of the form "[acmACM]+(|[acmACM]+)*)" can be given to specify conditions in which one or more conditions are or are not older than the date, where: "a" is access time; "c" is creation time; "m" is modification time; and "A", "C", and "M" are their inverses, respectively. For example, "aM|cm" would transfer source files whose access time was older than the date but whose modification time was not older, or files whose creation time and modification time were both older. Note that this option can be combined with --newer to specify exact date ranges. --pipeline Produce verified files earlier in the transfer by preferring to process the normal sequence of operations (find, copy, checksum, verify checksum, change attributes) in reverse order. In default (non-pipeline) operation, these stages are performed in order where all files are found before any are copied; before any are checksummed, etc. When this option is enabled, files that have reached the change attribute stage will be processed before files that have reached the verify checksum stage, which will be processed before files that have reached the checksum stage, etc. This allows you to perform parallel processing on verified files while the transfer is still ongoing. To determine the list of files that have been successfully verified in a transfer with id "N", use --status --id=N --state=done --search=chattr. When multiple clients are participating in the transfer (i.e., --clients or --hosts options are specified with a value greater than 1), different clients will prefer different stages for more overlap of reads and writes between the source and destination filesystems. Note that while several strategies are employed to ensure that checksums are computed from disk and not from cache, it is safest to use this option only when there is actually a need to process destination files during the transfer. --ports=NUM1:NUM2 Use ports from the range NUM1-NUM2 for the data streams of TCP-based transports (currently, bbcp, bbftp, fish-tcp, and gridftp). All connections originate from the client host so the given port range must be allowed on the network to the remote host and by the remote host itself. -R, -r, --recursive Transfer directories recursively. This option implies --no-dereference. Note that any symbolic links pointing to directories that are given on the command line will be followed during recursive transfers (identical to the default behavior of the cp command). --secure Encrypt data during remote transfers and use secure ciphers and MACs with SSH-based transports. Note that this option will, in most cases, decrease performance as it eliminates some higher performance transports and increases CPU utilization during SSH connections. --sync Synchronize files between the source and destination, similar to the rsync command. By default, files that have the same size and modification time at the source and destination will not be transferred. If the size or modification time of a file differs between the two, the contents of the file will be compared via checksum and any portions that differ will be transferred to the destination. To skip the size and modification time checks and always begin with the checksum stage, use -I or --ignore-times. If the --no-verify option is specified, integrity verification is not performed; this will increase performance when

Shift Command Options 11 there are many files at the source that are not at the destination, but will decrease performance when there are large files that have only small changes between the source and destination. Setting the --retry option to zero with this option can be used to show which files differ without making any changes. Note that when syncing directories, the destination should be specified as the parent of the location where the source directory should be transferred to. Also note that this option cannot be used with the --create-tar or --extract-tar options. --user=USER Set the user that will be used to access remote systems. -- Block until the transfer completes and a summary of the transfer. This option implies --no-mail. An value of 0 indicates that the transfer has successfully completed while an exit value of 1 indicates that the transfer has failed or that the waiting process was terminated prematurely. This option may be used together with --monitor to show the real-time status of the transfer while waiting.

Feature Disablement Options

The following options disable certain default features.

--no- Do not attempt to from host/process failures via cron. Note that when such a failure occurs, the transfer will become stuck in the "run" state until stopped. --no-mail[=LIST] By default, emails are sent when a transfer completes successfully, aborts with errors, or is stopped; and for the first instances of alerts, errors, throttling, and/or warnings while running. This option prevents emails from being sent altogether or, optionally, for a specific subset of states. The given list may be a comma-separated subset of {alert, done, error, run, stop, throttle, warn}. This option may be desirable when performing a large number of scripted transfers. Note that equivalent transfer status and history information can always be manually retrieved using the --status and --history options, respectively. --no-offline By default, files transferred to and from DMF-managed filesystems will be migrated to offline media as soon as the transfer is completed. This option specifies that files should instead be kept online (not migrated). Note that DMF may still choose to migrate a file even when this option is enabled. --no-preserve[=LIST] By default, times, permissions, ownership, striping, ACLs, and extended attributes of transferred files and directories are preserved when possible. This option specifies that these items (or an optional specified subset) should not be preserved. The given list may be a comma-separated subset of {acl, mode, owner, stripe, time, xattr}. Note that permissions may be left in various states depending on the invoking user's and the transport utilized. In particular, read access at the destination may be more permissive than read access at the source. --no-recall By default, files transferred from DMF-managed filesystems will be recalled from offline media as soon as the transfer begins and again before each batch of files is processed. This option specifies that files should not be recalled. Note that DMF will still recall files as needed even when this option is enabled. --no-sanity Disable file existence and size checks at the end of the transfer. This option was included for benchmarking and completeness purposes and is not recommended for general use. --no-silent By default, the checksums of all files transferred with Shift are stored in a per-user database. When a file with a known checksum is transferred and has not been modified

Shift Command Options 12 since the checksum was stored, the transfer will be put into the "alert" state if the current checksum does not match the stored checksum. This option disables the storage of checksums and comparison against existing checksums. While silent corruption detection adds minimal overhead during normal operation, it can increase the probability of lock contention when there are large numbers of clients. --no-verify By default, files are checksummed at the source and destination to verify that they have not been corrupted and if corruption is detected, the corrupted portion of the destination file is automatically corrected using a partial transfer from the original source. This functionality decreases the performance of transfers in proportion to the file size. If assurance of integrity is not required, the --no-verify option may be used to disable verification.

History, Management, and Status Options

Once one or more transfers have been initialized, you may view transfer history, stop/restart transfers, and/or check transfer status with the following options.

--history[=csv] Show a brief history of all transfers including the transfer identifier, the origin host/directory, and the original command. When --history=csv is specified, history is shown in CSV format. --id=NUM Specify the transfer identifier to be used with management and status commands. --last-sum Query the silent corruption database for all files given on the command line and print (one file per line) the last known checksum, the file modification time associated with this checksum, and the filename. When --index-tar is given, the first file argument is assumed to be a tar file and the remaining arguments names of files within the tar for which checksum information will be printed. A checksum of "-" means that no information is stored for the file. --mgr=HOST Set the host that will be used to manage transfers. By default, this host will be accessed as the current user with host-based authentication or an existing ssh agent. The user and/or identity used to access the manager host may be changed with the --mgr-user and --mgr-identity options, respectively. --mgr-identity=FILE Authenticate to the manager host using the given ssh identity file. The corresponding public key must reside in the appropriate user's ~/.ssh/authorized_keys file on the manager host. Note that only identity files without passphrases are supported. If a passphrase is required, an ssh agent may be used instead, but with a loss of reliability. This option is not needed if the manager host accepts host-based authentication from client hosts. --mgr-user=USER Set the user that will be used to access the manager host. Note that if the transfer is initiated by root and the --mgr-identity option is not specified, manager communication will be performed as the given user, so that user must be authorized to run processes locally. In particular, care should be taken on PBS-controlled nodes, where the given user should either own the node or be on the user exception list. --monitor[=FORMAT] Show the realtime status of all running transfers including the transfer identifier, the current state, the number of directories completed, the number of files transferred, the number of files checksummed, the number of attributes preserved, the amount of data transferred, the amount of data checksummed, the time the transfer started, the duration of the transfer, the estimated time remaining in the transfer, and the rate of the transfer. Note that updates are realtime with respect to the information available to the

Shift Command Options 13 manager and not with respect to the transports that may be carrying out the transfer. Status will be returned in CSV format when the --monitor=csv is specified. Duration and estimated time will be zero-padded when --monitor=pad is specified. When --monitor=color is specified, transfers in the {error, run, throttle, warn} states will be shown with {red, green, magenta, yellow} coloring, respectively. When --id is specified, only the given transfer will be shown. When all transfers (or the one specified) have completed, the command will exit. This option may be used with the --wait option to monitor progress while waiting. --plot=[=[BY:]LIST] Produce output suitable for piping into gnuplot (version 5 or above) that shows detailed performance over time across all transfers. The --id and --state options may be used to plot only a single transfer or transfers in a particular state, respectively. The default plot will show the aggregate performance of each I/O operation (such as cp, sum, and cksum) and the aggregate performance of each metadata operation (such as find, , , and chattr). I/O operations are plotted against the left y-axis while metadata operations are plotted against the right y-axis. The list of plotted items may be changed by giving a comma-separated list consisting of one of more of {chattr, cksum, cp, find, io, ln, meta, mkdir, sum}. Note that "io" is a shorthand for "cp,sum,cksum" and "meta" is a shorthand for "find,mkdir,ln,chattr". The list of items may be grouped by any of {host, id, user} by prefixing one of these terms to the list. For example, --plot=id:cp would show a curve for the copy performance of each tranfer id. When a grouping is given without a specific list of metrics (for example, --plot=id), "io" is assumed. --restart[=ignore] Restart the transfer associated with the given --id that was stopped due to unrecoverable errors or stopped explicitly via the --stop option. If --restart=ignore is specified, all existing errors will be ignored and the transfer will progress as if the associated files and directories were no longer part of the transfer. Note that transfers must be restarted on the original client host or one that has equivalent filesystem access. A subset of the available command-line options may be re-specified during a restart, including --bandwidth, --buffer, --clients, --cpu, --disk, --files, --force, --host-file, --host-list, --hosts, --io, --ior, --iow, --local, --, --netr, --etw, --no-cron, --no-mail, --no-offline, --no-recall, --pipeline, --ports, --preallocate, --remote, --retry, --secure, --size, --streams, --stripe, --threads, and --window. --search=REGEX When the --status and --id options are specified, this option will show the full status of file operations in the associated transfer whose source or destination filename match the given regular expression. When the --history option is specified, this option will show a brief history of the transfers in which the origin host or original command matches the given regular expression. Note that regular expressions must be given in Perl syntax (see perlre(1) for details). --state=STATE When the ??status and ??id options are specified, --state=STATE will show the full status of file operations in the associated transfer that have the given state. When --id is not specified, this option will show the brief status of transfers in the given state. Valid states are done, error, none, queue, run, and warn. A state of "none" will show a summary of the given transfer. --stats[=csv] Show stats across all transfers including transfer counts, rates, tool usage, initialization options, error counts, and error messages. When --tats=csv is specified, stats are shown in CSV format without error messages. --status[=FORMAT] Show a brief status of all transfers including the transfer identifier, the current state, the number of directories completed, the number of files transferred, the number of files checksummed, the number of attributes preserved, the amount of data transferred, the amount of data checksummed, the time the transfer started, the duration of the transfer, the estimated time remaining in the transfer, and the rate of the transfer. When the number of transfers exceeds a set threshold (the default is 20), older successfully

Shift Command Options 14 completed transfers beyond that limit will be omitted for readability. These omitted transfers can be shown using --status with --state=done. Status will be returned in CSV format when --status=csv is specified. Duration and estimated time will be zero-padded when --status=pad is specified. When --status=color is specified, transfers in the {done, error, run, stop, throttle, warn} states will be shown with {default, red, green, cyan, magenta, yellow} coloring, respectively. When ??id is specified, --status[=FORMAT] will show the full status of every file operation in the associated transfer. For each operation, this includes the state, the type, the tool used for processing, the target path, associated information (error messages, checksums, byte ranges, and/or running host) when applicable, the size of the file, the time processing started, and the rate of the operation. Note that not all of these items will be applicable at all times (for example, "rate" will be empty if the state is "error"). Also note that operations are processed in batches so the rate shown for a single operation will depend on the other operations processed in the same batch. When --status=color is specified, operations in the {done, error, queue, run, warn} states will be shown with {default, red, cyan, green, yellow} coloring, respectively. --stop Stop the transfer associated with the given --id. Note that transfer operations currently in progress will run to completion but new operations will not be processed. Stopped transfers may be restarted with the --restart option.

Transfer Tuning Options

Some advanced options are available to tune various aspects of the shiftc command's behavior. These options are not needed by most users.

--bandwidth=BITS Choose the TCP window size and number of TCP streams of TCP-based transports (currently, bbcp, bbftp, fish-tcp, and gridftp) based on the given bits per second. The suffixes k, m, g, and t may be used for Kb, Mb, Gb, and Tb, respectively. The default bandwidth is estimated to be 10 Gb/s if a 10 GE adapter is found on the client host, 1 Gb/s if the client host can be resolved to an organization domain (by default, one of the six original generic top-level domains), and 100 Mb/s otherwise. --buffer=SIZE Use memory buffer(s) of the given size when configurable in the underlying transport being utilized (currently, all but rsync). The suffixes k, m, g, and t may be used for KiB, MiB, GiB, and TiB, respectively. The default buffer size is 4 MiB. Increasing the buffer size trades higher memory utilization for more efficient I/O. --files=COUNT Process transfers in batches of at least the given number of files. The suffixes k, m, b or g, and t may be used for 1E3, 1E6, 1E9, and 1E12, respectively. The default batch count is 1000 files. This option works in concert with --size and --interval to manage the number of checkpoints and the overhead of transfer management. A batch will initially consist of at least --files files or --size bytes, whichever is reached first. The batch may then be dynamically increased in size until there is enough work to span --interval seconds. To batch selection completely dynamic, use --files=1 and --size=1. --interval=SECS Process transfers in batches that take about the given number of seconds. The default interval is 30 seconds. This option works in concert with --files and --size to manage the number of checkpoints, as well as the overhead of transfer management. A batch will initially consist of at least --files files or --size bytes, whichever is reached first. The batch may then be dynamically increased in size until there is enough work to span --interval seconds. Note that the actual time a batch takes will depend on its contents, and that the interval will be increased as the number of clients participating in a transfer increases to minimize contention for manager locks. To make batch selection completely

Shift Command Options 15 static, use --interval=0. --local=LIST Specify one or more local transports to be used for the transfer in order of preference, separated by commas. Valid transports for this option currently include bbcp, bbftp, cp, fish, fish-tcp, gridftp, mcp, and rsync. Note that the given transport(s) will be given priority, but may not be used in some cases (for example, rsync is not capable of transferring a specific portion of a file as needed by verification mode). In such cases, the default transport based on File::Copy will be used. The tool actually used for each file operation can be shown using the --status option with the --id option set to the given transfer identifier. --preallocate=NUM Preallocate files when their sparsity is under the given percent, where sparsity is defined as the number of bytes a file takes up on disk divided by its size. Note that this option will only have an effect when the fallocate command is available, the destination file does not already exist, and the target filesystem properly supports fallocate's -n option. Also note that this option will not function properly when either bbftp or rsync (to a DMF filesystem) is utilized as the transport due to their use of temporary files. --remote=LIST Specify one or more remote transports to be used for the transfer in order of preference, separated by commas. Valid transports for this option currently include bbcp, bbftp, fish, fish-tcp, gridftp, rsync, and sftp. Note that the given transport(s) will be given priority, but may not be used in some cases (for example, bbftp is not capable of transferring files with spaces in their names and is also incompatible with the --secure option). In such cases, the default transport based on sftp will be used. The tool actually used for each file operation can be shown using --status with the --id option set to the given transfer identifier. --retry=NUM Retry operations deemed recoverable up to the given number of attempts per file. The default number of retries is 2. A value of zero disables retries. Note that disabling retries also disables the ability of --sync to change file contents. Also note that the given value is cumulative across all stages of a file's processing so different stages may not be retried the same number of times. --size=SIZE Process transfers in batches of at least the given total file size. The suffixes k, m, g, and t may be used for KB, MB, GB, and TB, respectively. The default batch size is 4 GB. This option works in concert with --files and --interval to manage the number of checkpoints and the overhead of transfer management. A batch will initially consist of at least --size bytes or --files files, whichever is reached first. The batch may then be dynamically increased in size until there is enough work to span --interval seconds. To make batch selection completely dynamic, use --files=1 and --size=1. --split=SIZE Parallelize the processing of single files using chunks of the given size. The suffixes k, m, g, and t may be used for KiB, MiB, GiB, and TiB, respectively. The default split size is zero, which disables single file parallelization. A split size of less than 1 GiB is not recommended. Lowering the split size will increase parallelism but decrease the performance of each file chunk and increase the overhead of transfer management. Raising the split size will have the opposite effect. The ideal split size for a given file is the size of the file divided by the number of concurrent clients available. Note that --split=SIZE does not have an effect unless the value of the --hosts option is greater than 1. Also note that --split=SIZE can, in some cases, decrease remote transfer performance as it eliminates some higher performance transports. --split-tar=SIZE Create tar files of around the given size when used with the --create-tar option. When multiple tar files are created for a destination tar file "file.tar", the resulting split tar files will be named "file.tar-i.tar" starting from "file.tar-1.tar". The suffixes k, m, g, and t may be used for KB, MB, GB, and TB, respectively. The default split tar size is 1 TB. A value of zero disables splitting. A split tar size of greater than 2 TB is not recommended. Note

Shift Command Options 16 that resulting tar files may still be larger than specified when source files exist that are larger than the given size. --streams=NUM Use the given number of TCP streams in TCP-based transports (currently, bbcp, bbftp, fish-tcp, and gridftp). The default is the number of streams necessary to fully utilize the specified/estimated bandwidth using the maximum TCP window size. Note that it is usually preferable to specify the --bandwidth option, which allows an appropriate number of streams to be set automatically. Increasing the number of streams can increase performance when the maximum window size is set too low or there is cross-traffic on the network, but too high a value can decrease performance due to increased congestion and packet loss. --stripe=[CEXP][::[SEXP][::PEXP]] By default, a file transferred to a Lustre filesystem will be striped according to an administrator-defined policy (one stripe per GiB when not configured). It is recommended, although not required, that this policy preserve existing striping when the source resides on Lustre and has non-default striping. To disregard existing striping, "stripe" may be used with the --no-preserve=stripe option. To disable automatic striping completely and use the default Lustre behavior for all files and directories, use --stripe=0. The user may override the default policy by specifying expressions for one or more of the stripe count (CEXP), stripe size (SEXP), and stripe pool (PEXP). For the stripe count, a positive number less than 65,536 indicates a fixed number of stripes to use for all destination files and directories. A greater number or size defined with the suffixes k, m, g, and t for KiB, MiB, GiB, and TiB, respectively, specifies that files will be allocated one stripe per given size while directories will be striped according to the default policy. Finally, an arbitrary Perl expression (see perlsyn(1) for details) involving the constants , SZ, SC, and SS for source name, size, stripe count, and stripe size, respectively, may be specified to dynamically define the stripe count differently for every file and directory in the transfer. For example, the expression "NM =~ /foo/ ? 4 : (SZ < 10g ? 2g : 10g)" would set the stripe count of files whose name contains "foo" to 4, and the stripe count of files whose name does not contain "foo" to either one stripe per 2 GiB when the file size is less than 10 GiB or one stripe per 10 GiB otherwise. Striping behavior may be further refined by specifying a stripe size expression and/or Lustre pool name expression with similar conventions. The stripe count and/or stripe size can be left empty before the colons when specifying the stripe size or pool, respectively. For example, --stripe=::4m would specify the stripe size to be 4 MiB while using the default stripe count policy and, similarly, --stripe=::::pool1 would use the pool "pool1" while using the default stripe count and stripe size. Note that if the stripe pool is a Perl expression and not a simple alphanumeric pool name, pool names must use Perl conventions for indicating such as quotes and/or quote-like operators, for example: "NM =~ /foo/ ? q(poolfoo) : q(poolbar)" --threads=NUM Use the given number of threads in multi-threaded transports and checksum utilities (currently, mcp and msum). The default number of threads is 4. Increasing the number of threads can increase transfer/checksum performance when a host has excess resource capacity, but can reduce performance when any associated resource has reached its maximum. --verify-fast By default, files are checksummed at the source and destination to verify that they have not been corrupted with the source being read once during the copy and again during the checksum. This option specifies that the source copy buffer should be reused when possible for the source checksum calculations. This potentially increases performance up to 33%, but does not allow bits corrupted during the initial read to be detected. --window=SIZE Use a TCP send/receive window of the given size in TCP-based transports (currently, bbcp, bbftp, fish-tcp, and gridftp). The suffixes k, m, g, and t may be used for KB, MB, GB, and TB, respectively. The default is the product of the specified/estimated bandwidth

Shift Command Options 17 and the round-trip time between source and destination. Note that it is usually preferable to specify the ??bandwidth option, which allows an appropriate window size to be set automatically. Increasing the window size allows TCP to operate more efficiently over high bandwidth and/or high latency networks, but too high a value can overrun the receiver and cause packet loss.

Transfer Throttling Options

Transfers can be throttled to prevent resource exhaustion when they reach configured thresholds for CPU, disk, I/O, and/or network utilization.

--cpu=NUM Throttle the transfer when the local CPU usage reaches the specified percent of the total available. This option is disabled by default but may be desirable to prevent transfers from consuming too much of the local CPU. Once the given threshold is reached, a period will be induced between each batch of files to achieve an average CPU utilization equal to the value specified. Note that this functionality is currently supported only on Unix-like systems. --disk=NUM1:NUM2 Suspend/resume the transfer when the target filesystem disk usage reaches the specified percent of the total available. This option is disabled by default but may be desirable to prevent transfers from consuming too much local or remote disk space. Once the first threshold is reached, the transfer will suspend until enough disk resources have been freed on the target to bring the disk utilization under the second threshold. Note that this functionality is currently only supported on Unix-like systems. --io=NUM Throttle the transfer when the local I/O usage reaches the specified rate in MB/s. This option is disabled by default but may be desirable to prevent transfers from consuming too much of the local I/O bandwidth. Once the given threshold is reached, a sleep period will be induced between each batch of files to achieve an average I/O rate equal to the value specified. --ior=NUM Throttle the transfer when the local I/O reads reach the specified rate in MB/s. This option is similar to the --io option, but only applies to reads. --iow=NUM Throttle the transfer when the local I/O writes reach the specified rate in MB/s. This option is similar to the --io option, but only applies to writes. --net=NUM Throttle the transfer when the local network usage reaches the specified rate in MB/s. This option is disabled by default but may be desirable to prevent transfers from consuming too much of the local network bandwidth. Once the given threshold is reached, a sleep period will be induced between each batch of files to achieve an average network rate equal to the value specified. --netr=NUM Throttle the transfer when the local network reads reach the specified rate in MB/s. This option is similar to the --net option but only applies to reads. --netw=NUM Throttle the transfer when the local network writes reach the specified rate in MB/s. This option is similar to the --net option but only applies to writes.

Shift Command Options 18 Using Shift for Local Transfers and Tar Operations

For transfers on the same host within the NAS enclave—for example, between two directories on pfe21—the syntax for shiftc is similar to the cp command.

Note: If source and destination paths are not specified on the command line, any number of source/destination combinations will be read from the standard input, stdin (one combination per line).

The syntax for local Shift transfers is as follows:

shiftc [option]... source dest shiftc [option]... source... directory shiftc [option]...

For information about Shift options, including those used in the examples in this article, see Shift Command Options.

Transferring Files Locally

The examples in this section show you how use the shiftc command to transfer files on the same NAS host.

Note: The first example includes output; subsequent examples show only the command line.

• Copy local file1 into existing directory /u/username:

pfe21% shiftc file1 /u/username Shift id is 1 Detaching process (use --status option to monitor progress) • Copy file1 from /u/username to /nobackup/username/dir1 on pfe21:

pfe21% shiftc /u/username/file1 /nobackup/username/dir1 • Recursively copy the directory inputs inside another directory, /nobackup/username/dir2:

% shiftc -r inputs /nobackup/username/dir2

Complete the same operation with data verification turned off:

% shiftc -r --no-verify inputs /nobackup/username/dir2

TIP: Although using the --no-verify option can improve the speed of your transfer, it is not recommended because without data verification, the integrity of your data cannot be ensured. • Copy local file1 in the current directory to existing local directory /u/username/dir1:

pfe21% shiftc file1 /u/username/dir1 • Recursively copy local directory /nobackup/username/dir1 to local directory /nobackup/username/dir2 using 2 client hosts to perform the transfer:

pfe21% shiftc -r --hosts=2 /nobackup/username/dir1 /nobackup/username/dir2 • Recursively copy local directory nobackup/username/dir1 to local directory dir2, but exclude files ending in .log:

pfe21% shiftc -r --exclude='\.log$' nobackup/username/dir1 /dir2

Using Shift for Local Transfers and Tar Operations 19 Creating and Extracting Tar Files Locally

You can use Shift to transfer a directory and write it into a tar file it in one step, resulting in a portable tar file that can be read by either shiftc or tar. In the same step, you can also create a table-of-contents (.toc) file that lists the files contained in the archive along with their sizes and attributes (recommended).

Because of their sequential nature, tar files cannot be efficiently updated in place. As a workaround, incremental tar files can be used, which are separate tar files that consist of files updated after the time the original or subsequent incremental updates were created.

Creating Tar Files

The examples in this section show you how to create tar files on the same NAS host.

• Create a tar file (dir1.tar) of the directory /nobackup/username/dir1 and put it in the current directory, along with a corresponding table of contents (dir1.tar.toc):

pfe21% /nobackup/username pfe21% shiftc --create-tar --index-tar dir1 dir1.tar

Note:? If the dir1 directory is over 1 TB, it will be split into multiple tar files prefixed with dir1. • The Pleiades nobackup filesystems are mounted on Lou. This makes it possible to create local tar files from your nobackup filesystem directly to your Lou home directory when you are logged into Lou (lfe5-8).

For example, to transfer the dir1 directory as a tar file from your nobackup filesystem to the data_dir directory in your Lou home directory:

lfe5% cd /nobackupp8/username/data_dir lfe5% shiftc --create-tar --index-tar dir1 /u/username/data_dir/dir1.tar

Note: In the above example, the first step is to change into the directory /nobackupp8/username/data_dir, which is the parent directory of dir1 (the directory you are transferring). This prevents the tar operation from creating extraneous prepended directories when the tar file is extracted.

Creating Incremental Tar Files

The examples in this section show you how to create incremental tar files on the same NAS host.

• Create an incremental tar file (dir1-2020.tar) of all files modified on or after January 1st, 2020 in the /nobackup/username/dir1 directory, and put it in the current directory, along with a corresponding table of contents (dir1-2020.tar.toc):

pfe21% cd /nobackup/username pfe21% shiftc --create-tar --index-tar --newer="Jan 1 2020" dir1 dir1-2020.tar • Create an incremental tar file (dir1-update.tar) of all files modified in the dir1 directory of your nobackup directory after the original tar (dir1.tar) was successfully completed, to the data_dir directory in your Lou home directory:

lfe5% cd /nobackupp/username/data_dir lfe5% shiftc --create-tar --index-tar --newer=`stat -c %Y /u/username/data_dir/dir1.tar` \ dir1 /u/username/data_dir/dir1-update.tar

Using Shift for Local Transfers and Tar Operations 20 Notes:

♦ The shiftc command line shown above is too long to be formatted as one line, so it is broken with a backslash (\). ♦ The version of stat currently deployed on the systems does not allow file creation time to be retrieved, so the above example might miss files that were modified or created while the original tar was being written.

Extracting Tar Files

The examples in this section show you how to extract files from a tar file on the same host.

• Extract the tar file dir1.tar on Lou directly into your /nobackup directory:

lfe5% cd /nobackup/username/data_dir lfe5% shiftc --extract-tar /u/username/data_dir/dir1.tar . • If dir1 was over 1 TB and therefore split into multiple tar files:

pfe21% shiftc --extract-tar /u/username/data_dir/dir1.*tar .

• Extract the files 1g.20 through 1g.29 from dir1.tar to the current directory:

pfe21% shiftc --extract-tar --include='1g\.2[0-9]' dir1.tar .

Note: As shown in this example, shiftc uses Perl-style regular expressions for some options.

Using Shift for Local Transfers and Tar Operations 21 Using Shift for Transfers and Tar Operations Between Two NAS Hosts

For transfers between two host systems within the NAS enclave—for example, from a PFE to an LFE—the syntax for the shiftc command is nearly the same as that of local transfers.

Note: If source and destination paths are not specified on the command line, any number of source/destination combinations will be read from stdin (one combination per line). shiftc [OPTION]... source dest shiftc [OPTION]... source... directory shiftc [OPTION]...

For information about Shift options, including those used in the examples in this article, see Shift Command Options.

Transferring Files Within the Enclave

The examples in this section show you how to transfer files between two hosts within the enclave.

Note: The first example includes output; subsequent examples show only the command line.

Perform basic host-to-host transfers (for example, /u/username/file1 on pfe21 to your data/dir2 directory on Lou): pfe21% shiftc /u/username/file1 lfe:data/dir2 Shift id is 2 Detaching process (use --status option to monitor progress)

TIP: If you have a small transfer that will complete quickly, using the --wait option will provide a helpful indication that your transfer is complete—you'll know it's done as soon as your prompt returns. Also, if you include shiftc in a script, using the --wait option will prevent the next step in the script from starting until the transfer is complete.

The next two examples show this option.

• Copy local file1 in the current directory to your Lou home directory while waiting for completion:

pfe21% shiftc --wait file1 lfe: • Synchronize the local directory /u/username/dir1 with the Lou directory /u/username/dir2/dir1 while waiting for completion:

pfe21% shiftc -r --sync --wait /u/username/dir1 lfe:/u/username/dir2

Creating and Extracting Tar Files Within the Enclave

You can use Shift to transfer a directory and write it into a tar file it in one step, resulting in a portable tar file that can be read by either shiftc or tar. In the same step, you can also create a table-of-contents (.toc) file that lists the files contained in the archive along with their sizes and attributes (recommended).

Because of their sequential nature, tar files cannot be efficiently updated in place. As a workaround, incremental tar files can be used, which are separate tar files that consist of files updated after the time the original or subsequent incremental updates were created.

Using Shift for Transfers and Tar Operations Between Two NAS Hosts 22 Creating Tar Files

This example creates a tar file (dir1.tar) of the dir1 directory in /nobackup/user1 directly into your Lou home directory, along with a corresponding table of contents (dir1.tar.toc): pfe21% cd /nobackup/user1 pfe21% shiftc --create-tar --index-tar dir1 lfe:dir1.tar

Note:? If the dir1 directory is over 1 TB, it will be split into multiple tar files prefixed with dir1.

Creating Incremental Tar Files

The examples in this section show you how to create incremental tar files from one host to another within the enclave.

• Create an incremental tar file (dir1-2020.tar) of all files modified on or after January 1st, 2020 in the directory /nobackup/username/dir1 directly to your Lou home directory, along with a corresponding table of contents (dir1-2020.tar.toc):

pfe21% cd /nobackup/username pfe21% shiftc --create-tar --index-tar --newer="Jan 1 2020" dir1 lfe:dir1-2020.tar • Create an incremental tar file (dir1-update.tar) of all files modified in the dir1 directory of your nobackup directory after the original tar (dir1.tar) was successfully completed, to the data_dir directory in your Lou home directory:

pfe21% cd /nobackupp/username/data_dir pfe21% shiftc --create-tar --index-tar --newer=`ssh lfe stat -c %Y dir1.tar` dir1 lfe:dir1-update.tar

Note: The version of stat currently deployed on the systems does not allow file creation time to be retrieved, so the above example might miss files that were modified or created while the original tar was being written.

Extracting Tar Files

These examples show you how to extract tar files from one host to another within the enclave.

• Extract the dir1.tar file from your Lou home directory to the current directory:

pfe21% shiftc --extract-tar lfe:dir1.tar . • If dir1 was over 1 TB and therefore split into multiple tar files, extract them from your Lou home directory to the current directory as follows:

pfe21% shiftc --extract-tar lfe:'dir1.*tar' .

Note: As shown in the example, quotation marks are required when using wildcards (e.g., *) with sources on a remote host. • Extract the files 1g.20 through 1g.29 from dir1.tar to the current directory:

pfe21% shiftc --extract-tar --include='1g\.2[0-9]' lfe:dir1.tar .

Note: As shown in this example, shiftc uses Perl-style regular expressions for some options.

Using Shift for Transfers and Tar Operations Between Two NAS Hosts 23 Using Shift for Remote Transfers and Tar Operations

Transfers between a remote system and a host system within the NAS enclave—for example, between your local system and a Pleiades Front-End node (PFE)—must be carried out using the Secure Unattended Proxy (SUP).

To use the SUP for Shift transfers to NAS systems, you must first download the SUP client, authorize one or more NAS hosts for SUP operations, and then authorize one or more NAS directories for writes. A brief summary of these steps is shown below. For a full overview, see Using the Secure Unattended Proxy.

TIP: For higher-performance remote transfers, you can download and install bbFTP to make it available for Shift to use. Please note that bbFTP does not encrypt the data.

Downloading SUP to Enable Remote Transfers

Complete the following steps to set up your system for remote transfers:

1. Download and install SUP client (one time):

your_local_system% wget -O sup https://www.nas.nasa.gov/hecc/support/kb/file/9 your_local_system% 700 sup your_local_system% sup ~/bin 2. Authorize host for SUP operations (one time per host):

your_local_system% ssh pfe21 pfe21% ~/.meshrc 3. Authorize directories for writes (one or more times per host):

your_local_system% ssh pfe21 pfe21% /tmp >>~/.meshrc pfe21% echo /nobackup/$USER >>~/.meshrc pfe21% echo /u/$USER >>~/.meshrc 4. Download and install bbFTP (optional for higher performance)

Performing Remote Transfers

For remote transfers, the shiftc syntax is similar to local transfers and transfers between two hosts within the enclave, except that you must prepend sup (the SUP client) to each shiftc command. Also, remote Shift transfers must always be initiated from the system that is external to the NAS enclave, but files may be transferred in either direction. sup shiftc [OPTION]... source dest sup shiftc [OPTION]... source... directory sup shiftc [OPTION]...

For example, the following enclave-to-enclave transfer copies file1 into the directory ~/data/run2 on lfe5: pfe21% shiftc /home/username/file1 lfe5:~/data/run2

If the above example were changed into a remote transfer, it would become: your_local_system% sup shiftc /home/username/file1 lfe5:~/data/run2

In general, shiftc [args] becomes sup shiftc [args].

Using Shift for Remote Transfers and Tar Operations 24 Note: For information about Shift options, including those used in the examples in this article, see Shift Command Options.

File Transfer Examples

The examples in this section show you how to transfer files between your local system and a host within the enclave. The first example includes output; subsequent examples show only the command line.

• Perform a remote transfer via the Secure Unattended Proxy (for example, /username/file1 on your local system to your home directory on pfe21):

your_local_system% sup shiftc /username/file1 pfe21: Shift id is 3 Detaching process (use --status option to monitor progress) • Recursively copy local directory /username/dir1 on your local system to directory /username/dir2 on lfe5:

your_local_system% sup shiftc -r /username/dir1 lfe5:/username/dir2 • Recursively copy remote directory /username/dir2 on lfe5 to the current directory on your local system using an encrypted transport:

your_local_system% sup shiftc -r --secure lfe5:/username/dir2 .

Creating and Extracting Tar Files Remotely

You can use Shift to transfer a directory and write it into a tar file it in one step, resulting in a portable tar file that can be read by either shiftc or tar. In the same step, you can also create a table-of-contents (.toc) file that lists the files contained in the archive along with their sizes and attributes (recommended).

Because of their sequential nature, tar files cannot be efficiently updated in place. As a workaround, incremental tar files can be used, which are separate tar files that consist of files updated after the time the original or subsequent incremental updates were created.

Creating Tar Files

This example creates a tar file (dir1.tar) of the directory dir1 on your local system directly into your Lou home directory, along with a corresponding table of contents (dir1.tar.toc): your_local_system% sup shiftc --create-tar --index-tar dir1 lfe:dir1.tar

Note:? If the dir1 directory is over 1 TB, it will be split into multiple tar files prefixed with dir1.

Creating Incremental Tar Files

The example in this section shows you how to create incremental tar files from your local system to your Lou home directory.

Create an incremental tar file (dir1-2020.tar) of all files modified on or after January 1st, 2020 in the directory dir1 on your local system directly to your Lou home directory, along with a corresponding table of contents (dir1-2020.tar.toc):

Using Shift for Remote Transfers and Tar Operations 25 your_local_system% sup shiftc --create-tar --index-tar --newer="Jan 1 2020" dir1 lfe:dir1-2020.tar

Extracting Tar Files

These examples show you how to extract tar files from your local system to your Lou home directory.

• Extract the dir1.tar file from your local system to your Lou home directory:

your_local_system% sup shiftc --extract-tar lfe:dir1.tar . • If dir1 was over 1 TB and therefore split into multiple tar files:

your_local_system% sup shiftc --extract-tar lfe:'dir1.*tar' .

Note: As shown in this example, quotation marks are required when using wildcards (e.g., *) with sources on a remote host. • Extract the files 1g.20 through 1g.29 from dir1.tar from your local system to your Lou home directory:

your_local_system% sup shiftc --extract-tar --include='1g\.2[0-9]' lfe:dir1.tar .

Note: As shown in this example, shiftc uses Perl-style regular expressions for some options.

Using Shift for Remote Transfers and Tar Operations 26 Local Transfers

Checking File Integrity

It is a good practice to confirm whether your files are complete and accurate before you transfer the files to or from NAS, and again after the transfer is complete.

The easiest way to verify the integrity of file transfers is to use the NAS-developed Shift tool for the transfer, with the --verify option enabled. As part of the transfer, Shift will automatically checksum the data at both the source and destination to detect corruption. If corruption is detected, partial file transfers/checksums will be performed until the corruption is rectified.

For example: pfe21% shiftc --verify $HOME/filename /nobackuppX/username lou% shiftc --verify /nobackuppX/username/filename $HOME your_localhost% sup shiftc --verify filename pfe:

In addition to Shift, there are several algorithms and programs you can use to compute a checksum. If the results of the pre-transfer checksum match the results obtained after the transfer, you can be reasonably certain that the data in the transferred files is not corrupted. If data is corrupted during a transfer, a good checksum algorithm will yield different results before and after the transfer, with high probability.

The following checksum programs are available on HECC systems: sum Computes a checksum using the BSD sum or System V sum algorithm; also counts the number of blocks (1 KB-block or 512 B-block) in a file cksum Computes a cyclic redundancy check (CRC) checksum; also counts the number of bytes in a file md5sum Computes a 128-bit MD5 checksum, which is represented by a 32-character hexadecimal number msum High performance multi-threaded checksum utility, developed at NAS. By default, computes 128-bit MD5 checksums, but you can compute other types using --hash-type. Note that for full compatibility with md5sum output, you must use the --split-size=0 option, which will also decrease performance on large files. For more information, see man msum on any HECC front-end system. mtar Feature-enhanced version of tar, developed at NAS. By default, computes 128-bit MD5 checksums, but you can compute other types using --hash-type. To compute checksums of files contained within a tar archive without extracting them, use the --print-hash option with the -t option. For a full list of options and hash types, run mtar --help on any NAS front-end system.

For example:

% -l filename -rw------1 username group_id 67358 Nov 15 11:49 filename

% sum filename 50063 66

% cksum filename

Local Transfers 27 269056887 67358 filename

% md5sum filename cfe0fc62607e9dc6ea0c231982316b75 filename

% msum filename cfe0fc62607e9dc6ea0c231982316b75 filename

% mtar -tf filename.tar --print-hash e7334b7bed07fea35544092274118b1c a.out 9bd31bc329e0123adc8e190e27c5bb18 a.pl d0ec90a41d4644a16436a87dd4c008ae a.sh

To check the integrity of an existing tar file against a directory:

% mtar -tf dir_name.tar | > sums.dir_name.tar % find dir_name | md5sum | sort>sums.dir_name. % sums.dir_name.tar sums.dir_name.dir

The md5sum utility is more reliable than the sum or cksum commands for detecting accidental file corruption, as the chances of accidentally having two files with identical MD5 checksums are extremely small. The utility is installed by default in most Unix, Linux, and Unix-like operating systems. We recommend that you compute the md5sum of a file before and after the transfer.

The following example shows that the file filename is complete and accurate after the transfer, based on its md5sum. pfe21% md5sum filename cfe0fc62607e9dc6ea0c231982316b75 filename pfe21% scp filename local_username@your_localhost: your_localhost%md5sum filename cfe0fc62607e9dc6ea0c231982316b75 filename

See the sum, cksum, md5sum, and msum man pages for more information on these commands. See Using mtar to Create or Extract Tar Files on Lustre for more information on mtar.

Checking File Integrity 28 Local File Transfer Commands

For large file transfers within the NAS enclave, use the commands shiftc, mcp, or cxfscp. The slower cp command also can be used.

The following file transfer commands can be used when both the source and destination locations are accessible on the same host where the command is issued.

Shift Command

Shift (shiftc) is a NAS-developed tool for performing automated local and remote file transfers. Shift utilizes a variety of underlying file transports to achieve maximum performance for files of any size on any file system.

Where is it installed at NAS? shiftc is installed on the Pleiades and Lou front-end systems (PFEs and LFEs).

When to use it?

The command shiftc can be used as a drop-in replacement for cp at any time on any system on which it is installed.

Examples lfe5% shiftc /nobackup/username/filename $HOME pfe27% shiftc $HOME/filename /nobackup/username

Performance shiftc utilizes mcp, when available, so it can be up to 10 times faster than cp for large files (2+ GB) and can achieve up to 1.8 GB/sec on a single host. Using the --hosts option, it has been measured to achieve up to 5.0 GB/sec on 8 hosts.

For more information, see Shift File Transfer Overview. cxfscp cxfscp is a program for quickly copying large files to and from a CXFS filesystem, such as the Lou front-end (LFE) home file system. It can be significantly faster than cp on CXFS filesystems since it uses multiple threads and large direct I/Os to fully utilize the bandwidth to the storage hardware.

For files less than 64 kilobytes in size, which will not benefit from large direct I/Os, cxfscp will use a separate thread for copying these files using buffered I/O similar to cp.

Where is it installed at NAS? cxfscp is installed on the LFEs.

When to use it?

The Pleiades Lustre filesystems (/nobackuppX) are mounted on lfe[5-8]. The command cxfscp can be issued on either of these hosts to copy large files between the LFE's CXFS home file

Local File Transfer Commands 29 system and Pleiades's /nobackup. This is an easy way to transfer files between Lou and Pleiades without the need for scp, bbftp, or bbscp.

Examples lfe5% cxfscp -rp --bi /nobackupp4/username/testdir_a /u/username/tests lfe5% cxfscp -p --bo /u/username/data* /nobackupp4/username/data_dir

Performance

Some benchmarks done by NAS staff show that cxfscp is typically 4-7 times faster than cp for large files (2+ GB) and can achieve up to 650 MB/sec.

For more information, read man cxfscp. cp cp is a Unix command for copying files between two locations (for example, two different directories of the same filesystem or two different filesystems such as NFS, CXFS or Lustre).

Where is it installed at NAS? cp is available on all NAS systems except the secure front ends (SFEs).

Use the -p option to preserve the timestamp on the file.

Examples pfe21% cp -p $HOME/filename $HOME/newdir/filename2 pfe21% cp -p $HOME/filename /nobackup/username

Local File Transfer Commands 30 Shift Transfer Tool Overview

The NAS-developed Shift tool can copy files locally on NAS enclave hosts, transfer files between hosts inside the NAS enclave, and transfer files between the NAS enclave and remote hosts. You can also use Shift to check the status of transfers at any time, receive email notification of completion, errors, and warnings, and restart interrupted transfers or transfers with errors.

All functionality is accessed through the Shift client, which is invoked via the shiftc command. The syntax for shiftc is similar to the syntax for the cp and scp commands.

Shift is the recommended method for transferring files to and from the Lou mass storage system, as it can create tar files as part of the transfer and split a transfer into multiple tar files for oversized directories (larger than 1 TB).

Advanced Features

Shift includes the following advanced features:

• Automatic parallelization of transfers • Local and remote tar creation and extraction • Synchronization based on modification times and checksums (similar to rsync) • Automatic file integrity verification and correction • Ability to restart transfers • Automatic retrieval of files from tape storage (DMF-managed Lou filesystems) • Ability to check status of current transfers

How to Use Shift

See the following articles for detailed information about how to use Shift:

• Using Shift for Local Transfers and Tar Operations • Using Shift for Transfers and Tar Operations Between Two NAS Hosts • Using Shift for Remote Transfers and Tar Operations • Checking Shift Transfer Status and Restarting Transfers • Shift Command Options

Additional Resources

You can also see presentation slides for three HECC training webinars that demonstrate how to use the tool:

• Simplifying and Optimizing Your Data Transfers (PDF) • Advanced Features of the Shift Automated File Transfer Tool (PDF) • Simple Automated File Transfers Using SUP and Shift (PDF)

Recordings of each presentation are also available in the Past Webinars Archive.

Note: Some hostnames and options may have changed since the webinars were presented.

Shift Transfer Tool Overview 31 Remote Transfers

Remote File Transfer Commands

Use the file transfer commands scp, shiftc, bbftp, or bbscp when the source and destination are located on different hosts—either on two different NAS high-end computing hosts, or on a NAS host and a remote host such as your local desktop system.

Except for bbftp, the basic syntax is: copy-command [options]...source...destination scp Command

The Secure Copy Protocol (scp) command, based on the (SSH), is a means of securely transferring files between a local and a remote host. Both your authentication information (such as password or passcode) and your data are encrypted.

The most widely used scp program is from OpenSSH.

Where is scp installed at NAS?

A copy of scp from OpenSSH without the is available on the Pleiades front-end systems (PFEs), Lou, and the secure front-end systems (SFEs).

Do you need it installed on your local host?

If you have a version of SSH installed on your local host, scp is most likely already installed there.

When to use it?

Typically, scp is used to transfer small files within NAS (<< 5 GB) or offsite (<< 1 GB) that take a reasonable amount of time to complete.

Examples

In these examples, "outbound" means the command is initiated on a NAS host such as Pleiades or Lou, whether the file is being pushed or pulled. "Inbound" means the command is initiated on your local host.

Note: Omit local_username@ and nas_username@ in the examples below if your local username and NAS username are identical. These examples assume you already know how to log into the NAS enclave.

For outbound transfer: lou% scp local_username@your_localhost.domain:file1 ./file2

Remote Transfers 32 For inbound transfer if SSH passthrough has been set up correctly: your_localhost% scp file1 [email protected]:file2

For inbound transfer if you have not set up SSH passthrough: your_localhost% scp -oProxyCommand='ssh [email protected] ssh-proxy %h'file1 [email protected]:file2 where sfeX is sfe[6-9].

Note: Due to formatting issues, the command line in the above example is shown on two lines. It should be entered as one line.

Performance

Within the NAS secure enclave, depending on source and destination hosts and other factors, the performance range will be 40-100 MB/sec.

If your data will compress well, consider enabling compression by adding -C to your scp command line.

We recommend using OpenSSH 5.0 or a newer version.

Shift Command (shiftc)

Shift (shiftc) is a NAS-developed tool for performing automated local and remote file transfers. Shift utilizes a variety of underlying file transports to achieve maximum performance for files of any size on any file system.

Where is it installed at NAS?

Shift is installed on Lou and on the PFEs.

Do you need it installed on your local host?

For transfers between your local host and NAS systems, you must install the SUP client as discussed in Shift File Transfer Overview.

When to use it?

Shift is the recommended method for transferring files to and from the Lou mass storage system, as it can create tar files as part of the transfer and split a transfer into multiple tar files for oversized directories (larger than 1 TB).

Shift can be used as a drop-in replacement for scp or bbscp between any enclave systems. For transfers between your local host and NAS systems, the transfer must be initiated from your local host with shiftc invoked via the SUP client (that is, using the command sup shiftc). If an encrypted transfer is required, use the shiftc --secure option.

Remote File Transfer Commands 33 Example pfe27% shiftc /nobackupp2/username/filename lou: your_localhost% sup shiftc pfe:filename .

Performance

Shift uses the highest performing file transport that is available on both sides of the transfer, and is optimal for the sizes of the files being transferred. This means that Shift will be as fast as bbFTP for large transfers and faster than bbFTP for small and mixed transfers.

For more information, see File Transfer: Overview and Shift File Transfer Overview. bbftp command bbFTP is a high-performance remote that supports parallel TCP streams for data transfers. Basically, it splits a single file in several pieces and sends them through parallel streams. The whole file is then rebuilt on the remote site. bbFTP also allows dynamically adjustable TCP/IP window sizes instead of a statically defined window size used by normal scp. In addition, it provides a secure control channel over SSH and allows data to be transferred in cleartext to reduce overhead in unnecessary encryption. These characteristics allow bbFTP to achieve transfers that are faster than with normal scp.

We recommend using bbftp in place of scp large data transfers over long distances.

Where is it installed at NAS?

Both the bbFTP server (bbftpd) and client (bbftp) are installed on Lou, the PFEs, and the SUP.

Do you need it installed on your local host?

If you want to initiate bbftp from your local host, you must download and install the client version of bbFTP on your local host. If you want to initiate bbftp from a NAS system and transfer files from/to your local host, download and install the server version of bbFTP on your local host.

When to use it?

Consider using bbFTP when transferring large files ( > 1 GB) offsite. Be sure to use multiple streams to get better transfer rates.

Example bbFTP is like a non-interactive FTP, and the syntax can be complicated. your_localhost% bbftp -u nas_username -e 'setnbstream 8; get filename' -E 'bbftpd -s -m 8' lou.nas.nasa.gov

Note: Due to formatting issues, the command line in the above example is shown on two lines. It should be entered as one line.

Remote File Transfer Commands 34 Performance bbFTP typically transfers data 10-20 times faster than normal scp.

If you are not getting good performance, check with your network administrator to see if performance tuning is needed on your system. See the article bbFTP for more instructions on installing and using bbFTP. bbscp bbSCP is a bbFTP wrapper that provides an scp-like command-line interface. It assembles the proper command-line for bbFTP and then executes bbftp to perform the transfers. bbSCP is designed and tested for bbFTP version 3.2.0. bbSCP only encrypts usernames and passwords, it does not encrypt the data being transferred.

Where is it installed at NAS? bbSCP is installed on Lou and the PFEs under /usr/local/bin.

Do you need it installed on your local host?

If you want to initiate bbscp from your local host, you need to:

• Download and install bbftp-client-3.2.0 on your local host • Download bbSCP version 1.0.6 (also attached at the end of this article) and install it on your local host

When to use it?

Use the bbscp script when you want the bbFTP functionality and performance but with scp-like syntax. It can be used to transfer files within NAS enclave or between NAS and a remote site.

Example your_localhost% bbscp filename [email protected]:

Performance

The performance of bbSCP is the same as that of bbFTP.

See The bbscp Script for more information (, performance tuning, test and verification).

Remote File Transfer Commands 35 Checking File Integrity

It is a good practice to confirm whether your files are complete and accurate before you transfer the files to or from NAS, and again after the transfer is complete.

The easiest way to verify the integrity of file transfers is to use the NAS-developed Shift tool for the transfer, with the --verify option enabled. As part of the transfer, Shift will automatically checksum the data at both the source and destination to detect corruption. If corruption is detected, partial file transfers/checksums will be performed until the corruption is rectified.

For example: pfe21% shiftc --verify $HOME/filename /nobackuppX/username lou% shiftc --verify /nobackuppX/username/filename $HOME your_localhost% sup shiftc --verify filename pfe:

In addition to Shift, there are several algorithms and programs you can use to compute a checksum. If the results of the pre-transfer checksum match the results obtained after the transfer, you can be reasonably certain that the data in the transferred files is not corrupted. If data is corrupted during a transfer, a good checksum algorithm will yield different results before and after the transfer, with high probability.

The following checksum programs are available on HECC systems: sum Computes a checksum using the BSD sum or System V sum algorithm; also counts the number of blocks (1 KB-block or 512 B-block) in a file cksum Computes a cyclic redundancy check (CRC) checksum; also counts the number of bytes in a file md5sum Computes a 128-bit MD5 checksum, which is represented by a 32-character hexadecimal number msum High performance multi-threaded checksum utility, developed at NAS. By default, computes 128-bit MD5 checksums, but you can compute other types using --hash-type. Note that for full compatibility with md5sum output, you must use the --split-size=0 option, which will also decrease performance on large files. For more information, see man msum on any HECC front-end system. mtar Feature-enhanced version of tar, developed at NAS. By default, computes 128-bit MD5 checksums, but you can compute other types using --hash-type. To compute checksums of files contained within a tar archive without extracting them, use the --print-hash option with the -t option. For a full list of options and hash types, run mtar --help on any NAS front-end system.

For example:

% ls -l filename -rw------1 username group_id 67358 Nov 15 11:49 filename

% sum filename 50063 66

% cksum filename 269056887 67358 filename

% md5sum filename cfe0fc62607e9dc6ea0c231982316b75 filename

Checking File Integrity 36 % msum filename cfe0fc62607e9dc6ea0c231982316b75 filename

% mtar -tf filename.tar --print-hash e7334b7bed07fea35544092274118b1c a.out 9bd31bc329e0123adc8e190e27c5bb18 a.pl d0ec90a41d4644a16436a87dd4c008ae a.sh

To check the integrity of an existing tar file against a directory:

% mtar -tf dir_name.tar | sort > sums.dir_name.tar % find dir_name | xargs md5sum | sort>sums.dir_name.dir % diff sums.dir_name.tar sums.dir_name.dir

The md5sum utility is more reliable than the sum or cksum commands for detecting accidental file corruption, as the chances of accidentally having two files with identical MD5 checksums are extremely small. The utility is installed by default in most Unix, Linux, and Unix-like operating systems. We recommend that you compute the md5sum of a file before and after the transfer.

The following example shows that the file filename is complete and accurate after the transfer, based on its md5sum. pfe21% md5sum filename cfe0fc62607e9dc6ea0c231982316b75 filename pfe21% scp filename local_username@your_localhost: your_localhost%md5sum filename cfe0fc62607e9dc6ea0c231982316b75 filename

See the sum, cksum, md5sum, and msum man pages for more information on these commands. See Using mtar to Create or Extract Tar Files on Lustre for more information on mtar.

Checking File Integrity 37 Using GPG to Encrypt Your Data

Encryption helps protect your files during inter-host file transfers that use protocols that are not already encrypted—for example, when using bbftp or ftp, or when using shiftc without the --secure option. We recommend using the GNU Privacy Guard (GPG), an Open Source OpenPGP-compatible encryption system.

GPG has been installed on Pleiades, Endeavour, and Lou in the /usr/bin/gpg directory. If you do not have GPG installed on the system(s) that you would like to use for transferring files, please see the GPG website.

Choosing What Cipher to Use

We recommend using the cipher AES256, which uses a 256-bit Advanced Encryption Standard (AES) key to encrypt the data. Information on AES can be found at the National Institute of Standards and Technology's Computer Security Resource Center.

You can set your cipher in one of the following ways:

• Add --cipher-algo AES256 to your ~/.gnupg/gpg.conf file. • Add --cipher-algo AES256 in the command line to override the default cipher, CAST5.

Examples

If you choose not to add the cipher-algo AES256 to your gpg.conf file, you can add --cipher-algo AES256 on any of these simple example command lines to override the default cipher, CAST5.

Creating an Encrypted File

Both commands below are identical. They encrypt the test.out file and produce the encrypted version in the test.gpg file:

% gpg --output test.gpg --symmetric test.out

% gpg -o test.gpg -c test.out

You will be prompted for a passphrase, which will be used later to decrypt the file.

Decrypting a File

The following command decrypts the test.gpg file and produces the test.out file:

% gpg --output test.out -d test.gpg

You will be prompted for the passphrase that you used to encrypt the file. If you don't use the --output option, the command output goes to STDOUT. If you don't use any flags, it will decrypt to a file without the .gpg suffix. For example, using the following command line would result in the decrypted data in a file named "test":

% gpg test.gpg

Using GPG to Encrypt Your Data 38 Selecting a Passphrase

Your passphrase should have sufficient information entropy. We suggest that you include five words of 5-10 letters in size, chosen at random, with spaces, special characters, and/or numbers embedded into the words.

You need to be able to recall the passphrase that was used to encrypt the file.

Factors that Affect Encrypt/Decrypt Speed on NAS Filesystems

We do not recommend using the --armour option for encrypting files that will be transferred to/from NAS systems. This option is mainly intended for sending binary data through email, not via transfer commands such as bbftp or ftp. The file size tends to be about 33% bigger than without this option, and encrypting the data takes about 10-15% longer.

The level of compression used when encrypting/decrypting affects the time required to complete the operation. There are three options for the compression algorithm: none, zip, and zlib.

• --compress-algo none or --compress-algo 0 • --compress-algo zip or --compress-algo 1 • --compress-algo zlib or --compress-algo 2

For example:

% gpg --output test.gpg --compress-algo zlib --symmetric test.out

If your data is not compressible, --compress-algo 0 (none) gives you a performance increase of about 50% compared to --compress-algo 1 or --compress-algo 2.

If your data is highly compressible, choosing the zlib or zip option will not only increase the speed by 20-50%, it will also reduce the file size by up to 20x. For example, in one test on a NAS system, a 517 megabyte (MB) highly compressible file was compressed to 30 MB.

The zlib option is not compatible with PGP 6.x, but neither is the cipher algorithm AES256. Using the zlib option is about 10% faster than using the zip option on a NAS system, and zlib compresses about 10% better than zip.

Random Benchmark Data

We tested the encryption/decryption speed of three different files (1 MB, 150 MB, and 517 MB) on NAS systems. The file used for the 1 MB test was an RPM file, presumably already compressed, since the resulting file sizes for the none/zip/zlib options were within 1% of each other. The 150 MB file was an ISO file, also assumed to be a compressed binary file for the same reasons. The 517 MB file was a text file. These runs were performed on a CXFS filesystem when many other users' jobs were running. The performance reported here is for reference only, and not the best or worst performance you can expect.

Using AES256 as the Cipher Algorithm

1 MB File 150 MB File 517 MB File

Using GPG to Encrypt Your Data 39 ~5.5 secs to with --armour ~40 secs to encrypt encrypt

~4 secs to without --armour ~35 secs to encrypt encrypt

~33 secs to encrypt; ~33 secs, resultant file size without --armour, ~28 secs to decrypt to ~30 MB; ~34 secs to decrypt zlib compression file to file

~36 secs to encrypt; ~38 secs, resultant file size without --armour, zip ~31 secs to decrypt to ~33 MB; ~34 secs to decrypt compression file to file

~19 secs to encrypt; ~49 secs, resultant file size without --armour, no ~25 secs to decrypt to ~517 MB; ~75 secs to compression file decrypt to file

Using GPG to Encrypt Your Data 40 Shift Transfer Tool Overview

The NAS-developed Shift tool can copy files locally on NAS enclave hosts, transfer files between hosts inside the NAS enclave, and transfer files between the NAS enclave and remote hosts. You can also use Shift to check the status of transfers at any time, receive email notification of completion, errors, and warnings, and restart interrupted transfers or transfers with errors.

All functionality is accessed through the Shift client, which is invoked via the shiftc command. The syntax for shiftc is similar to the syntax for the cp and scp commands.

Shift is the recommended method for transferring files to and from the Lou mass storage system, as it can create tar files as part of the transfer and split a transfer into multiple tar files for oversized directories (larger than 1 TB).

Advanced Features

Shift includes the following advanced features:

• Automatic parallelization of transfers • Local and remote tar creation and extraction • Synchronization based on modification times and checksums (similar to rsync) • Automatic file integrity verification and correction • Ability to restart transfers • Automatic retrieval of files from tape storage (DMF-managed Lou filesystems) • Ability to check status of current transfers

How to Use Shift

See the following articles for detailed information about how to use Shift:

• Using Shift for Local Transfers and Tar Operations • Using Shift for Transfers and Tar Operations Between Two NAS Hosts • Using Shift for Remote Transfers and Tar Operations • Checking Shift Transfer Status and Restarting Transfers • Shift Command Options

Additional Resources

You can also see presentation slides for three HECC training webinars that demonstrate how to use the tool:

• Simplifying and Optimizing Your Data Transfers (PDF) • Advanced Features of the Shift Automated File Transfer Tool (PDF) • Simple Automated File Transfers Using SUP and Shift (PDF)

Recordings of each presentation are also available in the Past Webinars Archive.

Note: Some hostnames and options may have changed since the webinars were presented.

Shift Transfer Tool Overview 41 Using bbFTP and bbSCP for Remote Transfers

The bbSCP Script

The NAS-developed bbSCP script is a bbFTP wrapper that provides an scp-like command line interface; bbSCP only encrypts usernames and passwords, it does not encrypt the data being transferred.

Downloading bbSCP

If you plan to use the bbscp command on your local system, you have to download bbSCP (also attached at the end of this article) and download/install bbFTP client on your local system.

The bbSCP script has been installed on Pleiades and Lou.

Using bbSCP

Note that bbSCP is just a client-side wrapper for bbFTP, so, as with bbFTP, you must use the fully-qualified domain name of the target host (for example, pfe21.nas.nasa.gov) if you are not within the NAS domain.

See the bbSCP man page for more usage details.

BBSCP(1) User Contributed Perl Documentation BBSCP(1)

NAME bbscp - bbftp wrapper, provides an scp-like commandline interface

SYNOPSIS bbscp [OPTIONS] [[user@]host1:]file_or_dir1 [...] [[user@]host2:]dir2

DESCRIPTION bbscp does unencrypted copies of files either from the localhost to a directory on a remote host, or from a remote host to a directory on the localhost (see the -N option for the only exception to this). It assembles the proper commandline for bbftp (designed and tested for bbftp version 3.2.0, see RESTRICTIONS) and then executes bbftp to perform the transfer(s).

The "-s", "-p 2", and "-r 1" options for bbftp are set by default, along with the following options:

setoption keepaccess setoption keepmode setoption nocreatedir

The options -p and -r can be overridden on the commandline.

Note the following limitations and capabilities in different transfer scenarios:

copying from localhost to remote host - regular files bbftp will overwrite a pre-existing file of the same name on the remote host without asking for confirmation.

- directories This script recursively transfers entire directories (only for local-to-remote transfers!).

- symbolic links (see RESTRICTIONS) Symlinks on the localhost are treated just like the thing they point to, and are ignored if they point to something that doesn't exist.

Using bbFTP and bbSCP for Remote Transfers 42 copying from remote host to localhost - regular files bbftp will overwrite a pre-existing file of the same name on the localhost without asking for confirmation.

- directories There is no way at this time to transfer entire directories from a remote host to the localhost.

- symbolic links (see RESTRICTIONS) Symlinks on the remote host are treated just like the thing they point to (which means they are ignored if they point to a directory or to something that doesn't exist).

OUTPUT The default output mode of the script displays "OK" or "FAILURE" for each of the transfer operations that bbftp performs. This display occurs after bbftp has finished running, so it may be delayed for some time depending on the duration of the transfer(s).

The script switches to more verbose output if the user provides 1 or more of the verbose output commandline options (-l, -t, -V, and -W).

OPTIONS -B name/location of bbftp executable. default is "bbftp"

-d dry-run. script performs its duty but does not actually execute bbftp. the bbftp commandline is printed, along with the contents of the bbftp control-file

-h minimal help text

-k keep bbftp command file that this script creates

-l long-winded (extra verbose) output from bbftp. uses undocumented bbftp option (-d)

-N transfer a single file and rename it at the destination. both local-to-remote and remote-to-local transfer is supported. see RESTRICTIONS

-v version of this script

-X set the size of the TCP send window (in kilobytes). default is the bbftp default size

-Y set the size of the TCP receive window (in kilobytes). default is the bbftp default size

-z suppress the security disclaimer

bbftp options that can be specified on the commandline of this script:

-D[min_port:max_port] (e.g. "-D", "-D40000:40100")

-E

-L

-p

-R

-r

-t

-V -W

RESTRICTIONS Version of bbftp It's very important to use bbftp version 3.2.0 with bbscp -- there's at least 1 known issue with using bbftp 3.1.0.

Possible shell issues bash and tcsh interpret commandline text in different ways, so you

The bbSCP Script 43 may need to use quotes or other delimiters to use bbscp. In particular, bash and tcsh are known to handle wildcards differently.

Wildcards If the -N option is not in use, wildcards can be used in remote host file specifications, but only for the names of files, not for directories. So, for example, "user@host:/tmp/file*" is acceptable, but "user@host:/tm*/file*" is not.

Symbolic links Symlinks are not bbftp's strong suit -- if you wish to transfer a collection of files that includes symlinks it is highly recommended that you first make a tar-file and then transfer the tar-file.

Use of -N option Wildcards are not supported in remote host file specifications w/ -N.

If the destination is a symlink it will be overwritten, regardless of what that symlink points to.

EXAMPLES Note: these examples have been tested with bash, changes may be needed for them to work in tcsh (see RESTRICTIONS).

local file to remote directory (username must be the same on both machines) bbscp /u/username/data/file1 machine:target_dir

local file to remote file w/ different name bbscp -N /u/username/data/file1 machine:file89

multiple local files to remote directory bbscp /u/username1/data/*file username2@machine:/tmp

local directory to remote home directory bbscp /u/username1/data username2@machine:

remote file to local directory bbscp username1@machine:data/file5 /u/username2/source_dir

remote file to local file w/ different name bbscp -N username1@machine:data/file5 /u/username2/source_dir/file93

multiple remote files to local directory bbscp -V username1@machine:/u/username1/data/file* /tmp

multiple remote files to local directory bbscp -V username1@machine:file1.txt username1@machine:stuff.dat /tmp

AUTHOR Greg Matthews [email protected] perl v5.8.8 2010-12-10 BBSCP(1)

Performance Tuning

To find the transfer rate, turn on the -V option.

Like bbFTP, the number of streams and TCP send/recv window sizes affect performance. You can set the number of streams by using the -p option. Starting with bbSCP version 1.0.6 the default is 2 streams. To set the window sizes in KB, use the -X option for send window and -Y for receive window. The default is the bbFTP default send/recv window size.

For more information concerning test and verification of bbSCP, see Using bbSCP for Test and Verification.

The bbSCP script was written in Perl by NAS staff member Greg Matthews.

The bbSCP Script 44 Using bbSCP for Test and Verification

The following examples provide test and verification data and sample commands for using bbSCP between two hosts (crow & lfe2.nas.nasa.gov or dmzfs1.nas.nasa.gov).

Straight File Transfer

This example demonstrates the transfer of a file named 100mb: crow% bbscp -V 100mb [email protected]:/nobackup1/user/

/home/user/bin/bbscp: will run commandline: bbftp -s -r 1 -V -p 8 -u user -i /tmp/bbscp.lKCrSUg lfe2.nas.nasa.gov

/home/user/bin/bbscp: begin output of bbftp:

------WARNING! This is a US Government computer. This system is for ..... ------Authenticated with partial success.

Plugin authentication

Enter PASSCODE:

>> COMMAND : setoption keepaccess << OK >> COMMAND : setoption keepmode << OK >> COMMAND : setoption nocreatedir << OK >> COMMAND : put 100mb /nobackup1/user/100mb << OK 104857600 bytes send in 5.43 secs (1.89e+04 KB/sec or 147 Mb/s)

/home/user/bin/bbscp: end output of bbftp

Renaming File at Destination

This example demonstrates how to transfer a single file (named 100mb) and rename it (to crow-100mb) at the destination. Both local-to-remote and remote-to-local transfer are supported. crow% bbscp -V -N 100mb [email protected]:/nobackup1/user/crow-100mb

/home/user/bin/bbscp: will run commandline: bbftp -s -r 1 -V -p 8 -u user -i

/tmp/bbscp.5eUBcTX lfe2.nas.nasa.gov

/home/user/bin/bbscp: begin output of bbftp:

------

WARNING! This is a US Government computer. This system is for ..... ------

Authenticated with partial success.

Plugin authentication

Enter PASSCODE:

>> COMMAND : setoption keepaccess << OK >> COMMAND : setoption keepmode << OK >> COMMAND : setoption nocreatedir

Using bbSCP for Test and Verification 45 << OK >> COMMAND : put 100mb /nobackup1/user/crow-100mb << OK 104857600 bytes send in 5.3 secs (1.93e+04 KB/sec or 151 Mb/s)

/home/user/bin/bbscp: end output of bbftp

Adjusting the TCP Window Size

This example demonstrates the use of the -X and -Y options to set the TCP window size (available in bbSCP version 1.0.2 and above): crow% ./bbscp -V -N -X 2000 -Y 2000 1gig.dat [email protected]:/home/user/garbage.dat bbscp: will run commandline: bbftp -s -r 1 -V -p 8 -u kfreeman -i /tmp/bbscp.SNxL5RT dmzfs1.nas.nasa.gov bbscp: begin output of bbftp: [email protected]'s password:

>> COMMAND : setoption keepaccess << OK >> COMMAND : setoption keepmode << OK >> COMMAND : setoption nocreatedir << OK >> COMMAND : setsendwinsize 2000 << OK >> COMMAND : setrecvwinsize 2000 << OK >> COMMAND : put 1gig.dat /home/kfreeman/garbage.dat << OK

1109393408 bytes send in 34.6 secs (3.13e+04 KB/sec or 244 Mb/s) bbscp: end output of bbftp

Dry Run/Debugging

This example demonstrates the use of the -d option for dry run. In this case, the bbSCP script performs its duty but does not actually execute bbFTP. The bbftp command line is printed, along with the contents of the bbFTP control-file. lfe2.user% bbscp -d -V -N one-gig [email protected]:/home/user/data/lfe2-one-gig /usr/local/bin/bbscp: would have run commandline: bbftp -s -r 1 -V -p 8 -u user -i /tmp/bbscp.4PZYIuL crow.eos.nasa.gov

/usr/local/bin/bbscp: bbftp control-file (/tmp/bbscp.4PZYIuL) looks like: setoption keepaccess setoption keepmode setoption nocreatedir put one-gig /home/user/data/lfe2-one-gig

Using bbSCP for Test and Verification 46 Using bbFTP for Remote File Transfers

When and Why to Use bbFTP

If your data is being transferred to or from a NAS system over the wide area network, scp will almost always be the limiting factor, due to the static TCP windowing defined in the OpenSSH (versions older than 4.7) source code. The Bandwidth Delay Product (BDP) states that the bandwidth of the pipe multiplied by the latency gives the optimal window size for data transfer. With the window size statically defined for lower-speed networks, scp can never properly utilize the bandwidth available. bbFTP has dynamically adjustable window sizes (up to the maximum allowed by the system) and can also transmit multiple simultaneous streams of data. We have found that this application provides the best mechanism for making use of the bandwidth available between two sites.

Things to check:

• Are you using scp to transfer files? • Is it OK to transfer your data unencrypted? • Are you transferring files to an offsite location? (outside NAS or NASA Ames) • Is the average delay between sites larger than 30 ms? • Is the data being transferred in large files (1 GB+)?

If the answer to all of these is 'Yes', then the bbFTP application will improve data transfer rates. Please follow the guide below to get started.

Downloading bbFTP bbFTP has been tested to work on many operating systems: Linux, IRIX, Solaris, BSD and MacOSX. Other systems may also be supported.

If you intend to initiate bbFTP from your local system, you will need to install the bbFTP client on your local system. If you intend to initiate bbFTP from a NAS host, you will need to install the bbFTP server on your local system.

Download the latest version (3.2.1) from the bbFTP distribution site, IN2P3.

Installing bbFTP

If you download a source code distribution, follow the instruction below to build and install bbFTP. This guide covers the client setup only. Installing the server version is similar. your_local_system% tar -zxvf bbftp* your_local_system% cd bbftp*/bbftpc (or bbftp*/bbftpd for the server version) your_local_system% ./configure your_local_system% make your_local_system% make install (optional, requires root privileges to install)

By default, the application will install in /usr/local/bin. If you do not have admin privileges, you may skip the last step and copy the bbFTP binary to your home directory, or run it from the current location.

Note: If you are using a Mac system with a newer such as El Capitan or Sierra, you may receive the following error message:

Using bbFTP for Remote File Transfers 47 bbftp_get.c:98:8: error: expected ')' my64_t ntohll(my64_t v) ;

To resolve this error, change the ./configure line listed above as follows: your_local_system% env CFLAGS=-DHAVE_NTOHLL ./configure

Using bbFTP

To write the version of bbFTP and default values to standard output: bbftp -v

For example: pfe21% bbftp -v bbftp version 3.2.0 Compiled with : default port 5021 compression with Zlib-1.2.3 encryption with OpenSSL 0.9.8a 11 Oct 2005 default ssh command = ssh -q default ssh remote command = bbftpd -s default number of tries = 5 default sendwinsize = 256 Kbytes default recvwinsize = 256 Kbytes default number of stream = 1

To request the execution of commands contained in the control file control_file or the control_commands using remote_username on remote_host: bbftp [options] [-u remote_username] -i control_file [remote_host] bbftp [options] [-u remote_username] -e control_commands [remote_host]

Notice that -i or -e option are mandatory. The examples given in this article all use -e control_commands.

Available options are:

[-b (background)] [-c (gzip compress)] [-D[min:max] (Domain of Ephemeral Ports)] [-f errorfile] [-E server command for ssh] [-I ssh identity file] [-L ssh command] [-s (use ssh)] [-S (use ssh in batch mode)] [-m (special output for statistics)] [-n (simulation mode: no data written)] [-o outputfile] [-p number of // streams] [-q (for QBSS on)] [-r number of tries ] [-R .bbftprc filename] [-t (timestamp)] [-V (verbose)] will print out the transfer rate [-w controlport] [-W (print warning to stderr)]

For more information about each option, see man bbftp. Those used in the examples will be briefly described.

Single stream vs multiple streams

Using bbFTP for Remote File Transfers 48 Single stream:

Using single stream is the easiest, but may not provide optimal performance.

In the examples below, bbFTP is run from the current working directory. If it was installed in a system path location, the "./" may be omitted.

The -s option says to use ssh to remotely a bbftpd daemon. It usually starts the binary bbftpd -s, but this can be changed through the -E option.

The first command is to pull a file from a remote host using get and the second command is to push a file to the remote host using put.

./bbftp -s -u remote_username -e 'get filename' remote_host

./bbftp -s -u remote_username -e 'put filename' remote_host

Multiple streams:

For transfers between two NAS hosts, such as Pleiades and Lou, no more than 2 streams should be used.

For transfers between your site and NAS, more streams will probably help. In several tests, using 8 streams gave the best performance.

If there is little increase in the transfer rate from single stream to multiple streams, a lower number may be used. The value must be changed in both the control command -e and the server command -E so that the server listens for the same number of streams as the client requests.

In the examples below, -s is not used. Instead, -E 'bbftpd -s' is used to use ssh to remotely start a bbftpd daemon.

./bbftp -u remote_username -e 'setnbstream 8; get filename' -E 'bbftpd -s -m 8' remote_host

./bbftp -u remote_username -e 'setnbstream 8; put filename' -E 'bbftpd -s -m 8' remote_host

• File related commands

You may need to use the command cd to change directory on the remote host or lcd to change directory on the host where the bbftp command is issued in order to get or put files from/to the directory you intend to use. For the rules, please see the man page of bbFTP. Here are some examples:

bbftp -s -u remote_username -e 'cd /u/username/abc; get filename' remote_host

bbftp -s -u remote_username -e 'cd /u/username/abc; lcd def; put filename' remote_host

Because of a formatting issue, each command above was broken into two lines. They should be one line each. • Initiating bbFTP from a host outside of NAS domain

If you want to initiate bbFTP from a host that is not within the NAS domain to transfer files to/from a NAS host (not including dmzfs1), you must first set up SSH passthrough. Then, complete these steps:

Using bbFTP for Remote File Transfers 49 In the .ssh/config file on your local system, be sure to include entries with the fully-qualified domain name. For example:

Host pfe21.nas.nasa.gov ProxyCommand ssh sfe1.nas.nasa.gov /usr/local/bin/ssh-proxy pfe21.nas.nasa.gov

In the bbftp command line, use the fully-qualified domain name (: pfe21.nas.nasa.gov) of the NAS host. For example,

your_local_system% ./bbftp -s -u nas_username -e 'get filename' pfe21.nas.nasa.gov

These two steps are needed due to the fact that bbFTP uses 'gethostbyname' function to check a hostname for connection and then it uses ssh to connect to that hostname. Thus a fully-qualified domain name in the ./ssh/config file is required. If the fully-qualified domain name cannot be found in ./ssh/config, one will get the error:

BBFTP-ERROR-00061 : Error waiting MSG_LOGGED_STDIN message

For Pleiades, one has to use pfe[20-27].nas.nasa.gov. The front-end load balancer, pfe.nas.nasa.gov, does not work with bbFTP. For example:

your_local_system% bbftp -s -u nas_username -e 'get filename' pfe.nas.nasa.gov BBFTP-ERROR-00017 : Hostname not found (pfe.nas.nasa.gov)

On the other hand, for ssh or scp, one can use either the fully-qualified domain name above or the abbreviated name below:

Host pfe21 ProxyCommand ssh sfe1.nas.nasa.gov /usr/local/bin/ssh-proxy pfe21.nas.nasa.gov • Specifying port range

You can define the NAS port range, 50000-51000, by using one of the following methods. (The default port, 5000/tcp, will remain open because it is the control channel for bbFTP.)

bbftp -s -u remote_username -D50000:51000 -e 'setnbstream 8; put filename' -E 'bbftpd -s -m8' remotehost

bbftp -u username -e ' put file' -E 'bbftpd -s -e 50000:51000' hostname

Note that the first command has been broken into two lines due to a formatting issue. It should be on one line.

Performance Tuning

To find the transfer rate, turn on the -V option.

Performance of bbFTP is affected by the number of streams and the TCP window sizes.

The TCP window size determines the amount of outstanding data a transmitting end-host can send on a particular connection before it gets acknowledgment back from the receiving end-host. For optimal performance, the window size should be set to the value of the Bandwidth Delay Product (i.e., the product of the bandwidth of the pipe and the latency). bbFTP is compiled with a default send and receive TCP window size as can be seen with the -v option and can dynamically adjust the window size (up to the maximum allowed by the system) for better performance. However, a user can also choose a non-default send/recv window size (in KB). For example: bbftp -e 'setrecvwinsize 1024; setsendwinsize 1024; put filename' -E 'bbftpd -s' remote_host

Using bbFTP for Remote File Transfers 50 For high-speed links where bbFTP is still not performing as well as expected, it may be due to a system windowing limitation. Most operating systems have the maximum window size set to a small value, such as 64 KB. As practice, NAS systems are set to a minimum of 512 KB.

If you are not receiving good performance, ask your system administrator if performance tuning is necessary for your local system.

Using bbFTP for Remote File Transfers 51 Using SUP for Remote Transfers

Using the Secure Unattended Proxy (SUP)

The Secure Unattended Proxy (SUP) allows you to perform remote operations on specific hosts within the NAS enclave (currently, the Pleiades and Lou front-end systems [PFEs and LFEs]) without needing to use your RSA SecurID token at the time of the operation.

To accomplish this, you must first obtain special "SUP keys" using RSA SecurID authentication, which you can then use to perform operations from unattended jobs and/or scripts. Each SUP key is valid for a period of one week from the time it is generated. You may have multiple SUP keys at the same time, which will expire asynchronously.

SUP keys are currently allowed to call scp, sftp, bbftp, qstat, rsync, shiftc, ssh-balance, and test. In the future, other operations may be available via the SUP.

Note: The shiftc command is built into the sup command; therefore, when you download the SUP client, you will also get shiftc.

SUP Usage Summary

The steps in this section demonstrate how to quickly get up and running with the SUP. Each step is explained in more detail in subsequent sections.

In these steps, host refers to a PFE or LFE; the command lines shown are examples.

1. Download and install the SUP client in your personal bin directory (one time).

your_localhost% wget -O sup https://www.nas.nasa.gov/hecc/support/kb/file/9 your_localhost% chmod 700 sup your_localhost% mv sup ~/bin 2. Authorize host for SUP operations (one time per host):

pfe% touch ~/.meshrc 3. Authorize directories for write operations (one or more times per host):

pfe% echo /nobackupp2 >> ~/.meshrc 4. Execute a command (each time)

your_localhost% sup scp foobar pfe21:/nobackupp2/username/c_foobar 5. Examine expected output (as needed) 6. Troubleshoot problems (as needed)

SUP Client

The SUP client performs all the steps needed to execute commands through the SUP as if the SUP itself did not exist. Commands that are allowed to pass through the SUP can be executed as if the remote host were directly connected by simply prepending the client command sup, as shown in example 4, above. Besides executing remote commands, the client also includes an operating-system-independent virtual filesystem that allows files across all SUP-connected resources to be accessed using standard filesystem commands.

Using SUP for Remote Transfers 52 Requirements

The client requires Perl version 5.8.5 or above to execute and has been tested successfully on Linux, OS X, and Windows under Cygwin and coLinux. Only SSH is required to use the SUP. However, if these requirements cannot be met it is possible to use the SUP without the client.

Note for Windows users: even if the client is not used, scp and sftp require functionality only found in the OpenSSH versions of these commands, so Cygwin or coLinux will still be needed.

Installation

Complete the following steps to install the client:

1. Download the client and save it to a file called sup 2. Make the client executable by using chmod 700 sup 3. the client to a location in your $PATH

SSH Configuration

If your local username differs from your NAS username, it is recommended that you add the following to your ~/.ssh/config file, where nas_username is replaced with your NAS username:

Host sup.nas.nasa.gov sup-key.nas.nasa.gov User nas_username

Note: If you are using a config file based on the NAS config template, you do not have to do this step.

Alternatively, the client's -u option can be used as described in the next section. If your local username is the same as your NAS username, no additional configuration or command-line options are required.

SUP Command-line Options

-b Disable user interaction; for use within scripts. Note that the client will fail if any interaction is required--normally only needed when your SUP key has expired or is otherwise unavailable. -k By default, the client leaves any SSH agents started on your behalf running for future invocations after the client exits. This option forces spawned agents to be killed before exiting. Note that -b automatically implies -k. -u username Specify NAS username. Note that this option is required if your local username differs from your NAS username and you have not modified your SSH configuration appropriately. -v Enable verbose output for debugging purposes.

SUP Authorizations

Using the Secure Unattended Proxy (SUP) 53 The basic set of operations that may be performed using the SUP is specified by the administrator. To protect accounts from malicious use of SUP keys, users must grant execute and write permissions to SUP operations on each target system.

Authorizing Execute Operations

By default, even SUP operations permitted by site policy are not allowed to execute on a given host. To enable SUP operations to a given host (PFE or LFE), the file ~/.meshrc must exist on that host, which can be created by invoking the following:

% touch ~/.meshrc

Note that the PFEs share their home filesystems, so this must only be done on one of these nodes. Other systems must be authorized separately. Once this file exists on a host, all operations permitted by site policy are allowed to execute on that host.

Authorizing Write Operations

By default, SUP operations are not allowed to write to the filesystem on a given host. To enable writes to a given directory on a given host, that directory must be added (on a separate line) to the ~/.meshrc file on that host. For example, the following lines in ~/.meshrc indicate that writes should be permitted to /nobackupp2 and your home directory.

/nobackupp2 /u/username

Each directory is the root of allowed writes, so this configuration would allow writes to all files and directories rooted at /nobackupp2 and your home directory (for example, /nobackupp2/some/dir).

Note that the root directory cannot be authorized. Also note that dot files (that is, ~/.*) in your home directory are never writable regardless of the contents of ~/.meshrc.

Executing Commands Through SUP

Usage examples of each command that may be executed through the SUP are given below.

Note: SUP commands must be authorized for execution on each target host, and that transfers to a given host must be authorized for writes. Before a given operation is performed, the client may ask for certain information, including the existing or new passphrase for ~/.ssh/id_rsa, the password + passcode for sup.nas.nasa.gov, and/or the password + passcode for sup-key.nas.nasa.gov.

For more detailed information on each command, see their corresponding man pages.

File Transfer Commands bbftp your_localhost% sup bbftp -e "put foobar /u/username/c_foobar" pfe21.nas.nasa.gov

Using the Secure Unattended Proxy (SUP) 54 Note that you must use the fully qualified domain name of the target host (in this case, pfe21.nas.nasa.gov) if you are not within the NAS domain. bbscp your_localhost% sup bbscp foobar pfe21.nas.nasa.gov:/u/username/c_foobar

Note that the bbscp script is just a client-side wrapper for bbftp, therefore, as with bbftp, you must use the fully qualified domain name of the target host (in this case, pfe21.nas.nasa.gov) if you are not within the NAS domain. rsync your_localhost% sup rsync foobar pfe21:/nobackupp2/username/c_foobar

If you intend to transfer files to your home directory, note that even if your home directory has been authorized for writes, rsync transfers to your home directory will fail unless the --inplace option is specified. This is because rsync uses temporary files starting with "." during transfers, which cannot be written in your home directory. You can avoid this problem by specifying --inplace as shown in the following example: your_localhost% sup rsync --inplace foobar pfe21: scp your_localhost% sup scp foobar pfe21:/nobackupp2/username/c_foobar sftp your_localhost% sup sftp pfe21 shiftc your_localhost% sup shiftc foobar pfe21:/nobackupp2/username/c_foobar

For more information, see Using Shift for Remote Transfers and Tar Operations.

File Monitoring Command test your_localhost% sup ssh pfe21 test -f /u/username/c_foobar

Job Monitoring Command qstat your_localhost% sup ssh pfe21 qstat @pbspl1 ssh-balance and bbFTP ssh-balance your_localhost% sup ssh-balance -l pfe

Note that this command allows the Pleiades load balancer to be used with bbftp:

Using the Secure Unattended Proxy (SUP) 55 your_localhost% sup bbftp -e "put foobar /u/username/c_foobar" `sup ssh-balance -l pfe`

SUP Expected Output

The following sequence shows the expected output for the command: your_localhost% sup scp foobar pfe21:/nobackupp2/username/c_foobar for a user has never used the SUP before.

The conditions under which each sub-sequence will be seen are indicated next to each header. Most of the items will only be seen once or during key generation. A second invocation will only show the command output portion.

1. Host key verification (seen once per client host)

No host key found for sup-key.nas.nasa.gov ...continue if fingerprint is 1b:9a:82:2b:b9:b0:7d:e5:08:50:1d:e8:14:76:a2:2e The authenticity of host 'sup-key.nas.nasa.gov (198.9.4.24)' can't be established. RSA key fingerprint is 1b:9a:82:2b:b9:b0:7d:e5:08:50:1d:e8:14:76:a2:2e. Are you sure you want to continue connecting (yes/no)? yes No host key found for sup.nas.nasa.gov ...continue if fingerprint is 52:f3:61:9b:9c:73:79:4d:22:cb:f3:cd:9a:29:4e:fe The authenticity of host 'sup.nas.nasa.gov (198.9.4.21)' can't be established. RSA key fingerprint is 52:f3:61:9b:9c:73:79:4d:22:cb:f3:cd:9a:29:4e:fe. Are you sure you want to continue connecting (yes/no)? yes 2. Identity creation (seen during key generation if no identity available)

Cannot find identity /home/user/.ssh/id_rsa ...do you wish to generate it? (y/n) y Generating public/private rsa key pair. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/user/.ssh/id_rsa. Your public key has been saved in /home/user/.ssh/id_rsa.pub. The key fingerprint is: a3:cf:e5:50:12:6f:14:b1:21:59:19:a8:33:aa:77:40 user@host 3. Identity addition to agent (seen during key generation)

Adding identity /home/user/.ssh/id_rsa to agent Enter passphrase for /home/user/.ssh/id_rsa: Identity added: /home/user/.ssh/id_rsa (/home/user/.ssh/id_rsa) 4. Identity initialization (seen once per identity)

Initializing identity on sup-key.nas.nasa.gov (provide login information) Password: Enter PASSCODE: Key a3:cf:e5:50:12:6f:14:b1:21:59:19:a8:33:aa:77:40 uploaded successfully 5. SUP key generation (seen when no valid SUP keys available)

Generating key on sup.nas.nasa.gov (provide login information) Password: Enter PASSCODE: 6. Client upgrade (seen during key generation when new client available)

A newer version of the client is available (0.39 vs. 0.37) ...do you wish to the current version? (y/n) y 7. Command output (always seen)

foobar 100% 5 0.0KB/s 00:00

SUP Troubleshooting

Using the Secure Unattended Proxy (SUP) 56 The following error messages may be encountered during your SUP client usage. Note that the -v option can be given to the SUP client to output additional debugging information.

• "WARNING: Your password has expired"

This message indicates that your current password has expired and must be changed. To change your password, you must log in to an LDAP host (for example, Lou) through an SFE and change your LDAP password. This change will be automatically propagated to the SUP within a few minutes. • "Permission denied (~/.meshrc not found)"

This message indicates that you have not created a .meshrc file in your home directory on the target host. SUP commands must be authorized for execution on each target host. • "Permission denied (unauthorized command)"

This message indicates that you have attempted an operation that is not currently authorized by the SUP. Check that the command line is valid and that the attempted command is one of the authorized commands. Certain options to authorized commands may also be disallowed, but these should never be needed in standard usage scenarios. • "Permission denied during file access" (various forms)

These messages indicates that you attempted to read or write a file for which such access is not allowed. The most common cause is forgetting to authorize directories for writes. Reads and writes of ~/.* are never permitted. • "Permission denied (publickey)"

This message indicates that you may not have proper permissions on your ~/.ssh and/or home directory on the target host. Check to make sure that ~/.ssh is not readable/writable by other users/groups and that your home directory is not writable by other users/groups.

Using the Secure Unattended Proxy (SUP) 57 Advanced SUP Use

Using the SUP Virtual File System

The SUP virtual file system (VFS) capability can be used to issue file-related commands to all SUP-connected hosts from the command line of your local host.

Accessing Files Across Multiple Hosts

The SUP client includes a virtual file system capability that allows files across all SUP-connected resources to be accessed using standard file system commands. For example, once you have activated VFS as described below, the command:

% ls pfe21:/tmp from your local host would list the files in /tmp on pfe21. The command:

% cp filename pfe21:/tmp from your local host would copy the file filename from your current directory on your local host to /tmp on pfe21.

The set of supported commands includes , cd, , chmod, , , cp, , diff, du, file, , , less, ln, ls, mkdir, more, mv, , , , , , test, touch, and wc.

Note that this functionality is not a true file system since only these commands are supported and only when used from within a shell. Unlike more general approaches such as FUSE, however, the SUP capability is completely portable and can be enabled with no additional privileges or software.

Commands through the VFS functionality can act on any combination of local and remote files, where remote files are prefixed with hostname:. For example, the command:

% cat pfe21:/tmp/rfile ~/lfile would print the file rfile in /tmp on pfe21 as well as the file lfile in the user's home directory on the local host to the terminal. Any number of hosts can be included in any command. For example, the command:

% diff lfe2:/tmp/lfe_file pfe21:/tmp/pfe_file would show the differences between the file lfe_file in /tmp on lfe2 and the file pfe_file in /tmp on pfe21. The client determines if any remote access is needed based on the path(s) given. If not, it will execute the command locally as given as rapidly as possible. Fully local commands also support all options with the exception of options of the form -f value (that is, single-dash options that take values).

VFS Activation

Requirements

Currently, SUP VFS functionality is only supported for bash, but csh support is planned for the future. This functionality requires Perl version 5.8.5 (note that this is more recent than version

Advanced SUP Use 58 5.6.1 required by the basic client functionality). It also requires the standard Unix utilities cat, column, false, sort, and true; and has been tested successfully on Linux, OS X, and Windows under Cygwin and coLinux. Note that users of Windows under Cygwin may need to install the coreutils and util-linux packages to obtain these utilities.

Activation and Deactivation on Your Local Host

Complete these steps.

1. Install the SUP client if you have not already done so. 2. Activate VFS functionality in a bash shell. ♦ For an interactive bash shell, run:

eval `sup -s bash` ♦ For a non-interactive bash shell, begin the script with the following:

#!/bin/bash shopt -s expand_aliases eval `sup -s bash` The instructions will load aliases and functions that will intercept specific commands and replace them with commands that will perform the requested actions through the SUP client.

To deactivate VFS functionality in a bash shell or bash script at any time, run:

% eval `sup -r bash`

Command-Line Options

The behavior of the virtual file system can be modified using various options at the time it is activated.

-ocmd=opts Specify default options for a given command since the VFS functionality overrides any existing aliases for its supported set of commands. -t transport Change the file transport from its sftp default to transport. Currently, the only additional transport available is bbftp. Note that using bbftp as the transport may slow down certain operations on small files as bbftp has higher startup overhead. -u user Specify NAS user name. Note that this option is required if your local user name differs from your NAS user name.

For example, the following invocation activates the client virtual file system using bbftp as the transport mechanism, nasuser as the user and adds colorization of local file listings using the Linux ls --color=always option.

% eval `sup -s bash -t bbftp -u nasuser -ols=--color=always`

VFS Caveats

The VFS functionality is still somewhat experimental. In general, it works for the most common usage scenarios with some caveats. In particular:

Using the SUP Virtual File System 59 • "Whole file" commands (that is, commands that must process the entire file), including cat, cmp, diff, grep, wc (and currently more/less due to implementation) retrieve files first before processing for efficiency. Thus, these commands should not be executed on very large files.

• There is a conflict between commands that take piped input and the custom globbing of the client, thus these commands have portions of globbing support disabled. These commands are grep, head, less, more, tail, tee, and wc. In these cases, globbing will work for absolute prefixes, but not relative. For example, grep filename pfe21:/tmp/* will work, but cd pfe21:/tmp; grep filename * will not.

• Redirection to/from remote files doesn't work. The same effect can be achieved using cat and tee (for example, grep localhost < pfe21:/etc/hosts > pfe21:/tmp/a could be done with cat pfe21:/etc/hosts | grep localhost | tee pfe21:/tmp/a > /dev/null). Redirection still works normally for local files.

• The first time a command is run involving a particular host, a SFTP connection is created to that host. When running , it may appear as if a zombie client process is running.

VFS Commands

Currently supported commands and their currently supported options are below. Unsupported options will simply be ignored except where noted. All commands are still subject to SUP authorizations, thus something that cannot be executed or written normally through the SUP cannot be executed or written through this functionality either.

• cat (no options) • cd (no options) Note that when changing to remote directories, cd only changes $PWD, so to make changes visible the working directory (that is, \w in bash) must be in your prompt. For example, the following prompt:

export PS1="\h[\w]> "

would display the current host name followed by the current working directory. • chgrp (no options) Groups may be specified either by number or by name. Names will be resolved on the remote host. • chmod (no options) Modes must be specified numerically (for example, 0700). Symbolic modes, such as a+rX, are not currently supported. • chown (no options) Users and groups may be specified either by number or by name. Names will be resolved on the remote host. • cmp (all options) • cp [-r] Note that copies between two remote hosts transfer files to the local host first since the SUP does not allow third party transfers. Thus, very large file transfers between remote systems should be achieved using an alternate approach. • df [-i] Note that 1024-byte blocks are used. • diff (all options)

Using the SUP Virtual File System 60 • du [-a] [-b] [-s] Note that 1024-byte blocks are used. • file (all options) • grep (all options) • head [-number] Note that head does not support the form -n number, so, for example, to display the first 5 lines of a file, use -5 and not -n 5. • less (all options) • ln [-s] Note that hard links are not supported. Links from remote files to local files (for example, ln -s pfe21:/filename /filename) will be dereferenced during certain operations (for example, cat /filename will cat pfe21:/filename). • ls [-1] [-d] [-l] For efficiency purposes, ls behaves slightly differently for remote commands than for local commands. In particular ls -l will not show links by default and will show what is actually linked instead of the link itself. Link details can be obtained using the -d option (for example, ls -ld *).

Also for efficiency, ls processes remote files before local files, so output ordering may be changed when remote and local files are interleaved on the ls command line. For example, ls /file1 pfe21: /file2 would show pfe21: first, then /file1, then /file2. • mkdir (no options) • more (all options) • mv (no options) • pwd (no options) • rm [-r] • rmdir (no options) • tail [-number] Note that tail does not support the form -n number, so, for example, to display the last 5 lines of a file, use -5 and not -n 5. • tee [-a] • test [-b] [-c] [-d] [-e] [-f] [-g] [-h] [-k] [-L] [-p] [-r] [-s] [-S] [-u] [-w] Note that compound and string tests are not supported. Compound and string tests can be achieved using multiple test commands separated by shell compound operators. For example, instead of

% test -f pfe21:/filename -a "abc" != "123"

do

% test -f pfe21:/filename && test "abc" != "123"

Alternatively, the actual test command can be executed through the SUP:

% sup ssh pfe21 test -f /filename -a "abc" != "123" • touch (no options) • wc (all options)

Using the SUP Virtual File System 61 Using the SUP without the SUP Client

The recommended way to transfer files using the SUP is by using the SUP client. However, because the SUP client requires Perl, it may not be suitable for all purposes, in which case you can use this method.

The only software that is actually required to use the SUP is SSH. This article details the manual steps required to use the SUP with only SSH. Before using this method, be sure to read the client instructions for a full overview of the SUP.

SUP Manual Usage Summary

The steps below demonstrate how to get up and running with the SUP without the client, using a bbFTP transfer to lfe5 as an example. For full details on each step, click the link provided; otherwise, simply read this page to completion.

1. Initialize a long-term key on sup-key.nas.nasa.gov (one time)

ssh -x -oPubkeyAuthentication=no sup-key.nas.nasa.gov \ mesh-keygen --init <~/.ssh/authorized_keys 2. Generate a SUP key (one time per week)

eval `ssh-agent` ssh-add ~/.ssh/id_rsa ssh -A -oPubkeyAuthentication=no sup.nas.nasa.gov \ mesh-keygen |tee ~/.ssh/meshkey.`date -d week +%s` ssh-agent -k 3. Authorize host for SUP operations (one time per host)

ssh lfe5 touch ~/.meshrc 4. Authorize directories for writes (one or more times per host)

ssh lfe5 echo /tmp >>~/.meshrc 5. Prepare the SUP key for use (one time per session)

eval `ssh-agent` ssh-add -t 1w ~/.ssh/meshkey.[0-9]* 6. Execute command (each time)

bbftp -L "ssh -Aqx -oBatchMode=yes sup.nas.nasa.gov ssh -q" \ -e "put /foo/bar /tmp/c_foobar" lfe5.nas.nasa.gov 7. Troubleshoot problems (as needed)

SUP Key Generation

1. On the very first use only, invoke the mesh-keygen command with the --init option on sup-key.nas.nasa.gov to upload an SSH authorized_keys file (used only during key generation and revocation). An authorized_keys file contains one or more SSH public keys that allow the corresponding SSH private keys to be used for authentication to a system. The uploaded authorized_keys file can be an existing file (such as your ~/.ssh/authorized_keys file from any host) or one created specifically for this purpose using a new SSH key pair generated with ssh-keygen. The public keys in this file must be in OpenSSH format (that is, not the format of the commercial SSH version used on the Secure Front-Ends [SFEs]) and must not contain any forced commands (that is, command=). For example, to upload an existing authorized_keys file, the following can be invoked:

Using the SUP without the SUP Client 62 ssh -x -oPubkeyAuthentication=no sup-key.nas.nasa.gov \ mesh-keygen --init <~/.ssh/authorized_keys

You will be prompted to authenticate using both your NAS password (sometimes referred to as your "lou" or "LDAP" password) and your RSA SecurID passcode.

Users who have never connected to sup-key.nas.nasa.gov before may need to add a -oStrictHostKeyChecking=ask option to the scp command line. (RSA key fingerprint of sup-key.nas.nasa.gov is 1b:9a:82:2b:b9:b0:7d:e5:08:50:1d:e8:14:76:a2:2e)

Note that this is on sup-key only and that you must use the -oPubkeyAuthentication=no option as shown. Users outside NAS may need to add an appropriate SSH option to set their login name, such as -l username. 2. Start an SSH agent (or use one currently running):

eval `ssh-agent -s` (if your shell is sh/bash)

or

eval `ssh-agent -c` (if your shell is csh/tcsh) 3. Add a private key corresponding to one of the public keys in the authorized_keys file of Step 1 to the agent (this is unnecessary if an agent is already running with the key loaded). For example:

ssh-add ~/.ssh/id_rsa 4. Invoke the mesh-keygen command on sup.nas.nasa.gov. You will be prompted to authenticate using both your NAS password and your RSA SecurID passcode. After successful authentication, the mesh-keygen command prints a SUP key to your terminal, which should be saved to a file in a directory that is readable only by you. This key can be saved to a file by -and-, redirecting standard output, or using the tee command. For example, to generate a key and redirect it into a file starting with ~/.ssh/supkey and labeled with the current time, the following can be invoked:

ssh -A -oPubkeyAuthentication=no sup.nas.nasa.gov \ mesh-keygen |tee ~/.ssh/meshkey.`date -d week +%s`

Users who have never connected to sup.nas.nasa.gov before may need to add a -oStrictHostKeyChecking=ask option to the SSH command line. (RSA key fingerprint of sup.nas.nasa.gov is 52:f3:61:9b:9c:73:79:4d:22:cb:f3:cd:9a:29:4e:fe)

Note that you must use the -oPubkeyAuthentication=no option as shown. Users outside NAS may need to add an appropriate SSH option to set their login name, such as -l username. 5. Protect your keys. In order to perform unattended operations, SUP keys cannot be encrypted, thus should always be protected with appropriate file system permissions (that is, 400 or 600). Check the permissions of your key immediately after generation and modify if necessary. You are responsible for the privacy of your keys.

SUP Key Management

Each invocation of mesh-keygen creates a new SUP key that is valid for one week from the time of generation. Users may have multiple keys at once that all expire at different times. To facilitate the management of multiple SUP keys, the mesh-keytime and mesh-keykill commands are available.

Mesh-keytime

Using the SUP without the SUP Client 63 To determine the expiration time of a SUP key stored in a file /key/file, the following can be invoked: ssh -xi /key/file -oIdentitiesOnly=yes -oBatchMode=yes \ sup.nas.nasa.gov mesh-keytime

The key fingerprint and expiration time will be printed to your terminal.

Mesh-keykill

To invalidate a specific SUP key stored in a file /key/file before its expiration time has passed, you must have an SSH agent running with the same key you use to generate SUP keys as described in Steps 2 and 3 of the SUP Key Generation section. After which, the following can be invoked: ssh -Axi /key/file -oIdentitiesOnly=yes -oBatchMode=yes \ sup.nas.nasa.gov mesh-keykill

To invalidate all currently valid SUP keys, the following can be invoked: ssh -Ax -oPubkeyAuthentication=no sup.nas.nasa.gov mesh-keykill --all

In this case, you will be prompted to authenticate using both your NAS password and your RSA SecurID passcode.

SUP Key Preparation

Currently, the only operations allowed with a SUP key are scp, sftp, bbftp, qstat, rsync, and test. For all operations, an SSH agent must be started with the SUP key loaded, which can be scripted as needed, because the key is unencrypted.

1. Start an SSH agent:

eval `ssh-agent -s` (if your shell is sh/bash)

or

eval `ssh-agent -c` (if your shell is csh/tcsh)

2. Add a SUP key to the agent (this is the only key required to perform unattended SUP operations) with one week expiration:

ssh-add -t 1w /key/file

Adding the -t option will prevent a buildup of keys in the agent, which can cause login failure as described in the SUP Troubleshooting section. Keys may be explicitly removed from the agent using the following:

ssh-keygen -y -f /key/file >/key/file.pub ssh-add -d /key/file 3. Make sure agent forwarding and batch mode are enabled in your SSH client. The examples below include the appropriate options to enable agent forwarding (-A) and batch mode (-oBatchMode=yes).

Using the SUP without the SUP Client 64 SUP Commands

Examples of the use of each command that may be executed through the SUP are given below. Note that SUP commands must be authorized for execution on each target host and transfers to a given host must be authorized for writes. bbftp

(man page) bbftp -L "ssh -Aqx -oBatchMode=yes sup.nas.nasa.gov ssh -q" \ -e "put /foo/bar /tmp/c_foobar" lfe5.nas.nasa.gov

Note that you must use the fully-qualified domain name of the target host (in this case, lfe5.nas.nasa.gov) if you are not within the NAS domain. bbscp

(man page) bbscp -L "ssh -Aqx -oBatchMode=yes sup.nas.nasa.gov ssh -q" \ foobar lfe5.nas.nasa.gov:/tmp/c_foobar

Note that bbscp is just a client-side wrapper for bbftp, thus like bbftp, you must use the fully-qualified domain name of the target host (in this case, lfe5.nas.nasa.gov) if you are not within the NAS domain. qstat

(man page available on Pleiades and Lou) ssh -Aqx -oBatchMode=yes sup.nas.nasa.gov ssh -q lfe5 qstat @pbs1 rsync

(man page) rsync -e "ssh -Aqx -oBatchMode=yes sup.nas.nasa.gov ssh -q" \ foobar lfe5:/tmp/c_foobar

Note that even if your home directory has been authorized for writes, rsync transfers to your home directory will fail unless the -T or --temp-dir option is specified. This is because rsync uses temporary files starting with "." during transfers, which cannot be written in your home directory. By specifying an alternate temporary directory that is authorized for writes, this problem can be avoided. For example, the following uses /tmp as the temporary directory when files are transferred to the home directory. Make sure that the temporary directory specified has enough space for the files being transferred. rsync -T /tmp -e "ssh -Aqx -oBatchMode=yes sup.nas.nasa.gov ssh -q" \ foobar lfe5:

Using the SUP without the SUP Client 65 scp

(man page)

1. Create a file (for example, "supwrap") containing the following:

#!/bin/sh exec ssh -Aqx -oBatchMode=yes sup.nas.nasa.gov ssh -q $@ 2. Make the created file executable:

chmod 700 supwrap 3. Initiate the transfer. For example:

scp -S ./supwrap foobar lfe5:/tmp/c_foobar sftp

(man page)

1. Create a file (for example, "supwrap") containing the following:

#!/bin/sh exec ssh -Aqx -oBatchMode=yes sup.nas.nasa.gov ssh -q $@

Note that this file is identical to the one described for scp.

2. Make the created file executable:

chmod 700 supwrap 3. Initiate the transfer. For example:

sftp -S ./supwrap lfe5 test

(man page) ssh -Aqx -oBatchMode=yes sup.nas.nasa.gov ssh -q lfe5 test -f /tmp/c_foobar

SUP Troubleshooting

The following error messages may be encountered during SUP usage:

• "WARNING: Your password has expired"

This message indicates that your current NAS password has expired and must be changed. To change your password, you can log in to an LDAP host (for example, an LFE or PFE) through the SFEs and follow the prompts, or you can use the NAS Password Change Form. This change will be automatically propagated to the SUP within a few minutes. • "Permission denied (~/.meshrc not found)"

This message indicates that you have not created a .meshrc file in your home directory on the target host. SUP commands must be authorized for execution on each target host.

Using the SUP without the SUP Client 66 • "Permission denied (key expired)"

SUP keys are only valid for one week from the time of generation. This message indicates that the SUP key used for authentication has expired and is no longer valid. You must generate a new SUP key or use a different SUP key before attempting another operation. • "Permission denied (publickey,keyboard-interactive)"

This message indicates that you have not provided the appropriate authentication credentials to the SUP. There may be several causes:

♦ If you are generating a SUP key and also receive an "Error copying key..." message, you have not loaded a private key into your SSH agent corresponding to one of the public keys in the authorized_keys file uploaded to sup-key in Steps 1-3 of the SUP Key Generation section. You can verify that the correct key is loaded by running ssh-keygen -l -f uploaded_key_file and ssh-agent -l and checking that the fingerprint of your uploaded key file has been loaded into your SSH agent. ♦ If you have specified -oBatchMode=yes on the command line, a valid SUP key may not been loaded into your SSH agent. There may also be too many keys loaded into your agent. SSH tries each key in the agent sequentially, so a valid key may still fail if it was added to the agent after a number of invalid keys greater than or equal to the login attempt limit. Check the number of keys in the agent using ssh -l. The agent may be cleared of keys using ssh-add -D. ♦ If you have specified -oPubkeyAuthentication=no, you have not provided a valid NAS password and/or a valid RSA SecurID passcode.

• "Permission denied (unauthorized command)"

This message indicates that you have attempted an operation that is not currently authorized by the SUP. Check that the command line is valid and that the attempted command is one of the authorized commands. Certain options to authorized commands may also be disallowed, but these should never be needed in standard usage scenarios. • "Permission denied during file access (various forms)"

These messages indicate that you attempted to read or write a file for which such access is not allowed. The most common cause is forgetting to authorize directories for writes. Reads and writes of ~/.* are never permitted. • "Permission denied (publickey)"

This message indicates that you may have improper permissions on your ~/.ssh and/or home directory on the target host. Check to make sure that ~/.ssh is not readable/writable by other users/groups and that your home directory is not writable by other users/groups.

Using the SUP without the SUP Client 67 Examples

Inbound File Transfer Through SFEs: Examples

Inbound file transfers through the secure front ends (SFEs) require RSA SecurID token authentication. Files cannot be transferred directly to the SFEs, so transfers must be done using either SSH passthrough or using scp with the ‑oProxyCommand option.

Note: While this article covers file transfers through the SFEs only, you can also transfer files through the Secure Unattended Proxy.

To simplify the instructions, the approaches are described in terms of transfers to or from one of the Pleiades front ends (PFEs), such as pfe21, but they also apply to any of the other systems that are in the secure enclave—such as other PFEs, or the Lou front ends (LFEs).

For some of the methods described, two commands are provided. The first command (a) is used if you have identical usernames on your local system and on the NAS systems, or if the usernames are different but you have set up your local ~/.ssh/config file to include the NAS username. To learn how to set this up, download the ~/.ssh/config template. The second command (b) is used if your usernames are different and you do not include the NAS username in your local ~/.ssh/config file.

File Transfers Using scp with -oProxyCommand

The scp command is not recommended for files over 1 GB; see Remote File Transfer Commands for a comparison of commands. If you have not set up SSH passthrough, you must use scp with the ‑oProxyCommand option for inbound file transfers through the SFEs.

Note: Because of a formatting issue, the commands in this section are broken into two lines. Each should be on only one line.

Using scp with the ‑oProxyCommand option to push files out of your local system:

(a) your_local_system% scp -oProxyCommand='ssh sfe6.nas.nasa.gov ssh-proxy %h' filename pfe21.nas.nasa.gov: or

(b) your_local_system% scp -oProxyCommand='ssh [email protected] ssh-proxy %h' filename [email protected]:

Using scp with the ‑oProxyCommand option to pull files into your local system:

(a) your_local_system% scp -oProxyCommand='ssh sfe6.nas.nasa.gov ssh-proxy %h' pfe21.nas.nasa.gov:filename . or

(b) your_local_system% scp -oProxyCommand='ssh [email protected] ssh-proxy %h' [email protected]:filename .

File Transfers Using SSH Passthrough

If you have set up SSH passthrough correctly, you can use scp, bbftp, bbscp, or shiftc to transfer files between your local system and a NAS host. You can transfer files directly into the NAS host and avoid the need to double-authenticate. The passage through the SFEs is transparent.

Examples 68 Using scp

Using scp to push files out of your local system:

(a) your_local_system% scp filename pfe21.nas.nasa.gov: or

(b) your_local_system% scp filename [email protected]:

Using scp to pull files into your local system:

(a) your_local_system% scp pfe21.nas.nasa.gov:filename . or

(b) your_local_system% scp [email protected]:filename .

Using bbftp

This requires that you have a bbFTP client installed on your local system.

Using bbtfp to push files out of your local system:

(a) your_local_system% bbftp -s -e 'setnbstream 8; put filename' pfe21.nas.nasa.gov or

(b) your_local_system% bbftp -s -u nas_username -e 'setnbstream 8; put filename' pfe21.nas.nasa.gov

Note: Because of a formatting issue, the second command is broken into two lines. It should be on only one line.

Using bbftp to pull files into your local system:

(a) your_local_system% bbftp -s -e 'setnbstream 8; get filename' pfe21.nas.nasa.gov or

(b) your_local_system% bbftp -s -u nas_username -e 'setnbstream 8; get filename' pfe21.nas.nasa.gov

Note: Because of a formatting issue, the second command is broken into two lines. It should be on only one line.

See bbFTP for more instructions.

Using bbscp

This command requires that you have the bbFTP client and the NAS bbSCP script installed on your local system.

To push files out of your local system:

(a) your_local_system% bbscp filename pfe21.nas.nasa.gov: or

(b) your_local_system% bbscp filename [email protected]:

Inbound File Transfer Through SFEs: Examples 69 To pull files into your local system:

(a) your_local_system% bbscp pfe21.nas.nasa.gov:filename . or

(b) your_local_system% bbscp [email protected]:filename .

See bbSCP for more instructions.

Using shiftc

See Shift File Transfer Overview for shiftc examples.

More About File Transfers

See the list of links in the File Transfer Overview.

Inbound File Transfer Through SFEs: Examples 70 Outbound File Transfer Examples

You can transfer files between NAS and your local site either by running the transfer command on a Pleiades or Lou front-end node (PFE or LFE), or by running the command on your local system. Running the command on your local system requires you to go through a secure front end (SFE) or to set up the Secure Unattended Proxy (SUP) or SSH passthrough. Therefore, if your local system is set up to allow direct inbound connections, then starting the transfer from a PFE or LFE will be much simpler than starting the transfer from your local system.

The sample command lines shown below demonstrate how to run the scp, bbftp, or bbscp commands on a PFE to transfer files to or from you local system. You can also apply the commands to other systems in the enclave, such as the LFEs.

Notes:

• The sample command lines use pfe21 as an example. • Two command lines are provided for each transfer method. Use the first one if your username for your local system is the same as your username for the NAS systems. If the usernames are different, use the second.

Using scp

To push files out of a PFE to your local system: pfe21% scp filename your_local_system: pfe21% scp filename local_username@your_local_system:

To pull files into a PFE from your local system: pfe21% scp your_local_system:filename . pfe21% scp local_username@your_local_system:filename .

Using bbftp

The bbftp tool can only be used for data that doesn't need to be encrypted. If you find that using scp gives poor performance rates, and you aren't required to encrypt the data transfer, you might get better performance using bbftp. To use bbftp, the bbFTP server (bbftpd) must be installed on your local system.

To push files out of a PFE to your local system: pfe21% bbftp -s -e 'setnbstream 8; put filename' your_local_system pfe21% bbftp -s -u local_username -e 'setnbstream 8; put filename' your_local_system

To pull files into a PFE from your local system: pfe21% bbftp -s -e 'setnbstream 8; get filename' your_local_system pfe21% bbftp -s -u local_username -e 'setnbstream 8; get filename' your_local_system

For more detailed instructions, see Using bbFTP for Remote File Transfers.

Using bbscp

Outbound File Transfer Examples 71 The bbscp tool is a wrapper for bbftp that provides scp-like syntax. Like bbftp, bbscp can only be used for data that doesn't need to be encrypted. To use bbscp, the bbFTP server (bbftpd) must be installed on your local system.

To push files out of a PFE to your local system: pfe21% bbscp filename your_local_system: pfe21% bbscp filename local_username@your_local_system:

To pull files into a PFE from your local system: pfe21% bbscp your_localhost:filename . pfe21% bbscp local_username@your_local_system:filename .

For more detailed instructions, see The bbscp Script.

Outbound File Transfer Examples 72 Optimizing/Troubleshooting

Increasing File Transfer Rates

If you are moving large files, use the shiftc or bbftp commands instead of cp or scp. An online NAS service can help diagnose your remote network connection issues, and our network experts can work with your specific file transfer problems.

For fastest file transfer between Pleiades /nobackup and Lou, log into Lou and use shiftc, cxfscp, mcp, or mtar. A simple cp or tar will also work, but at slower speeds.

Moving large amounts of data efficiently to or from NAS across the network can be challenging. Often, minor system, software, or network configuration changes can increase network performance an order of magnitude or more.

If you are experiencing slow transfer rates, try these quick tips:

• Pleiades /nobackup are mounted on Lou, enabling disk-to-disk copying, which should give the highest transfer rates. You can use the shiftc, cp, or mcp commands to copy files or even make tar files directly from Pleiades /nobackup to your Lou home directory. • If using the scp command, make sure you are using OpenSSH version 5 or later. Older versions of SSH have a hard limit on transfer rates and are not designed for WAN transfers. You can check your version of SSH by running the command ssh -V. • For large files that are a gigabyte or larger, we recommend using bbFTP. This application allows for transferring simultaneous streams of data and doesn't have the overhead associated with encrypting all the data (authentication is still encrypted). • Another reliable option for large file transfers is through the Shift transfer tool, which includes options specific to the NAS environment, such as checking to see whether files residing on Lou are also on tape.

One-on-One Help

If you would like further assistance, contact the NAS Control Room at [email protected], and a network expert will work with you or your local administrator one-on-one to identify methods for increasing your transfer rates.

To learn about other network-related support areas see End-to-End Networking Services.

Optimizing/Troubleshooting 73 Dealing with Slow File Retrieval

On Lou, commands that should finish quickly may occasionally take a long time. This problem is usually due to slow retrieval of files from disk to tape.

When you run the ls command on Lou, the output shows all your Lou files on disk. However, most of the files are actually written to tape using the Data Migration Facility (DMF).

One reason for slow file retrieval is that for some multiple file transfers—for example, if you do an scp transfer with a list of files—Linux feeds each file to DMF one at a time, and DMF does not deal well with retrieving one file at a time from a long list of files. This means that the tape(s) containing the files is constantly being loaded and unloaded, which is very slow (and is bad for the tape and tape drives). As the list of files gets longer (through the use of "*" or moving a "tree" of files), the problem grows to where it can take hours to transfer a set of files that would only take a few minutes if they were on disk. This can be particularly problematic when several people do these types of file transfers at the same time.

The methods described below can help you avoid these problems.

Note: For more information about the commands in this section, see Data Migration Facility (DMF) Commands.

Optimizing File Retrieval

You can fetch files to disk as a group by running the dmget command before running your file transfer. dmget reads the tape once and gets all the requested files in a single pass.

Run dmget on the same list of files you are about to transfer. Then, after the dmget operation completes, you can transfer the files using scp/ftp/cp as you had originally intended. Or, you can put dmget in the background and run your transfer while dmget is working. If any files are already on disk, dmget sees this and doesn't try to get them from tape.

DMF also provides the dmfind command, which enables you to walk a file tree to find offline files to give to dmget.

Note: Be sure you are in the correct directory before running dmfind. Use the pwd command to determine your current directory.

Please check to make sure too much data isn't brought back online at once, either by using du with the --apparent-size option or by using /usr/local/bin/dmfdu. For example: lou% /usr/local/bin/dmfdu filename filename 13 MB regular 340 files 1114 MB dual-state 1920 files 74633 MB offline 2833 files 13 MB small 340 files 75761 MB total 5093 files

File transfer rates vary depending on the load on the system and how many users are transferring files at the same time. Typically, scp transfers between Lou and Pleiades on the /nobackup file system run between 30-120 MB/s for files larger than 100 MB, using the 10-gigabit network interface.

Example 1:

Dealing with Slow File Retrieval 74 lou% dmget *.data & lou% scp -qp *.data myhost.jpl.nasa.gov:/home/user/dir_name

Example 2: lou% dmfind /u/username/FY2000 -state OFL -print | dmget & lou% scp -rqp /u/username/FY2000 hostname:/nobackup/username/dir_name

You can see the state of a file by running dmls -l instead of ls -l.

Maximum Amount of Data to Retrieve Online

The online disk space for Lou is considerably smaller than its tape storage capacity, and it is impossible to retrieve all files to online storage at the same time. Using the Shift tool for file transfers automatically ensures that files on Lou are retrieved in batches and released afterwards so there is no need to manually split up the transfer. If you do not use Shift, however, then you should confirm whether there is enough disk space before you retrieve a large amount of data.

The df command shows the amount of free space in a filesystem. The Lou script dmfdu reports how much total (online and offline) data exists in a directory. To use dmfdu, simply cd into the directory you want to check, and execute the script.

If you would like to know the total amount of data under your home directory on Lou, you need to first find out if your account is under s1i-s1n or s2i-s2n. Assuming you are under s1c, you can then use dmfdu /s1c/user_id to find the total amount. Another alternative is to simply cd to your home directory and use dmfdu *, which will show use for each file or directory.

Lou's archive filesystems are between 85 TB and 450 TB in size, but the available space typically floats between 10% to 30%. In Example 3, 29% of space is unused.

It is best to retrieve no more than 10 TB at a time. As shown in Example 3, it is best to release the space (dmput -r) after using the retrieved files (scp, edit, compile, etc), then retrieve the next group of files, use them, and release the space again, and so on.

Example 3:

To retrieve one directory's data from tape, copy the data to a remote host, release the data blocks, and then retrieve more data from tape: lou% df -lh . Filesystem Size Used Avail Use% Mounted on /dev/cxvm/sfa2-s2l 228T 196T 32T 86% /lou/s2l lou% dmfdu project1 project2 project1 2 MB regular 214 files 13 MB dual-state 1 files 2229603 MB offline 101 files 2 MB small 214 files 2229606 MB total 315 files project2 7 MB regular 245 files 4661 MB dual-state 32 files 2218999 MB offline 59 files 7 MB small 245 files 2223668 MB total 336 files

Dealing with Slow File Retrieval 75 lou% cd project1 lou% dmfind . -state OFL -print | dmget & lou% scp -rp /u/username/project1 remote_host:/nobackup/username

##(Verify that the data has successfully transferred) lou% dmfind . -state DUL -print | dmput -rw lou% df -lh . lou% cd ../project2 lou% dmfind . -state OFL -print | dmget & lou% scp -rpq /u/username/project2 remote_host:/nobackp/username lou% dmfind . -state DUL -print | dmput -rw

Dealing with Slow File Retrieval 76 TCP Performance Tuning for WAN Transfers

You can maximize your wide-area network bulk data transfer performance by tuning the TCP settings on your local host. This article shows some common configuration tasks for enabling high-performance data transfers on your system.

Note that making changes to your system should only be done by a lead system administrator or someone who is authorized to make changes.

Linux

1. Edit the file sysctl.conf located under the /etc directory, and add the following lines:

net.core.wmem_max = 4194304 net.core.rmem_max = 4194304 2. Then have them loaded by running sysctl -p

Windows

We recommend using a tool like Dr. TCP.

1. Set the "Tcp Receive Window" to at least 4000000 2. Turn on "Window Scaling," "Selective Acks,"and "Time Stamping"

Other options for tuning Windows XP TCP are the SG TCP Optimizer or using Windows Registry Editor to edit the registry, but the latter is only recommended for Windows users who are already familiar with registry parameters.

Mac OS 10.4

Note that these changes require root access.

In order to allow the Mac operating system to retain the parameters after a reboot, edit the following variables in /etc/sysctl.conf:

1. Set maximum TCP window sizes to 4 megabytes

net.inet.tcp.sendspace= 4194304 net.inet.tcp.recvspace= 4194304 2. Set maximum Socket Buffer sizes to 4 megabytes

kern.ipc.maxsockbuf= 4194304

Mac OS 10.5 and Later

Use the sysctl command for the following variable: sysctl -w net.inet.tcp.win_scale_factor=8

If you follow these steps and are still getting less than your expected throughput, please contact the NAS network group at [email protected] (attn: Networks). We will work with you on tuning your system to optimize file transfers.

TCP Performance Tuning for WAN Transfers 77 You can also try the additional steps outlined in the related articles listed below.

TCP Performance Tuning for WAN Transfers 78 Optional Advanced Tuning for Linux

This document describes additional TCP settings that can be tuned on high-performance Linux systems. This is intended for 10-Gigabit hosts, but can also be applied to 1-Gigabit hosts. The following steps should be taken in addition to the steps outlined in TCP Performance Tuning for WAN transfers.

Configure the following /etc/sysctl.conf settings for faster TCP

1. Set maximum TCP window sizes to 12 megabytes:

net.core.rmem_max = 11960320 net.core.wmem_max = 11960320 2. Set minimum, default, and maximum TCP buffer limits:

net.ipv4.tcp_rmem = 4096 524288 11960320 net.ipv4.tcp_wmem = 4096 524288 11960320 3. Set maximum network input buffer queue length:

net.core.netdev_max_backlog = 30000 4. Disable caching of TCP congestion state (Linux Kernel version 2.6 only). Fixes a bug in some Linux stacks:

net.ipv4.tcp_no_metrics_save = 1 5. Use the BIC TCP congestion control algorithm instead of the TCP Reno algorithm (Linux Kernel versions 2.6.8 to 2.6.18):

net.ipv4.tcp_congestion_control = bic 6. Use the CUBIC TCP congestion control algorithm instead of the TCP Reno algorithm (Linux Kernel versions 2.6.18 and newer):

net.ipv4.tcp_congestion_control = cubic 7. Set the following to 1 (should default to 1 on most systems):

net.ipv4.tcp_window_scaling =1 net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_sack = 1

A reboot will be needed for changes to /etc/sysctl.conf to take effect, or you can attempt to reload sysctl settings (as root) with sysctl -p.

For additional information visit the Energy Science Network website.

If you have a 10-Gb system or if you follow these steps and are still getting less than your expected throughput, please contact NAS Control Room staff at [email protected], and we will work with you on tuning your system to optimize file transfers.

Optional Advanced Tuning for Linux 79 Streamlining PBS Job File Transfers to Lou

Some users prefer to streamline the storage of files (created during a job run) to Lou, within a PBS job. Because direct access to the Lou storage nodes from the Pleiades compute nodes and from Endeavour has been disabled, all file transfers to Lou within a PBS job must first go through one of the Pleiades front-end systems (PFEs).

Here is an example of what you can add to your PBS script to accomplish this:

1. ssh to a PFE (for example, pfe21) and create a directory on lou where the files are to be copied.

ssh -q pfe21 "ssh -q lou mkdir -p $SAVDIR"

Here, $SAVDIR is assumed to have been defined earlier in the PBS script. Note the use of -q for quiet-mode, and double quotes so that shell variables are expanded prior to the ssh command being issued. 2. Use scp via a PFE to transfer the files.

ssh -q pfe21 "scp -q $RUNDIR/* lou:$SAVDIR"

Here, $RUNDIR is assumed to have been defined earlier in the PBS script.

Streamlining PBS Job File Transfers to Lou 80 File Transfers Tips

The following quick and easy techniques may improve your performance rates when transferring files remotely to or from NAS.

This can increase your transfer rates by 5x, compared to older methods such as 3des.

• Transfer files from the /nobackup filesystem, which is often faster than the locally mounted disks. • If you are using scp and your data is compressible, try adding the -C option to enable file compression, which can sometimes double your performance rates:

% scp -C filename user@remote_host.com: • For SCP transfers, use a low-process-overhead cipher such as aes128-gcm@.com or arcfour:

% scp -c [email protected] filename user@remote_host.com: • If you are transferring files from Lou, make sure they are online, rather than on the tape archive, before you perform the transfer operation.

Note: If you use the shiftc command to transfer your files, it will automatically bring any files that are on the tape archive online before it transfers them. If you are not using shiftc, use the following DMF commands to determine the location of your files and bring them online if necessary:

% dmls -al filename # show the status of your file. % dmget filename # retrieve your file from tape prior to transferring.

For a full list of DMF commands, see DMF commands. • If you are transferring many small files, try using the tar command to compress them into a single file prior to transfer. Copying one large file is faster than transferring many small files. • For files larger than a gigabyte, we recommended using BBFTP software, which can achieve much faster rates than single-stream applications such as scp or rsync.

To improve your performance by modifying your system, see TCP Performance Tuning for WAN Transfers.

If you continue experiencing slow transfers and want to work with a network engineer to help improve file transfers, please contact the NAS Control Room at [email protected].

File Transfers Tips 81 Troubleshooting SCP File Transfer Failure with Protocol Error

To address security issues with the scp command, we are in the process of adding checks to ensure that the files returned by a remote server match the files requested by the user. In some cases, the checks implemented in scp to address this issue may cause requested files to be rejected with the following error message: protocol error: filename does not match request

This error can occur when you use quotation marks to escape special characters (such as a space) on the remote server, or when you use wildcard characters. The safest way to avoid this issue is to use the sftp command instead of scp to retrieve files. This avoids complications due to interpretation of the requested file names by the shell on the remote server.

For file retrieval, the syntax and command-line options for sftp are very similar to those for scp. For example, to retrieve files matching test*.c from a remote server to the directory somedir, use the following command line:

?$ sftp somehost:'test*.c' somedir/

Alternatively, you can add the -T option to the scp command line to disable checking of the file names returned by the remote server. Following the example above, the scp command line would be:

$ scp -T somehost:'test*.c' somedir/

If you need further help, please contact the NAS Control Room at (800) 331-8737 or (650) 604-4444.

Troubleshooting SCP File Transfer Failure with Protocol Error 82