<<

Let t he You: Moving Files a cross Studies and reating SAS ® On - The - G o

David Liang, Chiltern International, Wilmington, NC ABSTRACT

In clinical trials, datasets and SAS programs are often stored under different studies and oduct s in the system . SAS p rogrammers need to access those locations frequently, to read in data for programming, or to files for reuse in new analysis. Typing the lengthy is very consuming and nerve - racking. This paper d escribes an efficient way to store the various directory paths in advance through environment variables . Those pre - defined environmen t variables can be used for UNIX file operation (cop y ing, deleting, searching for files, etc.). Information carried by thos e variables can also be passed into SAS to con struct libraries your convenience .

INTRODUCTION

UNIX system is very popular in pharmaceutical industry. Clinical trial datasets (raw data, SDTM, ADaM), along with SAS programs , are often stored under various studies and products in the UNIX . Following is an example of the d irectory structure s in UNIX .

/////rawdata/

/////sdtmdata/

/////adamdata/

/////prog/

SAS programmers need to access those files from one study directory to another , to read data into SAS programs, or to copy SAS programs for recycling . Typing the lengthy directory path is painful and time - consuming.

UNIX environment is defined by environment variables . When you log in on UNIX, your current (login shell sets a unique working environment for you is maintained until you log out. UNIX allows you to set environment variables . In this paper, a set of environment variables are used to store your commonly us study directory paths in the login file . Your login file may vary depending on the shell that you are using (.cshrc for , .b ashrc for , etc. ). Once those variables are defined, you may use the environment variable name to the lengthy directory path during file operation , so that efficiency can be achieved.

Furthermore, environment variables can se r ve a convenient way to pass information to SAS programs running under UNIX . You can access those variables through a “pipe” in combination of DATA step function SYSGET (Thacher 2010).

In this paper, two methods are introduced for accessing those environment variables in SAS . One is through the %SYSGET function to retrieve a specific environment variable , while the other is to call a macro %SETENV to access a set of variables that you pre - defined. Study directory path information carried by those ariables can now be utilized in constructing SAS libraries for data access.

DEFINING ENVIRONMENT VARIABLES

The way in which you define an environment variable depends on the shell that you are running.

For Bourne shell (sh and ), the syntax is:

exp ort var = value

1

For C shell (csh and tcsh), the syntax is:

s etenv var = value

We will use Bourne shell as an example throughout this paper.

When assigning study directory path to the environment variable , you have two choices. You may use the long versio n of path which points directly to the location where the datasets or programs stored . In that way, y ou only need to the variable name when referring to that location . The drawback of that approach is that you have to create multiple environment varia ble s for each study when mu ltiple subdirectories under that study need to be accessed . See examples below, environment variables sxxx fd1r, sxxx fd2r, sxxx fv1r, sxxx f1p , and syyy f1p are created using that approach. Two of them ( sxxx f1p, syyy f1p ) will be used in the application example Ex1 in the next section.

export sxxx fd1r = / pdtabc / sxxx /final/draft1/rawdata

export sxxx fd2r = / pdtabc / sxxx /final/draft2/rawdata

export sxxx fv1r = / pdtabc / sxxx /final/version1/rawdata

export sxxx f1p = / pd tabc / sxxx /final/version1/prog

export syyy f1p = / pdtabc / syyy /final/version1/prog

On the other hand, you may use the short version of path, i.e., the path pointing to the study , rather than pointing to the datasets or program folders under that study. In that way, only o ne environment variable is needed for accessing the subdirectories under that study. Therefore, fewer environment variable names need to be memorized. See examples below. Those two variables will be used in the application examples (Ex2, Ex3, and Ex4) in the next section.

export szzz = / pdtabc / szzz

export swww = / pdtefg / swww

USING ENVIRONMENT VARIABLES IN FILE OPERATION

To use the environment variable in a UNIX command, you need to preface it with a ( $ ) . This tells the comma nd interpreter that you want the variable's value, not its name, to be used. To see the value of an environment variable , you can use the command.

echo $ < name >

Following are some application examples to use environment variables in copying files acr oss studies, searching files in another location, and comparing files from different studies.

COPYING FILES FROM ANOTHER STUDY LOCATION

When a new analysis starts, you might want to borrow some programs from another similar study. The environment variable s come in handy when doing that kind of operation . See the two examples below .

Ex 1. Copy t - demog.sas program from study xxx final analysis sion1 directory to study yyy final version1 directory.

U NIX command without the environment variables is very len gthy. It’s a lot of typing.

/ pdtabc / sxxx /final/version1/prog/t - demog.sas

/ pdtabc / syyy /final/version1/prog/

2

UNIX command using the pre - defined environment variables ( sxxx f1p, syyy f1p ) is very short and nea t.

cp $ sxxx f1p/t - demog.sas $ syyy f1p

Ex 2 . Copy t - demog.sas program from study zzz week 24 analysis ver sion1 directory to study www final analysis ver sion1 director y.

UNIX command without the environment variables is like this:

cp / pdtab c / szzz / wk_24 /version1/prog/t - demog.sas

/ pdtefg / swww / final/version1/ prog

UNIX command using the pre - defined environment variables ( szzz , swww ) is shorter.

cp $ szzz /wk_24/version1/prog/t - demog.sas $ swww / final/version1/ prog

In Example 1, the two e nvironment variables are assigned with the full directory path of the program location. Therefore, the typing time is the least . In the 2 nd example, the two environment variables are as signed with the path of the studies , ra ther than the program location s . Therefore, a bit typing is needed compared to the 1 st example. However, i n case you are familiar with the tab autocompleting feature in UNIX, the additional typing is actually very trivial.

SEARCHING FILES FROM ANOTHER STUDY LOCATION

Due to the com plexity of UNIX directory structures, looking for a SAS program from other study directories might not be easy . Fortunately, the command provide you the tool to search files under a study directory and its sub - directories. With the use of environment variables , this tool is very easy to use.

Ex 3 . Search t - snapshot.sas program under study zzz and it s sub - directories. The – p rint option instruct the command to display the search result when the search is completed.

UNIX command without environment variab les :

find / pdtabc / szzz – name t - snapshot.sas –

UNIX command using the environment variable ( szzz ) :

find $ szzz – name t - snapshot.sas – print

COMPARING FILES ACROSS STUDY LOCATIONS

When recycling SAS programs from one analysis to another, we might want to know the difference between versions, so that the right version will be selected for new analysis. The command provides a simple wa y to do such comparison . Example 4 illustrates another application of the environment variables .

Ex 4 . C ompare t he t - rnac.sas under study zzz week 48 version1 directory against the one under study www final draft2 . The options – wbi instructed the UNIX to ignore the white spaces and the change of cases.

UNIX command without environment variables : diff - wbi / pdtabc / sz zz /wk_48/version1/prog/t - rnac .sas

/ pdtefg / swww /final/draft2 /prog/t - rnac .sas

3

UNIX command using environment variable s ( szzz , swww ): diff - wbi $ szzz / wk_48/version1/prog/t - rnac.sas $ swww / final /draft 2 /prog/t - rnac.sas

Besides the application in UNIX file operation , information carried by the environment variable can be utilized in SAS programs too. We will illustrate the way to do that in the next section.

CREATING SAS LIBRARY ON - THE - GO

USING %SYSGET FUCTION

There are several ways to pass infor mation from environment variables to SAS. One way is to use the %SYSGET function. See example below.

1 %let sxxx data=%sysget( sxxx fd1r);

2 Libname inputds “& sxxx data”;

Line 1 uses %SYSGET function to retrieve the value of the environment variable sxx x d1r , and assign that value to the macro variable sxxx data through the %let statement.

Line 2 associates the SAS library with the libref inputds through libname statement.

Without the environment variable , we will need to the following code to associ ate the SAS library.

Libname inputds “ / pdtabc / sxxx /final/draft1/rawdata”;

CREATING %SETENV MACRO

The %SYSGET function approach allows us to retrieve a specific environment variable and use that value in the libname statement. In case you want to retrieve a set of environment variables at once, you may use the following %SETENV macro.

1 %macro setenv;

2 filename path _list pipe "set ";

3 data work.envvars;

4 length name $10 value $120;

5 infile path _lis t DLM='=' MISSOVER lrecl=32767;

6 input name $ value $;

7 if name ^= " " then do;

8 call symputx(name,value );

9 end;

10 run;

11 %mend setenv;

4

Line 2 uses t he FILENAME statement to assign fileref to a pipe. The pipe enables SAS app lication to receive the input from the UNIX commands. The SET retrieves all of the predefined environment variables . You may use command here in case you need to filter out the records to only select the ones that you are interested .

Line 6 reads in two variables from the external file, of which the NAME is the environment variable name, the VALUE is the value that is assigned to the particular environment variable .

Line 8 uses CALL SYMPUTX routine to assign the value of VALUE variable to the macro variable NAME, and removes both leading and trailing blanks.

With that, the stud y directory information is passed into SAS macro variable through the environment variable . The environment variable name is used here as the SAS macro variable name. Those SA S macro variables can be used to construct the directory path in the LIBNAME statement . Following are two SAS libraries associated using that approach.

Libname swww adam “& swww /final/draft1/adamdata”;

Libname szzz sdtm “& szzz / wk_24/version1/sdtmdata ”;

Witho ut the environment variables , we will need to write the following code to associate the two SAS libraries as mentioned above.

Libname swww adam “ /u/ pdtefg / swww /final/draft1/adam data”;

Libname szzz sdtm “ /u/ pdtabc / szzz /wk_24/version1/sdtmdata ”;

Many pharmaceu tical companies use relative path in LIBREF statement to achieve efficiency. The libname statement below associates libref sdtm to the SAS library under /sdtmdata/ folder, which shares the same parent directory with the current .

Libname sdtm “ .. / sdtmdata ”;

The relative path approach has its advantages for efficiency and making code portable. It has its limitation too, as you can’t use that for accessing folders in another study. The environment variable approach, on the other hand, doesn’ t limit you to the current working directory. It’s very useful when there is a time that you need to pull data from several studies for integration or study comparison.

CONCLUSION

This paper suggests that assigni ng study directory paths to UNIX environme nt variables can improve programming efficiency by eliminating the time to type the lengthy directory path s . Those predefined environme nt variables can be used in UNIX command for file operation . Information from those environment variables can also be pas into SAS programs through %SYSGET function or the data step in combination of CALL SYMPUTX routine, to construct SAS libraries that don’t limit you to the current working directory.

REFERENCE S

Thacher, C. 2010. “ Your SAS Code Environmentally Aw are. ” Proceedings of the SAS Global Forum 2010 Conference , Seattle, WA, 090 - 2010.

5

ACKNOWLEDGMENTS

I would like to thank Lila Thome for all of her support and advice in editing this paper.

CONTACT INFORMATION Your comments and questions are valued and en couraged. Contact the author at:

David Liang Chiltern International Inc. 2528 Independence Blvd., Suite 101 Wilmington, NC 28412 Email: [email protected]

SAS and all other SAS Institute Inc. product o r service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

6