<<

NESUG 18 Applications

Using SAS to Perform Maintenance Tasks on ASCII Files Chris Moriak, AstraZeneca, Wilmington, DE

ABSTRACT files in a data set. One can then use this information The ability of SAS® to and ASCII files later to loop through the files. The code for this step: makes it a good application for performing maintenance tasks on ASCII files. For example, myfile pipe “ –1” *.sas when copying and modifying programs, often there is specific text, like a trial number, that needs to be data pgms; changed in all programs. It would be great if there length file $30 ; were a utility to make such global changes to all infile mydir ; programs within a . Another example is input file $ ; run; comparing two directories of ASCII output to identify

any differences between them and to document the differences. SAS programmers dread comparing IMPORT FILE INTO DATA SET two SAS outputs to what, if anything has Now that one has a useable list of in SAS, changed between program runs. The manual one can begin servicing the file. One by one, import process of dredging through page after page is both the files into SAS by converting them into temporary inefficient and tedious. This paper will discuss how data sets with one variable. This task is SAS can help accomplish both these tasks. accomplished by using the INFILE and FILE statements in a data step: The platform used in this paper is UNIX, but one can modify it to use on other platforms. The files need to filename old “/user/temp/myfile.sas”; be in ASCII format (e.g. SAS or LST files). Topics in filename new the paper include the PROC COMPARE procedure, “/user/temp/myfile_new.sas”; obtaining a list of filenames from the operating system, converting the ASCII output into a data set, data temp; and the DATA _NULL_ statement to manipulate infile old missover sharebuffers files. length=len; file new; COMMON TECHNIQUES length outline inline $200; When performing maintenance tasks on files, the input @; process is the same: identify, retrieve, service, and input @1 inline $varying200. len; export. The two examples discussed in this paper use a common technique that can be applied to PERFORM TASK most maintenance tasks. Once the file line is read into the data set, then one can perform a task or function on that line. The line IDENTIFY FILES below replaces the string “find” with the string To identify a select group of files for processing, one “replace”. first needs to obtain a list of potential files. A SAS program can retrieve a list of files on the UNIX outline = trim(trandwrd(inline, platform by using the filename statement with a pipe: “find”, “replace”)); filename myfile pipe options. The pipe tells SAS that instead of identifying a file, information will EXPORT DATA SET TO FILE be passed to it via the filename. In order to pass a After performing the necessary tasks to the file line, list of files to SAS, the full filename statement is the last step is to export the observation back to the filename myfile pipe “ls –1” file.ext, file. The FILE statement shown above with the where “file.ext” is the file and extension name (wild following PUT statement accomplishes this task: cards can be used). The “ls –1” option tells UNIX to provide a list of files one entry per line. Next, add a data step using the filename to store the identified

1 NESUG 18 Applications

***Get program name; x = length(outline); %let prog = put @1 outline $varying200. x; %sysfunc(reverse(%scan(%sysfunc( run; reverse(&sysprocessname)),1,/)));

filename mydir pipe "ls -1 *.sas";

A SEARCH AND REPLACE MACRO *** Create data set of SAS programs; PROGRAM data pgms; length file $30 ; infile mydir ; PROCESS input file $ ; if upcase(file)="%upcase(&prog)" Once one knows this technique of reading an ASCII then delete; file into SAS, servicing it, then exporting it out as a run; file, one can find many uses for it. One such use is performing a search and replace on one or more files in a directory. The steps for this task: 3. LOOP THROUGH FILENAMES 1. Macro Parameters Using PROC SQL, create a macro variable string 2. Get Filenames that contains the names of the files with a separator 3. Loop Through Filenames between each name. The SQL procedure will 4. Copy Files for Backup include a WHERE clause to subset the possible 5. Import File into Data Set files, and SQL will count the final number of files. 6. Find and Replace String The macro variable string allows SAS to loop 7. Export Data Set as File through each file. 8. Clean Up *** Create macro variable of program 1. MACRO PARAMETERS files; proc sql noprint; The macro parameters necessary for this utility: select trim(file) into : pgm

separated by '~' DIR - Directory where files are from pgms located %if %length(&where)>0 %then FIND - String to find %do; REPLACE - Replacement string where &where WHERE - Clause to subset files %end; BACKUP - Whether to keep a backup of order by file ; the original file %let cnt=&sqlobs; 2. GET FILENAMES quit;

The first step is to identify the number and names of the files in the directory. To do this, use an X- ***Loop through each file; command to change the operating system directory %do i=1 %to &cnt ; %let pgm1= %scan(&pgm,&i,~) ; to the requested directory. Set a filename statement with the pipe option to query the operating system for the filenames within the directory. Then, create 4. COPY FILES FOR BACKUP a data set containing the filenames. The SAS code The worst thing that can happen when performing a to perform these tasks is shown below. The record find and replace is for the user to mistakenly replace containing the program name is deleted so to not text. Therefore, one should allow the user to keep alter itself when performing the search and replace. backups of the original file. Add an X-command to This SAS code for identifying the program name copy the original files to a backup file with an “.xrp” may need to be modified depending on your system. file extension to show that this program created these files. ***Change the directory; x "cd &dir";

2 NESUG 18 Applications

put @1 outline $varying200. x; ***Copy files to a backup file; run; x "cp -f &pgm1 &pgm1..xrp"; %end;

5. IMPORT FILE INTO DATA SET 8. CLEAN UP It is time to read in each file line by line. Create two It is always a good idea to clean up the SAS filename statements following the technique outlined environment when a program finishes processing. earlier. One statement will be for the input file, the This means deleting temporary data sets and other for the output file. Then, read in the file line by clearing the filenames. If the user wishes not to line. One can choose to create a temporary data set keep the backup files, then delete those as well via or simply use a DATA _NULL_ to skip the data set an X-command. creation. ***Delete temporary data sets; ***Create filenames for input and proc datasets lib=work mt=data nolist; output files; delete pgms; filename old "&dir./&pgm1..xrp"; quit; filename new "&dir./&pgm1"; ***Clear filenames; ***Read in input file placing; filename new clear; data _null_; filename old clear; infile old missover sharebuffers filename mydir clear; length=len; file new ; ***Delete backup files if not wanted; length outline inline $ 200; %if %upcase(&backup) ne Y %then input @; %do; x "rm *.xrp" ; ***Input line from file; %end; input @1 inline $varying200. len; EXAMPLE MACRO CALL

Here is an example of the macro call. In this 6. FIND AND REPLACE STRING example, the user is searching all the files in the Once the line is read into the data vector as a directory /project/study/pgms and changing every record, then one can perform a task on it. The task occurrence of the string ‘123’ to ‘456’. in this example is find and replace using the TRANWRD function. The additional TRIM function %xreplace(dir=/project/study/pgms will remove any trailing blanks. ,find=123 ,replace=456);

***Perform search and replace; outline=trim(tranwrd(inline,"&find", EXAMPLE RESULT "&replace")); All the programs in /project/study/pgms used to have the statement: libname final ‘/abc/sw-023-5412H123’; 7. EXPORT DATA SET AS FILE Now, all the programs in /project/study/pgms have After performing the find and replace on the record, the statement: it is time to export the record back to the ASCII file. libname final ‘/abc/sw-023-5412H456’; The FILE statement in the DATA _NULL_ tells SAS to output the data to a file instead of to a data set. To output the file, a PUT statement is necessary. Afterwards, add the %END statement to tell SAS to retrieve the next file.

***Output line back to file; x=length(outline);

3 NESUG 18 Applications

infile dir2 ; COMPARING MULTIPLE ASCII FILES input file $ ; run;

PROCESS 3. FIND MATCHING FILENAMES Another task that can be done using this technique is comparing multiple ASCII files within two For this utility, a prerequisite is that the files to directories. The steps for this task: compare must have the same name. Thus, the next 1. Macro Parameters step is to determine what files can be compared. 2. Get Filenames The user can modify the macro to add the option to 3. Find Matching Filenames specify two specific files to compare. A simple 4. Loop Through Comparison PROC SQL quickly determines the union. 5. Import Files into Data Sets 6. Compare Data Sets proc sql ; 7. Output Report create table match as 8. Clean Up select a.file from dir1 as a, dir2 as b 1. MACRO PARAMETERS where a.file = b.file ; The macro parameters necessary for this utility: quit;

DIR1 - Directory 1 4. LOOP THROUGH COMPARISON DIR2 - Directory 2 path IGNORE - Starting string to tell SAS to Using PROC SQL, create a macro variable string that contains the names of the files with a separator ignore entire line between each name. The SQL procedure will OBS - Number of observations to print that are different include a WHERE clause to subset the possible files and will count the final number of files. The macro ONLY - Whether to only print tables variable string allows SAS to loop through each file. that are different WHERE - Where condition to subset file names *** Create macro variable of files; TYPE - File extension proc sql noprint; select trim(file) into : pgm separated by '~' 2. GET FILENAMES from match The first step is to identify the files. For this task, one %if %length(&where)>0 %then needs to get the filenames from both directories. %do; where &where ***Get filenames in directory 1; %end; x "cd &dir1"; order by file ;

filename dir1 pipe "ls -1 *.&type"; %let cnt=&sqlobs; quit; data dir1; length file $30 ; ***Loop through each file; infile dir1 ; %do i=1 %to &cnt ; input file $ ; %let pgm1= %scan(&pgm,&i,~) ; run; 5. IMPORT FILES INTO DATA SETS ***Get filenames in directory 2; x "cd &dir2"; For this task, one needs to create two filename statements and import files into two data sets. Also, filename dir2 pipe "ls -1 *.&type"; one needs to create a variable containing the observation record number for use later in the data dir2; program. It is also good to delete the page return length file $30 ; character. For UNIX, this is usually byte(13).

4 NESUG 18 Applications

filename in1 "&dir1./&pgm1"; merge file1 (where=(_obs_ in (&xobs)) filename in2 "&dir2./&pgm1"; rename=(inline=inline1)) file2 (where=(_obs_ in (&xobs)) *** make data set of file1; rename=(inline=inline2)); data file1; by _obs_; infile in1 missover sharebuffers length=len; A disadvantage of many of the shareware and third length inline $ 200; party applications that compare files is that they do input @; not allow the user to specify a line to ignore. Often, input @1 inline $varying200. len; company output standards require a program _obs_=_n_; identification line. It becomes a burden when these inline=compress(inline,byte(13)); compare applications state every page is different run; just because the user ID is different or the run time changed. In SAS, one can tell the program to ignore *** make data set of file2; a line if it begins with a user specified string. data file2; infile in2 missover sharebuffers *** Remove records to be ignored; length=len; %let xlen=%length(&ignore); length inline $ 200; %if &xlen > 0 %then input @; %do; input @1 inline $varying200. len; if substr(inline1,1,&xlen)="&ignore" _obs_=_n_; or substr(inline2,1,&xlen)= inline=compress(inline,byte(13)); "&ignore" then delete; run; %end; run; 6. COMPARE DATA SETS The next step is to compare the two data sets and 7. OUTPUT REPORT identify the records that are different. This process It is time to check whether SAS found any should be easy, but for some unknown reason SAS differences between the two files. If there are no v8.2 (and earlier releases) makes it difficult. PROC differences, then give the user the option to print a COMPARE can compare the data sets, but it will notification stating this fact. only output the record numbers that are different, not the actual records. This is why one has to create title1 "Comparing LST or Text Files"; the observation record number variable in an earlier title3 "File1= &dir1/&pgm1"; step. One has to re-query the data set to retrieve title4 "File2= &dir2/&pgm1"; the actual records. The OUTNOEQUAL option specifies to output the observation numbers that are %let xob=; not equal. %if %length(&obs)> 0 %then %do; ** Compare Files; title5 "Only Showing &obs proc compare base=file1 compare=file2 Differences"; out=x3 noprint outnoequal; %let xob=(obs=&obs); run; %end;

%let xobs=0; %let xxobs=0; *** Check number of differences; ** Get OBS that are different; proc sql noprint ; proc sql noprint ; select count(*) into: xxobs select _obs_ into : xobs separated by from x4 ; ' ' quit; from x3; quit; *** If number of differences = 0; %if &xxobs eq 0 %then *** Get records from the OBS that are %do; different; %if %upcase(&only) ne Y %then data x4 ; %do;

5 NESUG 18 Applications

data _null_; /abc/sw-013-0123/prod with files of the same name file print ; in /abc/sw-013-0123/test. The user is asking for a put // @45 "==== THERE ARE NO report on all files, showing only the first 25 DIFFERENCES ====="; differences. Also, the program should ignore any run; line that begins with “/abc”, which is the program %end; identification line. %end; %qclst(dir1=/abc/sw-012-0123/prod, If there are differences, then print a report with two dir2=/abc/sw-012-0123/test, columns displaying the line that is different. To where=file like "t11%", avoid lengthy reports when many lines are different, only=n, obs=25, type=lst, one can tell the program to only display a specific ignore=/abc); number of differences. EXAMPLE RESULT %if &xxobs > 0 %then %do; Example results from this macro can be found in proc report data=x4 (obs=&obs) nowd Appendix A. If one has hundreds of pages of output headskip headline split="~"; or hundreds of files to compare, then one can see column _obs_ inline1 inline2 ; how useful this utility is to increase one’s QC define _obs_ / group efficiency. "Line~Where~Different" width=10; define inline1 / display CONCLUSION "Line from File1" width=58 flow; Programmers like SAS because of its ability to define inline2 / display manipulate many different file types. This feature "Line from File2" width=58 flow; allows SAS programmers to write their own file break after _obs_ / skip; utilities. The paper discussed two utilities performed run; %end; on ASCII files that use a similar technique. The reader should be able to take this technique, adapt it %end; to his or her own situation, and create even more

As stated earlier, the program will loop through neat and really useful utilities. these steps until it has compared all eligible files.

ACKNOWLEDGMENTS 8. CLEAN UP SAS is a Registered Trademark of the SAS Institute, Every good program should clean up after itself. Inc. of Cary, North Carolina. ® indicates US After completing the report, the program should registration. clear all file references and delete all temporary data sets. CONTACT INFORMATION The author welcomes comments, suggestions, and EXAMPLE MACRO CALL questions by e-mail: [email protected]. In this example, the user is requesting to compare all the LST files beginning with “t11” in directory

6 NESUG 18 Applications

APPENDIX A: EXAMPLE OUTPUT

Comparing LST or Text Files

File1= /abc/sw-012-0123/prod/t1102010101.lst File2= /abc/sw-012-0123/test/t1102010101.lst Only Showing 25 Differences

Date: 04JUN2004 Time: 14:15

Line Where Different Line from File1 Line from File2 ------

19 TREAT A 115 0.38 0.254 0.35 -0.12 TREAT A 115 0.38 0.254 0.35 -0.13

36 TREAT A 123 0.19 0.286 0.14 -0.61 TREAT A 123 0.20 0.286 0.14 -0.61

40 TREAT A 124 0.38 0.347 0.32 -0.67 TREAT A 124 0.38 0.348 0.32 -0.67

72 TREAT A 115 0.38 0.254 0.35 -0.12 TREAT A 115 0.38 0.254 0.35 -0.13

Comparing LST or Text Files

File1= /abc/sw-012-0123/prod/t1102010102.lst File2= /abc/sw-012-0123/test/t1102010102.lst Only Showing 25 Differences

Date: 04JUN2004 Time: 14:15

==== THERE ARE NO DIFFERENCES =====

7 NESUG 18 Applications

APPENDIX B: SEARCH AND REPLACE MACRO %let pgm1= %scan(&pgm,&i,~) ;

%macro replace(dir=, find=, replace=, where=, x "cp -f &pgm1 &pgm1..xrp"; backup=); filename old "&dir./&pgm1..xrp"; ***Change the directory; filename new "&dir./&pgm1"; x "cd &dir"; data _null_; ***Get program name; infile old missover sharebuffers length=len; %let prog = file new ; %sysfunc(reverse(%scan(%sysfunc(reverse(&sysproces length outline inline $ 200; sname)),1,/))); input @; input @1 inline $varying200. len; filename mydir pipe "ls -1 *.sas"; outline=trim(tranwrd(inline,"&find","&replace")); x=length(outline); *** Create data set of SAS programs; put @1 outline $varying200. x; data pgms; run; length file $30 ; %end; infile mydir ; input file $ ; proc datasets lib=work mt=data nolist; if upcase(file)="%upcase(&prog)" then delete; delete pgms; run; quit;

*** Create macro variable of program files; filename new clear; proc sql noprint; filename old clear; select trim(file) into : pgm separated by '~' filename mydir clear; from pgms where 1=1 /*&where*/ %if %upcase(&backup) ne Y %then order by file ; %do; x "rm *.xrp" ; %let cnt=&sqlobs; %end; quit; %mend replace; ***Loop through each file; %do i=1 %to &cnt ;

8 NESUG 18 Applications

APPENDIX C: COMPARE MULTIPLE ASCII FILES MACRO *** Create macro variable of files; %macro qclst(dir1=, dir2=, ignore=, obs=, only=, proc sql noprint; where= ,type=); select trim(file) into : pgm separated by '~' from match ***Get filenames in directory 1; where &where x "cd &dir1"; order by file ;

filename dir1 pipe "ls -1 *.&type"; %let cnt=&sqlobs; quit; data dir1; length file $30 ; ***Loop through each file; infile dir1 ; %do i=1 %to &cnt ; input file $ ; %let pgm1= %scan(&pgm,&i,~) ; run; filename in1 "&dir1./&pgm1"; ***Get filenames in directory 2; filename in2 "&dir2./&pgm1"; x "cd &dir2"; *** make data set of file1; filename dir2 pipe "ls -1 *.&type"; data file1; infile in1 missover sharebuffers length=len; data dir2; length inline $ 200; length file $30 ; input @; infile dir2 ; input @1 inline $varying200. len; input file $ ; _obs_=_n_; run; inline=compress(inline,byte(13)); run; proc sql ; create table match as *** make data set of file2; select a.file data file2; from dir1 as a, infile in2 missover sharebuffers length=len; dir2 as b length inline $ 200; where a.file = b.file ; input @; quit; input @1 inline $varying200. len; _obs_=_n_; inline=compress(inline,byte(13));

9 NESUG 18 Applications

run; title5 "Only Showing &obs Differences"; %let xob=(obs=&obs); ** Compare Files; %end; proc compare base=file1 compare=file2 out=x3 noprint outnoequal; %let xxobs=0; run; *** Check number of differences; proc sql noprint ; %let xobs=0; select count(*) into: xxobs from x4 ; ** Get OBS that are different; quit; proc sql noprint ; select _obs_ into : xobs separated by ' ' *** If number of differences = 0; from x3 %if &xxobs eq 0 %then %do; quit; %if %upcase(&only) ne Y %then %do; data _null_; *** Get records from the OBS that are different; file print ; data x4 ; put // @45 "==== THERE ARE NO DIFFERENCES merge file1 (where=(_obs_ in (&xobs)) ====="; rename=(inline=inline1)) run; file2 (where=(_obs_ in (&xobs)) %end; rename=(inline=inline2)); %end; by _obs_; %if &xxobs > 0 %then %do; *** Remove records to be ignored; proc report data=x4 (obs=&obs) nowd headskip %let xlen=%length(&ignore); headline split="~"; %if &xlen > 0 %then column _obs_ inline1 inline2 ; %do; define _obs_ / group "Line~Where~Different" if substr(inline1,1,&xlen)="&ignore" or width=10; substr(inline2,1,&xlen)="&ignore" then define inline1 / display "Line from File1" delete; width=58 flow; %end; define inline2 / display "Line from File2" run; width=58 flow; break after _obs_ / skip; title1 "Comparing LST or Text Files"; run; title3 "File1= &dir1/&pgm1"; %end; title4 "File2= &dir2/&pgm1"; %end; %mend qclst; %let xob=; %if %length(&obs)> 0 %then %do;

10