FILEVAR OPTION: CONTROLLING FILES READ AND/OR WRITTEN Steve Goins, American Express, Phoenix, AZ

ABSTRACT Related Infile Options While there are several simple ways to concatenate external files, it Two Infile options that can be very useful when reading can be very useful to control the files used. concatenated files are EOV and . This paper will cover concatenating external files and the basics of using the FileVar option. EOV = variable Assigns I to variable when the first record in a series of I will provide some example programs using the FILEVAR option: concatenated files is read. SAS does not reset the variable collecting metrics, making changes to all programs in a , to 0. The variable is not written to the data set. etc. FileName = variable INTRODUCTION SAS assigns the name of the currently file to the much The FileV ar option to the Infile and File statements provides variable. Like automatic variables, the FileName variable to be flexibility by allowing the files read from and/or written out is not written out. changed at run time. I will first cover other ways of reading multiple files, and then focus on the FileVar option. 1 Data a; Fn FileNm $25; This paper will only deal with the Windows operating systems 2 Length been tested on. I 3 Infile •c:\files\tst?.txt" (Windows 95, 98, ME, NT, 2000) the code has FileName=Fn; are short and readable. used my PC for testing so file names 4 Input Arnt 3. ; 5 FileNm=Fn; READING CONCATENATED FLAT FILES 6 Run; arranging for multiple files to be read as one file Two methods for Line 3 assigns the FileName variable Fn to a variable, together using the SAS grouping are: I) group the file names FileNm, which will remain on the data set. refer to the files with a wild card character. character, and 2) Below is the log from above program, and some output. is: Group the FOes NOTE: The infile "c:\files\tst?.txt" File Name~c:\files\tst3.txt, Place the file names, surrounded by a pair of tic marks or double File quotes, within parentheses. List=c:\files\tst?.txt,RECFM=V,LRECL=256 FileName FileRef ('c:\Files\tstl.txt' 'c:\Files\tst2.txt' "c:\files\tst?.txt" is: 'c:\Files\tst3.txt'); NOTE: The infile File Name~c:\files\tst2.txt, Data GroupExample; File FileRef; Infile List=c:\files\tst?.txt,RECFM=V,LRECL=256 Input Amount 3.; Run; NOTE: The infile "c:\files\tst?.txt" is: File Name=c:\files\tstl.txt, WUd Card Character File Use a wild card character in referring to the files -- in Windows, the List=c:\files\tst?.txt,RECFM=V,LRECL=256 asterisk "*" for any number of characters, or the question mark "?" records were read from the for I character. NOTE: 3 infile "c:\files\tst?.txt". FileName FileRef 'c:\Files\*.txt'; The minimum record length was 3. The maximum record length was 3. Data WildExample; NOTE: 3 records were read from the Infile FileRef; infile "c:\files\tst?.txt". Input Amount 3.; The minimum record length was 3. Run; The maximum record length was 3. NOTE: 3 records were read from the The wild card character can also be used in the Infile statement. infile "c:\files\tst?.txt". Data WildExarnple; The minimum record length was 3. Infile 'c:\Files\tst?.txt'; The maximum record length was 3. Input Amount 3.; NOTE: The data set WORK.A has 9 Run; observations and 2 variables. NOTE: DATA statement used:

32 real time 0.66 seconds End=done, to indicate the last record of the file by setting the variable done to I. Proc Print Data=Example NoObs; Run; Line 5: A DO loop is used here to read all the records of the file. This is necessary because on the second iteration FileNm Amount of the DATA step, the first In file statement reads another record which will be input to the File Var = nption in the c:\files\tstl.txt 111 c:\files\tstl.txt 222 second Infile statement. The DO While loop checks at the c:\files\tstl.txt 333 start of the loop while a DO Until loop checks at the end. c:\files\tst2.txt 222 The DO While loop should be used if you do not want the c:\files\tst2.txt 444 DATA step to end when it comes to a zero-byte file. c:\files\tst2.txt 666 c:\files\tst3.txt 333 c:\files\tst3.txt 666 Line 7: The OUTPUT statement writes out each record c:\files\tst3.txt 999 within the DO loop.

FILEVAR INFILE Statement The power of the File Var option comes from the flexibility in FILEVAR= Defines a variable whose change in value creating the FileVar variable. causes the INFILE statement to the current output file and open a new one the next time the INFILE statement executes. 1 The source for the FileVar variable can come from a field from a SAS file or raw file, can be created from fields read in, or can be You can create the FILEV AR variable by using the specified in the code. INFILE statement to read a directory listing from Unnamed pipes, a text files, or data lines. Or you could read a SAS data set to create the FILEV AR variable. FILEVAR Basics Below is a basic example of using File Var option to read 3 files. Unnamed Pipes as Input 1 Data Stuff; An unnamed pipe can be used to direct a directory listing 2 Infile Cards; to the SAS system for use as input. 3 Input Fil2Read $17.; FileName fList "Dir C:\Files\t*.txt/b"; 4 Infile Dummy FileVar=Fil2Read The above fList file reference will refer to below data: End= Done; 5 Do While (Not Done); tst3.txt 6 Input Amount 3.; tst2.txt 7 Output; tstl. txt 8 End; You do not get the full so you wiJI usually add the file name to the directory in your code. You can add the /s 9 Cards; include subdirectories option after lb to get the full path, if C:\Files\tstl.txt appropriate to include subdirectories. C:\Files\tst2.txt C:\Files\tst3.txt FileName fList "Dir C:\Files\*.txt/b/s"; ; Run; The above fList file reference will refer to below data: C:\Files\tst3.txt NOTE: The data set WORK.STUFF has C:\Files\tst2.txt 9 observations and 1 variables. C:\Files\tstl.txt NOTE: D~T~ statement used: C:\Files\SchedulerJobs.TXT real time 0.16 seconds

There may be different starting columns for the date and Line 2: The first Infile statement tells SAS to start reading data after , depending on the Windows version, with the the Cards statement. Windows Dir command.

Line 3: This Input statement reads in a filename into a variable that Creating the FileVar Variable in Code will be used in the FileVar option for reading multiple files. This You can derive the code before the Infile (with FileVar) variable will not be kept in the data because the variable that feeds Statement. You will need to stop the DATA step after you the FileVar= is not kept. have read enough records because here there will not be an end of record or end ofSAS data set to end the DATA step. Line 4: The file reference, Dummy, can be any valid 8 character SAS file reference name. An automatic variable is set up, with 1 Data Stuff; 2 Drop Cnt;

33 3 Cnt + 1; By State; 4 FileNm = "C:\Files\tst" Run; !! put(Cnt,l.) !! ".txt"; 5 Src = FileNm; Data Null ; 6 Infile dummy FileVar = FileNm Set-Stateinfo; End= Done; By State; 7 Do While (Not Done) ; If First.State Then File2Put = 8 Input Amount 3. ; "C:\Files\" ! ! State ! ! ".txt"; 9 Output; File Dummy FileVar = File2Put; 10 End; Put State $2. 11 If Cnt = 3 Then Stop; ThatDate $6. 12 Run; Amt $4. ; Run; NOTE: The infile DUMMY is: Below is the log, which follows the last DATA step, File Name=C:\Files\tstl.txt, showing that 6 files are written out. RECFM=V,LRECL=256 NOTE: The file DUMMY is: NOTE: The infile DUMMY is: File Name=C:\Files\AK.txt, File Name=C:\Files\tst2.txt, RECFM=V,LRECL=256 RECFM=V,LRECL=256 NOTE: The file DUMMY is: NOTE: The infile DUMMY is: File Name=C:\Files\AL.txt, File Name=C:\Files\tst3.txt, RECFM=V,LRECL=256 RECFM=V,LRECL=256 NOTE: The file DUMMY is: NOTE: 3 records were read from the infile File Name=C:\Files\AR.txt, DUMMY. RECFM=V,LRECL=256 The minimum record length was 3. file The maximum record length was 3. NOTE: 1 record was written to the read from the infile DUMMY. NOTE: 3 records were The minimum record length was 12. DUMMY. The maximum record length was 12. The minimum record length was 3. NOTE: 3 records were written to the file The maximum record length was 3. DUMMY. NOTE: 3 records were read from the infile The minimum record length was 12. DUMMY. The maximum record length was 12. The minimum record length was 3. NOTE: 2 records were written to the file The maximum record length was 3. DUMMY. NOTE: The data set WORK.STUFF has 9 The minimum record length was 12. observations and 2 variables. The maximum record length was 12. NOTE: DATA statement used: NOTE: There were 6 observations read from real time 0.37 seconds the data set WORK.STATEINFO.

Subtleties of FILEV AR FILE Statement You have more control with the FileVar option than the FJLEV AR= Defines a variable wbose change in value causes the other methods of reading multiple files. You can limit the FILE statement to close the current output file and open a new one of records sent to the last Infile statement by If 1 number the next time the FILE statement executes. logic on the data coming in, or limiting the incoming records of by using Obs= and or FirstObs=. Below is an example of using FileVar= to to multiple files. You should sort the file before the last DATA step when there are many different files to write out, so that files are only closed and Using a subsetting IF or a Delete statement within the DO opened when there is a new BY variable. loop will stop processing that file. The control will then go to the next iteration if there are more input records for files Data Stateinfo; to read. Be careful when leaving a loop early by using the Infile Cards; Leave statement. Input State $2. ThatDate $6. Amt $4. We have been using simple examples here. For very Cards; complicated code, it may be easier to build the 2 different AL0602023000 parts separately, and then put them together later. AK0601024500 AL0701023000 AR0601024455 AL0603021200 AR0702029999

Run;

Proc Sort Data=Stateinfo;

34 EXAMPLES Data Null ; Infile SchList TruncOver First0bs=7; Writing to Mnltiple Files to Create Test Files Input JobClass $18. JobNm $23. Data Null ; JobCmd $169. ; Do I = 1-to 3; CmdLoc = "C:\Files\" ! ! FileOut = "C:\Files\tst" Trim (JobClass) ! ! "\" ! ! ! ! Put(i,l.) ! ! ''.txt"; Trim(JobNm) ! ! ".cmd"; File Dummy FileVar=FileOut; Cmdinstr = "&SasPath" ! ! ! ! Do J = 1 to 3; Trim (JobCmd) ; Amount = (i * 111) * J; put Amount 3.; File Stuff FileVar = CmdLoc; End; Put Cmdinstr; End; Run; Run;

Writing To Multiple Files NOTE: The infile SCHLIST is: File Name=C:\Files\SchedulerJobs.TXT, We were migillting from one scheduler system to another. RECFM=V,LRECL~210 We needed to create 129 batch files for running production SAS jobs. We had the below from the Technologies team that owned NOTE: The file STUFF is: the old scheduler. File Name=C:\Files\AUTHMIS\UPDAAT.cmd, RECFM=V,LRECL=256 NOTE: The file STUFF is: File Name= C:\Files\AUTHMIS\UPDAATBAD.cmd, RECFM=V,LRECL=256

NOTE: The file STUFF is: Job File Name=C:\Files\AUTHMIS\UPDVOL.cmd, AUTH AABAD RESULT -sysin RECFM=V,LRECL=256 AUTH=AAGOOD_RESULT -sysin AUTH SAHP RESULT -sysin _·_·:-~~=~,,-,. NOTE: 1 record was written to the file AUTH-UOLUME RESULT -sysin STUFF. EMAIL_AUTH_RESULT -sysin The minimum record length was 210. UPDAAT -sysin \ \I'LLC'"'\1 The maximum record length was 210. UPDAATBAD -sysin ... ---u''' UPDUDL -sysin NOTE: 1 record was written to the file UPDUDLBAD -sysin HI· LLCU STUFF. CC911CUCCSG -sysin \\I'LLICIJ!j,1J The minimum record length was 212. CC9DCUCDRP -sysin The maximum record length was 212. CUSTPOP -sysin IHUEHACT -sysin \ \i•i:i:,i:~Jti:l NOTE: DATA statement used: CCIBHOHSP -sysin real time 0. 7 5 seconds CCIBSPIH -sysin \\I'LLIC,,'[ CHRGUSAH -sysin \\I'LLIC"j,:J LTC IF -SjiSin

Reading Header Reeords from Multiple Files We do not need to loop through the records of the files in I used the class field for the directory, the job name for the name of the example below because we only need the first record the batch file, and the job parms (SAS job, where to write log, etc.) in from each file. building the batch files. %Let Dirin =L:\Csasf\; FileName FileList Pipe "&oirin. *.dat"; Data Lookc; When the task of creating these batch files came up, the FileVar Format HdrDt FileDt mmddyylO.; option immediately came to mind. The below code took a minute or InFile FileList Truncover; two to write, and a second to run. Input @01 Fileot ?? mmddyylO. @40 FileNm $9. ; If substr(FileNm,1,2) = 'AO'; oatFile • "&oirin." ! ! FileNm; Below is the SAS code I used to create the 133 batch files, along with extracts of the log. Infile sasin Filevar=DatFile Dlm=','; Input Hdr Gdg : $8. Hdrot : yymmdd6. ; FileName SchList Run; 'C:\Files\SchedulerJobs.TXT'; %Let SasPath= 11 ""C: \Program Files\Sas Institute\SAS\V8\Sas .exe '';

35 Else Do; RtSs = Reading From and Writing To Multiple Files Input(Scan(RealTime,l," "),2.}; The SAS code below reads in the files of a directory, replaces all RtHms= Hms(O,O,RtSs}; occurrences of a word, and writes out the new files to another End; Format RtHms Time.; directory. Output; End; %Macro ChngCode(Dirin,Dirout oldwrd,Newwrd); End; FileName P1pein Pipe "Dir &oHin.*.sas/b";1 Run; Data _Null_; Infile Pipein TruncOver; Proc Summary Data=Stuff Nway Input Fn SChar80.; Missing NoObs Print; Filein = "&oi rin" I! Trim(Fn); Class Group; Fileout= "&oi rout" ! ! Trim(Fn); Infile Stuffin Filevar = Filein Var RtHms; Truncover End=Done; Output Out=SumStuff(Drop= type File stuffout Filevar = Fileout Rename=(_freq_=Jobs}) -- /* Place File statement beneath Sum=TotTime Mean=MeanTime ; DO While statement if you do not want 0-byte files written out. */ Run; Do While (Not Done); Input ProgLine $Char100.; Proc Print Data=SumStuff NoObs; If Index(ProgL i ne, "&oldWrd") Then Run; ProgLine = Tranwrd( ProgLine,"&oldwrd", 11&Newwrd .. ): Put ProgLine ScharlOO.; System End; The SAS Run; Mean %Mend; Group Jobs TotTime Time %ChngCode(C:\Files\Projl\, C:\Files\Projla\, Auth 4 11:27:47 2:51:57 Amt, Amount); Cesg 16 0:55:05 0:03:27 Corp 57 41:37:46 0:43:49 41 0:08:38 0:00:13 Real Time of Multiple Jobs Intl Gathering Wee 5 0:02:36 0:00:31 This example pulls data from 6 directories, and parses the FileNm pms 7 1:00:57 0:08:42 variable to get the directory (group) and the name of the program (also the name of the log, but would have a different extension). CONCLUSION The directory command with the lb and /s options returns the full The FileV ar option is a very powerful tool that allows you path names for the files. to dynamically change the file being read or written. You can process many files with just a little bit of SAS code. FileName LogFiles Pipe "Dir C:\Files\Sas\*.log/b/s"; Data Stuff; Drop LogLine; Length Group $6 Job $32 Infile LogFiles TruncOver; Input FileNm $60. ; Job= Scan(Scan(FileNm,-1,"\"),1,"."); Group= Scan(FileNm,-2,"\"}; Infile dummy FileVar=FileNm TruncOver End=LastRec; Do While (Not LastRec}; Input LogLine $25.; If LogLine = "NOTE: The SAS System used" Then Do; Input +15 RealTime & $20. ; If Index(RealTime,":") Then Do; RtSs = Input(Scan(RealTime,-1,":"),2.}; RtMrn = Input(Scan(RealTime,-2,":"),2.}; RtHh = Max(O,Input(Scan(RealTime,- 3,":"),2.)); RtHms = Hms(RtHh,RtMrn,RtSs}; End;

36 REFERENCES FOR FURTHER READING

1 SAS Help to the FILEVAR option

SAS Help and Examples

TS-581 Using FILEVAR= to read multiple external files in a DATA Step

SAS Technical Tips Quick Tip: Making Sense ofthe INFILE and INPUT Statements by Randall Cates

TRADEMARKS SAS and all other SAS institute product and service names are trademarks or registered trademarks of SAS institute Inc., Cary, NC, USA. ® indicates US registration on all published material. Other brand and product names are part of their respective companies.

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Steven D. Goins American Express 2512 W Dunlap Avenue Phoenix, AZ 85021 Work Phone: 602-537-9521 Fax: 602-537-9297 E-mail Address: [email protected]

37