SESUG 2016

Paper CC–134

Using SAS® to Get a List of File Names and Other Information from a in ® Windows

Imelda C. Go, South Carolina Department of Education, Columbia, SC

ABSTRACT

The ability to create a data set which contains the names of all the files in a directory can be very useful. This listing can be easily generated using the Disk (DOS) via the Windows command prompt. Within SAS, you can use the PIPE option in a FILENAME statement and the DIR command to generate the listing. This paper goes through three possible uses of this data set with the file listing. The first example is a situation where many files with the same input structure need to be read from the same directory. Instead of hard-coding the file name to create each of the many data sets, apply a SAS macro on each file name in the directory listing to create each data set. The second example involves the need to collect data files and determine which files are still missing. The third example is a situation where you are running out of computer storage space and need to identify files based on file size/ and/or age for possible deletion or other action. An often overlooked source of solutions is the SAS Knowledge Base online. Two SAS samples (24820 and 41880) can be used to address the above situations.

Generating a which contains file information for a directory, or a directory and all of its subdirectories is easily accomplished using a DOS command issued the Windows command prompt.

Disk Operating Systems (DOS) started in the early 80s and are operating systems that use the command line. DOS was the most common operating systems for IBM-PCs and compatibles from the 80s till the 90s. As true DOS systems became obsolete, the operating systems continued to provide users with access to the command prompt. Younger programmers may have never worked with the command prompt in a true DOS environment.

Let us first discuss how a user can obtain the directory listing through the Windows command prompt:

The DIR DOS command provides a listing of files. The information in the listing will depend on the exact form of the DIR command used. Options and parameters could vary, which change the exact form of the listing. A search online for “DIR DOS commands” will lead to many references regarding the DIR DOS command. Just be aware that exact syntax specifics might vary depending on the DOS version being referred to. The DIR syntax notes provided in this paper are not exhaustive.

Once the user decides on the final form of the DIR command to use, the redirection operator may be used to send the output of the command to a file. The listing may be redirected using the > or the >> operator. The > operator writes the output of the command into the specified file.

For example, the following command generates the default listing for the c:\ directory and writes the listing into c:\dircontents.txt

dir c:\ > c:\dircontents.txt

The file specified is created if it did not exist prior to the DIR command being issued and the file is overwritten if the specified file already exists.

Instead of overwriting an existing file, the >> redirection appends the DIR command output to the contents of the file specified. dir c:\ >> c:\dircontents.txt Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

DIR Syntax

The basic syntax will look like this where anything in brackets and parentheses is an option: DIR [d:][][filename][/A:(attributes)][/O:(order)][/B][/C][/CH][/L][/S][/P][/W]

Here are a few basic forms of the DIR command: dir displays the listing for the current directory dir *.csv displays the listing for files with a .csv extension in the current directory dir c:/ *. displays files with no file extension in the c:/ directory

/A Switch and Attributes [/A:(attributes)]

The attributes limit the listing of files and directories in the path. The specific attributes are specified right after /A:. Attributes that help limit the listing include: • hidden (H) or not hidden (-H) files • system (S) or non-system (-S) files • directories only (D) or files only (-D) • read-only files (R) or read/write files (-R)

For example: /A:HS is for a listing with hidden system files only /A:-DR is for read-only files only and excludes directories in the listing

/O Switch and Order Options [/O:(order)]

There are DOS options that affect the order of the listing. This is not really critical since the data can be sorted within SAS. Sort options include: • by alphabetical file name (N) or reverse alphabetical file name (-N) • by alphabetical extension (E) or reverse alphabetical extension (-E) • by chronological date and (D) or reverse chronological date and time (-D) • by increasing size (S) or decreasing size (-S)

Other Switches

There are other switches that can be useful. In particular, the /S switch is important for this paper’s third example. • /S displays the listing in the specified directory ([path]) and all subdirectories below it. • /L which displays the listing in lowercase letters. • /W displays only the file and directory names and excludes other file information but in a five-column .

2 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

THE DIR LISTING FROM THE COMMAND PROMPT

Let us look at how we can acquire this listing via the command prompt in a Windows system. Let us suppose the following three files are in c:\Users\igo\Desktop\example1

Open the Windows command prompt and then make your way to the directory by issuing the (change directory) command. (A discussion of the CD command’s syntax is not included in this paper.)

In the example below, we are now in the c:\Users\igo\Desktop\example1 directory. We then issue the DIR command in its simplest form without parameters, attributes, or switches. In its simplest form, the DIR command provides a listing in the current directory. The listing is displayed on the screen.

We can issue the “dir > example1dir.txt” command to save the listing to a file. This command takes the listing generated by the DIR command and writes it into the example1dir.txt file in the same (current) directory. If we issue the DIR command right after issuing the redirection command, we see that the example1dir.txt file is in the current directory. Since we did not specify a different path for example1dir.txt, the TXT file was written in the current directory, which is c:\Users\igo\Desktop\example1 in the example.

3 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

The text file has the same contents that the DIR command displayed in the command prompt window. We can observe that most of the file details are displayed in fixed width format and that it is possible to isolate the information for the date, time, size in bytes, and the file name by counting the column positions.

The strategy is simple: generate a listing with the DIR command and then create a data set using the information in the file.

The following example demonstrates the use of the /s and /a switches. The /a:-d portion of the command specified that only files be included in the listing. Therefore, directories will not be included.

4 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

PIPE OPTION AND DIR COMMAND IN FILENAME STATEMENT

Now that we know what the DIR command produces, let us look at what we can do from within SAS directly. Using a FILENAME statement with the PIPE option and the DIR command, we can generate the listing and use a DATA step to extract the data elements we need. The example below shows that the path for which you need a listing for is in double quotes (“c:\Users\igo\Desktop\example1”) and the exact DIR command specification is in single quotes. The DIR command below specifies the directory path for which you need the listing for. This technique involves an unnamed pipe, which allows you to take the output generated by the DIR command and use it as input for the data set. Note how you can do this directly in SAS and not need the listing to be saved in a file as illustrated above with the redirection of output.

The listing variable shown below contains every line in the listing. We use the fact that the listing contains data in fixed width format. We are able to use the SUBSTR function to isolate the filename value. The datasetname value is just the filename value in compressed form without spaces and the period. This is just an arbitrary dataset name used for the sake of providing an example.

Right now we have lines than needed. We can fix this by adding another line of code that eliminates the extra lines and leaves us with observations 8-11 only. This is a very simple example and the condition for deletion works for this example. Depending on the actual from of the DIR listing, you might have to adjust the condition for deletion. The need to adjust the condition will occur in Example 3. In the below, the example1dir.txt was deleted because it is not one of the three data files of interest.

5 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

The resulting data set looks like the following.

EXAMPLE 1: ONE DATA INPUT STRUCTURE BUT MANY FILES

There are many files with the same input structure that need to be read from the same directory. Instead of manually creating the code to read each of the many data sets, apply a SAS macro on each file name in the file information data set to create each data set.

Let us use the file listing data set we just created to produce a series of macro calls (one for each data set to be read) into a text file. The code could look like the following:

The resulting text file looks like this:

We now have all the information we need to read each specifc file and create a corresponding data set: (1) the file name and (2) the data set name. We may proceed to write our macro. We can associate the file with the macro calls with “maccall” in a FILENAME statement. We then use the %INCLUDE statement to bring these macro calls into the program for execution.

6 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

EXAMPLE 2: TAKING FILE INVENTORY

There is a set of data files and we want to determine if all expected files are in a directory and if not, which ones are missing. For this example, let us suppose you need data from four schools. The data file names you expect for each school are shown in the following table. It’s convenient that the school ID is embedded in the data file name and is the 5th character in the file name.

School School ID Anticipated data file name AB Elementary 1 data1.txt CD Middle 2 data2.txt EF High 3 data3.txt GH School 4 data4.txt

We use the information above and populate a web site.

Let us suppose the data files that have come in are in the directory C:\Users\igo\Desktop\example2. We see that the GH School has no data file since data4.txt is missing.

We can determine which files are missing by comparing the two data sets (first one being the list of file names in the directory and the second one being the list of complete schools) via the school id value.

7 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

The above yields the data set that flags which school has no data file.

EXAMPLE 3: WHERE ARE THE BIGGEST AND OLDEST FILES BY FILE TYPE?

The third situation is you are running out of computer storage space and need to identify files based on file size, age, and type for possible deletion or other action. You can run the DIR command as in the above examples and create reports for the largest and oldest files by file type. However, since large numbers of files are spread out in different directories, you will need to use the /S switch, which will list all the files in the subdirectories from where you issued the DIR command. Be sure to set the path to the appropriate root directory so that all the desired subdirectories are included in the output. Let us suppose the root directory is: c:\example3. Using the /S switch will include the listing for any of its subdirectories. Note that Examples 1 and 2 only involve one directory. The DOS command DIR “c:\example3”/s will produce the following output:

8 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

We observe that there are multiple files with the same name. This fact emphasizes the need to know which directory contains each file. The path is only listed at the top of each directory/subdirectory listing. With a few minor adjustments in our programming, we can isolate the file information we need. We can add the statements that will extract the date, time, file size, and file type; and associate the directory with each file.

9 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

Now that we see the output, we can think of ways to remove the unwanted records. Adding the following conditions to the DATA step will delete the unwanted records and the variables that are no longer needed.

The final data set looks like the following:

From this data set we have what we need to answer our questions. We can sort by datetime value to determine, which are the oldest files. We can sort by filesize to determine which files are the largest. We can also analyze the data by filetype value. We also know the name and location of every single file in our structure.

No doubt you can use macros 24820 and 41880 in the SAS Knowledge Base to accomplish some of the things illustrated in this paper. These two macros are included in this paper as a reference. However, it is always nice to know how a particular macro works. After all, the macros in the SAS Knowledge Base are listed with disclaimers:

By knowing how the data are extracted from the DIR listing, we can be confident that any macro we use/develop truly meets our specific needs and that we can run tests on our own code. That is, we can have much more than blind faith in someone else’s code.

10 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

Sample 24820: Creating a Directory Listing Using SAS for Windows. Retrieved July 1, 2016, from the SAS Support Knowledge Base Web site: http://support.sas.com/kb/24/820.html

11 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016 Continuation of Sample 24820: Creating a Directory Listing Using SAS for Windows. Retrieved July 1, 2016, from the SAS Support Knowledge Base Web site: http://support.sas.com/kb/24/820.html

12 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016 Sample 41880: Read all files from a directory and create separate SAS® data sets with unique names. Retrieved July 1, 2016, from the SAS Support Knowledge Base Web site: http://support.sas.com/kb/41/880.html

13 Using SAS® to Get a List of File Names and Other Information from a Directory in Microsoft® Windows, continued SESUG 2016

REFERENCES

DIR. DIR description from DOS the Easy Way. Retrieved August 9, 2016, from DOS the Easy Way Web site: http://www.easydos.com/dir.html

Sample 24820: Creating a Directory Listing Using SAS for Windows. Retrieved July 1, 2016, from the SAS Support Knowledge Base Web site: http://support.sas.com/kb/24/820.html

Sample 41880: Read all files from a directory and create separate SAS® data sets with unique names. Retrieved July 1, 2016, from the SAS Support Knowledge Base Web site: http://support.sas.com/kb/41/880.html

Wenston, J. (1997). Using unnamed pipes to simplify access to external files. Paper Published in the Proceedings of the Annual Conference of the Northeast SAS Users Group, Baltimore, MD, USA.

TRADEMARK NOTICE

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Microsoft, Encarta, MSN, and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

Other brand and product names are trademarks of their respective companies.

CONTACT INFORMATION

Imelda C. Go SC Department of Education 1429 Senate St. Columbia, SC 29201

14