
CODY I Data Conversion SAS (R) TUTORIAL SESSION: CONVERTING DATA BETWEEN "FOREIGN" FORMATS AND SAS (R) SYSTEM FILES Dr. Ronald Cody Robert Wood Johnson Medical School This paper will discuss ways to move data from such formats as ASCII, Lotus(r), and dBASE(r) to a SAS system file. Included in this discussion will be the moving of SAS system files from other platforms (such as UNIX) to system files on PC 1 s. PROC DIF, DBF, CPORT, and CIMPORT will be discussed as well as a non-SAS Institute package, DBMS/COPY which translates data between a variety of formats, including SAS system files. The special problems of missing values and incompatible formats is also addressed. I. Reading Data from an External ASCII file. One common way for users to enter data into a micro-computer is with a wordprocessing package. Several such packages write directly in ASCII format such as PCWRITE (r) and WordStar (r) (non-document mode only). Others use their own proprietary format such as Word Perfect(r) and Multimate(r). These latter packages contain translation routines which can convert their internal format to standard ASCII. In Word Perfect, the choice "Save to DOS Text File" will write ASCII files, while in Multimate, you must run a translate program. Another way to create ASCII files is to have a spread sheet program or a database program "print to a file." This technique is similar to sending data to the printer except that the resulting text will reside in a disk file. Care must be exercised here so that the package you are using does not format the text by adding margins or placing page breaks in the file. In Lotus, be sure to select the "Unformatted" and "Margin" (set left to zero) options in the Print menu before writing out the file. ASCII is a good "common denominator" between other packages and SAS system files when all else fails. Regardless of how the ASCII file was created, let us now see how a SAS program can read such a file. The ASCII file that was used in the program example which follows is listed below: FILE ASCII.TXT 001M2368160 ID is in cols 1-3, SEX in col 4, AGE in 5-6 002F4462 99 HEIGHT in col 7-8 and WEIGHT in col 9-11 003M29 200 004F2765 Note: This is a short record 005M6672220 006F6060100 The SAS program to read this file is shown next: 96 CODY 1 Data Conversion DATA ASCII; INFILE 1 ASCII.TXT 1 MISSOVER; INPUT ID 1-3 SEX $ 4 AGE 5-6 HEIGHT 7-8 WEIGHT 9-11; RUN; PROC PRINT NOOBS; TITLE 'SAMPLE DATA SET'; VAR ID--WEIGHT; RUN; Special care must be taken when reading this file. Notice that subject 004 has a short record (i.e. the carriage return was pressed immediately after the 11 65 11 was entered--no blanks were typed). Unlike mainframe systems, this file is not padded on the right with blanks. Without the "MISSOVER" option on the INFILE statement, the program would move to the next record in an attempt to read a value for WEIGHT even though the INPUT statement specifies columns 9-11 for this vari­ able. Below is a listing of the SAS data set that was produced without the MISSOVER option: Result of PROC PRINT when Option MISSOVER was Not Specified ID SEX AGE HEIGHT WEIGHT 1 M 23 68 160 2 F 44 62 99 3 M 29 200 4 F 27 65 5 6 F 60 60 100 Notice first that the SAS pointer went to record five to look for a value for WEIGHT and read the first three columns which was actually the ID number for the next subject. Then the SAS pointer moved to the next record, causing the data in record five to be skipped and the values in last record (seven) to appear in the 6th observation. The SAS LOG below shows that the SAS pointer went to a new line to read data and that the minimum record length was 8. SAS LOG where Option MISSOVER was Not Specified NOTE: The infile 1 ASCII.TXT 1 is file 0:\SASDATA\ASCII.TXT. NOTE: 6 records were read from the infile 0:\SASDATA\ASCII.TXT. The minimum record length was 8. The maximum record length was 11. NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.ASCII has 5 observations and 5 variables. If you see this NOTE in a SAS LOG and you did not intend for the SAS pointer to go to a new line to read data (as with INPUT statements 97 CODY I Data Conversion with @@) , be sure to think about this short record problem and the MISSOVER option to solve the problem. It is a good idea to include the option MISSOVER when reading ASCII files with SAS/PC. II. Reading Data from a Lotus(r) Spreadsheet Via DIF Format One way of converting a Lotus spreadsheet into a SAS system file is via a DIF (Data Interchange Format) file. Once your spreadsheet has been translated to a DIF file, you may use PROC DIF to convert the DIF file into a SAS system file. Let 1 s first discuss the format of the original spreadsheet. You may simply have columns of variables, with the first row of the spreadsheet containing the values for your first observation. Below is a sample of a simple spreadsheet (containing the same values as the sample ASCII file above): Lotus Spreadsheet Example 1 A B c D E 1 1 M 23 68 160 2 2 F 44 62 99 3 3 M 29 200 4 4 F 27 65 5 5 M 66 72 220 6 6 F 60 60 100 Notice that the numbers are right justified and the character variables are left justified. When this spreadsheet is converted to a SAS system file, the numbers will be SAS numeric (8 byte) variables and the characters will become character variables of length 20. Character values longer than 20 bytes will be truncated when we use the DIF format for our translation. Another form of the spreadsheet is to have the first row contain column headings. This form is shown next: Lotus Spreadsheet Example 2 A B ~ D E 1 ID SEX AGE HEIGHT WEIGHT 2 1 M 23 68 160 3 2 F 44 62 99 4 3 M 29 200 5 4 F 27 65 6 5 M 66 72 220 7 6 F 60 60 100 98 CODY 1 Data Conversion Finally, you may have one or more lines of text or comments in your spreadsheet. An example of this is shown next: Lotus Spreadsheet Example 3 A B c D E 1 These lines contain comments that we do not 2 want to include with our data. 3 ------------------------------------------- 4 ID SEX AGE HEIGHT WEIGHT 5 1 M 23 68 160 6 2 F 44 62 99 7 3 M 29 200 8 4 F 27 65 9 5 M 66 72 220 10 6 F 60 60 100 There are two ways of dealing with examples 2 and 3. First, before we enter the Lotus translate program, we can use the "RANGE" command of Lotus and name a range that includes only the data. We can then translate only the range and create a DIF file that will be identical to the one from example 1. The other alternative, is to translate the spreadsheet intact and use the SKIP option of PROC DIF to skip the first n lines of the spreadsheet. Now that we have translated our WK1 file to a DIF file, we are ready to see how PROC DIF works. The syntax for PROC DIF is: PROC DIF DIF=fileref OUT=sas_file SKIP=n; where fileref = a file reference to the .DIF file sas file = name of the newly created SAS system file n = number of lines of the spreadsheet to skip For example, suppose our original worksheet file was called LOTUS.WK1. The translated DIF file will be named LOTUS.DIF (the .DIF is added automatically by the translate routine). If we want our SAS system file to be called LOTUSAS, we would write out PROC statements as follows: FILENAME IN 'LOTUS.DIF'; PROC DIF DIF=IN OUT=LOTUSAS; The variables in the resulting SAS data set would be named COL1, COL2, COL3, etc. You could rename these variables using PROC DATASETS such as: 99 CODY I Data Conversion PROC DATASETS; MODIFY LOTUSAS; RENAME COLl=ID COL2=SEX COL3=AGE COL4=HEIGHT COL5=WEIGHT; III. Reading Data From a Lotus Spreadsheet via DBF Format An alternate method of converting a Lotus spreadsheet to a SAS system file is by first converting the spreadsheet to DBF format (choose dBase III from the Lotus translate screen) and to then use PROC DBF to create the SAS system file. There are advantages and dis­ advantages to this method. First, the Lotus translate routine expects that the first row of the spreadsheet contains variable names and subsequent rows contain data values. If there are extraneous rows or columns in your spreadsheet, use the "range" command in Lotus to name a range where your variable names and values are located. The Lotus to DBF conversion is more particular than the Lotus to DIF conversion. The translate routine insists that the second row of the spreadsheet (the first row of data) either contains data values or is formatted. After the conversion is completed, the resulting SAS system file will have the same variable names as the column headings.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-