<<

J-Files: Tipsheeton formats PageL of 4

Extendyour knowledge:A tipsheeton file formats

*ByJeffSouthffiSchoolofMasscommunications&VcU Whenyou get a file on a disk or downloadit from the ,how do you knowwhat program to use to openit and view or workwith its contents?

The key is to lookat the file extension- the "dot-something-something-something"at the end of thefile name. For example, if thefile nameends in ".XLS",it standsfor an Excel ;if the extensionis'.DBF", it's a dBASEfile; if it's".HTM" or ".HTML',the file is in hypertextmarkup language, formatted as a Web page.

Thereare hundredsof extensionsthese days, and they can get prettyconfusing. Even the wordextension is confusing:lt comesfrom the PCworld, but in Maclingo, an extensionis a pieceof softwarethat goes in your SystemFolder.

It's importantto knowwhat differentextensions stand for, so you'llknow what kindof file you're dealingwith. That way, you'll know whether you havea programthat can a particulaifile. lf you don'thave the rightprogram, you may haveto get the file convertedto a differentformat. (Youcan converta file by openingit with its "native"program and savingit in an alternative format.Or youcan use conversion like Junction, Conversions Plus or MacLink to turna filefrom, say, Excel to dBASE.)

lf you'reworking with Excel, you usuallydon't have to do a lotof dataconversion. That's becauseExcel can open many spreadsheet formats, like Lotus 1-2-3 files. Still, Excel chokes on a few file types(like the QuattroPro spreadsheetformat).

lf you'reusing Visual FoxPro, you needyour data in a dBASEformat. Fortunately, FoxPro has a way to take otherfile typesand changethem intodBASE.

BeforeI startlisting extensions and whatthey standfor, you shouldknow one otherthing: the differencebetween ASCII and binary files.

ASC|l,which stands for AmericanStandard Code for lnformationExchange and is pronounced ASS-key,is plain-vanillatext: no formatting, no specialcharacters. You canopen ASCII files on a Macwith TeachText,SimpleText or Word(but when you go to savethe file in Word,be sureto saveit as "textonly" - anothersynonym for ASCII). You canopen the sameASCII file on a PC with Notepad,WordPad, , Word, XyWrite, WordPerfect or any otherword- processingprogram.

Binaryfiles, on the otherhand, have special coding that tells a computerhow the file was createdand formatted. An Excelspreadbheet, for example,is a binaryfile: lf youtry to openit witha wordprocessor, you'll get garbage -- becauseit chokeson the initialcoding, which explains,among other things, about the numberand widths of columns.lf youwrite a storyand saveit in Word,that's also a binaryfile. You can open it in Word,and it looksgreat. But becauseof the invisiblecoding, it willlook like junk if youtry to openit witha differentword processor.

2L

-) hfip://w,ww.peof&.vcu.edg/-igsoutlr/lihtarv/analvsislfogn?ts.lrtnl'r 07n112002 J-Files:Tipsheet on file formats Page2 of 4

Hereare some extensions you're likely to encounterwhen you get data to crunchor you're surfingthe Internet. All of these are binary files and, unless otherwise noted, can be opened withExcel:

Spreadsheetfiles

. .DlF- DataInterchange Format, the format for , the first spreadsheet. . .WRK,.WR1 - Symphony,another older spreadsheet. . .SLK- ,yet another older spreadsheet. . ,RXD,.R2D -- Reflex,a ratherobscure spreadsheet. . .WKS,.WK1, .WK3, .WKE - Variousversion of Lotus'l-2-3, a popularspreadsheet. . .WQl, .WQ2- TheDOS version of QuattroPro, a solidspreadsheet (but for PCsonly, Macfans). . .W81,.WB2 -- TheWindows version of QuaftroPro. Many joumalists say this spreadsheetis as goodas Excel(and much cheaper). Maybe that's why Excel wont ooen.WB* files! . .XLS- Well,duh.

Databasefiles

. .DB2,.DBF,.DB4 -- dBASEfile. dBASE is probablythe most popular format for a databasetable: You can open a dBASEfile with almost any spreadsheet program (Excel, Lotus,Quatto Pro) or databasemanager (FoxPro, Paradox, Access). . .MDB- An MSAccess file. . .DB- A Paradoxfile. Paradox and Quattro Pro are owned by the same company; Excel won'tooen .DB or.WB" files.

FoxPro,especially the new versions, has spawned a bunchof newextensions. They mostly haveto dowith the way FoxPro "sees" and organizes dBASE tables. These extensions include:

. .DBC- A databasefile, which indicates that two or moredBASE tables are related and makeuo a . . .lDX,.GDX -lndex files, which help FoxPro put data in order. . .QPR,.PRG - Queryand program files, instructions to FoxProto do somethingwith a dBASEtable. . .PJX,.FPC, .CAT - Projectfiles. In FoxPro, a projectcan consist of ,dBASE tables,queries and other things; a ".PJX"file tells FoxPro all these things are related.

Because.DlF, .WKS, .DBF and similar files are binary, when you open them in Excel,Excel knowsexactly how to putthe data into columns and rows. But not all datafiles are binary. Often,you'll get data, on diskor fromthe Internet,as a textfile - somethingyou could open (butcouldn't sort or manipulate)with a wordprocessor. When you open a textfile with Excel, youhave to guideExcel through the process of puttingthe data into columns. This is called "parsing"the data. And you do it withExcel's lmport Wizard: You tell the Wizard what kind of textfile you're dealing with, and the Wizard walks you through the parsing process. (Visual FoxProalso has an lmportWizard for opening text files and turning them into dBASE tables. MicrosoftAccess, another popular database manager, also has a wizardfor importingtext files.) z7 http://www.p eople. vcu. edu/-j csouth/library/analysisiformats. 07nU2002 J-Files: Tipsheeton file formats Page3 of 4

Hereare some extensionsyou'll see associatedwith text files containing data:

. .TXT- This oftenmeans there are tabs betweeneach column, and there may be quotes aroundeach piece of data.For example, the datamight look like:

Name City Amount uDallasn "George Bush" "$5,000" uAusti-nn '$7,000n "Bob Bullock" r'Hale n$2,000'r "Pete Laneyrr Center"

That'scalled a tab-delimitedfile. Occasionally,you mightsee such a file with a .TAB extension.

. .GSV-- For "comma-separatedvalues." Instead of a tab betweeneach column,there's a comma. o .PRN,.DAT, .ASC, .SDF - lnsteadof a delimiter,there is spacebetween each column. Whenyou open such a file,you'll see that each column lines up:

GeorgeBush Dallas $6,000 BobBullock Austin M.000 PeteLaney HaleCenter $2,000

Somepeople call this formata "space-delimitedfile' or "standarddata format' (hence, the .SDFextension).

There'snot a lot of consistencyin applyingextensions to text files.You'll frequently find .TXT on space-delimitedfiles, and .ASC (for ASCII) on tab-or comma-delimitedfiles. lf you'renot sureof the format,use trial and enor in importingthe file into Excel.Or openthe file in a word processorand examineit. Whenyou do this, be sureto use a font like Courier,in whicheach character is thesame width - so the columnswill line up. (ln otherwords, use a monospacedfont. lf you usea proportionalfont, in whichan M is muchwider than an l, the columnsin a space-delimitedfile won't align.) You also should tell your to showinvisible characters, like tabs, so you'llbe ableto seeif thefile is tab-delimited.When examininga datafile with a wordprocessor, be carefulnot to savethe file as binary,like in the Wordformat. Save it as text .- or just closeit withoutsaving it.

Fromtime to time,you'll need a robustword processorto massagetext files of data.For instance,you may get a filein whichsomeone has used a semicolonas the delimiter.lt could looklike this:

"GeorgeBush" ; "Dallas"; "$5,000' "BobBullock" ; "Austin"; "$7,000" "PeteLaney" ; "HaleCenter'' ; "$2,000"

In suchcases, you maywant to replacethe semicolonwith a tab, so that your spreadsheetor databaseprogram will recognizethe file formatand let you importit.

7g http: //www .p e op I e. v cu.e du/-j cs o uth/libr ary I analysi s/ form ats .html 071ru2002 J-Files: Tipsheeton file formats Page4 of 4

Besidesknowing the extensionsfor datafiles, you should learn other kinds of extensions.For instance,if you get a lot of informationon disksor downloada lot from the Intemet,you'll probablysee:

. .ZlP - A "zipped"file. This meansit containscompressed information: All the air, so to speak,has beensucked out of the originalfile, so ifs only a fractionof its full size.This hasbeen done using a PC programcalled PKZIP. To inflatea .ZlPfile, you usually use a PC programcalled - ta-da- PKUNZIP.But thereare also programsthat let you inflate a .ZlPfile on a Mac. . .ARC- Anotherkind of compressedfile from a PC. . ,EXE- Thiscan indicate a self-extractingcompressed file from a PC,meaning PKUNZIP is insidethe file. In Windows,all you haveto do is double-clickon an .EXEfile and it will inflate. . .SlT, .SEA- Filesthat havebeen eompressed on a Macwith the programStufflt. lf you havean .SlTfile, you'llneed Stufflt to inflateit. An .SEAfile standsfor "self-extracting archive":lt will unstuffitself. . .GZ -- A compressedfile from a Unixsystem. To inflateit, you needa programcalled GUNZIP.

lf you spendmuch time on the Net,you'll encounterseveral other extensions:

. .HTM,.HTML - Webpages coded in hypertextmarkup language. These are ASCII files, but they havecodes you can'tsee in Netscape.These codes tell Netscapethe colorof the page,the sizeof the and howto linkcertain phrases or imagesto other information. . .GlF,.JPG, .JPEG - Theformat for mostimages you'll find on theWeb. . .BMP,.BMF, .PlC, .PICT, .TIFF - Otherimage formats. . .AU,.WAV -- Audiofiles. . .MPG,.MPEG, .AVl, .MOV - Videofiles. . .PDF- Theformat for AdobeAcrobat. A .PDFdocument can contain text, graphics and photos,laid out pageby page.You needAcrobat, which is free,to opensuch a file.

z7 htto ://www. p eop I e. vcu. edr/-i tag">c so uth/l lbr arvI analy si s/form ats. htnll- 07nU2002