Total Page:16
File Type:pdf, Size:1020Kb
Born Broken: Fonts And Information Loss In Legacy Documents Geoffrey Brown and Kam Woods Indiana University School of Informatics and Computing Key Questions How pervasive are font substitution problems ? What information is available to identify fonts ? How well can we match the fonts required by a document collection ? How can we assist archivists in identifying serious font issues ? Page 8 MCTM Bulletin February 2005 K: I knew what you meant. I was just kidding. I’ll do XüLLbl (W):InputQ:FnOff :"""Y =! Y",Y#:PlotsOff the dishes tonight at dinner. YüL‚(W):Goto:0!Xscl:0!Yscl:Plot1(Scatt T er,L#,L$,&) PlotsOn 1:ZoomStat:StorePic Pic1 Lbl Q:FnOff :""üY :PlotsOff Jennifer felt better so offered the following challenge to Pause :Goto T Kevin. Lbl:0üXscl:0üYscl:Plot1(Scatt S:ClrHome:2!dim(L%er,L):dim(L ,L‚,Ñ)# )!N J: What type of general statement can you make DispPlotsOn "NO. 1:ZoomStat:StorePic OF Pic1 regarding the various polygons and, better yet, what PausePTS.":Output(1,13,N):Pause :Goto T can you say about a figure that looks like this? LblFor(I,1,N):ClrHome S:ClrHome:2üdim(Lƒ):dim(L )üN Disp "NO. "PT. OF NO.","":Output(1,9,I) PTS.":Output(1,13,N):Pause L#(I)!L%(1):L$(I)!L%(2) For(I,1,N):ClrHome Disp L%:Pause :End:Goto T LblDisp "PT.0:Menu(" NO.","":Output(1,9,I) MODELS R""," LINEAR (2)",1,"L(I)üLƒ(1):L‚(I)üLƒ(2) QUADRATIC",2," CUBIC/QUARTIC",3,"Disp Lƒ:Pause :End:Goto LOGARITHMIC",4," T LblEXPONENTIAL",5," 0:Menu(" MODELS POWER",6," RÜ"," LINEAR MAIN (2)",1," MENU",T) QUADRATIC",2," CUBIC/QUARTIC",3," Lbl 1:"aX+b"!Y# Kevin was impressed. Would your students be as LOGARITHMIC",4," EXPONENTIAL",5," Menu(" LINEAR "," STANDARD",U," impressed or challenged? MEDIAN/MEDIAN",VPOWER",6," MAIN MENU",T) Born Broken: Fonts and Information Loss in Legacy Digital Documents Lbl 1:"aX+b"üY U:ClrHome:LinReg(ax+b) L#,L$ Reprinted from the California ComMuniCator, Volume /57/270/Corr.1 DispMenu(" "LinReg(ax+b) LINEAR "," STANDARD",U," " United Nation s A 29. No. 1. andMEDIAN/MEDIAN",VDisp "---------- ------",a,b,"",r Geoffrey Brown Kam Woods General Assembly Distr.: General Department of Computer Science,Output(3,1,"a Indiana University ="):Output(4,1,"b Lbl U:ClrHome:LinReg(ax+b) L ,L‚ 27 September 2002 150 S. WoodlawnDisp="):Output(6,1,"r Ave. "LinReg(ax+b) " =") Bloomington, IN 47405-71041!T:Pause :Goto 7 Original: English TI Corner Disp "----------------",a,b,"",r by R2 Lbl V:ClrHome:Med-Med L#,L$ DispOutput(3,1,"a "Med-Med ="):Output(4,1,"b " Disp="):Output(6,1,"r "---------------- =") ",a,b As statistics continues to grow to a more and more 1üT:PauseOutput(3,1,"a :Goto 7="):Output(4,1,"b =") prominent place in today’sAbstract mathematics curriculum, all for mathematical symbols may result in total information 8!T:Pause :Goto 7 resources (NCTM and others) suggest that statistics, even loss.Lbl V:ClrHome:Med For example, our- experimentalMed L ,L‚ data include 9 docu- For millions of legacy documents, correct rendering de- mentsLbl with2:ClrHome program listings for the Texas Instruments TI- in itspends early uponforms, resources should such be introduced as fonts that to are students not generally at the Disp "Med-Med " 83If series N<3:Goto calculators 0 renderedFifty-seventh with the sessionTi83Pluspc font appropriateembedded levels. within the This document program structure. (RESIDUAL) Yet there is sig- is Disp "----------------",a,bAgenda item 44* Start whichQuadReg provides L#,L various$:2! mathematicalT:"aX"+bX+c" symbols;!Y# these pro- Comment: <<ODS JOB centerednificant around risk ofthe information regression loss aspect due of to missingcurve fitting. or incor- It Output(3,1,"a ="):Output(4,1,"bFollow-up to the outcome =") of the Millennium Summit gramDisp listings "QUAD. are incomprehensible aX"+bX+c" when rendered with- NO>>N0260931E<<ODS JOB NO>> allowsrectly the substituteduser to enter fonts. and edit the ordered pairs in the <<ODS DOC outDisp8üT:Pause this obscure"---------------- :Goto font. 7 This is illustrated",a,b,c in Figure 1. Com- SYMBOL1>>A/57/270/Corr.1<<ODS DOC data setIn thisand paperselect we the use type a collectionof curve that of 225,000 best fits Word that doc-data. Implementation of the United Nations Millennium SYMBOL1>> uments to assess the difficulty of matching font require- poundingLblOutput(3,1,"a 2:ClrHome the problem, ="):Output(4,1,"b Texas Instruments has published=") <<ODS DOC SYMBOL2>><<ODS DOC As the user tries a variety (8 to select from) of curves, the Declaration SYMBOL2>> ments with a database of fonts. We describe the identify- aIfOutput(5,1,"c variety N<3:GotoFont of calculator 0 Substitution ="):Pause fonts with different :Goto internal 7 names program keeps track of the one that best fits the data by programing information keeps track contained of the inone common that best font fits formats, the data font re-by andLbl possibly 3:Menu(" incompatible SELECT glyphs. 1 "," CUBIC",A," calculating the residual sum. QuadReg L ,L‚:2üT:"aXÜ+bX+c"üY Report of the Secretary-General quirements stored in Word documents, the API provided by QUARTIC",B) Disp "QUAD. aXÜ+bX+c" Windows to support font requests by applications, the doc- Lbl A:ClrHome Corrigendum Lbl umentedT:ClrHome:10 substitution0000 algorithms!M:Float used by Windows when Disp "----------------",a,b,c Lbl T:ClrHome:100000üM:Float If N<4:Goto 0 Menu("requested fontsRESIDUALS are not available, andR the""," ways inDATA: which 1. Paragraph 41 Menu(" RESIDUALS RÜ"," DATA: ENTER",R," CubicRegOutput(3,1,"a L#,L ="):Output(4,1,"b$:3!T:"aX'+bX " +cX+d"=") !Y# ENTER",R,"support software ..... might VIEW",S," be used to control ..... font substitution ..... VIEW",S," ..... DOutput(5,1,"cisp "aX'+bX ="):Pause"+cX+d" :Goto 7The second sentence should read: EDIT",W,"in a preservation ..... environment. PLOT",Q," TEST 8 EDIT",W," ..... PLOT",Q," TEST 8 MODELS",0," LblDisp 3:Menu(" "---------------- SELECT 1 "," ",a,b,c,dCUBIC",A,"East Asia has been quite successful in reducing the proportion of people who MODELS",0," QUIT",Z) Correctly Rendered suffer from hunger, while Africa’s malnutrition rate has hardly budged. QUARTIC",B)Output(3,1,"a ="):Output(4,1,"b =") LblQUIT",Z) R:ClrHome Overview Output(5,1,"c ="):Output(6,1,"d =") LblInput R:ClrHome "NO. OF PTS. ",N Lbl A:ClrHome 2. Paragraph 111 Q. I need ‘font x’. ... Can you tell me where I can get Pause :Goto 7 N!dim(L#):N!dim(L$):1!S The last sentence should read: Inputit? "NO. OF PTS. ",N LblIf N<4:Goto B:ClrHome 0 For(I,1,N):ClrHome Nüdim(LA. This ):Nüdim(L‚):1üS is very unlikely, as there are over 100,000 IfCubicReg N<5:Goto L ,L‚:3üT:"aXÓ+bXÜ+cX+d"üY 0 This is why I intend to submit to the General Assembly in September 2002 a Disp "PT. NO.","":Output(1,9,I)1 report that will propose further programmatic, institutional and process For(I,1,N):ClrHomedigital fonts in existence. QuartRegDisp "aXÓ+bXÜ+cX+d" L#,L$:7!T Input "X = ",X:Input "Y = ",Y improvements, so that we can translate the ambitious template of the "aX^4+bX'+cX"+dX+e"!Y#:ClrHome XDisp!L#Doubtless, "PT.(I):Y NO.","":Output(1,9,I)!L$ many(I):End:Goto readers have T witnessed PowerPoint Disp "----------------Default",a,b,c,d Substitution Declaration into an achievable agenda of action. Disp "aX^4+bX'+cX"+"," dX+e" LblInputpresentations W:ClrHome:Input "X = ",X:Input where the "Y slides = "WHICH",Y were clearly ENTRY? missing ",Wglyphs Output(3,1,"a ="):Output(4,1,"b =") (visible characters) or were otherwise poorly rendered. In Disp a,b,c,d,eFigure 1: TI83plus 3. font Annex samples table, goal 6 ClrHome:DispXüL (I):YüL‚(I):End:Goto "CURRENTLY:","" T Output(5,1,"c ="):Output(6,1,"d =") most cases, this unhappy event is the direct result of copy- Output(3,1,"a ="):Output(4,1,"b Replace the =") text of goal 6 by the following: L#(W)!L%(1):L$(W)!L%(2) Lbling W:ClrHome:Input the presentation from "WHICH the machine ENTRY? upon ",W which it was Output(5,1,"cPause :Goto TI83plus7 ="):Output(6,1,"d Font =") Disp L%,"":Input "X = ",X In the process of preparing this paper we found several ClrHome:Dispcreated to a machine "CURRENTLY:","" provided for the presentation with- documentsOutput(7,1,"eLbl B:ClrHome with barcodes ="):Pause rendered using:Goto the 7 font “Bar- X!L#(W):Input "Y = ",Y L(W)üLƒ(1):L‚(W)üLƒ(2)out ensuring that the target machine has the required fonts. codeLblIf N<5:Goto 34:ClrHome:LnReg of 9 by request”.0 When L rendered#,L$:4 with!T font substi- Y!L$(W):Goto T DispThe Lƒ,"":Input reason is not always"X = ",X clear to the presenter because Mi- tutions,QuartReg"a+bln(X)" the bar L codes,L‚:7üT!Y#:Disp of the numbers"LnReg are replaced a+bln(X)" by Ara- crosoft Office performs font substitution without warning. bicDisp numerals. "---------------- Thus, the substitution",a,b,"",r preserved the numeric Annoying font substitutions occur frequently. For meaning, but not the functional ability to be scanned! Un- example, symbols such as apostrophes and quotations fortunately, there areRegular many barcode fonts and we were un- are rendered with the “WP TypographicSymbol” font in able to find the required font; however, we were able to WordPerfect Office 11. When these documents are mi- find a suitable substituteABCDEFGHIJKLM and to configure Office to accept grated to Microsoft Word, this font dependence is pre- our substitute as illustratedNOPQRSTUVWXYZ in Figure 2.