A reprint from American Scientist the magazine of Sigma Xi, The Scientific Research Society

This reprint is provided for personal and noncommercial use. For any other use, please send a request Brian Hayes by electronic mail to [email protected]. Computing Science

Writing Math on the

Brian Hayes

he was in- because characters could be placed Tvented at a physics laboratory, The Web would make with equal facility at any position and and the first users were scientists and scaled to any size. But the software for engineers. You might think, therefore, running the phototypesetting machines that this new channel of communica- a dandy blackboard offered no easy way to exploit this flex- tion would be especially well adapted ibility. The printed product was often to scientific discourse—that it would if only we could inferior to expertly set metal type. facilitate the expression of ideas like scribble an equation Enter Donald E. Knuth of Stanford ∂B University, who was so discontented E = with the deteriorating quality of math- ∇ × − ∂t ematical that he set aside or signments to blogs. Ideally, the Web other work and undertook to build his ∞ 1 1 ζ(s) = = . would serve as an extension of the own typesetting system. The eventual ns 1 1 blackboard where people gather to result was TeX, a formatting language n=1 p prime ps  − talk about science and math during not just for equations but for entire If only it were so! The truth is, the ba- coffee breaks. We need chalk for that documents. Leslie Lamport, now of sic protocols of the Web offer almost blackboard. Microsoft Research, soon introduced no support for rendering mathemat- The problem is not one of simple LaTeX, a higher-level language built ics or other specialized notations such neglect. Over the years there have atop Knuth’s TeX. (The names are pro- as chemical formulas. Presenting such been many earnest efforts to build a nounced tek and lah-tek.) material on a Web page often requires reliable facility for writing and reading The first equation in the first para- software add-ons or plug-ins to be in- mathematics online. The trouble is, no graph on this page is encoded in LaTeX stalled by the author or the reader or one solution has yet gained the kind as follows: both. Fine-tuning the display of math- of widespread adoption that would \nabla \times \mathbf{E}= ematics can be a fussy and finicky pro- make it a standard, supported in main- -\frac{\partial \mathbf{B}} cess, not much easier than formatting stream Web servers and browsers. Still, {\partial{t}}. equations with a typewriter. The re- there’s room for hope. We have tech- sults sometimes render differently—or nologies that work, if we can agree on Each of the terms beginning with a not at all—in various Web browsers. how to use them. And a minor change backslash is a LaTeX command. For This is a sad situation: As the Web has to the infrastructure of the Web might example, \frac{}{} constructs a frac- evolved into a thriving marketplace smooth the way for more online math. tion, with whatever appears inside and playground, the scholarly and the two pairs of brackets forming scientific community that created the Penalty Copy the numerator and the denomina- technology has not been well served. Even in the world of ink-on-paper pub- . Some commands simply insert a The confused state of online math- lishing, mathematical notation can be a single character, as with \times ( ), × ematical typography is worrisome as challenge. In the days when printing \partial (∂) and \nabla ( ). The well as sad. In years to come the Web was done with metal type, equations \mathbf{} command applies ∇a bold- will surely be the most important con- had to be assembled by hand, piecing face to the bracketed text. duit for scientific information. Already together symbols on individual sliv- Let me call attention to what is not it is a major channel for distributing ers of metal and shimming them into present in the TeX encoding. There are publications and preprints in many position. A manuscript with a lot of no explicit instructions for arranging disciplines, and it is becoming a venue mathematics was known as “penalty the symbols on the page; all of the for less-formal jottings and conversa- copy” because printers would charge geometric details, such as centering tions—everything from homework as- extra to compose it. the numerator over the denominator By the 1970s, metal type was giv- and drawing a horizontal bar between Brian Hayes is senior writer for American Sci- ing way to optical and electronic type- them, are handled automatically. entist. Additional material related to the “Com- setting devices, which projected let- Also missing is any hint of what the puting Science” column appears in Hayes’s blog terforms onto photographic film. In equation means; TeX is a language for at bit-player.org. Address: 211 Dacian Avenue, principle the new machinery might describing mathematical notation, not Durham, NC 27701. Internet: brian@bit-player. have aided mathematical typesetting for doing mathematics. For example, © 2009 Brian Hayes. Reproduction with permission only. 98 American Scientist, Volume 97 Contact [email protected]. “ E” is merely a sequence of three symbols,∇× with no reference to the op- eration those symbols denote—namely the “curl” of a vector field, a measure of the field’s rotation or angular mo- mentum. (The first equation is one of James Clerk Maxwell’s four famous equations describing the electromag- netic field. The second example gives two definitions of the Riemann zeta function, revealing a remarkable iden- tity between an infinite sum and an infinite product.) TeX has transformed the process of putting mathematical ideas on paper. What once required the services of a skilled typesetter can now be done by a mere mathematician. A manuscript can go from the author to a printed journal without human intervention. But what about the Web, which did not yet exist when Knuth developed TeX? Modern TeX systems produce output in the form of Postscript or PDF files. Although documents in these for- mats can be distributed over the Web— the arXiv preprint server dispatches thousands of them every day—they are not quite first-class citizens of the Web world. Browsers cannot display Postscript or PDF files without the aid Mathematics remains a marginal participant on the World Wide Web, where finely typeset of extra software, and many readers equations are difficult to produce and their appearance blends poorly with other textual ele- choose to download such files and ments. In this excerpt from a page of Wikipedia, the collaborative encyclopedia, most math- view them offline or print them. This ematical notation is rendered by means of embedded images, which differ in size, style and alignment from the rest of the text. In order to show how the page was assembled, the works well enough for static docu- stylesheet governing the display was altered to give all images a green border and a contrast- ments such as journal articles, but it’s ing background. A few bits of mathematical notation, marked in bright red, are not images not ideal for the more volatile and but ordinary characters; yet even they do not match the size of the main text. The highlighted interactive areas of the Web, such as panel associated with the first equation shows the TeX code that produced the equation. blogs or the always-under-revision of Wikipedia. For mathematics to be fully at home online, it needs to ming language, so that Web pages be- of , which have room be translated into one of the Web’s na- come not just static documents but in- for a larger collection of glyphs than tive languages. teractive programs. Still, none of these earlier font formats. But it’s still not to features directly address the needs of be taken for granted that every reader Marking It Up scientific and mathematical writing. will have the necessary fonts installed. From the outset, the primary language There are no HTML tags for integrals, The second problem is one of layout. of the Web has been HTML (Hypertext say, or for matrices. Mathematical notation is two-dimen- ), in which “tags” Two main impediments stand in sional; in order to represent a matrix, identify various parts of a document’s the way of presenting mathematics on say, or a summation with upper and structure, such as headings, paragraphs the Web. First is the alphabet prob- lower bounds, it’s necessary to specify and lists. In its earliest versions, HTML lem: Mathematicians have created a the exact x and y coordinates of sym- was quite simple; it couldn’t even do sprawling zoo of novel symbols and bols. Some elements of mathematical subscripts and superscripts, so there embellished or transformed versions notation, such as brackets and the radi- wasn’t much hope of displaying elabo- of familiar characters. The nabla ( ) cal that designates a square root, vary rate mathematical notation. that appears in Maxwell’s equations∇ is in shape and size as well as position. Several revisions later, HTML does a notable example, and it is joined by Encoding such geometric information have subscript and superscript tags. hundreds of other unusual glyphs—∂, in HTML and CSS is not impossible, but And a supplementary language called R, , , , —not to mention the entire it stretches the technology to its limit. CSS (Cascading Style Sheets) allows Greek∀ ∃ alphabet⊥ ∫ and occasional bor- Early in the history of the Web, a finer control over many aspects of the rowings from Hebrew and other lan- group of mathematicians and other appearance of a Web page. Modern guages. The difficulty of reproducing interested parties gathered to address browsers are also equipped with an these characters has been alleviated this issue in a systematic way. The re- interpreter for JavaScript, a program- to some extent by the recent adoption sult was a new markup language called © 2009 Brian Hayes. Reproduction with permission only. www.americanscientist.org Contact [email protected]. 2009 March–April 99 MathML, which was endorsed in 1998 Instead it relies on a variety of inge- symbol becomes a separate image, and by the World Wide Web Consortium. nious but ad hoc workarounds. Most multiple images have to be assembled I’ll return below to the present status often there is a TeX system somewhere and carefully positioned to represent and future prospects of MathML, but it in the background. an equation. In other cases an entire will suffice for now to note that most of One common strategy is to convert equation is encapsulated in a single the mathematical notation to be found a mathematical expression to an im- image. The practice takes us back to on the Web is not encoded in MathML. age, or “bitmap.” In some cases each the pre-alphabetic tradition of picto- graphic writing. Pictograms have one big advantage: b √b2 4ac Any symbol, no matter how arcane, x = − ± − can be displayed in any browser. But 2a there are also drawbacks. Symbols in the images may not blend well with TeX encoding on the page, and it’s hard to x=\frac{-b\pm\sqrt{b^2-4ac}}{2a} control size, spacing and alignment. A math-intensive document could have MathML presentation encoding MathML content encoding hundreds of small images, which are slow to load and display. Images can-  x  not be copied and pasted in the same  =  way that text can, and the equations    x cannot easily be edited or revised.     But relying on fonts to supply    -   mathematical symbols also has haz-    b    ards. The author of a Web page can-    €    not know what fonts are available on                the reader’s computer, or which of the           b available fonts will be selected for any       b     given symbol. Thus a page that looks       2     fine to the author may display very          differently for some readers, or could      -      be completely indecipherable.      4           a       From Author to Server to Reader      c       For mathematical prose written in            b TeX, one Web-publishing strategy           2         is to translate the entire document         in advance, creating an HTML ver-    2       sion that can then be displayed in a    a        browser without further special han-          dling. Two programs for this purpose          4 are LaTeX2HTML (written in 1995         a by Nikos Drakos, now of Gartner        Research) and TeX4HT (by Eitan M.        c Gurari of Ohio State University). In       essence, the translation programs re-          place the “back end” of a TeX system    with software that generates HTML    and bitmap images rather than output    intended for printing.     2 A translator of this kind runs on the     a author’s own computer; the HTML    files and accompanying images are   then uploaded to a Web server for dis-  tribution. This work is suited to long-lived and seldom-modified docu- ments written in TeX. The arrangement Markup languages provide a one-dimensional description of two-dimensional mathemati- is less convenient with frequently up- cal notation; the markup also encodes troublesome symbols (such as the square-root sign, or dated content or Web sites maintained radical) in an ordinary alphabet. Here the quadratic formula (top) is represented first in the TeX markup language and then in two versions of MathML, a language devised explicitly for by multiple authors in collaboration. displaying mathematics on the Web. The presentation layer of MathML focuses on the nota- It’s also not ideal for documents that tion itself—on symbols and their arrangement—whereas the content layer attempts to capture are written mostly in HTML rather meaning. Although the content description is quite prolix, it cannot quite represent the full than TeX, with just occasional snippets semantics of the quadratic formula, because content MathML has no “±“ operator. of mathematics. © 2009 Brian Hayes. Reproduction with permission only. 100 American Scientist, Volume 97 Contact [email protected]. A second strategy is to perform the author server reader translation not on the author’s com- iπ puter but on the Web server. Two pro- \Tex e 1 = 0 − grams created by John Forkosh adopt this modus operandi. MathTeX relies on a separate TeX system to do the actual + + + rendering of mathematical notation; MimeTeX is self-contained, with its own TeX interpreter. Both programs \Tex \Tex eiπ 1 = 0 generate bitmap images of entire equa- − tions. This scheme is not meant for processing complete TeX documents; + + it works with HTML documents that have interspersed TeX expressions. Processing mathematics on the server eiπ 1 = 0 works particularly well for collabora- − tive Web sites, since the translation + software has to be installed on only one machine. Wikipedia and many \Tex \Tex \Tex similar sites rely on this approach. (The math processor for Wikipedia is a Translation from one markup language into another and finally into a formatted equation program called texvc, which generates takes place in stages that can be performed at various points along the pathway from author to images for complex expressions but reader. Typically, an input language such as TeX is converted into some combination of HTML outputs HTML for simple ones.) (the main language of the Web) and images. The bulk of this work can happen on the author’s The drawbacks of server-side soft- computer (top), on the Web server (middle) or on the reader’s computer (bottom). ware are those that afflict all image- based solutions—clumsy typography ematics is liberated from all the con- Computer Modern faces introduced by and a profusion of tiny image files. But straints of HTML. The plug-in can Knuth; those fonts are freely available, if the result falls short of elegance, at supply its own fonts and can place but jsMath can access them only if the least it works reliably for most readers, characters with as much precision reader downloads and installs them. A no matter what browser they choose. as the hardware will allow. The big fallback strategy is to assemble equa- ­disadvantage is that none of this magic tions from individual character images. Let the Browser Do It works until the end user downloads The full set of images—some 78 mega- As we have just seen, translation soft- and installs the plug-in. This barrier bytes worth—is stored on the server; ware can run on the author’s comput- to entry tends to discourage casual only the subset needed is downloaded er or on a server. There is one more visitors to a Web site. It also creates a with any particular document. The possibility: performing the translation threshold effect: Authors hesitate to third option is to build equations from on the reader’s computer, or what is rely on the technology until enough Unicode fonts. The reader selects one of known as “the client side.” Under this readers adopt it, and vice versa. the three rendering methods through a plan, TeX or some other encoding of An interesting alternative to a plug-in pop-up control panel. mathematical content is written into might be called a slip-in. The idea is to Squeezing a TeX interpreter into a the Web document and passed along bundle up the translator program and Web page is an impressive feat, but unchanged by the server, to be inter- include it as part of the Web page itself. it adds considerable bulk and com- preted by the browser. There’s no need for the user to install plexity. Documents with many equa- Of course the problem with sending any software; the translation program tions take a while to finish rendering. TeX to the browser is that the browser runs automatically when the page is Cervone is now launching a follow- has no idea what to do with it. That loaded into a browser. A program of on project called MathJAX, supported issue can be addressed by means of this kind called jsMath takes advantage by several publishers of mathematical a plug-in—a software component in- of the JavaScript programming lan- software. The aim is to make the sys- stalled within the browser. guage built into Web browsers. tem more flexible and responsive. The best known mathematical plug- JsMath is the creation of Davide P. in is techexplorer, a program initially Cervone of Union College in Schenect- Whatever Happened to MathML? developed by Robert S. Sutor at IBM ady, N.Y. Cervone undertook the proj- Typesetting mathematics with HTML and now maintained and distributed ect mainly to meet his own needs: He and bitmap images is rather like turn- by Integre Technical Publishing. Tech­ wanted to distribute class notes and ing a bicycle into a sailboat. You can’t explorer can display complete LaTeX homework assignments via the Web, help admiring the audacity of the at- documents or it can render just the and none of the available solutions tempt, but the result still doesn’t seem mathematical expressions within an were entirely satisfactory. So he wrote like the best vehicle for the purpose. HTML document. In either case the what amounts to a TeX interpreter in a Why not choose MathML, which was display of equations is based on fonts JavaScript program. designed explicitly for this task? rather than images. JsMath offers three styles of equation MathML is a variety of XML (the The big advantage of a software rendering. The most graceful display eXtensible Markup Language). Com- plug-in is that the rendering of math- requires a set of six fonts based on the pared with TeX, it is a more formal © 2009 Brian Hayes. Reproduction with permission only. www.americanscientist.org Contact [email protected]. 2009 March–April 101 language, and it is also far more ver- One reason is lack of support in brows- was solved in the case of PDF files, bose. Consider the simple expression ers. In the early years, the only way which do embed fonts. Even if propri- x+1, which might be encoded in TeX to read MathML Web documents was etary typefaces were off limits, there as $x+1$. (The dollar signs mark the with plug-in software. More recently are enough freely available fonts—in- content as mathematics rather than a few browsers—notably those of the cluding all those commonly used with ordinary text.) In MathML the same family, such as —have TeX—to make the prospect attractive. expression takes this form: gained native support for MathML. For a change of this kind to have But there is still confusion over how any impact, all the browser makers MathML content should be embed- would have to adopt it. Those are the x ded in an HTML document. Moreover, same people who have so far resisted + MathML has not solved the fonts prob- implementing MathML. Why would 1 lem; readers are still responsible for they treat the font proposal any dif- . installing appropriate fonts. ferently? I think there is reason for op- Each symbol is tagged to indicate its Another factor inhibiting the spread timism on this score, not because the role— for an identifier, for an of MathML is simply that TeX is deep- mathematical community has much operator, for a number—and the ly entrenched, particularly in physics, clout but because embedded fonts expression as a whole is wrapped in an mathematics and computer science. If would be of value to other constituen- tag to show that it belongs on a you live in a TeX-centric universe—I cies. Advertisers, in particular, would single line. Capturing this information have a friend who even writes love let- be pleased to gain greater control over is potentially useful, since identifiers, ters in TeX—it’s hard to see any benefit . operators and numbers are accorded of a new and very different language. Meanwhile, as I finish preparing this different typographic treatment. (TeX column for the press, I also face the task has to infer the role of each symbol, Embedded Assets of helping to get my own penalty copy and occasionally gets it wrong.) How will it all turn out? Will Web sites ready for publication on the American There’s more. The style of markup of the future be chock full of MathML, Scientist Web site. Those irksome equa- shown above is only half of MathML. or will TeX and HTML continue to pre- tions and curious characters I’ve been It’s called the presentation language; vail? Or will something else altogether writing about will somehow have to there is also a content language, which come along? be made Web-friendly. I don’t know attempts to express meaning rather I have no answers for these ques- exactly how we’re going to do that, but than layout. The expression x+1 would tions, but I want to suggest an adjust- I suspect some bicycles are going to be have this content markup: ment in the way the Web works—a outfitted with spinnakers and jibs. small change that could improve any strategy for displaying mathematical Bibliography notation. It has to do with where fonts Beeton, Barbara, Asmus Freytag and Mur- x come from. ray Sargent III. 2008. Unicode support for 1 Under the present rules, a Web au- mathematics. Unicode Technical Report No. . 25. http://www.unicode.org/reports/tr25 thor can request a particular font, and Bos, Bert. 2008. For and against standardiz- Here does not refer to the the reader’s browser will honor the re- ing font embedding. http://www.w3.org/ symbol + but to the mathematical op- quest if the font is available on the client Fonts/Misc/eot-report-2008 eration of addition. In this structure we machine. If not, some default font is Cervone, Davide P. Web site. jsMath: A method of get a glimpse of a grand vision—Web substituted. Wouldn’t it be more helpful including mathematics in Web pages. http:// pages with active mathematical con- if the author could supply the missing www.math.union.edu/~dpvc/jsMath tent, where the menu of things you font, either by embedding it directly in Goossens, Michel, and Sebastian Rahtz with might do with an expression includes the page or by referring the browser to Eitan Gurari, Ross Moore and Robert Sutor. 1999. The LaTeX Web Companion: Integrating not just copying, pasting and printing a site where the font is available? Given TeX, HTML, and XML. Reading, Mass.: Ad- but also solving an equation, graphing such a mechanism, any font-based sys- dison-Wesley Longman. a function and factoring a polynomial. tem for presenting mathematics could Jipsen, Peter. Web site. Translating ASCII math no- MathML has an enthusiastic com- ensure that all the needed symbols are tation to Presentation MathML. http://www1. munity of developers and users. There ready at hand. chapman.edu/~jipsen/asciimath.html is commercial software for writing and This is not a new idea. A proposal Knuth, Donald E. 1984. The TEXbook. Illustra- editing MathML documents (notably for “Webfonts” was included in a draft tions by Duane Bibby. Reading, Mass.: Ad- dison-Wesley Pub. Co. from Design Science and Integre Tech- CSS standard in 1998, and the idea was Miner, Robert, and Jeff Schaefer. 1998. A gen- nical Publishing) as well as a noncom- even implemented in a few browsers, tle introduction to MathML. http://www. mercial translation program called including Microsoft’s Internet Explor- dessci.com/en/support/mathtype/tu- ASCIIMathML, created by Peter Jipsen er. But the proposal never caught on, torials//default.htm of Chapman University in Orange, and it was removed from later ver- Miner, Robert. 2005. The importance of Calif. Several large scholarly publish- sions of the standard. Recently, Håkon MathML to mathematics communication. ers, including the American Institute Wium Lie of Software has called Notices of the AMS 52:532–538. of Physics, have based their operations for renewed consideration of the idea. Soiffer, Neil. 1997. MathML: A proposal for representing mathematics in HTML. on XML and MathML; so has the U.S. Much of the discussion centers on SIGSAM Bulletin 31(3):44. Patent Office. legal questions of interest to the own- Wright, Francis J. 2000. Interactive mathemat- On the Web, however, MathML has ers of copyrights. This doesn’t ics via the Web using MathML. SIGSAM not exactly swept away the competition. seem like an insuperable problem. It Bulletin 34(2):49–57. © 2009 Brian Hayes. Reproduction with permission only. 102 American Scientist, Volume 97 Contact [email protected].