TUGboat, Volume 34 (2013), No. 2 223

Representing linguistic pitch in (X )LE ATEX not be segmented — an alternative will be presented in Section 4. Kevin Donnelly Charis SIL [10] is selected as the main font. This is a Abstract normal text font, like Times, but includes a vast range of glyphs to meet a range of phonetic and orthographic re- Linguists, especially those working with languages, quirements. (This article is typeset in Charis SIL instead of may need to depict pitch levels in the language examples the normal TUGboat font in order to enable easier display they use. This article looks at some options in LAT X and E of text being discussed.) X T X for representing pitch and tone. The emphasis is E E \defaultfontfeatures{Mapping=tex-text, on Africanist linguistics, since that is the area with which Scale=MatchLowercase} I am most familiar. \setmainfont{Charis SIL}

1 Introduction The \defaultfontfeatures command must precede the actual font selection, and tells fontspec to use com- For extended linguistic work X TE EX is preferable, since it mon T X aliases (e.g. -- will produce – , and | will pro- uses UTF-8 natively, and allows fonts to be chosen that E duce | instead of —), and to scale all fonts to match the will represent all aspects of the language’s orthography main font’s lower-case size. This ensures that any ad- (whether that is already standardised, or is one of the ditional fonts selected (see next paragraph) will better things that you are working on). However, it is still pos- match the weight of the main font. sible to represent pitch and tone effectively even if you fontspec also allows fonts to be set up for particu- need to use LATEX, though a different set of tools is re- lar purposes. For instance, if we are annotating the syn- quired. Notes for both TEX flavours are given here, and tactic relationships of particular words, and wish to use a sample documents for each are attached — pitch_latex.tex separate font for that, in the preamble we could add and pitch_xetex.tex. Everything has been tested on Ubuntu \newfontfamily{\syntaxfont}{Liberation Sans} 12.04 running TEX Live 2012 (note that older TEX Live ver- \newcommand \syn[1]{{\syntaxfont #1}} sions may not produce the same output). Here, the first line specifies Liberation Sans as thefont 2 Setting up the document for syntax annotations in example glosses, and the second Both sample documents assume that you are writing a line sets up a new command so that book, and additional packages or code to enable pitch \syn{annotation} and tone to be represented are then added. can be used instead of the more cumbersome For LAT X, the package tipa [8] is used, with addi- E {\syntaxfont annotation} tional code to allow pitch marks to be printed without sidebars (this is common for East Asian languages, but For both files, the package expex [2] is optional, not for African languages): but recommended because it offers good typesetting of \makeatletter linguistic examples, and allows word-by-word glossing. \renewcommand\@tonestembar{% \setbox0\hbox{\tipaencoding\char'277}% 3 Tone-marking individual words \hbox{\vrule height \ht0 depth \dp0 The first priority in representing a tone-language isthe width 0pt}} availability of sufficient glyphs (mostly ) for the \makeatother tone-marking. It is worth noting that the readability of Times is set as the default font: diacritics (or even whether they are displayed at all) de- pends crucially on the font — not all are capable of show- \usepackage{mathptmx} ing all diacritics, or placing them in the right location. For X TE EX, the package fontspec [9] is used. Most Diacritics commonly used in representing tone in- (but apparently not all) of the features in tipa are now clude: á à â ǎ ā a̋ ȁ a̍ ꜛa ꜜa ꜝa ꜞa. This list is not exhaus- part of xunicode, so there is no need to activate tipa tive — many other diacritics are available. (doing so produces warnings such as: Command \sups al- Many of these diacritics may already be available ready defined). Unfortunately, without tipa, commands via your keyboard (for instance, on the default UK key- like \textupstep in X TE EX do not work properly (they board in Ubuntu, á is produced using AltGr+; then a; à appear after the letter they refer to rather than before by AltGr+# then a, and â by AltGr+’ then a). If dia- it), so some workaround code is needed — see Section 3. critics are to be used frequently, it is worth setting up a Moreover, the LATEX code above to print pitch marks with- keyboard layout to allow this. out sidebars will not work (thanks to Alan Munn for point- In LATEX, commands can be used to access diacrit- ing this out), since the equivalent glyphs in Unicode can- ics — the Comprehensive LATEX Symbol List [6] and Appen-

Representing linguistic pitch in (X )LE ATEX 224 TUGboat, Volume 34 (2013), No. 2 dix A.3 of the TIPA Manual [8] are valuable here. Most of clusions as to how these pitches should be represented in the other diacritics above can be accessed as follows: the tone-marking. ǎ \v{a} The \tone command can be used for this in LATEX: ā \={a} ibuuna [~\tone{11}\, \tone{5555}\, \tone{11}~] \H{a} a̋ Here, the individual pitches are represented by the num- \H*{a} ȁ ber of the tone letter (see A.2.1 of the TIPA Manual [8]) — \textvbaraccent{a}} a̍ five levels are available. Longer pitches can be depicted \textupstep{a} ꜛa by repeating the tone numbers, as in \tone{5555}. Glides \textdownstep{a} ꜛa can be represented by sequences of numbers, and some The exceptions are the superscript exclamation marks ꜝ variation is possible here; as shown in the accompany- and ꜞ (used in African-language linguistics to represent ing file pitch_latex.tex, a rising pitch on the last of and respectively). a variant pre-pausal form of ibuuna can be represented X TE EX offers more scope for defining diacritics, since using any of \tone{13}, \tone{113}, \tone{1133} and UTF-8 allows diacritics to be combined (rather than hav- \tone{133}. ing to depend on precomposed glyphs). This means that The most reliable way of handling spacing between writing a\char"030D will produce a̍, because the num- the pitchmarks is to use \hspace{⟨dimen⟩}, giving mea- ber for the UTF-8 combining for the vertical dia- surements in em, mm or pt. For ease of use, this can be critic is 030D — the relevant numbers for each glyph can set up in a command such as: be found by using a utility like KCharSelect. [3] \newcommand \gap[1]{\hspace{#1mm}} However, since it is difficult to remember these char- so that a gap of 2mm can then be inserted by using acter numbers, and they are tedious to type, it is best to \gap{2}, and so on. set up commands to produce them. Setting up the follow- In X TE EX, Charis SIL provides five tone letters similar ing commands in the preamble: to those in TIPA, in each of four variants: line or dot with \newcommand \aup[1]{\char"F19E#1} tonestembar on the right or the left (see section 1.4 of \newcommand \adown[1]{\char"F19F#1} Marking Tone [7]). Issuing the command will allow us to get the Africanist up/downstep characters \fontspec[Renderer=Graphite, RawFeature= by using \aup{a} and \adown{a}: ꜝa, ꜞa. {Special=Hide tone staves}] {Charis SIL} In those cases where the LATEX tipa commands do not produce the desired results in X TE EX, you can override after \begin{document} will remove the tonestembars on them — this may also be useful if you have an older text glides, but not on level pitches, so these glyphs are inap- which you are converting, where you wish to minimise propriate for African-language linguistics. the number of textual changes that need to be made: However, Charis SIL also provides a set of nine pitch \renewcommand \textupstep[1]{\char"A71B#1} marks (glyphs F1F1–F1F9) specifically aimed at address- \renewcommand \textdownstep[1]{\char"A71C#1} ing this issue (see section 2 of Marking Tone [7]); these allow pitches to be shown inline. To simplify usage, the This allows \textupstep{a} and \textdownstep{a} to glyph numbers can be replaced by a \pitch command: continue to be used to give ꜛa and ꜜa respectively. \newcommand \pitch[1]{\char"F1F#1} The greater flexibility offered by combining diacrit- ics can also be used to produce new diacritics. For in- and ibuuna [   ] can then be produced by writing: stance, adding an acute and then a macron: ibuuna [~\pitch1\gap{1}\pitch9\pitch9 \newcommand \hbend[1]{#1\char"0301\char"0304} \gap{1}\pitch1~] can yield a diacritic to represent the end of a high-tone repeating the glyph number if longer pitches need to be bridge: \hbend{a} gives á.̄ shown. Inserting the command: However, not all combining diacritics may be visible \fontspec[Renderer=Graphite]{Charis SIL} to X TE EX — in Charis SIL, for instance, glyphs in the ranges after \begin{document} will again allow glides to be rep- 07nn, 1Dnn and 20nn do not give the expected output. resented in a variety of ways: [   엵 ] [   엵] [   엵] [   엵]. 4 Pitch representation for individual words Since these pitch-glyphs are in the Private Use Area Inline pitch representation of individual words has been of the font (in other words, they are not yet official Uni- used in the past in African-language linguistics to draw at- code glyphs) Charis SIL is the only font that they can be tention to the pitch contours of individual words, so that, used with. for example, the pitches of the kiKongo word ibuuna [  Mark Wibrow has presented a solution [1, 13] us-  ] (so) can be shown without drawing premature con- ing TikZ [11] that works with both LATEX and X TE EX, is

Kevin Donnelly TUGboat, Volume 34 (2013), No. 2 225 font-independent, and allows a greater variety of pitch- An approach that avoids this uses pstricks code marks. Using this, marking such as ibuuna [. ] can by John Frampton, which sets up six pitchmarks (glides be produced by writing: can also be specified). The marks are then applied simply [\tikz[baseline={(0,0.25ex)}]% by prefacing each vowel with the relevant level, so that \contour[contour only, contour scale=2ex/6, writing: contour marks={1.55.1}] \1ib\5u\5un\1a · b\1as\5i\5id\4i {ibuuna};] k\3il\2umb\pitchup u The positioning and size of the inline marking is han- will give: dled by the baseline and contour scale directives. The marking itself is input using numbers 1–5 to represent (2) ibuuna · basiidi kilumbu the levels, locating them above each letter that requires a pitchmark; letters not requiring pitchmarks must be des- so · they set aside a day ignated by a full stop (.) — that is, the number of items Mark Wibrow’s TikZ solution [13][1] works on both in the pitchmark line must be the same as the number of LATEX and X TE EX. The correspondence between letters and letters, spaces, etc. in the text line. (To do this easily, pitchmarks makes it easy to place and edit the marks your editor should be switched temporarily to a mono- while keeping the text easy to read: space font if it is not already using one.) contour marks={0.55.0....0.55.4..3.2..?}] A variety of glides can be represented: [. ] {ibuuna · basiidi kilumbu}; [. ][. ][. ][. ], along with the sort of shape which would be difficult to represent with The pitchmarks can be placed above or below the text, depending on the values used for contour raise: either of the earlier options, e.g. [. ][. ] Since only the pitchmarks are shown, they can be .ibuuna · basiidi kilumbu lengthened by duplicating letters in the text. For instance, to get an extra-long mark for the first syllable, use: (3) contour marks={1111.55.,}] so · they set aside a day … {iiiibuuna};] giving [. ] — see also the end of section 7. (4) .ibuuna · basiidi kilumbu 5 Representing pitchlevels over word-sequences so · they set aside a day … Depicting pitchlevels over word-sequences rather than in- Note that the TikZ solution will not work in LATEX dividual words is usually best done using a tiered format. if the text contains characters such as raised dots (·), so The same general principles as for inline marking apply. this is another argument for using X TE EX. Note also that A tipa For L TEX, the pitchmarks look best when used expex places its example number at the baseline of the expex in conjunction with , as shown in the accompa- TikZ graphic — there is currently no means of adjusting nying file pitch_latex.tex, where the pitchmark line takes the vertical placement of the example number. the place of a gloss. Note that sequences of more than one word in the text line need to be enclosed in braces, 6 Representing pitch contours over phrases and items in the text line which do not need pitchmarks In tone languages, the pitch domain is the syllable, and aligned to them should be marked by empty braces in the sections 3–5 have shown various means of representing gloss (pitchmark) line. pitch on the syllable. For pitch-accent and lan- The main drawback to this approach is that trial- guages, the pitch domain is the word, phrase, or sentence, and-error is required to place the pitchmarks, experiment- and this section sets out a method for depicting pitch con- \gap ing with different s until the marks are aligned with tours over these longer stretches. the relevant syllable. The pitchmark line is also hard to Mark Wibrow’s TikZ code [1, 12] allows contours edit, since it is unclear to which part of the text line it to be represented in a variety of ways in both LATEX and refers. X TE EX. In the following examples, an English interroga- For X TE EX, the Charis SIL pitchmarks can be used in tive sentence (“Where are you going?”) will be used, on expex tipa conjunction with — as with , this requires the which two pitch contours will be represented. The first same trial-and-error to space the pitchmarks (in the fol- of these (a) is the neutral or default contour, in answer lowing examples, a raised dot (·) denotes a short pause): to “I’m going out now” — this has high pitch on where and go. The second contour (b) is a focussed one, in answer (1) ibuuna · basiidi kilumbu to “I’m going out now, but not to the shops”, and has high         엵 so · they set aside a day … pitch on are.

Representing linguistic pitch in (X )LE ATEX 226 TUGboat, Volume 34 (2013), No. 2

One option is to place the words themselves on dif- the gap between the text line and the contour can be spec- ferent levels, using the tokens follow contour directive. ified by using contour raise — the gap here is twice that in the previous examples. Where go are you (5) a. . ing? are .Where are you going? you (9) go .Where ing? b. The contour can also be annotated by drawing ad- \path An inline notation is used, where a defined character ditional s such as the following, part of example (by default, the pipe character |, but this can be changed (10): to other characters, e.g. an asterisk) is used to specify the \path [draw=red, ->] ([yshift=0.25cm] anchor points for the contour, and the height of the con- mycontour-1) -- ([yshift=0.25cm] mycontour-3) node [midway, sloped, above] tour is specified by a following digit from 0–10 in square {\small rising}; brackets: \contour[tokens follow contour] contour mark prefix is used to give the contour a name {|[10]_Where_ |[3]are you |[6]_go_|[0]ing?}; (here mycontour), and then a red arrow (grayscaled for The /high-pitch composite in example (5) has the printed version) is drawn between the relevant anchor been marked by underlining (generated by a preceding points of mycontour, counted sequentially (in this case, yshift and following underscore). It is also possible to place the between anchor points 1 and 3). The directive words in boxes (box tokens), and expand the spaces be- specifies how far above the contour the arrow should be, node tween them (space token width), as in this example: and the clause specifies the location and size of the annotation “rising”. The contour is also styled to appear Where as a dotted line. go are you falling . ing? (6) a. rising A more familiar option is to show the text on a line as usual, with a contour line above it. Example (7) shows (10) b. .Where are you going? this, and also different styles for the contour linecontour ( /.style)— thick and gray in (a), and ultra thick and red in (b): 7 Replicating early typesetting of tone Nowadays the conventions of describing tone and into- nation are well-established, but at the beginning of the study of the role of pitch in languages even concepts such .Where are you going? (7) a. as “absolute pitch” and “relative pitch” were not well- understood. One of the earliest researchers in this field, Karl Laman, a Swedish missionary to the Kongo, produced the first dictionary of an African language to be tone- marked throughout [5]. His 1922 book, The Musical Ac- b. .Where are you going? cent or Intonation in the Kongo Language [4], was one of the A contour can be displayed by itself, without show- first modern attempts to systematise and describe pitch ing the text line, by using contour only: linguistically. When referring to early research, it may be useful . to quote it in its original format, and this may include (8) a. diacritics and pitchmarks that are no longer widely used, In some cases it may be useful to compare two con- or were even invented solely for that particular piece of tours directly. This can be done by drawing both con- research. LATEX and X TE EX offer possible ways to replicate tours, but setting one of them to contour only, as in ex- this type of material. ample (9). In this case, the focussed contour is compared Laman’s 1922 marking system for the six main pitch- to the neutral contour, with the latter styled so that it is levels he posits for kiKongo (there are a few others) are drawn with a dashed line. This example also shows how set out in Table 1. His marking is emulated here by using

Kevin Donnelly TUGboat, Volume 34 (2013), No. 2 227 stress diacritics (\textprimstress and \textsecstress or intonational languages. A variety of diacritics is avail- from the tipa package for LATEX, and Unicode characters able, particularly if X TE EX is used to give access to a com- 02C8 and 02CC for X TE EX, though the tipa characters prehensive font like Charis SIL. Add-on packages such as will work in X TE EX too), and a super/subscript + sign. TikZ/PGF allow diagrammatic representation of pitch lev- els and contours. Intricate marking systems from early Gradation Designation Diacritic works on pitch can be replicated simply and inexpen- very high aˈˈ high sively. high aˈ + semi-high a 9 Acknowledgements mid semi-low a+ I am grateful to Christina Thiele for comments on earlier low aˌ versions of this paper. low very low aˌˌ References Table 1: Laman’s pitch-marking system [1] Kevin Donnelly and Mark Wibrow. TikZ , 2013. http://gitorious.org/ For ease of use, commands can be set up to repro- tikz-pitch-contour. duce the marks (which can appear before or after the rel- [2] John Frampton. ExPex, 2012. http://www.math. evant vowel): neu.edu/ling/tex/expex. \newcommand{\hi}{\textprimstress} [3] KDE. KCharSelect, 2013. http://utils.kde.org/ \newcommand{\vlo}{\char"02CC\char"02CC} projects/kcharselect. \newcommand{\shi}{{$^{\scriptscriptstyle +}$}} [4] Karl E. Laman. The Musical Accent or Intonation in and so on. This allows Laman’s examples to be replicated the Kongo Language. Svenska Missionsförbundet, by inserting the mark shortcut at the appropriate point in Stockholm, 1922. Available from the Internet the word, as the following examples (from p. 12 of [4], Archive at http://archive.org/details/ musicalaccentori00lamarich but with inline pitchmarking added) show: . + mfu mu [. ](king) [5] Karl E. Laman. Dictionnaire Kikongo-Français. IRCB, Brussels, 1936. mukˌaˈnda [. ](book) [6] Scott Pakin. The Comprehensive LAT X Symbol List, zoˈbongo+ [. ](antelope) E 2009. http://www.ctan.org/pkg/comprehensive. ka+ba [. ](to share) + [7] Lorna Priest. Marking tone, 2007. http://scripts. ba+kaˌla [. ](man) + sil.org/cms/scripts/render_download.php? tū la [. ](to put) format=file&media_id=PitchContours.v1. di+nsuˌsuˌ [. ](herbal plant) pdf&filename=PitchContours.v1.pdf Similar approaches can be used for other markings via http://scripts.sil.org/ipahome. in [4], and for works by other early researchers, enabling [8] Fukui Rei. tipa — Fonts and macros for IPA phonetics such material, of intrinsic interest to the history of linguis- characters, 2002. http://www.ctan.org/pkg/tipa. tics as well as to descriptive linguistics, to be distributed [9] Will Robertson. fontspec — Advanced font selection in without great expense. X E LATEX and LuaLATEX, 2013. These examples incidentally demonstrate that more http://www.ctan.org/pkg/fontspec. aesthetic placement of the inline pitchmarks discussed in [10] SIL. Charis SIL 4.112, 2011. http://scripts.sil. section 4 can be achieved by manipulation of the text line org/CharisSIL. in the contour marks directive. For instance, writing out [11] Till Tantau. TikZ and PGF, 2010. mukanda in full http://www.ctan.org/pkg/pgf. contour marks={.1.>..1}] [12] Mark Wibrow. Using TikZ to depict intonation, 2013. {mukanda};] http://tex.stackexchange.com/questions/ 107941/using-tikz-to-depict-intonation. gives mukˌaˈnda [. ], which contains unsightly [13] Mark Wibrow. Using TikZ to depict pitchlevel, 2013. gaps in the inline marking. To deal with this, simply re- http://tex.stackexchange.com/questions/ move excess letters, using: 108530/using-tikz-to-depict-pitchlevel. contour marks={1.>.1}] {ukana};] ⋄ Kevin Donnelly to give the more attractive mukˌaˈnda [. ]. Llanfairpwllgwyngyllgogerychwyrndro bwllllantysiliogogogoch 8 Conclusions Wales kevin (at) dotmon dot com TEX is a powerful and versatile system for typesetting aca- demic work dealing with pitch, whether relating to tonal http://kevindonnelly.org.uk

Representing linguistic pitch in (X )LE ATEX