Animated Phonetic Script – Exploring Temporality in Visual Speech

Sean Isaacs

S0937900

Master of Science – Design and Digital Media

August 2016

Word Count: 6138

1

Contents History of Written Language ...... 3 International Phonetic ...... 5 Regular Featural Phonetic Scripts ...... 6 Inspiration ...... 7 APS Design Context ...... 9 APS Design ...... 10 Consonants ...... 10 Manner of articulation (Appendix 4) ...... 11 Place of articulation (Appendix 5) ...... 11 Voicing (Appendix 6) ...... 11 ...... 11 Prosodic elements ...... 13 Methodology ...... 13 Evaluation ...... 15 Obstacles ...... 15 Applications of APS ...... 17 Acknowledgements ...... 19 References ...... 20 Appendix ...... 22

2

Records of the use of written language date back to at least 3,500 B.C. The dominant media on which text is written have changed over the years from clay, to papyrus, to parchment, then to paper, which has been the globally dominant wriding medium since the mid-15th century. Static written text has been directly used in new media, and whilst moving text (such as in Kinetic typography) is not unheard of, it is not a preferred medium of written language communication. This dissertation aims to explore how a dynamic, temporal dimension could be added to the writing of text, by deconstructing how writing has historically been used, and by combining this with the study of the physiology of human speech. Could such an implementation of visual language be the next logical technological advancement? Does it have a place in cutting-edge technologies such as virtual reality displays?

History of Written Language Early scripts were recorded in impressions created by blunt wedge- In broad terms, language is a system of shaped reed styli in clay – a system which communication of ideas and emotions, of evolved out of the use of symbolic and which speech is the phonic medium of pictographic clay tablets and tokens, transmission (Crystal, 2008). Written predominantly for accounting purposes. language refers to the use of visible signs Papyrus use emerged in Ancient Egypt (or tactile signs in the case of systems such around 3,000 B.C. Owing to the flexibility as ) to systematically represent units and lightness of papyrus, long texts could of language in order to allow them to conveniently be written, stored and subsequently be retrieved by individuals transported in the form of scrolls and familiar with the encoding of the language codices (which were developed later, in in question (Coulmas, 1999, p. 560). 200-300 A.D.) (UTexas-Austin, n.d.). In Written language is thought to have 200-100 B.C., parchment – a more emerged independently at least twice – in expensive and more durable material made 3,300 B.C. Mesopotamia and 9th century B.C. Mesoamerica (Wilford, 1999; Lo, 2012).

The means by which language is transcribed have changed over time as media used to record writing developed.

3 of animal hide, superceded papyrus as the writing material of choice (EncyclopaediaBrittanica, 2016), before being obsoleted by paper, which was invented in China around 100 B.C (GeorgiaTech, 2006). The use of paper spread throughout Asia and into Northern Africa and Europe over the course of 1,000 to 1,300 years, but did not gain significant Figure 1: Kinetic Typography by Gery Greyhound favour in Europe until the rise of the printing https://commons.wikimedia.org/wiki/File:140815_AlvinKyneKonflikt_1.gif press in the mid 15th century (GeorgiaTech, 2006). be applied to the words which appear in works of kinetic typography. For nearly 600 years, paper has remained the dominant medium for written language, Whilst tools such as Adobe Flash/Animate whether for handwriting, printing, or typing. and Apple motion facilitate the animation of In the late 19th century, film added the typography, considerable user input and capacity for movement to visual media. time is still needed. It is therefore frequently Film was used to display text temporally an artistic decision, rather than a favoured whilst still a young medium, with the means of conveying a message, to use earliest known use in 1899 by George kinetic typography. Melies (Bellantoni & Woolman, 1999). This is the first example of kinetic typography Pieces of kinetic typography largely tend to (Figure 1), or moving text, a technique use single words as the smallest unit of which has since seen widespread use the typography to be animated. Experienced forms of animated film and television titles readers are generally thought to interpret and visualised monologues. words holistically to some degree, rather than strictly letter-by-letter (Reicher, 1969), The temporal dimension in kinetic although there are studies which refute this typography affords it with several qualities (Pelli, et al., 2003). Spoken words are absent from static type. The motion of primarily recognised syllable-by-syllable, words appearing and disappearing from and have also been shown to be view can mimic the pace of conversational interpreted phoneme-by-phoneme by speech. Established animation principles individuals with even moderate literacy that give cartoon characters emotional and levels (Mehler, et al., 1981). The visceral qualities (Lodigiani, 2014) can also contrasting means by which whole written words and segmented spoken words are

4 processed means that kinetic typography ordinary Latin characters as possible, cannot capture all of the the fine-level whilst minimising the addition of new temporal detail of speech. characters.

Written language also does not capture many prosodic elements of speech. Some, Figure 2: Egyptian Hieroglyps for Pharaoh such as intonation and pitch, can be = ”pr”, logograph for “house”, = ”aa”, logograph for “great” gleaned from learned word pronounciation rules, such as stress-timing in the English The forms of the characters of the Latin language, and punctuation (Crystal, 2008). alphabet have roots in Egyptian This is howver not always the case; the hieroglyphs (~3,500 B.C.), which were sentence “Did he steal my wallet?” can initially of Ancient Egyptian have five different meanings depending on words that were subsequently assigned which word is emphasised. Other more phonemes associated with each complex prosodic properties, such as (Loprieno, 2009). emotion or sarcasm are quite apparent in speech, but must often inferred The characters of the have contextually with some difficulty from evolved from their logographic roots, written text. tending towards simplicity and ease of writing. There is therefore no available logic In common practice, such prosodic to intuit a phoneme- relationship elements are not directly encoded into a priori given the characters and phonemes written text, although writing systems such alone. I.e., one could not take any quality of as the International Phonetic Alphabet a phoneme, whether auditory or (IPA), have methods in place to do so physiological, link it to the visual properties (2016). Precise changes in pitch can also of its corresponding grapheme, and then be recorded by combining musical notation apply the same logic to match another with written lyrics. phoneme to its corresponding grapheme.

International Phonetic Alphabet In order to visually represent speech at the

The International Phonetic Alphabet (IPA), phonemic level whilst preserving its invented in 1888 with the goal of creating a temporality, it is therefore inappropriate to script with an ideal phonetic orthography, use IPA characters as the basis for the or 1:1 phoneme-to-grapheme visual units, as they bear no directly correspondence, that would be universal intuitable featural relationships to the across all human languages. One key point phonemes they represent. of policy was to make use of as many

5

IPA remains the global standard in is , the official alphabet of South and phonetic categorisation and description of North Korea, developed in the 15th century. human speech sounds. It categorises pulmonic consonants by the different In Hangul, all coronal consonants possible permutations of place of (articulated with the tip of the tongue), for articulation, manner of articulation, and example, are based on the “ㄴ” figure, voicing, and vowels by different which represents a tongue with an configurations of tongue backness, tongue upwardly curled tip. This character height, and lip roundedness (Figure 3) corresponds to the /n/ sound produced (Ager, 2016). IPA therefore serves ideally when the tip of the tongue is in contact with to form the basis of a script which is the alveolar ridge and air is expelled designed on physiological features and can nasally. “ㄷ” has the addition of a horizontal be algorithmically generated. line at the top of the character, indicating Regular Featural Phonetic Scripts the consonant is plosive (air flow is blocked, pressure built up, then released), Writing systems in which are and thus produces the /d/ sound. “ㅌ” designed to have features which correspond with the vocal physiology used builds on this further by adding a central to create their respective phonemes are horizontal line indicating the consonant is known as featural systems (Sampson, aspirated, producing a breathy /t/ sound. 1985). Several of these exist, of which the The vowels of Hangul are composed of earliest known and by far most widely used

Figure 3: IPA Pulmoic Consonant Chart

Consonants on the left half of each column are unvoiced, consonants on the right half of each colummn are voiced

6 adjoining horizontal and veritlcal lines, and advanced to the stage where creating a are not featurally designed. compelling virtual reality experience is realistically achievable, however certain Other slightly more regular featural writing technologies involved in the process, such systems include (VS), as high-fidelity haptic input and optical created by Alexander Melville Bell displays still lag behind. (graduate of the University of Edinburgh, father of Alexander Graham) (1867) and Although HMDs like the Oculus Rift and the Human Physiological Alphabet (HPA), HTC Vive use screens with 2160x1080 created by Geoffrey Graham Tudor (1995). pixel resolutions, due to their proximity to the viewer’s eyes, individual pixels can still In VS, consonants consist of circular be observed (known as the “screen-door shapes with four possible 90° rotations and effect” or SCD) (Unity, 2016). This can be added modifiers to differentiate them by immersion-breaking for a VR user, and in a manners articulation. Vowels are depicted medium where immersion is key, this is a as vertical lines, differentiated by curled significant drawback. The phenomenon is modifiers at the termini and short, centrally- exacerbated in lower-end VR devices with bisecting strokes (Appendix 1). even lower screen resolutions, such as Google’s Cardboard system, which uses a HPA has a relatively higher level of smartphone’s display, processor, consistency between consonants. The accelerometers, inclinometers and frame of reference, or “letter field” in which magnetometers to determine the user’s characters are drawn and the order in orientation. which characters appear largely corresponds with how they occur in the vocal tract. HPS draws the lips at the left- most position, the glottis at the right-most position, and the nasal cavity at the upper- most position (Appendix 2).

Inspiration

The creation of APS was originally inspired Figure 4: Screen-door effect by a problem faced by the current https://upload.wikimedia.org/wikipedia/commons/0/0e/Screen- generation of virtual reality head-mounted The SCD becomes yet more apparent displays (VR HMDs). Computer vision when users are presented with small technology, graphics processing speed typography, and coupled with the and bandwidth, and motion tracking have discomfort caused when congverging one’s

7 eyes on objects that are too close causes pages as a means of fitting long bodies of reading text in VR to be an unpleasant text into fixed display areas. Paginated text experience for many users (Unity, 2016). has existed since the advent of the codex around 200-300 A.D. (UTexas-Austin, Several solutions currently exist to address n.d.), and features prominently in digital this problem. The simplest is to enlarge text media such as eBooks. Both of these and locate it closer to the user’s visual reduce the space taken up by text to some horizon, to reduce the discomfort caused a extent, but still tend to show a reader user’s eyes converging on a nearby object. relatively large chunks of text at any one Putting the text further away, however does time. reduce its pixel size, so the SCD can still be quite apparent. Enlarging the text to take Reducing the amount of text displayed to up a large proportion of the user’s vision one to three words at a time and presenting can reduce this effect, although it can be them one at a time to a user at a fixed distracting. position on a screen reduces the area taken up by the text even further. The use of voice recordings or generated Displaying text in this way is known as rapid speech is currently the best candidate for serial visual presentation (RSVP), and has text replacement in VR, as it resolves the been shown to increase reading speed by eye-strain involved in reading small, 33% with no significant increases in pixelated text and minimally distracts the cognitive load or decreases in user’s visual attention. comprehension (Öquista & Goldstein, 2003). This might appear to be a good Other text-less visual solutions might candidate for text display on a VR display, include the use of , various however, as with kinetic typography, the markers or gestural symbols to provide a smallest unit presented to the user at one user with visual information without relating time by RSVP is the whole word, so it loses directly or indirectly to language. some temporal resolution when compared

It remains worthwhile to consider how to speech. existing technologies can be used directly Logically, the area taken up by visually or as inspiration to address the problems displaying speech can be reduced to the associated with the display of text in VR. display of single phonemes at a time, and

Scrolling text has existed since the its use reasonably no further. This would serve as in papyrus scrolls around 3,000 B.C. (Ager, the basis for the formation of words in a 2016), and in modern display media temporally-spaced, animated script. features prominently in film credits and web

8

Such a script would ideally need to be morphing between their characters will intuitive and visually simple. The script tend to be jarring. developed for this dissertation, Animated Phonetic Script (APS), attempts to do this For example, the reference frame and by reducing the physiological components orientation of different Hangul characters involved in creating speech sounds into with respect to the position of the simple, yet distinctive forms, and physiological features they correspond with combining and morphing between them in is inconsistent, particularly between vowels various permutations to emulate the and consonants (Ager, 2016). movements of the articulators involved in The relationship between graphic features human speech. and physiological features in Visible From this initial conception, it became Speech is also quite inconsistent, as a apparent that there had been no prior circle segment can varyingly represent attempt to create a script specifically either the tongue or the lips depending on designed for a digital display medium, that its rotation, and vowels are composed out unlike any other alphabet, was designed of very different constituent shapes to specifically with animation at its core. With consonants (Appendix 1) (Bell, 1867). this realisation, the design of APS has Whilst less apparent, some small become an exercise in exploring the inconsistencies in the construction of the possibilities, obstacles, practicalities, and consonants of the Human Physioalphabet applications of such a script. do exist. Namely, the elements (letter APS Design Context constructors representing where a When designing a script with the central consonant is articulated) always depict the focus of being animated, non-phonetic point of a consonant’s articulation, except symbology lacking ideal phonetic in the case of retroflex consonants, in orthography, which is the norm for most which the element represents the shape of written languages, is inappropriate. All the the tongue. Additionally, a distinction is graphemes in APS are mapped 1:1 to their made between nasal articulation and oral corresponding phonemes. areas of articulation based on vertical position, but the same distinction is not Existing phonetic with ideal or drawn for glottally articulated consonants, close to ideal phonetic orthographies exist, which have relatively more vertical however due to the non-featural or separation from orally articulated inconsistently featural character designs consonants (Appendix 2). found in each of them, animating and

9

HPA vowels, whilst featural in overall APS Design design, are constructed by pictographic The IPA’s classification of phonemes by sub-units to represent roundedness, permutations of physiological factors in tongue height and tongue backness, and their articulation serves as the basis for the like VS vowels, have totally different forms organisation of APS grapheme to HPA consonants (Tudor, 1995). construction. Pulmonic consonants, which Due to the irregular and inconsistently are produced by the obstruction of air featural designs of the graphemes of IPA, escaping the glottis or oral cavity, which Hangul, VS, and HPA, none is suitable to comprise the majority of all consonants serve as a framework for the graphic produced by humans and includes all design of a new animable script, hence consonants in the are APS uses many of the same principles by organised majorly by elements of place and which these alphabets are constructed, but manner of articulation, as well as voicing. varies in the approach to its design. Vowels are classified by the elements of tongue height, tongue backness, and lip These alphabets have also been roundedness. developed to be easily handwritten, and so feature relatively simplified (though APS graphemes consist of curves, lines, arguably not in the case of VS) letter forms circles and simple shapes representing the for rapid notation. This is unnecessary for IPA’s elements of phoneme classification. an entirely digitally-used script, whose The shape, size, position, opacity, line characters could be generated with a weight, display duration, and, to a small keystroke. They also all have a tendency extent, colour of these shapes are for their characters to represent side-profile determined varies depending on which views of the vocal articulators they depict, grapheme is drawn and how its which is less apt for a featural script that is pronounciation is affected by its context meant to emulate speech to some degree, within a word or sentence. as the attitude of a speaker tends to be with his/her face towards the listener. Fixing the Consonants See full APS consonant chart at: reference frame for APS characters in this orientation allows the meanings of different http://playground.eca.ed.ac.uk/~s0937900/pc.html (Appendix 3) characters inferred intuited through direct comparison. APS consonants mirror IPA consonants in their taxonomy, using visual representations of the same three parameters in their graphemes. All

10 possible, and some impossible simple Vowels pulmonic consonants (not including See full APS vowel chart at: affricates and co-articulated consonants) http://playground.eca.ed.ac.uk/~s0937900/v.html can be generated by the combination these (Appendix 7) three parameters in different permutations. Using the same modular construction Manner of articulation (Appendix 4) principle employed in the design of The manner of articulation for each pulmonic consonants, vowels are built from consonant is indicated through the use of permutations of visual depictions of tongue the following kinetic and static symbols. height, tongue backness, and lip roundedness. Place of articulation (Appendix 5) Where the tongue is involved in the place The representation of the tongue in APS of articulation for each consonant, it is vowels consists of two connected depicted by a distinctive curved line and semicircles, and changes only in size and semi-opaque mark at point of contact or vertical position between vowels, where approximation of the tongue with the other size represents vowel backness, with a articulators. Where the tongue is partially- smaller shape representing a further-back or un-involved in the articulation of a vowel and a larger shape representing a consonant, the other relevant articulators further-forward vowel. are represented by curves, ellipsoids, and lines, as detailed below. Tongue height is simply displayed as the y- position of the tongue within the frame in Affricate consonants are formed by rapidly which the APS graphemes are being morphing between their constituent simple displayed. pulmonic consonants. Object size alone is not a preattentively Voicing (Appendix 6) processed property of human vision Pulmonic consonants are classified as (Stuart, et al., 1993), however stark colour either voiced or unvoiced in the IPA, differences, enclosure and subitising of up meaning they are produced either by to four objects are (Few, 2004). ejecting air from the lungs respectively with or without vibration of the vocal cords.

In APS, voiced consonants are indicated by an oscillating sinusoidal waveform below the main grapheme display area, whilst unvoiced consonants are indicated by a flat line in the same position.

11

Additionally, it can be difficult to differentiate APS vowels with similar tongue heights at a glance. For this reason, six small red reference points (black-ringed white circles for a colourblind version) are also present in the APS frame, which provide vowels that similar backnesses and/or heights with more salience and distinctiveness by either being occluded, triggering changes in their subitisation by a Figure 6: IPA Vowel Chart Where vowels occur in pairs, the left character is unrounded, and reader (Saltzman & Garner, 1948), or the right character is rounded simply serving as salient points of represented by a large circle which frames reference which either partially or fully the tongue. A continuous spectrum of enclose or are enclosed by the tongue ellipsoid shapes between the forms of shape (Elder, 1992) (Figures 7a/7b). unrounded and rounded vowel indicators is In most cases, IPA does not differentiate possible. These shapes tend to be between the level of roundedness for uninformative, particularly in English vowels at more than a binary level. In APS, language usage, where distinctions in level rounded vowels are represented by a tight of verb roundedness rarely occur, and so lip circle at the centre of the grapheme, with are not utilised in APS vowel design. the tongue external or partially internal to Although the IPA vowel table is drawn on a the circle. Unrounded vowels are continuum from front-to-back (tongue

Figures 7a & 7b: APS vowel graphemes for ɪ and ə

6a shows 2 red reference points occluded, 6b shows no reference points occluded with tongue between two sets of points

12 backness) and close-to-open (tongue Prosodic elements height) (Figure 5), vowels have been Syllabic stress, which is transcribed in IPA, shown to be recognised similarly to is indicated by increased weight of the lines consonants in the form of discrete and curves in the graphemes that for the phonemes rather than points within a syllable (see the 5th syllable - ˈreɪ - in continuous range (Cross, et al., 1965). APS demonstration 1 below). This is a visual therefore divides tongue backness into five property that can be rapidly and discrete levels and tongue height into preattentively processed (Few, 2004). seven discrete levels, although, as with Pitch changes, such as pitch rises on the roundedness, a continuous spectrum is last syllables of list items or questions technically possible, if somewhat (denoted in written language by commas impractical. and question marks, see the 8th syllable in Unlike IPA vowels, in which backness has demonstration 1 below), and drops in pitch a forward tendency as tongue height on the last syllables of non-question decreases (mirroring the physiology of sentences (denoted in written language as vocalising vowels), APS does not visually full-stops and exclamation marks), are depict this. All vowels, regardless of height, indicated by respective rises and falls in the are assigned one of the five tongue sizes position of the voicing indicator to discrete depending on their backness relative to upper and lower positions, occluding one of vowels of the same height (Appendix 7). three red reference points at any given This prevents more open vowels from pitch level. These reference points rely on having depictions of tongue backness too the same preattentively processed difficult to distinguish from one another. attributes exploited by the six vowel Consequently, all back vowels are always reference points. drawn using the smallest tongue size. Methodology Although this choice might initially seem The designs for all the APS grapheme counterintuitive, the benefit gained in components were all created and within easier visual discrimination of vowels is Adobe Animate CC. Tongue shapes, place significant. Additionally, speakers are able of articulation indicators, and manner of to easily intuit the maximum backness level articulation symbols (excluding fricative of an open vowel, as exceeding it will cause symbols) consist of straight lines and discomfort. Bezier curves. Lips are represented by ellipsoids that collapse into a single curve All vowels involve the vibration of the vocal for labiodental consonants. Fricative cords, and so all include a voicing indicator. symbols and voicing indicators are Animate MovieClip class objects, and have internal

13 sub-animations (undulations). All other http://playground.eca.ed.ac.uk/~s0937900/slow.html shapes drawn in APS graphemes are The demo above contains a phonetic standard “shape” class Animate objects. pangram of all phonemes used in received

Long vowels, such as ɑː (“are” in received pronounciation English (“Are those shy pronounciation) are displayed for 20 Eurasian headgear, cowboy chaps, or jolly, frames. Shortened vowels, such as most earthmoving headgear?”. There is an uses of the mid central vowel ə (schwa), interactive, navigable progress indicator are displayed for 10 frames. Regular length that can be toggled on at the bottom left of vowels are displayed for 15 frames. Simple the frame. consonants are displayed for 10 frames, Conversational APS demonstration 2 – 120fps (fast) and affricate consonants have their constituent simple consonants split over 15 http://playground.eca.ed.ac.uk/~s0937900/amzp.ht ml frames. Consonants at the first or last position of a word are elongated to 15 The framerate of this interactive demo is frames to provide words with a clearer set low at 24 fps to allow unfamiliar readers beginning and end. Pauses between to become acquainted with APS. A words, although often undeard in conversational speed of APS occurs at a conversational speech, are included to framerate closer to 120 fps, shown here make individual words more easily with inclusion of synthetically generated separable. These are set at 30 frames after speech with the correct pronounciation for a comma, 40 frames at the end of a each phoneme (generated with sentence, else at 15 frames. Googamaphone’s Type and Speak app, 2016). At this framerate, APS becomes Individual features (manner indicators, virtually unintelligible to even an tongue shape, place indicators, lip shape experienced reader, so realistically, APS etc.) fade in and out and morph between would be displayed at a speed slower than their various forms as one grapheme leads most conversational speech to preserve its into another. Discontinuous shapes, such interpretability. as manner indicators, fade in over 4 frames and out over 10 frames. Fricative manner APS demonstration 3 – 48 fps (moderate) symbols fade in and out depending on their http://playground.eca.ed.ac.uk/~s0937900/mod.html two-dimensional position.

APS exploration/learning demonstration 1 – 24fps (slow)

14

Evaluation human vocal anatomy. This reduced APS within context of featural speech metaphoric purity means the symbols of representation APS are significantly more intuitive than featural writing systems of higher Featurally represented speech lies on a metaphoric purity. It is possible, that once spectrum of metaphoric purity. The purest the hurdle of learning a purer metaphoric level of visual metaphor in terms on the featural script has been overcome, reading relationship between the visual and it might be faster and/or easier than reading phonemic properties of a script would be APS. Unlike any other featural script, the one whose graphemes are completely orientations of APS elements are fixed unrelated to the phonemes they represent. relative to each other and to their Extremely simple shapes could be used in anatomical positions (in a forward-facing such a script, which may make it easier to manner). read. Learning a purely metaphoric alphabet would be however, far less At the lowest level of metaphoric purity, intuitive than a moderately purely speech can be visualised by the display of metaphoric script such as APS, as its forms an avatar or recording of a person, and wound be only context-, and not anatomical reading its lips. This has the lowest feature-dependent. An example of such a bandwidth of all featural display variants, script is Gregg (Gregg, 1922). as the internal articulators of such a display would be obscured by external features, Featural scripts such as Hangul, Visible rendering interpretation of its output difficult Speech and the Human Physioalphabet to impossible. have slightly reduced metaphoric purity, as their symbols, although abstracted to Obstacles varying extent from direct representations The first obstacle to anyone using APS is of the physiological features, maintain that the script must first be learned. With some degree of featural design. this challenge in mind, the design of APS is Incidentally, the inconsistencies between structured to be as intuitive and quick to the design of consonants and vowels in all learn as possible. It may still be the case of these writing systems render them that it is difficult to learn or follow for some inappropriate for the purpose of letter-by- users. Older individuals might particularly letter animation. struggle to keep up with an APS message once it has started, as visual response APS falls below Hangul, VS and HPA in times deteriorate with age (Humes, et al., terms of metaphoric purity, as its features, 2009). whilst being two-dimensional forms, directly emulate the same features in the

15

The design of APS is the work of a single fine-tuning, however it may also be person, so its mechanisms and the possible to use pacing parameters intuitiveness of the visual metaphors used generated by speech synthesis software in its design are likely to vary subjectively (Birkholz & Pappalardo, 2013). user-by user. No formal feedback on its Complex prosody, such as sarcasm, tone, usability has yet been collected. Should or emotional content would be impossible APS be pursued further, such a survey to transcribe into intuitive forms in the same wound need to be carried out. way that APS graphemes are designed. Encapsulating all the complex properties of They might, however, be represented by speech in a visual metaphor is difficult. changes in colour, shifts in scale, or by APS currently distinguishes between adding ideographic diacritics, such as individual phonemes, stressed and , to the APS graphemes. unstressed syllables, phoneme pacing, and basic pitch changes, such as the rise on the Generating an APS message from human final syllable of a question. input is currently a more involved process than it reasonable has the potential to be. Higher resolution data such as precise The creation of a new message is currently pitch, volume, degree of stress, and pacing achieved by copying and pasting frames are not impossible, but would require the one grapheme at a time in Adobe Animate. development of new systems of As APS graphemes are modularly representation. Changes in pitch and constructed, and rules exist for morphing volume could technically be included in the between characters and transcribing some APS design by altering the wavelength and aspects of prosody, it would be possible to amplitude of the voicing indicator. The way create an algorithm to automatically pitch shifts are currently implemented at combine individual visual featural elements the ends of clauses is unlikely to be into phonemes, phonemes into words, and replaced entirely by such a precise pitch words into sentences. Input could be display, as although pitch changes converted from English text to IPA using between clauses are salient to a listener, software such as WordsAPI, which would they are difficult to detect visually in a then be ready to be parsed and fed to the moving waveform. Different degrees of APS drawing algorithm. Ideally, the input stress, such as primary and secondary would be the linguistic analysis output of a stress, could be represented by distinct an articulatory speech synthesiser, as such stroke thicknesses, however this was output would include prosodic data excluded from the current version of APS unavailable in standard IPA transcriptions, for simplification purposes. More accurate such as intonation and duration. pacing is achievable with additional manual

16

Morphing between shapes would require several reasons. Firstly, the speed at which some vector operations. APS conveys information under 100 wpm at easily readable framerates. Due to the At the time of writing, APS only features simplification of APS characters and the graphemes for simple pulmonic restrictions in the area they occupy, the consonants, two affricate pulmonic temporal dimension of APS messages consonants (ʧ, ʤ), and one co-articulated must carry a greater information burden. consonant (w), as English is limited APS is thereby impractically slow primarily to these. APS would need to be compared to most peoples reading speeds, expanded by adding new symbols the reading speed achievable by the rapid (designed using the same principles) to serial visual presentation of words, and encompass all IPA letters before being even the speed of conversational speech. universally applicable. Auditory processing occurs more quickly IPA diacritics are not featured in the current than haptic or visual feedback (Humes, et version of APS, adding them is generally a al., 2009). Compared to speech audition, matter of including the relevant articulatory the reading of an APS message will manner or lip involvement, such as, such interfere significantly more with the as an unvoiced/voiced indicator. observation of visual elements other than the APS message (Wickens, 2002). Some diacritics are more problematic (e.g. Backing up to re-read an APS message e̙ - retracted tongue root), and their from a previous point is also a significantly inclusion might necessitate the re-design of more involved process than doing the certain features of APS. same when reading static text. Tonality, present in languages such as Speech is itself a technology by which Cantonese, may be representable in APS concepts and emotions are converted into by experimenting with the vertical range of auditory information, ordered sequentially the voicing/pitch indicator. in time. Written language takes this a step Applications of APS further, removing the direct temporal APS has become, at its core, a challenge component of speech and translating it into in designing a practical visual metaphor for two spatial dimensions. Temporality is not speech – itself a practical metaphor for entirely absent from written language; it though, rather than simply a means of takes time to process information written on replacing text as the means of visually a page or displayed on a screen when displaying speech on a small digital display scanning it from side-to-side and top-to- whilst economising on the area used. For bottom, as is the occurs in the reading most this purpose, it is likely to be impractical for written languages.

17

In removing a proportion of the temporal be to simulate reality. Floating text or component of speech, and encoding morphing phonetic symbols would stand learned rules of interpretation – such as out as aberrations in in otherwise an punctuation to indicate separations in otherwise natural-looking environment. related and unrelated ideas, and to Speech, although man-made, is less distinguish questions from declarative contrived and feels more natural than statements – the reading and writing of text potentially immersion-breaking written has the potential to transmit information at instructions. VR games and applications far greater speeds than is possible through will likely be best suited to using speech. disembodied voices or, less uncannily, virtual avatars, to guide users through a The processing of words becomes more virtual environment. and more preattentive and therefore more rapid to readers as they develop in If large amounts of detailed information experience. Readers can further reduce must to be quickly given to a user, static the time involved in reading by reducing text may remain the best available solution, their subvocalisation, creating yet greater owing to its relatively high bandwidth for distinction between auditory and written fast readers. means of language interpretation. Other solutions to address this problem will APS does, nevertheless, address the surely be developed and perfected over problem of limited area for text display on a time, and may trend away from textual small VR HMD screen by displaying representation of information and towards relatively large characters in a relatively ideographic and gestural visual cues. small area when compared to static text. A In its current state, APS does not have a developer may choose to implement such fully organic, human quality to its a system if the drawbacks of APS (reduced appearance. APS has been simplified for reading speed, higher cognitive load) are clarity, so subtleties in the way facial not of concern, and its benefits (reduced anatomy moves when vocalising are lost. area, truer speech emulation) are desired. Whilst this might detract from its fidelity in visually emulating human speech, it is of benefit it is used in a purpose where a less- APS demonstration 4 – HUD than-human, or even a robotic quality is http://playground.eca.ed.ac.uk/~s0937900/hud.html desirable.

It is most likely that neither using APS nor This might, for example be, in the the direct display of bodies of text is the animation of cartoon robots, or of speaking way forward in VR, the aim of which should

18

AI avatars, such as Apple’s Siri or though as previously mentioned, text is not Microsoft’s Cortana. Synthesised speech as closely tied to the temporal aspects of mimics human sounds whilst being speech as is APS, and scrolling or RSVP generated computationally – a process have less temporal resolution owing to their mirrored by APS – so such a juxtaposition phrase-by-phrase or word-by-word seems apt. animation, unlike APS which is animated phoneme-by-phoneme. An example where APS demonstration 5 – Robot character speech animation APS might be preferable to text subtitles is in the visualisation of a narrator’s speech in http://playground.eca.ed.ac.uk/~s0937900/robot.ht ml a film or game.

APS could also be used to replace speech in situations where text is an inappropriate Acknowledgements I’d like to thank John Lee for his feedback and or undesirable substitute. Text can replace supervision throughout the process of my Masters speech in most cases where a visual dissertation. I would also like to thank Jules equivalent is required, such as subtitles for Rawlinson and Andrew Connor for their insightful feedback during the interim project review. individuals with hearing impairments,

19

References Ager, S., 2016. Ancient Egyptian scripts (hierogyphys, and ). [Online] Available at: http://www.omniglot.com/writing/egyptian.htm [Accessed 17 Aug 2016].

Ager, S., 2016. International Phonetic Alphabet (IPA). [Online] Available at: http://www.omniglot.com/writing/ipa.htm [Accessed 17 Aug 2016].

Ager, S., 2016. Korean alphabet, pronounciation, and language. [Online] Available at: http://www.omniglot.com/writing/korean.htm [Accessed 17 Aug 2016].

Bell, A. M., 1867. Visible Speech: the Science of Universal Alphabetics. 1st ed. London: Simpkin, Marshall & Co.

Bellantoni, J. & Woolman, M., 1999. Type in Motion. New York City: Rizzoli.

Birkholz, P. & Pappalardo, F., 2013. Modeling Consonant-Vowel Coarticulation for Articulatory Speech Synthesis. PLoS One, 8(4).

Coulmas, F., 1999. The Blackwell Encyclopedia of Writing Systems. Revised ed. Oxford : Wiley- Blackwell.

Cross, D. V., Lane, H. L. & Sheppard, W. C., 1965. IDENTIFICATION AND DISCRIMINATION FUNCTIONS FOR A VISUAL CONTINUUM AND THEIR RELATION TO THE MOTOR THEORY OF SPEECH PERCEPTION. Journal of Experimental Psychology, 70(1), pp. 63-74.

Crystal, D., 2008. A Dictionary of Linguistics and Phonetics. 6th ed. Malden: Blackwell Publishing Ltd.

Elder, J., 1992. Contour closure and the perception of shape, 1992: McGill University.

EncyclopaediaBrittanica, 2016. parchment | writing material. [Online] Available at: https://www.britannica.com/topic/parchment [Accessed 17 Aug 2016].

Few, S., 2004. Perceptual Edge. [Online] Available at: http://www.perceptualedge.com/articles/ie/visual_perception.pdf [Accessed 17 Aug 2016].

GeorgiaTech, 2006. The Invention of Paper. [Online] Available at: http://www.ipst.gatech.edu/amp/collection/museum_invention_paper.htm [Accessed 17 Aug 2016].

GeorgiaTech, 2006. The Spread of Papermaking in Europe. [Online] Available at: http://www.ipst.gatech.edu/amp/collection/museum_pm_euro.htm [Accessed 17 Aug 2016].

Gregg, D. J. R., 1922. The Basic Principles of Gregg Shorthand. New York City: s.n.

Humes, L. E., Busey, T. A., Craig,, J. C. & Kewley-Port, D., 2009. The effects of age on sensory thresholds and temporal gap detection in hearing, vision, and touch. Atten Percept Psychophys, 71(4), p. 860–871.

20

IPA, 2016. Full IPA Chart | International Phonetic Association. [Online] Available at: https://www.internationalphoneticassociation.org/content/full-ipa-chart [Accessed 17 Aug 2016].

Lodigiani, V., 2014. THE ILLUSION OF LIFE. [Online] Available at: http://the12principles.tumblr.com/ [Accessed 17 Aug 2016].

Lo, L., 2012. Ancientscripts.com. [Online] Available at: http://www.ancientscripts.com/index.html [Accessed 17 August 2016].

Loprieno, A., 2009. Ancient Egyptian A Linguistic Introduction. Online ed. Cambridge: Cambridge University Press.

Mehler, J., Dommergues, J. Y., Frauenfelder, U. & Segui, J., 1981. The Syllable's Role in Speech Segmentation. Journal of Verbal Learning and Verbal Behaviour, Volume 20, pp. 298-305.

Öquista, G. & Goldstein, M., 2003. Towards an improved readability on mobile devices: evaluating adaptive rapid serial visual presentation. Interacting with Computers, Volume 15, pp. 539-558.

Pelli, G. G., Farell, B. & Moore, D. C., 2003. The remarkable inefficiency of word recognition. Nature, Volume 423, pp. 752-756.

Reicher, G. M., 1969. Perceptual Recognition as a Function of Meaningfulness of Stimulus Material. Journal of Experimental Psychology, 81(2), pp. 275-280.

Saltzman, I. J. & Garner, W. R., 1948. Reaction time as a measure of span of attention. Journal of Psychology, Volume 25, pp. 227-241.

Sampson, G., 1985. Writing systems. London: Hutchinson.

Stuart, G. . W., Terence, R. J. & Bossomaier, S., 1993. Preattentive processing of object size: implications for theories of size perception. Perception, Volume 22, pp. 1175-1193.

Tudor, G. G., 1995. PHYSIOALPHABET AKA HUMAN PHYSIOLOGICAL ALPHABET. [Online] Available at: http://www.physioalphabet.com/ [Accessed 17 Aug 2016].

Unity, 2016. USER INTERFACES FOR VR. [Online] Available at: https://unity3d.com/learn/tutorials/topics/virtual-reality/user-interfaces-vr [Accessed 17 Aug 2016].

UTexas-Austin, n.d. Early Writing. [Online] Available at: http://www.hrc.utexas.edu/educator/modules/gutenberg/books/early/ [Accessed 17 Aug 2016].

Wickens, C. D., 2002. Multiple resources and performance prediction. Theoretical Issues in Ergonomics Science, 3(2), pp. 159-177.

Wilford, J. N., 1999. Who Began Writing? Many Theories, Few Answers. The New York Times, 6 April.

WordsAPI, 2014. Words API. [Online] Available at: https://www.wordsapi.com/ [Accessed 17 Aug 2016].

21

Appendix

Appendix 1

From Visible Speech: the Science of Universal Alphabetics, Bell 1867

22

Appendix 2

From Physioalphabet - Pulmonic consonants and vowels, Tudor 1995 http://www.physioalphabet.com/

23

Appendix 3

APS Pulmonic consonant chart http://playground.eca.ed.ac.uk/~s0937900/pc.html

24

Plosive (unvoiced) Plosive (voiced) Nasal

Flap/Tap Trill Approximant

Fricative Lateral approximant Lateral fricative

Appendix 4

APS Pulmonic Consonant Articulation Manners

These are drawn at the point of obstruction of the airways

25

Dental Bilabial Labiodental

Alveolar Post-alveolar Retroflex

Uvular Velar Palatal

Pharyngeal Glottal

Appendix 5

APS Pulmonic Consonant Articulation Manners

26

Unvoiced Indicator

Voiced Indicator

Appendix 6

APS Voicing Indicators

27

Appendix 7

APS Vowel Chart http://playground.eca.ed.ac.uk/~s0937900/pc.html

28