(19) TZZ¥Z __T

(11) EP 3 026 541 A1

(12) EUROPEAN PATENT APPLICATION

(43) Date of publication: (51) Int Cl.: 01.06.2016 Bulletin 2016/22 G06F 3/0482 (2013.01)

(21) Application number: 16150079.8

(22) Date of filing: 01.09.2009

(84) Designated Contracting States: • BOETTCHER, Jesse AT BE BG CH CY CZ DE DK EE ES FI FR GB GR Cupertino, CA California 95014 (US) HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR (74) Representative: Gillard, Matthew Paul Withers & Rogers LLP (30) Priority: 05.09.2008 US 205780 4 More London Riverside London SE1 2AU (GB) (62) Document number(s) of the earlier application(s) in accordance with Art. 76 EPC: Remarks: 09792124.1 / 2 329 348 •Claims filed after the date of filing of the application (Rule 68(4) EPC). (71) Applicant: Apple Inc. •This application was filed on 04-01-2016 as a Cupertino, CA 95014 (US) divisional application to the application mentioned under INID code 62. (72) Inventors: •MASON,James Cupertino, CA California 95014 (US)

(54) MULTI-TIERED VOICE FEEDBACK IN AN ELECTRONIC DEVICE

(57) Thisinvention is directed toproviding voice feed- back features may be integrated in a Model View Con- back to a user of an electronic device. Because each troller (MVC) design used for displaying content to a user. electronic device display may include several speakable For example, the model and view of the MVC design may elements (i.e., elements for which voice feedback is pro- include additional variables associated with speakable vided), the elements may be ordered. To do so, the elec- properties. The electronic device may receive audio files tronic device may associate a tier with the display of each for each speakable element using any suitable approach, speakable element. The electronic device may then pro- including for example by providing a host device with a vide voice feedback for displayed speakable elements list of speakable elements and directing a text to speech based on the associated tier. To reduce the complexity engine of the host device to generate and provide the in designing the voice feedback system, the voice feed- audio files. EP 3 026 541 A1

Printed by Jouve, 75001 PARIS (FR) 1 EP 3 026 541 A1 2

Description in a predetermined order (e.g., based on tiers associated with each displayed object). Background of the Invention [0006] In some embodiments, a method, electronic de- vice, and computer readable media for providing voice [0001] This invention is directed to providing multi-5 feedback to a user of an electronic device may be pro- tiered voice feedback in an electronic device. vided. The electronic device may display several ele- [0002] Many electronic devices provide a significant ments and identify at least two of the elements for which number of features or operations accessible to a user. to provide voice feedback. The electronic device may The number of available features or operations may often determine a tier associated with the display of each of exceed the number of inputs available using an input10 the identified elements, where the tier defines the relative mechanism of the electronic device. To allow users to importance of each displayed element. The electronic access electronic device operations that are not specif- device may then provide voice feedback for the identified ically tied to particular inputs (e.g., inputs not associated elements in an order of the determined tiers, for example with a key sequence or button press, such as a MENU such that voice feedback is first provided for the most button on an iPod, available from Apple Inc.), the elec- 15 important element, and subsequently provided for the tronic device may provide menus with selectable options, next most important element until voice feedback has where the options are associated with electronic device been provided for each element. operations. For example, an electronic device may dis- [0007] In some embodiments, a method, electronic de- play a menu with selectable options on a display, for ex- vice, and computer readable media for providing audio ample in response to receiving an input associated with 20 feedback for displayed content may be provided. The the menu from an input mechanism (e.g., a MENU but- electronic device may direct a display to display several ton). elements, where speakable properties are associated [0003] Because the menu is typically displayed on an with at least two of the elements. The electronic device electronic device display, a user may be required to look may determine a tier associated with each of the at least at thedisplay to select aparticular option. This may some- 25 two elements and generate a queue that includes the at times not be desirable. For example, if a user desires to least two elements. The determined tiers may set the conserve power (e.g., in a portable electronic device), order of the elements in the generated queue. The elec- requiring the electronic device to display a menu and tronic device may direct an audio output to sequentially move a highlight region navigated by the user to provide speak each queue element in the order of the queue, a selection may use up power. As another example, if a 30 where the audio output includes voice feedback associ- user is in a dark environment and the display does not ated with each of the at least two elements. include back lighting, the user may not be able to distin- [0008] In some embodiments, a method, electronic de- guish displayed options of the menu. As still another ex- vice and computer readable media for speaking the text ample, if a user is blind or visually impaired, the user may of elements displayed by an electronic device may be not be able to view a displayed menu. 35 provided. The electronic device may display several el- [0004] To overcome this issue, some systems may ements with which speakable properties are associated. provide audio feedback in response to detecting an input The speakable properties may identify, for each element, from a user or a change in battery status, as described text to speak. The electronic device may display the sev- in commonly assigned U.S. Patent Publication No. eral elements in several views, where each view is as- 2008/0129520, entitled "ELECTRONIC DEVICE WITH 40 sociatedwith speakable order. The electronic devicemay ENHANCED AUDIO FEEDBACK" (Attorney Docket No. generate a queue that includes the several elements, P4250US1, which is incorporated by reference herein in where the order of the elements in the queue is set from its entirety. In some cases, the electronic device may the speakable order of each view (e.g., such that ele- provide voice feedback describing options that a user ments with a higher speakable order are at the beginning may select or operations that the user may direct the45 of the queue). The electronic device may wait for a first electronic device to perform. If several menus are simul- timeout to lapse and identify audio files associated with taneously displayed, or if a display includes different each of the elements of the queue. During the first time- modules or display areas (e.g., several views), the elec- out, the electronic device may modify audio playback to tronic device may have difficulty determining the objects make speech easier to hear and to prevent the electronic or menu options, or the order of objects or menu options, 50 device from speaking while a transaction is detected. The for which to provide a voice feedback. audio files may include the spoken speakable property text to speak for each element. The electronic device Summary of the Invention may sequentially play back the identified audio files in the order of the queue and pause for a second timeout. [0005] This invention is directed to systems and meth- 55 The second timeout may allow the electronic device to ods for providing multi-tiered voice feedback to a user. return audio playback to the pre-speaking configuration In particular, this invention is directed to providing voice (e.g., playback). In some embodiments, the elec- feedback for several displayed objects (e.g., menu items) tronic device may receive the audio files from a host de-

2 3 EP 3 026 541 A1 4 vice that generates the audio files using a text to speech FIG. 14 is a flowchart of an illustrative process for engine from the speakable property text to speak for each providing static strings to an electronic device; and element. FIG. 15 is a flowchart of an illustrative process for providing dynamic strings to an electronic device. Brief Description of the Drawings 5 Detailed Description [0009] The above and other features of the present invention, its nature and various advantages will be more [0010] An electronic device operative to provide selec- apparent upon consideration of the following detailed de- tive voice feedback based on tiers associated with dis- scription, taken in conjunction with the accompanying 10 played options is provided. drawings in which: [0011] The electronic device may include a processor and a display. The electronic device may display any FIG. 1 is a schematic view of a electronic device in suitable information to the user. For example, a display accordance with one embodiment of the invention; may include a title bar, a menu with selectable options, FIG. 2 is a schematic view of an illustrative display 15 an information region for displaying information related screen having content for which voice feedback may to one or more options, information identifying media or be available in accordance with one embodiment of files available for selection, or any other suitable infor- the invention; mation. As the user accesses the display, the electronic FIG. 3 is a schematic view of an illustrative queue of device may provide voice feedback for the different dis- speakable items for playback associated with the 20 played elements. display of FIG. 2 in accordance with one embodiment [0012] Each displayed element may be associated of the invention; with different properties. In some embodiments, dis- FIG. 4 is a schematic view of an electronic device played elements for which voice feedback is to be pro- display after receiving a user selection of an option vided may be associated with a speakable property. The of the display of FIG. 2 in accordance with one em- 25 speakable property may include the text to be spoken for bodiment of the invention; the associated element. In addition, each element, as FIG. 5 is a schematic view of an illustrative queue of part of a view implemented for displaying the element, speakable items for playback associated with the may be associated with a speakable order or tier. As an display of FIG. 4 in accordance with one embodiment electronic device displays elements (e.g., as part of the of the invention; 30 view), the electronic device may determine, from the FIG. 6 is a schematic view of the electronic device speakable properties and the speakable orders, the text display of FIG. 4 having a different marked option in for which to provide voice feedback (e.g., the text to accordance with one embodiment of the invention; speak) and the order or tiers associated with each ele- FIG. 7 is a schematic view of an illustrative queue of ment. The electronic device may select the element hav- speakable items for playback associated with the 35 ing the highest tier and provide voice feedback (e.g., display of FIG. 6 in accordance with one embodiment speak) for the selected element. The electronic device of the invention; may then successively select each element having the FIG. 8 is a schematic view of an electronic device next highest tier and provide voice feedback for the sub- display provided in response to a user selecting the sequent elements in tier order (e.g., using a queue in highlighted menu option of FIG. 6 in accordance with 40 which the order of elements is set by the tiers associated one embodiment of the invention; witheach element). Elements thatdo not includea speak- FIG. 9 is a schematic view of an illustrative queue of able property or speakable order (e.g., elements for speakable items for playback associated with the which no voice feedback is provided) may be ignored or display of FIG. 8 in accordance with one embodiment skipped by the electronic device as it provides voice feed- of the invention; 45 back. FIG. 10 is a schematic view of an illustrative "Now [0013] The electronic device may determine which el- Playing" display in accordance with one embodiment ement to speak at a particular time using any suitable of the invention; approach. In some embodiments, the electronic device FIG. 11 is a schematic view of an illustrative queue may provide voice feedback in response to detecting a of speakable items for a Now Playing display in ac- 50 transaction (e.g., a decision regarding what elements can cordance with one embodiment of the invention; be spoken). For example, the electronic device may de- FIG. 12 is an illustrative state diagram for speaking tect a transaction in response to determining that the dis- speakable strings in accordance with one embodi- play has transitioned, or in response to receiving a user ment of the invention; action causing the display to change (e.g., the user se- FIG. 13 is a schematic view of an illustrative com- 55 lected an option or moved a highlight region). In response munications system including an electronic device to detecting a transaction, the electronic device may iden- and a host device in accordance with one embodi- tify the speakable elements of the updated display, and ment of the invention; the tiers associated with the speakable elements (e.g.,

3 5 EP 3 026 541 A1 6 elements within the transaction to speak in order). The firmware, user preference information data (e.g., media electronic device may then create a new queue of ele- playback preferences), authentication information (e.g. ments for which voice feedback is to be provided based libraries of data associated with authorized users), life- on the identified elements of the updated display, and style information data (e.g., food preferences), exercise provide voice feedback based on the newly created5 information data (e.g., information obtained by exercise queue. In some embodiments, the new queue may be monitoring equipment), transaction information data constructed by replacing unspoken equal or lower tier (e.g., information such as credit card information), wire- items of an existing queue. The particular elements spo- less connection information data (e.g., information that ken, and the order in which the elements are spoken may may enable electronic device 100 to establish a wireless change with each transaction. 10 connection), subscription information data (e.g., informa- [0014] The audio files that are played back in response tion that keeps track of or television shows or to receiving an instruction to provide voice feedback for other media a user subscribes to), contact information a particular displayed element may be generated using data (e.g., telephone numbers and email addresses), cal- any suitable approach. In some embodiments, to provide endar information data, and any other suitable data or high quality audio using a text to speech (TTS) engine, 15 any combination thereof. the audio files may be received from a host device con- [0018] Memory 106 can include cache memory, semi- nected to the electronic device. This approach may be permanent memory such as RAM, and/or one or more particularly desirable if the electronic device has limited different types of memory used for temporarily storing resources (e.g., inherent memory, processing and power data. In some embodiments, memory 106 can also be limitations due to the portability of the electronic device). 20 used for storing data used to operate electronic device The electronic device may provide a host device with a applications, or any other type of data that may be stored file listing strings associated with each element to be spo- in storage 104. In some embodiments, memory 106 and ken by the device. The host device may then convert the storage 104 may be combined as a single storage me- strings to speech using a text-to-speech engine and pro- dium. vide the audio files of the speech to the electronic device. 25 [0019] Input mechanism 108 may provide inputs to in- The electronic device may then consult a mapping of put/output circuitry of the electronic device. Input mech- strings to audio files to provide the proper audio file for anism 108 may include any suitable input mechanism, playback in response to determining that voice feedback such as for example, a button, keypad, dial, a click wheel, for a displayed element is to be provided. or a touch screen. In some embodiments, electronic de- [0015] FIG. 1 is a schematic view of a electronic device 30 vice 100 may include a capacitive sensing mechanism, in accordance with one embodiment of the invention. or a multi-touch capacitive sensing mechanism. Some Electronic device 100 may include processor 102, stor- sensing mechanisms are described in commonly owned age 104, memory 106, input mechanism 108, audio out- U.S. Patent Application No. 10/902,964, filed July 10, put 110, display 112, and communications circuitry 114. 2004, entitled "Gestures for Touch Sensitive Input De- In some embodiments, one or more of electronic device 35 vice," and U.S. Patent Application No. 11/028,590, filed components 100 may be combined or omitted (e.g., com- January 18, 2005, entitled "Mode-Based Graphical User bine storage 104 and memory 106). In some embodi- Interfaces for Touch Sensitive Input Device," both of ments, electronic device 100 may include other compo- which are incorporated herein in their entirety. nents not combined or included in those shown in FIG. [0020] Audio output 110 may include one or more 1 (e.g., a power supply or a bus), or several instances of 40 speakers (e.g., mono or stereo speakers) built into elec- the components shown in FIG. 1. For the sake of sim- tronic device 100, or an audio connector (e.g., an audio plicity, only one of each of the components is shown in jack or an appropriate Bluetooth connection) operative FIG. 1. to be coupled to an audio output mechanism. For exam- [0016] Processor 102 may include any processing cir- ple, audio output 110 may be operative to provide audio cuitry operative to control the operations and perform- 45 data using a wired or wireless connection to a headset, ance of electronic device 100. For example, processor headphones or earbuds. 100 may be used to run applications, [0021] Display 112 may include display circuitry (e.g., firmware applications, media playback applications, me- a screen or projection system) for providing a display dia editing applications, or any other application. In some visible to the user. For example, display 112 may include embodiments, a processor may drive a display and proc- 50 a screen (e.g., an LCD screen) that is incorporated in ess inputs received from a user interface. electronic device 100. As another example, display 112 [0017] Storage 104 may include, for example, one or may include a movable display or a projecting system for more storage mediums including a hard-drive, solid state providing a display of content on a surface remote from drive, flash memory, permanent memory such as ROM, electronic device 100 (e.g., a video projector). In some any other suitable type of storage component, or any55 embodiments, display 112 can include a coder/decoder combination thereof. Storage 104 may store, for exam- (Codec) to convert digital media data into analog signals. ple, media data (e.g., music and video files), application For example, display 112 (or other appropriate circuitry data (e.g., for implementing functions on device 100), within electronic device 100) may include video Codecs,

4 7 EP 3 026 541 A1 8 audio Codecs, or any other suitable type of Codec. [0026] The electronic device may provide voice feed- [0022] Display 112 also can include display driver cir- back for any suitable displayed content, including for ex- cuitry, circuitry for driving display drivers, or both. Display ample menu options or content available for playback to 112may be operativeto display content (e.g., media play- a user (e.g., voice feedback for metadata associated with back information, application screens for applications im- 5 media, such as an artist name, media title, or album). plemented on the electronic device, information regard- FIG. 2 is a schematic view of an illustrative display screen ing ongoing communications operations, information re- having content for which voice feedback may be available garding incoming communications requests, or device in accordance with one embodiment of the invention. Dis- operation screens) under the direction of processor 102. play 200 may include several areas on which content is [0023] One or more of input mechanism 108, audio 10 displayed. For example, display 200 may include title bar output 110 and display 112 may be coupled to input/out- 210, menu 220 and additional information 230. Title bar put circuitry. The input/output circuitry may be operative 210 may include title 212 indicating the mode or applica- to convert (and encode/decode, if necessary) analog sig- tion in use by the electronic device. For example, title nals and other signals into digital data. In some embod- 212 may include iPod (e.g., the top most title when no iments, the input/output circuitry can also convert digital 15 application has been selected), Music, Videos, Photos, data into any other type of signal, and vice-versa. For Podcasts, Extras, and Settings. Other titles may be avail- example, the input/output circuitry may receive and con- able, for example when an accessory device is coupled vert physical contact inputs (e.g., from a multi-touch to the electronic device (e.g., a radio accessory or work- screen), physical movements (e.g., from a mouse or sen- out accessory). Title bar 210 may also include any other sor), analog audio signals (e.g., from a microphone), or 20 suitable information, including for example battery indi- any other input. The digital data can be provided to and cator 214. received from processor 102, storage 104, memory 106, [0027] Menu 220 may include several selectable op- or any other component of electronic device 100. In some tions 222, including for example options for selecting a embodiments, several instances of the input/output cir- mode or application, or options associated with a partic- cuitry can be included in electronic device 100. 25 ular selected mode or application. A user may select an [0024] Communications circuitry 114 may be operative option from menu 220 by navigating highlight region 224 to communicate with other devices or with one or more over an option. The user may provide a selection instruc- servers using any suitable communications protocol. tion (e.g., by pressing a button or providing any other Electronic device 100 may include one more instances suitable input) while the highlight region is over a partic- of communications circuitry for simultaneously perform- 30 ular option to select the particular option. Additional in- ing several communications operations using different formation 230 may include any suitable information, in- communications networks. For example, communica- cluding for example information associated with the tions circuitry may support Wi-Fi (e.g., a 802.11 protocol), mode or application identified by title 212, one or more Ethernet, Bluetooth™ (which is a trademark owned by displayed options 222, the particular option identified by Bluetooth Sig, Inc.), radio frequency systems, cellular 35 highlight region 224, or any other suitable information. networks (e.g., GSM, AMPS, GPRS, CDMA, EV-DO, [0028] The electronic device may generate display EDGE, 3GSM, DECT, IS-136/TDMA, iDen, LTE or any 200, or any other display using any suitable approach. other suitable cellular network or protocol), infrared, In some embodiments, a Model-View-Controller (MVC) TCP/IP (e.g., any of the protocols used in each of the architecture or design may be used. The model may in- TCP/IP layers), HTTP, BitTorrent, FTP, RTP, RTSP,40 clude any suitable information coupled to a view for dis- SSH, Voice over IP (VOIP), any other communications play by a controller (e.g., the controller may query the protocol, or any combination thereof. In some embodi- model to construct views, or modify a view’s connection ments, communications circuitry 114 may include one or to a model at runtime). For example, a model may include more communications ports operative to provide a wired one or more strings or images. Each view may be con- communications link between electronic device 100 and 45 figured to display (e.g., support) one or more types of a host device. For example, a portable electronic device element. The view may pass the supported types to a may include one or more connectors (e.g., 30 pin con- get_Property call, in response to which the model may nectors or USB connectors) operative to receive a cable provide data associated with the supported type to the coupling the portable electronic device to a host compu- view for display by the device. Several views may be ter. Using software on the host computer (e.g. iTunes 50 combined to form each display. For example, display 200 available from Apple Inc.), the portable electronic device may include at least one view for each area of the display. may communicate with the host computer. [0029] To facilitate providing voice feedback for dis- [0025] In some embodiments, electronic device 100 played content, the electronic device may incorporate may include a bus operative to provide a data transfer voice feedback variables and settings in the MVC archi- path for transferring data to, from, or between control 55 tecture associated with the actual display of content. In processor 102, storage 104, memory 106, input/output some embodiments, the model may include an additional circuitry 108, sensor 110, and any other component in- speakable property field. The speakable property field cluded in the electronic device. may include any suitable information necessary or useful

5 9 EP 3 026 541 A1 10 for providing voice feedback. In some embodiments, the any suitable combination of displayed elements. For ex- speakable property field may include an indication that ample, the electronic device may speak only one menu voice feedback is to be provided (e.g., a toggled setting). item (e.g., the menu item identified by a highlight region). The electronic device may determine the text to speak As another example, the electronic device may speak using any suitable approach. In some embodiments, the 5 several menu items (e.g., all menu items that come after view or scheduling system may query the property ID of the highlighted menu item). As still another example, the the type associated with the view. In some embodiments, electronic device may speak all menu items. To ensure a fixed size ID generated from a property ID (e.g., using that the electronic device first speaks the menu item iden- a hash table) may instead or in addition be provided to tified by the highlight region, the electronic device may identify the text for which to provide voice feedback. In 10 associate a higher tier or order to the corresponding some embodiments, the speakable property may instead menu item. This discussion will interchangeably use the or in addition include a string of text to be spoken by the terms "speaking" a speakable element or string and electronic device, or a pointer to the field having the text "playing an audio file" associated with a speakable ele- to be displayed in the model. ment or string to describe providing voice feedback for a [0030] The electronic device may incorporate the tier 15 speakable element. or importance in any suitable component of the MVC ar- [0033] In some embodiments, the speech scheduler chitecture, including for example as a speakable order may only include one speakable element for each tier of variable associated with each view. The speakable order each view in the queue. This may provide an easy mech- may provide an indication of the importance of the speak- anism, for example, for the electronic device to speak able element displayed in the corresponding view, for 20 only a menu item that is highlighted (e.g., only speak example relative other text in other views that may be "Music" and not the other items in menu 220 by assigning displayed. The indication may include, for example, a tier the Focus tier only to the "Music" menu option). If, within of speech. The electronic device may define any suitable a transaction, several displayed items change within a speakable order or tier, including for example, context view at a given tier, the speech scheduler may only place (e.g., associated with menu titles), focus (e.g., list control, 25 the most recent changed item in the queue. To provide such as highlight region position), choice (e.g., an option voice feedback for several items associated with a same associated with an item on a list), property (e.g., a de- speakable order in a single transaction, the electronic tailed description or lyrics for media), detail, and idle. device may display the several items in distinct views Each view may be associated with one or more tiers or associated with the same speakable order. The speech speakable orders, for example based on the model or 30 scheduler may use any suitable approach for providing elements displayed in the view. For example, a view may voice feedback for different elements of views having the be associated with several tiers if a menu option and same tier (e.g., Idle tier in a Now Playing display, de- associated setting (e.g., Backlight option 224 and setting scribed below in more detail). For example, the speech 226) are simultaneously displayed within a view. Alter- scheduler may follow the order of the elements in one or natively, the menu option and setting may be provided 35 more resource files, an order based on the graphical po- in different views. sition of the views, alphabetically, or using any suitable [0031] If a view or several views are displayed as part order. of a display, the electronic device may retrieve from the [0034] FIG. 3 is a schematic view of an illustrative model the elements to display, and the manner in which queue of speakable items for playback associated with to display the elements. In addition, the electronic device 40 the display of FIG. 2 in accordance with one embodiment may retrieve the speakable properties from each model of the invention. Queue 300 may be depicted using any and the speakable order from each displayed view. The suitable approach. In the example of FIG. 3, queue 300 electronic device may provide voice feedback for any may include list 310 of speakable strings to speak suc- suitable speakable element of a display. For example, cessively. Each speakable string, as part of a view, may the electronic device may provide voice feedback for one 45 be associated with a speakable tier, identified in corre- or more views. As another example, the electronic device sponding column 340. Using the elements from display may provide voice feedback for one or more elements in 200 (FIG. 2), the speakable strings may include iPod a particular view. In some embodiments, the electronic string 312 having Context tier 342 and Music string 313 device may provide voice feedback, in a particular view, having Focus tier 343 (e.g., the menu item identified by for only one element at each tier (e.g., provide voice feed- 50 the highlight region is the only one spoken). In implemen- back for only one element in menu 220, where each op- tations in which all menu items are spoken (e.g., and not tion is associated with a particular tier). only the menu items identified by a highlight region), the [0032] To provide voice feedback for displayed speak- speakable strings may include a Videos string, a Photos able elements in the proper order, a speech scheduler string, a Podcasts string, an Extras string, a Settings of the electronic device may define a queue of items for 55 string, a Shuffle Songs string, and a Backlight string, for which to provide voice feedback (e.g., speakable items) example all having Choice tiers (e.g., a tier below the in which the speakable order or tier sets the order of the Focus tier of Music string 313). In addition, because the elements in the queue. The electronic device may speak Backlight option may be displayed with an associated

6 11 EP 3 026 541 A1 12 setting, queue 300 may also include an On string asso- speakable strings may include Music string 512 having ciated with a Properties tier, which may be spoken after Context tier 542 and string 513 having Focus the Backlight string is spoken. In implementations in tier 543 (e.g., the menu option identified by the highlight whichonly thehighlighted option is spoken,the electronic region). In implementations where all menu items are device may assign a Focus tier to the Backlight string 5 spoken, queue 500 may include a Playlists string, an Art- and a Choice tier to the On string in response to detecting ists string, an Albums string, a Songs string, a Genres that the highlight region has been placed over the Back- string, a Composers, an Audiobooks string, and a Search light option in the menu. The electronic device may iden- string, for example all having a Choice tier (e.g., a tier tify audio files associated with each of the speakable below Focus tier 543 of Cover Flow string 513). The elec- strings (e.g., using a hash or database) and successively 10 tronic device may identify audio files associated with play back each of the identified audio files in the order each of the speakable strings (e.g., using a hash or da- set by queue 300. tabase) and successively play back each of the identified [0035] When the content on the electronic device dis- audio files in the order set by queue 500. play changes, the electronic device may modify the voice [0038] In some embodiments, the voice feedback pro- feedback provided to reflect the changed display. FIG. 4 15 vided by the electronic device may change when the dis- is a schematic view of an electronic device display after played content remains the same, but when a marker receiving a user selection of an option of the display of controlled by the user (e.g., a highlight region) changes. FIG. 2 in accordance with one embodiment of the inven- This may allow a user to identify the action that will be tion. Similar to display 200 (FIG. 2), display 400 may in- performed in response to a user selection of the option clude several areas on which content is displayed. For 20 identified by the marker as the user moves the marker. example, display 400 may include title bar 410, menu FIG. 6 is a schematic view of the electronic device display 420 and additional information 430. Title bar 410 may of FIG. 4 having a different marked option in accordance include title 412 indicating the mode or application in use with one embodiment of the invention. Similar to display by the electronic device. In the example of FIG. 4, title 400 (FIG. 4), display 600 may include several areas on 412 may include Music, indicating the option from menu 25 which contentis displayed. For example, display 600 may 220 (FIG. 2) that was selected. include title bar 610, menu 620 and additional information [0036] Menu 420 may include several selectable op- 630. Title bar 610 may include title 612 indicating the tions 422, including for example options associated with mode or application in use by the electronic device, which a particular selected mode or application. A user may may be the same mode (e.g., Music) as display 400. select an option from menu 420 by navigating highlight 30 [0039] Menu 620 may include the same selectable op- region 424 over the option. The user may provide a se- tions 622 as display 400. As shown in FIG. 6, a user may lection instruction (e.g., by pressing a button or providing have navigated highlight region 624 over an Artist option any other suitable input) while the highlight region is over (e.g., instead of a Cover Flow option as in display 400). a particular option to select the particular option. In the The displayed additional information 630 may include example of FIG. 4, options 422 may include Cover Flow, 35 any suitable information, including for example informa- Playlists, Artists, Albums, Songs, Genres, Composers, tion associated with the mode or application identified by Audiobooks, and Search. Additional information 430 may title 612, one or more displayed options 622, the partic- include any suitable information, including for example ular option identified by highlight region 624, or any other information associated with the mode or application iden- suitable information. In the example of FIGS. 4 and 6, tified by title 412, one or more displayed options 422, the 40 the additional information displayed may be different, re- particular option identified by highlight region 424, or any flecting the position of highlight region 624. other suitable information. [0040] In response to determining that the position of [0037] In response to determining that the displayed the highlight region has changed (e.g., in response to content has changed (e.g., in response to detecting a detecting a transaction), the speech scheduler may up- transaction), the speech scheduler may update or revise 45 date the queue of speakable items providing voice feed- the queue of speakable items providing voice feedback back for the display. For example, the speech scheduler for the display. For example, the speech scheduler may may determine the revised, modified or updated speak- determine the speakable properties associated with each able properties associated with each view of the changed view of the changed display to generate the queue. FIG. display to generate the queue. FIG. 7 is a schematic view 5 is a schematic view of an illustrative queue of speakable 50 of an illustrative queue of speakable items for playback items for playback associated with the display of FIG. 4 associated with the display of FIG. 6 in accordance with in accordance with one embodiment of the invention. one embodiment of the invention. Queue 700 may be Queue 500 may be depicted using any suitable ap- depicted using any suitable approach. In the example of proach. In the example of FIG. 5, queue 500 may include FIG. 7, queue 700 may include list 710 of speakable list 510 of speakable strings to speak successively. Each 55 strings to speak successively. Each speakable string, as speakable string, as part of a view, may be associated part of a view, may be associated with a speakable tier, with a speakable tier, identified in corresponding column identified in corresponding column 740. Using the ele- 540. Using the elements from display 400 (FIG. 4), the ments from display 600 (FIG. 6), the speakable strings

7 13 EP 3 026 541 A1 14 may include Music string 712 having Context tier 742 and able string "Music" (e.g., the speakable string shared by Artists string 713 having Focus tier 743 (e.g., the menu queues 500 and 700) or a different speakable string (e.g., option identified by the highlight region). In particular, the not shared by queues 500 and 700). If the speech sched- listing of speakable strings in queue 700 may be different uler determines that the currently spoken speakable than that of queue 500 (FIG. 5) to reflect that the highlight 5 string falls within the speakable strings shared by the region moved down to the Artists option. For example, initial and updated queues, the speech scheduler may the speakable strings that would be spoken in queue 500 continue to speak or play back the audio associated with before queue 700 may be removed from queue 700. The the speakable string, and subsequently continue to play electronic device may identify audio files associated with back audio associated with the speakable strings of the each of the speakable strings (e.g., using a hash or da- 10 updated queue in the order set by the updated queue. tabase) and successively play back each of the identified For example, if the electronic device is playing back the audio files in the order set by queue 700. In implemen- audio associated with the speakable string "Music" tations where voice feedback for non-highlighted menu (which has a Context tier) as the user causes the display options is provided, queue 700 may include an Albums to change from display 400 to display 600, the electronic string, a Songs string, a Genres string, a Composers, an 15 device may provide the audio associated with the speak- Audiobooks string, a Search string, a Cover Flow string, able string "Artists" (the next item in the queue associated and a Playlists string, for example all having a Choice with display 600) when the electronic device finishes tier (e.g., a tier below Focus tier 743 of Artist string 713). playing back the audio associated with the speakable The other menu options may be ordered in any suitable string "Music" (e.g., instead of the audio associated with manner, including for example as a repeating list that 20 the speakable string "Cover Flow," which was the next begins with the menu item identified by the highlight re- speakable string in the queue associated with display gion. 400). [0041] The electronic device may play back any portion [0043] If the speech scheduler instead determines that of a speakable option audio file in response to detecting the currently spoken speakable string does not fall within a transaction. In some embodiments, if the electronic de- 25 the range of speakable strings shared by the initial and vice begins playing back the audio files associated with updatedqueues, theelectronic devicemay cease playing display 200 when the user provides an instruction to ac- back the audio associated with the currently spoken cess display 400, or the audio files associated with the speakable string. For example, the electronic device may speakable strings of display 400 as the user moves the cease playing back the audio as soon as the speech highlight region to the position reflected in display 600, 30 scheduler determines that the currently spoken speech the electronic device may selectively stop playing back is not within the range of shared speakable strings. The the audio file or continue playing back the audio file based electronic device may then resume playing back audio on at least one of the tier associated with the audio file associated with any suitable speakable string of the up- and the modification of the speech scheduler queue of dated queue, including for example speakable strings of speakable items. In some embodiments, the speech35 the updated queue starting with the speakable string of scheduler may first determine the updated queue, and the updated queue from which the order of speakable compare the initial queue to the updated queue. In par- elements changed. For example, if the electronic device ticular, the speech scheduler may determine, from the is currently speaking the speakable string "Cover Flow" beginning of the queues, the portions of the initial queue as the user causes the electronic device to move from and updated queue that remain the same, and the posi- 40 display 400 to display 600, the electronic device may stop tion of the updated queue from which the order of speak- playing back the audio associated with the speakable able elements changes. For example, as the speech string "Cover Flow" (e.g., and only play back the audio scheduler moves from queue 300 to queue 500, the for "Cover") and begin playing back the audio associated speech scheduler may determine that the queues do not with the speakable string "Artists" (e.g., the first speaka- share any common speakable strings and therefore are 45 ble string of queue 700 that is different from queue 500). different from the first position. As another example, as In implementations in which all menu items are spoken, the speech scheduler moves from queue 500 to queue if the electronic device is currently speaking the speak- 700, the speech scheduler may determine that the able string "Genres" as the user causes the electronic queues share the speakable string associated with the device to move from display 400 to display 600, the elec- Context tier, but differ starting with the speakable string 50 tronic device may stop playing back the audio associated associated with the Focus tier. with the speakable string "Genre" and begin playing back [0042] The speech scheduler may further determine the audio associated with the speakable string "Artists." the position on each of the initial queue and the updated The speakable string "Genre" may then be spoken again queue (if present) of the speakable string for which audio when it is reached in the queue associated with display is currently being provided. For example, as the speech 55 600 (e.g., queue 700). Accordingly, if a user moves a scheduler moves from queue 500 to queue 700, the highlight region along the options displayed in display speech scheduler may determine whether the speakable 400 at an appropriate speed, the electronic device may string for which an audio file is played back is the speak- only play back portions (e.g., the first syllables) of each

8 15 EP 3 026 541 A1 16 of the options of display 400. Choice tier (e.g., a tier below Focus tier 843 of Common [0044] In some embodiments, the electronic device string 813). The other artists may be ordered in any suit- may provide voice feedback for menu items that are not able manner, including for example as a repeating list statically provided by the electronic device firmware or thatbegins with theartist identified bythe highlightregion. operating system. For example, the electronic device 5 [0047] In some embodiments, the electronic device may provide voice feedback for dynamic strings gener- may selectively provide voice feedback based on the sta- ated based on content provided by the user to the elec- tus of media playback. For example, the electronic device tronic device (e.g., from a host device). In some embod- may not provide voice feedback for particular elements iments, the electronic device may provide voice feedback or in a particular mode when the electronic device is play- for media transferred to the electronic device by a user 10 ing back media. FIG. 10 is a schematic view of an illus- (e.g., based on metadata associated with the transferred trative "Now Playing" display in accordance with one em- media). FIG. 8 is a schematic view of an electronic device bodiment of the invention. Display 1000 may include title display provided in response to a user selecting the high- bar 1010, menu 1020 and additional information 1030. lighted menu option of FIG. 6 in accordance with one Title bar 1010 may include title 1012 indicating the mode embodiment of the invention. Similar to display 600 (FIG. 15 or application in use by the electronic device. For exam- 6), display 800 may include several areas on which con- ple, title 1012 may include iPod (e.g., the top most title tent is displayed. For example, display 800 may include when no application has been selected), Music, Videos, title bar 810, menu 820 and additional information 830. Photos, Podcasts, Extras, Settings, and Now Playing. Title bar 810 may include title 812 indicating the mode Title bar 1010 may also include any other suitable infor- or application in use by the electronic device (e.g., "Art- 20 mation, including for example battery indicator 1014. ists"). [0048] Menu 1020 may include several selectable op- [0045] Menu 820 may include any suitable listing as- tions 1022, including for example options for selecting a sociated with "Artists" mode, including for example listing mode or application, or options associated with a partic- 822 of the artist names for media available to the elec- ular selected mode or application. A user may select an tronic device (e.g., media stored by the electronic de- 25 option from menu 1020 by navigating highlight region vice). The electronic device may gather the artist names 1024 over an option. The user may provide a selection using any suitable approach, including for example from instruction (e.g., by pressing a button or providing any metadata associated with the media. The displayed ad- other suitable input) while the highlight region is placed ditional information 830 may include any suitable infor- over a particular option to select the particular option. For mation, including for example information associated30 example, to view information related to media that is cur- with one or more artists identified in menu 820 (e.g., in- rently being played back (e.g., currently playing or formation related to the media available from the artist paused media), the user may select a Now Playing op- identified by highlight region 824), or the mode or appli- tion. In response to receiving a user selection of the Now cation identified by title 612. Playing option, the electronic device may display addi- [0046] In response to detecting a transaction (e.g., a 35 tional information 1030 related to the now playing media. user selection of the Artists option in display 600, FIG. For example, additional information 1030 may include 6),the speech scheduler mayupdate the queue of speak- artist 1032, title 1034, and album 1036 overlaid on album able items to reflect the displayed dynamic artist names. art. In some embodiments, each of artist 1032, title 1034 For example, the speech scheduler may determine the and album 1036 may be associated with the same or revised, modified or updated speakable properties asso- 40 different views (e.g., different views to allow for voice ciated with each view of the changed display to generate feedback of the additional information using the same the queue. FIG. 9 is a schematic view of an illustrative tier for all of the additional information elements). queue of speakable items for playback associated with [0049] In response to receiving a selection of the Now the display of FIG. 8 in accordance with one embodiment Playing option of display 1000 (FIG. 10), the speech of the invention. Queue 900 may be depicted using any 45 scheduler may update the queue of speakable items to suitable approach. In the example of FIG. 9, queue 900 speak one or more strings related to the now playing may include list 910 of speakable strings to speak suc- media. For example, the speech scheduler may deter- cessively. Each speakable string, as part of a view, may mine the revised, modified or updated speakable prop- be associated with a speakable tier, identified in corre- erties associated with each view of the changed display sponding column 940. Using the elements from display 50 to generate the queue. FIG. 11 is a schematic view of an 800 (FIG. 8), the speakable strings may include Artists illustrative queue of speakable items for a Now Playing string 912 having Context tier 942 and Common string display in accordance with one embodiment of the inven- 913 having Focus tier 943 (e.g., the artist identified by tion. Queue 1100 may be depicted using any suitable the highlight region). In implementations where voice approach. In the example of FIG. 11, queue 1100 may feedback for non-highlighted menu options is provided, 55 include list 1110 of speakable strings to speak succes- queue 900 may include a The Corrs string, a Craig David sively. Each speakable string, as part of a view, may be string, a Creed string, a D12 string, a Da Brat string, and associatedwith aspeakable tier, identified incorrespond- a Daniel Beddingfield string, for example all having a ing column 1140. Using the elements from display 1000

9 17 EP 3 026 541 A1 18

(FIG. 10), the speakable strings may include iPod string electronic device may update the variables or fields as- 1112 having Context tier 1142, Now Playing string 1113 sociated with providing voice feedback. For example, a having Focus tier 1143 (e.g., the menu option identified speech scheduler may generate a queue of items for the by the highlight region), Mika string 1114 having Idle tier electronic device to speak, for example based on fields 1144, Grace Kelly string 1115 having Idle tier 1145, and 5 available from one or more models used to generate Life in Cartoon Motion string 1116 having Idle tier 1146. views for the post-transaction display. The electronic de- [0050] To ensure that voice feedback for the artist, title vice may move to PreSpeakTimeout state 1206 after Up- and album are not provided at inopportune times, the date step 1204. electronic device may not provide voice feedback for [0053] At PreSpeakTimeout state 1206, the electronic speakable elements associated with the Idle tier when 10 device may pause for a first timeout. During the timeout, media is playing back (e.g., not paused). For example, the electronic device may perform any suitable operation, the electronic device may first determine whether media including for example generate the queues of speakable is playing back. In response to determining that no media strings to speak, identify the audio files associated with is playing back, the electronic device may provide voice the speakable strings and perform initial operations for feedback for all of the elements in queue 1100, including 15 preparing the audio files for playback, duck or fade prior the elements associated with the Idle tier. If the electronic audio outputs (e.g., outputs due to music playback), or device instead determines that media is currently being perform any other suitable operation. For example, the played back, the electronic device may provide voice electronic device may reduce prior audio feedback (e.g., feedback for elements in queue 1100 from views asso- ducking) so that the spoken string may be clearer. As ciated with tiers other than the Idle tier. The speech20 another example, the electronic device may pause the scheduler may, in response to detecting that media is playback of media during the voice feedback (e.g., so playing back, remove elements associated with the Idle that the user does not miss any of the media). As still tier from queue 1100, or instead skip elements associat- another example, the electronic device may use Pre- ed with Idle tier in queue 1100. The electronic device may SpeakTimeout state to ensure that no more recent trans- assign an Idle tier to any suitable displayed information, 25 actions are detected (e.g., a subsequent movement of a including for example to information displayed in an ad- highlightregion) to avoid partially speaking text. The elec- ditional information window or area (e.g., the number of tronicdevice may remain inPreSpeakTimeout state 1206 songs or photos stored on the device). for any suitable duration, including for example a duration [0051] The electronic device may determine what in the range of 0ms to 500ms (e.g., 100ms). Once the strings tospeak at what time using any suitableapproach. 30 first timeout associated with PreSpeakTimeout state FIG. 12 is an illustrative state diagram for speaking 1206 has lapsed, the electronic device move to Resume speakable strings in accordance with one embodiment step 1206 to access Speaking state 1210. of the invention. State diagram 1200 may include several [0054] At Speaking state 1210, the electronic device states and several paths for accessing each of the sev- may speak a speakable item placed in the queue gener- eral states. The electronic device may begin in Idle state 35 ated during Update step 1204. For example, the elec- 1202. For example, the electronic device may remain in tronic device may identify the audio file associated with the Idle state when no content is displayed. As another a speakable item in the generated queue and play back example, the electronic device may remain in the Idle the identified audio file. When the electronic device fin- state when content is displayed, but the displayed con- ishes speaking the first item in the voice feedback queue tent is not associated with voice feedback (e.g., an album 40 generated by the speech scheduler, the electronic device cover art is displayed). As still another example, the elec- may determine that proper voice feedback has been pro- tronicdevice may remain in theIdle state when speakable vided and move to Complete step 1212. At Complete content is displayed, but the speakable content has all step 1212, the speech scheduler may remove the spoken been spoken. speakable element from the queue or move a pointer to [0052] While in Idle state 1202, the electronic device 45 the next speakable element in the queue. In some em- may monitor for transactions of the display. Any decision bodiments, the electronic device may instead remove the by the electronic device regarding what elements to speakable element from the queue just before speaking speak may result in a transaction. A transaction may be the element (e.g., while in Speaking state 1210) so that initiated (and detected by the electronic device) using the first speakable element identified by the electronic several different approaches. For example, a transaction 50 device after Complete step 1212, as the electronic device may be detected in response to receiving a user instruc- returns to Speaking state 1210, is the next element to tion (e.g., a user selection of a selectable option causing speak. The electronic device may successively move be- thedisplay to change). As another example,a transaction tween Speaking state 1210 and Complete step 1212 until may be detected in response to a transition of the display all of the speakable items in the queue generated during (e.g., the display changing, for example due to a timeout 55 an Update step (e.g., Update step 1204) have been spo- or due to a user moving a highlight region). In response ken (e.g., the queue is empty or the pointer has reached to detecting a transaction, the electronic device may the end of the queue), or until the display is changed and move to Update step 1204. At Update step 1204, the a new Update step is performed.

10 19 EP 3 026 541 A1 20

[0055] In response to detecting a transaction (e.g., de- to 500ms (e.g., 100ms). Once the first timeout associated scribed above) while in Speaking state 1210, the elec- with PostSpeakTimeout state 1218 has lapsed, the elec- tronic device may move to Update step 1214. At Update tronic device move to Resume step 1220 to return to Idle step1214, the electronicdevice may update the variables state 1202. or fields associated with providing voice feedback to con- 5 [0057] In some embodiments, the electronic device form to the display resulting from the transaction. For may detect a transaction (e.g., described above) while example, the speech scheduler may update the speak- in PostSpeakTimeout state 1218 and move to Update able elements, and the order of speakable elements for step 1222. Update step 1222 may include some or all of which to provide voice playback based on the display the features of Update step 1214. At Update step 1222, after the transaction, in an updated voice feedback10 the electronic device may update the variables or fields queue. In some embodiments, the electronic device may associated with providing voice feedback to conform to in addition determine the portion of the updated queue, the display resulting from the transaction. For example, starting with the first speakable element of the queue, the speech scheduler may update the speakable ele- that matches the initial voice feedback queue (e.g., prior ments, and the order of speakable elements for which to to step 1214), and identify the current speakable element 15 provide voice playback based on the display after the for which voice feedback is being provided. If the elec- transaction, in an updated voice feedback queue. In tronic device determines that the current speakable ele- someembodiments, the electronic devicemay in addition ment is within the portion of shared speakable elements determine the portion of the updated queue, starting with of the initial and updated queues, the electronic device the first speakable element of the queue, that matches may return to Speaking state 1210 and continue to speak 20 the initial voice feedback queue (e.g., prior to step 1222), the next speakable element of the updated queue (e.g., and identify the current speakable element for which using Complete step 1212 and Speaking state 1210). If voice feedback is being provided (e.g., as described the electronic device instead determines that the current above in connection with Update step 1214). The elec- speakable element is not within the portion of shared tronic device may then return to Speaking state 1210 and speakable elements of the initial and updated queues, 25 provide voice feedback for the speakable elements of the electronic device may cease speaking the current the updated queue, for example beginning with the first speakable element (e.g., stop playing back the audio file speakable element of the queue after the determined por- associated with the current speakable element) and re- tion of shared speakable elements. turn to Speaking state 1210. Upon returning to Speaking [0058] In some embodiments, the electronic device state 1210, the electronic device may provide voice feed- 30 may detect an error in the speaking process. For exam- back for the speakable elements of the updated queue, ple, the electronic device may receive, at play_error step for example beginning with the first speakable element 1224, an indication of an error associated with Speaking of the queue after the determined portion of shared state 1210. The electronic device may receive any suit- speakable elements. able indication of an error at step 1224, including for ex- [0056] Once the electronic device has provided voice 35 ample a play_error variable. The electronic device may feedback for every element in the queue generated by then reach ErrorSpeaking state 1226. The electronic de- the speech scheduler (e.g., once the queue is empty), vice may perform any suitable operation in ErrorSpeak- the electronic device may move to no_ready_queue step ing state 1226. For example, the electronic device may 1216. At no_ready_queue step 1216, the electronic de- perform a debugging operation, or other operation for vice may receive an indication that the queue of speak- 40 identifying the source of the error. As another example, able items is empty from the speech scheduler (e.g., a the electronic device may gather information associated no_ready_queue variable). From no_ready_queue step with the error to provide to the developer of the software 1216, the electronic device may move to PostSpeak- for debugging or revision. If the electronic device com- Timeout state 1218. At state 1218, the electronic device pletes the one or more operations associated with Err- may pause for a second timeout. During the timeout, the 45 orSpeaking state 1226, the electronic device may move electronic device may perform any suitable operation, to Complete step 1228 and return to Speaking state 1210 including for example preparing other audio for playback, to continue to provide voice feedback for the speakable initializing an operation selected by a user (e.g., in re- elements in the queue generated by the speech sched- sponse to detecting a selection instruction for one of the uler. displayed and spoken menu options), or any other suit- 50 [0059] Alternatively, if the electronic device fails to per- able operation. The electronic device may instead or in form all of the operations associated with ErrorSpeaking addition return audio output from a ducked or faded mode state 1226, the electronic device may move to Resume (e.g., enabled during PreSpeakTimeout state 1206 to a step 1230 and return to Speaking state 1210. The elec- normal mode for playing back audio or other media). Al- tronic device may fail to perform the operations associ- ternatively, the electronic device may resume the play- 55 ated with Speaking state 1210 for any suitable reason, back of paused media. The electronic device may remain including for example a failure to receive a valid "Com- in PostSpeakTimeout state 1218 for any suitable dura- plete" message, receiving a user instruction to cancel the tion, including for example a duration in the range of 0ms ErrorSpeaking operations or to return to Speaking state

11 21 EP 3 026 541 A1 22

1210, an error timeout (e.g., 100 ms), or any other suit- in FIG. 13 to avoid overcomplicating the drawing. able reason or based on any other suitable condition. [0063] Any suitable circuitry, device, system or combi- [0060] The electronic device may acquire audio files nation of these (e.g., a wireless communications infra- associated with each of the speakable elements using structure including communications towers and telecom- any suitable approach. In some embodiments, the audio 5 munications servers) operative to create a communica- files may be locally stored by the electronic device, for tions network may be used to create communications example as part of firmware or software of the device. network 1310. Communications network 1310 may be An inherent limitation of this approach, however, is that capable of providing wireless communications using any firmware is generally provided globally to all electronic suitable short-range or long-range communications pro- devices sold or used in different locations where languag- 10 tocol. In some embodiments, communications network es and accents may vary. To ensure voice feedback is 1310 may support, for example, Wi-Fi (e.g., a 802.11 providedin the properlanguage or with the proper accent, protocol), Bluetooth (registered trademark), radio fre- the firmware used by each device may need to be per- quency systems (e.g., 1300 MHz, 2.4 GHz, and 5.6 GHz sonalized. This may come at a significant cost, as several communication systems), infrared, protocols used by versions of firmware may need to be stored and provided, 15 wireless and cellular phones and personal email devices, and be significantly more complex, as the firmware or or any other protocol supporting wireless communica- software provider may need to manage the distribution tions between electronic device 1302 and host device of different firmware or software to different devices. In 1320. Communications network 1310 may instead or in addition, the size of audio files (e.g., as opposed to text addition be capable of providing wired communications files) may be large and prohibitive to provide as firmware 20 between electronic device 1302 and host device 1320, or software updates. for example using any suitable port on one or both of the [0061] In some embodiments, the electronic device devices (e.g., 30-pin, USB, FireWire, Serial, or Ethernet). may generate audio files locally using a text to speech [0064] Electronic device 1302 may include any suita- (TTS) engine operating on the device. Using such an ble device for receiving media or data. For example, elec- approach, each electronic device may provide text25 tronic device 1302 may include one or more features of strings associated with different menu options in the lan- electronic device 100 (FIG. 1). Electronic device 1302 guage associated with the device to the TTS engine of may be coupled with host device 1320 over communica- the device to generate audio files for voice feedback. This tionslink 1340using any suitable approach. For example, approach may allow for easier firmware or software up- electronic device 1302 may use any suitable wireless dates, as changes to displays in which speakable ele- 30 communications protocol to connect to host device 1320 ments are present may be reflected by a change in text over communications link 1340. As another example, strings on which the TTS engine may operate. The TTS communications link 1340 may be a wired link that is engine available from the electronic device, however, coupled to both electronic device 1302 and media pro- may limit this approach. In particular, if the electronic de- vider 1320 (e.g., an Ethernet cable). As still another ex- vice has limited resources, such as limited memory,35 ample, communications link 1340 may include a combi- processing capabilities, or power supply (e.g., limitations nation of wired and wireless links (e.g., an accessory associated with a portable electronic device), the quality device for wirelessly communicating with host device of the speech generated by the TTS engine may be re- 1320 may be coupled to electronic device 1302). In some duced. For example, intonations associated with dialects embodiments, any suitable connector, dongle or docking or accents may not be available, or speech associated 40 station may be used to couple electronic device 1302 with particular languages (e.g., languages too different and host device 1320 as part of communications link from a default language) may not be supported. 1340. [0062] In some embodiments, the electronic device [0065] Host device 1320 may include any suitable type may instead or in addition receive audio files associated of device operative to provide audio files to electronic with speakable elements from a host device to which the 45 device 1302. For example, host device 1320 may include electronic device is connected. FIG. 13 is a schematic a computer (e.g., a desktop or laptop computer), a server view of an illustrative communications system including (e.g., a server available over the Internet or using a ded- an electronic device and a host device in accordance icated communications link), a kiosk, or any other suita- with one embodiment of the invention. Communications ble device. Host device 1320 may provide audio files for system 1300 may include electronic device 1302 and 50 speakable elements of the electronic device using any communications network 1310, which electronic device suitable approach. For example, host device 1320 may 1302 may use to perform wired or wireless communica- include a TTS engine that has access to more resources tions with other devices within communications network than one available locally on electronic device 1302. Us- 1310. For example, electronic device 1302 may perform ing a more expansive host device TTS engine, host de- communications operations with host device 1320 over 55 vice 1320 may generate audio files associated with text communications network 1310. Although communica- strings for speakable elements of the electronic device. tionssystem 1300 may include several electronic devices The host device TTS engine may allow the electronic 1302 and host devices 1320, only one of each is shown device to provide voice feedback in different languages

12 23 EP 3 026 541 A1 24 or with personalized accents or voice patterns (e.g. using feedback voice, or build change. a celebrity voice or an accent from a particular region). [0068] The extracted text may be provided to the host The TTS engine may include a general speech device in a data file (e.g., an XML file) generated when and pronunciation rules for different sounds to generate the electronic device boots. This approach may allow for audio for the provided text and convert the generated 5 easier changing of speakable elements with firmware or audio to a suitable format for playback by the electronic software updates, as the compiled firmware or software device (e.g., AIFF files). In some embodiments, the TTS code may include the extracted speakable element in- engine may include a pre-processor for performing music formation needed by the host device to generate audio specific processing (e.g., substituting the string "feat." or files for voice feedback. In response to receiving the text "ft." with "featuring"). In some embodiments, host device 10 file, the host device may generate, using the TTS engine, 1320 may limit the amount of media transferred to the audio files for each of the speakable elements. In some electronic device to account for the storage space need- embodiments, the text file may include an indication of a ed to store the audio files associated with providing voice language change to direct the host device to generate feedback (e.g., calculate the space expected to be need- new audio files for the changed text or using the changed ed for the voice feedback audio files based on the ex- 15 voice or language. Systems and methods for generating pected number of media files stored on the electronic audio files based on a received text file are described in device). more detail in commonly assigned U.S. Publication No. [0066] The host device may identify the text strings for 2006/0095848, entitled "AUDIO USER INTERFACE which to provide audio files using any suitable approach. FOR COMPUTING DEVICES" (Attorney Docket No. In some embodiments, the host device may identify text 20 P3504US1), which is incorporated by reference herein strings associated with data transferred from the host in its entirety. device to the electronic device, and provide the identified [0069] The following flowcharts describe illustrative text strings to a TTS engine to generate corresponding processes for providing audio files used for voice feed- audio files. This approach may be used, for example, for back to an electronic device. FIG. 14 is a flowchart of an text strings associated with metadata for media files (e.g., 25 illustrative process for providing static strings to an elec- title, artist, album, genre, or any other metadata) trans- tronic device. Process 1400 may begin at step 1402. At ferred from the host device to the electronic device (e.g., step 1404, the electronic device may generate a data file music or video). In some embodiments, the electronic listing static strings. For example, the electronic device device may identify the particular metadata for which to may extract, from firmware, strings of text displayed by provide audio feedback to the host device (e.g., the elec- 30 the electronic device for which voice feedback may be tronic device identifies the title, artist and album metada- provided. At step 1406, the electronic device may provide ta). The host device may use any suitable approach for the file to a host device. For example, the electronic de- naming and storing audio files in the electronic device. vice may provide the file to the host device using a wired For example, the audio file name and stored location or wireless communications path. (e.g., directory number) may be the result of applying a 35 [0070] At step 1408, the host device may convert the hash to the spoken text string. static strings of the provided data file to audio files. For [0067] For speakable elements that are not transferred example, the host device may use a TTS engine to gen- from the host device to the electronic device (e.g., text erate audio for each of the static strings (e.g., generate of menu options of the electronic device firmware), how- audio, compress the audio, an convert the audio to a file ever, the host device may not be aware of the text strings 40 format that may be played back by the electronic device). for which the TTS engine is to provide audio files. In some At step 1410, the host device may transfer the generated embodiments, the electronic device may provide a text audio to the electronic device. For example, the host de- file (e.g., an XML file) that includes strings associated vice may transfer the generated audio files to the elec- witheach of the static speakable elements for which voice tronic device over a communications path. Process 1400 feedback is provided to the host device. The electronic 45 may then end at step 1412. The host device may store device may generate the text file with the speakable el- the audio files at any suitable location on the electronic ement strings at any suitable time. In some embodi- device, including for example at a location or directory ments, the file may be generated each time the electronic number resulting from a hash of the text string to speak. device boots based on data extracted from the firmware [0071] FIG. 15 is a flowchart of an illustrative process or software source code during compiling. For example, 50 for providing dynamic strings to an electronic device. when the electronic device compiles the source code as- Process 1500 may begin at step 1502. At step 1504, the sociated with the models and views for display, the elec- host device may identify media to transfer to the elec- tronic device may identify the elements having a speak- tronic device. For example, the host device may retrieve able property (e.g., the speakable elements) and extract a list of media to transfer (e.g., media within playlists) to the text string to speak and the priority associated with 55 transfer to the electronic device. At step 1506, the host the speakable element. In some embodiments, the elec- device may identified metadata strings associated with tronic device may generate the text file in response to the identified media. For example, the host device may detecting a change in the voice feedback language, voice retrieve specific metadata strings identified by a host de-

13 25 EP 3 026 541 A1 26 vice (e.g., artist, title and album strings) for each identified ther comprises providing voice feedback for the el- media item to be transferred to the electronic device. ements of the initial queue in the order of the ele- [0072] At step 1508, the host device may convert the ments in the initial queue. identified metadata strings (e.g., dynamic strings) to au- dio files. For example, the host device may use a TTS 5 6. The method of statement 4, further comprising: engine to generate audio for each of the dynamic strings (e.g., generate audio, compress the audio, an convert changing at least one of the displayed plurality the audio to a file format that may be played back by the of elements; and electronic device). At step 1510, the host device may updating at least a portion of the initial queue in transfer the generated audio to the electronic device. For 10 response to changing. example, the host device may transfer the generated au- dio files to the electronic device over a communications 7. The method of statement 6, further comprising: path. Process 1500 may then end at step 1512. The host device may store the audio files at any suitable location re-identifying at least two of the plurality of ele- on the electronic device, including for example at a loca- 15 ments for which to provide voice feedback in re- tion or directory number resulting from a hash of the text sponse to changing; string to speak. re-determining tiers associated with the display [0073] The above-described embodiments of the of each of the re-identified at least two plurality present invention are presented for purposes of illustra- of elements; and tion and not of limitation, and the present invention is20 generating a revised queue comprising the re- limited only by the claims which follow. identified at least two of the plurality of elements.

Statements of Invention 8. The method of statement 7, further comprising:

[0074] 25 detecting the identified element for which voice feedback was provided during changing; 1. A method for providing voice feedback to a user determining that the detected element has the of an electronic device, comprising: same position and tier in the initial queue and the revised queue; and displaying a plurality of elements; 30 providing voice feedback from the revised identifying at least two of the plurality of ele- queue starting with the detected element. ments for which to provide voice feedback; determining tiers associated with the display of 9. The method of statement 8, further comprising: each of the identified at least two plurality of el- ements; and 35 completing the voice feedback for the detected providing voice feedback for the identified at element. least two plurality of elements in an order of the determined tiers. 10. The method of statement 7, further comprising:

2. The method of statement 1, further comprising: 40 detecting the identified element for which voice feedback was provided during changing; retrieving audio files associated with each of the comparing the initial queue and the revised identified at least two plurality of elements; and queue to identify common portions of the playing back the retrieved audio files. queues; 45 determining that the detected element is not in 3. The method of statement 2, wherein playing back a portion of the revised queue that is in common further comprises playing back in the order of the with the initial queue; and determined tiers. stopping providing voice feedback for the de- tected element. 4. The method of statement 1, further comprising: 50 11. The method of statement 10, further comprising: generating an initial queue comprising the iden- tified at least two plurality of elements in re- identifying the first element of the revised queue sponse to identifying and determining; and following the common portions of the queues; ordering the identified elements in the initial55 and queue based on the determined tiers. providing voice feedback from the revised queue starting with the identified first element. 5. The method of statement 4, wherein providing fur-

14 27 EP 3 026 541 A1 28

12. An electronic device operative to provide audio 18. The electronic device of statement 17, wherein feedback for displayed content, comprising a proc- the input comprises at least one of a user selection essor, a display, and an audio output, the processor of a displayed option and a user instruction to move operative to: a highlight region. 5 direct the display to display a 19. A method for speaking text of elements displayed plurality of elements in views, wherein speaka- by an electronic device, comprising: ble properties are associated with at least two of the plurality of elements; defining a plurality of elements with which determine tiers associated with the views of10 speakable properties are associated; each of the at least two of the plurality of ele- displaying the plurality of elements in a plurality ments from the associated speakable proper- of views, wherein each view is associated with ties; a speakable order; generate a queue comprising the at least two of generating a queue comprising the plurality of the plurality of elements, wherein the order of 15 elements, wherein the order of the plurality of the queue elements is set by the determined elements in the queue is set from the speakable tiers; and order; direct the audio output to sequentially speak pausing for a first timeout; each queue element in the order of the queue. identifying audio files associated with each of 20 the plurality of elements of the queue, wherein 13. The electronic device of statement 12, wherein the audio files comprise text to speak for each the processor is further operative to: element; sequentially playing back the identified audio direct the display to display at least two text files in the order of the queue; and strings; and 25 pausing for a second timeout. direct the audio output to provide voice feedback for the at least two text strings. 20. The method of statement 19, wherein identifying further comprises retrieving audio files associated 14. The electronic device of statement 12, wherein with each of the plurality of elements from a hash of the processor is further operative to: 30 the text to speak.

detect a transaction; and generate a revised 21. The method of statement 19, wherein the audio queue comprising modified elements with which files are received from a host device. a speakable property is associated. 35 22. The method of statement 21, wherein the host 15. The electronic device of statement 14, wherein device generates the audio files using a text to the processor is further operative to: speech engine.

direct the audio output to provide audio associ- 23. The method of statement 22, further comprising: ated with each element of the revised queue in 40 the order of the revised queue. providing the text to speak for each of the plu- rality of elements to the host device; and 16. The electronic device of statement 14, wherein receiving audio files generated using a text to the processor is further operative to: speech engine applied to the provided text to 45 speak for each of the plurality of elements. determine that a displayed element of the plu- rality of elements with which a speakable prop- 24. The method of statement 19, further comprising: erty is associated has changed; and changing at least one of the displayed plurality detect a transaction. 50 of elements; and generating a revised queue comprising the 17. The electronic device of statement 16, wherein changed displayed plurality of elements ordered the processor is further operative to: from the speakable orders associated with the displayed views. detect a user input changing a displayed ele- 55 ment of the plurality of elements with which a 25. A computer readable media for providing voice speakable property is associated; and feedback to a user of an electronic device, the com- detect a transaction. puter readable media comprising computer program

15 29 EP 3 026 541 A1 30

logic recorded thereon for: in the revised queue based on the respec- tive speakable order associated with the displaying a plurality of elements; third speakable item and the at least one identifying at least two of the plurality of ele- fourth speakable item. ments for which to provide voice feedback; 5 determining tiers associated with the display of 3. The method of claim 2, further comprising, when the each of the identified at least two plurality of el- content displayed on the electronic device has ements; and changed during the providing of the voice output cor- providing voice feedback for the identified at respondingto thefirst speakable item and theat least least two plurality of elements in an order of the 10 one speakable item: determined tiers. detecting thespeakable item for which voiceout- put was provided during changing of the content Claims displayed on the electronic device; 15 determining that the detected speakable item 1. A method for providing voice feedback to a user of has the same speakable order in the initial an electronic device, comprising: queue and in the revised queue; and providing voice output from the revised queue identifying at least two speakable items to be starting with the detected speakable item. provided to a user, the at least two speakable 20 items including a first speakable item associated 4. The method of claim 2, further comprising, when the with a first speakable order, and at least one content displayed on the electronic device has second speakable item associated with a sec- changed during the providing of the voice output cor- ond speakable order; respondingto thefirst speakable item and theat least generating an initial queue including the identi- 25 one speakable item: fied first speakable item and the at least one second speakable item; detecting thespeakable item for which voiceout- ordering the first speakable item and the at least put was provided during changing of the content one second speakable item in the initial queue displayed on the electronic device; based on the respective speakable order asso- 30 comparing the initial queue and the revised ciated with the first speakable item and the at queue to identify common portions of the least one second speakable item; and queues; providing voice output corresponding to the first determining that the speakable item for which speakable item and the at least one second voice output was provided during changing of speakable item in the order specified in the initial 35 the content displayed on the electronic device queue. is not in a portion of the revised queue that is in common with the initial queue; and 2. The method of claim 1, further comprising: stopping providing voice output for the speaka- ble item for which voice output was provided dur- determining whether content displayed on the 40 ing changing of the content displayed on the electronic device has changed; electronic device. in accordance with a determination that the con- tent displayed on the electronic device has 5. The method of claim 4, further comprising: changed: 45 identifying a speakable item of the revised identifying at least two further speakable queue that follows the common portions of the items to be provided to the user, the at least queues; and two further speakable items including a third providing voice output from the revised queue speakable item associated with a third starting with the identified speakable item of the speakable order, and at least one fourth50 revised queue that follows the common portions speakable item associated with a fourth of the queues. speakable order; updating at least a portion of the initial 6. The method of claim 1, wherein items associated queue to generate a revised queue includ- with the first speakable order precede items associ- ing at least the third speakable item and the 55 ated with the second speakable order in the initial at least one fourth speakable item; and queue. ordering at least the third speakable item and the at least one fourth speakable item 7. The method of claim 1, wherein providing the voice

16 31 EP 3 026 541 A1 32

output comprises providing the voice output for the of the voice output corresponding to the first speak- first speakable item and the at least one second able item and the at least one speakable item: speakable item sequentially and without human in- tervention. detect the speakable item for which voice output 5 was provided during changing of the content dis- 8. An electronic device operative to provide voice feed- played on the electronic device; back to a user, comprising a processor, a display, determine that the detected speakable item has and an audio output, the processor operative to: the same speakable order in the initial queue and in the revised queue; and identify at least two speakable items to be pro- 10 provide voice output from the revised queue vided to a user, the at least two speakable items starting with the detected speakable item. including a first speakable item associated with a first speakable order, and at least one second 11. The electronic device of claim 9, the processor fur- speakable item associated with a second speak- ther operative to, when the content displayed on the able order; 15 electronic device has changed during the providing generate an initial queue including the identified of the voice output corresponding to the first speak- first speakable item and the at least one second able item and the at least one speakable item: speakable item; order the first speakable item and the at least detect the speakable item for which voice output one second speakable item in the initial queue 20 was provided during changing of the content dis- based on the respective speakable order asso- played on the electronic device; ciated with the first speakable item and the at compare the initial queue and the revised queue least one second speakable item; and to identify common portions of the queues; provide voice output corresponding to the first determine that the speakable item for which speakable item and the at least one second25 voice output was provided during changing of speakable item in the order specified in the initial the content displayed on the electronic device queue. is not in a portion of the revised queue that is in common with the initial queue; and 9. The electronic device of claim 8, the processor fur- stop providing voice output for the speakable ther operative to: 30 item for which voice output was provided during changing of the content displayed on the elec- determine whether content displayed on the tronic device. electronic device has changed; in accordance with a determination that the con- 12. The electronic device of claim 11, the processor fur- tent displayed on the electronic device has35 ther operative to: changed: identify a speakable item of the revised queue identify at least two further speakable items that follows the common portions of the queues; to be provided to the user, the at least two and further speakable items including a third40 provide voice output from the revised queue speakable item associated with a third starting with the identified speakable item of the speakable order, and at least one fourth revised queue that follows the common portions speakable item associated with a fourth of the queues. speakable order; update at least a portion of the initial queue 45 13. The electronic device of claim 8, wherein items as- to generate a revised queue including at sociated with the first speakable order precede items least the third speakable item and the at associated with the second speakable order in the least one fourth speakable item; and initial queue. order at least the third speakable item and the at least one fourth speakable item in the 50 14. The electronic device of claim 8, wherein the proc- revised queue based on the respective essor operative to provide the voice output is further speakable order associated with the third operative to provide the voice output for the first speakable item and the at least one fourth speakable item and the at least one second speak- speakable item. able item sequentially and without human interven- 55 tion. 10. The electronic device of claim 9, the processor fur- ther operative to, when the content displayed on the 15. A non-transitory computer readable storage media electronic device has changed during the providing for providingvoice feedbackto a userof an electronic

17 33 EP 3 026 541 A1 34

device, the computer readable media comprising put was provided during changing of the content computer program logic recorded thereon for: displayed on the electronic device; determining that the detected speakable item identifying at least two speakable items to be has the same speakable order in the initial provided to a user, the at least two speakable 5 queue and in the revised queue; and items including a first speakable item associated providing voice output from the revised queue with a first speakable order, and at least one starting with the detected speakable item. second speakable item associated with a sec- ond speakable order; 18. The non-transitory computer readable storage me- generating an initial queue including the identi- 10 dia of claim 16, further comprising computer program fied first speakable item and the at least one logic for, when the content displayed on the electron- second speakable item; ic device has changed during the providing of the ordering the first speakable item and the at least voice output corresponding to the first speakable one second speakable item in the initial queue item and the at least one speakable item: based on the respective speakable order asso- 15 ciated with the first speakable item and the at detecting thespeakable item for which voiceout- least one second speakable item; and put was provided during changing of the content providing voice output corresponding to the first displayed on the electronic device; speakable item and the at least one second comparing the initial queue and the revised speakable item in the order specified in the initial 20 queue to identify common portions of the queue. queues; determining that the speakable item for which 16. The non-transitory computer readable storage me- voice output was provided during changing of diaof claim 15, further comprisingcomputer program the content displayed on the electronic device logic for: 25 is not in a portion of the revised queue that is in common with the initial queue; and determining whether content displayed on the stopping providing voice output for the speaka- electronic device has changed; ble item for which voice output was provided dur- in accordance with a determination that the con- ing changing of the content displayed on the tent displayed on the electronic device has30 electronic device. changed: 19. The non-transitory computer readable storage me- identifying at least two further speakable dia of claim 18, further comprising computer program items to be provided to the user, the at least logic for: two further speakable items including a third 35 speakable item associated with a third identifying a speakable item of the revised queue that follows the common portions of the speakable order, and at least one fourth speak- queues; and able item associated with a fourth speakable or- providing voice output from the revised queue der; 40 starting with the identified speakable item of the updating at least a portion of the initial queue to revised queue that follows the common portions generate a revised queue including at least the of the queues. third speakable item and the at least one fourth speakable item; and 20. The non-transitory computer readable storage me- ordering at least the third speakable item and 45 dia of claim 15, the computer program logic for pro- the at least one fourth speakable item in the re- viding the voice output further comprises computer vised queue based on the respective speakable program logic for providing the voice output for the order associated with the third speakable item first speakable item and the at least one second and the at least one fourth speakable item. speakable item sequentially and without human in- 50 tervention. 17. The non-transitory computer readable storage me- diaof claim 16, further comprisingcomputer program logic for, when the content displayed on the electron- ic device has changed during the providing of the voice output corresponding to the first speakable 55 item and the at least one speakable item:

detectingthe speakable item for which voice out-

18 EP 3 026 541 A1

19 EP 3 026 541 A1

20 EP 3 026 541 A1

21 EP 3 026 541 A1

22 EP 3 026 541 A1

23 EP 3 026 541 A1

24 EP 3 026 541 A1

25 EP 3 026 541 A1

26 EP 3 026 541 A1

27 EP 3 026 541 A1

5

10

15

20

25

30

35

40

45

50

55

28 EP 3 026 541 A1

5

10

15

20

25

30

35

40

45

50

55

29 EP 3 026 541 A1

REFERENCES CITED IN THE DESCRIPTION

This list of references cited by the applicant is for the reader’s convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description

• US 20080129520 A [0004] • US 02859005 A [0019] • US 90296404 A [0019] • US 20060095848 A [0068]

30