USOO9124910B2
(12) United States Patent (10) Patent No.: US 9,124,910 B2 Kellicker (45) Date of Patent: Sep. 1, 2015
(54) SYSTEMS AND METHODS OF PROCESSING USPC ...... 725/87, 91, 40, 32: 348/468 CLOSED CAPTIONING FOR VIDEO ON See application file for complete search history. DEMAND CONTENT (71) Applicant: Wowza Media Systems, LLC, (56) References Cited Evergreen, CO (US) U.S. PATENT DOCUMENTS (72) Inventor: Scott Kellicker, Worthington, OH (US) (73) Assignee: WOWZA MEDIA SYSTEMS, LLC, 5,347,365 A 9/1994 Harigai et al. Golden, CO (US) 5,572,260 A 11/1996 Onishi et al. (Continued) (*) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 FOREIGN PATENT DOCUMENTS U.S.C. 154(b) by 0 days. (21) Appl. No.: 14/225,674 WO 2009052118 A3 4/2009 (22) Filed: Mar. 26, 2014 OTHER PUBLICATIONS (65) Prior Publication Data "Captions: Embed Closed Captions for Web, Mobile, and OTT', Zencoder, Retrieved on Oct. 4, 2012, Retrieved from <
400
receive, from a destination device, a request for closed captioning data (e.g. in a first language) associated with a -402 stream of a wideo on demand asset, where the stream is associated with a particular outbound streaming protocol
identify a source of the requested closed captioning data (e.g., / 404 based on the first language)
receive the closed captioning data from the identified source in a 408 first format
Convert the closed captioning data from the first format to a platform-independentformat and from the platform-independent / 408 format to a second format that is compatible with the particular outbound streaming protocol
Transmit the closed captioning data in the second format via the / 410 stream
-ul-or-o- Receive a request to switch the closed captioning data from the 42 first language to a second language
---...Ye Transmit second closed captioning data that is associated with the second language and that is in the second format via the 44 stream without interupting the stream US 9,124,910 B2 Page 2
(51) Int. C. 2011/0083144 A1* 4/2011 Bocharov et al...... 725/32 2011/O128443 A1* 6, 2011 Blanchard et al...... 348/468 H4N2L/472 (2011.01) 2011/O1491.53 A1 6, 2011 Nam et al. H4N2L/488 (2011.01) 2011/0164.673 A1* 7, 2011 Shaffer ...... 375,240.01 2011/0302417 A1 12/2011 Whillock et al. (56) References Cited 2012/0023255 A1 1/2012 Edelman et al. 2012/0066235 A1 3/2012 Itakura ...... 707/751
U.S. PATENT DOCUMENTS 2012/0102158 A1* 4, 2012 Ma et al. TO9,219 2012/014.3606 A1* 6, 2012 Pham ...... TO4/235 ::::::::ck m E 2012/01765402012,0240. A1A 9.20127/2012 PhilipseLabrozzi et at al. 7.054,804 B2 * 5/2006 Gonzales et al...... TO4/8 2012/0301 111 A1* 11, 2012 Cordova ...... 386.245 7,272.658 B1 9, 2007 Edelman et al. 2012/0316860 All 12/2012 Reitan 7.587,500 B1 9, 2009 Edelman et al. 2013/0300929 A1* 11/2013 Hines ...... 348/468 796.1878 B2 6, 2011 Whillocket al. 2014/00289 12 A1 1/2014 Lenzi et al...... 348/468 8,051.287 B2 11/2011 Shetty et al. OTHER PUBLICATIONS 8,065.426 B2 11/2011 Edelman et al. 8,205,076 B1 6/2012 Shetty et al. "Closed Captioning of Internet Protocol-Delivered Video Program 58. R 6583 S. al ming: Implementation of the Twenty-First Century Communications 8.285867 B1 10/2012 Edelman et al. and Video Accessibility Act of 2010”. Federal Register, vol. 77. No. 2002fO154891 A1 10, 2002 Kimura et al. 151, Aug. 6, 2012, pp. 46632-46633. 2004/0047589 A1 3f2004 Kim ...... 386/46 “FCC Releases Internet Protocol Captioning Rules', National Asso 2005.0075857 A1* 4, 2005 Elcock et al...... TO4/2 ciation of the Deaf, Retrieved from
U.S. Patent Sep. 1, 2015 Sheet 2 of 4 US 9,124,910 B2 2OO N
co Data in Platform-independent Format
s - 20
NO. 1
Text "JOHN: He should be here any minute." Start Time 0:19:20,000
End Time O:19:22:OOO Language eng
Location 50 4OOD Font Proportional Sans Serif Size 2
Style ------BG Color Black NFG Color White -L Passed-through
220
No. 2 Text "KNOCKING ON THE DOOR” Start Time O:19:21.7OO End Time O:19:23:700 Language eng Location <30, 300> Font Proportional Sans Serif Size 2 Style Italics BG Color Black F.G Color White Passed-through
FIG. 2 U.S. Patent Sep. 1, 2015 Sheet 3 of 4 US 9,124,910 B2
Receive closed captioning data at a computing device, where the closed captioning data is associated with a video on demand 302 asset and wherein the closed captioning data is represented in a first format
Convert the closed captioning data from the first format to a 304 platform-independent format
Convert the closed captioning data from the platform- 306 independent format to a second format
Transmit the closed captioning data in the second format to a 308 destination device
FIG 3 U.S. Patent Sep. 1, 2015 Sheet 4 of 4 US 9,124,910 B2
400 y
Receive, from a destination device, a request for closed Captioning data (e.g., in a first language) associated with a 402
stream of a video on demand asset, where the stream is associated with a particular outbound streaming protocol
ldentify a source of the requested closed captioning data (e.g., 404 based on the first language)
Receive the closed captioning data from the identified source in a 406 first format
Convert the closed captioning data from the first format to a platform-independent format and from the platform-independent 408 format to a second format that is compatible with the particular outbound streaming protocol
Transmit the closed captioning data in the second format via the 410 stream
Receive a request to switch the closed captioning data from the 412 first language to a second language
Transmit second closed captioning data that is associated with the second language and that is in the Second format via the 414 stream without interrupting the stream
F.G. 4 US 9,124,910 B2 1. 2 SYSTEMIS AND METHODS OF PROCESSING devices request closed captions at the same time, multiple CLOSED CAPTIONING FOR VIDEO ON closed caption streams may be generated on-the-fly from the DEMAND CONTENT platform-independent data. For example, platform-indepen dent captioning data may be used to generate consumer elec CROSS-REFERENCE TO RELATED tronics association (CEA)-608 captions that are embedded APPLICATIONS into a hypertext transfer protocol (HTTP) live streaming (HLS) stream, action message format (AMF) on TextData The present application is a continuation of and claims events included in a HTTP dynamic streaming (HDS) stream, priority to U.S. patent application Ser. No. 13/889,979, filed and/or captions included in a smooth streaming manifest. The May 8, 2013, which is a continuation of U.S. patent applica 10 described systems and methods may advantageously utilize a tion Ser. No. 13/652,272, filed Oct. 15, 2012, the content of plug-in architecture, enabling users to create custom ingest each of which is incorporated by reference herein in its ers to convert input captions to the platform-independent entirety. format and custom converters to generate output captions from data represented in the platform-independent format. BACKGROUND 15 BRIEF DESCRIPTION OF THE DRAWINGS The popularity of the Internet, coupled with the increasing capabilities of personal/mobile electronic devices, has pro FIG. 1 is a block diagram of a particular embodiment of a vided consumers with the ability to enjoy multimedia content system that is operable to process closed captioning for video almost anytime and anywhere. For example, live (e.g., sports on demand content; events) and video on demand (VOD) content (e.g., pre-re FIG. 2 is a diagram to illustrate a particular embodiment of corded television shows and movies) can be streamed via the a platform-independent closed captioning format that may be Internet to personal electronic devices (e.g., computers, used by the system of FIG. 1; mobile phones, and Internet-enabled televisions). FIG. 3 is a flowchart to illustrate a particular embodiment Broadcast television channels typically include one or 25 of a method of processing closed captioning for video on more closed captioning tracks to make programs more acces demand content; and sible to those that are deaf or hard of hearing. Similarly, FIG. 4 is a flowchart to illustrate another particular movies on digital video discs (DVDs) often include subtitle embodiment of a method of processing closed captioning for tracks in multiple languages. However, Internet-accessible Video on demand content. content is not typically provided with closed captioning. 30 Moreover, the large number of available closed captioning DETAILED DESCRIPTION formats makes it difficult to implement closed captioning systems that are compatible with the various electronic FIG. 1 is a block diagram of a particular embodiment of a devices and platforms available to consumers, even though system 100 that is operable to process closed captioning for government regulations may mandate closed captioning for 35 video on demand content. It should be noted, however, that Internet-accessible content. For example, in the United although various embodiments herein are described with ref States, the Federal Communications Commission (FCC) has erence to closed captioning (e.g., text intended for those that issued a rule entitled “Closed Captioning of Internet Proto are deafor hard of hearing, and therefore including transcrip col-Delivered Video Programming: Implementation of tions of Sounds, Such as “wind howls” or "knocking on Twenty-First Century Communications and Video Accessi 40 door), the embodiments may also be used with subtitles bility Act of 2010. Many providers for streaming video may (e.g., written translations of dialogue being spoken). More be concerned with only a specific streaming technology, over, embodiments described herein may also be used to application, or platform. Thus, a single-platform closed cap process and deliver "open’ captioning (i.e., captions that tioning system offered by Such providers in response to gov appear “hardcoded into a video stream and therefore cannot ernment regulations may not support a large number of users 45 be selectively disabled like closed captioning). that use different closed captioning formats and may not scale The system 100 includes a media server 102 that includes as new closed captioning formats become popular. one or more input interfaces 104, one or more output inter faces 106, and one or processors 108. As used herein, the term SUMMARY “media server may refer to software application(s) that per 50 form media serving and processing, hardware systems (e.g., Systems and methods of processing closed captioning data servers) that Support or perform media serving or processing, for video on demand content are disclosed. The disclosed or any combination thereof. Thus, various operations systems and methods may support receiving closed caption described with reference to a media server may be imple ing data in a variety of embedded and file-based formats and mented using hardware, Software (e.g., instructions execut outputting, on-the-fly, closed captioning data in accordance 55 able by the processor(s) 108) or any combination thereof. with a variety of output protocols for delivery to a variety of The input interface(s) 104 may be configured to receive devices and mobile device platforms. closed captioning data associated with a video on demand For example, closed captioning data in one or more lan (VOD) asset, such as a pre-recorded television show, movie, guages may be received as a timed text markup language etc. The incoming closed captioning data may be represented (TTML) file or as part of a motion picture experts group 60 in a first format (e.g., a platform-specific format). In a par (MPEG)-4 part 14 (MP4) video stream. The closed caption ticular embodiment, the input interface(s) 104 may be wired ing data may be converted into a platform-independent for and/or wireless interfaces that enable the media server 102 to mat (e.g., a platform-independent timed text representation). receive data via a network, such as the Internet. For example, The data in the platform-independent format may then be the input interface(s) 104 may include an Ethernet interface, converted into a desired output format that is compatible with 65 a universal serial bus (USB) interface, a wireless interface a streaming protocol to be used to deliver the closed captions compatible with an Institute of Electrical or Electronics Engi (and video stream) to a destination device. If multiple types of neers (IEEE) 802.11 protocol, etc. For example, the input US 9,124,910 B2 3 4 interface(s) 104 may include a storage device, a file system, limited to, action message format (AMF) on TextData events, network attached storage (NAS), cloud based storage, etc. consumer electronics association standards (CEA)-608, that is accessible to the media server 102 via a wired or CEA-708, motion pictures expert group (MPEG)-4 part 17, wireless connection to a network (e.g., the Internet). In some 3rd generation partnership project (3GPP) timed text, digital embodiments. Such input devices may be accessed without a video broadcasting (DVB) subtitling, sistema brasileiro de network connection (e.g., via a direct load operation, a direct televisao digital (SBTVD), digital terrestrial multimedia electrical link, etc.). broadcast (DTMB), and world system teletext (WST). The The output interface(s) 106 may be configured to transmit file formats may include, but are not limited to, Scenarist closed captioning data to a destination device (e.g., Smart closed captioning (SCC), timed text markup language phones, tablet computers, laptop computers, set-top boxes, 10 (TTML), distributed format exchange profile (DFXP), soci televisions, portable media players, game consoles, etc.). The ety of motion picture and television engineers (SMPTE) outgoing closed captioning data may be represented in a timed text (SMPTE-TT), web video text tracks (WebVTT), second format (e.g., a second platform-specific format). In a SubRip (SRT), synchronized accessible media interchange particular embodiment, the output interface(s) 106 may be (SAMI), European broadcasting union (EBU)-STL, and EBU wired and/or wireless interfaces that enable the media server 15 timed text (EBU-TT). 102 to transmit data via a network, such as the Internet. For The media server 102 may include a processing module example, the output interface(s) 106 may include an Ethernet 116 and one or more storage devices 118. The processing interface, a USB interface, a wireless interface compatible module 116 may perform various VOD stream processing with an IEEE 802.11 protocol, etc. In some embodiments, operations, such as video transcoding, aspect ratio conver like the input interface(s) 104, the output interface(s) 106 may Sion, resolution conversion, color space conversion, format include a storage device, a file system, network attached conversion, etc. The processing module 116 may also per storage (NAS), cloud based storage, etc. that is accessible to form closed caption processing operations on closed caption the media server 102 via a wired or wireless connection to a ing data that is received from the ingesters 110-114 in the network (e.g., the Internet). Alternately, Such output devices platform-independent format. may be accessed without a network connection (e.g., via a 25 The data storage device(s) 118 may include volatile direct store operation, a direct electrical link, etc.). Thus, memory (e.g., random access memory (RAM)). Alternately, various types of input interface(s) 104 and output interface(s) or in addition, the storage device(s) 118 may include non 106 may be supported by the media server 102. Volatile memory, such as one or more hard disks. In a particu The media server 102 may also include various compo lar embodiment, the processing module 116 may store plat nents configured to perform closed caption processing and 30 form-independent closed captioning data and/or associated stream processing functions. For example, the media server VOD assets in the data storage device(s) 118 for subsequent 102 may include one or more ingesters that are executable by retrieval. Alternately, or in addition, the data storage device(s) the processor(s) 108. Each ingester may be configured to 118 may be used to temporarily cache data during stream convert closed captioning data that is received via the input processing and closed caption processing. Storage and cach interface(s) 104 to a platform-independent format (e.g., a 35 ing of data may be performed in-memory (e.g., to RAM) “generic' timed text representation). An illustrative example and/or to hard disk(s). of a platform-independent format for closed captioning data The media server 102 may further include one or more is further described with reference to FIG. 2. converters that are executable by the processor(s) 108. Each One or more of the ingesters may be configured to process converter may be configured to convert closed captioning closed captioning data that is included in data file(s). For 40 data that is received from the ingesters 110-114, the process example, a TTML ingester 110 may be configured to convert ing module 116, and/or the storage device(s) 118 from the closed captioning data that is included in a TTML file 140 platform-independent format to a particular output format. received from a first caption source 130 (e.g., the media server Various output formats may be supported by the media server 102 itself, an external caption server, or an external VOD 102. server) from a TTML format to the platform-independent 45 For example, a hypertext transfer protocol (HTTP) live format. The TTML ingester 110 may read and process the streaming (HLS) converter 120 may be configured to convert TTML file 140 in “chunks” or all at once. Additional ingesters closed captioning data from the platform-independent format may be provided to process embedded (e.g., “in-stream”) into CEA-608 format closed captioning that can be embedded captions. For example, an MP4 ingester 112 may be config in a HLS stream 160 transmitted to a tablet computer 150. The ured to extract closed captioning data that is embedded in an 50 HLS stream 160 may include all or a portion of a VOD asset MP4 VOD stream 142 received from a second caption source associated with the closed captioning. In a particular embodi 132 and to convert the extracted closed captioning data to the ment, the HLS stream 160 may be used to deliver VOD platform-independent format. The MP4 VOD stream 142 content and associated closed captioning to iOSR devices may include audio and video data associated with the VOD (iOS is a registered trademark of Cisco Systems, Inc. of San asset as well as timed text information associated with closed 55 Jose, Calif. and is used by Apple Inc. of Cupertino, Calif. captioning for the VOD asset. One or more ingesters may also under license). be configured to receive caption data from an external Soft As another example, a HTTP dynamic streaming (HDS) ware system (e.g., Software as a service (SaaS)). converter 122 may convert closed captioning data from the In a particular embodiment, the media server 102 may platform-independent format into onTextData events that can expose details of the platform-independent format, so that 60 be embedded in a HDS stream 162 that may be transmitted to users may define “custom' ingesters (e.g., an illustrative cus a smartphone 152 or other device that is compatible with tom ingester 114 that receives custom format closed caption video for Adobe R Flash R (Adobe and Flash are registered ing data 144 from a third caption source 134) that can convert trademarks of Adobe Systems Inc. of San Jose, Calif.). closed captioning data from other formats to the platform As yet another example, the media server 102 may include independent format. Various embedded formats and file for 65 a Smooth streaming converter 124. "Smooth streaming may mats may be supported via “built-in” or custom ingesters. For generally refer to an Internet Information Services (IIS) example, the embedded formats may include, but are not extension from Microsoft Corp. of Redmond, Wash. that US 9,124,910 B2 5 6 implements adaptive bitrate streaming. The Smooth stream tioning data from the platform-independent format into a ing converter 124 may convert closed captioning data from second format that is compatible with particular destination the platform-independent format into data that can be embed device(s). ded in a Smooth streaming stream 164 to be provided to a The aforementioned operations may be performed on-the desktop computer 154 or included in a manifest associated 5 fly (e.g., during a single VOD streaming session) in response with the stream 164. To illustrate, the smooth streaming to a request from a destination device for closed captioning stream 164 may be provided to devices running Microsoft(R) data associated with a stream of a VOD asset. The request Silverlight(R) or other smooth streaming clients (Microsoft may be received during communication of a stream of the and Silverlight are registered trademarks of Microsoft Corp. VOD asset, where the VOD asset is being received from one 10 of the VOD sources 130-134 or the storage device(s) 118, of Redmond, Wash.). transcoded by the processing module 116, and provided to the In a particular embodiment, each of the ingesters 110-114 destination device. For example, a user at a destination device and the converters 120-126 may be implemented using soft may elect to turn on closed captions for a VOD stream that the ware classes. For example, each of the ingesters 110-114 and user is viewing. In a particular embodiment, the request may the converters 120-126 may be Java classes that are execut 15 also specify a desired language for the closed captions. In able by a Java Virtual Machine (NM) executing on the pro response to the request, and without interrupting display of cessor(s) 108. In alternate embodiments, non-Java imple the VOD stream, the media server 102 may identify a source mentations may be used. The media server 102 may thus of the requested closed captioning data, load an ingester that provide a plug-in architecture to implement closed captioning is compatible with the format of the closed captioning data for VOD assets. Specific ingesters and converters may be provided by the source, process the platform-independent loaded dynamically (e.g., just-in-time), based on what types closed captioning data provided by the ingester, load a con of captioning data is being received and what types of cap verter to convert the processed platform-independent closed tioning data is being transmitted. captioning data into a format that is compatible with the In a particular embodiment, the media server 102 may destination device, and provide the converted closed caption expose details of the platform-independent format. For 25 ing data to the destination device. Similarly, the media server example, the platform-independent format may be made 102 may switch closed captioning from a first language to a available via application programming interfaces (APIs), second language without interrupting communication of the documentation, and/or publically available class declaration VOD stream. Moreover, it should be noted that although or definition files. Exposing details of the platform-indepen multiple ingesters 110-114 and multiple converters 120-126 30 are shown in FIG.1, this is for example only. In some embodi dent format may enable users, or third party developers, to ments, a single ingester or type of ingester, and/or a single define “custom’ converters that can convert closed captioning converter or type of converter, may be present. For example, data from platform-independent formats into other formats. the media server 102 may include the TTML ingester 110 but For example, a custom converter 126 may generate a stream no video stream-based ingester. As another example, the 166 that is provided to an Internet-enabled television 156. The 35 media server 102 may include the MP4 ingester 112 but no output formats supported by the media server 102 may file-based ingester. include, but are not limited to, HLS, EMS, smooth streaming, The system 100 of FIG.1 may thus enable dynamic receipt, and MPEG dynamic adaptive streaming over HTTP (MPEG conversion, and transmission of closed captioning data for DASH). VOD assets. For example, a VOD asset provider may provide In a particular embodiment, custom converters may also be 40 a TTML file with closed captioning data and rely on the media used to perform other output processing functions. For server 102 to dynamically process the TTML file to provide example, the custom converter 126 may color all captions closed captioning over multiple outbound protocols to mul associated with dialogue of a first speaker (e.g., “John') blue tiple types of devices. Unlike single-format systems, the sys and all captions associated with dialogue of a second speaker tem 100 of FIG. 1 may process data in various input and (e.g., “Susan') orange (e.g., to enhance caption readability or 45 output formats, thereby providing a flexible closed captioning to make Such captions more distinguishable to a viewer that is framework that accommodates the disparity in formats pro color-blind). Multiple converters may be applied prior to vided by VOD sources and formats expected by destination transmitting an output stream to a destination device. Simi devices. Moreover, the provided framework may be extended larly, multiple ingesters may be applied when processing an by users or third parties by creation of custom ingesters and input stream from a VOD or caption source. 50 custom converters (e.g., as new input and output formats During operation one or more of the ingesters 110-114 may become popular). receive closed captioning data associated with a VOD asset, FIG. 2 is a diagram to illustrate a particular embodiment of where the closed captioning data is represented in a first a platform-independent closed captioning format that may be format. For example, the closed captioning data may be in a used by the system 100 of FIG. 1, and is generally designated file that is received by the media server 102 along with the 55 2OO. VOD asset. Alternately, the VOD asset and/or the closed In FIG. 2, the platform-independent closed captioning data captioning data may previously have been received by the includes a first segment 210 of timed text and a second seg media server 102 and stored in the storage device(s) 118. As ment 220 of timed text. Each of the segments 210, 220 may another example, the closed captioning data may be embed include a segment number 231, text 232, a start time 233, an ded in a stream of the VOD asset that is received by the media 60 end time 234, and a language identifier 235. The segments server 102. The ingesters 110-114 may convert the closed 210, 220 may also include layout information, such as a captioning data from the first format into the platform-inde location 236 (e.g.,
k k k k k