1 3GPP2 C.S0050-0 Version 1.0 Date: 12 December 2003

2 3GPP2 File Formats for Multimedia Services

3

4 COPYRIGHT

3GPP2 and its Organizational Partners claim copyright in this document and individual Organizational Partners may copyright and issue documents or standards publications in individual Organizational Partner's name based on this document. Requests for reproduction of this document should be directed to the 3GPP2 Secretariat at [email protected]. Requests to reproduce individual Organizational Partner's documents should be directed to that Organizational Partner. See www.3gpp2.org for more information. 5 6 C.S0050-0 v1.0

1

2 1 Contents

3 1 Contents ...... 2

4 2 List of Figures...... 3

5 3 List of Tables ...... 4

6 4 Scope...... 5

7 5 References...... 5

8 6 Abbreviations...... 6

9 7 Introduction...... 7

10 8 3GPP2 File Format “.3g2”...... 8 11 8.1 Conformance ...... 8 12 8.1.1 File identification 8 13 8.1.2 Registration of codecs 9 14 8.1.3 Interpretation of 3GPP2 file format 9 15 8.1.4 Limitation of the ISO base media file format 9 16 8.2 Codec Registration ...... 9 17 8.2.1 Overview 9 18 8.2.2 Sample Description Box 9 19 8.3 Video ...... 11 20 8.3.1 MPEG-4 Video 11 21 8.3.2 H.263 12 22 8.4 Audio and Speech...... 14 23 8.4.1 MPEG-4 AAC 15 24 8.4.2 AMR 15 25 8.4.3 EVRC 18 26 8.4.4 13K (QCELP) 20 27 8.4.5 SMV 23 28 8.5 Timed Text Format...... 25 29 Text wrap 25 30 TextWrapBox 26

31 9 Presentation and Layout Support (SMIL)...... 26 32 9.1 Media Synchronization and Presentation Format...... 26 33 9.1.1 Document Conformance 27 34 9.1.2 User Agent Conformance 27 35 9.1.3 3GPP2 SMIL Language Profile definition 27 36 9.1.4 Content Model 30

File Format for Multimedia Services for 2 C.S0050-0 v1.0

1 10 File Format for 13K Speech “.QCP” ...... 31

2 11 Compact Multimedia Format “.cmf” ...... 31 3 11.1 Media Presentation Format...... 32 4 11.1.1 Definitions for cmf format 32 5 11.2 Definition of a cmf file ...... 32 6 11.3 Definition of a cmf-header...... 32 7 11.4 Definition of cmf-tracks ...... 35 8 11.5 Tables ...... 40 9 11.5.1 TimeBase Values 40 10 11.5.2 Pitch Bend Range Values 41 11 11.5.3 Fine Pitch Bend Values 41 12 11.6 Acceptable Profiles for CMF file format...... 42 13 11.6.1 Talking Picture messaging 42 14 11.6.2 Audio-only profile 42 15 11.6.3 Picture ringers 42 16 CMF Conformance Guidelines ...... 42 17 11.7.1 Subchunk Requirements 42 18 11.7.2 MIDI Requirements 43 19 11.7.3 Wave Packet Requirements 43 20 11.7.4 Cuepoints and Jumps 43 21 11.7.5 Recycle Requirements 43 22 11.8 File Extension and MIME type for Media presentation ...... 44

23 Annex A. File formats Difference with 3GPP (Informative)...... 44 24 A.1 Relations between ISO, 3GPP, and 3GPP2 file format...... 44 25 A.2 Differences between 3GPP2 and 3GPP ...... 45 26 A.3 Usage of 3GPP branding...... 46

27 Annex B. Guideline for File Format Usage (Informative)...... 46 28 B.1 MSS (Multimedia Streaming Service) ...... 46 29 B.1.1 Server storage for RTP streaming 46 30 B.1.2 Transmission format for pseudo-streaming 46 31 B.2 MMS ...... 48 32 B.3 File download and play back...... 48 33 C.1 EVRC and SMV Speech ...... 48

34

35 2 List of Figures

36 37 Figure 8-1: ISO File Format Box Structure Hierarchy ...... 10 38 Figure 8-2: Frame byte alignment...... 20 39 Figure 8-3: SMV frame byte alignment...... 23

File Format for Multimedia Services for cdma2000 3 C.S0050-0 v1.0

1 Figure A 1: File formats in ISO...... 44 2 Figure A 2: 3GPP file format ...... 45 3 Figure A 3: 3GPP2 file format...... 45 4 Figure B 1: Hinted Presentation for Streaming (Reprint from ISO/IEC 14496-12) ...... 46 5 Figure B 2: Basic sequence of pseudo-streaming ...... 47 6 Figure B 3: Fragmented movie file format...... 48 7 8

9 3 List of Tables

10 11 Table 8-1: The File-Type Box ...... 8 12 Table 8-2: SampleEntry fields...... 11 13 Table 8-3: MP4VisualSampleEntry fields ...... 12 14 Table 8-4: H263SampleEntry fields ...... 13 15 Table 8-5: The H263SpecificBox fields H263SampleEntry...... 14 16 Table 8-6: AudioSampleEntry fields...... 15 17 Table 8-7: AMRSampleEntry fields...... 16 18 Table 8-8: The AMRSpecificBox fields for AMRSampleEntry...... 16 19 Table 8-9: AMRDecSpecStruc ...... 17 20 Table 8-10: EVRCSampleEntry fields...... 19 21 Table 8-11: The EVRCSpecificBox fields for EVRCSampleEntry ...... 19 22 Table 8-12: EVRCDecSpecStruc ...... 19 23 Table 8-13: QCELPSampleEntry fields...... 21 24 Table 8-14: The QCELPSpecificBox fields for QCELPSampleEntry ...... 21 25 Table 8-15: QCELPDecSpecStruc ...... 21 26 Table 8-16: Mapping table ...... 23 27 Table 8-17: SMVSampleEntry fields...... 24 28 Table 8-18: The SMVSpecificBox fields for SMVSampleEntry ...... 24 29 Table 8-19: SMV DECSpecStruc ...... 25 30 Table 8-20: TextWrapBox description ...... 26 31 Table 9-1: Content model for the 3GPP2 SMIL profile ...... 31 32 Table 11-1: TimeBase Values...... 41 33 Table 11-2: Pitch Bend Range values...... 41 34 Table 11-3: Fine PITCH bend range values ...... 42 35

File Format for Multimedia Services for cdma2000 4 C.S0050-0 v1.0

1 4 Scope

2 The objective is to define and standardize a set of common file formats to be used in 3 multimedia services such as Multimedia Streaming Service (MSS) and Multimedia 4 Messaging Service (MMS) and to provide interoperability with existing 3G and the 5 Internet multimedia services to the greatest extent possible. The specific media types 6 and descriptions to be covered include: video, audio, images, graphics, high fidelity 7 audio as well as presentation layout and synchronization.

8 5 References

9 1. S.R0001 – 3GPP2 Specifications List. 10 2. S.R0002 – 3G Capability Descriptions. 11 3. ISO/IEC 14496-12:2003 “Information technology – Coding of audio- 12 visual-objects – Part 12: ISO base media file format”. 13 4. ISO/IEC 14496-14:2003, “Information technology – Coding of audio- 14 visual-objects – Part 14: MP4 file format”. 15 5. 3GPP TS 26.234, “Transparent end-to-end packet switched streaming 16 service (PSS): Protocols and codecs (Release 4)”. 17 6. 3GPP TS 26.234, “Transparent end-to-end packet switched streaming 18 service (PSS): Protocols and codecs (Release 5)”. 19 7. ISO/IEC 14496-1~6: “Information Technology — Generic Coding of 20 Audio-Visual Object”. 21 8. ITU-T Recommendation H.263: “Video Coding for Low Bitrate 22 Communication” 23 9. 3GPP TS 26.071 "Adaptive Multi-Rate (AMR) Speech Codec; General 24 Description” 25 10. 3GPP TS 26.171: “AMR Speech Codec, wideband; General Description”. 26 11. C.S0014-0-1: Enhanced Variable Rate Codec, Speech Service Option 3 27 for Wideband Spread Spectrum Digital Systems 28 12. IETF RFC 2658: “RTP Payload Format for PureVoice(tm) Audio”. 29 13. IETF RFC 3558: “RTP Payload Format for Enhanced Variable Rate 30 Codecs (EVRC) and Selectable Mode Vocoders (SMV)”, June, 2003.b 31 14. IETF RFC 3267: “RTP payload format and file storage format for the 32 Adaptive Multi-Rate (AMR) Adaptive Multi-Rate Wideband (AMR-WB) 33 audio codecs”, March 2002. 34 15. C.S0020-0: “High Rate Speech Service Option 17 for Wideband Spread 35 Spectrum Communications Systems” 36 16. C.S0030-0 Version 2.0 Selectable Mode Vocoder Service Option for 37 Wideband Spread Spectrum Communication Systems. 38 17. 3GPP TS 26.101: “Adaptive Multi-Rate (AMR) speech codec frame 39 structure” 40 18. 3GPP TS 26.201: “AMR Wideband Speech Codec; Frame Structure”

File Format for Multimedia Services for cdma2000 5 C.S0050-0 v1.0

1 19. W3C Recommendation: "Synchronized Multimedia Integration Language 2 (SMIL 2.0)", http://www.w3.org/TR/2001/REC-smil20-20010807/, 3 August 2001. 4 20. ITU-T Recommendation H.263 (annex X): "Annex X: Profiles and levels 5 definition". 6 21. IETF RFC 2234: “Augmented BNF for Syntax Specification: ABNF”. 7 22. IETF RFC 3625: “The QCP File Format and Media Types for Speech 8 Data”. 9 23. ”“Unicode Standard Annex #13: Unicode Newline guidelines”, by Mark 10 Davis. An integral part of “The Unicode Standard, Version 3.1. 11 24. "Standard MIDI Files 1.0", RP-001, "The Complete MIDI 1.0 Detailed 12 Specification, Document Version 96.1" The MIDI Manufacturers 13 Association, Los Angeles, CA, USA, February 1996. 14 25. ITU-T Recommendation T.81: "Information technology; Digital 15 compression and coding of continuous-tone still images: Requirements 16 and guidelines". 17 26. IETF RFC 2083: "PNG (Portable Networks Graphics) Specification version 18 1.0 ". 19 27. Technical note TN1081: “Understanding the Differences between Apple 20 and Windows IMA-ADPCM Compressed Sound Files”, Developer 21 Connection, http://developer.apple.com/technotes/tn/tn1081.html 22 28. “Windows BMP” 23 http://msdn.microsoft.com/library/default.asp?url=/library/en- 24 us/gdi/bitmaps_5jhv.asp 25 29. “Windows Multimedia, Resource /WAVE” 26 http://msdn.microsoft.com/library/default.asp?url=/library/en- 27 us/multimed/htm/_win32_resource_interchange_file_format_services.as 28 p , http://msdn.microsoft.com/library/default.asp?url=/library/en- 29 us/multimed/htm/_win32_about_waveform_audio.asp 30

31 6 Abbreviations

32 For the purpose of this document, the following abbreviations apply: 33 3G Third Generation system 3GP File Format for 3GPP Multimedia Services 3GPP Third Generation Partnership Program 3GPP2 Third Generation Partnership Project 2 AAC AMR Adaptive Multi-Rate AMR-WB Adaptive Multi-Rate - Wideband BMP Bit Map Picture

File Format for Multimedia Services for cdma2000 6 C.S0050-0 v1.0

CMF Compact Multimedia Format EVRC Enhanced Variable Rate Codec IETF Internet Engineering Task Force IP Internet Protocol ISO International Standards Organization ITU-T International Telecommunication Union - Telecommunication Sector JPEG Joint Photographic Experts Group MIDI Musical Instrument Digital Interface MIME Multipurpose Internet Mail Extensions MMA MIDI Manufacturers Association MPEG Moting Picture Experts Group QCELP Qualcomm Code Excited Linear Prediction PNG Portable Network Graphihcs RFC Request for Comments RTP Realtime Transport Protocol SMIL Synchronized Multimedia Integration Language SMV Selectable Mode Vocoder 1

2 7 Introduction

3 The purpose of this standard is to define a set of file formats to be used with 3GPP2 4 multimedia services. Among these file formats is a new format designated as the 3GPP2 5 file format or “.3g2” file format. It is the recommended format to use and can contain 6 multiple media types (such as, video, audio, and timed text). Also included in this 7 release are a file format for 13K [15], “.qcp”, a presentation and layout description 8 language and file format, “.smi”, and a compact multimedia file format, “.cmf”. 9 These file formats can be used for, but not limited to 10 - multimedia content downloading to a terminal, 11 - multimedia file generation and uploading from the originating terminal, 12 - multimedia content exchange between MMS and/or MSS servers, 13 - multimedia content storing to a server, and 14 - multimedia message exchange with other industry system. 15 This document does not specify how this file format is used in specific services. In other 16 words, it should be expressly mandated, if necessary, in other service specifications 17 whether to use this specification.

File Format for Multimedia Services for cdma2000 7 C.S0050-0 v1.0

1 8 3GPP2 File Format “.3g2”

2 The purpose of this section is to define the 3GPP2 file format for multimedia services. 3 This file format is based on the ISO base media file format [3]. Also, it adopts the 4 methodology defined in [5] to integrate necessary structures for inclusion of non-ISO 5 codecs such as H.263 [8], AMR [9], AMR-WB [10], and extends this approach to include 6 3GPP2 specific codecs: EVRC [11], SMV [16] and 13K Speech (QCELP) [15].

7 8.1 Conformance

8 The 3GPP2 file format, used in the specification for timed media (such as video, audio, 9 and timed-text), is structurally based on the ISO base media file format defined in [3]. 10 However, the conformance statement for 3GPP2 files is defined in the present document 11 by addressing file identification (file extension, brand identifier and MIME type 12 definition) and registration of codecs. 13 14 NOTE: Future releases may expand the conformance statement for 3GPP2 files to include more codecs 15 or functionalities by defining new boxes. Boxes of unknown type in the 3GPP2 file shall be 16 ignored.

17 8.1.1 File identification

18 3GPP2 multimedia files can be identified using several mechanisms. When stored in 19 traditional computer file systems, these files should be given the file extension “.3g2” 20 (readers should allow mixed case for the alphabetic characters). The following MIME 21 types are expected to be registered and should be used: “video/3gpp2” (for visual or 22 audio/visual content, where visual includes both video and timed text) and 23 “audio/3gpp2” (for purely audio content). 24 A file-type box Table 8-1, as defined in the ISO Base Media File Format [3] shall be 25 present in conforming files. The file type box ‘ftyp’ shall occur before any variable-length 26 box (e.g. movie, free space, media data). Only a fixed-size box such as a file signature, 27 if required, may precede it. 28 The brand identifier for this specification is '3g2a'. This brand identifier shall occur in 29 the compatible brands list, and may also be the primary brand. Readers should check 30 the compatible brands list for the identifiers they recognize, and not rely on the file 31 having a particular primary brand, for maximum compatibility. Files may be 32 compatible with more than one brand, and have a 'best use' other than this 33 specification, yet still be compatible with this specification. 34

Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) 'ftyp' Brand Unsigned int(32) The major or ‘best use’ of this file MinorVersion Unsigned int(32) CompatibleBrands Unsigned int(32) A list of brands, to end of the Box 35 Table 8-1: The File-Type Box

36 Brand: Identifies the ‘best use’ of this file. The brand should match the file extension. 37 For files with extension ‘.3g2’ and conforming to this specification, the brand shall

File Format for Multimedia Services for cdma2000 8 C.S0050-0 v1.0

1 be ‘3g2a’. 2 MinorVersion: This identifies the minor version of the brand. Files with brand '3g2X', 3 where X is an alphabet, shall have a corresponding releaseX.Y.Z such that X = 1 4 when the brand equals A; X = 2 when the brand equals B; and so on. A conforming 5 minor version value for releaseX.y.z uses the byte aligned and right adjusted value 6 of releaseX*2562 + y*256+z. 7 CompatibleBrands: a list of brand identifiers (to the end of the Box). ‘3g2a’ shall be a 8 member of this list. The brand compatibility list shall include major brands “3gp4” 9 and/or “3gp5” as described in [5] and [6] when the file content meets the conditions 10 defined in Annex A section A.3.

11 8.1.2 Registration of codecs

12 In 3GPP2 files, MPEG-4 video and MPEG-4 AAC audio streams, and other ISO codec 13 streams as well as the non-ISO media streams such as AMR narrow-band speech, AMR 14 wide-band speech, EVRC speech, H.263 video, 13K speech, SMV speech and timed text 15 can be included as described in this specification.

16 8.1.3 Interpretation of 3GPP2 file format

17 All index numbers used in the 3GPP2 file format start with the value one rather than 18 zero, in particular “first-chunk” in Sample to chunk box, “sample-number” in Sync 19 sample box and “shadowed-sample-number”, “sync-sample-number” in Shadow sync 20 sample box.

21 8.1.4 Limitation of the ISO base media file format

22 The following limitation to the ISO base media file format [3] shall apply to a 3GPP2 file: 23 A 3GPP2 file shall be self-contained, i.e., there shall not be references to external media 24 data from inside the 3GPP2 file.

25 8.2 Codec Registration

26

27 8.2.1 Overview

28 The purpose of this section is to give some background information about the Sample 29 Description Box in the ISO base media file format [3]. The following sections define the 30 necessary structure for integration of the MPEG-4 video, H.263 video, AMR speech, 31 AMR-WB speech, MPEG-4 AAC audio, 13K speech, EVRC speech, and SMV speech 32 media specific information in a 3GPP2 file. Section 8.3 and 8.4 describe Sample Entries 33 for video and audio media respectively. Section 8.5 describes TextSampleEntry for the 34 timed text.

35 8.2.2 Sample Description Box

36 In an ISO file, Sample Description Box gives detailed information about the coding type 37 used, and any initialization information needed for that coding. The Sample Description 38 Box can be found in the ISO file format Box Structure Hierarchy shown in Figure 8-1.

File Format for Multimedia Services for cdma2000 9 C.S0050-0 v1.0

Movie Box

Track Box

Media Box

Media Information Box

Sample Table Box

Sample Description Box

1

2 Figure 8-1: ISO File Format Box Structure Hierarchy

3 The Sample Description Box can have one or more Sample Entry Boxes. Valid Sample 4 Entry Boxes already defined for ISO [5] and MP4 [5] include MP4AudioSampleEntry, 5 MP4VisualSampleEntry, and HintSampleEntry. 6 In addition, the Sample Entry Box for H.263 video shall be H263SampleEntry. 7 The Sample Entry Box for AMR and AMR-WB speech shall be AMRSampleEntry. 8 The Sample Entry Box for EVRC speech shall be EVRCSampleEntry. 9 The Sample Entry Box for 13K (QCELP) speech shall be QCELPSampleEntry or 10 MP4AudioSampleEntry. 11 The Sample Entry Box for SMV speech shall be SMVSampleEntry. 12 The Sample Entry Box for timed text shall be TextSampleEntry. 13 14 The format of SampleEntry and its fields are explained as follows: 15

File Format for Multimedia Services for cdma2000 10 C.S0050-0 v1.0

1 SampleEntry ::= 2 MP4VisualSampleEntry | 3 MP4AudioSampleEntry | 4 H263SampleEntry | 5 AMRSampleEntry | 6 EVRCSampleEntry | 7 QCELPSampleEntry | 8 SMVSampleEntry | 9 TextSampleEntry | 10 HintSampleEntry 11

Field Type Details Value MP4VisualSampleEntry Entry type for visual samples defined in the MP4 specification [5]. MP4AudioSampleEntry Entry type for audio samples defined in the MP4 specification [5]. H263SampleEntry Entry type for H.263 visual samples defined in section 8.3.2 of the present document. AMRSampleEntry Entry type for AMR and AMR-WB speech samples defined in section 8.4 of the present document. EVRCSampleEntry Entry type for EVRC speech samples defined in section 8.4 of the present document. QCELPSampleEntry Entry type for 13k (QCELP) speech samples defined in section 8.4 of the present document. SMVSampleEntry Entry type for SMV speech samples defined in section 8.4 of the present document. TextSampleEntry Entry type for timed text samples defined in section 8.5 of the present document. HintSampleEntry Entry type for hint track samples defined in the ISO specification [5]. 12 Table 8-2: SampleEntry fields

13 8.3 Video

14 This section describes Sample Entries for video.

15 8.3.1 MPEG-4 Video

16 NOTE: Throughout this document MPEG-4 Visual is referred to as MPEG-4 video, which should be 17 taken to mean encoding of natural (pixel based) video using MPEG-4 Visual methods.

18 8.3.1.1 MP4VisualSampleEntry Box

19 The MP4VisualSampleEntry Box in the MPEG-4 file format [4] is defined as follows: 20 21 MP4VisualSampleEntry ::= BoxHeader 22 Reserved_6

File Format for Multimedia Services for cdma2000 11 C.S0050-0 v1.0

1 Data-reference-index 2 Reserved_16 3 Width 4 Height 5 Reserved_4 6 Reserved_4 7 Reserved_4 8 Reserved_2 9 Reserved_32 10 Reserved_2 11 Reserved_2 12 ESDBox 13

Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) 'mp4v' Reserved_6 Unsigned int(8) [6] 0 Data-reference-index Unsigned int(16) Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference boxes. Reserved_16 Const unsigned int(32) [4] 0 Width Unsigned int(16) Maximum width, in pixels of the stream Height Unsigned int(16) Maximum height, in pixels of the stream Reserved_4 Const unsigned int(32) 0x00480000 Reserved_4 Const unsigned int(32) 0x00480000 Reserved_4 Const unsigned int(32) 0 Reserved_2 Const unsigned int(16) 1 Reserved_32 Const unsigned int(8) [32] 0 Reserved_2 Const unsigned int(16) 24 Reserved_2 Const int(16) -1 ESDBox Box containing an elementary stream descriptor for this stream. 14 Table 8-3: MP4VisualSampleEntry fields

15 The stream type specific information is in the ESDBox structure. 16 This version of the MP4VisualSampleEntry, with explicit width and height, shall be 17 used for MPEG-4 video streams conformant to this specification. 18 19 NOTE: Width and height parameters together may be used to allocate the necessary memory in the 20 playback device without need to analyse the video stream.

21 8.3.2 H.263

22 8.3.2.1 H263SampleEntry Box

23 The Box type of the H263SampleEntry Box shall be 's263'. 24 The H263SampleEntry Box is defined as follows: 25

File Format for Multimedia Services for cdma2000 12 C.S0050-0 v1.0

1 H263SampleEntry ::= BoxHeader 2 Reserved_6 3 Data-reference-index 4 Reserved_16 5 Width 6 Height 7 Reserved_4 8 Reserved_4 9 Reserved_4 10 Reserved_2 11 Reserved_32 12 Reserved_2 13 Reserved_2 14 H263SpecificBox 15 Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) 's263' Reserved_6 Unsigned int(8) [6] 0 Data-reference-index Unsigned int(16) Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference boxes. Reserved_16 Const unsigned int(32) [4] 0 Width Unsigned int(16) Maximum width, in pixels of the stream Height Unsigned int(16) Maximum height, in pixels of the stream Reserved_4 Const unsigned int(32) 0x00480000 Reserved_4 Const unsigned int(32) 0x00480000 Reserved_4 Const unsigned int(32) 0 Reserved_2 Const unsigned int(16) 1 Reserved_32 Const unsignedint(8) [32] 0 Reserved_2 Const unsigned int(16) 24 Reserved_2 Const int(16) -1 H263SpecificBox Information specific to the H.263 decoder. 16 Table 8-4: H263SampleEntry fields

17 If one compares the MP4VisualSampleEntry box to the H263SampleEntry box the main 18 difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems, 19 with a Box suitable for H.263: H263SpecificBox. The H263SpecificBox field structure 20 for H.263 is described in Section 8.3.2.2. 21 8.3.2.2 H263SpecificBox field for H263SampleEntry Box

22 The H263SpecificBox fields for H.263 shall be as defined in Table 8-5. The 23 H263SpecificBox for the H263SampleEntry Box shall always be included if the 3GPP2 24 file contains H.263 media.

File Format for Multimedia Services for cdma2000 13 C.S0050-0 v1.0

1 The H263SpecificBox for H263 is composed of the following fields.

Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) ‘d263’ DecSpecificInfo H263DecSpecStruc Structure which holds the H.263 Specific information 2 Table 8-5: The H263SpecificBox fields H263SampleEntry

3 BoxHeader Size and Type: indicate the size and type of the H.263 decoder-specific 4 Box. The type must be ‘d263’. 5 DecSpecificInfo: This is the structure where the H263 stream specific information 6 resides. 7 H263DecSpecStruc is defined as follows: 8 9 struct H263DecSpecStruc{ 10 Unsigned int (32) vendor 11 Unsigned int (8) decoder_version 12 Unsigned int (8) H263_Level 13 Unsigned int (8) H263_Profile 14 } 15 The definitions of H263DecSpecStruc members are as follows: 16 vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor 17 field gives information about the vendor whose codec is used to create the encoded 18 data. It is an informative field which may be used by the decoding end. If a vendor 19 already has a four character code, that code should be used in this field. Otherwise, a 20 vendor may create a four character code which best expresses the vendor’s name. This 21 field may be ignored. 22 decoder_version: version of the vendor’s decoder which can decode the encoded 23 stream in the best (i.e. optimal) way. This field is closely associated with the vendor 24 field. It may be used advantageously by vendors which have optimal encoder-decoder 25 version pairs. The value shall be set to 0 if the decoder version has no importance for 26 the vendor. This field may be ignored. 27 H263_Level and H263_Profile: These two parameters define which H263 profile and 28 level is used. These parameters are based on the MIME media type, video/H263-2000. 29 The profile and level specifications can be found in [20]. 30 31 EXAMPLE 1: H.263 Baseline = {H263_Level = 10, H263_Profile = 0}

32 EXAMPLE 2: H.263 Profile 3 @ Level 10 = {H263_Level = 10 , H263_Profile = 3}

33 NOTE: The "hinter", for the creation of the hint tracks, can use the information given by 34 the H263DecSpecStruc members.

35 8.4 Audio and Speech

36 This section describes Sample Entries for audio and speech.

File Format for Multimedia Services for cdma2000 14 C.S0050-0 v1.0

1 8.4.1 MPEG-4 AAC

2 3 8.4.1.1 MP4AudioSampleEntry Box

4 The MP4AudioSampleEntryBox in the MPEG-4 file format [4] is defined as follows: 5 6 MP4AudioSampleEntry::= BoxHeader 7 Reserved_6 8 Data-reference-index 9 Reserved_8 10 Reserved_2 11 Reserved_2 12 Reserved_4 13 TimeScale 14 Reserved_2 15 ESDBox 16

Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) 'mp4a' Reserved_6 Unsigned int(8) [6] 0 Data-reference-index Unsigned int(16) Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference Boxs. Reserved_8 Const unsigned int(32) [2] 0 Reserved_2 Const unsigned int(16) 2 Reserved_2 Const unsigned int(16) 16 Reserved_4 Const unsigned int(32) 0 TimeScale Unsigned int(16) Copied from track Reserved_2 Const unsigned int(16) 0 ESDBox Box containing an elementary stream descriptor for this stream. 17 Table 8-6: AudioSampleEntry fields

18 The stream type specific information is in the ESDBox structure. 19 NOTE: Other ISO codecs such as MPEG2 Audio Layer III (MP3) can be expressed in this Box 20 according to MPEG-4 File Format [4].

21 8.4.2 AMR

22 AMR and AMR-WB speech data are stored in the stream according to the AMR and 23 AMR-WB storage format for single channel header [14], without the AMR magic 24 numbers. 25 8.4.2.1 AMRSampleEntry Box

26 For narrow-band AMR, the box type of the AMRSampleEntry Box shall be 'samr'. For 27 AMR wideband (AMR-WB), the box type of the AMRSampleEntry Box shall be 'sawb'.

File Format for Multimedia Services for cdma2000 15 C.S0050-0 v1.0

1 The AMRSampleEntry Box is defined as follows: 2 3 AMRSampleEntry ::= BoxHeader 4 Reserved_6 5 Data-reference-index 6 Reserved_8 7 Reserved_2 8 Reserved_2 9 Reserved_4 10 TimeScale 11 Reserved_2 12 AMRSpecificBox: 13 Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) 'samr' or ‘sawb’ Reserved_6 Unsigned int(8) [6] 0 Data-reference-index Unsigned int(16) Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference Boxs. Reserved_8 Const unsigned int(32) [2] 0 Reserved_2 Const unsigned int(16) 2 Reserved_2 Const unsigned int(16) 16 Reserved_4 Const unsigned int(32) 0 TimeScale Unsigned int(16) Copied from media header Box of this media Reserved_2 Const unsigned int(16) 0 AMRSpecificBox Information specific to the decoder. 14 Table 8-7: AMRSampleEntry fields

15 If one compares the MP4AudioSampleEntry Box to the AMRSampleEntry Box the main 16 difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems, 17 with a box suitable for AMR and AMR-WB. The AMRSpecificBox field structure is 18 described in section 8.4.2.2. 19 8.4.2.2 AMRSpecificBox field for AMRSampleEntry Box

20 The AMRSpecificBox fields for AMR and AMR-WB speech shall be as defined in Table 21 8-8. The AMRSpecificBox for the AMRSampleEntry Box shall always be included if the 22 3GPP2 file contains AMR or AMR-WB media. 23

Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) ‘damr’ DecSpecificInfo AMRDecSpecStruc Structure which holds the AMR and AMR-WB Specific information 24 Table 8-8: The AMRSpecificBox fields for AMRSampleEntry

25 BoxHeader Size and Type: indicate the size and type of the AMR decoder-specific Box. 26 The type must be ‘damr’.

File Format for Multimedia Services for cdma2000 16 C.S0050-0 v1.0

1 DecSpecificInfo: the structure where the AMR and AMR-WB stream specific 2 information resides. 3 The AMRDecSpecStruc is defined as follows: 4 Field Type Details Value vendor Unsigned int(32) decoder_version Unsigned int(8) mode_set Unsigned int(16) mode_change_period Unsigned int(8) frames_per_sample unsigned int(8) 5 Table 8-9: AMRDecSpecStruc

6 The definitions of AMRDecSpecStruc members are as follows: 7 vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor 8 field gives information about the vendor whose codec is used to create the encoded 9 data. It is an informative field which may be used by the decoding end. If a 10 manufacturer already has a four character code it should be used in this field. 11 Otherwise, a vendor may create a four character code which best expresses the 12 vendor’s name. This field may be ignored. 13 decoder_version: version of the vendor’s decoder which can decode the encoded 14 stream in the best (i.e. optimal) way. This field is closely associated with the vendor 15 field. It may be used advantageously by vendors, which have optimal encoder-decoder 16 version pairs. The value shall be set to 0 if the decoder version has no importance for 17 the vendor. This field may be ignored. 18 mode_set: the active codec modes. Each bit of the mode_set parameter corresponds to 19 one mode. The bit index of the mode is calculated according to the 4 bit FT field of the 20 AMR or AMR-WB frame structure. The mode_set bit structure is as follows: 21 (B15xxxxxxB8B7xxxxxxB0) where B0 (Least Significant Bit) corresponds to Mode 0, and 22 B8 corresponds to Mode 8. 23 The mapping of existing AMR modes to FT is given in table 1.a in [17]. A value of 24 0x81FF means all modes and comfort noise frames are possibly present in an AMR 25 stream. 26 The mapping of existing AMR-WB modes to FT is given in Table 1.a in [18]. A value of 27 0x83FF means all modes and comfort noise frames are possibly present in an AMR-WB 28 stream. 29 As an example, if mode_set = 0000000110010101b, only Modes 0, 2, 4, 7 and 8 are 30 present in the stream. 31 mode_change_period: defines a number N, which restricts the mode changes only at a 32 multiple of N frames. If no restriction is applied, this value shall be set to 0. If 33 mode_change_period is not 0, the following restrictions apply to it according to the 34 frames_per_sample field: 35 if (mode_change_period < frames_per_sample) 36 frames_per_sample = k x (mode_change_period) 37 else if (mode_change_period > frames_per_sample) 38 mode_change_period = k x (frames_per_sample) 39 where k : integer [2, …]

File Format for Multimedia Services for cdma2000 17 C.S0050-0 v1.0

1 If mode_change_period is equal to frames_per_sample, then the mode is the same for all 2 frames inside one sample. 3 frames_per_sample: defines the number of frames to be considered as 'one sample' 4 inside the 3GPP2 file. This number shall be greater than 0 and less than 16. A value of 5 1 means each frame is treated as one sample. A value of 10 means that 10 frames (of 6 duration 20 msec each) are put together and treated as one sample. It must be noted 7 that, in this case, one sample duration is 20 (msec/frame) x 10 (frame) = 200 msec. For 8 the last sample of the stream, the number of frames can be smaller than 9 frames_per_sample, if the number of remaining frames is smaller than 10 frames_per_sample. 11 12 NOTE: The "hinter", for the creation of the hint tracks, can use the information given by the 13 AMRDecSpecStruc members.

14 8.4.3 EVRC

15 EVRC speech data shall be stored inside of a media track in such a way that is 16 described in Section 11 of [13]. The magic number shall not be included. The codec 17 data frames are stored in a consecutive order with a single TOC entry field as a prefix 18 per each of data frame, where the TOC field is extended to one octet by setting the four 19 most significant bits of the octet to zero, as illustrated in the following figure. 20 21 |<-- Octet 1 -->|<-- Octet 2 -->|<-- …… -->|<-- Octet N -->| 22 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+……+-+-+-+-+-+-+-+-+-+-+ 23 |0|0|0|0|FR Type| One EVRC speech data frame | 24 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+……+-+-+-+-+-+-+-+-+-+-+

25 26 8.4.3.1 EVRCSampleEntry Box

27 For EVRC, the Box type of the EVRCSampleEntry Box shall be 'sevc'. 28 The EVRCSampleEntry Box is defined as follows: 29 30 EVRCSampleEntry ::= BoxHeader 31 Reserved_6 32 Data-reference-index 33 Reserved_8 34 Reserved_2 35 Reserved_2 36 Reserved_4 37 TimeScale 38 Reserved_2

File Format for Multimedia Services for cdma2000 18 C.S0050-0 v1.0

1 EVRCSpecificBox

Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) 'sevc’ Reserved_6 Unsigned int(8) [6] 0 Data-reference-index Unsigned int(16) Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference Boxs. Reserved_8 unsigned int(32) [2] 0 Reserved_2 unsigned int(16) 2 Reserved_2 unsigned int(16) 16 Reserved_4 unsigned int(32) 0 TimeScale Unsigned int(16) Copied from media header Box of this media Reserved_2 unsigned int(16) 0 EVRCSpecificBox Information specific to the decoder. 2 Table 8-10: EVRCSampleEntry fields

3 If one compares the MP4AudioSampleEntry Box to the EVRCSampleEntry Box the main 4 difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems, 5 with a box suitable for EVRC speech. The EVRCSpecificBox field structure is described 6 in section 8.4.3.2. 7 8.4.3.2 EVRCSpecificBox field for EVRCSampleEntry Box

8 The EVRCSpecificBox fields for EVRC shall be as defined in Table 8-11. The 9 EVRCSpecificBox for the EVRCSampleEntry Box shall always be included if the 3GPP2 10 file contains EVRC media. 11

Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) ‘devc’ DecSpecificInfo EVRCDecSpecStruc Structure which holds the EVRC Specific information 12 Table 8-11: The EVRCSpecificBox fields for EVRCSampleEntry

13 BoxHeader Size and Type: indicates the size and type of the EVRC decoder-specific 14 Box. The type shall be ‘devc’. 15 DecSpecificInfo: the structure where the EVRC stream specific information resides. 16 The EVRCDecSpecStruc is defined as follows: 17 Field Type Details Value vendor Unsigned int(32) decoder_version Unsigned int(8) frames_per_sample Unsigned int(8) 18 Table 8-12: EVRCDecSpecStruc

19 The definitions of EVRCDecSpecStruc members are as follows: 20 vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor 21 field gives information about the vendor whose codec is used to create the encoded 22 data. It is an informative field which may be used by the decoding end. If a 23 manufacturer already has a four character code it should be used in this field.

File Format for Multimedia Services for cdma2000 19 C.S0050-0 v1.0

1 Otherwise, a vendor may create a four character code which best expresses the 2 vendor’s name. This field may be ignored. 3 decoder_version: version of the vendor’s decoder which can decode the encoded 4 stream in the best (i.e. optimal) way. This field is closely associated with the vendor 5 field. It may be used advantageously by vendors, which have optimal encoder-decoder 6 version pairs. The value shall be set to 0 if the decoder version has no importance for 7 the vendor. This field may be ignored. 8 frames_per_sample: defines the number of frames to be considered as 'one sample' 9 inside the MP4 file. This number shall be greater than 0 and should be carefully chosen 10 since the ‘access unit’ is decided depending on the value defined by this field. For 11 example, a value of 1 means each frame is treated as one sample. A value of 10 means 12 that 10 frames (of duration 20 msec each) are aggregated and treated as one sample. It 13 must be noted that, in this case, one sample duration is 20 (msec/frame) x 10 (frame) = 14 200 msec. For the last sample of the stream, the number of frames can be smaller than 15 frames_per_sample, if the number of remaining frames is smaller than 16 frames_per_sample. 17

18 8.4.4 13K (QCELP)

19 13K speech data shall be stored inside of a media track in the same way for codec data 20 frame format as described in Section 3.2 of [12]. Each codec data frame is zero-padded 21 to become of multiple of octets and the frames are stored in a consecutive order, as 22 illustrated in the following figure. Here ‘z’ is the stuffing bit used to keep byte 23 alignment; its value is 0. 24 25 |<-- Octet 1 -->|<-- Octet 2 -->|<-- …… -->|<-- Octet N -->| 26 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+……+-+-+-+-+-+-+-+-+-+-+ 27 | Rate | One 13K speech data frame …|z|z| 28 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+……+-+-+-+-+-+-+-+-+-+-+ 29 30 Figure 8-2: Frame byte alignment

31 8.4.4.1 QCELPSampleEntry Box

32 For 13K, the box type of the QCELPSampleEntry Box shall be 'sqcp'. 33 The QCELPSampleEntry Box is defined as follows: 34 35 QCELPSampleEntry::= BoxHeader 36 Reserved_6 37 Data-reference-index 38 Reserved_8 39 Reserved_2 40 Reserved_2 41 Reserved_4 42 TimeScale 43 Reserved_2

File Format for Multimedia Services for cdma2000 20 C.S0050-0 v1.0

1 QCELPSpecificBox 2

Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) 'sqcp’ Reserved_6 Unsigned int(8) [6] 0 Data-reference-index Unsigned int(16) Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference Boxs. Reserved_8 Const unsigned int(32) [2] 0 Reserved_2 Const unsigned int(16) 2 Reserved_2 Const unsigned int(16) 16 Reserved_4 Const unsigned int(32) 0 TimeScale Unsigned int(16) Copied from media header Box of this media Reserved_2 Const unsigned int(16) 0 QCELPSpecificBox Information specific to the decoder. 3 Table 8-13: QCELPSampleEntry fields

4 If one compares the MP4AudioSampleEntry Box to the QCELPSampleEntry Box the 5 main difference is in the replacement of the ESDBox, which is specific to MPEG-4 6 systems, with a box suitable for 13k. The QCELPSpecificBox field structure is 7 described in Section 8.4.4.2. 8 8.4.4.2 QCELPSpecificBox field for QCELPSampleEntry Box

9 The QCELPSpecificBox fields for 13K speech shall be as defined in Table 8-14. The 10 QCELPSpecificBox for the QCELPSampleEntry Box shall always be included if the 11 3GPP2 file contains 13K speech media. 12

Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) ‘dqcp’ DecSpecificInfo QCELPDecSpecStruc Structure which holds the 13K (QCELP) speech specific information 13 Table 8-14: The QCELPSpecificBox fields for QCELPSampleEntry

14 BoxHeader Size and Type: indicate the size and type of the 13k decoder-specific Box. 15 The type must be ‘dqcp’. 16 DecSpecificInfo: the structure where the 13K speech stream specific information 17 resides. The QCELPDecSpecStruc is defined as follows: 18 Field Type Details Value vendor Unsigned int(32) decoder_version Unsigned int(8) frames_per_sample Unsigned int(8) 19 Table 8-15: QCELPDecSpecStruc

20 The definitions of QCELPDecSpecStruc members are as follows:

File Format for Multimedia Services for cdma2000 21 C.S0050-0 v1.0

1 vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor 2 field gives information about the vendor whose codec is used to create the encoded 3 data. It is an informative field, which may be used by the decoding end. If a 4 manufacturer already has a four character code, it is recommended that it uses the 5 same code should be used in this field. Otherwise, a vendor may create a four character 6 code which best expresses the vendor’s name. Else, it is recommended that the 7 manufacturer creates a four character code which best addresses the manufacturer’s 8 name. This field may be safely ignored. 9 decoder_version: version of the vendor’s decoder which can decode the encoded 10 stream in the best (i.e. optimal) way. This field is closely associated with the vendor 11 field. It may be used advantageously by the vendors, which have optimal encoder- 12 decoder version pairs. The value shall be set to 0 if the decoder version has no 13 importance for the vendor. This field may be safely ignored. 14 frames_per_sample: defines the number of frames to be considered as 'one sample' 15 inside the file. This number shall be greater than 0 and should be carefully chosen 16 since the ‘access unit’ is decided depending on the value defined by this field. A value of 17 1 means each frame is treated as one sample. A value of 10 means that 10 frames (of 18 duration 20 msec each) are aggregated and treated as one sample. It must be noted 19 that, in this case, one sample duration is 20 (msec/frame) x 10 (frame) = 200 msec. For 20 the last sample of the stream, the number of frames can be smaller than 21 frames_per_sample, if the number of remaining frames is smaller than 22 frames_per_sample. 23 8.4.4.3 13K (QCELP) Support in MP4AudioSampleEntry Box

24 For storage of 13K speech media, MP4AudioSampleEntry also can be used. 13K speech 25 data shall be stored inside of a media track in the same way as described in [22] 26 When storing 13K speech bitstreams in a 3GPP2 file, the handler-type field within the 27 HandlerAtom shall be set to ‘soun’ to indicate media of type AudioStream, and the 28 SampleEntry Box type shall be 'mp4a' and the same Box described in Section 8.4.1.1 is 29 used. 30 For inclusion of 13K speech media in MP4AudioSampleEntry, the stream type specific 31 information is in the ESDBox structure. The 13K speech codec is to be signaled by new 32 value from the ‘User Public’ area of ‘objectTypeIndication’ within the 33 DecoderConfigDescriptor structure. 34 objectTypeIndication = 0xE1 ; 35 The QCELPDecoderSpecificInfo in ABNF [21] format is specified as 36 QCELPDecoderSpecificInfo = QLCM fmt 37 The above ABNF rule indicates that QCELPDecoderSpecificInfo is the same as the 38 header for 13K vocoder in “.qcp” file format as described in [22], but without RIFF, riff- 39 size, or anything after fmt. In addition, if the size of packets is completely constant i.e. 40 fixed rate encoding, the following rules apply to the definition of fmt (see variable-rate 41 referenced by codec-info). 42 num-rates = 0 43 rate-map = 0 44 major = 1

File Format for Multimedia Services for cdma2000 22 C.S0050-0 v1.0

1 8.4.4.4 Mapping of QCELPSampleEntry Box and 13K Support in 2 MP4AudioSampleEntry Box

3 Variables in QCELPSampleEntry Box and DecoderSpecificInfo in 4 MP4AudioSampleEntry Box for 13K is translated as described in the following table. 5 QCELPSampleEntry MP4AudioSampleEntry Vendor The first 4 bytes of Name field decoder_version The fifth byte of Name field framesPerSample (fPS) N.A. fPS = sPB / (sPS * 0.02) N.A. AvgBitsPerSec (aBPS) Calculated based on duration field in Track Header Box and data size in Sample Size Box. N.A. bytesPerBlock (bPB) Calculated according to the equation: aBPS = bPB * 8(bits/Byte) / (0.02 * fPS) N.A. samplePerBlock (sPB) sPB = sPS * 0.02 * fPS N.A. numOfRates and bytesPerPacket When fixed rate encoding is used, all fields should be set 0x00. When variable rate encoding is used, example 1 in packet definition in [22]. 6 7 Table 8-16: Mapping table

8 8.4.5 SMV

9 SMV speech data shall be stored inside of a media track in such a way that is described 10 in Section 11 of [13]. The magic number is not included. The codec data frames are 11 stored in a consecutive order with a single TOC entry field as a prefix per each of data 12 frame, where the TOC field is extended to one octet by setting the four most significant 13 bits of the octet to zero, as illustrated in the following figure. 14 15 |<-- Octet 1 -->|<-- Octet 2 -->|<-- …… -->|<-- Octet N -->| 16 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+……+-+-+-+-+-+-+-+-+-+-+ 17 |0|0|0|0|FR Type| One SMV speech data frame | 18 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+……+-+-+-+-+-+-+-+-+-+-+

19 20 Figure 8-3: SMV frame byte alignment

File Format for Multimedia Services for cdma2000 23 C.S0050-0 v1.0

1 8.4.5.1 SMVSampleEntry Box

2 For SMV speech, the box type of the SMVSampleEntry Box shall be 'ssmv'. 3 The SMVSampleEntry Box is defined as follows: 4 5 SMVSampleEntry ::= BoxHeader 6 Reserved_6 7 Data-reference-index 8 Reserved_8 9 Reserved_2 10 Reserved_2 11 Reserved_4 12 TimeScale 13 Reserved_2 14 SMVSpecificBox 15 Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) 'ssmv’ Reserved_6 Unsigned int(8) [6] 0 Data-reference-index Unsigned int(16) Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference Boxs. Reserved_8 Const unsigned int(32) [2] 0 Reserved_2 Const unsigned int(16) 2 Reserved_2 Const unsigned int(16) 16 Reserved_4 Const unsigned int(32) 0 TimeScale Unsigned int(16) Copied from media header Box of this media Reserved_2 Const unsigned int(16) 0 SMVSpecificBox Information specific to the decoder. 16 Table 8-17: SMVSampleEntry fields

17 If one compares the AudioSampleEntry Box for the SMVSampleEntry Box the main 18 difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems, 19 with a box suitable for SMV. The SMVSpecificBox field structure is described in 20 section 8.4.5.2. 21 8.4.5.2 SMVSpecificBox field for SMVSampleEntry Box

22 The SMVSpecificBox fields for SMV shall be as defined in Table 8-18. The 23 SMVSpecificBox for the SMVSampleEntry Box shall always be included if the MP4 file 24 contains SMV media.

Field Type Details Value BoxHeader.Size Unsigned int(32) BoxHeader.Type Unsigned int(32) ‘dsmv’ DecSpecificInfo SMVDecSpecStruc Structure which holds SMV Specific information 25 Table 8-18: The SMVSpecificBox fields for SMVSampleEntry

26 BoxHeader Size and Type: indicate the size and type of the SMV decoder-specific Box. 27 The type must be ‘dsmv’.

File Format for Multimedia Services for cdma2000 24 C.S0050-0 v1.0

1 DecSpecificInfo: the structure where the SMV stream specific information resides. 2 The SMVDecSpecStruc is defined as follows: 3 Field Type Details Value Vendor Unsigned int(32) decoder_version Unsigned int(8) Frames_per_sample Unsigned int(8) 4 Table 8-19: SMV DECSpecStruc

5 The definitions of SMVDecSpecStruc members are as follows: 6 vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor 7 field gives information about the vendor whose codec is used to create the encoded 8 data. It is an informative field, which may be used by the decoding end. If a 9 manufacturer already has a four character code, it is recommended that it uses the 10 same code should be used in this field. Otherwise, a vendor may create a four character 11 code which best expresses the vendor’s name. Else, it is recommended that the 12 manufacturer creates a four character code which best addresses the manufacturer’s 13 name. This field may be safely ignored. 14 decoder_version: version of the vendor’s decoder which can decode the encoded 15 stream in the best (i.e. optimal) way. This field is closely associated with the vendor 16 field. It may be used advantageously by the vendors, which have optimal encoder- 17 decoder version pairs. The value shall be set to 0 if the decoder version has no 18 importance for the vendor. This field may be safely ignored. 19 frames_per_sample: defines the number of frames to be considered as 'one sample' 20 inside the MP4 file. This number shall be greater than 0 and should be carefully chosen 21 since the ‘access unit’ is decided depending on the value defined by this field. For 22 example, a value of 1 means each frame is treated as one sample. A value of 10 means 23 that 10 frames (of duration 20 msec each) are aggregated and treated as one sample. It 24 must be noted that, in this case, one sample duration is 20 (msec/frame) x 10 (frame) = 25 200 msec. For the last sample of the stream, the number of frames can be smaller than 26 frames_per_sample, if the number of remaining frames is smaller than 27 frames_per_sample. 28

29 8.5 Timed Text Format

30 For timed text in downloaded files, 3GPP Timed Text shall be used [6]. This is for 31 download use, not for streaming. 32 Note that operators may deploy terminals specified with rules and restrictions in 33 addition to those in 3GPP timed text. Behavior that is optional for 3GPP Timed Text 34 may be mandatory for particular deployments. In particular, the required character set 35 is almost certainly dependent on the geography of the deployment. 36 In 3GPP2 Timed Text, a text wrap function is additionally included as an optional 37 feature. Therefore, 3GPP Timed Text is modified as following.

38 8.5.1 Text wrap

39 Automatic wrapping of text from line to line is complex, and can require hyphenation 40 rules and other complex, language-specific criteria. For these reasons, text wrap is

File Format for Multimedia Services for cdma2000 25 C.S0050-0 v1.0

1 optional in this specification. Text wrap is specified using TextWrapBox. Terminals 2 supporting this feature shall use TextWrapBox. When text wrap is not used or a string 3 is too long to be drawn within the box, it is clipped. The terminal may choose whether 4 to clip at the pixel boundary, or to render only whole glyphs. 5 There may be multiple lines of text in a sample (hard wrap). Terminals shall start a 6 new line for the Unicode characters line separator (\u2028), paragraph separator 7 (\u2029) and line feed (\u000A). Terminals should follow Unicode Technical Report 13 8 [23]. Terminals should treat carriage return (\u000D), next line (\u0085) and CR+LF 9 (\u000D\u000A) as new line.

10 8.5.2 TextWrapBox

11 'twrp' - Specifies text wrap: the Box contains one 8-bit integer as a wrap mode flag. 12 13 class TextWrapBox() extends TextSampleModifierBox (‘twrp’) { 14 unsigned int(8) wrap_flag; 15 } 16 17 wrap_flag: 18 Value Description 0 No wrap 1 Automatic wrap 2-255 Reserved

19 Table 8-20: TextWrapBox description

20 9 Presentation and Layout Support (SMIL)

21 This section describes the 3GPP2 SMIL profile.

22 9.1 Media Synchronization and Presentation Format

23 3GPP2 SMIL is a markup language based on SMIL Basic [19] and SMIL Scalability 24 Framework. 25 3GPP2 SMIL consists of the modules required by SMIL Basic Profile (and SMIL 2.0 Host 26 Language Conformance) and additional MediaAccessibility, MediaDescription, 27 MediaClipping, MetaInformation, PrefetchControl, EventTiming and BasicTransitions 28 modules. All of the following modules are included: 29 - SMIL 2.0 Content Control Modules – BasicContentControl, SkipContentControl 30 and PrefetchControl 31 - SMIL 2.0 Layout Module – BasicLayout 32 - SMIL 2.0 Linking Module – BasicLinking, LinkingAttributes 33 - SMIL 2.0 Media Object Modules – BasicMedia, MediaClipping, 34 MediaAccessibility and MediaDescription 35 - SMIL 2.0 Metainformation Module – Metainformation 36 - SMIL 2.0 Structure Module – Structure

File Format for Multimedia Services for cdma2000 26 C.S0050-0 v1.0

1 - SMIL 2.0 Timing and Synchronization Modules – BasicInlineTiming, 2 MinMaxTiming, BasicTimeContainers, RepeatTiming and EventTiming 3 - SMIL 2.0 Transition Effects Module – BasicTransitions

4 9.1.1 Document Conformance

5 A conforming 3GPP2 SMIL document shall be a conforming SMIL 2.0 document. 6 All 3GPP2 SMIL documents use SMIL 2.0 namespace. 7 8 3GPP2 SMIL documents may declare requirements using systemRequired attribute: 9 EXAMPLE1: 10 13 Namespace URI http://www.3gpp2.org/SMIL20/FFMS10/ identifies the version of the 14 3GPP2 SMIL profile described in the present document. Authors may use this URI to 15 indicate requirement for exact 3GPP2 SMIL semantics for a document or a subpart of a 16 document: 17 EXAMPLE2: 18 21 The content authors should generally not include the FFMS requirement in the 22 document unless the SMIL document relies on FFMS specific semantics that are not 23 part of the W3C SMIL. The reason for this is that SMIL players that are not conforming 24 3GPP2 FFMS10 user agents may not recognize the FFMS10 URI and thus refuse to play 25 the document.

26 9.1.2 User Agent Conformance

27 A conforming 3GPP2 SMIL user agent shall be a conforming SMIL Basic User Agent. 28 A conforming user agent shall implement the semantics of 3GPP2 SMIL as described in 29 Sections 9.1.3 and 9.1.4. 30 A conforming user agent shall recognize: 31 - the URIs of all included SMIL 2.0 modules, 32 - the URI http://www.3gpp2.org/SMIL20/FFMS10/ as referring to all modules 33 and semantics of the version of the 3GPP2 SMIL profile as described in the 34 present document.

35 9.1.3 3GPP2 SMIL Language Profile definition

36 3GPP2 SMIL is based on SMIL 2.0 Basic language profile [19]. This section defines the 37 content model and integration semantics of the included modules where they differ 38 from those defined by SMIL Basic.

File Format for Multimedia Services for cdma2000 27 C.S0050-0 v1.0

1 9.1.3.1 Content Control Modules

2 3GPP2 SMIL includes the content control functionality of the BasicContentControl, 3 SkipContentControl and PrefetchControl modules of SMIL 2.0. PrefetchControl is not 4 part of SMIL Basic and is an additional module in this profile. 5 All BasicContentControl attributes listed in the module specification shall be 6 supported. 7 NOTE: The SMIL specification [19] defines that all functionality of PrefetchControl 8 module is optional. This means thatalthough PrefetchControl is mandatory, user 9 agents may implement some of none of the semantics of PrefetchControl module. 10 The PrefetchControl module adds the prefetch1 element to the content model of SMIL 11 Basic body, switch, par and seq elements. The prefetch element has the attributes 12 defined by the PrefetchControl module (mediaSize, mediaTime and bandwidth), the 13 src attribute, the BasicContentControl attributes and the skip-content attribute. 14 9.1.3.2 Layout Module

15 3GPP2 SMIL shall use the BasicLayout module of SMIL 2.0 for spatial layout. The 16 module is part of SMIL Basic. 17 Default values of the width and height attributes for root-layout shall be the 18 dimensions of the device display area. 19 9.1.3.3 Linking Module

20 3GPP2 SMIL shall use the SMIL 2.0 BasicLinking module for providing hyperlinks 21 between documents and document fragments. The BasicLinking module is from SMIL 22 Basic. 23 When linking to destinations outside the current document, implementations may 24 ignore values "play" and "pause" of the 'sourcePlaystate' attribute and values "new" and 25 "pause" of the 'show' attribute, instead using the semantics of values "stop" and 26 "replace" respectively. When the values of 'sourcePlaystate' and 'show' are ignored the 27 player may also ignore the 'sourceLevel' attribute since it is of no use then. 28 9.1.3.4 Media Object Modules

29 3GPP2 SMIL includes the media elements from the SMIL 2.0 BasicMedia module and 30 attributes from the MediaAccessibility, MediaDescription and MediaClipping modules. 31 MediaAccessibility, MediaDescription and MediaClipping modules are additions in this 32 profile to the SMIL Basic. 33 MediaClipping module adds to the profile the ability to address sub-clips of continuous 34 media. MediaClipping module adds 'clipBegin' and 'clipEnd´ (and for compatibility 35 'clip-begin' and 'clip-end') attributes to all media elements. 36 MediaAccessibility module provides basic accessibility support for media elements. 37 New attributes 'alt', 'longdesc' and 'readIndex' are added to all media elements by this 38 module. MediaDescription module is included by the MediaAccessibility module and 39 adds 'abstract', 'author' and 'copyright' attributes to media elements.

1 Bold indicates elements that are not part of SMIL 2.0 Basic

File Format for Multimedia Services for cdma2000 28 C.S0050-0 v1.0

1 9.1.3.5 MetaInformation Module

2 The MetaInformation module of SMIL 2.0 is included in the profile. This module is an 3 addition, in this profile, to the SMIL Basic and provides a way to include descriptive 4 information about the document content into the document. 5 This module adds meta and metadata elements to the content model of SMIL Basic 6 head element. 7 9.1.3.6 Structure Module

8 The Structure module defines the top-level structure of the document. It is included in 9 SMIL Basic. 10 9.1.3.7 Timing and Synchronization modules

11 The timing modules included in the 3GPP2 SMIL are BasicInlineTiming, MinMaxTiming, 12 BasicTimeContainers, RepeatTiming and EventTiming. The EventTiming module is an 13 addition in this profile to the SMIL Basic. 14 For 'begin' and 'end' attributes either single offset-value or single event-value shall be 15 allowed. Offsets shall not be supported with event-values. 16 Event timing attributes that reference invalid IDs (for example elements that have been 17 removed by the content control) shall be treated as being indefinite. 18 Supported event names and semantics shall be as defined by the SMIL 2.0 Language 19 Profile. All user agents shall be able to raise the following event types: 20 - activateEvent; 21 - beginEvent; 22 - endEvent. 23 The following SMIL 2.0 Language event types should be supported: 24 - focusInEvent; 25 - focusOutEvent; 26 - inBoundsEvent; 27 - outBoundsEvent; 28 - repeatEvent. 29 User agents shall ignore unknown event types and not treat them as errors. 30 Events do not bubble and shall be delivered to the associated media or timed elements 31 only. 32 9.1.3.8 Transition Effects Module

33 3GPP2 SMIL profile includes the SMIL 2.0 BasicTransitions module to provide a 34 framework for describing transitions between media elements. 35 NOTE: The SMIL specification [19] defines that all functionality of BasicTransitions 36 module is optional: "Transitions are hints to the presentation. Implementations must be 37 able to ignore transitions if they so desire and still play the media of the presentation". 38 This means that even although the BasicTransitions module is mandatory user agents

File Format for Multimedia Services for cdma2000 29 C.S0050-0 v1.0

1 may implement semantics of the BasicTransitions module only partially or not to 2 implement them at all. Content authors should use transitions in their SMIL 3 presentation where this appears useful. User agents that fully support the semantics of 4 the Basic Transitions module will render the presentation with the specified transitions. 5 All other user agents will leave out the transitions but present the media content 6 correctly. 7 User agents that implement the semantics of this module should implement at least the 8 following transition effects described in SMIL 2.0 specification [19]: 9 - barWipe; 10 - irisWipe; 11 - clockWipe; 12 - snakeWipe; 13 - pushWipe; 14 - slideWipe; 15 - fade. 16 A user agent should implement the default subtype of these transition effects. 17 A user agent that implements the semantics of this module shall at least support 18 transition effects for non-animated image media elements. For purposes of the 19 Transition Effects modules, two media elements are considered overlapping when they 20 occupy the same region. 21 BasicTransitions module adds attributes 'transIn' and 'transOut' to the media elements 22 of the Media Objects modules, and value "transition" to the set of legal values for the 23 'fill' attribute of the media elements. It also adds transition element to the content 24 model of the head element.

25 9.1.4 Content Model

26 Table 9-1shows the full content model and attributes of the 3GPP2 SMIL profile. The 27 attribute collections used are defined by SMIL Basic [19], SMIL Host Language 28 Conformance requirements, chapter 2.4. Changes to SMIL Basic are shown in bold.

File Format for Multimedia Services for cdma2000 30 C.S0050-0 v1.0

1

Element Elements Attributes smil head, body COMMON-ATTRS, CONTCTRL-ATTRS, xmlns layout, switch, head meta, metadata, COMMON-ATTRS transition TIMING-ELMS, body MEDIA-ELMS, COMMON-ATTRS switch, a, prefetch layout root-layout, region COMMON-ATTRS, CONTCTRL-ATTRS, type COMMON-ATTRS, backgroundColor, height, width, skip- root-layout EMPTY content COMMON-ATTRS, backgroundColor, bottom, fit, height, region EMPTY left, right, showBackground, top, width, z-index, skip- content, regionName COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, ref, animation, audio, img, repeat, region, MEDIA-ATTRS, clipBegin(clip-begin), area video, text, textstream clipEnd(clip-end), alt, longDesc, readIndex, abstract, author, copyright, transIn, transOut a MEDIA-ELMS COMMON-ATTRS, LINKING-ATTRS COMMON-ATTRS, LINKING-ATTRS, TIMING-ATTRS, area EMPTY repeat, shape, coords, nohref TIMING-ELMS, COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, par, seq MEDIA-ELMS, repeat switch, a, prefetch TIMING-ELMS, switch MEDIA-ELMS, COMMON-ATTRS, CONTCTRL-ATTRS layout, a, prefetch COMMON-ATTRS, CONTCTRL-ATTRS, mediaSize, prefetch EMPTY mediaTime, bandwidth, src, skip-content meta EMPTY COMMON-ATTRS, content, name, skip-content metadata EMPTY COMMON-ATTRS, skip-content COMMON-ATTRS, CONTCTRL-ATTRS, dur, type, transition EMPTY subtype, startProgress, endProgress, direction, fadeColor. skip-content 2 Table 9-1: Content model for the 3GPP2 SMIL profile

3 10 File Format for 13K Speech “.QCP”

4 This section describes the qcp file format for reading and writing 13K vocoder packets. 5 RFC2658 [12] specifies RTP streaming for 13K vocoder but does not include a file 6 format. The qcp file format is described in [22]. The MIME type for “.qcp” files with 13K 7 vocoder is “audio/qcelp”.

8 11 Compact Multimedia Format “.cmf”

9 This section presents a file format for creating multimedia files to combine 13K vocoder 10 speech, WAVE/RIFF sound [29], IMA ADPCM sound [27], MIDI [24], text, JPEG [25] 11 pictures, PNG pictures [26], BMP pictures [28] and animation data. This is similar to 12 standard MIDI and provides functionality similar to that provided by the SMIL 13 presentation layer. The format described here results in smaller files compared to SMIL, 14 while maintaining the functionality to create synchronized multimedia presentations. 15 Some examples of presentations that can be created using this format are:

File Format for Multimedia Services for cdma2000 31 C.S0050-0 v1.0

1 • Multimedia ringers with graphics, text, MIDI and speech 2 • Audio postcard messages with speech and JPEG 3 • Advertisements with graphics, text and audio 4 • Karaoke with graphics 5 • Animated cartoons with MIDI, text and speech 6 In addition, the above format allows for addition of new and custom sub-chunks with 7 full backward compatibility. Media elements can be recycled, resulting in more compact 8 files.

9 11.1 Media Presentation Format

10 This section describes “.cmf”, a binary format container for multimedia elements with 11 embedded synchronization information. The file format is presented in ABNF format as 12 specified in [21]

13 11.1.1 Definitions for cmf format

14 This section defines the definitions used in describing the cmf file format. 15 16 UINT32 = 4OCTET 17 ; this field contains a 32-bit integer stored 18 ; as a sequence of four octets, in 19 ; big-endian order (most significant 20 ; octet first) 21 22 UINT16 = 2OCTET 23 ; this field contains a 16-bit integer stored 24 ; as a sequence of two octets, in 25 ; big-endian order (most significant 26 ; octet first) 27 28 OCTET = %x00-FF 29 ; an octet, also called a byte - any possible 30 ; combination of eight bits, forming a single 31 ; integer or part of a larger integer having 32 ; more than eight bits 33 34 length2 = 2OCTET 35 length4 = 4OCTET 36 ; The above two define the length of a track, 37 ; field, chunk, etc. 38

39 11.2 Definition of a cmf file

40 A cmf file is composed of file ID, file length, header information and one or more content 41 tracks, as shown below. 42 cmf-file = "cmid" length4 cmf-header 1*4cmf-tracks

43 11.3 Definition of a cmf-header

44 The following ABNF describes the cmf header. The header sets up some global 45 parameters for interpreting the cmf-tracks.

File Format for Multimedia Services for cdma2000 32 C.S0050-0 v1.0

1 2 ; cmf-header specifies the header format for mm 3 cmf-header = header-length content-type nTracks *sub-chunk 4 header-length = UINT16 5 content-type = (melody (complete / part) ) 6 / (song instruments) 7 melody = %x01 8 ; used for ringers 9 complete = %x01 10 ; all of the melody 11 part = %x02 12 ; part of the melody 13 song = %x02 14 ; used for pictures + audio 15 instruments = OCTET 16 ; 0VMFPTWS 17 ; V=1 contains other vocal parts 18 ; M=1 contains male vocal parts 19 ; F=1 contains female vocal parts 20 ; P=1 contains picture data 21 ; T=1 contains text data 22 ; W=1 contains wave data 23 ; S=1 contains musical event 24 25 nTracks = OCTET 26 ; Currently, up to 4 tracks are supported. 27 28 sub-chunk = vers-chunk cnts-chunk note-chunk 29 ; one or more required chunks 30 ; Only one of each type is allowed. 31 sub-chunk =/ *optional-chunk 32 33 optional-chunk = code-chunk 34 / titl-chunk 35 / date-chunk 36 / sorc-chunk 37 / copy-chunk 38 / exsn-chunk 39 / exsa-chunk 40 / exsb-chunk 41 / exsc-chunk 42 / cuep-chunk 43 / pcpi-chunk 44 / cnts-chunk 45 / prot-chunk 46 / poly-chunk 47 / wave-chunk 48 49 vers-chunk = "vers" %x0004 "0500" 50 code-chunk = "code" %x0001 code-value 51 titl-chunk = "titl" UINT16 *data 52 date-chunk = "date" UINT16 *data 53 sorc-chunk = "sorc" %x0001 source-info 54 copy-chunk = "copy" UINT16 *data 55 ; Content provider’s copyright notice 56 note-chunk = "note" %x0002 note-msg-config 57 note-msg-config = [%x0000 / %x0001] 58 ; %x0000 = Note message is of length 3 octet 59 ; %x0001 = Note message is of length 4 octet 60 ; The extra octet is used to include velocity

File Format for Multimedia Services for cdma2000 33 C.S0050-0 v1.0

1 ; and Octave Shift information. 2 exsn-chunk = "exsn" %x0002 2data 3 ; specifies the length of normal extension 4 ; status-A message 5 exsa-chunk = "exsa" %x0002 2data 6 ; specifies the length of extension 7 ; status-A, class A message 8 exsb-chunk = "exsb" %x0002 2data 9 ; specifies the length of extension 10 ; status-A, class B message 11 exsc-chunk = "exsc" %x0002 2data 12 ; specifies the length of extension 13 ; status-A, class C message 14 cuep-chunk = "cuep" 4nTracks *data 15 ; This sub chunk specifies starting offsets into 16 ; each track where the player can locate the 17 ; "cue point start point", or the beginning of the 18 ; theme music in the track. The "Length" field, 19 ; and subsequently the combined total length of the 20 ; "Cue Point Start Point" fields, must equal the 21 number 22 ; of track chunks multiplied by 4. Every 4 bytes 23 ; consists of the "Cue Point Start Point" field 24 ; corresponding to a track chunk. 25 ; Each cue point start point is defined to be a byte 26 ; offset from the first event in the corresponding 27 ; track to the beginning of the theme music event in 28 ; that track. If a cue point start point value is 29 ; FFFFFFFF, the cue point is considered "invalid" and 30 ; not used. 31 pcpi-chunk = "pcpi" %x0001 [%x00-01] 32 ; pcpi describes the picture packet information. 33 ; %0x00 = XY offsets in pcpi are in percent 34 ; %0x01 = XY offsets in pcpi are in pixels 35 cnts-chunk = "cnts" UINT16 [media-type *[";" media-type]] 36 ; Describes the various media contents that are 37 ; present in the file. 38 ; Multiple media are separated by ";" in cnts-chunk 39 ; Example = SONG;WAVE;TEXT;PICT 40 prot-chunk = "prot" UINT16 *data 41 poly-chunk = "poly" %x0001 data 42 wave-chunk = "wave" %x0001 data 43 44 code-value = %b00000000 ; ANSI CHARSET 45 / %b00000001 ; ISO8859-1 46 / %b00000010 ; ISO8859-2 47 / %b00000011 ; ISO8859-3 48 / %b00000100 ; ISO8859-4 49 / %b00000101 ; ISO8859-5 50 / %b00000110 ; ISO8859-6 51 / %b00000111 ; ISO8859-7 52 / %b00001000 ; ISO8859-8 53 / %b00001001 ; ISO8859-9 54 / %b00001010 ; ISO8859-10 55 / %b10000001 ; HANGUL CHARSET 56 / %b10000010 ; Chinese Simplified 57 / %b10000011 ; Chinese Traditional 58 / %b10000100 ; Hindi 59 ; Others Reserved 60

File Format for Multimedia Services for cdma2000 34 C.S0050-0 v1.0

1 source-info = %x00 / %x01 / %x02 / %x03 / %x04 / %x05 2 ; %x00 = No copyright, downloaded (from the net) 3 ; %x01 = Copyrighted, downloaded (from the net) 4 ; %x02 = No copyright, mobile originated 5 ; %x03 = Copyrighted, mobile originated 6 ; %x04 = No copyright, from desktop 7 ; %x05 = Copy righted, from desktop 8 9 data = %x00-ff 10 media-type = ["SONG" ; Contains MIDI 11 /"WAV" ; Contains Wave sounds 12 /"TEXT" ; Contains text data 13 /"PICT" ; Contains still image data 14 /"ANIM" ; Contains animation data 15 /"LED" ; Contains LED data 16 /"VIB"] ; Contains VIB data

17 11.4 Definition of cmf-tracks

18 This section defines the cmf-tracks field. A cmf file can contain one or more tracks, as 19 indicated by the nTracks field. 20 21 ; cmf-tracks specifies the multi media content and synchronization 22 cmf-tracks = "trac" length4 *event 23 24 event = delta-time event-message 25 delta-time = OCTET 26 ; Delta time is described the elapsed time from a 27 previous 28 ; event. The unit of time is determined from Tempo and 29 ; TimeBase. Default Tempo Value is 125. 30 ; Default TimeBase Value is 48. 31 event-message = note-message 32 / ext-A-message 33 / ext-B-message 34 / ext-info-message 35 36 note-message = note-status gate-time 37 / note-status gate-time vel-oct-shift 38 ; If note-msg-config is 1, 39 ; we have velocity-octaveShift info. 40 note-status = OCTET 41 ; vvkkkkkk 42 ; vv = Assigned channel index, defined 43 ; with respect to the channel reference index. 44 ; kkkkkk = key number, from 0 to 62. 45 ; Key number 63 is prohibited. 46 ; Key number 15 is middle C of keyboard 47 gate-time = OCTET 48 ; Continuation time from note-on to note-off. 49 ; If a gate-time value of more than 255 is required 50 ; multiple note-messages are used. See the examples. 51 vel-oct-shift = OCTET 52 ; Velocity (6 bits) and OctaveShift (2 bits) 53 ; eeeeeess 54 ; eeeeee = Velocity. 0 to 63. 55 ; ss = Octave Shift 56 ; 01 = Increase one octave

File Format for Multimedia Services for cdma2000 35 C.S0050-0 v1.0

1 ; 00 = ineffective 2 ; 11 = decrease one octave 3 ; 10 = decrease two octaves 4 5 ; Extension Information message format 6 ext-A-message = %xFF A-command-data 7 ext-B-message = %xFF B-command-data 8 ext-info-message = %xFF %x11110001 length2 -data 9 / %xFF %x11110010 length2 text-data 10 / %xFF %x11110011 length2 picture-data 11 / %xFF %x11110100 length2 animation-data 12 13 14 A-command-data = UINT16 15 ; Assigned channel fine pitch bend command 16 ; A-command-data = 0vvnnnnn nnnnnnnn 17 ; vv = Assigned channel index (0..3) 18 ; nnnnnnnnnnnnn = Fine Pitch Bend [cents] 19 ; (range: 0x0000 to 0x1fff) (see table) 20 ; Fine pitchbend message sets the change value of the 21 ; pitch specified in the note message. 22 23 24 B-command-data = master-volume 25 / master-balance 26 / master-tune 27 / part-configuration 28 / pause 29 / stop 30 / reset 31 / timebase-tempo 32 / cuepoint 33 / jump 34 / NOP 35 / end-of-track 36 / program-change 37 / bank-change 38 / volume 39 / panpot 40 / pitchbend 41 / channel-assign 42 / pitchbend-range 43 / wave-channel-volume 44 / wave-channel-panpot 45 / text-control 46 / picture-control 47 / vib-control 48 / LED-control 49 50 51 master-volume = %b10110000 %x00-7F ; Specifies the volume of system 52 ; for all audio events. The 53 ; default value is 100 (0 dB). 54 ; Range is from 0 to 127. 55 master-balance = %b10110001 %x00-7F ; 0x00 = Pan Left, 0x40 = Center, 56 ; 0x7F = Pan Right 57 master-tune = %b10110011 %x34-4C ; Master Tune for music synthesizer 58 ; 0b00110100 = -(12 x 100 ) [cents] 59 ; … 60 ; 0b00111110 = -( 2 x 100 ) [cents]

File Format for Multimedia Services for cdma2000 36 C.S0050-0 v1.0

1 ; 0b00111111 = -( 1 x 100 ) [cents] 2 ; 0b01000000 = 0 [cents] 3 ; 0b01000001 = ( 1 x 100 ) [cents] 4 ; 0b01000010 = ( 2 x 100 ) [cents] 5 ; … 6 ; 0b01001100 = (12 x 100 ) [cents] 7 part-configuration = %b10111001 %x00 ; reserved 8 pause = %b10111101 %x00 ; 0x00 = Pause player 9 stop = %b10111110 %x00 ; 0x00 = Stop player 10 reset = %b10111111 %x00 ; 0x00 = Reset controllers 11 timebase-tempo = %b11000000-%b11001111 %x14-FF 12 ; Timebase table 13 ; Bottom 4-bits of command value 14 ; contain timebase index (see 15 ; table) 16 ; Data value contains tempo. Tempo 17 ; is defined as the number of 18 ; quarter notes in one minute. 19 cuepoint = %b11010000 %x00-01 ; 0x00 = Cuepoint startpoint, 20 ; 0x01 = Cuepoint endpoint 21 jump = %b11010001 jump-data 22 jump-data = OCTET ; jump-data = ddiimmmmm 23 ; dd = destination(0)/jump(1) point 24 ; ii = jump ID (0 to 3) 25 ; mmmm = # of times to jump 26 ; (15 = infinity) 27 NOP = %b11011110 NOP-data 28 NOP-data = OCTET ; NOP-data contains value N in 29 ; equation 256 * N + (delta time). 30 end-of-track= %b11011111 %x00 ; 0x00 = end of track 31 program-change = %b11100000 prog-data 32 prog-data = OCTET ; prog-data = ccdddddd 33 ; cc = channel index 34 ; dddddd = program change value 35 ; (range: 0x00 to 0x3f) 36 bank-change = %b11100001 bank-change-attr 37 bank-change-attr = OCTET ; bank-change-attr = ccdddddd 38 ; cc = channel index 39 ; dddddd = bank change value 40 ; (range: 0x00 to 0x3f) 41 42 volume = %b11100010 volume-attr 43 volume-attr = OCTET ; volume-attr = ccdddddd 44 ; cc = channel index 45 ; dddddd = volume change value 46 ; (range: 0x00 to 0x3f) 47 48 panpot = %b11100011 panpot-attr 49 panpot-attr = OCTET ; panpot-attr = ccdddddd 50 ; cc = channel index 51 ; dddddd = panpot change value 52 ; (range: 0x00 to 0x3f) 53 ; 0x00 = Far Left 54 ; 0x20 = Center 55 ; 0x3F = Far Right 56 pitchbend = %b11100100 pitchbend-attr 57 pitchbend-attr = OCTET ; pitchbend-attr = ccdddddd 58 ; cc = channel index 59 ; dddddd = pitchbend change value 60 ; (range: 0x00 to 0x3f) (see table)

File Format for Multimedia Services for cdma2000 37 C.S0050-0 v1.0

1 channel-assign = %b11100101 channel-data 2 channel-data = OCTET ; channel-data = cc00dddd 3 ; cc = channel index (0-3) 4 ; dddd = channel value (0-15) 5 pitchbend-range = %b11100111 pitchrange-data 6 pitchrange-data = OCTET ; pitchrange-data = cc0ddddd 7 ; cc = channel index 8 ; ddddd = pitch bend range(0-12) 9 wave-channel-volume = %b11101000 wave-vol 10 wave-vol = OCTET ; wave-vol = ccdddddd 11 ; cc = wave channel index 12 ; dddddd = wave volume change value 13 ; (range: 0x00 to 0x3f) 14 wave-channel-panpot = %b11101001 wave-panpot 15 wave-panpot = OCTET ; wave-panpot – ccdddddd 16 ; cc = wave channel index 17 ; dddddd = wave panpot change value 18 ; (range: 0x00 to 0x3f) 19 text-control= %b11101011 %x00-05 ; 0x00 = Text Enable 20 ; 0x01 = Text Disable 21 ; 0x02 = Clear text 22 ; 0x03 = reserved 23 ; 0x04 = Increase cursor by 1 byte 24 ; 0x05 = Increase cursor by 2 bytes 25 picture-control = %b11101100 %x00-02; 0x00 = Picture Enable 26 ; 0x01 = Picture Disable 27 ; 0x02 = Clear picture 28 29 LED-control = %b11101101 led-data 30 led-data = OCTET ; led-data = %b0cdddddd 31 ; c = enable/disable LED 32 ; dddddd = color pattern 33 vib-control = %b11101110 vib-data 34 vib-data = OCTET ; vib-data = %b0cdddddd 35 ; c = enable/disable vib 36 ; dddddd = vibrator pattern 37 38 39 ; Wave packet data 40 ; wav-data. The data chunk can be multiple audio formats. 41 ; Wave packet allows recycling of data, player must implement storage 42 ; functions for recycled data 43 ; The *data is length2 – 7 bytes. length2 in ext-info-message. 44 wav-data = wav-atrb1 wav-atrb2 packet-offset prev-flag *data 45 wav-atrb1 = OCTET ; Channel # (2 bits) and Channel ID (6 bits) 46 ; ccdddddd 47 ; cc = Wave channel # (0-3) 48 ; dddddd = Wave channel ID (0-63) 49 wav-atrb2 = OCTET ; Wave packet mode (2) and Wave format (6) 50 ; rrffffff 51 ; rr = Wave packet mode 52 ; 00 = Store Mode 53 ; 01 = Set Mode 54 ; 10 = Recycle Mode 55 ; ffffff = Wave format 56 ; 000000 = WAV/RIFF format 57 ; 000100 = QCP-13k format 58 ; 000101 = IMA ADPCM format 59 packet-offset = UINT32 ; specifies offset in bytes to next Wave packet 60 prev-flag = OCTET ; specifies if current Wave packet is continued

File Format for Multimedia Services for cdma2000 38 C.S0050-0 v1.0

1 ; from previous (for those formats with frame 2 ; history) 3 4 ; Text packet data 5 ; The *data is length2 – 1 bytes. length2 in ext-info-message. 6 7 text-data = text-atrb *data 8 text-atrb = OCTET ; Set/Append and XY Alignment 9 ; 0cxxxyyy 10 ; c = Set(0)/Append(1) a string 11 ; xxx=x-align (000=Left, 001=Center, 010=Right) 12 ; yyy=y-align (000=Bottom, 001=Center, 010=Top) 13 14 15 ; Picture packet data 16 ; The *data is length2 – 5 bytes. length2 in ext-info-message. 17 ; Picture packet allows recycling of data, player must implement storage 18 ; functions for recycled data 19 20 picture-data = pict-atrb1 pict-atrb2 pict-atrb3 pict-x-off pict-y-off 21 *data 22 pict-atrb1 = OCTET ; ccaaaaaa 23 ; cc = reserved 24 ; aaaaaa = Picture packet ID (0-63) 25 pict-atrb2 = OCTET ; Picture packet mode (2) and format (6) 26 ; rrffffff 27 ; rr = Picture packet mode 28 ; 00 = Store Mode 29 ; 01 = Set Mode 30 ; 10 = Recycle Mode 31 ; ffffff = Picture format 32 ; 000001 = BMP format 33 ; 000010 = JPEG format 34 ; 000011 = PNG format 35 pict-atrb3 = OCTET ; Draw Mode: Normal 36 ; 00000000 37 pict-x-off = OCTET ; If subchunk for Picture packet = 0 38 ; 00000000 = X-offset 0% 39 ; 00000001 = X-offset 1% 40 ; … 41 ; 01100100 = X-offset 100% 42 ; 01100101 = Left 43 ; 01100110 = Center 44 ; 01100111 = Right 45 ; If subchunk for Picture packet = 1 46 ; pict-x-off = pixel offset from left (0..255) 47 48 pict-y-off = OCTET; If subchunk for Picture packet = 0 49 ; 00000000 = Y-offset 0% 50 ; 00000001 = Y-offset 1% 51 ; … 52 ; 01100100 = Y-offset 100% 53 ; 01100101 = Top 54 ; 01100110 = Center 55 ; 01100111 = Bottom 56 ; If subchunk for Picture packet = 1 57 ; pict-y-off = pixel offset from top (0..255) 58 59 ; Animation packet data 60 ; The *data is length2 – 8 bytes. length2 in ext-info-message.

File Format for Multimedia Services for cdma2000 39 C.S0050-0 v1.0

1 2 animation-data = -atrb0 anim-atrb1 anim-atrb2 anim-x-off anim-y- 3 off *data 4 anim-atrb0 = 4OCTET / UINT32 5 anim-atrb1 = OCTET ; Animation packet mode (2) and packet ID (6) 6 ; rrffffff 7 ; rr = Animation packet mode 8 ; 01 = reserved 9 ; ffffff = Animation packet ID 10 ; 000000 = reserved 11 12 anim-atrb2 = OCTET ; cccaaaaa 13 ; ccc = Animation format 14 ; 000 = Reserved animation format 15 ; 001 16 ; … 17 ; 111 = Reserved for Future animation format 18 ; aaaaa = Animation Command 19 ; 00000 = Image object data 20 ; 00001 = Frame ID 21 ; 00010 = Frame command 22 anim-x-off = OCTET ; If subchunk for Animation packet = 0 23 ; 00000000 = X-offset 0% 24 ; 00000001 = X-offset 1% 25 ; … 26 ; 01100100 = X-offset 100% 27 ; 01100101 = Left 28 ; 01100110 = Center 29 ; 01100111 = Right 30 ; If subchunk for Animation packet = 1 31 ; anim-x-off = pixel offset from left (0..255) 32 33 anim-y-off = OCTET ; If subchunk for Animation packet = 0 34 ; 00000000 = Y-offset 0% 35 ; 00000001 = Y-offset 1% 36 ; … 37 ; 01100100 = Y-offset 100% 38 ; 01100101 = Top 39 ; 01100110 = Center 40 ; 01100111 = Bottom 41 ; If subchunk for Animation packet = 1 42 ; anim-y-off = pixel offset from top (0..255) 43 44

45 11.5 Tables

46 Tables required for cmf file entries.

47 11.5.1 TimeBase Values

48 TimeBase is expressed by the lower 4-bits of the status byte. The default value is 48. 49 ----0000 TimeBase = 6 ----0001 TimeBase = 12

File Format for Multimedia Services for cdma2000 40 C.S0050-0 v1.0

----0010 TimeBase = 24 ----0011 TimeBase = 48 ----0100 TimeBase = 96 ----0101 TimeBase = 192 ----0110 TimeBase = 384 ----0111 Reserved ----1000 TimeBase = 15 ----1001 TimeBase = 30 ----1010 TimeBase = 60 ----1011 TimeBase = 120 ----1100 TimeBase = 240 ----1101 TimeBase = 480 ----1110 TimeBase = 960 ----1111 Reserved 1 Table 11-1: TimeBase Values

2 11.5.2 Pitch Bend Range Values

3 The following table contains a description of the contents of Pitch Bend Range. 4 “RangeValue” is defined in the section of Assigned Channel PitchBend Range. That 5 initial value is 2. 6 000000 -( 32 x RangeValue x 100 / 32 ) [cents] … … 011110 -( 2 x RangeValue x 100 / 32 ) [cents] 011111 -( 1 x RangeValue x 100 / 32 ) [cents] 100000 0 [cent] 100001 ( 1 x RangeValue x 100 / 32 ) [cents] 100010 ( 2 x RangeValue x 100 / 32 ) [cents] … … 111111 ( 31 x RangeValue x 100 / 32 ) [cents] 7 Table 11-2: Pitch Bend Range values

8 11.5.3 Fine Pitch Bend Values

9 The following table contains a description of the fine pitch bend value. 10 0000000000000 -( 4096 x RangeValue x 100 / 4096 ) [cents] … …

File Format for Multimedia Services for cdma2000 41 C.S0050-0 v1.0

0111111111110 -( 2 x RangeValue x 100 / 4096 ) [cents] 0111111111111 -( 1 x RangeValue x 100 / 4096 ) [cents] 1000000000000 0 [cent] 1000000000001 ( 1 x RangeValue x 100 / 4096 ) [cents] 1000000000010 ( 2 x RangeValue x 100 / 4096 ) [cents] … … 1111111111111 ( 4095 x RangeValue x 100 / 4096 ) [cents] 1 Table 11-3: Fine PITCH bend range values

2 11.6 Acceptable Profiles for CMF file format

3 4 It is required that compliant players check the acceptable profiles through the cnts 5 subchunk. Only the following profiles are specified. Other profiles are invalid 6 configurations of the CMF file format. Additional profiles may be added in future 7 versions.

8 11.6.1 Talking Picture messaging

9 cnts = WAV;PICT 10 This profile is primarily used for messaging applications. PICT can either be JPEG or 11 PNG for graphics/image data.

12 11.6.2 Audio-only profile

13 cnts = SONG;WAV 14 This profile is primarily used for ringers and other audio only applications such as the 15 audio portion of a game application. SONG is used for MIDI and WAV is used to 16 provide QCP or ADPCM sound effects.

17 11.6.3 Picture ringers

18 cnts = SONG;WAV;PICT 19 This profile is an enhancement on 11.6.2 that adds graphics capability for picture or 20 animated ringers. PICT can be either PNG or JPEG format.

21 11.7 CMF Conformance Guidelines

22 This section describes implementation requirements and recommendations for 23 encoders and decoders interoperating with content in the 3GPP2 specified CMF file 24 format.

25 11.7.1 Subchunk Requirements

26 There are 3 required subchunks: note, vers, and cnts. All encoders are required to 27 include these subchunks and all decoders are required to verify the existence of these

File Format for Multimedia Services for cdma2000 42 C.S0050-0 v1.0

1 subchunks before playing the content.

2 11.7.2 MIDI Requirements

3 Implementations of all MIDI related functions are interpreted the same as General MIDI 4 requirements set up by the MMA. Where parameters are provided with different 5 resolution, those parameters should be mapped to equivalent dynamic range.

6 11.7.3 Wave Packet Requirements

7 Encoders are recommended to break wave packets into reasonable subchunks and time 8 stamped with the use of the prev-flag to ensure continuous decoding of audio packets. 9 A typical implementation breaks audio packets into 0.5 second chunks with 10 appropriate time stamps. 11 This recommendation allows client players to implement effective fast-forward and 12 rewind operations without affecting the ability to properly handle wave packets. Using 13 0.5 second chunks allows a typical decoder to achieve 0.5 second resolution in forward 14 and rewind increments. 15 Decoders should be able to correctly implement a continuous bitstream interface to the 16 decoder when wave packets are provided with prev-flag data set. When prev-flag is not 17 set, the decoder should reset the wave decoder appropriately. 18

19 11.7.4 Cuepoints and Jumps

20 21 11.7.4.1 Cuepoints

22 Cuepoints are used to provide an alternative play mode for CMF files. When in cue- 23 point play mode, the decoder should jump to the cue start point when starting 24 playback. All rules for setup that are observed for normal playback at the beginning of 25 the file should be observed. For example, an encoder is required to insert all 26 configuration events in between cuepoint boundaries even if those events are 27 redundant with configuration events outside cue-point boundaries. 28 11.7.4.2 Jump Points

29 Jump points are used to reuse portions of the playback using loops to reduce file size. 30 The decoder is required to parse a Jump Destination point and save a pointer to the 31 file. Up to 4 JUMP ID’s can be saved for later reference. When a jump command is 32 received for a given destination ID, the decoder should continue playback from the 33 destination point as normal with no further memory. The loop number specifies the 34 number of times the jump should be taken. After the final jump, decoding should 35 continue as normal ignoring the final jump command. 36

37 11.7.5 Recycle Requirements

38 Recycling is supported in Picture, Wave, and Animation packets. The use of recycle is 39 recommended to optimize file sizes for data transmission. Each packet group allows for 40 up to 64 individual ID’s to be selected be used for recycle.

File Format for Multimedia Services for cdma2000 43 C.S0050-0 v1.0

1 11.7.5.1 Store Command

2 The “store” operation specifies that the decoder should not display the data, but instead 3 cache the data for displaying at some future time. 4 11.7.5.2 Set Command

5 The “set” operation specifies that the decoder should both cache the data and 6 displaying it. 7 11.7.5.3 Recycle Command

8 The “recycle” operation specifies that the decoder should redisplay picture image data 9 previously cached by a “store” or “set” operation that used the same “packet ID” value 10 specified in the “Attributes 1” field. 11 12

13 11.8 File Extension and MIME type for Media presentation

14 The media files created as per the above format specification shall use the extension of 15 “.cmf”, short for Compact Multimedia Format. The MIME type “application/cmf” is 16 expected to be registered and used.

17 Annex A. File formats Difference with 3GPP (Informative)

18

19 A.1 Relations between ISO, 3GPP, and 3GPP2 file format

20 ISO defines ISO Base Media File Format as a basis of developing a media container for 21 various usages. It describes a basic architecture of the multimedia file, and mandatory 22 /optional elements in it. There are some extensions over ISO Base Media File Format, 23 one of which is an MP4 file format to support MPEG-4 visual/audio codecs and various 24 MPEG-4 Systems features such as object descriptors and scene descriptions. 25 26 27 MP4 extension ISO Base Media 28 File Format 29 30 31 Figure A 1: File formats in ISO

32 3GPP extended ISO Base Media File Format to incorporate new media codecs and a 33 timed text feature. They also use MPEG-4 visual/audio codecs, a portion of MP4 34 extension is included. The relation is depicted in Figure A.2. It is noted that 3GPP file 35 format does not use some features in ISO Base Media File Format. 36 3GPP file format

File Format for Multimedia Services for cdma2000 44 C.S0050-0 v1.0

1 2 3 4 5 6 Figure A 2: 3GPP file format

7 3GPP2 employs full aspects of ISO Base Media File Format. It also adds new codecs and 8 extends a 3GPP timed text. On the other hand, it only uses the same portion of MP4 9 extension as 3GPP does. Figure A.3 illustrates it. 10 3GPP2 file format 11 12 13 14 15 16 17 Figure A 3: 3GPP2 file format

18

19 A.2 Differences between 3GPP2 and 3GPP

20 a) Features in ISO Base Media File Format 21 • Movie fragment 22 3GPP2 does not prohibit this feature while 3GPP does. It may be useful for 23 various applications, e.g., pseudo-streaming and live authoring of a movie file. 24 b) New extensions 25 • QCELPSampleEntry and 13K speech support in MP4AudioSampleEntry 26 • SMVSampleEntry 27 • EVRCSampleEntry 28 These codecs are used in 3GPP2 and there are no definitions in 3GPP file 29 format, so their encapsulations are defined. 30 c) Enhancements to 3GPP features 31 • Enhancements to 3GPP Timed Text 32 Word wrap and link functionality for phone and mail are enhanced from 3GPP Timed 33 Text. 34 d) File identifications 35 3GPP2 has its own file extension, MIME types, and file brand identifier.

File Format for Multimedia Services for cdma2000 45 C.S0050-0 v1.0

1 A.3 Usage of 3GPP branding

2 Conditions for using 3GPP branding are as follows: 3 • The media types contained in the “.3g2” file are restricted to those identified 4 for use in the “.3gp” file format [5],[6], specifically AMR and AMR-WB 5 speech, MPEG4 and H.263 video, MPEG4 AAC audio, and timed text 6 without optional textwrap feature. 7 • The file does not include fragmented media. 8 It is left to the implementer to understand the implications of the absence of minor 9 versioning support in the compatible branding list.

10 Annex B. Guideline for File Format Usage (Informative)

11 FFMS is a generic standard and includes all features. Since some features may be 12 useful but others may not in some services, this section shows a usage guideline in 13 various services.

14 B.1 MSS (Multimedia Streaming Service)

15

16 B.1.1 Server storage for RTP streaming

17 A streaming server stores multimedia content in 3GPP2 file format, reads out media 18 data from the file, and transmits to a client in an RTP packet. Hint tracks can be useful 19 to the server when creating RTP packets.

ISO file moov mdat …other boxes trak (video) Interleaved, time-ordered, video trak (audio) and audio frames, and hint instructions trak (hint)

20 21 Figure B 1: Hinted Presentation for Streaming (Reprint from ISO/IEC 14496-12)

22 B.1.2 Transmission format for pseudo-streaming

23 The definition of pseudo-streaming is media playback during its download. It is 24 assumed that the download is carried out by HTTP (TCP) in this contribution. 25 26 HTTP is used for control and data transmission.

File Format for Multimedia Services for cdma2000 46 C.S0050-0 v1.0

Client Server GET /foo.mp4 http/1.1 HTTP/1.1 200 OK

Start of the file receiving Start of playback

End of the file receiving 1 2 3 Figure B 2: Basic sequence of pseudo-streaming

4 5 HTTP specifies the file as a control protocol, and the file is transmitted in the response 6 to the request. Playback starts during file download. This avoids a lengthy wait for a file 7 to finish downloading (due to the size of the file and/or the transmission channel 8 bitrate). Additionally, some receivers may not have sufficient memory to store an entire 9 file. To ensure smooth playback, the client must receive all necessary header 10 information and the first part of media data of sufficient temporal length to 11 accommodate the maximum jitter in the system. Accordingly, requirements for the file 12 format are as follows. 13 1) moov position 14 moov has information necessary for decoding the media data in the file and 15 therefore needs to be positioned at the beginning of the file. 16 2) media interleave 17 If multiple media are included in the file, then they should be interleaved as a 18 “chunk”. The size of a chunk relates waiting time to playback. A typical size of a 19 chunk is a few seconds. 20 3) movie fragmentation 21 The size of moov becomes large for lengthy movies. Therefore, long movies should 22 be fragmented with each fragment having a header (moov or moof). 23 4) I-frame beginning 24 The first video frame in mdat should start with an I-frame so that the client can 25 start decoding from the first frame. 26 5) Timed Text 27 Timed Text can be supported for the pseudo-streaming service. Text samples are 28 also interleaved as well as audio and video samples. 29 30 ftyp moov A V A V A V moof A V A V 31 32 33 chunk 34 fragment

File Format for Multimedia Services for cdma2000 47 C.S0050-0 v1.0

1 2 Figure B 3: Fragmented movie file format.

3 B.2 MMS

4 3GPP2 files used for the purpose of MMS contain rather short duration clips that are 5 transferred from a client to a server and vice versa. Considering the MMS feature 6 should require less complexity, the following restrictions are useful. 7 • Movie fragment is not useful for short duration video, therefore it should not be 8 used. 9 • The maximum number of tracks should be one for video, one for audio and one 10 for text. 11 • The maximum number of sample entries should be one per track for video and 12 audio (but unrestricted for text). 13 • Compact sample sizes ('stz2') should not be used.

14 B.3 File download and play back

15 A 3GPP2 file is downloaded to a client and played back locally. Random positioning in 16 the downloaded clip is an attractive feature. However, addressing information included 17 in the default boxes such as ‘stts’ and ‘stsc’ only have a relative time difference, thus 18 some processing is required to get the absolute address within the file that corresponds 19 to the indicated relative time difference. mfra Box provides direct address information 20 in the file and is useful for random positioning during play back. 21 22 Annex C. Support for Additional Codecs in QCP File Format (Informative) 23

24 C.1 EVRC and SMV Speech

25 The “.qcp” file format as described in [22] supports storage for EVRC and SMV 26 vocoders. The MIME type for “.qcp” files with EVRC is “audio/evrc-qcp” and the MIME 27 type for “.qcp” files with SMV is “audio/smv-qcp”. 28 29 30

File Format for Multimedia Services for cdma2000 48