Looney Tunes: Exposing the Lack of DRM Protection in Indian Streaming Services

Ahaan Dabholkar Sourya Kakarla Dhiman Saha [email protected] [email protected] [email protected] de.ci.phe.red Lab, Indian Institute Of Indian Institute Of Technology de.ci.phe.red Lab, Indian Institute Of Technology Bhilai Kharagpur Technology Bhilai

Abstract developed online payment infrastructure and continual de- Numerous studies have shown that streaming is now the mand for content within the entertainment domain, projects most preferred way of consuming multimedia content and the global OTT service market to grow from $81.60 billion this is evidenced by the proliferation in the number of stream- in 2019 to $156.9 billion by 2024 exhibiting a CAGR (Com- ing service providers as well as the exponential growth in pound Annual Growth Rate) of 14% [38]. The Asia Pacific their subscriber base. Riding on the advancements in low region is set to record the highest growth rate during the cost electronics, high speed communication and extremely forecast period. According to a joint report published by cheap data, Over-The-Top (OTT) music streaming is now the the (IMI) and Deloitte [18], the norm in the music industry and is worth millions of dollars. audio-video OTT market in India is valued at around US$ This is especially true in India where major players offer 280 million with nearly 150 million monthly active users the so called freemium models which have active monthly accessing soundtracks across various platforms. user bases running in to the millions. These services namely, Gaana[33], Airtel Wynk[32] and JioSaavn[34] attract a signif- icantly bigger audience than their 100% subscription based Service Name Business Model Origin Reference peers like Amazon Prime Music, etc.[35] Given Bundle, Airtel Wynk Domestic [32] their ubiquity and market dominance, it is pertinent to do a Ad Supported systematic analysis of these platforms so as to ascertain their potential as hotbeds of piracy. This work investigates the re- Apple Music Paid International [4] silience of the content protection systems of the four biggest Paid International [2] music streaming services (by subscriber base) from India, Gaana Ad Supported Domestic [33] namely Airtel Wynk, Ganna, JioSaavn and Hungama. By considering the Digital Rights Management (DRM) system Hungama Ad Supported Domestic [6] employed by as a benchmark, we analyse the secu- Bundle, JioSaavn Domestic [34] rity of these platforms by attempting to steal the streamed Paid content efficiently. Finally, we present a holistic overview of the flaws in their security mechanisms and discuss possi- Spotify Ad Supported International [11] ble mitigation strategies. To the best of our knowledge, this Youtube Music Subscription International [13] work constitutes the first attempt to analyze security of OTT Table 1. OTT music services currently operating in India music services from India. Our results further confirm the time tested belief that security through obscurity is not a

arXiv:2103.16360v1 [cs.CR] 30 Mar 2021 long term solution and leaves such platforms open to piracy and a subsequent loss of revenue for all the stakeholders. Revenue from digital means contributes nearly 78%to the overall recorded music industry revenue in India and Keywords: Digital Rights Management, Web Security, Piracy, 54% [27], globally. A survey of India’s audio streaming mar- OTT Audio Streaming ket reveals that it is primarily divided among domestic play- ers Wynk, Gaana, JioSaavn, Hungama and global players Spotify, Amazon Music, Apple Music and more recently 1 Introduction YouTube Music (Table 1). As per a consumer insights survey OTT is an acronym for “over-the-top” and refers to the dis- conducted by the IFPI in 2018 [28], an average internet user tribution of multimedia (audio, video) content over a pub- in India spends 21.5 hours every week listening to music, lic network. Recent trends have shown a mass adoption of higher than the global average of 17.8 hours. It is interesting smart mobile devices in the consumer market. This coupled to note that despite the popularity, contemporary literature with a higher penetration of high-speed, cheap Internet and lacks security analysis of any of the domestic OTT platforms the emergence of advanced technologies, such as 5G, 4G, and forms the primary motivation of this work. 156 Streaming Other Digital Physical salvage the situation. As a part of responsible disclosure, this 9 work was attempted to be communicated to the concerned Performance Rights Synchronisation 125 15 11 parties. With the exception of Wynk, response from others 110 7 13 94 12 is awaited. 88 16 9 2 14 12 3 11 Our Contributions 11 13 28 20 19 Our contributions can be summed up as follows - 109 20 20 83 • We present a security analysis of the content protec-

52 tion systems in place for four of the biggest music 31 34 streaming services (by subscriber base) in India. • We highlight the lax security protocols in place in all 2014 2015 2016 2017 2018 these services by attempting to steal content in an Note: All USD numbers above have been calculated using exchange rate $1 = INR 68.43 Source: IFPI, Global Music Report, 2019 undetectable way and provide proof of concepts to automatically acquire content by reverse engineering Figure 1. The dominance of streaming as the main source their content delivery protocols. of revenue in the Indian music industry [22] • We present a comparative study of these apps with the current state-of-the-art DRM systems. • We present a discussion on the design choices em- ployed by these services and make recommendations This easy and free access to content was thought to have to enhance their security. solved several issues regarding unsanctioned sharing of me- dia [30] as it provided Music-as-A-Service which was more Organisation Of The Paper lucrative to the consumer than content ownership [24]. How- The following sections contain the conclusions and results of ever, with the consequential emergence of “stream-ripping”, our experiments while reverse engineering said services. We piracy has increasingly kept pace. The gravity of the situa- first provide a primer on Adaptive Streaming in Section[2.1] tion reflects in the numbers where estimates point to almost which is used by most of the OTT streaming services and US$ 250 million lost each year in India alone while the es- which would help us elucidate the protocols involved clearly. timated number of stream-rippers in the US have grown to We follow this up in Section[2.3] with a brief note on present an alarming 17 million [42]. The surging popularity of such day DRM systems. Section[3] is dedicated to describing the platforms has also not been missed by the shadier sections Widevine DRM used by Spotify to protect it’s content, to of our society with more sinister agendas [14]. Couple this establish a benchmark for comparing the other services. This with the 40% - 60% of revenue that is lost to pirates, there is leads us into the results of the reverse engineering in Sec- hence a dire need to take a critical look at the security of such tion[4] where we give reconstructions of the protocols used. content delivering platforms. A recent paper on bypassing Section[5] contains discussions on the flaws in current DRM DRM protection in online video streaming [45] is one of the systems and the design choices made by these services fol- many research efforts highlighting the need to have a deeper lowed by our conclusions in Section[7]. understanding on how OTT services should be deployed in practice. 1.1 Responsible Disclosure In this work we systematically analyze the four leading OTT music service providers in India namely Wynk, Gaana, All the services mentioned here were contacted prior to sub- JioSaavn and Hungama comparing them to the best practices mission of this manuscript with reports on the vulnerabilities in the industry. To our great surprise, our research reveals in their protocols and with offers to collaborate on the fix.It that none of these platforms adopt any state-of-the-art DRM should be noted that none of these services have vulnerabil- protection. Contrary to this they actually attempt a very ity disclosure programs and hence finding a suitable point rudimentary form of code obfuscation. As a result, we were of contact was tough. When informed of the break, Airtel able to not only reverse engineer their protocols but also Wynk was all for the idea of a collaborative fix but ended devise mechanisms leading to automated, unsupervised and up deploying a haphazard patch without consultation and uninterrupted download of music from their servers. We proper notice which was broken eventually using the same develop detailed Proof-of-Concepts for the same and illus- techniques. trate case-studies on each of the platforms. To put things in context, we also investigate the Spotify web-application 2 Background and find it adopting very standard DRM protection making This section is provided as a primer for familiarising the it a benchmark in the comparative study that we furnish reader with certain technologies that are heavily referred to later. Finally, we discuss possible mitigation strategies to in this work. 2 2.1 Adaptive Streaming Classical streaming protocols used a technique called progres- sive streaming to deliver content. In this technique, a single file sitting on the vendors’s server was delivered to the client requesting it. Though this method was simple, it had some obvious inefficiencies which are demonstrated using atoy example below-

1. Consider two clients with two different displays, one having a 720p display and the other having a 4K one. With the progressive streaming protocol, both clients would be delivered the same content despite the dif- ferences in their hardware capabilities. If the content streamed was in 4K, it would not pose a problem for the second client, however for the first client it would imply that he receive 4K media which would eventu- ally be downscaled to 720p (or not run at all, depending on the decoding hardware) 2. A problem would also arise if one of the clients had Figure 2. HLS Architecture severley limited network bandwidth. This client would be unable to consume content meaningfully owing to A typical configuration (Figure 2) consists of a hardware it’s unnecessarily large size. encoder that encodes Audio/Video input into MP3/H.264 and encapsulates it into an MPEG-2 transport stream. A software The idea of adaptive streaming aims at solving both these segmenter then divides the stream into chunks(.ts ) of issues in real time. The first problem is solved by having equal duration and creates an index file index.m3u8 that encodings at multiple bitrates of the same media on the contains links to those chunks. This process is carried out content-delivery servers while the second problem is solved for each encoding of the A/V stream and a master index by providing the client with the ability to switch between en- file, also called the manifest is generated and usually named codings mid-stream depending on it’s resources. This adap- master.m3u8. The manifest identifies and points to the dif- tiveness is facilitated by dividing the source content into ferent index files available for that particular A/V stream. chunks and indexing it. Hence if network degradation is This manifest is then served by the streaming server over detected by the client, the next chunk can be retrieved from HTTP to the client which then selects the suitable encoding a lower bitrate, thus maintaining the flow of content. based on the resources available and requests the index file We will now describe the HTTP Live Streaming (HLS) of that encoding. Once the index is received, the client se- Protocol, an adaptive streaming protocol which is predomi- quentially makes requests for the chunks, enabling playback nantly used by most streaming services including, in our case, on its device. When a network change is detected, a lower by Wynk Music to serve content efficiently. An understand- bitrate encoded index file is requested in order to retrieve ing of its architecture is necessary to grasp the underlying the next chunk, for continuous playback. protocol. The HLS Protocol was developed by Apple Inc and released to the public in 2009. According to survey reports 2.3 Digital Rights Management (DRM) from 2019, HLS remains the most adopted protocol with DRM can be thought of as a digital lock or as protections in more than 45% of broadcasters using it to provide streaming place to secure proper usage of proprietary technologies services to their clients.[41]. and copyrighted works[21]. Although enforced in many countries through licensing agreements and laws such as the Digital Millenium Copyright Act (DMCA)[40] which 2.2 HTTP Live Streaming (HLS) criminalise circumvention, their efficacy and ulterior mo- As the name suggests, HLS is an adaptive streaming protocol tives have been the subject of constant debate. A discussion that delivers content over HTTP/HTTPS. The HLS Architec- on these technicalities is however not relevant to this pa- ture [26] essentially involves three components- per, where we will instead choose to focus upon the DRM techiques used primarily by OTT service providers. Most 1. The Streaming Server streaming services such as Netflix, Hulu, Amazon Prime re- 2. The Distribution Component quire playback devices to support some form of DRM. Com- 3. The Streaming Client mon choices fir DRM schemes are Microsoft PlayReady[8], 3 Apple FairPlay[3], Adobe PrimeTime[1], Marlin[7] and Google workings. The documented architecture[23] for Widevine Widevine[12]. Most of these DRM schemes atleast provide was also heavily referred to in order to establish context. browser support through Content Decryption Modules[20] (CDMs) which follow the Encrypted Media Extensions (EME) 3.2 Summary Of Findings [20] specification which in turn is implemented by all ma- Spotify1 is currently using Widevine Level L3 to implement jor browsers today. This uniformity has resulted in an un- DRM for it’s content which is streamed to a modified ver- precedented ease of implementing basic content protection sion of the Shaka player that uses the HTML5 Media Source at an efficient cost. An example would be Google’s Shaka Extensions to interact with the CDM.2 Player[10]: an open source player which can be integrated The CDM is a precompiled binary3, implemented as a into a project with relative ease, which uses the Widevine shared library (libwidevinecdm.so in Linux for Google DRM scheme and supports adaptive streaming over MPEG- Chrome and as a plugin for Mozilla Firefox. DASH[48] and HLS. Coming to the retrieval, Figure (3) depicts the protocol fol- lowed. We would like to point out that we have intentionally Earlier Work on DRM left out the exact details in some parts of the protocol in the interests of keeping the description brief. Breaking DRM protection has been the focus of all kinds of hacker groups ever since the popularity of commercial 1. Login This is the first part of the protocol that a client software grew, giving rise to the so called pirate hubs which encounters when trying to start playback. Streaming are still popular today. Right from spoofing KMS systems for does not start until user identification and authentica- acquiring Windows licenses to patching AAA titles deploy- tion is done. There are numerous options for authenti- ing the Denuvo Anti Tamper[5] system, the community has cation using OAuth but all of them effectively end up been witness to some rather creative albeit illegal ways of setting identification cookies on the client. Let usde- stealing content over the years. Acacdemic attention to the note these cookies by C. These cookies are later used problem of breaking DRM systems [15, 16, 46] however, has to setup and maintain a player state which is used to proved to be rather mild. To the best of our knowledge, our track playback, sync multiple devices, gather insights work is the first such analysis of OTT Indian music stream- etc. ing webapps. We did however take inspiration for the subject 2. Acquire Access Token An access token is requested from the work done by Wang et. al [46] on automatically by- from the spotify servers using the cookies C. As in passing DRM systems and for a way to present our findings, the protocol, we shall refer to this token by Bearer. we looked at Kumar et. al’s [31] work on analysing UPI apps Bearer is an authenticated token that has a long expi- in India. ration time and is required by and subsequently used for most operations of the Spotify client. 3. Get Resource URI To actually retrieve the media file 3 Spotify: Demonstrating Widevine from the Content Distribution Network (CDN), we Having been baffled by the results of our preliminary inves- require an authorized URI which permits access to the tigations into Wynk, we were curious to see if this trend was content on the server. This URI is obtained from the followed across the board by even the big players in the game Spotify servers by making a request and leveraging the and hence we decided to focus our attention on Spotify. We Bearer token as authorization. If the Bearer token is were quite satisfied to observe that Spotify proved resilient valid, the server responds with a list of multiple URIs to the basic reversing techniques that had proved fatal for (We assume for redundancy). the other services in terms of content security. However we 4. Retrieving the First Chunk Of Data According to must clarify, we do not claim that Spotify is infallible, just the widevine specification, the first chunk of datais that it would require more effort than what was put in for all used to gather licensing information for subsequent the others combined. Here we present a high level overview, decryption of content. Having obtained the CDN URI explaining how Spotify protects its content while streaming in the previous request, the player requests the first and also use this analysis later to highlight the deficiencies chunk of data of a certain size by setting the Range in the other protocols. header in the request. The server response which con- tains a chunk of the media filedistinguished ( from its 3.1 Methodology header) is used to extract initial licensing information We decided to target the Spotify Web Player as it was clearly 1 suited for comparison with the other services. By monitor- For future reference, unless explicitly stated otherwise, a reference to Spotify refers to the Spotify Web Player ing network requests made by the web player and using a 2For a detailed explanation, refer to the EME documentation[20] combination of static and dynamic analysis of the client-side 3An open source CDM or OCDM can be viewed in the Chromium Project’s Javascript modules, we were able to piece together the inner source, however the Chrome CDM is closed source [19] 4 CLIENT BROWSER decryption of these media needs to be performed. The

CDM PLAYER SERVER information (keys, initlialisation vectors) needed for decryption is included in the license. Based on the initData, the CDM generates an encrypted license Authentication using OAuth / Spotify Login Set-Cookies: request and passes it to the player. The player then Login Reqs GET Request to relays this request to a license URI that was obtained open.spotify.com/get_access_token?... Cookies: asynchronously. If the request and its payload is valid, Response clientId: "..." the server responds with an encrypted response that is Get Access Token accessToken: "< Bearer >" ... relayed by the player to the CDM. The CDM decrypts GET Request to spclient.wg.spotify.com/widevine-license/v1/ the response to obtain the license. application-certificate 6. Playback Once the licensing information has been Response Application certificate bytes obtained, 10 second chunks are downloaded from the Get AppCertificate servers and passed to the CDM which decrypts those GET Request to gae-spclient.spotify.com/storage-resolve/files/ audio/interactive/?version=100000... chunks and passes them to the Audio/Video Stack for authorizarion: Bearer playback.

Response cdnurl: ["audio-ak-spotify-com.akamaized.net/ audio/?_token_=exp=<..>hmac=<..>",..], Get Authorized CDN URI fileid : 3.3 Lessons Learnt

GET Request to audio-ak-spotify-com.akamaized.net/ audio/?_token_=exp=<..>hmac=<..> Range: bytes=0-163388 tent as it is streamed to a client’s device. For the sake of

Response establishing standard practices, we highlight a few of them Content Type:Bytes ftypdashiso6mp4......mvhd....pssh below -

Retrieve First Chunk Of Media 1. Mandatory User Identification The login process Extract initData forces a client to identify itself in order to use the GET Request to gae-spclient.spotify.com/melody/v1/license_url? services. Spotify, while providing flexibility of login initData keysystem=com.widevine.alpha... authorization: Bearer options with OAuth also implements reCAPTCHA pro-

Generate Licence Response Request uri: "widevine-license/v1/audio/license?exp= tection against bots. In addition to this, Spotify can Enc. <...>&cp=&tok=<...>", expires: <...> Get License URL track each user’s activity which could potentially be License Request POST Request to used to recognise malicious use. gae-client.spotify.com/widevine- license/v1/audio/license?exp=<...>&cp=&tok=<...> authorization: Bearer 2. Streamed Content is Encrypted To prevent stream- Payload: ripping (discussed in Section 5.5), content stored on the servers is encrypted. Enc. Response: License Blob (bytes) Licence 3. No Hardcoded Keys The keys for decryption of the Blob Retrieve encrypted license (keys) streamed content are not hardcoded in the files that Decrypt Blob to GET Request to audio-ak-spotify-com.akamaized.net/ audio/?_token_=exp=<..>hmac=<..> the user has direct access to. Range: bytes=163389-329949 4. License Information is Invisible to the Player The Response: Enc. Media File Blob (bytes) Enc. for different license information passed between the CDM and the Media Blob Repeat Ranges server is encrypted and hence is not accessible to the

Decrypt Media user. Encrypted Content 5. Content Decryption Module The CDM is theoret- Comms using ically the weakest part of the protocol. However in A/V MSE Stack Method terms of usable/practical security, since it is closed source binary, it offers a basic level of protection against direct observation of the decrypted content, but is the- Figure 3. Spotify’s Content Retrieval Protocol - Widevine oretically vulnerable to black box cryptanalysis tech- L3 (Reconstructed) niques and some implementation level exploits. L2 and L1 level Widevine attempt to mitigate this vul- called initData. initData is then passed on to the nerability by having the decryption occur in a Trusted Content Decryption Module (CDM). Execution Environment(TEE)[44]. 5. Obtaining the License The fragmented4 media chunks Now that a benchmark has been established, we proceed to retrieved from the servers are encrypted using AES-128 present an analysis of the four biggest OTT music streaming in CTR mode. Hence, in order to initlialise playback, services in India, in the process highlighting security gaffes 4See moof boxes [25] for information on format of the media fragments where DRM is concerned. 5 4 Case Studies CLIENT SERVER This section forms the basis of the work done in this paper. We present here, a reconstruction of the streaming protocols used by the four biggest (by subscriber base) music streaming Generate deviceId Generate userAgent services in India, in view of formulating an exploit to steal their content. The reconstruction of these protocols involved POST Request to "sapi.wynk.in/music/v3/account/login" reverse engineering the Javascript modules executing on the Request Payload {"deviceId":"c7782aa9-908a-5d13-1643- client browser using static and dynamic techniques such d26ed0f09429","userAgent":"c7782aa9-908a-5d13- as code de-obfuscation, debugging etc., observing network 1643-d26ed0f09429"} packets using Burp[9] and a fair amount of intuition. In all Response 200 cases, we were able to completely replicate the protocols in Client Registration { "isSystemGeneratedContentLang":true, "uid":"4h5TX4l0byA23YpO4UovEHiZ6sk4", order to get access to the audio content using these standard "songQuality":"a","autoPlaylists":true, "downloadQuality":"hd", techniques of reverse engineering. Some code obfuscation "lastAutoRenewalOffSettingTimestamp":0, "isRegistered":false, aside, none of these services used industry standard DRM "lang":"en","notifications":true, and could be broken with minimal effort by a dedicated "token":"hLO545xc"} attacker. Given below is a summarised analysis of Airtel Wynk, Compute search_id POST Request to JioSaavn, Gaana and Hungama. Using this analysis we were "https://playback.wynk.in/streaming/v4/ able to write scripts to automatically steal content. In the cscgw/.html? ets=true&hlscapable=1&sq=a&lang=en" interest of keeping the descriptions concise, we have delib- Headers: x-bsy-utkn Request Payload erately sacrificed rigorous function definitions in favour of {} broad descriptions of what those functions do, while illustrat- Response 200 {"success":true,"url":"","cookie": {"CloudFront-Policy":"<...>", Get Auhtorized URI "CloudFront-Signature":"<...>", "CloudFront- similar to their names in the actual JS code. For a detailed Key-Pair-Id":"<...>"},"lyrics":{}} description refer to Appendix A. The implementation details GET Request to url (From previous response) are furnished in Section 6. Response 200 master.m3u8 Manifest File

4.1 The Curious Case Of Wynk Get Manifest GET Request to index.m3u8 URI as specified in Airtel Wynk Music was the first service that we came across manifest Query Parameters that had serious flaws in their content security mechanism. "Policy", "Signature", "Key-Pair-Id"

The flaws were such, that we were able to write scripts in Get Index Response 200 order to automatically steal content at the highest available index.m3u8 Index File quality. The protocol diagram in Figure 1 describes the work- GET Request to Chunk URIs as specified in ing of Wynk prior to our disclosure. index.m3u8 Query Parameters The Protocol in A Nutshell. "Policy", "Signature", "Key-Pair-Id" • Client Registration The client is identified to the Response 200 Audio Content Chunk(.ts)

server using a POST request containing the deviceId Repeat and userAgent parameters in the payload. These pa- rameters are set by the client and appear to be random Figure 4. Content Retrieval Protocol - Wynk (Reconstructed) in nature. Our observation was that persisting the val- ues for these parameters had no effect on the execution of the protocol. As a response to this request, the server replies with values for uid and token. signed cookies which we will refer to as CloudFront • Compute Search Id For Resource A search_id was Cookies. computed based on the song URL through a combina- • Acquire Manifest On making a request to the URI tion of string operations and table lookups. obtained as a response in previous request, the server • Acquire Authenticated URI A POST request is made responds with the manifest file which contains URIs to retrieve the authenticated URI for content retrieval to the various index.m3u8 files available. from the CDN. Using token as the key, a SHA1-HMAC • Acquire index.m3u8 Using the URI for the index file of a string containing the search_id is generated. The of highest quality available, a request is made with Base64 encoded value of this HMAC is assigned to a query parameters being set using the CloudFront Cook- request header x-bsy-utkn after appending it to the ies obtained previously. A successful response from uid. The result of this request is a URL with a set of the server gives us the index.m3u8 file of our choice. 6 • Getting Content By making GET requests to the chunk CLIENT SERVER URIs present in the index file and setting the appropri- .ts Generate BK ate query parameters, the client starts recieving Generate deviceId media files from the server. By appending those files Set pk Set sk

in order, the complete audio file is obtained. GET Request to "https://img.wynk.in/webassets/<...>_1.jpg" Following our disclosure, Wynk made certain changes to Response 200 their protocol that are listed below - {....} GET Request to 1. A code obfuscation scheme was introduced that re- "https://img.wynk.in/webassets/<...>_2.jpg"

spit_out() Call Response 200 placed function/variable names, identifiers with what {....} were essentally array lookups. The array used for lookup POST Request to "https://ping.wynk.in/health/check" was included in the source code which rendered the Headers: 'tk':' ', 'bk':' ' Request Payload obfuscation useless. {"pid": " "}

2. The client registration process was redesigned and a check() call Response 200 time window was introduced using Time Based OTPs Headers: {"k","n","y","w","m","z","a","p"} POST Request to (TOTPs)[49]. "https://login.wynk.in/music/account/v1/login" Headers: x-bsy-ptot, x-bsy-cip Request Payload However, the the content retrieval part of the protocol re- {} mained the same. The introduced changes only served to Response 200 login() call complicate the process of getting the authenticated URIs for {dt, uid, token, kt} the CDN. Moreover, content was still being streamed with- Compute search_id 5 out encryption . The revised protocol is described in Figure POST Request to "https://stream.wynk.in/song/v4/stream? 5. Needless to say, we did not face any difficulties in breaking ets=true&hlscapable=1&sq=a&lang=en&id= " Wynk once more. Headers: x-bsy-uuid, x-bsy-utkn, x-bsy-t Request Payload {} The New Protocol in a Nutshell. Response 200

Get Authprized URI {"success":true,"url":"","cookie": {"CloudFront-Policy":"<...>", "CloudFront-Signature":"<...>", "CloudFront-Key- tocol involved the generation of certain tokens, namely, Pair-Id":"<...>"},"lyrics":{}} BK, deviceId, pk & sk. pk and sk were values that GET Request to Authorized url(From previous were hardcoded into the source code while BK and Get response) Response 200

deviceId Manifest were generated using the epoch time and a master.m3u8 Manifest File pseudorandom number GET Request to index.m3u8 URI as specified in manifest • spit_out(BK, deviceId) Two requests are made to Query Parameters the server using the outputs of this method which ba- "Policy", "Signature", "Key-Pair-Id" Get Index deviceId Response 200 sically does some intermixing of the strings index.m3u8 Index File and BK. The ouptut strings are then appended with GET Request to Chunk URIs as specified in "_1.jpg" and "_2.jpg" and treated as endpoints for index.m3u8 Query Parameters requests. Now, we are not entirely sure why the image "Policy", "Signature", "Key-Pair-Id" extensions are used in particular, however we can con- Response 200 Audio Content Chunk(.ts)

fidently say that the response to those requests hasno Repeat further use. That being said, if either of those requests are not made, the protocol fails subsequently, Figure 5. Revised Content Retrieval Protocol - Wynk (Re- • check() & login() These functions are named after constructed) the endpoints to which requests are made. A successful response to the check endpoint returns several param- eters which are used to compute the values of certain • Acquire Authenticated URI The values received in headers in the request to the login endpoint. A suc- the previous step are used to set the headers for an- login cessful response from the endpoint contains the other request as follows - dt, uid, token, kt parameters among others. – x-bsy-uuid ← dt • Compute Search Id This method did not change com- – x-bsy-utkn similar computations as before6 pared to the previous deployment of Wynk – x-bsy-t ← AES(kt, TOTP(dt||sk, 600, 6))7

5When we say without encryption, we refer to the fact that after the decryp- 6The changes can be observed in the Appendix tion from the HTTPS layer and gzip unpacking has occurred, the audio 76 digit TOTP generated with a window of 600 seconds. CryptoJS content is directly playable (unencrypted) implementation of AES used 7 This POST request if successful returns the CloudFront CLIENT SERVER Cookies with a URI. The rest of the protocol follows

identically to the previous version of Wynk. GET Request to

It is pretty evident from the analysis that Wynk went to Response HTML Code........ greater lengths to complicate the retrieval mechanism post disclosure, however they failed to address the crux of the Get Song Info Extract problem. Encrypted URI

4.2 JioSaavn Joins The Jam AES-CBC Decrypt

With the findings from our work on Wynk, we were inspired GET Request to to look into other platforms to test if the situation found Response with Wynk was a general norm among established players. manifest.m3u8 Get Manifest JioSaavn is the 2nd most popular India based music streaming GET Request to service in terms of number of subscribers. It took some vigi- index URI present in manifest.m3u8

Response lant effort to get to the media content but once the relevant Get Index index.m3u8 execution path was found, piecing together the protocol was GET Request to found to be extremely easy and straightforward. URI of audio chunk as given in index.m3u8

Response segment.ts Repeat

Figure 7. Content Retrieval Protocol - Gaana (Reconstructed)

GET Request to Webapp Song URL (https://www.jiosaavn.com/song//)

window__INITIAL_DATA__. Response ˙ HTML containing Encrypted URL of song media Parameters that are essential for fetching the media (Example: ID2ieOjCrwfgWv4B1ImC5QfbsDy%2F3il...) content are encrypted_medial_url and perma_url.

Get Encrypted URL • Generate Auth Token A GET request is made to https: GET Request to //www.jiosaavn.com/api.php?call=song.generateAuthToken& https://www.jiosaavn.com/api.php? call=song.generateAuthToken&url=&bit_rate= to url>&bitrate=..... obtain the authorised URL that is used to fetch the me-

Response dia from the CDNs. The relevant parameters are url JSON: {'auth_url': , 'type': 'mp4', which is the encrypted_media_url discussed above 'status': 'success'} and bit_rate which takes the values "128", "320", (auth-url example: "64", "32", "16" auth_url Get Authorized CDN URL 'https://ac.cf.saavncdn.com/577/2c5b...203b13ea_320.mp4? . The response contains Expires=1602217480&Signature=jWboJ6ut2Tn...EQw1wWuJw5W4g__&Key- which is verified by the CDN to authorize a request. Pair-Id=APKAJB334VX63D3WJ5ZQ' • Downloading Media A GET request is made to auth_url to finally retrieve the relevant media. This URL is suf- GET Request to ficient for authorization.

4.3 Getting Gaana Response Content Type:audio/mp4 Gaana was the third service that we looked at. Figuring out (Served as stream (206)) the protocol was easy as Gaana had no code obfuscation and at least for the non-logged in user, did not rely on cookies at Retrieve First Chunk Of Media all. Figure 7 demonstrates the working of the protocol. Figure 6. Content Retrieval Protocol - JioSaavn (Recon- • Getting Song Details Where Wynk relied on interac- structed) tion with the server to obtain an authorised resource URI, Gaana instead embeds all information as text in the .html code of the song’s page. The current song The Protocol in A Nutshell. information is a JSON string present in a tag • Acquiring Song Info Interestingly, the metadata re- with ’data-type’:’playSong’. The path key con- lated to the song is served within an HTML response. tains AES-CBC encrypted URIs with PKCS#7 padding[29] It is found within the JavaScript variable, for various bitrates indexed as high, medium, low. 8 response contains the relevant values media_id and file. The file URL contains the parameter token • Fetching Media Url A POST request with an empty GET Request to https://www.hungama.com/audio-player- body is made to http://www.hungama.com/mdnurl/ data/track/ (where Webapp Song URL is song/{song_id}?token=. The response contains https://www.hungama.com/song//

Response trieve the media. The bit rate can be chosen by set- Get File URL [{...'mediaid': , 'file': ting the hcom_audio_qty parameter to one of "high", 'https://www.hungama.com/mdnurl/song/? token='...}] "low", "medium" in the Cookie header. • Downloading Media A GET request is made to media_url POST Request to http://www.hungama.com/mdnurl/song/{song_id}? to download the relevant media. This URL is sufficient token= for authorization. Body:{} 5 Discussion Response {'status': 1, 'mesage': 'successful', 'response': 5.1 Comparative Analysis {'media_url': , 'type': 'mp3'}} Get Authorized Media URL Among the case studies showcased in this work, reversing

GET Request to the protocol for Wynk was the most challenging in terms of effort due to the intricacies and complexity of the imple- (example: https://media.hungama.com/c/4/0d7/22d/40990679/40990679_128.mp3? 9pFejfIg...FzIiRo3dQ) mentation. Yet, in the end it turned out to be a matter of getting through the different layers of obfuscation which didn’t really have any theoretical security guarantees. When Response Content Type:mp3 compared to Wynk, the other 3 services, JioSaavn, Gaana (Served as stream (206))

Retrieve First Chunk Of Media and Hungama had simpler mechanisms for content serv- ing which were reversed quite effortlessly. The patch im- plemented by Wynk after our disclosure shows that they Figure 8. Content Retrieval Protocol - Hungama (Recon- are indeed concerned about protecting their content and structed) so it would be quite interesting to see how these platforms embrace DRM in the future instead of working on stop-gap solutions which only delay the inevitable. • Decrypting path The key and initialization vector is We present a summary of the best/worst practices in Table 2 hardcoded in the JS files and we use those values to based on our investigations. decrypt and obtain the authorized URI. • Acquire Manifest A request to the authorised URI Practice Spotify Wynk JioSaavn Gaana Hungama Mandatory User manifest.m3u8 ✓ returns the which contains the URI Identification for index.m3u8 Streamed Content ✓ • Playback The index.m3u8 file contains URIs for all Encryption chunks of the audio. After iterating throught this file Hardcoded Keys ✓ ✓ DRM Scheme ✓ and making requests for all chunks (.ts files), we can Cookie Based append them together to obtain the complete audio. Authentication ✓ ✓ Timeout Premium Content 4.4 Hunting Hungama ✓ Access Restrictions Hungama is yet another popular music streaming service in Obfuscation/ ✓ ✓ ✓ ✓ ✓ India. We explored its content serving mechanism and found Minification it to be pretty similar to JioSaavn and reverse engineered the Table 2. A Comparative Analysis of Practices following protocol.

The Protocol in A Nutshell. • Audio Player Data A GET request is made to https:// 5.2 Why No DRM ? www.hungama.com/audio-player-data/track/ What might be the possible reasons that could have led to to fetch the metadata of the song. The song_id pa- such an oversight, we wonder. DRM protection is not exactly rameter required for this request is obtained from the a new or novel concept and has been a part of the industry WebApp URL of the song which is of the form https: for a fair amount of time. This would imply that deploying //www.hungama.com/song//

7 begin Retrieval Algorithm 5: get_song() Function 8 [푐ℎ푢푛푘푠] ← get_song(퐶) Input: 퐶 ; /* Response Object Containing Authenticated URIs & Signatures */ Output: [푐ℎ푢푛푘푠] 1 begin Algorithm 2: register() Function 2 manifest_url ← Extract manifest file URI from 퐶 Input: 푑푒푣푖푐푒퐼푑,푢푠푒푟퐴푔푒푛푡 3 manifest_file ← GET(manifest_url) 푢푖푑, 푡표푘푒푛 Output: 4 index_uri ← Identify and extract highest quality Data: Request Headers: H index.m3u8 URI 1 begin 5 index_file ← GET(index_uri) 2 푢푟푙 := 6 chunks = [ ] "https://sapi.wynk.in/music/v3/account/login" 7 forall chunk_url in index_file do 3 payload := {"deviceId": "푑푒푣푖푐푒퐼푑", 8 푡푚푝 ← GET(푐ℎ푢푛푘_푢푟푙) "userAgent": "푢푠푒푟퐴푔푒푛푡"} 9 chunks.push[푡푚푝] 4 푢푖푑, 푡표푘푒푛 ← POST(푢푟푙, H, payload) 10 5 return 푢푖푑, 푡표푘푒푛 return chunks

11 A.2 Wynk v2 Algorithm 9: login() Function Input: 푘,푛,푦,푤,푚,푧,푎,푝 Algorithm 6: Client Side in Wynk 2.0 Output: 푑푡,푢푖푑,푡표푘푒푛,푘푡,... H ′ Input: 푢푟푙; /* Resource Url */ Data: Request Header: 1 begin Output: [푐ℎ푢푛푘푠]; /* Audio Chunk List */ 2 푢푟푙:= 1 begin Initialisation "https://login.wynk.in/music/account/v1/login" 2 퐵퐾 ← gen_bk(T ,R) 3 퐵푆 := 푘||푛||푦||푤 ||푚||푧||푎||푝 3 푑푒푣푖푐푒퐼푑 ← gen_random_id(R) ′ 4 H [x-bsy-ptot] ← T 푝푘 4 = /* Generate x-bsy-cip from BS value */ Base64enc(https://sapi.wynk.in/music) 5 a ← [ ], b← 0, t← 0 5 푠푘 = 51ymYn1MS 6 for 푡 ≤ 푙푒푛(퐵푆) − 1 do 6 begin Authentication 7 푒 = 10(퐵푆 [푡]) + 퐵푆 [푡 + 1] /* Authorisation with the Wynk Servers 8 if 푒 ≤ 55 then to enable authenticated retrieval 9 if 2 ̸ | 푏 then requests */ 10 a.push(200 + 푒) 7 spit_out(퐵퐾, 푑푒푣푖푐푒퐼푑) 11 else 8 푈 := {푘,푛,푦,푤,푚,푧,푎,푝} ← 퐵퐾 T check( , ) 12 a.push(100 + 푒) 9 푀 := {푑푡,푢푖푑,푡표푘푒푛,푘푡,...} ← login(푈 , T ) 13 b++ 10 퐶 ← request_manifest(푢푟푙, 푡표푘푒푛, 푑푡, 푠푘) 14 else 11 begin Retrieval 15 a.push(100 + 푒) 12 [푐ℎ푢푛푘푠] ← get_song(퐶) ′ 16 H [x-bsy-cip] ← concat(푎) ′ 17 퐶 ← POST(푢푟푙, H , payload: {}) 18 return 퐶 Algorithm 7: spit_out() Function Input: 퐵퐾, 푑푒푣푖푐푒퐼푑 Algorithm 10: request_manifest() Function 1 begin Input: {푢푟푙,푑푡,푢푖푑,푡표푘푒푛,푘푡} 2 푑1,푑2 ← 푑푒푣푖푐푒퐼푑 [0...36),푑푒푣푖푐푒퐼푑 [36...72) Output: Authenticated CloudFront Resource 3 In 푑1,푑2 replace "−" → "" Parameters 4 푑 ,푑 ← mix_it(푑 , 퐵퐾), mix_it(푑 , 퐵퐾) 3 4 1 2 Data: Request Header: H ′′ 5 푢푟푙 := "https://img.wynk.in/webassets/" 1 begin 6 GET(푢푟푙||푑 ||"_1.jpg") ′′ 3 2 H [x-bsy-uuid] ← 푑푡 푢푟푙 푑 7 GET( || 4||"_2.jpg") /* Generating header x-bsy-utkn */ 3 suffix := "/song/v4/stream?ets=true&hlscapable= 1&sq=a&lang=en&id=" 4 search_id ← get_search_id(푢푟푙) Algorithm 8: check() Function 5 푚푠푔 := "POST"|| suffix || search_id ||"{}" Input: 퐵퐾 6 푑푖푔푒푠푡 ← SHA1_HMAC(푡표푘푒푛, 푚푠푔) Output: k,n,y,w,m,z,a,p ′′ 7 H [x-bsy-utkn] ← 푢푖푑||":"||Base64Enc(푑푖푔푒푠푡) Data: Request Header: H /* Generating x-bsy-t using Time Based 1 begin OTPs and CryptoJS AES */ 2 푢푟푙:= "https://ping.wynk.in/health/check" ′′ 8 H [x-bsy-t] ← AES(푘푡, TOTP(푑푡 ||푠푘, 600, 6)) 3 H[tk] ← T /* Send POST Request To Server */ 4 H[bk] ← 퐵퐾[ ( )/ ) 0,푙푒푛 퐵퐾 2 ′′ 9 푋 ← POST(푢푟푙, H , payload: {}) 5 푝 ← 퐵퐾[푙푒푛(퐵퐾)/2),푙푒푛(퐵퐾)) ′ 10 return 푋 6 푈 ← POST(푢푟푙, H , payload:{“푝푖푑”:“푝”}) 7 return 푈

12 References [28] International Federation of the Phonographic Industry. 2018. IFPI [1] [n.d.]. Adobe Primetime. https://www.adobe.com/in/marketing/ Music Consumer Insight Report. https://www.ifpi.org/downloads/ primetime.html Music-Consumer-Insight-Report-2018.pdf [2] [n.d.]. Amazon Music. https://music.amazon.in/home [29] B. Kaliski. 1998. PKCS #7: Cryptographic Message Syntax. Technical [3] [n.d.]. Apple Fairplay Streaming. https://developer.apple.com/ Report. https://tools.ietf.org/html/rfc2315 streaming/fps/ [30] Kate Swanson. 2013. A Case Study on Spotify: Exploring Perceptions [4] [n.d.]. Apple Music. https://www.apple.com/in/music/ of the Music Streaming Service. MEIEA Journal 13, 1 (2013). [5] [n.d.]. Denuvo By Irdeto. https://irdeto.com/denuvo/ [31] Renuka Kumar, Sreesh Kishore, Hao Lu, and Atul Prakash. 2020. Se- [6] [n.d.]. Hungama. https://www.hungama.com/ curity Analysis of Unified Payments Interface and Payment Apps [7] [n.d.]. Marlin DRM. https://www.intertrust.com/products/drm- in India. In 29th USENIX Security Symposium (USENIX Security 20). system/marlin-drm/ USENIX Association, 1499–1516. https://www.usenix.org/conference/ [8] [n.d.]. Microsoft PlayReady. https://www.microsoft.com/playready/ usenixsecurity20/presentation/kumar [9] [n.d.]. PortSwigger Burp Proxy. https://portswigger.net/support/ [32] Airtel Digital Ltd. 2020. Wynk Music - Homepage. Retrieved October using-burp-proxy 4, 2020 from https://wynk.in/music [10] [n.d.]. Shaka Player. https://github.com/google/shaka-player [33] Gamma Gaana Ltd. 2020. Gaana - Homepage. Retrieved October 4, [11] [n.d.]. Spotify. https://www.spotify.com/in/ 2020 from https://gaana.com/ [12] [n.d.]. Widevine. https://www.widevine.com/ [34] Saavn Media Pvt Ltd. 2020. JioSaavn - Homepage. Retrieved October [13] [n.d.]. Youtube Music. https://music.youtube.com/ 4, 2020 from https://www.jiosaavn.com/ [14] Ben Munson. [n.d.]. Streaming services are being hit hard by hackers, [35] Vainavi Mahendra. 2019. Hybrid OTT and music streaming platforms Akamai says. https://www.fiercevideo.com/video/streaming-services- get more users than 100% subscription-based platforms. Retrieved are-being-hit-hard-by-hackers-akamai-says October 4, 2020 from https://brandequity.economictimes.indiatimes. [15] Alex Biryukov, Gaëtan Leurent, and Arnab Roy. 2013. Cryptanalysis of com/news/media/hybrid-ott-and-music-streaming-platforms-get- the “Kindle” Cipher. In Selected Areas in Cryptography, Lars R. Knudsen more-users-than-100-subscription-based-platforms/69359360 and Huapeng Wu (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, [36] Colin Mann. 2020. Study: Stream-ripping piracy on the rise. Retrieved 86–103. September 17, 2020 from https://advanced-television.com/2020/09/17/ [16] Bruce E. Boyden. 2011. Is DRM working?: how could we tell?. In study-massive-increase-in-stream-ripping-piracy/ Proceedings of the 11th ACM Workshop on Digital Rights Management, [37] mark. 2019. Audio Downloader Prime. Retrieved September 21, 2019 Chicago, Illinois, USA, October 21, 2011, Yan Chen, Stefan Katzenbeisser, from https://www.jiosaavn.com/ and Ahmad-Reza Sadeghi (Eds.). ACM, 1–2. https://doi.org/10.1145/ [38] Markets and Markets. 2020. Over-The-Top Services Market by Type 2046631.2046633 (Online Gaming, Music Streaming, VoD and Communication), Moneti- [17] David Buchanan. 2019. . Retrieved October 4, 2020 from https: zation Model (Subscription-based, Advertising-based, and Transaction- //twitter.com/David3141593/status/1080606827384131590 based), Streaming Device, Vertical, and Region - Global Forecast to [18] Deloitte India. [n.d.]. Audio OTT Economy in India. 2024. TC2445 (2020). https://www.marketsandmarkets.com/Market- https://www2.deloitte.com/in/en/pages/technology-media-and- Reports/over-the-top-ott-market-41276741.html telecommunications/articles/OTT.html [39] Andy Maxwell. 2019. The Scene: Pirates Ripping Content From Amazon [19] Cory Doctorow. 2019. After Years of Insisting that DRM in HTML & Netflix. Retrieved October 4, 2020 from https://torrentfreak.com/the- Wouldn’t Block Open Source Implementations, Google says It Won’t scene-pirates-ripping-content-from-amazon-netflix-190707/ Support Open Source Implementations. Retrieved October 4, 2020 from [40] U. S. Government Publishing Office. [n.d.]. Public Law 105–304–Digital https://boingboing.net/2019/04/03/i-hate-being-right-2.html Millennium Copyright Act. Retrieved October 4, 2020 from http://www. [20] David Dorwin, Jerry Smith, Mark Watson, and Adrian Bateman. 2017. gpo.gov/fdsys/pkg/PLAW-105publ304/html/PLAW-105publ304.htm Encrypted Media Extensions. Technical Report. [41] Traci Ruether. 2019. 2019 Video Streaming Latency Report. Re- [21] EC-Council. 2009. Computer Forensics: Investigating Network Intrusions trieved October 4, 2020 from https://www.wowza.com/blog/2019- and Cybercrime. Cengage Learning. video-streaming-latency-report [22] Financial Express. 2019. Audio OTT contributes 70% to music [42] Russ Crupnick. [n.d.]. Thanks to Stream-Ripping, Music Piracy Still a industry revenue: Deloitte-IMI. https://www.financialexpress. Scourge. https://www.musicwatchinc.com/blog/thanks-to-stream- com/industry/deloitte-imi-audio-ott-contributes-70-pct-to-music- ripping-music-piracy-still-a-scourge/ industry-revenue/1695436/ [43] Fidus Information Security. 2019. A Primer on Widevine and How It Can [23] Google. 2017. Widevine DRM Architecture Overview. Retrieved October Be Abused to Download Encrypted Movies/Shows. Retrieved October 4, 4, 2020 from http://www.whymatematica.com/wp-content/uploads/ 2020 from https://fidusinfosec.com/breaking-content-protection-on- 2018/08/Widevine_DRM_Architecture_Overview.pdf streaming-websites/ [24] Hennie van Kuijeren. [n.d.]. The Preference for Music [44] Robert Triggs. 2019. Widevine Digital Rights Management Explained. as a Service as Opposed to Download to Own. https: Retrieved October 4, 2020 from https://www.androidauthority.com/ //www.inholland.nl/media/10674/masterthesis-hennie-van- widevine-explained-821935/ kuijeren-the-preference-for-music-as-a-service-a.pdf [45] Ruoyu Wang, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni [25] Apple Inc. 2017. HTTP Live Streaming. Technical Report. https://tools. Vigna. 2013. Steal This Movie: Automatically Bypassing DRM Pro- ietf.org/html/draft-pantos-http-live-streaming-23#section-3.3 tection in Services. In USENIX Security Symposium. [26] Apple Inc. 2020. Understanding the HTTP Live Stream- USENIX Association, 687–702. ing Architecture. Retrieved October 4, 2020 from https: [46] Ruoyu Wang, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni //developer.apple.com/documentation/http_live_streaming/ Vigna. 2013. Steal This Movie: Automatically Bypassing DRM Protec- understanding_the_http_live_streaming_architecture tion in Streaming Media Services. In 22nd USENIX Security Symposium [27] International Federation of the Phonographic Industry. 2018. IFPI (USENIX Security 13). USENIX Association, Washington, D.C., 687– Global Music Report 2018. https://www.ifpi.org/downloads/GMR2018. 702. https://www.usenix.org/conference/usenixsecurity13/technical- pdf sessions/paper/wang_ruoyu

13 [47] Wikipedia. 2019. Analog Hole. Retrieved October 4, Adaptive_Streaming_over_HTTP 2020 from https://en.wikipedia.org/wiki/Analog_hole#:~: [49] Wikipedia. 2019. Time Based One Time Passwords. Retrieved October text=The%20analog%20hole%20(also%20known,ultimately% 4, 2020 from https://en.wikipedia.org/wiki/Time-based_One-time_ 20reproduced%20using%20analog%20means. Password_algorithm [48] Wikipedia. 2019. Dynamic Adaptive Streaming over HTTP. Re- trieved October 4, 2020 from https://en.wikipedia.org/wiki/Dynamic_

14