Indian Regional Movie Dataset for Recommender Systems

Prerna Agarwal∗ Richa Verma∗ Angshul Majumdar IIIT Delhi IIIT Delhi IIIT Delhi [email protected] [email protected] [email protected]

ABSTRACT few years with a lot of diversity in languages and viewers. As per Indian regional movie dataset is the €rst database of regional Indian the UNESCO cinema statistics [9], India produces around 1,724 movies, users and their ratings. It consists of movies belonging to movies every year with as many as 1,500 movies in Indian regional 18 di‚erent Indian regional languages and metadata of users with languages. India’s importance in the global €lm industry is largely varying demographics. Œrough this dataset, the diversity of Indian because India is home to Bollywood in Mumbai. Œere’s a huge base regional cinema and its huge viewership is captured. We analyze of audience in India with a population of 1.3 billion which is evident the dataset that contains roughly 10K ratings of 919 users and by the fact that there are more than two thousand multiplexes in 2,851 movies using some supervised and unsupervised collaborative India where over 2.2 billion movie tickets were sold in 2016. Œe box €ltering techniques like Probabilistic Matrix Factorization, Matrix oce revenue in India keeps on rising every year. Œerefore, there Completion, Blind Compressed Sensing etc. Œe dataset consists of is a huge need for a dataset like Movielens in Indian context that can metadata information of users like age, occupation, home state and be used for testing and bench-marking recommendation systems known languages. It also consists of metadata of movies like genre, for Indian Viewers. As of now, no such recommendation system language, release year and cast. India has a wide base of viewers exists for Indian regional cinema that can tap into the rich diversity which is evident by the large number of movies released every of such movies and help provide regional movie recommendations year and the huge box-oce revenue. Œis dataset can be used for for interested audiences. designing recommendation systems for Indian users and regional movies, which do not, yet, exist. Œe dataset can be downloaded from hŠps://goo.gl/EmTPv6. 1.1 Motivation As of now, Netƒix and Movielens datasets do not have a comprehen- KEYWORDS sive listing of regional productions as the clipping shows in Figure Movie dataset, , Metadata, Indian Regional 1( borrowed from [39]). Œerefore, a substantial source of such a Cinema, Crowd Sourcing, data comprising movies of various regions, varying languages and genres encompassing a wider folklore is strongly needed that could provide such data in a suitable format required for building and 1 INTRODUCTION benchmarking recommendation systems. Recommendation systems [17, 18, 19, 20, 33] utilize user ratings to To capture the diversity of Indian regional cinema, popular websites provide personalized suggestions of items like movies and products. like Netƒix are trying to shi‰ focus towards it [36, 37]. Œe goal is Some popular brands that provide such services are Amazon [34], to bring some of the greatest stories from Indian regional cinema Netƒix, IMDb, BarnesAndNoble [35], etc. Collaborative Filtering on a global platform. Œrough this, viewers are exposed to a wide (CF) and Content-based (CB) recommendation are two commonly variety of new and diverse stories from India. As a result of this used techniques for building recommendation systems. CF systems initiative, Indian regional cinema will be available across countries. [21, 22, 23]operate by gathering user ratings for di‚erent items in Building a recommendation system using a dataset of such movies a given domain and compare various users, their similarities and and their audience can prove to be useful in such situations. Here, di‚erences, to determine the items to be recommended. Content- we present such a dataset which is the €rst of its kind. based methods recommend items by comparing representations of content that a user is interested in to representations of content arXiv:1801.02203v1 [cs.IR] 7 Jan 2018 that an item consists of. 1.2 Contributions Œere are several datasets like MovieLens [1] and Netƒix [2] that are • Web portal for data collection: A web portal where a user available for testing and bench-marking recommendation systems. can sign up by €lling details like email, date of birth, gender, We present an Indian regional movie dataset on similar lines. India home town, languages known and occupation. Œe user has been the largest producer of movies in the world for the last can then provide rating to movies as like/dislike. • Indian Regional Cinema Dataset: It is the €rst dataset of ∗Prerna Agarwal (now working with IBM Research) and Richa Verma (now working with TCS Research ) have equal contribution Indian Regional Cinema which contains ratings by users for di‚erent regional movies along with user and movie Permission to make digital or hard copies of part or all of this work for personal or metadata. User metadata is collected while signing up classroom use is granted without fee provided that copies are not made or distributed for pro€t or commercial advantage and that copies bear this notice and the full citation on the portal. Movie metadata consists of genre, release on the €rst page. Copyrights for third-party components of this work must be honored. year, description, language, writer, director, cast and IMDb For all other uses, contact the owner/author(s). rating. Conference’17, Washington, DC, USA © 2016 Copyright held by the owner/author(s). 978-x-xxxx-xxxx-x/YY/MM...$15.00 • Detailed analysis of the dataset using some supervised and DOI: 10.1145/nnnnnnn.nnnnnnn unsupervised Collaborative Filtering techniques. such as Blind Compressed Sensing, Probabilistic Matrix Factoriza- tion, Matrix completion, Supervised Matrix Factorization are used on our dataset to provide benchmarking results. Œese techniques are chosen over others because these techniques have proven to provide beŠer accuracy in recent works [6].

3 INDIAN REGIONAL MOVIE DATASET Œis is the €rst dataset of Indian regional cinema which covers movies of 18 di‚erent regional languages and a variety of user ratings for such movies. It consists of 919 users with varying demo- graphics and 2,851 movies with di‚erent genres. It has 10K ratings from 919 users.

3.1 Metadata Information Œe data for movies has been scraped from IMDb [3]. IMDb has a collection of Indian movies spanning across multiple Indian regional languages and genres. Each movie is associated with the following metadata. • Movie id: Each movie has a unique id for its representation. • Description: Description of the movie for users. • Language: Language(s) used in the movie. A movie may have been released in multiple regional languages. Œe Figure 1: Amazon and Netƒix: Focus on Regional Films distribution is shown in Table 1. • Release date: Date of release of the movie. • Rating count: As per IMDb, to judge the popularity of the movie. 2 RELATED WORK • Crew: Director, writer and cast of the movie. MovieLens [1] is a web-based portal that recommends movies. It • Genre: Movie genre. It can be one or multiple out of 20 uses the €lm preferences of its users, collected in the form of movie genres available on IMDb. Œe number of movies for each ratings and preferred genres and then utilizes some collaborative genre are shown in Figure 2. €ltering techniques to make movie recommendations. Œe Depart- ment of Computer Science and Engineering at the University of Minnesota houses a research lab known as Grouplens Research that Table 1: Language Distribution created Movielens in 1997 [38]. Indian Regional Cinema dataset is inspired from Movielens. Œe primary goal was to collect data for performing research on providing personalized recommendations. Languages Movie Count User Count MovieLens released three datasets for testing recommendation sys- Hindi 615 902 tems: 100K, 1M and 10M datasets. Œey have released 20M dataset Bengali 582 28 as well in 2016. In the dataset, users and movies are represented Assamese 22 9 with integer IDs, while ratings range from 1 to 5 at a gap of 0.5. Tamil 313 30 Netƒix released a training data set for their contest, Netƒix Prize [8], Nepali 51 9 which consists of about 100,000,000 ratings for 17,770 movies given Punjabi 150 78 by 480,189 users. Each rating in the training dataset consists of Rajasthani 18 14 four entries: user, movie, date of grade, grade. Users and movies Malayalam 346 16 are represented with integer IDs, while ratings range from 1 to 5. Bhojpuri 26 21 Œese datasets are largely for hollywood movies and TV series, and Kannada 303 11 their viewers. Œey are not designed for those user communities Haryanvi 3 18 which are inclined towards watching Indian regional cinema. Manipuri 8 4 From the view point of recommender systems, there have been a lot Urdu 129 23 of work using user ratings for items and metadata to predict their Marathi 204 14 liking and disliking towards other items [4, 5, 6, 11]. Many unsuper- Telugu 338 18 vised and supervised collaborative €ltering techniques have been Oriya 98 6 proposed and benchmarked on movielens dataset. Here, in this Gujarati 49 7 paper, we have chosen few popular techniques such as user-user Konkani 6 4 similarity to establish baseline and then other deeper techniques Figure 2: Genre Distribution

For beŠer recommendations, it is important to include the factors which inƒuence user ratings the most. Following is the metadata information collected for a user: • User id: A user will have a unique id for its representation. • Languages: Œe languages known by the users. Its count is shown in Table 1. • State: Œe state of India that the user belongs to. Œe region wise distribution is shown in Figure 3.

Figure 4: Age Distribution

Figure 3: Region-wise Distribution

• Age: Date of birth of the user is taken as input to calculate the age of the user. Œe distribution is shown in Figure 4. • Gender: Denotes the gender of the user. Gender distribu- tion of the data is shown in Figure 5. • Occupation: It denotes the occupation of the user. It can be Figure 5: Gender Distribution any one out of student, self-employed, service, retired and others. Its distribution is shown in Figure 6. Table 2: Comparison of Datasets

Dataset Movie Count User Count Rating Count Sparsity(%) Release Year Our Dataset 2851 919 10,000 99.96 2017 Movielens 100K 1700 1000 100,000 99.94 4/1998 Movielens 1M 6000 4000 1,000,000 99.96 2/2003

Figure 6: Occupation Distribution

3.2 MovieLens vs Indian Regional Cinema Dataset Œe key di‚erence between the presented dataset and movielens is that the laŠer does not contain movies from Indian regional cinema. Movielens only has few Hindi and Urdu movies. Also, our data has been collected mainly from the viewers of regional movies in India. Figure 7: Sign Up form on Portal Œe user metadata, thus, collected can be used to recommend more relevant movies for such audiences. information. Œe user can then login to the portal to rate movies as Also, the MovieLens datasets are biased towards a certain category either like or dislike and the responses are recorded as shown in of users. Œey contain data only from users who have rated at Figure 8. least twenty movies. Œe datasets do not include the data of those users who could not €nd enough movies to rate or did not €nd the system easy enough to use. Œere is a possibility that there is a fundamental di‚erence between such users and the other users in the datasets. Our dataset makes no such distinction among users based on the number of movies that they have rated. To make the process of rating multiple movies easier for a user, we have used the concept of binary rating for movies where, a user can either ”like” or ”dislike” a movie denoted by ”1” and ”-1” in the dataset. On the other hand, MovieLens uses a 10-point scale for rating (from 0 to 5). A basic comparison of these datasets are shown in Table 2. Œe table indicates the number of users, movies, ratings, release year and the sparsity of datasets.

3.3 Dataset Collection For the collection of user information and movie ratings, a web portal named Fickscore [15] is created where users can sign up €lling in all details as shown in Figure 7. Œe user has to provide the preferred languages so that the portal can ask users to rate the movies of their preferred languages. Figure 8: Rating Movies on Portal While signing up, the user is prompted to €ll up the metadata 4 UNSUPERVISED COLLABORATIVE Œe system uses the already observed ratings to €t a model on them FILTERING TECHNIQUES and uses that model to predict the new ratings. Œe intuition behind using matrix factorization to analyze this To analyze the dataset, some unsupervised techniques are used dataset is that there should be some latent features that determine such as user-user similarity, item-item similarity, Matrix Factoriza- how a user rates an item. For example, two users may give high tion, Probabilistic Matrix Factorization, Blind Compressed Sensing ratings to a certain movie if they both like the actors/actresses of etc. Œe main advantage of using such techniques is the ease of the movie, or if the movie is an action movie, which is a genre implementation and their incremental nature. On the other hand, preferred by both users. Hence, if we can discover these latent it is human data dependent its performance decreases on increase features, we should be able to predict the rating given by a certain of sparsity of data. Œese techniques cannot address the cold start user to a certain item, because the features associated with the user problem i.e., when a new user or item adds in the dataset whose should match with the features associated with the item. ratings are not available because these use ratings of users to make predictions. Bias correction is performed on the dataset by cal- culating global mean, user bias and item bias and then the above 4.3 Probabilistic Matrix Factorization techniques are used to predict the rating of a new user for an item. It can handle large datasets because it scales linearly with the num- 4.1 User and Item-based similarity ber of observations in the dataset. Let Rij represent the rating of In user-user model, a similarity matrix A is calculated, each entry a user i for a movie j. Let U and V be latent feature matrices for user and movie, respectively. Œe column vectors are denoted as Aij indicates the score computed by cosine similarity between a user i and another user j. It denotes how much similar are two Ui , representing user-speci€c latent feature vectors, and Vj , repre- users i and j, higher the score higher is the similarity. Similarly, in senting movie-speci€c latent feature vectors. [4]. Œe log posterior is maximized over movie and user features with hyper parameters item-item model, each entry Aij of the similarity matrix ’A’ denotes the cosine similarity score between an item i and another item j. using the following equation: Higher the score, the two items are more similar. N M N M 1 Õ Õ T 2 λu Õ 2 λv Õ 2 Cosine similarity can be calculated for two users u and u’ using the Iij (Rij − U Vj ) + ||Ui || + ||Vj || 2 i 2 f rob 2 f rob following equations: i=1 j=1 i=1 j=1 Í (4) j ru, jru0, j 0 Where, λ and λ are the regularization parameters for user and Su,u = q ∀ (1) u v Í 2 Í 2 item respectively. A local minimum of the equation can be com- j ru, j j ru0, j ∀ ∀ puted by gradient descent in U and V . Œe model performance is th th Where, ri, j denotes the rating by i user for j item. measured by computing mean average error (MAE) and root mean Prediction for uth user for jth item is done as: squared error (RMSE) on the test set. Õ Œis model can be viewed as a probabilistic extension of the SVD ruˆ, j = wu,u0ru,u0 (2) model, since if all ratings have been observed, the objective given u0 ∈N (u), j ∈R(u0) by the equation reduces to the SVD objective in the limit of prior Where, wu,u0 is the normalized similarity weight, ru,u0 is the variances going to in€nity. Œis technique beŠer addresses the th rating by u’ user for j item. sparsity and scalability problems and thus improves prediction Similar to user-user similarity, item-item similarity is calculated performance. It gives an intuitive rationale for recommendation. by computing cosine similarity between two items and ratings are predicted in the similar way. 4.4 Blind Compressed Sensing 4.2 Matrix factorization A dense user item matrix is not a reasonable assumption as each user will like/dislike a trait to certain extent [24]. However, any Œere are some hidden traits (latent factors) of liking/disliking of item will possess only a few of the aŠributes and never all. Hence, users which may depend on the paŠern of their ratings. Users and the item matrix will ideally have a sparse structure rather than a movies are mapped by this model to a joint latent factor space. Each dense one as formulated in earlier works. item i and user j is associated with vector q and p respectively which Œe objective of this approach is to €nd the user and item latent measures the possessiveness of an item or user for those factors. factor matrices. As per the approach, user latent factor matrix can T Œe dot product qi pu denotes the liking of a user for a speci€c item be dense but the same does not logically follow for the item latent which approximates the rating rui [10]. Computing the mapping of factor matrix. Œe sparsity of the item latent factor matrix increases each user and item to factor vectors is a major challenge. Imputation the recommendation accuracy signi€cantly. [5]. Œe following can prove to be expensive as it noticeably increases the amount equation is minimized: of data during calculation. To model the observed ratings directly 2 with regularization, the following equation is used: min ||Y − A(UV )|| + λu ||U ||f rob + λv ||V ||f rob (5) (U ,V ) Õ T 2 2 2 min (rui − q pu ) + λ(||qi || + ||pu || ) (3) q∗ .p∗ i Where λu and λv are regularization parameters for user and item (u,i ∈k) respectively. A is the binary mask matrix and Y is the rating matrix. Here, k is the set of those user-item pairs in the training set for U and V is the user latent matrix and item latent matrix respectively which rui is known. which were assumed to be dense in earlier models. Table 3: Unsupervised Techniques

Techniques Movielens 100K Our Dataset Movielens 1M Errors MAE RMSE MAE RMSE MAE RMSE User-User similarity 0.6980 1.026 0.5307 1.03 0.607 0.8810 Item-Item similarity 0.744 1.061 0.648 1.049 0.671 0.9196 Matrix Factorization 0.828 1.128 0.471 0.971 0.6863 0.8790 Probabilistic Matrix Factorization 0.7564 0.9639 0.481 0.9372 0.7241 0.9127 Blind Compressed Sensing 0.7356 0.9409 0.463 0.9612 0.6917 0.8789 Matrix Completion 0.8324 1.102 0.4827 0.9264 0.7196 0.9102

Table 4: Supervised Technique

Technique Movielens 100K Our dataset Movielens 1M Errors MAE RMSE MAE RMSE MAE RMSE Supervised Matrix Factorization 0.7199 0.9196 0.4367 0.9283 0.6709 0.8567

4.5 Matrix Completion Class label information puts in additional constraints which results Matrix completion involves €lling up the missing entries of a par- in reducing the search space as a result of which determinacy of tially observed matrix. It aims to compute the matrix with the the problem is reduced. lowest rank or, if the rank of the completed matrix is known, a Mathematically, within the matrix factorization framework, addi- matrix of rank r that matches the known entries. A popular ap- tional information of user metadata (U) and item metadata (V) can proach for solving the problem is nuclear-norm-regularized (NN) be used and the following equation can be minimized [6]. matrix [7] as shown in the following equation. min ||Y − M(UV )||f rob + λu ||U ||f rob + λv ||V ||f rob + (U ,V,A,C) || − ||2 || || (8) min B R f rob + λ R NN (6) µu ||W − UC||f rob + µv ||Q − AV ||f rob Where, Where, W = 1 if user i belongs to class j else 0. C is the linear B = R + MT (Y − M.R ) (7) ij k−1 k−1 map from latent factor space to classi€cation domain. Q is the and, M is the binary mask. R is the rating matrix imputed and Y is class information matrix created similar to W . Other variables the original rating matrix. have their usual meanings. Introducing supervised learning into the latent factor model helps in improving the prediction accuracy 5 SUPERVISED COLLABORATIVE FILTERING by reducing the problem of rating matrix sparsity. Œe value of TECHNIQUES regularization parameters are determined using l-curve technique To analyze the dataset, some supervised techniques are used such [16] as supervised Matrix Factorization. Œe main advantage of using supervised methods is that whenever a new user or new item comes 6 EXPERIMENTS AND RESULTS in, it can make predictions for them as well which unsupervised Œree di‚erent datasets are used to compare the results of super- techniques fail to do [28, 29, 30, 31]. Œis is also called as cold start vised and unsupervised collaborative €ltering techniques used to problem. Œese are scalable and are dependent on the metadata predict user ratings. Œe datasets used for experiments are Movie- information of user and item because of which it gives more accu- lens 100K, MovieLens 1M and our dataset of Indian regional movies. rate predictions as it establishes relation well. Bias correction is For error calculation, Mean absolute error (MAE) and Root mean performed on the dataset by calculating user bias and item bias and squared error (RMSE) is calculated between the actual ratings and then the above technique is used to calculate the rating of a new the predicted ratings. Œe datasets are divided into 5 folds for eval- user for an item. uation. Œe ratings are binarized into like/dislike (1/-1) labels for experiments. Results of di‚erent techniques on these datasets are 5.1 Supervised Matrix Factorization shown in Table 3 and Table 4. Œe task of predicting ratings becomes dicult largely because As the values in the Table 2 indicate, the basic cosine similarity mea- of the sparsity of the ratings available in the database of a recom- sures between users and movies perform fairly well on all datasets. mender system. Œerefore, using the knowledge related to users Œe minimum MAE values result from the experiments using our demographics and item categories can enhance prediction accuracy dataset. Since the sparsity of regional cinema dataset and Movielens [25, 26, 27, 32]. Classes are formed as per users age group, gender 1M dataset is is very high (as indicated in Table 2), techniques like and occupation. A user can belong to multiple classes at a time. Probabilistic Matrix Factorization and Blind Compressed Sensing Class label information is important to learn the latent factor vec- perform beŠer than other basic similarity measures and among tors of users and movies in a supervised environment, in a way them the least MAE is again shown for our dataset. that they are consistent with the class label information available. To use metadata information, the information is encoded in the form of one hot vector of 1’s and 0’s where in case of languages, mul- [21] R. M. Bell and Y. Koren, Improved neighborhood-based collaborative €ltering, in tiple 1’s can be present in the vector. Since supervised techniques Proc. KDD-Cup Workshop, 2007, pp. 7-14. [22] C.Desrosiers, and G.Karypis, A comprehensive survey of neighborhood based uses both user and item metadata they outperform unsupervised recommendation methods, in Recommender Systems Handbook. New York, NY, collaborative €ltering techniques. Among all three datasets, min- USA: Springer, 2011, pp. 107-144. [23] J. Wang, A. P. de Vries, and M. J. T. Reinders, Unifying user-based and item-based imum MAE is shown on our dataset. Œis shows that the Indian collaborative €ltering approaches by similarity fusion, in Proc. 29th Annu. Int. Regional Cinema dataset can prove to be useful for building and ACM SIGIR Conf. Res. Develop. Inf. Retr., 2006, pp. 501-508. benchmarking recommendation systems in Indian context, which [24] S.Gleichman and Y.C.Eldar, Blind compressed sensing,IEEETrans. Inf. Œeory, vol. 57, no. 10, pp. 6958-6975, Oct. 2011. has the most diverse languages and demographics. [25] H.Ma, D.Zhou, C.Liu, M.R.Lyu, and I.King, Recommender systems with social regularization, in Proc. 4th ACM Int. Conf. Web Search Data Mining, 2011, pp. 287-296. 7 CONCLUSION AND FUTURE WORK [26] P. Massa and P. Avesani, Trust-aware recommender systems, in Proc.ACM Conf. India is one of the country where not only varying languages are Recommender Syst., 2007, pp. 17-24. [27] X.Tang, Y.Xu, and S.Geva, Learning higher order interactions for user and item pro- present, it’s population’s demographics are also very diverse in €ling based on tensor factorization, in Proc. 20th Int. Conf. Intell. User Interfaces, nature. Œerefore, Indian regional cinema has a lot of diversity 2015, pp. 213-224. when it comes to the number of languages and the demographics of [28] Q.Gu, J.Zhou, and C.Ding,Collaborative €ltering:Weighted non negative matrix factorization incorporating user and item graphs, in Proc. SDM, 2010, pp. 199-210. the viewers. Œere are thousands of such movies that are produced [29] S.-T. Park, D. Pennock, O. Madani, N. Good, and D. DeCoste, Naive €lterbots for annually and there is a huge community of people who watch them. robust cold-start recommendations, in Proc. 12th ACM SIGKDD Int. Conf. Knowl. Œerefore, a recommender system for Indian regional movies is Discovery Data Mining, 2006, pp. 699-705. [30] A. L. V. Pereira and E. R. Hruschka, Simultaneous co-clustering and learning to needed to address the preferences of the growing number of their address the cold start problem in recommender systems, Knowl.-Based Syst., vol. viewers. Œis dataset has around 10K ratings by Indian users, along 82, pp. 11-19, Jul. 2015. [31] Z.-K. Zhang, C. Liu, Y.-C. Zhang, and T. Zhou, Solving the cold-start problem with their demographic information. We believe that this dataset in recommender systems with social tags, EPL (Europhys. LeŠ.), vol. 92, no. 2, p. could be used to design, improve and benchmark recommendation 28002, Nov. 2010. systems for Indian regional cinema. We plan to release the dataset [32] A.-T. Nguyen, N. Denos, and C. Berrut, Improving new user recommendations with rule-based induction on cold user data, in Proc. ACM Conf. Recommender a‰er its publication. We further want to release another version Syst., 2007, pp. 121-128. of this dataset with more number of ratings and users, which will [33] G. Adomavicius and A. Tuzhilin, Toward the next generation of recommender help to improve the current state of recommender systems for the systems: A survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734-749, Jun. 2005. Indian audience. [34] Amazon: hŠp://www.amazon.in [35] BarnesAndNoble: hŠps://www.barnesandnoble.com [36] hŠps://www.clickondetroit.com/entertainment/ REFERENCES netƒix-steps-up-its-baŠle-with-amazon-in-india [1] MovieLens: hŠps://movielens.org/ [37] hŠp://www.€nancialexpress.com/industry/technology/ [2] Netƒix: hŠps://www.netƒix.com/in/ the-desi-content-baŠleground/798145/ [3] IMDb: hŠp://www.imdb.com/ [38] hŠps://grouplens.org/datasets/movielens/ [4] A. Mnih and R. Salakhutdinov, Probabilistic matrix factorization, in Proc. Adv. [39] hŠp://timeso€ndia.indiatimes.com/companies/ Neural Inf. Process. Syst., 2007, pp. 1257-1264. amazon-to-join-bollywood-€lm-industry-hires-consultants-to-create-a-blueprint\ [5] A. Gogna and A. Majumdar. (2015). Blind compressive sensing framework for -for-a-hindi-€lm-studio/articleshow/58176571.cms collaborative €ltering. Available: hŠp://arxiv.org/abs/1505.01621 [6] A. Gogna and A. Majumdar, A Comprehensive Recommender System Model: Im- proving Accuracy for Both Warm and Cold Start Users, in IEEE Access, vol. 3, no. , pp. 2803-2813, 2015. [7] T. Hastie, R. Mazumder, J. Lee, R. Zadeh, Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares, hŠps://arxiv.org/abs/1410.2596. [8] Netƒix Prize: hŠp://www.netƒixprize.com/ [9] hŠp://uis.unesco.org/en/news/record-number-€lms-produced [10] Yehuda Koren, Matrix Factorization Techniques for Recommender Systems, Pub- lished by the IEEE Computer Society, August 2009. [11] Xiaoyuan Su and Taghi M. Khoshgo‰aar, A survey of collaborative €ltering techniques, in Advances in arti€cial intelligence, vol. 2009, pp. 4, 2009. [12] Œomas Hofmann, Latent semantic models for collaborative €ltering, in ACM Transactions on Information Systems (TOIS), vol. 22, no. 1, pp. 89-115, 2004. [13] Gediminas Adomavicius and Alexander Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, in Knowledge and Data Engineering, IEEE Transactions on, vol. 17, no. 6, pp. 734-749, 2005. [14] Sivan Gleichman and Yonina C Eldar, Blind compressed sensing, in Information Œeory, IEEE Transactions on, vol. 57, no. 10, pp. 6958-6975, 2011. [15] Flickscore: hŠp://ƒickscore.iiitd.edu.in [16] C. L. Lawson and R. J. Hanson, Solving Least Squares Problems, vol. 161. Engle- wood Cli‚s, NJ, USA: Prentice-Hall, 1974. [17] M. F. Hornick and P. Tamayo, Extending recommender systems for disjoint user/item sets: Še conference recommendation problem, IEEE Trans. Knowl. Data Eng., vol. 24, no. 8, pp. 1478-1490, Aug. 2012. [18] Q.Liu, E.Chen, H.Xiong, Y.Ge, Z.Li and X.Wu, A cocktail approach for travel package recommendation, IEEE Trans. Knowl. Data Eng.,vol. 26, no. 2, pp. 278- 293, Feb. 2014. [19] Y. Koren and R. Bell, Advances in collaborative €ltering, in Recommender Systems Handbook. New York, NY, USA: Springer, 2011, pp. 145-186. [20] X. Su and T. M. Khoshgo‰aar, A survey of collaborative €ltering techniques, Adv. Artif. Intell., vol. 2009, Jan. 2009, Art. ID 4.