<<

applied sciences

Article Deep Learning Architecture for Collaborative Filtering Recommender Systems

Jesus Bobadilla * , Santiago Alonso and Antonio Hernando Dpto. Sistemas Informáticos, Escuela Técnica Superior de Ingeniería de Sistemas Informáticos, Universidad Politécnica de Madrid, 28031 Madrid, Spain; [email protected] (S.A.); [email protected] (A.H.) * Correspondence: [email protected]

 Received: 17 March 2020; Accepted: 30 March 2020; Published: 3 April 2020 

Abstract: This paper provides an innovative deep learning architecture to improve collaborative filtering results in recommender systems. It exploits the potential of the reliability concept to raise predictions and recommendations quality by incorporating prediction errors (reliabilities) in the deep learning layers. The underlying idea is to recommend highly predicted items that also have been found as reliable ones. We use the deep learning architecture to extract the existing non-linear relations between predictions, reliabilities, and accurate recommendations. The proposed architecture consists of three related stages, providing three stacked abstraction levels: (a) real prediction errors, (b) predicted errors (reliabilities), and (c) predicted ratings (predictions). In turn, each abstraction level requires a learning process: (a) Matrix Factorization from ratings, (b) Multilayer Neural Network fed with real prediction errors and hidden factors, and (c) Multilayer Neural Network fed with reliabilities and hidden factors. A complete set of experiments has been run involving three representative and open datasets and a state-of-the-art baseline. The results show strong prediction improvements and also important recommendation improvements, particularly for the recall quality measure.

Keywords: collaborative filtering; reliabilities; deep learning; recommender systems; matrix factorization

1. Introduction Recommender Systems (RS) are powerful tools to address the information overload problem in the . They make use of diverse sources of information. Explicit votes from users to items, and implicit interactions are the basis of the Collaborative Filtering (CF) RS. Implicit interactions examples are clicks, listened songs, watched movies, likes, dislikes, etc. Often, hybrid RS [1] are designed to ensemble CF with some other types of information filtering: demographic [2], context-aware [3], content-based [4], social information [5], etc. RS cover a wide range of recommendation targets, such as travels [6], movies [7], restaurants [8], fashion [9], news [10], etc. Traditionally, CF RS have been implemented using methods and algorithms, mainly the memory-based K nearest neighbor algorithm, and more recently the Matrix Factorization (MF) method. MF is an efficient model-based approach that obtains accurate recommendations; it is easy to understand and to implement, and it is based on the dimension reduction basis, where a reduced set of hidden factors contains the ratings information. Predictions are obtained by making the dot product of the users and the items hidden factors. There are several implementations of the MF such as PMF [11] and BNMF. An important drawback of the MF predictions is that the linear dot product cannot catch the complex non-linear relations existing among the set of hidden factors. Neural models do not have this restriction and they are called to lead the CF research in RS. Currently, RS research is directed toward deep learning approaches [12,13]. It is usual to find neural content-based approaches: images [8,9], text [10,14], music [15,16], videos [17,18], etc. The above targets

Appl. Sci. 2020, 10, 2441; doi:10.3390/app10072441 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, x FOR PEER REVIEW 2 of 14 Appl. Sci. 2020, 10, 2441 2 of 14 Currently, RS research is directed toward deep learning approaches [12,13]. It is usual to find neural content-based approaches: images [8,9], text [10,14], music [15,16], videos [17,18], etc. The aboveare processed targets are by usingprocessed different by using neural different networks neur architectures:al networks Convolutionalarchitectures: Convolutional Neural Networks Neural [19], NetworksRecurrent Neural[19], Recurrent Networks Neural [20], Multilayer Networks Neural [20], Multilayer Networks Neural (MNN) Netw [3,21],orks and (MNN) [3,21], [ 7and,22] autoencodersare usual approaches. [7,22] are Theusual preceding approaches. publications The preceding are focused publications on content-based are focused or on hybrid content-based filtering. orCF hybrid neural filtering. approaches CF canneural be classifiedapproaches as: can (1) be Deep classified Factorization as: (1) Deep Machines Factorization (deepFM) Machines [23] and (deepFM)(2) Neural [23] Collaborative and (2) Neural Filtering Collaborative (NCF) [24]. Filtering NCF simultaneously (NCF) [24]. NCF processes simultaneously item and user processes information item andusing user a dual information neural network. using a dual The neural two main network. NCF implementationsThe two main NCF are implementations (a) NCF [24] and are (b) (a) Neural NCF [24]Network and (b) Matrix Neural Factorization Network Matrix (NNMF) Factorization [25]. Since (NNMF) NCF reports [25]. better Since accuracyNCF reports than better NNMF, accuracy we use thanit as ourNNMF, baseline. we use Figure it as 1our shows baseline. the NCF Figure architecture: 1 shows the the NCF input architecture: sparse data the (ratings input ofsparse the users data (ratingsand ratings of the of theusers items) and isratings converted of the in items) dense informationis converted by in meansdense ofinformation an embedding by means layer. of Then, an embeddinga Multilayer layer. Perceptron Then, combinesa Multilayer both Perceptron features paths combines by concatenating both features them. paths Training by concatenating is performed them.using Training the known is ratingsperformed as labels. using Oncethe known the architecture ratings as haslabels. learned, Once itthe can architecture predict unknown has learned, ratings it canin a predict non-linear unknown process. ratings Our proposed in a non-linear architecture proces makess. Our useproposed of the NCFarchitec one,ture expanding makes use it to of hold the NCFtraining one, using expanding known it errors to hold and training then predicting using known reliabilities. errors and We willthen refer predicting to the proposed reliabilities. architecture We will referas Reliability-based to the proposed Neural architecture Collaborative as Reliabil Filteringity-based (RNCF). Neural Collaborative Filtering (RNCF).

FigureFigure 1. 1. NeuralNeural Collaborative Collaborative Filtering Filtering (NCF) (NCF) architecture. architecture.

ThereThere are are some some publications publications that that combine combine wide wide and and deep deep learning learning [26,27] [26 ,to27 ]support to support CF RS. CF The RS. wideThe wide side sideis implemented is implemented using using a perceptron, a perceptron, wher whereaseas the the deep deep side side is isimplemented implemented using using an an MNN. MNN. TheThe wide component learns simple relations (mem (memorization);orization); the the deep deep component component learns learns complex complex relationsrelations (generalization). The proposedproposed architecturearchitecture (RNCF) (RNCF) focuses focuses on on the the deep deep learning learning approach approach to tocatch catch complex complex relations relations both both to to predict predict reliabilities reliabilities and and to to predict predictrating rating values.values. The wide learning cannotcannot make make these these functions, functions, but but it it could could slightly slightly improve improve the the overall overall results. results. A A potential potential future future work work isis to to expand expand RNCF RNCF to to hold hold a a wi widede learning component. The The deepFM deepFM architecture architecture [28] [28] joins joins a a deep deep andand a a wide wide component component to to process process the the raw raw input input vectors. vectors. It It simultaneously simultaneously learns learns low low and and high-order high-order featurefeature interactions.interactions. ComparedCompared to to NCF, NCF, deepFM deepFM achieves achieves a slight a slight accuracy accuracy improvement improvement (0.33% (0.33% Criteo Criteodataset, dataset, 0.48% Company0.48% Company dataset). dataset). The proposed The prop approachosed approach (RNCF) fits (RNCF) better withfits better the NCF with architecture the NCF architecturethan with the than deepFM with the one. deepFM Additionally, one. Additionally the wide, architectures the wide architectures have the sizehave of the the size input of the layer input as layera drawback as a drawback that jeopardizes that jeopardizes their scalability. their scalabil Givenity. Given the limited the limited room room forimprovement for improvement using using the thedeepFm deepFm model, model, we we have have chosen chosen the the more more regular regular and and scalable scalable NCF NCF architecture architecture as a as base a base to create to create our ourmodel. model. There There are some are othersome approachesother approaches that provide that neuralprovide networks neural basednetwor onks improved based on information improved informationcoming from coming data, suchfrom asdata, [29 such], where as [29], authors where extract authors interaction extract interaction behavior behavior from bipartite from bipartite graphs. graphs.Deep-learning Deep-learning architectures architectures can be designed can be designed by extracting by extracting multi-criteria multi-criteria information information from data from [30]. dataAdditionally, [30]. Additionally, a classification-based a classification-based deep learning deep learning collaborative collaborative filtering filtering approach approach is stated is instated [31], inwhere [31], thewhere learning the learning process process takes as takes information as informat twoion diff twoerent different binary sources:binary so (a)urces: relevant (a) relevant/non-/non-relevant relevantvotes, and votes, (b) voted and (b)/non-voted voted/non-voted items. items. OnceOnce we we have have reviewed reviewed the the most most important important neural neural networks networks solutions solutions that that the thestate state of the of theart providesart provides us, we us, are we going are going to explain to explain the main the mainconcep conceptsts of the ofproposed the proposed approa approachch by means by of means Figure of 2.Figure Some2 .state-of-the-art Some state-of-the-art deep lear deepning learning approaches approaches to CF RS to use CF hidden RS use factors hidden of factors users ofand users items and as embeddingitems as embedding [24]. Then, [24 from]. Then, hidden from factors, hidden an factors, MNN anlearns MNN the learns existing the non- existinglinear non-linear relations between relations Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 14 hidden factors and ratings (Figure 2a). Once the MNN has learned, predictions can be made by feedforward running the MNN, feeding it with the hidden factors corresponding to the pairs that have not an associated rating. These types of architectures provide improved results with regard to the classical MF approach, where predictions are obtained by means of a linear hidden factors-based dot product. Related to the reliability or confidence of the RS recommendations, some published papers [32] provide additional information to each prediction or recommendation: its reliability. As an example, we can recommend an item with 4 stars and a reliability of 2 stars, or we can recommend another item with 3.5 stars and a reliability of 4 stars. It is probable that the recommended user will choose the second item, since even knowing that our recommendation is lower (3.5 compared to 4), it is more reliable (4 compared to 2). Using the same concept, we can select recommendations from the set of predictions taking into account not only the prediction values, but also their reliabilities. By carefully studyingAppl. Sci. 2020 relations, 10, 2441 between predictions and reliabilities, probably we could extract some rules3 of to 14 recommend. Instead, we use a deep learning stage to learn the existing non-linear relation between predictions, reliabilities, and accurate recommendations. betweenA recommendation hidden factors and reliability ratings (Figureis defined2a). Onceas its the predicted MNN has error learned, [32]. predictionsThat is, in can addition be made to predictingby feedforward ratings, running we also the predic MNN,t the feeding error of it those with predictions. the hidden Th factorse hypothesis corresponding of this paper to the claims pairs that state-of-the-art have not an associated architectures rating. can These be typesimproved of architectures by adding provide the predicted improved resultserrors (reliabilities)with regard to to the the classical MNN learning. MF approach, Thus, wherethe state- predictionsof-the-art are architecture obtained by shown means in of Figure a linear 2a hiddencan be improvedfactors-based by using dot product. the architecture shown in Figure 2b.

Figure 2. ProposedProposed method method (RNCF) (RNCF) concepts: concepts: (a (a) )NCF NCF baseline, baseline, ( (bb)) improved improved NCF NCF by by adding reliabilities, ( (cc)) extraction of of the the hidden factors factors and and the the real prediction errors, ( d) prediction of the errors (reliabilities), ( e) training stage in the RNCF architecture. architecture. NCF: NCF: Neural Neural Collaborative Collaborative Filtering, Filtering, RNCF: Reliability-based Neural Collaborative Filtering.

TheRelated question to the that reliability now arises or confidence is the way of to the get RS the recommendations, predicted errors: somereliabilities published (black papers circles [32 in] Figureprovide 2b). additional This process information will need to two each separated prediction stages, or recommendation: as shown in Figure its reliability. 2c,d. Figure As 2c an shows example, the waywe can we recommend can get a set an of item real withprediction 4 stars errors: and a reliabilityfirst, an MF of 2is stars, processed or we to can obtain recommend the users another and items item hiddenwith 3.5 factors; stars and then, a reliabilitywe predict of each 4 stars. existing It is rating probable in the that dataset. the recommended Since we predict user willknown choose ratings, the wesecond can obtain item, since real (exact) even knowing prediction that errors our (top recommendation of Figure 2c). isThe lower next (3.5process compared (Figure to 2d) 4), is it to is make more usereliable of the (4 comparedreal prediction to 2). errors Using to the train same an concept,MNN; the we MNN can select is fed recommendations with hidden factors from in theits input set of layer,predictions and it taking learns into using account the real not onlyprediction the prediction errors as values, labels. but Once also the their MNN reliabilities. has learned, By carefully it can studying relations between predictions and reliabilities, probably we could extract some rules to recommend. Instead, we use a deep learning stage to learn the existing non-linear relation between predictions, reliabilities, and accurate recommendations. A recommendation reliability is defined as its predicted error [32]. That is, in addition to predicting ratings, we also predict the error of those predictions. The hypothesis of this paper claims that the CFN state-of-the-art architectures can be improved by adding the predicted errors (reliabilities) to the MNN learning. Thus, the state-of-the-art architecture shown in Figure2a can be improved by using the architecture shown in Figure2b. The question that now arises is the way to get the predicted errors: reliabilities (black circles in Figure2b). This process will need two separated stages, as shown in Figure2c,d. Figure2c shows the way we can get a set of real prediction errors: first, an MF is processed to obtain the users and items hidden factors; then, we predict each existing rating in the dataset. Since we predict known ratings, we can obtain real (exact) prediction errors (top of Figure2c). The next process (Figure2d) is to make Appl. Sci. 2020, 10, 2441 4 of 14 use of the real prediction errors to train an MNN; the MNN is fed with hidden factors in its input layer, and it learns using the real prediction errors as labels. Once the MNN has learned, it can feedforward reliabilities: error predictions (black circles) from not-voted pairs, where each user and each item consist of their hidden factors. Finally, to accomplish our objective: to accurately predict and recommend using hidden factors and reliabilities (Figure2b), we must train the MNN; Figure2e shows the way we use the known ratings (left to right faded circles) to train the MNN: known ratings are used as labels in the neural network training process, so that the MNN learns to accurately predict ratings (make predictions) from hidden factors and reliabilities. The trained MNN in Figure2e is feedforward used to make non-linear predictions (Figure2b). Please note that the proposed method acts as a collaborative filtering black box that is fed just with the votes of each . In this way, it can be used as a base to implement hybrid recommender systems containing demographic, context-based, content-based, and social information.

2. Materials and Methods The proposed RNCF architecture gets improved recommendation results by performing three different learning processes that produce three different abstraction levels in the whole system. Borrowing diagrams from Figure2, Figure3 shows the overall architecture: each one of the performed processes as well as their stacked connections. The first RNCF process is drawn in the bottom gray rectangle of Figure3; it is labeled as “Abstraction level 1”. Using the CF dataset containing the ratings casted by the users to the items, we run a machine learning MF method. In this way, we obtain two dense matrices containing the hidden factors: the users and the items matrices. From the two hidden factor matrices, predictions can be obtained by processing the dot product of each selected user and item. Usually, predictions are computed just for the not-voted items of each user, since it is not useful to recommend consumed items. In our proposed RNCF architecture, in its first learning process, we do the opposite: predictions are obtained for the voted (consumed) items (Equation (11)). This is because, in the first abstraction level, we look for the prediction errors made in the MF stage. The first stage of the proposed method can be mathematically interpreted as:

Let U be the set of users in the CF dataset. (1)

Let I be the set of items in the CF dataset. (2)

Let V be the set of categorical values of the ratings, where ‘ ’ means ‘not consumed’ • V = 1, 2, 3, 4, 5, (3) { •} Let r be the rating of the user u U to the item i I, r V (4) u,i ∈ ∈ u,i ∈ Let F be the number of chosen factors for the MF method (5)

Let s be the hidden factor f F of the user u U (6) u, f ∈ ∈ Let q be the hidden factor f F of the item i I (7) f ,i ∈ ∈ Let p be the prediction of the rating to the user u U on the item i I (8) u,i ∈ ∈ X pu,i = su, f q f ,i (9) · f F ∈ Let eu,i be the error of the prediction pu,i (10) 2 e = (r p ) r , (11) u,i u,i − u,i ∀ u,i • Appl. Sci. 2020, 10, 2441 5 of 14 Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 14

Figure 3. Proposed RNCF architecture.

TheNow, second the experiment’s stage of the design proposed is explained, method is incl drawnuding in the the used middle baseline, of Figure chosen3, which datasets, is labeled tested asquality ‘Abstraction measures, level selected 2’. As paramete it can ber seen, values we in make the machine use of an and MNN the deep to predict learning the MFalgorithms, error of used each prediction.framework, Inetc. this Experiments case, as usual, are performed once the neural using networkcross-validation has learned, and they predictions have been are tested made on for open the not-votedCF datasets ratings widely (Equation used by (18)).RS re Tosearch accomplish papers. theWe learning have selected process, Netflix* the neural, MovieLens network [33] input and is fedFilmTrust with the [34] hidden as datasets. factors Table of users 1 contains and items the (Equation most relevant (13)), whereasfacts about each these label datasets. output isThe fed cross- with eachvalidation obtained values error used in Stagein the 1 experiments of the proposed are abst methodracted (Equation in Table (15)).2. The That chosen is, Stagequality 2 MNNmeasure learns for theprediction obtained has errors been from the StageMean 1Absolute to provide Error the expected(MAE), whereas error of eachthe chosen prediction quality (reliability). measures In for its learningrecommendation operation, have the been second-stage Precision, mathematical Recall, and F1. interpretation is: The Precision measure returns the proportion of relevant recommendations regarding the total number of N recommendations,Let X be the where learning {relevant set of samples recommendations used to feed} are the MNNthe ratings greater than (12) or equal to a threshold. Recall is nthe proportiono of relevant recommendations regarding the total X = s , q u U, i I, f F r , . (13) number of relevant items (from theu ,totalf f ,numberi ∀ ∈ of ∀items∈ voted∀ ∈ for theu,i user).• Please note that whereas N is a constant, the numberLet Y ofbe relevant the target items set of is labels not. Recall used tois feeda “relative” the MNN measure, since it is more (14) difficult to extract relevant recommendations from a set of a few relevant items than from a large set n o Y = e u U, i I r , . (15) of relevant items. The F1 score combinesu ,precisioni ∀ ∈ and∀ ∈ recall, u,i giving• the same to each one 1 = 2 ∙ ∙ ⁄ ( + ) of them:Please note that in the first stage, real errors are obtained. by predicting known (voted) ratings (EquationFinally, (11)), to implement whereas in the the deep second learning stage, processes, we predict we errors have (reliabilities) made use of for the the Keras not-voted API, working ratings on top of TensorFlow. In this case, the programming language has been Python. Keras is a user friendliness API: it minimizes the required number of user actions for common use cases. Keras allows making designs by using fully configurable modules and it provides easy extensibility. The machine learning matrix factorization has been made using the Java-based CF4J framework [35]. Appl. Sci. 2020, 10, 2441 6 of 14

(Equation (18)) based on the real errors on the voted ratings. Once the MNN has learned, in its feedforward operation (prediction of errors), the second-stage mathematical interpretation is:

Let g be the stage 2 MNN feedforward process. (16)

Let e0u,i be the prediction of the error (reliability) obtained in the Stage 2 MNN (17)

e0u,i = g(X) u U, i I ru,i = . (18) ∀ ∈ ∀ ∈ • 2 The MNN learns by minimizing the error (e e0 ) r , . (19) u,i − u,i ∀ u,i • Finally, the RNCF Stage 3 is in charge of predicting the not-voted rating values. It takes, as input, the users and items hidden factors, as well as the previously predicted reliabilities. It is expected that the predicted reliabilities will provide the necessary additional information to improve predictions and recommendations. In its learning operation, the third-stage mathematical interpretation is:

Let X0 be the learning set of samples used to feed the MNN: (20)

n o X0 = s , q , e0 u U, i I, f F r , . (21) u, f f ,i u,i ∀ ∈ ∀ ∈ ∀ ∈ u,i •

Let Y0 be the target set of labels used to feed the MNN: (22) n o Y0 = r u U, i I r , . (23) u,i ∀ ∈ ∀ ∈ u,i • Once the third-stage MNN has learned, its feedforward prediction operation can be fed by selecting the not-voted ratings:

Let h be the stage 3 MNN feedforward process. (24)

Let tu,i be the prediction of the rating : (25)

tu,i = h(X0) u U, i I ru,i = . (26) ∀ ∈ ∀ ∈ • 2 The MNN learns by minimizing the error (t r ) r , . (27) u,i − u,i ∀ u,i • Recommendation Z is chosen by selecting the N highest predictions:

n o c Z = t t t0 , t0 Z , Z = N. (28) u,i u,i ≥ u,i u,i ∈ | | Now, the experiment’s design is explained, including the used baseline, chosen datasets, tested quality measures, selected parameter values in the machine and the deep learning algorithms, used framework, etc. Experiments are performed using cross-validation and they have been tested on open CF datasets widely used by RS research papers. We have selected Netflix*, MovieLens [33] and FilmTrust [34] as datasets. Table1 contains the most relevant facts about these datasets. The cross-validation values used in the experiments are abstracted in Table2. The chosen quality measure for prediction has been the Mean Absolute Error (MAE), whereas the chosen quality measures for recommendation have been Precision, Recall, and F1.

Table 1. Used datasets and their main facts.

Dataset #Ratings #Items #Users Ratings Values Netflix* (subset) 4,553,208 7000 10,000 1 to 5 MovieLens 1,000,209 3706 6040 1 to 5 FilmTrust 33,470 2509 1227 0.5 to 4 step 0.5 Appl. Sci. 2020, 10, 2441 7 of 14

Table 2. Cross-validation values.

Parameter Value % training 80 % testing 20 N factors {1, ... , 96} step 5 Recommendation threshold (θ) 4

The Precision measure returns the proportion of relevant recommendations regarding the total number of N recommendations, where {relevant recommendations} are the ratings greater than or equal to a threshold. Recall is the proportion of relevant recommendations regarding the total number of relevant items (from the total number of items voted for the user). Please note that whereas N is a constant, the number of relevant items is not. Recall is a “relative” measure, since it is more difficult to extract relevant recommendations from a set of a few relevant items than from a large set of relevant items. The F1 score combines precision and recall, giving the same relevance to each one of them: F1 = 2 Precision Recall/(precision + recall). · · Finally, to implement the deep learning processes, we have made use of the Keras API, working on top of TensorFlow. In this case, the programming language has been Python. Keras is a user friendliness API: it minimizes the required number of user actions for common use cases. Keras allows making designs by using fully configurable modules and it provides easy extensibility. The machine learning matrix factorization has been made using the Java-based CF4J framework [35]. The baseline for our experiments is the NCF neural architecture [24]. This is a state-of-the-art approach that outperforms the existing MF methods and gets better accuracy than the Neural Network Matrix Factorization design (NNMF). The deepFM architecture [28] has achieved similar results than the NCF one. Overall, the NCF architecture provides a representative baseline that outperforms the machine learning KNN and MF; it also is a representative baseline for neural state-of-the-art approaches to the CF RS field.

3. Results Results in this section have been split in two subsections: the first one shows and explains the prediction quality results, both for the proposed RNCF and for the NCF baseline. The second subsection is devoted to showing the recommendation quality results.

3.1. Prediction Results As explained in the previous section, the proposed RNCF architecture contains a Multilayer Neural Network (Figure3, abstraction level 2) that learns from the errors made in the MF predictions. Once the network has learned, it can predict “errors in predictions” (reliabilities). The smaller the predicted error, the greater its reliability. Figure4 shows the quality of the obtained reliabilities on the MovieLens dataset. As it can be seen, the neural network is not overfitted and its testing Mean Absolute Error (MAE) is close to the 0.01 value when 15 epochs have been run. The reliability values can be considered accurate enough to feed the neural network on top of the proposed architecture (Figure3, abstraction level 3). The Netflix and FilmTrust results are very similar to the MovieLens one; they have been omitted to keep this subsection as short as possible. Once reliabilities can be predicted on the abstraction layer 2 of the RNCF architecture, they can feed the abstraction layer 3 (on Figure3). Figures5 and6 show the obtained MAE by using both the NCF baseline and the proposed RNCF. Figure5 shows the MovieLens results, whereas Figure6 shows the Netflix ones. FilmTrust prediction results have been omitted to keep the size of this subsection short: they show the same behavior as the MovieLens and the Netflix results. Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 14

Table 1. Used datasets and their main facts.

Dataset #Ratings #Items #Users Ratings Values Netflix* (subset) 4,553,208 7000 10,000 1 to 5 MovieLens 1,000,209 3706 6040 1 to 5 FilmTrust 33,470 2509 1227 0.5 to 4 step 0.5

Table 2. Cross-validation values.

Parameter Value % training 80 % testing 20 N factors {1,…,96} step 5 Recommendation threshold () 4

The baseline for our experiments is the NCF neural architecture [24]. This is a state-of-the-art approach that outperforms the existing MF methods and gets better accuracy than the Neural Network Matrix Factorization design (NNMF). The deepFM architecture [28] has achieved similar results than the NCF one. Overall, the NCF architecture provides a representative baseline that outperforms the machine learning KNN and MF; it also is a representative baseline for neural state- of-the-art approaches to the CF RS field.

3. Results Results in this section have been split in two subsections: the first one shows and explains the prediction quality results, both for the proposed RNCF and for the NCF baseline. The second subsection is devoted to showing the recommendation quality results.

3.1. Prediction Results As explained in the previous section, the proposed RNCF architecture contains a Multilayer Neural Network (Figure 3, abstraction level 2) that learns from the errors made in the MF predictions. Once the network has learned, it can predict “errors in predictions” (reliabilities). The smaller the predicted error, the greater its reliability. Figure 4 shows the quality of the obtained reliabilities on the MovieLens dataset. As it can be seen, the neural network is not overfitted and its testing Mean Absolute Error (MAE) is close to the 0.01 value when 15 epochs have been run. The reliability values can be considered accurate enough to feed the neural network on top of the proposed architecture (Figure 3, abstraction level 3). The Netflix and FilmTrust results are very similar to the MovieLens Appl. Sci. 2020, 10, 2441 8 of 14 one; they have been omitted to keep this subsection as short as possible.

Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 14

Once reliabilities can be predicted on the abstraction layer 2 of the RNCF architecture, they can feed the abstraction layer 3 (on Figure 3). Figures 5 and 6 show the obtained MAE by using both the NCF baseline and the proposed RNCF. Figure 5 shows the MovieLens results, whereas Figure 6 shows the Netflix ones. FilmTrust prediction results have been omitted to keep the size of this subsection short: they show the same behavior as the MovieLens and the Netflix results.

Figure 5Figure shows 4. MovieLensthat the .neural Mean Absolute network Error is (MAE)not overfitted quality in theboth prediction for the of NCF the reliabilities. baseline (top graph on FigureFigure 5) and 4. MovieLensfor the proposed. Mean Absolute RNCF (bottomError (MAE) graph quality on Figure in the prediction 5). Comparing of the reliabilities. the testing MAE values at Figurethe end5 shows of the thattraining the neural (40 epochs network in isth note baseline overfitted and both 60 epochs for the NCFin RNCF), baseline we (top can graph claim the superiorityon Figure of5 the) and proposed for the proposed RNCF when RNCF applied (bottom to graph the onMovieLens Figure5). Comparingdataset: an theerror testing of 0.47 MAE versus values at the end of the training (40 epochs in the baseline and 60 epochs in RNCF), we can claim 0.62. RNCF achieves an improvement of 24% in the quality of the predictions. From the analysis of the superiority of the proposed RNCF when applied to the MovieLens dataset: an error of 0.47 versus Figure 6 (results obtained using the Netflix dataset), the same conclusions can be extracted: there is 0.62. RNCF achieves an improvement of 24% in the quality of the predictions. From the analysis of not overfitting,Figure6 (results and obtained RNCF usingachieves the Netflixan improvemendataset), thet same of approximately conclusions can be22% extracted: in the therequality isnot of the predictions.overfitting, Overall, and RNCF the achievesproposed an method improvement achieves of approximately a strong improvement 22% in the quality in the of thequality predictions. of the RS predictions.Overall, the proposed method achieves a strong improvement in the quality of the RS predictions. aeie(NCF) Baseline rpsdMethod Proposed

Figure 5. MovieLens. Mean Absolute Error (MAE) obtained by training both the NCF baseline (top chart) Figure 5. MovieLens. Mean Absolute Error (MAE) obtained by training both the NCF baseline (top and the proposed RNCF neural architecture (chart below). Training and testing evolutions for the chart)60 and first the epochs proposed (x-axis). RNCF neural architecture (chart below). Training and testing evolutions for the 60 first epochs (x-axis).

Since the processed version of Netflix is much bigger than the Movielens one, we have also used a bigger value for the neural network batch size hyperparameter in order to avoid a significant increase in runtime. High batch sizes tend to produce fluctuations in trends results, as shown in Figure 6 (Test MAE). These fluctuations can be smoothed by processing the average of a series of separate runs of the same neural network. Fluctuations appear on the TestMAE curve, and not on the TrainingMAE curve, due to the smaller size of the test set. Appl. Sci. 2020, 10, 2441 9 of 14 Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 14

FigureFigure 6. Netflix 6. Netflix. Mean. Mean Absolute Absolute Error Error (MAE) (MAE) obtained obtained by by tr trainingaining bothboth the the NCF NCF baseline baseline (top (top chart) chart) and theand proposed the proposed RNCF RNCF neural neural arch architectureitecture (chart (chart below). below). Training Training and and testing testing evolutions evolutions forforthe the 60 60 first epochs (x-axis). first epochs (x-axis). Since the processed version of Netflix is much bigger than the Movielens one, we have also used a 3.2. Recommendationbigger value for the Results neural network batch size hyperparameter in order to avoid a significant increase Inin this runtime. section, High thebatch firstsizes set of tend experiments to produce tests fluctuations the proposed in trends RNCF results, architecture as shown on in Figurethe Netflix*6 (Test MAE). These fluctuations can be smoothed by processing the average of a series of separate runs dataset. F1 results are shown in the top graph of Figure 7. As it can be seen, recommendation quality of the same neural network. Fluctuations appear on the TestMAE curve, and not on the TrainingMAE is improved for all the tested values of N (number of recommendations). It is remarkable that the curve, due to the smaller size of the test set. most improved values are the most relevant ones: around 11 recommendations for a single user. The bottom3.2. graph Recommendation in Figure Results7 shows the details of the F1 quality measure: we can separately see their Precision Inand this Recall section, components. the first set ofIn experiments this dataset, tests Precision the proposed does RNCFnot significantly architecture change on the Netflix* from the baselinedataset. to theF1 resultsproposed are shown RNCF, in theand top the graph complete of Figure F17 .improvement As it can be seen, is recommendationdue to the Recall quality behavior. is Additionally,improved forlow all numbers the tested valuesof recommendations of N (number of recommendations).(N) produce better It is remarkablerecommendation that the mostquality improvements.improved values As we are will the see, most the relevant same ones:type of around results 11 has recommendations been found on for the a single performed user. The experiments bottom usinggraph the MovieLens in Figure7 shows and the the FilmTrust details of the datasets.F1 quality Regarding measure: numerical we can separately improvement see their values,Precision if we take Nand = Recall11 as components.a reasonable In number this dataset, of recommendations,Precision does not significantly we approximately change from obtain the baselinethe following to improvementthe proposed percentages: RNCF, and Netflix the complete 34%,F1 MovieLensimprovement 20%, is due FilmTrust to the Recall 12%.behavior. These Additionally,improvements N decreaselow numberswhen a higher of recommendations number of recommendations ( ) produce better recommendationis selected. quality improvements. As we will see, the same type of results has been found on the performed experiments using the MovieLens and the FilmTrust datasets. Regarding numerical improvement values, if we take N = 11 as a reasonable number of recommendations, we approximately obtain the following improvement percentages: Netflix 34%, MovieLens 20%, FilmTrust 12%. These improvements decrease when a higher number of recommendations is selected. Appl. Sci. 2020, 10, 2441 10 of 14 Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 14

FigureFigure 7. Netflix 7. Netflix recommendationrecommendation accuracy. accuracy. Top Top chart: F1F1comparative comparative between between the the NCF NCF baseline baseline and and the proposedthe proposed RNCF. RNCF. Chart Chart below: below: PrecisionPrecision and andRecall Recallcomparative comparative between between the NCF the baseline NCF baseline and the and the proposedproposed RNCF; x x-axis:-axis: number number of of recommendations recommendations (N). (N). We have conducted some additional research to understand why the Recall quality measure explainsWe have most conducted of the F1 improvement,some additional as well research as the decreasing to understand quality improvementwhy the Recall when quality the number measure explainsof recommendations most of the F1 raises. improvement, Finally, we as have well found as the the decr justification:easing quality there is animprovement inverse correlation when the numberbetween of recommendations prediction values and raises. reliability Finally, values. we There have is found a tendency the forjust higherification: predictions there tois presentan inverse correlationlower reliabilities. between prediction This finding values has a deep and influence reliability on thevalues. evolution There of bothis a the tendencyPrecision andfor thehigher predictionsRecall quality to present measures: lower usually, reliabilities. recommendations This find areing madehas a by deep selecting influence the highest on the predicted evolution values; of both the Precisioninstead, the and proposed the Recall architecture quality ‘discards’measures: the usually, non-reliable recommendations recommendations are (the made ones by with selecting a high the highestpredicted predicted error). Sincevalues; the proposedinstead, architecturethe proposed can discard architecture non-reliable ‘discards’ predictions fromthe thenon-reliable set of recommendationshigh predicted (the items, ones it will with do a high better predicted job if there e isrror). a su ffiSincecient the number proposed of predictions architecture to select can anddiscard non-reliableto discard, predictions particularly from due tothe the set narrow of high margin predicte that providesd items, theit will low reliabilitiesdo a better related job if to there high is a predictions. Here is where the Recall quality measure can show its behavior. Recall is a relative measure, sufficient number of predictions to select and to discard, particularly due to the narrow margin that relative to the number of relevant items: highly valued items that usually are the highly predicted provides the low reliabilities related to high predictions. Here is where the Recall quality measure ones. On the other hand, Precision is an absolute measure that depends on the constant N (number of can showrecommendations). its behavior. Recall The Recall is adi relativefference measure, between therelative baseline to the method number and theof relevant proposed items: one will highly valuedincrease items as that the usually number are of relevant the high itemsly predicted increases, ones. since theOn proposedthe other method hand, Precision will be able is toan better absolute measurechoose that the mostdepends reliable on and the highly constant predicted N (number ones. Instead, of recommendations).Precision does not incorporate The Recall the number difference betweenof relevant the baseline items in method its equation: and the it just proposed incorporates one wi thell constantincreaseN asvalue. the number Finally, theof relevant higher the items increases,number since of recommendations, the proposed method the lower will the RNCFbe able recommendation to better choose improvement: the most simply,reliable the and higher highly predictedthe value ones. of NInstead,, the faster Precision the reliable does items not areincorporate finished, andthe thenumberRecall ofimprovement relevant items decreases. in its equation: it just incorporatesFigures8 and the9, correspondingconstant N value. to the Finally,MovieLens theand higher the FilmTrust the numberexperiments, of recommendations, show the same the lowermain the behaviorRNCF recommendation of the discussed Figure improvem7( Netflix*ent:). It simply, is remarkable the higher that the biggerthe value the dataset, of N, the higherfaster the the proposed neural architecture improvement. This is the expected result, since big datasets offer reliable items are finished, and the Recall improvement decreases. better opportunities to find items that are, at the same time, highly predicted and highly reliable. Figures 8 and 9, corresponding to the MovieLens and the FilmTrust experiments, show the same By selecting these items, our architecture can improve recommendations. This way, we can claim that * mainthe behavior proposed of RNCF the discussed deep learning Figure architecture 7 (Neflix is). scalable It is remarkable in the sense that that itthe works bigger better the when dataset, the the higherdataset the proposed size increases. neural architecture improvement. This is the expected result, since big datasets offer better opportunities to find items that are, at the same time, highly predicted and highly reliable. By selecting these items, our architecture can improve recommendations. This way, we can claim that the proposed RNCF deep learning architecture is scalable in the sense that it works better when the dataset size increases. Appl.Appl. Sci. 2020 Sci. 2020, 10,, x10 FOR, 2441 PEER REVIEW 11 of 1411 of 14 Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 14

Figure 8. MovieLens recommendation accuracy. Top chart: F1 comparative between the NCF baseline FigureFigure 8. MovieLens 8. MovieLens recommendationrecommendation accuracy. accuracy. Top chart: chart:F1 F1comparative comparative between between the the NCF NCF baseline baseline and the proposed RNCF. Chart below: Precision and Recall comparative between the NCF baseline and theand proposed the proposed RNCF. RNCF. Chart Chart below: below: Precision and andRecall Recallcomparative comparative between between the NCF the baseline NCF baseline and and the proposed RNCF; x-axis: number of recommendations (N). and thethe proposed RNCF; RNCF;x-axis: x-axis: number number of recommendationsof recommendations (N). (N).

FigureFigure 9. FilmTrust 9. FilmTrust recommendationrecommendation accuracy. accuracy. Top chart:chart:F1 F1comparative comparative between between the the NCF NCF baseline baseline Figure 9. FilmTrust recommendation accuracy. Top chart: F1 comparative between the NCF baseline and theand proposed the proposed RNCF. RNCF. Chart Chart below: below: Precision and andRecall Recallcomparative comparative between between the NCF the baseline NCF baseline and and the proposed RNCF. Chart below: Precision and Recall comparative between the NCF baseline and the proposed RNCF; RNCF;x-axis: x-axis: number number of recommendations of recommendations (N). (N). and the proposed RNCF; x-axis: number of recommendations (N). 4. Discussion 4. Discussion 4. DiscussionIn the same way that more accurate or less accurate predictions exist, there are more reliable or lessIn the reliable same predictions. way that more The proposedaccurate deepor less learning accura architecturete predictions uses exist, its first there levels are of more abstraction reliable or In the same way that more accurate or less accurate predictions exist, there are more reliable or less reliableto learn predictions. and to predict The the proposed reliability deep of each learning prediction. architecture Subsequently, uses its the first architecture levels of learnsabstraction the to less reliable predictions. The proposed deep learning architecture uses its first levels of abstraction to learnnon-linear and to predict relationships the reliability between accuracyof each prediction. and reliability; Subsequently, in this way, we the implement architecture a filter learns that selects the non- learn and to predict the reliability of each prediction. Subsequently, the architecture learns the non- linearthe relationships predictions that between are most accuracy likely to and satisfy reliability; the user, taking in this into way, account we implement each prediction’s a filter reliability. that selects linear relationships between accuracy and reliability; in this way, we implement a filter that selects the predictionsThis set of predictions that are most determines likely to the satisfy most accuratethe user, recommendations. taking into account each prediction’s reliability. the predictions that are most likely to satisfy the user, taking into account each prediction’s reliability. This set of predictions determines the most accurate recommendations. This set of predictions determines the most accurate recommendations. The results show a strong improvement in the quality of predictions. They also show an overall The results show a strong improvement in the quality of predictions. They also show an overall improvement in the recommendation quality: results from the three tested datasets show the improvement in the recommendation quality: results from the three tested datasets show the improvement of the F1 measure. Most of the F1 improvement is due to its Recall component, since improvement of the F1 measure. Most of the F1 improvement is due to its Recall component, since the higher the number of relevant ratings a user has issued, the greater the possibilities of the the higher the number of relevant ratings a user has issued, the greater the possibilities of the proposed method to select the most reliable predictions. On the other hand, as expected, when a large proposed method to select the most reliable predictions. On the other hand, as expected, when a large Appl. Sci. 2020, 10, 2441 12 of 14

The results show a strong improvement in the quality of predictions. They also show an overall improvement in the recommendation quality: results from the three tested datasets show the improvement of the F1 measure. Most of the F1 improvement is due to its Recall component, since the higher the number of relevant ratings a user has issued, the greater the possibilities of the proposed method to select the most reliable predictions. On the other hand, as expected, when a large number of items is recommended, the Recall improvement decreases, since the most reliable and accurate items run out. For the above reasons, the experiments also show a recommendation quality improvement as the dataset size increases. Since the proposed deep learning architecture only uses ratings as input data, it can be seen as a collaborative filtering black box that can be enhanced applying hybrid approaches: demographic, social, context-aware, content-based, etc. Finally, as future work, we propose (1) improving the proposed architecture results by enhancing its reliability extraction stage and (2) adding a wide learning component to the neural architecture, providing memorization by catching simple relations from multiple features.

Author Contributions: Authors J.B., S.A. and A.H. contributed equally to this work. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Dong, X.; Yu, L.; Wu, Z.; Sun, Y.; Yuan, L.; Zhang, F. A Hybrid Collaborative Filtering Model with Deep Structure for Recommender Systems. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 1309–1315. Available online: https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14676 (accessed on 12 February 2017). 2. Pazzani, M. A Framework for Collaborative, Content-Based and Demographic Filtering. Artif. Intell. Rev. 1999, 13, 393–408. [CrossRef] 3. Ebesu, T.; Fang, Y. Neural Citation Network for Context-Aware Citation Recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 1093–1096. [CrossRef] 4. Musto, C.; Greco, C.; Suglia, A.; Semeraro, G. Askme Any Rating: A Content-Based Recommender System Based on Recurrent Neural Networks. In Proceedings of the IIR, Venezia, Italy, 7–10 May 2016. 5. Wang, X.; He, X.; Nie, L.; Chua, T.-S. Item Silk Road. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 185–194. [CrossRef] 6. Yang, C.; Bai, L.; Zhang, C.; Yuan, Q.; Han, J. Bridging Collaborative Filtering and Semi-Supervised Learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and , Halifax, NS, Canada, 20–24 August 2017; pp. 1245–1254. [CrossRef] 7. Yi, B.; Shen, X.; Zhang, Z.; Shu, J.; Liu, H. Expanded Recommendation Framework and Its Application in Movie Recommendation. In Proceedings of the 2016 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA), Chengdu, China, 15–17 December 2016; pp. 298–303. [CrossRef] 8. Chu, W.-T.; Tsai, Y.-L. A hybrid recommendation system considering visual information for predicting favorite restaurants. 2017, 20, 1313–1331. [CrossRef] 9. He, R.; McAuley, J. Ups and Downs. In Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; pp. 507–517. [CrossRef] 10. Hsieh, C.-K.; Yang, L.; Wei, H.; Naaman, M.; Estrin, D. Immersive Recommendation. In Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; pp. 51–62. [CrossRef] 11. Li, K.; Zhou, X.; Lin, F.; Zeng, W.; Alterovitz, G. Deep Probabilistic Matrix Factorization Framework for Online Collaborative Filtering. IEEE Access 2019, 7, 56117–56128. [CrossRef] Appl. Sci. 2020, 10, 2441 13 of 14

12. Zhang, S.; Yao, L.; Sun, A.; Tay, Y. Deep Learning Based Recommender System. ACM Comput. Surv. 2019, 52, 1–38. [CrossRef] 13. Mu, R.; Zeng, X.; Han, L. A Survey of Recommender Systems Based on Deep Learning. IEEE Access 2018, 6, 69009–69022. [CrossRef] 14. Wang, X.; Yu, L.; Ren, K.; Tao, G.; Zhang, W.; Yu, Y.; Wang, J. Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors’ Demonstration. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 20–24 August 2017; pp. 2051–2059. [CrossRef] 15. Liang, D.; Zhan, M.; Ellis, D.P. Content Aware Collaborative Music Recommendation Using Pre-Trained Neural Networks. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR, Málaga, Spain, 26–30 October 2015; pp. 295–301. [CrossRef] 16. Wang, X.; Wang, Y. Improving Content-Based and Hybrid Music Recommendation Using Deep Learning. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 627–636. [CrossRef] 17. Chen, J.; Zhang, H.; He, X.; Nie, L.; Liu, W.; Chua, T.-S. Attentive Collaborative Filtering. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 335–344. [CrossRef] 18. Chen, X.; Zhang, Y.; Ai, Q.; Xu, H.; Yan, J.; Qin, Z. Personalized Key Frame Recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 315–324. [CrossRef] 19. Tang, J.; Wang, K. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Los Angeles, CA, USA, 5–9 February 2018; pp. 565–573. [CrossRef] 20. Suglia, A.; Greco, C.; Musto, C.; De Gemmis, M.; Lops, P.; Semeraro, G. A Deep Architecture for Content-Based Recommendations Exploiting Recurrent Neural Networks. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, Bratislava, Slovakia, 9–12 July 2017; pp. 202–211. [CrossRef] 21. Alashkar, T.; Jiang, S.; Wang, S.; Fu, Y. Examples-Rules Guided Deep Neural Network for Makeup Recommendation. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 941–947. 22. Zhang, S.; Yao, L.; Xu, X. AutoSVD++. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 957–960. [CrossRef] 23. Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Toronto, ON, Canada, 19–25 August 2017; pp. 1725–1731. 24. He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.-S. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017; pp. 173–182. [CrossRef] 25. Dziugaite, G.K.; Roy, D.M. Neural Network Matrix Factorization. arXiv 2015, arXiv:1511.06443. 26. Cheng, H.-T.; Ispir, M.; Anil, R.; Haque, Z.; Hong, L.; Jain, V.; Liu, X.; Shah, H.; Koc, L.; Harmsen, J.; et al. Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [CrossRef] 27. Yuan, W.; Wang, H.; Baofang, H.; Wang, L.; Wang, Q. Wide and Deep Model of Multi-Source Information-Aware Recommender System. IEEE Access 2018, 6, 49385–49398. [CrossRef] 28. Rendle, S. Factorization Machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, NSW, Australia, 13–17 December 2010; pp. 995–1000. [CrossRef] 29. Yin, R.; Li, K.; Zhang, G.; Lu, J. A deeper graph neural network for recommender systems. Knowl.-Based Syst. 2019, 185, 105020. [CrossRef] 30. Nassar, N.; Jafar, A.; Rahhal, Y. A novel deep multi-criteria collaborative filtering model for recommendation system. Knowl.-Based Syst. 2020, 187, 104811. [CrossRef] 31. Bobadilla, J.; Ortega, F.; Gutiérrez, A.; Alonso, S. Classification-based Deep Neural Network Architecture for Collaborative Filtering Recommender Systems. Int. J. Interact. Multimed. Artif. Intell. 2020, 6, 68. [CrossRef] Appl. Sci. 2020, 10, 2441 14 of 14

32. Zhu, B.; Ortega, F.; Bobadilla, J.; Gutierrez, A. Assigning reliability values to recommendations using matrix factorization. J. Comput. Sci. 2018, 26, 165–177. [CrossRef] 33. Harper, F.; Konstan, J. The MovieLens Datasets. ACM Trans. Interact. Intell. Syst. 2015, 5, 1–19. [CrossRef] 34. Golbeck, J.; Hendler, J. FilmTrust: Movie Recommendations Using Trust in Web-Based Social Networks. In Proceedings of the 3rd IEEE Consumer Communications and Networking Conference, Las Vegas, NV, USA, 8–10 Januaty 2006; Volume 1, pp. 282–286. [CrossRef] 35. Ortega, F.; Zhu, B.; Bobadilla, J.; Hernando, A. CF4J: Collaborative filtering for Java. Knowl.-Based Syst. 2018, 152, 94–99. [CrossRef]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).