A Spatiotemporal Dilated Convolutional Generative Network for Point-Of-Interest Recommendation

International Journal of Geo-Information

Article A Spatiotemporal Dilated Convolutional Generative Network for Point-Of-Interest Recommendation

Chunyang Liu 1, Jiping Liu 2,*, Shenghua Xu 2, Jian Wang 3, Chao Liu 4, Tianyang Chen 5 and Tao Jiang 2 1 School of Environment Science and Spatial Informatics, China University of Mining and Technology (CUMT), Xuzhou 221116, China; [email protected] 2 Chinese Academy of Surveying and Mapping, Beijing 100830, China; [email protected] (S.X.); [email protected] (T.J.) 3 School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 100044, China; [email protected] 4 School of Geodesy and Geomatics, Anhui University of Science and Technology, Huainan 232001, China; [email protected] 5 Department of Geography and Earth Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; [email protected] * Correspondence: [email protected]; Tel.: +86-138-0105-1915

Received: 18 January 2020; Accepted: 18 February 2020; Published: 19 February 2020

Abstract: With the growing popularity of location-based social media applications, point-of-interest (POI) recommendation has become important in recent years. Several techniques, especially the collaborative filtering (CF), Markov chain (MC), and recurrent neural network (RNN) based methods, have been recently proposed for the POI recommendation service. However, CF-based methods and MC-based methods are ineffective to represent complicated interaction relations in the historical check-in sequences. Although recurrent neural networks (RNNs) and its variants have been successfully employed in POI recommendation, they depend on a hidden state of the entire past that cannot fully utilize parallel computation within a check-in sequence. To address these above limitations, we propose a spatiotemporal dilated convolutional generative network (ST-DCGN) for POI recommendation in this study. Firstly, inspired by the Google DeepMind’ WaveNet model, we introduce a simple but very effective dilated convolutional generative network as a solution to POI recommendation, which can efficiently model the user’s complicated short- and long-range check-in sequence by using a stack of dilated causal convolution layers and residual block structure. Then, we propose to acquire user’s spatial preference by modeling continuous geographical distances, and to capture user’s temporal preference by considering two types of time periodic patterns (i.e., hours in a day and days in a week). Moreover, we conducted an extensive performance evaluation using two large-scale real-world datasets, namely Foursquare and Instagram. Experimental results show that the proposed ST-DCGN model is well-suited for POI recommendation problems and can effectively learn dependencies in and between the check-in sequences. The proposed model attains state-of-the-art accuracy with less training time in the POI recommendation task.

Keywords: point-of-interest recommendation; dilated causal convolution; residual block; spatial preference; temporal preference

1. Introduction During the past few years, with the rapid growth of mobile devices and location-based social networks (LBSNs) services, these services have attracted many users to share their locations and experiences with massive amounts of check-in data accumulated. The huge volume of check-in data

ISPRS Int. J. Geo-Inf. 2020, 9, 113; doi:10.3390/ijgi9020113 www.mdpi.com/journal/ijgi ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 2 of 20 ISPRS Int. J. Geo-Inf. 2020, 9, 113 2 of 20 experiences with massive amounts of check-in data accumulated. The huge volume of check-in data and contextual information brought opportunities for researching human mobility behavior in a large scaleand contextual[1,2]. Point-of-interest information (POI) brought recommendation opportunities forplays researching an important human role mobility in LBSNs behavior because in ait largecan predictscale [ 1users’,2]. Point-of-interest preferences to (POI)provide recommendation users valuable playssuggestions an important and assist role them in LBSNs to make because adequate it can decisionspredict users’ in their preferences daily routines to provide and userstrip planning valuable [3,4]. suggestions Figure and1 illustrates assist them an toexample make adequate of POI recommendation,decisions in their given daily all routines users’ check-in and trip sequences planning [data;3,4]. the Figure task1 isillustrates to predict an the example POI of a of user, POI whorecommendation, will visit at a specific given all time users’ point, check-in by mining sequences user’s data; location the task preferences is to predict and themovement POI of a patterns. user, who Thiswill task visit is at meaningful a speciﬁc time and point, important, by mining as it user’s not only location helps preferences users discover and movementinteresting patterns. locations This to increasetask is meaningful their engagement and important, with location-based as it not only serv helpsices, users but also discover creates interesting the opportunities locations tofor increase LBSN servicetheir engagement providers to with increase location-based their revenue services, through but also personalized creates the advertising opportunities [5]. for Therefore, LBSN service the researchproviders on to POI increase recommendation their revenue throughhas attracted personalized widespread advertising attention [5]. Therefore,from the academic the research and on industrialPOI recommendation fields [6–9]. has attracted widespread attention from the academic and industrial ﬁelds [6–9].

FigureFigure 1. 1. AnAn example example of of point-of-interest point-of-interest recommendation. recommendation.

UnlikeUnlike items items such such as as news, videos, andand musicmusic in in traditional traditional context-free context-free recommender recommender systems, systems, the theuser’s user’s history history check-in check-in data data implies implies the the interactions interactions between between a user a user and and POIs POIs in ain physical a physical world world [10 ]. [10].Thus, Thus, geospatial geospatial information, information, such such as geographical as geograph distance,ical distance, would would have have a significant a significant effect effect on user’s on user’sdaily daily activities activities and check-in and check-in behaviors. behaviors. For example,For example, people people prefer prefer to go to to go nearby to nearby malls malls or gyms or gymsbecause because such asuch decision a decision is more is time-e moreffi cienttime-efficient than attending than attending similar places similar in aplaces further in distance. a further As distance.per Tobler’s As per First Tobler’s Law of First Geography Law of [ 11Geography] that “Everything [11] that is“Everything related to everything is related else,to everything but near thingselse, butare near more things related are thanmore distant related things”, than distant adjacent things”, POIs adjacent are more POIs geographically are more geographically relevant than relevant distant thanPOIs. distant In the POIs. literature, In the spatial literature, influence spatial has beeninfluence mostly has modeled been mostly by utilizing modeled the distanceby utilizing between the distancetwo POIs; between moreover, two manyPOIs; existingmoreover, studies many have existing shown studies that there have is shown a strong that relationship there is a betweenstrong relationshipuser’s check-in between activities user’s and check-in geographical activities distances and geographical [12,13]. Besides, distances temporal [12,13]. context Besides, and temporal sequential contextrelations and are sequential also crucial relations factors are that also affect crucial human fact real-lifeors that check-in affect human activities real-life [7,14– check-in16] due toactivities the time [7,14–16]sensitivity due of theto the POI time recommendation. sensitivity of Forthe example,POI recommendation. people would repeatedlyFor example, go topeople the gym would after repeatedlywork on weekdays, go to the gym and theyafter couldwork alsoon weekdays, prefer to visit and cinemasthey could at nightalso prefer on weekends. to visit cinemas This also at reflectsnight onthe weekends. periodic characteristicsThis also reflects of users’the periodic check-in characteri behaviors,stics e.g., of users’ different check-in hours behaviors, in a day or e.g., diff erentdifferent days hoursin a week. in a day In addition,or different sequential days in a relations week. In ofaddition, the check-in sequential also need relations to be of considered. the check-in For also instance, need tomost be considered. people may For want instance, to find most a hotel people instead may of want a gym to afterfind a arriving hotel instead at the of airport. a gym Therefore, after arriving how atto the eff ectivelyairport. Therefore, capture user’s how short-to effectively and long-range capture user’s dependencies short- and from long-range a given check-independencies sequence from is aalso given an interestingcheck-in sequence problem is to also be investigation. an interesting However, problem how to tobe accurately investigation. predict However, user’s movement how to accuratelybehavior preferencepredict user’s according movement to complex behavior spatiotemporal preference contextual according features to complex and sequential spatiotemporal patterns contextualis still a challenging features and issue. sequential patterns is still a challenging issue. TheThe POI POI recommendation recommendation methods methods have have been been applied applied in in the the numerous numerous studies, studies, most most of of which which areare based based on on collaborative collaborative filtering filtering (CF) (CF) [17] [ 17and] andMarkov Markov chains chains (MC) (MC) [18]. However, [18]. However, traditional traditional user- baseduser-based CF methods, CF methods, item-based item-based CF methods, CF methods, and matrix and matrix factorization factorization (MF)-based (MF)-based CF methods CF methods find find it difficultit difficult to handle to handle long-range long-range sequences sequences and and inco incorporaterporate various various features features effectively effectively because because they they onlyonly learn learn linear linear or or low-order low-order interactions interactions between between features. features. Moreover, Moreover, MC-based MC-based methods methods assume assume ISPRS Int. J. Geo-Inf. 2020, 9, 113 3 of 20

ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 3 of 20 strong independence among different components and only utilize the last POI when modeling check-in sequences.strong independence Recently, deep among learning-based different components methods, especially and only RNNs-based utilize the methods,last POI when have beenmodeling applied in POIcheck-in recommendation sequences. Recently, and were deep assumed learning-based to be effective methods, [10,13 especially,19]. RNNs-based RNNs-based methods methods, outperform have otherbeen POI applied recommendation in POI recommendation methods since and they were can learn assumed long-range to be dependencieseffective [10,13,19]. effectively. RNNs-based Moreover, somemethods studies outperform consider integratingother POI spatiotemporalrecommendation contextual methods informationsince they can into learn RNN long-range structure to enhancedependencies the performance effectively. of POIMoreover, recommendation some studies [20 consider,21]. While integrating RNNs and spatiotemporal its variants have contextual shown an impressiveinformation capability into RNN in modelingstructure to check-in enhance sequences, the performance these RNN-based of POI recomme methodsndation depend [20,21]. on aWhile hidden stateRNNs of the and entire its variants past that have cannot shown eff ectivelyan impressive utilize ca parallelpability computation in modeling withincheck-in a sequences, check-in sequence these andRNN-based fully learn methods high-level depend interactions on a hidden between state of features the entire [22 past]. Consequently,that cannot effectively these issues utilize inevitablyparallel affectcomputation RNNs to furtherwithin a improve check-in theirsequence performance and fully learn when high-level applying interactions to POI recommendation. between features [22]. Consequently,To address thethese identified issues inevitably issues in affect existing RN studies,Ns to further inspired improve by the their WaveNet performance model [when23], we proposeapplying a spatiotemporalto POI recommendation. dilated convolutional generative network, or ST-DCGN for short, as a solutionTo to address POI recommendation. the identified issues The in framework existing studies, of the inspired proposed by method the WaveNet is depicted model in [23], Figure we 2. propose a spatiotemporal dilated convolutional generative network, or ST-DCGN for short, as a This model not only considers modeling complex long-range sequential relations to acquire the user’s solution to POI recommendation. The framework of the proposed method is depicted in Figure 2. sequential preference, but also modeling continuous geographic movement and temporal periodic This model not only considers modeling complex long-range sequential relations to acquire the user’s patterns to acquire the user’s personalized spatiotemporal preference. From our experiments, we sequential preference, but also modeling continuous geographic movement and temporal periodic observe that our model outperforms state-of-the-art algorithms on two publicly available datasets, patterns to acquire the user’s personalized spatiotemporal preference. From our experiments, we namelyobserve Foursquare that our model [24] andoutperforms Instagram state-of-the-art [25]. In conclusion, algorithms our on contributions two publicly available are summarized datasets, as follows:namely Foursquare [24] and Instagram [25]. In conclusion, our contributions are summarized as follows:We proposed a novel POI recommendation framework based on WaveNet model, where the •  conditionalWe proposed generative a novel modelPOI recommendation and dilated causal framework convolutions based on are WaveNet used to enablemodel, muchwhere largerthe receptiveconditional fields generative and model model complex and dilated long-range causal check-inconvolutions sequence. are used The to enable framework much notlarger only achievesreceptive higher fields recommendation and model complex performance, long-range but check-in also appears sequence. to have The aframework lower level not of only model achieves higher recommendation performance, but also appears to have a lower level of model complexity compared to the identified state-of-the-art POI recommendation methods. complexity compared to the identified state-of-the-art POI recommendation methods. Considering the importance of spatiotemporal contextual information, we acquire the user’s •  Considering the importance of spatiotemporal contextual information, we acquire the user’s personalizedpersonalized spatial spatial preferencepreference by by modeling modeling cont continuousinuous geographical geographical distances, distances, and capture and capture the theuser’s user’s personalized personalized temporal temporal preference preference by bymodeling modeling specific specific continuous continuous time time IDs, IDs, which which integratedintegrated patterns patterns in in two two time time scales scales (e.g.,(e.g., hourshours in a day and and days days in in a a week). week).  WeWe conducted conducted experiments experiments to studyto study the spatiotemporalthe spatiotemporal characteristics characteristics of users’ of check-inusers’ check-in behavior • onbehavior two real-world on two datasets,real-world and datasets, we compared and we ST-DCGN compared with ST-DCGN seven baselinewith seven approaches baseline of POIapproaches recommendation, of POI recommendation, and extensive experimentsand extensive showedexperiments that showed ST-DCGN that was ST-DCGN effective was and outperformseffective and state-of-the-art outperforms state-of-the-art methods significantly. methods significantly.

Data Check-in Preprocessing Sequences

Model Training LBSNs Data Step-1: Personalized Spatiotemporal Preference continuous Spatiotemporal geographical distances Experimental Evaluation Preference different time IDs Recall@k Step-2: Spatiotemporal Dilated Convolutional Generative Network F1-score@k

embedding layer NDCG@k A Simple dilated causal Generative Model convolutions layer

final layer

FigureFigure 2. 2.Framework Framework of of thethe proposedproposed model fo forr point-of-interest point-of-interest recommendation. recommendation. ISPRS Int. J. Geo-Inf. 2020, 9, 113 4 of 20

The rest of this paper is organized as follows: The existing related studies are brieﬂy reviewed in Section2. The details of our ST-DCGN method are delivered in Section3. Experiments and results of the proposed method are illustrated in Section4. Finally, conclusions and future work are drawn in Section5.

2. Related Work In this section, we review related work from two stream of methods, conventional and deep learning-based POI recommendation methods.

2.1. Conventional POI Recommendation Methods POI recommendation has been widely investigated in the field of LBSNs. Most previous solutions learned user preference for POIs using CF-based methods. User-based CF and item-based CF techniques are widely exploited for POI recommendation [6,7]. For example, Ye et al. [6] firstly proposed user-based and item-based approaches for POI recommendation by using CF techniques, which assumed that similar users had similar tastes for locations and users were interested in similar POIs. Furthermore, other researchers employed the model-based CF technique such as MF for POI recommendation in LBSNs [5,8,17,26], which searched for potential location preferences of users by factorizing a user-POI matrix into two low rank matrices, each of which represented the latent factors of users or POIs. Differing from traditional recommender systems, POI recommender systems need to consider geographical influence, temporal influence, sequential influence, or other characteristics (e.g., social relationship, reviews, categories, etc.) [8,21]. The geographical influence has been proven to be a significant factor in POI recommendation [13], where many existing studies mainly focus on integrating the geographical information due to the well-known strong correlation between users’ activities and geographical distance. Existing methods of modeling geographical influence mainly use several types of spatial distribution functions, such as power law function, multi-center Gaussian distribution, or kernel density estimation model [17,26–28]. For example, Cheng et al. [17] explained that users always visited nearby POIs around several centers (i.e., the most popular POIs), thus they capture the geographical influence via modeling the probability of a user’s check-in on a location as a multi-center Gaussian model (MGM). In addition, Zhang et al. [28] capture the personalized geographical influence by using a kernel density estimation approach. Lian et al. [26] proposed a GeoMF model to incorporate geographical information into MF, and used a two-dimensional kernel density estimation to characterize geographical influence over distance. The results of these works demonstrated the effectiveness of incorporating spatial context in POI recommendations. Temporal influence has been proved effective for modeling users’ check-in behavior by recent studies [5,7,14]. For example, Yuan et al. [7] argued that users’ visiting preferences for some locations exhibited time periodicity. Thus, they split time into hourly based slots and proposed time-aware point-of-interest recommendation method. Gao et al. [14] proposed four temporal aggregation strategies to integrate a user’s check-in preferences of different temporal states. Furthermore, some studies focus on the application of content information such as social information and other characteristics in LBSNs for POI recommendation as well. For example, Li et al. [29] presented a unified POI recommendation approach, which exploited geographical, social, and categorical associations between users and POIs. Yang et al. [30] considered both check-ins and comments of venues in location recommendation, and proposed a fusion framework to get a unified preference model from both check-ins and tips. However, most approaches fail to model complicated relations in the check-in sequence data. In addition to traditional CF methods, sequential methods have been considered for POI recommendation and they mostly rely on Markov chains. Mathew et al. [31] proposed a hybrid approach based on hidden Markov models, which clusters location histories according to their characteristics, and later trains an HMM for each cluster. Cheng et al. [18] proposed a matrix factorization model, namely FPMC-LR, to include both personalized Markov chain and localized regions solving the POI ISPRS Int. J. Geo-Inf. 2020, 9, 113 5 of 20 recommendation task. However, the underlying strong Markov assumption of these methods has difficulty in constructing more effective relationship among different components.

2.2. Deep Learning-Based POI Recommendation Methods Deep learning, developed in computer science, has been widely applied in many research fields, such as computer vision [32,33], natural language processing [34,35], and speech recognition [36,37]. Also, many deep learning techniques have recently been applied to POI recommendation systems, which may change the architectures of traditional recommendation and brings new opportunities to improve the recommended accuracy [38]. For example, a few previous works utilized Word2vec [39] to model human mobility behavior [40,41]. Recently, RNNs-based methods have gained remarkable attention and become more powerful in modeling user’s sequential history and transition. For example, Liu et al. [19] firstly brought RNN to next location prediction, where they employed a temporal and spatial recurrent neural network (ST-RNN) to model local temporal and spatial contexts in each layer with time-specific transition matrices for different time intervals and distance-specific transition matrices for different geographical distances. Kong et al. [42] built a hierarchical spatial–temporal long–short term memory (HST-LSTM) model, which naturally combined spatial-temporal influence into LSTM to mitigate the problem of data sparsity. Zhao et al. [20] proposed a ST-LSTM network for the next POI recommendation, which modeled spatiotemporal intervals between check-ins under LSTM architecture to learn user’s visiting behavior. Cui et al. [13] proposed a Distance2Pre network for the next POI prediction, and it can mine spatial preference to model the correlation of the user distance. Moreover, some researchers have integrated attention models into RNNs and achieved better performance. For example, Huang et al. [10] developed an attention-based spatiotemporal LSTM (ATST-LSTM) network for the next POI recommendation, which considered the relevant historical check-in records in a check-in sequence selectively using the spatiotemporal contextual information. Feng et al. [43] proposed an attentional mobility model, namely DeepMove, which predicted human mobility from lengthy and sparse trajectories. However, the above RNNs-based methods depend on a hidden state of the entire past that cannot effectively utilize parallel computing within a check-in sequence. This also results in a speed limit on the model’s training and evaluation process [22]. By contrast, the structure using convolutional neural network (CNN) does not depend on the calculation of each time step in the sequence history, but little work exists for POI recommendation by using CNN structure. Wang et al. [44] proposed a novel CNN-based visual content enhanced POI recommendation (VPOI), which incorporated visual contents into a probabilistic model for learning user and POI latent features, but they only used CNN framework when extracting features from images. Furthermore, Tang et al. [45] proposed a convolutional sequence embedding recommendation model by modeling recent actions as an “image” among time, latent dimensions, and learning sequential patterns using convolutional filters. It abandoned RNN structures and demonstrated that this CNN-based recommender can achieve superior performance to the popular RNN model in the Top-N sequential recommendation task. Yuan et al. [22] proposed a simple, efficient, and highly effective convolutional generative network for next-item recommendation, which was capable of learning high-level representation from both short- and long-range item dependencies. However, the above two sequence recommendation methods do not consider the spatiotemporal contextual information, and they are not specialized solutions to POI recommendations. Unlike existing studies, our work considers geographical influence and temporal influence in a personalized way into a spatiotemporal dilated convolutional generative network to capture user’s sequential preference and spatiotemporal preference.

3. Proposed Method In this section, we firstly addressed the identified problem of POI recommendation and then described our approach to obtain personalized spatiotemporal preference and components of ST-DCGN, ISPRS Int. J. Geo-Inf. 2020, 9, 113 6 of 20 ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 6 of 20 which included personalized spatiotemporal preference processing, a simple generative model under model under spatiotemporal conditions, an embedding layer, dilated causal convolution layers, and spatiotemporal conditions, an embedding layer, dilated causal convolution layers, and a final layer. a final layer. 3.1. Problem Formulation 3.1. Problem Formulation Let U = u1, u2, , um and X = x1, x2, , xn be the sets of m users and n POIs, respectively. Let 𝑼={𝑢 ,𝑢 ,⋯,𝑢··· } and 𝑿={𝑥 ,𝑥 ,⋯,𝑥··· } be the sets of m users and n POIs, respectively. Each POI has a unique identifier and geographical coordinates, which include geographical latitude Each POI has a unique identifier and geographical coordinates, which include geographical latitude and longitude. For user u, a check-in sequence that represents that user’s history check-ins are arranged n o and longitude. For user u, a check-inu = sequenceu u thatu represents that user’s history check-insu are in chronological order, denoted by X x1, x2, ,xT . Given each user’s check-in sequence X , the arranged in chronological order, denoted by··· 𝑿 = 𝑥 ,𝑥 ,⋯,𝑥 . Given each user’s check-in goal of POI recommendation is to predict the most likely POI xT+1 that the user u will visit at next time sequence 𝑿, the goal of POI recommendation is to predict the most likely POI 𝑥 that the user u point T + 1. will visit at next time point 𝑇1. 3.2. Personalized Spatiotemporal Preference 3.2. Personalized Spatiotemporal Preference In this part, we model check-in sequences and capture personalized spatiotemporal preference by consideringIn this part, geographical we model check-in influences sequences andtemporal and capture periodic personalized patterns. spatiotemporal Recent studies preference show thatby considering continuous geographical geographic movementinfluences and and temporal temporal periodic periodic patterns. patterns Recent are important studies show for POIthat recommendationscontinuous geographic [10,13, 16movement,19]. and temporal periodic patterns are important for POI recommendations [10,13,16,19]. 3.2.1. Personalized Spatial Preference 3.2.1. Personalized Spatial Preference Previous works show that power law distribution and multi-center Gaussian distribution can representPrevious the geographical works show informationthat power law by usingdistribution the users’ and overall multi-center historical Gaussian check-in distribution record [7, 17can]. Althoughrepresent theythe geographical reflect geographical information differences by using of the user’s users’ check-in overall behavior, historicalthey check-in ignore record the user’s[7,17]. personalizedAlthough they diff erencesreflect geographical in check-in behavior. differences In order of user’s to better check-in model behavior, the user’s they personalized ignore the check-in user’s behavior,personalized we usedifferences geographical in check-in distances behavior. of continuous In order to user’s better check-inmodel the to user’s model personalized the personalized check- spatialin behavior, preference. we use More geographical specifically, distances we calculate of cont theinuous distances user’s between check-in two to successivemodel the POIspersonalized that all users’spatial check-in preference. and More map specifically, these distances we calculate to discrete the bins, distances for example, between as two shown successive in Figure POIs3, wherethat all ∆users’s1 is mapped check-in to and the map interval these∆d distancesto 2∆d, and to discrete∆s3 is mapped bins, for to theexample, interval as 2shown∆d to 3 in∆d Figure, so every 3, where other distance∆𝑠 is mapped value can to the be similarlyinterval ∆𝑑 mapped to 2∆𝑑 to, aand specific ∆𝑠 is interval. mapped In to our the scheme, interval we 2∆𝑑 need to to3∆𝑑 define, so every one valueother ∆distanced to represent value can the be interval similarly of mapped discrete bins,to a sp asecific for theinterval. effects In of our parameter scheme, we settings, need to we define will discussone value them ∆𝑑 in to the represent experiments. the interval of discrete bins, as for the effects of parameter settings, we will discuss them in the experiments.

FigureFigure 3. 3.User’s User’s check-incheck-in behaviorbehavior andand itsits spatiotemporalspatiotemporal contextcontext division.division.

u n o WeWe transformtransform eacheach user’suser’s check-incheck-in sequencesequence X𝑿== x𝑥u, x,𝑥u,,⋯,𝑥, xu intointo aa ﬁxed-lengthfixed-length sequencesequence 1 2··· T u n u u uo E𝑬== x𝑥,,𝑥x , ,⋯,𝑥, x ,, wherewhere kk representsrepresents thethe maximummaximum lengthlength thatthat wewe consider.consider. IfIf thethe sequencesequence lengthlength X 1 2 ··· k waswas greater greater than thank k,, we we would would only only consider consider the the most most recent recentk kcheck-in check-in records. records. IfIf thethe sequencesequence lengthlength waswas lessless thanthank k,, we we would would add add padding padding items items to to the the left left until until the the length length became becamek k.. Therefore,Therefore, wewe cancan u n u u uo furtherfurther obtainobtain ﬁxed-lengthfixed-length continuouscontinuous geographicgeographic distancedistance sequencessequences E𝑬 == r𝑟, ,𝑟r , ,⋯,𝑟, r ,, andand thethe S 1 2 ··· k continuouscontinuous geographicgeographic distancedistance matrix matrix for for all allm musers users is is provided provided as as follows. follows. ISPRS Int. J. Geo-Inf. 2020, 9, 113 7 of 20 ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 7 of 20   r1 r1 r1  1 112 k 1  2 2 ··· 2   r rr12r  r rk   1 2 k  ES =  rr22··· r 2 (1) = 12 k  ES  (1)  ············mm m  r1 r2 rk mm··· m rr12 rk 3.2.2. Personalized Temporal Preference

3.2.2.Previous Personalized works Temporal have shown Preference that users’ check-in behavior exhibits periodic characteristics [7,16]. For example, users tend to check in around the gym from 18:00 to 20:00 on Tuesday and Thursday evenings,Previous but prefer works to have go to shown the market that users’ for shopping check-in onbehavior Saturday exhibits from periodic 15:00 to 17:00.characteristics Therefore, [7,16]. we canFor divide example, the users time periodictend to check pattern in intoaround two the scales: gym Di fromfferent 18:00 hours to 20:00 in a day on andTuesday different and daysThursday in a week.evenings, To capture but prefer two to periodic go to the patterns market offor users’ shopping check-in on Saturday behaviors, from we 15:00 introduce to 17:00. a two-slice Therefore, time we n o indexingcan divide scheme the time [16]. periodic As shown pattern in Figure into3 two, we scales firstly: obtainDifferent the hours timestamp in a day sequence and differentTu, Tu ,days, Tinu a 1 2 ··· k week. To capture two periodic patterns of users’n u ucheck-inuo behaviors, we introduce a two-sliceu time corresponding to the user’s check-in sequence x1, x2, , x , and then divide each timestamp Ti into indexing scheme [16]. As shown in Figure 3, we firstly··· obtaink the timestamp sequence 𝑇 ,𝑇 ,⋯,𝑇 the specific time interval of a week and a day. To be specific, a timestamp is divided into two slices in corresponding to the user’s check-in sequence 𝑥 ,𝑥 ,⋯,𝑥 , and then divide each timestamp 𝑇 terms of day of week, and hour slot. Furthermore, we split a week into seven days (i.e., Sunday to into the specific time interval of a week and a day. To be specific, a timestamp is divided into two Saturday) and a day into 24 h (i.e., 1 to 24). Then, we use 3 bits to denote the day in one week and 5 bits slices in terms of day of week, and hour slot. Furthermore, we split a week into seven days (i.e., to define the hour in one day. Finally, we convert the binary code into a unique decimal digit as the Sunday to Saturday) and a day into 24 h (i.e., 1 to 24). Then, we use 3 bits to denote the day in one time ID. In this time indexing scheme, we can obtain T=7 24 = 168 time slices. Figure4 demonstrates week and 5 bits to define the hour in one day. Finally,× we convert the binary code into a unique the procedure of encoding an exemplary time stamp, “2016-08-29 23:29:12”. Therefore, we can further decimal digit as the time ID. In this time indexing scheme,n we can obtaino 𝑇=7× 24 = 168 time slices. obtain fixed-length continuous time ID sequences Eu = tu, tu, , tu , and the continuous time ID Figure 4 demonstrates the procedure of encoding anT exemplary1 2 ··· timek stamp, “2016-08-29 23:29:12”. matrix for all m users are provided as follows. Therefore, we can further obtain fixed-length continuous time ID sequences𝑬 = 𝑡 ,𝑡 ,⋯,𝑡 , and the continuous time ID matrix for all m users 1are provided1 as1 follows.  t t t   1 2 ··· k   t2tt11t2  t 12  E =  1 122 ··· k  (2) T  22 2  tt t E  =············12 k  T tmtm tm  (2) 12 ··· k mm m tt12 tk

FigureFigure 4. 4.Time Time indexing indexing scheme scheme demonstration. demonstration. 3.3. A Generative Model under Spatiotemporal Conditions 3.3. A Generative Model under Spatiotemporal Conditions In this section, we introduce a novel generative model that is operated directly on the user’s In this section, we introduce a novel generative model that is operated directly on the user’s check-in sequence. The solution proposed here is inspired by the idea of WaveNet [23], a generative check-in sequence. The solution proposed here is inspired by the idea of WaveNet [23], a generative model for raw audio based on the PixelCNN [46] architecture. WaveNet provides a generic and ﬂexible model for raw audio based on the PixelCNN [46] architecture. WaveNet provides a generic and framework for tackling many applications that rely on audio generation (e.g., text-to-speech, music, flexible framework for tackling many applications that rely on audio generation (e.g., text-to-speech, speech enhancement, voice conversion, source separation). Similarly, we consider a user’s history music, speech enhancement,n voice conversion,o source separation). Similarly, we consider a user’s u = u u u check-in sequence EX x1, x2, ,xk , given a model with parameter θ. We aim to output the next history check-in sequence 𝑬 =···𝑥 ,𝑥 ,⋯,𝑥 , given a model with parameter 𝜃. We aim to output value xˆu conditional on the check-in sequence history. Let p Eu θ be the joint probability of check-in the nextk+1 value 𝑥 conditional on the check-in sequenceX history. Let 𝑝𝑬|𝜃 be the joint probability of check-in sequence 𝑥 ,𝑥 ,⋯,𝑥 ; moreover, we can factorize 𝑝𝑬|𝜃 as a product of conditional probabilities by chain rule as follows: ISPRS Int. J. Geo-Inf. 2020, 9, 113 8 of 20

n o sequence xu, xu, , xu ; moreover, we can factorize p Eu θ as a product of conditional probabilities 1 2 ··· k X ISPRSby chain Int. J. ruleGeo-Inf. as 2020 follows:, 9, x FOR PEER REVIEW 8 of 20 Yk p(Eu θ ) = p(xu xu, , xu, θ) (3) X| k i+1 1 ··· i uuuuθθ=i=1 ppxxx()EXii∏ (+11 ,,,) (3) where the POI sample xu is therefore conditionedi=1 on the samples of all the previous POIs n o k+1 xu, xu, , xu . u where1 2 the··· POIk sample xk+1 is therefore conditioned on the samples of all the previous POIs As mentioned, we considered the spatial and temporal contextual information in the POI 𝑥,𝑥,⋯,𝑥 . u = recommendation. Therefore, we also consider continuous geographic distance sequences ES n u Asu mentioned,uo we considered the spatial andu temporaln u u contextualuo information in the POI r , r , , r and continuous time ID sequences ET = t , t , , t as conditional inputs, when recommendation.1 2 ··· k Therefore, we also consider continuous1 geographic2 ··· k distance sequences 𝑬 = u n u u uo predicting𝑟,𝑟,⋯,𝑟 the and user’s continuous check-in sequencetime ID sequencesE = x , x𝑬, =,x𝑡,𝑡. Further,,⋯,𝑡 weas canconditional model the inputs, conditional when X 1 2 ··· k u predictingdistribution thep Euser’sX θ ofcheck-in the check-in sequence sequence 𝑬 = given𝑥 ,𝑥 these,⋯,𝑥 inputs. . Further, Equation we can (3) model now becomes the conditional distribution 𝑝𝑬|𝜃 of the check-in sequence given these inputs. Eq. (3) now becomes k Y k u uu u u uu u u uuuu u u u u p(EX θ ) =θθ= p(xi+1 x1, , xi , r1, r2, , ri , t1, t2, , ti , θ) (4) ppxxxrrrttt(| EXiiii )∏ (+11··· ,,,,,,,,,,,) 1 2··· 1 2 ··· (4) i=1i=1 wherewhere the the conditional conditional probability probability distribution distribution is is modelled modelled by by using using stacked stacked layers layers of of dilated dilated convolutions,convolutions, which which we we will will describe describe later. later.

3.4.3.4. Embedding Embedding Look-Up Look-Up Layer Layer Given a user’s continuous check-in sequence, the model retrieves each of the first k POIs Given a user’s continuous check-in sequence, the model retrieves each of the first k POIs 𝑬 = u n u u uo E𝑥,𝑥=,⋯,𝑥x , x , via, x a look-upvia a look-up table, and table, stacks and these stacks POI these embeddings POI embeddings together. together.Similarly, Similarly,we deal with we X 1 2 ··· k n o 𝑬 = 𝑟,𝑟,⋯,𝑟u = u u u thedeal user’s with thecontinuous user’scontinuous geographicgeographic distance sequences distance sequences ES r1 ,andr2, time, rk IDand sequence time ID n o ··· 𝑬sequence = 𝑡 ,𝑡E,⋯,𝑡u = tu , simultaneously.tu, , tu simultaneously. Assuming the Assuming embedding the embeddingdimension is dimension 2d, where d is can 2d, be where set asd T 1 2 ··· k thecan number be set as of the inner number channels of inner in channelsthe convolutiona in the convolutionall network, we network, create three we create embedding three embedding matrices 𝑬 ∈ℝ×u, 𝑬 k∈ℝ2d ×u , andk 2 d 𝑬 ∈ℝ×u fork 2 d POIs, geographic distances, and time IDs, matrices E0 R × , E0 R × , and E0 R × for POIs, geographic distances, and time IDs, X ∈ S ∈ T ∈ respectively.respectively. Inspired Inspired by by previous previous work work [22], [22], ou ourr proposed proposed method method will will learn learn the the embedding embedding layer layer through one-dimensional convolution filters. To be specific, the 2D matrix (i.e., 𝑬 u, 𝑬uand 𝑬u) is through one-dimensional convolution filters. To be specific, the 2D matrix (i.e., EX0 , ES0 and ET0 ) is reshapedreshaped from k𝑘×2𝑑2d to to a 1a 1×𝑘×2𝑑k 2d three-dimensional three-dimensional tensor. tensor. Figure Figure5 illustrates 5 illustrates the reshaping the reshaping process. process. × × ×

Figure 5. Transformation from the standard 2D filter to the 1D 2-dilated filter. Figure 5. Transformation from the standard 2D filter to the 1D 2-dilated filter. 3.5. Dilated Causal Convolutions Layer 3.5. Dilated Causal Convolutions Layer There are several obvious drawbacks of traditional convolution operation process for processing sequenceThere prediction are several problems, obvious drawbacks e.g., (1) some of traditional sequential convolution information operation will be lost process during for theprocessing pooling sequenceprocess; (2)prediction a simple problems, standard e.g., causal (1) convolutionsome sequential is only information able to increase will be thelost receptiveduring the field pooling with process; (2) a simple standard causal convolution is only able to increase the receptive field with size linear in the depth of the network. This makes it challenging to handle long-range dependence of check-in history sequence, as shown in Figure 6. Therefore, inspired by early work on speech modeling [23], our solution here is to construct the proposed generative model by using dilated causal convolution algorithm enabling an exponentially large receptive field. Figure 7 depicts a dilated ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 9 of 20 ISPRSISPRS Int. Int. J. J. Geo-Inf. Geo-Inf.2020 2020, 9, ,9 113, x FOR PEER REVIEW 9 of of 20 20 causalcausal convolution convolution with with filter filter size size g g= =3 3and and dilation dilation factors factors l =l =1, 1, 2, 2, 4, 4, 8. 8. We We can can see see that that a adilated dilated convolutionsizeconvolution linear in is theis a aconvolution depthconvolution of the where where network. the the filter Thisfilter is makes is applied applied it challengingover over an an area area to larger larger handle than than long-range its its length length dependenceby by skipping skipping inputofinput check-in values valueshistory with with a acertain sequence, certain step. step. as It shownIt is is equivalent equivalent in Figure to to a6 a.convolution convolution Therefore, with inspired with a alarger larger by earlyfilter filter derived work derived on from speechfrom the the originalmodelingoriginal filter [filter23], by ourby dilating solutiondilating it hereit with with is tozeros, zeros, construct but but is theis significantly proposedsignificantly generative more more efficient efficient model since by since using it it utilizes dilatedutilizes causalfewer fewer parameters.convolutionparameters. algorithmThus, Thus, the the enablingdilated dilated convolutional anconvolutional exponentially operat operat largeionion receptive can can better better field. handle handle Figure long-term 7long-term depicts ausers’ dilatedusers’ check-in check-in causal sequencesconvolutionsequences without without with filter using using size more moreg = 3network andnetwork dilation layers. layers. factors l = 1, 2, 4, 8. We can see that a dilated convolution is a convolutionInIn addition, addition, where at at training training the filter time, time, is appliedthe the conditional conditional over an probabilities area probabilities larger than for for itsall all length timesteps timesteps by skipping can can be be calculated inputcalculated values in in parallelwithparallel a certain because because step. all all timesteps It timesteps is equivalent of of check-in check-in to a convolution sequence sequences s withare are known. aknown. larger Note filterNote that derivedthat unlike unlike from RNN-based RNN-based the original models models filter thatbythat dilating depend depend it withon on a zeros, ahidden hidden but state isstate significantly of of the the entire entire more check echeckfficient-in-in sincehistory, history, it utilizes it it cannot cannot fewer fu fu parameters.llylly utilize utilize a Thus, aparallel parallel the mechanism.dilatedmechanism. convolutional As As a aresult, result, operation the the computing computing can better advant handleadvantage long-termage of of CNN CNN users’ models models check-in are are sequencesmore more preferred preferred without by by using POI POI recommendationmorerecommendation network layers. systems. systems.

FigureFigureFigure 6. 6. 6. AA A figure ﬁgure figure showing showing showing a a astack stack stack of of of causal causal causal convolutional convolutional convolutional layers. layers.

FigureFigureFigure 7. 7. 7. TheThe The proposed proposed proposed generative generative generative architecture architecture architecture with with with dilated dilated dilated causal causal causal convolutional convolutional convolutional network. network.

In addition, at training time, the conditional probabilities for all timesteps can be calculated MoreMore formally, formally, given given a aone-dimensional one-dimensional sequence sequence input input 𝑋∈ℝ 𝑋∈ℝandand a afilter filter 𝑓: 𝑓:0,1,0,1, ⋯ ⋯ , 𝑔 , 𝑔 1 1→→ in parallel because all timesteps of check-in sequences are known. Note that unlike RNN-based ℝℝ, the, the one-dimensional one-dimensional dilated dilated convolution convolution F Fon on element element s sof of the the sequence sequence is is defined defined as as models that depend on a hidden state of the entire check-in history, it cannot fully utilize a parallel −1 g g−1 mechanism. As a result, the computingFs()=∗=∗() X advantage f () s = = of fi CNN ()X ⋅ ⋅−⋅ models are more preferred by POI Fs()() Xlsli f () s fi ()X−⋅ (5)(5) lsli= recommendation systems. i 0i=0 More formally, given a one-dimensional sequence input X Rk and a filter f : 0, 1, , g 1 R, wherewhere f isf is the the filter filter function, function, g gis is the the filter filter size, size, l isl is the the dilation dilation∈ factor, factor, and and 𝑠𝑙∙𝑖 𝑠𝑙∙𝑖 accounts ···accounts− for for→ the the the one-dimensional dilated convolution F on element s of the sequence is defined as directiondirection of of the the past. past. Clearly, Clearly, dilated dilated causal causal convolution convolution algorithm algorithm can can better better capture capture long-term long-term check-incheck-in sequence sequence dependencies dependencies without without using using more more ne gnetwork1twork layers layers and and larger larger filters. filters. In In practice, practice, to to X− further increase the receptive fields and model capacity, we just need to repeat the dilated further increase the receptive Ffields(s) = and(X l modelf )(s) = capacity,f (i) X swel i just need to repeat the dilated(5) ∗ · − · convolutionconvolution structure structure in in Figure Figure 7 7by by stacking stacking (e.g., (e.g., 1, 1, 2, i2,= 4,0 4, 8, 8, 1, 1, 2, 2, 4, 4, 8). 8). AsAs discussed discussed in in [22], [22], in in order order to to learn learn higher-level higher-level feature feature representations representations from from long-range long-range where f is the filter function, g is the filter size, l is the dilation factor, and s l i accounts for the sequencesequence dependencies, dependencies, an an intuitive intuitive method method is is to to increase increase the the number number of of layers layers− · in in our our network. network. direction of the past. Clearly, dilated causal convolution algorithm can better capture long-term However,However, in in practice, practice, it it also also easily easily results results in in the the degradation degradation problem, problem, which which makes makes the the training training processprocess much much harder. harder. To To solve solve this this problem, problem, we we introduce introduce residual residual connections connections [33,47] [33,47] in in our our method. method. ISPRS Int. J. Geo-Inf. 2020, 9, 113 10 of 20 check-in sequence dependencies without using more network layers and larger filters. In practice, to further increase the receptive fields and model capacity, we just need to repeat the dilated convolution structure in Figure7 by stacking (e.g., 1, 2, 4, 8, 1, 2, 4, 8). As discussed in [22], in order to learn higher-level feature representations from long-range sequence dependencies, an intuitive method is to increase the number of layers in our network. However, in practice, it also easily results in the degradation problem, which makes the training ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 10 of 20 process much harder. To solve this problem, we introduce residual connections [33,47] in our method. AsAs shown shown in in Figures Figures 7 and8 8b,b, aa residual residual block block contains contains twotwo branches.branches. OneOne branchbranch isis toto convertconvert thethe inputinput layer layer EE toto FF throughthrough a a series series of of network network layers, layers, incl includinguding the the dilated dilated causal causal convolution convolution with with the layer-normalization [48], activation (e.g., ReLU [49]), and 1 1 convolutional in a specific order. the layer-normalization [48], activation (e.g., ReLU [49]), and 1 ×× 1 convolutional in a specific order. TheThe other other branch branch is is a direct projection of the inputinput EE.. TheThe residualresidual mappingmappingF 𝐹(E𝐸) can can be be computed computed as asfollows: follows: E = W (ReLU(φ(E))) + b (6) 1 =+1 φ 1 EWReLUE11( ( ( ))) b 1 (6) E = W (ReLU(φ(E ))) + b (7) 2 =+2 φ 1 2 EWReLUE22((())) 1 b 2 (7) F(E) = W (ReLU(φ(E ))) + b (8) =+3 φ 2 3 FE() W323 ( ReLU (( E ))) b (8) where φ denotes the layer-normalization. W1, W2, W3, b1, b2, and b3 are a set of weights and biases φ 1 2 3 1 2 3 wherefor the residual denotes block. the layer-normalization. Specifically, W2 denotes W , theW , dilatedW , b , b causal, and convolutionb are a set of weight weights function and biases with forfilter the size residualg = 3 block. and dilation Specifically, factors W2l denotes= 1, 2, 4, the 8. dilatedW and causalW denote convolution standard weight 1 function1 convolution with 1 3 × filterweight size function. g = 3 and dilation factors l = 1, 2, 4, 8. W1 and W3 denote standard 1 × 1 convolution weight function.

(a) (b) (c)

FigureFigure 8. 8. FusionFusion input input layer layer (a (a),), dilated dilated residual residual blocks blocks ( (bb),), and and final ﬁnal output output layer layer ( (cc).).

TheThe desired mappingmapping isis now now recast recast into intoF( E𝐹)+𝐸E𝐸by element-wise by element-wise addition. addition. This eThisffectively effectively allows allowslayers tolayers learn to modifications learn modifications to the identity to the identi mappingty mapping rather than rather the entirethan the transformation, entire transformation, which has whichbeen provenhas been beneficial proven in beneficial deeper networks in deeper by previousnetworks literatureby previous [22, 33literatu,47]. Inre our[22,33,47]. framework, In our we framework,capture the geographicalwe capture influencesthe geographical and temporal influences periodic and patterns temporal by modelingperiodic patterns specific spatiotemporal by modeling specificinformation. spatiotemporal Therefore, informatio we need ton. integrateTherefore, continuous we need to geographic integrate continuous distance sequence geographic and distance specific sequence and specific time ID sequence into our network. As shown in Figure 8a, the check-inu time ID sequence into our network. As shown in Figure8a, the check-in sequence input EX0 and sequence input 𝑬 and specific spatiotemporalu conditionsu (i.e., 𝑬and 𝑬) are fused through the specific spatiotemporal conditions (i.e., ES0 and ET0 ) are fused through the dilated causal convolutional dilatedand summed causal withconvolutional the parametrized and summed skip connectionswith the parametrized in the first layer.skip connections The result of in the the first first layer layer. is Thethe inputresult inof thethe subsequentfirst layer is dilated the input convolution in the subsequent layer with dilated a residual convolution connection layer from with the a residual input to connectionthe output from of the the convolution input to the (see output Figure of8 b).the Insteadconvolution of the (see standard Figure residual8b). Instead connection, of the standard we use residualparametrized connection, skip connection we use parametrized in the first layer, skip dynamically connection adjustingin the first the layer, weight dynamically parameters adjusting to ensure theour weight model parameters correctly extracting to ensure theour necessary model correc relationstly extracting between the the necessary forecast andrelations both between the check-in the forecastsequence and input both and specificthe check-in spatiotemporal sequence conditions. input and The specific conditioning spatiotemporal on the continuous conditions. geographic The conditioning on the continuousu geographic distance sequenceu 𝑬 and specific time ID sequence 𝑬 distance sequence ES0 and specific time ID sequence ET0 are done by computing the activation function areof thedone convolution by computing in the the first activation layer as: function of the convolution in the first layer as: =∗++∗++∗+′′′uuu EEReLU()()() wxl X b x ReLU w rl E S b r ReLU w tl E T b t (9)

∗ where wx , wr , and wt are learnable convolution filter, l denotes a convolution operator, and E denotes the result of multivariate sequence fusion.

3.6. Final Layer and Network Training We have already mentioned the matrix in the last layer of the dilated causal convolution architecture has the same dimensional size of the input embedding E (i.e., 𝑬∈ℝ×), but the result ISPRS Int. J. Geo-Inf. 2020, 9, 113 11 of 20

u u u E = ReLU(wx E0 + bx) + ReLU(wr E0 + br) + ReLU(wt E0 + bt) (9) ∗l X ∗l S ∗l T where wx, wr, and wt are learnable convolution ﬁlter, denotes a convolution operator, and E denotes ∗l the result of multivariate sequence fusion.

3.6. Final Layer and Network Training We have already mentioned the matrix in the last layer of the dilated causal convolution architecture k 2d has the same dimensional size of the input embedding E (i.e., E R × ), but the result we need should ∈ be a probability distribution that includes all POIs in the output sequence, where the probability distribution is the desired one that generates top-k POI recommendation list. In such a view, we g 2d n use a fully connected layer with weight matrix W R × . As mentioned, we aim to maximize the ∈ conditional likelihood (equation 4). Clearly, maximizing log p Eu θ is mathematically equivalent to X n o minimizing the sum of the binary cross-entropy loss for each item in xu, xu, , xu . Furthermore, we 1 2 ··· k use negative sampling strategy (e.g., sampled softmax [50]) to avoid the calculation of the full softmax distributions for network training.

4. Experimental Results and Analysis In this section, extensive experiments are conducted to compare our proposed ST-DCGN model with several state-of-the-art POI recommendation approaches. Firstly, two publicly accessible datasets are described and analyzed in detail. Then, baseline methods and evaluation metrics are introduced. Finally, experimental results are fully demonstrated, which include the recommendation performance and influence of hyper-parameters. In summary, our work attempts to answer the following research questions: RQ1: Can our proposed method perform better than state-of-the-art baselines in accuracy for POI recommendation tasks? RQ2: Does ST-DCGN outperform other deep neural networks (i.e., GRU, Distance2Pre, ST-RNN) in efficiency for POI recommendation tasks? RQ3: How do the parameters affect our model performance, such as the embedding size, spatial windows widths, and sequence length?

4.1. Datasets Description and Analysis Our experiments were conducted on the two publicly accessible LBSNs check-in datasets. The first one is the Foursquare check-ins, which were collected in Tokyo City from April 2012 to February 2013 [24]. The second one is the Instagram check-ins, which were collected in New York City from June 2011 to November 2016 [25]. Both the two datasets provide sufficient richness of user check-ins. Each check-in contains user ID, POI ID, and timestamp. For both two datasets, we removed POIs checked in by less than five users and users who have checked in fewer than five POIs to reduce noise and alleviate data sparsity problems. Furthermore, we also removed check-in data without time stamps in the original Instagram dataset and extracted data from October 2015 to September 2016 as our experimental dataset. After pre-processing, statistics of the two datasets are shown in Table1. Similar to some previous work [12,18], we further analyzed the geographic influence and temporal periodic patterns of the two datasets. Figure9 presents all users’ check-in distribution in the two datasets, and we can find that the check-in distributions in the two datasets were significantly different. More specifically, for both datasets, the check-in distribution of users was concentrated in some hot areas, but Foursquare check-in distribution was more scattered than Instagram, which may be due to the different distribution of hot spots. This phenomenon further revealed the spatial patterns across different cities. Moreover, we further investigated the geographical influence on users’ successive check-in behavior. In order to more intuitively explain the impact of geographical distance in users’ check-in behaviors, we calculated the cumulative distribution function (CDF) of geographical distance between any two check-ins and ISPRS Int. J. Geo-Inf. 2020, 9, 113 12 of 20 two consecutive check-ins of the same user in the Foursquare and Instagram datasets, respectively, as shown in Figure 10a,b. The results in Figure 10a indicate that users’ check-in behaviors have highly geographic relevance since both the CDF curves for the two datasets increase fast when the distance is small. Specifically, this phenomenon is more apparent in Figure 10b because it considers the user’s two consecutive check-ins. The above analysis suggests that it is necessary to consider the distance effect of continuous check-in behaviors in the POI recommendation algorithm. Thus, we attempted to utilize continuous geographical distance to capture user’s personalized spatial preferences and movementISPRSISPRS Int.Int. J.J. Geo-Inf.Geo-Inf. patterns. 2020,, 9,, xx FORFOR PEERPEER REVIEWREVIEW 12 12 of of 20 20

Table 1.1. Basic statistics of Foursquare andand InstagramInstagram dataset.dataset.

StatisticsStatistics Foursquare Foursquare Instagram #Users#Users 2293 2293 16,889 16,889 #POIs#POIs 6870 6870 3961 3961 #Check-ins#Check-ins 385,914 385,914 278,735 278,735 Avg. #check-ins per user 168.3 16.5 Avg. #check-ins per user 168.3 16.5 Avg. #visited POIs per user 56.2 70.4 Avg. #visitedsparsity POIs per user 97.550% 56.2 99.583% 70.4 sparsityTime span April 2012–February97.550% 2013 October 2015–September 99.583% 2016 Time span April 2012–February 2013 October 2015–September 2016

((a) Foursquare (Tokyo) (b) Instagram (New York)

Figure 9. Distribution of all users’ ch check-ineck-in in the two datasets.

(a) (b)

Figure 10. Geographical neighbor influence inﬂuence in users’ check-in behaviors. ( (a) Cumulative distribution function ofof geographicalgeographical distancedistance betweenbetween users’users’ anyany twotwo check-ins;check-ins; ((bb)) cumulativecumulative distributiondistribution functionfunction ofof geographicalgeographical distancedistance betweenbetween users’users’ consecutiveconsecutive check-ins.check-ins.

We furtherfurther explored explored two two temporal temporal periodic periodic patterns pa oftterns users’ of check-in users’ behaviors.check-in behaviors. More specifically, More forspecifically, the two datasets, for the two we datasets, compared we users’ compared check-in users’ probabilities check-in probabilities at different time at different in a day time and diinff aerent day and different days in a week by calculating the check-in frequencies in the corresponding time slots, respectively, as shown in Figure 11. Based on the results in Figure 11, we found that the two datasets exhibited different temporal patterns, and different living habits in different regions. More specifically, for the Foursquare dataset, Figure 11a shows that check-ins on weekdays were mainly concentrated between 8:00–9:00 and 19:00–20:00, while the weekends were mainly concentrated on 17:00–18:00, which also reflects the periodic characteristics of users’ check-in behavior. For the Instagram dataset, the difference in check-in time patternpattern waswas relativelyrelatively smallsmall onon weekdaysweekdays andand weekends but there were still differences in the check-in patterns at different time periods. In summary, there are significant time periodic characteristics of user’s check-in behavior. Therefore, ISPRS Int. J. Geo-Inf. 2020, 9, 113 13 of 20 days in a week by calculating the check-in frequencies in the corresponding time slots, respectively, as shown in Figure 11. Based on the results in Figure 11, we found that the two datasets exhibited different temporal patterns, and different living habits in different regions. More specifically, for the Foursquare dataset, Figure 11a shows that check-ins on weekdays were mainly concentrated between 8:00–9:00 and 19:00–20:00, while the weekends were mainly concentrated on 17:00–18:00, which also reflects the periodic characteristics of users’ check-in behavior. For the Instagram dataset, the difference in check-in time pattern was relatively small on weekdays and weekends but there were still differences ISPRSin the Int. check-in J. Geo-Inf. 2020 patterns, 9, x FOR at PEER different REVIEW time periods. In summary, there are significant time periodic 13 of 20 characteristics of user’s check-in behavior. Therefore, we attempted to use specific time ID coding to wecapture attempted the users’ to use personalized specific time temporal ID coding preferences to capture and the periodicusers’ personalized patterns. temporal preferences and periodic patterns.

(a) Foursquare (b) Instagram

FigureFigure 11. 11. PeriodicPeriodic pattern pattern in in users’ users’ check-in check-in behaviors. behaviors.

4.2.4.2. Baseline Baseline Approaches Approaches ToTo evaluate thethe e ffeffectivenessectiveness of ourof our proposed proposed method, method, we compared we compared ST-DCGN ST-DCGN with the with following the followingrepresentative representative baseline approaches baseline approaches for POI recommendation. for POI recommendation.  Bayesian Personalized Ranking (BPR): This work presents the generic optimization criterion Bayesian Personalized Ranking (BPR): This work presents the generic optimization criterion • BPR-OPT derived from the maximum posterior estimator for optimal personalized ranking [51]. BPRBPR-OPT is a classic derived baseline from method the maximum for general posterior POI estimatorrecommendation. for optimal personalized ranking [51].  GRU:BPR is RNN a classic is effective baseline for method POI recommendation for general POI recommendation. task, and we applied an extension of RNN calledGRU: GRURNN for is ecapturingffective for the POI long-term recommendation dependency task, [52]. and we applied an extension of RNN called •  FPMC-LR:GRU for capturing A state-of-the-art the long-term Markov dependency chain method [52]. for POI recommendation. This method is designedFPMC-LR: basedA state-of-the-art on first-order Markov Markov chain chain an methodd uses forneighbors POI recommendation. as negative samples This [18]. method is •  PRME-G:designed basedA state-of-the-art on first-order metric Markov embedding chain and meth usesod neighborsfor POI recommendation, as negative samples and the [18]. spatial distancePRME-G: is consideredA state-of-the-art as the metricweight embedding [12]. method for POI recommendation, and the spatial •  Caser:distance A isstate-of-the-art considered as standard the weight 2D [12 CNN-based]. method for personalized top-N sequential recommendationCaser: A state-of-the-art [45], and standardwe applied 2D Caser CNN-based in POI recommendation. method for personalized top-N sequential • Distance2Pre: A state-of-the-art GRU-based model for POI prediction, which acquires the spatial recommendation [45], and we applied Caser in POI recommendation. preference by modeling distances between successive POIs [13]. Distance2Pre: A state-of-the-art GRU-based model for POI prediction, which acquires the spatial • ST-RNN：A state-of-the-art RNN-based model for POI recommendation [19], which incorporates preference by modeling distances between successive POIs [13]. both local temporal and spatial transition context. ST-RNN: A state-of-the-art RNN-based model for POI recommendation [19], which incorporates • 4.3. Evaluationboth local Metrics temporal and and Experiment spatial transitionSetup context. To our best knowledge, Recall@k, F1-score@k, and NDCG@k (denoted by R@k, F1@k, and NDCG@k, respectively) are three popular top-k metrics used for evaluating POI recommendation results, such as [2,8,13,19]. In this study, the three metrics are formulated as follows:

N () 1 Rkuu T R@k =  (10) NTu =1 u

N 2/⋅⋅()Rk() T k() Rk() T / T = 1 uu uuu F@1 k  (11) N = () + () u 1 ()Rkuu T// k() Rk uuu T T

k rel 1121N n − NDCG@ k = u=1 ()+ (12) NYu n=1 log2 k 1 where k indicates the number of POIs recommended to the user. We report R@k, F1@k, and NDCG@k with k = 5, 10, and 20 in our experiments. 𝑅𝑘 indicates the Top-k list recommended to the user. 𝑇 represents the number of POIs the user visited. 𝑟𝑒𝑙 indicates the relevance of the nth POI to the user. 𝑌 represents the maximum DCG value of user u. ISPRS Int. J. Geo-Inf. 2020, 9, 113 14 of 20

4.3. Evaluation Metrics and Experiment Setup To our best knowledge, Recall@k, F1-score@k, and NDCG@k (denoted by R@k, F1@k, and NDCG@k, respectively) are three popular top-k metrics used for evaluating POI recommendation results, such as [2,8,13,19]. In this study, the three metrics are formulated as follows:

N 1 X Ru(k) Tu R@k = ∩ (10) N Tu u=1 | | N 1 X 2 Ru(k) Tu /k Ru(k) Tu / Tu F @k = · ∩ · ∩ | | (11) 1 N Ru(k) Tu /k + Ru(k) Tu / Tu u=1 ∩ ∩ | | k 1 XN 1 X 2reln 1 NDCG@k = − (12) N u=1 Yu log (k + 1) n=1 2 where k indicates the number of POIs recommended to the user. We report R@k, F1@k, and NDCG@k with k = 5, 10, and 20 in our experiments. Ru(k) indicates the Top-k list recommended to the user. Tu represents the number of POIs the user visited. reln indicates the relevance of the nth POI to the user. Yu represents the maximum DCG value of user u. Additionally, all experiments were implemented through Python 3.5 and TensorFlow on one graphic processing unit (GPU), NVIDIA GeForce RTX 2080Ti. For the Foursquare dataset, the learning rate and batch size were set as 0.001 and 30, respectively. For the Instagram dataset, the learning rate and batch size weere set as 0.001 and 40, respectively. Inspired by previous studies [13,21,22], we evaluated the POI recommendation results by using the leave-one-out evaluation. More speciﬁcally, we used the last (i.e., next) POI of each check-in sequence as the test data and the remaining POI as the training data. Furtheermore, all baseline methods were reimplemented in the two datasets mentioned, and the relevant parameters were set according to the optimal conﬁguration in the original paper.

4.4. Recommendation Performance The performances of our proposed model ST-DCGN and six baselines on the Foursquare and the Instagram datasets evaluated by R@k, F1@k and NDCG@k are shown in Figures 12 and 13, respectively (RQ1). We listed several findings as follows: (1) It is obvious that that our proposed ST-DCGN outperformed all identified baseline approaches on the Foursquare and Instagram datasets, showing ST-DCGN is effective for POI recommendation task. (2) Both BPR and GRU dropped behind other methods as they only model user–POI interactions without considering any contextual information to model users’ check-in behavior. Furthermore, it is worthy to note that GRU did not always achieve better performance than BPR, especially on the Foursquare dataset. This result indicates that a good neural network architecture (i.e., RNN cell) is not enough to obtain excellent accuracy in the POI recommendation task, so we should consider more spatial and temporal contexts. (3) In comparison to BPR and GRU, FPMC-LR and PRME-G incorporated geographical and sequential information, and they took advantage of different ranking-based optimization strategies. Therefore, their performance on the two datasets were obviously better, indicating that modeling spatial contexts is indeed useful for POI recommendation. (4) Caser obtained much better performance than GRU, and this result demonstrates the advantage of using CNN architecture. Although Caser does not integrate any spatiotemporal context information, it still outperforms FPMC-LR, since FPMC-LR only modeled the first-order Markov chain while Caser captured high-order relations. (5) Distance2Pre had obviously better performance than FPMC-LR and PRME-G due to its capability in modeling user’s sequential preference and spatial preference using RNN architecture. ST-RNN achieved further improvement by incorporating temporal contextual information. These great improvements indicate that neural network with spatiotemporal contextual information can obtain very promising performance in the POI recommendation task. (6) We firstly observed that ST-DCGN greatly outperformed Distance2Pre and ST-RNN on both datasets. ISPRS Int. J. Geo-Inf. 2020, 9, 113 15 of 20

ISPRSISPRS Int.Int. J.J. Geo-Inf.Geo-Inf. 20202020,, 99,, xx FORFOR PEERPEER REVIEWREVIEW 15 15 of of 20 20 Compared with ST-RNN, the ST-DCGN improved R@5, R@10, R@20, NDCG@5, NDCG@10, and NDCGcomputation@20 by to 14.62%, improve 12.13%, training 7.95%, efficiency. 17.92%, On 15.50%, thethe otherother and hand, 13.39%,hand, ST-DCGNST-DCGN respectively, tooktook on advantageadvantage the Foursquare ofof thethe dataset.personalized Also, spatiotemporal for the Instagram information, dataset, the and performance it can effectively improvements acquire in the the user’s evaluation spatial metrics preference were 11.03%,and temporal 22.29%, preference. 26.91%, 7.18%, 14.40%, and 16.12%, respectively.

((a)) RecallRecall ((b)) F1-scoreF1-score ((c)) NDCGNDCG

Figure 12. Performance comparison with state-of-the-art approaches on Foursquare.

((a)) RecallRecall ((b)) F1-scoreF1-score ((c)) NDCGNDCG

Figure 13. Performance comparison with state-o state-of-the-artf-the-artf-the-art approaches approaches on Instagram.

In addition to verifying theTable accuracy 2. Overall of our training proposed time model,(hours). we also evaluated the efficiency of ST-DCGN in Table2( RQ2). It is clear that our proposed ST-DCGN required less training time than other neural network models GRU GRU (i.e., CaserCaser GRU, Distance2Pre,Distance2PreDistance2Pre ST-RNN).ST-RNNST-RNN The ST-DCGNST-DCGN reason is that CNN-based methods can effFoursquareectively save training1.595 time1.227 through 2.309 the full parallel 2.958 mechanism 1.157 of convolutions. For example, we canInstagramInstagram adopt parallelism 2.116 when1.892 calculating 4.236 the product 5.156 of conditional 1.793 probabilities. It is worth noting that although Caser achieved higher efficiency by using CNN structure and parallel computing4.5. Sensitive compared Analysis of with Parameters RNN-based methods, ST-DCGN achieved further improvements in training timeInIn compared thisthis part,part, with wewe exploredexplored Caser, confirming thethe effectseffects theofof severasevera advantagell keykey hyper-parametershyper-parameters of considering using onon thethe dilated performanceperformance convolutional ofof ST-ST- generativeDCGN. Here, network. we focused on analyzing the impacts of embedding size, spatial window widths, and sequencesequence lengthlength ((RQ3).). ExperimentsExperiments werewere conductedconducted onon bothboth thethe FoursquareFoursquare andand InstagramInstagram dataset.dataset. Table 2. Overall training time (hours). Figure 14 presents the effects of embedding sizesize onon thethe performance.performance. WeWe analyzedanalyzed thethe performance of the proposed GRUST-DCGN Caser model on Distance2Pre both datasetsdatasets ST-RNN withwith differentdifferent ST-DCGN embeddingembedding sizessizes (i.e.,(i.e., 20, 40, 60, 80, 100,Foursquare and 120) and1.595 use R@5 1.227 and R@10 2.309as thethe measuremeasure 2.958 metrics.metrics. ItIt is 1.157is apparentapparent fromfrom thisthis figurefigure thatthat thethe performanceperformanceInstagram ofof2.116 STST-DCGN-DCGN 1.892 graduallygradually increasedincreased 4.236 withwith 5.156 thethe embeddingembedding 1.793 sizes,sizes, becausebecause highhigh dimension representation can learn more latent features and capture more complex interactions. We noticeIn that summary, the performance ST-DCGN of improved our model over became the bestrobust baseline when the approaches embedding on size the tworeached datasets 60 and with 80 respecton the toFoursquare the three metrics.and Instagram On one hand,datasets, our modelrespectively. tooktively. advantage However,However, of aa 1D largerlarger dilated embeddingembedding causal convolutions sizesize maymay networkresultresult inin modelmodel and residual performanceperformance learning degradationdegradation to increase duedue the toto receptive ovoverfitting. fields Therefore, and enable we chose training the ofembedding much deeper size networks,2𝑑 = 60 for for which thethe FoursquareFoursquare greatly enhances datasetdataset the andand modeling 2𝑑 =of 80 user’s for for thethe long-term InstagramInstagram dependency dataset.dataset. and short-term interest. Moreover, such a CNN-based network structure can fully utilize parallel computation to improve training efficiency. On the other hand, ST-DCGN took advantage of the personalized spatiotemporal information, and it can effectively acquire the user’s spatial preference and temporal preference. ISPRS Int. J. Geo-Inf. 2020, 9, 113 16 of 20

4.5. Sensitive Analysis of Parameters In this part, we explored the effects of several key hyper-parameters on the performance of ST-DCGN. Here, we focused on analyzing the impacts of embedding size, spatial window widths, and sequence length (RQ3). Experiments were conducted on both the Foursquare and Instagram dataset. Figure 14 presents the effects of embedding size on the performance. We analyzed the performance of the proposed ST-DCGN model on both datasets with different embedding sizes (i.e., 20, 40, 60, 80, 100, and 120) and use R@5 and R@10 as the measure metrics. It is apparent from this figure that the performance of ST-DCGN gradually increased with the embedding sizes, because high dimension representation can learn more latent features and capture more complex interactions. We notice that the performance of our model became robust when the embedding size reached 60 and 80 on the Foursquare and Instagram datasets, respectively. However, a larger embedding size may result in model performance degradation due to overfitting. Therefore, we chose the embedding size 2d = 60

ISPRSfor the Int. Foursquare J. Geo-Inf. 2020 dataset, 9, x FOR and PEER 2d REVIEW= 80 for the Instagram dataset. 16 of 20

(a) Foursquare (b) Instagram

FigureFigure 14. 14. EffectEﬀect of of embedding embedding size size in in ST-DCGN. ST-DCGN.

TableTable 33 showsshows thethe impactimpact ofof didifferentfferent spatial windows widths. We We analyzed the performance of of thethe proposed proposed ST-DCGN ST-DCGN model model on on both both datasets datasets with with different different spatial spatial window window widths widths (i.e., (i.e., 0.1 0.1 km, km, 0.30.3 km, km, 0.5 0.5 km, km, and and 0.7 0.7 km) km) regarding regarding R@5 and and F1@5. F1@5. It It is is obviously obviously seen seen from from Table Table 3 that ST-DCGN achievedachieved the the best best performance performance on on the the Foursq Foursquareuare dataset dataset when when the the spatial spatial window window width width ∆𝑑∆d was setset to to 0.5 0.5 km km while while the the best best performance performance was was achieved achieved on on the the Instagram Instagram dataset dataset when when ∆𝑑∆d was set to to 0.30.3 km. An explanationexplanation isis thatthat the the distances distances distributions distributions of of consecutive consecutive check-ins check-ins are are diff erentdifferent on two on twodatasets. datasets. For example,For example, for the for Foursquare the Foursquare and Instagram and Instagram dataset, dataset, 85% and 85% 93% and consecutive 93% consecutive check-ins check-inswere less thanwere 10 less km, than respectively, 10 km, respectively, as shown in Figureas shown 10b. in Therefore, Figure 10b. we Therefore, can see that we a largercan see∆d thatvalue a largermay be ∆𝑑 more value suitable may be when more dataset suitable covers when more dataset longer covers distances. more longer distances.

TableTable 3. Performance of ST-DCGN with varying window width by R@5 and F1@5.

0.10.1 km km 0.3 0.3 km 0.5 0.5 km km 0.7 0.7 km km R@5R@5 0.39200.3920 0.4248 0.4248 0.4277 0.4277 0.4121 0.4121 FoursquareFoursquare F1@5F1@5 0.12760.1276 0.1367 0.1367 0.1426 0.1426 0.1251 0.1251 R@5R@5 0.37520.3752 0.3810 0.3810 0.3793 0.3793 0.3681 0.3681 InstagramInstagram F1@5F1@5 0.11890.1189 0.1270 0.1270 0.1207 0.1207 0.1186 0.1186

Figure 15 presents the performance of the proposed ST-DCGN with different sequence length Figure 15 presents the performance of the proposed ST-DCGN with diﬀerent sequence length while while keeping other optimal hyperparameters unchanged. We can observe that the best POI keeping other optimal hyperparameters unchanged. We can observe that the best POI recommendation recommendation performance is achieved, respectively, when maximum sequence length 𝑘=80 performance is achieved, respectively, when maximum sequence length k = 80 and k = 30 on the and 𝑘=30 on the Foursquare and Instagram datasets. This result further suggests that our method can learn both short-term and long-term sequence dependencies well.

(a) Foursquare (b) Instagram

Figure 15. Performance of ST-DCGN with different sequence lengths by R@5 and R@10. ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 16 of 20

(a) Foursquare (b) Instagram

Figure 14. Effect of embedding size in ST-DCGN.

Table 3 shows the impact of different spatial windows widths. We analyzed the performance of the proposed ST-DCGN model on both datasets with different spatial window widths (i.e., 0.1 km, 0.3 km, 0.5 km, and 0.7 km) regarding R@5 and F1@5. It is obviously seen from Table 3 that ST-DCGN achieved the best performance on the Foursquare dataset when the spatial window width ∆𝑑 was set to 0.5 km while the best performance was achieved on the Instagram dataset when ∆𝑑 was set to 0.3 km. An explanation is that the distances distributions of consecutive check-ins are different on two datasets. For example, for the Foursquare and Instagram dataset, 85% and 93% consecutive check-ins were less than 10 km, respectively, as shown in Figure 10b. Therefore, we can see that a larger ∆𝑑 value may be more suitable when dataset covers more longer distances.

Table 3. Performance of ST-DCGN with varying window width by R@5 and F1@5.

0.1 km 0.3 km 0.5 km 0.7 km R@5 0.3920 0.4248 0.4277 0.4121 Foursquare F1@5 0.1276 0.1367 0.1426 0.1251 R@5 0.3752 0.3810 0.3793 0.3681 Instagram F1@5 0.1189 0.1270 0.1207 0.1186

ISPRSFigure Int. J. Geo-Inf.15 presents2020, 9 ,the 113 performance of the proposed ST-DCGN with different sequence length17 of 20 while keeping other optimal hyperparameters unchanged. We can observe that the best POI recommendation performance is achieved, respectively, when maximum sequence length 𝑘=80 Foursquare and Instagram datasets. This result further suggests that our method can learn both and 𝑘=30 on the Foursquare and Instagram datasets. This result further suggests that our method short-term and long-term sequence dependencies well. can learn both short-term and long-term sequence dependencies well.

(a) Foursquare (b) Instagram

FigureFigure 15. 15. PerformancePerformance of ofST-DCGN ST-DCGN with with different diﬀerent sequence sequence lengths lengths by by R@5 R@5 and and R@10. R@10.

5. Conclusions and Future Work In this work, we presented a spatiotemporal dilated convolutional generative network (i.e., ST-DCGN) for POI recommendation based on a deep neural network known as the WaveNet architecture [23]. The proposed method introduces a conditional generative model and dilated causal convolutions network to model users’ check-in sequences, which are very eﬀective to model the short- and long-range dependencies. Compared with the RNNs based methods, such a network structure can fully utilize parallel computation within a check-in sequence and greatly reduce the training and evaluation time of the model. In addition, we acquired the user’s personalized spatial preference and personalized temporal preference by using the continuous geographical distance and encoded speciﬁc time ID in each time step. Extensive experiments were conducted to evaluate the performance of ST-DCGN and other comparative methods. The experimental results showed that our proposed ST-DCGN model can achieve better performance than state-of-the-art methods for POI recommendation. In the future, we will incorporate more check-in features to improve performance of POI recommendation, like users’ activities, comment text, and picture information. On the other hand, we will explore more advanced neural networks, like graph convolutional neural network. Moreover, recent studies show that some conventional methods based on matrix factorization could generalize better [53,54]. Therefore, these methods are also worth exploring in the future.

Author Contributions: Conceptualization and methodology, Chunyang Liu; validation and formal analysis, Chunyang Liu, Jiping Liu, Shenghua Xu, and Jian Wang; writing—original draft preparation, Chunyang Liu, Chao Liu, and Tianyang Chen; writing—review and editing, Chunyang Liu and Tao Jiang. All authors have read and agreed to the published version of the manuscript. Funding: This research is partially supported by the National Key Research and Development Program of China (2016YFC0803101 and 2016YFC0803108). Conﬂicts of Interest: The authors declare no conﬂict of interest.

References

1. Gao, H.; Tang, J.; Liu, H. Exploring social-historical ties on location-based social networks. In Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, Dublin, Ireland, 4–7 June 2012. 2. Ding, R.; Chen, Z. RecNet: A deep neural network for personalized POI recommendation in location-based social networks. Int. J. Geogr. Inf. Sci. 2018, 32, 1631–1648. [CrossRef] ISPRS Int. J. Geo-Inf. 2020, 9, 113 18 of 20

3. Majid, A.; Chen, L.; Chen, G.; Mirza, H.T.; Hussain, I.; Woodward, J. A context-aware personalized travel recommendation system based on geotagged social media data mining. Int. J. Geogr. Inf. Sci. 2018, 27, 662–684. [CrossRef] 4. Wan, L.; Hong, Y.; Huang, Z.; Peng, X.; Li, R. A hybrid ensemble learning method for tourist route recommendations based on geo-tagged social networks. Int. J. Geogr. Inf. Sci. 2018, 32, 2225–2246. [CrossRef] 5. Li, X.; Cong, G.; Li, X.L.; Pham, T.A.N.; Krishnaswamy, S. Rank-geofm: A ranking based geographical factorization method for point of interest recommendation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 433–442. 6. Ye, M.; Yin, P.; Lee, W.C.; Lee, D.L. Exploiting geographical influence for collaborative point-of-interest recommendation. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 24–28 July 2011; pp. 325–334. 7. Yuan, Q.; Cong, G.; Ma, Z.; Sun, A.; Thalmann, N.M. Time-aware point-of-interest recommendation. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 28 July–1 August 2013; pp. 363–372. 8. Cai, L.; Xu, J.; Liu, J.; Pei, T. Integrating spatial and temporal contexts into a factorization model for POI recommendation. Int. J. Geogr. Inf. Sci. 2018, 32, 524–546. [CrossRef] 9. Gan, M.; Gao, L. Discovering Memory-Based Preferences for POI Recommendation in Location-Based Social Networks. ISPRS Int. J. Geo-Inf. 2019, 8, 279. [CrossRef] 10. Huang, L.; Ma, Y.; Wang, S.; Liu, Y. An Attention-based Spatiotemporal LSTM Network for Next POI Recommendation. IEEE Trans. Serv. Comput. 2019, 99.[CrossRef] 11. Tobler, W.R. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 1970, 46, 234–240. [CrossRef] 12. Feng, S.; Li, X.; Zeng, Y.; Cong, G.; Chee, Y.M.; Yuan, Q. Personalized ranking metric embedding for next new POI recommendation. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. 13. Cui, Q.; Tang, Y.; Wu, S.; Wang, L. Distance2Pre: Personalized Spatial Preference for Next Point-of-Interest Prediction. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Macau, China, 14–17 April 2019; pp. 289–301. 14. Gao, H.; Tang, J.; Hu, X.; Liu, H. Exploring temporal effects for location recommendation on location-based social networks. In Proceedings of the 7th ACM Conference on Recommender Systems, Hong Kong, China, 12–16 October 2013; pp. 93–100. 15. Kefalas, P.; Manolopoulos, Y. A time-aware spatio-textual recommender system. Expert. Syst. Appl. 2017, 78, 396–406. [CrossRef] 16. Wang, W.; Yin, H.; Du, X.; Nguyen, Q.V.H.; Zhou, X. TPM: A temporal personalized model for spatial item recommendation. ACM Trans. Intell. Syst. Technol. 2018, 9, 1–25. [CrossRef] 17. Cheng, C.; Yang, H.; King, I.; Lyu, M.R. Fused matrix factorization with geographical and social influence in location-based social networks. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012. 18. Cheng, C.; Yang, H.; Lyu, M.R.; King, I. Where you like to go next: Successive point-of-interest recommendation. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, China, 3–19 August 2013. 19. Liu, Q.; Wu, S.; Wang, L.; Tan, T. Predicting the next location: A recurrent model with spatial and temporal contexts. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. 20. Zhao, P.; Zhu, H.; Liu, Y.; Li, Z.; Xu, J.; Sheng, V.S. Where to Go Next: A Spatio-temporal LSTM model for Next POI Recommendation. arXiv 2018, arXiv:1806.06671. 21. Liu, C.; Liu, J.; Wang, J.; Xu, S.; Han, H.; Chen, Y. An Attention-Based Spatiotemporal Gated Recurrent Unit Network for Point-of-Interest Recommendation. ISPRS Int. J. Geo Inf. 2019, 8, 355. [CrossRef] 22. Yuan, F.; Karatzoglou, A.; Arapakis, I.; Jose, J.M.; He, X. A Simple Convolutional Generative Network for Next Item Recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 11–15 February 2019; pp. 582–590. ISPRS Int. J. Geo-Inf. 2020, 9, 113 19 of 20

23. Oord, A.V.D.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016, arXiv:1609.03499. 24. Yang, D.; Zhang, D.; Zheng, V.W.; Yu, Z. Modeling user activity preference by leveraging user spatial temporal characteristics in LBSNs. IEEE Trans. Syst. Man Cybern. Syst. 2014, 45, 129–142. [CrossRef] 25. Chang, B.; Park, Y.; Park, D.; Kim, S.; Kang, J. Content-Aware Hierarchical Point-of-Interest Embedding Model for Successive POI Recommendation. In Proceedings of the Program Committee of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2019; pp. 3301–3307. 26. Lian, D.; Zhao, C.; Xie, X.; Sun, G.; Chen, E.; Rui, Y. GeoMF: Joint geographical modeling and matrix factorization for point-of-interest recommendation. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 831–840. 27. Kurashima, T.; Iwata, T.; Hoshide, T.; Takaya, N.; Fujimura, K. Geo topic model: Joint modeling of user’s activity area and interests for location recommendation. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, Rome, Italy, 4–8 February 2013; pp. 375–384. 28. Zhang, J.D.; Chow, C.Y.iGSLR: Personalized geo-social location recommendation: A kernel density estimation approach. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Orlando, FL, USA, 5–8 November 2013; pp. 334–343. 29. Li, H.; Ge, Y.; Hong, R.; Zhu, H. Point-of-interest recommendations: Learning potential check-ins from friends. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Hilton Hotel, San Francisco, CA, USA, 13–17 August 2016; pp. 975–984. 30. Yang, D.; Zhang, D.; Yu, Z.; Wang, Z. A sentiment-enhanced personalized location recommendation system. In Proceedings of the 24th ACM Conference on Hypertext and Social Media, Paris, France, 1–3 May 2013; pp. 119–128. 31. Mathew, W.; Raposo, R.; Martins, B. Predicting future locations with hidden Markov models. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, PA, USA, 5–8 September 2012; pp. 911–918. 32. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Proc. Syst. 2012, 60, 1097–1105. [CrossRef] 33. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. 34. Irsoy, O.; Cardie, C. Deep recursive neural networks for compositionality in language. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2096–2104. 35. Dong, L.; Yang, N.; Wang, W.; Wei, F.; Liu, X.; Wang, Y.; Hon, H.W. Unified Language Model Pre-training for Natural Language Understanding and Generation. arXiv 2019, arXiv:1905.03197. 36. Hinton, G.; Deng, L.; Yu, D.; Dahl, G.; Mohamed, A.R.; Jaitly, N.; Sainath, T. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal. Proc. Mag. 2012, 29, 82–97. [CrossRef] 37. Amodei, D.; Ananthanarayanan, S.; Anubhai, R.; Bai, J.; Battenberg, E.; Case, C.; Chen, J. Deep speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 173–182. 38. Zhang, S.; Yao, L.; Sun, A.; Tay, Y. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. 2019, 52, 5. [CrossRef] 39. Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 3111–3119. 40. Liu, X.; Liu, Y.; Li, X. Exploring the Context of Locations for Personalized Location Recommendations. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 1188–1194. 41. Feng, S.; Cong, G.; An, B.; Chee, Y.M. Poi2vec: Geographical latent representation for predicting future visitors. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. 42. Kong, D.; Wu, F. HST-LSTM: A Hierarchical Spatial-Temporal Long-Short Term Memory Network for Location Prediction. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 2341–2347. ISPRS Int. J. Geo-Inf. 2020, 9, 113 20 of 20

43. Feng, J.; Li, Y.; Zhang, C.; Sun, F.; Meng, F.; Guo, A.; Jin, D. Deepmove: Predicting human mobility with attentional recurrent networks. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, Lyon, France, 23–27 April 2018; pp. 1459–1468. 44. Wang, S.; Wang, Y.; Tang, J.; Shu, K.; Ranganath, S.; Liu, H. What your images reveal: Exploiting visual contents for point-of-interest recommendation. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 391–400. 45. Tang, J.; Wang, K. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Los Angeles, CA, USA, 5–9 February 2018; pp. 565–573. 46. Oord, A.V.D.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel recurrent neural networks. arXiv 2016, arXiv:1601.06759. 47. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In European conference on computer vision. In Proceedings of the European Conference on Computer Vision; Amsterdam, The Netherlands, 11–14 October 2016, pp. 630–645. 48. Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. 49. Nair, V.; Hinton, G.E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. 50. Jean, S.; Cho, K.; Memisevic, R.; Bengio, Y. On using very large target vocabulary for neural machine translation. arXiv 2014, arXiv:1412.2007. 51. Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009; pp. 452–461. 52. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. 53. Ayala-Gómez, F.; Daróczy, B.Z.; Mathioudakis, M.; Benczúr, A.; Gionis, A. Where could we go? Recommendations for groups in location-based social networks. In Proceedings of the 2017 ACM on Web Science Conference, Troy, NY, USA, 25–28 June 2017; pp. 93–102. 54. Rendle, S.; Zhang, L.; Koren, Y. On the difficulty of evaluating baselines: A study on recommender systems. arXiv 2019, arXiv:1905.01395.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).