International Journal of Geo-Information

Article From Motion Activity to Geo-Embeddings: Generating and Exploring Vector Representations of Locations, Traces and Visitors through Large-Scale Mobility Data

Alessandro Crivellari * and Euro Beinat

Department of Geoinformatics—Z_GIS, University of Salzburg, 5020 Salzburg, Austria; [email protected] * Correspondence: [email protected]

 Received: 29 January 2019; Accepted: 4 March 2019; Published: 8 March 2019 

Abstract: The rapid growth of positioning technology allows tracking motion between places, making trajectory recordings an important source of information about place connectivity, as they map the routes that people commonly perform. In this paper, we utilize users’ motion traces to construct a behavioral representation of places based on how people move between them, ignoring geographical coordinates and spatial proximity. Inspired by natural language processing techniques, we generate and explore vector representations of locations, traces and visitors, obtained through an unsupervised approach, which we generically named motion-to-vector (Mot2vec), trained on large-scale mobility data. The algorithm consists of two steps, the trajectory pre-processing and the -based model building. First, mobility traces are converted into sequences of locations that unfold in fixed time steps; then, a Skip-gram Word2vec model is used to construct the location embeddings. Trace and visitor embeddings are finally created combining the location vectors belonging to each trace or visitor. Mot2vec provides a meaningful representation of locations, based on the motion behavior of users, defining a direct way of comparing locations’ connectivity and providing analogous similarity distributions for places of the same type. In addition, it defines a metric of similarity for traces and visitors beyond their spatial proximity and identifies common motion behaviors between different categories of people.

Keywords: embeddings; Word2vec; unsupervised learning; trajectories; motion behavior

1. Introduction The only source of information about relationships between different locations or traces (sequences of locations) provided by the traditional geographical representation is their spatial proximity. However, some specific applications may take advantage of further relationships ignored by the simple geographic coordinates. Such missing information may be relevant in cases where places (and connectivity between them) strongly influence people’s behavior over the territory, or when, vice versa, people’s movements between places can provide a meaningful indication of the functionality of such places. Nowadays, the rapid growth of positioning technology makes easier to track motion, allowing many devices to acquire current locations. These trajectory recordings are important sources of information about place connectivity, showing the routes that people commonly perform [1–4]. In this paper, we explore the concept of “behavioral proximity” between places, based on people’s trajectories, not on locations’ geography. We construct location embeddings as a vector representation strictly based on large-scale human mobility, defining a metric of similarity beyond the simple geographical proximity.

ISPRS Int. J. Geo-Inf. 2019, 8, 134; doi:10.3390/ijgi8030134 www.mdpi.com/journal/ijgi ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 2 of 24 ISPRS Int. J. Geo-Inf. 2019, 8, 134 2 of 23 embeddings as a vector representation strictly based on large-scale human mobility, defining a metric of similarity beyond the simple geographical proximity. TheThe meaning meaning of of “behavioral “behavioral proximity” proximity” is is related related to to the the concept concept of of trajectory. trajectory. Two Two locations locations are are behaviorallybehaviorally similar similar if if they they often often belong belong to to the the same same trajectories; trajectories; they they often often share share the the same same neighbor neighbor locationslocations along along the the trace. trace. Of Of course, course, closer closer locations locations areare likely likely to to be be more more connected connected between between each each other,other, but but it it is is not not always always the the case. case. For For example, example, if if two two places places are are located located next next to to each each other other but but none none ofof the the roads roads connects connects them, them, those those places places are are not not behaviorally behaviorally close, close, even even if if geographically geographically theythey are,are, or,or, again, again, if if two two locations locations are are spatially spatially close close to to each each other other but but rarely rarely visited visited together, together, rarely rarely following following thethe same same route, route, their their behavioral behavioral distance distance is higher is than higher the onethan between the one two locationsbetween geographicallytwo locations moregeographically distant but more often distant sharing but common often sharing trajectories. commo Thisn trajectories. can also reflect This can how also people reflect tend how to people visit placestend to when visit traveling. places when Figure traveling.1a shows Figure three 1(a) locations, shows where three thelocations, spatial where distance the LOC1–LOC2 spatial distance is comparableLOC1–LOC2 with is comparable the distance with LOC2–LOC3. the distance However, LOC2–LOC3. people’s However, behavior people’s between behavior those locations between isthose very locations different. is The very motion different. traces The between motion traces LOC1 between and LOC2 LOC1 are muchand LOC2 sparser: are much from asparser: behavioral from perspective,a behavioral LOC2 perspective, is “closer” LOC2 to LOC3. is “closer” Figure to1 LOC3b depicts. Figure people’s 1(b) depicts trajectories peop passingle’s trajectories through passing LOC1 andthrough LOC2; LOC1 Figure and1c shows LOC2; the Figure ones passing1(c) shows through the LOC2ones passing and LOC3. through This behavioralLOC2 and difference LOC3. This is capturedbehavioral by difference the embeddings. is captured by the embeddings.

(a) (b) (c)

FigureFigure 1. 1.Locations Locations having having comparable comparable distances distances LOC1–LOC2 LOC1–LOC2 and and LOC2–LOC3 LOC2–LOC3 ( a(),a), with with trajectories trajectories passingpassing through through LOC1 LOC1 andand LOC2LOC2 (b),), and LOC2 LOC2 and and LOC3 LOC3 (c (c),), within within a amaximum maximum time time period period of ofsix sixhours. hours.

OurOur study study is is not not limited limited to to a a location location level level but but can can also also be be extended extended to to a a trace trace level, level, allowing allowing comparisonscomparisons that that go beyondgo beyond the simplethe simple spatial distancespatial distance between centersbetween of masscenters (COMs), of mass considering (COMs), alsoconsidering the “behavioral also the relationships” “behavioral relationships” between traces. between Figure traces.2 shows Figure three 2 traces shows where three the traces distance where betweenthe distance COM1 between and COM2 COM1 is and comparable COM2 is withcomparab the distancele with the between distance COM1 between and COM3.COM1 and However, COM3. TRACE1However, and TRACE1 TRACE2 and are TRACE2 located inare proximity located in of proximity a common of area a common (different area sides (different of the samesides lake),of the whilesame TRACE3lake), while is in TRACE3 a different is in area. a different Trace embeddings area. Trace areembeddings able to capture are able the to influence capture ofthe particular influence areasof particular of interest, areas defining of interest, similar defining vector representations similar vector forrepresentations traces located for in theirtraces proximity, located in hence their consideredproximity, behaviorallyhence considered related. behaviorally related. Besides locations and traces, we explore a third level of representation, the visitor level, conceptually similar to the trace one. We propose comparisons intra-visitor, e.g., different behaviors of the same user in different hours of the day, and inter-visitor, between different customers or different groups of customers, e.g., to study the motion behavior of tourists grouped for instance by nationality.

ISPRS Int. J. Geo-Inf. 2019, 8, 134 3 of 23 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 3 of 24

Figure 2. TracesTraces having comparable distances CO COM1–COM2M1–COM2 and COM1–COM3 but behaviorally different meaning: TRACE1 TRACE1 and and TRACE2 TRACE2 are are located located in in the the proximity proximity of of the the same area of interest represented by the same lake.

BesidesOur purpose locations is the and design traces, of a machine-readablewe explore a thir representation,d level of representation, whereby behaviorally the visitor similar level, conceptuallylocations (traces, similar visitors) to the share trace similarone. We representations propose comparisons in mathematical intra-visitor, terms. e.g., We different hereby behaviors generate ofand the explore same user a dense in different vector representation hours of the day, obtained and byinter-visitor, means of between an embedding different method customers that weor differentgenerically groups called of motion-to-vectorcustomers, e.g., to (Mot2vec), study the mo applyingtion behavior the tools of oftourists Word2vec, grouped primarily for instance used by in nationality.natural language processing (NLP), to pre-processed trajectories. As Word2vec learns high-dimensional embeddingsOur purpose of words is the where design vectors of a machine-readable of similar words representation, end up near in whereby the vector behaviorally space, Mot2vec similar is locationsan NLP-inspired (traces, visitors) technique share that similar treats singlerepresenta locationstions in as mathematical “words” and terms. trajectories We hereby as “sentences”. generate andApplying explore the a Word2vec dense vector algorithm representation on a corpus obtain of pre-collecteded by means traces, of an high-dimensional embedding method embeddings that we genericallyof locations called are obtained, motion-to-vector and the vectors (Mot2vec), of behaviorally applying the related tools locations of Word2vec, occupy primarily the same used part in of naturalthe vector language space. Mot2vec proces issing initially (NLP), trained to onpre-processed trajectory data trajectories. to obtain feature As vectorsWord2vec of locations, learns high-dimensionalwhich in turn can beembeddings used to create of words trace andwhere visitor vector vectors.s of similar words end up near in the vector space,We Mot2vec evaluated is an the NLP-inspired method on a large-scale technique bigthat dataset treats madesingle of locations trajectories as “words” of foreign and visitors trajectories in Italy. asWe transformed“sentences”. theApplying trajectories the into Word2vec discrete (in algorithm space and time)on a location corpus sequences of pre-collected and fed them traces, to a high-dimensionalSkip-gram Word2vec-based embeddings model, of whichlocations defined are obta the embeddingined, and the vector vectors for each of behaviorally location according related to locationsits frequent occupy previous the same and next part locations of the vector along space. the trajectories Mot2vec is in initially the dataset. trained We on finally trajectory used locationdata to obtainembeddings feature to vectors construct of locations, trace embeddings which in turn and can visitor be used embeddings. to create trace Behavioral and visitor comparisons vectors. and clusterWe visualization evaluated the were method performed on a large-scale on the basis big of datase humant made motion of activity, trajectories in particular of foreign highlighting visitors in Italy.the differences We transformed between the spatial trajectories and behavioral into discre proximities.te (in space and time) location sequences and fed them to a Skip-gram Word2vec-based model, which defined the embedding vector for each location according2. Related to Work its frequent previous and next locations along the trajectories in the dataset. We finally used Vectorlocation space embeddings models of meaning to construct have beentrace used em forbeddings several decadesand visitor [5], butembeddings. the recent employment Behavioral comparisonsof machine learning and cluster to train visualization such models were greatly perfor improvedmed on their the precision,basis of human reaching motion the state-of-art activity, in particularcomputational highlighting linguistics the domain. differences Their betw abilityeen spatial of efficiently and behavioral calculating proximities. semantic similarity between linguistic entities is extremely useful in several fields, such as web search, opinion mining, and 2.document Related Work collections management [6–8]. VectorThe use space of distributional models of models meaning of meaning have been was introducedused for several by Charles decades Osgood [5], and but subsequently the recent employmentfurther developed of machine [9]. Meanings learning to of train words such were mode representedls greatly improved as vectors their containing precision, the reaching frequencies the state-of-artof co-occurrences in computational with every wordlinguistics in the domain. training Their corpus. ability Words of efficiently were both calculating points and axessemantic in a similaritymulti-dimensional between space,linguistic in which entities the is proximity extremel ofy similaruseful wordsin several was guaranteedfields, such byas theirweb frequent search, opinionco-occurrences. mining, However,and document in case collections of large corpora,management this representation[6–8]. may end up with millions of dimensions,The use causing of distributional very sparse vectors.models of meaning was introduced by Charles Osgood and subsequentlyThe solution further to this developed curse of [9]. dimensionality Meanings of was words the were birth ofrepresented “word embeddings”, as vectors containing dense vectors the frequenciesretaining meaningful of co-occurrences relations with between every entities. word in The the main training approaches corpus. to Words construct were word both embeddings points and axescomprise in a multi-dimensional count-based models space, [10 in], which predictive the proximity models usingof similar artificial words neural was guaranteed networks by [11 their,12], frequentand Global co-occurrences. vectors for WordHowever, Representation in case of larg [13].e corpora, Nowadays, this representation the last two approaches may end up are with the millions of dimensions, causing very sparse vectors.

ISPRS Int. J. Geo-Inf. 2019, 8, 134 4 of 23 most popular, boosting almost all areas of NLP. Their novelty consists of actively employing machine learning. The real improvement started in 2013 with the use of artificial neural networks for learning meaningful word vector representations. Mikolov [14] introduced Word2vec, a fast and efficient approach to learn word embeddings by means of a neural network model explicitly trained for that purpose. It featured two different algorithms: Continuous Bag-of-words and Skip-gram. The main idea consisted of continuously defining a prediction problem while scanning the training corpus through a sliding window. The objective is to predict the current word with the help of its contexts, and the outcome of the prediction determines whether adjusting the current word vector and in what direction. Lots of follow-up research was performed after Mikolov’s paper. Pennington [13] released GloVe, a different approach to learn word embeddings combining the global count models and the local context window prediction models. Levy and Goldberg showed that Skip-gram implicitly factorizes word-context matrix of point-wise mutual information coefficients [15], and studied its performance for different choices of hyperparameters emphasizing its great robustness and computational efficiency [16]. Le and Mikolov [17] proposed Paragraph Vector to learn distributed representations also for paragraphs and documents. Bojanowski [18] released fastText, creating embeddings for character sequences. The main frameworks and toolkits available are: Word2vec original C code [19], Gensim framework for Python including Word2vec and fastText implementation [20], Word2vec implementation in TensorFlow [21], and GloVe reference implementation [22]. Currently, word embeddings are employed in almost every NLP task related to meaning and widely used instead of discrete word tokens as an input to more complex neural network models. In addition, embeddings began to be utilized in further completely different fields, such as to represent chemical molecular substructures [23] or in general to model categorical attributes [24,25]. In geography-related fields, the use of embeddings started to be explored very recently. Attempts were performed on modeling vector representations based on spatial proximity between points of interest in cities for place type similarity analysis [26,27] and functional region identification [28]. Regarding embeddings generated from human motion, their study is primarily focused on urban road systems and city-level mobility [29–31]. Recent research on social media data also includes multi-context embedding models for personalized recommendation systems, representing different features such as user personal behaviors, place categories and points of interest, in the same high dimensional space [32–34]. Moreover, a few works deal with embeddings for inferring users’ information from mobility traces, such as demographic information and mobile behavior similarity [35,36]. However, the majority of literature mainly focuses on local movements and regular user motion behaviors, and does not provide clear insights of the embedding representation space. Our work, inserted in this very new wave of embeddings investigation in geoinformatics, aims to further explore the meaning and future potential of vector representations based on the collective motion activity derived from large-scale mobility data.

3. Methodology Mot2vec is an unsupervised method trained on unlabeled data to obtain high-dimensional feature vectors (embeddings) of locations, which in turn can be used to create trace and visitor vectors. The algorithm consists of two steps, the trajectory pre-processing and the Word2vec-based model building. This section describes how to pre-process trajectories in order to feed them to the embedding model, and how to apply and train the model on such trajectories for constructing embeddings of locations, traces and visitors.

3.1. Trajectory Pre-Processing

A trajectory is initially composed of a series of track points expressed as T = {pi|i = 1, 2, 3, . . . , N}, where N is the number of track points. Each track point contains spatial information such as longitude, latitude, and time stamp [37], expressed as pi = (loni, lati, ti). Different ways of collecting mobility data ISPRS Int. J. Geo-Inf. 2019, 8, 134 5 of 23

(e.g., from device’s applications or from the network’s cell towers) may lead to different resolutions in time and space. In our analysis, we do not consider the actual geographical position of each track point, but we represent each location with an ID. In case of a serious sparsity of trajectory data leading to many unique locations with a very low number of occurrences, which may cause high computational cost and may not achieve a good result in terms of embedding representation of places, we suggest to group together adjacent locations. Existing methods utilize cell-based partitioning, clustering, and stop point detection to transform trajectories into discrete cells, clusters and stay points [38]. A valuable option may be to fix some meaningful reference points on the territory and project the other locations to the nearest fixed point. The minimum spatial resolution can be set of your choosing and may vary according to the precision required by different applications. The final result consists of a certain number of fixed points identified by a unique ID, representing a particular location or area. Mobility traces are then converted into sequences of IDs that unfold in fixed time steps, i.e., if timestep = 1h, the next ID in the sequence refers to one hour later than the previous ID. In this way, time information is encoded implicitly in the position along the sequence. If more than one event falls within a single time step, the one with most occurrences is chosen to represent the location of the user. The length of the time step depends on both the data source and the demand of the final application, in particular to balance accuracy with completeness of the sequences: a short time unit would increase trace fragmentation, a long unit would reduce the accuracy of representation of the actual mobility trace. In general, other sampling rules can be chosen to be employed without impacting the functionality of the algorithm, e.g., sequences with irregular time differences between points, leading however to a different meaning and behavioral representation in the vector space. In conclusion, the input to the embedding model consists of discrete location sequences (ID1, ID2 ,..., IDN), where, given a time step unit t, locations correspond to time (t, 2t,..., Nt). In the next subsection, we present the Word2vec algorithm, and how it is trained for learning location embedding representations.

3.2. Word2vec Model for Location Representation

3.2.1. Model Description Embeddings are dense vectors of meaning based on the distribution of element co-occurrences in large training corpora. Elements occurring in similar contexts have similar vectors, but particular vector components (features) are not directly related to any particular properties. Word2vec is one of the most efficient techniques to define these vectors. We now briefly present how it works. Each element in the training corpus is associated with a random initial vector of a pre-defined size, therefore obtaining a weight matrix of dimensionality num_elements × vector_size. During training, we move through the corpus with a sliding window containing the current focus element and its neighbor elements (its context). Although Word2vec is an unsupervised method, it still internally defines an auxiliary prediction task, where each instance is a prediction problem: the aim is to predict the current element with the help of its contexts (or vice versa). The outcome of the prediction determines whether we adjust the element vectors and in what direction. Prediction here is not an aim in itself, it is just a proxy to learn vector representations. Word2vec can be applied in two different forms: Continuous Bag-of-words (CBOW) and Skip-gram, both shown to outperform traditional count-based models in various tasks [39]. At training time, CBOW learns to predict the current element based on its context, while Skip-gram learns to predict the context based on the current element. This inversion statistically has some effects. CBOW treats an entire context as one observation, smoothing over a lot of the distributional information. On the other hand, Skip-gram treats each context-target pair as a new observation, and this tends to work better for larger datasets. A graphic representation of the two algorithms is reported in Figure3. The network is made of a single linear projection layer between the input and the output layers. ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 6 of 24

this tends to work better for larger datasets. A graphic representation of the two algorithms is reported in Figure 3. The network is made of a single linear projection layer between the input and ISPRS Int. J. Geo-Inf. 2019, 8, 134 6 of 23 the output layers.

Figure 3. Graphic representation of CBOW and Skip-gram model with a context window of two Figure 3. Graphic representation of CBOW and Skip-gram model with a context window of two elements in the past and two in the future. elements in the past and two in the future. In our method, we adopted the Skip-gram approach: at each training instance, the input for the predictionIn our method, is the currentwe adopted element the Skip-gram vector. The approach training: objective at each training is to maximize instance, the the probability input for the of prediction is the current element vector. The training objective is to maximize the probability of observing the correct context elements cE1 ... cEj given the target element Et, with regard to its current observing the correct context elements 𝑐𝐸 ...𝑐𝐸 given the target element 𝐸 , with regard to its embedding θt. The cost function C is the negative log probability of the correct answer, as reported in Equationcurrent embedding (1): 𝜃. The cost function 𝐶 is the negative log probability of the correct answer, as reported in Equation (1): j C = − ∑ log(p(cEi|Et)). (1) 𝐶= ∑i=1 log(𝑝 (𝑐𝐸|𝐸)). (1) This function is defined over the entire dataset, but it is typically optimized with stochastic gradientThis descent function (or is any defined other typeover of the stochastic entire dataset, gradient but optimizer) it is typically using mini-batch optimized training, with stochastic usually withgradient adaptive descent learning (or any rate. other The type gradient of stochastic of the lossgradient is derived optimizer) with respectusing mini-batch to the embedding training, parametersusually withθ, i.e.,adaptive∂C/∂θ ,learning and the embeddingsrate. The gradient are updated of the consequently loss is derived by taking with arespect small stepto the in theembedding direction parameters of the gradient. 𝜃, i.e., This 𝜕𝐶/𝜕𝜃 process, and is the repeated embeddings over the are entire updated training consequently corpus, tweaking by taking the a embeddingsmall step in vectors the direction for each of element the gradient. until they This converge process tois optimalrepeated values. over the entire training corpus, tweaking the embedding vectors for each element until they converge to optimal values. 3.2.2. Model Training 3.2.2.After Model pre-processing, Training trajectories are represented as sequences of fixed and discrete locations, each ofAfter them pre-processing, identified by a trajectories unique ID. Weare connectrepresented each locationas sequences ID to aof vector fixed ofand pre-defined discrete locations, size: the wholeeach of list them of places identified refers by to a a unique lookup ID. table We where connect each each ID location corresponds ID to to a avector particular of pre-defined unique row size: of thethe embeddingwhole list of matrix places of refers size numto a lookup_locations table× vectorwhere_ eachsize. ID corresponds to a particular unique row of theDuring embedding training, matrix we moveof size the 𝑛𝑢𝑚_𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝑠 sliding window through each ×𝑣𝑒𝑐𝑡𝑜𝑟_𝑠𝑖𝑧𝑒. single trace, feeding the Skip-gram Word2vecDuring model training, with we the move current the focus sliding location window and through its neighbor each single context trace, locations feeding in the the Skip-gram trajectory. WindowWord2vec size model can be with set ofthe your current choosing, focus dependinglocation and on its the neighbor final purpose context and locations the time resolutionin the trajectory. of the trajectories.Window size Smaller can be time set of resolution your choosing, may lead depending to a larger on window the final inpurpose terms of and locations, the time e.g., resolution one hour of contextthe trajectories. window Smaller contains time four resolution locations if maytime _leadresolution to a larger= 15 windowmin while in just terms one of if locations,time_resolution e.g., one= 1hourh. Aside context from timewindow resolution, contains larger four windows locations increase if 𝑡𝑖𝑚𝑒_𝑟𝑒𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛 the influence of distant =15𝑚𝑖𝑛 (inwhile time) just locations, one if whereas𝑡𝑖𝑚𝑒_𝑟𝑒𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛 smaller windows =1ℎ. Aside consider from onlytime veryresolution, neighboring larger locationswindows along increase the trajectory,the influence hence of visiteddistant in(in a time) shorter locations, time period, whereas emphasizing smaller windows the local consider proximity. only The very process neighboring is represented locations in along Figure the4 withtrajectory, a context hence window visitedof inthree a shorter locations time inperiod, the past emphasizing and three the in the local future. proximity. The model The process updates is therepresented embedding in Figure matrix 4 according with a context to locations’ window contexts of three alonglocations the in trajectories, the past and based three on in the the internal future. auxiliaryThe model prediction updates task, the definingembedding similar matrix vectors according for behaviorally to locations’ related contexts locations. along the trajectories, based on the internal auxiliary prediction task, defining similar vectors for behaviorally related locations.

ISPRS Int. J. Geo-Inf. 2019, 8, 134 7 of 23 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 7 of 24

FigureFigure 4.4. TheThe processprocess ofof slidingsliding windowwindow (with(with aa lengthlength ofof threethree locationslocations inin thethe pastpast andand threethree inin thethe future)future) andand thethe modelmodel input–outputinput–output pairs.pairs.

OnceOnce obtainedobtained the the embedding embedding representation representation of singleof single locations, locations, we canwe usecan them use them to create to densecreate vectorsdense vectors of traces, of visitors,traces, visitors, and even and groups even ofgroups visitors. of Followingvisitors. Following a simple buta simple efficient but approach efficient consistingapproach consisting of averaging of averaging word embeddings word embeddings in a text forin a creating text for sentencecreating sentence and document and document vectors, whichvectors, has which proven has to proven be a strong to be baselinea strong acrossbaseline a multitudeacross a multitude of NLP tasks of NLP (e.g., tasks [40, 41(e.g.,]), we [40,41]), combine we continuouscombine continuous location vectors location into vectors continuous into continuous trace and visitortrace and vectors. visitor Since vectors. trace (visitor)Since trace meaning (visitor) is composed of individual location meanings, we define a composition function consisting of an average meaning→ is composed of individual location meanings, we define a composition function consisting vectorof an averageV over thevector vectors 𝑉⃗ over of all the elements vectorsE of1 ... allE elementsn in the composition, 𝐸...𝐸 in the as reportedcomposition, in Equation as reported (2): in Equation (2): → 1 n → V = × E . (2) n ∑ k k=1 𝑉⃗ = × ∑ 𝐸⃗. (2) If the element vectors are generated from a good embedding model, this bottom-up approach can be very efficient. Despite having the main disadvantage of not taking into account the location order, thereIf are the several element advantages vectors are in buildinggenerated composition from a good embeddings embedding in model, this way: this bottom-up approach can be very efficient. Despite having the main disadvantage of not taking into account the location •order,They there work are fastseveral and advantages reuses already in buil trainedding models.composition embeddings in this way: • •Behaviorally They work connectedfast and reuses locations already collectively trained models. increase or decrease expression of the •corresponding Behaviorally components. connected locations collectively increase or decrease expression of the • Meaningfulcorresponding locations components. automatically become more important than noise locations. • Meaningful locations automatically become more important than noise locations. FigureFigure5 5 shows shows a a summarizing summarizing graph graph of of the the embedding embedding generation generation process. process.

ISPRS Int. J. Geo-Inf. 2019, 8, 134 8 of 23 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 8 of 24

Figure 5. Overview of the embedding generation steps. Step 1: vector representations of locations are Figure 5. Overview of the embedding generation steps. Step 1: vector representations of locations are generated by means of an unsupervised training process. Step 2: location vectors are retrieved and generated by means of an unsupervised training process. Step 2: location vectors are retrieved and averaged to obtain trace and visitor vectors. averaged to obtain trace and visitor vectors. 4. Experiment 4. Experiment This section first describes the dataset used for training Mot2vec and the evaluation metrics, and thenThis section shows thefirst results describes at differentthe dataset levels: used locations,for training traces, Mot2vec visitors, and andthe evaluation groups of metrics, visitors. Theand Skip-gramthen shows embedding the results modelat different was implemented levels: locati andons, executedtraces, visitors, on TensorFlow and groups using of anvisitors. AWS EC2The p3.2xSkip-gram large GPUembedding instance. model was implemented and executed on TensorFlow using an AWS EC2 p3.2x large GPU instance. 4.1. Experimental Preparation 4.1. Experimental Preparation 4.1.1. Dataset 4.1.1. Dataset A real-world trajectory dataset is used to evaluate the model. Data were provided by a major telecomA real-world operator and trajectory consist ofdataset an anonymized is used to sampleevaluate of the seven model. months Data of were roamers’ provided call detail by a records major (CDRs)telecom in operator Italy. The and dataset consist spans of thean periodanonymized between sample the beginning of seven of months May to of end roamers’ of November call detail 2013. recordsEach (CDRs) CDR isin related Italy. The to a dataset mobile phonespans the activity period (e.g., between phone the calls, beginning SMS communication, of May to end data of connection),November 2013. enriching the event with a time stamp and the current position of the device, represented as theEach coverage CDR areais related of the to principal a mobile antenna. phone Theactivi sizety of(e.g., this phone coverage calls, area SMS can communication, vary from a few data tens ofconnection), meters in aenriching city to a few the kilometers event with in remotea time areas. stamp CDRs and have the alreadycurrent beenposition utilized of inthe studies device, of humanrepresented mobility as the to characterizecoverage area people’s of the behaviorprincipal andantenna. predict The human size of motion this coverage [42–46]. area can vary fromThe a few sequences tens of meters of antenna in aconnection city to a few events kilometers are often in discontinuousremote areas. andCDRs sparse, have depictingalready been the usualutilized erratic in studies profile of ofhuman mobile mobility activity to patterns. characterize To reduce people’s trace behavior fragmentation, and predict we human chose the motion time step[42–46]. unit to be one hour. If more than one event occurred in the same hour, we selected the location associatedThe sequences with the of majority antenna of connection those events events in order are often to represent discontinuous the current and sparse, position depicting of the user. the Inusual addition, erratic due profile to the of time mobile step activity unit chosen patterns. and the To geographical reduce trace sparsity fragmentation, of the locations, we chose we the defined time astep minimum unit to be space one resolutionhour. If more of 2 than km. one Hence, event we occurred selected in as the reference same hour, points we the selected antennas the withlocation the highestassociated number with ofthe connections majority of withinthose events a 2 km in distance, order to and represent projected the the current other position antennas of to the the user. closest In addition, due to the time step unit chosen and the geographical sparsity of the locations, we defined a minimum space resolution of 2 km. Hence, we selected as reference points the antennas with the

ISPRS Int. J. Geo-Inf. 2019, 8, 134 9 of 23 reference point. We also eliminated locations with just a few tens of occurrences, being almost never visited and therefore considered as a bias of the overall behavior of foreign visitors in Italy. However, in general, the choice of parameters such as time and space resolution can be chosen differently and should be set according to the characteristics of the datasets. We finally obtained 1-hour encoded sequences of almost six thousand unique locations over the whole Italian territory. Table1 summarizes the characteristics of the pre-processed dataset. It contains 5.1 million trajectories, with an average trajectory length of 11.2 hours. The average traveled distance per hour is 13.4 km.

Table 1. Summary characteristics of the pre-processed dataset.

Avg Trace Length Avg Displacement Num. Users Num. Traces Num. Locations (in Time) per Hour 1.6 millions 5.1 millions 11.2 hours 13.4 km 5903

4.1.2. Experimental Settings We implemented the model with a window size of three hours (locations) in the past and three in the future, and a vector size of 100 dimensions. The Word2vec algorithm was trained by using mini batch approach with noise-contrastive estimation loss and Adam optimizer [47,48]. To measure the “closeness” between embeddings, we use the cosine similarity, translating the behavioral similarity of locations, traces and visitors into the cosine of the angle between vectors: similarity lowers as the angle grows, while it grows as the angle lessens. As shown in Equation (3), cosine similarity consists of the dot product of unit-normalized vectors:

→ → V(LOC1) × V(LOC2) cos(LOC1, LOC2) = (3) → → V(LOC1) × V(LOC2) .

To visually display relationships between high dimensional vectors, we apply t-distributed Stochastic Neighbor Embedding (t-SNE). The t-SNE method reduces dimensionality while trying to keep similar instances close and dissimilar instances apart. It is widely used for visualizing clusters of instances in high-dimensional space [49].

4.2. Evaluation The evaluation is performed on two levels: location embeddings and compositional embeddings. Location evaluation focuses on the behavioral similarity between single places, the direct output of the Mot2vec model. We compare geographical and behavioral proximity between close and distant places, showing the different behavioral patterns between locations belonging to big cities or small towns, or representing centers of high motion connectivity such as airports and train stations. Compositional evaluation deals with trace and visitor embeddings, which are obtained averaging vectors of single locations. We explore the meaning of trace similarity, showing how particular areas of interest influence representations of traces located in their proximity, and study user compositions, a particular case of trace embedding analysis where the number of visited locations may be very high. We also perform intra-visitor analysis, in order to show how movements of the same user can be grouped and compared on the basis of a selected attribute, and display a vector representation of groups of visitors, in particular grouping tourists of the same nationality, in order to reveal meaningful clusters in their motion behavior.

4.2.1. Evaluation on Location Embeddings Geographical distance and “behavioral” distance are not proportional: same distances in space do not imply same behavioral similarities. Although locations’ distances may be geographically ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 10 of 24

Geographical distance and “behavioral” distance are not proportional: same distances in space

ISPRSdo not Int. imply J. Geo-Inf. same2019, 8behavioral, 134 similarities. Although locations’ distances may be geographically10 of 23 comparable, people’s motion between them may be very different. Figure 1 in the Introduction reported the example of similar spatial distances LOC1–LOC2 and LOC3–LOC2. In Figure 6, we comparable,represent the people’s embeddings motion of between those three them locations may be very together different. with Figure their1 geographicin the Introduction coordinates reported and thethe examplecosine similarity of similar between spatial them: distances cos(LOC1-LOC2) LOC1–LOC2 is and equal LOC3–LOC2. to 0.48, while In cos Figure(LOC3-LOC2)6, we represent is equal theto 0.70. embeddings This shows of those that, threealthough locations LOC1 together and LOC3 with are their almost geographic equally coordinatesfar from LOC2 and in the terms cosine of similarityspatial distance between (in particular them: cos(LOC1-LOC2) LOC1–LOC2 is is slightly equal to shorter), 0.48, while theircos behavioral(LOC3-LOC2) representation is equal tois 0.70.very Thisdifferent shows (higher that, similarity although LOC1between and LOC3 LOC3 and are LOC2) almost in equallyaccordance far with from the LOC2 trajectories in terms depicted of spatial in distanceFigure 1. (in particular LOC1–LOC2 is slightly shorter), their behavioral representation is very different (higher similarity between LOC3 and LOC2) in accordance with the trajectories depicted in Figure1.

FigureFigure 6.6. GeographicGeographic representationrepresentation andand embeddingembedding representationrepresentation ofof LOC1,LOC1, LOC2LOC2 andand LOC3LOC3 ofof thethe exampleexample ofof FigureFigure1 1.. Although Although spatial spatial distance distance LOC1–LOC2 LOC1–LOC2 isis slightlyslightly shorter,shorter, LOC3 LOC3 and and LOC2 LOC2 are are substantiallysubstantially moremore similar.similar. To have a general idea about relationships between location embeddings, vectors can be To have a general idea about relationships between location embeddings, vectors can be dimensionally reduced through t-SNE and plotted. In Figure7, each location is labeled as the province dimensionally reduced through t-SNE and plotted. In Figure 7, each location is labeled as the and colored as the region it belongs to. We can clearly notice a tendency of grouping locations of the province and colored as the region it belongs to. We can clearly notice a tendency of grouping same region/province (reasonably places located in the same area are more likely to belong to the locations of the same region/province (reasonably places located in the same area are more likely to same trajectories), but it is not always the case. Moreover, locations having many connections spread belong to the same trajectories), but it is not always the case. Moreover, locations having many across the whole territory are placed at the center of the plot. In order to make the plot understandable, connections spread across the whole territory are placed at the center of the plot. In order to make only the 100 most visited locations are represented. the plot understandable, only the 100 most visited locations are represented. Analyzing similarity distributions, we observed recurrent patterns based on location types, in particular related to places belonging to cities, small towns and rural areas, airports and stations.

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 11 of 24 ISPRS Int. J. Geo-Inf. 2019, 8, 134 11 of 23 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 11 of 24

Figure 7. t-SNE reduction of the 100 most visited locations labeled as the province and colored as the

region they belong to. FigureFigure 7.7. t-SNEt-SNE reductionreduction ofof thethe 100100 mostmost visitedvisited locationslocations labeledlabeled asas thethe provinceprovince andand coloredcolored asas thethe regionregionAnalyzing theythey belongbelong similarity to.to. distributions, we observed recurrent patterns based on location types, in particular related to places belonging to cities, small towns and rural areas, airports and stations. LocationsAnalyzingLocations in similarity towns in towns or distribution rural or rural areas areass, tend we tend observed to have to have very recurrent very high high similaritypatterns similarity based values values on with location with places placestypes, in the inin the surroundingparticularsurrounding related area area andto places lowand similaritieslow belonging similarities withto cities, with more small more distant towns di locations.stant and locations. rural In other areas, In words,other airports words, places and places belonging stations. belonging to ruraltoLocations areas rural are areas maximallyin aretowns maximally or connected rural connected areas to nearbytend to to nearby locations. have locations.very We high present Wesimilarity present an example values an example in with Figure placesin8 Figure. in the8. surrounding area and low similarities with more distant locations. In other words, places belonging to rural areas are maximally connected to nearby locations. We present an example in Figure 8.

cos(X,S1) = 0.813 cos(X,S2) = 0.777 cos(X,S1) = 0.813 cos(X,S3) = 0.772 cos(X,S2) = 0.777 cos(X,S4) = 0.740 cos(X,S3) = 0.772 cos(X,S5) = 0.694 cos(X,S4) = 0.740

cos(X,S5) = 0.694

Figure 8. Example about similarity of towns/rural areas. The top five similarities with location X are Figure 8. Example about similarity of towns/rural areas. The top five similarities with location X are reported:reported: high high similarities similarities tend tend to distribute to distribute over over the local the local area area surrounding surrounding location location X. X. Figure 8. Example about similarity of towns/rural areas. The top five similarities with location X are Onreported: theOn otherthe high other hand,similarities hand, locations locationstend to in distribute cities in cities tend over tend not the tonot local reach to areareach comparable surrounding comparable levels location levels of X. similarity. of similarity. Nearby Nearby locationslocations tend tend to be to again be again the most the similarmost similar ones, but ones, the but cosine the similarity cosine similarity values do values not reach do not the levelsreach the achievedlevelsOn the forachieved other rural hand, areas: for rural locations closer areas: and in closer distantcities tendand places distantnot have to reachpl aaces narrower comparable have a similarity narrower levelsrange. ofsimi similarity.larity This range. is Nearby mainly This is duelocationsmainly to the tend presence due to to be the in again citiespresence the of centersmost in cities similar of of long-distance centers ones, butof long-distance the movements cosine similarity suchmovements as trainvalues such stations do as not train or reach airports, stations the or butlevels alsoairports, achieved to a generalbut foralso rural tendency to a areas:general of closer tourists tendency and of distant travelingof tourists pl betweenaces of travelinghave cities, a narrower between while moving simi cities,larity inwhile therange. samemoving This local inis the areasmainlysame in due case local to of theareas small presence in towns case orinof touristiccitiessmall of towns centers villages. or oftouris Therefore, long-distancetic villages. small movements Therefore, towns present suchsmall higher as towns train values presentstations of thehigher or topairports, cosinevalues but similaritiesof alsothe topto a cosine butgeneral towards similarities tendency a very butof limited tourists towards number of a traveling very of locations,limited between number whereas cities, of citieslowhilecations, show moving whereas lower in topthe cities similaritysameshow local lower values areas top spreadin similaritycase overof small avalues larger towns spread number or touris over of locations,tica larg villages.er number due Therefore, to of the locations, higher small number duetowns to the ofpresent connections higher higher number theyvaluesof are connectionsof linked. the top The cosine they particular similaritiesare linked. case of but trainThe towards particular stations a invery aca cityse limited of reveals train number howstations top of similarityloincations, a city values whereasreveals are howcities also top obtainedshowsimilarity lower for top distant values similarity locations. are also values obtained Figure spread9 displays for over distant a the larg examplelocatier numberons. of Figure the of locations, city 9 ofdisplays Milan, due the reportingto theexample higher the of topnumber the five city of similaritiesof connectionsMilan, reporting for a they generic theare location toplinked. five insimilaritiesThe the cityparticular (a) for and aca forgenericse theof main trainlocation trainstations in station the in city (b):a (a)city similarity and reveals for the values how main top are train clearlysimilaritystation lower values (b): than similarity are in thealso previous valuesobtained are example for clearly distant about lower locati towns/rural thanons. in Figure the previous areas. 9 displays Itis example worth the example noticing about towns/rural thatof the the city third areas.of highestMilan,It is reporting similarityworth noticing the with top the that five main the similarities trainthird stationhighest for ina similarity generic Milan is location thewith main the in trainmain the city stationtrain (a) station inand Turin, for in theMilan and main again is thetrain the main fourthstationtrain and (b): station thesimilarity fifth in onesTurin, values are and inare proximityagain clearly the lower offourth a transit than and in station thethe previousfifth crossed ones example byare many in proximity about trains towns/rural connecting of a transit Milanareas. station toIt is variouscrossed worth locations noticing by many inthat trains northeastern the connecting third highest Italy. Milan similarity to various with locations the main in train northeastern station in Italy. Milan is the main train station in Turin, and again the fourth and the fifth ones are in proximity of a transit station crossed by many trains connecting Milan to various locations in northeastern Italy.

ISPRSISPRS Int. Int. J. J. Geo-Inf. Geo-Inf.2019 2018, ,8 7,, 134 x FOR PEER REVIEW 12 12 of of 23 24

cos(X,S1) = 0.680 cos(X,S2) = 0.631 cos(X,S3) = 0.625 cos(X,S4) = 0.623

cos(X,S5) = 0.604

(a)

cos(X,S1) = 0.725 cos(X,S2) = 0.651

cos(X,S3) = 0.626

cos(X,S4) = 0.608

cos(X,S5) = 0.597

(b)

FigureFigure 9. 9.Two Two examples examples about about similarity similarity of cities. of cities. The top Th fivee top similarities five similarities with location with Xlocation are reported, X are inreported, case X represents in case X represents a generic location a generic in location the city (ina), the and city in case(a), and X represents in case X represents the main train the main station train of thestation city (ofb). the Top city similarity (b). Top valuessimilarity are lowervalues than are lower in Figure than8 .in Moreover, Figure 8. theMoreover, train station the train has station three out has ofthree the topout fiveof the similarities top five sim relatedilarities to otherrelated train to other stations trai outsiden stations the outside city. the city.

ISPRS Int. J. Geo-Inf. 2019, 8, 134 13 of 23 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 13 of 24

TheThe case case of airportsof airports is an is “extreme”an “extreme” version version of big of big cities cities and and train train stations: stations: top top similarity similarity values values areare even even lower lower and mayand alsomay comprise also comprise a few verya few distant very locations.distant locations. In particular, In particular, similarity issimilarity evident is betweenevident airports between connected airports by connected frequent by flight frequent routes. flig Weht present routes. an We example present in an Figure example 10, wherein Figure the 10, mainwhere airport the inmain Rome airport shows in aRome high similarityshows a high with similarity the airport with in Naples.the airport in Naples.

cos(X,S1) = 0.748 cos(X,S2) = 0.683

cos(X,S3) = 0.591

cos(X,S4) = 0.561

cos(X,S5) = 0.553

FigureFigure 10. Example10. Example about about similarity similarity of airports. of airports. The The top top five five similarities similarities with with location location X are X are reported: reported: toptop similarity similarity values values are are even even lower lower than than in Figure in Figure9. It 9. is It worth is worth observing observing that that the fourththe fourth highest highest similaritysimilarity is related is related to another to another airport airport in another in another city. city.

SimilaritiesSimilarities between between location location types types can can describe describe how how people people tend tend to moveto move across across the the territory, territory, revealingrevealing the the main main transportation transportation connections. connections. Comparing Comparing similarities similarities between between airports airports and and stations stations of twoof two different different cities cities can can reveal reveal if people if people tend tend to travelto travel by planeby plane or byor trainby train when when moving moving from from a a citycity to theto the other other one. one. In In Figure Figure 11 11,, we we presentpresent anan example about the the cities cities of of Milan, Milan, Bologna Bologna and and Bari. Bari.Comparing Comparing Milan Milan and and Bologna Bologna discloses higherhigher similaritysimilarity between between stations stations rather rather than than airports, airports, implyingimplying a larger a larger number number of people of people traveling traveling by train.by train. On On the the other other hand, hand, comparing comparing Milan Milan and and Bari Bari exhibitsexhibits higher higher similarity similarity between between airports, airports, implying implying a tendency a tendency oftraveling of traveling by planeby plane inaccordance in accordance withwith the the long long distance. distance.

ISPRS Int. J. Geo-Inf. 2019, 8, 134 14 of 23

ISPRSISPRS Int. Int. J. Geo-Inf. J. Geo-Inf. 2018 2018, 7,, x7 ,FOR x FOR PEER PEER REVIEW REVIEW 14 14 of of 24 24

FigureFigureFigure 11. 11. 11. SimilaritySimilarity Similarity relationships relationships relationships between between between airports airports airports and and and stations stations stations of of ofthr three three eedifferent different different cities. cities. The The The example example example suggestssuggestssuggests a a tendency tendencya tendency of of oftraveling traveling traveling by by by train train train between between between Milan Milan Milan and and and Bologna, Bologna, Bologna, and and and by by by plane plane between between Milan Milan Milan andandand Bari. Bari. Bari.

TheTheThe overall overall overall different different different simila similarity similarityrity behavior behavior of of ofcities, cities, towns towns and and and airports airports airports can can can be be be clearly clearly observed observed in in in thethethe normalized normalized normalized histograms histograms histograms of of ofFigure Figure Figure 12, 12 12, depicting, depictingdepicting th theireirtheir cosine cosinecosine similarity similarity distributions distributions towards towards towards all all allof of theofthethe locations locations locations in in inthe the the dataset. dataset. The The histograms histograms are are zoomed zoomedzoomed in in inbetween betweenbetween values valuesvalues 0.4 0.40.4 and and and 0.9 0.9 0.9 of of ofcosine cosine similarity.similarity.similarity. Cities Cities havehave have aa tendency tendencya tendency for for for larger larger larger percentages perc percentagesentages of of mid-low ofmid-low mid-low similarity similarity similarity locations locations locations (double (double (double in thein in theslotthe slot 0.4–0.5) slot 0.4–0.5) 0.4–0.5) and and a and lower a lowera percentagelower percentage percentage of high of ofhigh similarities high similarities similarities (half in(half (half the in slot inthe the 0.7–0.8) slot slot 0.7–0.8) 0.7–0.8) with respect with with respect torespect towns. to to towns.Airportstowns. Airports presentAirports present a present lower a percentage lowera lower percentage percentage of both of high ofboth both and high high low and and similarities, low low similarities similarities consequently, consequently, consequently having havinga having larger a a largernumberlarger number number of very oflow ofvery very similarity low low similarity similarity locations locations locations (cos < 0.4).(cos (cos < 0.4).< 0.4).

Figure 12. Cont.

ISPRS Int. J. Geo-Inf. 2019, 8, 134 15 of 23 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 15 of 24

cosine towns cities airports 0.4-0.5 0.0741 0.1414 0.0643 0.5–0.6 0.0212 0.0365 0.0137 0.6–0.7 0.0059 0.0054 0.0015 0.7–0.8 0.0016 0.0007 0.0002

FigureFigure 12. Similarity12. Similarity distribution distribution of towns, of towns, cities cities and and airports airports in the in the slot slotcos =cos 0.4–0.9. = 0.4–0.9. Towns Towns present present higherhigher percentages percentages for for high high similarity similarity slots, slots, cities cities for for mid-low mid-low similarity similarity slots, slot ands, and airports airports for for very very lowlow slots. slots.

WeWe finally finally measured measured the the average average distance distance of city,of city, town town and and airport airport locations locations with with their their top top 20 20 highesthighest similarity similarity locations. locations. Results Results are reportedare reported in Table in Table2. The 2. behavior The behavior is even is more even distinctive more distinctive if we if considerwe consider how towns how havetowns a lowerhave a mean lower distance mean distance despite thedespite higher the sparsity higher sparsity of antennas of antennas in rural areas. in rural areas. Table 2. Average distance with top 20 highest similarity locations.

TableTowns 2. Average top 20 distance mean distance with top 20 highest 14.59 similarity km locations. Cities top 20 mean distance 23.99 km AirportsTowns top 20 meantop 20 distance mean distance 72.4514.59 kmkm Cities top 20 mean distance 23.99 km Airports top 20 mean distance 72.45 km 4.2.2. Evaluation on Compositional Embeddings

The embedding representation allows for comparing traces on the basis of their behavioral meaning,4.2.2. Evaluation not simply on as geographicalCompositional distance Embeddings between their COMs. Figure 13 shows five traces located in differentThe areasembedding of Italy. representation Comparing the allows reported for cosine compar similaritiesing traces between on the basis trace T1of andtheir the behavioral other four,meaning, the behavioral not simply distance as geographical between traces distance seems between somehow their to correspondCOMs. Figure to their13 shows geographical five traces distance.locatedAs in adifferent general areas tendency, of Italy. this Comparing is true when thecomparing reported cosine different similarities traces located between in trace different T1 and separatethe other areas four, over the a large behavioral territory. distance However, between this is nottraces determined seems somehow directly by to their correspond geographical to their proximity,geographical but indirectly distance. becauseAs a general of the tendency, higher number this is true of pathswhen betweencomparing places different in adjacent traces located areas: in singledifferent locations separate are generally areas over more a connectedlarge territory. to places Howe withinver, athis certain is not maximum determined radius. directly In addition, by their it isgeographical interesting toproximity, notice that but both indirectly T4 and T5because comprise of the an higher airport number location: of removing paths between it from places their in compositionadjacent areas: function single leads locations to a substantial are generally drop mo inre similarity, connected showing to places once within again a the certain influence maximum of airportsradius. as centersIn addition, of long-distance it is interesting connectivity. to notice that both T4 and T5 comprise an airport location: removing it from their composition function leads to a substantial drop in similarity, showing once again the influence of airports as centers of long-distance connectivity.

ISPRS Int. J. Geo-Inf. 2019, 8, 134 16 of 23 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 16 of 24

cos(T1, T2) = 0.896 cos(T1, T3) = 0.702 cos(T1, T4) = 0.392 (excl. airport = 0.334) cos(T1, T5) = 0.344 (excl. airport = 0.284)

FigureFigure 13. 13.Similarity Similarity comparison comparison between between five five traces traces distributed distributed over over a wide a wide territory. territory. Each Each of T4 of and T4 T5and comprises T5 comprises an airport an airport location location in the composition: in the comp similaritiesosition: similarities in brackets in are brackets reported are excluding reported thoseexcluding airport those locations airport from locations the traces. from the traces.

ExploringExploring tracetrace behaviorbehavior atat aa smallersmaller scale,scale, consideringconsidering activitiesactivities withinwithin smallsmall areas,areas, revealsreveals howhow relationshipsrelationships betweenbetween tracetrace vectorsvectors cancan gogo beyondbeyond spatialspatial proximity,proximity, detecting detecting moremore complexcomplex informationinformation than than the the only only COM COM position position and trace and direction trace direction in space. in We space. noticed We that noticed higher similaritiesthat higher weresimilarities obtained were for tracesobtained located for traces near alocated same common near a same specific common area, such specific as a lake,area, asuch coastline, as a lake, a big a citycoastline, neighbor a big or city a confined neighbor rural or area.a confined Figure rural 14 shows area. threeFigure examples 14 shows of three this type,examples depicting of this traces type, withdepicting comparable traces COMwith comparable distances but COM different distances behavioral but different representation. behavioral Traces representation. near different Traces sides ofnear the different same lake sides (a), tracesof the reachingsame lake the (a), same traces shore reaching (b), and the traces same crossingshore (b), the and same traces city crossing (c), are allthe casessame where city (c), a commonare all cases area where of interest a common determines area higherof interest similarities determines for traces higher located similarities in its proximity. for traces Welocated name in an its area proximity. as “of interest” We name if it an is subjectedarea as “of to inte frequentrest” if paths it is ofsubjected users largely to frequent moving paths over of it. users It is thereforelargely moving defined over by theit. It motion is therefore behavior defined of people, by the notmotion by geographical behavior of pe borders:ople, not a lakeby geographical may be an areaborders: of interest a lake notmay as be such an butarea because of interest of similar not as embeddingssuch but because of places of similar surrounding embeddings it, implying of places a considerablesurrounding motion it, implying activity a onconsiderable that particular motion delimited activity territory. on that In particular other words, delimited an area subjectedterritory. toIn significantother words, local an connectivity area subjected “attracts” to significant traces nearby, local causingconnectivity similar “attracts” vector representations,traces nearby, causing due to thesimilar presence vector of representations, highly connected due locations to the (accordingpresence of to highly the general connected motion locations activity) (according in the different to the tracegeneral compositions. motion activity) in the different trace compositions. A visitor embedding vector is a particular case of trace composition vector, in which the number of track points (and therefore locations) may be very high, hence covering a larger area, and not be part of a single continuous trace but belonging to all the traces of each specific user. As the way of constructing visitor embeddings is the same as for trace embeddings, what we inferred about traces is generally valid also for visitors. Figure 15 reports an example of visitors’ representation and

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 17 of 24

ISPRS Int. J. Geo-Inf. 2019, 8, 134 17 of 23 comparison. As expected, the highest similarity is between V1 and V2, covering geographically closer areas, but V1 is more similar than V2 towards V3 and V4. This is because V1 covers a larger territory near bigger cities, whereas V2 is located in a coastal area made of small towns and more distant from big cities: as we reported previously, cities are centers of longer-distance connections. ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 17 of 24

cos(T1, T2) = 0.693 cos(T1, T2) = 0.735 cos(T1, T2) = 0.826 cos(T1 , T3) = 0.641 cos(T1, T3) = 0.699 cos(T1, T3) = 0.806

(a) (b) (c)

Figure 14. Three examples of traces with similar COM distances but different behavioral representation: (a) shows two traces (T1 and T2) near different sides of the same lake, and a third one (T3) near a different lake; (b) shows two traces (T1 and T2) reaching the same tract of coastline, and a third one (T3) traveling in the inland parallel to the coast; (c) shows two traces (T1 and T2) crossing the metropolitan area of Milan, and a third one (T3) traveling along the highway at the border of the city.

A visitor cos embedding(T1, T2) = 0.693 vector is a cosparticular(T1, T2) =ca 0.735se of trace composition cos(T1, T2) vector,= 0.826 in which the number of track cos( pointsT1 , T3) (and = 0.641 theref ore locations) cos(T1, T3 may) = 0.699 be very high, hencecos(T1 covering, T3) = 0.806 a larger area, and not be part of a single continuous trace but belonging to all the traces of each specific user. As the way of constructing(a visitor) embeddi ngs is the ( bsame) as for trace embeddings, (c) what we inferred about tracesFigureFigure is 14. generally14.Three Three examples validexamples also of traces forof visitors. withtraces similar with Figure COMsimila 15 distances reportsr COM but andistances differentexampl e behavioralbut of visitors’different representation: representationbehavioral and comparison.(arepresentation:) shows two As traces ( aexpected,) shows (T1 and two the T2)traces highest near (T1 different and similarity T2) sidesnear different ofis thebetween same sides lake, V1of the and and same aV2, third lake, covering one and (T3) a third geographically near one a closerdifferent(T3) nearareas, lake; a different but (b) showsV1 lake;is more two (b traces) shows similar (T1 two andthan traces T2) V2 reaching (T1 towards and theT2) V3 same reaching and tract V4. the of coastline,Thissame is tract because and of acoastline, third V1 onecovers and (T3) a a larger territorytravelingthird one near in (T3) the biggertraveling inland parallelcities, in the whereas to inland the coast; parallel V2 (cis) shows locatto theed twocoast; in traces a ( ccoastal) shows (T1 and area two T2) tracesmade crossing (T1 of theandsmall metropolitan T2) towns crossing and more distantareathe metropolitan of from Milan, big and cities: area a third of as Milan, one we (T3) reported and traveling a third previously, alongone (T3) the traveling highwaycities are along at centers the the border highway of lo ofnger-distance the at city. the border connections. of the city.

A visitor embedding vector is a particular case of trace composition vector, in which the number of track points (and therefore locations) may be very high, hence covering a larger area, and not be part of a single continuous trace but belonging to all the traces of each specific user. As the way of constructing visitor embeddings is the same as for trace embeddings,cos(V1, what V2) we = 0.625inferred about traces is generally valid also for visitors. Figure 15 reports an example of cosvisitors’(V1, V3) representation = 0.423 and comparison. As expected, the highest similarity is between V1 and V2,cos (V1,covering V4) = geographically0.373 closer areas, but V1 is more similar than V2 towards V3 and V4. This is because V1 covers a larger cos(V2, V3) = 0.371 territory near bigger cities, whereas V2 is located in a coastal area made of small towns and more distant from big cities: as we reported previously, cities are centers of longer-distancecos(V2, V4) = 0.320connections. cos(V3, V4) = 0.235

cos(V1, V2) = 0.625 cos(V1, V3) = 0.423

cos(V1, V4) = 0.373 FigureFigure 15. Similarity 15. Similarity comparison comparison between between four visitors four visitors distributed distributed over acos wideover(V2, territory.a V3)wide = territory.0.371 The red spots The red representspots therepresent area covered the area by covered the movements by the movements of each visitor. of each The visitor. highest Thecos similarity (V2,highest V4) is similarity between= 0.320 V1is between and V2, located in geographically closer areas. However, V1 is more similar than V2 towards V3 and V4: cos(V3, V4) = 0.235 V1 covers a larger territory along Arno river near bigger cities such as Florence, Leghorn and Pisa, whereas V2 is located in a very touristic coastal area made of small towns and more distant from cities,

centers of longer-distance motion behavior.

Figure 15. Similarity comparison between four visitors distributed over a wide territory. The red spots represent the area covered by the movements of each visitor. The highest similarity is between

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 18 of 24

V1 and V2, located in geographically closer areas. However, V1 is more similar than V2 towards V3 and V4: V1 covers a larger territory along Arno river near bigger cities such as Florence, Leghorn and ISPRS Int. J. Geo-Inf. 2019 8 Pisa, whereas V2, ,is 134 located in a very touristic coastal area made of small towns and more distant18 of 23 from cities, centers of longer-distance motion behavior. In addition to the inter-visitor analysis, an intra-visitor comparison can be performed to study In addition to the inter-visitor analysis, an intra-visitor comparison can be performed to study particular motion patterns in the traveling behavior of a single user. The visited locations can be particular motion patterns in the traveling behavior of a single user. The visited locations can be gathered in different groups, which in turn can be compared between each other. An example may gathered in different groups, which in turn can be compared between each other. An example may be the study of a visitor’s behavior in different hours of the day. Figure 16 shows how the motion be the study of a visitor’s behavior in different hours of the day. Figure 16 shows how the motion behavior of a user over several days is distributed during the morning, afternoon, evening and night, behavior of a user over several days is distributed during the morning, afternoon, evening and also reporting the cosine similarities between the different parts of the day. In particular, this example night, also reporting the cosine similarities between the different parts of the day. In particular, this reports morning and evening having the highest similarity, while afternoon and night share the lowest example reports morning and evening having the highest similarity, while afternoon and night similarity: the motion activity during morning and evening is mainly performed in a confined area, share the lowest similarity: the motion activity during morning and evening is mainly performed in beinga confined centered area, on being the same centered few locations,on the same whereas few locati it isons, spread whereas over ait wideris spread territory over a inwider the afternoon territory andin the comprises afternoon one and single comprises location one in single the night. location in the night.

cos(morning, afternoon) = 0.957 cos(morning, night) = 0.963 cos(afternoon, night) = 0.854 cos(morning, evening) = 0.978 cos(afternoon, evening) = 0.973 cos(evening, night) = 0.903

Figure 16. Similarity comparison between motion behaviors of a user (over several days) during

different parts of the day: morning (6:00 a.m.–12:00 p.m.), afternoon (12:00 p.m.–6:00 p.m.), evening (6:00Figure p.m.–12:00 16. Similarity a.m.), andcomparison night (12:00 between a.m.–6:00 motion a.m.). behavi Forors each of spot,a user we (over reported several the days) percentage during of timedifferent spent parts in every of the location day: morning for each (6:00 of the a.m.–12:00 four time p.m.), intervals. afternoon Morning (12:00 and p.m.–6:00 evening p.m.), are shown evening to have(6:00 the p.m.–12:00 highest similarity, a.m.), andwhile night afternoon(12:00 a.m.–6:00 and night a.m.). share For theeach lowest spot, we similarity. reported The the motion percentage activity of duringtime spent morning in every and eveninglocation isfor mainly each of performed the four time in the intervals. same few Morning locations, andwhereas evening itare comprises shown to a widerhave territorythe highest in the similarity, afternoon while and oneafternoon single locationand night in theshare night. the lowest similarity. The motion activity during morning and evening is mainly performed in the same few locations, whereas it We finally performed analysis on groups of visitors, in particular grouping users by nationality. comprises a wider territory in the afternoon and one single location in the night. The aim was to observe if there were common visiting patterns in the motion of visitors coming from variousWe countries. finally performed Therefore, analysis we averaged on groups the of embeddings visitors, in particular of each time-stamped grouping users location by nationality. for every userThe ofaim the was same to observe nationality, if there and were compared common the visiting newly patterns obtained in vectorsthe motion representing of visitors coming each country. from Wevarious reduced countries. the vectors Therefore, using t-SNEwe averaged and plot the them embeddings to visually of each find time-stamped clusters. As shown location in Figurefor every 17 , countriesuser of the with same similar nationality, culture and compared geographical the proximitynewly obtained often appearvectors closerepresenting to each other.each country. Groups ofWe nationalities reduced the belonging vectors using to the t-SNE same and areas plot are them clearly to visually visible: find the clusters. majority As of shown Eastern-European in Figure 17, countriescountries on with the similar upper rightculture (Slovakia, and geographical Czech Republic, proximity Poland, often Hungary,appear close Slovenia, to each Croatia, other. Groups Bosnia and Herzegovina, Serbia, Lithuania, Estonia, Romania, Bulgaria, Montenegro, and Macedonia); many countries of Central and Northern Europe on the lower right (Germany, Switzerland, Belgium, Luxembourg, Denmark, Norway, Finland, Ireland, United Kingdom, and Iceland); a group of English ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 19 of 24 of nationalities belonging to the same areas are clearly visible: the majority of Eastern-European countries on the upper right (Slovakia, Czech Republic, Poland, Hungary, Slovenia, Croatia, Bosnia and Herzegovina, Serbia, Lithuania, Estonia, Romania, Bulgaria, Montenegro, and Macedonia); manyISPRS countries Int. J. Geo-Inf. of2019 Central, 8, 134 and Northern Europe on the lower right (Germany, Switzerland,19 of 23 Belgium, Luxembourg, Denmark, Norway, Finland, Ireland, United Kingdom, and Iceland); a group of Englishspeaking speaking countries countries and former and British former colonies British on co thelonies mid on lower the mid part oflower the plotpart (United of the plot States, (United Canada, States,Australia, Canada, New Australia, Zealand, New South Zealand, Africa, andSouth Hong Africa, Kong); and aHong group Kong); of Asian a group countries of Asian on the countries mid-lower on rightthe mid-lower (China, Indonesia, right (China, South Korea,Indonesia, and Japan);South Korea, and again and small Japan); groups and of again countries small such groups as: Russia, of countriesUkraine such and as: Kazakhstan; Russia, Ukraine France, and Spain Kazakhstan; and Portugal; France, United Spain Arab and Emirates, Portugal; Kuwait, United Jordan Arab and Emirates,Saudi Arabia.Kuwait, Moreover,Jordan and the Saudi central Arabia. part Moreover, of the plot the mainly central consists part ofof the various plot mainly African consists and South of variousAmerican African countries. and South This analysisAmerican suggests countries. similar This paths analysis for users suggests of neighboring similar paths countries for users and can of be neighboringseen as an countries interesting and starting can be point seen toas studyan intere if andsting how starting belonging point toto specificstudy if countriesand how orbelonging continents to specificmay determine countries characteristic or continents visiting may determine motion behaviors. characteristic visiting motion behaviors.

Figure 17. t-SNE reduction of the embedding vectors representing the motion behavior of visitors’ main nationalities in Italy. Figure 17. t-SNE reduction of the embedding vectors representing the motion behavior of visitors’ 4.3.main Discussion nationalities in Italy. Mot2vec is a method for creating dense vectors of locations, traces and visitors, based on the 4.3. Discussion motion behavior of people. Geography of places is ignored, only trajectories passing from a location to another one are used to construct the embedding vectors. Even though very close locations have higher chances to be more behaviorally similar than the ones in distant regions, we showed that spatial proximity and behavioral proximity are not proportional: there are cases where places that ISPRS Int. J. Geo-Inf. 2019, 8, 134 20 of 23 are more distant in space are also more similar, especially if traversed by popular routes, always depending on how people move over the territory. We reported how places in small towns and rural areas tend to have higher similarities with nearby locations (local movement), whereas, in big cities, and even more in correspondence with stations and airports, top similarities are lower in absolute values and more distributed over a larger number of places, often comprising more distant locations (longer distance movement). Therefore, similarity distribution varies according to the type of location considered, potentially helping to identify the correct kind of place such as belonging to a rural area, a city, or an airport. Similarity analysis may also be used to reveal particular functions of places within a certain area: higher similarities between two locations of two different cities can identify centers of connection such as the train stations connecting those cities, or, if we already know the locations of train stations, bus stations and airports within two different cities, the highest similarity can reveal which transportation people generally use to move between those two cities. Embeddings of traces can be constructed from vectors of single locations. In this way, traces’ comparison assumes a different meaning than simply measuring the distance between COMs, acquiring a behavioral flavor, being influenced by areas of interest (limited group of locations having very similar representations, often belonging to specific areas such as borders of lakes, tracts of coastlines, city metropolitan areas, confined rural territories). As for places, traces in the same regions tend to have a more similar representation than traces located far away. However, on a smaller scale, different behaviors emerge: traces with comparable COM distances may have a different cosine similarity due to the presence of areas of interest influencing representations of the traces located nearby. We showed indeed cases of traces with comparable distances and same direction in space (or same angles between directions), where the ones in proximity of the same area of interest have a higher similarity. Moreover, embeddings of visitors can be created in order to compare behaviors not only between different users, but also between groups of traces belonging to the same user. We showed how the motion of a user can be studied according to the time of the day, displaying for example that a particular user has a lower similarity during specific hours of the day as compared to the other hours. This gives an idea about the motion patterns of a user’s activities, revealing under which conditions the behavior is substantially different from usual. We finally studied similarities between mobility of users grouped by nationality, pointing it out as a relevant factor in understanding motion of people whose movements appear to be related to cultural and geographic affinity between users’ nationalities. This implicitly highlights how the nationality can be taken into consideration as a potential feature for helping predict movements of foreign visitors. Depending on research purposes, any other types of grouping can be performed, using embedding vectors to inspect similarities between different categories. In conclusion, embeddings have advantages on meaningfully representing locations on the basis of users’ motion, even over a wide territory, and is suitable for different applications. The main scenarios are related to similarity searching, clustering approaches, and prediction algorithms. Comparisons of similar places can be performed in order to quickly identify which locations a place is more connected to, which routes are often traveled by users, where the flux of people mainly flows between locations, and which places belong to highly connected delimited areas on the territory. Clustering can be used on locations, users and groups of users uncovering interesting associations, whereas deeper studies on classification of location types can take advantage of their different similarity distributions. We can finally utilize location embeddings as a pre-processing step for prediction models, expecting a performance improvement over traditional representations.

5. Conclusions In this paper, we explore dense vector representations of locations, traces and visitors, constructed from large-scale mobility data. The Mot2vec model for generating embeddings consists of two steps, the trajectory pre-processing and the Skip-gram Word2vec-based model building. Aiming at constructing the input data to Skip-gram, we transformed original trajectories into location sequences ISPRS Int. J. Geo-Inf. 2019, 8, 134 21 of 23 by fixing a time step in order to encode time information implicitly in the position along the sequence. Mobility traces are converted to sequences of locations unfolding in fixed time steps, where if more than one location event falls within a single time step, the one with the most occurrences is chosen to represent the location of the user. Thereafter, Skip-gram Word2vec model is proposed to construct the location embeddings, being one of the most efficient techniques to define dense vectors of meaning. Embeddings of traces and visitors are finally created averaging the vectors of the time-stamped locations belonging to each specific trace or visitor. In general, embeddings obtained from the motion behavior of people were revealed to be a meaningful representation of locations, allowing a direct way of comparing locations’ connections and providing analogous similarity distributions for places of the same type. They also allow identifying common areas of interest over the territory and common motion behaviors of users, finding similarities intra-user, inter-users and inter-groups. There are several potential extensions of this paper. In particular, embedding representations can be tested in various applications, either fed into machine learning models or used as a basis for similarity searching and clustering approaches: comparison and clustering of similar places and people’s behaviors, pre-processing for predictive models, studies of place connectivity and human motion over the territory, and information retrieval on the functionality of places. As we studied behavioral similarities between nationalities, different types of user categories can be explored to find useful features that can be utilized in further applications. Moreover, we used a dataset composed of foreign visitors’ trajectories spread over a whole country, but different datasets can be employed, comprising motion traces belonging to a different type of users, or dealing with different territory dimensions, for example focusing on a particular region. Depending on the type of source data, different resolutions in time and space can be explored; in particular, GPS data would allow finer resolutions than telecom data. To conclude, as word embeddings are now employed in practically every NLP task related to meaning, geo-embeddings are meaningful representations that can potentially be used for various objectives related to human motion behavior and applied to a wide range of applications dealing with mobility traces.

Author Contributions: A.C. conceived and designed the experiments, analyzed the data and wrote the paper. E.B. supervised the work, helped with designing the conceptual framework, and edited the manuscript. Funding: This research was funded by the Austrian Science Fund (FWF) through the Doctoral College GIScience at the University of Salzburg (DK W 1237-N23). Acknowledgments: The authors would like to thank Vodafone Italia for providing the dataset for the case study. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Feng, Z.; Zhu, Y. A Survey on Trajectory Data Mining: Techniques and Applications. IEEE Access 2016, 4, 2056–2067. [CrossRef] 2. Schuessler, N.; Axhausen, K.W. Processing Raw Data from Global Positioning Systems without Additional Information. Transp. Res. Rec. 2009, 2105, 28–36. [CrossRef] 3. Zheng, Y. Trajectory Data Mining: An Overview. J. ACM Trans. Intell. Syst. Technol. 2015, 6.[CrossRef] 4. Pappalardo, L.; Simini, F.; Rinzivillo, S.; Pedreschi, D.; Giannotti, F.; Barabási, A.-L. Returners and explorers dichotomy in human mobility. Nat. Commun. 2015, 6, 8166. [CrossRef][PubMed] 5. Turney, P.D.; Pantel, P. From frequency to meaning: Vector space models of semantics. J. Artif. Intell. Res. 2010, 37, 141–188. [CrossRef] 6. Mitra, B.; Diaz, F.; Craswell, N. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 1291–1299. 7. Liu, P.; Joty, S.; Meng, H. Fine-grained opinion mining with recurrent neural networks and word embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1433–1443. ISPRS Int. J. Geo-Inf. 2019, 8, 134 22 of 23

8. Ye, X.; Shen, H.; Ma, X.; Bunescu, R.; Liu, C. From word embeddings to document similarities for improved information retrieval in engineering. In Proceedings of the 38th International Conference on Software Engineering, Austin, TX, USA, 14–22 May 2016; pp. 404–415. 9. Osgood, C.E.; Suci, G.J.; Tannenbaum, P.H. The Measurement of Meaning; University of Illinois Press: Champaign, IL, USA, 1964. 10. Bullinaria, J.A.; Levy, J.P. Extracting semantic representations from word co-occurrence statistics: A computational study. Behav. Res. Methods 2007, 39, 510–526. [CrossRef][PubMed] 11. Bengio, Y.; Ducharme, R.; Vincent, P.; Jauvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 2003, 3, 1137–1155. 12. Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 3111–3119. 13. Pennington, J.; Socher, R.; Manning, C. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. 14. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv, 2013; arXiv:1301.3781. 15. Levy, O.; Goldberg, Y. Neural as implicit matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2177–2185. 16. Levy, O.; Goldberg, Y.; Dagan, I. Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 2015, 3, 211–225. [CrossRef] 17. Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1188–1196. 18. Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching word vectors with subword information. arXiv 2016, arXiv:1607.04606. 19. Word2vec Original C Code. Available online: https://word2vec.googlecode.com/svn/trunk/ (accessed on 20 January 2019). 20. Gensim Framework for Python. Available online: http://radimrehurek.com/gensim/ (accessed on 20 January 2019). 21. Word2vec in TensorFlow. Available online: https://www.tensorflow.org/tutorials/word2vec (accessed on 20 January 2019). 22. GloVe Implementation. Available online: http://nlp.stanford.edu/projects/glove/ (accessed on 20 January 2019). 23. Jaeger, S.; Fulle, S.; Turk, S. Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition. J. Chem. Inf. Model. 2018, 58, 27–35. [CrossRef][PubMed] 24. Guo, C.; Berkhahn, F. Entity embeddings of categorical variables. arXiv 2016, arXiv:1604.06737. 25. De Brébisson, A.; Simon, É.; Auvolat, A.; Vincent, P.; Bengio, Y. Artificial neural networks applied to taxi destination prediction. arXiv 2015, arXiv:1508.00021. 26. Yan, B.; Janowicz, K.; Mai, G.; Gao, S. From ITDL to Place2Vec: Reasoning About Place Type Similarity and Relatedness by Learning Embeddings from Augmented Spatial Contexts. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, 7–10 November 2017; p. 35. 27. Gao, S.; Yan, B. Place2Vec: Visualizing and Reasoning About Place Type Similarity and Relatedness by Learning Context Embeddings. In Proceedings of the 14th International Conference on Location Based Services, Zurich, Switzerland, 15–17 January 2018; pp. 225–226. 28. Zhai, W.; Bai, X.; Han, Y.; Peng, Z.-R.; Gu, C. Beyond Word2vec: An approach for urban functional region extraction and identification by combining Place2vec and POIs. Comput. Environ. Urban Syst. 2019, 74, 1–12. [CrossRef] 29. Liu, K.; Gao, S.; Qiu, P.; Liu, X.; Yan, B.; Lu, F. Road2Vec: Measuring Traffic Interactions in Urban Road System from Massive Travel Routes. ISPRS Int. J. Geo-Inf. 2017, 6, 321. [CrossRef] ISPRS Int. J. Geo-Inf. 2019, 8, 134 23 of 23

30. Zhou, Y.; Wu, Y.; Wu, J.; Chen, L.; Li, J. Refined Taxi Demand Prediction with ST-Vec. In Proceedings of the 2018 26th International Conference on Geoinformatics, Kunming, Yunnan, China, 28–30 June 2018; pp. 1–6. 31. Li, X.; Zhao, K.; Cong, G.; Jensen, C.S.; Wei, W. Deep representation learning for trajectory similarity computation. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE), Paris, France, 16–19 April 2018; pp. 617–628. 32. Zhou, N.; Zhao, W.X.; Zhang, X.; Wen, J.-R.; Wang, S. A general multi-context embedding model for mining human trajectory data. IEEE Trans. Knowl. Data Eng. 2016, 28, 1945–1958. [CrossRef] 33. Zhao, W.X.; Zhou, N.; Sun, A.; Wen, J.-R.; Han, J.; Chang, E.Y. A time-aware trajectory embedding model for next-location recommendation. Knowl. Inf. Syst. 2018, 56, 559–579. [CrossRef] 34. Sun, Y.; Gu, T.; Bin, C.; Chang, L.; Kuang, H.; Huang, Z.; Sun, L. A Multi-latent Semantics Representation Model for Mining Tourist Trajectory. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, Nanjing, China, 28–31 August 2018; pp. 463–476. 35. Solomon, A.; Bar, A.; Yanai, C.; Shapira, B.; Rokach, L. Predict Demographic Information Using Word2vec on Spatial Trajectories. In Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization, Singapore, 8–11 July 2018; pp. 331–339. 36. Esuli, A.; Petry, L.M.; Renso, C.; Bogorny, V. Traj2User: Exploiting embeddings for computing similarity of users mobile behavior. arXiv 2018, arXiv:1808.00554. 37. Andrienko, N.; Andrienko, G.; Pelekis, N.; Spaccapietra, S. Basic concepts of movement data. In Mobility, Data Mining and Privacy; Springer: Berlin, Germany, 2008; pp. 15–38. 38. Urner, J.; Bucher, D.; Yang, J.; Jonietz, D. Assessing the Influence of Spatio-Temporal Context for Next Place Prediction using Different Machine Learning Approaches. ISPRS Int. J. Geo-Inf. 2018, 7.[CrossRef] 39. Baroni, M.; Dinu, G.; Kruszewski, G. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, 22–27 June 2014; pp. 238–247. 40. Wieting, J.; Bansal, M.; Gimpel, K.; Livescu, K. Towards universal paraphrastic sentence embeddings. arXiv 2015, arXiv:1511.08198. 41. Kutuzov, A.; Kopotev, M.; Sviridenko, T.; Ivanova, L. Clustering comparable corpora of Russian and Ukrainian academic text: Word embeddings and semantic fingerprints. arXiv 2016, arXiv:1604.05372. 42. De Montjoye, Y.-A.; Quoidbach, J.; Robic, F.; Pentland, A.S. Predicting personality using novel mobile phone-based metrics. In Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, Washington, DC, USA, 2–5 April 2013; pp. 48–55. 43. Noulas, A.; Scellato, S.; Lathia, N.; Mascolo, C. Mining user mobility features for next place prediction in location-based services. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining (ICDM), Brussels, Belgium, 10–13 December 2012; pp. 1038–1043. 44. Lu, X.; Bengtsson, L.; Holme, P. Predictability of population displacement after the 2010 Haiti earthquake. Proc. Natl. Acad. Sci. USA 2012, 109, 11576–11581. [CrossRef][PubMed] 45. Lu, X.; Wetter, E.; Bharti, N.; Tatem, A.J.; Bengtsson, L. Approaching the limit of predictability in human mobility. Sci. Rep. 2013, 3, 2923. [CrossRef][PubMed] 46. Hawelka, B.; Sitko, I.; Kazakopoulos, P.; Beinat, E. Collective prediction of individual mobility traces for users with short data history. PLoS ONE 2017, 12, e0170907. [CrossRef][PubMed] 47. Mnih, A.; Kavukcuoglu, K. Learning word embeddings efficiently with noise-contrastive estimation. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 2265–2273. 48. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. 49. Maaten, L.V.D.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).