Dynamic Content Based Ranking
Total Page:16
File Type:pdf, Size:1020Kb
Dynamic content based ranking Seppo Virtanen Mark Girolami University of Cambridge University of Cambridge and The Alan Turing Institute Abstract counts of likes, shares or ratings may naturally be used to compute the rankings for top-M items reflecting rel- We introduce a novel state space model for a ative order of items. Viral content may, first, directly set of sequentially time-stamped partial rank- attain top rank or evolve more slowly to top in the ings of items and textual descriptions for the consecutive rankings, second, dominate the top ranks items. Based on the data, the model infers for some time, and, finally, decline towards lower ranks text-based themes that are predictive of the eventually disappearing from the rankings. rankings enabling forecasting tasks and per- In this work, we present a novel model for dynamic par- forming trend analysis. We propose a scaled tial rankings of items that may be accompanied with Gamma process based prior for capturing the text data providing rich information about the item underlying dynamics. Based on two challeng- content. Our main goal is to uncover textual content ing and contemporary real data collections, that explain the dynamic rankings and use them for we show the model infers meaningful and use- exploration/summarisation in an easily interpretable ful textual themes as well as performs better manner, and for prediction. Based on two real data than existing related dynamic models. collections of movies and news, using user-provided tags/keywords and actual news articles, respectively, we show that the proposed model is useful for rank- 1 INTRODUCTION ing prediction, trend-analysis and is able to capture complex dynamics of viral content. Social media is becoming increasingly prevalent affect- ing and altering population behaviour. Users of social In the following sections, we first introduce the model media interactively expose to and distribute content accompanied with a MCMC algorithm for posterior online, creating communities or networks of similar inference. Section 3 discusses related work. Then we users. Within these communities users may amplify present the experiments and results. The final section and disseminate specific content that may become viral, concludes the paper. analogous to rapid spread of disease. It is of high rele- vance to analyse exposure uncovering complex dynam- ics of user interaction with social media, and detecting 2 MODEL viral content. For example, Facebook and Twitter are replacing tra- 2.1 Data ditional news providers, whereas Movielens and Netflix are changing consumption and distribution of movies. In this work, we consider partial or top-M time- Users expose to items, such as, news articles or movies, stamped rankings, that is, partial permutations, of (t) (t) (t) and provide content via sharing news articles or rating items y = fy1 ; : : : ; yM g, where t = 1;:::;T and movies. m = 1;:::;M denote time stamps and rank positions (m = 1 being the highest position), respectively. Each We propose a general and natural framework to analyse (t) exposure and uncover dynamics via analysing partial ym provides an identifier (for example, an integer) to rankings of items based on their population popular- a set of items I. ity or relevance within consecutive time frames. Raw 2.2 Dynamic Plackett-Luce Model Proceedings of the 23rdInternational Conference on Artificial Intelligence and Statistics (AISTATS) 2020, Palermo, Italy. We present a Plackett-Luce (PL)-based (Luce, 2005; PMLR: Volume 108. Copyright 2020 by the author(s). Plackett, 1975) state space model for the rankings: the Dynamic content based ranking 2 probability of a ranking y(t) is (t) wk /τ, as well as specifies the shape of the distri- 2 6 M T (t) bution through controlling skewness p and kurtosis . z^ (t) w τ τ Y y PL(y(t)) = m = (1) Figure 1 illustrates simulated coefficients. Depending PM T (t) z^ (t) w m=1 m0=m y on the value for τ the prior is flexible to account for m0 T (t) T (t) T (t) smoothness, burstiness and self-excitation. For large z^ (t) w z^ (t) w z^ (t) w y y y 1 2 ··· M−1 ;values of τ, the coefficients are smooth and vary slowly PM T (t) PM T (t) PM T (t) z^ (t) w z^ (t) w z^ (t) w in time. For smaller τ, the coefficients may experience m0=1 y m0=2 y m0=M−1 y m0 m0 m0 burstiness and encourage self-excitation via increased K (t) variance. where z^d 2 ∆ , for ym = d, denotes K item-specific compositional features, that are positive and sum to (t) 2.5 one over K, for the dth item and w , for t = 1;:::;T , 2.5 20 2.0 denote positively-valued feature-specific dynamic re- 2.0 15 1.5 gression coefficients. The rankings depend on item- 1.5 10 1.0 specific (positive) scores λ(t) = z^T w(t), that may be 1.0 d d 5 0.5 0.5 interpreted as a weighted average of feature-specific 0 0.0 scores w(t), for k = 1;:::;K. Even though the total 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 k (a) τ = 1 (b) τ = 10 (c) τ = 100 number of items is not known, the model is internally consistent, meaning that there is no need to take into Figure 1: Simulated regression coefficients for different account all possible items. Instead, only items that values for τ for an initial value of w(1) = 1. For τ ≤ 1, appear in the partial rankings are required (Hunter, the curves fluctuate strongly and eventually fall close 2004; Guiver and Snelson, 2009). The likelihood (1) to zero and for τ ≥ 100 the curves become smooth with may be intuitively decomposed, to illustrate a sampling values increasingly closer to 1 (not shown). process of items without replacement proportionally to the scores, such that the denominator for m = i (t) Further, for 1 < t < T , the prior conditional distribu- excludes items that occurred in y for ranks m < i. 0 (t) (t ) 0 The higher the score, the better the rank. tion of wk given wk , for t 6= t, proportional to Following Guiver and Snelson (2009), we adopt Gamma- (t) (t0) 0 (t+1) (t) (t) (t−1) p(wk jfwk : t 6= tg) / p(wk jwk )p(wk jwk ); based prior for the scores. We present a dynamic (3) Gamma process for the regression coefficients (that is, corresponds to a generalised inverse Gaussian (GIG) feature-specific scores), distribution, ! (t) τ ! w ∼ Gamma τ; ; (2) (t) 2τ (t+1) k (t−1) w ∼ GIG 0; ; 2τw / k (t−1) k wk wk (1) (t) (t+1) wk ∼ Gamma(α0; β0); −1 w w w(t) exp − τ k + k ; (4) k (t−1) (t) for k = 1;:::;K and t = 2;:::;T , and assume wk wk (τ) (τ) that illustrates how the prior takes naturally into ac- τ ∼ Gamma α ; β : 0 0 count coefficient ratios between forward and backward time steps and also imposes sparsity due to the (loga- For t ≥ 2, the process may be intuitively interpreted (t) via the scaling property of Gamma-distributed random rithmic) penalty term, − log(wk ). variables; for 2.3 Constructing Features (t) (t−1) δk ∼ Gamma(τ; τ) and wk > 0; ! Assuming K = jIj and a diagonal feature matrix such τ w(t) = δ(t)w(t−1) ∼ Gamma τ; : that z^d;d = 1 and zero for all other off-diagonal ele- k k k (t−1) (t) wk ments, we obtain item-specific dynamic scores, λd = w(t). However, for this simple choice the number of (t) d Here, δk represents random multiplicative constants parameters increases rapidly for increasing jIj. Further, (t) of the dynamical process. The prior mean of wk is the model is unable to generalise to new items and fails h (t)i (t−1) to properly leverage statistical strength between the given by E w = w , irrespective of τ, whereas k k coefficients. To overcome these limitations we assume h (t)i τ affects the prior variation, written as, Var wk = text-based data exist for the items. Seppo Virtanen, Mark Girolami (t) For ym = d; let xd = fxd;1; : : : ; xd;Nd g denote a set hence, it would be useful to share statistical strength of bag-of-word text data, containing Nd word tokens via joint modelling. Thus, we present a joint model xd;n 2 V over a word vocabulary V. In the following, where the topic assignments depend both on the text we refer to xd as a (textual item-specific) document. data and rankings via (5) and (1), respectively. The text data is available for each unique item, d = 1;:::; jIj. 2.4 MCMC for Posterior Inference A straightforward approach would be to regress directly Following (Griffiths and Steyvers, 2004), we adopt (par- based on the empirical word-distributions, that is, K = tially) collapsed Gibbs sampling for inference for the jVj and topic assignments analytically marginalising out topics N 1 Xd and topic proportions. The probability that the word z^d;k = #[xd;n = k]; N xd;n = i is assigned to the kth topic is given by d n=1 for k = 1;:::; jVj, inferring word-specific scores. How- −(dn) −(dn) Gk;i + γ ever, text data is often high dimensional and sparse, p(zd;n = k) / N + α × · · · d;k PjVj −(dn) complicating inference. Additionally, we know that j=1 Gk;j + γ word usage exhibits complex dependencies and, hence, Y (t) −(dn) (t) treating words independently would not be optimal.