Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss Mrfs: Appendices
Total Page:16
File Type:pdf, Size:1020Kb
Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs: Appendices A. Probabilistic Soft Logic users (i.e., users that are not top users). In this supplement, we describe the models used in our ex- We organize the variables in our model using PSL predi- periments using probabilistic soft logic (PSL) (Bach et al., cates. Whether each regular user tweeted a hashtag is rep- 2015), a language for defining hinge-loss potential tem- resented with the PSL predicate USEDHASHTAG. Tweets plates. PSL’s variables are logical atoms, and its rules that mention or retweet a top user are not counted, since use logical operators such as conjunction and implication they are too closely related to the target interaction. For to define dependencies between these variables. All vari- example, if User 1 tweets the hashtag #hayuncamino ables are continuous in the [0; 1] interval. Conjunction of then USEDHASHTAG(1; #hayuncamino) has an ob- Boolean variables X^Y are generalized to continuous vari- served truth value of 1.0. The PSL predicate ables using the hinge function maxfX + Y − 1; 0g, which REGULARUSERLINK represents whether a regular user is known as the Lukasiewicz t-norm. Disjunction X _ Y retweeted or mentioned any user in the full data set that is relaxed to minfX + Y; 1g, and negation :X is relaxed is not a top user, regardless of whether that mentioned or to 1 − X. To define a model, PSL rules are grounded retweeted user is a regular user. Whether a regular user out with all possible substitutions for logical terms. The retweeted or mentioned a top user is represented with the groundings define hinge-loss potentials that share the same PSL predicate TOPUSERLINK. Finally, the latent group weight, and whose values are the ground rule’s distance to membership of each regular user is represented with the satisfaction, e.g., for X ) Y the distance to satisfaction is PSL predicate INGROUP. maxfX − Y; 0g. In PSL, rules consist of a conjunction of literals in the body and a disjunction of literals in the head Latent group model We construct a HL-MRF model of the rule. The continuous interpretation of the rule be- for predicting interactions of regular users with top users comes a hinge-loss function for the rule’s distance to satis- via latent group membership. We treat atoms with faction. Finally, each rule is annotated with a non-negative the USEDHASHTAG or REGULARUSERLINK predicate weight, which is the parameter shared across all potentials as the set of conditioning variables x, atoms with the templated by that rule. TOPUSERLINK predicate as the set of target variables y, and atoms with the INGROUP predicate as the set of latent B. Discovering Latent Groups in Social Media variables z. When defining our model, let H be the set of hashtags used Data set The data set from Bach et al.(2013a) is roughly by at least 15 different regular users (jHj = 33), let T be 4.275M tweets collected from about 1.350M Twitter users the set of top users (jT j = 20), and let G = fg0; g1g be the via a query that focuses on South American users. The set of latent groups. tweets were collected from Oct. 6 to Oct. 8, 2012, a 48- hour window around the Venezuelan presidential election We first include rules that relate hashtag usage to group on Oct. 7. The two major candidates were Hugo Chavez,´ membership. For each hashtag in H and each latent group, the incumbent, and Henrique Capriles. Chavez´ won with we include a rule of the form 55% of the vote. wh;g : USEDHASHTAG(U; h) ! INGROUP(U; g) The goal is to learn a model that relates language usage 8h 2 H; 8g 2 G and social interactions to latent group membership. We first identify 20 users as top users based on being the so that there is a different rule weight governing how most retweeted or, in the case of the state-owned television strongly each commonly used hashtag is associated with network’s account, being of particular interest. Recorded each latent group. Second, we include a rule associating Twitter interactions form the features in this model. We social interactions with group commonality: identify all other users that either retweeted or mentioned at least one of the top users and used at least one hashtag wsocial : REGULARUSERLINK(U1;U3) in a tweet that was not a mention or a retweet of a top user. ^ REGULARUSERLINK(U2;U3) ^ U1 6= U2 Filtering by these criteria, the set contains 1,678 regular ^ INGROUP(U1;G) ! INGROUP(U2;G): Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs: Appendices This rule encodes the intuition that regular users who in- And eight are for a non-cyclic “v” structure: teract with the same people on Twitter are more likely to 1 belong to the same latent group. Adding this rule leverages wv :TRUSTS(A; B) ^ TRUSTS(B; C) ! TRUSTS(C; B); one the advantages of general log-linear models with latent 2 wv :TRUSTS(A; B) ^ :TRUSTS(B; C) !:TRUSTS(C; B); variables: the ability to easily include dependencies among w3 ::TRUSTS(A; B) ^ TRUSTS(B; C) !:TRUSTS(C; B); latent variables. Third, we include rules of the form v 4 wv ::TRUSTS(A; B) ^ :TRUSTS(B; C) ! TRUSTS(C; B); wg;t : INGROUP(U; g) ! TOPUSERLINK(U; t) 5 wv :TRUSTS(A; B) ^ TRUSTS(B; C) !:TRUSTS(C; B); 8g 2 G; 8t 2 T 6 wv :TRUSTS(A; B) ^ :TRUSTS(B; C) ! TRUSTS(C; B); for each latent group and each top user so that there is a 7 wv ::TRUSTS(A; B) ^ TRUSTS(B; C) ! TRUSTS(C; B); parameter governing how strongly each latent group tends w8 ::TRUSTS(A; B) ^ :TRUSTS(B; C) !:TRUSTS(C; B): to interact with each top user. Last, we constrain the v INGROUP atoms for each regular user to sum to 1.0, mak- We also include pairwise interactions: ing INGROUP a mixed-membership assignment. + w :TRUSTS(A; B) ! TRUSTS(B; A); We specify initial parameters w by initializing wh;g to 2.0 pair − for all hashtags and groups, w to 2.0, and wg;t to 5.0 social wpair ::TRUSTS(A; B) !:TRUSTS(B; A): for all top users and groups, except two hashtags and two top users which we assign as seeds. We initially associate the top user hayuncamino (Henrique Capriles’s cam- To add latent variable reasoning, we add predicates TRUST- paign account) and the hashtag for Capriles’s campaign slo- ING and TRUSTWORTHY that take a single actor as input. gan #hayuncamino with Group 0 by initializing the pa- The rules rameters associating them with Group 0 to 10.0 and those associating them with Group 1 to 0.0. We initially as- w1 :TRUSTING(A) ! TRUSTS(A; B); chavezcandanga latent sociate the top user (Hugo Chavez’s´ 2 account) and the hashtag for Chavez’s´ campaign slogan wlatent :TRUSTWORTHY(B) ! TRUSTS(A; B); 3 #elmundoconchavez´ with Group 1 in the same way. wlatent :TRUSTING(A) ^ TRUSTWORTHY(B) For entropy surrogates we add the following rules, all with ! TRUSTS(A; B) fixed weights of 10.0: infer trust from these latent predicates, and the rules w : :INGROUP(U; g) 8g 2 G; entropy 4 wlatent :TRUSTS(A; B) ! TRUSTING(A); wentropy : :TOPUSERLINK(U1;U2): 5 wlatent :TRUSTS(A; B) ! TRUSTWORTHY(B) The full results for all ten folds are presented in Figures1, infer the latent values from other trust predictions and ob- 2,3, and4. servations. All rules are initialized to weights of 1.0. Note that in this problem the structure of the social network is C. Modeling Latent User Features in Trust observed, so these rules are grounded for TRUSTS(A; B) Networks atoms where A and B are observed to know each other. For entropy surrogates, we use the following rules, all with We build our model based on that of Huang et al.(2013), fixed weights of 10.0: which encodes rules consistent with triadic closure in so- cial networks. Instead, we include rules for all possible wentropy : TRUSTS(A; B); configurations of directed triads, including those that do not wentropy : :TRUSTS(A; B); imply balanced behavior, so the learning algorithm can at- tribute weight to any configuration if it helps optimize its wentropy : TRUSTING(A; B); objective. Removing symmetries, there are 12 distinct log- wentropy : :TRUSTING(A; B); ical formulas. Four are for a cyclic structure: wentropy : TRUSTWORTHY(A; B); 1 wentropy : :TRUSTWORTHY(A; B): wcyc :TRUSTS(A; B) ^ TRUSTS(B; C) ! TRUSTS(C; A); 2 wcyc :TRUSTS(A; B) ^ :TRUSTS(B; C) ! TRUSTS(C; A); w3 ::TRUSTS(A; B) ^ :TRUSTS(B; C) ! TRUSTS(C; A); cyc The full results for all eight folds are presented in Figures 4 wcyc ::TRUSTS(A; B) ^ :TRUSTS(B; C) ! TRUSTS(C; A): 5,6, and7. Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs: Appendices D. Image Reconstruction that share bright and dark pixel locations with the seed im- ages. Starting with this initialization, which includes no The latent HL-MRF model we use for image reconstruc- information about the unthresholded pixel intensities, the tion reasons over variables representing the brightness of learning algorithms fit the models to also predict pixel in- pixel values BRIGHT, a binary, thresholded brightness of tensity. Figure8 shows details of the learned model and observed pixels (i.e., an indicator of whether have intensity example reconstructions. greater than 0.5) BINARY, and a set of six latent states LAT- STATE. The intuition behind the model is that the observed pixel intensities and the thresholded intensities provide ev- E.