Inducing Writers' Values on Concept Ordering from Mi- Croblog

DBSJ Journal Vol. 16, No. 2 Regular Paper March 2018 Inducing Writers’ Values on ious perspectives (e.g., beautiful, cheap) and specific stand- points (e.g., for women, in spring). At present, people are Concept Ordering from Mi- forced to spend a substantial amount of time wading through croblog massive amounts of text to get an overview of others’ opinions, or spend a lot of money to call for votes from experts or crowd workers in order to derive a convincing ordering. ~ Motivated from these situations, Nishina et al. [15] initi- Tatsuya IWANARI ated a task of ordering concepts based on common attribute } Naoki YOSHINAGA intensity expressed by an adjective (x 2), and Iwanari et al. Masashi TOYODA♠ presented a system that derives concept orderings [10] by | aggregating various pieces of evidence such as co-occurrence Masaru KITSUREGAWA of a concept and adjective from social media text [9] (x 3). The system collects microblog posts written by specific writers and at a certain time of interest (say, domain) to induce This article proposes a robust method of inducing concept orderings of the target domain, which reflect their microblog writers’ values on concept ordering in a values on the target concepts. It is not only practically bene- ficial for understanding concepts from others’ ordering-based specific domain (e.g., genders, residential areas and values to make correct decisions but also interesting from a time series) from their writings in the domain. The sociological perspective for inversely understanding common values on concept ordering are represented by sets views shared by a certain demographic and/or from a certain of ordered concepts (e.g., London, Berlin, and Rome) period of time. As the target domain becomes more specific, in accordance with a common attribute intensity ex- however, it becomes more difficult to gather enough amount pressed by an adjective (e.g., entertaining). Existing of evidence due to the data sparseness problem, which pre- vents the system from making convincing orderings. methods infer social-media users’ values by aggregat- To solve the data sparseness problem, we propose a robust ing various pieces of evidence for the given concepts method of ordering concepts that gathers more evidence by and adjective from their writings, but suffer from a (1) exploiting adjectives whose intensity correlates with that data sparseness problem when a target domain be- of the target adjective and (2) referring to concept orderings comes more specific since it is more difficult to gather in more general domains (where more text is available) than x a sufficient amount of evidence from less data. We the target domain in the supervised framework [10] ( 4). Addressing the data sparseness problem, this study opens therefore introduce two techniques to solve the data a way to acquiring values in more specific domains, or ulti- sparseness problem: 1) exploiting adjectives whose mately, individual values. intensity correlates with that of the target adjective We validate the effectiveness of our method in terms of the (e.g., heavy for large) and 2) referring to concept or- correlation between the system-generated and gold-standard derings in more general domains where more text is orderings for real-world concepts obtained from social media text (x 5). Experimental results on our 5-year Twitter archive available than the target domain. We evaluate our confirmed that our method obtained more convincing concept method on real-world concept orderings with various orderings in specific domains than the baseline [9]. domains on our 5-year microblog (Twitter) archive. 2. Task Settings 1. Introduction This section describes input, output, and gold standard of We make decisions every day by ordering two or more con- our concept ordering task. We exploit microblog posts in a cepts on the basis of common knowledge or common sense specific domain to induce the common values shared by the that we have. For example, imagine a situation in which writers (users) in that domain. The domains of individual we buy fruit juice. If we want something sweet to drink, users are identified in advance (x 5.1.2). we choose apple juice rather than lemon juice because we know that apples are generally sweeter than lemons. On the Input A set of nominal concepts is provided along with an other hand, when we want to investigate unfamiliar things adjective that represents an attribute shared by all members or concepts (e.g., Gypsophila), we typically endeavor to un- of the set. We provide an antonym of the target adjective if derstand the concept by comparing or ordering it with simi- any exists to reduce the ambiguity of the adjective. We refer lar and familiar concepts (e.g., Rose and Carnation) from var- to a pair of concepts and adjective (and its antonym) as a ~ query. In addition to a query, our method accepts one of the Non Member Recruit Holdings Co,Ltd. pre-identified domains (e.g., women, living in Kanto region). [email protected] } Output Given these inputs, our goal is to output an or- Member Institute of Industrial Science, the University of Tokyo [email protected] dered list of given concepts on the basis of attribute inten- ♠ sity. For example, when a set of concept felephant, whale, Member Institute of Industrial Science, the University of Tokyo dog, mouseg and an adjective large (along with the antonym [email protected] | small) are given, the expected output is whale ≻ elephant ≻ Member Institute of Industrial Science, the University of Tokyo dog ≻ mouse, where whale is the largest, elephant is the sec- National Institute of Informatics [email protected] ond largest, and so forth. The output ordering is required to reflect the common values of writers in the specified domain. 1 DBSJ Journal Vol. 16, No. 2 Regular Paper March 2018 Gold Standard We ask multiple crowd workers to order of more specific domains in which a smaller amount of text, concepts from various viewpoints (adjectives) and to provide thus a smaller amount of evidence, is available. their domain information (e.g., age, gender, prefecture they Our study addresses the above data sparseness problem live in, SNS they use) (x 5.1.1). We then generate the gold- by exploiting adjectives correlating with the target adjective standard orderings for a domain that maximize the average and statistics obtained from domains that are more general Spearman’s rank correlation coefficient, ρ, against the order- than the target domain. Lee et al.’s recent work [14] (which ings of crowd workers in the domain. The resulting orderings appeared after the draft version of this article first appeared) can be considered as common values shared in the domain. addresses the data sparseness problem by exploiting synony- mous adjectives in a way similar to our approach. They ob- 3. Related Work tained promising results for ordering concepts with English The concept ordering is a relatively new task initiated by datasets, which confirms the effectiveness of using similar Nishina et al. [15]. In this section, we first discuss tasks adjectives to solve the data sparseness problem. related to the concept-ordering task, and then introduce existing approaches to the concept ordering. 4. Proposed Method Question answering systems extract answers to factual This section describes our method of concept ordering. questions (e.g.,‘What is the average temperature in Tokyo?’) We adopt Iwanari’s supervised framework of concept order- from text [18] and some researchers have attempted to ex- ing [10], and introduce two smoothing techniques to gather tract attributes and their values from the Web [4, 1, 24, 28, supplemental pieces of evidence on concept ordering to ad- 21]. These studies can partly help us to perform our task, dress the data sparseness problem. The first technique ex- particularly when we order concepts in terms of the inten- ploits evidence on adjectives whose intensity correlates with sity of an objective and numerical attributes (e.g., largeness, that of the target adjective (e.g., heavy for large), while the heaviness, and expensiveness). second one refers to evidence obtained from domains that are Aspect-based sentiment analysis mines reviews or other more general than the target domain. texts for opinions on entities (e.g., products or movies) [17]. In what follows, we first briefly explain Iwanari’s method Some of these studies have handled statements comparing of concept ordering (x 4.1). We then explain the two smooth- multiple items (e.g.,‘car x is two feet longer than car y’) [11]. ing techniques (x 4.2, x 4.3). Kurashima et al. [13] proposed aggregating such statements 4.1 Ordering Concepts Based on Common At- to rank products in accordance with their popularity. This tribute Intensity sort of information is also used with our method but is inte- Iwanari et al. [10] resorted to massive amounts of social me- grated with other evidence to obtain orderings for concepts dia text to collect textual evidences that represent writers’ that are not directly compared in texts. This strategy dis- perception on concept ordering, and then obtained a con- tinguishes our method from those proposed for aspect-based vincing ordering by integrating these evidences in ranking sentiment analysis. SVM [12] and support vector regression (SVR) [6]. They ex- In contrast to these studies, the concept ordering task is ploited four types of evidences to capture the common view more general in that it handles not only objective attributes on concepts from social media text: (1) co-occurrences of a (with numerical intensity, e.g., size [21]) but also subjective concept and an adjective (e.g., How large that whale is!), (2) attributes.

Load more