Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus

Total Page:16

File Type:pdf, Size:1020Kb

Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus Suguru Matsuyoshi1, Hirotaka Kameko2, Yugo Murawaki3, Shinsuke Mori4 1Graduate School of Informatics and Engineering, The University of Electro-Communications 2Graduate School of Engineering, The University of Tokyo 3Graduate School of Informatics, Kyoto University 4Academic Center for Computing and Media Studies, Kyoto University 1Chofugaoka, Chofu, Tokyo, Japan 2Hongo, Bunkyo-ku, Tokyo, Japan 3;4Yoshida-honmachi, Sakyo-ku, Kyoto, Japan [email protected], [email protected], [email protected], [email protected] Abstract In recent years, there has been a surge of interest in the natural language processing related to the real world, such as symbol grounding, language generation, and nonlinguistic data search by natural language queries. We argue that shogi (Japanese chess) commentaries, which are accompanied by game states, are an interesting testbed for these tasks. A commentator refers not only to the current board state but to past and future moves, and yet such references can be grounded in the game tree, possibly with the help of modern game-tree search algorithms. For this reason, we previously collected shogi commentaries together with board states and have been developing a game commentary generator. In this paper, we augment the corpus with manual annotation of modality expressions and event factuality. The annotated corpus includes 1,622 modality expressions, 5,014 event class tags and 3,092 factuality tags. It can be used to train a computer to identify words and phrases that signal factuality and to determine events with the said factuality, paving the way for grounding possible and counterfactual states. Keywords: game commentary, modality, factuality, symbol grounding 1. Introduction The field of natural language processing (NLP) has expe- rienced a resurgence of interest in the symbol grounding problem (Harnad, 1990). An increasing number of datasets available align natural language expressions with real world objects in the form of images and videos (Hashimoto et al., 2014; Mori et al., 2014b; Ferraro et al., 2015), and systems built on top of such datasets typically perform image/video- to-text generation (Ushiku et al., 2011; Yang et al., 2011). Figure 1: Starting setup of shogi (left: normal depiction, These systems are, however, more akin to great apes than right: chess-like depiction). to humans in the sense that “[t]heir lives are lived entirely in the present” (Donald, 1991). While there is no evi- dence that nonhuman animals communicate past episodes A human commentator usually expresses to shogi fans the and planned future events, human language is abundant degree of confidence on the future move he is describing, with them (Szagun, 1978). It is even suggested that the which reflects his evaluation of the game-tree. Since con- faculty of language has a close connection to the ability to fidence is expressed through a wide variety of words and image the future (Suddendorf and Corballis, 1997). phrases, which we call modality expressions, identifying such expressions and binding the factual statuses to events Here we argue that commentaries for an extensive-form are necessary steps toward automatic generation of human- game (e.g., chess) are an ideal testbed for developing like commentaries. truly intelligent systems. Specifically, we focus on shogi To this end, we annotate a shogi corpus for a commen- (Japanese chess). Unlike typical image captions, human tator’s confidence in this work. Specifically, we adopt a comments on shogi games are full of references to past and three-layer annotation scheme: modality expression, event future moves as we will see in Subsection 2.2. Yet, thanks class and factuality. The first layer specifies constituents of to the well-definedness of the world, many of such refer- a sentence construction conveying a certain level of confi- ences can be grounded in a game tree. Although ambi- dence. The second layer extracts event mentions that can be guities inherent in natural language remain a challenging mapped into a game tree. The last layer denotes confidence problem, modern game-tree search algorithms (Tsuruoka level of each event mention. et al., 2002) help distinguish realistic states from unreal- istic ones. For this reason, we have collected shogi com- 2. Game and Commentary mentaries together with the corresponding board states and have been developing a game commentary generator (Mori 2.1. Shogi et al., 2016; Kameko et al., 2015). Shogi is a two-player board game similar to chess as illus- Tag Meaning ゴキゲン中飛車との予測は外れた (I expected that Tu Turn white used cheerful central rook strategy, but he did Po Position not.) Pi Piece Ps Piece specifier • Mentioning future directions. Mc Move compliment △1四香とすれば決戦 (White’s Lx1d would shift the Mn Move name phase to the end game.) Me Move evaluation St Strategy 3. Japanese Modality Expressions Ca Castle Ev Evaluation: entire We augment the SGC corpus with annotations of modality Ee Evaluation: part expressions and event factuality. Table 2 shows a sample Re Region text annotated according to our annotation scheme. In this Ph Phase section, we describe the first layer of modality expressions. Pa Piece attribute Modality expressions are words and multi-word expres- Pq Piece quantity sions which impose a proposition in a sentence on propo- Hu Human sitional modality and event modality (Palmer, 2001). Here Ti Time are some examples of Japanese modality expressions (EV Ac Player action denotes an event mention and ME denotes a modality ex- Ap Piece action pression): Ao Other action EX 1 後手は歩を成り捨て た (White sacrifice - Ot Other notion EV ME EV edME the pawn in the oppenent’s zone.) Table 1: Shogi-specific named entity tags. EX 2 こ の 試 合 で は 居 飛 車 を採用EV す る かもしれないME (White mayME useEV rook stay strategy in the game.) trated in Fig 1. The goal of each player is to capture the opponent’s king. Each player moves one piece alternately. EX 3 おそらくME △ 1 四 香 が良好EV (ProbablyME The biggest difference from chess is that a player can drop white’s Lx1d will be goodEV.) a captured piece back onto the board instead of making a move. For detailed explanation of shogi, please refer to EX 4 ▲8五桂の跳ねEV 出しを防いME だ (The move (Leggett, 2009). preventedME black’s Nx8e attackEV.) 2.2. Shogi Commentary EX 5 ここで△1四歩と受けEV ればME 先手はつらい Professional players and writers give commentaries of pro- (IfME white choosesEV Px1d, black will have a hard fessional matches for shogi fans. Mori et al. (2016) col- time.) lected shogi commentaries with board states described in In EX 1,“た (PAST)” is an auxiliary verb for past tense and Shogi Forsyth-Edwards notation (SFEN). The corpus con- indicates that the event is a fact. In EX 2,“かもしれない sists of 6,523 matches including 744,327 sentences and (may)” is a multi-word expression functioning as an auxil- 11,083,669 words. Mori et al. (2016) first segmented nine iary verb and expressing the low possibility of the event. In matches, or 2,041 sentences, automatically with a text an- EX 3,“おそらく (probably)” is an adverb which suggests alyzer KyTea1 and then corrected word boundaries manu- that the event is probable. “防い (prevent)” is a verb used ally. Finally they manually added to the sentences shogi- as a modality expression while in EX 4 it is used a main specific named entities (NEs) tags defined in Table 1. We verb and can be seen as an event mention. This counter- call the annotated corpus the Shogi Game Commentary factive verb makes it explicit that the event “跳ね (attack)” Corpus (SGC corpus). did not happen. In EX 5,“ば (if)” is a conjunctive particle In the SGC corpus, we observed that the following con- that leads to a conditional construction. Factuality of event tents were mainly expressed with recollection of the past mentions in the hypothetical construction is absolutely un- and imagination of the future: certain. • Reason behind moves. In the first layer of our annotation scheme, we mark up この桂馬打ちは飛車取りを狙ったものだ (This drop modality expressions as described above. of knight aims at capturing the opponent’s rook.) 3.1. POS 角の仇ですね (The move must be a revenge for his bishop.) There have been several studies of detecting Japanese modality expressions (Suzuki et al., 2012; Izumi et al., • Prediction of the next moves and strategies. 2013) (Kamioka et al., 2015). However, they focus on aux- 美濃囲いが有効だ (Black will build Mino castle for iliary verbs and functional multi-word expressions. Modal making it better.) adverbs and conjunctive particles are largely out of scope of these studies. By contrast, we target all types of modality 1http://www.phontron.com/kytea/ expressions for mark-up regardless of their parts-of-speech. Layer String 先手 は 美濃 囲い が 崩れ て い る の で 、 飛車 交換 は 後手 の 得 に な り そう だ 。 black TOP Mino castle NOM break have PP because rook change TOP white ’s good to be be likely to POS N P N N P V P V Sf P Aux Pnc N N P N P N P V Sf Adj Aux Pnc NE Tu-B O Ca-B Ca-I O Ao-B O O O O O O Mn-B Mn-I O Tu-B O Ee-B O Ao-B O O O O Modality O O O O O MEn-B O O O O O O O O O O O O O O O MEa-B O O Event class EVe EVe EVf EVi EVe Factuality FNc FPc FPr Table 2: Five annotation layers for a commentary string with a well-formatted board state. 3.2.
Recommended publications
  • Game State Retrieval with Keyword Queries
    Game State Retrieval with Keyword Queries Atsushi Ushiku Shinsuke Mori Kyoto University Kyoto University [email protected] [email protected] Hirotaka Kameko Yoshimasa Tsuruoka The University of Tokyo The University of Tokyo [email protected] [email protected] ABSTRACT learning (often complex) formats of search queries. Cur- There are many databases of game records available online. rently, however, retrieval systems that accept natural lan- In order to retrieve a game state from such a database, users guage queries for nonlinguistic data are available only for a usually need to specify the target state in a domain-specific few domains such as images [4] and videos [13]. language, which may be difficult to learn for novice users. One example of nonlinguistic data is stock charts. A stock In this work, we propose a search system that allows users trader may want to search for a stock chart similar to his to retrieve game states from a game record database by us- current situation in order to predict what will happen next. ing keywords. In our approach, we first train a neural net- Another example is records of games. A chess player may work model for symbol grounding using a small number of want to practice by using past game records for reference. pairs of a game state and a commentary on it. We then Building a natural language-based search system for such apply it to all the states in the database to associate each nonlinguistic data is viable if all of the data are already as- of them with characteristic terms and their scores.
    [Show full text]
  • Introduction to Shogi
    Front Cover A Brief Introduction to Shogi Roger Hare January 2021 This is work in progress and will be updated periodically. For the current version, please see Eric Cheymol's Shogi page, or contact me at [email protected] Please also feel free to send feed-back to [email protected] As far as I am aware, no copyright material is included here without permission. If you think that someone's copyright has been infringed, please let me know. The document is 'protected'. Please do not attempt to edit or take extracts from the document – thank you. This document is free and may be freely shared on a no-cost basis. If you have paid for this document, you have been ripped-off!! This release (January 2021) contains additional material about 'Shogi Etiquette' (p. 61). Footnote has been updated to record another sad bastard offering a badly out-of-date copy of the file (p. 6). Change Log and Download information added (currently not visible to raders). Table of Contents [4] Acknowledgments. [6] Editing and Copying [7] Foreword. [12] Introduction to Shogi. [19] The Shogi Board and Pieces. [34] Shogi Diagrams. [39] Moves of the Pieces. [44] Shogi Notation. [55] The Rules of Shogi. [63] Shogi Openings and Castles. [72] Shape, Tesuji, Forks, Pins and Skewers. [88] Shogi Problems. [101] Shogi Proverbs. [104] A Game for Beginners. [133] 10 Shogi Maxims. [134] More about Shogi Notation. [138] Some More Sample Games. [159] Shogi Variants. [165] Glossary of Shogi Terms. [171] Computer Shogi. [183] Shogi Equipment. [187] Bibliography and other Shogi Resources.
    [Show full text]
  • Supporting Interface for Beginners Watching Japanese
    Proceedings of the International MultiConference of Engineers and Computer Scientists 2019 IMECS 2019, March 13-15, 2019, Hong Kong Supporting Interface for Beginners Watching Japanese Chess Games: Visualization of Putting Timing of Captured Piece and Value of Player’s Action Daisuke Shimizu, Yoko Nishihara, Member, IAENG, and Ryosuke Yamanishi, Member, IAENG, Abstract—This paper proposes a supporting interface for game board. Experimental results for the interface showed beginners of Japanese Chess. Japanese Chess players move their that the participants with the interface could understand the pieces to check mate each opponent king. The players make positions of games more. However, there may be any items predictions of the future situations of game. The predictions are hard to think up for beginners of Japanese Chess. Especially, to be evaluated and visualized for the beginners. Some of the the beginners do not understand why a player moves the piece participants mentioned that the putting timing of a captured to the square. They often fail to understand the situations of piece is unclear and the value of player’s action is also games. If such blind information is visualized for the beginners, unclear even if they use the interface. They also mentioned they may understand the situations of games smoothly and they that player’s tactics are also unclear. We found that the three may be able to predict the next action by players. This research improves the interface that have proposed in the previous study items should be visualized for the beginners. for supporting the beginners watching Japanese Chess games. This paper proposes a new supporting interface for the We add new visualized items to the previous interface.
    [Show full text]
  • Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus
    Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus Suguru Matsuyoshi1, Hirotaka Kameko2, Yugo Murawaki3, Shinsuke Mori4 1Graduate School of Informatics and Engineering, The University of Electro-Communications 2Graduate School of Engineering, The University of Tokyo 3Graduate School of Informatics, Kyoto University 4Academic Center for Computing and Media Studies, Kyoto University 1Chofugaoka, Chofu, Tokyo, Japan 2Hongo, Bunkyo-ku, Tokyo, Japan 3;4Yoshida-honmachi, Sakyo-ku, Kyoto, Japan [email protected], [email protected], [email protected], [email protected] Abstract In recent years, there has been a surge of interest in the natural language processing related to the real world, such as symbol grounding, language generation, and nonlinguistic data search by natural language queries. We argue that shogi (Japanese chess) commentaries, which are accompanied by game states, are an interesting testbed for these tasks. A commentator refers not only to the current board state but to past and future moves, and yet such references can be grounded in the game tree, possibly with the help of modern game-tree search algorithms. For this reason, we previously collected shogi commentaries together with board states and have been developing a game commentary generator. In this paper, we augment the corpus with manual annotation of modality expressions and event factuality. The annotated corpus includes 1,622 modality expressions, 5,014 event class tags and 3,092 factuality tags. It can be used to train a computer to identify words and phrases that signal factuality and to determine events with the said factuality, paving the way for grounding possible and counterfactual states.
    [Show full text]