Automatically Generating Psychiatric Case Notes from Digital Transcripts of Doctor-Patient Conversations

Total Page:16

File Type:pdf, Size:1020Kb

Automatically Generating Psychiatric Case Notes from Digital Transcripts of Doctor-Patient Conversations Automatically Generating Psychiatric Case Notes From Digital Transcripts of Doctor-Patient Conversations Nazmul Kazi Indika Kahanda Gianforte School of Computing Gianforte School of Computing Montana State University Montana State University Bozeman, MT, USA Bozeman, MT, USA [email protected] [email protected] Abstract Limited face-to-face time is especially disadvan- tageous for working with mental health patients Electronic health records (EHRs) are notori- where the psychiatrist could easily miss a non- ous for reducing the face-to-face time with verbal cue highly important for the correct diag- patients while increasing the screen-time for clinicians leading to burnout. This is espe- nosis. Moreover, EHR’s usability related prob- cially problematic for psychiatry care in which lems lead to unstructured and incomplete case maintaining consistent eye-contact and non- notes (Kaufman et al., 2016) which are difficult verbal cues are just as important as the spoken to search and access. words. In this ongoing work, we explore the Due to the above-mentioned downsides of feasibility of automatically generating psychi- EHRs, there have been recent attempts for de- atric EHR case notes from digital transcripts veloping novel methods for incorporating vari- of doctor-patient conversation using a two-step approach: (1) predicting semantic topics for ous techniques and technologies such as natu- segments of transcripts using supervised ma- ral language processing (NLP) for improving the chine learning, and (2) generating formal text EHR documentation process. In 2015, American of those segments using natural language pro- Medical Informatics Association reported time- cessing. Through a series of preliminary ex- consuming data entry is one of the major prob- perimental results obtained through a collec- lems in EHRs and recommended to improve EHRs tion of synthetic and real-life transcripts, we by allowing multiple modes of data entry such demonstrate the viability of this approach. as audio recording and handwritten notes (Payne 1 Introduction et al., 2015). Nagy et al.(2008) developed a voice-controlled EHR system for dentists, called An electronic health record (EHR) is a digital ver- DentVoice, that enables dentists to control the sion of a patient’s health record. EHRs were in- EHR and take notes over voice and without taking troduced as a means to improve the health care off their gloves while working with their patients. system. EHRs are real-time and store patient’s Kaufman et al.(2016) also developed an NLP- records in one place and can be shared with other enabled dictation-based data entry where clini- clinicians, researchers and authorized personals cians can write case notes over voice and able to instantly and securely. The use and implementa- reduce the time by more than 60%. tion of EHRs were spurred by the 2009 US Health Psychiatrists mostly collect information from Information Technology for Economic and Clin- their patients through conversations and these con- ical Health (HITECH) Act and 78% office-based versations are the primary source of their case clinicians reported using some form of EHR by notes. In a long-term project in collaboration with 2013 (Hsiao and Hing, 2014). National Alliance of Mental illness (NAMI) Mon- Presently, all clinicians are required to digitally tana and the Center for Mental Health Research document their interactions with their patients us- and Recovery (CMHRR) at Montana State Uni- ing EHRs. These digital documents are called case versity, we envision a pipeline that automatically notes. Manually typing case notes is time con- records a doctor-patient conversation, generates suming (Payne et al., 2015) and limits the face- the corresponding digital transcript of the conver- to-face time with their patients, which leads to sation using speech-to-text API and uses natural both patient dis-satisfaction and clinician burnout. language processing and machine learning tech- 140 Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 140–148 Minneapolis, Minnesota, June 7, 2019. c 2019 Association for Computational Linguistics niques to predict and/ or extract important pieces to generate a more formal (i.e. written) version of of information from the text. This relevant text is the text which goes in to the corresponding section then converted to a more formal written version of of the EHR form. the text and are used for auto-populating the dif- These semantic topics are suggested by the do- ferent sections of the EHR form. main experts from NAMI Montana and corre- In this work, we focus on the back-end of the spond to the main sections of a typical EHR form. above mentioned pipeline, i.e. we explore the fea- They are (1) Client details: personal information sibility of populating sections of EHR form us- of a patient, such as name, age, birth date etc., (2) ing the information extracted from a digital tran- Chief complaint: refers to the information regard- script of a doctor-patient conversation. In order ing a patient’s primary problem for which the pa- to gather gold-standard data, we develop a hu- tient is seeking medical attention., (3) Medical his- man powered digital transcript annotator and ac- tory: any past medical condition(s), treatment(s) quire annotated versions of digital transcripts of and record(s), (4) Family history: indicates medi- doctor-patient conversations with the help domain cal history of a family member of the patient, and experts. As the first step in our two-step approach, (5) Social history: refers to information about pa- we develop a machine learning model that can pre- tient’s social interactions, e.g. friends, work, fam- dict the semantic topics of segments of conversa- ily dinner etc. We call these semantic categories tions. Then we develop natural language process- “EHR categories” interchangeably. The formal ing techniques to generate a formal written text us- text is essentially the summary text that the clini- ing the corresponding segments. In this paper, we cian would write or type into the EHR form based present our preliminary findings from these two on the interaction with the patient. tasks; Figure1 depicts the high-level overview of our two-step approach. Previous studies most related to our work are (1) Lacson et al.(2006) predicting semantic topics for medical dialogue turns in the home hemodial- ysis, and (2) Wallace et al.(2014) automati- cally annotating topics in transcripts of patient- provider interactions regarding antiretroviral ad- herence. While both studies successfully use ma- chine learning for predicting semantic topics (al- beit different topics to ours) they do not focus on the development of NLP models for text summa- rization (i.e. formal text generation). The rest of the paper is structured as follows. We describe our two-step approach, data collec- tion and processing, machine learning models and natural language processing methods in chapter 2. In chapter 3, we report and discuss the perfor- mance of our methods. We summarize our find- Figure 1: High-level overview of our approach. Task1: ings, discuss limitations and potential future work Predicting EHR categories. Task 2: Formal text gen- in chapter 4. eration. ML: Machine Learning. EHR: Electronic Heallth Record. 2 Methods 2.1 Approach 2.2 Transcripts of doctor-patient dialogue As depicted in Figure1, we divide the task of Our raw dataset is composed of 18 digital tran- generating case notes from digital transcripts of scripts of doctor-patient conversations and covers doctor-patient conversations into two sub tasks: 11 presenting conditions. The presenting condi- (1) using supervised learning models to predict se- tions are Attention-deficit/ hyperactivity disorder mantic topics for segments of the transcripts and (ADHD), Alzheimer’s disease, Anger, Anorexia, then (2) using natural language processing models Anxiety, Bipolar, Borderline Personality Disor- 141 der (BPD), Depression, Obsessive Compulsive Disorder (OCD), Post Traumatic Stress Disorder (PTSD) and Schizophrenia. All transcripts are la- beled with speaker tags “Doctor:” and “Patient:” to indicate the words uttered by each individual. Thirteen of these transcripts are synthetic in that they are handwritten (i.e. typed) by a domain expert from NAMI Montana who has years of experience working with mental illness patients. Hence, each synthetic transcript represents a real case scenario of conversation between a patient (suffering from one of the presenting conditions mentioned above) and a psychiatric doctor/ clin- ician who verbally interviews the patient in a 2- person dialogue set up. Table1 reports summary statistics. Figure 2: Screen shot of the human-powered transcript annotator. Left panel displays an example transcript Rest of the five transcripts are part of Coun- while the semantic concepts are shown on the right. seling & Therapy database1 from the Alexander Street website. Hence, we refer to them as AS transcripts for the rest of the paper. Each of these the expert annotators. Any conversation pair that AS transcripts is generated from a real-life conver- was found to be irrelevant to the five categories sation between a patient and a clinician. Majority is annotated with a new category called “Others”. of these transcripts cover multiple mental condi- Conversation pair level annotations eliminated the tions. challenges in annotating a question or an answer In order to annotate transcripts using seman- on their own without the proper context provided tic topics mentioned above, we develop a human- by the preceding/ following sentences. powered transcript annotator as shown in Figure 2.3 Task 1: Predicting EHR categories 2, a responsive web application, that takes digital transcripts as input, breaks down each transcript In this task, we use the annotated digital tran- into segments where each segment starts with a scripts to generate the training data to train super- speaker tag (Doctor: or Patient:) and generates vised classification models using two different ap- samples by pairing each doctor segment with the proaches.
Recommended publications
  • Definiteness Determined by Syntax: a Case Study in Tagalog 1 Introduction
    Definiteness determined by syntax: A case study in Tagalog1 James N. Collins Abstract Using Tagalog as a case study, this paper provides an analysis of a cross-linguistically well attested phenomenon, namely, cases in which a bare NP’s syntactic position is linked to its interpretation as definite or indefinite. Previous approaches to this phenomenon, including analyses of Tagalog, appeal to specialized interpretational rules, such as Diesing’s Mapping Hypothesis. I argue that the patterns fall out of general compositional principles so long as type-shifting operators are available to the gram- matical system. I begin by weighing in a long-standing issue for the semantic analysis of Tagalog: the interpretational distinction between genitive and nominative transitive patients. I show that bare NP patients are interpreted as definites if marked with nominative case and as narrow scope indefinites if marked with genitive case. Bare NPs are understood as basically predicative; their quantificational force is determined by their syntactic position. If they are syntactically local to the selecting verb, they are existentially quantified by the verb itself. If they occupy a derived position, such as the subject position, they must type-shift in order to avoid a type-mismatch, generating a definite interpretation. Thus the paper develops a theory of how the position of an NP is linked to its interpretation, as well as providing a compositional treatment of NP-interpretation in a language which lacks definite articles but demonstrates other morphosyntactic strategies for signaling (in)definiteness. 1 Introduction Not every language signals definiteness via articles. Several languages (such as Russian, Kazakh, Korean etc.) lack articles altogether.
    [Show full text]
  • Arxiv:2106.08037V1 [Cs.CL] 15 Jun 2021 Alternative Ways the World Could Be
    The Possible, the Plausible, and the Desirable: Event-Based Modality Detection for Language Processing Valentina Pyatkin∗ Shoval Sadde∗ Aynat Rubinstein Bar Ilan University Bar Ilan University Hebrew University of Jerusalem [email protected] [email protected] [email protected] Paul Portner Reut Tsarfaty Georgetown University Bar Ilan University [email protected] [email protected] Abstract (1) a. We presented a paper at ACL’19. Modality is the linguistic ability to describe b. We did not present a paper at ACL’20. events with added information such as how de- sirable, plausible, or feasible they are. Modal- The propositional content p =“present a paper at ity is important for many NLP downstream ACL’X” can be easily verified for sentences (1a)- tasks such as the detection of hedging, uncer- (1b) by looking up the proceedings of the confer- tainty, speculation, and more. Previous studies ence to (dis)prove the existence of the relevant pub- that address modality detection in NLP often p restrict modal expressions to a closed syntac- lication. The same proposition is still referred to tic class, and the modal sense labels are vastly in sentences (2a)–(2d), but now in each one, p is different across different studies, lacking an ac- described from a different perspective: cepted standard. Furthermore, these senses are often analyzed independently of the events that (2) a. We aim to present a paper at ACL’21. they modify. This work builds on the theoreti- b. We want to present a paper at ACL’21. cal foundations of the Georgetown Gradable Modal Expressions (GME) work by Rubin- c.
    [Show full text]
  • Calculus of Possibilities As a Technique in Linguistic Typology
    Calculus of possibilities as a technique in linguistic typology Igor Mel’uk 1. The problem stated: A unified conceptual system in linguistics A central problem in the relationship between typology and the writing of individual grammars is that of developing a cross-linguistically viable con- ceptual system and a corresponding terminological framework. I will deal with this problem in three consecutive steps: First, I state the problem and sketch a conceptual system that I have put forward for typological explora- tions in morphology (Sections 1 and 2). Second, I propose a detailed illus- tration of this system: a calculus of grammatical voices in natural languages (Section 3). And third, I apply this calculus (that is, the corresponding con- cepts) in two particular case studies: an inflectional category known as an- tipassive and the grammatical voice in French (Sections 4 and 5). In the latter case, the investigation shows that even for a language as well de- scribed as French a rigorously standardized typological framework can force us to answer questions that previous descriptions have failed to re- solve. I start with the following three assumptions: 1) One of the most pressing tasks of today’s linguistics is description of particular languages, the essential core of this work being the writing of grammars and lexicons. A linguist sets out to describe a language as pre- cisely and exhaustively as possible; this includes its semantics, syntax, morphology and phonology plus (within the limits of time and funds avail- able) its lexicon. 2) Such a description is necessarily carried out in terms of some prede- fined concepts – such as lexical unit, semantic actant, syntactic role, voice, case, phoneme, etc.
    [Show full text]
  • Respiratory Therapy Pocket Reference
    Pulmonary Physiology Volume Control Pressure Control Pressure Support Respiratory Therapy “AC” Assist Control; AC-VC, ~CMV (controlled mandatory Measure of static lung compliance. If in AC-VC, perform a.k.a. a.k.a. AC-PC; Assist Control Pressure Control; ~CMV-PC a.k.a PS (~BiPAP). Spontaneous: Pressure-present inspiratory pause (when there is no flow, there is no effect ventilation = all modes with RR and fixed Ti) PPlateau of Resistance; Pplat@Palv); or set Pause Time ~0.5s; RR, Pinsp, PEEP, FiO2, Flow Trigger, rise time, I:E (set Pocket Reference RR, Vt, PEEP, FiO2, Flow Trigger, Flow pattern, I:E (either Settings Pinsp, PEEP, FiO2, Flow Trigger, Rise time Target: < 30, Optimal: ~ 25 Settings directly or by inspiratory time Ti) Settings directly or via peak flow, Ti settings) Decreasing Ramp (potentially more physiologic) PIP: Total inspiratory work by vent; Reflects resistance & - Decreasing Ramp (potentially more physiologic) Card design by Respiratory care providers from: Square wave/constant vs Decreasing Ramp (potentially Flow Determined by: 1) PS level, 2) R, Rise Time (­ rise time ® PPeak inspiratory compliance; Normal ~20 cmH20 (@8cc/kg and adult ETT); - Peak Flow determined by 1) Pinsp level, 2) R, 3)Ti (shorter Flow more physiologic) ¯ peak flow and 3.) pt effort Resp failure 30-40 (low VT use); Concern if >40. Flow = more flow), 4) pressure rise time (¯ Rise Time ® ­ Peak v 0.9 Flow), 5) pt effort (­ effort ® ­ peak flow) Pplat-PEEP: tidal stress (lung injury & mortality risk). Target Determined by set RR, Vt, & Flow Pattern (i.e. for any set I:E Determined by patient effort & flow termination (“Esens” – PDriving peak flow, Square (¯ Ti) & Ramp (­ Ti); Normal Ti: 1-1.5s; see below “Breath Termination”) < 15 cmH2O.
    [Show full text]
  • Categories of Nouns
    Categories of nouns Nouns describe people, places or things. They also express a range of meanings such as concepts, qualities, organisations, communities, sensations and events. What do they look like? A small proportion of nouns have identifiable ‘noun endings’. These include: Tradition, ability, excellence, significance, factor, rigour. Many plural nouns end in ‘s’, e.g. cats Proper Nouns and Capital Letters Words which begin with capital letters but are not always at the beginning of sentences are often the names of people, places (town, countries, etc.) or institutions. These are called proper nouns. Example Lauren and Jack, Africa, International House A capital letter is also used for days of the week, months of the year, and the names of nationalities, ethnic groups and languages. Example Tuesday, August, Swahili Where do nouns come in sentences? Nouns can: o act as the subject of a verb: Example Cats kill mice o act as the object of a verb: Example Cats kill mice Developed by Learning Advisers 1 o act as the complement of a verb: Example They are men They often end a phrase which begins with an article such as a, an or a quantifier such as either, any, or many. They also often follow adjectives: Adjective Adjective Example A drunk either way a much older elite larger mice Countable and Uncountable Nouns Countable or ‘unit’ nouns [C] have a singular and a plural form, e.g. bookbooks. Uncountable or ‘mass’ nouns [U] have only one form, e.g. furniture. [C] [U] Singular Plural another biscuit three apples not much success The distinction between countable and uncountable is based on whether or not we can count what the noun describes.
    [Show full text]
  • The Interaction Between Telicity and Agentivity: Experimental Evidence from Intransitive Verbs in German and Chinese
    Originally published in: Lingua vol. 200 (2017), pp. 84-106. DOI: https://doi.org/10.1016/j.lingua.2017.08.006 POSTPRINT a,b,* a a a,b Tim Graf , Markus Philipp , Xiaonan Xu , Franziska Kretzschmar , a Beatrice Primus a Institute of German Language and Literature I, University of Cologne, Albertus-Magnus-Platz, 50923 Cologne, Germany b CRC 1252 Prominence in Language, University of Cologne, Germany The interaction between telicity and agentivity: Experimental evidence from intransitive verbs in German and Chinese Abstract: Telicity and agentivity are semantic factors that split intransitive verbs into (at least two) different classes. Clear-cut unergative verbs, which select the auxiliary HAVE, are assumed to be atelic and agent-selecting; unequivocally unaccusative verbs, which select the auxiliary BE, are analyzed as telic and patient-selecting. Thus, agentivity and telicity are assumed to be inversely correlated in split intransitivity. We will present semantic and experimental evidence from German and Mandarin Chinese that casts doubts on this widely held assumption. The focus of our experimental investigation lies on variation with respect to agentivity (specifically motion control, manipulated via animacy), telicity (tested via a locative vs. goal adverbial), and BE/HAVE-selection with semantically flexible intransitive verbs of motion. Our experimental methods are acceptability ratings for German and Chinese (Experiments 1 and 2) and event-related potential (ERP) measures for German (Experiment 3). Our findings contradict the above- mentioned assumption that agentivity and telicity are generally inversely correlated and suggest that for the verbs under study, agentivity and telicity harmonize with each other. Furthermore, the ERP measures reveal that the impact of the interaction under discussion is more pronounced on the verb lexeme than on the auxiliary.
    [Show full text]
  • Grammatical Gender and Linguistic Complexity
    Grammatical gender and linguistic complexity Volume I: General issues and specific studies Edited by Francesca Di Garbo Bruno Olsson Bernhard Wälchli language Studies in Diversity Linguistics 26 science press Studies in Diversity Linguistics Editor: Martin Haspelmath In this series: 1. Handschuh, Corinna. A typology of marked-S languages. 2. Rießler, Michael. Adjective attribution. 3. Klamer, Marian (ed.). The Alor-Pantar languages: History and typology. 4. Berghäll, Liisa. A grammar of Mauwake (Papua New Guinea). 5. Wilbur, Joshua. A grammar of Pite Saami. 6. Dahl, Östen. Grammaticalization in the North: Noun phrase morphosyntax in Scandinavian vernaculars. 7. Schackow, Diana. A grammar of Yakkha. 8. Liljegren, Henrik. A grammar of Palula. 9. Shimelman, Aviva. A grammar of Yauyos Quechua. 10. Rudin, Catherine & Bryan James Gordon (eds.). Advances in the study of Siouan languages and linguistics. 11. Kluge, Angela. A grammar of Papuan Malay. 12. Kieviet, Paulus. A grammar of Rapa Nui. 13. Michaud, Alexis. Tone in Yongning Na: Lexical tones and morphotonology. 14. Enfield, N. J. (ed.). Dependencies in language: On the causal ontology of linguistic systems. 15. Gutman, Ariel. Attributive constructions in North-Eastern Neo-Aramaic. 16. Bisang, Walter & Andrej Malchukov (eds.). Unity and diversity in grammaticalization scenarios. 17. Stenzel, Kristine & Bruna Franchetto (eds.). On this and other worlds: Voices from Amazonia. 18. Paggio, Patrizia and Albert Gatt (eds.). The languages of Malta. 19. Seržant, Ilja A. & Alena Witzlack-Makarevich (eds.). Diachrony of differential argument marking. 20. Hölzl, Andreas. A typology of questions in Northeast Asia and beyond: An ecological perspective. 21. Riesberg, Sonja, Asako Shiohara & Atsuko Utsumi (eds.). Perspectives on information structure in Austronesian languages.
    [Show full text]
  • “Doing Gender” in Doctor-Patient Interactions
    “DOING GENDER” IN DOCTOR-PATIENT INTERACTIONS: GENDER COMPOSITION OF DOCTOR-PATIENT DYADS AND COMMUNICATION PATTERNS A thesis submitted to Kent State University in partial fulfillment of the requirements for the degree of Master of Arts by Kelly MacArthur August, 2008 Thesis written by Kelly MacArthur B.A., Fairleigh Dickinson University 2004 M.A., Kent State University, 2008 Approved by , Advisor Timothy Gallagher , Chair, Department of Sociology Richard Serpe , Dean, College of Arts and Sciences John R.D. Stalvey ii TABLE OF CONTENTS LIST OF TABLES ...................................................................................................................iv ACKNOWLEDGEMENTS ......................................................................................................v 1. INTRODUCTION.................................................................................................................1 2. LITERATURE REVIEW......................................................................................................4 Language and Gender...................................................................................................4 The Turn-Taking Model...............................................................................................7 “Doing Gender” in Doctor-Patient Interactions .........................................................14 The Environment in Which Women Physicians Work...............................................16 Do Women and Men Communicate Differently?.....................................................
    [Show full text]
  • The Effects of Positive Versus Negative Language Use by a Physician
    THE EFFECTS OF POSITIVE VERSUS NEGATIVE LANGUAGE USE BY A PHYSICIAN To what extent does positive versus negative language used by a physician impact the patient outcomes of patients with nonspecific low back pain? A randomized study using video vignettes. Bachelor Thesis De invloed van positief versus negatief taalgebruik door een huisarts op patiëntuitkomsten. May 15, 2017 RADBOUD UNIVERSITY NIJMEGEN International Business Communication I.A. Stortenbeker M. Balsters Iris Nijemeisland, S4333950 [email protected] 06-23861611 Words: 7063 2 Summary The way in which a physician communicates has a great impact on patients, but little research has been focused on examining which specific words influence patient outcomes. Given the fact that up till 95% of people will experience low back pain at one point during their lives, this research tried to answer to what extent positive versus negative language used by a physician impacts the patient outcomes of patients with nonspecific low back pain. By using two different versions of fragments in a video vignette the researchers tried to measure any differences in analogue patients with regards to kinesiophobia (‘fear of moving’), recovery expectations, intention to therapy compliance and trust in the physician. This experiment did not indicate any present relationship between either a positive or a negative version of language use, and its impact on patient outcomes of people with nonspecific low back pain. No significant results with regard to these variables were found. After making a distinction between participants with a medical occupation and those with non-medical backgrounds, a significant result for the variable ‘empathy towards the patient’ was found.
    [Show full text]
  • The Grammatical Category of Modality∗
    The grammatical category of modality∗ Valentine Hacquard University of Maryland, College Park, Maryland, United States [email protected] Abstract In many languages, the same words are used to express epistemic and root modality. These modals further tend to interact with tense and aspect in systematic ways, based on their interpretation. Is this pattern accidental, or a consequence of grammar or mean- ing? I address this question by: (i) comparing `grammatical' modals to verbs/adjectives that share meanings with modals, but not the same scope constraints; (ii) examining pat- terns of grammaticalization from `lexical' to `grammatical' modality; (iii) comparing scope interactions in languages where modals are 'polysemous' and in those where they are not. 1 Introduction In many languages, the same modal words are used to express a variety of `root' and `epistemic' meanings. English may, for instance, can express deontic or epistemic possibility. About half of the 200+ languages in [vdAA05] have a single form that is used to express both kinds of modality. Yet, in many other languages, modal markers are unambiguously determined for meaning. In the Kratzerian tradition ([Kra81, Kra91, Kra12]), modals are lexically specified only for force (as existential or universal quantifiers over worlds), and the various meanings a `polysemous'1 modal expresses arise from the modal combining with various modal bases and ordering sources. This account, based on the case of polysemous languages, easily extends to non-polysemous ones: a modal can further lexically specify the kind of modal base and ordering source it allows, restricting its meaning to a single epistemic or root meaning. An alternative account based on non-polysemous cases would provide separate lexical entries for the various modals, and extend to polysemous cases by postulating ambiguity.
    [Show full text]
  • Arxiv:1909.02224V2 [Cs.CL] 9 Sep 2019 Code Gender Bias in Society (Bolukbasi Et Al., 2016; Other Languages, Such As French (FR)
    Examining Gender Bias in Languages with Grammatical Gender Pei Zhou1;2, Weijia Shi1, Jieyu Zhao1, Kuan-Hao Huang1, Muhao Chen1;3, Ryan Cotterell4, Kai-Wei Chang1 1Department of Computer Science, University of California Los Angeles 2Department of Computer Science, University of Southern California 3Department of Computer and Information Science, University of Pennsylvania 4Department of Computer Science, Johns Hopkins University [email protected]; fswj0419,jyzhao, khhuang, [email protected]; [email protected]; [email protected] Abstract in English word embeddings cannot be directly applied to languages with grammatical gender1, Recent studies have shown that word embed- where all nouns are assigned a gender class and dings exhibit gender bias inherited from the the corresponding dependent articles, adjectives, training corpora. However, most studies to and verbs must agree in gender with the noun (e.g. date have focused on quantifying and mitigat- ing such bias only in English. These analy- in Spanish: la buena enfermera the good female ses cannot be directly extended to languages nurse, el buen enfermero the good male nurse) (Cor- that exhibit morphological agreement on gen- bett, 1991, 2006). This is because most existing ap- der, such as Spanish and French. In this paper, proaches define bias in word embeddings based on we propose new metrics for evaluating gender the projection of a word on a gender direction (e.g. bias in word embeddings of these languages “nurse” in English is biased because its projection and further demonstrate evidence of gender on the gender direction inclines towards female). bias in bilingual embeddings which align these When the grammatical gender exists, such bias def- languages with English.
    [Show full text]
  • Role and Reference Grammar
    Work Papers of the Summer Institute of Linguistics, University of North Dakota Session Volume 37 Article 5 1993 Role and reference grammar Robert D. Van Valin Jr. Follow this and additional works at: https://commons.und.edu/sil-work-papers Part of the Linguistics Commons Recommended Citation Van Valin, Robert D. Jr. (1993) "Role and reference grammar," Work Papers of the Summer Institute of Linguistics, University of North Dakota Session: Vol. 37 , Article 5. DOI: 10.31356/silwp.vol37.05 Available at: https://commons.und.edu/sil-work-papers/vol37/iss1/5 This Article is brought to you for free and open access by UND Scholarly Commons. It has been accepted for inclusion in Work Papers of the Summer Institute of Linguistics, University of North Dakota Session by an authorized editor of UND Scholarly Commons. For more information, please contact [email protected]. ROLE AND REFERENCE GRAMMAR Robert D. Van Valin, Jr. State University of New York at Buffalo 1 Introduction 2 Historical background 3 Central concepts of the theory 3. I Clause structure 3.2 Semantic structure 3 .3 FOCUS structure 3. 4 Grammatical relations and linking 4 Some implications of RRG 1 Introduction* Role and Reference Grammar [RRG] (Van Valin 1993a) may be termed a "structural-functionalist theory of grammar"; this locates it on a range of perspectives from extreme formalist at one end to radical functionalist at the other. RRG falls between these two extremes, differing markedly from each. In contrast to the extreme formalist view, RRG views language as a system of communicative social action, and consequently, analyzing the communicative functions of morphosyntactic structures has a vital role in grammatical description and theory from this perspective.
    [Show full text]