Analyzing Sentiments of German Job References

Analyzing Sentiments of German Job References (Invited Paper) Finn Folkerts∗, Vanessa Schrecky, Shirin Riazyz and Katharina Simbeckx Hochschule für Technik und Wirtschaft Berlin Treskowallee 8, 10318 Berlin, Germany ∗Email: [email protected] yEmail: [email protected] zEmail: [email protected] xEmail: [email protected] Abstract—Filling a vacancy takes a lot of (costly) time. Auto- German job reference letter sentences. We explored whether mated pre-processing of applications using artificial intelligence the sentiment scores for almost identical sentences differentiate technology can help to save time, e.g., by analyzing applications between male and female subjects, German and Turkish sur- using machine learning algorithms. We investigated whether such systems are potentially biased in terms of gender, origin names as well as German surnames with and without nobiliary and nobility. For this purpose we created a corpus of common particle. German reference letter sentences on which we performed Our experiments show that all tested sentiment analysis sentiment analysis using the cloud services by Amazon, Google, services evaluate almost identical sentences unpredictably dif- IBM and Microsoft. We established that all tested services rate ferent, depending on the subject used in a sentence. the sentiment of the same template sentences very inconsistently and biased at least with regard to gender. II. RELATED WORK Several studies that are concerned with bias of NLP sys- I. INTRODUCTION tems regarding gender or origin have shown how algorithms In order to improve the efficiency, companies automate their reproduce stereotypes. Comparing 200 sentiment analysis sys- recruiting process by using either external software solutions tems [9] found evidence for strong bias. In [8], the authors or self-developed tools. Besides quality of and cost per recruit- trained a system that learned word associations from the Com- ment, staffing time is an important performance indicator [1]. mon Crawl5 corpus via word embedding techniques, resulting For this reason, Natural Language Processing (NLP) systems in replicating stereotypes like women getting more associated are sometimes used to pre-select candidates [2], [3]. The use with family and arts, while men get more associated with of such artificial intelligence (AI) models is expected to to career and sciences. Bolukbasi et al. [10] attempted to develop increase productivity [4], but often lacks transparency [5], [6]. a methodology to reduce bias in applications using word The systems are used as black boxes, this non-transparency embedding. However, algorithmic bias is not limited to NLP may obfuscate ethical issues such as discrimination [2], [4], but also affects computer vision systems. It turned out that one [7]. Automation and digitization could contribute to making particular training data set contained 33% more pictures with HR processes less prone to prejudices or stereotypes. AI women associated with the activity cooking than men, which systems are commonly expected to be objective and therefore amplifies the reproduction of gender based stereotypes when help to avoid potential discrimination. However, when systems models trained with this data set are being used [11]. learn from biased data they will reproduce those biases [8]. The Gender Shades study [12] identified a high misclassifi- Sentiment analysis systems have been shown to assess sen- cation rate for face recognition algorithms with respect to the tences inconsistently, depending on gender and race of the skin tone of a person. Compared to light-skinned individuals, subject [9]. dark-skinned individuals were up to 34.4% more likely to be In this research we analysed the fairness of commercial misclassified. sentiment analysis systems. We have extended the work of [9] However, traditional HR processes are biased, too. Several to the human resources (HR) domain by testing the sentiment studies have shown that applicants with uncommon or for- analysis systems provided by Google Natural Language API1, eign sounding names are discriminated against by potential Amazon Web Service Comprehend2, IBM Watson Natural employers [13]–[15]. Language Understanding3 and Microsoft Azure Cognitive III. THE GERMAN JOB REFERENCE CORPUS (GJRC) Service4 with sentences from German job reference letters. For this purpose we compiled a test corpus with typical We compiled a test corpus with typical German job reference letter sentences from German books on how to write 1https://cloud.google.com/natural-language/ job reference letters [16]–[20]. We combined those template 2https://aws.amazon.com/comprehend sentences with subjects of varying gender, origin and nobility. 3https://www.ibm.com/watson/services/natural-language-understanding/ 4https://azure.microsoft.com/en-en/services/cognitive-services/ 5https://commoncrawl.org/ c 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. TABLE I 6 SURNAMES USED TO COMPILE THE CORPUS. Corpus available on GitHub . German with IV. EXPERIMENT German Turkish nobiliary particle We conducted an experiment using the cloud services for Becker von Eyb Aras sentiment analysis from Amazon, Google, IBM and Microsoft, Dürr von Halem Erikli because they own the biggest market shares [24]. We tested Gruber vom Bruch Bozoglu˘ if there are significant differences in the sentiment scores Haußmann von Berg Çaglar˘ depending on gender or surname of the subject in the corpus Klein von Breitenbuch Ta¸s sentences. We compared all sentences containing a female Pfeiffer von Brunn Dogan subject with those containing a male subject. Accordingly, Sänze von Danwitz Demirel we compared German surnames with Turkish surnames and Pohl von Angern Özdemir German surnames with and without nobiliary particles. Stettner von Kalben Yilmaz Using Python scripts, we have automated the sentiment Zimmermann von Pein Yüksel analysis of all 52,266 sentences via the services’ application programming interfaces (API). The data was collected through July 2019 and is also available on GitHub7. Due to free tiers and promotion credits the overall cost for this study did not We focused on comparing sentences with different sur- exceed $100. names, expecting to receive lower sentiment scores for non- In the following section we describe the sentiment analysis German names. We chose to compare German surnames with and how susceptible the four services are to variations in the Turkish surnames as citizens from Turkish origin represent the three categories gender, origin and nobility. largest minority in Germany [21]. Further, we expected higher sentiment scores for names A. Data pre-processing that indicate nobility (“von”, “zu”). However, a nobiliary Each of the services provide scores on different scales. The particle does not grant any privileges by law in Germany since scores XG of Google and XIBM of IBM lie in the interval 1919 [22]. Consequently, we examined the differences for ten [ 1; 1]. For a service S G; IBM , XS (0; 1] defines − 2 f g j j 2 German surnames with nobiliary particle as well as ten Turkish the magnitude and sgn(XS) 1; 1 defines the direction of 2 {− g surnames and ten common German surnames. negative or positive ratings. Note that a realization of XS = 0 To find suitable surnames for our experiment, we looked up defines a neutral rating without magnitude. the lists of members of the German state parliaments. Then The ratings XM provided by Microsoft lie in the interval we mapped the names to their origins and randomly picked [0; 1], where xM = 0 represents the maximal realization of a ten German surnames, ten German surnames with nobiliary negative rating and xM = 1 defines the positive extremum. particle and ten Turkish surnames. In Table I we listed all We define the correction surnames that we used to compile the GJRC. M M X^ := X 2 1 [ 1; 1] (1) Following a literature review, we collected 843 different · − 2 − sentences that are commonly used in German job references. in order to compare this score to the previously in- In order to generate multiple versions of the same sentence, troduced ones. The scores of the Amazon service were we modified each one so that it can be used as a template: A jLj provided in the shape (X ) [0; 1] , where L := all words that are gender-specific or require gender-specific negative, neutral, positive, mixed2 defines all possible labels declension were substituted with a suitable placeholder. Four andf the value represents the probabilityg of belonging to each examples of such template sentences are shown in Table II. of the labels. Similar to before, we have mapped the scores Note that by law, German reference letters must be phrased provided by Amazon to our desired scale as seen in (4). For favorably to the employee [23], even if the employee did not A A A A A perform well. Therefore, a generally positive sentiment can be (X )i=1;:::;4 := (Xneg; Xneu; Xpos; Xmix) (2) expected. and realizations A A A A To compile the German Job Reference Corpus, we com- (xneg; xneu; xpos; xmix) (3) bined each template sentence with each of the 30 different mixed surnames and both gender specific titles. This yields 60 distinct of the random variable, we have removed the label sentences originating from the same template. Additionally, as it never occurred as highest probability and mapped the we altered each template sentence by replacing the title and remaining probabilities to surname with the corresponding male or female pronoun, thus A A Xpos Xneg ^ A − adding another two sentences per template to the corpus. X := A A A [ 1; 1]; (4) (Xneg + Xneu + Xpos) 2 − Eventually, the corpus consists of 52,266 sentences in total, out of which 1,686 sentences are formed with a pronoun 6https://github.com/iug-htw/GJRC instead of a name.

Analyzing Sentiments of German Job References

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support