Authorship Attribution and Author Profiling of Lithuanian Literary Texts Jurgita Kapociˇ ut¯ e-Dzikien˙ e˙ Andrius Utka Ligita Šarkute˙ Vytautas Magnus University Vytautas Magnus University Kaunas Univ. of Technology K. Donelaicioˇ 58, LT-44248, K. Donelaicioˇ 58, LT-44248, K. Donelaicioˇ 73, LT-44029, Kaunas, Lithuania Kaunas, Lithuania Kaunas, Lithuania
[email protected] [email protected] [email protected] Abstract posts, Internet comments, tweets, etc.) the au- thorship analysis is becoming more and more top- In this work we are solving authorship at- ical. In this respect it is important to consider tribution and author profiling tasks (by fo- the anonymity factor, as it allows everyone to ex- cusing on the age and gender dimensions) press their opinions freely, but on the other hand, for the Lithuanian language. This paper opens a gate for different cyber-crimes. Therefore reports the first results on literary texts, the authorship research –which for a long time which we compared to the results, pre- in the past was mainly focused on literary ques- viously obtained with different functional tions of unknown or disputed authorship– drifts to- styles and language types (i.e., parliamen- wards more practical applications in such domains tary transcripts and forum posts). as forensics, security, user targeted services, etc. Available text corpora, linguistic tools, and sophis- Using the Naïve Bayes Multinomial and ticated methods even more accelerate the devel- Support Vector Machine methods we in- opment of the authorship research field, which is vestigated an impact of various stylistic, no longer limited to authorship attribution (when character, lexical, morpho-syntactic fea- identifying who, from a closed-set of candidate tures, and their combinations; the differ- authors, is the actual author of a given anonymous ent author set sizes of 3, 5, 10, 20, 50, text document) only.