Turkish Journal of Physiotherapy and Rehabilitation; 32(3) ISSN 2651-4451 | e-ISSN 2651-446X

A COMPREHENSIVE STUDY OF AUTOMATED AUTHOR PROFILING TECHNIQUES FOR PERSONALITY TRAIT DETECTION

Aparna M C 1, Dr M.N. Nachappa2 1Research Scholar,School of Computer Science and Information Technology, JAIN( Deemed to be University), Bangalore [email protected] 2Professor and Head, School of Computer Science and Information Technology, JAIN (Deemed to be University), Bangalore [email protected]

ABSTRACT

The features of an individual's behaviour, motivation, emotion, and cognitive pattern make up their personality. Our personality has a big influence on us and consequently has an impact on our life, physical well-being, work choices, and health. [1]. The method of determining the user's demographic traits such as age group, , native language, educational qualification, personality trait, and is called Author Profiling (AP)[2]. Much research has been going on in the above field as there are many applications in different areas like marketing, , etc. Moreover, there has been much progress as Machine learning and Deep learning classifiers like Support Vector Machine, M5 Regression, CNN, RNN, and many more are being used for automatic classification. This paper describes significant machine learning and deep learning models on Author Profiling for Personality detection along with an extensive study on current research trends, challenges, and applications of detecting personality traits of an author of the text.

Keywords: Author Profiling, Big 5 personality traits, Text Classification.

I. INTRODUCTION Individuals' choice of words and how they use words convey a share of information about the individual and hint at their age, social status, sex, personality and motives[3]. Hence to know whether a person is emotionally distant or close, shallow or thoughtful, neurotic, extraverted or open to new types of experience can be sensed by the words used by that person. Moreover, one can analyse and access large volumes of text samples to identify the personality traits of authors automatically and predict their potential responses and behaviours [4]. Machine learning (ML) algorithms are extensively used in a rapidly improving digital world to detect relevant patterns from data which is surrounding us human beings. Thus, Author profiling aims to find complete information about a person by analysing texts written by that person using ML or Deep Learning techniques[5]. This study has extensive applications in security of digital data, spotting predatory activities, cyber-terrorism and detecting fraud, or even . It is also used in more familiar settings like market research, chatbots and diagnosis and also improving service of the customer[6]. This paper presents a survey of how automated personality trait detection methodologies have progressed through the years. We have covered the base for measurement of Personality Traits in Section II, followed by Section III, covering various study applications. Section IV gives a detailed explanation of the datasets available for the study of AP. Section V analyzes the current research trends in AP for personality traits using Machine learning and Deep learning along with their methodologies. Finally, Sections VI and VII describe the various challenges and conclusions, respectively.

Personality trait measures: While personality theory remains predominant with its own set of disputes, researchers have primarily concentrated to the Big-5 trait personality model over the last few decades. Numerous additional traits are being fused into the Big-5 trait model for entrepreneurial work, including innovativeness, self-efficacy, risk attitudes and locus of control. Thus, personality traits are very helpful in determining one's outcomes of life. www.turkjphysiotherrehabil.org 10891

Turkish Journal of Physiotherapy and Rehabilitation; 32(3) ISSN 2651-4451 | e-ISSN 2651-446X

A personality trait is a usual outline of thinking, feeling or behaving that have a tendency to be constant over the time and across various similar situations [7]. The languages we have around us include various adjectives to describe personality, among these many of them can be arranged in the class of the Big Five trait dimensions:

a) Extroversion (e.g., friendly, energetic and assertive).

b) Agreeableness (e.g., compassionate, respectful, and trusting).

c) Conscientiousness (e.g., orderly, hardworking, and responsible)

d) (worrying, temperamental and pessimistic)

e) Openness to Experience (intellectual, artistic, imaginative and Open-Mindedness)

A great number of interpersonal, individual and social-institutional outcomes have been associated with the Big 5 personality traits; this association has been proved with the reviews conducted to date[5]. For example, high extroversion has been related to capacity of leadership and status is society, Conscientiousness has been connected to health and job performance, Agreeableness has been associated with satisfaction in a relationship and volunteerism, Neuroticism had been related to negative emotionality and conflict in relationship while open mindedness is linked to political liberalism and spirituality[8].

As widely held research uses the Big-Five personality trait measurement for classification, we considered the above method for analysis and measurement of model accuracies.

II. APPLICATIONS: Automating the personality recognition systems has a large number of industrial applications in the present times. With the kind of research going on, we can deduce that the market options will rise shortly. The models are expected to measure the personality consistently and accurately. If this is achieved one can expect the increasing demand for automated personality recognition software. Since research in this field is developing, we are certain to find models with more reliability and accuracy. Almost every human to computer interaction in the future can be connected with Artificial intelligence. Many computational devices that are equipped with personality are being developed which can make it react to different people in different ways. For example, we can have a phone that has different modes for people having different personalities, this can lead to more personalized interaction with the device. Personality traits can also be used to achieve higher accuracy in tasks such as detection of sarcasm, systems for polarity disambiguation of words, or lie detection.

Below are the few areas in which personality detection plays a vital role:

a) Enhanced Personal Assistants: Automatic detection of the user's personality can help voice assistants that are automated such as Google Assistant, Siri, and Alexa and so on, give customized responses. Also, to increase user satisfaction, these voice assistants can be programmed to display different personalities depending on the person's personality.

b) Recommendation Systems: It is said that people who share a specific personality type tend to share similar interests and also hobbies. Thus, positively evaluating various products by another user of a similar personality type can recommend different products and services to a user. For example, Yin et al. gave a proposal to model the intentions of purchases of automobiles by customers depending on their hobbies and personality [9]. In addition, Yang and Huang have successfully developed a game recommendation system for players which recommends games for different players depending on their personality traits that were derived through automated personality detection by analysing their chats with other players[10].

c) Polarity Detection of words: Cambria et al. mention that Personality detection is a subpart of , and it can be used for polarity disambiguation of words in finding the sentiment lexicon[11]. Majumdar et al. say that it can also be used for disambiguation between non-sarcastic and sarcastic content[12]. www.turkjphysiotherrehabil.org 10892

Turkish Journal of Physiotherapy and Rehabilitation; 32(3) ISSN 2651-4451 | e-ISSN 2651-446X

d) Specialized Health Care and Counselling: People are giving more importance to mental health care than in earlier years. Many individuals are coming up to seek professional counselling to cure mental health- related issues. Personality detection can prove to be very helpful in fixing mental health-related problems, and it helps give better counselling guidance.

e) Forensic Science: Determining the linguistic profile of an individual who is the author of a suspicious text can prove to be very helpful to gain some background information about the author. Personality detection models can also help increase the accuracy of lie detectors[13].

f) Job Recruiting: Most of the recruiting process these days concentrates more on the aptitude of the candidate. While it is an essential aspect to search for in the candidate, it is equally important to recruit a candidate who matches the personality required for that job role to perform better in the position given. For example, after the personality detection test, candidates with high neuroticism trait values can be eliminated for positions involving leadership qualities[14].

g) Study of Psychology: Automated detection of personality traits can help in finding strong relations between personality traits of people and their behaviours, knowing this information can be helpful in discovering new dynamics of human spirits.

h) Forecasting personality traits of voters: Automated personality detection on a broader scale can be used by politicians as a parameter to develop campaigns which are targeted to voters of a certain personality. Psychographic profiles of voters can be created by analytical firms by using the large amount of analysed behavioural data of the voters. These profiles can be very helpful for based on the interests of people with a certain kind of personality at a location for a particular political event.

i) Smartphone addiction levels: In general, addictions towards anything are related closer to an individual‟s personality, and the same applies to addiction to smartphones, and thus personality traits detection is also used to predict smartphone addiction levels.

III. AVAILABLE DATASETS: j) Different datasets are being created for author profiling in various languages. These datasets, along with personality detection, are used for many other applications like gender identification, native language identification, occupation detection, age group detection, etc. A few of the datasets are listed below:

k) Enron Dataset: Joe Bartling collected the Enron data for two weeks in May 2002 at Enron Corporation headquarters in Houston, and the 3CALO Project then prepared it (A Cognitive Assistant that Learns and Organizes).The dataset has texts from 150 users which are taken from the senior management. They are divided into folders. The Enron dataset contains around 500000 . During an investigation the Federal Energy Regulatory Commission received the data which was public then.

l) Ruspersonality: It is a Russian corpus which was created for authorship profiling and detection of deception. The corpus was created to serve many purposes like authorship profiling, authorship attribution, deception detection, gender detection, etc. The corpus contains over 1850 documents which were taken from 1145 respondents and is still expanding. The average length of each text is around 230 words. The corpus can be used by anyone freely for academic research purposes on demand. Tatiana Litvinova is head of Corpus and Authorship Profiling Lab (RusProfiling Lab)[15].

m) SMS AP 18: It is a corpus containing SMS texts. There are 810 profiles, where 610 are male and 200 are female. The dataset has 84694 SMS messages and there are around 104.56 messages in one profile, which has a standard deviation of 64.87. There are 732,219 words or tokens and 50974 word types or unique tokens in total.. This corpus contains SMS data of English and Roman Urdu [16].

n) The NUS SMS Corpus: This corpus consists of SMS (Short Message Service) messages which were mainly compiled for study at the Department of Computer Science, National University of Singapore. This dataset contains 67,093 SMS messages taken from the corpus. The messages were primarily collected from Singaporeans and mostly from students of the University. These messages were collected from students who volunteered to share their data. They were informed about the research and also how www.turkjphysiotherrehabil.org 10893

Turkish Journal of Physiotherapy and Rehabilitation; 32(3) ISSN 2651-4451 | e-ISSN 2651-446X

the data would be used for research. The metadata about the messages and their senders of text were collected by the data collectors to enable different types of evaluations. This corpus was created by Tao Chen and Min-Yen Kan[17].

o) RUEN-AP-17: It contains author's profiles along with their demographic details such age, gender, language, native place, qualification, personality and occupation. It has a dictionary which is bilingual containing 7749 entries that translate Roman Urdu words to English. It includes 479 unique profiles with 1032899 terms and 453986-word types with an average of 2156 tokens per profile[6].

p) Essays I: This dataset has about 2468 essays that are anonymous labelled with the personality traits of authors. Volunteers inscribed Stream-of-consciousness writings in a measured environment, and the authors of the essays were requested to mark their Big-Five personality traits [3].

q) SenticNet Dataset: The dataset contains more than 14000 concepts and their polarity score ranging from - 1.0 to +1.0. Along with this, it has 7600 multiword concepts[18].

r) MBTI Kaggle: It is a dataset collected from the users of personalitycafe.com, it is an online forum where the users answer a questionnaire that divides them into their MBTI types, and then they chat publicly with their others. This dataset is available in Kaggle and contains 8600 rows of data. It consists of two columns 1. MBTI type indicator of a person, 2. Fifty posts of the user. As there are 50 posts for every user, the number of data points is 430000[19].

s) MyPersonality: This dataset was created by using a Facebook App which was created that requested its users to answer a personality questionnaire which was later used for psychological research. However, they stopped data sharing for research purposes from the year 2018[12].

t) Italian FriendFeed: A total of 1065 posts were found in a sample of 748 Italian FriendFeed users. The data was gathered from FriendFeed's public URL, and the fresh posts can be found here. A language identification has already been applied to this dataset.[20].

u) Personae: It is a corpus purely used for predicting personality traits with a collection of 145 student texts about a documentary on Artificial Life containing both factual description and opinion-based data with around 1400 words/student [21].

IV. RELATED WORK: A survey on Machine Learning Classifiers: For the classification of a text author based on personality traits, researchers have used a variety of machine learning techniques such as Bayesian classifiers, K-nearest neighbour (KNN), Decision Trees, Support Vector Machines (SVMs), Neural Networks, Rocchio's Algorithm, Latent Semantic Analysis, Fuzzy Correlation, and Genetic Algorithms, among others.[22]. Pennebaker et al.[23] discovered some of the earliest reliable links between and personality, such as the frequency of usage of terms. Extraverts, for example, were more likely to use words like great, happy, and awesome to describe positive emotions. Those with higher levels of Neuroticism, on the other hand, were found to utilise first-person singulars like Me, I, and Mine more frequently. Poria et al. went on to investigate the influence of common-sense knowledge combined with sentiment data on personality recognition. They created a feature vector including LIWC features, Sentic based emotional features, and MRC features to evaluate the model. These characteristics were also used to construct the model using the Sequential Minimal Optimization (SMO) classifier.[18]. Personality trait detection has gained popularity in various fields like the gaming recommendation system and the fake news alert system. Hsin-Chang Yang et al. created a model to detect the personality traits of gamers to improvise their game recommendation systems. They employed the M5 Regression tree, which gave them a 74 percent accuracy in Roshchina's report. On a scale of 1 to 7, the recognizer model was supposed to assign a score to each trait. For example, 7 denotes a person who is exceptionally outgoing.[10]. Anu et al. developed a model that uses linguistic and personality features to detect fake news spreaders in social media networks. They employed extra Trees for N-grams and tweet embedding, and logical regression for sentiment analysis as the best classifier for style features. For their model, they received a 0.72 F accuracy score.[24]. Personality identification through social media, particularly

www.turkjphysiotherrehabil.org 10894

Turkish Journal of Physiotherapy and Rehabilitation; 32(3) ISSN 2651-4451 | e-ISSN 2651-446X

Twitter, has gained popularity in recent years. This may be partially due to the fact that data collecting is direct and available via the Twitter API.[25].

Framework for Author Profiling to detect personality traits using Machine Learning models: Preprocessing the dataset: Preprocessing requires the representation of a collection of documents in a modified format. This procedure is carried out primarily to lessen the complexity of the documents and make them easier to handle. The documents will be transformed from full-text to document vector format here. One of the most prominent properties of text classification issues is the large dimensionality of text data. Document preprocessing allows proper data manipulation and representation. Few of the steps in data preprocessing includes:

Tokenization: It's a step that breaks down lengthy documents or texts into smaller chunks or tokens. Tokenize larger sections of texts into sentences, which can then be tokenized into words, and so on. It's also known as text segmentation or lexical analysis.

Normalization: This step is carried out to align all of the text on the same field and at the same level. Converting text to higher or lower case, converting numerals to their word equivalents, eliminating , and so on are all examples of this. The three distinct steps in normalization are stemming, lemmatization and removal of stop words, etc.

Feature Extraction: Efficient data manipulation and representation requires a Dimensionality Reduction(DR). This is a crucial step as the high dimension of redundant features and irrelevant data often reduce the performance of the classification algorithms. DR is the exclusion of irrelevant keywords with the help of a statistical process, thereby creating a low dimension vector. Different feature extraction methods are Bag-of- word, Word embeddings such as TF-IDF, Count vectors, etc.

Feature Selection: This method is used to increase the accuracy, efficiency, and scalability of the text classifier. The key idea was to pick a subset of features from the original set of retrieved features. Information Gain (IG), Chi-Square, and Odds Ratio are a few examples of feature selection strategies.

Classification Models: Automatic text classification has been intensively researched, and this field appears to be progressing quickly. In the process of building a model to classify text based on personality traits, machine learning approaches such as Decision Tree, Bayesian classifier K-nearest neighbour (KNN), Latent Semantic Analysis, Support Vector Machines (SVMs), Rocchio's Algorithm, Genetic Algorithms, and Fuzzy Correlation, among others, have been used as classifiers. For automatic text classification, supervised learning approaches are commonly utilised, in which pre-defined category labels are tagged to the texts based on the prospect offered by a training set of labelled texts.

Fig 1: General Framework of AP system www.turkjphysiotherrehabil.org 10895

Turkish Journal of Physiotherapy and Rehabilitation; 32(3) ISSN 2651-4451 | e-ISSN 2651-446X

Related work on Deep learning technologies: Even though Machine learning has been the apple of the eye for many researchers since the early 2000s. It's worth noting that neural network designs and models have improved since 2014, opening the path for deep learning models to flourish. These deep learning models outperform current machine learning models in terms of accuracy. One must note that social media data is used extensively for research in the AP field these days, but essay datasets also prove to be a popular choice for the study. For a document-level personality recognition test, Majumder et al. use a deep Convolutional Neural Network (CNN). First, CNN was used to easily extract monogram, bigram, and trigram features from the text. Using Word2Vec, each word was then represented as a fixed-length feature vector in the input. With the feature vector recovered from the deep CNN, a total of 84 features were added. Finally, for the final detection of personality traits, this vector was sent into the fully linked layer. [3]. Hernandez and Scott used Recurrent Neural Network (RNN) to capture some of the data that ML classifiers such as Naïve Bayes would otherwise ignore. They used Keras to evaluate multiple Recurrent Neural Network (RNN) models, including simple RNN, Gated Recurrent Unit (GRU), Long Short Term Memory (LSTM), and bidirectional LSTM.. Among these, they found that LSTM gave the best results[19]. Liu et al. have used atomic features of text, characters. They have combined them with a deep learning model where the prediction of an individual's personality trait is made using vector world, hierarchical, and sentence representation. They use bi-RNN using GRU as the recurrent unit here instead of the standard word embeddings like Glove or Word2Vec. The performance of GRU is similar to LSTN, but it is computationally less expensive[26]. 2CLSTM is a personality recognition model developed by Xiangguo Sun et al. This approach combines bidirectional LSTMs (Long Short-Term Memory networks) with a CNN model to detect user personalities based on the structure of texts. In this case, CNN is utilised to reconstruct the course of creating the articles, while RNN deciphers the text by mimicking human reading behaviour. The abstract feature combination based on closely connected sentences is represented by a notion called Latent Sentence Group (LSG). [27].

Fig 2: Architecture of 2CLSTM model[27].

The general framework of Author Profiling using Deep Learning Methodologies: The neural network is a computational system that acts as a connectionist. Typically, computational systems are procedural where the program starts from the first line of code, and it executes the lines one after the other. However, that sequence is not followed in a Neural network. Instead, a network is a frame made up of nodes that processes the information in parallel. Each node of the network is called a neuron. A network of multiple neurons has a higher tendency of being more intelligent. The ability of neural networks to learn is one of its strongest features. It is a complex system that can adapt itself to the environment by modifying its internal structure as the data flows through it. Based on the weight associated with each connection between nodes in the network, the network produces signals as either good or bad depending on the output produced. If there is a bad

www.turkjphysiotherrehabil.org 10896

Turkish Journal of Physiotherapy and Rehabilitation; 32(3) ISSN 2651-4451 | e-ISSN 2651-446X signal, then the network adjusts the weight to optimize the network. If the signal is good, the model need not adjust the weights.

Fig 3: A general framework of a Neural network-based model for text classification.

V. CHALLENGES a) A lot of social media datasets are available, but there are various limitations with social media datasets. These limitations include the unavailability of comprehensive data for analysis due to privacy issues. In addition, the information given in social media might not be genuine as the number of fake profiles is swelling every day[25].

b) Automated author profiling belongs to Computer Sciences, whereas psychology belongs to social sciences, and thus, building a bridge with viewpoints and validation techniques from both the sciences is very important. While blindly trusting the data given, one must question how reliable is this data? Is it okay to forego complete human intervention?

c) Identifying the most appropriate model of personality traits for a given application is still a matter of confusion. Researchers have found a discrepancy between observed personality and self-assessed personality.

d) Cross-cultural and cross-lingual effects on personality have to be studied to better the accuracy of the personality detection model. To achieve this target, corpora from other languages need to be included, which will make the model more complex.

e) When the data generated from social media like status updates, tweets, comments, reviews, etc. are used, we usually come across slang words like- btw for “by the way,” ppl for people, etc. but these words do not exist in the dictionary, which in turn affect the accuracy of Author Profiling algorithms.

f) The utmost problem in Author Profiling is the ambiguity of the language, i.e., the capability of being understood in multiple possible senses as a word or phrase may have various meanings, which may lead to ambiguity problems.

VI. CONCLUSION: Machine learning and psychology have many complementary interests, leading researchers to discover various novel ways of finding solutions to problems in the respective domains. As a result, it is critical to understand the diverse demands and interests of numerous stakeholders across disciplines in order to have significant innovation in the field that will benefit researchers. We discuss popular Author profiling techniques for personality features in this research. Given our current understanding of the problems, there is a lot of room for more research in this area

REFERENCES 1 S. Hussain, M. Abbas, K. Shahzad, and S. A. Bukhari, “PERSONALITY TRAITS AND CAREER CHOICES,” no. December, 2016, doi: 10.5897/AJBM11.2064. 2 S. Argamon et al., “Psychological aspects of natural language use: Our words, our selves,” Proc. Ninth Int. Conf. CLEF Assoc. (CLEF 2018), vol. 20, no. 4, pp. 42–46, Feb. 2018, doi: 10.1145/3293339.3293342. www.turkjphysiotherrehabil.org 10897

Turkish Journal of Physiotherapy and Rehabilitation; 32(3) ISSN 2651-4451 | e-ISSN 2651-446X

3 N. Majumder, I. P. Nacional, A. Gelbukh, and I. P. Nacional, “Deep Learning-Based Document Modeling for Personality Detection from Text,” 2017. 4 F. Rangel, P. Rosso, M. Potthast, and B. Stein, “Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter,” Work. Notes Pap. CLEF, 2017. 5 D. J. Ozer and V. Benet-Martínez, “Personality and the prediction of consequential outcomes,” Annu. Rev. Psychol., vol. 57, no. February, pp. 401– 421, 2006, doi: 10.1146/annurev.psych.57.102904.190127. 6 M. Fatima, K. Hasan, S. Anwar, and R. M. A. Nawab, “Multilingual author profiling on Facebook,” Inf. Process. Manag., vol. 53, no. 4, pp. 886–904, 2017. 7 G. Chittaranjan, B. Jan, and D. Gatica-Perez, “Who‟s who with big-five: Analyzing and classifying personality traits with smartphones,” Proc. - Int. Symp. Wearable Comput. ISWC, pp. 29–36, 2011, doi: 10.1109/ISWC.2011.29. 8 C. J. Soto, “How Replicable Are Links Between Personality Traits and Consequential Life Outcomes? The Life Outcomes of Personality Replication Project,” Psychol. Sci., vol. 30, no. 5, pp. 711–727, 2019, doi: 10.1177/0956797619831612. 9 M. Briedienė and J. Kapočiutė-Dzikienė, “An automatic author profiling from non-normative Lithuanian texts,” CEUR Workshop Proc., vol. 2145, pp. 99–105, 2018. 10 H. C. Yang and Z. R. Huang, “Mining personality traits from social messages for game recommender systems,” Knowledge-Based Syst., vol. 165, no. xxxx, pp. 157–168, 2019, doi: 10.1016/j.knosys.2018.11.025. 11 E. Cambria, S. Poria, A. Gelbukh, and M. Thelwall, “Sentiment Analysis Is a Big Suitcase,” IEEE Intell. Syst., vol. 32, no. 6, pp. 74–80, 2017, doi: 10.1109/MIS.2017.4531228. 12 Y. Mehta, N. Majumder, A. Gelbukh, and E. Cambria, “Recent trends in deep learning based personality detection,” Artif. Intell. Rev., vol. 53, no. 4, pp. 2313–2339, 2020, doi: 10.1007/s10462-019-09770-z. 13 F. Rangel, P. Rosso, B. Verhoeven, W. Daelemans, M. Potthast, and B. Stein, “Overview of the 4th author profiling task at PAN 2016: Cross-genre evaluations,” CEUR Workshop Proc., vol. 1609, pp. 750–784, 2016. 14 M. F. Annemarie and A. Sukma, Interdisciplinary Perspectives on Algorithmic Job Candidate Screening Green Open Access added to TU Delft Institutional Repository „ You share , we take care !‟‟ – Taverne project.‟ 2018. 15 T. Litvinova, P. Seredin, O. Litvinova, and O. Zagorovskaya, “Profiling a set of personality traits of text author: what our words reveal about us,” Res. Lang., vol. 14, no. 4, pp. 409–422, 2016. 16 M. Fatima et al., “Multilingual SMS-based author profiling: Data and methods,” Nat. Lang. Eng., vol. 24, no. 5, pp. 695–724, 2018. 17 T. Chen and K. A. N. MIN-YEN, “The National University of Singapore SMS Corpus,” 2015. 18 S. Poria, A. Gelbukh, and B. Agarwal, “LNAI 8266 - Common Sense Knowledge Based Personality Recognition from Text,” pp. 484–496. 19 hernandez and knight, “Predicting MBTI from text.” . 20 F. Celli, “Unsupervised Personality Recognition for Sites,” ICDS 2012, Sixth Int. Conf. Digit. Soc., no. May, pp. 59--62, 2012, [Online]. Available: http://personality.altervista.org/docs/2012_celli_icds.pdf. 21 K. Luyckx and W. Daelemans, “Personae: a Corpus for Author and Personality Prediction from Text.,” 2008. 22 B. Baharudin, L. H. Lee, and K. Khan, “A Review of Machine Learning Algorithms for Text-Documents Classification,” J. Adv. Inf. Technol., vol. 1, no. 1, 2010, doi: 10.4304/jait.1.1.4-20. 23 J. W. Pennebaker, M. R. Mehl, and K. G. Niederhoffer, “Psychological aspects of natural language use: Our words, our selves,” Annu. Rev. Psychol., vol. 54, no. 1, pp. 547–577, 2003. 24 Shrestha, F. Spezzano, and A. Joy, “Detecting Fake News Spreaders in Social Networks via Linguistic and Personality Features Notebook for PAN at CLEF 2020,” Cappellato, L., Eickhoff, C., Ferro, N., N´ev´eol, A. CLEF 2020 Labs Work. Noteb. Pap., no. September, pp. 22–25, 2020, [Online]. Available: ceur-ws.org. 25 M. Pundlik Kalghatgi, M. Ramannavar, and N. S. Sidnal, “A Neural Network Approach to Personality Prediction based on the Big-Five Model,” Int. J. Innov. Res. Adv. Eng., vol. 2, no. 8, pp. 56--63, 2015, [Online]. Available: http://www.ijirae.com/volumes/Vol2/iss8/09.AUAE10095.pdf. 26 L. Liu, D. Preoţiuc-Pietro, Z. R. Samani, M. E. Moghaddam, and L. Ungar, “Analyzing personality through social media profile picture choice,” Proc. 10th Int. Conf. Web Soc. Media, ICWSM 2016, pp. 211–220, 2016. 27 X. Sun, B. Liu, J. Cao, J. Luo, and X. Shen, “Who am I? Personality detection based on deep learning for texts,” IEEE Int. Conf. Commun., vol. 2018- May, pp. 1–6, 2018, doi: 10.1109/ICC.2018.8422105.

www.turkjphysiotherrehabil.org 10898