<<

future

Article Social Emotional Opinion Decision with Newly Coined Words and Polarity of Social † Networks Services

Jin Sol Yang 1 , Myung-Sook Ko 2 and Kwang Sik Chung 1,*

1 Department of Computer Science, Graduate School, Korea National Open University, Jongno-gu, Dongsung-Dong, Seoul 30387, Korea 2 Department of Business Administration, Bucheon University, WonMi-Gu, SimGok-Dong 25, Bucheon 14632, Korea * Correspondence: [email protected]; Tel.: +82-23668-4654 This paper is an extended version of our paper: Yang, J.S.; Chung, K.S. Newly-Coined Words and Emoticon † Polarity for Social Emotional Opinion Decision. In Proceedings of the IEEE 2nd International Conference on Information and Computer Technologies, Kahului, HI, USA, 14–17 March 2019; p. 76.

 Received: 9 June 2019; Accepted: 22 July 2019; Published: 25 July 2019 

Abstract: Nowadays, based on mobile devices and internet, social network services (SNS) are common trends to everyone. Social opinions as public opinions are very important to the government, company, and a person. Analysis and decision of social polarity of SNS about social happenings, political issues and government policies, or commercial products is very critical to the government, company, and a person. Newly coined words and on SNS are created every day. Specifically, emoticons are made and sold by a person or companies. Newly coined words are mostly made and used by various kinds of communities. The SNS big data mainly consist of normal text with newly coined words and emoticons so that newly coined words and emoticons analysis is very important to understand the social and public opinions. Social big data is informally made and unstructured, and on social network services, many kinds of newly coined words and various emoticons are made anonymously and unintentionally by people and companies. In the analysis of social data, newly coined words and emoticons limit the guarantee the accuracy of analysis. The newly coined words implicitly contain the social opinions and trends of people. The emotional states of people significantly are expressed by emoticons. Although the newly coined words and emoticons are an important part of the social opinion analysis, they are excluded from the emotional dictionary and social big data analysis. In this research, newly coined words and emoticons are extracted from the raw Twitter’s twit and analyzed and included in a pre-built dictionary with the polarity and weight of the newly coined words and emoticons. The polarity and weight are calculated for emotional classification. The proposed emotional classification algorithm calculates the weight of polarity (positive or negative) and results in total polarity weight of social opinion. If the total polarity weight of social opinion is more than the pre-fixed threshold value, the twit message is decided as positive. If it is less than the pre-fixed threshold value, the twit message is decided as negative and the other values mean neutral opinion. The accuracy of the social big data analysis result is improved by quantifying and analyzing emoticons and newly coined words.

Keywords: social media opinion; social big data; social big data analysis; emoticon; newly coined words; social emotional opinion

Future Internet 2019, 11, 165; doi:10.3390/fi11080165 www.mdpi.com/journal/futureinternet Future Internet 2019, 11, 165 2 of 14

1. Introduction As social networks services become popular and expresses social opinions more efficiently, social network service participants increase, and this increases the volume of social big data. Social big data increases a vast amount and this is increasing exponentially and spreading rapidly on social media. Also, social big data are mainly composed of mixed texts, music, and are the existing text-oriented data. By analyzing the emotions, opinions, and public opinion of these social big data, companies can create an emerging online market when social big data are used for corporate marketing and strategy development. The Korean government and global Korean company recognize social big data as important data and actively utilize it for customer’s needs and demands. For this reason, research on text mining, natural language processing, and emotional analysis is actively carried out. The emotional analysis based on social big data can help people’s consumption aspect and opinions on government policy or company product evaluation. According to the research [1], companies analyze and survey consumers’ opinion not only for their images, but also for other companies’ products and services. However, social big data contains many newly coined words and emoticons. The newly coined words and emoticon express people’s emotions and opinions appropriately and are directly related to the social emotion. These newly coined words and emoticons are difficult to classify systematically based on morphemes or are difficult to be preregistered. In this paper, Korean-based newly coined words and global emoticons are extracted from Twitter. The emotional polarity of SNS with the coined words are analyzed based on it. The remainder of the paper is organized as follows. In Section2, we review previous emotional analysis research and show weak points of them. The proposed social emotional opinion decision algorithms with newly coined words and the emoticon polarity of social network services (SNS) are described in Section3. We introduce the architecture and function processes of social emotional opinion decision with newly coined words and emoticon dictionary and the experiment results are described in Section4. Finally, we conclude in Section5.

2. Related Works Most of the early emotional analysis studies have analyzed the opinions of websites and social media services and provided analysis results of ‘positive/negative’ or ‘good/no’ [2–8]. In [9], online sentences that cannot be correctly analyzed by the previous morpheme analyzer is focused on, because they have many kinds of grammatical mistakes and misspellings. Further, the length of sentences is too short to understand the exact meaning. In order to solve these problems, this research used the word selection method using the priority of the words in the sentence. In this paper, an emotional analysis module that constructs a word property database is proposed. The emotional analysis module is dependent on the part of speech by separating verb and cognition based on the part-of-speech information extracted from the morpheme analyzer. Ref. [9] used Support-vector machine (SVM) algorithm for text emotion analysis. Ref. [9] can compensate for errors in spelling and spacing, but emotional analysis is difficult for actual newly coined words and emoticons in [9]. In order to solve these problems, emotional words and information are extracted from the newly coined word and emoticon by additionally using the coined word and emoticon dictionary that are proposed in this paper. In [10,11], polarity is defined only as negative and positive, and did not considered newly coined words and emoticons. In [12–14], polarity is classified between negative and positive. However, formal words and a formal dictionary are used to analyze SNS opinions. Previous Korean emotion classification research has not yet established a formal classification system for Korean emotion categories, and Korean emotional resource such as SentiWordNet in English is also lacking. This leads to the absence of learning corpus used in the machine learning approach, suggesting the difficulty of classifying Korean emotions. Previous Korean emotional analysis techniques exclude profanity, abbreviation, newly coined words and emoticon, and only analyzing standard vocabulary in dictionaries. However, many articles (texts) such as SNS, homepage comments, and contain spoken words, idiomatic expressions, Future Internet 2019, 11, 165 3 of 14 newly coined words, and emoticons. By filtering these texts, spoken words, idiomatic expressions, newly coined words, and emoticons are excluded from the emotional analysis process. In particular, Korean characters have vowels and consonants so that newly coined words come from a combination of vowels and consonants. These kinds of newly coined words need different emotional analyses from Future Internet 2019, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/futureinternet English and have words that are constructed from English. In this paper, emotional analysis methods In particular, Korean characters have vowels and consonants so that newly coined words come from for Korean-based newly coined words and emoticons are proposed. a combination of vowels and consonants. These kinds of newly coined words need different emotional analyses from English and have words that are constructed from English. In this paper, 3. Newly Coined Words and Emoticon Extraction emotional analysis methods for Korean-based newly coined words and emoticons are proposed. Generally speaking, the internet news provided by mass media uses standard words. However, the social3. Newly data Coined created Words by and individuals Emoticon includeExtraction many non-standard words (newly coined words, emoticons,Generally etc.). Newly speaking, coined the internet words news and emoticons provided by are mass utilized media inuses SNS standard and they words. have However, sentimental meaning.the social For data example, created the by ‘mujigaemaeneoindividuals include (very many poor non-standard manners)’ words in the (newly article coined “geu words, salam-eun jeongmalemoticons, mujigaemaeneoya!!! etc.). Newly coined (He wo isrds in and very emoticons poor manners!!!)’ are utilized in in social SNS and data they is ahave newly sentimental coined word that combinesmeaning. For ‘muji example, (very)’ the and ‘mujigaemaeneo ‘gaemaeneo (poor (very manners)’ poor manners)’ and has in athe negative article meaning.“geu salam-eun However, jeongmal mujigaemaeneoya!!! (He is in very poor manners!!!)’ in social data is a newly coined word when performing the emotional analysis based on the existing Korean dictionary, “mujigaemae (very that combines ‘muji (very)’ and ‘gaemaeneo (poor manners)’ and has a negative meaning. However, poor manners)” of “geu salam-eun jeongmal mujigaemaeneoya!!! (He is in very poor manners!!!)” are when performing the emotional analysis based on the existing Korean dictionary, “mujigaemae (very excludedpoor frommanners)” the analysis of “geu salam-eun even though jeongmal they aremujigaemaeneoya!!! important emotion (He words.is in very If poor weseparately manners!!!)” set up newlyare coined excluded words from and the emoticons analysis even as sentimentalthough they are dictionaries important and emotion useexisting words. If emotional we separately dictionaries set basedup on newly Korean coined dictionaries, words and we emoticons can improve as sentim accuracyental dictionaries in emotional and analysis.use existing This emotional study uses Twitterdictionaries and Naver based Blog on to Korean establish dictionaries, newly coined we can words improve and accuracy emoticons-based in emotional sentimental analysis. This dictionaries. study Fromuses collected Twitter social and data,Naver Twitter Blog to extractsestablish the newl newlyy coined coined words words, and characteremoticons-based type emoticons, sentimental 4-byte characterdictionaries. type emoticons, From collected and Naver social blogdata, extractsTwitter extracts image type the newly emoticons. coined Twitter words, usescharacter its own type search applicationemoticons, programming 4-byte interface type emoticons, (API) to collect and Na socialver blog data, extracts and image since Navertype emoticons. does not Twitter provide its uses its own search application programming interface (API) to collect social data, and since Naver own API, it develops a separate crawler and collects blogs. We then extract the newly coined words does not provide its own API, it develops a separate crawler and collects blogs. We then extract the and emoticons from the collected social data using the proposed algorithm. Figure1 shows the whole newly coined words and emoticons from the collected social data using the proposed algorithm. processFigure of the 1 shows newly the coined whole wordsprocess and of the emoticon newly co extractionined wordsprocess. and emoticon extraction process.

FigureFigure 1. Newly1. Newly coined coined wordswords and emot emoticonicon extraction extraction process. process.

3.1. Social3.1. Social Data Data Collection Collection We collectWe collect Twitter Twitter and and Naver Naver blogs blogs to to collect collect social data data for for extracting extracting newly newly coined coined words words and and emoticons.emoticons. Twitter Twitter is a social is a networksocial network service thatservic communicatese that communicates in short sentences.in short sentences. The characteristics The characteristics of Twitter include ease of use, speed and scalability, and asymmetric networks. In of Twitter include ease of use, speed and scalability, and asymmetric networks. In terms of ease of terms of ease of use, Twitter is not burdened with writing short-sentence-oriented content with a 140- use, Twitter is not burdened with writing short-sentence-oriented content with a 140-character limit. character limit. It also has an advantage that it can be easily linked with external web pages through It alsomobile has an devices advantage and APIs that [1]. it A can blog be can easily be a linked specific with webpage external format web or a pages tool for through creating mobile it that can devices and APIssimply [1 ].upload A blog articles can be (such a specific as short webpage mentions format on daily or activities a tool for or creating existing itarticles) that can [2]. simply Real-time upload Twitter tweets are randomly collected through Twitter search API. Twitter’s search API provides

Future Internet 2019, 11, 165 4 of 14 articles (such as short mentions on daily activities or existing articles) [2]. Real-time Twitter tweets are randomly collected through Twitter search API. Twitter’s search API provides about 1%–2% real-time Future Internet 2019, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/futureinternet tweets for all of Twitter’s tweet. Twitter4j library developed on the java platform is used. Figure2 about 1%–2% real-time tweets for all of Twitter’s tweet. Twitter4j library developed on the java is a function using the Twitter4j library. Because Naver blog does not provide its own search API, Futureplatform Internet is 2019used., 11 ,Figure x; doi: FOR 2 is PEER a function REVIEW using the Twitter4j library. www. Becausemdpi.com/journal/futureinternet Naver blog does not it developsaboutprovide 1%–2% and its own uses real-time search a crawler API,tweets that it developsfor collects all of and NaverTwitter’s uses blog. atweet. crawler The Twitter4j crawler that collects library periodically Naver developed blog. collects Theon the thecrawler java contents of theplatformperiodically Naver blogis used.collects included Figure the contents in2 theis a “Entertainmentfunction of the Naver using blog the/Art” includedTwitter4j category. in library. the “Entertainment/Art” Because Naver blog category. does not provide its own search API, it develops and uses a crawler that collects Naver blog. The crawler periodically collects the contents of the Naver blog included in the “Entertainment/Art” category.

FigureFigure 2. 2.Collect Collect tweets tweets using the the Twitter4j Twitter4j library library [15]. [15 ].

3.2. Preprocessing3.2. Preprocessing Process Process Figure 2. Collect tweets using the Twitter4j library [15]. Randomly collected tweets are raw data, and preprocessing is required to minimize errors in the Randomly3.2. Preprocessing collected Process tweets are raw data, and preprocessing is required to minimize errors in the extractionextraction of newly of newly coined coined words words and and emoticons. emoticons. Randomly collected collected tweets tweets contain contain some some unneeded unneeded information, such as usernames, URLs, hashtags, and so on. Unnecessary information is not included information,Randomly such as collected usernames, tweets URLs, are raw hashtags, data, and and prepro so on.cessing Unnecessary is required informationto minimize errors is not in included the extractionin the subject of newly of pre-extraction coined words and and may emoticons. be a factor Randomly in lowering collected the accuracy tweets contain of emotional some unneeded analysis. in the subject of pre-extraction and may be a factor in lowering the accuracy of emotional analysis. information,To remove unnecessary such as usernames, information, URLs, hashtags,this study and us esso on.the Unnecessaryregular expression information pattern is not of includedthe java To removeinplatform. the unnecessarysubject Figures of pre-extraction 3–5 information, are functions and this mayused study be in athe usesfactor preprocessing the in lowering regular expressionprocess the accuracy of the pattern oftweeter. emotional of theFigure javaanalysis. 3 platform.is a FiguresTofunction 3remove–5 are for functionsunnecessaryremoving a used URL information, infrom the a preprocessingtweet, this Figurestudy 4us ises process a thefunction regular of for the removingexpression tweeter. a Figurepatternuser ID,3 of andis the a Figure function java for removingplatform.5 is a function aFigures URL for from3–5removing are a tweet,functions a hash Figure usedtag. The 4in is the URL, a functionpreprocessing user ID, for and removing process hashtags of a thefrom user tweeter. randomly ID, and Figure Figurecollected 3 is5 a is a functionfunctiontweets for using removingfor removing URL removal a hash a URL function, tag. from The a tweet,user URL, ID Figure userremova ID,4 isl function, anda function hashtags and for hashtag removing from randomlyremoval a user function.ID, collected and Figure Table tweets using51 URLis ana function example removal forof function, aremoving tweet in user which a hash ID a removaltag.preprocessing The URL, function, processuser andID, of and hashtaga Twitter hashtags removalis completed. from randomly function. In Naver collected Table blog,1 is an exampletweetsimage of usingtype a tweet emoticonsURL in removal which consist a function, preprocessing of HTML user IDtags. processremova Sincel of function,you a Twitter do notand is needhashtag completed. to removalextract In the function. Naver image blog, Table type image type1emoticons, emoticons is an example you consist shouldof a tweet of remove HTML in which all tags. content a preprocessing Since and you tags doexcept process not the need of a Twitter to extract tag is from completed. the your image blog. In typeNaverTwitter emoticons, blog, and imageblogs that type have emoticons been preprocessed consist of areHTML stored tags. in theSince social you data do databasenot need for to the extract dictionary the image extraction. type you should remove all content and tags except the tag from your blog. Twitter and blogs that emoticons,As shown inyou Table should 2, the remove dictionary all content extraction and tags DB except is composed the of the tag original from your text blog. ID number, Twitter andthe have been preprocessed are stored in the social data database for the dictionary extraction. As shown blogschannel, that and have the been content. preprocessed The original are storednumber in is the a keysocial value data that database is automatically for the dictionary generated extraction. when a in TableAsTwitter 2shown, the and dictionary in a Tableblog are 2, extractionstored.the dictionary A channel DB extraction is means composed a DB division is of composed the between original of a Twitterthe text original ID and number, atext blog, ID theand number, channel,a content the and the content.channel,means collected Theand originalthe social content. data. number The original is a key number value is that a key is value automatically that is automatically generated generated when a Twitterwhen a and a blogTwitter are stored. and a blog A channelare stored. means A channel a division means a between division between a Twitter a Twitter and a and blog, a blog, and and a content a content means collectedmeans social collected data. social data.

Figure 3. Uniform resource locator (URL) removal function [16].

FigureFigure 3. Uniform3. Uniform resource resource locatorlocator (U (URL)RL) removal removal function function [16]. [16 ].

Future Internet 2019, 11, 165 5 of 14

Future Internet 2019, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/futureinternet

FigureFigure 4. 4.User User IDID removal function function [16]. [16 ].

FigureFigure 5. 5.Hashtag Hashtag removal function. function.

TableTable 1. 1.Example Example of of TwitterTwitter pre-processing process. process.

RawRaw Source Source Data Data Compensated Compensated Social Social Data Data RT @parfaitfemi: And really so handsome that the movie is RT And really so handsome that the movie is RT @parfaitfemi: And really so handsome that the not the Olympics …#Pyeongchang Winter Olympics—ShortRT And not really the Olympics so handsome …#Pyeongchang that the movie Winter is not the movie is not the Olympics ... #Pyeongchang Winter Track 1500 m Men Finalists Sandor Liu (Hungary) OlympicsOlympics—Short... #Pyeongchang Track Winter 1500 m Olympics—Short Men Olympics—ShortTrack 1500 m Men Track Finalists 1500 mSandor Men FinalistsLiu (Hungary) Sandor Olympics—Short Track 1500 m Men https://t.co/LWPRXGyiBq Track 1500Finalists m Men Sandor Finalists Liu (Hungary) Sandor Liu (Hungary) Liuhttps://t.co/LWPRXGyiBq (Hungary) https://t.co/LWPRXGyiBq Finalists Sandor Liu (Hungary) RT @rifkrifk: RT Pyeongchang toilet since LOL LOL LOL RTPyeongchang @rifkrifk: toilet since LOL LOL LOL men’s toilet is left, men’s toilet is left, cause the women are Pyeongchangcause the women toilet sinceare always LOL LOLright LOL(right) men’s toilet RT Pyeongchang toilet since LOL LOL LOL men’s is left, cause the women are always right (right) toilet isalways left, cause right the (right) women are always right (right) https://t.co/TjoQGvI8iM https://t.co/TjoQGvI8iM RT @1theleft: Worldwide 5 billion people support the TV RT Worldwide 5 billion people support the RT2925 @1theleft: players Worldwide in 92 countries 5 billion The largest people number support in the historyRT # Worldwide TV 2925 5players billion in people 92 countries support The the largest TV 2925 TVPyeongchang 2925 players inOlympics 92 countries # Peace The Olympics largest number# Daily in players number in 92 countries in history The # Pyeongchang largest number Olympics in history historyhttps://t.co/X8TPswXKWH # Pyeongchang Olympics # Peace Olympics # # Pyeongchang# Peace Olympics Olympics # Daily # Peace Moon Olympics # Daily Daily Moon https://t.co/X8TPswXKWH Moon Table 2. Examples of social data for word extraction. Table 2. Examples of social data for word extraction. Id Channel Content RT @6882c93abe3b4a1: “Park Geun-hye, cares about jeong Yu Ra.” said the horse riding Id ChannelRT Content @6882c93abe3b4a1: “Park Geun-hye, cares about jeong Yu Ra.” said the horse riding system was a non-linear reality. The former vice chairman of Korea Racing Authority, Lee 1 Twitter SangyoungRT @6882c93abe3b4a1: came out as a witness “Park Geun-hye,and said, “I careshave heard about directly jeong Yuthat Ra.” former said President the horse riding system was a non-linear reality. The former vice chairman of Korea Racing 1 Twitter Park Geun-hye chose to preserve his …” Discussions:Authority, Lee‘Cooperation Sangyoung in the came Left’ out by asPark a witnessGeun-hye, and ‘Workers’ said, “I Movement’ have heard directly that Discussions:former President In the controversy Park Geun-hye surrounding chose to th preservee direction his of the... movement” of withdrawal sinceDiscussions: the parliamentary ‘Cooperation cabinet in was the elected Left’ by and Park Park Geun-hye, Geun- · Labor ‘Workers’ Party · Labor Movement’ Solidarity 2 Twitter · Discussions:Labor Front held In thea public controversy debate on surrounding December 29th. the The direction debate ofat the Korean movement of Confederationwithdrawal since of Trade the parliamentaryUnions (KCTU) cabineton the theme was electedof ‘struggle and for Park the Geun- withdrawalLabor of Partythe · · ParkLabor Geun-hye Solidarity regimeLabor and Frontthe worker’s held a mo publicvement’ debate was the on Decemberpurpose of advancing 29th. The the debate at 2 Twitter · movementthe Korean and Confederation seeking joint activi of Tradeties of Unions the left.…wspaper.org (KCTU) on the theme of ‘struggle for the sticker and seekingimage” width=“185” joint activities height=“160”> of the left. ... type=p50_50” class=“__se_img_el” alt=“sticker image” width=“185” height=“160”> ““

Future Internet 2019, 11, 165 6 of 14

Future Internet 2019, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/futureinternet 3.3. Extraction of Newly Coined Words We use Naver-open dictionaries as base data fo forr extracting newly coined coined words words from from social social data. data. The Naver Open Dictionary, aa user-participation-openuser-participation-open dictionary has 32 newnew languageslanguages including Korean, English, Chinese, and Japanese,Japanese, registered onon thethe websitewebsite (opendict.naver.com).(opendict.naver.com). Users of open dictionaries can evaluate “good” or “dislike”“dislike” for new words registered by other users and can register new ones in the open dictionary directly directly.. The newly coined words are displayed on the page in the “real-time word” list as shown in Figure 6 6 andand waitswaits forfor evaluationevaluation byby otherother users.users. In this study, we developed a dedicated web crawler to extract Naver-open dictionary. Figure 66 is is thethe “real-time“real-time word” page ofof thethe Naver-openNaver-open dictionarydictionary websitewebsite thatthat thethe webweb crawlercrawler willwill collect.collect. The search conditions at the time of collection are limited to Korean. Since the Naver Open Dictionary is a user participation dictionary, itit cancan bebe aa dictionary word that does not match the dictionary meaning or is not frequently used in the SNS. To solve this problem, we store words and meanings of dictionaries containing more than 10 ‘likes’ in the Naver Open Dictionary DB. We use the collected Naver-Open Dictionary to extract the newly coined words andand emoticonsemoticons fromfrom socialsocial data.data.

Figure 6. Naver Open Dictionary.

3.4. Separation of Word Unit In the word segment separation step, the tokenizing operation for the word phrase is performed after separating the social data into the word units. The original social data table for extracting newly coined wordswords andand emoticonsemoticons stores stores only only some some data data of of the the entire entire social social data. data. A rawA raw social social data data table table for extractingfor extracting a newly a newly coined coined word word and and an emoticonan emoticon is extracted is extracted by extractingby extracting only only partial partial data data of the of entirethe entire social social data. data. The The boundary boundary between between the the word word and and the the word word is isdecided decided basedbased on the spacing. The words in spacing units in thethe originaloriginal social data table for extractingextracting newly coined words and emoticons areare tokenized.tokenized. Then,Then, when when we we are are performing performing emotional emotional analysis, analysis, a raw a raw social social data data table table for emotionalfor emotional analysis analysis including including collected collected keywords keywords is also is also tokenized tokenized in a in space a space unit. unit. 3.5. Emoticon Extraction 3.5. Emoticon Extraction In this study, three types of emoticons are extracted from tokenized words. First, it extracts In this study, three types of emoticons are extracted from tokenized words. First, it extracts emoticons of the form consisting of a combination of characters such in Figure7. expresses emoticons of the form consisting of a combination of characters such in Figure 7. Hangul expresses characters by combining the three elements of initial/neutral/longitudinal sound. Therefore, Hangul characters by combining the three elements of initial/neutral/longitudinal sound. Therefore, Hangul cannot express a character as a single of initial/neutral/longitudinal sound. However, most cannot express a character as a single element of initial/neutral/longitudinal sound. However, most of the emoticons consist of only the initial sound. Using these characteristics, a word consisting of of the emoticons consist of only the initial sound. Using these characteristics, a word consisting of only an initial sound in a tokenized word is judged as an emoticon and extracted. The extracted only an initial sound in a tokenized word is judged as an emoticon and extracted. The extracted character type emoticons are stored in the emoticon candidate table. Second, we extract the emoticon character type emoticons are stored in the emoticon candidate table. Second, we extract the emoticon of the image form as shown in Figure8. Recently, as the online community evolves, it is changing of the image form as shown in Figure 8. Recently, as the online community evolves, it is changing from a character-shaped emoticon to an image-type emoticon. SNS companies develop their own from a character-shaped emoticon to an image-type emoticon. SNS companies develop their own unique image emoticons and provide them to users. Image-type emoticons are generally composed of unique image emoticons and provide them to users. Image-type emoticons are generally composed tags in HTML format. After extracting the tag from the tokenized word, it finds unique of tags in HTML format. After extracting the tag from the tokenized word, it finds unique patterns of SNS companies only. For example, if there is a value of “sticker image” in alt attribute in the emoticon tag of image form in Figure 8, extract tag. The extracted tag

Future Internet 2019, 11, 165 7 of 14

Future Internet 2019, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/futureinternet Future Internet 2019, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/futureinternet patternsFuture Internet of SNS2019, companies11, x; doi: FOR only. PEER ForREVIEW example, if there is a value of “sticker www.mdpi.com/journal image” in alt/futureinternet attribute in is stored in the emoticon candidate table with a character type emoticon. Third, we extract the theis stored emoticon in the tag emoticon of image formcandidate in Figure table8, with extract a character tag. type The emoticon. extracted Third, tag extract is stored the emoticons in 4-Byte character, as shown in Figure 9. 4-Byte Unicode characters have been inemoticons the emoticon in 4-Byte candidate Unicode table character, with a characteras shown type in Figure emoticon. 9. 4-Byte Third, Unicode we extract characters the emoticons have been in extended from 2-Byte Unicode characters to 4 Byte, so that it is possible to represent characters in 4-Byteextended Unicode from 2-Byte character, Unicode as shown characters in Figure to 94. Byte, 4-Byte so Unicode that it is characters possible haveto represent been extended characters from in picture format. Emoticons that use 4-Byte Unicode characters are “” developed by NTT 2-Bytepicture Unicodeformat. charactersEmoticons tothat 4 Byte, use so4-Byte that itUnicode is possible characters to represent are “Emoji” characters developed in picture by format. NTT TOKOMO Co., Ltd. of . “Emoji” is supported by Apple and Google as well as SNS company EmoticonsTOKOMO Co., that Ltd. use 4-Byteof Japan. Unicode “Emoji” characters is supported are “Emoji” by Apple developed and Google by NTTas well TOKOMO as SNS company Co., Ltd. Facebook. A 4-Byte Unicode character-type emoticon extraction process, such as “Emoji”, examines ofFacebook. Japan. “Emoji”A 4-Byte is Unicode supported character-type by Apple and emoticon Google extraction as well as process, SNS company such as Facebook. “Emoji”, examines A 4-Byte all characters in the tokenized word to find a character encoded in 4-Byte Unicode. The extracted 4- Unicodeall characters character-type in the tokenized emoticon word extraction to find a process, character such encoded as “Emoji”, in 4-Byte examines Unicode. all charactersThe extracted in the 4- Byte Unicode character type emoticons are stored in the emoticon candidate table as character type tokenizedByte Unicode word character to find a type character emoticons encoded are instored 4-Byte in Unicode. the emoticon The extracted candidate 4-Byte table Unicodeas character character type emoticons. typeemoticons. emoticons are stored in the emoticon candidate table as character type emoticons.

Figure 7. Character type emoticons [16]. Figure 7. Character type emoticons [16]. [16].

Figure 8. ImageImage type emoticons [16]. [16]. Figure 8. Image type emoticons [16].

Figure 9. 4-Byte character type emoticons [16]. Figure 9. 4-Byte character type emoticons [16]. Figure 9. 4-Byte character type emoticons [16]. 3.6. Dictionary Registration 3.6. Dictionary Registration 3.6. DictionaryIf the extraction Registration of the newly coined word and emoticon is completed, the registrant registers the If the extraction of the newly coined word and emoticon is completed, the registrant registers newlyIf coinedthe extraction word and of the emoticon newly emotioncoined word dictionary. and emoticon The registrant is completed, categorizes the registrant the effective registers newly the newly coined word and emoticon emotion dictionary. The registrant categorizes the effective coinedthe newly words coined and word the eff andective emoticon emoticons emotion in the dict newlyionary. coined The word registrant candidate categorizes table and theemoticon effective newly coined words and the effective emoticons in the newly coined word candidate table and candidatenewly coined table. words Valid and newly the coined effective words emoticons that are in not the included newly incoined the general word emotionalcandidate dictionarytable and emoticon candidate table. Valid newly coined words that are not included in the general emotional andemoticon the newly candidate coined table. word Valid dictionary newly includecoined words buzzwords that are or feednot included stuff items in thatthe general represent emotional political dictionary and the newly coined word dictionary include buzzwords or feed stuff items that anddictionary social issues.and the The newly feed stucoinedff is Koreanword dictionary slang, which include is mainly buzzwords used by teenagers.or feed stuff Among items adults, that represent political and social issues. The feed stuff is Korean slang, which is mainly used by teenagers. therepresent frequency political use of and feed social stuff issues.s is increasing, The feed and stuff the is Korean use of feed slang, stu whffs isich also is mainly spreading used rapidly by teenagers. in SNS. Among adults, the frequency use of feed stuffs is increasing, and the use of feed stuffs is also ValidAmong emoticons adults, the that frequency are not registered use of feed in the stuffs emoticons is increasing, dictionary and include the use character of feed type stuffs emoticons, is also spreading rapidly in SNS. Valid emoticons that are not registered in the emoticons dictionary include spreading rapidly in SNS. Valid emoticons that are not registered in the emoticons dictionary include character type emoticons, Emoji emoticons, and image tag type emoticons. The classification of valid character type emoticons, Emoji emoticons, and image tag type emoticons. The classification of valid newly coined words and emoticons is done manually. To solve this problem, we developed a newly coined words and emoticons is done manually. To solve this problem, we developed a

Future Internet 2019, 11, 165 8 of 14

Emoji emoticons, and image tag type emoticons. The classification of valid newly coined words and FutureemoticonsFuture InternetInternet is 20192019 done,, 1111,, x;x; manually. doi:doi: FORFOR PEERPEER To REVIEWREVIEW solve this problem, we developed www. www. a dedicatedmdpi.com/journal/futureinternetmdpi.com/journal/futureinternet tool to clarify the dedicatedclassification.dedicated tooltool The toto clarifyclarify dedicated thethe classi toolclassi canfication.fication. search TheThe the dedicateddedicated extracted newlytooltool cancan coined searchsearch words thethe extractedextracted and emoticons newlynewly in coinedcoined detail wordsaswords shown andand in emoticonsemoticons Figure 10 andinin detaildetail can specify asas shownshown polarity inin FigureFigure and weight. 1010 andand Polaritycancan specifyspecify can polaritypolarity be classified andand intoweight.weight. positive PolarityPolarity and cannegative,can bebe classifiedclassified and weights intointo positivepositive can range andand from negative,negative, 1 to 5. The andand newly weightsweights coined cancan wordsrangerange from andfrom emoticons 11 toto 5.5. TheThe registered newlynewly coinedcoined using wordsthewords dedicated andand emoticonsemoticons tool are registered storedregistered in the usingusing emotion thethe dedicateddedicated dictionary ttoolool table. areare storedstored inin thethe emotionemotion dictionarydictionary table.table.

FigureFigure 10. 10. ClassificationClassificationClassification ofof newnew words words and and emoticons emoticons using using dedicated dedicated tools. tools.

4.4. EmotionalEmotional AnalysisAnalysis ProposedProposed emoticon emoticon and and newly-coined newly-coined words words polari polaritypolarityty decision decision system system consists consists of of enter enter process process ofof collected collected keyword, keyword, social social data data collectioncollection process,process, contentscontents pre-processingprpre-processinge-processing process, process, morpheme morpheme extractionextraction process process and and emotional emotional clas classificationclassificationsification processprocess inin FigureFigure 1111.11..

FigureFigure 11. 11. EmotionalEmotional analysisanalysis process process [16]. [[16].16].

4.1.4.1. EnteringEntering CollectedCollected KeywordsKeywords EmotionalEmotional analysis analysis users users select select the the topics topics to to be be su subjectedsubjectedbjected to to emotional emotional analys analysisanalysisis for for social social big big data data emotionalemotional analysisanalysisanalysis and andand enter enterenter it. it.it. For ForFor example, example,example, in inin an anan emotional emotionalemotional analysis analysisanalysis of the ofof the socialthe socialsocial big databigbig datadata of the ofof 2017 thethe 20172017 U-20U-20 WorldWorld CupCup KoreaKorea vs.vs. PortugalPortugal gamegame heldheld onon 3030 MayMay 2017,2017, thethe analysisanalysis usersusers enterenter thethe searchsearch termterm asas 2017.05.30,2017.05.30, thethe searchsearch termterm isis “U20”“U20” “Soccer”,“Soccer”, “World“World Cup”,Cup”, “Korea“Korea Portugal”.Portugal”.

4.2.4.2. CollectingCollecting SocialSocial DataData

Future Internet 2019, 11, 165 9 of 14

U-20 World Cup Korea vs. Portugal game held on 30 May 2017, the analysis users enter the search term as 2017.05.30, the search term is “U20” “Soccer”, “World Cup”, “Korea Portugal”.

4.2. Collecting Social Data When collecting social data, the collection process collects the collected data including the search word entered in the “input step of the collected keyword” using the API provided by the SNS vendor. The collected data containing the query are stored in the original message table. As shown in Table1, the elements of the original message table are automatically generated keys as the original text number (ID), and the collected messages are the contents (CONTENT).

4.3. Preprocessing Process The collected data containing the query are raw data, and a preprocessing process is required to minimize errors in emotion analysis. The collection data containing the query include some unnecessary information such as user name, URL, and hash tag. Unnecessary information is information that is not subject to analysis. Therefore, it may be a factor that lowers the accuracy of emotional analysis. To remove unnecessary information, this study uses the regular expression pattern of the java platform. Figure4 is a function that removes the URL if the message contains a URL, and Figure5 is a function that removes the user ID when the user ID is included in the message. Figure6 is a function that removes a hash tag when a hash tag is included in the message. URL removal function, user ID removal function, and hashtag removal function are applied to the collected data including the search term, and the user name, URL, and hash tag entry are removed. The collected data, including the search term after the preprocessing process, are stored in the correction data table. The elements of the correction data table are the same as the original data table.

4.4. Morpheme Extraction The morpheme is extracted by the morpheme extractor process in the message in which the preprocessing has been completed. In [9], morphemes is defined as the basic unit for analyzing language and the smallest grammar unit, that cannot be analyzed any more. Morphological analysis is also a process of separating morphemes from words and restoring them. In order to extract the morpheme of the message, this study uses the 2.4 version of the Comoran morpheme analyzer, Shin-Jun Su, provided in the form of a Java library. Unlike conventional morpheme analyzers, Comoran morpheme analyzer can analyze multiple phrases as a part of speech, allowing more accurate analysis of proper nouns including white space (movie title, restaurant name, song title, etc.). In this study, only the verbs, nouns, and adjectives that will match the emotional word among the parts of the morpheme extracted by the Comoran morpheme analyzer are used for emotional classification. The extracted morphemes are stored in the morpheme table. Elements of table consist of original number (ID) and morphology (MORPHEME).

4.5. Emotional Classification In general, emotional classification of natural language is classified into analysis method based on emotion dictionary and analysis method through machine learning. The analysis method based on the emotion dictionary calculates how many times the emotion word is included in the natural language. In this paper, the emotion dictionary-based analysis method is used. In addition to the basic emotion dictionary, a new emulation dictionary and newly coined-word dictionary are separately constructed and used for emotion analysis. The basic emotion dictionary uses the Korean sentiment analysis corpus (KOSAC) of [10]. Korean Emotion Analysis Corpus (KOSAC) annotated 17,582 emotional expressions, using 332 newspaper articles and 7744 sentences as a commentary for constructing Korean emotion corpus essential for emotional analysis [10,17]. Elements of the emotion dictionary using the Korean emotion analysis corpus are composed of a dictionary code (SEQ), a dictionary name (WORD), a polarity (SENTIMENT), and a weight (INTENSITY). In Figure 12, the polarity of the emotion Future Internet 2019, 11, 165 10 of 14 dictionary is positive if the word is a positive word or negative if it is a negative word. The weights are weighted from 1 to 4, and the higher the number, the stronger the polarity. The newly coined words and emoticon dictionary are extracted by using the coined word and emoticon extraction process presented in this paper. In general, the analysis method based on emotion dictionary extracts morphemes from natural language through morphological analysis and whether the morpheme contains emotion words Futureis computed. Internet 2019 However,, 11, x; doi: morphologicalFOR PEER REVIEW extraction is impossible because www. themdpi.com/journal/futureinternet newly coined word and impossibleemoticon do because not have the a newly basicmorpheme. coined word In and order emotic to solveon do this not problem, have a basic we first morpheme. extract morphemes In order to solveusing this morphological problem, we analyzer first extract as well morphemes as an existing usin emotionalg morphological analysis analyzer method andas well extract as an emotional existing wordsemotional by comparing analysis method them with and basic extract emotional emotional dictionaries words by in extractedcomparing morpheme. them with In basic the second emotional step, dictionariesthe newly coined in extracted word and morpheme. emoticon areIn the extracted second from step, the the completed newly coined data beforeword theand morphological emoticon are extractedanalysis using fromthe the newly completed coined data words before and the emoticon morphological dictionary. analysis Emotion using words, the newly newly coined words,words and emoticonsemoticon dictionary. extracted from Emotion the message words, includenewly coined polarity words, and weight, and emoticons and polarity extracted and weight from the are messageused for emotionalinclude polarity classification. and weight, The emotionand polarity classification and weight equation are used summarizes for emotional the weights classification. among Thethe wordsemotion with classification positive polarity equation and summarizes adds all the the weights weights among among the the words words with with negative positive polarity.polarity andSubsequently, adds all the the weights negative among weight the value words is subtractedwith negative from polarity. the sum Subsequent of the positively, the weight negative values, weight and valuethe resulting is subtracted value stores from thethe correspondingsum of the positive value inweight the variable. values, Here, and the if the resulting result value value is morestores than the 1,corresponding the sensitivity value of the in message the variable. is determined Here, if the as aresultffirmative. value If is the more result than is less1, the than sensitivity 1, the negative of the valuemessage is determined. is determined as affirmative. If the result is less than 1, the negative value is determined.

Figure 12. Emotional polarity decision algorithm [15] [15]..

5. Discussion Based on newly coined words and emoticon extraction process presented in this paper, we have constructed a newly coined word and emoticon extraction and classification system and an emotional analysis system with newly coined words and emoticon based on the emotional analysis process in Chapter 4. The system is built on JAVA JDK 1.6 and CentOS 6.5. MYSQL 5.5 is used as the database to store social data. The social data are extracted only from Twitter. We used twitter4j library and the Java Wrapper of Twitter API, in order to randomly collect tweets in Korean. The collection period is from 7 March 2018 to 7 May 2018, and a total of 4,210,744 tweets were collected. The collected tweets were processed through a preprocessing process to remove the URL, user ID, and hashtag. We collected 90,373 Naver Open dictionaries using the crawlers developed in the JAVA language for

Future Internet 2019, 11, 165 11 of 14

5. Discussion Based on newly coined words and emoticon extraction process presented in this paper, we have constructed a newly coined word and emoticon extraction and classification system and an emotional analysis system with newly coined words and emoticon based on the emotional analysis process in Section4. The system is built on JAVA JDK 1.6 and CentOS 6.5. MYSQL 5.5 is used as the database to store social data. The social data are extracted only from Twitter. We used twitter4j library and the Java Wrapper of Twitter API, in order to randomly collect tweets in Korean. The collection period isFuture from Internet 7 March 2019, 11 2018, x; doi: to FOR 7 May PEER 2018, REVIEW and a total of 4,210,744 tweets www. weremdpi.com/journal/futureinternet collected. The collected tweets were processed through a preprocessing process to remove the URL, user ID, and hashtag. extracting new words from the collected tweets. In the collected tweets, if the word in the open We collected 90,373 Naver Open dictionaries using the crawlers developed in the JAVA language dictionary is included, it is judged as a newly coined word, and 55,996 cases were extracted. Also, for extracting new words from the collected tweets. In the collected tweets, if the word in the 31,340 character-type emoticons and 1171 4-bit type character-type emoticons were extracted for the open dictionary is included, it is judged as a newly coined word, and 55,996 cases were extracted. emoticons. Image-type emoticons are excluded because they are not provided by Twitter. On Twitter, Also, 31,340 character-type emoticons and 1171 4-bit type character-type emoticons were extracted we collected 725,928 tweets from 6 August 2018 to 7 September 2018, with the keyword ‘idol.’ As a for the emoticons. Image-type emoticons are excluded because they are not provided by Twitter. criterion for measuring the accuracy of the emotion analysis method presented in this paper, a person On Twitter, we collected 725,928 tweets from 6 August 2018 to 7 September 2018, with the keyword directly classified tweets having a positive inclination and a tweet having a negative inclination in a ‘idol.’ As a criterion for measuring the accuracy of the emotion analysis method presented in this tweet. The process of categorizing 725,928 tweets by hand requires a lot of time and labor, limiting paper, a person directly classified tweets having a positive inclination and a tweet having a negative the number of positive tweets to 926, negative tweets to 327, with a total of 1253 validation tweets. inclination in a tweet. The process of categorizing 725,928 tweets by hand requires a lot of time The evaluation items of this experiment were emotion analysis accuracy using existing Korean and labor, limiting the number of positive tweets to 926, negative tweets to 327, with a total of dictionary-based emotion dictionary, emotion analysis accuracy using new word and emoticon 1253 validation tweets. The evaluation items of this experiment were emotion analysis accuracy using emotion dictionary, emotion analysis accuracy using existing Korean dictionary-based emotion existing Korean dictionary-based emotion dictionary, emotion analysis accuracy using new word dictionary, and newly coined word and emoticon emotion dictionar, respectively. The tweets are pre- andverified emoticon and the emotion URL, user dictionary, ID, and emotion hashtag analysis of tweets accuracy are removed. using existingThe emotion Korean dictionary dictionary-based based on emotionthe existing dictionary, Korean anddictionary newly coineduses Korean word andsent emoticoniment analysis emotion corpus dictionar, (KOSAC). respectively. Korean TheSentiment tweets areAnalysis pre-verified Corpus and (KOSAC) the URL, consists user ID,of morphemes. and hashtag Therefore, of tweets we are extract removed. morphemes The emotion from dictionarythe tweets basedfor verification on the existing in emotional Korean analysis dictionary using uses emotional Korean dictionary sentiment based analysis on existing corpus Korean (KOSAC). dictionary, Korean Sentimentand then extract Analysis emotional Corpus (KOSAC)words by consists matching of morphemes. Korean sentiment Therefore, analysis we extract corpus morphemes (KOSAC). fromShin theWon-Su’s tweets forComorian verification morpheme in emotional analyzer analysis 2.4 [10] using was emotional used for dictionarymorphological based extraction on existing from Korean the dictionary,verification and tweet. then The extract measurement emotional words method by matchingverifies the Korean correspondence sentiment analysis between corpus the (KOSAC).tweet of Shinpositive Won-Su’s inclination Comorian and the morpheme tweet of analyzernegative 2.4 inclination [10] was usedclassified for morphological by the person extraction in the verification from the verificationtweet. tweet. The measurement method verifies the correspondence between the tweet of positive inclinationFigure and 13 shows the tweet the offrequency negative of inclination use of newl classifiedy coined by words the person and emoticons in the verification in tweets tweet. used in experiments.Figure 13 In shows total, the65.4% frequency of the collected of use of tweets newly had coined newly words coined and words emoticons and emoticons in tweets usedused inin experiments.the experiment. In total,SNS 65.4%users usually of the collected use standard tweets wo hadrds, newly newly coined coined words words, and emoticonsand emoticons used in the experiment.tweets. SNS users usually use standard words, newly coined words, and emoticons in the tweets.

Figure 13. Frequency of newly coined words and emoticons used.

Figure 14 shows the results of the analysis on the positive tweets. The accuracy of emotional analysis by Korea standard word with emotional dictionaries, the accuracy of emotional analysis by the proposed newly coined words and emoticon dictionary, the accuracy of emotional analysis by Korea standard word with emotional dictionaries and the proposed newly coined words and emoticon dictionary are compared. The accuracy of emotional analysis by Korea standard word with emotional dictionaries is 53.5%. The accuracy of emotional analysis by the proposed newly coined words and emoticon dictionary is 56.3%. The accuracy of emotional analysis by Korea standard word with emotional dictionaries and the proposed newly coined words and emoticon dictionary is 80.2%. With Korean standard word with emotional dictionaries and the proposed newly coined words and emoticon dictionary, accuracy improvement of emotional analysis is 26.7%. The number of positive tweets is affected by events as like concert, donations, and showcases. For newly coined words and

Future Internet 2019, 11, 165 12 of 14

Figure 14 shows the results of the analysis on the positive tweets. The accuracy of emotional analysis by Korea standard word with emotional dictionaries, the accuracy of emotional analysis by the proposed newly coined words and emoticon dictionary, the accuracy of emotional analysis by Korea standard word with emotional dictionaries and the proposed newly coined words and emoticon dictionary are compared. The accuracy of emotional analysis by Korea standard word with emotional dictionaries is 53.5%. The accuracy of emotional analysis by the proposed newly coined words and emoticon dictionary is 56.3%. The accuracy of emotional analysis by Korea standard word with emotional dictionaries and the proposed newly coined words and emoticon dictionary is 80.2%. With Korean standard word with emotional dictionaries and the proposed newly coined words and emoticon dictionary, accuracy improvement of emotional analysis is 26.7%. The number of positive Futuretweets Internet is aff 2019ected, 11 by, x; doi: events FOR asPEER like REVIEW concert, donations, and showcases. www. Formdpi.com/journal/futureinternet newly coined words and emoticonsemoticons that that are are not not included in in the the first first analysis stage, stage, polarity polarity of of positive positive tweets tweets is is decided decided only only with Korean standard words with emotional dictionari dictionaries.es. After After that, that, polarity polarity of of positive positive tweets tweets with with thethe proposed proposed newly newly coined coined words words and and emoticon dictio dictionarynary added added to to polarity of positive tweets are decideddecided with Korea standard words with emotional dictionaries.

Figure 14. AccuracyAccuracy analysis analysis of of positive positive tweets. tweets. Figure 15 shows the negative tweet results. The average accuracy of emotional analysis using Figure 15 shows the negative tweet results. The average accuracy of emotional analysis using standard word based emotional dictionaries is 55.8%. The average accuracy of emotional analysis using standard word based emotional dictionaries is 55.8%. The average accuracy of emotional analysis new emulation and emoticon-based emotional dictionaries is 53.8%. The average accuracy of emotional using new emulation and emoticon-based emotional dictionaries is 53.8%. The average accuracy of analysis using standard word-based emotional dictionaries and new terms and emoticon-based emotional analysis using standard word-based emotional dictionaries and new terms and emoticon- emotional dictionaries is 80.1%, which is 24.3% higher than the existing emotion analysis. Emotional based emotional dictionaries is 80.1%, which is 24.3% higher than the existing emotion analysis. analysis results of negative tweets yielded similar experimental results as well as positive tweets. Emotional analysis results of negative tweets yielded similar experimental results as well as positive Emotional dictionaries based on the Korean standard word and the emotional dictionaries based on tweets. Emotional dictionaries based on the Korean standard word and the emotional dictionaries the newly coined word and emoticon showed a similar level of emotion analysis accuracy. based on the newly coined word and emoticon showed a similar level of emotion analysis accuracy.

Future Internet 2019, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/futureinternet emoticons that are not included in the first analysis stage, polarity of positive tweets is decided only with Korean standard words with emotional dictionaries. After that, polarity of positive tweets with the proposed newly coined words and emoticon dictionary added to polarity of positive tweets are decided with Korea standard words with emotional dictionaries.

Figure 14. Accuracy analysis of positive tweets.

Figure 15 shows the negative tweet results. The average accuracy of emotional analysis using standard word based emotional dictionaries is 55.8%. The average accuracy of emotional analysis using new emulation and emoticon-based emotional dictionaries is 53.8%. The average accuracy of emotional analysis using standard word-based emotional dictionaries and new terms and emoticon- based emotional dictionaries is 80.1%, which is 24.3% higher than the existing emotion analysis. Emotional analysis results of negative tweets yielded similar experimental results as well as positive Futuretweets. Internet Emotional2019, 11, 165dictionaries based on the Korean standard word and the emotional dictionaries13 of 14 based on the newly coined word and emoticon showed a similar level of emotion analysis accuracy.

Figure 15. Accuracy analysis of negative tweets.

6. Conclusions As mobile devices were developed and gained popularity, smartphones became a necessity in everyday life, and the exchanges of personal opinions about social issues or society happenings through smart phones enter into general use. As a result, the importance of analyzing individual opinions in SNS is increasing. Specifically, companies and governments have been interested in the results of emotional analysis on social data. However, research on emotion analysis techniques for newly coined words and various kinds of emoticons has been limited due to too quick modification and creation of newly coined words and emoticons. In this research, a newly coined words and emoticons dictionary construction and analysis method is proposed. For newly coined words and emoticons dictionary construction, the morphological analyzer extracts the morphemes from the twit messages of Twitter, and the newly coined word and emoticon are extracted from the pre-processed data (morphological analysis results). This newly coined words and emoticons dictionary is used to decide the polarity of twit messages. In newly coined words and emoticons dictionary analysis method, the social data emotional polarity decision algorithm is proposed. The social data emotional polarity decision algorithm uses an emotion classification equation and the proposed newly coined words and emoticons dictionary. In order to evaluate the effectiveness of the newly coined words and emoticon in emotional analysis of SNS (twitter messages), we compared the results analyzed in the emotion dictionary based on the existing Korean dictionary and the analysis results based on the newly coined word and emoticon dictionary. After collecting about 700,000 twits that contain the keyword ‘idol’ from from 6 August 2018 to 7 September 2018 using Twitter’s API, emotional analysis experiments showed that emotions, including new words and emoticon dictionaries, the accuracy of the analysis and the accuracy of the non-emotional analysis are, respectively, improved. In order to fully understand the user base and to correctly analyze the relationship between the users and the options, user analyze can be added for further studies. This study has its limitations as it did not include detailed user information, since Twitter API does not provide it. Added user survey can provide more details about the relationship between the sentiment opinions and where and how the differences or correlations lie among certain users or certain group of users. Future Internet 2019, 11, 165 14 of 14

Author Contributions: Conceptualization, K.S.C.; Methodology, K.S.C.; Software, J.S.Y.; Validation, K.S.C. and M.-S.K.; Data Curation, J.S.Y.; Resources, J.S.Y.; Writing—Review & Editing, K.S.C. and M.-S.K.; Supervision, K.S.C. Funding: This research received no external funding. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Kim, S.Y.; Park, S.T.; Kim, Y.K. Samsung-Apple patent war case analysis: Focus on the strategy to deal with patent litigation. J. Digit. Converg. 2015, 13, 117–125. 2. Bollen, J.; Mao, H.; Zeng, X. Twitter mood predicts the stock market. J. Comput. Sci. 2011, 2, 1–8. [CrossRef] 3. DiGrazia, J.; McKelvey, K.; Bollen, J.; Rojas, F. More tweets, more votes: Social media as a quantitative indicator of political behavior. PLoS ONE 2013, 8, e79449. [CrossRef][PubMed] 4. Pak, A.; Paroubek, P. Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17–23 May 2010; Volume 10. 5. Agarwal, A.; Xie, B.; Vovsha, I.; Rambow, O.; Passonneau, R. Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media (LSM 2011), Portland, OR, USA, 23 June 2011; pp. 30–38. 6. Go, A.; Bhayani, R.; Huang, L. Twitter Sentiment Classification Using Distant Supervision; CS224N Project Report; Stanford University: Stanford, CA, USA, 2009. 7. Kim, S.M.; Hovy, E. Determining the sentiment of opinions. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, 23–27 August 2004. 8. Yadollahi, A.; Shahraki, A.G.; Zaiane, O.R. Current State of Text Sentiment Analysis from Opinion to Emotion Mining. ACM Comput. Surv. 2017, 50, 25. [CrossRef] 9. Song, E. The Sensitivity Analysis for Customer Feedback on Social Media. J. Korea Inst. Inf. Commun. Eng. 2015, 19, 780–786. [CrossRef] 10. Kim, M.; Jang, H.; Jo, Y.; Shin, H. KOSAC: Korean Sentiment Analysis Corpus. In Proceedings of the Korea Information Science Society, 2013; pp. 650–652. 11. Oh, P.; Hwang, B. Real-time Spatial Recommendation System based on Sentiment Analysis of Twitter. J. Soc. E-Bus. Stud. 2016, 21, 15–28. [CrossRef] 12. Lee, G.H.; Lee, K.J. Sentiment Analysis of Twitter on Current Trends Extracted from Newspapers. Korea Inf. Process Soc. 2013, 2, 731–738. 13. Kim, K.; Lee, J. Sentiment Analysis of Twitter using Lexical Functional Information. In Proceedings of the KISS 2014 Conference, Busan, , 25–27 July 2014; pp. 734–736. 14. Hong, D.; Jeong, H.; Park, S.; Han, E.; Kim, H.; Yun, I. Study on the Methodology for Extracting Information from SNS Using a Sentiment Analysis. J. Korea Inst. Intell. Transp. Syst. 2017, 16, 141–155. [CrossRef] 15. Yang, J.S.; Chung, K.S. Newly-coined Words and Emoticon Polarity for Social Emotional Opinion Decision. In Proceedings of the 2019 IEEE 2nd International Conference on Information and Computer Technologies, Kahului, HI, USA, 14–17 March 2019; pp. 126–130. 16. Yang, J.S.; Chung, K.S. Newly-Coined Words and Emoticons Dictionary Construction for Social Data Sentiment Analysis. In Proceedings of the 2019 11th International Conference on Future Computer and , Rangoon, Myanmar, 27 February–1 March 2019; pp. 126–130. 17. Jang, K.; Park, S.; Kim, W. Automatic Construction of a Negative/positive Corpus and Emotional Classification using the Internet Emotional Sign. J. KIISE 2015, 42, 512–521. [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).