Information School

INF6000 Dissertation COVER SHEET (TURNITIN)

Registration Number 160128947 Family Name Zhao First Name Pan

Use of unfair means. It is the student's responsibility to ensure no aspect of their work is plagiarised or the result of other unfair means. The University’s and Information School’s advice on unfair means can be found in your Student Handbook, available via http://www.sheffield.ac.uk/is/current

Assessment Word Count _____10893______. If your dissertation has a word count that is outside the range 10,000 – 15,000 words or if you do not state the word count then a deduction of 3 marks will be applied

Late submission. A dissertation submitted after 10am on the stated submission date will result in a deduction of 5% of the mark awarded for each working day after the submission date/time up to a maximum of 5 working days, where ‘working day’ includes Monday to Friday (excluding public holidays) and runs from 10am to 10am. A dissertation submitted after the maximum period will receive zero marks.

Ethics documentation should be included in the Appendix if your dissertation has been judged to be Low Risk or High Risk. (Please tick the box if you have included the documentation) A deduction of 3 marks will be applied for a dissertation if the required ethics documentation is not included in the appendix; and the same deduction will be applied if your research data has not been available for inspection when required.

The deduction procedures are detailed in the INF6000 Module Outline and Dissertation Handbook.

Competitive Intelligence in Social Media Twitter : Huawei, OPPO and VIVO

A study submitted in partial fulfilment of the requirements for the degree of MSc Information Management

at

THE UNIVERSITY OF SHEFFIELD

by

Pan Zhao

September 2017 Abstract

In this information age, social media has become a part of people's daily lives, it is an important way for people to communicate and socialize. With the development of social media, more and more information is generated from social media, these information contains a wealth of valuable information. Users through the social media to express their views, participate in corporate marketing activities, access to the required information. Enterprise can understand user’s needs and preferences through the behavior of users in social media and use that information to improve their products and services. For 's smart phone brands, they can get user feedback and advice from the social media through reasonable data collection and processing, and they can define competitors by comparing different products form different brands. In this study, three Chinese smart phone business as the research object, researcher analyzes the social media data of the three companies on Weibo, designs the reasonable data collection method and data analysis method, and finally proves the correlation between the social media data and the market performance, at same time, this study verify that this method is feasible. The three Chinese smart phone brand is Huawei, OPPO and VIVO, they are top 3 in China's smart phone market sales in 2016 ranked. The study uses Chinese Word Segmentation, text mining, Sentiment analysis and other technologies.

Key word:social media,smart phone,Weibo,Chinese Word Segmentation, text mining, Sentiment analysis.

Table of Contents

Abstract ...... 1 1. Introduction and context ...... 3 1.1 Introduction and context ...... 3 1.2 Research aims and objectives ...... 5 2. Literature review ...... 6 2.1 social media ...... 6 2.2 Social media platforms in China ...... 8 2.3 Competitive Intelligence ...... 9 2.3.1 Competitive Intelligence Process ...... 12 2.3.2 The Benefits of Competitive Intelligence ...... 14 2.4 Social media in competitive intelligence...... 15 3. Methodology ...... 17 3.1 Quantitative ...... 17 3.2 Data collection methods ...... 18 3.3 Pre-processing Data ...... 23 3.4 Data analysis methods ...... 26 3.4.1 Text Mining ...... 27 3.4.2 Sentiment analysis ...... 28 3.5 Implementation process ...... 34 3.5.1 Weibo Sentiment Polarity Analysis ...... 34 3.5.2 Purchase intention analysis ...... 36 3.6 Practicalities ...... 37 3.7 Ethical ...... 38 4. Result and Finding ...... 39 4.1 Social media and Marketing ...... 40 4.2 Text mining...... 41 4.2.1 OPPO ...... 42 4.2.2 Huawei ...... 46 4.2.3 VIVO ...... 49 4.3 Weibo Sentiment Polarity Analysis ...... 52 4.4 Purchase Intention Analysis ...... 53 4.5 Summary ...... 55 5. Discussion ...... 56 6. Conclusion ...... 61 Reference ...... 64 Appendix ...... 70

1. Introduction and context

1.1 Introduction and context

With the rapid development of social media in recent years, people's way of receiving information from television, radio and so on gradually changed to a social media-oriented multi-to-many interactive information dissemination. In recent years, social media users have shown a booming trend, according to Mintel (2016) report pointed out that Facebook's daily active users increased from 327 million in 2010 to more than 1 billion in

2015. According to statistics of Sina’s Weibo (hereinafter referred to as

Weibo) earnings, the number of active users (MAU) in December 2016 increased 33% from a year earlier to 331 million, of which 90% were mobile users. The number of daily active users (DAU) in December 2016 increased 30% from a year earlier to $ 139 million (Finance Sina, 2017).

Social media is an efficient marketing tool, the cost of advertising is much lower than the traditional media, and spread a wider range. Mzinga and

Babson (2009) have pointed out that 86% of companies use social media for commercial purposes, with 57% of social media as a marketing tool.

Based on the huge user base, the commercial value of social media is getting higher and higher, and social media can gain insight into the market and find out the competitor and user's evaluation of the product (Kim, 2016).This study is based on the Chinese social media Weibo, as the smart phone industry to the research object, explore the relationship between the user and the mobile phone market based on social media, and then translate information that is embedded in social media into competitive intelligence for the smart phone business to provide a reliable decision making assistance. The research object is three Chinese smartphone brands.

Respectively, Huawei, OPPO and VIOV, per the Iimedia (2017) report pointed out that the three companies over Apple and Samsung and other international brands occupy the top three in Chinese smart phone market.

The market share is: OPPO ranked first, the market share of 18%;

HUAWEI ranked second, the market share of 17.6%; VIVO ranked third, the market share of 15.4%; Apple ranked fourth, the market share of 14.6%,

Samsung did not enter the top five (Iimedia, 2017). The three companies represent the Chinese smartphone industry, the study of the relationship between the three companies and social media has an important reference value to understand the Chinese smart phone industry.

As an open social media platform, microblogging is an important platform for businesses to communicate with users and potential users (Malhotra,

2016). China’s mobile phone companies in the social media platform has an official account to be dedicated to customer communication and marketing. In the process of communication between enterprises and users will produce a large number of unstructured data, through natural language processing, text mining, emotional analysis and other technical means to deal with these unstructured data can be obtained reliable intelligence information (Ma, 2017). In this study, all social media data will come from

Weibo.

1.2 Research aims and objectives

The aim of this study is mining the competitive intelligence from Huawei,

OPPO, VIVO related social media data, and discuss the link between social media data and actual market performance, and establish the relevant methodologies. In the course of the study, it is the focus of this project on how to collect target information in Chinese social media,then the process of information collation and analysis after the completion of information collection.

In order to achieve the above aim, the specific objectives are:

1. Analyse data about the 3 companies gathered from the Weibo platform using text mining and sentiment analysis.

2. Defining the impact of social media on market performance.

3. Defining the relationship between social media data and corporate market performance

2. Literature review

2.1 social media

There is no uniform definition of social media in academia. Antony (2008) defines social media as a new online media that gives users a great deal of space, with the following characteristics: participation, openness, communication, connectivity, community, multiplatform. Andreas Kaplan and Michael Haenlein (2010) “defines social media as a series of web applications based on the technology and ideology of web 2.0, which allows users to create and communicate their own content.” Obor and

Wildman's review of social media has determined that social media should have the following four characteristics: 1. Social media is Internet applications based on Web 2.0; 2. User generated Content (UGC) is the main source of social media content; 3. Social media sites or applications

Should be based on user needs to design; 4. Social media through user data connect different users, in order to develop social networks (Obar&Wildman, 2015). Although scholars have not exactly the same definition of social media, their essence and content for social media are the same, Social media is a applications set that use Internet technology to eliminate the barriers between time and space and facilitate communication between people.

Typical social media generally includes blogs, microblogs, wikis, video sharing sites, social networking sites, forums, and more. Social media is rich in functions, the basic functions generally include: the establishment of personal connections; information dissemination, sharing and comments; the establishment of groups; experience social games; recommend and search; browsing information (Zhang Chunhong, 2012). There are many types of social media, and many representative websites or applications have made the public familiar.

In September 2009, Mzinga and Babson conducted a social media survey of 555 companies in the financial, consulting, marketing, human resources and engineering industries, and 86% of these companies used social media for commercial purposes. Specific applications include: Marketing (57%); internal collaboration and learning (39%); customer service and support

(29%); sales (25%); human resources (21%); enterprise strategy (); product development (14%).The types of social media used by the surveyed companies include file sharing, podcast, Facebook, Twitter, YouTube, creative sharing, slide sharing. In the study of Paniagua(2014), social media can largely affect the relationship between the organization and the society, and it will reveal the user's preference. At the same time, for marketing, social media is also an effective marketing tool. Through these channels, social media will affect performance (Saravanakumar &

SuganthaLakshmi, 2012). To sum up, social media in the commercial application of different functions, and different types of social media complement each other can play a greater role.

2.2 Social media platforms in China

The famous internet word of mouth (IWOM) consultancy

KantarMediaCIC released "CHINA SOCIAL MEDIA LANDSCAPE

2016", according to the social media industry maturity and the function of the platform attachment to divided China social media platform into two categories:“Functional Segmentation Platforms”、“Mobilized Interest

Communities”. Michael Toedman, said: “social media still plays an important role in people's lives. In China, social media continues to dominate the consumer's time, attention, and trust in products and services, which is the leading brand awareness of social media in the presence of huge commercial data value reasons(KantarMediaCIC,2017).”

Weibo is a social media network planform in China, and it launched by

Sina.com in November 3, 2009. It offers microblogging services. It can be used through the API release information by third-party software. As of

December 2016,the MAU reached 331 million, (Finance Sina, 2017).

The main function of Weibo: publish, forward, subscribe, comment, search

(the user can use two # between, insert a topic) and private massage. The product features of Weibo: 1. Each post cannot exceed 140 characters; 2.

Users can use the Internet, the client, WAP and other means to publish information and receive information, anytime, anywhere; 3. Fast spread.

The user to publish a message, all of his fans can see synchronization, but also a key to forward to their fans, to achieve fission; 4. Real-time search.

Users can search by other Weibo users to find information released a few seconds ago, it is more time-sensitive than the traditional search engine.

2.3 Competitive Intelligence

Competitive Intelligence is a process in which people collect, analyze, and disseminate information about the business environment, competitors, and the organization with professional ethics (Miller, 2001). The first thing to note was that CI wasn’t corporate espionage and CI process was both ethical and legal (Fleisher, 2001). As CI was an emerging research area, there was no commonly accepted and unified conclusion on the definition of CI, and different scholars had their own views. The most accepted definition was: "the processes or practices of generating and disseminating operational intelligence via planning, ethical and legal collection, processing and analyzing internal and external or competitive environment information so as to help decision-makers participate in decision-making and provide competitive advantages for enterprises" (Pellissier &

Nenzhelele, 2013).

Köseoglu and Ross pointed out in their study that CI contained two concepts of competitive and intelligence. Competitive refers to the process of competition among individuals or organizations, and intelligence refers to the forecast for the future as well as the discovery of laws. In the CI process, implementers needed to collect data, information and knowledge from the environment in which they participated. Conversely, in a similar process, all the activities aimed at the collection, analysis and dissemination of data, information or knowledge were called CI (Köseoglu and Ross, 2016). The application of CI could help an organization to gain competitive advantages ahead of its rivals, provide the basis for organizational stability, reduce risks and avoid inefficiencies brought about by information redundancy (Ponis and Christou, 2013).

Moreover, other scholars had varied definitions of CI. Wright, et al. argued that CI was a process of collecting information and should be used for process planning and decision making in order to achieve performance improvement (Wright, et al, 2009). Xu and his companions thought that as a technique for filtering effective information from complex intelligence sources, the purposes of CI were to analyze and interpret the information and to convey the results to decision-makers for aiding decision-making

(Xu, et al., 2011). Du believed that CI was a strategic tool for identifying potential threats and opportunities (Du, 2013).

Miao Qijao (1995) shows competitive intelligence activities carried out

78% of enterprises from the United States, the other from Western Europe,

Canada and Latin America and other countries. Chunjing and others in their respective articles on the competitive intelligence analysis methods commonly used(Yang, 2007). Wang Yongsheng in the "Domestic

Competitive Intelligence Review" describes the method in the research of competitive intelligence in the course of development of China, describes the focus of competitive intelligence analysis methods, such as: SWOT analysis, benchmarking, PEST analysis, financial statement analysis, then, he predicted that China's competitive intelligence has broad market prospects and great potential for development (Wang Yongsheng, 2011). Chen Feng and Liang Zhanping think that corporate culture, formal organization, professional staff is the key factor of enterprise competitive intelligence (Chen Feng, 2002). Zhan Hongqiao discussed the impact of enterprise competitive intelligence work of the key factors are five aspects:

Corporate culture, senior management support, professional quality and ability, information technology facilities, organizational structure, and analyzes how these five key factors affect the competitive intelligence work (Zhan Hongqiao, 2003).

2.3.1 Competitive Intelligence Process

The CI process was generally divided into four successive and cyclic steps

(Sass, et al., 2015). These four steps were Planning, Collection, Analysis,

Communication or Dissemination. In this process, first plan action resources and determine the purpose of the actions, then collect useful information from White information (open-source information), Grey information (private domain information) and Black information (illegally obtained information) in Collection process. The core of the CI lay in the third step, the Analysis step: discovering intelligence from the collected information and using the intelligence information to support strategy formulation and decision-making. After the above steps were completed, the intelligence and analysis results would be delivered to final decision- makers in a variety of ways.

Nasri's research complemented two successive steps in addition to the above four steps: 1. Process and structure. The implementation of CI required a complete structure and a systematic method as the support for actions, and also to be clear to support actions based on the method, the structure, policies and hardware facilities. 2. Organizational awareness and culture. Instead of being an individual's action, CI required the collaborative work of the multiple departments in an organization. Hence, the organizational awareness and culture about CI were needed so as to ensure that the entire process was well performed.

These two points were not the specific CI implementation steps, but the factors that affected the whole process of CI and the environment that supported the CI process. Viviers, et al.'s research suggested that infrastructure and appropriate policies were necessary if employees were expected to contribute effectively to the CI system, which was the basis of the CI process. Meanwhile, if there were no awareness and related corporate culture of sharing information and intelligence within an organization, it would be difficult to develop CI within the organization.

The complete CI process is shown in Figure 1

Fig.1 The competitive intelligence cycle(Fleisher & Bensoussan, 2003)

2.3.2 The Benefits of Competitive Intelligence

Organizing and implementing CI produced significant benefits (Sass, et al.,

2015): 1. improving the analytical ability of management; 2. sharing knowledge or thoughts so that the knowledge of the members in the organization became more explicit, generating new knowledge and applying the knowledge; 3. identifying the factors that affected organization operation, such as the application of new technologies, the actions of competitors, the actions of suppliers and customers; and 4. improving the understanding of external influences and providing the basis for the continuous improvement of the organization.

2.4 Social media in competitive intelligence

Compared with the research on Competitive Intelligence in China and

Western countries, it is found that Internet information is an important source of competitive intelligence, in particular, the social media information is of great significance to the enterprise competitive intelligence work. Steve Duncan, QuadTech (2006) argued that social networks can find experts, especially technical personnel, but also can automatically search using LinkedIn Connect Manager tools, tracking the various connections. In the specific process of development of enterprise competitive intelligence work using social media, Nicole Black, Daily

Record (2010) believe that social media can simplify the traditional competitive intelligence collection process, and has the characteristics of real time search. Lambert, G (2009) proposed the use of social media can monitor the rival company personnel, tracking other company news release trends, identify new products, understand the change of litigation and corporate internal potential.

Wang Shuyi and Wang Xin have proved that Twitter can play an important role in competitive intelligence work, they introduced two kinds of practical application in the competitive intelligence work of Twitter, which is released to control monitoring competitors information using the Twitter content, and communicate via Twitter visualization to construct rival social network graph (Wang Shuyi, 2010). Yu Bo from micro-blog, micro-blog attribute of information dissemination of information pollution, information acquisition, micro-blog fair information service integration, competitive intelligence value, micro-blog social network, spatial ecology and so on aspects of the significance of Information Science (Yu Bo, 2010).

Huang Xiaobin introduced the method of competitive intelligence collection based on Weibo, constructed the relevant collection framework, and discussed the main methods of competitive intelligence analysis

(Huang Xiaobin, 2012).

3. Methodology

In this study, three typical Chinese intelligent mobile phone enterprises as samples, choose their products as the analysis target, then, using text mining, sentiment analysis techniques to derive the user's view of these products as a whole. By comparing the user's view of the product, sales performance and participation in the discussion of the amount of data to assess the relevance of the information contained in the social media and the sales performance of the product. The research by Kim (2016) is similar to this study, and the method design in this study is reference to Kim's approach, which ensures the feasibility of this method. At same time , it will verify that theory from Kim (2016) are equally applicable in the

Chinese market.

3.1 Quantitative

Due to the relatively open method of quantitative research, it is suitable for exploratory research, often produce new insights and changes of direction

(Bryman, 2006).This research is based on social media data, it is needs to collect a large number of social media data, and then through the analysis of the data to answer the question. This requires the use of quantitative research methods, such as the amount of data to analyze the relationship between corporate performance and the degree of concern about the commodity.

3.2 Data collection methods

This study is based on China's social media platform, and it will use the open Source Crawler tool to collect the posts associated with the three mobile phone brands by keyword from the Weibo platform. Open Source

Crawler tool is free data collection tools developed by many developers.

The data has been collected in Weibo contains eight attributes: publisher

ID, publisher name, posts content, release time, posts source, the quantity of forwarding, the quantity of comments, the quantity of like. In this study, four attributes were used: publisher ID, publisher name, posts content and posts source. The greatest impact on this study is that Weibo does not allow comments to be collected by the API for each post. Therefore, this method also has some shortcomings, the need to use the Crawler tools in a specific environment, through the key words on social media to pick up the topic related data.

Table 1 Open Source Crawler Framework Comparison

Planform Development Advantages Disadvantages language

Larbin Linux C++ Good No delete function, (Larbin.sourceforge,2017) performance, remove duplicate stable operation, items will be good support for mistaken Chinese

Nutch Windows/Linux Java Support for Unstable (Nutch.apache,2017) Lucene and Hadoop, with the advantages of distributed operation

Heritrix Windows/Linux Java Good scalability, Poor support for (Webarchive.jira,2017) full-featured, high Chinese, poor fault degree of crawling tolerance process control mechanism

Web SPHINX Windows/Linux Java High efficiency, Stop maintenance (Sphinx,2017) easy to expand, good support for Chinese

PolyBot (Cis.poly,2017) Linux Python/C++ Good scalability The use of complex, user interface is not compatible

WebCollector Windows Java Easy to use, simple Extensibility is not (Crawlscript.github,2017) interface, good good stability, good support for the Chinese

GooSeeker Windows/Mac Multiple Visual operation, Poor scalability programming no programming languages basics required , high reliability

In this project, seven relatively broad open source crawler frameworks were compared, and the GooSeeker was chosen. GooSeeker is a mature data acquisition tool, there are three reasons to choose it: 1, its operating interface is simple, easy to use, can be visualized operation; 2,

GooSeeker support for the Chinese very well, there are widely used in

China, a lot of information to help develop; 3, support the Windows platform to meet this project of the equipment requirements. Although

GooSeeker is poorly scalable and has a less functional framework than other open source crawlers, it is sufficient to meet the data acquisition needs of this study.

For avoiding irrelevant data, this study adopted keyword search to acquire relevant data, and specifically, utilizing the Weibo data acquisition tool in

GooSeeker, performing search by using the keywords "Huawei", "OPPO" and "VIVO", and collecting all the search results. Each of the collected posts includes the publisher ID, the content of the posts, the number of comments, the number of forwards, and so on.

Due to Weibo API (Application Programming Interface) restrictions, each search could only obtain at most 50 pages of results, and each page included about 15 posts. In order to minimize the impact of the restrictions on the reliability of the data set, a set of rigorous data collection method was designed during the data collection progress in this study. Its detailed description was made as follows:

1. The time span of the sample. This study was targeted at studying the

relevance between social media data and the actual sales of mobile

phone brands. Enterprise sales data were the sales data in 2016, so the

collected data were from 01/01/2016 to 31/12/2016.

2. The sample size. The number of data for the whole year of 2016 was

too large to be collected. In this study, a "day" was regarded as a cycle,

and the data of nearly 30 cycles would be collected.

3. Sampling methods. The study required a reasonable extraction of

approximately 30 days of representative data. There were three methods

to be chosen: 1. collecting the data of the three brands in the same

month; 2. collecting the data of the three brands in their respective new

product release month; and 3. collecting the data on the 5th, 15th and

25th every month. The Method 1 had the advantage that the external

factors affecting the user's emotions and attitudes in the same time

period were the same (for example, national plague outbreaks might

make people's emotions more aggressive), which could minimize the

impact imposed by the external factors on this study. Nevertheless, its

shortcomings were also prominent. Owing to the characteristics of the

mobile phone industry, the manufacturers’ marketing behaviors while

each new product release would greatly increase users’ attention (ref), but different brands might release their new products in different

months, resulting in the difficulty in choosing to collect which month’s

data. The Method 2 avoided the shortcomings of Method 1, but there

were various external influencing factors in different months, which had

a negative impact on this study. The advantage of the Method 3 lay in

the more uniform collection of the data throughout the year coupled

with containing all the months and common external influencing

factors. The Method 3 not only included the advantages of the Method

1 and the Method 2, but also avoided the above shortcomings, the

Method 3 was thus adopted.

4. The countermeasures against the Weibo API Restrictions. For the

keyword search, there were the following restrictions on Weibo API: 1.

In a certain time interval, if you searched for a keyword,分页返回, at

most 50 pieces of data would be shown on each page, and at most 20

pages would be found, that is, at most 1000 pieces of data would be

searched (1). Due to the influence of this rule, if the data within one day

were over 1000 pieces of data, the data other than the 1000 would be

lost. Such failure to acquire complete data had a serious negative impact

on this study. To eliminate this negative effect, a parameter "Time"

would be used to divide each cycle into multiple time periods in order

to ensure the number of the data within each time period were less than

1000, so that more data could be collected. The specific time period division methods were as follows: 1. searching for the keywords

"Huawei", "OPPO" and "VIVO" in Weibo, and the time being

05/01/2016, 05/06/2016 and 05/12/2016 respectively; and 2. viewing

the search results, and by taking the last posting time as a starting point,

searching the posts from the time point to 24 pm and the results being

less than 1000. So theoretically, the data within the whole day could be

collected by dividing each day into two 12-hour time periods. However,

in the actual operation, serious data missing was found and the search

results were different many times under the same search conditions. In

order to avoid the influence of this situation on the study, each cycle

was divided into three periods overlapping each other: 0: 00 - 09: 00,

07: 00 - 17: 00, and 15: 00 - 24: 00.

5. Data check and supplement. Manual check was conducted on each data

set after completing the data collection by using GooSeeker according

to the above condition, and the main purpose was to find an obviously

missing data time period for supplement. The keywords while data

supplement were still "Huawei", "OPPO" and "VIVO", but the time

parameter depended on the specific missing data.

3.3 Pre-processing Data

To improve the result accuracy of the data analysis in this study, it was necessary to screen and clear the data before analysis. The data preprocessing goal was mainly to complete the following two tasks:

1. Removing duplicate data. In the data collection process, in order to

make the collected data more complete, each cycle was divided into

three overlapping time period, which made those data in the repeated

time period be collected several times, so the data needed to be cleared

to ensure data quality.

2. Removing advertisements. Because data collection objects were

brands and enterprises, manufacturers would develop plenty of

publicity and promotional activities on Weibo out of business purposes,

so that the collected data encompassed lots of advertisements and

marketing activities, which were the factors causing serious

interferences to the data quality.

The first step of data cleansing was to remove duplicate data. The principle was to compare the two attributes of each piece of data: blogger IDs and posts. As long as the two attributes were identical, it was illustrated that the pieces of data were actually the same, and only one should be retained.

The second step was to clear the advertisements and marketing posts in data sets. The process was: 1. Screening out the data posted by the publisher whose name contained

"Huawei", "VIVO" and "OPPO". These were official marketing accounts, and the published contents were advertisements or marketing activities information.

2. Based on the observation of the data sets and the study on Weibo advertisements and marketing models, it was found that there were two main ways of social media advertisements: 1. letting the accounts with a large number of fans publish all forms of advertisements to get a higher amount of reading, but the post spread being very limited; and 2. encouraging users to forward a post by means of lucky draw, cash reward, or gifts sending to get high amount of reading. The first approach had a small spread amount of advertisements with the minor impact on this study, so the advertisements in the second form were the main clearing objects.

The posts source and topic that contain the advertisement are listed in the table:

Table 2 The posts source and topic that contain the advertisement Brands Posts Source Topic

Huawei Weibo activities, fans cash awards, #Android client feedback #,

Weibo brand activities, voting #Xiaoyun weekend topic#,

etc. VIVO Weibo activities, fans cash awards, #Be a heroes, decisive battle

Weibo brand activities, voting Jingdong#, etc.

OPPO Weibo activities, fans cash awards, # I want to sing with you #,#

Weibo brand activities, voting OPPO Olympic time #, etc.

3.4 Data analysis methods

Data analysis of social media content will use data analysis techniques such as text mining and sentiment analysis. Text mining can extract valuable information from natural language, the more common practice is to count the frequency of words in the sample, and the frequency can reflect the importance of a word, which helps to judge what is the main opinion of a business or product (Meyer, Hornik, & Feinerer, 2008); Sentiment analysis can be used to explore the emotion of the user and provide support for the decision (Rani & Rani, 2016). The data analysis methods such as text mining and sentiment analysis can deal with a large amount of data in a short time. The purpose of emotional analysis is to study whether the emotional tendencies of the product are related to sales. This study will use

R-Studio and Weka for text mining and emotional analysis.

3.4.1 Text Mining

Chinese Word Segmentation

Chinese Word Segmentation referred to dividing a sentence or a sequence of Chinese characters into several separate words. As the basis and key for

Chinese text analysis, the main reason for Chinese Word Segmentation resulted from the unobvious distinction between Chinese words. Unlike

English texts, there was no interval between the words in a Chinese sentence, so computers failed to process the text correctly. The difficulties of Chinese Word Segmentation mainly arose from the following several aspects: 1. Ambiguity recognition. Ambiguity referred to there might be two or more segmentation methods for the same sentence, and different results represented different meanings; and 2. New word recognition. The process of Chinese Word Segmentation relied on the phrases in the dictionary, and if a word was not included in the dictionary, it would not be recognized by the computer. The widely used software package in the field of Chinese Word Segmentation based on Python was Jieba.

Jieba supported four word segmentation modes of Maximum Probability,

Hidden Markov Model, Query Segment and Mix Segment, as well as POS tagging (Part-of-Speech tagging) and keyword extraction function. Jieba also added some basic text analysis algorithms such as TFIDF (term frequency-inverse document frequency), Text similarity analysis. It was developed by using Rcpp and CppJieba and its latest version was updated on 28/09/2016.

This study chose Jieba package as the word segmentation tool for the following reasons:

1. This study had no need for recognition and proprietary vocabulary recognition.

2. The expression of Weibo posts tended to be colloquial. Word

Segmentation processing was more dependent on the logical relationship between the context.

3. To meet the research objectives, technologies such as TFIDF and Text similarity analysis needed to be used.

4. Jieba was frequently updated, meaning that the new words appearing in recent years would be included therein, which was critical to Word

Segmentation.

3.4.2 Sentiment analysis

As an important component of Natural Language Processing Technology, sentiment analysis referred to that a computer could automatically judge the author's own mood, psychological reaction and sentiment state conveyed by a text by means of algorithms. At present, the sentiment analysis for Chinese Weibo is still in its infancy. Despite the fact that the sentiment analysis for English Twitter achieved preliminary achievements, its method was not completely applicable to the sentiment analysis for

Chinese Weibo because of the great differences between Chinese and

English in character writing and language expression. In addition, Weibo content was limited within 140 characters with a low information posting threshold and less constraints on the language and characters, which resulted in Weibo diction confusion, colloquial expression and quite irregular grammar, whilst there were often pictures, links, network buzzwords, emoticons and irregular punctuation combinations in texts, making the traditional text classification method inapplicable, bring about difficulties in the word segmentation processing of Weibo corpora and causing a great interference to Weibo sentiment analysis. Consequently, the accuracy rate of Chinese Weibo multi-sentiment analysis at this stage is still low. At present, the methods of identifying Weibo sentiments are mainly divided into two categories: first, the machine learning method; and second, the Lexicon-based unsupervised method.

The machine-learning-based method mainly utilized the training sets to training classifiers with annotated sentiment categories, and obtained classification models through training by using words, sentiment words and themes as classification features for later sentiment classification.

Now, the general machine learning methods for text classification include k-Nearest Neighbor, Naïve Bayes, Maximum Entropy and Support Vector

Machines. The main idea was: extracting corresponding features from corpora according to a feature extraction algorithm; training classifiers on the training corpora by utilizing these features; and predicting the sentiment categories of other unknown samples by using the classifiers obtained through training. The general approach was: first conducting

Binary classification on data sets by using the classifiers; dividing into two categories of sentiment expression and non-sentiment expression; and then identifying specific sentiments by using a more targeted algorithm. For example, Ouyang Chunping (2014), et al. first conducted Binary classification on a post by using a NB (Naïve Bayes) classification algorithm to determine whether there was a sentiment; and then performing fine-grained sentiment classification on the post with sentiment by using

SVMs (Support Vector Machines) and KNN (k-Nearest Neighbor) classification algorithms to identify the specific sentiment categories. So far, the supervised method has produced relatively good experimental results. Go (2009), et al. performed positive and negative sentiment classification on tweets by using three machine learning methods of Naïve

Bayes, Maximum Entropy and Support Vector Machines, and the correct rate exceeded 80%. Besides, more advanced machine learning techniques were applied. For instance, Huanhuan Liu (2013), et al. proposed a joint model by using the relationship between news readers and comment authors to classify the sentiments of news readers and comment authors.

This method held that there was consistency between the sentiments of news readers and those of comment authors, so a small number of annotated news readers’ sentiments and the corresponding comment authors’ sentiment corpora were used for Co-training. The final experimental results showed that relatively high classification performance could be achieved with merely 10 initial samples annotated corpora.

Meanwhile, the experiments made on the sentiment tendencies of news readers and comment authors displayed better classification performance than traditional algorithms. The above three examples represented the main application of machine learning in the field of Natural Language

Processing, but the flaws were also apparent: 1. It relied heavily on a large number of annotated corpora, that is, the corpora artificially annotated with the sentiment categories, and the corpora content in different fields had large differences, so more manpower and time needed to be spent on corpora annotation, which caused certain limitations to the sentiment analysis for the posts with a great deal of information and involving a wide range of fields; 2. It was essentially the keyword and sentence feature based on sentiment without taking into account the relationship between semantics and context; 3. Its timeliness was short and it cannot be popularized. For specific data sets, the machine learning method might use specific corpora for learning, and the found features only had high recognition ability for object data sets with poor compatibility to other data sets. Furthermore, the iteration of network culture and buzzwords was at a high speed, causing an old annotation would soon lose its function; and 4.

Human sentimental reactions were very complicated and changeable, especially complex sentiments such as gratitude, pride and guilt. The existing text sentiment classification system tends to map numerous complex sentiments to basic sentiments, such as mapping worry to sadness and mapping satisfaction to happiness. However, due to the subjectivity of human sentiments, it was rather hard to map all the complex sentiments to the corresponding basic sentiments in the actual annotation process.

Lexicon-based Sentiment Analysis referred to using sentiment lexicons to judge Weibo sentiment classification according to established rules. The main idea was: matching the words obtained by performing word segmentation on a post with the sentiment lexicon; counting the numbers of all kinds of sentiment words in the post; and finally judging the sentiment of the post according to the numbers of sentiment words and the established rules. The general approach was: first performing word segmentation processing on corpora; distinguishing the part-of-speech; and carrying out sentiment analysis by using the corresponding sentiment lexicon. For example, Nasukawa (2003), et al. performed word segmentation, POS (part-of-speech tagging) processing and syntactic dependency analysis on the corpus, selected nouns, adjectives, verbs from the corpus for sentiment pre-definition, and adopted rule-based method for sentiment analysis. The main Chinese sentiment lexicons were: HowNet and NTUSD. HowNet was a Chinese and English sentiment lexicon created by the Computer Language Information Center of the Chinese

Academy of Sciences. It clearly separated the words that express sentiments from the words that express evaluation, wherein sentiment words were divided into positive sentiment words and negative sentiment words, and a total of 836 positive sentiment words and 1254 negative sentiment words therein. NTUSD was a Chinese word database created by

Taiwan University (NTU) based on the textual sentiment binary division method. It divided 11086 words into 2810 positive attribute words and

8276 negative attribute words. Its shortcomings lay in: 1. It depended on the sentiment lexicons, so the quality of the sentiment lexicons determined the judgment accuracy of the unsupervised method; 2. Chinese sentiment lexicons were incomplete and still in the construction stage; and 3. The sentiment lexicons were mostly constructed by screening words from existing lexicons, dividing the sentiment categories of words and annotating sentiment intensity in an artificial way, so most of the sentiment division and intensity annotation took the original meaning of words as the standard, which failed to identify new words.

3.5 Implementation process

The machine-learning-based method was costly and relied on artificially annotated corpora, but the results were more accurate and could identify subtle sentiments. The lexicon-based method didn’t depend on the annotated corpora, which overcame the dependence of the machine learning method on a large number of annotated corpora and avoided the high cost of training classifiers to a certain extent, so it was more suitable for the Weibo sentiment analysis with large data volume. Based on the above discussion, the two methods were combined in this study for discussing the data sets.

3.5.1 Weibo Sentiment Polarity Analysis

Sentiment polarity recognition was carried out on all the valid data in the data sets of HUAWEI, OPPO and VIVO by using the lexicon-based method. The sentiment polarity recognition of complete data sets represented Weibo users’ attitudes towards the brand and could reflect the brand reputation and the purchase possibility. The specific method was as follows:

1. Conducting data preprocessing and deleting duplicate data and

advertisements. Only such data sets could reflect the most real

sentiments of all Weibo users towards the three manufacturers.

2. Using the lexicons HowNet and NTUSD. In the process of sentiment

polarity recognition, the more plentiful the words in the lexicon were,

the higher the accuracy of sentiment polarity recognition was. As a

result, as shown in the table below, in this study the two lexicons would

be combined and duplicate words would be removed:

• Dictionaries• Positive Negative • In total Repetition

HowNet 836 1254 2090 1392 NTUSD 2810 8276 11086

3. Applying Rstudio as an analysis tool for sentiment polarity recognition.

3.5.2 Purchase intention analysis

The purchase intention was essentially a special sentiment tendency as well, so sentiment analysis techniques could also be applied to purchase intention recognition. The machine learning method was more applicable to the precise recognition of the "special sentiment". Different from the above mentioned sentiment analysis study that required multi-sentiment accurate recognition, herein it was enough to just accurately recognize whether a Weibo user had the sentiment tendencies of "buying" or "not buying". SVMs (Support Vector Machines) was a reliable means to achieve this purpose, and its principle was to convert any form of information into a unified form of vector. In this way, the original data had vector features, and then these vector features were used to complete the data feature modeling work. The finally accomplished SVMs model held distinct feature directivity and could be used to accurately judge whether target data conformed to model features. The model features exactly referred to whether a user had the sentiment tendencies of "buying" or "not buying".

The specific method was as follows:

1. Performing data preprocessing, deleting duplicate data and advertisements, and forming “Data set_O”.

2. Randomly selecting 200 pieces of data from the data sets of each of the three respective manufacturers, and constituting a “Training set”.

3. Artificially annotating the date in the Training set, wherein the data showing purchase intention was annotated "T", while the data showing no purchase intention "F".

4. Using the Word2Vec package to convert the corpora in the Training set into the form of a vector in Rstudio, and forming “Training set_1”.

5. Conducting 10 times cross test based modeling on the “Training set 1” by using the SVMs in the data mining tool WEKA (Waikato Environment for Knowledge Analysis). Since the modeling process was based on 10 times cross-validation, it was not necessary to set a validation set to validate the accuracy of the model.

6. Using the Word2Vec package to convert the corpora in the “Data set_O” into the form of a vector in Rstudio, and forming a Vector set, which contained “Huawei-Vec”,“OPPO-Vec” and “VIVO-Vec”.

7. Using the Model to process “Huawei-Vec”,“OPPO-Vec” and “VIVO-

Vec”. The obtained results were users’ purchase intentions.

3.6 Practicalities

This research is based on social media data, there are two ways to access the data: 1. through the official AIP acquisition, or through the legitimate purchase of related companies; 2. crawling data through the crawler program, crawler technology is impeccable, for research is legal and ethical.

The technology used in data analysis, such as text mining, sentiment analysis, is a very mature technology, there are a large number of literature, tutorials, related research can refer to. And the feasibility of these techniques in dealing with social media data has been demonstrated by a number of researchers.

3.7 Ethical

The data collected in this study are from the mobile phone companies’ official Weibo account, all tweets and user comments are open, and does not involve user privacy. According to Li Shenglong (2015) the study pointed out that unfair competition business for the purpose of using web crawler technology in People's Republic of China is a violation of the “anti- unfair competition law”, at the same time, According to The British psychology society s Ethics Guidelines for Internet-mediated research"

(2013), the data in social network will automatically be considered as a result of the activities of the public. All the data obtained in this study using the crawler tool will only be used in this study and will not be used for commercial purposes. For the data, the researchers will not explore the specific information contained in the user comments, will only be used for

Sentiment analysis and other technical treatment. On the other hand, the original data has been stored in the data warehouse of the University of

Sheffield. Based on the above discussion, the data acquisition does not need to be informed and consent.

4. Result and Finding

The data collection process strictly followed the data collection method.

From 22:23:58 on 18/07/2017 to 00:09:36 on 21/07/2017 lasting for 73 hours and 14 minutes, a total of 95425 pieces of data were collected.

After data preprocessing, the obtained results were shown in the table below: Table3. Data details Brand Collected Duplicate data Advertisement Valid data

Huawei 30153 1077 5566 23510

VIVO 30871 5209 12939 12723

OPPO 34401 5990 10976 17435

4.1 Social media and Marketing

More times of a brand being mentioned on social media suggested that more people paid attention to this brand. Being mentioned contained users’ forwarding of brand marketing and activities, because the behavior itself represented user's attention to the brand. The sales data and posts quantity of the three brands in 2016 were shown in the table below. Through data comparison, it could be found that there was a great relevance between

Weibo posts quantity and the actual sales data. In general, the sales and market share of OPPO in 2016 were approximate to those of Huawei, and

OPPO sold 1.8 million more mobile phones than Huawei. In other words, the sales of OPPO were 0.235% higher than Huawei, and at the same time, its market share was 0.4% higher than Huawei. Via comparing their Weibo posts quantities, Huawei was more than OPPO, but only about 0.234%.

The sales of VIVO were 0.966% lower than Huawei and 1.173% lower than OPPO; its market share was 1.6% lower than Huawei and 2% lower than OPPO; and its Weibo posts quantity was 1.188% lower than Huawei and 0.981% lower than OPPO. The gaps between the data of VIVO and those of the other two brands reached three times more than the gap between OPPO and Huawei, indicating that there was an obvious relationship between the Weibo posts quantity including the advertisement quantity and market data.

Table4. Weibo posts quantity contrasts with market performance Brand Shipments in market share in Weibo posts

2016(M) 2016 quantity

OPPO 78.4 16.8% 28411

Huawei 76.6 16.4% 29076

VIVO 69.2 14.8% 25622

REF: IDC China Mobile Phone Tracker

4.2 Text mining

The text mining results were displayed through word cloud, and each brand was analyzed from two aspects; first of all, the analysis related to brand or product features, namely, mainly analyzing the product features Weibo users were more concerned about in terms of the brand; and finding the brand's competitors from the word cloud.

4.2.1 OPPO

After performing the text mining on the posts related to OPPO and removing the meaningless stopwords, the top 15 frequently seen words were respectively: OPPO, mobile phone, R9, shooting, webpage, Li

Yifeng, Yang Yang, ColorOS, charge, Huawei, OPPOR9, songs,

TFBOYS, music, Millet. Top 15 words could be divided into three categories: mobile phone functions, brand spokesmen and competitive brands.

a) Product and Brand Characteristics Analysis

Seen from the results, Weibo users’ discussion over OPPO products focused on the shooting, webpage browsing, music and the charging function. Among OPPO products, the most discussed one was "OPPO R9".

As the most watched mobile phone model released by OPPO in 2016, it was equipped with the "Android"-based operating system "ColorOS" and had the following three features: 1. fast charging function; 2. a 16- megapixel front-facing camera with the performance higher than its 13- megapixel rear camera; and 3. independent audio processing system with a better audio decoding capability (REF). Furthermore, OPPO's advertising strategies also constituted one of the crucial reasons for attracting Weibo users’ attention to these products. OPPO had three slogans: 1. "Charge for five minutes, and you can talk for two hours"; 2. "Keep real! OPPO REAL music mobile phone"; and 3. "Listen to your voice, OPPO mobile phone".

The features of OPPO products and the key topics in advertisements and

Weibo were as follows:

Table5. Product characteristics compare with posts topics Product features Advertisement Posts

Topics Fast charging, Fast charging, Take pictures,

High performance High performance web browsing,

camera, music player system music, and

ColorOS, charging

High performance

music player system

From the above comparison, it could be seen that Weibo user's focus was highly consistent with the features of OPPO products and the promotional content in advertisements.

b) Competitor Analysis

When users referred to OPPO in Weibo, they often mentioned other brands for comparison. For instance, "How does oppo r7s work? Or Millet? I’m entangled to decide to buy oppo r7s or Millet…" (the 6926th piece of

OPPO data); and "I have been using OPPO for ten years. Its fast charging is good, but I suggest it should be faster, as it’s slower than Samsung" (the

7788th piece of OPPO data). These posts could be divided into two categories: 1. the suggestions for the choice made between the products of different brands; and 2. the comparison between the advantages and disadvantages of different products. These posts contained valuable competitive intelligence that could help enterprises to define their competitors.

In the word frequency statistics results of OPPO, its competitors and their frequencies were as follows:

Table6. Competitive brand frequency Competitor Brand Frequency

Huawei 1158 Millet 983

Apple 851

MEIZU 394

Samsung 497

Through comparing the above data, it could be seen that OPPO’s biggest competitor was Huawei, whose times of being mentioned in the whole sample reached 1158, and next Millet for 983 times. Actually, Huawei and

Millet were both the leaders in China's mobile phone industry with their own distinctive product features. Huawei was known for its mobile performance and quality, while Millet was famous for its high cost- performance. Apple and Samsung respectively ranked third and fifth. It was worth mentioning that in IDC China smartphone sales ranking 2016,

Huawei ranked second with the market performance very close OPPO;

Apple ranked fourth; but Samsung was not in the top five.

4.2.2 Huawei

Huawei's text mining results in the highest frequency of fifteen words are:

Huawei, web pages, mobile phones, glory, music, Millet, Zhang Yixing,

P9, Apple, Ren Zhengfei, Samsung, China, made in China, technology, worldwide. These words can be divided into four categories: mobile phone function related, brand spokesperson, brand characteristics and competitive brands.

a) Product and brand characteristics analysis

Products, Weibo users are most concerned about the glory and P9 two mobile phones, but with the OPPO different, Weibo users of Huawei's discussion did not focus on product features and functions, but more to discuss its competitors and Huawei's brand characteristics. Huawei is a world top 500 ICT (information and communication technology) solution provider, main sales of communications equipment. “Made in China pride”,

“Science and Technology Innovation”, and “Global sales” are Huawei's well-known brand labels in China. In recent years, its share in the European and American markets has improved (REF). Its unique brand characteristics is to attract people to pay attention to the important factors,

“Huawei” this brand represents the high quality products have been widely accepted. Huawei's advertising strategy is also very accurate to reveal this point: 1.“Huawei, not just the world top 500!”; 2.“the top 500 of the world in your hand” 3.“If you like simple. Our details will make you moved!” The following are the key features of Huawei brand, advertising and Weibo’s key point:

Table7. brand characteristics compare with posts topics Brand features Advertisement Posts

Topics Made in China, Top 500, Mobile phone,

Top 500, Guarantee quality honor, P9, China,

Science and domestic,

Technology technology,

Innovation, world

Global sales

From the above comparison can be seen,weibo users' concerns and Huawei's brand characteristics and advertising in the promotional content is highly consistent.

b) Competitor Analysis

Huawei as the world's top 500 enterprises in recent years in Europe and the

United States market performance is getting better and better, its main competitor is not only China's other domestic mobile phone brand, but with world-class Apple, Samsung and other industry giants in line. In Huawei's word frequency statistics, their competitors and their frequency appear as follows:

Table8. Competitive brand frequency Competitor Brand Frequency

Millet 1897

Apple 1875

Samsung 1341

MEIZU 880

ZTE 317

Comparison of the above data can be seen, Huawei's biggest rival is Millet and Apple, respectively, in the whole sample was mentioned 1897 and 1875 times; Followed by Samsung, was mentioned 1341 times. Millet and

Huawei's product characteristics are different, Millet pay more attention to product cost, so people in the product cost and product quality to choose between time will have a contrast and discussion. Apple and Samsung are world-class brands, is the industry's product quality benchmark, but also

Huawei's strong rival. MEIZU and ZTE is a Chinese mobile phone brand, but its market performance is far lower than Huawei, Millet and other brands, from the data can also be seen that the two brands were mentioned far less than the number of other brands.

4.2.3 VIVO

VIVO related posts in the text mining of the highest frequency of the fifteen words are: VIVO, self-timer, soft, mobile phone, dual camera head, Huawei, X7, front camera, millet, X9, Xplay5, Song Zhongji, Xplay6, recording, border. Top 15 words could be divided into three categories: mobile phone functions, model, brand spokesmen and competitive brands.

a) Product and Brand Characteristics Analysis

From the results, the Weibo user's discussion focused on VIVO's self-timer.

Among them, “X7”,“ X9”,“ Xplay5”,“ Xplay6”are most attention.

These four phones are two different series of low-end and high-end models.

VIVO's "X" series focus on performance, "Xplay" series is focused on thin and design. VIVO in the product to do the difference, for the self-timer front camera for high-performance 20 megapixel dual camera system, innovative access to the front flash design, which attracts a lot of hobby users. VIVO ads also highlight the function of their camera: 1. "It does not matter that there is no sunshine, you are the focus of the front line. It does not matter that there is no background, because you are the scenery.

"; 2. "soft self-timer, illuminate your beauty". The following are VIVO product features, advertising and Weibo in the key topic:

Table9. Product characteristics compare with posts topics Product features Advertisement Posts Topics Front high Self-timer Self-timer, soft

performance dual light (front

camera, flash), front dual

Front flash camera

As can be seen from the above comparison, Weibo users focus on VIVO product features and advertising in the promotional content is highly consistent.

b) Competitor Analysis

On VIVO's word frequency statistics, their competitors and their frequency appear as follows:

Table10. Competitive brand frequency Competitor Brand Frequency

Huawei 1253

Millet 1156

Apple 853

Samsung 572

MEIZU 426

Comparison of the above data can be seen, VIVO biggest competitor is

Huawei, there are 1253 times the number of ontologies in the whole sample;

Followed by Millet, was mentioned 1156 times. Apple and Samsung are ranked third and fourth respectively. An interesting fact is that VIVO and

OPPO's competitors are very similar in sorting, but in the VIVO text mining results appear in the competitor brand did not appear OPPO.

Although VIVO in the market performance behind the OPPO, but the two brands of marketing model exactly the same, all through the line store to complete channel sales, their stores all over China, almost every OPPO store is next to every VIVO store. These two brands seem to be competitive, but from the data point of view exactly the opposite. Have to say that there are such surprising results because the two brands of distinctive product features and user base positioning.

4.3 Weibo Sentiment Polarity Analysis

Through sentiment analysis, we can see the attitude of the three companies in 2016, and discuss the relationship between these emotional tendencies and market performance. Emotional tendencies are divided into three categories: positive, neutral and negative. The results are shown below:

Table11. Weibo sentiment polarity result OPPO Huawei VIVO

Sample size 17435 23510 12723

Positive 2362(13.5%) 3808(16%) 1526(12%)

Neutral 13140(75%) 17093(73%) 8740(69%)

Negative 1933(11%) 2609(11%) 2457(19%)

The table summarises the results of Sentiment polarity analysis. From the data we can see that Huawei has 16% of the posts show a positive tendency, is the most of the three brands. VIVO has 19% posts showing negative tendencies, which is far higher than OPPO and Huawei. The neutral posts of these three brands accounted for about 70%, which contains a large number of emotions that can not be identified, so the results are not accurate. There are two reasons for this phenomenon: 1. The dictionary for

Sentiment analysis can not cover all the emotional vocabulary; 2. Chinese sentences are more ambiguous, the emotions in these statements can not be matched to the dictionary.

4.4 Purchase Intention Analysis

The user's posts in Weibo often include the user's intention to buy, for example: "I must buy Huawei next time!" (No. 942 data of Huawei), “I decided to buy a Huawei” (No. 1680 data of Huawei), "I want to buy it after watching the oppo ad" (No. 6394 data of appo). These posts contain suggestions that the user wants to buy, or have decided to buy. This information is a valuable competitive, this part makes a comparison between the market performance of these 3 brands and the conclusions through purchase intention analysis.

After completing the data preprocessing, randomly extracting 600 samples, marking the training set, and etc., The researcher use the SVMs in WEKA to process the sample data set “Trainingset_1”, and finally get 68% model accuracy. “True” on behalf of the user has the intention to buy, all of the opinions except “True” are “False”. The results are shown in the following table:

Table12. Weibo Purchase Intention Analysis result OPPO Huawei VIVO

Sample size 17435 23510 12723

Ture 3312(19%) 3996(17%) 1399(11%)

False 14123(81%) 19514(83%) 11324(89%)

Shipments in 78.4 76.6 69.2

2016(M)

From the results, there is 19% people possibility to buy products from

OPPO, 17% of people may buy Huawei, the different is similar to their sales gap. While VIVO in the purchase intention and annual sales than the other two brands behind a lot. So, the greater the proportion of Posts with the intention to buy, the better the market performance will be.

4.5 Summary

In summary, Weibo contains a wealth of competitive intelligence. The number of posts posted on Weibo is closely related to its market performance. And the more users are discussing which brand, the higher the brand's annual sales. The role of text mining is very obvious, these fragmented posts contains people's attention to the product or brand, and business competitors information. Through the comparison between the posts and the product features, brand characteristics, and advertising themes, it can be summed up: The focus points of Weibo users to these three brands are the product features and advertising themes. Through this conclusion, merchants can understand the consumer from the information in the social media, the competitor can be defined intuitively by sorting the frequency of competing brands. The result of Sentiment polarity analysis is not accurate enough, mainly due to technical defects, which is worth improving. The results of the purchase intention analysis are very clear: the greater the proportion of Posts with the intention to buy, the better the market performance will be.

5. Discussion

It shows that the social media fragment information contains a wealth of competitive intelligence resources through the above analysis, and we also can gain valuable information through a reasonable method of analysis. In this study, it compared the Weibo posts and the market performance, making a deep study on three Chinese smart phone manufacturers, i.e.

Huawei, OPPO and VIVO. Through the results of analysis, the research objectives could be explained as below:

1. Analyse data about the 3 companies gathered from the Weibo platform using text mining and sentiment analysis.

This research uses the sampling method to filter the data in 5th, 15th, 25th each month from Weibo Phone as the data source, which can make the exclusion of more external factors of interference. Through the Text mining to analyse the posts can get of the following two conclusions: The hot spots of the brand discussion from people on Weibo are the characteristics of this brand or this product; 2. The competitor can be defined by sorting the frequency of competing brands in posts, and the sorting results are somewhat relevant to the market performance though they are not very obvious. The sentiment analysis results are divided into two parts: 1.The results of lexicon-based sentiment polarity recognition do not show obvious characteristics, the main reason is that the relevant

Chinese text mining technologies are immature, and the ambiguity in natural language can not be identified; 2. The results of machine learning- based purchase intention analysis have obvious features, the greater number of the posts which contain the purchase intention, the better market performance the corresponding brands have.

Kim (2016), in the study about iPhone 6 and Galaxy S5, uses real-time mobile phone-related tweet data collection method, and use the same lexicon-based method and machine learning-based method in the

Sentiment analysis part. But Kim et al.’ findings are not obvious. They think that it is because Twitter only has 140 words limitation, leading people cannot fully express feelings. They do not use text mining to analyse the data, but use the collected data to predict future sales.

This study used a method similar to Kim et al., but did not predict future sales. Instead, it focused on finding the competitive intelligence contained in the collected data. This study is different from Kin and others research as three following points: 1. The data collection methods and data pre- processing methods are very different. The research object of this study are the data of the three brands throughout the year 2016. Therefore, it chose to extract the three days of the month as the data source; 2. The technique used for dealing with the data is different. The study is based on Chinese social media Weibo. Thus, the difficulty of text processing is much greater than that of English text processing due to the unique nature of Chinese. 3.

This research focuses more on the link between social media data and actual market performance. In this study, it discussed the relationship between Weibo posts quantity and actual market performance, and used text mining to analyse the relationship between user's concerns and product features in posts, and also used competitor analysis using high frequency vocabulary statistics.

Although the two studies are similar, the actual purpose is different. On the other hand, this study due to technical and many other factors is not perfect.

There are some room for improvement.

2. Defining the impact of social media on market performance.

Smart phone industry is a very competitive industry, especially in recent years, the market close to saturation. Mobile phone performance differences are smaller and smaller. The design which is suitable for users and reasonable marketing means are key factors in the occupation of the market. There is no doubt that social media is a low cost and efficient marketing platform. As the top three companies in the 2016 market,

Huawei, OPPO and VIVO have active marketing activities on Weibo.

Mzinga and Babson (2009) have pointed out that 86% of companies use social media for commercial purposes, with 57% of social media as a marketing tool. In the Paniagua (2014) study, he pointed out that social media is an effective marketing tool. From the results of the study, OPPO's advertising number is 10976, accounting for 39% of the total advertising;

Huawei's advertising number is 5566, accounting for 19% of the sum;

VIVO advertising number is 12939, accounting for 50% of the total advertising. These data can prove that these companies are well positioned to actively use the social media platform as a marketing tool. In other words, the use of social media as a marketing tool is one of the important factors affecting the performance of the market.

3. Defining the relationship between social media data and corporate market performance

From the research results, the three brands of social media data and market performance have a very significant correlation. The correlation performed in two ways: 1. The more quantity Weibo posts, the better performance in marketing; 2. The more people having intention of buying, the better performance in marketing. The first point contains a large number of users to forward the advertising content. Because the behaviour of forwarding ads represents the user's response to marketing activities, which is a positive attitude to the expression of the product.

In the Kim’s (2016) study based on social media competitive intelligence, all forwarding and advertising were removed in the step of data pre- processing. Because "the content that the user forwarded was largely meaningless." During the study, he did not discuss the relationship between the number of posts and the market data. In this study, this study discussed the relationship between the two and defined the correlation between them.

On the other hand, the conclusions of this study are based on annual sampling of data sets, rather than annual data, which may lead to inaccurate results. In addition, the study only discussed the relationship between the number of posts and the annual total sales of the three brands. Thus, the results cannot fully represent the relationship between social media data and corporate market performance.

6. Conclusion

Today's social media has been integrated into people's lives, as human communication and social important way. So the social media contains a lot of valuable information, the aim of this study is to mining the competitive intelligence from Huawei, OPPO, VIVO related social media data, and discuss the link between social media data and actual market performance, and establish the relevant methodologies. From the results, the brand-related social media data and market performance has a very significant correlation, at the same time, the social media platform is a very important marketing tool for the enterprise. Through the collected data for the text mining and sentiment analysis obtained a lot of valuable conclusions, it is also possible to demonstrate that the methodology designed by this institute is applicable to all competing services for the same purpose.

Limited to the Chinese text processing technology and the ability of the researchers themselves, this study is still flawed. In Sentiment analysis, the results of Sentiment polarity analysis were not significantly different, the fundamental reason is: 1. Chinese natural language processing technology in the emotional dictionary is very imperfect, can not cover most of the expression of emotional Chinese vocabulary; 2. Chinese and English different, the need for complex word segmentation technical support, subject to the technical ability of the researchers word segmentation has many defects; 3. Ambiguity and new words in Chinese can not be recognized by computers. There is another point that the data discussed in this study is limited by data resources, researchers can not and get enough data to support, so only for posts quantity and advertisement quantity made a simple discussion.

This study can be improved in the future from the following three points:

1. data collection. In the case of sufficient resources to support the collection of higher density of data, so that the results can be more accurate;

2. Improve the Chinese text processing technology. Uses Machine learning-based method to create emotional dictionaries based on actual needs, for example, from Amazon's consumer's comments on the phone to get the phone's emotional words, this method will improve the accuracy of the results; 3. Automated processing. At this stage of data collection technology, data processing technology has been able to meet from data collection to data analysis of automated processing, which is the future direction of technological development.

All in all, this study attempts to use the Chinese social media platform

Weibo micro data sources, collected three 2016 China sales of the first three mobile phone brand related posts and finally the results of the shortcomings, but to achieve the purpose of this study, and for the future improvement of the work made a plan.

Reference

Bao, S., Xu, S., Zhang, L., Yan, R., Su, Z., Han, D., & Yu, Y. (2012). Mining social emotions from affective text. IEEE transactions on knowledge and data engineering, 24(9), 1658-1670.

Bergeron, P. (2000). Government Approaches to Foster Competitive Intelligence Practice in SMEs: A Comparative Study of Eight Governments. In Proceedings of the ASIS Annual Meeting (Vol. 37, pp. 301-8).

British Psychological Society (2013). Ethics Guidelines for Internet-mediated Research. INF206/1.2013. Leicester: Author. Available from: http://www.bps.org.uk/system/files/Public%20files/inf206-guidelines-for-internet- mediatedresearch.pdf

Bryman, A. (2006). Integrating quantitative and qualitative research: how is it done?. Qualitative research, 6(1), 97-113.

Cis.poly.edu. (2017). Web Exploration and Search Technology Lab. [online] Available at: http://cis.poly.edu/westlab/polybot/ [Accessed 29 May 2017].

China Industrial Network. (2017). 2016-2022 China social media market operating situation and development prospects. [online] Chyxx.com. Available at: http://www.chyxx.com/research/201605/415250.html [Accessed 11 Apr. 2017].

Changhuo, B., Yan, L., & Xiuling, W. (2006). Human Intelligence Network [J]. Information Studies: Theory & Application, 2, 000.

Chen Feng. (2002).Factors of success in enterprise competitive intelligence work. 中国信息导报, (9), 44-45.

Chen, M.(2004). Application of a mathematical model of Information Retrieval (LSI) in competitive intelligence system, Proceedings of the 2004 International Conference on Management Science & Engineering,1(2):437-440

Chunpin Ouyang, Xiaohua yang, Longyan lei, Qiang xu, Ying yu, & Zhiming liu. (2014). 多策略 中文微博细粒度情绪分析研究(Multi - strategy Chinese microblogging fine - grained emotion analysis). 北京大学学报 (自然科学版), 50(1), 67-72.

Crawlscript.github.io. (2017). WebCollector by CrawlScript. [online] Available at: http://crawlscript.github.io/WebCollector/ [Accessed 26 May 2017].

Chen, Y., Jin, P., & Yue, L. (2008, December). Ontology-driven extraction of enterprise competitive intelligence in the Internet. In Future Generation Communication and Networking Symposia, 2008. FGCNS'08. Second International Conference on (Vol. 2, pp. 35-38). IEEE.

Duncan, S. (2006). CI: Social Networking Systems as Competitive Intelligence Tools. Competitive Intelligence Magazine, 9(4), 16-19.

Du Toit, A. S. A. (2013). Comparative study of competitive intelligence practices between two retail banks in Brazil and South Africa. Journal of Intelligence Studies in Business, 3(2).

Du Toit, A. S. (2015). Competitive intelligence research: An investigation of trends in the literature. Journal of Intelligence Studies in Business, 5(2).

Erdelez, S., & Ware, N. (2001). Finding competitive intelligence on Internet start-up companies: a study of secondary resource use and information-seeking processes. Information Research, 7(1), 7- 1.

Finance Sina (2017). 微博发布 2016 年第四季度及全年财报. [online] Available at: http://finance.sina.com.cn/stock/usstock/c/2017-02-23/doc-ifyavvsk2753481.shtml [Accessed 3 Jun. 2017].

Fleisher, C. S. (2001). An introduction to the management and practice of competitive intelligence (CI). Managing frontiers in competitive intelligence, 3-18.

Fleisher, C. S., & Bensoussan, B. E. (2003). Strategic and competitive analysis: methods and techniques for analyzing business competition . Upper Saddle River, NJ: Prentice Hall.

Gordon-Till, J. (2004). Competitive Intelligence–Law and Ethics. Legal Information Management, 4(01), 17-18.

Go A, Bhayani R, Huang L. Twitter Sentiment classification using distant supervision.CS224N Project Report, Stanford, 2009:1-12

Gruebel, R. J., & Weida, W. A. (1997). Market and competitive intelligence: Targeting the sci-tech market place. Journal of AGSI, 6(2), 68-93.

Heydon, A., & Najork, M. (1999). Mercator: A scalable, extensible web crawler. World Wide Web, 2(4), 219-229.

Huang Xiaobin, & Nie Bin . (2012). The collection and analysis of enterprise competitive intelligence based on Weibo. 情报理理论与实践, 35(5), 5-9.

Iimedia. (2017). 2016-2017 年中国智能手机市场监测报告. [online] Available at: http://www.iimedia.cn/49815.html [Accessed 1 Jun. 2017].

Kantar Media. (2017). CHINA SOCIAL MEDIA LANDSCAPE 2016. [online]ciccorporate. Available at: http://www.ciccorporate.com/download/China_Social_Media_Landscape_2016_presentation_dec k.pdf [Accessed 7 Apr. 2017].

Kaplan, A. M.,Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of Social Media. Business horizons, 53(1), 59-68.

Kim, Y., Dwivedi, R., Zhang, J., & Jeong, S. R. (2016). Competitive intelligence in social media Twitter: iPhone 6 vs. Galaxy S5. Online Information Review, 40(1), 42-61.

Köseoglu, M. A., Ross, G., & Okumus, F. (2016). Competitive intelligence practices in hotels. International Journal of Hospitality Management, 53, 161-172.

Koutalakis S. Mzinga(2010), Inc: The Leader In On-Demand Social Software. http://www. mzinga.com/company/newsdetail.asp? lang=en&newsID = 252&strSection = company&strPage = news

Lambert, G. (2009). Harnessing free-flowing competitive intelligence through social media sites. Law Practice Management,(07):26-28.

Larbin.sourceforge.net. (2017). Larbin : Parcourir le web, telle est ma passion. [online] Available at: http://larbin.sourceforge.net/index-eng.html [Accessed 29 May 2017].

Li Longsheng. (2015). Research on Commercial Ethics in Internet. 法律律适⽤(9), 57-61.

Liu H, Li S, Zhou G, et al. Joint Modeling of News Reader’s and Comment Writer’s Emotions[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013: 511–515.

Malhotra, C. K., & Malhotra, A. (2016). How CEOs can leverage twitter. MIT Sloan Management Review, 57(2), 73.

Marteau, P. F., & Krumeich, C. (1995). Semantic analysis of text applied to competitive intelligence. IDT, 95, 258-265.

Marín Llanes, L., CARRO CARTAYA, J., & ESPIN ANDRADE, R. (1999). Some information analysis techniques for the competitive intelligence process. FID review, 1(4-5), 64-71.

Ma, T., Zhang, Y., Huang, L., Shang, L., Wang, K., Yu, H., & Zhu, D. (2017).Text mining to gain technical intelligence for acquired target selection: A case study for China's computer numerical control machine tools industry. Technological Forecasting and Social Change, 116, 162-180.

Mayfield, A. (2008). What is social media.

Meyer, D., Hornik, K., & Feinerer, I. (2008). Text mining infrastructure in R. Journal of statistical software, 25(5), 1-54.

Miao Qijao. (1995). Competitive Intelligence - Foreign Development Trends and Its Impact on

China. 情报理理论与实践, (1), 2-10.

Mintel (2016) Social and Media Networks - UK - May 2016 retrieved from Mintel academic database

Miller, S. H. (2001). Competitive Intelligence–an overview. Competitive Intelligence Magazine, 1(11), 1-14.

Mzinga and Babson Executive Education (2009). Social Software in Business.

Nasri, W. (2011). Competitive intelligence in Tunisian companies. Journal of Enterprise Information Management, 24(1), 53-67.

Nasukawa T, Yi J. Sentiment analysis: Capturing favorability using natural language processing. Processing of the 2nd international conference on Knowledge capture, ACM, 2003:70-77.

Nicole,B. , Daily, R. (2010). Commentary: gain competitive intelligence using social media. Daily Record.

Nutch.apache.org. (2017). Apache Nutch™ -. [online] Available at: http://nutch.apache.org/ [Accessed 29 May 2017].

Obar, J. A., & Wildman, S. S. (2015). Social media definition and the governance challenge: An introduction to the special issue.

Paniagua, J., & Sapena, J. (2014). Business performance and social media: Love or hate?. Business horizons, 57(6), 719-728. Patton, M. Q. (2005). Qualitative research. John Wiley & Sons, Ltd.

Ponis, S. T., & Christou, I. T. (2013). Competitive intelligence for SMEs: a web-based decision support system. International Journal of Business Information Systems, 12(3), 243-258.

Pomerol, J. C. (1997). Artificial intelligence and human decision making. European Journal of Operational Research, 99(1), 3-25.

Rani, V. V., & Rani, K. S. (2016). Twitter Streaming and Analysis through R. Indian Journal of Science and Technology, 9(45).

Pikas, C. K. (2005). Blog searching for competitive intelligence, brand image, and reputation management. Online, 29(4), 16-21.

Saravanakumar, M., & SuganthaLakshmi, T. (2012). Social media marketing. Life Science Journal, 9(4), 4444-4451.

Sassi, D. B., Frini, A., Abdessalem, W. B., & Kraiem, N. (2015, May). Competitive intelligence: History, importance, objectives, process and issues. In Research Challenges in Information Science (RCIS), 2015 IEEE 9th International Conference on (pp. 486- 491). IEEE.

Sohu.com. (2017). IDC:2017 年第一季度国产品牌领跑中国智能手机市场. [online] Available at: http://www.sohu.com/a/138209930_485557 [Accessed 1 Jun. 2017].

Sphinx. (2017). WebSPHINX: A Personal, Customizable Web Crawler. [online] Available at: https://www.cs.cmu.edu/~rcm/websphinx/ [Accessed 1 Jun. 2017].

Stanat, R. (1986). Building a document-based competitive intelligence system. In National online meeting. 7 (pp. 433-438).

Tarapanoff, K., Gomes da Nobrega, R., & Cormier, P. M. J. (1999). Competitive intelligence and scenarios: a methodological proposal for a case study in

Brazil. FID review, 1(4-5), 31-41.Wagers, R. (1986). Online Sources of Competitive Intelligence. Database, 9(3), 28-38.

Wang Shuyi, &Wang Xin. (2010). Enterprise competitive intelligence collection based on micro- blog Twitter. 情报学报, (3), 545-552.

Wang Yongshen,Li Min, &Ren Baoshi. (2011). A review of competitive intelligence analysis methods in China. 科技情报开发与经济, 21(35), 164-166.

Webarchive.jira.com. (2017). Heritrix - Heritrix - IA Webteam Confluence. [online] Available at: https://webarchive.jira.com/wiki/display/Heritrix [Accessed 29 May 2017].

Wright, S., Eid, E. R., & Fleisher, C. S. (2009). Competitive intelligence in practice: empirical evidence from the UK retail banking sector. Journal of Marketing Management, 25(9-10), 941-964.

Xu, K., Liao, S. S., Li, J., & Song, Y. (2011). Mining comparative opinions from customer reviews for Competitive Intelligence. Decision support systems, 50(4), 743-754.

Yang Chunjing, &Liu Yinxuan. (2007). Analysis method of Competitive Intelligence.科技情报开 发与经济, 17(36), 108-110.

Yu Bo. (2010). Discussion on the significance of Weibo in Informatics. 图书情报⼯工作, (22),

57-60.

Zhan Hongqiao. (2003). competitive intelligence key influencing factors analysis. 图书馆学刊, (z1).

Zhang Chunhong, Yu Cuibo, Zhu Xinning, &Gao Ya. (2012). Social networking(SNS) technology base and development case.

Appendix

a) Delete duplicate data install.packages("wordcloud2") library(wordcloud2)

########################################### huawei <- read.csv('C:/Users/Administrator/Desktop/12345/HUAWEI-1/Sheet111.csv')

huawei2<- huawei [!duplicated(huawei$X.5),]

setwd('C:/Users/Administrator/Desktop/shujuquchong') write.csv(huawei2, file = 'huawei-3.csv') ############################################ oppo <- read.csv('C:/Users/Administrator/Desktop/12345/oppo-1/Sheet1-vivo.csv')

oppo2<- oppo [!duplicated(oppo$X.5),]

write.csv(oppo2, file = 'oppo-3.csv') ############################################ vivo <- read.csv('C:/Users/Administrator/Desktop/12345/vivo-1/Sheet1-vivo.csv')

vivo2<- vivo [!duplicated(vivo$X.5),]

write.csv(vivo2, file = 'vivo-3.csv') ############################################# #####only keep x.2 and x.5 vivo######## vivo3<- vivo [!duplicated(3,6),] b) Word cloud import jieba import os from jieba.analyse import extract_tags from wordcloud import WordCloud import matplotlib.pyplot as plt from scipy.misc import imread from ReadContent import Content

font = os.path.join(os.path.dirname(__file__), "c:\\windows\\fonts\\simsun.ttc") def GeneratePicture(content, max_words=50, imgname='',Picname = '华为.jpg'): """

wordfreq_path = os.getcwd()+'\\{}.词频统计 txt'.format(imgname) wordfreq_txt = open(wordfreq_path,'a+',encoding='utf-8') path = os.getcwd() tags = extract_tags(content, topK=max_words) word_freq_dict = dict() word_list = list(jieba.cut(content)) for tag in tags: freq = word_list.count(tag) word_freq_dict[tag] = freq wordfreq_txt.write(tag+' '+str(freq)+'\n')

if Picname: if '\\' in path: background = path + '\\' + Picname else: background = path + '/' + Picname back_coloring = imread(background) wc = WordCloud(font_path=font, background_color="white", max_words= max_words, max_font_size=100, mask = back_coloring, random_state=42) else: wc = WordCloud(font_path=font, background_color="white", max_words=max_words, max_font_size=100, random_state=42) wc.generate_from_frequencies(word_freq_dict) plt.imshow(wc) plt.axis("off") plt.show()

if '\\' in path: pic_file = path + '\\' + imgname + '%d.png' % max_words else: pic_file = path + '/' + imgname + '%d.png' % max_words wc.to_file(pic_file) wordfreq_txt.close()

textdata = Content().read_content(file='huawei-3.csv') GeneratePicture(content = textdata, max_words=100,imgname='huawei',Picname = '')

c) Chinese Word Segmentation

import pandas as pd import jieba import os import csv

def open_dict(path): dictionary = open(path, 'r', encoding='utf-8') dict = [] for word in dictionary: word = word.strip('\n') dict.append(word) return dict

def Go(file,output):

outputpath = os.getcwd()+'\\{output}'.format(output=output) outputfile = open(outputpath,'w+',encoding='utf-8',newline='') writer = csv.writer(outputfile) writer.writerow(('1','关键字','博主头像','博主 id','博主','博主主页','博文','博文独立网址',' 发布时间','发布终端','转发数','评论数','点赞数','','','博文分词')) path = os.getcwd() + '\\{file}'.format(file=file) f = open(path, encoding='utf-8') df = pd.read_csv(f)

row_max = int(df.describe().ix[0, 0]) stopword = open_dict(os.getcwd()+'\\Stopword.txt')

for i in range(row_max):

row_data = df.ix[i,:] row_num = row_data['1'] keyword = row_data['关键字'] img = row_data['博主头像'] ID = row_data['博主 id'] BoZhu = row_data['博主'] BoZhu_url = row_data['博主主页'] Bowen = row_data['博文'] Bowen_url = row_data['博文独立网址'] date = row_data['发布时间'] Mobile = row_data['发布终端'] ZhuanFaShu = row_data['转发数'] PingLunShu = row_data['评论数'] DianZanShu = row_data['点赞数'] fullpath = row_data[''] createdate = row_data['']

#jieba 分词 Bowen_Seg = list(jieba.cut(Bowen))

Bowen_Seg = [w for w in Bowen_Seg if len(w)>=2 and w not in stopword]

Bowen_Seg = ' '.join(Bowen_Seg)

writer.writerow((row_num, keyword, img, ID, BoZhu, BoZhu_url, Bowen, Bowen_url, date, Mobile, ZhuanFaShu, PingLunShu, DianZanShu, fullpath, createdate, Bowen_Seg))

outputfile.close()

Go(file = 'huawei-3.csv', output = 'new-huawei-3.csv')

Application 015330

Section A: Applicant details

Created: Tue 20 June 2017 at 16:31

First name: Pan

Last name: Zhao

Email: [email protected]

Programme name: Information Management

Module name: Research Methods and Dissertation Preparation Last updated: 29/06/2017

Department: Information School

Date application started: Tue 20 June 2017 at 16:31

Applying as: Undergraduate / Postgraduate taught

Research project title: Competitive Intelligence Based On Social Media

Section B: Basic information

1. Supervisor(s)

Name Email

Pamela McKinney [email protected] 2: Proposed project duration

Proposed start date: Sat 1 July 2017

Proposed end date: Fri 1 September 2017

3: URMS number (where applicable)

URMS number - not entered -

4: Suitability

Takes place outside UK? No

Involves NHS? No

Healthcare research? No

ESRC funded? No

Involves adults who lack the capacity to consent? No

Led by another UK institution? No

Involves human tissue? No

Clinical trial? No

Social care research? No

5: Vulnerabilities

Involves potentially vulnerable participants? No Involves potentially highly sensitive topics? No

Section C: Summary of research Section C: Summary of research

1. Aims & Objectives

The core of this project is to study how companies use social media to gain competitive intelligence and gain insight into market information to improve the competitiveness of their businesses. Research on the Chinese smart phone industry as the object of study, investigate the competitive landscape of the Chinese mobile phone industry and understand the competitive position of the Huawei, OPPO and VIVO, based on their social media. In the course of the study, it is the focus of this project on how to collect target information in Chinese social media,then the process of information collation and analysis after the completion of information collection. In order to achieve the above aim, the specific objectives are: 1.To understand what social media platform used by Chinese smart phone business, the characteristics of these platforms and the impact of these platforms on the business. 2.To collect related posts includes the publisher ID, the content of the tweets, the number of comments, the number of forwards from Weibo on social media and using text mining and sentiment analysis to analyses. 3. Determine the factors which define the competitive landscape of the mobile phone industry based on social media in China. 4. To assess the role of social media in the Chinese mobile phone industry 5.To research and comparison the relationship between the companies' social media data and actual performance 6. To understand the limitations and challenges of social media research in the mobile phone industry

2. Methodology

Methodology

In this study, three typical Chinese intelligent mobile phone enterprises as samples, choose their products as the analysis target, then, using text mining, sentiment analysis techniques to derive the user's view of these products as a whole. By comparing the user's view of the product, sales performance and participation in the discussion of the amount of data to assess the relevance of the information contained in the social media and the sales performance of the product. Verify that theory from Kim and Zhang(2016) are equally applicable in the Chinese market.

Quantitative

Due to the relatively open method of quantitative research, it is suitable for exploratory research, often produce new insights and changes of direction (Bryman, 2006).This research is based on social media data, it is needs to collect a large number of social media data, and then through the analysis of the data to answer the question. This requires the use of quantitative research methods, such as the amount of data to analyze the relationship between corporate performance and the degree of concern about the commodity.

Main research questions: 1. How many people are posting about the three brands, and what is the relationship of the number of brand-related tweets and brand market performance? 2. What are people talking about and whether their attitude to the brand has affected the market performance? 3. What impact does the social media have on the brand? Including market performance and people's attitude towards the brand. To solve these three questions, it is need to compare and analyze the market data with social media data, then, found the relationship.

Data collection methods This study is based on China's social media platform, and it will use the Crawler tool to collect the posts associated with the three mobile phone brands by keyword from the Weibo platform. The requirements of the data are true and accurate, however, the acquisition of Weibo's Application Programming Interface (API) permissions are more stringent, after the success of the application of data collection is limited, cannot get a complete set of data. The greatest impact on this study is that Weibo does not allow comments to be collected by the API for each post. Therefore, this method also has some shortcomings, the need to use the Crawler tools in a specific environment, through the key words on social media to pick up the topic related data.

In this project, six relatively broad open source crawler frameworks were compared, and the open source crawler framework WebCollector (Crawlscript, 2017) was chosen. WebCollector is a Java- based open source Web crawler framework, there are three reasons to choose it: 1, its operating interface is simple, easy to use, can be visualized operation; 2, WebCollector support for the Chinese very well, there are widely used in China, a lot of information to help develop; 3, support the Windows platform to meet this project of the equipment requirements. Although WebCollector is poorly scalable and has a less functional framework than other open source crawlers, it is sufficient to meet the data acquisition needs of this study. The crawler program to be used in this study has been basically developed, see the appendix.

The data collected in this study are divided into two categories, social media data and enterprise sales data. Social media data from Weibo, to avoid irrelevant data, this study applied hashtag "#" to find the relevant data. Specifically, "#åŽä¸º(Huawei)", "# OPPO" and "#VIVO" are used. Each of the collected posts includes the publisher ID, the content of the tweets, the number of comments, the number of forwards, and so on. Per Kim (2016) et al.'s study, estimate data size in 30,000 posts. Enterprise sales data from the major media reports, research institutions published reports and corporate annual reports, the main concern is the mobile phone sales and turnover.

This study plans to collect data from July 1 to July 10, 2017, and collect all relevant posts from January to December 2016.

Data analysis methods

Data analysis of social media content will use data analysis techniques such as text mining and sentiment analysis. Text mining can extract valuable information from natural language, the more common practice is to count the frequency of words in the sample, and the frequency can reflect the importance of a word, which helps to judge what is the main opinion of a business or product (Meyer, Hornik, & Feinerer, 2008); Sentiment analysis can be used to explore the emotion of the user and provide support for the decision (Rani & Rani, 2016). The data analysis methods such as text mining and sentiment analysis can deal with a large amount of data in a short time. The purpose of emotional analysis is to study whether the emotional tendencies of the product are related to sales。Emotional analysis methods are divided into two categories: the first is based on the emotional dictionary method; the second is based on machine learning methods. Machine learning method is costly, so here choice the first method. To realize the fast, accurate and efficient work of enterprise competitive intelligence. This study will use R-Studio for text mining and emotional analysis.

3. Personal Safety

Raises personal safety issues? No

Pesonal safety management

- not entered - Section D: About the participants

1. Potential Participants

There is no potential participants in this research.

2. Recruiting Potential Participants

The study does not need to recruit potential participants.

2.1 Advertising methods

Will the study be advertised using the volunteer lists for staff or students maintained by CiCS? No

- not entered -

3. Consent

Will informed consent be obtained from the participants? (i.e. the proposed process) No

This study is based on social media data.

4. Payment

Will financial/in kind payments be offered to participants? No

- not entered -

5. Potential Harm to Participants

What is the potential for physical and/or psychological harm/distress to the participants?

This study is based on social media data, there is no direct participant, so it will not harm or distress for anyone.

How will this be managed to ensure appropriate protection and well-being of the participants?

This study is based on social media data, there is no direct participant, so there is no need to protect the well-being of the participants

Section E: About the data

1. Data Confidentiality Measures

In this project, in addition to the official account, all the collected datas will be anonymous to the publisher, and no post content will be rewritten in the article.The collected data will not be shared with any individual or organization except for the school backup. 2. Data Storage

he data will be kept by me and processed on my personal computer.When the data is not in use will be in the encrypted state,there is no second person other than me can read or use the dataset. There is no personal factor in the process of data analysis, the user ID is not an element of the analysis, and no user ID or other personal information is present in this study. The data will also store on the Information school's research data server.

Section F: Supporting documentation

Information & Consent

Participant information sheets relevant to project? No

Consent forms relevant to project? No

Additional Documentation

None

External Documentation

- not entered -

Offical notes

- not entered -

Section G: Declaration

Signed by: Pan Zhao Date signed: Thu 29 June 2017 at 10:34 Downloaded: 04/09/2017 Approved: 29/06/2017

Pan Zhao Registration number: 160128947 Information School Programme: Information Management

Dear Pan

PROJECT TITLE: Competitive Intelligence Based On Social Media APPLICATION: Reference Number 015330

On behalf of the University ethics reviewers who reviewed your project, I am pleased to inform you that on 29/06/2017 the above-named project was approved on ethics grounds, on the basis that you will adhere to the following documentation that you submitted for ethics review:

University research ethics application form 015330 (dated 29/06/2017).

If during the course of the project you need to deviate significantly from the above-approved documentation please inform me since written approval will be required.

Yours sincerely

Larah Hogg Ethics Administrator Information School

Information School.

Access to Dissertation

A Dissertation submitted to the University may be held by the Department (or School) within which the Dissertation was undertaken and made available for borrowing or consultation in accordance with University Regulations.

Requests for the loan of dissertations may be received from libraries in the UK and overseas. The Department may also receive requests from other organisations, as well as individuals. The conservation of the original dissertation is better assured if the Department and/or Library can fulfill such requests by sending a copy. The Department may also make your dissertation available via its web pages.

In certain cases where confidentiality of information is concerned, if either the author or the supervisor so requests, the Department will withhold the dissertation from loan or consultation for the period specified below. Where no such restriction is in force, the Department may also deposit the Dissertation in the University of Sheffield Library.

To be completed by the Author – Select (a) or (b) by placing a tick in the appropriate box

If you are willing to give permission for the Information School to make your dissertation available in these ways, please complete the following: X (a) Subject to the General Regulation on Intellectual Property, I, the author, agree to this dissertation being made immediately available through the Department and/or University Library for consultation, and for the Department and/or Library to reproduce this dissertation in whole or part in order to supply single copies for the purpose of research or private study (b) Subject to the General Regulation on Intellectual Property, I, the author, request that this dissertation be withheld from loan, consultation or reproduction for a period of [ ] years from the date of its submission. Subsequent to this period, I agree to this dissertation being made available through the Department and/or University Library for consultation, and for the Department and/or Library to reproduce this dissertation in whole or part in order to supply single copies for the purpose of research or private study Name Pan Zhao Department Information School Signed Pan Zhao Date 03/09/2017

To be completed by the Supervisor – Select (a) or (b) by placing a tick in the appropriate box

(a) I, the supervisor, agree to this dissertation being made immediately available through the Department and/or University Library for loan or consultation, subject to any special restrictions (*) agreed with external organisations as part of a collaborative project. *Special restrictions (b) I, the supervisor, request that this dissertation be withheld from loan, consultation or reproduction for a period of [ ] years from the date of its submission. Subsequent to this period, I, agree to this dissertation being made available through the Department and/or University Library for loan or consultation, subject to any special restrictions (*) agreed with external organisations as part of a collaborative project Name Department Signed Date

THIS SHEET MUST BE SUBMITTED WITH DISSERTATIONS IN ACCORDANCE WITH DEPARTMENTAL REQUIREMENTS.