<<

PUBLIC POLICY RESEARCH FUNDING SCHEME

公共政策研究資助計劃

Project Number : 項目編號: 2013.A8.009.14A

Project Title : Can Online Opinion Reflect Public Opinion? An Investigation 項目名稱: into the Interplays between Online Opinion, Public Opinion, and Mass Media 網上輿情能反映民情嗎? 剖析網上輿情民意和大眾傳媒的互動

Principal Investigator : Dr FU King Wa 首席研究員: 傅景華博士

Institution/Think Tank : The University of 院校 /智庫: 香港大學

Project Duration (Month): 推行期 (月) : 30

Funding (HK$) : 總金額 (HK$): 556,888.00

This research report is uploaded onto the Central Policy Unit’s (CPU’s) website for public reference. The views expressed in this report are those of the Research Team of this project and do not represent the views of the CPU and/or the Assessment Panel. The CPU and/or the Assessment Panel do not guarantee the accuracy of the data included in this report.

Please observe the "Intellectual Property Rights & Use of Project Data” as stipulated in the Guidance Notes of the Public Policy Research Funding Scheme.

A suitable acknowledgement of the funding from the CPU should be included in any publication/publicity arising from the work done on a research project funded in whole or in part by the CPU.

The English version shall prevail whenever there is any discrepancy between the English and Chinese versions.

此研究報告已上載至中央政策組(中策組)網站,供公眾查閱。報告內所表達的意見純屬本項 目研究團隊的意見,並不代表中策組及/或評審委員會的意見。中策組及/或評審委員會不保證報 告所載的資料準確無誤。

請遵守公共政策研究資助計劃申請須知內關於「知識產權及項目數據的使用」的規定。

接受中策組全數或部分資助的研究項目如因研究工作須出版任何刊物/作任何宣傳,均須在 其中加入適當鳴謝,註明獲中策組資助。

中英文版本如有任何歧異,概以英文版本為準。

Can online opinion reflect public opinion? An investigation into the interplays between online opinion, public opinion, and mass media

網上輿情能反映民情嗎?

剖析網上輿情民意和大眾傳媒的互動

PUBLIC POLICY RESEARCH FUNDING SCHEME

CENTRAL POLICY UNIT

(Project No. 2013.A8.009.14A)

1

Data Visualization: Online Communities

Based on a dataset of Hong Kong-based Facebook Pages, the above network diagram shows how online communities are formed and connected. A node in the diagram represents an individual Facebook Page and an edge between two pages denotes one or more Facebook post shared with each other. The color of each edge indicates the nature of the traffic to and from the communities to which the pages belong.

2 Table of Contents

Table of Contents Data Visualization: Online Communities...... 2 Table of Contents ...... 3 Abstract...... 5 English version ...... 5 Chinese version ...... 6 Introduction ...... 7 Conceptual Framework ...... 9 Conceptualizing Online Opinion Expression ...... 10 Previous Research in Hong Kong ...... 12 Online Public Opinion and Citizen Sensor ...... 13 Online Opinion, Public Opinion, and Traditional Media ...... 14 Research Questions ...... 15 Method ...... 16 Data Collection ...... 16 Collecting Facebook Data ...... 16 Collecting Hong Kong Golden Forum Data ...... 18 Collecting Online News Media Data ...... 18 Ethics Approval ...... 19 Results ...... 20 Pattern of Online Public Opinion in Hong Kong ...... 20 Online Discussion Forum ...... 20 Online News Media ...... 21 Facebook Public Pages ...... 25 Interaction between Facebook and Online News Media ...... 31 Interaction between public opinion, online media and social media ...... 33 Machine Learning Analysis ...... 34 Performance of Predictions ...... 35 Case Study: The 2016 Legislative Council Election ...... 36 Policy Implications and Recommendations ...... 39 Interaction between media and public opinion ...... 39 Policy Recommendation 1 ‒ Gathering Online Public Opinion as a Formal Process of Public Consultation ...... 39

3 Cyberbalkanization and the Information Cocoon ...... 40 Policy Recommendation 2 ‒ Promoting Online Policy Deliberation ...... 40 Policy Recommendation 3 ‒ Cross-departmental Online Public Engagement Policy ... 41 Policy Recommendation 4 ‒ Open Government via Social Media ...... 41 Policy Recommendation 5 ‒ Long-Term Commitment to Democracy ...... 42 Research and Development ...... 42 Policy Recommendation 6 ‒ Supporting E-government and Internet Social Research . 42 References ...... 43

4 Abstract

English version

Online media are known as essential tools for global citizens, especially the younger generation, to utilize when they participate in public affairs or political activities. Much has been observed about the role of new media in policy development and public governance, but evidence is still limited. Even though preliminary research findings in this area in the Hong Kong context are available, further study is required to scrutinize the complex interactions between formation of online opinion, public opinion, and public governance in a unique political context where controversial policy debates have appeared to be never ending since the beginning of the term of office of the Chief Executive CY Leung.

This study aims to analyze a large dataset of online media sources. A major objective of the exercise is to further investigate the patterns and characteristics of a representative subset of Hong Kong online public opinion with respect to the debate on Hong Kong’s political system reform. Our study methodology is designed to collect online opinion content in Hong Kong systematically and analyze the data using both quantitative and qualitative approaches. Moreover, this study seeks to examine empirically the interplays among online opinion, public opinion, and traditional mass media. We test the temporal relationships between a set of time series variables and examine their inter-associations.

This project reveals a stronger relationship between social media, i.e. messages on Facebook Pages, and public opinion, i.e. government’s approval ratings, than the association between online news media and public opinion. It also finds a cyber- balkanized online public opinion sphere. Based on the findings, six policy recommendations are made.

Based on a public policy research perspective, this study helps understand the online opinion and political participation of Hong Kong citizens, which can make long-term and substantial contributions to the government’s public engagement and policy deliberation. This line of research sheds light on the process of formation of public opinion as well as the role of online opinion and the mass media in shaping public opinion.

5 Chinese version

網上媒體已被認為是公民參與政治的工具,尤其是青少年一代。很多人認為網上媒體 在政策制定擔當一定角色。雖然香港在這方面已經有些基礎研究,但有需要更深入了 解網上媒體、民意和公共管治的互動關係。

本研究的目的是在前試點研究的基礎上,把項目的範圍擴展至全面研究,包含各方面 的政策題目,與及搜集香港更完整和具代表性的網上民意訊息來源,進一步了解香港 網上民意的特性和模式,分析網上民意在政改的爭議下的角色。本研究將有系統的收 集及儲存香港網上民意,分析工具包括時間趨勢數據、關鍵詞分析、社會網絡分析及 情感分析。另外,本研究亦會採用具體數據,分析網上媒體、民意和大眾媒體的互動 關係。

本研究發現社交媒體(面書公眾頁)與民意調查的關係,比網上新聞媒體與民意調查的 關連更大; 亦發現出現所謂「網絡巴爾幹」現象。本研究亦據結果作出六項社會政策建 議。

本研究以政策研究為視野,了解網上媒體和香港民眾網上政治參與的發展,有助政府 制定適切的公共參與和商討政策。

6 Introduction

This research project aims to provide a theoretical understanding about the role of social media users’ opinion expression in shaping (or being shaped by) public opinion as well as their interactions with the mainstream mass media in Hong Kong. Using a systematic data collection approach, the study helps understand the characteristics of new media use and political participation of the Hong Kong citizens. The findings are expected to be valuable to academics, policy makers and political practitioners. We expect the results can contribute to substantial and long- term development of the ways in which the Hong Kong government or public authorities in general engage with citizens proactively for public deliberation via new media and technological platforms.

Online media are known as essential tools for citizens, especially the younger generation, to participate in civic and political activities. They also play a vital role in political movement, election, policy development and public governance. In a previous project, the Principal Investigator of the current study, Dr. King-wa Fu, obtained pilot findings on an overall landscape of the Hong Kong online opinion sphere (entitled “Understanding and Analyzing Online Public Opinion in Hong Kong Cyberspace”, available on the Central Policy Unit’s research reports website), but we are yet to fully understand the complex interactions between online opinion, public opinion, and public governance, as well as to investigate their composition and the mechanisms of formation, particularly when the research setting is situated under the current unique political context during the term of office of the Chief Executive CY Leung when a series of controversial policy and social issues are being debated.

Evidence-based (and data-driven) policy development is essential to good public governance. If policy makers fail to establish structural procedures and institutional arrangements to give an appropriate response to the public demand for online engagement and political deliberation, effective governance will be compromised and social disengagement and cynicism might be generated. This can, at worst, undermine the long-term democratic development of a society.

Against this backdrop, as stated earlier, there are two main project objectives in this study. First, this study seeks to extend the scope of research from a pilot study to a full-scale study that covers a broader range of policy topics and collects a larger set of cases, which would be potentially more representative of online news and online content sources. The research design is targeted to further investigate the patterns and the characteristics of the Hong Kong online opinion sphere with respect to a list of strategic policy areas, i.e. constitutional reform, housing policy, the Hong Kong- Mainland China conflict, population policy, and environmental issues.

7

Moreover, this study aims to examine analytically and empirically an interwoven relationship between online opinion, public opinion, and traditional mass media, through which we can investigate and theorize the mechanism of their formation. We intend to test the temporal relationships between the time series of these three variables and also inspect their possible confounding or mediating relationships between each other. This line of research can shed light on the process of the formation of public opinion as well as the role of online opinion and the mass media in shaping public opinion. We are also interested in issues related to polarization of online opinion and the so-called “cyberbalkanization” phenomenon as well as the topical variation between various information sources.

As outlined in this report, this study has achieved the following two project objectives:

1. On the basis of our previous pilot study, this study aims to extend the scope of research to a full-scale study that

covers a broader range of policy areas and a wider range of online media sources, with an objective to further investigate the characteristics of online opinion in Hong Kong with respect to a list of strategic policy topics

2. Second, this study attempts to examine empirically the interactions between online opinion, public opinion, and traditional mass media by testing the associations between these variables and their mediating relationships

8 Conceptual Framework As a globally top-ranked and well-developed city in the use of computing and information technology (International Telecommunication Union, 2009), the Hong Kong population has a high penetration of personal computer ownership and Internet connection and its residents are also known to be sophisticated technology users. According to the of the Hong Kong Government (September 2016), the household broadband penetration rate reached 86% in 2016; the mobile subscriber penetration rate was 232%; and 2.5G and 3G/4G mobile subscribers exceeded 15 million.

With a large population of sophisticated users of new media technologies, Hong Kong society has been experiencing a drastic change in approaches to citizens’ engagement with the increasingly complicated sociopolitical environment, particularly by means of novel ways of online engagement and opinion expression in relation to politics and public affairs. In recent years, large-scale collective actions in Hong Kong, for example protests against the financial approval of the HK-China express rail link project (January 2010), the 2012 political reform proposal (June 2010), antinational education campaign (September 2012), and the “Umbrella Movement” (2014), have drawn much local and global attention to the use of technology in politics by which mostly younger people — the so-called “post-’80s” or even “post-’90s” generations — and many “critical citizens” (Norris, 1999) become active participants in policy debates. Public voices are heard on a variety of new media platforms through which many citizens are mobilized to join political activities. From the perspective of public governance, the new modes of civic participation can broaden the conventional meaning of public opinion and establish a new avenue for enriching political discussion and societal debate, promoting pluralism and increasing diversity of minority opinion (Organisation for Economic Co-operation and Development, 2007).

The emergence of online public opinion on the new media has posed a challenge to the government’s current model of public engagement and political deliberation. For example, public discontent over the introduction of a national education curriculum was initially widespread on the Internet in 2012 summer (a copy of a controversial national education textbook was posted online) after a long process of formal consultation and legislation. Under public pressure and a wave of large-scale protests, Hong Kong government decided in October 2012 to “retune” the mandatory three-year program and allow schools to decide the implementation progress.

Public opinion is conventionally collected through telephone surveys, an approach commonly used by government and organizations to ask the public questions about values, attitudes, and comment on social policy, or approval rating (Lavrakas, 2008). This research methodology is typically undertaken by deploying probability-based sampling, i.e. each inhabitant has an equal chance to be chosen. Using a random

9 sampling scheme, the result obtained is, theoretically speaking, a representative snapshot of the population. Despite its popular use, pollsters in recent years are increasingly struggling to obtain good quality in phone survey results because of reduction of domestic phone utilization in the household population (due to growing use of mobile phones and smart devices), high nonresponse rates (hard to gain respondent’s cooperation to complete the survey), and growing response bias attributed to self-reporting (Kempf & Remington, 2007; Kreuter, 2009).

Much has been done to investigate the relationships between traditional media and political participation in Hong Kong (see review by Lee & Chan, 2009). These studies routinely found positive relationships between traditional media use (newspaper and television) and political knowledge about elections (Guo, 2000) or between media exposure and the likelihood of voting (Cheung, Chan, & Leung, 2000). However, there is a research gap in understanding what online public opinion means to the society.

Conceptualizing Online Opinion Expression

With the characteristics of robustness, interactivity, rapid information dissemination, and globally open borders, the Internet has been recognized as a key ingredient for citizen political participation (DiMaggio, Hargittai, Neuman, & Robinson, 2001; Ronfeldt, 1992; Sparks, 2001). The online platforms enable citizens to take part in politics and substantially reduce the cost of participation. With the development of social media, citizens are further empowered to express opinions via a wider range of applications, including writing on or replying to personal blogs or online BBS, tweeting their updates, retweeting messages to followers, engaging in social networking sites, or sending instant message via applications such as WhatsApp or WeChat. Many optimists believe that the digital platform can encourage democratic participation, free expression of opinion and thus democratic deliberation, serving ideally as a virtual platform of Habermasian communicative rationality, namely the public sphere (Habermas, 1989) or a marketplace of knowledge (Hayek, 1984).

This line of thought rests primarily on a notion that online political participants are no longer passive information receiving ends of political information and knowledge, but rather are empowered by new technologies to be active agents, initiators or contributors to political discussion and deliberation. The opportunity of active involvement, in principle, facilitates citizens’ political participation regardless of their race, socioeconomic status, or gender. New media can broaden citizens’ channels to expression of opinion toward a variety of public issues and/or personal interests. This is particularly more profound in the context of a worldwide trend of lower citizen interest in parliamentary politics, emergence of “critical citizens” (Norris, 1999) and increasing public aspirations for democracy (Norris, 2011). Some people have high

10 hopes for the Internet for encouraging younger generations to participate in civic engagement (Dahlgren, 2007).

While many people in general anticipate the Internet will be a potential solution for the public’s falling interest in conventional politics, many scholars contend that the anticipation is too optimistic and inconclusive (for a review, see Debatin, 2008). Empirical research data, including findings in the United States, Hong Kong and multinational studies, support a “digital gap” between engaged and disengaged populations in term of political engagement (Fu, Wong, Law & Yip, 2016; Hindman, 2009; Norris, 2001; Smith, Schlozman, Verba, & Brady, 2009) — an imbalance of online political participation based on demographics or social characteristics such as age, gender, education attainment, or class. For example, Smith et al. (2009, p. 3) argues that “… the Internet is not changing the fundamental socioeconomic character of civic engagement in America. When it comes to online activities such as contributing money, contacting a government official or signing an online petition, the wealthy and well educated continue to lead the way.”

Another major challenge against the optimistic attitude toward the Internet’s contribution is the fallacy that the societal influence of technology per se is excessively focused, but a broader scope of sociocultural and political contexts are not sufficiently emphasized. Consequently, the Internet’s impact on public engagement might have been exaggerated. The overemphasis on technological characteristics, namely technological determinism, invites a wide range of criticisms, such as “cyberutopianism” (Morozov, 2011).

The impact of the Internet on political participation is generally summarized in two theses: a Mobilization Thesis and a Normalization Thesis (See review and meta- analysis by Boulianne, 2009; Gibson, Lusoli, & Ward, 2005). Mobilization Thesis presents an optimistic view of the contribution of the Internet to empowering or mobilizing citizens — particularly inactive or disengaged individuals — becoming more resourceful in accessing political knowledge and more engaged with like- minded citizens and interest groups via online platforms. Using the Internet can lower a citizen’s entry barrier to politics and can broaden the opportunity for participation in contrast to conventional means such as joining political parties or community groups. Not only is this view appealing to those who expect the Internet to rescue the worldwide decline in political interest and social capital (Coleman & Gotze, 2001), but also it appears to be a solution to the widening social gaps in political participation between diverse populations: people with lower socioeconomic status, women, youth, or any politically disadvantaged citizens.

The Normalization Thesis, under certain contexts called the Reinforcement Thesis, is a proposition that the major consequence of online engagement is the activation and reinforcement of those who are already knowledgeable citizens, higher educated, politically engaged or interested in conventional political participation, but not those

11 inactive and disengaged citizens. This thesis is empirically backed up by a number of cross-sectional studies in the past decade (See two recent examples in the United States, Hindman, 2009; and Smith, et al., 2009). Norris’s notion of a “virtuous circle” exemplifies a strong case from an international perspective, positing that media use, primarily traditional media, may serve to mobilize only engaged citizens rather than those already disengaged people to become involved in the process of engagement (Norris, 2000). At worst, the new online platform might merely reinforce the gap between information-rich and information-poor and between engaged and disengaged citizens.

A whole array of individual determinants of political participation is identified from the literature. Political interest has been found to be a strong predictor for political participation as well as a mediator between media use and political participation (Boulianne, 2009; Livingstone & Markham, 2008). Political knowledge, political efficacy, political attitude, political orientations, social capital, or interpersonal/social networks are among a set of key contributors for predicting an individual’s pattern of political participation (Lee, 2006, 2010; Livingstone & Markham, 2008; Shen, Wang, Guo, & Guo, 2009; Wang, 2007; Zhang & Chia, 2006).

Previous Research in Hong Kong

In March 2010, the Central Policy Unit (CPU) of the HKSAR Government commissioned a team of HKU researchers to investigate the needs, views, and frustrations of Hong Kong’s 18–to-29-year-old generation (Yip, Wong, Law, & Fu, 2011). Further analysis was conducted to investigate the pattern and characteristics of the Hong Kong younger generation’s online political participation and online media use (Fu, Wong, Law & Yip, 2016). Using latent class analysis, we identified four e clusters of young people with different levels of political participation and patterns of offline and online media use: (1) “Critical citizens” (14%) who have the highest likelihood of using online media, engaging in offline and online political participation, casting vote in the last election, and holding “critical” views, such as left-wing, liberal, distrusting of government, and post-materialist values; (2) modestly politically active nonvoters (15%) who are likely to participate in several types of offline or online political activities but have lower intention to be registered voters or to participate in election; (3) voters (29%) who have lower intention to participate in offline or online political activities but are likely to be registered voters and more interested in casting a vote in the election; and (4) “disengaged” individuals (42%) who have the lowest intention to participate in offline or online political activities and to be voters. Based on this typology, the results suggest that the “critical citizens,” who are mostly politically active online users, constitute a core group within the Hong Kong online public opinion sphere but indeed the group is a minority. Their voice might be over- representative, and known as a “mainstream” view on the Internet, but not necessarily in society in general.

12

Patterns and characteristics of Hong Kong online opinion were then further explored. In 2010, the Principal Investigator of this project, Dr. King-wa Fu, was commissioned by the Central Policy Unit to conduct a pilot study on the characteristics of online public opinion in Hong Kong. The project successfully developed a robust and reliable data collection methodology for sampling the space of online public opinion in Hong Kong. The final research report describes online public opinion in Hong Kong and characterizes its main features. Selected topics of online discussion of social policy were analyzed and discussed (Fu & Chau, 2011).

Using the data collected, we also established an approach to examining the interactions between online and offline public opinion. Major findings of the study are highlighted as follows: 1) Our data shows that the number of posts increases and the sentiment score decreases as public outcry over controversial issues happens; 2) Major keywords extracted from online contents can inform public governance – for example those emerging topics that appear frequently but were rarely found in the past; 3) Digital presences of “opinion leaders” can be identified; and 4) The results of the two case studies demonstrate the ways in which online opinion is incorporated into policy discussion.

Using the sentiment analysis technique, we developed an online sentiment score to reflect online public sentiment toward the government. Further analysis seeks to test the temporal association between the online sentiment score and phone survey poll results in Hong Kong (Fu & Chan, 2013). Our findings suggest that online sentiment scores can lead phone survey results by about 8–15 days, and the results are significantly correlated.

Online Public Opinion and Citizen Sensor

The above two studies have provided us a better understanding about online opinion in Hong Kong with respect to the following two main arguments: 1) online opinion can act as a “citizen sensor” to reflect the public’s reaction to social policy or public event; 2) online sentiment can be quantified into an indicator for prediction of an approval rating of the Hong Kong government (as measured by a phone survey). Many studies have used social media sentiment (mainly from Twitter) to predict voting results (Ceron, Curini, Iacus, & Porro, 2013; Marchetti-Bowick & Chambers, 2012). But Mejova, Srinivasan, & Boynton (2013) conclude that social media sentiment is overwhelmed by negatively toned posts about politicians, and the results are not entirely predictive of results in national polls.

With the rapid development of the field, Fu & Chau (2011) suggest the establishment of an online public opinion tracking system in Hong Kong. Such a system could help policy makers and the general public to keep track of online discussions of social

13 and political topics, and could follow changes, especially short-term changes, in citizens’ online sentiments toward public governance and social policy.

Online Opinion, Public Opinion, and Traditional Media In this study, we ask one key question that is important to the understanding of the interactions between media and politics in Hong Kong: How can we account for the correlation between population-based opinion and the nonrepresentative samples of online expression (mainly “critical citizens”)? What is the role of mass media in shaping public opinion, online opinion, and/or both concurrently?

Previous study has indicated a set of theoretical explanations for the interplays between online opinion, offline public opinion, and mass media (Fu & Chan, 2013). Three plausible explanations are suggested: 1) both online and offline opinion are mainly driven by same group of opinion leaders; 2) their relationship is confounded by mass media impact; 3) both results are caused by the same sources of systematic error.

Drawing on the two-step flow of communication model (Katz, 1957), opinion leaders’ views are important to the formation of public discourse, and consequently can direct the general population’s opinion. This group of elite members of the society can extend the opinion formation process from offline to online environments and reinforce their influential power across a variety of offline and online media. Their online opinions can therefore be used for estimating the overall population’s view.

The second explanation is that the correlation of public opinion (primarily from surveys) and online sentiment is confounded by mass media impact. Media theories, for example agenda setting and media framing research, have informed us with established evidence to support the view that public opinion is profoundly shaped by news media (Kepplinger, 2008). But on the other hand, news and information are among the main sources of social media content in Hong Kong and online media have significant power to direct the news media’s agenda (Fu & Chau, 2011). Therefore, we argue that if citizens generally as well as online news users are both exposed to, and strongly influenced by, news media, both phone survey results and online sentiment may become two distinct indicators for the same underlying variable: the attitude of the news media.

Third, there might be methodological and sampling bias in both research designs. Both phone survey results and social media content might be imperfect forms of operationalization for public opinion measurement, i.e. measures deviating from the “true value” systematically, and therefore both would be outcomes with systematic error.

14 Based on these explanations, which are not entirely mutually exclusive, a conceptual diagram is devised [Figure 1]. This study is designed to address the first two explanations. To the author’s knowledge, no study has empirically examined the interactions among online public opinion, mass media, and conventional means of collecting public opinion (e.g. telephone surveys) in Hong Kong or elsewhere. Here are our research questions.

Media reporting on polling

Mass Public Media Agenda setting/framing Opinion

Online news

Bottom-up Opinion agenda setting Leaders Online Opinion

Figure1: Conceptual Diagram

Research Questions Drawing on the conceptual framework as described in this section, we seek to examine the following research questions in this study.

RQ1) What are the patterns and characteristics of Hong Kong online opinion?

RQ2) How is the formation of online opinion associated with policy development and government-society interactions?

RQ3) What are the interactions among online opinion, public opinion (phone survey- based), and mass media?

15 Method Data Collection

Collecting Facebook Data

By Facebook’s definition, Facebook pages are developed for “for businesses, brands, organizations and public figures to share their stories and connect with people. Like profiles, pages can be customized with stories, events and more. People who like a page can get updates in News Feed.”

Five Facebook pages, namely Scholarism (學民思潮), Supporthktv (萬人齊撐!!! 快發牌比香港電視!!!), Passiontimes (熱血時報), Salutetohkpolice (向香港警察致 敬) and Supportnationaleducation (理性撐國民教育), were selected as the seed pages for snowball sampling. These pages were selected on the basis of their relevance and significance to both supporting (the first three) and opposing (the last two) the Occupy Movement in Hong Kong. Using the selected seeds as the initial set of pages, additional Hong Kong-based Facebook pages were gradually collected iteration-by-iteration through a process of tracing the posts or links shared on the current set of Facebook pages.

When a page was included into sampling pool, the page’s publicly available posts (including original posts and shared posts, depending on the privacy setting of an individual page) were obtained by using the Facebook Graph API (https://developers.facebook.com/docs/graph-api). Each page’s posts were retrieved since July 1, 2014. The data fields that are accessible via the Facebook Graph API consist of the following variables: the body of the post, date and time of publication and the shared URL. The sampling pool was continuously checked for updates of new posts over the entire study period.

Update the sampling pool The sampling pool was updated by two methods.

1) The first method was used since the beginning of the project in July 2014. The sampling pool was updated monthly. New candidate pages were searched by locating all shared stories in the last 30 days found on the pages of the current sampling pool. The inclusion of new candidate pages was first checked by a Support Vector Machine (SVM) classifier, which was pretrained. The classifier generated a features vector, including a list of fields like descriptions, titles and usernames, by training an initial set of 1,387 pages and the whole set of excluded pages as training set. The performance of the

16 classifier is acceptable, i.e. the F1 score of the classifier is 71%. The classification results of the SVM classifier were then double checked by one human rater (CH Chan) before final inclusion. As of August 2015, the total number of pages in the sampling pool was 3,011. Subsequently, for all newly included pages, all publicly available posts published since July 1, 2014, were retrieved.

2) The second method was deployed in 2016 till now. Similarly to the above method, the sampling pool was updated weekly but new candidate pages were searched by locating all shared stories in the last 60 days. For each candidate page with at least 300 fans, the Facebook Insights API (https://developers.facebook.com/docs/graph-api/reference/v2.6/insights) was deployed to evaluate the proportion of Hong Kong based fans over the total for each page that was examined. If the majority (>50%) of the fans come from Hong Kong, that candidate page is included into the sampling pool. This method enables full automation of page inclusion procedures without human input.

The Facebook sharing network A post-sharing network was then constructed by using all shared posts data collected in the sampling pool. In the network, a node represents an individual Facebook page and an edge denotes a post of Page A shared by Page B (A ->B), such that an edge weight designates the total number of posts shared between the two pages (nodes). As a result, a weighted and directional edge was formed between any two Facebook pages in the sample (nodes) when one page shared a post from another page (an edge) and the weight of edge denotes the overall number of shared posts within the study period. For example, a directed edge denotes a shared post from one page to another. The direction of edge indicates the flow of information from Page A to Page B, for example when A -8->B means 8 posts of Page A are shared by Page B.

Collecting commenters and likers information Based on the page information and its betweenness centrality in the network, some selected pages’ post-level commenters and likers were collected through the Facebook Graph API once a month. The data can be used to analyze whether and how the content creation and information sharing patterns of Facebook pages might polarize some Facebook users.

Every month, posts for the whole day at the selected page were obtained and the posts’ comments and likes were collected.

17 Because of the limitation of computing resources and Facebook API’s quota, only some indexed pages are selected for collection of their likers’ and commenters’ information.

Collecting Hong Kong Golden Forum Data We also aim to collect all threads published on the HK Golden Forum and also the threads’ corresponding replies within 60 days after the publication of the original thread.

A numeric unique identifier of each thread of the online forum is inferred from the thread’s hyperlink (e.g. http://forum14.hkgolden.com/view.aspx?message=5793684, where 5793684 stands for its unique identifier). The unique identifier of thread (TID) is observed to increase sequentially over time ‒ the latest, the largest. The latest TID is extracted from the threads URLs with the largest TID listed on the listing page of all latest threads. (http://forum1.hkgolden.com/topics_bw.htm)

Starting from the TID of the post published on 1 July 2014 00:00:00, a software- programmatically controlled Internet browser was deployed to visit the web via URL of threads one-by-one sequentially, obtaining the HTML codes of threads from the July-1-dated TID up to the latest TID. An HTML parser built within the Python’s Beautiful Soup library was deployed to parse and extract the information content from the obtained HTML data structure. The body of the threads (the initial post and all replies, collectively called posts) was first obtained. For all posts, the unique identifier of the posters, date and time of posting, body of the posts, URL included and URL of images (including emotional icons) included in the body of the posts were then extracted.

In order to collect all updated replies within 60 days after the first visit, each thread was revisited 60 days after the initial posting date for obtaining the updated set of replies and follow-up posts using the aforementioned method. Any new update reply after 60 days would be excluded.

Collecting Online News Media Data Similar to the data collection method used for the HK Golden Forum, customized HTML parsers were developed to download and extract data from a list of online news media websites (as listed below) since September 2014. News articles of these sites were harvested using either one of the following methods:

1. RSS (Rich Site Summary) feed approach Some online media provide an RSS news feed function to “push” the latest article links to their subscribers. Sources like Mingpao, RTHK, Speakout, InMedia, Post852, VJ Media, Localpress, Standnews and HKGPao are

18 among those who offer the RSS news feed service and their news feeds were checked regularly for new articles.

2. Mobile indices approach For online media that do not provide an RSS feed, their mobile version websites were checked regularly for new article links. Sources whose data are collected by this method include: , Oriental Daily, Sun, Hong Kong Economic Journal, AM730, Sing Tao, Passion Times, Metro HK and HK Daily News.

For each newly posted news item, its hyperlink, publication date, article title and the content were extracted.

Ethics Approval

The study was approved (Ref No.: EA110414) by the Human Research Ethics Committee for Non-Clinical Faculties (HRECNCF) regarding the ethical aspects of the project.

During the study period, the Journalism and Media Studies Centre at the University of Hong Kong was responsible for the data storage. The data were collected by using self-developed computer program and stored in a computer server maintained by the JMSC researchers. All user identifiers (including username, display name, user ID code, and web-links to photos) will be deleted or replaced with personally unidentifiable pseudo code after all data are collected upon the end of the whole project.

19 Results

Pattern of Online Public Opinion in Hong Kong

Online Discussion Forum

Word Cloud Analysis

Online discussion forum is a BBS-type of online bulletin board which allows a user to start a discussion topic (a new thread), post a message under a thread, or reply to others’ messages publicly. So the number of generated replies on a thread or a post is an indicator for its level of engagement. The largest number of replies, the highest level of engagement.

First, we analyzed the messages posted on the Hong Kong Golden Forum (HKGF, https://www.hkgolden.com) as an example. We aim to examine the relationship between the use of textual content and the level of engagement in one of the most popular online forums in Hong Kong.

As shown in Figure 2, a word cloud analysis presents a diagram containing a “cloud” of keywords of popular topics found on the HKGF. The font size of each term indicates the number of replies generated ‒ the largest font size, the largest number of replies, i.e. highest level of engagement. The figure also shows that many terms are not related to public affairs or political topics in Hong Kong but are about entertainment or leisure activities. Because of this, we believe that using HKGF as a primary information source to study online public opinion might be futile. If we intend to study public-policy-related discussions on the HKGF, extra work and effort might be required to filter out the “noise” generated by large amount of irrelevant topics, say entertainment, music/movies, computer games, or even porn, in order to extract the “signal” out of the mixture of raw data.

Based on this platform characteristic, we decided not to study the data on online forum any further.

Figure 2: Word Cloud Analysis (Hong Kong Golden Forum)

20

Online News Media

Topical analysis In recent years, online news media have become important news sources for Hong Kong citizens. Online platforms are also key information channels for news dissemination on social media via sharing, forwarding, or retweeting, which creates an “information cascade” effect, reaching out to the mass of the general public. However, less is known about what news topics the online news media often cover and what “media diversity,” i.e. the extent to which the media run stories that are distinct, unique or exclusive, the online news platforms offer to online readers.

First, we deployed topic modelling analysis to examine the variation of topics covered by the Hong Kong based online news media outlets. Second, we tested whether or not clusters of topical similarity exist among the samples of online news media.

News items were collected from major online news sources during the study period, including the following websites:

輔仁媒體 (http://www.vjmedia.com.hk/),

HKG 報 (http://hkgpao.com/),

21 本土新聞 (http://www.localpresshk.com/),

熱血時報 (http://www.passiontimes.hk/),

立場新聞 (http://thestandnews.com/),

852 郵報 (http://www.post852.com/),

獨立媒體 (http://www.inmediahk.net/),

港人講地 (http://speakout.hk/).

An unsupervised topic model was developed to automatically determine the probability of topic membership of a given free text news article. The model can determine the topic membership of a news article in 50 topics.

This model has two implications. First, it is deployed to determine the heterogeneity in topical interests between media outlets. For example, the aggregated average probabilities of each topical probability for a given media outlet (which publishes a certain number of new items within a specific time period) is calculated and this topical vector is known as a “phenotype vector” for a media outlet, representing its tendency in covering a list of topics. Similarity between vectors, i.e. in this study we used cosine distance, was deployed to evaluate the “closeness” between online media outlets in terms of topical interest.

In the following diagram, we compared the topic probabilities of six selected popular news topics, including budget approval on the high speed railway (高鐵撥款), CY Leung/LegCo (梁振英/立法會), police/Mong Kok clash (警察/旺角衝突), post- Umbrella Movement (勇武/後雨傘), equal opportunity for sexual orientation/discrimination (同志平權/歧視) and The University of Hong Kong saga (香 港大學), among a selected set of online news outlets ([輔]仁媒體, [H]KG 報, [本]土新

聞, [熱]血時報, [立]場新聞, [8]52 郵報, [獨]立媒體, [港]人講地, [Word] denotes the corresponding online news media outlet).

As shown in Figure 3, we found the differences and similarities in topical interest between the online media outlets. Hierarchical cluster analysis of the heterogeneity of the phenotype vectors of online media outlets is presented as a dendrogram in Figure 4.

22 Figure 3: Differences and similarities in topical interest between a set of online media outlets

([輔]仁媒體, [H]KG 報, [本]土新聞, [熱]血時報, [立]場新聞, [8]52 郵報, [獨]立媒體, [港]人講地)

Figure 4: Hierarchical cluster analysis on the heterogeneity of online media outlets ([輔]仁媒

體, [H]KG 報, [本]土新聞, [熱]血時報, [立]場新聞, [8]52 郵報, [獨]立媒體, [港]人講地)

23

Second, we aimed to test the sentimental variation between online news outlets with respect to a specific topic of interest and analyzed the topical contents of an online news media by using sentiment analysis. In this case, a simple lexicon based sentiment model was deployed. Through that model, we can computationally determine how a media outlet covers a news topic in terms of relatively positive, negative, or neutral tones.

In Figure 5, “The University of Hong Kong saga” is selected as a case study. News items of this topic for each of the online news media are grouped and an overall sentiment score is calculated to shown the average polarity of sentiment of all news articles about “The University of Hong Kong saga” for each media outlet. The result shows a diverse attitude toward the same topic among the online media samples, e.g. [輔]仁媒體 reported relatively “positive” toward the incident (mainly about the students’ protest) whereas [H]KG 報 ran the stories in a more negatively-toned .

However, this part of the study is solely a simple pilot test for sentiment analysis of online news media. The sentiment analysis model requires further enhancement with better methodology such as natural language model to improve accuracy. Figure 5: Sentiment analysis on “The University of Hong Kong saga” contents of online news media ([輔]仁媒體, [H]KG 報, [本]土新聞, [熱]血時報, [立]場新聞, [8]52 郵報, [獨]立媒體, [港]人講地)

24 Facebook Public Pages

Online Communities

A Facebook sharing network was created and analyzed by using social network analysis. The analysis was then used to study the pattern of online communities and information hubs (or online opinion leaders) within the Facebook sharing network.

Connectivity via sharing between pages represents whether or not their ties are strong or weak, i.e. creation of an online community. The larger number of shared posts, the stronger ties between the pages and the higher chance they belong to same online community. Community membership of each Facebook page was computationally determined by an unsupervised Walktrap community detection algorithm (Pons & Latapy, 2005). Once the community was computationally assigned, main actors within each community, i.e. an information hub or online opinion leaders, were determined by calculating each node’s betweenness centrality and/or high out-degree centrality. Those who have high betweenness and/or high out-degree centrality are known as information hubs.

We analyzed the formation of online communities and their network-structural characteristics. This study aimed to investigate the nature of “cyber-balkanization” in Hong Kong over a longer term of study, i.e. between July 1, 2014, and May 31, 2016. Particular interest was placed on the newer development in the context of post-Occupy-Central period.

As shown in Table 1, we found that the top two online communities, i.e. largest number of members, are the two ends of “poles” (localists and proestablishment groups, respectively) along the Hong Kong political spectrum. This seems to suggest that the Hong Kong political atmosphere, as reflected by the online data sharing pattern, has been becoming even more “polarized” toward the two ends of the political spectrum in the post-Occupy-Central period. These two mega online communities outnumbered other groups in term of scale and number of pages.

The movie/entertainment industry and soccer groups, i.e. 電影公司/電影人 and 足 球, were ranked as the third and the fourth largest online communities respectively.

The social movement and media groups, i.e. 社運組織、媒體, and environment- related groups, i.e.保育、農業、環保人士, followed as the fifth and the sixth but they were not even ranked among the top four.

25

Table 1: Top Online Communities of Hong Kong based Facebook Pages Online Number of Highest Betweenness Pages Communities Pages 社群名稱 社群專頁總 社群中最高中介度專頁 數

1 反左膠、本土 489 豬場新聞 Pignewshk 派、退聯等 科大行動 唐生大地震

2 親北京、撐警等 212 時聞香港 向香港警察致敬 港獨不代表我

3 電影公司/電影 134 Edko Films Ltd. 安樂影片 人 電影與娛樂盛事 Movie & Event Marketing UA Cinemas

4 社運組織媒體 99 香港獨立媒體網 USP United Social Press 社媒 MM

5 社運組織 67 學民思潮 Scholarism Kit Da Sketch - Kit Man 黃之鋒 Joshua

6 新聞評論 #1 62 召集十萬人反黑警! 炮打司令部 升旗易得道

7 新聞評論 #2 71 福佳與林忌創作 無神論者的巴別塔 寰雨膠事錄 Gaus.ee 國際軍事政治經濟新 聞放送局

8 3C 60 網絡的事 石先生 數碼捕籠

26 9 本土意識 90 香港人 Secrets 我係香港人 昔日香港

10 娛樂 76 麻利有隻小綿羊 my903.com 商業電台 King Jer 娛樂台

11 保育、農業 84 東北告急,無你點得 ? 集雜志 Zine 馬寶寶社區農場 Mapopo Community Farm

12 足球 120 keymansoho 足球版圖 香港超級聯賽 Hong Kong Premier League 大球場道 Road to Stadium

13 交通 74 柏斯敦巴士台 plaxtonl's Bus Page Hong Kong International Airport 香港國際 機場 hkitalk.net 香港交通﹒資訊網

14 旅遊 64 新假期 JetSo 新假期周刊 杜遊珍

15 民主派 KOL 及組 72 潘小濤 織 前線科技人員 Charles Mok 莫乃光

Cyberbalkanization

Using time series analysis, this study also examines the association between cyberbalkanization and real life polarization of public opinion. A set of 1,387 Hong Kong-based Facebook Pages (between July 1 and December 15, 2014) were analyzed. Their page’s public posts were retrieved and a post sharing network (1,397 nodes and 41,404 edges) was formed. A community detection algorithm extracted the online communities computationally and assigned each a community

27 membership. Then, the daily degree of cyberbalkanization was quantified, i.e. the ratio of number of sharings through strong ties (intracommunity sharing) connections and the total. On the other hand, the index of political polarization was computed using the opinion polls data, i.e. proportion of respondents who gave extreme ratings to the government leader in Hong Kong (Chan & Fu, 2015).

The time series analysis found that the daily degree of cyberbalkanization was significantly associated with the level of political polarization, particularly with the younger age group's opinion poll result (Chan & Fu, 2015). This result provides empirical evidence for supporting cyberbalkanization to serve as a leading predictive indicator of the polarization of public opinion for at least 10 days ahead, suggesting that social media data analysis can supplement traditional public opinion research methods, such as phone surveys, during social controversy.

Most Shared Facebook Messages

We then identified the most popular individual message items that generated the largest number of sharing between the sampled Facebook pages in the study period. As shown in the Table 2, most of the message items are related to Hong Kong public affairs or social issues. Only a few exceptions are about entertainment news. Even the first ranked one entitled “黎明就取消演唱會致歉” is an entertainment story, its online virality was mainly attributable to local politics in a way that netizens took Mr. Leon Lai’s apology as a “reference model” to the Hong Kong government’s poor crisis response and the government officer’s reluctance to make a public apology in many of the previous incidents. The result indicates that a substantial amount of messages shared among Facebook pages in Hong Kong are mainly used for exchange of political information among Facebook users.

28 Table 2: Top Ranked Facebook Shared Messages between Pages Number of shares betwee Video or Post contents in Chinese title n pages not

黎明就取消演唱會致歉 102 yes

黎明 100 毛咖啡廣告 88 yes

黎明重開演唱會 88 yes

立場新聞結束聲明 86 no

聯署要求白宮支持香港民主 85 no

TVB 回應毛孟靜就 J5 簡體字幕的質詢 84 no

覺醒配音:《希特拉都反網絡 23 條》 79 yes

蘋果日報:版權修訂條例草案表決二讀獲得通過 78 no

循道中學 SAS 歌曲 MV (已刪除) 78 yes

蘋果突發:上大帽山被阻 惡女狂鬧阿 Sir 76 yes

香港西米電視:3 分鐘看懂 - 網絡廿三條 76 yes

NOW 新聞:【政情】梁振英籲商界勿捐錢予本地大學 74 yes

萬人聯署: 聲援被捕女童 72 no 反對網絡 23 條聯署 70 no

Last Blood: 感謝 Google 提供港豬證明書。 68 no

聯署要求白宮關注失蹤書店人士 67 no

蘋果日報:車死人現場 街坊執橙笑晒口 67 no

本土民主前線:選舉事務處濫權踐踏自由,高度自治消失殆盡 64 no

黎明演唱會現場直播 60 yes

熱血時報:【旺角警民衝突】有市民被警員打到頭破血流 60 yes

開片:網絡挑機-關愛默示錄 (被消失的 Credit 現身版) 58 yes

29 一群無綫新聞部記者的公開信 57 no

東九龍社區關注組:東九投票結果速報! 55 no

蘋果日報:登記做選民(2014 年) 55 no

蘋果日報:大內密探劉皇發 55 yes

香港地:Chirs CHUNG 53 no

蘋果突發:梁齊昕慶萬聖節 怒摑母親唐青儀 53 yes

鍵盤戰線:「網絡廿三 一定唔得」二讀集會 52 no

十年:登陸 Google! 52 no

熱血時報:【旺角警民衝突】警員拔槍 52 no

明報職工協會:執總深夜被炒 不明不白 51 no

NOW 新聞直播 50 yes

蘋果日報:寧要大笨象養得大,也不想香港人活得健康的政府 50 no

有線新聞:1,500 元住「劏上劏」房 50 yes

東網:麒麟 KO 葉劉 50 yes

李克勤遺失行李嬲怒 49 no

周庭:An Urgent Cry from Hong Kong 49 yes

結束一桶專棄:緊急呼籲 49 no

100 毛:第一屆毛記電視分獎典禮正式完結! 48 no

啟德坊十條行人專用街道仲未有名,巴絲打畀啲 idea 過嚟啦! 48 no

蘋果日報:「以胸襲警」如此下場 48 no

Jackz:賽馬直擊:《第 50 屆工展會盃》 48 yes

破折號:黃之鋒錢詩文遇襲暴徒街頭拳打腳踢(己刪除) 47 no

吉野家:熊本地震捐助活動 47 no

香港超級聯賽:FULL- TIME 香港 0-0 中國 47 no

30 十年:《十年》電影官方預告片 47 yes

一字馬致敬:《合成》 47 yes

蘋果日報:火爆姐版《堅‧香港地》 Rap 出港人心聲 46 yes

MM:警司揮警棍擊打市民後頸 46 yes

100 毛:特事特辦.機場實測 45 yes

Interaction between Facebook and Online News Media

This section presents an investigation into the interactions between the Facebook public pages and the online news media. The main research inquiry is about whether or not news on social media might be associated with web-based online news media in terms of news volume, news content or the sentiment toward the government. This helps answer the question about using online media as news distribution platforms when evaluating the effectiveness of news dissemination and addressing the differences in communication functions among online news platforms.

Table 3 shows the major sources of shared links embedded in the posts on the sampled Facebook pages. The domain names of shared information in the aforementioned Facebook sharing network are sorted by the number of unique sharers, i.e. to avoid a misleading figure that indicates that a link is shared by the same person multiple times.

Facebook.com and youtube.com are the two main domains of the shared link, showing widespread information sharing within various social media platforms. However, we cannot exclude the possibility that these shared contents on Facebook and YouTube are indeed news-related, i.e. Facebook news page or YouTube news channel. Further effort is needed to analyze and regroup the data for an overall investigation. The generic domain parser1 we are using cannot combine some domain names together (such as hk.on.cc and orientaldaily.on.cc).

We observed that traditional media outlets such as the Next Media or Apple Daily (including hk.apple.nextmedia.com, s.nextmedia.com, hkm.appledaily.com, nextplus.nextmedia.com), Oriental Daily (including hk.on.cc, orientaldaily.on.cc) and (including news.mingpao.com, m.mingpao.com) played a vital role in generating shared content on the social media. Moreover, alternative online news media, such as Stand News (thestandnews.com), In Media (www.inmediahk.net),

1 https://cran.r-project.org/web/packages/urltools/index.html

31 HK012 (www.hk01.com) and Post 852 (www.post852.com), followed some leading traditional media as popular news sources on the Facebook sharing network. News aggregator Yahoo News (hk.news.yahoo.com) was also an important source of shared links.

Table 3: Domain names of major shared links on Facebook Pages Number of Number of unique Domain shares sharers www.facebook.com 4471162 12190 www.youtube.com 109626 7357 youtu.be 66792 5954 hk.apple.nextmedia.com 68161 4330 goo.gl 95849 3324 bit.ly 163895 3234 hk.on.cc 26455 2901 s.nextmedia.com 28550 2558 news.mingpao.com 34843 2529 hkm.appledaily.com 9069 1853 thestandnews.com 33071 1752 hk.news.yahoo.com 11323 1594 www.inmediahk.net 17635 1418 docs.google.com 3275 1401 nextplus.nextmedia.com 6789 1391 m.youtube.com 3634 1372 www.hk01.com 3882 1270 news.now.com 12095 1268 topick.hket.com 13450 1242 orientaldaily.on.cc 4778 1218 www.am730.com.hk 5734 1067 gph.is 3366 1058 programme..hk 3661 1035 www.post852.com 38969 1033 m.mingpao.com 3433 1015

2 It is worth noting that HK01 is a new online media (less than a year old, as of writing) and is rapidly gaining ground on the social media.

32 Second, the post volume of Facebook sharings and the total number of published items of all online media outlets are displayed in Figure 6. The volume figure is adjusted by the amount at September 1, 2014 (= 1.0). The figure shows a diverse pattern of trend between these two sets of time series. The data indicates that the volume of online news media production was gradually dwindling whereas Facebook shared posts showed an increasing trend over the study period.

Figure 6: Volume of Facebook sharings (FB sharing) and number of published items by all indexed media outlets (Media)

Interaction between public opinion, online media and social media A time series of government approval rating was created using the data obtained from the HKUPOP (Public Opinion Programme, The University of Hong Kong). The HKUPOP conducts telephone polling to assess the approval rating of the Hong Kong Chief Executive Mr. CY Leung. On a monthly-basis, the poll asks 500-1000 respondents a question that reads, “Please use a scale of 0-100 to rate your extent of support to the Chief Executive Leung Chun-Ying, with 0 indicating absolutely not supportive, 100 indicating absolutely supportive and 50 indicating half-half. How would you rate the Chief Executive Leung Chun-Ying?” We obtained the raw data of each survey from the HKUPOP website.3

A time series was created from the HKUPOP data, and the weighted average rating of CY Leung (APR) was created. This time series was used as the “ground truth” to be predicted.

3 http://data.hkupop.hku.hk/v3/hkupop/ce2012_leung/ch.html

33

A machine learning approach was used to study the interactions between the quantitative changes in the volume of social media posts, traditional media new items and the variability in APR. The Regularized Random Forest algorithm (Deng & Runger, 2013) was used because it can handle the interactions between features automatically and also panelize any time series that is not predictive or too predictive for optimizing between underfitting and overfitting models.

The traditional media time series was generated from the data regarding the daily number of news items collected from RTHK, Mingpao, Oriental Daily, oncc instant, Apple Daily, Sing Tao Headline News and Hong Kong Economic Journal websites.

The social media time series was based on the number of message items created by all the large communities in the Facebook sharing network (as shown in Table 1). Based on the data between July 1, 2014, and May 31, 2016, 15 large online communities were detected in the Facebook sharing network.

Machine Learning Analysis

The study period was set between August 1, 2014, and November 22, 2016. The start date was chosen because the first day of data collection was July 1, 2014, and we wanted to create lagged time series with at most 30 days. The end date was the most updated HKUPOP survey, as of this writing.

The entire period was divided into two parts: a training period (August 1, 2014, to May 31, 2016) and a testing period (June 1, 2016, to November 22, 2016). The time series for APR was mean-centered with the mean and standard deviation calculated within the training period. This process was done to minimize the influence of extreme outliers.

A benchmark model was created to predict the APR, which is based on the lagged 30 days of APR as the only feature to be used to train a regularized random forest model.

Additional regularized random forest models were created with three combinations of lagged 30-day APR, lagged traditional media time series and lagged Facebook time series. For the traditional media and the Facebook time series, they were lagged with three to 30 days lag units. In order to correct for the weekday and holiday variations in the posted messages of the Facebook and traditional media, all time series were centered with the weekday and holiday mean and standard deviation values of the published items.

The prediction accuracy was evaluated using the mean absolute error (MAE) between predicted time series and the actual APR.

34 Performance of Predictions

The actual APR and the four predicted APRs are presented in Figure 7. All of them were decalibrated in the forecasting period, indicating overfitting.

From the performance analysis, we found that the model trained with traditional media data alone (“Lagged 30d APR + traditional media”) was just slightly better than the benchmark in the forecasting period. However, the prediction based on Facebook data is way better than the benchmark. Addition of traditional media to the Facebook-based prediction cannot generate any significant improvement. From this analysis, we found that the usage activity within the large Facebook online communities was associated with the change in APR, but that was not the case within the traditional media.

The variable importance analysis of the combined model (“Lagged 30d APR + Facebook + Traditional media”) also reveals that the activities of the online communities 1, 6, 7 and 14 are the most predictive of the future APR. The activity of traditional media is not useful for the prediction.

Table 4: Mean Absolute Error between Actual and Predicted Time Series Training period Forecasting period Aug 1, 2014, to May 31, Jun 1, 2016, to Nov 22, 2016 2016 Mean Absolute Error Mean Absolute Error

Lagged 30d APR only 0.817 0.882

Lagged 30d APR + 0.565 0.827 Traditional Media

Lagged 30d APR + 0.432 0.666 Facebook

Lagged 30d APR + 0.424 0.656 Facebook + Traditional Media

35

Figure 7: Actual APR and the four predicted APRs

Case Study: The 2016 Legislative Council Election Using only the Facebook data between December 2015 and November 2016, we found a major spark in Facebook activity during the Legislative Council election (September 2016), after adjusting for weekday variation in posting volume. As shown in Figure 8, social media posts related to collective actions during the time period could not generate as much attention as trivial festive dates such as Mid-Autumn Festival and Typhoon No. 8.

Figure 8: Facebook activity over the Legislative Election Period

36 Candidates’ Facebook Pages An in-depth analysis was conducted to investigate how the candidates used their Facebook during the election season. Seventy-seven candidates’ Facebook pages were studied and all posts with engagement data during the election season (between July 30, 2016, and September 3, 2016) were collected in November 2016.

The average positive engagements (“LIKE”, “LOVE”, “WOW”) per items published for each candidate during the election season were calculated. The average support derived from all HKUPOP election polls was also calculated. The correlation coefficient between the two was 0.24 (p = 0.038). It seems that the average positive engagement metrics might be a rough proxy for offline support.

Similarly, the average positive engagement is significantly correlated with the final vote count (r = 0.397, p < 0.001).

Figure 9: Scatterplot of Actual votes/Polling Result (y-axis, upper and lower panel, respectively) against Positive Engagement on the Candidates’ Facebook Page (x- axis)

A negative binomial regression was deployed to study the relationship between the actual votes and the average positive engagement by the political factions of the candidates (Pro-Beijing [Pro BJ], Pan-Democrat [Pan-dem] and Localists). We found that the positive engagement is still a significant predictor for the actual votes after adjusting for the political factions.

37 Table 5: Prediction for Actual Votes by Political Factions Predictors Estimate 95%CI

(Intercept) 10.2356439 9.8424 to 10.6681

Average positive 0.0012277 0.0004 to 0.0023* engagement

Pro-Beijing faction Reference

Pan-democrat faction -0.5659991 -1.0664 to -0.0824*

Localist faction -0.2991438 -1.0484 to 0.5659 * p< 0.05

38 Policy Implications and Recommendations

Interaction between media and public opinion In the field of political communication research, there has been a well-established relationship between media, politics and public governance. But the media per se in this chain of relation is largely referred to in terms of traditional media outlets, which follow traditional model of journalism (newsroom setting, editorial cleaning, and nonjudgmental and “fact-based” reporting) and operate in mainly one-to-many distribution channel. Even though most of the major traditional media outlets have established web-based online news sections, the extent to which news messages are disseminated nowadays relies heavily on the cascade effect of social media.

Our findings reveal a stronger relationship between social media post traffic, i.e. messages on Facebook Pages, and public opinion, i.e. phone surveys about the government’s approval rating, in contrast to the relationship between online news media and public opinion. Indeed while the social media factor is considered in the equation, online news media could not contribute much to the model in predicting public opinion. This result strongly suggests that social media are playing an even more important role than online news media (mainly traditional media), in distributing news messages and shaping the formation of public opinion, particularly during a period of public controversy.

This result should not be too surprising as we did observe a similar pattern of characteristics of online public opinion in the previous research report (Fu & Chau, 2011) and did suggest the government pay more attention to the emergence of social media (Fu & Chau, 2011). While this time, stronger evidence and argument are available, we are inclined to reiterate our previous position and make the first policy recommendation.

Policy Recommendation 1 ‒ Gathering Online Public Opinion as a Formal Process of Public Consultation Our findings consistently suggest that a better understanding of public opinion and social media would help policy makers and the public keep track of the online discussion of various social and political topics, as well as follow changes, especially short-term changes, in citizens’ sentiment toward public governance and social policy. The social media analysis is complementary to the current mode of phone- survey type of public opinion research. Online public opinion data collection systems should be supported by recurring financial and human resources. Technical and maintenance work within the system and routine data analysis can be partly outsourced to relevant commercial entities. We recommend that the core daily analysis of online public opinion is better undertaken by existing public opinion research units within the government.

39 Cyberbalkanization and the Information Cocoon Almost a decade ago, Harvard scholar Cass R. Sunstein described an online phenomenon within which “people sometimes go to extremes simply because they are consulting others who think as they do. The rise of blogs makes it all the easier for people to live in echo chambers of their own design. Indeed some bloggers, and many readers of blogs, live in information cocoons (p.94)” (Sunstein, 2008). Formation of information cocoons within an online networked communication system, also called cyberbalkanization (van Alstyne & Brynjolfsson, 1996), is known to lead to opinion polarization in the public domain, which is unhealthy for a deliberative democratic system (Sunstein, 2008); sociologist Zygmunt Bauman says in an interview about social media, “Social media don’t teach us to dialogue because it is so easy to avoid controversy… But most people use social media not to unite, not to open their horizons wider, but on the contrary, to cut themselves a comfort zone where the only sounds they hear are the echoes of their own voice, where the only things they see are the reflections of their own face. Social media are very useful, they provide pleasure, but they are a trap.”4

It is evident that our findings support the view that Hong Kong online public opinion, which closely reflects real-life public opinion, is remarkably polarized. Several relatively isolated online political factions (online communities as shown in the data visualization at the beginning of this report, reflecting real-life political factions across the political spectrum, ranging from proestablishment and pandemocrat to localists) are observed, lasting throughout the whole term of the Chief Executive CY Leung’s administration.

Again, the result is not entirely surprising. As noted by Sunstein and Bauman, the current polarization can be damaging to Hong Kong society as well as its future development, if any, in terms of the political system and democracy. While we have to admit that there is no single remedy and no quick fix for this problem, and more work should be done to fill the gap between the polarized groups, we make the following policy recommendations:

Policy Recommendation 2 ‒ Promoting Online Policy Deliberation As observed in the social media, a large group of self-expressive Hong Kong citizens, including many younger people, are demanding that the government deliver a high quality of public governance and transparency. This trend results in an increasing citizen demand for public deliberation, a process through which social and policy issues are debated and discussed and consent achieved between governments, social stakeholders, and interest groups, usually through an established mechanism with a set of defined procedures, and finally determined by a

4 INTERVIEW: Zygmunt Bauman: “Social media are a trap” http://elpais.com/elpais/2016/01/19/inenglish/1453208692_424660.html

40 trusted mechanism or public body. Various online democratic models are being experimented with, for example Google Votes (Hardt & Lopes, 2015). We understand that the goal of such online deliberation cannot be achieved overnight and should be supported by citizens’ value change, strengthening media literacy and democratic literacy in the society as well as government’s open social policy and transparent information policy.

Policy Recommendation 3 ‒ Cross-departmental Online Public Engagement Policy Our findings regarding the Legislative Council show that, regardless of political factions, social media engagement can somewhat translate into opinion toward candidates and even their actual political support (actual votes). Indeed in recent years, Hong Kong citizens’ new media engagement in politics has profoundly posed challenges to the government’s existing public engagement system, which has been repeatedly found to be unsuccessful in consolidating and satisfying public demand. As of this writing, the 2017 Hong Kong-Palace Museum “deal” exemplifies the government's poor strategy in public engagement. As stated in the Chief Executive’s Policy address almost a decade ago, the government has long recognized the importance of the development of online public engagement as a policy direction (Chief Executive of Hong Kong, 2008), but so far a well-established and cross- departmental policy commitment remains nonexistent. If the government continues its past failure to establish institutional procedures and arrangements to address growing public demand for public engagement, both online and offline, it will certainly undermine the effectiveness of public governance and create additional public disengagement and cynicism (Dahlgren, 2009).

Policy Recommendation 4 ‒ Open Government via Social Media Although social media are known as powerful tools to disseminate information rapidly over a network of online users and across social media, the government does not seem to have established a cross-departmental practice of using social media for public communication purposes. Such a practice would be especially crucial for Hong Kong Government when undertaking damage control in response to widespread negative public opinion in the society. Currently, Facebook, Twitter, and Sina Weibo accounts are used by some government departments. Some official announcements and senior officer’s articles are regularly posted on these platforms and updated. Nevertheless, most of them are solely one-way information dissemination and the Chief Executive’s Facebook account is poorly managed. Using this approach to engaging online public is usually implemented in an ad-hoc basis and is subject to each individual department’s preference. The process in which the collective public view is channeled into decision making of policy formation is not sufficiently transparent. The public already has low trust for the government and has even less political efficacy in the government’s public consultation process.

41 The government should investigate how to use the Internet and social media to facilitate open data access and immediate official response.

Policy Recommendation 5 ‒ Long-Term Commitment to Democracy

While the whole issue of the relationship between media and politics is contextualized, the most fundamental and systemic problem is an immature development of the political system, which is a pressing agenda that the forthcoming administration should address. As stated in Article 45 of the , the “ultimate aim” of political development in Hong Kong is that the Chief Executive and Legislative Councilors will be selected by universal suffrage. No matter how polarized are the views in the society and how difficult it may be to achieve consent, it is the Hong Kong Government’s constitutional duty to make a concerted effort to meet public demands for a democratic and open society. It is essential for the government to have long-term commitment to promote a democratic culture of public deliberation via transparency, dialogue with mutual respect, and a consent-making process. Moreover, the government must perform its basic duty to maintain and safeguard Hong Kong citizens’ freedom of press, freedom of expression and freedom of information (both offline and online) and make no law that infringes upon that basic freedom.

Research and Development

Policy Recommendation 6 ‒ Supporting E-government and Internet Social Research

Besides online public opinion research, there is an urgent need to investigate the characteristics, value system (for example political aspiration), process, mechanism, pattern of offline and online public or political engagement/participation, and interaction between offline/online media use of Hong Kong citizens, especially focusing on the younger generation and the politically disengaged and disadvantaged individuals. These research questions are essential for the development of citizen-centric public engagement policy. The research can serve as a part of evidence-based policy making to understand the characteristics of new media use and political participation of the Hong Kong citizens and can become a reference for future long-term and cross-cultural comparative studies. Studies should be developed to become relevant to all stakeholders: academics, policy makers and political practitioners.

42 References Boulianne, S. (2009). Does Internet Use Affect Engagement? A Meta-Analysis of Research. Political Communication, 26(2), 193-211. Box, GEP, Jenkins, GM, & Reinsel, GC. (1976). Time series analysis: Forecasting and Control (Revised ed.). San Francisco: Holden-day Census and Statistics Department. (2012). Thematic Household Survey (Report No. 49, Use of new media) Hong Kong: Census and Statistics Department. Ceron, A, Curini, L, Iacus, SM, & Porro, G. (2013). Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France. New Media & Society. Chan, C. H., & Fu, K. W. (2015, June). Predicting Political Polarization from Cyberbalkanization: Time series analysis of Facebook pages and Opinion Poll during the Hong Kong Occupy Movement. In Proceedings of the ACM Web Science Conference (p. 36). Cheung, CK, Chan, WT, & Leung, KW. (2000). Roles of civic duty, political responsiveness, and mass media for voter turnout in Hong Kong. International Journal of Public Opinion Research, 12(2), 199-207. Coleman, S, & Gotze, J. (2001). Bowling Together: Online Public Engagement in Policy Deliberation. London: Hansard Society. Dahlgren, P. (2007). Introduction: youth, civic engagement and learning via new media In Peter Dahlgren (Ed.), Young citizens and new media: learning for democratic participation (pp. 1-18). New York: Routledge. Debatin, B. (2008). The Internet as a New Platform for Expressing Opinions and as a New Public Sphere. In Wolfgang Donsbach & Michael W. Traugott (Eds.), The SAGE handbook of public opinion research (pp. 64-72). London: SAGE Publications. Deng, H & Runger G. (2013) Gene Selection With Guided Regularized Random Forest, https://arxiv.org/pdf/1209.6425.pdf DiMaggio, P, Hargittai, E, Neuman, WR, & Robinson, JP. (2001). Social implications of the Internet. Annual Review of Sociology, 27, 307-336. Fu, KW, & Chan, CH. (2013). Analyzing Online Sentiment to Predict Telephone Poll Results. Cyberpsychol Behav Soc Netw, 16(9), 702-707. Fu, KW, & Chau, M. (2011). Understanding and Analyzing Online Public Opinion in Hong Kong Cyberspace. Hong Kong: Central Policy Unit, Hong Kong SAR Government. Fu, K.W., Wong, P. W. C., Law, Y. W., & Yip, P. S. F. (2016). Building a typology of young people’s conventional and online political participation: A randomized mobile phone survey in Hong Kong, China. Journal of Information Technology & Politics, 13(2), 126-141. doi:10.1080/19331681.2016.1158138 Gibson, R, Lusoli, W, & Ward, S. (2005). Online Participation in the UK: Testing a 'Contextualised' Model of Internet Effects. The British Journal of Politics & International Relations, 7(4), 561-583.

43 Guo, Z. (2000). Media use habits, audience expectations and media effects in Hong Kong's first legislative council election. International Communication Gazette, 62(2), 133. Habermas, J. (1989). The structural transformation of the public sphere: an inquiry into a category of Bourgeois society. Cambridge: Polity Press. Hardt, S., & Lopes, L. C. (2015). Google Votes: A Liquid Democracy Experiment on a Corporate Social Network. Hayek, F. (1984).The use of knowledge in society .In C. Nishiyama & K. Leube(Eds.),The essence of Hayek. Stanford: Hoover. Hindman, MS. (2009). The myth of digital democracy. Retrieved from http://press.princeton.edu/chapters/s8781.pdf International Telecommunication Union. (2009). Measuring the Information Society - The ICT Development Index. Geneva, Switzerland: International Telecommunication Union. Katz, E. (1957). The two-step flow of communication: An up-to-date report on a hypothesis. Public Opinion Quarterly, 21(1), 61-78. Kempf, AM, & Remington, PL. (2007). New challenges for telephone survey research in the twenty-first century. Annual Review of Public Health, 28, 113-126. Kepplinger, HM. (2008). Effects of the News Media on Public Opinion. In Wolfgang Donsbach & Michael W. Traugott (Eds.), The SAGE handbook of public opinion research (1st ed., pp. 192-204). London: SAGE Publications. Kreuter, F. (2009). Survey Methodology: International Developments. Retrieved from http://ssrn.com/abstract=1447883 Lavrakas, PJ. (2008). Surveys by Telephone. In Wolfgang Donsbach & Michael W. Traugott (Eds.), The SAGE handbook of public opinion research (1st ed., pp. 249- 262). London: SAGE Publications. Lee, FLF. (2006). Collective Efficacy, Support for Democratization, and Political Participation in Hong Kong. International Journal of Public Opinion Research, 18(3), 21p. Lee, FLF. (2010). The Perceptual Bases of Collective Efficacy and Protest Participation: The Case of Pro-Democracy Protests in Hong Kong. International Journal of Public Opinion Research, 22(3), 20p. Lee, FLF, & Chan, JM. (2009). Making Sense of Political Transition: Political Communication Research in Hong Kong. In Lars Willnat & Annette Aw (Eds.), Political communication in Asia (pp. 9-42). New York: Routledge. Livingstone, S, & Markham, T. (2008). The contribution of media consumption to civic participation. British Journal of Sociology, 59(2), 351-371. Marchetti-Bowick, M, & Chambers, N. (2012). Learning for microblogs with distant supervision: political forecasting with Twitter. Paper presented at the Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France. Mejova, Y, Srinivasan, P, & Boynton, B. (2013). GOP primary season on twitter: "popular" political sentiment in social media. Paper presented at the Proceedings of the sixth ACM international conference on Web search and data mining, Rome, Italy.

44 Morozov, E. (2011). The net delusion: The dark side of internet freedom. New York: Public Affairs. Norris, P. (1999). Critical citizens: global support for democratic government. Oxford: Oxford University Press. Norris, P. (2000). A virtuous circle: political communications in postindustrial societies. Cambridge: Cambridge University Press. Norris, P. (2001). Digital divide: civic engagement, information poverty, and the Internet worldwide. Cambridge: Cambridge University Press. Norris, P. (2011). Democratic Deficit: Critical Citizens Revisited. Cambridge, New York: Cambridge University Press. Organisation for Economic Co-operation and Development. (2007). Participative web and user-created content: Web 2.0, wikis and social networking. Retrieved from www.sourceoecd.org/scienceIT/9789264037465 Ronfeldt, D. (1992). Cyberocracy is coming. The Information Society, 8(4), 243-296. Shen, F, Wang, N, Guo, ZS, & Guo, L. (2009). Online network size, efficacy, and opinion expression: Assessing the Impacts of Internet use in China. International Journal of Public Opinion Research, 21(4), 451-476. Smith, A, Schlozman, KL, Verba, S, & Brady, H. (2009). The Internet and Civic Engagement. Washington, D.C.: Pew Internet & American Life Project. Sparks, C. (2001). The Internet and the global public sphere. In W. Lance Bennett & Robert M. Entman. (Eds.), Mediated politics: Communication in the future of democracy (pp. 75-98). Sunstein, C. R. (2008). Neither Hayek nor Habermas. Public Choice. 134: 87–95 Van Alstyne, M., & Brynjolfsson, E. (1996). Electronic Communities: Global Villages or Cyberbalkanization? (Best Theme Paper). ICIS 1996 Proceedings, 5. Wang, S. (2007). Political use of the internet, political attitudes and political participation. Asian Journal of Communication, 17(4), 381-395. Yip, P, Wong, P, Law, F, & Fu, K. (2011). A Study on Understanding our Young Generation. In HKSAR government Central Policy Unit (Series Ed.) Retrieved from http://www.cpu.gov.hk/english/research_reports.htm Zhang, W, & Chia, SC. (2006). The Effects of Mass Media Use and Social Capital on Civic and Political Participation. Communication Studies, 57(3), 21p.

45