History & Philosophy of Medicine
Total Page:16
File Type:pdf, Size:1020Kb
History & Philosophy of Medicine doi: 10.12032/HPM20210404031 Analysis of microblog public opinion characteristics on traditional Chinese medicine against COVID-19 based on deep learning Shi-Pian Li1, Xue-Meng Cai1, Cheng Chen1, Ze-Lin Wei2, Wen-Zong Zhang3, Dai-Le Zhang1, Yong-Ming Guo1, Xin-Ju Li1 * 1Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China. 2Central China Normal University, Wuhan 430077, China. 3Beijing University of Technology, Beijing 100081, China. *Corresponding to: Xin-Ju Li. Tianjin University of Traditional Chinese Medicine, No. 10, Poyanghu Road, West Area, Tuanbo New Town, Jinghai District, Tianjin 301617, China. E-mail: [email protected]. Abstract The opinion research on traditional Chinese medicine during the coronavirus disease 2019 (COVID-19) pandemic on microblog, a social network, took into account the national people’s fight against COVID-19 — the research background — the strength of traditional Chinese medicine during the pandemic — the research topic — and the public opinion — the research object. The timeline was divided into three stages according to the overall heat change. In order to explore and compare people’s emotion and topics of concern on traditional Chinese medicine during the different stages of the pandemic, deep learning analysis methods such as emotional analysis and Latent Dirichlet Allocation analysis were used. This study found that the public’s positive “emotional composition” on traditional Chinese medicine significantly improved within the timeline, while the public’s autonomy was enhanced and the overall public opinion started to show an increased trend. Keywords: Deep learning, COVID-19, Public opinion analysis, Traditional Chinese medicine Competing interests: The authors declare no conflicts of interest. Acknowledgments: The authors did not receive any funding for this study. Abbreviation: COVID-19, coronavirus disease 2019; TCM, traditional Chinese medicine; LSTM, long-term and short-term memory network; LDA, latent dirichlet allocation. Citation: Li SP, Cai XM, Chen C, et al. Analysis of microblog public opinion characteristics on traditional Chinese medicine against COVID-19 based on deep learning. Hist Philos Med. 2021;3(2):5. doi: 10.12032/HPM20210404031. Executive editor: Shan-Shan Lin. Submitted: 16 March 2021, Accepted: 04 April 2021, Online: 16 April 2021. © 2021 By Authors. Published by TMR Publishing Group Limited. This is an open access article under the CC-BY license (http://creativecommons.org/licenses/BY/4.0/). Submit a manuscript: https://www.tmrjournals.com/hpm 1 doi: 10.12032/HPM20210404031 REVIEW academic community regarded the protest against Background NATO bombing incident in People’s Network Forum in 1999 as the beginning of public opinion’s During the coronavirus disease 2019 (COVID-19) effectively entering into the Chinese society. We pandemic, people responded to the calls to refrain searched the general database with “ 舆 情 (public going outdoors. Consequently, the time spent on the opinion)” as a key word, in the annual publication of Internet has greatly increased. By last December, the HowNet (Figure 1). The rise of public opinion research internet penetration rate in China had reached 70.4%, is only about ten years old. Nowadays, public opinion and the number of Internet users was approximately research mainly focuses on public opinion monitoring, 989 million [1]. At the peak of the pandemic, the analysis, and guidance. Through the analysis of public average number of hours of internet use per week was opinion literature on HowNet, preliminary results were up to 30.8 hours, which was significantly higher than obtained. in other periods. During this time, people paid attention to the progress of the front-line pandemic Data sources and engineering characteristics work. For example, hot topics appeared online such as In this paper, we crawled all 198,928 text data with the construction of the Wuhan Huoshenshan Hospital “ 中 医 药 (TCM)” as the keyword in a microblog. and the Wuhan Leishenshan Hospital. The internet Firstly, we crawled the microblog data through the public opinion on public health also reached an scratch distributed crawler framework and configured unprecedented dimension. The decentralized trend of the agent to solve the anti-crawling mechanism of online social communication has contributed to the microblog. Then, we preprocessed the data, deleted the actual degree of freedom of public opinion. stop words (https://github.com/goto456/stopwords.), Particularly, the internet has become a new approach used the term frequency–inverse document frequency to express opinion. Consequently, the internet enabled (TF-IDF) [4] algorithm to process the text data, and research on the public opinion on epidemic-related finally obtained the matrix expression containing all content. the text information. Because of the lack of knowledge on Chinese traditional culture, the public has low awareness of Text sentiment analysis traditional Chinese medicine (TCM) and easily Text sentiment analysis is a process of analysis, misinterprets it. Since 13th five-year, the Communist processing, induction, and reasoning subjective text Party of China Central Committee with Comrade Xi with emotional color [5]. This section aimed to make Jinping gave great importance to the development of use of the long-term and short-term memory network TCM. In the opinion of the Communist Party of China (LSTM) in deep learning technology [6]. Based on the Central Committee and the State Council on promoting emotional analysis of the text, LSTM network are a the inheritance, innovation, and development of TCM, variant of recurrent neural network. Recurrent neural researchers should promote the benefits of TCM network can only have short-term memory because of culture through media, strengthen and standardize the the gradient disappearance. LSTM network combines dissemination and popularization of knowledge about short-term memory with long-term memory through prevention and treatment of diseases in TCM, and subtle gate control, and solves the problem of gradient create a social atmosphere on TCM that people cherish, disappearance to a certain extent. At present, it showed love, and support [2]. During the fight against a good performance in solving the problems of time COVID-19, there was approximately 5,000 Chinese series, natural language processing, and speech medical staff. For patients with COVID-19, the recognition. utilization rate of Chinese medicines was over 92% [3], In this paper, we choose the popular microblog and the effective rate of the confirmed cases in Hubei comment emotion data set as the training set of the was over 90%. TCM has played an irreplaceable role neural network. The data set had three types of tags: in the fight against this pandemic. The search on positive, negative, and neutral. In this model, the input internet’s public opinion of TCM can complement the layer node of LSTM network was set to 50. Because research systems on public opinion, providing a the final prediction result could be of three types, the support for policy implementation and referencing output layer node was 3. The dropout layer was added TCM as essential in improving health during the to prevent over fitting. By adjusting parameters and pandemic. comparing the results, the number of nodes in the hidden layer was 16. In the process of building the Methods model, MSE, meaning mean square error, was chosen as the loss function, tanh was selected as the activation Literature research method function in the hidden layer, and softmax was At the end of the 18th century, Rousseau had put preferred as the output layer. In order to find the best forward the concept of “public opinion”. Domestic balance between memory efficiency and memory public opinion research started relatively late. The capacity, RTX Titan graphics card was selected. 2 Submit a manuscript: https://www.tmrjournals.com/hpm History & Philosophy of Medicine doi: 10.12032/HPM20210404031 Figure 1 Trends of HowNet public opinion literature Because of the low number of parameters, there was words in each text is as follows. no need to consider the use of video memory, so batch T was selected. When the loss of training was less than P(wi ) P(wi zi )P(zi j) 1e-5, the training was stopped. It was found that the j1 training had stopped when the iteration was of about four times. There were 14,795 complete training In this paper, we used Gensim to build a topic parameters with a final training accuracy of 94%. The analysis model and PyLDAvis to visualize the model. model was saved and outputted as the model weight. Finally, we calculated the text confusion degree to Finally, all the parameters were used to train the neural determine the topic parameters and to evaluate the network, and the test data was added to the final model. Confusion degree meant that the number of model. document topics generated by the training model is uncertain. Different number of topics will change the Latent dirichlet allocation confusion degree, being lower when the document clustering effect is better. Latent dirichlet allocation (LDA) is a topic analysis method of mining text topics using a probabilistic Basis of stage division model [7]. Based on the maximum likelihood method According to the latest statistics of the Chinese internet and generative model, LDA reduces the dimension of network information center, microblog is the third high-dimensional text data to a lower dimensional largest social networking platform, which is second space. On this basis, if LDA is used prior distribution, only to WeChat’s circles of friends and QQ space. Due it forms a naive Bayesian model of article-topic-single to its information openness, it is more approachable to word. Finally, LDA finds the semantic structure and the development of data mining. Therefore, microblog mines articles by calculating their probability. Each was used as the data source platform in this study. text can be expressed as the probability distribution P Baidu is a high usage search engine, with a daily user (z) of a series of topics, and each topic is the activity of nearly 200 million people.