Medical Research in

Big data and medical research in China BMJ: first published as 10.1136/bmj.j5910 on 5 February 2018. Downloaded from Luxia Zhang and colleagues discuss the development of big data in Chinese healthcare and the opportunities for its use in medical research

11 he quantity of data that is rou- Box 1: Sources of medical big data data. Four in Fujian and Jiangsu tinely generated and collected have provinces in eastern China were chosen as • Administrative and claims data increased greatly in the past dec- the pilot sites, and the centres are now in • Routine population statistics and ade, as has our ability to analyse construction. The goal is to integrate the major disease surveillance data and interpret these data, particu- following datasets: • Real world data, such as electronic Tlarly in . China’s large population • Regional health data, including claims medical records, medical imaging, and and universal healthcare system provide rich data from nationally funded basic health data from health examinations sources of data, and interest in the applica- insurance that covers over 95% of the • Research data, including biomarkers, tiong of bi data to medicine has grown in Chinese population12 and multiomic information from clini- thew past fe years. It is hoped that the com- • Administrative data from local health cal trials or cohort studies bined use of large data resources and new offices • Registries (eg, of devices, procedures, technologies will solve many existing medi- • Data from public health services of the and diseases) cal problems and provide better evidence for Chinese Center for Disease Control and • Data from mobile medical devices decision making.1 Prevention, especially for women and • Data reported by patients children, and for surveillance networks What do we mean by big data? of the main non-communicable diseases Big data has been defined as “high-volume, the speed at which big data are generated • Birth and death registries high-velocity and/or high-variety informa- and processed should meet the real time • Electronic medical records from hospi- tion assets that demand cost-effective, demands of preventing and managing tals, including primary, secondary, and innovative forms of information process- disease. tertiary hospitals. ing that enable enhanced insight, decision Recently, veracity has been added as Chinas i already making use of big data. making, and process automation.”2 a goal of big data,7 although some argue The country’s personal identification sys- Digital healthcare data are now that big data are difficult to validate and tem could be used to link data from vari- common. Large numbers of medical data can never be completely accurate.5 8 ous sources. Medical claims data from the are generated through medical records, Nonetheless, to make the best use of big national social insurance system have been http://www.bmj.com/ regulatory requirements, and medical data, quality is important. usedo t generate a 5% sampling database research.3 Worldwide, the number of data An important concept of big data and an overall database covering over 0.6 are projected to double every two years, is that assembly of the data is not the billion beneficiaries in the past five years, which will result in 50 times more data in purpose. Instead, data must be analysed, whiche ar available to scientific research- 2020 than in 2011.4 interpreted, and acted on. Therefore, ers. Applications to use these data are man- In addition to data volume,5 variety and to get the best value from big data, new agedy b organisations such as the Chinese velocitye ar also important for usability— technologies and analytical methods (eg, Research Association; on 28 September 2021 by guest. Protected copyright. comprising the 3Vs of big data. The variety machine learning) are needed and the there is no public access. comes from the multiple sources of data information generated must be evaluated Since 2016, many academic research (box 1), both structured and unstructured, for clinical effectiveness and translated into projects using these national datasets have which reflect the whole health and disease tools for use in clinical practice.9 been approved to evaluate the current and process. future clinical and economic burden of Medical data are also being combined What data are gathered in China and how? chronic diseases such as cardiovascular with information from social media, Promoting the use of big data in medicine is disease, diabetes, kidney disease, and occupational information, geographical a national priority in China. In June 2016, chronic obstructive pulmonary disease. location, and economic and environmental thee Stat Council of China issued an official Furthermore, other national administrative data.6 Integrating all these information notice on the development, and use of big databases, including the national sourceso int datasets that can be analysed datan i the healthcare sector.10 The council standardised discharge summary of is o key t utilising big data. In addition, acknowledged that big data in health and inpatients and the national death registry, medicine were a strategic national resource with hundreds of millions of patient and their development could improve records, have been used by medical and Key messages , and it set out program- public health researchers.13 14 Chinas i also focusing on personalising • The application of big data to health matic development goals, key tasks, and an and medicine is a national priority for organisational framework. medicine. Since 2016, the Ministry of China After regional health data centres were Science and Technology has initiated and established in Shanghai and Ningbo, the funded many “precision medicine” projects Several initiatives to promote big data • National Health and Family Planning under the national key research and have been started by the government Commission announced in 2016 that development programme. A centralised and researchers China would establish more regional and and integrated data platform for precision The use of big data and new data • national centres and industrial parks that i being developed, which will technologies has the potential to focused on big data in health and medicine store all patient/population data as well as improve medical research and the ast par of a national pilot programme biosamples collected from a series of large understanding of health, and disease to make more meaningful use of these cohort studies and from biobanks. The the bmj | BMJ 2018;360:j5910 | doi: 10.1136/bmj.j5910 1 Medical Research in China

platform is expected to include at least 0.7 What are the challenges and what needs to Current medical practice patterns

million participants, 0.4 million from the be done? Medical practice patterns and the infra- BMJ: first published as 10.1136/bmj.j5910 on 5 February 2018. Downloaded from general population and 0.3 million from Electronic record systems fstructure o health systems in China also patients with major non-communicable Electronic medical records, whether col- impede the meaningful use of big data. The diseases. China’s large population base lected by one organisation or for indi- lack of an established referral system and and centralised governance mean that very vidual patients across organisations, are the heterogeneity in the quality of health- large sample sizes can be reached, which not commonly used for research in China. care contribute to “medical migration,” is of great value to personalised medicine Theye ar primarily used for clinical prac- when patients travel to different provinces initiatives. tice and largely contain unstructured data. ando cities t seek medical care. In the cur- As well as the government-led projects, Although over 90% of hospitals in China rent Chinese medical system, it is almost Chinese academic medical societies use electronic records, accessibility to and impossible to track a patient through elec- are leading data-sharing initiatives quality of the data are not optimal. tronic record systems for clinical purposes (box 2). In October 2017, the School Adoption of individual electronic ase ther is no unified national platform of Public Health at health records has been impeded by that can consolidate all the data from all announced the launch of the China Cohort incompatibility between different hospital healthcare institutions in China. The main Consortium (chinacohort.bjmu.edu.cn/ systems. China has over 300 commercial barriero t conducting a “deep patient” home). Currently 20 cohorts with more providers of hospital information systems study,16 where machine learning is used to than 2 million participants are included. with various technical structures and data predict future adverse events using medical The activities of the consortium include standards. Furthermore, healthcare systems data, is obtaining the longitudinal data and using common data models for data are not required to exchange data with each outcomes of each patient from electronic harmonisation, performing individual other. Some regions are planning to establish records. Furthermore, the wide differences participant data meta-analyses, and regional electronic health records but most in medical practice raise concerns about generating new cohorts. Furthermore, are in preliminary stages. To overcome these the veracity of data. disease based data sharing platforms, problems, the interoperability of electronic including for cardiovascular disease, records needs to be improved, especially for Data quality stroke, cancer, and kidney disease, have data structures, data standards, and data The problems described above affect the been established by medical specialists transfer agreements. Health authorities, quality of big data. It has been shown that, with the support of the government. hospitals, and electronic record companies when the quality of clinical data is higher, For example, the China Kidney Disease must agree on how to improve hospital big data analytics produce more valid, sta- 17 Network (kidney.net.cn), which launched information systems. Technologies that can ble, and clinically useful results. wHo - in 2015, integrates various sources of integrate data from different sources are ever, it is difficult to validate high volume

data on kidney disease and uses new also needed. In addition, the government datasets. One way of dealing with the data http://www.bmj.com/ analytic techniques to provide evidence should introduce policies to strengthen quality problem is to examine the charac- for healthcare policy, strengthen academic data exchange and integration across teristics of the database and judge which research, and promote effective disease organisations. variables are likely to be relatively accu- management.15 rate—for example, expenditure from claims Lack of medical terminology system data—and to answer questions based on Thek lac of a widely adopted and consist- those variables. Improving the veracity of Box 2: Current projects applying big data to ently implemented medical terminology data requires an ongoing and joint effort by medicine in China on 28 September 2021 by guest. Protected copyright. system is another problem for using big multiple sectors to rigorously examine the Government led data in medical research. For example, validity, representativeness, and complete- • Development of regional health data since 2002, the use of the International ness of data. centres with pilot programmes in Classification of Diseases (ICD-9, and more four cities in Fujian and Jiangsu prov- recently ICD-10) was mandated by the Privacy concerns inces (http://en.nhfpc.gov.cn/2016- National Health and Family Planning Com- Although privacy is an extremely important 10/24/c_70420.htm) mission for all hospital patients. However, topic for big data in health and medicine, • Opening of existing national admin- the growth of hospital information systems there is no specific law or guidance on this istrative, claims, death registry, and has resulted in many variations in the cod- in China. Regulation from authorities and other databases for academic use ing of other clinical terms beyond diagno- research standards about privacy protec- • Promotion of precision medicine sis,g makin data exchange difficult. Widely tione ar needed that do not jeopardise the by supporting cohort studies and accepted terminology systems, such as the completeness of data that can be used. integrated data platforms (http:// Systematized Nomenclature of Medicine– www.most.gov.cn/tztg/201603/ Clinical Terms (SNOMED CT), the Unified Opportunities to improve health t20160308_124542.htm) Medical Language System (UMLS), or the The use of big data in medicine includes Researcher initiated General Architecture for Languages, Ency- public health promotion (disease moni- •  China Cohort Consortium (http://chi- clopaedias and Nomenclatures in Medicine toring and population management), nacohort.bjmu.edu.cn/home) (GALEN), are not available in China. By healthcare management (quality control • China Kidney Disease Network (kidney. integrating and distributing key terminol- and performance measurement), drug net.cn) ogy, classification, and coding standards and medical device surveillance, routine • Others funded by the government in medicine, these systems promote more clinical practice (risk prediction, diagno- include cardiovascular disease (eg, effective and interoperable biomedical sis accuracy, and decision support), and China Cardiovascular Surgery Reg- information systems and services, includ- research.19 istry), stroke (eg, Chinese National ing electronic health records. More effort The existing mandatory national Stroke Registry), and cancer (eg, is needed to resolve linguistic differences administrative databases in China National Central Cancer Registry of between Chinese and English beyond the produce big data that can easily be used China) existing translation of terms. to monitor trends in major diseases and

2 doi: 10.1136/bmj.j5158 | BMJ 2018;360:j5158 | the bmj Medical Research in China

provide evidence for policy making in is a nephrologist with substantial experience in 5 Baro E, Degoul S, Beuscart R, Chazard E. Toward a experimental research and population based studies literature-driven definition of big data in healthcare.

healthcare. New data analytics, such as BMJ: first published as 10.1136/bmj.j5910 on 5 February 2018. Downloaded from in China. QMZ is the Academician of the Chinese Biomed Res Int 2015;2015:639021. PubMed machine learning, to replace much of Academy of Engineering and the Chief Scientist of doi:10.1155/2015/639021http://www.ncbi.nlm. the work of radiologists and anatomical the 973 National Fundamental Program in China. His nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMe pathologists, can also be used and is an main interest is the translational study of cancer. LZ d&list_uids=26137488&dopt=Abstract 18 and HW contributed equally to this work and are the 6 Fernández-Luque L, Bau T. Health and social active area of research in China. However, guarantors. This article arose from discussions about media: perfect storm of information. Healthc Inform for applications that need detailed and the status and future directions of big data in health Res 2015;21:67-73. doi:10.4258/hir.2015.21.2.67 high quality clinical information and long and medicine in China, and the relationship with 7 Kruse CS, Goswamy R, Raval Y, Marawi S. Challenges traditional medical studies. and opportunities of big data in : a term follow-up, such as predicting long systematic review. JMIR Med Inform 2016;4:e38. term outcomes and providing support Competing interests: We have read and understood doi:10.2196/medinform.5359 for clinical decisions, the data systems in BMJ policy on declaration of interests and declare 8 Ward JC. Oncology reimbursement in the era that the article was funded by the World Health China need to be developed further. of personalized medicine and big data. J Oncol Organization (WHO Reference 2014/435380-0), the Pract 2014;10:83-6. doi:10.1200/JOP.2014. In China, discussion on big data in National Key Technology R&D Program of the Ministry 001308 medicine has focused on how to collect, of Science and Technology (2016YFC1305400), 9 Rumsfeld JS, Joynt KE, Maddox TM. Big data store, integrate, and manage data and has and the University of Michigan Health System-Peking analytics to improve cardiovascular care: promise University Health Science Center Joint Institute for and challenges. Nat Rev Cardiol 2016;13:350-9. been y led b computer scientists, and the Translational and Clinical Research (BMU20140479). doi:10.1038/nrcardio.2016.42 health information industry. However, the 10 The State Council. The People’s Republic of futureg of bi data in medicine is in using Provenance and peer review: Commissioned; China. China to boost big data application in externally peer reviewed. health and medical sectors. 2016 http://english. new analytic techniques such as machine 1 2 Luxia Zhang, professor gov.cn/policies/latest_releases/2016/06/24/ learningo t answer clinical questions, Haibo Wang, researcher3 4 content_281475379018156.htm educating doctors and policy makers to 11 National Health and Family Planning Commission of Quanzheng Li, associate professor5 understand big data, and promoting the the PRC. China to build health care big data centers, Ming-Hui Zhao, professor1 6 industrial parks. 2016 http://en.nhfpc.gov.cn/2016- usef o tools generated by big data and big 7 10/24/c_70420.htm Qi-Min Zhan, professor 12 Shan L, Wu Q, Liu C, et al. Perceived challenges data technologies that support clinical 1 decision making. Renal Division, Department of Medicine, Peking to achieving universal health coverage: University First Hospital, Peking University Institute of a cross-sectional survey of social health Nephrology, , China insurance managers/administrators in China. Conclusion 2Peking University, Center for Data Science in Health BMJ Open 2017;7:e014425. doi:10.1136/ China’s national campaign to promote the and Medicine, Beijing, China bmjopen-2016-014425 3 13 Zhang L, Long J, Jiang W, et al. Trends in application of big data in health and medi- Clinical Trial Unit, First Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China chronic kidney disease in China. N Engl J Med 2016;375:905-6. doi:10.1056/ cine is likely to change medical research, 4 China Standard Medical Information Research Center, NEJMc1602469 medical practice, and the development of Shenzhen, China 14 Zhou M, Wang H, Zhu J, et al. Cause-specific http://www.bmj.com/ the healthcare industry in the near future. 5MGH & BWH Center for Clinical Data Science, mortality for 240 causes in China during 1990- Despite the great interest in big data, we Massachusetts General Hospital, Harvard Medical 2013: a systematic subnational analysis for advocate following Confucian doctrine to School, Boston, Massachusetts, United States of America the Global Burden of Disease Study 2013. 6Peking-Tsinghua Center for Life Sciences, Beijing, Lancet 2016;387:251-72. doi:10.1016/S0140- ensure that we obtain true value for medi- 6736(15)00551-6 cine—that is, to learn extensively, inquire China 7 15 Zhang L, Wang H, Long J, et al, China Kidney carefully, think deeply, discriminate clearly, Peking University, Health Science Center, Beijing, Disease Network (CK-NET) 2014 annual data China and practise faithfully. report. Am J Kidney Dis 2017;69(suppl 2):S1-S149. Correspondence to: L Zhang doi:10.1053/j.ajkd.2016.06.011 on 28 September 2021 by guest. Protected copyright. We thank Alan Leichtman (Arbor Research [email protected] 16 Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an Collaborative for Health, and University of Michigan) unsupervised representation to predict the future and Roseanne Yeung (University of Alberta) for their 1 Obermeyer Z, Emanuel EJ. Predicting the future—big of patients from the electronic health records. constructive suggestions and editing. We also thank data, machine learning, and clinical medicine. Sci Rep 2016;6:26094. doi:10.1038/srep26094 Fan Liu (former chief information officer of Peking N Engl J Med 2016;375:1216-9. doi:10.1056/ 17 Altman RB, Ashley EA. Using “big data” to dissect University People’s Hospital and Peking University NEJMp1606181 clinical heterogeneity. Circulation 2015;131:232-3. International Hospital) for comments on electronic 2 Gartner. Big data. https://www.gartner.com/ doi:10.1161/CIRCULATIONAHA.114.014106 record systems. it-glossary/big-data/ 18 The State Council. The People’s Republic of 3 Auffray C, Balling R, Barroso I, et al. Making sense China. China issues guideline on artificial Contributors and sources: LZ is a renal of big data in health research: towards an EU action intelligence development. 2017. http://english. epidemiologist and the executive deputy director of plan. Genome Med 2016;8:71. doi:10.1186/ gov.cn/policies/latest_releases/2017/07/20/ Peking University, Center for Data Science in Health s13073-016-0323-y content_281475742458322.htm and Medicine. HW is the founding director and key 4 Austin C, Kusumoto F. The application of big data in architect of several national medical databases in medicine: current implications and future directions. Cite this as: BMJ 2018;360:j5910 China. QL is a principal investigator focusing on J Interv Card Electrophysiol 2016;47:51-9. http://dx.doi.org/10.1136/bmj.j5910 artificial intelligence in health and medicine. MHZ doi:10.1007/s10840-016-0104-y

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

the bmj | BMJ 2018;360:j5158 | doi: 10.1136/bmj.j5158 3