Big Data and Medical Research in China BMJ: First Published As 10.1136/Bmj.J5910 on 5 February 2018
Total Page:16
File Type:pdf, Size:1020Kb
MEDICAL RESEARCH IN CHINA Big data and medical research in China BMJ: first published as 10.1136/bmj.j5910 on 5 February 2018. Downloaded from Luxia Zhang and colleagues discuss the development of big data in Chinese healthcare and the opportunities for its use in medical research 11 he quantity of data that is rou- Box 1: Sources of medical big data data. Four cities in Fujian and Jiangsu tinely generated and collected have provinces in eastern China were chosen as • Administrative and claims data increased greatly in the past dec- the pilot sites, and the centres are now in • Routine population statistics and ade, as has our ability to analyse construction. The goal is to integrate the major disease surveillance data and interpret these data, particu- following datasets: • Real world data, such as electronic Tlarly in medicine. China’s large population • Regional health data, including claims medical records, medical imaging, and and universal healthcare system provide rich data from nationally funded basic health data from health examinations sources of data, and interest in the applica- insurance that covers over 95% of the • Research data, including biomarkers, tiong of bi data to medicine has grown in Chinese population12 and multiomic information from clini- thew past fe years. It is hoped that the com- • Administrative data from local health cal trials or cohort studies bined use of large data resources and new offices • Registries (eg, of devices, procedures, technologies will solve many existing medi- • Data from public health services of the and diseases) cal problems and provide better evidence for Chinese Center for Disease Control and • Data from mobile medical devices decision making.1 Prevention, especially for women and • Data reported by patients children, and for surveillance networks What do we mean by big data? of the main non-communicable diseases Big data has been defined as “high-volume, the speed at which big data are generated • Birth and death registries high-velocity and/or high-variety informa- and processed should meet the real time • Electronic medical records from hospi- tion assets that demand cost-effective, demands of preventing and managing tals, including primary, secondary, and innovative forms of information process- disease. tertiary hospitals. ing that enable enhanced insight, decision Recently, veracity has been added as Chinas i already making use of big data. making, and process automation.”2 a goal of big data,7 although some argue The country’s personal identification sys- Digital healthcare data are now that big data are difficult to validate and tem could be used to link data from vari- common. Large numbers of medical data can never be completely accurate.5 8 ous sources. Medical claims data from the are generated through medical records, Nonetheless, to make the best use of big national social insurance system have been http://www.bmj.com/ regulatory requirements, and medical data, quality is important. usedo t generate a 5% sampling database research.3 Worldwide, the number of data An important concept of big data and an overall database covering over 0.6 are projected to double every two years, is that assembly of the data is not the billion beneficiaries in the past five years, which will result in 50 times more data in purpose. Instead, data must be analysed, whiche ar available to scientific research- 2020 than in 2011.4 interpreted, and acted on. Therefore, ers. Applications to use these data are man- In addition to data volume,5 variety and to get the best value from big data, new agedy b organisations such as the Chinese velocitye ar also important for usability— technologies and analytical methods (eg, Health Insurance Research Association; on 28 September 2021 by guest. Protected copyright. comprising the 3Vs of big data. The variety machine learning) are needed and the there is no public access. comes from the multiple sources of data information generated must be evaluated Since 2016, many academic research (box 1), both structured and unstructured, for clinical effectiveness and translated into projects using these national datasets have which reflect the whole health and disease tools for use in clinical practice.9 been approved to evaluate the current and process. future clinical and economic burden of Medical data are also being combined What data are gathered in China and how? chronic diseases such as cardiovascular with information from social media, Promoting the use of big data in medicine is disease, diabetes, kidney disease, and occupational information, geographical a national priority in China. In June 2016, chronic obstructive pulmonary disease. location, and economic and environmental thee Stat Council of China issued an official Furthermore, other national administrative data.6 Integrating all these information notice on the development, and use of big databases, including the national sourceso int datasets that can be analysed datan i the healthcare sector.10 The council standardised discharge summary of is o key t utilising big data. In addition, acknowledged that big data in health and inpatients and the national death registry, medicine were a strategic national resource with hundreds of millions of patient and their development could improve records, have been used by medical and KEY MESSAGES healthcare in China, and it set out program- public health researchers.13 14 Chinas i also focusing on personalising • The application of big data to health matic development goals, key tasks, and an and medicine is a national priority for organisational framework. medicine. Since 2016, the Ministry of China After regional health data centres were Science and Technology has initiated and established in Shanghai and Ningbo, the funded many “precision medicine” projects Several initiatives to promote big data • National Health and Family Planning under the national key research and have been started by the government Commission announced in 2016 that development programme. A centralised and researchers China would establish more regional and and integrated data platform for precision The use of big data and new data • national centres and industrial parks that medicines i being developed, which will technologies has the potential to focused on big data in health and medicine store all patient/population data as well as improve medical research and the ast par of a national pilot programme biosamples collected from a series of large understanding of health, and disease to make more meaningful use of these cohort studies and from biobanks. The the bmj | BMJ 2018;360:j5910 | doi: 10.1136/bmj.j5910 1 MEDICAL RESEARCH IN CHINA platform is expected to include at least 0.7 What are the challenges and what needs to Current medical practice patterns million participants, 0.4 million from the be done? Medical practice patterns and the infra- BMJ: first published as 10.1136/bmj.j5910 on 5 February 2018. Downloaded from general population and 0.3 million from Electronic record systems fstructure o health systems in China also patients with major non-communicable Electronic medical records, whether col- impede the meaningful use of big data. The diseases. China’s large population base lected by one organisation or for indi- lack of an established referral system and and centralised governance mean that very vidual patients across organisations, are the heterogeneity in the quality of health- large sample sizes can be reached, which not commonly used for research in China. care contribute to “medical migration,” is of great value to personalised medicine Theye ar primarily used for clinical prac- when patients travel to different provinces initiatives. tice and largely contain unstructured data. ando cities t seek medical care. In the cur- As well as the government-led projects, Although over 90% of hospitals in China rent Chinese medical system, it is almost Chinese academic medical societies use electronic records, accessibility to and impossible to track a patient through elec- are leading data-sharing initiatives quality of the data are not optimal. tronic record systems for clinical purposes (box 2). In October 2017, the School Adoption of individual electronic ase ther is no unified national platform of Public Health at Peking University health records has been impeded by that can consolidate all the data from all announced the launch of the China Cohort incompatibility between different hospital healthcare institutions in China. The main Consortium (chinacohort.bjmu.edu.cn/ systems. China has over 300 commercial barriero t conducting a “deep patient” home). Currently 20 cohorts with more providers of hospital information systems study,16 where machine learning is used to than 2 million participants are included. with various technical structures and data predict future adverse events using medical The activities of the consortium include standards. Furthermore, healthcare systems data, is obtaining the longitudinal data and using common data models for data are not required to exchange data with each outcomes of each patient from electronic harmonisation, performing individual other. Some regions are planning to establish records. Furthermore, the wide differences participant data meta-analyses, and regional electronic health records but most in medical practice raise concerns about generating new cohorts. Furthermore, are in preliminary stages. To overcome these the veracity of data. disease based data sharing platforms, problems, the interoperability of electronic