c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51
j ournal homepage: www.intl.elsevierhealth.com/journals/cmpb
Cancer-disease associations: A visualization and
animation through medical big data
a,b a,b,1 a,b
Usman Iqbal , Chun-Kung Hsu , Phung Anh (Alex) Nguyen ,
a,b a,b a,b
Daniel Livius Clinciu , Richard Lu , Shabbir Syed-Abdul ,
a,b,c a a,1
Hsuan-Chia Yang , Yao-Chin Wang , Chu-Ya Huang ,
a,b a a,b,d
Chih-Wei Huang , Yo-Cheng Chang , Min-Huei Hsu ,
a,b,e,f a,b,g,∗
Wen-Shan Jian , Yu-Chuan (Jack) Li
a
Graduate of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taiwan
b
International Center for Health Information Technology (ICHIT), Taipei Medical University, Taiwan
c
Institute of Biomedical Informatics, National Yang Ming University, Taiwan
d
Bureau of International Cooperation, Ministry of Health and Welfare, Taipei, Taiwan
e
School of Health Care Administration, Taipei Medical University, Taipei, Taiwan
f
Faculty of Health Sciences, Macau University of Science and Technology, Macau, China
g
Department of Dermatology, Taipei Medical University – Wan Fang Hospital, Taipei, Taiwan
a r t i c l e i n f o a b s t r a c t
Article history: Objective: Cancer is the primary disease responsible for death and disability worldwide.
Received 1 December 2015 Currently, prevention and early detection represents the best hope for cure. Knowing the
Received in revised form expected diseases that occur with a particular cancer in advance could lead to physicians
6 January 2016 being able to better tailor their treatment for cancer. The aim of this study was to build an
Accepted 11 January 2016 animated visualization tool called as Cancer Associations Map Animation (CAMA), to chart
the association of cancers with other disease over time.
Keywords: Methods: The study population was collected from the Taiwan National Health Insurance
Visual analytics Database during the period January 2000 to December 2002, 782 million outpatient visits
Disease visualization were used to compute the associations of nine major cancers with other diseases. A motion
Big data visualization chart was used to quantify and visualize the associations between diseases and cancers.
Cancer disease visualization Results: The CAMA motion chart that was built successfully facilitated the observation of
Cancer comorbidities visualization cancer-disease associations across ages and genders. The CAMA system can be accessed
online at http://203.71.86.98/web/runq16.html.
Conclusion: The CAMA animation system is an animated medical data visualization tool
which provides a dynamic, time-lapse, animated view of cancer-disease associations across
different age groups and gender. Derived from a large, nationwide healthcare dataset, this
∗
Corresponding author at: College of Medical Science and Technology, Taipei Medical University, Chair Dermatology Department, Wan-
Fang Hospital, No. 250, Wu-Hsing Street, Taipei, Taiwan. Tel.: +886 2 27361661x7601; fax: +886 2 6638 7537.
E-mail address: [email protected] (Y.-C. Li).
1
Equal contribution with second author.
http://dx.doi.org/10.1016/j.cmpb.2016.01.009
0169-2607/© 2016 Published by Elsevier Ireland Ltd.
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51 45
exploratory data analysis tool can detect cancer comorbidities earlier than is possible by
manual inspection. Taking into account the trajectory of cancer-specific comorbidity devel-
opment may facilitate clinicians and healthcare researchers to more efficiently explore early
stage hypotheses, develop new cancer treatment approaches, and identify potential effect
modifiers or new risk factors associated with specific cancers.
© 2016 Published by Elsevier Ireland Ltd.
early awareness of signs and symptoms of comorbidities that
1. Introduction
are involved in the tumor development process is important
because of the importance of prevention and early interven-
As advancements in medicine and healthcare contribute to
tion [31].
longer life expectancies and substantial growth of the aged
The Cancer Associations Map Animation (CAMA) project
population [1–4], it is becoming increasingly important to pre-
was conceived of and built by the College of Medical Sci-
dict cancer and its concomitant diseases early in the disease
ence and Technology, Taipei Medical University, Taiwan. It is
process. This is because early prediction and detection is con-
an applied data visualization tool designed with an intuitive
sidered the key to initiate preventive and curative procedures
interface that is intended for use by real-world clinicians and
[5,6]. Over the past two decades, research has amassed much
health researchers who do not have computer programming
information and knowledge on gene expression profiles [7–9],
skills or a strong statistics background. Using CAMA, anyone
protein interactions [10–13] and genome-wide associations
can verify or unearth new associations between cancers and
(GWAS) [14,15] which has contributed to better understanding
other diseases based on a large medical dataset. The large vol-
of the pathophysiological mechanisms in hereditary diseases,
umes of medical data required for the CAMA project was aided
cancers and neurological disorders. Nevertheless, a compre-
in particular by Taiwan’s highly accessible universal health-
hensive understanding of various mechanisms underlying
care system [32] and high rates of annual doctor visits per
most diseases is still in its infant stages.
individual (15 visits per year).
A shift from traditional methods of phenome-wide asso-
ciations (PWAS) studies is occurring. Current visualization
methodologies rely on case reports and sampled cohort stud- 2. Materials and methods
ies for investigating one-to-one disease relationships [16].
This section provides details of the CAMA system, beginning
The increasing availability of large amounts of data in many
from the process of data gathering to the development of the
national healthcare systems has prompted informaticians
visualization tool. The study consisted of two primary steps:
to begin analyzing multiple disease associations simulta-
neously [17–19]. Furthermore, rapid adoption of electronic
health records has enabled the accumulation of significant I. Establishment of the Disease–Disease Association (DDA)
amounts of patient-level clinical data by many healthcare knowledge database.
providers [20,21]. II. Visualization approach.
One of the prime targets for large-scale data visualiza-
tion techniques is cancer due it being responsible for such a 2.1. Establishment of the disease–disease association
large burden of disease worldwide. Each year, tens of millions (DDA) knowledge database
of people are diagnosed with cancer around the world, and
more than half of the patients eventually die from it. Can- 2.1.1. Data source
cer prevention is as essential component of all cancer control In this study, we used Taiwan’s National Health Insurance
plans because about 40% of all cancer deaths can be pre- (NHI) claims database, a database that covered 99% of Taiwan’s
vented [22]. For example, Larsson et al. observed that 20% population by 2013 [33]. The claims data included outpatient
incidence risk of breast cancer is attributed to type 2 dia- visits, dental visits, hospitalizations, medications prescribed,
betes, a preventable disease [23]. The underling mechanism medications refilled, laboratory and imaging examinations,
may be related to alterations in circulating concentrations of and procedure codes for all of Taiwan’s 23 million people
insulin, insulin-like growth factors (IGFs), and endogenous each year. The National Health Insurance Research Database
sex hormones [24]. Insulin inhibits the production of sex (NHIRDB) has become an important “Big Data” research data
hormone-binding globulin (SHBG) [25], which results in an source that has provided an ideal opportunity to conduct
increase in free steroid hormones, (free estrogens in particu- clinical research [33,34]. Moreover, due to Taiwan’s National
lar) because testosterone successfully competes with estrogen Health Insurance Bureau cross-checking all diagnostic codes
for SHBG [26]. Insulin is also a growth-promoting hormone and medication codes before reimbursement, the codes are
with mitogenic effects in both normal and malignant breast known for their exceptional fidelity.
tissue [27]. Insulin suppresses IGF binding protein-1 and thus
increases bioavailability of IGF-1 [28]. Furthermore, studies 2.1.2. Data gathering and cleaning
have also reported that hyperinsulinemia with insulin resis- Data from Taiwan’s NHI database from January 1, 2000 to
tance may increase the risk of breast cancer [29,30]. Thus, December 31, 2002 was eligible for inclusion in this study.
46 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51
During this timeframe approximately 787.86 million outpa-
Table 1 – An example of disease A associated with
tient visit claims were recorded in the database. Each record in
disease B by 2 × 2 table.
the database consisted of a patient’s visit date, patient’s dei-
Disease B Disease A Total
dentified national ID, age, gender, and one to three ICD9-CM
diagnosis codes. We excluded all records due to a miscoded A present A absent
diagnosis, errors regarding the date of birth, or missing or
B present CAB – CB
unknown gender. In total, 5.88 million (0.75%) records were B absent – – –
excluded, while the remaining 782 million visit claims were
Total CA – N
valid records. There were 9.73 million unique males and 10.29
million unique females in the database.
independently of each other, X follows a non-central hyper-
2.1.3. Definition of disease–disease (DD) “association”
geometric distribution X ∼ Hyper(N, CA, CB) [37]. The X1 and
We assumed that two disease were associated if they occurred
X2 were defined as the threshold of CA and threshold of CB
at least once in the same person over 36 consecutive months
respectively; therefore, CAB > Max (0, CA*X1, CB*X2). Moreover,
(3 years). In this study, three year windows (i.e. January 2000 to
the cut-off value ˛ was also defined as a threshold of Q value
December 2002) were used to observe and compute the associ-
to comply with the clinical relationship significance between
ations among cancers and other diseases. All patient records
both diseases A and B. The threshold X1, X2 and cut-off ˛ by
were stratified by gender and age. Therefore, the dataset con-
default were 0.001, 0.001 and 1, respectively.
tained 200 subsets (i.e. 2 genders with 100 age groups). Each
subset was used separately to quantify the disease–disease
2.2. Visualization approach
associations’ strength.
Each patient was followed for 36 months and all of the
After creating a large disease–disease association database,
unique three digit ICD9-CM codes were obtained for each
the next step was to make the knowledge it contains easy
patient (e.g., ICD9-CM code 250 and its children was used to
to visualize. Our goal was to display large amounts of data
define diabetics). All possible pairwise combinations of these
efficiently given limited screen space. We first considered
ICD9-CM codes were then calculated. Disease–disease associ-
rendering each event (i.e. association) as a point. However,
ations were defined as the number of times that two diseases
even mid-sized traces often contain far too many events
co-occurred within a one year window. Q values were calcu-
to display on-screen without causing clutter and losing
lated for each pair in order to measure the association strength
the ability to show patterns and meaningful information.
[35]. In case that no associations were found between a DD
Mostly used in a variety of finance and economics [38–40],
pair, the Q would be equal to 1. Positive DD associations would
the motion chart – a dynamic bubble chart was employed
give Q values greater than 1, and negative DD associations
in this study. This allowed efficient and interactive explo-
would give Q values less than 1. However, DD associations with
ration and visualization of longitudinal multivariate data
less than 5 co-occurrences or less than one by hundred thou-
[41].
sandth of age population [36], by default were considered as
“rare associations” and were not included in this study.
2.2.1. Interface
Our system used the familiar 2D motion bubble chart which
2.1.4. Creating a disease–disease association (DDA)
enabled the display of large multivariate data such as
knowledge database
thousands of disease data points and allowed for interac-
After computing all DD associations for the 200 subsets, we
tive visualization of the data using additional dimensions
stored these associations in a DDA database that contained 3.9
(i.e. time, the size of circles and colors) to show dif-
million and 4.4 million unique associations along with their
ferent characteristics of the data. The central object of
Q values. The Q values were gender-specific, with a separate
a motion chart is a circle. Circles have three important
value for males and a separate value for females. (Figure S1 in
characteristics – size, position and appearance. Using vari-
Appendix).
able mapping, motion charts allow users to control the
appearance of the circles at different time points. This mech-
2.1.5. Distribution of disease co-occurrences and
anism enhances the dynamic appearance of the data in the
prevalence threshold
motion chart and facilitates the visual inspection of disease
As with any sampled statistic, there is uncertainty about the
associations, patterns and trends in multivariate datasets
true population Q values. We handled this by reporting con-
[41]. Box 1 shows the specific mapping variables in our
fidence intervals for each Q value. In the present version of
study.
method implementation, we intentionally avoid using confi-
dence intervals, p-values or other statistical tools normally
involved in hypothesis testing. We do so in order to reflect 3. Results
the empirical information contained in the data without any
subjective interpretation that could otherwise be introduced In this study, we concentrated on 9 common malignant neo-
through, for example, the choice of a significance level. plasms (i.e. Stomach cancer [ICD9-CM], 151; Colorectal cancer,
Let CAB be an outcome of a random variable X, and con- 153; Rectum rectosigmoid junction cancer, 154; Liver can-
sider disease B as a risk factor for disease A. Table 1 shows cer, 155; Lung cancer, 162; Skin cancer, 173; Breast cancer
the margins CA and CB and assumes that subjects are affected (female), 174; Cervical cancer, 180; and Prostate cancer, 185).
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51 47
Table 2 – Number of associations with 9 cancers by gender.
ICD9-CM – Cancer name Number of Q in male Number of Q in female Total
151 – Stomach cancer 22,062 20,633 42,695
153 – Colorectal cancer 30,653 31,782 62,435
154 – Rectum rectosigmoid junction cancer 24,166 24,412 48,578
155 – Liver cancer 32,817 29,524 62,341
162 – Lung cancer 28,941 26,530 55,471
173 – Skin cancer 17,074 16,977 34,051
174 – Female breast cancer – 32,633 32,633
180 – Cervical cancer – 30,678 30,678
185 – Prostate cancer 22,416 – 22,416
mellitus [ICD9-CM], 250 and Infection of kidney, 590). The
Box 1: Mapping variables in this study.
number of Q values, particular diseases, and co-occurrences
according to age were also shown in bar chart, respectively.
Motion Mapping variables in this study chart parameters
4. Discussion
Time Presents age of patients (i.e. 100 age
Cancer is a systemic disease, so other comorbid diseases that a
groups)
cancer patient has will affect the cancer’s clinical course and
X-axis Presents the scale of association’s
the effectiveness of treatments. However, cancer comorbidi-
strength (i.e. Q-values)
ties dramatically change with age and sex, so an animated
Y-axis Presents the scale of count number
visualization of individual cancers and their associations with
of relative disease
other diseases was developed based on this assumption. Ani-
Size of circle Presents the number of
mation enhances the details in a chart and facilitates the
co-occurrence of both diseases A
observation of disease associations, patterns and trends in the
and B
multivariate datasets [41].
Color Presents the category of disease (see
Initially, many were skeptical that what is “the” referring
Table S1 in Appendix)
to associations were real, including the statistician Karl Pear-
son, who wrote: “I must frankly admit that at first I viewed Dr.
Maynard’s conclusion as in some-way based on disregarded
spurious correlation, and due to non-allowance for popula-
Table 2 shows the number of associations with 9 malignant tion, age or general unhealthiness factors. But I have been
neoplasms in this study. The disease associations with their gradually forced by the pressure of these statistical results to
classified categories were observed in the CAMA system. By consider it something very real” [42]. We hypothesized that
default all disease groups were queried and presented in the the CAMA system could visually display the link between
motion chart. Different disease groups were defined in differ- chronic diseases (e.g. diabetes, hypertensive disease, or hyper-
ent colors (see Table S1 in Appendix). A particular disease (i.e. lipidemia, etc.) and cancers. Some studies have reported an
single disease) that was associated with malignant neoplasms increased in the risk of lung cancer in diabetic patients [43,44].
was also presented in CAMA. Fig. 1 shows an overview of our The risk of breast cancer is reported to be approximately 20%
CAMA system interface. higher and colorectal cancer 30% higher in diabetic subjects
A visualization and animation of various diseases asso- compared to those without diabetes [43,45,46]. Those results
ciated with female breast cancer were performed in three are consistent with the Q values which we observed in our
different age groups (i.e. 50, 60, and 70 year-old), showed in system. While this is a good start, several unanswered ques-
Fig. 2. Moreover, the size of the circles shows the number tions have also been raised. For example, it remains unclear
of co-occurrences of a particular disease with breast cancer. whether the association between diabetes and cancer is direct
We observed that the circles could appear in four different (e.g., hyperglycemia leads to cancer), whether diabetes is a
areas of the chart (i.e. right-top, left-top, right-bottom, and marker of underlying biologic factors that alter cancer risk
left-bottom). For example, a circle in the top-left of the chart (e.g., insulin resistance and hyperinsulinemia), or whether
means that the disease has a higher prevalence but has lower diabetes is a confounder for cancer. That is, diabetes is related
association with breast cancer. A circle at the bottom-right to a common risk factor such as obesity, which is also a risk
of the chart shows that the disease is more likely associ- factor for cancer. Other common risk factors for cancer include
ated with breast cancer however, it has a lower prevalence lack of physical activity, diet, race/ethnicity, alcohol, smoking,
in Taiwan. etc.
The motion chart could also display a trend of a par- Nevertheless, this study has potential advantages. First,
ticular disease by double clicking on the circle. Visualizing this work is the first time that a visualization system using
disease–disease associations according to individual age motion charts has been applied to health care data specifi-
groups was done and is presented in Fig. 3 (i.e. Diabetes cally to explore the relationships between cancers and other
48 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51
Fig. 1 – Overview of the CAMA system interface.
Fig. 2 – Illustrations of other diseases with female breast cancer changes across 50-age, 60-age and 70-age.
diseases. The second advantage is that the CAMA anima- a useful tool for doctors, epidemiologists, and public health
tion system focused on the association of 9 common cancers researchers.
with their comorbidities that could reveal several related dis- However, the study also has some limitations. Our study
eases in early cancer risk detection. The motion chart provides is based on the assumption that two diseases are associ-
an array of opportunities and challenges with regard to dis- ated if they were recorded for the same patient within the
ease classification and may aid in the development of new same 1-year observation window, so we did not observe any-
treatment approaches. Third, physicians and researchers can thing outside the window. Both acute and chronic diseases
easily observe trends over time for a particular disease with a were treated identically, thus, regardless of how many times
single mouse click. Moreover, researchers could filter out weak a disease was observed during the observation period, it was
associations with cancers by setting the threshold of number counted as a single occurrence. The motion chart presented
patients or Q values respectively. Fourth, the CAMA visual- in this study does not reveal causation of diseases, either,
ization system is well-suited for displaying cancer-disease although it could be used to identify notable associations
associations in big datasets such as Taiwan’s National Health among diseases. This could generate hypotheses that would
Insurance Database. Thus, the CAMA motion chart can be otherwise be difficult to imagine.
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51 49
Fig. 3 – Display trends of particular disease by age.
gave technical support. U.I., C-K.H., D.L.C., and R.L. wrote the
5. Conclusion manuscript.
The animated visualization of cancer-disease associations
using a large medical dataset was accomplished by building
Funding
a motion chart called Cancer Associations Map Animation
(CAMA). CAMA was used to analyze 782 million outpatient
This research is sponsored in part by Ministry of Science and
visits obtained from NHIRDB over a three-year period and
Technology (MOST) under grant MOST 103-2221-E-038-014,
provided dynamic animation of cancer-disease association
MOST 103-2221-E-038-016, MOST 103-2622-E-038-004-CC2,
across different age groups and gender. Such information
Ministry of Health and Welfare (MOHW), Taiwan, under grant
can be used to identify comorbidity relationships for clin-
MOHW103-TD-B-111-01, MOHW103-CC-EMR-05, Health and
icians and provide additional reference data for medical
researchers. Welfare Surcharge of Tobacco Products grant MOHW104-
TDU-B-212-124-001, Taipei Medical University under grant
99TMU-WFH-10, 101TMU-SHH-21, TMU102-AE1-B31, Taipei
Medical University and Taipei Medical University Hospi-
tal (101-TMU-TMUH-03) and Ministry of Education, Taiwan,
Author contributions
under grant TMUTOP103006-6.
U.I., C-K.H, W-S.J., and Y-C.L. invented and developed the con-
cept. Y-C.L., M-H.H. obtained the data. Y-C.C., P-A.N., C-Y.H.,
Competing interests
and H-C.Y organized and validated the data. Y-C.C., P-A.N.,
W-S.J., S-S.A. reviewed the methods. C-K.H., C-W.H. imple-
mented the method and developed the website. W-S.J., Y-C.W. None.
50 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51
associations: a population-based observational study, J. Am.
Acknowledgements
Med. Inform. Assoc.: JAMIA (2015).
[17] H. Cao, G. Hripcsak, M. Markatou, A statistical methodology
We would like to thank Mr. Yong-Fu Yen from College of Med- for analyzing co-occurrence data from a large sample, J.
ical Science & Technology, Taipei Medical University, Taiwan, Biomed. Inform. 40 (3) (2007) 343–352.
[18] H. Cao, M. Markatou, G.B. Melton, M.F. Chiang, G. Hripcsak,
has contributed in building the web-site for CAMA system.
Mining a clinical data warehouse to discover disease-finding
associations using co-occurrence statistics, in: AMIA Annual
Symposium Proceedings, American Medical Informatics
Appendix A. Supplementary data
Association, 2005, p. 106.
[19] E.S. Chen, G. Hripcsak, H. Xu, M. Markatou, C. Friedman,
Supplementary data associated with this article can be Automated acquisition of disease–drug knowledge from
biomedical and clinical documents: an initial study, J. Am.
found, in the online version, at http://dx.doi.org/10.1016/
Med. Inform. Assoc. 15 (1) (2008) 87–98.
j.cmpb.2016.01.009.
[20] P. Nambisan, G.L. Kreps, S. Polit, Understanding electronic
medical record adoption in the United States:
communication and sociocultural perspectives, Interact. J.
r e f e r e n c e s
Med. Res. 2. (1) (2013).
[21] I.M. Xierali, C.-J. Hsiao, J.C. Puffer, L.A. Green, J.C. Rinaldo,
A.W. Bazemore, M.T. Burke, R.L. Phillips, The rise of
[1] J. Appleby, A. Harrison, Spending on Health Care: How Much
electronic health record adoption among family physicians,
is Enough? King’s Fund, 2006.
Ann. Fam. Med. 11 (1) (2013) 14–19.
[2] F.G. Castles, Population aging and the public purse: how real
[22] WH Organization, Cancer Control: Knowledge into Action:
is the problem? in: Australasian Political Studies Association
WHO Guide for Effective Programmes, vol. 2, World Health
Conference, Canberra, Australia, October 3–6, 2000.
Organization, 2007.
[3] A. Gandjour, Aging diseases – do they prevent preventive
[23] S.C. Larsson, C.S. Mantzoros, A. Wolk, Diabetes mellitus and
health care from saving costs? Health Econ. 18 (3) (2009)
risk of breast cancer: a meta-analysis, Int. J. Cancer 121 (4) 355–362.
(2007) 856–862.
[4] D.B. Reuben, E. Keeler, T.E. Seeman, A. Sewall, S.H. Hirsch,
[24] E.J. Gallagher, D. LeRoith, Diabetes, cancer, and metformin:
J.M. Guralnik, Development of a method to identify seniors
connections of metabolism and cell proliferation, Ann. N. Y.
at high risk for high hospital utilization, Med. Care 40 (9)
Acad. Sci. 1243 (1) (2011) 54–68.
(2002) 782–793.
[25] S.R. Plymate, R.C. Hoop, R.E. Jones, L.A. Matej, Regulation of
[5] P. Greenwald, A favorable view: progress in cancer
sex hormone-binding globulin production by growth factors,
prevention and screening, Cancer Prevent. (2007) 3–17,
Metabolism 39 (9) (1990) 967–970.
Springer.
[26] W. Rosner, The functions of corticosteroid-binding globulin
[6] S. Lee, H. Huang, M. Zelen, Early detection of disease and
and sex hormone-binding globulin: recent advances, Endocr.
scheduling of screening examinations, Stat. Methods Med.
Rev. 11 (1) (1990) 80–91.
Res. 13 (6) (2004) 443–456.
[27] B. van der Burg, G.R. Rutteman, M.A. Blankenstein, S.W. de
[7] L. Li, M. Shiga, W.K. Ching, H. Mamitsuka, Annotating gene
Laat, E.J. van Zoelen, Mitogenic stimulation of human breast
functions with integrative spectral clustering on microarray
cancer cells in a growth factor-defined medium: synergistic
expressions and sequences, Genome Inform. (2010) 95–120,
action of insulin and estrogen, J. Cell. Physiol. 134 (1) (1988)
World Scientific. 101–108.
[8] U.F. Lingappa, X. Wu, A. Macieik, S.F. Yu, A. Atuegbu, M.
[28] C.A. Conover, P.D. Lee, J.A. Kanaley, J.T. Clarkson, M.D.
Corpuz, J. Francis, C. Nichols, A. Calayag, H. Shi, Host–rabies
Jensen, Insulin regulation of insulin-like growth factor
virus protein–protein interactions as druggable antiviral
binding protein-1 in obese and nonobese humans, J. Clin.
targets, Proc. Natl. Acad. Sci. 110 (10) (2013) E861–E868.
Endocrinol. Metab. 74 (6) (1992) 1355–1360.
[9] S.S. Ray, S. Bandyopadhyay, S.K. Pal, Gene ordering in
[29] P.F. Bruning, J.M. Bonfrer, P.A. van Noord, A.A. Hart, M. de
partitive clustering using microarray expressions, J. Biosci.
Jong-Bakker, W.J. Nooijen, Insulin resistance and
32 (Suppl. 1) (2007) 1019–1025.
breast-cancer risk, Int. J. Cancer 52 (4) (1992) 511–516.
[10] M.W. Gonzalez, M.G. Kann, Protein interactions and disease,
[30] G. Yang, G. Lu, F. Jin, Q. Dai, R. Best, X.-O. Shu, J.-R. Chen, X.-Y.
PLoS Comput. Biol. 8 (12) (2012) e1002819.
Pan, M. Shrubsole, W. Zheng, Population-based, case–control
[11] J.N. Hirschhorn, M.J. Daly, Genome-wide association studies
study of blood C-peptide level and breast cancer risk, Cancer
for common diseases and complex traits, Nat. Rev. Genet. 6
Epidemiol. Biomark. Prevent. 10 (11) (2001) 1207–1211.
(2) (2005) 95–108.
[31] WHO, Cancer, vol. 2014, 2014.
[12] Y. Lee, H. Li, J. Li, E. Rebman, I. Achour, K.E. Regan, E.R.
[32] W.S.H. Chan, Taiwan’s healthcare report 2010, EPMA J. 1 (4)
Gamazon, J.L. Chen, X.H. Yang, N.J. Cox, Network models of
(2010) 563–585.
genome-wide association studies uncover the topological
[33] U. Iqbal, P.A. Nguyen, S. Syed-Abdul, H.C. Yang, C.W. Huang,
centrality of protein interactions in complex diseases, J. Am.
W.S. Jian, M.H. Hsu, Y. Yen, Y.C. Li, Is long-term use of
Med. Inform. Assoc. 20 (4) (2013) 619–629.
benzodiazepine a risk for cancer? Medicine 94 (6) (2015)
[13] M. Vidal, D.W. Chan, M. Gerstein, M. Mann, G.S. Omenn, D. e483.
Tagle, S. Sechi, The human proteome – a scientific
[34] Y.-C. Chen, H.-Y. Yeh, J.-C. Wu, I. Haschler, T.-J. Chen, T.
opportunity for transforming diagnostics, therapeutics, and
Wetter, Taiwan’s National Health Insurance Research
healthcare, Clin. Proteomics 9 (1) (2012) 6.
Database: administrative health care database as study
[14] M. Baker, Genomics: the search for association, Nature 467
object in bibliometrics, Scientometrics 86 (2) (2011)
(7319) (2010) 1135–1138. 365–380.
[15] M. Vidal, M.E. Cusick, A.-L. Barabasi, Interactome networks
[35] P.A. Nguyen, S. Syed-Abdul, U. Iqbal, M.-H. Hsu, C.-L. Huang,
and human disease, Cell 144 (6) (2011) 986–998.
H.-C. Li, D.L. Clinciu, W.-S. Jian, Y.-C.J. Li, A probabilistic
[16] S. Syed-Abdul, M. Moldovan, P.A. Nguyen, R. Enikeev, W.S.
model for reducing medication errors, PLOS ONE 8 (12)
Jian, U. Iqbal, M.H. Hsu, Y.C. Li, Profiling phenome-wide (2013) e82401.
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51 51
[36] H.P. Administration, Health Surveillance, vol. 2014, Ministry [43] M.-Y. Lee, K.-D. Lin, P.-J. Hsiao, S.-J. Shin, The association of
of Health and Welfare, Taiwan, 2014. diabetes mellitus with liver, colon, lung, and prostate cancer
[37] C.J. Lloyd, Statistical Analysis of Categorical Data, Wiley, is independent of hypertension, hyperlipidemia, and gout in
New York, 1999. Taiwanese patients, Metabolism 61 (2) (2012) 242–249.
[38] A. Grossenbacher, The globalisation of statistical content, [44] E.T. Petridou, T.N. Sergentanis, C.N. Antonopoulos, N.
Stat. J. IAOS 25 (3) (2008) 133–144. Dessypris, I.L. Matsoukis, K. Aronis, A. Efremidis, C. Syrigos,
[39] H. Rosling, Global Population Growth, Box by Box, vol. 2014, C.S. Mantzoros, Insulin resistance: an independent risk
TED@Cannes, 2010. factor for lung cancer? Metabolism 60 (8) (2011) 1100–1106.
[40] H. Rosling, Religions and Babies, vol. 2014, TedxSummit, [45] M. Inoue, M. Iwasaki, T. Otani, et al., Diabetes mellitus and
2012. the risk of cancer: results from a large-scale
[41] J. Al-Aziz, N. Christou, I.D. Dinov, SOCR motion charts: an population-based cohort study in Japan, Arch. Intern. Med.
efficient, open-source, interactive and dynamic applet for 166 (17) (2006) 1871–1877.
visualizing longitudinal multivariate data, J. Stat. Educ. 18 (3) [46] S.C. Larsson, N. Orsini, A. Wolk, Diabetes mellitus and risk of
(2010) 1–29. colorectal cancer: a meta-analysis, J. Natl. Cancer Inst. 97
[42] K. Pearson, A. Lee, E.M. Elderton, On the correlation of (22) (2005) 1679–1687.
death-rates, J. R. Stat. Soc. 73 (5) (1910) 6.