c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51

j ournal homepage: www.intl.elsevierhealth.com/journals/cmpb

Cancer-disease associations: A and

animation through medical big

a,b a,b,1 a,b

Usman Iqbal , Chun-Kung Hsu , Phung Anh (Alex) Nguyen ,

a,b a,b a,b

Daniel Livius Clinciu , Richard Lu , Shabbir Syed-Abdul ,

a,b,c a a,1

Hsuan-Chia Yang , Yao-Chin Wang , Chu-Ya Huang ,

a,b a a,b,d

Chih-Wei Huang , Yo-Cheng Chang , Min-Huei Hsu ,

a,b,e,f a,b,g,∗

Wen-Shan Jian , Yu-Chuan (Jack) Li

a

Graduate of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taiwan

b

International Center for Health Information Technology (ICHIT), Taipei Medical University, Taiwan

c

Institute of Biomedical Informatics, National Yang Ming University, Taiwan

d

Bureau of International Cooperation, Ministry of Health and Welfare, Taipei, Taiwan

e

School of Health Care Administration, Taipei Medical University, Taipei, Taiwan

f

Faculty of Health Sciences, Macau University of Science and Technology, Macau, China

g

Department of Dermatology, Taipei Medical University – Wan Fang Hospital, Taipei, Taiwan

a r t i c l e i n f o a b s t r a c t

Article history: Objective: Cancer is the primary disease responsible for death and disability worldwide.

Received 1 December 2015 Currently, prevention and early detection represents the best hope for cure. Knowing the

Received in revised form expected diseases that occur with a particular cancer in advance could lead to physicians

6 January 2016 being able to better tailor their treatment for cancer. The aim of this study was to build an

Accepted 11 January 2016 animated visualization tool called as Cancer Associations Animation (CAMA), to

the association of cancers with other disease over time.

Keywords: Methods: The study population was collected from the Taiwan National Health Insurance

Visual analytics Database during the period January 2000 to December 2002, 782 million outpatient visits

Disease visualization were used to compute the associations of nine major cancers with other diseases. A motion

Big chart was used to quantify and visualize the associations between diseases and cancers.

Cancer disease visualization Results: The CAMA motion chart that was built successfully facilitated the observation of

Cancer comorbidities visualization cancer-disease associations across ages and genders. The CAMA system can be accessed

online at http://203.71.86.98/web/runq16.html.

Conclusion: The CAMA animation system is an animated medical data visualization tool

which provides a dynamic, time-lapse, animated view of cancer-disease associations across

different age groups and gender. Derived from a large, nationwide healthcare dataset, this

Corresponding author at: College of Medical Science and Technology, Taipei Medical University, Chair Dermatology Department, Wan-

Fang Hospital, No. 250, Wu-Hsing Street, Taipei, Taiwan. Tel.: +886 2 27361661x7601; fax: +886 2 6638 7537.

E-mail address: [email protected] (Y.-C. Li).

1

Equal contribution with second author.

http://dx.doi.org/10.1016/j.cmpb.2016.01.009

0169-2607/© 2016 Published by Elsevier Ireland Ltd.

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51 45

exploratory data analysis tool can detect cancer comorbidities earlier than is possible by

manual inspection. Taking into account the trajectory of cancer-specific comorbidity devel-

opment may facilitate clinicians and healthcare researchers to more efficiently explore early

stage hypotheses, develop new cancer treatment approaches, and identify potential effect

modifiers or new risk factors associated with specific cancers.

© 2016 Published by Elsevier Ireland Ltd.

early awareness of signs and symptoms of comorbidities that

1. Introduction

are involved in the tumor development process is important

because of the importance of prevention and early interven-

As advancements in medicine and healthcare contribute to

tion [31].

longer life expectancies and substantial growth of the aged

The Cancer Associations Map Animation (CAMA) project

population [1–4], it is becoming increasingly important to pre-

was conceived of and built by the College of Medical Sci-

dict cancer and its concomitant diseases early in the disease

ence and Technology, Taipei Medical University, Taiwan. It is

process. This is because early prediction and detection is con-

an applied data visualization tool designed with an intuitive

sidered the key to initiate preventive and curative procedures

interface that is intended for use by real-world clinicians and

[5,6]. Over the past two decades, research has amassed much

health researchers who do not have computer programming

information and knowledge on gene expression profiles [7–9],

skills or a strong statistics background. Using CAMA, anyone

protein interactions [10–13] and genome-wide associations

can verify or unearth new associations between cancers and

(GWAS) [14,15] which has contributed to better understanding

other diseases based on a large medical dataset. The large vol-

of the pathophysiological mechanisms in hereditary diseases,

umes of medical data required for the CAMA project was aided

cancers and neurological disorders. Nevertheless, a compre-

in particular by Taiwan’s highly accessible universal health-

hensive understanding of various mechanisms underlying

care system [32] and high rates of annual doctor visits per

most diseases is still in its infant stages.

individual (15 visits per year).

A shift from traditional methods of phenome-wide asso-

ciations (PWAS) studies is occurring. Current visualization

methodologies rely on case reports and sampled cohort stud- 2. Materials and methods

ies for investigating one-to-one disease relationships [16].

This section provides details of the CAMA system, beginning

The increasing availability of large amounts of data in many

from the process of data gathering to the development of the

national healthcare systems has prompted informaticians

visualization tool. The study consisted of two primary steps:

to begin analyzing multiple disease associations simulta-

neously [17–19]. Furthermore, rapid adoption of electronic

health records has enabled the accumulation of significant I. Establishment of the Disease–Disease Association (DDA)

amounts of patient-level clinical data by many healthcare knowledge database.

providers [20,21]. II. Visualization approach.

One of the prime targets for large-scale data visualiza-

tion techniques is cancer due it being responsible for such a 2.1. Establishment of the disease–disease association

large burden of disease worldwide. Each year, tens of millions (DDA) knowledge database

of people are diagnosed with cancer around the world, and

more than half of the patients eventually die from it. Can- 2.1.1. Data source

cer prevention is as essential component of all cancer control In this study, we used Taiwan’s National Health Insurance

plans because about 40% of all cancer deaths can be pre- (NHI) claims database, a database that covered 99% of Taiwan’s

vented [22]. For example, Larsson et al. observed that 20% population by 2013 [33]. The claims data included outpatient

incidence risk of breast cancer is attributed to type 2 dia- visits, dental visits, hospitalizations, medications prescribed,

betes, a preventable disease [23]. The underling mechanism medications refilled, laboratory and examinations,

may be related to alterations in circulating concentrations of and procedure codes for all of Taiwan’s 23 million people

insulin, insulin-like growth factors (IGFs), and endogenous each year. The National Health Insurance Research Database

sex hormones [24]. Insulin inhibits the production of sex (NHIRDB) has become an important “Big Data” research data

hormone-binding globulin (SHBG) [25], which results in an source that has provided an ideal opportunity to conduct

increase in free steroid hormones, (free estrogens in particu- clinical research [33,34]. Moreover, due to Taiwan’s National

lar) because testosterone successfully competes with estrogen Health Insurance Bureau cross-checking all diagnostic codes

for SHBG [26]. Insulin is also a growth-promoting hormone and medication codes before reimbursement, the codes are

with mitogenic effects in both normal and malignant breast known for their exceptional fidelity.

tissue [27]. Insulin suppresses IGF binding protein-1 and thus

increases bioavailability of IGF-1 [28]. Furthermore, studies 2.1.2. Data gathering and cleaning

have also reported that hyperinsulinemia with insulin resis- Data from Taiwan’s NHI database from January 1, 2000 to

tance may increase the risk of breast cancer [29,30]. Thus, December 31, 2002 was eligible for inclusion in this study.

46 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51

During this timeframe approximately 787.86 million outpa-

Table 1 – An example of disease A associated with

tient visit claims were recorded in the database. Each record in

disease B by 2 × 2 .

the database consisted of a patient’s visit date, patient’s dei-

Disease B Disease A Total

dentified national ID, age, gender, and one to three ICD9-CM

diagnosis codes. We excluded all records due to a miscoded A present A absent

diagnosis, errors regarding the date of birth, or missing or

B present CAB – CB

unknown gender. In total, 5.88 million (0.75%) records were B absent – – –

excluded, while the remaining 782 million visit claims were

Total CA – N

valid records. There were 9.73 million unique males and 10.29

million unique females in the database.

independently of each other, X follows a non-central hyper-

2.1.3. Definition of disease–disease (DD) “association”

geometric distribution X ∼ Hyper(N, CA, CB) [37]. The X1 and

We assumed that two disease were associated if they occurred

X2 were defined as the threshold of CA and threshold of CB

at least once in the same person over 36 consecutive months

respectively; therefore, CAB > Max (0, CA*X1, CB*X2). Moreover,

(3 years). In this study, three year windows (i.e. January 2000 to

the cut-off value ˛ was also defined as a threshold of Q value

December 2002) were used to observe and compute the associ-

to comply with the clinical relationship significance between

ations among cancers and other diseases. All patient records

both diseases A and B. The threshold X1, X2 and cut-off ˛ by

were stratified by gender and age. Therefore, the dataset con-

default were 0.001, 0.001 and 1, respectively.

tained 200 subsets (i.e. 2 genders with 100 age groups). Each

subset was used separately to quantify the disease–disease

2.2. Visualization approach

associations’ strength.

Each patient was followed for 36 months and all of the

After creating a large disease–disease association database,

unique three digit ICD9-CM codes were obtained for each

the next step was to make the knowledge it contains easy

patient (e.g., ICD9-CM code 250 and its children was used to

to visualize. Our goal was to display large amounts of data

define diabetics). All possible pairwise combinations of these

efficiently given limited screen space. We first considered

ICD9-CM codes were then calculated. Disease–disease associ-

rendering each event (i.e. association) as a point. However,

ations were defined as the number of times that two diseases

even mid-sized traces often contain far too many events

co-occurred within a one year window. Q values were calcu-

to display on-screen without causing clutter and losing

lated for each pair in order to measure the association strength

the ability to show patterns and meaningful information.

[35]. In case that no associations were found between a DD

Mostly used in a variety of finance and economics [38–40],

pair, the Q would be equal to 1. Positive DD associations would

the motion chart – a dynamic bubble chart was employed

give Q values greater than 1, and negative DD associations

in this study. This allowed efficient and interactive explo-

would give Q values less than 1. However, DD associations with

ration and visualization of longitudinal multivariate data

less than 5 co-occurrences or less than one by hundred thou-

[41].

sandth of age population [36], by default were considered as

“rare associations” and were not included in this study.

2.2.1. Interface

Our system used the familiar 2D motion bubble chart which

2.1.4. Creating a disease–disease association (DDA)

enabled the display of large multivariate data such as

knowledge database

thousands of disease data points and allowed for interac-

After computing all DD associations for the 200 subsets, we

tive visualization of the data using additional dimensions

stored these associations in a DDA database that contained 3.9

(i.e. time, the size of circles and colors) to show dif-

million and 4.4 million unique associations along with their

ferent characteristics of the data. The central object of

Q values. The Q values were gender-specific, with a separate

a motion chart is a circle. Circles have three important

value for males and a separate value for females. (Figure S1 in

characteristics – size, position and appearance. Using vari-

Appendix).

able mapping, motion allow users to control the

appearance of the circles at different time points. This mech-

2.1.5. Distribution of disease co-occurrences and

anism enhances the dynamic appearance of the data in the

prevalence threshold

motion chart and facilitates the visual inspection of disease

As with any sampled statistic, there is uncertainty about the

associations, patterns and trends in multivariate datasets

true population Q values. We handled this by reporting con-

[41]. Box 1 shows the specific mapping variables in our

fidence intervals for each Q value. In the present version of

study.

method implementation, we intentionally avoid using confi-

dence intervals, p-values or other statistical tools normally

involved in hypothesis testing. We do so in order to reflect 3. Results

the empirical information contained in the data without any

subjective interpretation that could otherwise be introduced In this study, we concentrated on 9 common malignant neo-

through, for example, the choice of a significance level. plasms (i.e. Stomach cancer [ICD9-CM], 151; Colorectal cancer,

Let CAB be an outcome of a random variable X, and con- 153; Rectum rectosigmoid junction cancer, 154; Liver can-

sider disease B as a risk factor for disease A. Table 1 shows cer, 155; Lung cancer, 162; Skin cancer, 173; Breast cancer

the margins CA and CB and assumes that subjects are affected (female), 174; Cervical cancer, 180; and Prostate cancer, 185).

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51 47

Table 2 – Number of associations with 9 cancers by gender.

ICD9-CM – Cancer name Number of Q in male Number of Q in female Total

151 – Stomach cancer 22,062 20,633 42,695

153 – Colorectal cancer 30,653 31,782 62,435

154 – Rectum rectosigmoid junction cancer 24,166 24,412 48,578

155 – Liver cancer 32,817 29,524 62,341

162 – Lung cancer 28,941 26,530 55,471

173 – Skin cancer 17,074 16,977 34,051

174 – Female breast cancer – 32,633 32,633

180 – Cervical cancer – 30,678 30,678

185 – Prostate cancer 22,416 – 22,416

mellitus [ICD9-CM], 250 and Infection of kidney, 590). The

Box 1: Mapping variables in this study.

number of Q values, particular diseases, and co-occurrences

according to age were also shown in , respectively.

Motion Mapping variables in this study chart parameters

4. Discussion

Time Presents age of patients (i.e. 100 age

Cancer is a systemic disease, so other comorbid diseases that a

groups)

cancer patient has will affect the cancer’s clinical course and

X-axis Presents the scale of association’s

the effectiveness of treatments. However, cancer comorbidi-

strength (i.e. Q-values)

ties dramatically change with age and sex, so an animated

Y-axis Presents the scale of count number

visualization of individual cancers and their associations with

of relative disease

other diseases was developed based on this assumption. Ani-

Size of circle Presents the number of

mation enhances the details in a chart and facilitates the

co-occurrence of both diseases A

observation of disease associations, patterns and trends in the

and B

multivariate datasets [41].

Color Presents the category of disease (see

Initially, many were skeptical that what is “the” referring

Table S1 in Appendix)

to associations were real, including the statistician Karl Pear-

son, who wrote: “I must frankly admit that at first I viewed Dr.

Maynard’s conclusion as in some-way based on disregarded

spurious correlation, and due to non-allowance for popula-

Table 2 shows the number of associations with 9 malignant tion, age or general unhealthiness factors. But I have been

neoplasms in this study. The disease associations with their gradually forced by the pressure of these statistical results to

classified categories were observed in the CAMA system. By consider it something very real” [42]. We hypothesized that

default all disease groups were queried and presented in the the CAMA system could visually display the link between

motion chart. Different disease groups were defined in differ- chronic diseases (e.g. diabetes, hypertensive disease, or hyper-

ent colors (see Table S1 in Appendix). A particular disease (i.e. lipidemia, etc.) and cancers. Some studies have reported an

single disease) that was associated with malignant neoplasms increased in the risk of lung cancer in diabetic patients [43,44].

was also presented in CAMA. Fig. 1 shows an overview of our The risk of breast cancer is reported to be approximately 20%

CAMA system interface. higher and colorectal cancer 30% higher in diabetic subjects

A visualization and animation of various diseases asso- compared to those without diabetes [43,45,46]. Those results

ciated with female breast cancer were performed in three are consistent with the Q values which we observed in our

different age groups (i.e. 50, 60, and 70 year-old), showed in system. While this is a good start, several unanswered ques-

Fig. 2. Moreover, the size of the circles shows the number tions have also been raised. For example, it remains unclear

of co-occurrences of a particular disease with breast cancer. whether the association between diabetes and cancer is direct

We observed that the circles could appear in four different (e.g., hyperglycemia leads to cancer), whether diabetes is a

areas of the chart (i.e. right-top, left-top, right-bottom, and marker of underlying biologic factors that alter cancer risk

left-bottom). For example, a circle in the top-left of the chart (e.g., insulin resistance and hyperinsulinemia), or whether

means that the disease has a higher prevalence but has lower diabetes is a confounder for cancer. That is, diabetes is related

association with breast cancer. A circle at the bottom-right to a common risk factor such as obesity, which is also a risk

of the chart shows that the disease is more likely associ- factor for cancer. Other common risk factors for cancer include

ated with breast cancer however, it has a lower prevalence lack of physical activity, diet, race/ethnicity, alcohol, smoking,

in Taiwan. etc.

The motion chart could also display a trend of a par- Nevertheless, this study has potential advantages. First,

ticular disease by double clicking on the circle. Visualizing this work is the first time that a visualization system using

disease–disease associations according to individual age motion charts has been applied to health care data specifi-

groups was done and is presented in Fig. 3 (i.e. Diabetes cally to explore the relationships between cancers and other

48 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51

Fig. 1 – Overview of the CAMA system interface.

Fig. 2 – Illustrations of other diseases with female breast cancer changes across 50-age, 60-age and 70-age.

diseases. The second advantage is that the CAMA anima- a useful tool for doctors, epidemiologists, and public health

tion system focused on the association of 9 common cancers researchers.

with their comorbidities that could reveal several related dis- However, the study also has some limitations. Our study

eases in early cancer risk detection. The motion chart provides is based on the assumption that two diseases are associ-

an array of opportunities and challenges with regard to dis- ated if they were recorded for the same patient within the

ease classification and may aid in the development of new same 1-year observation window, so we did not observe any-

treatment approaches. Third, physicians and researchers can thing outside the window. Both acute and chronic diseases

easily observe trends over time for a particular disease with a were treated identically, thus, regardless of how many times

single mouse click. Moreover, researchers could filter out weak a disease was observed during the observation period, it was

associations with cancers by setting the threshold of number counted as a single occurrence. The motion chart presented

patients or Q values respectively. Fourth, the CAMA visual- in this study does not reveal causation of diseases, either,

ization system is well-suited for displaying cancer-disease although it could be used to identify notable associations

associations in big datasets such as Taiwan’s National Health among diseases. This could generate hypotheses that would

Insurance Database. Thus, the CAMA motion chart can be otherwise be difficult to imagine.

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51 49

Fig. 3 – Display trends of particular disease by age.

gave technical support. U.I., C-K.H., D.L.C., and R.L. wrote the

5. Conclusion manuscript.

The animated visualization of cancer-disease associations

using a large medical dataset was accomplished by building

Funding

a motion chart called Cancer Associations Map Animation

(CAMA). CAMA was used to analyze 782 million outpatient

This research is sponsored in part by Ministry of Science and

visits obtained from NHIRDB over a three-year period and

Technology (MOST) under grant MOST 103-2221-E-038-014,

provided dynamic animation of cancer-disease association

MOST 103-2221-E-038-016, MOST 103-2622-E-038-004-CC2,

across different age groups and gender. Such information

Ministry of Health and Welfare (MOHW), Taiwan, under grant

can be used to identify comorbidity relationships for clin-

MOHW103-TD-B-111-01, MOHW103-CC-EMR-05, Health and

icians and provide additional reference data for medical

researchers. Welfare Surcharge of Tobacco Products grant MOHW104-

TDU-B-212-124-001, Taipei Medical University under grant

99TMU-WFH-10, 101TMU-SHH-21, TMU102-AE1-B31, Taipei

Medical University and Taipei Medical University Hospi-

tal (101-TMU-TMUH-03) and Ministry of Education, Taiwan,

Author contributions

under grant TMUTOP103006-6.

U.I., C-K.H, W-S.J., and Y-C.L. invented and developed the con-

cept. Y-C.L., M-H.H. obtained the data. Y-C.C., P-A.N., C-Y.H.,

Competing interests

and H-C.Y organized and validated the data. Y-C.C., P-A.N.,

W-S.J., S-S.A. reviewed the methods. C-K.H., C-W.H. imple-

mented the method and developed the website. W-S.J., Y-C.W. None.

50 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51

associations: a population-based observational study, J. Am.

Acknowledgements

Med. Inform. Assoc.: JAMIA (2015).

[17] H. Cao, G. Hripcsak, M. Markatou, A statistical methodology

We would like to thank Mr. Yong-Fu Yen from College of Med- for analyzing co-occurrence data from a large sample, J.

ical Science & Technology, Taipei Medical University, Taiwan, Biomed. Inform. 40 (3) (2007) 343–352.

[18] H. Cao, M. Markatou, G.B. Melton, M.F. Chiang, G. Hripcsak,

has contributed in building the web-site for CAMA system.

Mining a clinical data warehouse to discover disease-finding

associations using co-occurrence statistics, in: AMIA Annual

Symposium Proceedings, American Medical Informatics

Appendix A. Supplementary data

Association, 2005, p. 106.

[19] E.S. Chen, G. Hripcsak, H. Xu, M. Markatou, C. Friedman,

Supplementary data associated with this article can be Automated acquisition of disease–drug knowledge from

biomedical and clinical documents: an initial study, J. Am.

found, in the online version, at http://dx.doi.org/10.1016/

Med. Inform. Assoc. 15 (1) (2008) 87–98.

j.cmpb.2016.01.009.

[20] P. Nambisan, G.L. Kreps, S. Polit, Understanding electronic

medical record adoption in the United States:

communication and sociocultural perspectives, Interact. J.

r e f e r e n c e s

Med. Res. 2. (1) (2013).

[21] I.M. Xierali, C.-J. Hsiao, J.C. Puffer, L.A. Green, J.C. Rinaldo,

A.W. Bazemore, M.T. Burke, R.L. Phillips, The rise of

[1] J. Appleby, A. Harrison, Spending on Health Care: How Much

electronic health record adoption among family physicians,

is Enough? King’s Fund, 2006.

Ann. Fam. Med. 11 (1) (2013) 14–19.

[2] F.G. Castles, Population aging and the public purse: how real

[22] WH Organization, Cancer Control: Knowledge into Action:

is the problem? in: Australasian Political Studies Association

WHO Guide for Effective Programmes, vol. 2, World Health

Conference, Canberra, Australia, October 3–6, 2000.

Organization, 2007.

[3] A. Gandjour, Aging diseases – do they prevent preventive

[23] S.C. Larsson, C.S. Mantzoros, A. Wolk, Diabetes mellitus and

health care from saving costs? Health Econ. 18 (3) (2009)

risk of breast cancer: a meta-analysis, Int. J. Cancer 121 (4) 355–362.

(2007) 856–862.

[4] D.B. Reuben, E. Keeler, T.E. Seeman, A. Sewall, S.H. Hirsch,

[24] E.J. Gallagher, D. LeRoith, Diabetes, cancer, and metformin:

J.M. Guralnik, Development of a method to identify seniors

connections of metabolism and cell proliferation, Ann. N. Y.

at high risk for high hospital utilization, Med. Care 40 (9)

Acad. Sci. 1243 (1) (2011) 54–68.

(2002) 782–793.

[25] S.R. Plymate, R.C. Hoop, R.E. Jones, L.A. Matej, Regulation of

[5] P. Greenwald, A favorable view: progress in cancer

sex hormone-binding globulin production by growth factors,

prevention and screening, Cancer Prevent. (2007) 3–17,

Metabolism 39 (9) (1990) 967–970.

Springer.

[26] W. Rosner, The functions of corticosteroid-binding globulin

[6] S. Lee, H. Huang, M. Zelen, Early detection of disease and

and sex hormone-binding globulin: recent advances, Endocr.

scheduling of screening examinations, Stat. Methods Med.

Rev. 11 (1) (1990) 80–91.

Res. 13 (6) (2004) 443–456.

[27] B. van der Burg, G.R. Rutteman, M.A. Blankenstein, S.W. de

[7] L. Li, M. Shiga, W.K. Ching, H. Mamitsuka, Annotating gene

Laat, E.J. van Zoelen, Mitogenic stimulation of human breast

functions with integrative spectral clustering on microarray

cancer cells in a growth factor-defined medium: synergistic

expressions and sequences, Genome Inform. (2010) 95–120,

action of insulin and estrogen, J. Cell. Physiol. 134 (1) (1988)

World Scientific. 101–108.

[8] U.F. Lingappa, X. Wu, A. Macieik, S.F. Yu, A. Atuegbu, M.

[28] C.A. Conover, P.D. Lee, J.A. Kanaley, J.T. Clarkson, M.D.

Corpuz, J. Francis, C. Nichols, A. Calayag, H. Shi, Host–rabies

Jensen, Insulin regulation of insulin-like growth factor

virus protein–protein interactions as druggable antiviral

binding protein-1 in obese and nonobese humans, J. Clin.

targets, Proc. Natl. Acad. Sci. 110 (10) (2013) E861–E868.

Endocrinol. Metab. 74 (6) (1992) 1355–1360.

[9] S.S. Ray, S. Bandyopadhyay, S.K. Pal, Gene ordering in

[29] P.F. Bruning, J.M. Bonfrer, P.A. van Noord, A.A. Hart, M. de

partitive clustering using microarray expressions, J. Biosci.

Jong-Bakker, W.J. Nooijen, Insulin resistance and

32 (Suppl. 1) (2007) 1019–1025.

breast-cancer risk, Int. J. Cancer 52 (4) (1992) 511–516.

[10] M.W. Gonzalez, M.G. Kann, Protein interactions and disease,

[30] G. Yang, G. Lu, F. Jin, Q. Dai, R. Best, X.-O. Shu, J.-R. Chen, X.-Y.

PLoS Comput. Biol. 8 (12) (2012) e1002819.

Pan, M. Shrubsole, W. Zheng, Population-based, case–control

[11] J.N. Hirschhorn, M.J. Daly, Genome-wide association studies

study of blood C-peptide level and breast cancer risk, Cancer

for common diseases and complex traits, Nat. Rev. Genet. 6

Epidemiol. Biomark. Prevent. 10 (11) (2001) 1207–1211.

(2) (2005) 95–108.

[31] WHO, Cancer, vol. 2014, 2014.

[12] Y. Lee, H. Li, J. Li, E. Rebman, I. Achour, K.E. Regan, E.R.

[32] W.S.H. Chan, Taiwan’s healthcare report 2010, EPMA J. 1 (4)

Gamazon, J.L. Chen, X.H. Yang, N.J. Cox, Network models of

(2010) 563–585.

genome-wide association studies uncover the topological

[33] U. Iqbal, P.A. Nguyen, S. Syed-Abdul, H.C. Yang, C.W. Huang,

centrality of protein interactions in complex diseases, J. Am.

W.S. Jian, M.H. Hsu, Y. Yen, Y.C. Li, Is long-term use of

Med. Inform. Assoc. 20 (4) (2013) 619–629.

benzodiazepine a risk for cancer? Medicine 94 (6) (2015)

[13] M. Vidal, D.W. Chan, M. Gerstein, M. Mann, G.S. Omenn, D. e483.

Tagle, S. Sechi, The human proteome – a scientific

[34] Y.-C. Chen, H.-Y. Yeh, J.-C. Wu, I. Haschler, T.-J. Chen, T.

opportunity for transforming diagnostics, therapeutics, and

Wetter, Taiwan’s National Health Insurance Research

healthcare, Clin. Proteomics 9 (1) (2012) 6.

Database: administrative health care database as study

[14] M. Baker, Genomics: the search for association, Nature 467

object in bibliometrics, Scientometrics 86 (2) (2011)

(7319) (2010) 1135–1138. 365–380.

[15] M. Vidal, M.E. Cusick, A.-L. Barabasi, Interactome networks

[35] P.A. Nguyen, S. Syed-Abdul, U. Iqbal, M.-H. Hsu, C.-L. Huang,

and human disease, Cell 144 (6) (2011) 986–998.

H.-C. Li, D.L. Clinciu, W.-S. Jian, Y.-C.J. Li, A probabilistic

[16] S. Syed-Abdul, M. Moldovan, P.A. Nguyen, R. Enikeev, W.S.

model for reducing medication errors, PLOS ONE 8 (12)

Jian, U. Iqbal, M.H. Hsu, Y.C. Li, Profiling phenome-wide (2013) e82401.

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 2 7 ( 2 0 1 6 ) 44–51 51

[36] H.P. Administration, Health Surveillance, vol. 2014, Ministry [43] M.-Y. Lee, K.-D. Lin, P.-J. Hsiao, S.-J. Shin, The association of

of Health and Welfare, Taiwan, 2014. diabetes mellitus with liver, colon, lung, and prostate cancer

[37] C.J. Lloyd, Statistical Analysis of Categorical Data, Wiley, is independent of hypertension, hyperlipidemia, and gout in

New York, 1999. Taiwanese patients, Metabolism 61 (2) (2012) 242–249.

[38] A. Grossenbacher, The globalisation of statistical content, [44] E.T. Petridou, T.N. Sergentanis, C.N. Antonopoulos, N.

Stat. J. IAOS 25 (3) (2008) 133–144. Dessypris, I.L. Matsoukis, K. Aronis, A. Efremidis, C. Syrigos,

[39] H. Rosling, Global Population Growth, Box by Box, vol. 2014, C.S. Mantzoros, Insulin resistance: an independent risk

TED@Cannes, 2010. factor for lung cancer? Metabolism 60 (8) (2011) 1100–1106.

[40] H. Rosling, Religions and Babies, vol. 2014, TedxSummit, [45] M. Inoue, M. Iwasaki, T. Otani, et al., Diabetes mellitus and

2012. the risk of cancer: results from a large-scale

[41] J. Al-Aziz, N. Christou, I.D. Dinov, SOCR motion charts: an population-based cohort study in Japan, Arch. Intern. Med.

efficient, open-source, interactive and dynamic applet for 166 (17) (2006) 1871–1877.

visualizing longitudinal multivariate data, J. Stat. Educ. 18 (3) [46] S.C. Larsson, N. Orsini, A. Wolk, Diabetes mellitus and risk of

(2010) 1–29. colorectal cancer: a meta-analysis, J. Natl. Cancer Inst. 97

[42] K. Pearson, A. Lee, E.M. Elderton, On the correlation of (22) (2005) 1679–1687.

death-rates, J. R. Stat. Soc. 73 (5) (1910) 6.