Artificial Neural Networks Approach to Early Lung Cancer Detection
Total Page:16
File Type:pdf, Size:1020Kb
Cent. Eur. J. Med. • 9(5) • 2014 • 632-641 DOI: 10.2478/s11536-013-0327-6 Central European Journal of Medicine Artificial neural networks approach to early lung cancer detection Research Article Krzysztof Goryński*1, Izabela Safian2, Włodzimierz Grądzki3, Michał Piotr Marszałł1, Jerzy Krysiński4, Sławomir Goryński5, Anna Bitner6, Jerzy Romaszko7, Adam Buciński2 1 Department of Medicinal Chemistry, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, 85-089 Bydgoszcz, Poland 2 Department of Biopharmacy, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, 85-089 Bydgoszcz, Poland 3 Ward of Diagnostic-monitoring of Tuberculosis and Illness of Lungs, Voivodship Centre of the pulmonology, 85-326 Bydgoszcz, Poland 4 Department of Pharmaceutical Technology, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, 85-089 Bydgoszcz, Poland 5 Department of Paliative Medicine, Regional Specialist Hospital in Grudziadz, 86-300 Grudziadz, Poland 6 Chair and Department of Hygiene and Epidemiology, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, 85-094 Bydgoszcz, Poland 7 NZOZ Pantamed Sp z o.o. in Olsztyn, ul. Pana Tadeusza 6, 10-461, Olsztyn, Poland Received 10 October 2012; Accepted 5 March 2014 Abstract: Lung cancer is rated with the highest incidence and mortality every year compared with other forms of cancer, therefore early detection and diagnosis is essential. Artificial Neural Networks (ANNs) are “artificial intelligence” software which have been used to assess a few prognostic situations. In this study, a database containing 193 patients from Diagnostic and Monitoring of Tuberculosis and Illness of Lungs Ward in Kuyavia and Pomerania Centre of the Pulmonology (Bydgoszcz, Poland) was analysed using ANNs. Each patient was described using 48 factors (i.e. age, sex, data of patient history, results from medical examinations etc.) and, as an output value, the expected presence of lung cancer was established. All 48 features were retrospectively collected and the database was divided into a training set (n=97), testing set (n=48) and a validating set (n=48). The best prediction score of the ANN model (MLP 48-9-2) was above 0.99 of the area under a receiver operator characteristic (ROC) curve. The ANNs were able to correctly classify 47 out of 48 test cases. These data suggest that Artificial Neural Networks can be used in prognosis of lung cancer and could help the physician in diagnosis of patients with the suspicion of lung cancer. Keywords: Artificial Neural Networks • Cancer diagnosis • Lung cancer• Risk factors © Versita Sp. z o.o 1. Introduction of abnormal cells in the lung. In 2009, about 15% of all cancer cases and 29% of all cancer-related deaths were Lung cancer is a serious, worldwide health and epidemi- due to lung cancer [1,2]. It is rated with the highest death ological problem. It is defined as the uncontrolled growth rate every year and is the second most diagnosed after * E-mail: [email protected] 632 K Goryński et al prostate (28%) and breast (28%) cancers (for men and and Gleason sum. The published literature in the last women, respectively) [1,2]. Lung cancer is most often decade suggests that ANN models have proven to be found in elderly people because it develops over a long valuable tools in reducing the workload of oncologists by period of time. Falling ill with lung cancer peaks after the detecting cancer symptoms and providing decision sup- age of 70 years [3]. port, potentially with the ability to automatically estimate Many lung cancers are diagnosed at an advanced the model on-line. stage, leading to a poor prognosis. Early diagnosis of In this study, the ANN model was successful built lung cancer is therefore important to facilitate treatment and used to achieve a highly accurate set of patients and potential cure whilst the disease is still in its early with developing or already suffering from lung cancer stages [4,5]. An Artificial Neural Network (ANN) is a tool from a wide, healthy population. In other words, ANNs that be used to assist the physician with the diagnosis of can be a useful tool for the selection of the smallest patients with the suspicion of the lung cancer. group with highest risk of developing lung cancer. An ANN is a computational model based on the human brain. ANNs contain some nodes which are connected through weights. Each node receives data 2. Materials and methods from previous nodes, adds it together and outputs data through a nonlinear function, and then propagates data 2.1. Patient characteristics to proceeding nodes. The first neuronal layer of the ANN is the input layer composed of variable number The study was based on retrospectively reviewed of collected data from observation. The next additional clinical records of patients treated at the Diagnostic and neuronal layers compose of hidden layers created to Monitoring of Tuberculosis and Illness of Lungs Ward generate a variable number of numerical combinations. in Kuyavia and Pomerania Centre of the pulmonology The last neuronal layer named output layer generates in Bydgoszcz (Poland) in 2006. The data collected in- the answer (numbers that represent the output). Most of cluded demographic factors (e.g. age, education, place ANN models are based on the idea of supervised train- of residence, etc.), clinical data from biochemical mea- ing. There are two phases in the ANN action: training/ surements (e.g. haemoglobin, creatinine, hematocrit, learning phase and test phase. A validating phase is etc.), results obtained from spirometry tests (e.g. forced still often made. In training/learning phase, input data expiratory volume in 1 second, forced vital capacity, are presented to the ANN and weights are adjusted and etc.) as well as information gathered from interview fixed. In other words, the ANN does indeed learn the (e.g. lung cancer in the family, dyspnea, weight loss, input patterns in the learning phase. In folllowing test etc.) and examination (e.g. auscultatory changes above phase, the data which are not used in the previous pulmonary fields, form of the X-ray examination, etc.) (training/learning) phase are presented to the ANN and done by primary care physician and specialist doctor. the ANN’s outputs are used to estimate its learning. If All variables considered in this study are presented in the estimate of ANN’s action is satisfactory, it can be Table 1. used in its own specific application [6-8]. A total of 193 patients entered the study, aged Many scientists from different fields of technology between 26 and 85 years. Within the study group, the and science use neural networks to solve problems median age was 61 years, 70.5% were men, and more in control, pattern classification, function approxima- than 40% patients live in the provinces. The major- tion and medical diagnosis. In recent times ANNs are ity of the respondents were current cigarette smokers becoming quite popular in medicine, particularly for (91.7%). Diagnosis of lung cancer was reported for 103 clinical diagnosis based on experimental and clinical patients. Baseline characteristics of the study popula- data [9-13]. ANNs are a convenient tool in various tasks tion are shown in Table 2. such as for blood cell classification, EEG, EMG, ECG This work has been approved by the Research Eth- analysis and bone fracture healing assessment, for di- ics Committee at the Collegium Medicum in Bydgoszcz, agnosis, healing, and prognosis in hypertension [14-16]. Nicolaus Copernicus University in Torun, Poland. Recently many publications refer to the application of neural networks to predict and diagnosis breast, ovar- 2.2. ANN Analysis ian, gastrointestinal, bronchial, skin or prostate cancer [17-20]. More often the ANNs performance is compared Artificial neural networks were used for the analysis with traditional screening methods using in oncology data and Statistica v. 8 software (StatSoft, Inc. Stat- e.g. in prostate cancer (prostate specific antigen, PSA), soft, Tulsa, OK., USA) was applied. Statistica Neural serum marker, digital rectal examination, age and race, Networks software randomly divides data set into three 633 Artificial neural networks approach to early lung cancer detection Table 1. Variables considered in the ANN analysis. No Variable name Variable value 1 Age years 2 Height cm 3 Weight kg 4 Sex (1) – female, (2) - male 5 Zubrod score (0) - asymptomatic (Fully active, able to carry on all predisease activities without restriction) (1) - symptomatic but completely ambulatory (Restricted in physically strenuous activity but ambulatory and able to carry out work of a light or sedentary nature. For example, light housework, office work) (2) - symptomatic, <50% in bed during the day (Ambulatory and capable of all self care but unable to carry out any work activities. Up and about more than 50% of waking hours) (3) - symptomatic, >50% in bed, but not bedbound (Capable of only limited self-care, confined to bed or chair 50% or more of waking hours) (4) - bedbound (Completely disabled. Cannot carry on any self-care. Totally confined to bed or chair) 6 Place of residence (1) – capital of province, (2) – city of county, (3) – borough, (4) – village, (5) - homeless 7 Education (0) – N/A, (1) – elementary, (2) – vocational, (3) – secondary, (4) - university 8 Occupational exposure (0) – ND, (1) – absence of exposure, (2) – asbestos, (3) – silicas, (4) - aromatic organic compounds 9 Smoking cigarettes (0) - non-smoker, (1) – smoker 10 Chronic obstructive pulmonary