TAGR: Telehealth Automatically Generated Recommendations
Total Page:16
File Type:pdf, Size:1020Kb
University of the Philippines Manila College of Arts and Sciences Department of Physical Sciences and Mathematics TAGR: Telehealth Automatically Generated Recommendations A special problem in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science Submitted by: Kryle Marxel E. Molina June 2016 Permission is given for the following people to have access to this SP: Available to the general public Yes Available only after consultation with author/SP adviser No Available only to those bound by confidentiality agreement No ACCEPTANCE SHEET The Special Problem entitled \TAGR: Telehealth Automatically Gen- erated Recommendations" prepared and submitted by Kryle Marxel E. Molina in partial fulfillment of the requirements for the degree of Bachelor of Science in Com- puter Science has been examined and is recommended for acceptance. Marvin John C. Ignacio, M.Sc.(candidate) Adviser EXAMINERS: Approved Disapproved 1. Gregorio B. Baes, Ph.D. (candidate) 2. Avegail D. Carpio, M.Sc. 3. Richard Bryann L. Chua, Ph.D. (candidate) 4. Perlita E. Gasmen, M.Sc. (candidate) 5. Ma. Sheila A. Magboo, M.Sc. 6. Vincent Peter C. Magboo, M.D., M.Sc. Accepted and approved as partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science. Ma. Sheila A. Magboo, M.Sc. Marcelina B. Lirazan, Ph.D. Unit Head Chair Mathematical and Computing Sciences Unit Department of Physical Sciences Department of Physical Sciences and Mathematics and Mathematics Leonardo R. Estacio Jr., Ph.D. Dean College of Arts and Sciences i Abstract Telehealth Automatically Generated Recommendations (TAGR) is a module for auto- matically classifying SMS and e-mail messages with their appropriate tags as required by the National Telehealth Center for their National Telehealth Service Program (NTSP). It utilizes support vector machines in text classification which are config- urable in a web-based interface. The default classifier trained on the base dataset gives an average accuracy of 73% via 10-fold cross validation. Keywords: natural language processing, text classification, machine learning, support vector machine Contents Acceptance Sheeti Abstract ii List of Figuresv List of Tables vii I. Introduction1 A. Background of the Study....................... 1 B. Statement of the Problem....................... 3 C. Objectives of the Study........................ 4 D. Significance of the Project ...................... 6 E. Scope and Limitations ........................ 6 F. Assumptions.............................. 7 II. Review of Related Literature8 III. Theoretical Framework 14 A. National Telehealth Center...................... 14 B. Telemedicine.............................. 14 C. Doctors to the Barrios ........................ 15 D. Referrals................................ 16 E. Machine Learning........................... 17 F. Natural Language Processing..................... 18 G. N-gram................................. 19 H. Term Frequency-Inverse Document Frequency (TF-IDF) . 19 I. Chi Square (χ2) Feature Selection.................. 20 J. Support Vector Machine (SVM)................... 20 iii K. web2py................................. 22 L. scikit-learn............................... 22 IV. Design and Implementation 23 A. Use Case Diagram........................... 23 B. Data Flow Diagram.......................... 25 C. Database Design............................ 30 D. Data Dictionary............................ 31 E. System Architecture.......................... 32 F. Technical Architecture ........................ 33 V. Results 34 A. General View ............................. 34 B. Telehealth Nurse View ........................ 35 C. Administrator View.......................... 40 VI. Discussions 48 VII. Conclusions 51 VIII. Recommendations 52 IX. Bibliography 53 X. Appendix 60 A. Source Code.............................. 60 XI. Acknowledgement 77 iv List of Figures 1 Referral workflow............................. 16 2 Supervised learning process ....................... 18 3 Support vector machine; generated optimal hyperplane . 21 4 Support vector machine with kernel trick................ 22 5 Overview use case diagram........................ 23 6 Administrator use case diagram..................... 24 7 Top level data flow diagram ....................... 25 8 Explosion of data flow diagram ..................... 25 9 Sub-explosion for 1. Spam blocker and message tagger . 26 10 Sub-explosion for 1.1 Spam blocker................... 26 11 Sub-explosion for 1.2 Message tagger .................. 26 12 Sub-explosion for 2. Verification, Mark as spam............ 27 13 Sub-explosion for 2. Verification, Unmark as spam........... 27 14 Sub-explosion for 2. Verification, Change message tag......... 27 15 Sub-explosion for 3. Classifier configuration .............. 28 16 Sub-explosion for 3.1 Configure the blocker............... 28 17 Sub-explosion for 3.2 Configure the tagger, Train a new model . 29 18 Sub-explosion for 3.2 Configure the tagger, Load a model . 29 19 Sub-explosion for 3.2 Configure the tagger, Choose a model . 30 20 Entity relationship diagram ....................... 30 21 Log in page, TAGR ............................ 34 22 Nurse welcome page, TAGR ....................... 35 23 Edit profile details, TAGR ........................ 36 24 Edit password, TAGR .......................... 36 25 View Messages, TAGR .......................... 37 26 Confirm removal of verified status, TAGR . 37 v 27 Confirm marking a message as spam, TAGR . 38 28 Verify a message, TAGR ......................... 38 29 View Filtered Messages, TAGR ..................... 39 30 Confirm unmarking a message as spam, TAGR . 39 31 Admin welcome page, TAGR ...................... 40 32 Configure message filter list, TAGR ................... 41 33 Adding an item to the message filter list, TAGR . 41 34 Deleting an item from the message filter list, TAGR . 42 35 TAGR Settings page, TAGR ....................... 42 36 Train a new model, TAGR ........................ 43 37 Training evaluation, TAGR ....................... 43 38 Manual testing, TAGR .......................... 44 39 Manual testing with prediction, TAGR . 44 40 Save the trained model, TAGR ..................... 45 41 Load a trained model, TAGR ...................... 45 42 Loaded model evaluation, TAGR .................... 46 43 Choose a model to use, TAGR ...................... 46 44 A different model is deployed, TAGR . 47 45 Hidden input page for simulating new messages, TAGR . 47 vi List of Tables 1 auth user table .............................. 31 2 auth membership table.......................... 31 3 auth group table ............................. 31 4 t message table.............................. 31 5 t filter table................................ 32 6 t model table ............................... 32 7 Specifications of the machines used in development and testing . 33 8 Data from National Telehealth Center ................. 49 vii I. Introduction A. Background of the Study In most countries, providing healthcare to the public is a responsibility of the govern- ment. As such, governments will have bodies and mechanisms in place for a working public health system. In the Philippines, the Department of Health (DOH) fulfills this role as the country's national health policy maker and regulatory institution [1]. Part of its mandate is attaining universal healthcare which translates to accessible and equitable quality healthcare for every Filipino. Its mission and vision places emphasis on guaranteeing health especially for the poor [2]. This raises the issue of ensuring that healthcare is accessible even to those living in remote and rural areas. There is a disparity of healthcare attention in rural and urban areas, leaning towards the latter. This is evidenced by the abundance of health centers and personnel near metropolitan areas in contrast with the areas far from urban centers. Hospitals in the National Capital Region (NCR) and Region IV-A make up 17% of all hospitals in all regions, while hospitals in the Autonomous Region of Muslim Mindanao (ARMM) and Region XIII together only amount to 7%. In perspective, Region IV-B has a slightly smaller population than that of ARMM, but it has almost twice the number of hospitals than the latter [3]. Consequently, health workers are more concentrated in Regions III, IV-A and the NCR than in Mindanao. However, the shortage of health workers does not stop at the level of urban-rural differences. The Philippines, in general, does not have the sufficient amount of health professionals to see to the needs of the population. One cited cause is the increased migration of health workers throughout the years [4]. Despite the shortage of health workers, the DOH still attempts to addresses the problem of distribution with programs like Doctors to the Barrios (DttB). The DttB program deploys physicians to underdeserved and difficult-to-access municipalities 1 for at least two years [5]. The University of the Philippines College of Medicine (UPCM) has also looked into mandatory return of service programs as a way to bring more doctors to where they are needed the most [6]. While these programs may alleviate the problem of shortage and distribution of personnel, they do not involve the construction of more health centers to accommodate the communities. Rural physicians might not have all the facilities and resources that they may need during their deployment. The University of the Philippines Manila - National Telehealth Center (NThC) performs research and development to improve