De-Identified Multidimensional Medical Records for Disease Population Demographics and Image Processing Tools Development

DE-IDENTIFIED MULTIDIMENSIONAL MEDICAL RECORDS FOR DISEASE POPULATION DEMOGRAPHICS AND IMAGE PROCESSING TOOLS DEVELOPMENT DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Barbaros Selnur Erdal, D.D.S., M.S. ***** Electrical and Computer Engineering The Ohio State University 2011 Dissertation Committee: Prof. Bradley D. Clymer, Adviser Prof. Elliott D. Crouser Prof. Umit V. Catalyurek Prof. Kun Huang c Copyright by Barbaros Selnur Erdal 2011 ABSTRACT Recently, The National Institute of Health (NIH) has outlined its scientific priorities in a strategic plan, “NIH Roadmap for Medical Research”. In direct alignment with these priorities, many academic and research oriented medical institutions across The United States conduct numerous clinical and translational research studies on an ongoing basis. From a personalized health care and translational research perspective, quite often efforts of such nature will span across multiple departments or even institutions. We consider these activ- ities as a knowledge and information flow which is taking place around multidimensional, heterogeneous clinical and research data that is collected from disparate sources. The primary objective of the research and development described in this thesis is to pro- vide an integrative platform where multidimensional data from multiple disparate sources can be easily accessed, visualized, and analyzed. We believe that ability to execute such truly integrative queries, visualizations and analyses across multiple data types is critical to the ability to execute highly effective clinical and translational research. Therefore, to address the preceding gap in knowledge, we introduce a model computational framework that is intended to support the integrative query, visualization and analysis of struc- tured data, narrative text, and image data sets in support of translational research activi- ties. The introduced framework also aims to address the challenges posed by regulatory compliance, patient privacy/confidentiality concerns, and the need to facilitate multicenter research paradigms. ii dedicated to Sevinc and Deniz iii ACKNOWLEDGMENTS I would like to thank several people who believed in me and supported me throughout my doctoral studies and particularly during my dissertation. First and foremost, I would like to express my sincere thanks to my mentors: my advisor Dr. Bradley Clymer and Dr. Elliott Crouser. They never quit believing in me and supported me, even when nobody thought the things I wanted to accomplish were possible. Because of their guidance, I am a better researcher and a professional. I also would like to thank to my other committee members Dr. Umit Catalyurek for his continuous support and Dr. Kun Huang for his constructivist feedback. Also, earnest thanks and appreciation to Dr. Philip Payne, Dr. Nathan Hall and Dr. Michael Knopp for their support and collaboration in many research projects. Special thanks to Dr. Hakan Ferhatosmanoglu, Dr. Han-Wei Shen, and Dr. Umit Catalyurek for their support and encouragement in early stages of my career. If it weren’t for their support, this degree wouldn’t have been possible. Great appreciation is also dedicated to my supervisors from James Cancer Hospital, Dr. Miguel Villalona and Dr. Gregory Otterson for bringing me into OSU Medical Center. Also thanks to fellow colleagues Dr. Mehmet Kale, Dr. Lee Cooper, Dr. Olcay Sertel and Brian Myers for being great friends and supporters; my supervisor, mentor and great friend Dr. Felix Liu, for being there in every stage of this journey and all my fellow water polo teammates, for giving me a mental escape and the occasional battle scars. iv Last but not least, heartfelt thanks go to my mom, Selma; my sister Funda; my aunt Rengin; my uncle Engin; my stepdad Mustafa; my in-laws Tumer and Memnune; my brother-in-law Kurtulus; and my best friend Akif for helping me get to this point and pro- viding me a great deal of emotional support. Finally, heartfelt love and appreciation is offered to my wife Sevinc and my son Deniz, whose loving support and encouragement carried me through the tough times and reminded me the priorities in life. Thank you all! v VITA August 01, 1972 . Born - Istanbul, Turkey 1991-1997 . .D.D.S, Dentistry, Ege University 1997-1998 . .Dentist (Private Practice), Karsiyaka, Izmir, Turkey 2000-2001 . .Research Assistant, James Cancer Hospi- tal / Division of Hematology and Oncol- ogy, The Ohio State University Medical Center 2001-2005 . .M.S, Bioinformatics, The Ohio State Uni- versity 2005-2009 . .Senior Systems Consultant, The Ohio State University Medical Center 2006-2009 . .M.S, Electrical & Computer Engineering, The Ohio State University 2009-present . .Ph.D.c, Electrical & Computer Engineer- ing, The Ohio State University 2009-present . .Biomedical Informatics Consultant, The Ohio State University Medical Center PUBLICATIONS Research Publications Peer Reviewed Journal Publications Nadella P, Shapiro C, Otterson GA, Hauger M, Erdal S, Kraut E, Clinton S, Shah M, Stanek M, Monk P and Villalona-Calero MA, Pharmacobiologically Based Scheduling vi of Capecitabine and Docetaxel Results in Antitumor Activity in Resistant Human Malig- nancies, Journal of Clinical Oncology, vol. 20(11), pp. 2616–2623, 2002. Altiparmak F, Ferhatosmanoglu H, Erdal S and Trost DC, Information Mining over Het- erogeneous and High Dimensional Time Series Data in Clinical Trials Databases, IEEE Transactions on Information Technology in Biomedicine, vol. 10(2), pp. 254–263, 2006. Erdal S, Catalyurek UV, Payne P, Saltz J, Kamal J and Gurcan MN, A Knowledge- Anchored Integrative Image Search and Retrieval System, J Digit Imaging, vol. 22(2), pp. 166–182, 2009. Stawicki SP, Schuster D, Liu J, Kamal J, Erdal S, Gerlach AT, Whitmill ML, Lindsey DE, Thomas YM, Murphy C, Steinberg SM and Cook CH, Introducing the glucogram: Description of a novel technique to quantify clinical significance of acute hyperglycemic events, OPUS 12 Scientist, vol. 3(1) pp. 1–5, 2009. Stawicki SPA, Schuster D, Liu J, Kamal J, Erdal S, Gerlach AT, Whitmill ML, Lindsey DE, Murphy C, Steinberg SM and Cook CH, The glucogram: A new quantitative tool for glycemic analysis in the surgical intensive care unit, Int. J. Crit. Ill. & Inj. Sci, vol. 1(1) pp. 5–12, 2011. Erdal BS, Liu J, Ding J, Chen J, Marsh CB, Kamal J and Clymer BD, A Database De- identification Framework to Enable Direct Queries on Medical Data for Secondary Use, Methods Inform Med, 2011. Peer Reviewed Conference/Symposia Publications and Demonstrations Erdal S, Ozturk O, Armbruster A, Ferhatosmanoglu F and Ray WC, A Time Series Anal- ysis of Microarray Data, Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE’04), p.366, 2004. Erdal S, Catalyurek UV, Kamal J, Saltz J and Gurcan MN, Information Warehouse Appli- cation of caGrid: A Prototype Implementation, In caBIG, Cancer Biomedical Informatics Grid, 2007 Annual Meeting, Washington, D.C., 2007. Erdal S, Catalyurek UV, Saltz J, Kamal J and Gurcan MN, Flexible Patient Information Search and Retrieval Framework: Pilot Implementation, In Proceedings - SPIE, The Inter- national Society for Optical Engineering, vol. 6516 p. 6516OI vii Erdal S, Catalyurek UV, Saltz J, Kamal J and Gurcan MN, Integrating a PACS System to caGrid: A De-identification and Integration Framework, In SIIM, Society of Imaging Informatics in Medicine, Providence, RI, 2007. Altiparmak F, Ozturk O, Erdal S, Ferhatosmanoglu H and Trost DC, Combining Mining Results from Multiple Sources in Clinical Trials and Microarray Applications, The MMIS workshop at the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), San Jose, CA, 2007. Altiparmak F, Erdal S, Ozturk O and Ferhatosmanoglu H, A Multi-Metric Similarity Based Analysis of Microarray Data, The IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 317–324, Fremont, CA, 2007. Liu J, Erdal S, Silvey SA, Ding J, Marsh CB and Kamal J, Toward a Fully De-identified Biomedical Information Warehouse, In AMIA, American Medical Informatics Association Annual Proceedings, pp. 370–374, 2009. Erdal S, Clymer BD; Liu J, Kamal J, Knopp MV and Hall N, Integrative Searches on HIS, PACS and RIS with Utilization of an Information Warehouse, In SIIM, Society of Imaging Informatics in Medicine, Washington, D.C., 2011. Peer Reviewed Conference/Symposia Abstracts Altiparmak F, Ozturk O, Erdal S, Ferhatosmanoglu H and Trost DC, Information Mining over Heterogeneous Microarray and Clinical Data, LSS Computational Systems Bioinfor- matics Conference, 2006. Erdal S and Kamal J, An Indexing Scheme for Medical Free Text Searches: A Prototype, AMIA, American Medical Informatics Association Annual Proceedings, p. 918, 2006. Ding J, Erdal S, Dhaval R and Kamal J, Augmenting Oracle Text with the UMLS for Enhanced Searching of Free-Text Medical Reports, AMIA, American Medical Informatics Association Annual Proceedings, p. 940, 2007. Erdal S, Ding J, Osborn C, Mekhjian H and Kamal J, ICD9 Code Assistant: A Prototype, AMIA, American Medical Informatics Association Annual Proceedings, p. 950, 2007. Ding J, Erdal S, Borlawsky T, Liu J, Golden-Kreutz D, Kamal J and Payne PRO, The Design of a Pre-Encounter Clinical Trial Screening Tool: ASAP, AMIA, American Medical Informatics Association Annual Proceedings, p. 631, 2008. viii Rogers P, Erdal S, Santangelo J, Liu J, Schuster S and Kamal J, Use of Synthesized Data to Support Complex Ad-hoc Queries

De-Identified Multidimensional Medical Records for Disease Population Demographics and Image Processing Tools Development

Online Identity in the Case of the Share Phenomenon. a Glimpse Into the on Lives of Romanian Millennials

Storming the Reality Studio

Data Management in Systems Biology I

Four Modes of Travelling and Navigating the Knowledge Universe?

Implementing an Infosphere Optim Data Growth Solution

Dynamic Information with IBM Infosphere Data Replication CDC

Annual Meeting with the Financial Market

A Metalogue with Floridi's Information Ethics

APA Newsletter on Philosophy and Computers a Basic Cognitive Cycle, Including Several Modes of Learning, 08:2

A Review of Data Mining Using Big Data in Health Informatics Matthew Herland, Taghi M Khoshgoftaar and Randall Wald*

Troubleshooting.Pdf

Some Methodological and Theoretical Questions of Infosphere Development