MIMBCD-UI

Medical Imaging Multimodality Breast Cancer Diagnosis User Interface

Francisco Maria Galamba Ferrari Calisto

Thesis to obtain the Master of Science Degree in

Information Systems and Computer Engineering

Supervisors: Prof. Jacinto Carlos Marques Peixoto do Nascimento Prof. Daniel Jorge Viegas Gonçalves

Examination Committee Chairperson: Prof. Luís Manuel Antunes Veiga Supervisor: Prof. Jacinto Carlos Marques Peixoto do Nascimento Member of the Committee: Prof. Hugo Miguel Aleixo Albuquerque Nicolau

February 2018 Acknowledgments

We would like to convey our gratefulness to Hospital Fernando Fonseca (HFF) for the collaboration. I would especially like to thank the Doctors Clara Aleluia, Gisela Andrade, Willian Schmitt, Pedro Tomás Marques and Ana Sofia Germano from the HFF for the generous support and medical expertise. Also a great thanks to Doctor Cristina Ribeiro da Fonseca who help this project with all the effort she has. To all of the clinicians and radiologists with somehow helped this project. My appreciation also goes to Bruno Cardoso, Bruno Dias, Daniel Da Costa, Bruno Oliveira, Ana Beatriz Alves, Rodrigo Lourenço, Ricardo Cruz, Filipe Fernandes and João Miranda for help and above all for the good companionship. Thanks to Professors Daniel Simões Lopes and Daniel Mendes for giving us some aid during our research work. A special thanks to my com fellows André Mateus, Gonçalo Pais and João Campos. Not forgetting the presence of Dr Pedro Miraldo, who support our project. Last but not least, thank Joana Teixeira and Lídia Freitas for the feedback. Fundação para a Ciência e a Tecnologia (FCT) and Instituto Superior Técnico (IST) partially supported this work through the FCT/UID/EEA/50009/2013 project, BL89/2017- IST-ID grant.

i Abstract

Breast cancer is one of the most commonly occurring types of cancer among women. The primary strategy to reduce mortality is early detection and treatment based on medical imaging technologies. The current workflow applied in breast cancer diagnosis involves several imaging multimodalities. The fact that no single modality has high enough sensitivity for a reliable diagnosis supports the need for multi-modal imaging in breast cancer diagnosis. Nevertheless, their combination can significantly in- crease diagnostic accuracy. It also reduces the number of unnecessary biopsies, leading to better patient care and lower health care costs. In this work, we used interaction techniques to build a user interface adapted to the standard needs of a radiology room. This user interface allows the combination of Mammogram (MG), Ultrasound (US), Magnetic Resonance Imaging (MRI) and Text Data to assist the clinician in establishing the diagnosis. The work involves the development and design of a user interface for automatic detection, segmenta- tion and classification from breast MG, US and MRI, as well as, textual data notations and information visualisation. We conclude, through user analysis and evaluation, that our methods, techniques and developments are satisfactory. Moreover, this work provides a framework that can be applied to new medical interaction systems.

Keywords: Human computer interaction (HCI), Usability testing, User interface design, User centered design, Health care information systems, Health informatics

iii Resumo

O cancro de mama é um dos tipos mais comuns de cancro entre as mulheres. A principal estratégia para reduzir a mortalidade é a detecção precoce e o tratamento baseado em tecnologias de imagens médicas. O actual fluxo de trabalho aplicado no diagnóstico do cancro da mama envolve diversas multi- modalidades de imagem. A necessidade de imagem multi-modal no diagnóstico do cancro de mama é baseada no facto de que nenhuma modalidade tem a especificidade e a sensibilidade desejáveis o su- ficiente para o diagnóstico confiável. No entanto, a sua combinação pode aumentar significativamente a precisão do diagnóstico. Tal, reduz o número de biópsias desnecessárias, o que leva a uma melhor assistência ao paciente, reduzindo os custos de cuidados de saúde. Neste trabalho, usamos técnicas de interação para o desenvolvimento de uma interface de utilizador adaptada às necessidades padrão de uma sala de radiologia. Essa interface de utilizador permite a combinação de Mamografia (MG), Ultrasom (US), Ressonância Magnetica (MRI) e dados de texto para ajudar o médico a estabelecer o diagnóstico. O trabalho envolve o desenvolvimento e desenho de uma interface de utilizador para detecção, segmentação e classificação automáticas de mamas em MG, US, MRI, além de notações de dados textuais e visualização de informações. Concluímos, através da análise e avaliacão do utilizador, que os nossos métodos, as nossas técnicas e os nossos desenvolvimentos são satisfatórios. Para além disso, este trabalho fornece um conjunto de regras que podem ser aplicadas a novos sistemas de interação médica.

Palavras-Chave: Interação Pessoa-Máquina (IPM), Testes de Usabilidade, Desenho de Interfaces Utilizador, Concepção Centrada no Utilizador, Sistemas de Informação na Saúde, Informática na Saúde

v vi Contents

List of Tables viii

List of Figures ix

Acronyms xiii

1 Introduction 1 1.1 Motivation and Context ...... 2 1.2 Challenges ...... 4 1.3 Contributions ...... 5 1.4 Document Structure ...... 6

2 Related Work 7 2.1 Definitions ...... 7 2.1.1 Methodology ...... 9 2.2 Clinical Domain ...... 12 2.2.1 Activity-Based Computing ...... 12 2.2.2 Fine-Needle Aspiration ...... 12 2.2.3 Picture Archiving and Communication Systems ...... 13 2.2.4 Computer-Aided Diagnosis ...... 13 2.2.5 Patient Visualization ...... 14 2.2.6 Patient Progress ...... 15 2.2.7 Interaction System ...... 16 2.2.7.1 CAS & CADx ...... 17 2.2.7.2 User Interface for CADx ...... 17 2.3 User-Centered Design ...... 18 2.3.1 Prototyping ...... 19 2.4 Environments ...... 19 2.5 Systems ...... 21 2.6 Overview ...... 23

3 Methodology 25 3.1 Questionnaires ...... 27 3.2 Prototypes ...... 28 3.3 Interviews & Observation ...... 36

vii 4 Implementation 41 4.1 System Architecture ...... 41 4.1.1 Image Processing ...... 42 4.1.2 Proposed Architecture Components ...... 45 4.1.3 Auxiliary Files ...... 49 4.2 Services ...... 50 4.3 User and Technical Manuals ...... 50

5 Experimental Evaluation 51 5.1 Approach ...... 51 5.1.1 Participants ...... 52 5.1.2 Apparatus ...... 53 5.1.3 Tasks ...... 54 5.1.4 Statistical Data Analysis ...... 56 5.2 Evaluation ...... 58 5.2.1 Quantitative Evaluation ...... 58 5.2.1.1 Time Evaluation ...... 59 5.2.1.2 Accuracy Evaluation ...... 60 5.2.1.3 Number of Interactions Evaluation ...... 60 5.2.1.4 Number of Errors Evaluation ...... 61 5.2.2 Qualitative Evaluation ...... 62 5.2.2.1 Positive and Negative Affect Scale ...... 62 5.2.2.2 Intrinsic Motivation Inventory ...... 63 5.2.2.3 Experience Needs Satisfaction ...... 63

6 Results 65 6.1 Performance ...... 65 6.1.1 Time Results ...... 65 6.1.2 Accuracy Results ...... 67 6.1.3 Number of Interactions Results ...... 67 6.1.4 Number of Errors Results ...... 68 6.2 User Experience ...... 69 6.3 Summary ...... 72

7 Recommendations & Discussion 75 7.1 Functional ...... 75 7.2 User Interface ...... 76 7.3 Limitations ...... 76 7.4 Future Work ...... 77

8 Conclusion 79

Bibliography 81

A Main Appendix 90

B External Source 95

viii List of Tables

2.1 Table of Systems & Topics for the Related Work ...... 23

5.1 Table of Radiologist Expert Level ...... 52 5.2 Table of Patient Studies ...... 56

6.1 Time (seconds) Performance Results...... 66 6.2 Accuracy Results...... 67 6.3 Num. of Anno. vs Hit Rate Score (HRS) Results...... 67 6.4 Total Number of Interactions...... 68 6.5 Total Number of Errors...... 69 6.6 Positive and Negative Affect...... 70 6.7 Intrinsic Motivation Inventory...... 71 6.8 Experience Needs of Satisfaction...... 72 6.9 Kruskal-Wallis (H) Test Mean Rank...... 73 6.10 Post-Hoc Tukey’s Honest Significant Difference (THSD) Descriptive Statistics by Expert Level...... 74

ix x List of Figures

1.1 A screenshot of a multimodality data set [1] with many multi-modalities of breast cancer imaging...... 1 1.2 Breast Imaging Reporting and Data System (BI-RADS) Assessment Categories...... 3 1.3 Proposal: The two different levels...... 5

2.1 Input and output in formal usability inspections...... 8 2.2 Critical vehicle for discovery and usability early in an emerging fields development, from Gabbard et al. [21]...... 9 2.3 Depicted user-centered design activities, from Gabbard et al. [21]...... 11 2.4 Zoomable interface of PHiP...... 14 2.5 activeNotes Prototype System...... 15 2.6 Operating room interaction system example...... 16 2.7 User-Centered Design Process, from Kaygin [58]...... 18 2.8 Iconographic Display of Mammographic Features ...... 21

3.1 User-Centered Design (UCD) Model, from Journal of the American Medical Informatics Association [92]...... 26 3.2 Patients List...... 30 3.3 SELECT Breast...... 30 3.4 RIGHT SELECT Options...... 31 3.5 Bottom Options...... 31 3.6 Horizontal Modality Set...... 32 3.7 Vertical Modality Set...... 33 3.8 Low-Fi Prototype: Picture Archiving and Communication System (PACS) System Simula- tion...... 34 3.9 Low-Fi Prototype: Multimodality of Imaging...... 34 3.10 Low-Fi Prototype: Mammography Example (Group)...... 35 3.11 Low-Fi Prototype: Computer-Aided Diagnosis (CADx) System Simulation...... 36 3.12 High-Fi Prototype: PACS System Simulation...... 37 3.13 Hi-Fi Prototype: CADx System Simulation...... 39

4.1 Digital Imaging and Communications in Medicine (DICOM) Image Manipulating Tools. .. 42 4.2 DICOM Image Objects Hierarchy...... 43 4.3 Orthanc Software Architecture...... 44 4.4 Unified Modeling Language (UML) request sequence to the Medical Imaging Multimoda- lity Breast Cancer Diagnosis User Interface (MIMBCD-UI) project services...... 46 4.5 Deployment of web technologies in the PACS...... 48

5.1 Position of the user in relation to the devices...... 53

xi 5.2 Anonymous 1 Left Craniocaudal (LCC) Modality...... 54 5.3 Anonymous 4 US Modality...... 55 5.4 Anonymous 2 MG Modality...... 55 5.5 Performance Evaluation Phases and Questionnaires...... 59 5.6 Annotation Classification Areas...... 60

xii Acronyms

ANOVA Analysis of Variance. 56

API Application Programming Interface. 47, 49

AR Augmented Reality. 17, 23

AWS Amazon Web Services. 45, 50

BI-RADS Breast Imaging Reporting and Data System. xi, 3, 4, 54, 79

CAD Computer-Aided Detection. 36

CADx Computer-Aided Diagnosis. xi, 13, 16, 17, 22, 23, 25, 28, 32, 36, 37, 39, 47

CAS Computer-Aided Surgery. 16, 17, 23

CC Craniocaudal. 1, 32–34

CCD Clinician-Centered Design. 16, 17, 35, 37

CR Computed Radiography. 52

DDR Direct Digital Radiography. 52

DICOM Digital Imaging and Communications in Medicine. xi, 4, 13, 25, 26, 36, 38–55, 59, 61, 74, 76

EC2 Elastic Compute Cloud. 45

ENS Experience Needs Satisfaction. 62, 64, 71

FNA Fine-Needle Aspiration. 12

HCI Human-Computer Interaction. 2, 4–7, 16, 18, 20, 25, 37, 65, 79

HFF Hospital Fernando Fonseca. 3, 5, 25, 51, 52

HRS Hit Rate Score. ix, 67

HSJ Hospital São José. 5, 25

HTTP Hypertext Transfer Protocol. 26, 41, 43, 45, 48, 49

ICS Image Checker System. 22

IMI Intrinsic Motivation Inventory. 62–64, 69

xiii IMI Imagens Médicas Integradas. 5, 25

IP Internet Protocol. 42

IPO Instituto Português de Oncologia. 5, 25, 51, 52

LCC Left Craniocaudal. xii, 54

M/ORIS Medical/Operating Room Interaction System. 16, 23, 24

MG Mamografia. v

MG Mammogram. iii, xii, 2, 7, 13, 28, 32, 33, 54, 55, 75

MIMBCD-UI Medical Imaging Multimodality Breast Cancer Diagnosis User Interface. xi, 19, 32, 36, 45–48, 50, 65

ML Machine Learning. 2, 12, 13, 23, 42, 77, 80

MLO Mediolateral Oblique. 1, 32–34

MRI Ressonância Magnetica. v

MRI Magnetic Resonance Imaging. iii, 1–3, 7, 13, 22, 28, 32, 33, 52, 75

OR Operation Room. 16

PACS Picture Archiving and Communication System. xi, 13, 23, 32–34, 36, 37, 43, 45, 47–49, 53, 59

PANAS Positive and Negative Affect Scale. 62, 63

ROI Region of Interest. 22, 28, 51

ROIs Regions of Interest. 32

SAMS Serviços de Assistência Médico-Social. 5, 25, 51, 52

TCP Transmission Control Protocol. 42, 43

THSD Tukey’s Honest Significant Difference. ix, 57, 58, 66, 72–74

UCD User-Centered Design. xi, 6, 18, 19, 23–26, 35, 37, 79

UI User Interface. 1, 2, 4, 5, 7–12, 16, 17, 19, 23–29, 33, 35–38, 40–45, 47–51, 53, 54, 56, 58, 59, 61, 63–65, 67–70, 72–76, 79, 80

UIs User Interfaces. 4, 7, 9, 10, 12, 14, 18, 19, 24–26, 29, 36, 44, 51, 79

UML Unified Modeling Language. xi, 45, 46

US Ultrasom. v

US Ultrasound. iii, xii, 1–3, 7, 13, 28, 32, 33, 54, 55, 75

UX User Experience. 12, 25, 27, 29, 36–38, 40, 51, 58, 79

VR Virtual Reality. 23

WADO Web Access to DICOM Objects. 43, 45–48

xiv Chapter 1

Introduction

The goal of this thesis is to develop a new User Interface (UI) to collect ground truth (i.e. annotations), concerning two types of lesions that can occur in breast screening, i.e., masses and calcifications. Also, visualisations of the mentioned lesions should be available for a proper diagnosis to perform patients follow up. The task of collecting such annotations is of chief importance. Indeed, with such large annotated datasets it is possible to develop learning methodologies for the automatic diagnosis in the breast screening (e.g. Machine Learning based approaches). Another important aspect is that the system should consider several image modalities (Figure 1.1). This aspect happens, since in current clinical se- tups [2] the radiologists perform the diagnosis using several image modalities. With such multimodality, it is possible to obtain a reliable diagnosis, since the complementary of the information contained in these modalities is crucial for the completion of the exam. More specifically, the following image modalities [3] are considered: (i) Craniocaudal (CC) and Mediolateral Oblique (MLO) Mammography (Letter B of Figure 1.1); (ii) US (Letter A of Figure 1.1) and (iii) MRI volumes (Letters C and D of Figure 1.1).

Figure 1.1: A screenshot of a multimodality data set [1] with many multi-modalities of breast cancer imaging.

1 In this thesis, we propose a system that efficiently allows for the automatic collection of the annota- tions comprising the lesions observed in breast screening (i.e. masses and calcifications). The problem addressed herein is novel since in a clinical domain the UI should be installed which constitutes a new paradigm. This thesis is thus threefold. First, it bridges the gap between the Human- Computer Interaction (HCI) and a clinical domain. Indeed, the literature is still scarce concerning the above regard. Second, it is of crucial importance to know the clinical realm and to know the workflow of the diagnosis processes that taken in a radiology room. Finally, in detail, the radiologist profile must be identified. Radiologists require rapid and reliable access to images. Within digital environments, based on web browser technologies this access is increasing. Our system specifies a set of web-based services for presenting and accessing multimodality of medical images. In this system communication, the system becomes very flexible. The functionality supports access to a medical imaging data, while client design can be either complex, providing advanced functionalities by taking the medical image data to be ap- plied. As controls, within the server applications, it is providing all functions (annotations, zoom among others). These may be downloaded locally as pages so that the only necessary software on radiologist’s workstation is a web browser. In the end, there is no particular client application. Throughout this thesis, it is addressing all the above concerns.

1.1 Motivation and Context

We focus on the development of a new system to the problem of breast cancer image analysis for two primary reasons. Firstly, breast cancer is the most commonly occurring type of cancer among women [4], the main strategy to reduce mortality and morbidity being early detection and treatment based on medical imaging technologies. Secondly, the current workflow applied in breast cancer diagnosis natu- rally involves several imaging modalities, making it an ideal test bed for a proper diagnosis methodology. Based on the fact that no single modality has a sensitivity and specificity high enough for reliable diag- nosis [5], it is addressing the need for multimodal imaging in breast cancer diagnosis. However, their combination can significantly increase diagnostic accuracy [6, 7, 8, 9]. Our work aims to reduce the number of unnecessary biopsies, which leads to lower health care costs and better patient care. In- deed, currently, in clinical settings, initial breast cancer diagnoses are done using MG for older patients and US/MRI for younger [7] patients. If founding suspicious lesions, the case is followed up with a biopsy. To avoid false-negatives, the false-positive rate is typically high, resulting in a high number of unnecessary biopsies. The complementary nature of multimodal imaging has the potential to decrease the false-positive and false-negative rates because of the reduced likelihood of a mistake to happen in more than one imaging modality. The development of a multimodality UI will be helpful for the automation of the diagnosis. As above described, if a large data set is available, it is possible to have a Machine Learning (ML) based approach (e.g. Deep Learning Methodologies) that automatically can classify the whole exam, giving a high prob- ability to those cases (Figure 1.2) where the biopsy is necessary. Such automation is important since it reduces the inspection performed by the radiologist, that is still rudimentary in current clinical setups. For example, in cases where the mammography is unclear, what happens in dense breast cases, the radiologist has to perform additional tests. First, the radiologist has to perform an additional inspection in US images (besides the inspection in MG). What happens is that this second examination may be insufficient, since the US image may skip some details. Thus, by requiring the third examination in MRI, it is now possible to observe the finest details. However, the MRI has many false positives, and the radiologist must return to the US again in the attempt to find the changes that are compatible with MRI.

2 From the above, it is clear that the radiologist is performing the diagnosis in a loop fashion1 within image modalities. Avoiding such "loop-procedure" should. And this is possible if one has large annotated data sets. Another important aspect of multimodality is the correct diagnosis desire for precision. For instance, the lesion G1 and lesion G2 [10] are common in the diagnostic. In the former case (lesion G1) the lesion is well differentiated, appearing as malign in US and benign in MRI. In the latter (lesion G2), the lesion is not well different, appearing as benign in US and malign in MRI. This lesion anatomy justifies the importance of using multimodality information. Besides of having large annotated multimodality datasets, the follow-up of the patient is crucial. This follow-up means the annotation of a given patient should be analysed through time. Indeed, the masses and micro-calcifications changes its appearance, position and also the density [8]. Taking as an example the micro-calcifications, one can observe a random localisation of such lesion. However, if the patient will have cancer in a short future, the randomness tends to diminish, and the micro-calcification tend to be more organised providing a directional spatial organisation. In case of the masses, these type of lesion exhibit regular shapes for benign cases, however, they change its morphology is the patient is developing cancer. In this situation, the shape of the masses is now sharper and irregular.

Figure 1.2: BI-RADS Assessment Categories.

1A standard practice in the Hospital Fernando Fonseca (HFF) to do medical exams using these technologies.

3 From the above temporal inspection must also be considered in the systems developed in this thesis. To tackle the mentioned above issues (collect a large data set of annotations, as well as, the changes of the patient’s lesion) the developed system will incorporate the functionalities as follows:

1. First, it provides an efficient way to collect the ground truth comprising masses, calcifications and the corresponding BI-RADS (Figure 1.2) label for a given image modality. The aforementioned is annotated by the clinician who delineates the contour mass, gives the calcification position and labels the corresponding image. Thus, as the number of examinations increases an update of the ground-truth database is also performed.

2. Second, based on a query of the patient ID, it is possible to conceive all the annotations of a given patient collected in a period (typically two years) on any imaging modality. This process is important since it streamlines the job of the radiologist. Therefore, the clinician can supervise the evolution of the masses and calcifications automatically to perform the inspection of their morphology. In this way, the expert can predict better the risk of tumour appearance.

1.2 Challenges

The current challenge of HCI for medical imaging diagnosis include the following issues: (1) the UI proto- types and input environment devices that underlie these clinical tools are in a dynamic state, where user (radiologist) profile is always changing; (2) the UI prototypes themselves may be highly fluid because of the relative ease of changing medical images [11](DICOM standard) information visualization; (3) many User Interfaces (UIs) are used in situations in which a variety of influences on medical and clinical envi- ronment outcomes exist, few of which are subjects to an easy assessment or experimental prototyping; (4) researchers often lack familiarity with users, and more specifically radiologists, evaluation methods and techniques; and (5) researchers often believe that evaluation will delay the research, increase costs, and have limited impact on research benefits. Addressing these issues requires additional research to improve the effectiveness of newer systems, promoting evaluation. Nevertheless, existing evaluation methods should and can be adapted to assess UI prototyping and HCI theory. In our preliminary work, radiologist observations and hospital meetings revealed to be a way where we questioned radiologists about their needs and also, where researchers try to understand their be- haviour and the future context of use. It is in the radiology room where radiologists have to communicate their precise requirements while being capable of explaining their goals and how they approach their tasks. We took about 240 hours of Physician Meetings. On the laboratory work, we took more than 1100 hours of R&D and more than 250 hours of Prototyping. The R&D work comprises several stages. Next, each of the stages is detailed, as well as the approximate time for its realisation:

1. 250 hours to collect State-Of-The-Art work;

2. 50 hours to perform Questionnaires (see User Profile on Appendix B);

3. 50 hours to perform Interviews (see Scripts on Appendix B) with domain questions;

4. 150 hours to perform radiologist Observations supported by the scripts (see Scripts on Appendix B) and also developing several reports (see Reports on Appendix B);

5. 100 hours to perform raw data and radiologist’s information Analysis;

6. 100 hours to perform Research work on the laboratory;

7. 400 hours to perform thesis and papers Writing;

4 Concerning the Prototyping, we divided it into Low-Fi and High-Fi development. The Low-Fi took about 30 hours of development, while the High-Fi took about 220 hours of development. Finally, out of the laboratory, the Physician Meetings where about 30 meetings at the following institutions:

• HFF; • Hospital São José (HSJ);

• Serviços de Assistência Médico-Social • Clínica Europa; (SAMS) Hospital; • Cedima;

• Imagens Médicas Integradas (IMI) Campo • Instituto Português de Oncologia (IPO) Lis- Pequeno; boa;

This methods and contextual inquiry seem promising, yet challenges exist in the use and analysis of the vast amount of collected raw data and radiologist’s information.

1.3 Contributions

The thesis contributions are what guides the practical research. However, the purpose is not solely to research and develop a UI that will produce a data-set of diagnostic annotations from radiologists interactions but also create a tool to follow-up patients that will bring a more reliable learning human- machine. Likewise, reflecting and discussing what contributions of a medical diagnosis for multimodality of imaging of several UI prototypes can have in HCI practices as a way to gain insight for future medical imaging diagnosis. By conduction research through a real-world and practical implementation, we aim to address both purposes (Figure 1.3), which however work together to create theoretical and practical knowledge.

Figure 1.3: Proposal: The two different levels.

Handling the practical case of prototyping a UI for medical multimodality of imaging diagnosis emerge to explore the research field and answer the research question actively. This thesis goal begins to look into the contribution of HCI practices in the medical and clinical subjects.

5 The two levels are relatively important in a HCI need as it deals with not only issues of the proto- type concerning radiologists, but also to the data-set quality produced from the annotations made by radiologists and so crucial to the machine.

1.4 Document Structure

This section describes the structure of the document. It comprises six chapters. Next, we detail each of them. In Chapter 2 the document provides background knowledge regarding the development of a system for the breast cancer diagnosis using multimodality of medical imaging, such as important Definitions in Section 2.1 and Clinical Domain in Section 2.2. Furthermore, in Section 2.3, we describe theoretical models of analysis, proposing evaluation and development that for this work, while describing method- ologies of development, as well as, the UCD approach. In Section 2.4 we describe other Environments of interaction and techniques. We describe some existing systems in Section 2.5, focusing on systems which attempt to implement breast cancer diagnosis, i.e. detection and diagnosis of the lesions, as well as, treatments before and after surgery. To conclude the Related Work section, we provide an overview of the topics studied in Section 2.7. In Chapter 3, we discuss the empirical methodology to find out the needs of radiologists. On Sec- tion 3.1, we describe and discuss a variety of questionnaires used to collect primary quantitative data. Section 3.2 will focus on Prototypes and how this approach helps our research through the final design of the system solution. Conducting user Interviews & Observations are part of Section 3.3 and con- ducted to the design and development of the final solution. In Section 3.4 the System Architecture is described. In Section 3.5, we describe the other service solutions for the proposed architecture. Sec- tion 3.6 concludes this chapter where User and Technical Manuals of all prototype and system are described. In Chapter 4 we present an Experimental Evaluation & Results on integrating the system in a real-world scenario. In Section 4.1 we describe the Approach taken. In Section 4.2, the Evaluation section, a formal verification of statistical data is described. At the end, in Section 4.3 we make an Overall description of those results, while Section 4.4 analyze the performance using several metrics. For User Experience evaluation, in Section 4.5, we present the results related to the topic. At the end, we Summarize our results. Section 4.6 discusses several performance aspects and experience improvements. In Chapter 5 a exploratory research is performed by Recommendations & Discussion of the sys- tem. In Section 5.1 the Functional improvements of the system are described. In Section 5.2 we make a debate of how the User Interface can be improved. On the other hand, Section 5.3 we describe some Limitations of the system. Finally, we suggest on Section 5.4 the Future Work of this thesis. Finally, we summarise the main achievements of this thesis in Chapter 6 as well as the main Con- clusions.

6 Chapter 2

Related Work

In this chapter, we discuss software for both clinical and non-clinical tools. More specifically, we ad- dress image modality concerning medical applications. Also, we treat software concerning the (non)- multimodality views and existing work in the field of medical and clinical user interfaces.

2.1 Definitions

The term Medical Imaging Multimodality [3] is very broad. Therefore, a comprehensive explanation regarding the multimodality of medical imaging is provided jointly with the HCI paradigm in the context of this thesis. The typical goal of medical imaging multimodality is to explore medical sets of images regarding their various modalities (MG, US and MRI). This thesis explores the need for multimodal imaging in breast cancer diagnosis that will be able to collect a large number of lesion annotations in multimodality of medical imaging. This work involves the design and development of a UI that will have enough impact on automatic detection, segmentation and classification of breast lesions in several image modalities comprising MG, US and MRI, as well as infor- mation visualisation, to serve as a more natural, and direct form of interaction between radiologists and systems. This new form of interaction could potentially minimise the Gulf of Execution [12], since now, the radiologists communicate their intentions through interactions to the system, which subsequently will produce annotations. The term usability refers to the ease of use and ability to learn the human-computer field, what makes the term a broad concept. Several usability methods exist to evaluate a technique or a product on its user satisfaction and efficiency. A set of methods are evaluating the Usability of a health system. Those methods [13] can in most cases be divided into separate categories: (i) Model-based; (ii) Inspection- based; (iii) User-based; and (iv) Scenario-based; of which the (iii) category will be of main interest in this section. Next, we detail each of them. Model-based. The model-based usability evaluation methods can provide questions regarding how users would perform a certain task, concerned with computational models of human behaviours and cognitive processes [14]. Most importantly, by using as input devices keyboard and mouse systems the model-based methods and evaluations have the main applicability. The aim of usability evaluation is not only to identify usability problems on a health system but also providing information to us to modify the system and UI to reduce or remove the identified problems. Here again, model-based approaches can be of great interest. Indeed, by extracting scenarios from models, it is easier to point out which part of which model is involved in the identified usability problems. Due to major limitations for clinical UIs, we concluded that the model-based evaluation would be the less important model of the three to take into

7 account. The exploratory nature of the clinical domain, in waiting to find out how radiologists interact with systems, as well as, to understand the usability evaluation [15] and functionality requirements, were part of this conclusions. However, model-based methods also have certain drawbacks. Most of the models are limited to expert-user evaluations behaviour and thus can not model novice-user of the usage by the evaluated system. Inspection-based. Used on an early stage, when developing a UI, by evaluating prototypes or a set of specifications for a system that can not be tested directly with the users. Inspection-based is characterising a set of methods where an evaluator inspects a UI. The great advantage of these methods is that it does not require any users, which often make a method in a cost and time efficient. By being concerning with methods in which the evaluator inspects a system on its usability depending on a series of heuristics (Figure 2.1). However, this method has also drawbacks for this study, since it should be applied by multiple usability experts to maximise the effective measurement of the UI usability [16]. When a specific domain is needed, like in our study, the knowledge from usability experts is also needed to evaluate the system at hand currently. In our case, specific domain knowledge is requiring medical image viewers, the radiologists in the radiology room, which is probably not the average expertise of the usability experts. For this reason, inspection-based usability evaluation models do not take performance measures of the users on the system into account, in which, this case, is a paramount consideration of time and distractions constraints in the radiology room environments.

Figure 2.1: Input and output in formal usability inspections.

8 User-based. This method is defining concerns with gathering inputs from relevant users interacting with UIs. Questionnaires are the most widely used methods in this category, which measures the user’s subjective preference after using the relevant system. These questionnaires can be either quantitative or qualitative of the system usage. These methods are critical for driving UI activities, usability, and early discovery on an emerging development (Figure 2.2), such as medical image diagnosis. Over time, contributions from the field emerge, leading eventually to adopted UI guidelines and standards (Figure 2.2). Lessons learned [17] involved by a technological field from conducting user-based studies are not only providing value to the field as a whole regarding insight into a part of the UI space but also critical for the usability of a particular UI. Contributions to the field, as time progresses, begin to form metaphors [18] and a collection of simple design guidelines [19] from which we can speculate. The simple design guidelines are eventually shaken down into a collection of tried-and-true guidelines [20] and metaphors [19] adopted by us.

Figure 2.2: Critical vehicle for discovery and usability early in an emerging fields development, from Gabbard et al. [21].

Scenario-based. In the accountant scenario-based methodology, the definition is often preceding by another usability measure. Questionnaires are therefore fulfilling the scenario-based testing [22] where several tasks are presenting in the form of scenarios. These scenarios are explaining what the participant needs to do on the system, but not only, how to perform it [13].

2.1.1 Methodology

In this section, we give detail about how previously mentioned concepts are related to the present work. From the four methods, we only chose two of them: as the main one we prefer the (i) User-based; as

9 a secondary method, we accept the (ii) Inspection-based. These methods try to describe and predict the user’s behaviour, what makes it different from both of them. Whereas user-based methods can only retrospectively generate UI, the user is performing specific tasks after Usability issues. On the other hand, inspection-based methods can only guess on UI and usability issues based on the heuristics used by and the experience and knowledge of the person who evaluates the tasks. The work context reported here, although, it falls within the UI of the user-based methods, this approach emphasises interactive activities between the user task analysis phase [23], where user tasks are understood, and gathering requirements, plus the formative user-centered evaluation phase [24]. It is where an application level UI prototype has been developed and is being analyzed. Because of that, we choose the user-based approach as a primary approach of this phase. In conjunction with this approach, the inspection-based methods evaluation can be coupled with the user-based methods (Figure 2.2) to assist in the UI activity. The inspection-based techniques, using expert assessment, can be iteratively combined with user-based means to refine and understand useful parameters and most importantly to improve all system. Those UI activities are driving by some events that make it the strength of this approach. Inputs from the user tasks phase, UI parameters correlated with expert evaluation results (inspection-based methods) and UI performance, derived from user-based methods. There are two logical starting points of the three main activities shown on Figure 2.2: UI design and user-based methods. Starting with UI design activities, in an advantage, that we can start exploring for a former space on time investment in system development. Moreover, it can explore some radiologists candidates quickly and easily. Previously, if mocked-up correctly, the static prototypes can be presented through Low-Fidelity Prototypes (also known as Low-Fi Prototypes), allowing us to get an idea of how the prototype may be perceived [25, 26]. Once a set of prototypes have been created, evaluations from experts can be applied to assess that static UIs (Low-Fi Prototypes), culling prototypes that are likely to be less effective than others. Evaluations using experts are also useful regarding further understanding the interaction space by identifying potential user-based levels and factors. It can be conducted to examine those levels and factors further to determine the match between inspection-based methods (expert evaluation) and user-based methods, like user studies and questionnaires (Appendix B). In cases where the prototype is somewhat understood, and we have specific questions about how different prototype parameters might support user task performance, we may be able to conduct a user-based evaluation as a starting point. Under this approach, we start with experimental prototypes parameters as opposed to specific UI prototypes. In Figure 2.3 it is showing that not only user-based methods identifies UI prototype parameters to assist researchers, but also, have the potential to pro- duce UI prototype learned lessons, guidelines and to generate innovation, which provides both tangible contributions to the field while also improving the usability.

10 Finally, a set of iteratively refined UI prototypes are developing the basis for the overall and final UI prototype. This prototype can then be analyzed using formative user-centered evaluation [27, 28]. The current model offers an user-centered approach for increasing the use of results from internal evaluation. The characterisation of the user-centered evaluation is threefold: (i) attention to the general communi- cation aspects of evaluation; (ii) delineation of the evaluator and the decision-maker roles; and (iii) an ordered set of steps with usefulness as the primary concern at each step. That way, a user-centered ap- proach of analysis will help evaluation do what it is supposed to do, providing information that gets used to increase the effectiveness of decisions. Also, user-centered evaluation accomplishes those decisions by having a pervasive behaviour of utility and by carefully attending to the three characteristic features of this approach to evaluation. For the current model we develop several scripts (Appendix B) to guide us during evaluations.

Figure 2.3: Depicted user-centered design activities, from Gabbard et al. [21].

In conclusion, the User-based methods for usability evaluation is very suitable for the current re- search. Due to its exploratory nature, we will try to see whether those systems are a desire over the current situation. Whenever finding out feasibility and usability, of a new interaction technique, from a need for qualitative and quantitative data, while we are in a specific working environment.

11 2.2 Clinical Domain

This section describes and relates the clinical domain with the main topics that have a relation with the fields of UIs [29], Usability [30] and User Experience (UX) [31]. The more a clinical domain is understood, the easier it will be to observe radiologists and to spot UI issues. All of the above works reported numerous problems and deficiencies, which considerably hinder the efficiency of system use and radiologists’ routine work, fundamentally supporting this thesis by giving us early information of the reported problems and deficiencies.

2.2.1 Activity-Based Computing

In this section we address other concepts like an Activity-Based Computing (ABC) for Medical Work in Hospitals [32] that presents the concept, which seeks to create computational support for human ac- tivities contributing to the growing research on support for human activities, mobility, collaboration, and context-aware computing. To summarise, activity-based computing builds and expands upon prior work within each of the areas described as activity management, virtual window management, collaboration support systems and context-awareness. However the last topics, do not approach a multimodal view of images, but it was great in other fields of understanding the context and most of the problem/solutions. We are interested in examining the radiologist’s interactions and actions. These interactions should also be a useful addition to our system since we need to generate information from radiologist’s ac- tions (breast annotations) from several radiologists concerning a longterm activity. Our Meta-Info. Gen. (Meta-Information Generation) is produced from radiologist’s interaction on the image as they are an- notating the image the system is generating coordinates. From those annotations related to the image patient id.

2.2.2 Fine-Needle Aspiration

Another direction of work is known as Fine-Needle Aspiration (FNA) [33]. This work brings improvement on the diagnostic accuracy of breast FNA goal where an interactive computer system is developing cytological features evaluation derived directly from a digital scan of breast FNA slides. The system uses computer vision technology techniques to analyze cell nuclei and classifies them using an inductive method based on linear programming. A Breast Cytology Diagnosis researches this class of concepts via Digital Image Analysis [34]. It uses computer-based image analysing. Also, important as a context of the clinical domain to our work. The researched efficiency of a system for medical imaging breast cancer diagnosis from FNAs varies considerably. Ahmad et al. [35] researched on FNA performance parameters, while founding some inconstant variables and constraints. The FNA diagnosis is highly operator-dependent and emphasised the need for developing individual performance characteristics for those doing this diagnostic. One goal of the present work is to improve the diagnostic accuracy of FNA by increasing its objectivity and thereby making it less operator-dependent. This image analysis is a prognosis [32] study. ML applied to breast cancer diagnosis introduces us to the diagnostic of the breast cancer done by computers. From then, requiring a value of aspiration. One medical literature [35] give us evidence for a cytological examination of the breast supported by a statistical review.

12 2.2.3 Picture Archiving and Communication Systems

Medical services in current-time rely heavily on digital imaging technology due to image modalities utilised in the medical field such as the computer MG, US and MRI. These techniques require image- processing tools and digital management that have been the primary reason for the development of PACS [36]. This technology provides economical storage and convenient access to images from mul- tiple modalities where images are transmitted digitally via PACS where it eliminates the need to file, retrieve manually, or transport image sets. The universal format for PACS image storage and transfer is DICOM [37] format. One work that is close to our system is proposed in [38]. This work addresses a web-based, data processing and management of medical imaging. It runs through the network and internet browser that has a similar ability to the PACS. Having the advantages of being an online system can be viewed as archiving one step closer towards total online medical imaging system and online imaging in general. Despite the excellent information that this work offer in the field of this research, it does not provide any ability for analysing the multimodality of images, one the concern that is the goal of this thesis.

2.2.4 Computer-Aided Diagnosis

The PACS [36] faces ever-growing adoption in hospitals and clinics worldwide [39] as we have seen before. Digitization process and sharing of medical images are progressively replacing the use of to- mography films, thus reducing costs and increasing the possibility of remote medical diagnosis through telemedicine solutions. In line with this trend, CADx [40] is also gaining ground. CADx is an interdisci- plinary technology combining elements of machine learning and computer vision with radiological image processing. A typical application is the detection of a tumour. For instance, some hospitals use CADx to support preventive medical check-ups in mammography (diagnosis of breast cancer). CADx typically intends to provide a suggested diagnosis based on automatic quantitative analysis of medical images to aid physicians in their final diagnosis. The output of ML based algorithms may be useful in situations where the human visual system might fail or in case of fatigue of the physicians, alerting on possible well-known problems. According to a computer-aided diagnosis in medical imaging article [40], there are two CADx system types. The first type of CADx system assists with lesion detection, searching for abnormal standards in images like micro-calcifications groups in mammography images. The second type of CADx system assists the diagnosis, analysing the quantification of image characteristics was extracting information about a lesion shape may help to determine if a tumour is malignant or benign. As a manner of the fact that our system is characterised to be a CADx topic related, it is of prominent importance to describe and analyse that literature.

13 2.2.5 Patient Visualization

Another different interface to visualise patient histories on a PDA [41] describes two different UIs for mobile device tool: (1) one that displays patient histories; and (2) another that permits to visually query patient data stored in hospital database. PHiP (Patient History in Pocket) [41] is a tool designed for a mobile device that displays patient histories and permits to visually query patient data stored in the hos- pital database exploiting information visualisation techniques where it can accommodate on the screen a right amount of clinical cases.

Figure 2.4: Zoomable interface of PHiP.

The objective of this work is to display as much as possible information about the patient history on limited display space, providing overview data as well as details. By showing on a single screen of a personal computer the overview of multiple facets of records will give the users a better sense of type and volume of available data. In our work, we are going thicker than in the PHiP (Patient History in Pocket) [41] work. Besides of providing a patient (like PHiP [41]) visualisation, we also offer a temporary display of the patient’s lesion.

14 This work has been developed according to an user-centered approach and will bring system support in that way. Besides the user studies conducted in the hospital at the requirement phase, they have per- formed with physician evaluations of the different prototypes and this kind of information is kindly useful for our research field as well. While the developed system on that work can provide temporal details of the patient, the authors have optimised the patient visualisation. In this work, patient visualisation is performing by (i) the temporal morphologic evolution of the masses; and (ii) the temporal evolution of the calcifications2. In short, the system has the capacity of merging both patient visualisation and patient’s follow-up. By using dynamic information over time.

2.2.6 Patient Progress

In the computer-assisted creation of patient progress notes [42] a prototype application, called ac- tiveNotes, supports the creation of critical care notes by physicians in a hospital of an intensive care unit. It integrates automated, context-sensitive patient data retrieval and user control of automated data updates and alerts into the note creation process.

Figure 2.5: activeNotes Prototype System.

A critical care note is a clinical document, written by a hospital physician, that documents the patient’s progress and prognosis. This kind of work will help our project system by giving us information analysis from physicians feedback and user understanding. It also brings us information with the qualitative study by providing us with the right path throw prototype design and user experience evaluation. The physician-driven management of patient progress notes in an intensive care unit [42] describes the design of an exploratory focused techniques to support data input and control of electronic progress like note content. These techniques will help the project system by giving us an alternative design explo- ration including observations, structured and semi-structured interviews, design and implementation of a prototype, as well as, feedback gathered in the qualitative study with physician surveys. This approach was not, yet, implemented in the morphologic/density diagnosis of the breast cancer lesion. Something that emerges from this thesis.

2The same evolution that can have a directional, or non-directional aspects.

15 2.2.7 Interaction System

On Medical/Operating Room Interaction System (M/ORIS) [43] it is proposed an architecture for a real-time multimodal system, which provides non-contact, adaptive user interfacing for Computer-Aided Surgery (CAS) [44]. This paper focuses on the proposed activity monitoring aspects of M/ORIS. The researchers have analysed the issues of HCI in a Operation Room (OR) based on real-world case stud- ies. In the following, we focus on interfaces for CADx. As reported [43], the automation system in the medical environment has to obey strict safety rules. With real-time monitoring of the physician’s activity, it is combining gesture interpretation for a specific interaction. The same philosophy can be followed by us, for automatically predict the radiologist’s needs. Thus, any automated UI must be overridable by the radiologist’s decision. From common interactions, they identify a UI for endoscopy, proving that the transition from one interaction to another can be automated using information.

Figure 2.6: Operating room interaction system example.

This M/ORIS [43] was vital to us since it provides a multimodal framework to perform two different physicians interpretation and support. Although the system is related to CAS, and not to CADx, as our system is, several guidelines are describing their design into their medical UI. One of the followed directions were the User Tests, while we support some of their usability evaluations. The authors are also giving importance of a Clinician-Centered Design (CCD) for both efficiency and safety of a clinical domain system.

16 2.2.7.1 CAS & CADx

CAS [45] and CADx [40, 46] can contribute to the general cost-cutting trend in health care by making it possible to have little staff performing the same activity in less time than with traditional methods. It also brings us human error prevention and, if so, the system can better audit the source of problems and reduce error effects. In particular, it is likely that shortly, a single doctor will have to control and diagnose several computer-based processes during a surgical, diagnosis or clinical intervention. Efficient UI de- sign that matches the constraints of clinical environments and that helps reduce the doctor’s workload will, in large part, determine the success of CAS and CADx. Most of the works in CAS and CADx systems for medical images analysis rely on extraction of a specifically disease feature, as well as a supervised classification or regression Model-based on this feature expression [45, 46]. Both definitions and respective research work, are of chief importance to our research as we need to understand and develop a CADx based system with new interaction tech- niques supported by CAS literature. Some of this definitions are the User Tests, Multimodal Architecture, Characteristics of Clinical Procedures, Requirements and Activity Identification.

2.2.7.2 User Interface for CADx

The impressive development [47] in multimodal medical imaging technology during the last decades provided physicians with an increasing amount of patient functional data and anatomical specific. Fur- thermore, the rising use of non-ionizing real-time imaging, in particular, optical and ultrasound imaging, during cancer analysing procedures creates the need for design and development of new visualisa- tion of information and display technology [47] allowing physicians to take full advantage of abundant sources of independent preoperative and intra-operative data. The Augmented Reality (AR) applied to the clinical domain was proposed as a paradigm while bringing new interactive solutions and visuali- sation techniques [48] into perspective. CADx technologies, whether they enhance traditional methods (e.g. image visualization [49]) or provide new tools such as augmented displays [47], share a common need for multimodality of imaging UI, like in our work. Interface issues are systematically brought up in connection with new computer-assisted techniques, and poor UI design is cited [50, 49] as a significant limiting factor for many operations. In particular, physicians criticise the lack of user-centered design, the difficulty to operate computer-assisted equipment during surgery, and the failure to convey information without otherwise constraining the physicians. Supporting the argument of a need for categorising the radiologists’ profile as a user (CCD) and how the radiology processes work in the radiology room. To address these issues, several authors have developed guidelines for medical UI design. An introduction to human factors in medical devices [51, 52] and making medical device interfaces more user-friendly [53] stress the importance of a CCD for both efficiency and human error reduction. A framework for determining component and overall accuracy for CAS systems [54] proposes a framework for evaluating the benefits of new UI paradigms in CAS system design that can be applied to CADx system design, since one is for surgery and the other is for diagnosis. By addressing several variables, this thesis contributes to a specific guideline into the development of a UI for CADx of a breast cancer diagnosis.

17 2.3 User-Centered Design

Multimodal systems that process user’s input have become a vital and expanding field, especially within the past years, with advances in a growing number of research and application areas. An increasing interest in multimodal interface design is inspired in large part by the goals of supporting more transpar- ent, flexible, efficient and powerfully expressive means of HCI that in the past. Multimodal interfaces are expected to help a broader range of diverse applications, be usable by a border spectrum of the average population, and function more reliably under realistic and challenging usage conditions. Despite recom- mendations that users be involved in the design and testing of image diagnosis technologies, few works describe how to include radiologists in systemic and manfully ways to ensure that UIs are customised to meet their needs. UCD is an approach that involves end-users throughout the design process so that technology support tasks, easy to operate, and add value to users. Involving radiologists in the design and testing ensures functionality and usability, therefore increasing the likelihood of promoting the intended outcomes. Usability defined by Bond et al. [55] as a measure of the ease with which a medical system can be used and learned, including its efficiency, effectiveness and safety as employment to evaluate medical applications at a clinical expert setting. UCD is an approach for developing prototypes [56] that incorpo- rates user-centered activities through the entire development process (Figure 2.7).This UCD approach [57] allows end-users, like radiologists, to influence how a prototype takes shape to increase ultimate usability.

Figure 2.7: User-Centered Design Process, from Kaygin [58].

18 2.3.1 Prototyping

In this thesis, the following methods are taken into account [56], including: (1) tasks and requirements analysis; (2) assessing the intended radiologists observation; (3) testing the features and UIs; (4) an- alyzing and resolving usability issues; (5) evaluating the possible alternatives of the design; and (6) developing and testing prototypes with radiologists with an interactive manner. As we are doing on the MIMBCD-UI project and master thesis, incorporating UCD, principles keep the focus on understanding users’ (radiologists’) needs. The principles are in the overall development of a medical diagnosis for multimodality of imaging UIs. Once contemplated as indispensable, costly and time-consuming, the evidence of the benefits of UCD approach is now absolute. Involving users in prototyping phases and development [56] of a new system will improve the sys- tem’s quality since this approach will bring us a more accurate assessment of radiologists requirements and a higher level of radiologists acceptance. However, as demonstrated by Calisto et al. [59] there is a resistance to change. This resistance has been found to reduce development time [?] because usability problems are identified and resolved in the prototyping phase. It means that the usability problems iden- tified and addressed the launched before issues. Applying UCD to the system development improves usability and functionality, therefore increases the likelihood of promoting the intended medical imaging diagnosis behaviours and outcomes. Although UCD is frequently acclaimed as a means of ensuring user acceptability, its employment with radiologist users has not been widely disseminated to the medical science and clinical field as disciplines, despite calls for its employment. This inconsistency is problematic since medical and clin- ical professionals typically identify the medical issues or behaviours of concern and propose possible interventions from the diagnostic.

2.4 Environments

In many contemporary systems, there is a big opportunity [48] to improve the UI. From a collection of displays, complex and tedious procedures, inconsistent sequences of actions, inadequate functionality and insufficient informative feedback can generate debilitating anxiety and stress. In this section, it is described the most relevant interaction devices [59], like traditional (mouse and keyboard), touch and virtual reality. Comparison of user performance with three relative devices on a desktop display and two orienta- tions, using a small vertical screen, is presented in [60]. On the other hand, in [61] highlighted that for steering tasks techniques users are about twice as fast comparing devices. Both works were the base support to our practice of comparison through participants in terms attentional demands that different input devices put onto radiologists. These studies typically report that users are slower or faster at learning and using digital technologies depending on the interaction device. To better suit radiologists’ capabilities, it is resigning the same technique. Other authors [62, 63] investigated the use of additional fingers to mode the mapping between touch and control point on displays. In this manner, a radiologist can switch to a more accurate mapping of lesions, when precise control is needed and can default back to direct single-finger input when working with more significant targets. Caprani [64] compared a collection of input devices designed to improve the accuracy and precision needed to select a target and the design of the screen elements of bare hand interaction with a touch device. They found that performance and favourite interaction lines depend on the target size, leading to different performance rankings. Also, they conclude that the system should provide to the radiologists a variety of selection tools so that the radiologist can choose the most appropriate tool for the task.

19 An early work [65] showed that when performing target selection, which has a significant impact regarding speed, but a decrease when performing shape matching [66]. On the other hand, some other works [67, 68, 69] found an increase in the performance of tasks that require recall and spatial memory when using touch devices. Quality of experience in touch tasks showed to be more efficient than the traditional ones when doing target acquisition tasks of shape docking and moving targets tasks. These developments suggest that radiologist performance improves with a larger target and touch screen [66] sizes, increased spaces between targets, as well as alternative methods to interact with smaller targets. In the seminal work of Withana [70] on the quantitative comparison of pointing device systems, it is shown that indirect traditional input systems compared favourably to direct stylus input. While there are essential differences between stylus and touch input systems, one would expect that these results might generalise to single-finger pointing. By focusing on different devices, this work is supporting our work apparatus. Helpful for delivering force feedback to the radiologist, from a methodology of reproducing, capturing and ultimately modelling device-related visual properties for the clinical domain proposed [70] as haptography from the radiology room. Bhalla [71] compared traditional and touch screen inputs in a single task, investigating the differences of the above input devices for the assignments on a variety of displays for medical monitoring devices. It becomes apparent that system developers need to consider the proportion of the input that their system requires when choosing between touch and traditional input devices [66]. While a touch input device modality may not lead to higher performance regarding speed and accuracy for the single tasks, other considerations, such as fatigue, spatial memory, and awareness of radiologist’s actions, might convince a system developer to choose single-interactions input over traditional input devices. Among the earliest work in the HCI field is the study of Schultheis et al. [72], which articulated the benefits of input on radiologist interface tasks. This body of research has typically investigated interaction with traditional devices or in virtual reality [73] devices, but not in tabletop system settings. Ullrich et al. [74] investigated the value of proprioception in asymmetric bi-manual tasks. They found that radiologists benefit from working in a single absolute reference frame when completing bi-manual tasks when visual feedback is absent. Geiger et al. [75] studied symmetric inputs performed with traditional device systems. They found that for many tasks, symmetric data outperforms not only single interaction, but also asymmetric contributions regarding performance and preference, and advocate the addition of a second interaction on traditional device systems. Yu et al. [76] describe an experiment in which participants performed better when using traditional device than when using touch device systems directly on a table while completing an asymmetric task. Because the job used in this study required pixel-accurate positioning, the author suggests that the superior performance of the traditional system may be due to the relatively large size and low accuracy of one’s fingertips. Both works allow us to address the fatigue issues that arise from a wall of interaction. Other issues of interest are the interactive angle for other kind of devices. Irwin at al. [77] studied this behaviour on their work investigating how the screen angel affects the user performance and fatigue. The fact that the standard monitor position used by traditional devices is not optimal, at least when using a touchscreen system, is supported by these studies. Understanding this principle improved our knowledge over interactive angle issues. That way, the development of our system is conditioned to an angle setup and prepared to the horizontal and vertical setting, as well as mobile devices (responsive) interaction.

20 Another relevant aspect that deserves interest is the presence of bias. This bias has consistent differences between the location that users want to interact with, and where they interact [78]. Con- cerning the preferences created by radiologists at various positions relative to the device, the results found by Lundstrom et al. [79] indicate that device biases depend on viewing angle. In our work, we must overcome the challenges inherent to evaluate multiple variables, noisy experimental results and to challenge to measure metrics while advancing our knowledge regarding device interactions. Also, quantify existing knowledge. This knowledge is an improvement, but still scarce to achieve quantitative results or, even more critical, statistical significance about those device interactions in the context of the clinical domain. Such literature from this authors is of an immense importance research supporting our research understanding and system decisions, by giving us meaning over the area.

2.5 Systems

Studies on breast cancer diagnosis system have been approached for many years from different angles of view: the cause of the lesion; the detection of the lesion; the diagnosis of lesion; the diagnosis systems; method of treatments before and after surgery. Two paradigms divide these studies. The first one, which defines breast cancer lesions as a local and regional lesion. The second one, as a precise lesion control where systems are helpful to diagnose and follow-up patients, as a preventive act and early detection of lesions. The breast cancer diagnosis systems work [80] is an overview of the studies that have been done to assist medical systems and user interfaces in producing more accurate and faster diagnosis of breast cancer patients. Systems that focus on data extraction, like the one Ganesan et al. [81] have described, a computer- aided system that can do malignancy probability estimation of mammography lesion assisting radiolo- gists to decide patient information management while improving the diagnostic accuracy.

Figure 2.8: Iconographic Display of Mammographic Features

21 The R2 Image Checker System (ICS) [82], identifies potential Region of Interest (ROI) from the de- tection of clusters of microcalcifications or speculated masses. First, radiologists read the images, then views the results of the ICS analysis on a display monitor. The radiologist may then return to the original image to confirm whether or not anything missing at a location on the monitor. Initiatives in nuclear medicine imaging and other techniques in breast cancer diagnosis have been taken by researchers to explore the nuclear medicine field for an improvement in the diagnosis of the lesions. This new approach is a novel way of using CADx technology, extending the paradigm of providing radiologists a follow-up feature of the annotated lesions. This system underlines the importance of a ICS, or more commonly called by us as follow-up feature, as an important feature of our system. A work from Chen et al. [83] claims that a reliable and effective screening mass protocol for pa- tients at risk with age ranged from 40 to 49 years old does not exist. An early pre-menopausal women patient at this age range have a denser breast tissue, therefore any lesions, masses or calcifications are not easily detectable by the mammogram. Contrast-enhanced magnetic resonance imaging, in the recent development, has been indicated as a promising complementary technique to mammography as a potential tool for screening of younger women patients, due to its three dimensional characteristics. The MRI based imaging tool does not require the use of ionizing radiation [84] but it produces higher diagnostic sensitivity, especially in case of dense breast tissue. The MRI imaging single modality tool [85] is a web-based CADx system for manual and automatic extraction and analysis of breast masses. Thus, the radiologists use a normal browser to view scans from a remote station, in order to exchange of know-how among the clinicians and improve data management. The proposed systems are important to our research, while it outcomes the promising results and facts. Therefore the web-based tool is likely to be improved by using accessible architectures for a better radiologist’s performance [86] on the diag- nostic relations between curve morphology’s, enhancement values and lesion detection. Emphasising the same need for accessibility in our system. Realizing the crude drawback of a mammography-based breast diagnose system, several studies [87, 88] are improving our data extraction technique by merging computer-aided systems in lesions extraction and progressive images with microcalcification. Different approaches were used in the single- modality to extract data, or to be more precise, to retrieve female patient breast images with higher resolution, with a goal to produce a system that could detect the smallest tumour [89] in breast cancer patients with less false-positive cases. What leads to unnecessary biopsies. Concentration on discov- ering the tumour in the breast with denser tissues (younger pre-menopausal women) is also done in a long aim to reduce the mortality rate due to this disease by introducing an early breast cancer detection system.

22 2.6 Overview

We present in Table 2.1 the most used medical systems: PHiP [41]; ActiveNotes [42]; M/ORIS [43]; TTBMIDA [59]; VRRRRoom [73]; FI3D [76]; MVT [79]; CADStream [89]; 3TP [89]; Mammatool [89]; R2 [82]; and the proposed MIMBCD-UI. Nine features are used to define each system, where mark  indicates the presence of the feature and mark  indicates the absence feature. Columns Contrast, Zoom and Anno. (Annotations) means that the UI has image processing features respectively. Follow- up means that the system is prepared with a follow-up feature of the patients to physicians. Patient Vis. (Patient Visualization) means that the system has a feature to manage and select patients (e.g. PACS) for an early visualization. Med. Img. (Medical Image) refers to all systems that work with medical images. UCD means the implementation of the methodology during the development of the system. Meta-Info. Gen. (Meta-Information Generation) means the feature of generating information from user interaction to be consumed by the ML algorithms. Finally, Other Env. (Other Environment) means that a system is ready for non-traditional devices (keyboard and mouse), for instance, touch-based devices, AR or Virtual Reality (VR).

Systems & Topics Works Contrast Zoom Anno. Follow- Patient Med. UCD Meta- Other up Vis. Img. Info. Env. Gen. PHiP          ActiveNotes          M/ORIS          TTBMIDA          VRRRRoom          FI3D          MVT          CADStream          3TP          Mammatool          R2          MIMBCD-UI         

Table 2.1: Table of Systems & Topics for the Related Work

Most of the related to CADx systems have the Anno. (Annotations) feature but lack into Meta-Info. Gen. (Meta-Information Generation). M/ORIS [43], MVT [79], CADStream [89], 3TP [89], Mamma- tool [89] and R2 [82] have the three UI features: (1) Contrast; (2) Zoom; and (3) Anno. (Annotations). However, just M/ORIS is prepared for the patients Follow-up and Patient Vis. (Patient Visualization) as the system is an intelligent collection of modules and sensors that are designed to specifically address the need for suitable user interaction. The work was the most complete as a relation to our needs wile the research done there remains a good guideline for our work. On the other hand, M/ORIS is a system developed to the operating room and for the surgery purpose (CAS), instead to the radiology room and diagnostic purpose (CADx).

23 Both PHiP [41] and ActiveNotes [42] are tools designed for patients Follow-up and Patient Vis. (Patient Visualization) for an Other Env. (Other Environment) devices. Permitting to visualise query patient data stored in the hospital database. Different UIs and interaction techniques are implementing both formal and informal user testing while performing comparisons between their impact on users. Also, the two systems were developed according to a user-centered approach (UCD), thanks to the collaboration of their physicians at the hospital. Beside the user studies conducted in a hospital at the requirement phase, both works performed with physicians evaluations of different prototypes, up to the final prototypes shown in their works. In the development and design of the system we aim to be persistent with other initiatives. For in- stance, the efforts of the Med. Img. (Medical Image) systems will bring together healthcare information system needs and medical imaging system needs, as a matter of fact, our system also process and manage medical images. M/ORIS [43], TTBMIDA [59], VRRRRoom [73], MVT [79], CADStream [89], 3TP [89], Mammatool [89] and R2 [82] are presenting basic concepts in image processing and manage- ment on an UI. However, just M/ORIS [43], TTBMIDA [59], VRRRRoom [73] and MVT [79] have relation between Med. Img. (Medical Image) and UCD concepts. There have been almost no attempts at developing and design a system in multimodality of the medical image producing large annotated datasets. As referred previously in this thesis, this is not an approaching concept, which makes it difficult to implement the system in a real world practice. We have seen several systems, proposed by researchers, presenting and describing the most used implementations in the clinical domain, and have studied how they can be used in practical systems [40, 90]. Throughout the years, several systems are presenting a relation between UCD and Healthcare research, implementing medical systems to support clinicians and patients. For instance, M/ORIS [43], TTBMIDA [59], VRRRRoom [73], FI3D [76] and MVT [79] systems were developing using the UCD approach as we already described. As we have seen in many contemporary systems, there is a big opportunity to improve the UI fol- lowing user-centred approach (UCD). From clustered displays (Follow-up and Patient Vis.), complex and tedious procedures, inconsistent sequences of actions (Contrast, Zoom and Anno.), inadequate functionality and insufficient informative feedback can generate debilitating anxiety and stress. While the use of UIs for multimodality of the medical image (Med. Img.) diagnosis is not new in medical and clinical settings. The breast cancer specifically has not been studied. Although our approach does not address diagnosis, it is of crucial importance to observe the radiologist’s systems in their typical envi- ronment (radiology room) and how they work. Moreover, we specifically work with health professionals. The above features are used to define each system purposely introduced to combine novel (Meta-Info. Gen. and Other Env.), yet simple, immersive healthcare visualisations and interactions.

24 Chapter 3

Methodology

The empirical methodology3 used in this section of this master thesis has shown to be successful in detecting the needs of radiologists, indicating the appropriate testing interaction system, performance measures and finally determining usability improvements. In particular, the user-based usability evalu- ation has shown to be efficient in testing an entirely new technique and eliciting useful feedback from the domain users, the radiologists. Also, the methodology here approached is thus advantageous and recommended for future innovative interaction systems in the radiology room, or more specifically, in the medical diagnosis for multimodality of imaging. Such need has to be tested on their usability and from no model-based usability evaluation that exists and for whom inspection-based usability evaluation does not suffice. Furthermore, this section is a welcome addition to the slim amount of HCI and UCD medical UIs research in general. It provides a practical methodology applying new practical medical interaction innovations, which will certainly become more and more important in the years to come. For medical imaging diagnosis, our method included observation, questionnaires, surveys, focus groups and interviews. All representations, focus group and interviews sessions were audiotaped and transcribed for later analysis. Interviews and focus groups are described as data resources by a set of initial interviews conducted with radiologists, to learn about current practices with medical imaging diagnosis, what are the clinical domain processes related, and what issues are important to radiologists. Radiologists were directly interviewed to learn their understanding of and experience with the do- main. They were also asked to relate their feelings about the prototypes. We led radiologist focus groups, where they gave some feedback to the first Low-Fi Prototypes and discussed possible issues. Finally, radiologists indicated what they would like to have in a prototype (phase Design and Develop Solutions via Rapid Prototyping of Figure 3.1) and their willingness to such use. A computer-use sur- vey was used to report the medical practices, where researchers asked information about the use of computer devices, features and UX. Throughout the assessment needs, we conducted a series of in- terviews with radiologists, including various Hospitals and Private Clinics, like HFF, SAMS Hospital, IMI Campo Pequeno, HSJ, Clínica Europa, Cedima and IPO Lisboa. These radiologists helped us identify prototyping needs that were not immediately apparent to us and lent support to some of the conclusive perspectives of the final UI, also known as High-Fi Prototypes [91]. For observations as assessment needs of the data resources, we observed the operational standards (such important concepts like DI- COM, CADx and so on), to better understand how radiologists work in the context domain. We also observed interactions between radiologists and those prototypes in the radiology room. As a result of this observations, we were able to identify specific strategies used to understand and conclude of the UI prototyping choices and further discussed.

3Defined as a term based on creating a methodology to apply UCD on the medical imaging field as a verifiable observation method or experience, instead then HCI theory or pure logic.

25 Figure 3.1: UCD Model, from Journal of the American Medical Informatics Association [92].

The medical imaging standard DICOM arose from the need to standardise the communications of digital diagnostics. It originally intended for transmission and dissemination of the medical objects through the TCP/IP protocols, over time, to include definitions in the standard. In this master thesis and research work we provide a medical imaging prototype for multimodality diagnosis using the Corner- stone JavaScript Library [93] and Orthanc Server [38] to support the prototype implementation. Also, a comprehensive evaluation of the system architecture and services is made describing advanced func- tionality for the visualisation and UIs integration. Focusing on Open Source components, our description of system architecture and services shows that advanced visualisation and suitably implemented UI prototypes also founding the field of the Open Source and not only in commercial products.

Our prototype implementation (phase Implement Solution of Figure 3.1) aims to transmit DICOM images via Hypertext Transfer Protocol (HTTP) for a later display on the UI in a multimodality of view. The prototype also has a set of features that lead radiologists to the annotation use case, where the diagnostic is made by annotating the masses and calcifications (lesions) of the breast. During the imple- mentation of this prototype, we encounter some challenges. Identifying a lack of interfaces for system integration has a key issue in clinical research. The available systems and tools are operated standalone demanding on a workflow optimisation and the system integration for our clinical imaging-based proto- type. For instance, powerful functionality for annotation, zoom and so on, must be required but is not integrated into a single framework. Implementing an optimal prototype turns challenging, particularly for our research application demanding a complex set of tools. A final solution implementation is further described on the next Implementation chapter.

26 3.1 Questionnaires

The focus of this section was the application of demographic and psychometric methods throw the anal- ysis and evaluation of questionnaires, understanding user profile and characteristics, and measuring4 the user needs for satisfaction with system usability and UX. The primary goal of the questionnaire methodology application is to: (1) understand and discuss demographic profiles, as well as psychome- tric characteristics of the users, measuring the user needs for satisfaction and UI fitting language that is translating a UX measure; and (2) provide statistical data from the users that will blend on research interpretation with a later prototyping analysis. Conventional wisdom, respecting to question orders, is that general questions should precede spe- cific questions. Likewise, evidence from some primary studies supported this assertion from experimen- tal and quasi-experimental studies on aspects of questionnaire appearance was scanty. Regardless of how some research is identifying outlines and theoretical basis for aspects of survey design. It sug- gests that the presence of a questionnaire can influence on the obtained responses and respondents’ decisions at several stages. Our inquiries took into account some aspects that have been suggested by Irwin et al. [94], where arguing that semantics and syntax, including the choice and order of response categories, can have a meaningful impact on the nature and quality of responses. A need for consistency is founding on our research work as a of visual information and an understanding of a graphical non-verbal language into the application. Where the spatial arrangement of data and other optical phenomena are essential choices to have in the questionnaire design, such as colour and form. In our case, we follow the rules related to spatial arrangement [94] as follows: (a) for the questions themselves, we choose linear interrupted; (b) for the answer options, we want lists; and (c) for answer tables, matrix. Throughout the online questionnaires, development was emerging doubts that prompted questions about points that would clog with the Low-Fi Prototypes. For a first sample of reasonability and report the quality of lesion has been claimed by radiologists to help us validate the online questionnaires and some of the questions. This first online questionnaire will be used to characterise the user from mammography imaging of multimodality UI being carried out in the framework of an innovative UI for monitoring and diagnosis of breast lesions in various medical imaging modalities. The questionnaire took about 10 minutes, dividing into eight short phases:

1. A simple description and introduction about of what the project is and what the objectives are to face on the rest of the project.

2. Understand the radiologist’s profile where we ask questions like sex, age, etc. It is also in this section where we try to understand a little of the professional pattern that most mammography professionals have and in this way is questioned how long are medical functions prosecute, which sectors where the radiologist works in that infrastructure and finally what speciality.

3. Characterization of the clinical captivity by questions about it, the technological preferences, the clinical practice and its support for visualisation, tools, technologies and software of its medical units.

4. Understand what training the radiologists have to work in the field. Asking if they have some training in digital mammography and their views given the need to update knowledge in digital

4Questionnaires supports the phases Assess and Analyze Needs, Rank & Select Needs; Identify Barriers & Incentives and Identify Solutions; Articulate Goals of Figure 3.1.

27 mammography and mammography certification.

5. Questions of technical nature, here asking which mammography image acquisition doses as well as understanding of exposure/dose indicators to monitor the quality of the examination. It is also in this section that is asked the opinion of the physician to the impact of digital mammography in reading/mammography interpretation time.

6. Question, the ideas of physicians on medical image-related technologies, applied tools for MG.

7. Question the opinions of physicians on medical image-related technologies asked tools for US.

8. Question the ideas of physicians on medical image-related technologies applied tools for MRI.

In this sections we ask physicians about what frequency they use the next post-processing tools:

• Contrast; • Fill; • Filters;

• Contrast Inversion; • Crop; • Histogram;

• Zoom; • ROI; • 3D Reconstruction;

• Pan; • Annotation; • CADx;

This list will bring us the information about the most useful tools to our UI and what are the priorities since the order of appearance tools is vital to radiologists when the time is a requirement to them. Finally in the last section, the ninth, the issue of digital over the analogue model with improving questions in the framework that brought the digital face to analogue and what is it covering.

3.2 Prototypes

Information from surveys and quantitative measures are analysing as the standard descriptive statis- tics. Those outcomes add meaning to the qualitative findings, an assessment needs [92], providing indications of features positioning and interaction aspects of the situation, that would not otherwise, be apparent. Assessment needs [92] can be used to explore what is currently occurring and how radiolo- gists feel about it, identifying potential solutions. At a first step, we worried about how to ensure that their prototypes, resource-rich, visually appealing, and easy-to-use, are useful to the intended radiologists. For our system, the Low-Fi Prototypes [91] were designed and develop with Balsamiq Mockups [95], where researchers used this tool to produce a set of evaluated prototypes. Those prototypes supplied the conclusions about improvements and issues of the UI. Also, gave us a perspective how users interact with their daily tools.

28 Identifying four significant gains during prototyping through the qualitative analysis process, we have the following list:

• Assumptions and goals;

• Issues and improvements;

• New features;

• User liability;

The needs identified in both types of analysis are further examined by us, selecting and prioritising those by addressing them. Organizational goals are considerations, but also, the consequences needs not being met and the available time, budget and expertise of the radiologists. Moreover, it is essential to identify the radiologist’s barriers where they may face and what incentives they may associate with the study. These will be important factors during the design and development of the prototypes. Once we fully understood the related tasks and the goals, we start the design and development of the prototypes. Many brainstorming sessions between our team supported the experiences we want to create for the radiologists as they go through about achieving the goals we have set, always keeping in mind the barriers they face with and any motivational incentives on which we may be able to capitalise. We applied what we know from theory and learning of this process. For each prototype, we created a storyboard that suggests the features, UI structure and controls to be provided. The aesthetics are purposely ignored until the finalisation of the Low-Fi Prototypes, holding on the "form follow function" principle, where the look and feel of the prototypes are developed after its functionality to support it. By creating the storyboards, we reference guidelines for a practical prototype and UI development. The guidelines we use address prototype design, the design of the prototype structure and user interac- tions. However, the instructions for the design of the look and feel of the prototype by later consideration. We have developed our set as the result of many hours of user observation, interviews and hours of research. It includes the design and development of the prototype, also, an evaluation of the state-of- the-art research; evaluation of successful health and clinical prototypes; and reviews of the literature on the instructional design, UI development, usability testing and UX assessment. During weeks we began transforming our discussion based and theoretical ideas into a conceptual model (see Report 10 at Appendix B) UI elements and workflows. We started with Low-Fidelity paper mock-ups since they let us easily, and in a cheap way, to create experimental UIs and modify the struc- ture of the elements. Using the common material supplies like paper, markers, index cards, scissors and post-its, we sketched the first screens and elements structure, including menus, options and tools. The paper mock-up with its handwritten text, crooked lines, and last-minute corrections wasn’t very neat but was enough to the user test radiologist participants for what the basic structure of information and preferences look. Figure 3.2 will show to the user the first screen of the log. This screen is important to us since is the open window of the UI and it is the first impact and how the physician will choose the patient. Even this is not our responsibility it is important to prototype this first screen to understand the usability and interaction between the screens.

29 Figure 3.2: Patients List.

On this screen (Figure 3.3) we validate how the multi-modality of the image organization throw the display. We also understand here where options should be and how the user interacts with this options as follow the left click selection to have the entire view of a breast.

Figure 3.3: SELECT Breast.

30 A mouse RIGHT SELECT Option (Figure 3.4) seems to be the most common way to activate a toolbox and as far as we understand it might be the best choice. It can increase the number of clicks but on the other hand will decrease time, for this reason, is a better throughput requirement since physicians give more value to time against a number of clicks.

Figure 3.4: RIGHT SELECT Options.

The screen shown in Figure 3.5 displays the other way to organise the toolbox as we can see here we will have each small breast screen a bottom toolbox. This toolbox reduces the number of clicks. However, we suspect this will increase the time to click.

Figure 3.5: Bottom Options.

31 Another issue is the spatial arrangement of the images, as well as, the tools that allow the modality selection (Figure 3.6). That way, Horizontal Modality Set (Figure 3.6) is one of the options that we show to radiologists evaluate how they feel more comfortable with it. It seems to us the most intuitive way to organise the modality set of elements but is where we free space since we only have a proper two breast screen width, however, it might be better to have just two but with a viable size relatively to more breast on the screen but with a low amplitude. Again, for us it seems to be the better choice on a Horizontal vs Vertical of the modality set of options.

Figure 3.6: Horizontal Modality Set.

On the other hand, Vertical Modality Set (Figure 3.7) is the second option that we show to radiologists evaluate how they feel more comfortable with it. We stress again the fact that this might not be the best choice since we loose inherent power. However, we increase the number of the breast on the screen and the number of sets on the sides of the screen since we can play here against the width and height. For the MIMBCD-UI project we bring together two separated systems on the same prototype, one for a list of patients, simulating a PACS system (Figures 3.2, 3.8 and 3.12), and the other a multimodality of medical imaging for the diagnostic of the breast cancer, also called a CADx system (Figures 3.11 and 3.13). The PACS system provides access to images from multiple modalities, where reports are transmitted digitally via PACS. The CADx system architecture is extended appropriately and integrated into the PACS, taking into account the geometric relation between different images and also be extracted by multi-modal data. The first stage of the proposed extended CADx Low-Fi Prototype (Figure 3.11) will allow to extract and combine characteristics of the lesion. The attributes from the corresponding Regions of Interest (ROIs) of the lesion will be creating a data-set from those and depicting the lesion in different projections in multi-dimensional diagnoses, such as CC and MLO (both MG), as well as the multimodality of medical imaging, US and MRI. The idea behind this strategy is that the radiologist will always use multimodality projections of a breast for the diagnosis of a mass or a calcification to partially solve superpositions of the tissues. This stage requires linking ROIs in different views that correspond to the same lesion.

32 Figure 3.7: Vertical Modality Set.

To include the radiologist in the design and development of a new medical and clinical information system, we adopt a participatory approach for user requirement gathering and prototype development. Our experience is describing the development of prototypes and aims to report the demonstration of some of the second Low-Fi Prototypes developed in Balsamiq [95]. The quality of data is a product of tools and techniques adopted. Requirement specifications templates, wireframing tools (e.g. Bal- samiq Mockups) and common iterative knowledge with stakeholders, play a crucial role in our imple- mentation of the participatory approach. Based on user requirements and paper prototypes, the UI development can be triggered. Bal- samiq Mockups [95] is a rapid wireframing tool which can build a rapid prototype in the software en- gineer, as said before. It can be used to draw an interface sketch for user interaction. Once radiologists find out functional and practical solutions, we are threatening this as the High-Fi Prototypes. UI proto- typing is the most critical part in this research work. It should follow the UI rules and meet the users need at the same time. The final prototypes of this Low-Fi phase should meet following requirements:

• The UI should provide support to the user, that should understand the multi-modality of imaging interaction. So radiologists can better and faster diagnose breast cancer.

• The UI should help the user find where are the breast masses and calcifications.

• The UI should help young, inexperienced radiologists to quickly familiar with the user interface.

• The UI must allow the radiologists to visualise the three modalities (MG, US and MRI) in different screen levels and sizes are measuring and diagnosing the breast cancer.

Something immediately pointed out by radiologists was the need to compare the last two MG modality images (CC and MLO) acquisitions on the main screen, but it is trivial that this can not be the first screen to appear. However the preferences were made, we must not forget to have a clear option screen as first to choose the patients (PACS) name and days (Figure 3.8).

33 Figure 3.8: Low-Fi Prototype: PACS System Simulation.

A proposed second screen will have four of the multi-modality of imaging as we can see in Figure 3.9, and we will consider this the main screen since the patient was already chosen.

Figure 3.9: Low-Fi Prototype: Multimodality of Imaging.

From the requirement of having the set of CC and MLO screen views we implement a prototype with most of the screen directing to this option as we can see in Figure 3.10. On the other hand, it is not enough information to show on this screen, and it is fundamental to have a set of filters with the last two date image acquisitions (CC and MLO) to be compared between each other.

34 Figure 3.10: Low-Fi Prototype: Mammography Example (Group).

We have used a participatory approach to gather UI requirements and to model prototypes (Figure 3.11) for analysis and developing activities. For instance, many of the user requirements are themselves research questions (see Report 4 of Appendix B) for us related to the collected data during this phase. Likewise the perception of the right spatial arrangement, UI structure and interaction techniques were made thanks to the participatory approach. It is here where we understood how radiologists are doing breast cancer classification and what they do with different cases of the diagnosis. We faced special de- sign challenges as we implemented the prototype guidelines (CCD) to meet clinician requirements and needs. As the prototype emerges, it is refined through rapid prototyping, by an iterative method (UCD) of the staged development, early and evaluation analysis. Repeated cycles of prototype development, evaluation and revision took place. We rely predominantly on two evaluation techniques during prototyp- ing that will support the user requirements and needs: (i) participatory approach; and (ii) user testing. We often use think aloud 5 protocols with both of these techniques, in which a radiologist is asked to think aloud at the same time the interaction is made to accomplish the task goals. This phase also aims to use a participatory approach in subsequent phases of the UI appropriate life cycle, like the High-Fi Prototype development described in the next section.

5In our thinking aloud test we asked radiologists while continuously thinking out loud using both Low-Fi and High-Fi Prototypes.

35 Figure 3.11: Low-Fi Prototype: CADx System Simulation.

To summarise this section, a UI which is mainly facing the diagnosis of breast cancer for the multi- modality of medical imaging was prototyped. MIMBCD-UI already has similar Computer-Aided Detection (CAD) UIs, which is primarily for the researchers a base comparison. The work from this section has the potential to make radiologists benefit from it. The prototyping design and development will base on the understanding of the current CADx system and the MIMBCD-UI research study case. In the depth observation and horizontal comparison of the MIMBCD-UI data, we found out that the difference in post- operative pain among different radiologist groups does exist. The CADx view is being an efficient way of dealing with this kind of database. The interface will be developed for the current CADx, PACS and DICOM server system examples.

3.3 Interviews & Observation

In this section, we describe our interviews and observation methodology approach, discussing what the weaknesses and strengths are. Transcripts were selected as the method because the number of test users was quite small. These transcripts make possible to observe radiologists during the interview, giving an opportunity to make the evaluation flexible. Observation was selected for gathering radiologists non-verbal expressions and understand all radiology room process sequence. Because a radiologist may not be aware of related and past experiences that could be important to us, or capable of expressing them verbally. Our evaluation techniques were interviews before and after the prototyping phase and observation during the use of interviews. The term UX encompass the concepts of usability and affective engineering but it is not clear. User interviews and observation were conducted to design and develop a final UI prototype. Additionally, it will validate the UI and improve some issues that might not be clear from the previous phases or even from the iterative process of developing a prototype. Interviews and observations are suitable methods to gather and evaluate final issues of the UI, as well as, Usability and UX of the UI. These methods

36 are essential, because radiologists may not be aware of typical experiences or be capable of express- ing them verbally. In our clinical domain, Usability and UX broadly describes all aspects of interactions between radiologist and UI. UX, in general, has been captured with techniques like interviews and ob- servation regardless how HCI research area (UCD) is understated. However, UX and its evaluation, a non-establishment of the relation with medical and clinical (CCD) field not made yet. One reason of this may be a weakness on the definition of UX and its relationship to usability. Radiologists were asked to think aloud 6 during the user tests. Interviews and observations are recording, more specifically the interactions between radiologists and the computer were saved into screen video record, and we used applications for record interaction (number of errors, number of in- teractions, positions and screen areas). The interview questions are developing the basis of literature review [96]. Questions concerning radiologist’s domain experience, adaptability, the context of use, medical and clinical factors were important and obtained by adding the observation to interviews. From selecting the observation, it is helping us understand the radiology room while getting some information about the radiologist’s use cases. The prototype was tested by radiologists. The aim was to collect performance metrics and different radiologists experience. We test the prototype with eight test users. The aim was to collect different performance, usability and UX measurements. All radiologists were familiar with PACS (Figure 3.12) and CADx (Figure 3.13) systems. We do not use more radiologist tests because the first evaluation session aimed to collect preliminary information of the suitability of interviews and observations for performance, usability and UX research.

Figure 3.12: High-Fi Prototype: PACS System Simulation.

It was essential to understand what colours do we choose for the UI, what should be the feature order, the interactions that we must have on the UI like a drag-and-drop functionality to manage the multimodality of images on the UI, and so on. In the planning research, those factors are defining and deciding about how to capture and which information we need; for instance, whether experience relating

6The thinking aloud technique was important here to understand the exact moment when the radiologist understand the use case and proceed the action.

37 to UI issues is required to gather it, either adaptive to the context of use. As a further matter, a test atmosphere and situation have to be as natural as possible for the radiologists because it will affect by forming the kind of performance, usability and UX. So we took the radiology room for that, whether we were in the hospital or in a clinic. From interviews, it is crucial that the questions related to UX measures are straightforward so that the radiologist can understand them easily. Also, the questions order may affect how the radiologists interpret the problems, and this will influence the answers. Moreover, the radiologist should not be prompted by asking some questions about UX before the UI tests and before it is a topical issue. This approach is a challenge for researchers because we have to find a balance for when to ask questions and when to expect the radiologist to tell about the issues freely. One of the findings was that if the radiologist was interacting focused on the UI during interviews, the radiologist may not be so concentrated to the interview questions but had better performance. On the other hand, when radiologist was more focus on the interview, we made a better discussion on the UI issues and improvements with an expression of opinions. We develop the test scenarios so that it was appropriate for the test environment, the radiology room. The setting was a radiology room, with a table, a chair, a laptop, keyboard and a mouse. The improvement ideas from the first radiologist evaluation are taking into account in the next assessment. On the first radiologist evaluation, we understand the right number of DICOM images we need to test to have the proper throughput between time7 and collected information. Hence, we record interviews and observations with a screen, interactions and sound record. Before the actual tests a pilot was performed in the lab, and it confirmed that the test scenario and cases were appropriate. We update interview questions from the Questionnaires and Low-Fi Prototyping phases, where we concerned the radiologist’s prior experiences and expectations, the prototypes relatedness and reliability, context of use and medical factors. The evaluation of the performance will need more efficient ways to catch the performance. For instance, tests were screen saved in a perspective that gave us further information about the radiologist’s number of errors, number of interactions, number of annotations for a lesion, time per each use case and hit rate. Information about what is happening on the screen was captured via radiologist’s thinking aloud and the research standing the test user and a later watching of the screen in the lab.

The use of screen recording elicited some new questions for performance research:

1. Does the order of the features influence time performance and task completion?

2. Does the interaction techniques, like drag-and-drop, will improve the multimodality of imaging view for radiologists?

3. How can we reduce the number of errors?

4. How can we in the same period, produce more annotations?

7We estimate a maximum number of 10 minutes approximately to the use case tests done by radiologists in the radiology room. An early work [59] concluded the number of minutes estimation developed by us.

38 We wrongly assumed that the main purpose feature, the annotations, should be the first one, since was our system purpose, concerning the first questions. However, radiologists first, try to adapt (Figure 4.1) the DICOM image by configuring the luminosity (WW/WC), or even Invert colours, next they Zoom the DICOM image and they move (Pan) by positioning it. In the end, they do the annotation. A reflection, that we made, after understanding that radiologists seemed to have more pleasure interacting with the system. For that, a drag-and-drop mechanism to manage the various modalities of medical imaging on the screen, was also important to have in mind, filling the answer to the second question. The third question, took place when we understood that typically radiologists make some mistakes and they like to re-annotate some less accurate annotations. Finally, on the fourth question, we focus on trying to improve the number of interactions, for instance, the number of annotations on the same time, since will bring more information to the Meta-Info. Gen. (Meta-Information Generation) and more learning input points. The following paragraphs will describe each question and answer with more detail.

Figure 3.13: Hi-Fi Prototype: CADx System Simulation.

For the first question, a broad evaluation showed that the order of the features influence time per- formance and task completion. It should be formulated in a particular way, so that, the radiologist could understand rapidly what the task is and how to conclude it. Therefore, user’s language is a fundamental step to reduce the entropy of the process. Situations with radiologist tests can be like an open session, giving us the possibility to create several options to be tested. One of them is the order of the features and buttons on the menu toolbar. That way we better understood the process hierarchy and optimal order of the elements. However, there were exciting challenges to be clarified. First of all, testing tasks should be formulated and explained very carefully so that the radiologists can understand them eas- ily to do not influence the completion of the job negatively. Second, radiologists can express opinions about the system and its characteristics, but verbally describing feelings is more difficult. Also, what radiologists think it could be useful for them, can not be right in fact. For the second question, it shows that thinking and testing new interaction techniques also gave us information about radiologists behaviour. However, we need to interpret the radiologist’s inputs and outputs, because the profile and characteristics of each radiologist will affect the performance of each

39 interaction technique try. Moreover, users make interactions very differently, for example, while one uses the full features set (WW/WC, Invert, Zoom and Pan) for manipulate (Figure 4.1) the DICOM image, while others just use some of them. Overall, radiologists were intrigued by the multimodality visualisation and felt that implementing new techniques of interaction, like a drag-and-drop technique, provides value to the UI prototype. The drag-and-drop technique for the multimodality management of the medical image sets is especially interesting. It provides an intuitive technique for manage and manipulates large sets of medical images. For the third question, we have observed a direct correlation between the number of errors and the lesion morphology. The more irregular in shape the mass is, the higher number of errors often a radiologist makes in the delineation. Also, the higher number of masses and calcifications on the screen will produce more number of errors. To deal with these issues, at first, we develop several buttons to Edit, Redo, Undo or Erase those annotation errors. But then we understood that the mouse interaction was enough to cover this task. The solution was easy, when a radiologist wants to fix an annotation, just need to press the wanted annotation to fix. Removing buttons and features reduce the number of errors while we also remove interaction entropy of our UI prototype. Fourth and final, from our studies it was clear that using bullet points is better for the annotations number during the time, since we observed faster actions and less number of errors. Also, it will improve our Meta-Info. Gen. (Meta-Information Generation) by giving the right coordinates of those bullet points. We confirmed that several methods need to be used on a later medical image manipulation from the retrieved information of those annotations, to obtain a better data set. On this Interviews & Observation phase we conclude that the best chance we have to improve annotations over time is giving radiologists the most common technique to annotate medical images. Typically, radiologists are used to interpreting image by pressing several bullet points around the lesion. In contrary to we first thought, as we first wrongly thought on a drawing technique over the lesion. In addition to the interviews and observations, we will need more efficient ways to get information about UI performance, usability and experience (UX) conclusions. Concerning on a later data interpretation of all knowledge, we are discussing this on the evaluation chapter, where we describe, analyse and present performance improvements related to annotations number.

40 Chapter 4

Implementation

As a major decision to take in a project like this, we are presenting our choices over the available tools. For our work, we need to find modules that provided the ability to execute our primary operations and requirements. In the following sections, the architecture of the system is explained in detail, as well as, other network related aspects.

4.1 System Architecture

In this project, a major decision to take into account is the technologies choice, like programming lan- guages, libraries and frameworks. Although our experience in JavaScript, HTML and CSS was not at an expert level of the technology, we knew that it was a very flexible set of technologies for a UI mul- timodality of imaging development, also, allowing a multi-paradigm approach (JavaScript), such as (i) Object-Oriented; (ii) Interpreted; (iii) Client Side; (iv) both Imperative and Declarative; and (v) Functional. Furthermore, it has complete standard libraries for our Research Project, allowing the development of the prototypes. For our work, we needed to find tools that provided the ability to execute the next central operations:

1. A DICOM image objects server, to serve a data base of DICOM images sending them via HTTP later showed by the UI viewer;

2. The UI must read and acknowledge the DICOM image objects server;

3. The UI must offer a multimodality of medical imaging visualization, to manage the various modali- ties of a medical image diagnostic;

4. The UI must allow radiologists to manipulate (Figure 4.1) the DICOM objects:

(a) Change Luminosity (WW/WC); (b) Invert; (c) Zoom In and Zoom Out; (d) Image Position (Pan); (e) Stack Scroll; (f) Length Measurement; (g) Manage Number of Windows; (h) Annotations (Freehand);

41 5. The system must generate a data-set information saved on a source file (later this information will be consumed by the machine and used by the ML Algorithms);

Figure 4.1: DICOM Image Manipulating Tools.

In this section, we explain the architecture of the system in detail. Throughout this master thesis, several standards will be introduced, placing the project in a context of research work, which includes some objectives and the methodology that will be followed to complete them. Finally, the process of implementation and the final solution is described.

4.1.1 Image Processing

Before analysing the UI development technologies, let us briefly introduce the DICOM image objects server, to be able to describe the aspects of particular relevance for the project. We are doing the communication between devices by using the DICOM network protocol via Trans- mission Control Protocol (TCP) and Internet Protocol (IP). It establishes a connection, devices or appli- cations negotiate who is the client and who is the server and establish an association. Once associated, orders can be sent to copy, save, delete and move DICOM objects. DICOM objects are not limited to image data. It contains a data structure with a large amount of meta information. Each element is clas- sifying within a DICOM object according to a dictionary of identifying tags uniquely a data field. Among these labels, very numerous, some fields point to the patient’s data (name, date of birth), data of the image and to what study and series the object belongs. To clarify what a Serie in DICOM is, we must refer to the hierarchy, present in Figure 4.2. In this standard one Patient has multiple Studies, one Study has multiple Series and a Serie has multiple Instances (images, DICOM objects). One of the primary goals of the standard is identifying the DICOM objects uniquely. On a DICOM object, the patient’s data, the image and other data that relate to one object to the other (such as series and studies), bounding it unequivocally. A part of the standard describes how requests and responses should be in a web service that serves DICOM objects and throughout its evolution has incorporated definitions for web services and RESTful8 [97] web services (Figure 4.3). This part of the standard is of particular interest to the development of the project since the goal of allowing access to objects like DICOM ones via a UI is to implement a service that complies with the specifications of the standard.

8RESTful: architectural style for software design of a web service, defined for sending, retrieving and querying for medical images and related information.

42 Figure 4.2: DICOM Image Objects Hierarchy.

The DICOM standard includes a Web Access to DICOM Objects (WADO) [37] definition. This def- inition has evolved considerably, titling it as web services for the medical image. The advantage of a WADO service is the alternative way of communicating with a PACS server through HTTP to retrieve objects, where we do not need to deploy the communications protocol of the DICOM standard. This communication opens the door to information servers for web clients that understand the format of a DICOM object, but not necessarily the communications protocol, as would be the case with a DICOM based viewer UI prototype. It is important to underline that Open Source projects mainly support the developer tools. The first type of solution is a full DICOM server with capacity storage, management, exploration and visualisation of DICOM objects. For this reason, we need an application as the middleware/communication interface that covers those needs. In a way where it is possible to define individual flow imaging data between multiple PACS servers and easy management of the data workflow. Among several options we found the Orthanc project [38] to be our PACS service server. Orthanc is an Open Source project under a free software license. The project offers a medical imaging server (PACS) with the capacity of management and exploration of the contents, as well as, the visualisation of the same image sources in the explorer web browser using its implementation of a visualizer for medical images. Also, they have an implemen- tation of the WADO protocol so that other third-party applications, such as web browsers, or other PACS services can retrieve server objects via HTTP without the need to establish communications through the DICOM communications protocol. Orthanc (Figure 4.3) is fast and easy to administer, designed to improve medical imaging flows in the radiology room. Multiple instances of Orthanc can be deployed within a hospital to establish electronic gateways independent of PACS. With the use of Web 2.0 technologies, Orthanc offers a modern and straightforward programming interface that hides the complexity of the file format and the DICOM protocol: scripts can automate tasks such as transfer or anonymisation of images. These unique features contribute to addressing the issues raised in this section. The objectives of Orthanc that we have just stated place strong constraints on the computer architec- ture. In particular, it should be possible to deploy multiple instances of the server on the radiology room network (to create numerous DICOM gateways), or even on the same computer (to pool the material costs). Since then, Orthanc is designed to be both lightweight and self-contained. It does not ask for material and does not need any external software to function: Orthanc integrates its database system (based on SQLite) and stores DICOM files directly in the computer’s file system. Thanks to Orthanc, it is effortless to convert any computer from a remote server into a DICOM server: download an executable file and launch it with a set of terminal commands. If necessary, a configuration FileSystem text can be adapted to adjust the DICOM parameters, such as the application entity title or the TCP port number.

43 Figure 4.3: Orthanc Software Architecture.

The Orthanc server is programmed to run on both desktops (often using ), as well as on servers or virtual machines in the radiology room (which often use ). To enable this portability maximum, Orthanc is programmed in the C++ language. This strategy also confers speed to turn on low- end equipment. We took special attention to the fact that desktops and virtual machines are not large storage space equipped. However, since DICOM images are rarely compressed into clinical routine, Orthanc can be configured to compress the incoming DICOM images, non-destructively (using a set of Tools). The image configuration typically doubles the storage capacity of a computer. Similarly, Orthanc implements a system for recycling disk space: by erasing the eldest series of images automatically, it can modify the source as soon as the disk space falls below a certain threshold. This modifier is very useful for implementing a DICOM buffer between systems. A mechanism to protect some important patients from recycling is also integrated. Figure 4.3 shows the internal architecture of Orthanc. We highlighted in blue the third-party libraries9 embedded in our UI prototype.

Opening the Orthanc source code also contributes to knowledge sharing on the DICOM standard ac- cording to an academic approach, and proposes a pragmatic response to an interoperability challenge and the need for uniformity in medical imaging practice. Thanks to its support of web technologies, Or- thanc opens the electronic and secure exchange of medical images between hospitals and/or between clinical companies. The use of such electronic transmissions would limit economic costs, environmental costs and the risks of losing the practice of medical imaging. Further, Orthanc development by focusing currently on expanding the support of the DICOM standard, on the development of more sophisticated UIs, and on the integration of primitives dedicated to nuclear medicine and radiotherapy.

9A third-party library is a reusable component of the development that we use to support our system. Several were used by us (e.g. Hammer.js, jQuery and so on).

44 4.1.2 Proposed Architecture Components

Our proposed system can be framed into the diagram shown in Figure 4.4. More specifically, the imple- mentation of the (i) Client service; accessed by the (ii) MIMBCD-UI services; and served by a (iii) PACS server (Orthanc). Part of the DICOM standard must be accepted as a service of the PACS server type and must be the response answer of the MIMBCD-UI services, in order to determine the parameters. Additionally, the standard includes an exception with respect to several parameters. We will first describe the (iii) PACS server (Orthanc), since we first upload the DICOM images through this system. Second, we will describe the (ii) MIMBCD-UI services, while it uses several libraries. Third, we will describe the (i) Client. The standard defines parameters in two variants; required and optional. If a PACS server (Orthanc) supports search by unique instance identifier, parameters of study and series are not necessary. The required parameters are those that identify a DICOM object unequivocally; the study is identifying series and instance. All other parameters refer to modifications on the image or format of the data. This modifier is the case of a server, implemented by our system, in which we treated and validated the parameters while doing searches, reducing the size of transactions. Figure 4.4 shows an UML sequence corresponding to with the execution of an HTTP Request on the MIMBCD-UI (WADO) services. We explain the design of the service below. Among these tools, we can find the support of Cornerstone [98] libraries. These libraries have in common both HTML5 and JavaScript, so they allow the manipulation, display and access to DICOM object information completely in a web browser (Client). Upon receiving an HTTP Request, it collects the parameters and links them to a data model. If the request is validated correctly, the service searches for the location of the object in the PACS and if available responds to the HTTP Request by returning the object’s data. Otherwise, the request is answered with a HTTP Error Code as specified in the standard WADO. Possible answers are 406 Not Acceptable for Invalid Requests and 404 Not Found for searches without results. This phase of the project marks a significant turning point for the development of the system. The stage researched the different types of DICOM viewers and similar tools that are available. The objective of this research was to determine the most suitable for the development of the DICOM based viewer as a browser UI. Therefore the focus of search were solutions web-based. The answers researched were mainly from Open Source projects.

In the research study several aspects were considered:

1. Ability to run in a browser environment.

2. Compatibility with syntax transfer (or encodings) supported by PACS.

3. Ability to recover objects via UI.

4. Image manipulation capabilities.

5. Capacity for integration (possibility or facility to adapt the visualizer to the project).

The ways of clarification, for the integration capacity, are considered mainly two factors, if the tool allows us to configure it for an existing PACS easily and if the application type is compatible with a Ama- zon Web Services (AWS) server like a Elastic Compute Cloud (EC2). This last aspect is only considered for the convenience that the PACS services used with MIMBCD-UI (WADO) services are deployed on an AWS server. As the research of this study, it was decided to explore the Cornerstone [93] libraries to create a web-based DICOM viewer. While it is a solution that requires greater implementation effort than

45 Figure 4.4: UML request sequence to the MIMBCD-UI project services. those studied, which offer a fully functional image viewer, it is a basis for with multimodality of medical imaging capabilities. Cornerstone is a JavaScript library designed to show medical images in a web environment by doing use of the HTML5 Canvas element. Cornerstone Core [93] is not meant to be a complete application itself, but instead a component that uses a part of a larger more complex applica- tion. The MIMBCD-UI (WADO) services are an example of using the various Cornerstone libraries to build a simple study viewer. Cornerstone Core [93] is agnostic to the actual container used to store image pixels and to trans- port mechanisms used to get the image data. As a matter of fact, Cornerstone Core [93] itself has no ability to read/parse or load images and instead depends on one or more ImageLoader and - Parser (Validate () and Parse () from Figure 4.4) to function. The goal here is to avoid constraining developers to work within a single container and transport storing images in a variety of formats (e.g. DICOM). By providing flexibility concerning the container and transport, obtaining the highest perfor- mance image display as no conversion requiring an alternate container or transport. We hope that developers feel empowered to load images from any image container using any transport. See the CornerstoneWADOImageLoader [93] project for an example of a DICOM WADO based Image Loader.

46 However the Cornerstone Core [93] does not include any mouse, touch or keyboard bindings to manipulate the various image properties such as scale, translation or WW/WC. The goal here is to avoid constraining the use of this library to fit into a given system paradigm. We hope that researchers are empowered to create new standards possibly using new input mechanisms, as we did on an early [59] research work, to interact with medical images (e.g. Hammer.js). Cornerstone does provide a set of Application Programming Interface (API) allowing manipulation of the image properties via JavaScript. That way we implement some tools built in on top of Cornerstone [93] Libraries, like a touch UI, to the medical imaging diagnosis.

To summarise, the architecture level is composed of three elements:

• Core. As the application engine, it handles aspects related to representation from the image; manage the cache of images and upload them, draw them on the Canvas, enable them and disable them as needed, modify the representation of the pixels if the element changes of size, etc. The core application engine was implemented using the cornerstone Core library.

• Image Uploader. This component allows defining the method of transport and the container which will use the application. It is in charge to carry out the request of a DICOM object and to interpret the data, thus should be able to decode the appropriate transfer syntax. The project has already implemented an ImageLoader thanks to cornerstoneWADOImageLoader library as a system that allows you to recover a DICOM object of a WADO service.

• Tools. Tools that modify the representation of an object. Typical tools applied to a DICOM object are; move, zoom, reverse colours, calculation of angles and lines, reproduction (if it is multi-frame), etc. These tools were implemented thanks to the cornerstoneTools library.

The advantage of this architecture design is that the application is agnostic to the tools, while using chargers, so it is possible to define new tools or UI chargers for different uses without rewriting the application core or any of the other components. Supporting this project by the necessary means, Cornerstone offers to develop the UI of a DICOM image with annotations as it provides sufficient handling capabilities to meet the objectives of the project. Once the MIMBCD-UI (WADO) services (Figure 4.4 and Figure 4.5) are brought together and the technology for implementing the UI (Figure 4.5) started the design phase that will serve to explore the content and produce the use cases. The development of the UI is divided into a specification require- ments phase, outline of the interface and finally implementation of server logic to integrate the search capability with the CADx and PACS. The back-end, or server logic, of the application is implemented also in JavaScript language on the Node.js framework following the architecture to facilitate the division of responsibilities and maintainability of the UI. The application controllers are responsible for translating the interaction of the User (Figure 4.5) with requests. For the front-end, or client logic, of the application use JavaScript, HTML5 and CSS3 to handle the data returned by the server and modify the behavior of the Client (Figure 4.4) side. The User (Figure 4.5) receives a view in HTML format and resulting in a JSON (JavaScript Object Notation) data generation (R&D of Figure 4.5) as Operations are doing by the User (Figure 4.5) on the Client (Figure 4.4) side. The Client (Figure 4.4) application is responsible for present data dynamically by loading only those parts of the results table with which we have User (Figure 4.5) interaction.

47 Figure 4.5: Deployment of web technologies in the PACS.

For the phase of the DICOM display, the project offered freedom in terms of design and character- istics, limiting only the tools of manipulating objects that may be presented in said system, as a static DICOM Object (Figure 4.5) representation [99]. This is because of the reason why the application of measurement tools requires a high degree of verification that is not desired for an application which will not be used for medical diagnostic purposes. The DICOM Object (Figure 4.5) view application is based on Cornerstone (MIMBCD-UI of Figure 4.4 and Figure 4.5) and vanilla JavaScript (Operations of Figure 4.5). In this application Cornerstone (MIMBCD-UI of Figure 4.4 and Figure 4.5) is used to obtain and manipulate the data of the image while the state of the manipulation tools (Operations of Figure 4.5) and the interface are controlled with a vanilla JavaScript (Operations of Figure 4.5) application. The op- eration of the display is explained below. When an User (Figure 4.5) selects an image from the Browser (Figure 4.5), the DICOM viewer application launches in a new window. On the server side (MIMBCD-UI of Figure 4.4 and Figure 4.5) the controller responds with a view containing the DICOM reference to a DICOM Object (Figure 4.5) and to an application based on vanilla JavaScript (Operations of Figure 4.5) that controls the Cornerstone (MIMBCD-UI of Figure 4.4 and Figure 4.5) application. When the view has loaded on the client, the application display passes the HTTP (Figure 4.4) reference to the loader of Cornerstone (MIMBCD-UI of Figure 4.4 and Figure 4.5), who asks for the DICOM Object (Figure 4.5) and shows it. Once we complete the project goals, we would have an architecture with expanded capabilities, enabling web services using our MIMBCD-UI source with WADO services and the browser application. Figure 4.5 shows a diagram of the state after a theoretical deployment. Previously, by completing the prototype, the three initially objectives raised and as a result, obtaining from three individual components together functionality to meet the overall aim of the prototype. Reca- pitulating on the tasks carried out we can mainly analyse two aspects of it. First, there are the technical decisions like the choices in the developed technologies. The second is the results of the research in the search for solutions to annotate DICOM images in a well-designed UI. From a technical point of view, we obtain the results with technologies chosen for the development of different parts. We used some analysis to study the performance and experience of our prototype. Although, the primary goal of the development of this prototype, was not the study of an exclusive interaction between the user and the interface. We notably demonstrated it in the next section of this master thesis, making the prototype easily scalable and expandable in the clinical direction. The overall progress of the prototype at the development level has been satisfactory, although perhaps with better planning and forecasting of tasks could have come more far, especially regarding some features (see Future Work section).

48 Finally, a significant aspect of the prototype is the successful outcome of the research into the so- lution to annotate a DICOM image. With the research carried out as the course of the prototype has determined that there are many variables to carry out from the development of the prototype.

4.1.3 Auxiliary Files

The system has an extra source of auxiliary files that directly interact with several modules and are important to the prototype system. Some of this auxiliary files are related with Figure 4.4 and Figure 4.5.

The auxiliary files are listed as follows:

• Image Loader: The ImageLoader is a JavaScript implemented as a plugin that is responsible for taking the DICOM image as an ImageId and returning the corresponding pixel data for that same image. While the loading DICOM images frequently requires a call to the Orthanc (PACS of Figure 4.4) server, the API with the purpose of loading the image needs to be asynchronous.

• Promises: Handling asynchronous (HTTP Request of Figure 4.4) operations in JavaScript is made via Promises. Our prototype will therefore assume that image loaders return a Promise which the prototype will use to receive the pixel data asynchronously or an error, if so.

• Image Rendering: An ImageRendered event call is well emitted by every time the image is re- drawn or an annotation (Operations of Figure 4.5) is finished on our prototype. This event includes the draw feature that can be used inside the image by using the HTML5 canvas context10 (e.g. text or geometry). Also, HTML overlays can be updated with various viewport properties such as zoom, window level and window width.

• Viewport: Each element that is enabled has a viewport (UI of Figure 4.5) which describes how a DICOM image should be rendered. For an enabled element, the viewport can be obtained via the getViewport() method and by using the setViewport() method it can be set.

• jQuery: The main goal of using jQuery as a dependency is because its custom event handling and deferred support. Both of these could be replaced with non-jQuery libraries or hopefully some equivalent tool.

• DICOM Parser: The dicomParser method from a lightweight library for parsing DICOM byte streams (DICOM Object of Figure 4.5) in modern HTML5 based and Node.js, it is fast, easy to use and has no required external dependencies. Actually displaying DICOM images is quite complex due to a variety of pixel formats as we argue already. Also, compression algorithms supported by the (DICOM Object of Figure 4.5) are a hard abstraction issue. The ImageLoader method uses this library to extract the pixel data from DICOM files and display the images on the UI prototype.

10The Canvas Rendering Context interface is used for drawing annotations and text onto the DICOM images.

49 4.2 Services

There are other services solutions, but for the proposed architecture the AWS provides a scalable environment for processing and storage medical images. While basing the proposed prototype on a client/server architecture, where the server is in the cloud (AWS), the traditional structures do not allow radiologists to interact, analyse and visualise patient information remotely stored in a DICOM repository, as our prototype does. This approach has the drawback of requiring massive data transfer between the server and client limiting the number of processing actions on a DICOM data on a local computer to an underlying image enhancement algorithm. In our prototype, by storying the DICOM image from a patient in the server, it can be processed and analysed by the MIMBCD-UI prototype UI. Also, by using a large number of machine nodes (reaching performance), and sending to the final user (radiologist) remotely with relevant information only or even image data information.

4.3 User and Technical Manuals

User and technical manuals are using as many as different kinds of environments: (i) they may be used on a low light room or in a brilliant place; (ii) they may be used in a comfortable and user-friendly setting or in an environment that is hostile or worst, dangerous. Those are different environments. Some basic rules can ensure the user and technical manuals to a right use of them. First of all, we need to provide the user, and directly accessing a technical manual by the UI or at least comfortable to be assessed. Second, we need to consider the light environments of radiologists, in our case the radiology room, where the light is dim. Thirdly, provide a mechanism or a manual design that gives the user the opportunity to assess the manual and work at the same time. Finally, from the most common UI issues we aim to frequently update both user and technical manuals following the user and system needs. The MIMBCD-UI prototype User and Technical Manuals are provided to help radiologists and researchers understand how to use the UI as a tool. The document at Appendix B on sections User Manual and Technical Manual also contains details to explain how the UI work and configurations. Including the required file formats and sample issues on those documents.

50 Chapter 5

Experimental Evaluation

In this section, we present an experimental evaluation on integrating the above UI prototype in a real- world scenario. More specifically, we will explore the radiologist’s receptivity to the current solution.

5.1 Approach

We present a performance and experience analysis approach conducting an aligned study of a UI pro- totype with the previously mentioned evaluation and results. We will demonstrate that the UI prototype is comfortable to interact with and fast enough to solve the annotation tasks. For each DICOM image, the same simple image feature (e.g. delete/correct annotation) is considered. Regarding speed and accuracy, we studied the performance differences of our UI prototype for a variety of clinical and medical interactive tasks with different DICOM image difficulties (i.e., shape irregularities). Each user task has a harder DICOM image from the previous one and a harder feature task to do. This feature includes features such as ROI annotations and length measurements, among others, as referenced before in this document. Although, UX while inserting annotations is important when evaluating inputs, there is still a lack of research examining how the experience compromises the interactive UIs. To tackle this, in this section we focus primarily on cognition and time-performance. We also considered the experience as a means to validate the sensation, affect and value of the interaction. Indeed, while applying UX analy- sis and evaluation to clinical systems [100], time-measure and error scales are still missing. Moreover, the study of the UI prototype performance can improve medical and clinical competences, since it can reduce the running time DICOM images to complete common tasks in the diagnosis process. We focus on-time performance and human error reduction and adding more understanding about time performance relation between each radiologist mentioned above. In the attempt to meet the usual work conditions that radiologists have, we brought the most common interaction workspace into the hospital meeting room and into the clinics we visited. The experiments were conducted at the radiology services of HFF, SAMS Hospital, Clínica Europa and IPO Lisboa. The radiologists are sitting in the most comfortable position in front of a traditional environment. An experiment was designed by us to understand better performance between inputs made by radiologists. More specifically, we are interested in developing a deeper understanding of a radiologist’s needs for motivation and satisfaction, when using new UI prototypes and comparing conditions to better understand the obtained results through standard methods of time performance (e.g. the number of interaction annotations and hit rate score).

51 5.1.1 Participants

The study (Appendix A) is performed at the HFF, SAMS Hospital, IPO Lisboa and with a radiologist of Clínica Europa at Carcavelos, Portugal. Which we created collaboration protocols. We evaluate our system with eight medical experts, aged 26-60 (Mdn=44.25). Four of them were females (50%), and four were males (50%). No one has a physical condition or disease that limits a monitor’s use for long periods of time. The radiologists were grouped at three (Table 5.1) expert group: (1) Intern; (2) Junior; and (3) Senior. Five of the radiologists were senior (more than ten years of experience). Two of them have a junior experience, and one was an intern on the HFF. One of the radiologists works only in the private sector. Two of them work only in the public sector, and the other five radiologists work in both sectors. Seven of them (87.50%) reported that Hospitals or Hospital Centers were the infrastructures of the clinical profile where it fits their function. Five of them reported Clinic of Complementary Diagnostic and Therapeutic Means. As the type of clinical activities, six of the radiologists (75%) reported that are familiar to do patient tracking. All of them do breast cancer diagnosis. Five (62.50%) of them do the patient intervention, and finally, six (75%) of them do the patient follow-up. All of them reported the digital format as the favourite to the mammography diagnosis model. However, 12.50% prefer the Hardcopy11 for tclinical practice that supports the visualisation, 87.50% of the radiologists preferred the Softcopy12 against the Hardcopy.

Radiologist 1 2 3 4 5 6 7 8 Intern 

Junior  

Senior     

Table 5.1: Table of Radiologist Expert Level

Radiologists reported about the used technologies on their hospital or clinic unit, two of them (25%) answer Mammograph with Computed Radiography (CR) Digital Technology13 (not integrated). Five (62.50%) answer the Direct Digital Radiography (DDR) Mammograph14 (digital integrated). Three of them (37.50%) appoint to the DDR with Tomosynthesis15 functionality. Ultrasound as dedicated to breast studies16 was reported by seven of the radiologists (87.50%). Finally, six of the radiologists reported the Equipment for MRI17 (75%) were used to work with the technology. One of them does not select an option.

11Hardcopy: Physical copy of the DICOM image, where we can make notes with pencil and pen. 12Softcopy: the virtual copy of the DICOM image, where we can make digital annotations. 13CR has been known as indirect digital technology, bridging the gap between fully digital detectors and X-Ray film. 14DDR is an online system that includes a detector as an integral part of the mammographic unit to allow reading of the digital image in real time. Also, DDR is a more viable option for effective breast cancer screening. 15Tomosynthesis is a particular kind of mammogram that produces a 3D image of the breast by using several low dose X-Rays obtained at different angles. The chest is positioned and compressed in the same way as for a mammogram, but the X-Ray tube moves in a circular arc around the breast. By sending the information from the X-Rays to a computer, which produces a focused 3D image of the breast. The X-Ray dose for a tomosynthesis image is similar to that of a regular mammogram. 16Ultrasound can demonstrate the cystic quality of some breast lesions. The exam is very operator-dependent, and images may be difficult to reproduce because of variable transducer position and settings from one review to the next. Breast ultrasound is, therefore, best used in evaluating specific lesions, such as nodules visible on mammography or palpable lesions. 17MRI hardware includes the electrical and mechanical components of a scanning device.

52 Typically, 87.50% of the radiologists, what means that seven in eight of the radiologists are used to have a PACS on their workstation. Also, 62.50% of the radiologists reported that are used to work on a DICOM format. That means familiarising topics. All stated that they analyse medical images for the diagnostic purpose on a daily basis.

5.1.2 Apparatus

The prototype for the experiment is implemented in JavaScript using the cornerstone [98] library and using standard HTML/CSS for the UI as defined in the previous chapter. The device used on the ex- periment for a traditional environment is a MacBook Pro, Retina, 13-inch, Early 2015, with a standard integrated keyboard on the laptop, together with a Microsoft Mobile Mouse 4000. The MacBook Pro features a 14 nm Broadwell 2.7 GHz Intel (Core i5) processor (5257U), with dual independent proces- sor cores on a single silicon chip, a 3 MB shared level 3 cache, 8 GB of onboard 1866 MHz LPDDR3 SDRAM, and an integrated Intel Iris Graphics 6100 graphics processor that shares memory with the system. It also has a high-resolution LED-backlit 13.3" widescreen 2560x1600 (227 ppi) "Retina" dis- play that showed an excellent choice to the prototype display. The Microsoft Wireless Mobile Mouse 4000 has a blue track technology that tracks on almost any surface. Also, a power slider conserves battery. It is right and left hand compatible. Not forgetting a contoured shape and transceiver attaches to undercarriage. All of this specifications showed to be useful for the test. The base of the prototype display (Figure 5.1) is located about 45 cm from the edge of the table [59] since it is the most comfortable distance for the radiologists. It is determining the initial position of the mouse in line with the base and 12 cm from the edge of the laptop and 22 cm in a diagonal line from the corner of the display.

Figure 5.1: Position of the user in relation to the devices.

53 For the interaction and user records, we use several tools to support the study. We developed this prototype to be interacted by Google Chrome browser which offers great performance by the fast JavaScript speed. In fact, that advantage was a major reason for our initial prototype development. It’s still very fast, but other browsers have since equalled and even surpassed it in speed, as we will see below and also compatible with our prototype. We also used Apple’A˘ Zs´ default media application QuickTime, which is ready and waiting for screen recordings from radiologist inputs and for a later lab analysis of all information. Finally, we used the Chrome Recorder, a tool that records the UI, take screenshots and replay the actions. Also can mock XHR calls with a HAR file what make it possible to record radiologists annotation coordinates.

5.1.3 Tasks

The experiment was carried out on several days. In each day, radiologists performed one repetition, on one DICOM image for each patient (Table 5.2), for a given task, where there were just three DICOM image patients to be analysed. The job consists in the delineation of three different DICOM images, with increasing difficulty in the annotation process. This difficulty means that the DICOM images are presenting irregularities BI-RADS (i.e. more irregular shape masses and more classification density) to the radiologist. Each frame is a DICOM image of a patient containing lesions (masses and calcifications).

Figure 5.2: Anonymous 1 LCC Modality.

In the first DICOM image (Figure 5.2), three lesions were presented to the radiologist with a low difficulty on a LCC modality. On the second DICOM image (Figure 5.3) the source is different, we took a US modality with a medium difficulty of annotation asking the radiologist to do this simple task annotation. Finally, the last DICOM image (Figure 5.4) was a typical MG modality but with a huge and irregular mass lesion. The difficulty here was hard of annotation as expected to be.

54 Figure 5.3: Anonymous 4 US Modality.

Each radiologist participated in a one minute of a training session with us explaining the interface features that must be followed to perform the shape annotation. Then, the radiologist is asked to com- plete, as in training sessions, the same task on the three patient DICOM images and with no external intervention. When completing the tasks for each image, the radiologist has to say the Patient Name of the next Phase, to be able to access the following image. If we observed that the radiologist did not carry out the task in a given time, we assign it as an incomplete task. All the user interactions are recorded on the video screen. We also recorded the heat maps and the coordinates.

Figure 5.4: Anonymous 2 MG Modality.

55 Patient Studies Phase Patient Name Modality Image Difficulty Number of Lesions 1 Anonymous 1 LCC + 3 2 Anonymous 4 US ++ 1 3 Anonymous 2 MG +++ 1

Table 5.2: Table of Patient Studies

5.1.4 Statistical Data Analysis

Statistical analysis was performed using IBM SPSS Statistics [101] Version 24.0 (IBM, Corporation). We plotted several histograms of the radiologist’s data, and its global distribution was found to be nonpara- metric18. SPSS require a long-form data arrangement for the independent sample test of the wide form arrangement for the paired test. Accordingly, data information may need to be manipulated properly before analysing with software packages. We analyzed the performance using several metrics including, (1) time, (2) number of interactions, (3) hit rate score and (13) errors of the conditions list. During the test some notes were collected, that includes preferences, improvement suggestions and applicability of the UI prototype (i.e. what are the UI prototype advantages to bring it for clinical setups). All analysis used the SPSS software. As a first step, we may consider the use of a paired sample t-test to assess whether the means of group variables from the same population are not different. Although, if we are interested in testing whether the means of more than two variables are equal, the appropriate statistical deduction will depend on the underlying distributions [102]. Both tests, the parametric Analysis of Variance (ANOVA) [103] test and the non-parametric Kruskal-Wallis [104] test, are adequate for testing differences between more than two condition samples from the same population of radiologists.

Several conditions are presented in tests, and are as follows:

1. Time; 5. Autonomy; 10. Enjoyment;

6. Relatedness; 2. Number of Interactions; 11. Intuitive Controls; 7. Immersion; 3. Hit Rate; 12. Preferences; 8. Positive Affect;

4. Competence; 9. Negative Affect; 13. Errors;

Since the data collected does not meet the applicability pre-conditions required for ANOVA [105], i.e., the information set does not follow a Gaussian distribution and also the number of samples is small, we resort to using the Kruskal-Wallis [104] for a ranked analysis of variance to test the performance and experience that is useful for testing between-subjects effects for two or more conditions. Also, the uni- variate analysis used the Kruskal-Wallis supported by χ2 procedures. Nevertheless, multivariate regres- sion19 using forward stepwise modelling procedures were used to identify the most parsimonious20 set

18We call it as nonparametric because they make no assumptions about the parameters (such as the mean and variance) of distribution, nor do they assume the use of any particular distribution.

56 of work predictors of performance and experience. Kruskal-Wallis analysis of variance was used to compare differences between the 13 groups. By founding significant differences, THSD [106] tests were used for post hoc21 pairwise comparisons between every two groups, in case that both data collected meet the applicability pre-conditions required. The Kruskal-Wallis and THSD tests were used to com- pare scores between single or global groups to evaluate performance. These tests were also used to compare time performance and the number of errors committed by the groups. For the data analysis, the Kruskal-Wallis and THSD [106] tests were also used to compare (Table 6.3) the number of interactions and accuracy (hit rate score) between groups, based on interview and observation by video screen data findings and recorded coordinates. Baseline differences in performance and experience were analysed using those extensions of the statistical data analysis tests. A common problem in user and tasks data analysis is to decide whether or not sample differences in central tendency reflect exact differences on users. It is appropriate to use fixed effects of Kruskal- Wallis for a k-sample case (in our research, for every two groups) meeting assumptions of independence of errors, homogeneity of variance and Normal Distribution. When equality of variation and/or Normal Distribution are doubtful, recommending the use of non-parametric statistical procedures. Two non- parametric counterparts to a function are the Kruskal-Wallis rank tests and the typical average scores tests, already referenced, which user normalised observations in the place of ranks. Based on ranks, the Kruskal-Wallis test is making it suitable for the k-sample case. Also, statistic test hypotheses of an equal number of users are sensitive to location shifts, where Kruskal-Wallis test is asymptotically distributed as χ2 with k - 1 degrees of freedom. Sampling is assumed to be random, where samples are drawn from users with continuous distributions on the one hand, and are infinite or sampling with an original replacement on the other side. Large values of the statistic are lead to rejection of a null hypothesis.

12 R 2 H = ∑ i − 3(N + 1) N(N + 1) i=1 ni

where:

ni = observations number on the ith sample;

N = total number of observations (∑ni) in all samples combined;

Ri = ranks sum in the ith sample;

The THSD is based on a studentized range distribution similar to a t-student (q statistic) but taking into account the number of data treatments to be considered in. The test is somewhat in-between, being frequently used when comparing all pairs of means. We used the THSD in conjunction with the Kruskal-Wallis to find means that are significantly different from each other. Comparing all possible pairs of means, it applies simultaneously to our conditions set 22 (e.g. Time, Number of Interactions) of all pairwise comparisons (µi − µj) and identifies any difference between two means each condition of the 13 conditions while greater than the expected standard er- ror. When all sample sizes are equal, and in our research, we always compare the same number of radiologists and conditions, the confidence coefficient for the set is 1 − α for any 0 ≤ α ≤ 1.

19A technique that estimates a single regression model with more than one outcome variable, as we are making a comparison between quantitative and qualitative conditions. 20The most straightforward model/theory with the least assumptions and variables but with most considerable explanatory power. 21For example, we use the post hoc test to determine whether Time, Number of Interactions, Hit Rate and Errors were statistically significantly different between Intern, Junior and Senior groups. 22 µi stands for the computed average through the test for the i-th indentifier, while µj stands for the j-th indentifier.

57 The q statistic is given by:

(y − y ) q = max√ min 2 S n

where:

ymax = largest of the sample mean;

ymin = smallest of the sample mean; S = the pooled sample variance from these samples; n = sample size;

The THSD approach uses mean rank sums and can be employed for equally as well as unequally sized condition samples without ties. The null hypothesis H0 : ymax −ymin is rejected, if a critical absolute difference of mean rank sums is exceeded. For this data analysis section, we have assumed that the goal of this section is to find out a model having the best generalisation results of the conditions set. By doing this, we have been concerned primarily with the choice of literature [106, 104, 105] subset about the model options we have. In the design of the experiment for comparing several conditions from each radiologist, we have taken into consideration all the sources of variations that any statistical test should control. After collecting data from our radiologists, we analyse error means. The complete procedure has been shown to be useful in our data analysis from the radiologists.

5.2 Evaluation

Performance and experience are addressed within-subjects design with the UI prototype. Figure 5.5 shows the order of the user (radiologist) tests. First, the user took a Demographic questionnaire on an online form (Electronic Questionnaires). Second, a user Training session on a mouse-based envi- ronment, where we show to the user what are the features and a resume of the task. Third, the user executes a set of pre-defined Tasks, already referenced before (Table 5.2). Followed by a questionnaire, the fourth phase is where we ask the user information about Experience of use. At the end, another Final online form (Electronic Questionnaires) is answer by the user.

The after-task questionnaire concerns the Experience of use (UX) that takes place after each condi- tion test, resorting to a set of questions rated on a Likert-scale. We inquire the user about nine conditions that were in the above Statistical Data Analysis section (From 4 to 12). The final questionnaire of the high UI prototype is filled, regarding satisfaction and experience.

5.2.1 Quantitative Evaluation

The focus of this section is on the methods for quantitative evaluation: (i) methods for determining performance requirements; and (ii) methods for evaluating compliance with the achievement of the tasks; in the practical context of prototype development for a medical and clinical purpose. To determine the performance during the task, we compute several metrics including (1) time performance, (2) number of interactions, (3) accuracy and (13) number of errors in the annotation process.

58 Figure 5.5: Performance Evaluation Phases and Questionnaires.

5.2.1.1 Time Evaluation

A successful user recognition should, in general, be supported not only on classification through interac- tion time functions but also on the time-space context of each task. If a radiologist reaches and performs a valid device (mouse and keyboard) manipulative interaction with the UI prototype for a DICOM image annotation and succeeding those annotations from the radiologist’s point of view, the system should con- sider as an error. This method will increase the Number of Errors variable. Still, only a small number of cases so far exploit this fact.

While computational effectiveness in users recognition arises, the trade-off is classical:

Recognition Time vs Recognition Applicability vs Task Complexity

Most of the model-based interactions supported by features characterise several parameters (Figure 4.1). The time is taken to learn the system and the retention of acquired knowledge over time also affect the utility of the system. Radiologist acceptance [81] of a system, i.e., subjective satisfaction. Nevertheless, a system may be evaluated favourably on every performance measure. We base our time evaluation on a screen video recording from the radiologist’s interaction between the UI prototype and using the traditional devices (mouse and keyboard on the laptop). We examine radiologists while they annotate lesions from the asked tasks to determine the start and end times of those defined tasks. The start time was defined as the moment the radiologist initiated the Use Case Tasks Phase (Figure 5.5) defined by the exact moment when a radiologist ends the Training Phase (Figure 5.5) and opens the PACS (Figure 3.12) view of the prototype.

59 5.2.1.2 Accuracy Evaluation

We measure accuracy by computing the Hit Rate Score (HRS). This measure has defined the per- centage of annotations that lie on three areas (Figure 5.6). We measure these three areas from the ground-truth (see black line in Figure 5.6). The first area, Area A, is the area delimited by the two Green Lines, having a width of 2ε, where ε is the perpendicular distance from a point in the ground truth (black line) to the Green Line23. The second area, Area B, has also a width of 2ε. However, Area B embraces two regions, each region falling between the Green Lines (inner boundary) and Red Lines (outer boundary).

Figure 5.6: Annotation Classification Areas.

The third area, Area C, is the complementary area of the union of the regions A and B. Taking a perpendicular line at a point in the ground truth, the Area A has a width of 2ε, whilst the Area B has the width of 2ε. We define a high accuracy annotation, whenever reaching a score of three points (HRS = 3). Low accuracy is defined whenever we obtain the score of one (HRS = 1). Finally, every area out of Area A and Area B, (i.e. Area C) is scored as a zero (HRS = 0), meaning that the annotation has no accuracy. As we can see in Figure 5.6 the areas are defined and easily computed by screen recording coordinates.

5.2.1.3 Number of Interactions Evaluation

Reducing the number of operations, i.e., annotations will produce less information to the Meta-Info. Gen. (Meta-Information Generation) and less consumed data. This issue brought us a trade-off that we need to measure between reducing the number of interactions and less available data.

23Here, the ε is the diameter of the annotation point.

60 The number of interactions, and also, the number of annotations are computed as follows:

NoAtotal = ∑ ∑ ξ img=0 + ∑ ∑ ξ img=1 + ∑ ∑ ξ img=2 ls=1 ann=1 ls=1 ann=1 ls=1 ann=1

NoItotal = NoAtotal + ∑ δ i i=1

where:

img = the imgth image of the three (Table 5.2) tested DICOM images; ls = the lsth lesion of a given DICOM image; ann = the annth annotation of a given lesion from a DICOM image;

δ i = UI event action on the ith interaction of the user;

ζ j = UI event action on the jth annotation of the user;

NoItotal = total number of interactions;

NoAtotal = total number of annotations (∑ζ j) on each lesions combined from the three (Table 5.2) tested DICOM images;

At this moment we measure the number of interactions by counting the total number of annotations on each image (img=0, img=1 and img=2) and each lesion (ls=1), plus the total number of event action by the radiologist on the UI. Considering those event actions on the UI as pressing a UI button or even each time the radiologist interact with our system.

5.2.1.4 Number of Errors Evaluation

Typically, the criterion of error refers to means available to prevent and detect command errors, data entry errors, or actions with destructive consequences to the task. It is better to identify the mistakes before validation rather than after, where the detection is less disruptive. The research on this section supported the hypothesis that perceptual structure of an input task in an important consideration when designing a multimodal of medical imaging for breast cancer diagnosis as a UI. Task completion time, the number of errors, and improvement of the radiologist acceptance when the UI best matched the perceptual structure of the inputs attributes. We record the number of mistakes and also the screen recorder of the UI prototype. We document three measure of errors:

• Task Errors; • Mouse Errors; • Annotation Errors;

The Task Errors is related to when the radiologist forgets what the task to do, like what patient should be the one for the present diagnostic was. Mouse Errors were recorded when a participant accidentally selected an incorrect feature from one of the menus displayed on the UI prototype. The Annotation Errors were identified as when the input did not match the most likely diagnosis for each tissue slide and are far away from a reasonable Accuracy Evaluation range of limits. Finally, to account for the number of errors, we compute the number of times that an error occurs by observation in the radiology room. This accounting means that the radiologist made a mistake and needed to repeat the annotation task. Determining the actual number of errors we analyse the diagnostic output from the UI prototype, recorded observations of the researcher, and review of screen-recorded during the study.

61 5.2.2 Qualitative Evaluation

We characterise the qualitative evaluation through the concern of exploring phenomena from the per- spective of the radiology room, among many distinct features. We captured all detailed data (Appendix A and section User Experience of Appendix B) which is mainly inductive rather than deductive ana- lytic24 process. Answering the ’What is?’, ’How?’ and ’Why?’ questions, the process employees a variety of methods. Items from section Statistical Data Analysis are formed in different contexts mostly known as Positive and Negative Affect Scale (PANAS) [107], Intrinsic Motivation Inventory (IMI) [108] and Experience Needs Satisfaction (ENS) [109]. Those, we group the nine conditions as follows:

• PANAS;

– Positive Affect; – Negative Affect;

• IMI;

– Enjoyment; – Competence; – Autonomy; – Relatedness;

• ENS;

– Immersion; – Intuitive Controls; – Preferences;

For measuring purposes, at the end of the Use Case Tasks Phase (Figure 5.5), the Experience Phase (Figure 5.5), radiologists completed a series of validated scales. The following scales are consid- ered: Starting by the (i) PANAS [107, 110]; (ii) IMI [108]; and (iii) ENS [109]. When we achieve the end of the use case tasks, radiologists are asked to provide information and rank each condition, providing any comments.

5.2.2.1 Positive and Negative Affect Scale

Some scales have been created to measure several factors. However, many existing measures are inadequate, showing discriminant validity or poor convergent or low reliability. To fill the need for reliable and valid PANAS [107] that are also brief and easy to administer, we address the two 5-item scales that comprise the PANAS. The scales are shown to be highly internally consistent [110]. Briefly, positive affect reflects the extent to which a person feels enthusiastic, active, and alert. High positive affect is a state of high energy, full concentration, and pleasurable engagement, whereas sad- ness and lethargy characterise a low positive affect. In contrast, negative affect is a general dimension of subjective distress and unpleasurable engagement that subsumes a variety of aversive states, in- cluding anger, contempt, disgust, guilt, fear, and nervousness, with low negative affect being a state of

24An inductive approach is concerned with the generation of our new theories from the Statistical Data Analysis. A deductive method is testing those theories and is aimed at it.

62 calmness and serenity. These two factors represent affective state dimensions, roughly correspond to the dominant personality factors of extraversion and anxiety, respectively. The scales correlate at predicted levels with measures of related constructs and show the same pattern of relations with external variables. For instance, the positive affect is relevant to users activity and shows significant diurnal variation, whereas the negative affect is significantly related to perceived stress and shows no circadian pattern25 on the radiology room. Thus, we offer the PANAS as an efficient, reliable and valid means for measuring these two critical dimensions.

5.2.2.2 Intrinsic Motivation Inventory

Human beings can be engaged and proactive or alienated and passive, as a function of several con- ditions in action. Self-Determination theory has focused on the user context conditions that facilitate against forestalling the processes nature of self motivation, as well as, psychological development of the users. More specifically, we examine human factors that enhance or undermine well being, self- regulation and intrinsic motivation. Radiologists’ intrinsic motivation was assessed using the interest and enjoyment subscale related called IMI [108]. It is a validated multidimensional measurement instrument and it is used by us to measure radiologists’ subjective experiences related to enjoyment, and interest in activities conducted in the radiology room experiments by measuring intrinsic motivation. In this section we detailed those conditions, addressing its implications of four (Enjoyment, Compe- tence, Autonomy and Relatedness) important outcomes. We begin with an examination of IMI approach, the UI prototype interaction of the user tendency toward learning, and we consider that research was specifying conditions that forestall versus facilitate this special type of motivation. Motivation concerns equifinality, persistence, direction and energy of all intention and activation aspects of the user. We highly value it because of its consequences that motivation produces. Therefore, it is a preeminent concern to the radiologists by involving mobilising others to act. Given the number of goals we hoped to compare and measure (e.g. ability, outcome, normative and learning goals), we wanted to use the fewest possible items to measure each type of goal while still maintaining high reliability of the study. We felt that using relatively few items would minimise the confu- sion and frustration radiologists might experience. Our IMI questionnaire uses a five-point Likert scale, also, to measure of Enjoyment. Competence was predicted to be positively correlated to Enjoyment. For each item, radiologists were asked to rate their agreement on a 5-point Likert-type scale ranging from 1 (Strongly Disagree) to 5 (Strongly Agree).

5.2.2.3 Experience Needs Satisfaction

Needs of satisfaction [109] is assumed to represent the underlying motivational mechanism that orient and energise user’s behaviour. Regarding the psychological needs of satisfaction as the essential nour- ishment for individual’s optimal functioning. Three basic needs are distinguished and addressed on our study: (1) Immersion; (2) Intuitive Controls; and (3) Preferences. First, a psychological theory has posited that immersion is related to specific user needs, including those for exploration, interaction and experience [111]. Satisfaction of such needs represents a primary reason for the medical UI prototype, indicating the relevance of satisfying immersion-related needs. This section also devises the construct immersion satisfaction to denote the extent to which immersion- related requirements are satisfied. Immersion is pleasurable, often satisfying particular user needs. However, no construct describes the degree of satisfaction with immersion-related needs. Our analysis,

25Changes that follow a daily cycle.

63 thus devises a construct, immersion satisfaction, to represent the extent to which immersion-related radiologists’ requirements are satisfied. This term may help subsequent computer studies to describe such a concept. Second, convergence in the correlations between intuitive controls and needs of satisfaction estab- lished an association between radiologists and the relatedness needs for satisfaction. Intuitive controls are another variable of interest in assessing needs of joy on the clinical field. While the degree in which the UI controls are simple as that, and whether they make sense, we quickly mastered it and do not interfere with the task. We thus applied a measure of Intuitive Controls as a subscale of the ENS that assesses the UI between the radiologist and the action taking place within the task. Intuitive controls can contribute to motivation by associating it with a greater sense of freedom and power, and they enhance a sense of competence (IMI). Therefore, insofar as an intuitive control, predicting motivational outcomes of the diagnostic, as we expect it to be mediated by perceived autonomy and power, also. Third, at this early stage of prototype development, evaluation coupling needs of satisfaction with an assessment of radiologist characteristics and preferences provides essential insights as the medical and clinical fields explore the complexities of implementing the UI prototype in ways that are most meaningful for patients, and strategically situated in the context of radiologist-centered diagnosis. Understanding radiologist needs and preferences is a crucial starting place to begin to unravel this complexity. Perhaps increasing basic psychological needs of satisfaction is only one of several possible interven- tions, which could reduce the doubts about experienced by radiologists. In conclusion, active counselling with these radiologists may require that explicit attention is paid to both reducing the individual’sZ´ use of negative, adaptive strategies as well as directly helping them find more positive, adaptive methods of satisfying unmet psychological needs.

64 Chapter 6

Results

We analyse the results, to understand if we were reaching our proposed goals. While taking these into account, some minor changes were made to improve our system. The following sections describe each result in detail, and several conclusions.

6.1 Performance

We resort to using the Kruskal-Wallis Analysis of Variance [104] to test the performance and experience that is useful for testing between-subjects effects for three26 or more conditions [105]. In the following sections, we describe the results regarding the first and second order statistics, using the mean (Mcond) and standard deviation (σ cond), as well as the usual error of the way that is defined as follows.

√ semcond = σ cond/ N

where:

cond: condition (cond) can be one of the following variables for time (Time), hrs (Hit Rate Score), noa (Number of Annotations), noi (Number of Interactions), noe (Number of Errors), pa (Positive Affect), na (Negative Affect), enj (Enjoyment), comp (Competence), aut (Autonomy), rela (Relatedness), imme (Immersion), ic (Intuitive Controls) or pref (Preferences); N: the number of users (radiologists);

semcond: standard error of the mean of the condition (cond);

Mcond: mean value of the mean of the condition (cond);

σ cond: standard deviation of the mean of the condition (cond);

6.1.1 Time Results

Like most clinical systems, MIMBCD-UI was not developed primarily for addressing HCI problems but is a more significant clinical endeavour to represent essential research concepts of the medical imaging diagnosis. However, since HCI is a subset of the radiologist performance, on the radiology room, a reasonable proposal for a cognitive system should allow one to analyse and compare UI designs and then recommend and evaluate improvements from the results.

65 Table 6.1: Time (seconds) Performance Results.

Table 6.1 shows the overall time performance of the radiologists (N = 8) to complete the annota- tion task as described in Tasks section. As above mentioned, for all the events, all coordinates of the annotation are saved and recording all videos from the user interaction involvement. The relation be- tween the radiologist expert level is also important here, addressed on the Participants section (Table 5.1). The average of each subject’s successful tasks is used to obtain the completion time aggregate by radiologists. Excluding the unsuccessful tests in the completion time analysis. By evaluating the completion time of the task, the analysis revealed (Mtime = 168.38s, σ time = 64.12, semtime(x¯) = 22.6698) a radiologist maximum time spent of 296 seconds and radiologist minimum time spent of 103 seconds. The conducted post-hoc tests, by the application of the THSD test, revealed that there are differences between all the interactions regarding completion time. This information is addressed in the Summary section below where we set back all information and summarise it.

26The conditions are typically 15 counting the radiologist number as a condition as well.

66 6.1.2 Accuracy Results

To evaluate the accuracy results (Table 6.2) when annotating a breast lesion, we analyse and created a HRS already addressed and explained on the Accuracy Evaluation section.

By evaluating the accuracy results of the lesions, the analysis revealed (Mhrs=123.25, σ hrs = 128.145, semhrs(x¯) = 45.3061) a radiologist maximum HRS spent of 414 score points by radiologist five, a senior radiologist, and radiologist minimum HRS spent of 24 score points by radiologist three, also senior radi- ologist. For this information we do not have enough information about the professional group experience to conclude the relation between those. The number of annotations has a direct relationship between the HRS while we compute the per- centage annotations (Table 6.3) scoring it as a score of three points (HRS = 3) or a score of one point (HRS = 1). For instance, as more we do accurate annotations, better the information data will be gen- erated. That way, as more percentage of the red bar we have, it will mean that the percentage of the high score (HRS = 3) will be higher. From Table 6.2 it is clear that the radiologist five performed more annotations, and also was the most accurate (Table 6.3). The throughput of the loss was the time (Table 6.1), if we observe the time spent for this radiologist five, we can see that was the slowest one. Typically understandable, while radiologist five did the most number of annotations and the most accurate ones.

Table 6.2: Accuracy Results. Table 6.3: Num. of Anno. vs HRS Results.

6.1.3 Number of Interactions Results

In this section, we analyse the impact of the number of interactions (Table 6.4), different interaction patterns, and inconsistencies in decision maker responses on the convergence of an interactive UI prototype. In our context, a total number of interactions is the sum of all image annotations with the UI interactions. Commonly, the UI interactions are the ones referenced on System Architecture section (Figure 4.1), where the Zoom and Pan where the most used tools. The results indicate that it is possible to obtain solutions that are very good or even nearly optimal with a reasonable number of interactions. There is also surprising robustness toward different patterns of interaction with the decision maker. From the Table 6.4 we can observe how the relation between annotations and total number of in- teractions (Mnoi=62.38, σ noi = 50.177, semnoi(x¯) = 17.7402) will behave, where the mean of the number of interactions (Mnoi=62.38) is 11.62% more (Mnoa=55.13) than the mean number of annotations. How- ever the observation of several differences between number of annotations (Mnoa=55.13, σ noa = 39.39, semnoa(x¯) = 19.49) and number of interactions were noticed, the difference of the means are not so rele- vant and were typically expected, while the same differenece was observed on both σ noa and semnoa(x¯).

67 Table 6.4: Total Number of Interactions.

For some interactions comparison with some annotations, we need to manage two attributes. On the one hand, fewer interactions will give the user a better experience, more motivation and less number of errors. On the other hand, a higher number of annotations will generate more data information to the Meta-Info. Gen. (Meta-Information Generation), representing an important variable to focus on. This approach is desirable since we want to maximise the number of interactions and accuracy, minimising the time speed. More interactions mean that we will obtain a higher number of annotations, thus a better input for the system is obtained at the same time.

6.1.4 Number of Errors Results

Another metric, the error counter (Table 6.5), may have been regardless, so the method chosen as error counter did not affect the conclusion about overall measures from performance and the total number of iterations process. The errors (Table 6.5) must take into consideration when trying to understand accuracy and the time spent when repeating the annotations. A typical criteria and measures included the amount of time takes typical users to learn to understand the UI to do the scheduled tasks, and the number of errors they make, shows that (Mnoe = 1.50, σ noe = 1.414, semnoe(x¯) = 0.4999) our UI is enough robust from the mean value of 1.50 errors/radiologist number of errors. That means for each radiologist that take a mean time of interaction (Mtime=168.38s) it will make a 1.50 error per radiologist. These values are completely acceptable and less than expected.

68 Table 6.5: Total Number of Errors.

6.2 User Experience

While the affect fosters intrinsic motivation, as reflected by choice of activity in a situation and by the rated amount of enjoyment of a challenging task, it also promotes responsible workload behaviour in a task completion situation. Where there exists tasks to be completed, radiologists in the Positive Affect condition reduced their time on the enjoyable ones, successfully completing the task, but also spending time on a more enjoyable way. We present the results (Table 6.6 - Table 6.8) of affect, enjoyment, need satisfaction and preferences.

Affect. The mean significant (Mpa = 4.38, σ pa = 0.518, sempa(x¯) = 0.1831) main effect of Positive

Affect (Table 6.6), while the median value (mpa=4.00) is the same of the minimum value (minpa=4.00), less than the maximum value (maxpa=5.00). A percentage of 37.50% voted xpa = 5.00 that corresponds to the maximum value attributed on this results. Finally, a percentage of 62.50% voted xpa = 4.00 what corresponds to the minimum value attributed on this results. Whereas the marginal effect (Table 6.6) of the Negative Affect (Mna = 1.50, σ na = 0.756, semna(x¯) = 0.2672) is commonly expected in comparison to the good results of Positive Affect. Also, analyzing the minimum value (minna=1.00) with 62.50% of the radiologists, and a maximum value (maxna=3.00). A percentage of 62.50% voted xna = 1.00, a 27 percentage of 25% voted xna = 2.00 and finally a percentage of 12.50% voted xna = 3.00.

Enjoyment. The results (Menj = 4.25, σ enj = 0.463, semenj(x¯) = 0.1636) from the IMI and more specif- ically about enjoyment (Table 6.7) were clearly positive. Where, a high percentage of 75% voted xenj = 4.00 and the other 25% voted the maximum value of xenj = 5.00 from the 5-point Likert-scale.

The median (menj=4.00) of the enjoyment was also average high. Competence. Radiologists also perceived themselves to feel competent (Table 6.7) using our UI prototype features and tools. A percentage of 50% selected the maximum value of xcomp = 5.00 from the

27A 12.50% of the radiologists percentage means one in eight radiologists.

69 Table 6.6: Positive and Negative Affect.

Likert-scale, while the other 50% choose the xcomp = 4.00. Here the results are even higher (Mcomp=4.50,

σ comp = 0.535, semcomp(x¯) = 0.1891). The median value (mcomp=4.50) represents also an evidence that radiologists felt competent using our UI prototype.

Autonomy. Describing the autonomy (Table 6.7) that in comparison to competence, also showed that our UI prototype gave a good experience to the radiologists related on those conditions (competence and autonomy). The results (Maut = 4.50, σ aut = 0.535, semaut(x¯) = 0.1891) showed that, like on the com- petence condition, a percentage of 50% selected the maximum value of xaut = 5.00, while the other 50% choose the xaut = 4.00. The median value (maut=4.50) represented also an evidence that radiologists felt autonomous.

Relatedness. As a scale, relatedness (Table 6.7) assesses the affective, cognitive and experimen- tal aspects of radiologists’ connection to an early identical experience. Observing the results we can conclude that a percentage of 50% from the radiologists, felt highly related (xrela = 4.00) to our UI pro- totype. A percentage of 37.50% (xrela = 5.00) felt extremely high related to the UI prototype, while only

12.50% (xrela = 3.00) felt normally related. The mean value (Mrela=4.25, σ rela = 0.707, semrela(x¯) = 0.2499) showed to be higher than average, so relatedness is also a conserved condition to the study. This scale correlated as expected with other condition measures, and relatedness was a better predictor of radiolo- gists involvement, sustainable consumption, and user identification. The relatedness results support the validity of affective, cognitive and experiential aspects of radiologist connection with the UI prototype.

Immersion. Several aspects related to immersion were discovered in this results that corresponds to the existing research of the immersion scale. Discussing the entry into a diagnostic, assessment is an integral for any level of radiologist engagement with a tool on the radiology room. The more a task is oriented to the UI, the more users will put up with minor usability issues if the overall experi- ence is pleasurable. For instance, our immersion levels showed up a significant positive mean effect

70 Table 6.7: Intrinsic Motivation Inventory.

(Mimme=4.00, σ imme = 0.756, semimme(x¯) = 0.2672) for this condition. While 25% of the radiologists voted ximme = 5.00 that represents the maximum value, another 25% voted ximme = 3.00. That means a 50% voted ximme = 4.00, a value average representing the majority as the median value (mimme=4.00) also showed. In contrast, engagement, and therefore enjoyment through immersion, is not possible if there are usability and control problems. Usability flaws could hinder this.

Intuitive Controls. Using specific subscales, several hypothesis are tested from the ENS conditions. We first explored for main and interactive effects of radiologists on each task variables. Results revealed to be positive (Mic = 4.75, σ ic = 0.463, semic(x¯) = 0.1636) to this condition. Mainly, 75% of the radiologists said that the controls were highly (xic = 5.00) intuitive, while the other 25% answer the controls (xic = 4.00) were intuitive.

Preferences. As preferences are differently scaled (1 ≤ xpref ≤ 4.00), the mean value (Mpref = 2.63,

σ pref = 0.916, sempref(x¯) = 0.3238) showed to be more average (mpref=3.00) meaning that radiologists preferred our tool for the purposed solution.

These results indicate that affect does foster intrinsic motivation, as well as, enjoyment of the tasks. The reactions of participants in the affect condition may illustrate radiologist’s typical, everyday think- ing and self-regulation concerning intrinsic motivation and needs of satisfaction. We found that these radiologists were responsive to an intrinsic motivational orientation. These findings are significant be- cause self-determination shows [112, 113, 114] rather convincingly that radiologists with a motivational direction and who are external regulated show relatively poor functioning.

71 Table 6.8: Experience Needs of Satisfaction.

6.3 Summary

To summarise, in this section, we discuss our results suggest that the performance and experience of our UI prototype validate the effect on radiologists behaviour, as well as on subjective user preferences. For testing significance, we used the non-parametric Kruskal-Wallis analysis (Table 6.9) of variance by ranks and post-hoc THSD tests. We applied the THSD tests for pairwise comparisons, since the THSD test is more sensitive when making several numbers of comparisons than other commonly used, further detailed in the Recommendations & Discussion section. The Kruskal-Wallis (Table 6.9) showed a significant difference28 between subjects with different ex- 2 pert levels (Table 5.1) for all tasks, indicating that the average score (χ exp = 2.705, αexp = 0.911) of both performance and experience received from the subjects with less expert level was lower than the aver- age scores from the subjects with high expert level of radiology. That means more and better-generated data information for the Meta-Info. Gen. (Meta-Information Generation). For instance, if we further analyse the 50% most scored radiologists (in order: eight, two, five and six) we can see that three of them have a senior expert level and just one has a junior expert level. Next, we will describe how we conducted data examinations.

28Information provided by the SPSS software. It allows us to test whether our sample means (interval variables of Normal Distribution) significantly differ from our hypothesis values.

72 Table 6.9: Kruskal-Wallis (H) Test Mean Rank.

For that we need a set of variables that are defined as follows:

grp: grouping (grp) variable with several grouping options; exp: expert (exp) level of the grouping (grp) variable; kw: Kruskal-Wallis (kw) test option of the grouping (grp) variable or condition; thsd: THSD (thsd) test option of the grouping (grp) variable or condition; 2 χ grp: Chi-Square of the grouping (grp) variable;

αgrp: Alpha value of the grouping (grp) variable;

Post-Hoc THSD tests (p ≤ 0.05) were conducted to examine further the effect of our UI prototype complexity on the condition differences. However the Kruskal-Wallis test (Mkw = 19.98, σ kw = 49.94, semkw(x¯) = 4.7188), grouped by expert level of the radiologists, revealed a tendentious main effect of the performance and experience from the Senior expert level of the grouping variables. On the other hand, the Post-Hoc THSD tests (q(1.28) = 0.671, p = 0.03) revealed (Table 6.10) that the Junior values (Mthsd = 22.47, σ thsd = 40.28, semthsd(x¯) = 3.8061) prevailed against the Senior (Mthsd = 20.43,

σ thsd = 57.34, semthsd(x¯) = 5.4181) and Intern (Mthsd = 12.76, σ thsd = 20.15, semthsd(x¯) = 1.9040). The statistical reason is because of the Senior radiologist three has low levels of performance (Table 6.1 - Table 6.5) and experience (Table 6.6 - Table 6.8). Also the Junior radiologist number two out performed great results on time and with a good ratio between annotations and interactions, as well as, affect, intrinsic motivation and needs of satisfaction. On the singular Intern result (Mthsd = 12.76, σ thsd = 20.15, semthsd(x¯) = 1.9040) demonstrated that a low expert level influences the quality and quantity of the gen- erated information data for the Meta-Info. Gen. (Meta-Information Generation).

73 Table 6.10: Post-Hoc THSD Descriptive Statistics by Expert Level.

A significant difference showed by Kruskal-Wallis (H = 56.50, p = 0.089) of the expert level in com- parison to the overall conditions of performance and experience made us conclude the relation between conditions and the radiologist profile. Also, trivial relations between conditions supported the results of the study. We conducted an empirical study, where radiologists were asked to perform several tasks. All tasks are part of the medical imaging diagnosis of a radiologist and important for its successful com- pletion. By saying this, a relation between a successful completion task will be directly related to the success of the generated data information from that DICOM images29, so important to the Meta-Info. Gen. (Meta-Information Generation). Statistical analysis revealed that expert level of the radiologists influenced the performance and ex- perience conditions. Also, the results showed that the UI prototype is robust enough to be pursued in production. This research is a part of a more massive project aimed to develop a multimodal of a medical imaging system for the breast cancer diagnosis allowing radiologists to visualise and annotate several modalities of medical imaging.

29The patient DICOM images that will be diagnosed by the radiologist.

74 Chapter 7

Recommendations & Discussion

The exploratory research in this section and the performance and experience evaluation of the system on Evaluation part yielded new points of possible functional improvements. Also, the System Architecture section is essential to debate several system decisions and their revisions. During the performance and experience evaluation radiologists were given the possibility to provide feedback on the test case system and indicate whether they missed specific functionality and other issues they would like to see improved. Subdividing these improvements into the following categories: (i) functional improvements; and (ii) UI improvements. There are no direct effects of the affect, intrinsic motivations and needs of satisfaction results on any of the experience measures. The only interaction involving both performance and experience suggests that each condition has its pair of relations. Furthermore, these results show that radiologists are significantly happier (positive affect) remaining on their usual work-flow tool. Also, radiologists had higher competence, autonomy, relatedness and immersed, with a mental involvement on the tasks, by the use of our UI prototype.

7.1 Functional

Functional system improvements include the option to switch between patient studies. This improve- ment is significant because radiologists often want to look at different reviews or old and new image studies of the patient. Also, it should be possible to filter between image source type, filtering between MG, US and MRI to X-Ray scans, for example. Ideally, the system should also be able to display 3D image reconstructions and allow for 3D interaction, but this is not a requirement and would demand a more complex interaction style in three dimensions. Less frequently mentioned, but possible, valuable suggestions include the addition of optional patient information, like a historical study directly available on our system. While working on Cornerstone and its associated repositories to remove the jQuery dependency we come more and more to the conclusion that the architecture of our system is hard to test and maintain. There are so many events fired that it is tough to follow the program flow and to remove the jQuery event dependency. In fact, the system would greatly benefit from a reactive programming architecture [115] as a more scalable solution. Another future improvement is the desire for SVG overlaid on the canvas instead of the current drawing approach, which requires the image to be re-rendered when moving drawing. With the WebGL renderer this becomes only a minor issue, but it exists nonetheless. Another idea would be to further formalize the above approach, and define some well structured hooks. The tools would need to implement functions with specific signatures that would get called at the appropriate moment of the processing pipeline (Figure 4.5). Something that needs to be specified

75 is in which order the handlers of the different tools will get calls. Hopefully, this can be deterministic and configurable, maybe by setting a priority on devices or something similar. On the other hand, it is too complicated to get SVG overlays in the right position, but storing the rendered image in a buffer, and if data changes are loading the image back from the buffer and deliver the tools over it.

7.2 User Interface

It is possible to improve the UI of the system by extending the current interface with a screen to switch between different studies. This could possibly be visualized by using large thumbnails of preselected (pre-operative) studies. Also, by adding an extra filter in which radiologists can access diverse settings such comparing with other patient information. An implementation with complex mouse interactions can be done using keyboard modifiers such as SPACE, SHIFT, CTRL and ALT with several combinations. The definition of constants for keyboard and mouse buttons would be a good opportunity to improve interaction performance and time reduction during annotations. Also, another improvement could be a key (usually DEL) to remove the currently highlighted annotation. Finally, it could be interesting, on the same keyboard and mouse interactive topic, an API to store/retrieve the enabled/activated tools for each mouse button. This method will make it easier for radiologists to switch tools in their applications. Some improvements on the UI most also are considered. For instance, a touch panel gestures would be necessary for the following enhancements: (i) pinch gesture to zoom; (ii) gesture to control windows and level mode; (iii) gestures for the control of the DICOM stacks; and (iv) change buttons gesture mode. Nevertheless, we present thoughts over esoteric gesture interactions over our system as recommendations for whom who might continue our research work. Some of them could be the re-order of the DICOM images using gesture or shuffling.

7.3 Limitations

Although the results are positive in favour of performance and experience evaluation in the radiology room, results need to be interpreted carefully, as multimodality of medical imaging for the breast cancer diagnosis interaction could also be benefiting from a novelty effect. Further experiments with different systems and possibly during real breast diagnosis procedures, will have to demonstrate whether this is the case. On a system perspective, one of the issues we face in is when we have an error loading on the image during stack scrolling, the UI does not redeem from the radiologist, and we need to fix the image first. In short, what happens is that if the first (displayed) image in the stack failed to load, even if the system shows some warning message over the blank canvas, the broken stack scroll tool and there is no way to scroll down to other images. This issue is a limitation of the stack tool. An early solution is the use of a dummy ’error’ DICOM Image. We trivially know that a reliable system is impossible without a faithful implementation. From our system, several limitations such as point-to-point connectivity design pose a severe threat to reliability and scalability of our system. In fact, our static system configuration, using statically dependent protocols such as a static DICOM representation [99], instead of the more dynamic, eliminates any chance for gradual recovery. While it was not a requirement for this phase, it is still a limitation to have in mind. The reasons why we choose the static DICOM representation are simple since the static description is less complicated and cumbersome compared to the dynamic story.

76 7.4 Future Work

Some possible future work using this thesis as a base of experimental set up is apparent, as the further research and experimentation strongly recommending the multimodality of medical imaging. It would be interesting to assess the effects of our research work in the diagnoses of breast cancer and accurate treatment that will be given to the patient. One of the evolved future needs is the combination of radiology and historical information of the patient paired with the ML based approach. It may consider the next innovation to find the appropriate solution for complex cases while Deep Learning Methodologies could classify automatically, not only, the available significant data set of diagnostic images, but also, cross this information with patient’s clinical past. The use of the combination of different classification methods which are not in the same categories (semantic, numeric or fuzzy) will increase performance in many of medical history searching features. Although some involved criteria to evaluate historical information samples are often difficult to formulate in computational forms, a study of the correlation of elements derived from semantic-level and the diagnosis of medical imaging using ML methods can be used to develop models that relate to reliable diagnostic of the patients. These semantic features may capture medical clues often used by expert radiologists when assessing breast diagnosis. Semantic-level features require a significant amount of medical data since each fundamental medical concept should be in the training data represented. Another interesting future work is to incorporate our system into a global active learning mechanism. It will provide the system more helpful information of the diagnosis from global medical experts and researchers on specific undiagnosed samples. This idea will also combine semi-supervised learning with active learning in the co-training paradigm applying for a global scale. This paradigm will draw much more strength from its global level, broad distribution, vast dataset availability, and flexible medical image debate for the breast cancer. Furthermore, quantitative ML methods allowed a straightforward system for this as principle whereby particularly diagnostic could be identified and subsequently inserted into the network to be learned. Iterative diagnosis based on this approach will considerably increase the reliability of the system without compromising the efficiency of the radiology room. This approach suggests a general, rational and unbiased approach to the development of a live learning system for the breast cancer diagnosis. Due to their quantitative nature, such policy will cover a wide range of binding affinities of breast cancer interest, by readily integrating radiologists with global predictions of others involved. These predictions can perform live learning diagnostic improving the algorithms and radiologists’ diagnosis in less time and cost. Finally, it is an example of an interactive feedback loop whereby advanced, computational optimise experimental strategy improving the medical research of the breast cancer. Nevertheless, another opportunity that will improve our system capacity to help radiologists workflow and patients healthcare is the fact that over time, there has been an increase in both complexity and number of radiology reports. Formerly, we accomplish reports by distinct methods: (a) dictation on a handset tape with subsequent transcription; (b) lack of a request from a handwritten form for subsequent typing; (c) direct correspondence to a secretary. These methods are time-consuming for the radiologist, in conjunction with higher healthcare costs on the health system. It is expensive regarding transcription- ist costs, while typically there is some third-party that will charge the health system for it, and may lead to inevitable delays in the availability of the reports. Meanwhile, there is an opportunity to try integrated newer methods on our system, as a manner of fact there have been significant advances in speech recognition systems. Recognition and correction of the following speech on digital dictation have been reported as a suitable replacement for report accomplishment when compared with word-processed reports performed by radiologists.

77 78 Chapter 8

Conclusion

There exist several research developments concerning UIs for clinical purpose. However, the effort for the development of a UI in a breast multimodality screening context is still scarce. This thesis fills the above gap, presenting innovations in the field. More specifically, this thesis offered an effective way at developing a new UI to collect a significant amount of annotations that characterise the lesions under inspection (i.e. masses and calcifications). Furthermore, this collection is performed by several radiol- ogists that can interact simultaneously/remotely with the developed UI. This functionality is a significant step towards an efficient way to collect data since it is possible to raise lesion information in a short time. Nevertheless, the diagnostic process in medical imaging is mandatory for the correct detection of breast lesions, inspection by visualisation and annotation of lesions in multimodality view is therefore fundamental for the purpose. Another innovation presented in the system is the ability to provide to the radiologist the temporal evolution of the lesions. For this purpose, the radiologist only has to select the ID of the patient, to define a period in which we perform exams (e.g. two months), and the UI can provide relevant temporal and clinical information. More specifically, by providing the following information visualisation: (i) if a given mass is acquiring (or not) an irregular morphological shape; and (ii) if the calcifications are progressively (or not) located commonly. If one (or both) of the above meeting conditions, this means that the BI-RADS will be increasing. An empirical methodology was used on this thesis. It has shown to be successful in detecting the needs of radiologists, indicating the appropriate testing interaction system, performance measures and finally determining usability improvements. It provides a practical methodology applying it to new practi- cal medical interaction innovations, which will undoubtedly become more and more critical in the years to come. While it is a welcome addition to the generally slim amount of HCI and UCD medical UIs research in general. We also applied demographic and psychometric methods throw the analysis and evaluation of questionnaires, understanding user profile and characteristics, to measure the user needs for satisfaction with system usability and UX. These methods brought us the information about most useful tools to our UI and what were the priorities of the features. We analyse surveys and quantitative measures as current descriptive statistics regarding this information. Meaning is added from that out- comes to the qualitative findings, providing indicators of feature positioning and interaction aspects. We worried about how prototypes could be useful to the intended radiologists. The prototype phase gave us the knowledge of our system and our research study. Once we complete these thesis goals, we would have a specification of the system architecture with expanded capabilities. Thus enabling web services using our system source with several core services and the web browser application.

79 One of the evolved future needs, as a pre-diagnostic phase, is the combination of radiology and historical information of the patient paired with the ML based approach. Although some involved cri- teria to evaluate ancient information samples are often difficult to formulate in computational forms, a study of the correlation of features derived from semantic-level and the diagnosis of medical imaging using ML methods can be used to develop models that relate to reliable diagnostic of the patients. These semantic features may capture medical clues often used by expert radiologists when assessing breast diagnosis. Semantic-level features require a significant amount of medical data, representing each essential medical concept in the training data. Another interesting future work, is the autonomous generation of radiology reports. Here, there are several strategies to support this use case in the radi- ology room workflow. We can create another UI that will give radiologists information of both historical patient’s information and medical imaging diagnosis, both supported by the ML algorithms. Also, it would be interesting on this post-diagnosis phase, to integrate a speech recognition middle feature in case of missing information on the final report. This increment will be substantial because if the data is missing from automatically generated reports, the radiologist will have an easy way to add information thanks to speech recognition using the existing workflow of the radiology room. Also, here the ML algorithms could be improved thanks to this speech recognition technique, while this brings another learning variable to the machines.

80 Bibliography

[1] L. K. L. França, A. G. V. Bitencourt, H. L. S. Paiva, C. B. Silva, N. P.Pereira, J. Paludo, L. Graziano, C. S. Guatelli, J. A. d. Souza, and E. F. Marques, “Role of magnetic resonance imaging in the plan- ning of breast cancer treatment strategies: comparison with conventional imaging techniques,” Radiologia brasileira, no. AHEAD, pp. 0–0, 2017.

[2] R. Velaga and M. Sugimoto, “Future paradigm of breast cancer resistance and treatment,” in Resistance to Targeted Therapies in Breast Cancer, pp. 155–178, Springer, 2017.

[3] W. H. Kim, J. M. Chang, H.-G. Moon, A. Yi, H. R. Koo, H. M. Gweon, and W. K. Moon, “Comparison of the diagnostic performance of digital breast tomosynthesis and magnetic resonance imaging added to digital mammography in women with known breast cancers,” European radiology, vol. 26, no. 6, pp. 1556–1564, 2016.

[4] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer statistics, 2016,” CA: a cancer journal for clinicians, vol. 66, no. 1, pp. 7–30, 2016.

[5] B. L. Sprague, R. F. Arao, D. L. Miglioretti, L. M. Henderson, D. S. Buist, T. Onega, G. H. Rauscher, J. M. Lee, A. N. Tosteson, K. Kerlikowske, et al., “National performance benchmarks for modern diagnostic digital mammography: update from the breast cancer surveillance consortium,” Radiol- ogy, vol. 283, no. 1, pp. 59–69, 2017.

[6] T. Huzarski, B. Górecka-Szyld, J. Huzarska, G. Psut-Muszynska,´ G. Wilk, R. Sibilski, C. Cybul- ski, B. Kozak-Klonowska, M. Siołek, E. Kilar, et al., “Screening with magnetic resonance imag- ing, mammography and ultrasound in women at average and intermediate risk of breast cancer,” Hereditary cancer in clinical practice, vol. 15, no. 1, p. 4, 2017.

[7] N. Cho, W. Han, B.-K. Han, M. S. Bae, E. S. Ko, S. J. Nam, E. Y. Chae, J. W. Lee, S. H. Kim, B. J. Kang, et al., “Breast cancer screening with mammography plus ultrasonography or magnetic res- onance imaging in women 50 years or younger at diagnosis and treated with breast conservation therapy,” JAMA oncology, vol. 3, no. 11, pp. 1495–1502, 2017.

[8] P.J. DiSaia, W. T. Creasman, R. S. Mannel, D. S. McMeekin, and D. G. Mutch, Clinical Gynecologic Oncology E-Book. Elsevier Health Sciences, 2017.

[9] C. C. Riedl, N. Luft, C. Bernhart, M. Weber, M. Bernathova, M.-K. M. Tea, M. Rudas, C. F. Singer, and T. H. Helbich, “Triple-modality screening trial for familial breast cancer underlines the impor- tance of magnetic resonance imaging and questions the role of mammography and ultrasound re- gardless of patient mutation status, age, and breast density,” Journal of Clinical Oncology, vol. 33, no. 10, pp. 1128–1135, 2015.

[10] G. Scaperrotta, C. Ferranti, E. Capalbo, B. Paolini, M. Marchesini, L. Suman, C. Folini, L. Mariani, and P. Panizza, “Performance and role of the breast lesion excision system (bles) in small clusters of suspicious microcalcifications,” European journal of radiology, vol. 85, no. 1, pp. 143–149, 2016.

81 [11] N. E. M. Association, P. NEMA, et al., “Iso 12052, digital imaging and communications in medicine (dicom) standard,” National Electrical Manufactureres Association, Rosslyn, VA, USA, 2016.

[12] J. Vermeulen, K. Luyten, E. van den Hoven, and K. Coninx, “Crossing the bridge over norman’s gulf of execution: revealing feedforward’s true identity,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1931–1940, ACM, 2013.

[13] S. M. Stuij, “Usability evaluation of the kinect in aiding surgeon-computer interaction,” 2013.

[14] F. Paterno, Model-based design and evaluation of interactive applications. Springer Science & Business Media, 2012.

[15] E. Nabovati, H. Vakili-Arki, S. Eslami, and R. Khajouei, “Usability evaluation of laboratory and radiology information systems integrated into a hospital information system,” Journal of medical systems, vol. 38, no. 4, p. 35, 2014.

[16] W. Jorritsma, F. Cnossen, and P. M. van Ooijen, “Merits of usability testing for pacs selection,” international journal of medical informatics, vol. 83, no. 1, pp. 27–36, 2014.

[17] J. R. Lewis, “Usability: lessons learned and yet to be learned,” International Journal of Human- Computer Interaction, vol. 30, no. 9, pp. 663–684, 2014.

[18] K. Hänsel, N. Wilde, H. Haddadi, and A. Alomainy, “Challenges with current wearable technology in monitoring health data and providing positive behavioural support,” in Proceedings of the 5th EAI International Conference on Wireless Mobile Communication and Healthcare, pp. 158–161, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2015.

[19] P.Merrell, E. Schkufza, Z. Li, M. Agrawala, and V. Koltun, “Interactive furniture layout using interior design guidelines,” in ACM Transactions on Graphics (TOG), vol. 30, p. 87, ACM, 2011.

[20] D. Maron, C. Missen, and J. Greenberg, “Lo-fi to hi-fi: a new way of conceptualizing metadata in underserved areas with the egranary digital library,” in Proceedings of the 2014 International Conference on Dublin Core and Metadata Applications, pp. 37–42, Dublin Core Metadata Initiative, 2014.

[21] J. L. Gabbard and J. E. Swan II, “Usability engineering for augmented reality: Employing user- based studies to inform design,” IEEE Transactions on visualization and computer graphics, vol. 14, no. 3, pp. 513–525, 2008.

[22] A. Khunlertkit, L. Dorissaint, A. Chen, L. Paine, and P. J. Pronovost, “Reducing and sustaining duplicate medical record creation by usability testing and system redesign.,” Journal of patient safety, 2017.

[23] M. F. Walji, E. Kalenderian, M. Piotrowski, D. Tran, K. K. Kookal, O. Tokede, J. M. White, R. Vader- hobli, R. Ramoni, P. C. Stark, et al., “Are three methods better than one? a comparative as- sessment of usability evaluation methods in an ehr,” International journal of medical informatics, vol. 83, no. 5, pp. 361–367, 2014.

[24] D. A. Hanauer, D. T. Wu, L. Yang, Q. Mei, K. B. Murkowski-Steffy, V. V. Vydiswaran, and K. Zheng, “Development and empirical user-centered evaluation of semantically-based query recommen- dation for an search engine,” Journal of Biomedical Informatics, vol. 67, pp. 1–10, 2017.

82 [25] J. Mirkovic, O. B. Kristjansdottir, U. Stenberg, T. Krogseth, K. C. Stange, and C. M. Ruland, “Patient insights into the design of technology to support a strengths-based approach to health care,” JMIR research protocols, vol. 5, no. 3, 2016.

[26] S. A. Mummah, T. N. Robinson, A. C. King, C. D. Gardner, and S. Sutton, “Ideas (integrate, design, assess, and share): A framework and toolkit of strategies for the development of more effective digital interventions to change health behavior,” Journal of medical Internet research, vol. 18, no. 12, 2016.

[27] W. Seidelman, M. Lee, C. Carswell, T. Kent, B. Fu, and R. Yang, “User centered design of a hybrid- reality display for weld monitoring,” in CHI’14 Extended Abstracts on Human Factors in Computing Systems, pp. 2059–2064, ACM, 2014.

[28] R. M. Ratwani, A. Zachary Hettinger, A. Kosydar, R. J. Fairbanks, and M. L. Hodgkins, “A frame- work for evaluating electronic health record vendor user-centered design and usability testing pro- cesses,” Journal of the American Medical Informatics Association, vol. 24, no. e1, pp. e35–e39, 2017.

[29] S. T. Rosenbloom, R. A. Miller, K. B. Johnson, P.L. Elkin, and S. H. Brown, “A model for evaluating interface terminologies,” Journal of the American Medical Informatics Association, vol. 15, no. 1, pp. 65–76, 2008.

[30] A. C. Li, J. L. Kannry, A. Kushniruk, D. Chrimes, T. G. McGinn, D. Edonyabo, and D. M. Mann, “Integrating usability testing and think-aloud protocol analysis with near-live clinical simulations in evaluating clinical decision support,” International journal of medical informatics, vol. 81, no. 11, pp. 761–772, 2012.

[31] J. Viitanen, H. Hyppönen, T. Lääveri, J. Vänskä, J. Reponen, and I. Winblad, “National ques- tionnaire study on clinical ict systems proofs: physicians suffer from poor usability,” International journal of medical informatics, vol. 80, no. 10, pp. 708–725, 2011.

[32] J. E. Bardram, “Activity-based computing for medical work in hospitals,” ACM Trans. Comput.- Hum. Interact., vol. 16, no. 2, pp. 1–36, 2009.

[33] A. M. Ahmad, G. M. Khan, S. A. Mahmud, and J. F. Miller, “Breast cancer detection using carte- sian genetic programming evolved artificial neural networks,” in Proceedings of the 14th annual conference on Genetic and evolutionary computation, pp. 1031–1038, ACM, 2012.

[34] P. Filipczuk, T. Fevens, A. Krzyzak, and R. Monczak, “Computer-aided breast cancer diagnosis based on the analysis of cytological images of fine needle biopsies,” IEEE Transactions on Medical Imaging, vol. 32, no. 12, pp. 2169–2178, 2013.

[35] A. M. Ahmad, G. M. Khan, S. A. Mahmud, and J. F. Miller, “Breast cancer detection using cartesian genetic programming evolved artificial neural networks,” in Proceedings of the 14th Annual Con- ference on Genetic and Evolutionary Computation, GECCO ’12, (New York, NY, USA), pp. 1031– 1038, ACM, 2012.

[36] K. Chuang, Picture archiving and communication systems (PACS) in medicine, vol. 74. Springer Science & Business Media, 2013.

[37] M. Mustra, K. Delac, and M. Grgic, “Overview of the dicom standard,” in ELMAR, 2008. 50th International Symposium, vol. 1, pp. 39–44, IEEE, 2008.

83 [38] S. Jodogne, C. Bernard, M. Devillers, E. Lenaerts, and P.Coucke, “Orthanc – A lightweight, REST- ful DICOM server for healthcare and medical research,” in Biomedical Imaging (ISBI), IEEE 10th International Symposium on, (San Francisco, CA, USA), pp. 190–193, Apr. 2013.

[39] H. Huang, “Medical imaging, pacs, and imaging informatics: retrospective,” Radiological physics and technology, vol. 7, no. 1, pp. 5–24, 2014.

[40] B. van Ginneken and C. L. Novak, “Computer-aided diagnosis,” in Proceedings of SPIE, vol. 8315, pp. 831501–83, 2012.

[41] C. Ardito, P. Buono, M. F. Costabile, and R. Lanzilotti, “Two different interfaces to visualize patient histories on a pda,” in Proceedings of the 8th conference on Human-computer interaction with mobile devices and services, pp. 37–40, ACM, 2006.

[42] L. Wilcox, J. Lu, J. Lai, S. Feiner, and D. Jordan, “Activenotes: computer-assisted creation of patient progress notes,” in CHI’09 Extended Abstracts on Human Factors in Computing Systems, pp. 3323–3328, ACM, 2009.

[43] S. Grange, “Medical/operating room interaction system,” 2007.

[44] A. V. Reinschluessel, J. Teuber, M. Herrlich, J. Bissel, M. van Eikeren, J. Ganser, F. Koeller, F. Kollasch, T. Mildner, L. Raimondo, et al., “Virtual reality for user-centered design and evalua- tion of touch-free interaction techniques for navigating medical images in the operating room,” in Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 2001–2009, ACM, 2017.

[45] R. Wen, B. P. Nguyen, C.-B. Chng, and C.-K. Chui, “In situ spatial ar surgical planning using projector-kinect system,” in Proceedings of the Fourth Symposium on Information and Communi- cation Technology, SoICT ’13, (New York, NY, USA), pp. 164–171, ACM, 2013.

[46] J.-Z. Cheng, D. Ni, Y.-H. Chou, J. Qin, C.-M. Tiu, Y.-C. Chang, C.-S. Huang, D. Shen, and C.-M. Chen, “Computer-aided diagnosis with deep learning architecture: applications to breast lesions in us images and pulmonary nodules in ct scans,” Scientific reports, vol. 6, p. 24454, 2016.

[47] T. Sielhorst, M. Feuerstein, and N. Navab, “Advanced medical displays: A literature review of augmented reality,” Journal of Display Technology, vol. 4, no. 4, pp. 451–467, 2008.

[48] D. Guha, N. M. Alotaibi, N. Nguyen, S. Gupta, C. McFaul, and V. X. Yang, “Augmented reality in neurosurgery: A review of current concepts and emerging applications,” Canadian Journal of Neurological Sciences, vol. 44, no. 3, pp. 235–245, 2017.

[49] E. Garai, S. Sensarn, C. L. Zavaleta, N. O. Loewke, S. Rogalla, M. J. Mandella, S. A. Felt, S. Fried- land, J. T. Liu, S. S. Gambhir, et al., “A real-time clinical endoscopic system for intraluminal, mul- tiplexed imaging of surface-enhanced raman scattering nanoparticles,” PLoS One, vol. 10, no. 4, p. e0123185, 2015.

[50] L. Maier-Hein, P. Mountney, A. Bartoli, H. Elhawary, D. Elson, A. Groch, A. Kolb, M. Rodrigues, J. Sorger, S. Speidel, et al., “Optical techniques for 3d surface reconstruction in computer-assisted laparoscopic surgery,” Medical image analysis, vol. 17, no. 8, pp. 974–996, 2013.

[51] P. Carayon, T. B. Wetterneck, A. J. Rivera-Rodriguez, A. S. Hundt, P. Hoonakker, R. Holden, and A. P. Gurses, “Human factors systems approach to healthcare quality and patient safety,” Applied ergonomics, vol. 45, no. 1, pp. 14–25, 2014.

84 [52] P. Carayon, Handbook of human factors and ergonomics in health care and patient safety. CRC Press, 2016.

[53] R. J. Shaw, M. M. Horvath, D. Leonard, J. M. Ferranti, and C. M. Johnson, “Developing a user- friendly interface for a self-service healthcare research portal: cost-effective usability testing,” Health Systems, vol. 4, no. 2, pp. 151–158, 2015.

[54] K. Sun, T. S. Pheiffer, A. L. Simpson, J. A. Weis, R. C. Thompson, and M. I. Miga, “Near real-time computer assisted surgery for brain shift correction using biomechanical models,” IEEE journal of translational engineering in health and medicine, vol. 2, pp. 1–13, 2014.

[55] R. R. Bond, D. D. Finlay, C. D. Nugent, G. Moore, and D. Guldenring, “A usability evaluation of med- ical software at an expert conference setting,” Computer methods and programs in biomedicine, vol. 113, no. 1, pp. 383–395, 2014.

[56] A. D. V. Dabbs, B. A. Myers, K. R. Mc Curry, J. Dunbar-Jacob, R. P. Hawkins, A. Begey, and M. A. Dew, “User-centered design and interactive health technologies for patients,” Computers, informatics, nursing: CIN, vol. 27, no. 3, p. 175, 2009.

[57] T. J. Hagedorn, S. Krishnamurty, and I. R. Grosse, “An information model to support user-centered design of medical devices,” Journal of biomedical informatics, vol. 62, pp. 181–194, 2016.

[58] B. KAYGIN and M. DEM˙IR, “A research on the importance of user-centered design in furniture,”

[59] F. M. Calisto, A. Ferreira, J. C. Nascimento, and D. Gonçalves, “Towards touch-based medical image diagnosis annotation,” in Proceedings of the 2017 ACM International Conference on Inter- active Surfaces and Spaces, pp. 390–395, ACM, 2017.

[60] A. C. McLaughlin, W. A. Rogers, and A. D. Fisk, “Using direct and indirect input devices: Atten- tion demands and age-related differences,” ACM Transactions on Computer-Human Interaction (TOCHI), vol. 16, no. 1, p. 2, 2009.

[61] S. A. Jax, D. A. Rosenbaum, and J. Vaughan, “Extending fitts law to manual obstacle avoidance,” Experimental brain research, vol. 180, no. 4, pp. 775–779, 2007.

[62] H. Benko and D. Wigdor, “Imprecision, inaccuracy, and frustration: The tale of touch input,” in Tabletops-Horizontal Interactive Displays, pp. 249–275, Springer, 2010.

[63] L. Vlaming, C. Collins, M. Hancock, M. Nacenta, T. Isenberg, and S. Carpendale, “Integrating 2d mouse emulation with 3d manipulation for visualizations on a multi-touch table,” in ACM Interna- tional Conference on Interactive Tabletops and Surfaces, pp. 221–230, ACM, 2010.

[64] N. Caprani, N. E. OConnor, and C. Gurrin, “Touch screens for the older user,” in Assistive tech- nologies, InTech, 2012.

[65] L. G. Motti, N. Vigouroux, and P. Gorce, “Interaction techniques for older adults using touchscreen devices: a literature review,” in Proceedings of the 25th Conference on l’Interaction Homme- Machine, p. 125, ACM, 2013.

[66] C. Forlines, D. Wigdor, C. Shen, and R. Balakrishnan, “Direct-touch vs. mouse input for tabletop displays,” in Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 647–656, ACM, 2007.

85 [67] A. Ebert, M. Deller, D. Steffen, and M. Heintz, “where did i put that?–effectiveness of kinesthetic memory in immersive virtual environments,” Universal Access in Human-Computer Interaction. Applications and Services, pp. 179–188, 2009.

[68] P.Kapur, M. Jensen, L. J. Buxbaum, S. A. Jax, and K. J. Kuchenbecker, “Spatially distributed tactile feedback for kinesthetic motion guidance,” in Haptics Symposium, 2010 IEEE, pp. 519–526, IEEE, 2010.

[69] V. Nacher, A. Ferreira, J. Jaen, and F. Garcia-Sanjuan, “Are kindergarten children ready for indirect drag interactions?,” in Proceedings of the 2016 ACM on Interactive Surfaces and Spaces, pp. 95– 101, ACM, 2016.

[70] A. Withana, M. Kondo, Y. Makino, G. Kakehi, M. Sugimoto, and M. Inami, “Impact: Immersive haptic stylus to enable direct touch and manipulation for surface computing,” Computers in Enter- tainment (CIE), vol. 8, no. 2, p. 9, 2010.

[71] M. R. Bhalla and A. V. Bhalla, “Comparative study of various touchscreen technologies,” Interna- tional Journal of Computer Applications, vol. 6, no. 8, pp. 12–18, 2010.

[72] U. Schultheis, J. Jerald, F. Toledo, A. Yoganandan, and P. Mlyniec, “Comparison of a two-handed interface to a wand interface and a mouse interface for fundamental 3d tasks,” in 3D User Inter- faces (3DUI), 2012 IEEE Symposium on, pp. 117–124, IEEE, 2012.

[73] M. Sousa, D. Mendes, S. Paulo, N. Matela, J. Jorge, and D. S. Lopes, “Vrrrroom: Virtual reality for radiologists in the reading room,” in Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 4057–4062, ACM, 2017.

[74] S. Ullrich, T. Knott, Y. C. Law, O. Grottke, and T. Kuhlen, “Influence of the bimanual frame of ref- erence with haptics for unimanual interaction tasks in virtual environments,” in 3D User Interfaces (3DUI), 2011 IEEE Symposium on, pp. 39–46, IEEE, 2011.

[75] C. Geiger and O. Rattay, “Tubemouse-a two-handed input device for flexible objects,” in 3D User Interfaces, 2008. 3DUI 2008. IEEE Symposium on, pp. 27–34, IEEE, 2008.

[76] L. Yu, P. Svetachov, P. Isenberg, M. H. Everts, and T. Isenberg, “Fi3d: Direct-touch interaction for the exploration of 3d scientific visualization spaces,” IEEE transactions on visualization and computer graphics, vol. 16, no. 6, pp. 1613–1622, 2010.

[77] C. B. Irwin and M. E. Sesto, “Performance and touch characteristics of disabled and non-disabled participants during a reciprocal tapping task using touch screen technology,” Applied ergonomics, vol. 43, no. 6, pp. 1038–1043, 2012.

[78] H. Huang and H.-H. Lai, “Factors influencing the usability of icons in the lcd touchscreen,” Displays, vol. 29, no. 4, pp. 339–344, 2008.

[79] C. Lundstrom, T. Rydell, C. Forsell, A. Persson, and A. Ynnerman, “Multi-touch table system for medical visualization: Application to orthopedic surgery planning,” IEEE Transactions on Visual- ization and Computer Graphics, vol. 17, no. 12, pp. 1775–1784, 2011.

[80] A. Jalalian, S. B. Mashohor, H. R. Mahmud, M. I. B. Saripan, A. R. B. Ramli, and B. Karasfi, “Computer-aided detection/diagnosis of breast cancer in mammography and ultrasound: a review,” Clinical imaging, vol. 37, no. 3, pp. 420–426, 2013.

86 [81] K. Ganesan, U. R. Acharya, C. K. Chua, L. C. Min, K. T. Abraham, and K.-H. Ng, “Computer-aided breast cancer detection using mammograms: a review,” IEEE Reviews in biomedical engineering, vol. 6, pp. 77–98, 2013.

[82] B. van Ginneken, C. M. Schaefer-Prokop, and M. Prokop, “Computer-aided diagnosis: how to move from the laboratory to the clinic,” Radiology, vol. 261, no. 3, pp. 719–732, 2011.

[83] J.-H. Chen, H. Yu, M. Lin, R. S. Mehta, and M.-Y. Su, “Background parenchymal enhancement in the contralateral normal breast of patients undergoing neoadjuvant chemotherapy measured by dce-mri,” Magnetic resonance imaging, vol. 31, no. 9, pp. 1465–1471, 2013.

[84] M. J. Yaffe, “Mammographic density. measurement of mammographic density,” Breast Cancer Research, vol. 10, no. 3, p. 209, 2008.

[85] D. Saslow, C. Boetes, W. Burke, S. Harms, M. O. Leach, C. D. Lehman, E. Morris, E. Pisano, M. Schnall, S. Sener, et al., “American cancer society guidelines for breast screening with mri as an adjunct to mammography,” CA: a cancer journal for clinicians, vol. 57, no. 2, pp. 75–89, 2007.

[86] F. Sardanelli, F. Podo, F. Santoro, S. Manoukian, S. Bergonzi, G. Trecate, D. Vergnaghi, M. Fed- erico, L. Cortesi, S. Corcione, et al., “Multicenter surveillance of women at high genetic breast cancer risk using mammography, ultrasonography, and contrast-enhanced magnetic resonance imaging (the high breast cancer risk italian 1 study): final results,” Investigative radiology, vol. 46, no. 2, pp. 94–105, 2011.

[87] J. Tang, R. M. Rangayyan, J. Xu, I. El Naqa, and Y. Yang, “Computer-aided detection and diag- nosis of breast cancer with mammography: recent advances,” IEEE Transactions on Information Technology in Biomedicine, vol. 13, no. 2, pp. 236–251, 2009.

[88] J. J. Fenton, S. H. Taplin, P. A. Carney, L. Abraham, E. A. Sickles, C. D’orsi, E. A. Berns, G. Cutter, R. E. Hendrick, W. E. Barlow, et al., “Influence of computer-aided detection on performance of screening mammography,” New England Journal of Medicine, vol. 356, no. 14, pp. 1399–1409, 2007.

[89] C. Dromain, B. Boyer, R. Ferre, S. Canale, S. Delaloge, and C. Balleyguier, “Computed-aided diagnosis (cad) in the detection of breast cancer,” European journal of radiology, vol. 82, no. 3, pp. 417–423, 2013.

[90] N. Berger, Z. Varga, T. Frauenfelder, and A. Boss, “Mri-guided breast vacuum biopsy: Localiza- tion of the lesion without contrast-agent application using diffusion-weighted imaging,” Magnetic resonance imaging, vol. 38, pp. 1–5, 2017.

[91] P. Palanque, J.-F. Ladry, D. Navarre, and E. Barboni, “High-fidelity prototyping of interactive sys- tems can be formal too,” Human-Computer Interaction. New Trends, pp. 667–676, 2009.

[92] L. van Velsen, T. van der Geest, M. ter Hedde, and W. Derks, “Requirements engineering for e- government services: A citizen-centric approach and case study,” Government Information Quar- terly, vol. 26, no. 3, pp. 477–486, 2009.

[93] D. Haak, C.-E. Page, and T. M. Deserno, “A survey of dicom viewer software to integrate clinical research and medical imaging,” Journal of digital imaging, vol. 29, no. 2, pp. 206–215, 2016.

[94] D. E. Irwin, J. W. Varni, K. Yeatts, and D. A. DeWalt, “Cognitive interviewing methodology in the development of a pediatric item bank: a patient reported outcomes measurement information system (promis) study,” Health and Quality of Life Outcomes, vol. 7, no. 1, p. 3, 2009.

87 [95] S. Faranello, Balsamiq wireframes quickstart guide. Packt Publishing Ltd, 2012.

[96] M. Kenteris, D. Gavalas, and D. Economou, “An innovative mobile electronic tourist guide applica- tion,” Personal and ubiquitous computing, vol. 13, no. 2, pp. 103–118, 2009.

[97] F. Valente, C. Viana-Ferreira, C. Costa, and J. L. Oliveira, “A restful image gateway for multiple medical image repositories,” IEEE Transactions on Information Technology in Biomedicine, vol. 16, no. 3, pp. 356–364, 2012.

[98] A. Arbelaiz, A. Moreno, L. Kabongo, and A. García-Alonso, “Volume visualization tools for medical applications in ubiquitous platforms,” in eHealth 360rˇ, pp. 443–450, Springer, 2017.

[99] W. E. Hammond, “Standards for global health information systems,” Global Health Informatics: How Information Technology Can Change Our Lives in a Globalized World, p. 94, 2016.

[100] S. Crisan and I. G. Tarnovan, “Optimization of a multi-touch sensing device for biomedical appli- cations,” in Advanced Engineering Forum, vol. 8, pp. 545–552, Trans Tech Publ, 2013.

[101] D. Grimes, D. S. Tan, S. E. Hudson, P. Shenoy, and R. P. Rao, “Feasibility and pragmatics of classifying working memory load with an electroencephalograph,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 835–844, ACM, 2008.

[102] D. C. Montgomery, Design and analysis of experiments. John Wiley & Sons, 2017.

[103] D. D. Salvucci and P. Bogunovich, “Multitasking and monotasking: the effects of mental workload on deferred task interruptions,” in Proceedings of the SIGCHI conference on human factors in computing systems, pp. 85–88, ACM, 2010.

[104] G. W. Corder and D. I. Foreman, Nonparametric statistics: A step-by-step approach. John Wiley & Sons, 2014.

[105] J. Lazar, J. H. Feng, and H. Hochheiser, Research methods in human-computer interaction. Mor- gan Kaufmann, 2017.

[106] H. D. Bondell and B. J. Reich, “Simultaneous factor selection and collapsing levels in anova,” Biometrics, vol. 65, no. 1, pp. 169–177, 2009.

[107] E. L. Merz, V. L. Malcarne, S. C. Roesch, C. M. Ko, M. Emerson, V. G. Roma, and G. R. Sadler, “Psychometric properties of positive and negative affect schedule (panas) original and short forms in an african american community sample,” Journal of affective disorders, vol. 151, no. 3, pp. 942– 949, 2013.

[108] B. S. Bell and S. W. Kozlowski, “Active learning: effects of core training design elements on self- regulatory processes, learning, and adaptability.,” Journal of Applied psychology, vol. 93, no. 2, p. 296, 2008.

[109] A. Broeck, M. Vansteenkiste, H. Witte, B. Soenens, and W. Lens, “Capturing autonomy, com- petence, and relatedness at work: Construction and initial validation of the work-related basic need satisfaction scale,” Journal of Occupational and Organizational Psychology, vol. 83, no. 4, pp. 981–1002, 2010.

[110] S. Kaplan, J. C. Bradley, J. N. Luchman, and D. Haynes, “On the role of positive and negative af- fectivity in job performance: a meta-analytic investigation.,” Journal of Applied Psychology, vol. 94, no. 1, p. 162, 2009.

88 [111] D. J. Kuss, J. Louws, and R. W. Wiers, “Online gaming addiction? motives predict addictive play behavior in massively multiplayer online role-playing games,” Cyberpsychology, Behavior, and Social Networking, vol. 15, no. 9, pp. 480–485, 2012.

[112] J. W. Adie, J. L. Duda, and N. Ntoumanis, “Autonomy support, basic need satisfaction and the optimal functioning of adult male and female sport participants: A test of basic needs theory,” Motivation and Emotion, vol. 32, no. 3, pp. 189–199, 2008.

[113] C. Lonsdale, C. M. Sabiston, I. M. Taylor, and N. Ntoumanis, “Measuring student motivation for physical education: examining the psychometric properties of the perceived locus of causality questionnaire and the situational motivation scale,” Psychology of Sport and Exercise, vol. 12, no. 3, pp. 284–292, 2011.

[114] E. L. Deci and R. M. Ryan, Self-determination. Wiley Online Library, 2010.

[115] S. Mansilla, Reactive programming with RxJS: untangle your asynchronous JavaScript code. The Pragmatic Programmers, 2015.

89 Appendix A

Main Appendix

90 91 92 93 94 Appendix B

External Source

The following information is regarding several links complementing this thesis.

Figures The list of appendix figures are as follows: - Thrends in Death Rates; - Stage of Diagnosis; - Triple Negative; - Age-Specific Incidence Rates;

User Manual The user manual can be accessed via our GitHub repository: - User Manual;

Technical Manual The technical manual can be accessed via our GitHub repository: - Technical Manual;

User Profile The list of informations for the user profile are as follows: - User Profile; - Result 1; - Result 2; - Result 3; - Result 4; - Result 5;

User Experience The following list of questionnaires can be accessed via our GitHub repository: - Questionnaire 1; - Questionnaire 2; - Questionnaire 3; - Questionnaire 4;

95 Scripts The following list of scripts can be accessed via our GitHub repository: - Script 1; - Script 2;

Reports The following list of reports can be accessed via our GitHub repository: - Report 1; - Report 2; - Report 3; - Report 4; - Report 5; - Report 6; - Report 7; - Report 8; - Report 9; - Report 10;

96